CUDA Error: An Illegal Memory Access Was Encountered - Part 1 ...
Could you confirm the pytorch version?
Some of the workarounds suggested were a) lower batch sizes b) setting specific gpu torch.cuda.set_device(1)
This seems like a pytorch issue. It is not clear to me that the issue is completely fixed.
RuntimeError: CUDA error: an illegal memory access was encountered
opened 07:55AM - 15 Jun 19 UTC closed 06:01PM - 02 Oct 20 UTCHi,everyone! I met a strange illegal memory access error. It happens randomly w…ithout any regular pattern. The code is really simple. It is PointNet for point cloud segmentation. I don't think there is anything wrong in the code. ```python import torch import torch.nn as nn import torch.nn.functional as F import os class InstanceSeg(nn.Module): def __init__(self, num_points=1024): super(InstanceSeg, self).__init__() self.num_points = num_points self.conv1 = nn.Conv1d(9, 64, 1) self.conv2 = nn.Conv1d(64, 64, 1) self.conv3 = nn.Conv1d(64, 64, 1) self.conv4 = nn.Conv1d(64, 128, 1) self.conv5 = nn.Conv1d(128, 1024, 1) self.conv6 = nn.Conv1d(1088, 512, 1) self.conv7 = nn.Conv1d(512, 256, 1) self.conv8 = nn.Conv1d(256, 128, 1) self.conv9 = nn.Conv1d(128, 128, 1) self.conv10 = nn.Conv1d(128, 2, 1) self.max_pool = nn.MaxPool1d(num_points) def forward(self, x): batch_size = x.size()[0] # (x has shape (batch_size, 9, num_points)) out = F.relu(self.conv1(x)) # (shape: (batch_size, 64, num_points)) out = F.relu(self.conv2(out)) # (shape: (batch_size, 64, num_points)) point_features = out out = F.relu(self.conv3(out)) # (shape: (batch_size, 64, num_points)) out = F.relu(self.conv4(out)) # (shape: (batch_size, 128, num_points)) out = F.relu(self.conv5(out)) # (shape: (batch_size, 1024, num_points)) global_feature = self.max_pool(out) # (shape: (batch_size, 1024, 1)) global_feature_repeated = global_feature.repeat(1, 1, self.num_points) # (shape: (batch_size, 1024, num_points)) out = torch.cat([global_feature_repeated, point_features], 1) # (shape: (batch_size, 1024+64=1088, num_points)) out = F.relu(self.conv6(out)) # (shape: (batch_size, 512, num_points)) out = F.relu(self.conv7(out)) # (shape: (batch_size, 256, num_points)) out = F.relu(self.conv8(out)) # (shape: (batch_size, 128, num_points)) out = F.relu(self.conv9(out)) # (shape: (batch_size, 128, num_points)) out = self.conv10(out) # (shape: (batch_size, 2, num_points)) out = out.transpose(2,1).contiguous() # (shape: (batch_size, num_points, 2)) out = F.log_softmax(out.view(-1, 2), dim=1) # (shape: (batch_size*num_points, 2)) out = out.view(batch_size, self.num_points, 2) # (shape: (batch_size, num_points, 2)) return out Num = 0 network = InstanceSeg() network.cuda() while(1): input0 = torch.randn(32, 3, 1024).cuda() input1 = torch.randn(32, 3, 1024).cuda() input2 = torch.randn(32, 3, 1024).cuda() input = torch.cat((input0, input1, input2), 1) out = network(input) Num = Num+1 print(Num) ``` After random number of steps, error raises. The error report is ``` Traceback (most recent call last): File "/home/wangye/Frustum-PointNet_Test/frustum_pointnet.py", line 58, in <module> input0 = torch.randn(32, 3, 1024).cuda() RuntimeError: CUDA error: an illegal memory access was encountered ``` When I added "os.environ['CUDA_LAUNCH_BLOCKING'] = '1'" at the top of this script, the error report was changed to this ``` Traceback (most recent call last): File "/home/wangye/Frustum-PointNet_Test/frustum_pointnet.py", line 64, in <module> out = network(input) File "/home/wangye/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/wangye/Frustum-PointNet_Test/frustum_pointnet.py", line 35, in forward out = F.relu(self.conv5(out)) # (shape: (batch_size, 1024, num_points)) File "/home/wangye/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/wangye/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED ``` I know some wrong indexing operations and some wrong usage method of loss function may lead to illegal memory access error. But in this script, there is no such kind of operation. I am quite sure this error is not because of out of memory since only about 2G GPU memory is used, and I have totally 12G GPU memory. This is my environment information: ``` OS: Ubuntu 16.04 LTS 64-bit Command: conda install pytorch torchvision cudatoolkit=9.0 -c pytorch GPU: Titan XP Driver Version: 410.93 Python Version: 3.6 cuda Version: cuda_9.0.176_384.81_linux cudnn Version: cudnn-9.0-linux-x64-v7.4.2.24 pytorch Version: pytorch-1.0.1-py3.6_cuda9.0.176_cudnn7.4.2_2 ``` I have been stuck here for long time. In fact, not only this project faces this error, many other projects face similar error in my computer. I don't think there is anything wrong in the code. It can run correctly for some steps. Maybe this error is because the environment. I am not sure. Does anyone have any idea about this situation? If more detailed information is needed, please let me know. Thanks for any suggestion.
Tag » Code 700 Reason An Illegal Memory Access Was Encountered
-
Incidental Error 700 - An Illegal Memory Access Is Encountered
-
Simple CUDA Test Always Fails With "an Illegal Memory Access Was ...
-
CUDA Error 700: An Illegal Memory Access Was Encountered #1946
-
Cuda Runtime Error (700) : An Illegal Memory Access Was Encountered
-
CUDA Error: An Illegal Memory Access Was Encountered With ...
-
CUDA Error In :465 : An Illegal Memory Access Was ...
-
Illegal Memory Access Problem CUDA - GPU - Julia Discourse
-
An Empirical Method Of Debugging "illegal Memory Access" Bug In ...
-
Getting Cuda Error 700 Without Any Obvious Reason - ADocLib
-
PyTorch RuntimeError: CUDA Error: An Illegal Memory Access Was ...
-
An Illegal Memory Access Was Encountered(CUDA错误非法访问内存)
-
Pytorch报错:CUDA Error: An Illegal Memory Access Was Encountered
-
An Illegal Memory Access Was Encountered” A Few Times This Past Day.