Illegal Memory Access Problem CUDA - GPU - Julia Discourse
I am creating some dynamic shared memory boolean arrays in kernel, and it give me consistently
ERROR: LoadError: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS) Stacktrace: [1] throw_api_error(res::CUDA.cudaError_enum) @ CUDA C:\Users\1\.julia\packages\CUDA\9T5Sq\lib\cudadrv\error.jl:105 [2] query @ C:\Users\1\.julia\packages\CUDA\9T5Sq\lib\cudadrv\stream.jl:102 [inlined] [3] synchronize(stream::CuStream; blocking::Bool) @ CUDA C:\Users\1\.julia\packages\CUDA\9T5Sq\lib\cudadrv\stream.jl:130 [4] synchronize (repeats 2 times) @ C:\Users\1\.julia\packages\CUDA\9T5Sq\lib\cudadrv\stream.jl:117 [inlined] [5] unsafe_copyto!(dest::Vector{UInt16}, doffs::Int64, src::CuArray{UInt16, 1, CUDA.Mem.DeviceBuffer}, soffs::Int64, n::Int64) @ CUDA C:\Users\1\.julia\packages\CUDA\9T5Sq\src\array.jl:389 [6] copyto! @ C:\Users\1\.julia\packages\CUDA\9T5Sq\src\array.jl:349 [inlined] [7] getindex(xs::CuArray{UInt16, 1, CUDA.Mem.DeviceBuffer}, I::Int64) @ GPUArrays C:\Users\1\.julia\packages\GPUArrays\3sW6s\src\host\indexing.jl:89 [8] top-level scope @ c:\GitHub\GitHub\NuclearMedEval\src\playgrounds\convolutionsPlay.jl:71I am wondering what is wrong here - my assumption is that true size of boolean array in bytes is amount of its entries divided by 8 so 1 bit per entry - am I correct?
using CUDA dataBdim= (32,24,32) fp = CUDA.zeros(UInt16,1) sumInBits = (dataBdim[1]+2)+(dataBdim[2]+2)+(dataBdim[3]+2)+dataBdim[1]+dataBdim[2]+dataBdim[3] shmemSum = cld(sumInBits,8)#in bytes function testKernelA(dataBdim,fp) resShmem = @cuDynamicSharedMem(Bool,((dataBdim[1]+2),(dataBdim[2]+2),(dataBdim[3]+2))) sourceShmem = @cuDynamicSharedMem(Bool,(dataBdim[1],dataBdim[2],dataBdim[3])) # naive loop just for presentation of problem for i in 1:(dataBdim[1]+2),j in 1:(dataBdim[2]+2), n in 1:(dataBdim[3]+2) resShmem[i,j,n]=false end for i in 1:(dataBdim[1]),j in 1:(dataBdim[2]), n in 1:(dataBdim[3]) sourceShmem[i,j,n]=false end fp[1]=1 return end @cuda threads=(32,5) blocks=(2) shmem=shmemSum testKernelA(dataBdim,fp) fp[1] maleadt November 22, 2021, 7:04am 2sumInBits = (dataBdim[1]+2)+(dataBdim[2]+2)+(dataBdim[3]+2)+dataBdim[1]+dataBdim[2]+dataBdim[3]
How is this ‘in bits’ if you’re nowhere multiplying by sizeof(UInt16) (or 8*sizeof if you actually want this size to be bits)?
Jakub_Mitura November 22, 2021, 6:07pm 3You are right still changing it to
sumInBits = (dataBdim[1]+2)*(dataBdim[2]+2)*(dataBdim[3]+2)+dataBdim[1]*dataBdim[2]*dataBdim[3]do not solve the problem, but is size of needed if this is boolean array? is it not just bitarray?
maleadt November 23, 2021, 6:55am 4There’s still no sizeof in that expression? And it is needed, CuArray{Bool} doesn’t have the same bitarray-like optimization implemented.
Also, try out CUDA.jl#master, there the dynamic memory accesses are bounds checked so will throw a BoundsError instead of crashing CUDA with an illegal memory access.
Jakub_Mitura November 23, 2021, 9:23am 5Ok, so can I use bit type in shared memory ?
And what do you mean by master, I suppose you deduced that I am using some branch, what is not intended by me , I had found somewhere that shared memory initialization macro should now be a function - this is what you mean ?
Thanks !
carstenbauer November 23, 2021, 10:01am 6And what do you mean by master
The master branch on GitHub, i.e. ] add CUDA#master
You’re probably on the latest stable release (if you didn’t do anything fancy).
1 Like Jakub_Mitura November 23, 2021, 10:09am 7Ok , thanks
Jakub_Mitura November 24, 2021, 6:00am 8So I already understand it i suppose
, still is there a way to use 3 dimensional bit array in shared memory ? It would be extremely usefull .
No, the BitArray optimization has not been implemented for CuArray. Just use a regular Bool array. If space is a problem, you’ll need to look into implementing BitArray’s packed layout.
1 LikeRelated topics
| Topic | Replies | Views | Activity |
|---|---|---|---|
| CuDynamicSharedArray error GPU gpu | 2 | 540 | November 25, 2021 |
| @cuDynamicSharedMem : allocating beforehand? GPU | 2 | 1371 | January 2, 2018 |
| I don't understand why it is slower with CuStaticSharedArray New to Julia gpu , cuda , sharedarrays , cudajl | 9 | 342 | March 17, 2025 |
| @cuStaticSharedMem multidimensional indexing seems not to work GPU question | 2 | 477 | October 30, 2020 |
| Simple CUDA sum kernel General Usage | 0 | 338 | January 19, 2021 |
Tag » Code 700 Reason An Illegal Memory Access Was Encountered
-
Incidental Error 700 - An Illegal Memory Access Is Encountered
-
Simple CUDA Test Always Fails With "an Illegal Memory Access Was ...
-
CUDA Error 700: An Illegal Memory Access Was Encountered #1946
-
Cuda Runtime Error (700) : An Illegal Memory Access Was Encountered
-
CUDA Error: An Illegal Memory Access Was Encountered With ...
-
CUDA Error In :465 : An Illegal Memory Access Was ...
-
An Empirical Method Of Debugging "illegal Memory Access" Bug In ...
-
Getting Cuda Error 700 Without Any Obvious Reason - ADocLib
-
PyTorch RuntimeError: CUDA Error: An Illegal Memory Access Was ...
-
An Illegal Memory Access Was Encountered(CUDA错误非法访问内存)
-
Pytorch报错:CUDA Error: An Illegal Memory Access Was Encountered
-
An Illegal Memory Access Was Encountered” A Few Times This Past Day.
-
CUDA Error: An Illegal Memory Access Was Encountered - Part 1 ...