Vulkan Memory Allocator: Recommended Usage Patterns
Có thể bạn quan tâm
Vulkan gives great flexibility in memory allocation. This chapter shows the most common patterns.
See also slides from talk: Sawicki, Adam. Advanced Graphics Techniques Tutorial: Memory management in Vulkan and DX12. Game Developers Conference, 2018
GPU-only resourceWhen: Any resources that you frequently write and read on GPU, e.g. images used as color attachments (aka "render targets"), depth-stencil attachments, images/buffers used as storage image/buffer (aka "Unordered Access View (UAV)").
What to do: Let the library select the optimal memory type, which will likely have VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT.
VkImageCreateInfo imgCreateInfo = { VK_STRUCTURE_TYPE_IMAGE_CREATE_INFO }; imgCreateInfo.imageType = VK_IMAGE_TYPE_2D; imgCreateInfo.extent.width = 3840; imgCreateInfo.extent.height = 2160; imgCreateInfo.extent.depth = 1; imgCreateInfo.mipLevels = 1; imgCreateInfo.arrayLayers = 1; imgCreateInfo.format = VK_FORMAT_R8G8B8A8_UNORM; imgCreateInfo.tiling = VK_IMAGE_TILING_OPTIMAL; imgCreateInfo.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; imgCreateInfo.usage = VK_IMAGE_USAGE_SAMPLED_BIT | VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT; imgCreateInfo.samples = VK_SAMPLE_COUNT_1_BIT; VmaAllocationCreateInfo allocCreateInfo = {}; allocCreateInfo.usage = VMA_MEMORY_USAGE_AUTO; allocCreateInfo.flags = VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT; allocCreateInfo.priority = 1.0f; VkImage img; VmaAllocation alloc; vmaCreateImage(allocator, &imgCreateInfo, &allocCreateInfo, &img, &alloc, nullptr); vmaCreateImageVkResult vmaCreateImage(VmaAllocator allocator, const VkImageCreateInfo *pImageCreateInfo, const VmaAllocationCreateInfo *pAllocationCreateInfo, VkImage *pImage, VmaAllocation *pAllocation, VmaAllocationInfo *pAllocationInfo)Function similar to vmaCreateBuffer() but for images. VMA_MEMORY_USAGE_AUTO@ VMA_MEMORY_USAGE_AUTODefinition vk_mem_alloc.h:553 VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT@ VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BITSet this flag if the allocation should have its own memory block.Definition vk_mem_alloc.h:592 VmaAllocationCreateInfoParameters of new VmaAllocation.Definition vk_mem_alloc.h:1294 VmaAllocationCreateInfo::priorityfloat priorityA floating-point value between 0 and 1, indicating the priority of the allocation relative to other m...Definition vk_mem_alloc.h:1340 VmaAllocationCreateInfo::usageVmaMemoryUsage usageIntended usage of memory.Definition vk_mem_alloc.h:1302 VmaAllocationCreateInfo::flagsVmaAllocationCreateFlags flagsUse VmaAllocationCreateFlagBits enum.Definition vk_mem_alloc.h:1296 VmaAllocationRepresents single memory allocation.Also consider: Consider creating them as dedicated allocations using VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT, especially if they are large or if you plan to destroy and recreate them with different sizes e.g. when display resolution changes. Prefer to create such resources first and all other GPU resources (like textures and vertex buffers) later. When VK_EXT_memory_priority extension is enabled, it is also worth setting high priority to such allocation to decrease chances to be evicted to system memory by the operating system.
Staging copy for uploadWhen: A "staging" buffer than you want to map and fill from CPU code, then use as a source of transfer to some GPU resource.
What to do: Use flag VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT. Let the library select the optimal memory type, which will always have VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT.
VkBufferCreateInfo bufCreateInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; bufCreateInfo.size = 65536; bufCreateInfo.usage = VK_BUFFER_USAGE_TRANSFER_SRC_BIT; VmaAllocationCreateInfo allocCreateInfo = {}; allocCreateInfo.usage = VMA_MEMORY_USAGE_AUTO; allocCreateInfo.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT | VMA_ALLOCATION_CREATE_MAPPED_BIT; VkBuffer buf; VmaAllocation alloc; VmaAllocationInfo allocInfo; vmaCreateBuffer(allocator, &bufCreateInfo, &allocCreateInfo, &buf, &alloc, &allocInfo); ... memcpy(allocInfo.pMappedData, myData, myDataSize); vmaCreateBufferVkResult vmaCreateBuffer(VmaAllocator allocator, const VkBufferCreateInfo *pBufferCreateInfo, const VmaAllocationCreateInfo *pAllocationCreateInfo, VkBuffer *pBuffer, VmaAllocation *pAllocation, VmaAllocationInfo *pAllocationInfo)Creates a new VkBuffer, allocates and binds memory for it. VMA_ALLOCATION_CREATE_MAPPED_BIT@ VMA_ALLOCATION_CREATE_MAPPED_BITSet this flag to use a memory that will be persistently mapped and retrieve pointer to it.Definition vk_mem_alloc.h:613 VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT@ VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BITDefinition vk_mem_alloc.h:662 VmaAllocationInfoDefinition vk_mem_alloc.h:1432 VmaAllocationInfo::pMappedDatavoid * pMappedDataPointer to the beginning of this allocation as mapped data.Definition vk_mem_alloc.h:1474Also consider: You can map the allocation using vmaMapMemory() or you can create it as persistenly mapped using VMA_ALLOCATION_CREATE_MAPPED_BIT, as in the example above.
ReadbackWhen: Buffers for data written by or transferred from the GPU that you want to read back on the CPU, e.g. results of some computations.
What to do: Use flag VMA_ALLOCATION_CREATE_HOST_ACCESS_RANDOM_BIT. Let the library select the optimal memory type, which will always have VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_CACHED_BIT.
VkBufferCreateInfo bufCreateInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; bufCreateInfo.size = 65536; bufCreateInfo.usage = VK_BUFFER_USAGE_TRANSFER_DST_BIT; VmaAllocationCreateInfo allocCreateInfo = {}; allocCreateInfo.usage = VMA_MEMORY_USAGE_AUTO; allocCreateInfo.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_RANDOM_BIT | VMA_ALLOCATION_CREATE_MAPPED_BIT; VkBuffer buf; VmaAllocation alloc; VmaAllocationInfo allocInfo; vmaCreateBuffer(allocator, &bufCreateInfo, &allocCreateInfo, &buf, &alloc, &allocInfo); ... const float* downloadedData = (const float*)allocInfo.pMappedData; VMA_ALLOCATION_CREATE_HOST_ACCESS_RANDOM_BIT@ VMA_ALLOCATION_CREATE_HOST_ACCESS_RANDOM_BITDefinition vk_mem_alloc.h:674 Advanced data uploadingFor resources that you frequently write on CPU via mapped pointer and frequently read on GPU e.g. as a uniform buffer (also called "dynamic"), multiple options are possible:
- Easiest solution is to have one copy of the resource in HOST_VISIBLE memory, even if it means system RAM (not DEVICE_LOCAL) on systems with a discrete graphics card, and make the device reach out to that resource directly.
- Reads performed by the device will then go through PCI Express bus. The performance of this access may be limited, but it may be fine depending on the size of this resource (whether it is small enough to quickly end up in GPU cache) and the sparsity of access.
- On systems with unified memory (e.g. AMD APU or Intel integrated graphics, mobile chips), a memory type may be available that is both HOST_VISIBLE (available for mapping) and DEVICE_LOCAL (fast to access from the GPU). Then, it is likely the best choice for such type of resource.
- Systems with a discrete graphics card and separate video memory may or may not expose a memory type that is both HOST_VISIBLE and DEVICE_LOCAL, also known as Base Address Register (BAR). If they do, it represents a piece of VRAM (or entire VRAM, if ReBAR is enabled in the motherboard BIOS) that is available to CPU for mapping.
- Writes performed by the host to that memory go through PCI Express bus. The performance of these writes may be limited, but it may be fine, especially on PCIe 4.0, as long as rules of using uncached and write-combined memory are followed - only sequential writes and no reads.
- Finally, you may need or prefer to create a separate copy of the resource in DEVICE_LOCAL memory, a separate "staging" copy in HOST_VISIBLE memory and perform an explicit transfer command between them.
Thankfully, VMA offers an aid to create and use such resources in the the way optimal for the current Vulkan device. To help the library make the best choice, use flag VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT together with VMA_ALLOCATION_CREATE_HOST_ACCESS_ALLOW_TRANSFER_INSTEAD_BIT. It will then prefer a memory type that is both DEVICE_LOCAL and HOST_VISIBLE (integrated memory or BAR), but if no such memory type is available or allocation from it fails (PC graphics cards have only 256 MB of BAR by default, unless ReBAR is supported and enabled in BIOS), it will fall back to DEVICE_LOCAL memory for fast GPU access. It is then up to you to detect that the allocation ended up in a memory type that is not HOST_VISIBLE, so you need to create another "staging" allocation and perform explicit transfers.
VkBufferCreateInfo bufCreateInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; bufCreateInfo.size = 65536; bufCreateInfo.usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT; VmaAllocationCreateInfo allocCreateInfo = {}; allocCreateInfo.usage = VMA_MEMORY_USAGE_AUTO; allocCreateInfo.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT | VMA_ALLOCATION_CREATE_HOST_ACCESS_ALLOW_TRANSFER_INSTEAD_BIT | VMA_ALLOCATION_CREATE_MAPPED_BIT; VkBuffer buf; VmaAllocation alloc; VmaAllocationInfo allocInfo; VkResult result = vmaCreateBuffer(allocator, &bufCreateInfo, &allocCreateInfo, &buf, &alloc, &allocInfo); // Check result... VkMemoryPropertyFlags memPropFlags; vmaGetAllocationMemoryProperties(allocator, alloc, &memPropFlags); if(memPropFlags & VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT) { // The Allocation ended up in a mappable memory. // Calling vmaCopyMemoryToAllocation() does vmaMapMemory(), memcpy(), vmaUnmapMemory(), and vmaFlushAllocation(). result = vmaCopyMemoryToAllocation(allocator, myData, alloc, 0, myDataSize); // Check result... VkBufferMemoryBarrier bufMemBarrier = { VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER }; bufMemBarrier.srcAccessMask = VK_ACCESS_HOST_WRITE_BIT; bufMemBarrier.dstAccessMask = VK_ACCESS_UNIFORM_READ_BIT; bufMemBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; bufMemBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; bufMemBarrier.buffer = buf; bufMemBarrier.offset = 0; bufMemBarrier.size = VK_WHOLE_SIZE; // It's important to insert a buffer memory barrier here to ensure writing to the buffer has finished. vkCmdPipelineBarrier(cmdBuf, VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_VERTEX_SHADER_BIT, 0, 0, nullptr, 1, &bufMemBarrier, 0, nullptr); } else { // Allocation ended up in a non-mappable memory - a transfer using a staging buffer is required. VkBufferCreateInfo stagingBufCreateInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; stagingBufCreateInfo.size = 65536; stagingBufCreateInfo.usage = VK_BUFFER_USAGE_TRANSFER_SRC_BIT; VmaAllocationCreateInfo stagingAllocCreateInfo = {}; stagingAllocCreateInfo.usage = VMA_MEMORY_USAGE_AUTO; stagingAllocCreateInfo.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT | VMA_ALLOCATION_CREATE_MAPPED_BIT; VkBuffer stagingBuf; VmaAllocation stagingAlloc; VmaAllocationInfo stagingAllocInfo; result = vmaCreateBuffer(allocator, &stagingBufCreateInfo, &stagingAllocCreateInfo, &stagingBuf, &stagingAlloc, &stagingAllocInfo); // Check result... // Calling vmaCopyMemoryToAllocation() does vmaMapMemory(), memcpy(), vmaUnmapMemory(), and vmaFlushAllocation(). result = vmaCopyMemoryToAllocation(allocator, myData, stagingAlloc, 0, myDataSize); // Check result... VkBufferMemoryBarrier bufMemBarrier = { VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER }; bufMemBarrier.srcAccessMask = VK_ACCESS_HOST_WRITE_BIT; bufMemBarrier.dstAccessMask = VK_ACCESS_TRANSFER_READ_BIT; bufMemBarrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; bufMemBarrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; bufMemBarrier.buffer = stagingBuf; bufMemBarrier.offset = 0; bufMemBarrier.size = VK_WHOLE_SIZE; // Insert a buffer memory barrier to make sure writing to the staging buffer has finished. vkCmdPipelineBarrier(cmdBuf, VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0, 0, nullptr, 1, &bufMemBarrier, 0, nullptr); VkBufferCopy bufCopy = { 0, // srcOffset 0, // dstOffset, myDataSize, // size }; vkCmdCopyBuffer(cmdBuf, stagingBuf, buf, 1, &bufCopy); VkBufferMemoryBarrier bufMemBarrier2 = { VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER }; bufMemBarrier2.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT; bufMemBarrier2.dstAccessMask = VK_ACCESS_UNIFORM_READ_BIT; // We created a uniform buffer bufMemBarrier2.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; bufMemBarrier2.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; bufMemBarrier2.buffer = buf; bufMemBarrier2.offset = 0; bufMemBarrier2.size = VK_WHOLE_SIZE; // Make sure copying from staging buffer to the actual buffer has finished by inserting a buffer memory barrier. vkCmdPipelineBarrier(cmdBuf, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_VERTEX_SHADER_BIT, 0, 0, nullptr, 1, &bufMemBarrier2, 0, nullptr); } vmaCopyMemoryToAllocationVkResult vmaCopyMemoryToAllocation(VmaAllocator allocator, const void *pSrcHostPointer, VmaAllocation dstAllocation, VkDeviceSize dstAllocationLocalOffset, VkDeviceSize size)Maps the allocation temporarily if needed, copies data from specified host pointer to it,... vmaGetAllocationMemoryPropertiesvoid vmaGetAllocationMemoryProperties(VmaAllocator allocator, VmaAllocation allocation, VkMemoryPropertyFlags *pFlags)Given an allocation, returns Property Flags of its memory type. VMA_ALLOCATION_CREATE_HOST_ACCESS_ALLOW_TRANSFER_INSTEAD_BIT@ VMA_ALLOCATION_CREATE_HOST_ACCESS_ALLOW_TRANSFER_INSTEAD_BITDefinition vk_mem_alloc.h:686 Other use casesHere are some other, less obvious use cases and their recommended settings:
- An image that is used only as transfer source and destination, but it should stay on the device, as it is used to temporarily store a copy of some texture, e.g. from the current to the next frame, for temporal antialiasing or other temporal effects.
- Use VkImageCreateInfo::usage = VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT
- Use VmaAllocationCreateInfo::usage = VMA_MEMORY_USAGE_AUTO
- An image that is used only as transfer source and destination, but it should be placed in the system RAM despite it doesn't need to be mapped, because it serves as a "swap" copy to evict least recently used textures from VRAM.
- Use VkImageCreateInfo::usage = VK_IMAGE_USAGE_TRANSFER_SRC_BIT | VK_IMAGE_USAGE_TRANSFER_DST_BIT
- Use VmaAllocationCreateInfo::usage = VMA_MEMORY_USAGE_AUTO_PREFER_HOST, as VMA needs a hint here to differentiate from the previous case.
- A buffer that you want to map and write from the CPU, directly read from the GPU (e.g. as a uniform or vertex buffer), but you have a clear preference to place it in device or host memory due to its large size.
- Use VkBufferCreateInfo::usage = VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT
- Use VmaAllocationCreateInfo::usage = VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE or VMA_MEMORY_USAGE_AUTO_PREFER_HOST
- Use VmaAllocationCreateInfo::flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT
Từ khóa » Vulkan Vk_memory_property_device_local_bit
-
VkMemoryPropertyFlagBits(3) Manual Page - Khronos Registry
-
VkPhysicalDeviceMemoryProper...
-
Memory Management - Could Someone Help Me Understand ...
-
Vulkan Memory Types On PC And How To Use Them - Adam Sawicki
-
Vulkan Memory Allocator: Deprecated List
-
Using Vulkan® Device Memory - GPUOpen
-
What's Your Vulkan Memory Type? - NVIDIA Developer
-
VK_MEMORY_PROPERTY_HO...
-
Nguyên Tắc Thiết Kế Của Vulkan | Android NDK
-
Vulkan Design Guidelines | Android NDK
-
Staging Buffer - Vulkan Tutorial
-
Vulkan Growing Linear Arena - Gists · GitHub