Draft: Add amdgpu hsakmt native context support for enable OpenCL based on AMD ROCm stack
Background:
This idea comes from virtgpu-native-context for mesa graphics virtgpu-native-context: Add msm native context.
The hsakmt is like drm in amdgpu ROCm compute stack, trying to make guest use native libhsakmt driver to enable amdgpu ROCm compute stack.
Currently the OpenCL is in progress. The libhsakmt is totally different with libdrm, more modifications is needed and this draft MR is in very early stage as far as I can see.
Implementation details:
- Add libhsakmt backend, amd rbtree for memory management, new blob flag and create function.
- Libhsakmt needs userptr feature that use the user space memory directly, called SVA/SVM generally, and the guest system memory need access by host libhsakmt directly, this is the first challenge in implementation. The implement of WSL GPADL (Guest Physical Address Descriptor Lists): WSL-GPADL is referenced, the guest user memory used in hsakmt native context is not moveable so that the backend driver and GPU hardware can access them with no data error. And we have a plan to bypass mmu notifior message into backend let guest user memory pin free.
- The libhsakmt bo is address based not handle based different with libdrm. And the ROCm runtime submit the command use the libhsakmt bo address in guest directly. So we need mirror the guest address and host address this is the second challenge, the rbtree is used for manage the libhsakmt bo address keep all the bo address from libhsakmt is same with the guest address in reversed memory range.
- The most difference between libhsakmt and libdrm is libhsakmt doesn't return file handle when open device. Libhsakmt ties with the process this is the third challenge. So different guest process share one real libhsakmt backend. And we are trying to modify the libshakmt let it support multi handle in one preocess. Or maybe trying to create a multi process backend stack is better.
V2:
- Add VA check in some APIs need virtual address.
- Add malloc check in all response malloc path.
- Add failed/error handling in vamgr.
- Remove
hsakmt_set_frag_free
andhsakmt_set_frag_used
in vamgr - Remove all bool, void* in proto.
- Add pad, static size check, -Wpadded in proto.
- Use drm_* for logging.
- Abort when dereserve fail.
- Bypass some error return value into guest UMD.
- Add bound check in some commands.
- Modify meson build, libhsakmt backend requires drm and amdgpu drm now.
- Modify code format into current style.
V3:
- Use va_handle for guest blob mampped resource creating.
- Fix VHSAKMT_CCMD.
- Add else do { } while (false) in VHSA_CHECK_VA.
- Fixe fail handling path in VHSAKMT_CCMD_QUERY_TILE_CONFIG.
- Add new flag to ensure the AQL r/w memory free after AQL queue and guest BO.
V4:
- Use drm_context functions in hsakmt device, but use vhsakmt_context* to replace drm_context* solution for further upgrade.
- Add hsakmt_util.h to reuse drm_log or use it's own vhsa_log.
- Add hsakmt_vm.c put virtual address manager in it.
- Add vhsakmt backend for initializing vamgr.
- Add libhsakmt virtio feature apis, rely on HSAKMT_VIRTIO to enable or disable.
- Add hsakmt_hw.h for reuse drm_hw.h .
TODO:
- Modify the capset id.
- Mofidy the native context reference code make them close to DRM helper.
- Add HW version check.
- Modify the rbtree code copyright notice.
- Maybe use typedef to reuse drm_context* functions is better, need discuss.
Performance:
Got 97% (13000/13300) performance vs bare metal in Geekbench6 by using OpenCL API in Xen hypervisor. The OpenCL CTS basic test all passed currently.