* [PATCH 0/2] drm/panthor: Fix panthor+FEX-Emu @ 2025-04-17 10:05 Boris Brezillon 2025-04-17 10:05 ` [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info Boris Brezillon 2025-04-17 10:05 ` [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators Boris Brezillon 0 siblings, 2 replies; 13+ messages in thread From: Boris Brezillon @ 2025-04-17 10:05 UTC (permalink / raw) To: Boris Brezillon, Steven Price, Liviu Dudau, Adrián Larumbe Cc: dri-devel, kernel Hello, This is an attempt a couple bugs exposed by FEX-Emu. The first one is pretty trivial and should be uncontroversial, since it's just a missing padding field in one of our uAPI structs. We are getting away with it on arm32 because of the alignment rules provided by the Arm ABI, but x86 has relaxed constraints for u64 fields, and this bug is definitely hit when running a 32-bit x86 mesa binary under FEX Emu. The second fix is addressing a problem we have because FEX-Emu is an aarch64 process executing 32-bit x86 code, meaning the check we do on the is-32bit-task check we do to figure out the MMIO offset seen by the user won't work. In order to fix that, we add an ioctl to let the user explicitly set this offset. The offset can only be set early on, if no MMIO range has been mapped before. With those, and the mesa MR at [1], I managed to run a 32-bit x86 glmark2 through FEX without using the host mesa (if we were to use the thunked mesa lib, both the kernel and mesa would use MMIO_OFFSET_64BIT, and the problem doesn't exist anymore). Regards, Boris [1]https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34573 Boris Brezillon (2): drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info drm/panthor: Fix the user MMIO offset logic for emulators drivers/gpu/drm/panthor/panthor_device.h | 18 +++++++ drivers/gpu/drm/panthor/panthor_drv.c | 60 +++++++++++++++++++----- include/uapi/drm/panthor_drm.h | 44 +++++++++++++++++ 3 files changed, 109 insertions(+), 13 deletions(-) -- 2.49.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info 2025-04-17 10:05 [PATCH 0/2] drm/panthor: Fix panthor+FEX-Emu Boris Brezillon @ 2025-04-17 10:05 ` Boris Brezillon 2025-04-17 10:24 ` Steven Price 2025-04-17 11:26 ` Liviu Dudau 2025-04-17 10:05 ` [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators Boris Brezillon 1 sibling, 2 replies; 13+ messages in thread From: Boris Brezillon @ 2025-04-17 10:05 UTC (permalink / raw) To: Boris Brezillon, Steven Price, Liviu Dudau, Adrián Larumbe Cc: dri-devel, kernel drm_panthor_gpu_info::shader_present is currently automatically offset by 4 byte to meet Arm's 32-bit/64-bit field alignment rules, but those constraints don't stand on 32-bit x86 and cause a mismatch when running an x86 binary in a user emulated environment like FEX. It's also generally agreed that uAPIs should explicitly pad their struct fields, which we originally intended to do, but a mistake slipped through during the submission process, leading drm_panthor_gpu_info::shader_present to be misaligned. This uAPI change doesn't break any of the existing users of panthor which are either arm32 or arm64 where the 64-bit alignment of u64 fields is already enforced a the compiler level. Fixes: 0f25e493a246 ("drm/panthor: Add uAPI") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- include/uapi/drm/panthor_drm.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h index 97e2c4510e69..1379a2d4548c 100644 --- a/include/uapi/drm/panthor_drm.h +++ b/include/uapi/drm/panthor_drm.h @@ -293,6 +293,18 @@ struct drm_panthor_gpu_info { /** @as_present: Bitmask encoding the number of address-space exposed by the MMU. */ __u32 as_present; + /** + * @garbage: Unused field that's not even zero-checked. + * + * This originates from a missing padding that leaked in the initial driver submission + * and was only found when testing the driver in a 32-bit x86 environment, where + * u64 field alignment rules are relaxed compared to aarch32. + * + * This field can't be repurposed, because it's never been checked by the driver and + * userspace is not guaranteed to zero it out. + */ + __u32 garbage; + /** @shader_present: Bitmask encoding the shader cores exposed by the GPU. */ __u64 shader_present; -- 2.49.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info 2025-04-17 10:05 ` [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info Boris Brezillon @ 2025-04-17 10:24 ` Steven Price 2025-04-17 11:24 ` Boris Brezillon 2025-04-17 11:26 ` Liviu Dudau 1 sibling, 1 reply; 13+ messages in thread From: Steven Price @ 2025-04-17 10:24 UTC (permalink / raw) To: Boris Brezillon, Liviu Dudau, Adrián Larumbe; +Cc: dri-devel, kernel On 17/04/2025 11:05, Boris Brezillon wrote: > drm_panthor_gpu_info::shader_present is currently automatically offset > by 4 byte to meet Arm's 32-bit/64-bit field alignment rules, but those > constraints don't stand on 32-bit x86 and cause a mismatch when running > an x86 binary in a user emulated environment like FEX. It's also > generally agreed that uAPIs should explicitly pad their struct fields, > which we originally intended to do, but a mistake slipped through during > the submission process, leading drm_panthor_gpu_info::shader_present to > be misaligned. > > This uAPI change doesn't break any of the existing users of panthor > which are either arm32 or arm64 where the 64-bit alignment of > u64 fields is already enforced a the compiler level. > > Fixes: 0f25e493a246 ("drm/panthor: Add uAPI") > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > include/uapi/drm/panthor_drm.h | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > index 97e2c4510e69..1379a2d4548c 100644 > --- a/include/uapi/drm/panthor_drm.h > +++ b/include/uapi/drm/panthor_drm.h > @@ -293,6 +293,18 @@ struct drm_panthor_gpu_info { > /** @as_present: Bitmask encoding the number of address-space exposed by the MMU. */ > __u32 as_present; > > + /** > + * @garbage: Unused field that's not even zero-checked. > + * > + * This originates from a missing padding that leaked in the initial driver submission > + * and was only found when testing the driver in a 32-bit x86 environment, where > + * u64 field alignment rules are relaxed compared to aarch32. > + * > + * This field can't be repurposed, because it's never been checked by the driver and > + * userspace is not guaranteed to zero it out. Why would user space be providing this structure? This is meant to be provided from the kernel to user space, and (fingers-crossed) we've been zeroing the padding even though not explicitly? (rather than leaking some kernel data). Other than the comment - yes this is a uAPI mistake we should fix. I'm not sure how much we care about historic x86 uAPI but it also should be possible to identify an old x86 client using the x86 padding because the structure will be too short. But my preference would be to say "it's always been broken on x86" and therefore there's no regression. Thanks, Steve > + */ > + __u32 garbage; > + > /** @shader_present: Bitmask encoding the shader cores exposed by the GPU. */ > __u64 shader_present; > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info 2025-04-17 10:24 ` Steven Price @ 2025-04-17 11:24 ` Boris Brezillon 0 siblings, 0 replies; 13+ messages in thread From: Boris Brezillon @ 2025-04-17 11:24 UTC (permalink / raw) To: Steven Price; +Cc: Liviu Dudau, Adrián Larumbe, dri-devel, kernel On Thu, 17 Apr 2025 11:24:32 +0100 Steven Price <steven.price@arm.com> wrote: > On 17/04/2025 11:05, Boris Brezillon wrote: > > drm_panthor_gpu_info::shader_present is currently automatically offset > > by 4 byte to meet Arm's 32-bit/64-bit field alignment rules, but those > > constraints don't stand on 32-bit x86 and cause a mismatch when running > > an x86 binary in a user emulated environment like FEX. It's also > > generally agreed that uAPIs should explicitly pad their struct fields, > > which we originally intended to do, but a mistake slipped through during > > the submission process, leading drm_panthor_gpu_info::shader_present to > > be misaligned. > > > > This uAPI change doesn't break any of the existing users of panthor > > which are either arm32 or arm64 where the 64-bit alignment of > > u64 fields is already enforced a the compiler level. > > > > Fixes: 0f25e493a246 ("drm/panthor: Add uAPI") > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > --- > > include/uapi/drm/panthor_drm.h | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > > index 97e2c4510e69..1379a2d4548c 100644 > > --- a/include/uapi/drm/panthor_drm.h > > +++ b/include/uapi/drm/panthor_drm.h > > @@ -293,6 +293,18 @@ struct drm_panthor_gpu_info { > > /** @as_present: Bitmask encoding the number of address-space exposed by the MMU. */ > > __u32 as_present; > > > > + /** > > + * @garbage: Unused field that's not even zero-checked. > > + * > > + * This originates from a missing padding that leaked in the initial driver submission > > + * and was only found when testing the driver in a 32-bit x86 environment, where > > + * u64 field alignment rules are relaxed compared to aarch32. > > + * > > + * This field can't be repurposed, because it's never been checked by the driver and > > + * userspace is not guaranteed to zero it out. > > Why would user space be providing this structure? This is meant to be > provided from the kernel to user space, and (fingers-crossed) we've been > zeroing the padding even though not explicitly? (rather than leaking > some kernel data). Uh, you're right this doesn't matter for kernel -> user data transfers. I guess we can just make it a regular padding field, and allow it to be repurposed if needed (as long as the expected default value is zero, of course). > > Other than the comment - yes this is a uAPI mistake we should fix. > > I'm not sure how much we care about historic x86 uAPI but it also should > be possible to identify an old x86 client using the x86 padding because > the structure will be too short. But my preference would be to say "it's > always been broken on x86" and therefore there's no regression. That's also my opinion: we don't have native x86 users of panthor, and emulated ones are just broken right now, because the kernel side (which is arm32/64) already has a different layout. > > Thanks, > Steve > > > + */ > > + __u32 garbage; > > + > > /** @shader_present: Bitmask encoding the shader cores exposed by the GPU. */ > > __u64 shader_present; > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info 2025-04-17 10:05 ` [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info Boris Brezillon 2025-04-17 10:24 ` Steven Price @ 2025-04-17 11:26 ` Liviu Dudau 1 sibling, 0 replies; 13+ messages in thread From: Liviu Dudau @ 2025-04-17 11:26 UTC (permalink / raw) To: Boris Brezillon; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel On Thu, Apr 17, 2025 at 12:05:02PM +0200, Boris Brezillon wrote: > drm_panthor_gpu_info::shader_present is currently automatically offset > by 4 byte to meet Arm's 32-bit/64-bit field alignment rules, but those > constraints don't stand on 32-bit x86 and cause a mismatch when running > an x86 binary in a user emulated environment like FEX. It's also > generally agreed that uAPIs should explicitly pad their struct fields, > which we originally intended to do, but a mistake slipped through during > the submission process, leading drm_panthor_gpu_info::shader_present to > be misaligned. > > This uAPI change doesn't break any of the existing users of panthor > which are either arm32 or arm64 where the 64-bit alignment of > u64 fields is already enforced a the compiler level. > > Fixes: 0f25e493a246 ("drm/panthor: Add uAPI") > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > include/uapi/drm/panthor_drm.h | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > index 97e2c4510e69..1379a2d4548c 100644 > --- a/include/uapi/drm/panthor_drm.h > +++ b/include/uapi/drm/panthor_drm.h > @@ -293,6 +293,18 @@ struct drm_panthor_gpu_info { > /** @as_present: Bitmask encoding the number of address-space exposed by the MMU. */ > __u32 as_present; > > + /** > + * @garbage: Unused field that's not even zero-checked. By whom? Kernel provides it and initializes it to zero, so I guess that's why it's not checked. > + * > + * This originates from a missing padding that leaked in the initial driver submission > + * and was only found when testing the driver in a 32-bit x86 environment, where > + * u64 field alignment rules are relaxed compared to aarch32. > + * > + * This field can't be repurposed, because it's never been checked by the driver and > + * userspace is not guaranteed to zero it out. The direction is the other way around, it's provided by kernel. I would loosen the restriction on repurposing, maybe when 32bit x86 is no longer such a hot market we can reuse it for some other chicken bits. With the comment updated, Acked-by: Liviu Dudau <liviu.dudau@arm.com> Best regards, Liviu > + */ > + __u32 garbage; > + > /** @shader_present: Bitmask encoding the shader cores exposed by the GPU. */ > __u64 shader_present; > > -- > 2.49.0 > -- ==================== | I would like to | | fix the world, | | but they're not | | giving me the | \ source code! / --------------- ¯\_(ツ)_/¯ ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators 2025-04-17 10:05 [PATCH 0/2] drm/panthor: Fix panthor+FEX-Emu Boris Brezillon 2025-04-17 10:05 ` [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info Boris Brezillon @ 2025-04-17 10:05 ` Boris Brezillon 2025-04-17 10:41 ` Steven Price 1 sibling, 1 reply; 13+ messages in thread From: Boris Brezillon @ 2025-04-17 10:05 UTC (permalink / raw) To: Boris Brezillon, Steven Price, Liviu Dudau, Adrián Larumbe Cc: dri-devel, kernel Currently, we pick the MMIO offset based on the size of the pgoff_t type seen by the process that manipulates the FD, such that a 32-bit process can always map the user MMIO ranges. But this approach doesn't work well for emulators like FEX, where the emulator is a 64-bit binary which might be executing 32-bit code. In that case, the kernel thinks it's the 64-bit process and assumes DRM_PANTHOR_USER_MMIO_OFFSET_64BIT is in use, but the UMD library expects DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, because it can't mmap() anything above the pgoff_t size. In order to solve that, we need a way to explicitly set the user MMIO offset from the UMD, such that the kernel doesn't have to guess it from the TIF_32BIT flag set on user thread. We keep the old behavior if DRM_PANTHOR_SET_USER_MMIO_OFFSET is never called. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- drivers/gpu/drm/panthor/panthor_device.h | 18 +++++++ drivers/gpu/drm/panthor/panthor_drv.c | 60 +++++++++++++++++++----- include/uapi/drm/panthor_drm.h | 32 +++++++++++++ 3 files changed, 97 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h index 4c27b6d85f46..b97aba89132a 100644 --- a/drivers/gpu/drm/panthor/panthor_device.h +++ b/drivers/gpu/drm/panthor/panthor_device.h @@ -10,6 +10,7 @@ #include <linux/io-pgtable.h> #include <linux/regulator/consumer.h> #include <linux/pm_runtime.h> +#include <linux/rwsem.h> #include <linux/sched.h> #include <linux/spinlock.h> @@ -219,6 +220,23 @@ struct panthor_file { /** @ptdev: Device attached to this file. */ struct panthor_device *ptdev; + /** @user_mmio: User MMIO related fields. */ + struct { + /** @offset: Offset used for user MMIO mappings. */ + u64 offset; + + /** + * @offset_immutable: True if the user MMIO offset became immutable. + * + * Set to true after the first mmap() targeting a page in the user MMIO range. + * After this point, the user MMIO offset can't be changed. + */ + bool offset_immutable; + + /** @offset_lock: Lock used to protect offset changes. */ + struct rw_semaphore offset_lock; + } user_mmio; + /** @vms: VM pool attached to this file. */ struct panthor_vm_pool *vms; diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c index 7cd131af340d..6a8931492536 100644 --- a/drivers/gpu/drm/panthor/panthor_drv.c +++ b/drivers/gpu/drm/panthor/panthor_drv.c @@ -1336,6 +1336,29 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data, return 0; } +static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev, + void *data, struct drm_file *file) +{ + struct drm_panthor_set_user_mmio_offset *args = data; + struct panthor_file *pfile = file->driver_priv; + int ret; + + if (args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_32BIT && + args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_64BIT) + return -EINVAL; + + down_write(&pfile->user_mmio.offset_lock); + if (pfile->user_mmio.offset_immutable) { + ret = pfile->user_mmio.offset != args->offset ? -EINVAL : 0; + } else { + pfile->user_mmio.offset = args->offset; + ret = 0; + } + up_write(&pfile->user_mmio.offset_lock); + + return ret; +} + static int panthor_open(struct drm_device *ddev, struct drm_file *file) { @@ -1353,6 +1376,19 @@ panthor_open(struct drm_device *ddev, struct drm_file *file) } pfile->ptdev = ptdev; + init_rwsem(&pfile->user_mmio.offset_lock); + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET; + +#ifdef CONFIG_ARM64 + /* + * With 32-bit systems being limited by the 32-bit representation of + * mmap2's pgoffset field, we need to make the MMIO offset arch + * specific. + */ + if (test_tsk_thread_flag(current, TIF_32BIT)) + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; +#endif + ret = panthor_vm_pool_create(pfile); if (ret) @@ -1405,6 +1441,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = { PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW), PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW), PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW), + PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW), }; static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) @@ -1418,20 +1455,16 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) if (!drm_dev_enter(file->minor->dev, &cookie)) return -ENODEV; -#ifdef CONFIG_ARM64 - /* - * With 32-bit systems being limited by the 32-bit representation of - * mmap2's pgoffset field, we need to make the MMIO offset arch - * specific. This converts a user MMIO offset into something the kernel - * driver understands. - */ - if (test_tsk_thread_flag(current, TIF_32BIT) && - offset >= DRM_PANTHOR_USER_MMIO_OFFSET_32BIT) { - offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT - - DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; + /* Adjust the user MMIO to match the offset used kernel side. */ + down_read(&pfile->user_mmio.offset_lock); + if (offset >= pfile->user_mmio.offset && + pfile->user_mmio.offset != DRM_PANTHOR_USER_MMIO_OFFSET) { + offset -= pfile->user_mmio.offset; + offset += DRM_PANTHOR_USER_MMIO_OFFSET; vma->vm_pgoff = offset >> PAGE_SHIFT; + pfile->user_mmio.offset_immutable = true; } -#endif + up_read(&pfile->user_mmio.offset_lock); if (offset >= DRM_PANTHOR_USER_MMIO_OFFSET) ret = panthor_device_mmap_io(ptdev, vma); @@ -1514,6 +1547,7 @@ static void panthor_debugfs_init(struct drm_minor *minor) * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag + * - 1.4 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl */ static const struct drm_driver panthor_drm_driver = { .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | @@ -1527,7 +1561,7 @@ static const struct drm_driver panthor_drm_driver = { .name = "panthor", .desc = "Panthor DRM driver", .major = 1, - .minor = 3, + .minor = 4, .gem_create_object = panthor_gem_create_object, .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h index 1379a2d4548c..2a16ca86113c 100644 --- a/include/uapi/drm/panthor_drm.h +++ b/include/uapi/drm/panthor_drm.h @@ -127,6 +127,20 @@ enum drm_panthor_ioctl_id { /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */ DRM_PANTHOR_TILER_HEAP_DESTROY, + + /** + * @DRM_PANTHOR_SET_USER_MMIO_OFFSET: Set the offset to use as the user MMIO offset. + * + * The default behavior is to pick the MMIO offset based on the size of the pgoff_t + * type seen by the process that manipulates the FD, such that a 32-bit process can + * always map the user MMIO ranges. But this approach doesn't work well for emulators + * like FEX, where the emulator is an 64-bit binary which might be executing 32-bit + * code. In that case, the kernel thinks it's the 64-bit process and assumes + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT is in use, but the UMD library expects + * DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, because it can't mmap() anything above the + * pgoff_t size. + */ + DRM_PANTHOR_SET_USER_MMIO_OFFSET, }; /** @@ -989,6 +1003,22 @@ struct drm_panthor_tiler_heap_destroy { __u32 pad; }; +/** + * struct drm_panthor_set_user_mmio_offset - Arguments passed to + * DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET + */ +struct drm_panthor_set_user_mmio_offset { + /** + * @offset: User MMIO offset to use. + * + * Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT or + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT. The common use case is to pass + * DRM_PANTHOR_USER_MMIO_OFFSET which picks the right value based on the size of + * pgoff_t (AKA unsigned long). + */ + __u64 offset; +}; + /** * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number * @__access: Access type. Must be R, W or RW. @@ -1031,6 +1061,8 @@ enum { DRM_IOCTL_PANTHOR(WR, TILER_HEAP_CREATE, tiler_heap_create), DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY = DRM_IOCTL_PANTHOR(WR, TILER_HEAP_DESTROY, tiler_heap_destroy), + DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET = + DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset), }; #if defined(__cplusplus) -- 2.49.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators 2025-04-17 10:05 ` [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators Boris Brezillon @ 2025-04-17 10:41 ` Steven Price 2025-04-17 11:33 ` Liviu Dudau 2025-04-17 12:10 ` Boris Brezillon 0 siblings, 2 replies; 13+ messages in thread From: Steven Price @ 2025-04-17 10:41 UTC (permalink / raw) To: Boris Brezillon, Liviu Dudau, Adrián Larumbe; +Cc: dri-devel, kernel On 17/04/2025 11:05, Boris Brezillon wrote: > Currently, we pick the MMIO offset based on the size of the pgoff_t > type seen by the process that manipulates the FD, such that a 32-bit > process can always map the user MMIO ranges. But this approach doesn't > work well for emulators like FEX, where the emulator is a 64-bit binary > which might be executing 32-bit code. In that case, the kernel thinks > it's the 64-bit process and assumes DRM_PANTHOR_USER_MMIO_OFFSET_64BIT > is in use, but the UMD library expects DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, > because it can't mmap() anything above the pgoff_t size. I'm not a fan of the FEX behaviour here. I know I won't be popular, but can FEX not just handle this difference internally? > In order to solve that, we need a way to explicitly set the user MMIO > offset from the UMD, such that the kernel doesn't have to guess it > from the TIF_32BIT flag set on user thread. We keep the old behavior > if DRM_PANTHOR_SET_USER_MMIO_OFFSET is never called. Although I agree this is probably a better uAPI that we should have had from the beginning (hindsight and all that!). > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > drivers/gpu/drm/panthor/panthor_device.h | 18 +++++++ > drivers/gpu/drm/panthor/panthor_drv.c | 60 +++++++++++++++++++----- > include/uapi/drm/panthor_drm.h | 32 +++++++++++++ > 3 files changed, 97 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > index 4c27b6d85f46..b97aba89132a 100644 > --- a/drivers/gpu/drm/panthor/panthor_device.h > +++ b/drivers/gpu/drm/panthor/panthor_device.h > @@ -10,6 +10,7 @@ > #include <linux/io-pgtable.h> > #include <linux/regulator/consumer.h> > #include <linux/pm_runtime.h> > +#include <linux/rwsem.h> > #include <linux/sched.h> > #include <linux/spinlock.h> > > @@ -219,6 +220,23 @@ struct panthor_file { > /** @ptdev: Device attached to this file. */ > struct panthor_device *ptdev; > > + /** @user_mmio: User MMIO related fields. */ > + struct { > + /** @offset: Offset used for user MMIO mappings. */ > + u64 offset; > + > + /** > + * @offset_immutable: True if the user MMIO offset became immutable. > + * > + * Set to true after the first mmap() targeting a page in the user MMIO range. > + * After this point, the user MMIO offset can't be changed. > + */ > + bool offset_immutable; Do we need this complexity? Does it really matter if user space confuses itself by changing the offsets? > + > + /** @offset_lock: Lock used to protect offset changes. */ > + struct rw_semaphore offset_lock; Equally the lock seems slightly overkill - AFAICT user space can only harm itself. > + } user_mmio; > + > /** @vms: VM pool attached to this file. */ > struct panthor_vm_pool *vms; > > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c > index 7cd131af340d..6a8931492536 100644 > --- a/drivers/gpu/drm/panthor/panthor_drv.c > +++ b/drivers/gpu/drm/panthor/panthor_drv.c > @@ -1336,6 +1336,29 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data, > return 0; > } > > +static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev, > + void *data, struct drm_file *file) > +{ > + struct drm_panthor_set_user_mmio_offset *args = data; > + struct panthor_file *pfile = file->driver_priv; > + int ret; > + > + if (args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_32BIT && > + args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_64BIT) > + return -EINVAL; Note we're not preventing a 32 bit client requesting to use 64 bit offsets here. > + > + down_write(&pfile->user_mmio.offset_lock); > + if (pfile->user_mmio.offset_immutable) { > + ret = pfile->user_mmio.offset != args->offset ? -EINVAL : 0; > + } else { > + pfile->user_mmio.offset = args->offset; > + ret = 0; > + } > + up_write(&pfile->user_mmio.offset_lock); > + > + return ret; > +} > + > static int > panthor_open(struct drm_device *ddev, struct drm_file *file) > { > @@ -1353,6 +1376,19 @@ panthor_open(struct drm_device *ddev, struct drm_file *file) > } > > pfile->ptdev = ptdev; > + init_rwsem(&pfile->user_mmio.offset_lock); > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET; > + > +#ifdef CONFIG_ARM64 > + /* > + * With 32-bit systems being limited by the 32-bit representation of > + * mmap2's pgoffset field, we need to make the MMIO offset arch > + * specific. > + */ > + if (test_tsk_thread_flag(current, TIF_32BIT)) > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > +#endif > + > > ret = panthor_vm_pool_create(pfile); > if (ret) > @@ -1405,6 +1441,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = { > PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW), > PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW), > PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW), > + PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW), > }; > > static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > @@ -1418,20 +1455,16 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > if (!drm_dev_enter(file->minor->dev, &cookie)) > return -ENODEV; > > -#ifdef CONFIG_ARM64 > - /* > - * With 32-bit systems being limited by the 32-bit representation of > - * mmap2's pgoffset field, we need to make the MMIO offset arch > - * specific. This converts a user MMIO offset into something the kernel > - * driver understands. > - */ > - if (test_tsk_thread_flag(current, TIF_32BIT) && > - offset >= DRM_PANTHOR_USER_MMIO_OFFSET_32BIT) { > - offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT - > - DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > + /* Adjust the user MMIO to match the offset used kernel side. */ > + down_read(&pfile->user_mmio.offset_lock); > + if (offset >= pfile->user_mmio.offset && > + pfile->user_mmio.offset != DRM_PANTHOR_USER_MMIO_OFFSET) { > + offset -= pfile->user_mmio.offset; > + offset += DRM_PANTHOR_USER_MMIO_OFFSET; > vma->vm_pgoff = offset >> PAGE_SHIFT; > + pfile->user_mmio.offset_immutable = true; > } > -#endif > + up_read(&pfile->user_mmio.offset_lock); I can't help feeling we can just simplify this to: u64 mmio_offset = pfile->user_mmio.offset; if (offset >= mmio_offset) { offset -= mmio_offset; offset += DRM_PANTHOR_USER_MMIO_OFFSET; vma->vm_pgoff = offset >> PAGE_SHIFT; ret = panthor_device_mmap_io(ptdev, vma); } else { ret = drm_gem_mmap(filp, vma); } Or even go further and push the offset calculations into panthor_device_mmap_io(). > > if (offset >= DRM_PANTHOR_USER_MMIO_OFFSET) > ret = panthor_device_mmap_io(ptdev, vma); > @@ -1514,6 +1547,7 @@ static void panthor_debugfs_init(struct drm_minor *minor) > * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query > * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority > * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag > + * - 1.4 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl > */ > static const struct drm_driver panthor_drm_driver = { > .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | > @@ -1527,7 +1561,7 @@ static const struct drm_driver panthor_drm_driver = { > .name = "panthor", > .desc = "Panthor DRM driver", > .major = 1, > - .minor = 3, > + .minor = 4, > > .gem_create_object = panthor_gem_create_object, > .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > index 1379a2d4548c..2a16ca86113c 100644 > --- a/include/uapi/drm/panthor_drm.h > +++ b/include/uapi/drm/panthor_drm.h > @@ -127,6 +127,20 @@ enum drm_panthor_ioctl_id { > > /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */ > DRM_PANTHOR_TILER_HEAP_DESTROY, > + > + /** > + * @DRM_PANTHOR_SET_USER_MMIO_OFFSET: Set the offset to use as the user MMIO offset. > + * > + * The default behavior is to pick the MMIO offset based on the size of the pgoff_t > + * type seen by the process that manipulates the FD, such that a 32-bit process can > + * always map the user MMIO ranges. But this approach doesn't work well for emulators > + * like FEX, where the emulator is an 64-bit binary which might be executing 32-bit > + * code. In that case, the kernel thinks it's the 64-bit process and assumes > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT is in use, but the UMD library expects > + * DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, because it can't mmap() anything above the > + * pgoff_t size. > + */ > + DRM_PANTHOR_SET_USER_MMIO_OFFSET, > }; > > /** > @@ -989,6 +1003,22 @@ struct drm_panthor_tiler_heap_destroy { > __u32 pad; > }; > > +/** > + * struct drm_panthor_set_user_mmio_offset - Arguments passed to > + * DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET > + */ > +struct drm_panthor_set_user_mmio_offset { > + /** > + * @offset: User MMIO offset to use. > + * > + * Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT or > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT. The common use case is to pass > + * DRM_PANTHOR_USER_MMIO_OFFSET which picks the right value based on the size of > + * pgoff_t (AKA unsigned long). "The common use case" is not to call this ioctl ;) Although if we were designing this uAPI from scratch I'd just say require user space to decide where it wants the MMIO region and not have two offsets to choose from. Thanks, Steve > + */ > + __u64 offset; > +}; > + > /** > * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number > * @__access: Access type. Must be R, W or RW. > @@ -1031,6 +1061,8 @@ enum { > DRM_IOCTL_PANTHOR(WR, TILER_HEAP_CREATE, tiler_heap_create), > DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY = > DRM_IOCTL_PANTHOR(WR, TILER_HEAP_DESTROY, tiler_heap_destroy), > + DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET = > + DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset), > }; > > #if defined(__cplusplus) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators 2025-04-17 10:41 ` Steven Price @ 2025-04-17 11:33 ` Liviu Dudau 2025-04-17 12:16 ` Boris Brezillon 2025-04-17 12:10 ` Boris Brezillon 1 sibling, 1 reply; 13+ messages in thread From: Liviu Dudau @ 2025-04-17 11:33 UTC (permalink / raw) To: Steven Price; +Cc: Boris Brezillon, Adrián Larumbe, dri-devel, kernel On Thu, Apr 17, 2025 at 11:41:18AM +0100, Steven Price wrote: > On 17/04/2025 11:05, Boris Brezillon wrote: > > Currently, we pick the MMIO offset based on the size of the pgoff_t > > type seen by the process that manipulates the FD, such that a 32-bit > > process can always map the user MMIO ranges. But this approach doesn't > > work well for emulators like FEX, where the emulator is a 64-bit binary > > which might be executing 32-bit code. In that case, the kernel thinks > > it's the 64-bit process and assumes DRM_PANTHOR_USER_MMIO_OFFSET_64BIT > > is in use, but the UMD library expects DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, > > because it can't mmap() anything above the pgoff_t size. > > I'm not a fan of the FEX behaviour here. I know I won't be popular, but > can FEX not just handle this difference internally? > > > In order to solve that, we need a way to explicitly set the user MMIO > > offset from the UMD, such that the kernel doesn't have to guess it > > from the TIF_32BIT flag set on user thread. We keep the old behavior > > if DRM_PANTHOR_SET_USER_MMIO_OFFSET is never called. > > Although I agree this is probably a better uAPI that we should have had > from the beginning (hindsight and all that!). > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > --- > > drivers/gpu/drm/panthor/panthor_device.h | 18 +++++++ > > drivers/gpu/drm/panthor/panthor_drv.c | 60 +++++++++++++++++++----- > > include/uapi/drm/panthor_drm.h | 32 +++++++++++++ > > 3 files changed, 97 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > > index 4c27b6d85f46..b97aba89132a 100644 > > --- a/drivers/gpu/drm/panthor/panthor_device.h > > +++ b/drivers/gpu/drm/panthor/panthor_device.h > > @@ -10,6 +10,7 @@ > > #include <linux/io-pgtable.h> > > #include <linux/regulator/consumer.h> > > #include <linux/pm_runtime.h> > > +#include <linux/rwsem.h> > > #include <linux/sched.h> > > #include <linux/spinlock.h> > > > > @@ -219,6 +220,23 @@ struct panthor_file { > > /** @ptdev: Device attached to this file. */ > > struct panthor_device *ptdev; > > > > + /** @user_mmio: User MMIO related fields. */ > > + struct { > > + /** @offset: Offset used for user MMIO mappings. */ > > + u64 offset; > > + > > + /** > > + * @offset_immutable: True if the user MMIO offset became immutable. > > + * > > + * Set to true after the first mmap() targeting a page in the user MMIO range. > > + * After this point, the user MMIO offset can't be changed. > > + */ > > + bool offset_immutable; > > Do we need this complexity? Does it really matter if user space confuses > itself by changing the offsets? > > > + > > + /** @offset_lock: Lock used to protect offset changes. */ > > + struct rw_semaphore offset_lock; > > Equally the lock seems slightly overkill - AFAICT user space can only > harm itself. > > > + } user_mmio; > > + > > /** @vms: VM pool attached to this file. */ > > struct panthor_vm_pool *vms; > > > > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c > > index 7cd131af340d..6a8931492536 100644 > > --- a/drivers/gpu/drm/panthor/panthor_drv.c > > +++ b/drivers/gpu/drm/panthor/panthor_drv.c > > @@ -1336,6 +1336,29 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data, > > return 0; > > } > > > > +static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev, > > + void *data, struct drm_file *file) > > +{ > > + struct drm_panthor_set_user_mmio_offset *args = data; > > + struct panthor_file *pfile = file->driver_priv; > > + int ret; > > + > > + if (args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_32BIT && > > + args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_64BIT) > > + return -EINVAL; > > Note we're not preventing a 32 bit client requesting to use 64 bit > offsets here. > > > + > > + down_write(&pfile->user_mmio.offset_lock); > > + if (pfile->user_mmio.offset_immutable) { > > + ret = pfile->user_mmio.offset != args->offset ? -EINVAL : 0; > > + } else { > > + pfile->user_mmio.offset = args->offset; > > + ret = 0; > > + } > > + up_write(&pfile->user_mmio.offset_lock); > > + > > + return ret; > > +} > > + > > static int > > panthor_open(struct drm_device *ddev, struct drm_file *file) > > { > > @@ -1353,6 +1376,19 @@ panthor_open(struct drm_device *ddev, struct drm_file *file) > > } > > > > pfile->ptdev = ptdev; > > + init_rwsem(&pfile->user_mmio.offset_lock); > > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET; > > + > > +#ifdef CONFIG_ARM64 > > + /* > > + * With 32-bit systems being limited by the 32-bit representation of > > + * mmap2's pgoffset field, we need to make the MMIO offset arch > > + * specific. > > + */ > > + if (test_tsk_thread_flag(current, TIF_32BIT)) > > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > > +#endif > > + > > > > ret = panthor_vm_pool_create(pfile); > > if (ret) > > @@ -1405,6 +1441,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = { > > PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW), > > PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW), > > PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW), > > + PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW), > > }; > > > > static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > > @@ -1418,20 +1455,16 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > > if (!drm_dev_enter(file->minor->dev, &cookie)) > > return -ENODEV; > > > > -#ifdef CONFIG_ARM64 > > - /* > > - * With 32-bit systems being limited by the 32-bit representation of > > - * mmap2's pgoffset field, we need to make the MMIO offset arch > > - * specific. This converts a user MMIO offset into something the kernel > > - * driver understands. > > - */ > > - if (test_tsk_thread_flag(current, TIF_32BIT) && > > - offset >= DRM_PANTHOR_USER_MMIO_OFFSET_32BIT) { > > - offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT - > > - DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > > + /* Adjust the user MMIO to match the offset used kernel side. */ > > + down_read(&pfile->user_mmio.offset_lock); > > + if (offset >= pfile->user_mmio.offset && > > + pfile->user_mmio.offset != DRM_PANTHOR_USER_MMIO_OFFSET) { > > + offset -= pfile->user_mmio.offset; > > + offset += DRM_PANTHOR_USER_MMIO_OFFSET; > > vma->vm_pgoff = offset >> PAGE_SHIFT; > > + pfile->user_mmio.offset_immutable = true; > > } > > -#endif > > + up_read(&pfile->user_mmio.offset_lock); > > I can't help feeling we can just simplify this to: > > u64 mmio_offset = pfile->user_mmio.offset; > > if (offset >= mmio_offset) { > offset -= mmio_offset; > offset += DRM_PANTHOR_USER_MMIO_OFFSET; > vma->vm_pgoff = offset >> PAGE_SHIFT; > > ret = panthor_device_mmap_io(ptdev, vma); > } else { > ret = drm_gem_mmap(filp, vma); > } > > Or even go further and push the offset calculations into > panthor_device_mmap_io(). > > > > > if (offset >= DRM_PANTHOR_USER_MMIO_OFFSET) > > ret = panthor_device_mmap_io(ptdev, vma); > > @@ -1514,6 +1547,7 @@ static void panthor_debugfs_init(struct drm_minor *minor) > > * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query > > * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority > > * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag > > + * - 1.4 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl > > */ > > static const struct drm_driver panthor_drm_driver = { > > .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | > > @@ -1527,7 +1561,7 @@ static const struct drm_driver panthor_drm_driver = { > > .name = "panthor", > > .desc = "Panthor DRM driver", > > .major = 1, > > - .minor = 3, > > + .minor = 4, > > > > .gem_create_object = panthor_gem_create_object, > > .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, > > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > > index 1379a2d4548c..2a16ca86113c 100644 > > --- a/include/uapi/drm/panthor_drm.h > > +++ b/include/uapi/drm/panthor_drm.h > > @@ -127,6 +127,20 @@ enum drm_panthor_ioctl_id { > > > > /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */ > > DRM_PANTHOR_TILER_HEAP_DESTROY, > > + > > + /** > > + * @DRM_PANTHOR_SET_USER_MMIO_OFFSET: Set the offset to use as the user MMIO offset. > > + * > > + * The default behavior is to pick the MMIO offset based on the size of the pgoff_t > > + * type seen by the process that manipulates the FD, such that a 32-bit process can > > + * always map the user MMIO ranges. But this approach doesn't work well for emulators > > + * like FEX, where the emulator is an 64-bit binary which might be executing 32-bit > > + * code. In that case, the kernel thinks it's the 64-bit process and assumes > > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT is in use, but the UMD library expects > > + * DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, because it can't mmap() anything above the > > + * pgoff_t size. > > + */ > > + DRM_PANTHOR_SET_USER_MMIO_OFFSET, > > }; > > > > /** > > @@ -989,6 +1003,22 @@ struct drm_panthor_tiler_heap_destroy { > > __u32 pad; > > }; > > > > +/** > > + * struct drm_panthor_set_user_mmio_offset - Arguments passed to > > + * DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET > > + */ > > +struct drm_panthor_set_user_mmio_offset { > > + /** > > + * @offset: User MMIO offset to use. > > + * > > + * Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT or > > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT. The common use case is to pass > > + * DRM_PANTHOR_USER_MMIO_OFFSET which picks the right value based on the size of > > + * pgoff_t (AKA unsigned long). > > "The common use case" is not to call this ioctl ;) Although if we were > designing this uAPI from scratch I'd just say require user space to > decide where it wants the MMIO region and not have two offsets to choose > from. I have to say that I'm with Steve here. Can we not actually change the IOCTL to userspace passing an offset mask so that we can restrict the offset range? We don't need all the locking or to let the user space decide the offsets. Best regards, Liviu > > Thanks, > Steve > > > + */ > > + __u64 offset; > > +}; > > + > > /** > > * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number > > * @__access: Access type. Must be R, W or RW. > > @@ -1031,6 +1061,8 @@ enum { > > DRM_IOCTL_PANTHOR(WR, TILER_HEAP_CREATE, tiler_heap_create), > > DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY = > > DRM_IOCTL_PANTHOR(WR, TILER_HEAP_DESTROY, tiler_heap_destroy), > > + DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET = > > + DRM_IOCTL_PANTHOR(WR, SET_USER_MMIO_OFFSET, set_user_mmio_offset), > > }; > > > > #if defined(__cplusplus) > -- ==================== | I would like to | | fix the world, | | but they're not | | giving me the | \ source code! / --------------- ¯\_(ツ)_/¯ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators 2025-04-17 11:33 ` Liviu Dudau @ 2025-04-17 12:16 ` Boris Brezillon 2025-04-17 13:05 ` Liviu Dudau 0 siblings, 1 reply; 13+ messages in thread From: Boris Brezillon @ 2025-04-17 12:16 UTC (permalink / raw) To: Liviu Dudau; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel On Thu, 17 Apr 2025 12:33:01 +0100 Liviu Dudau <liviu.dudau@arm.com> wrote: > On Thu, Apr 17, 2025 at 11:41:18AM +0100, Steven Price wrote: > > On 17/04/2025 11:05, Boris Brezillon wrote: > > > Currently, we pick the MMIO offset based on the size of the pgoff_t > > > type seen by the process that manipulates the FD, such that a 32-bit > > > process can always map the user MMIO ranges. But this approach doesn't > > > work well for emulators like FEX, where the emulator is a 64-bit binary > > > which might be executing 32-bit code. In that case, the kernel thinks > > > it's the 64-bit process and assumes DRM_PANTHOR_USER_MMIO_OFFSET_64BIT > > > is in use, but the UMD library expects DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, > > > because it can't mmap() anything above the pgoff_t size. > > > > I'm not a fan of the FEX behaviour here. I know I won't be popular, but > > can FEX not just handle this difference internally? > > > > > In order to solve that, we need a way to explicitly set the user MMIO > > > offset from the UMD, such that the kernel doesn't have to guess it > > > from the TIF_32BIT flag set on user thread. We keep the old behavior > > > if DRM_PANTHOR_SET_USER_MMIO_OFFSET is never called. > > > > Although I agree this is probably a better uAPI that we should have had > > from the beginning (hindsight and all that!). > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > --- > > > drivers/gpu/drm/panthor/panthor_device.h | 18 +++++++ > > > drivers/gpu/drm/panthor/panthor_drv.c | 60 +++++++++++++++++++----- > > > include/uapi/drm/panthor_drm.h | 32 +++++++++++++ > > > 3 files changed, 97 insertions(+), 13 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > > > index 4c27b6d85f46..b97aba89132a 100644 > > > --- a/drivers/gpu/drm/panthor/panthor_device.h > > > +++ b/drivers/gpu/drm/panthor/panthor_device.h > > > @@ -10,6 +10,7 @@ > > > #include <linux/io-pgtable.h> > > > #include <linux/regulator/consumer.h> > > > #include <linux/pm_runtime.h> > > > +#include <linux/rwsem.h> > > > #include <linux/sched.h> > > > #include <linux/spinlock.h> > > > > > > @@ -219,6 +220,23 @@ struct panthor_file { > > > /** @ptdev: Device attached to this file. */ > > > struct panthor_device *ptdev; > > > > > > + /** @user_mmio: User MMIO related fields. */ > > > + struct { > > > + /** @offset: Offset used for user MMIO mappings. */ > > > + u64 offset; > > > + > > > + /** > > > + * @offset_immutable: True if the user MMIO offset became immutable. > > > + * > > > + * Set to true after the first mmap() targeting a page in the user MMIO range. > > > + * After this point, the user MMIO offset can't be changed. > > > + */ > > > + bool offset_immutable; > > > > Do we need this complexity? Does it really matter if user space confuses > > itself by changing the offsets? > > > > > + > > > + /** @offset_lock: Lock used to protect offset changes. */ > > > + struct rw_semaphore offset_lock; > > > > Equally the lock seems slightly overkill - AFAICT user space can only > > harm itself. > > > > > + } user_mmio; > > > + > > > /** @vms: VM pool attached to this file. */ > > > struct panthor_vm_pool *vms; > > > > > > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c > > > index 7cd131af340d..6a8931492536 100644 > > > --- a/drivers/gpu/drm/panthor/panthor_drv.c > > > +++ b/drivers/gpu/drm/panthor/panthor_drv.c > > > @@ -1336,6 +1336,29 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data, > > > return 0; > > > } > > > > > > +static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev, > > > + void *data, struct drm_file *file) > > > +{ > > > + struct drm_panthor_set_user_mmio_offset *args = data; > > > + struct panthor_file *pfile = file->driver_priv; > > > + int ret; > > > + > > > + if (args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_32BIT && > > > + args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_64BIT) > > > + return -EINVAL; > > > > Note we're not preventing a 32 bit client requesting to use 64 bit > > offsets here. > > > > > + > > > + down_write(&pfile->user_mmio.offset_lock); > > > + if (pfile->user_mmio.offset_immutable) { > > > + ret = pfile->user_mmio.offset != args->offset ? -EINVAL : 0; > > > + } else { > > > + pfile->user_mmio.offset = args->offset; > > > + ret = 0; > > > + } > > > + up_write(&pfile->user_mmio.offset_lock); > > > + > > > + return ret; > > > +} > > > + > > > static int > > > panthor_open(struct drm_device *ddev, struct drm_file *file) > > > { > > > @@ -1353,6 +1376,19 @@ panthor_open(struct drm_device *ddev, struct drm_file *file) > > > } > > > > > > pfile->ptdev = ptdev; > > > + init_rwsem(&pfile->user_mmio.offset_lock); > > > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET; > > > + > > > +#ifdef CONFIG_ARM64 > > > + /* > > > + * With 32-bit systems being limited by the 32-bit representation of > > > + * mmap2's pgoffset field, we need to make the MMIO offset arch > > > + * specific. > > > + */ > > > + if (test_tsk_thread_flag(current, TIF_32BIT)) > > > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > > > +#endif > > > + > > > > > > ret = panthor_vm_pool_create(pfile); > > > if (ret) > > > @@ -1405,6 +1441,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = { > > > PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW), > > > PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW), > > > PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW), > > > + PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW), > > > }; > > > > > > static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > > > @@ -1418,20 +1455,16 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > > > if (!drm_dev_enter(file->minor->dev, &cookie)) > > > return -ENODEV; > > > > > > -#ifdef CONFIG_ARM64 > > > - /* > > > - * With 32-bit systems being limited by the 32-bit representation of > > > - * mmap2's pgoffset field, we need to make the MMIO offset arch > > > - * specific. This converts a user MMIO offset into something the kernel > > > - * driver understands. > > > - */ > > > - if (test_tsk_thread_flag(current, TIF_32BIT) && > > > - offset >= DRM_PANTHOR_USER_MMIO_OFFSET_32BIT) { > > > - offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT - > > > - DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > > > + /* Adjust the user MMIO to match the offset used kernel side. */ > > > + down_read(&pfile->user_mmio.offset_lock); > > > + if (offset >= pfile->user_mmio.offset && > > > + pfile->user_mmio.offset != DRM_PANTHOR_USER_MMIO_OFFSET) { > > > + offset -= pfile->user_mmio.offset; > > > + offset += DRM_PANTHOR_USER_MMIO_OFFSET; > > > vma->vm_pgoff = offset >> PAGE_SHIFT; > > > + pfile->user_mmio.offset_immutable = true; > > > } > > > -#endif > > > + up_read(&pfile->user_mmio.offset_lock); > > > > I can't help feeling we can just simplify this to: > > > > u64 mmio_offset = pfile->user_mmio.offset; > > > > if (offset >= mmio_offset) { > > offset -= mmio_offset; > > offset += DRM_PANTHOR_USER_MMIO_OFFSET; > > vma->vm_pgoff = offset >> PAGE_SHIFT; > > > > ret = panthor_device_mmap_io(ptdev, vma); > > } else { > > ret = drm_gem_mmap(filp, vma); > > } > > > > Or even go further and push the offset calculations into > > panthor_device_mmap_io(). > > > > > > > > if (offset >= DRM_PANTHOR_USER_MMIO_OFFSET) > > > ret = panthor_device_mmap_io(ptdev, vma); > > > @@ -1514,6 +1547,7 @@ static void panthor_debugfs_init(struct drm_minor *minor) > > > * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query > > > * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority > > > * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag > > > + * - 1.4 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl > > > */ > > > static const struct drm_driver panthor_drm_driver = { > > > .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | > > > @@ -1527,7 +1561,7 @@ static const struct drm_driver panthor_drm_driver = { > > > .name = "panthor", > > > .desc = "Panthor DRM driver", > > > .major = 1, > > > - .minor = 3, > > > + .minor = 4, > > > > > > .gem_create_object = panthor_gem_create_object, > > > .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, > > > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > > > index 1379a2d4548c..2a16ca86113c 100644 > > > --- a/include/uapi/drm/panthor_drm.h > > > +++ b/include/uapi/drm/panthor_drm.h > > > @@ -127,6 +127,20 @@ enum drm_panthor_ioctl_id { > > > > > > /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */ > > > DRM_PANTHOR_TILER_HEAP_DESTROY, > > > + > > > + /** > > > + * @DRM_PANTHOR_SET_USER_MMIO_OFFSET: Set the offset to use as the user MMIO offset. > > > + * > > > + * The default behavior is to pick the MMIO offset based on the size of the pgoff_t > > > + * type seen by the process that manipulates the FD, such that a 32-bit process can > > > + * always map the user MMIO ranges. But this approach doesn't work well for emulators > > > + * like FEX, where the emulator is an 64-bit binary which might be executing 32-bit > > > + * code. In that case, the kernel thinks it's the 64-bit process and assumes > > > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT is in use, but the UMD library expects > > > + * DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, because it can't mmap() anything above the > > > + * pgoff_t size. > > > + */ > > > + DRM_PANTHOR_SET_USER_MMIO_OFFSET, > > > }; > > > > > > /** > > > @@ -989,6 +1003,22 @@ struct drm_panthor_tiler_heap_destroy { > > > __u32 pad; > > > }; > > > > > > +/** > > > + * struct drm_panthor_set_user_mmio_offset - Arguments passed to > > > + * DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET > > > + */ > > > +struct drm_panthor_set_user_mmio_offset { > > > + /** > > > + * @offset: User MMIO offset to use. > > > + * > > > + * Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT or > > > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT. The common use case is to pass > > > + * DRM_PANTHOR_USER_MMIO_OFFSET which picks the right value based on the size of > > > + * pgoff_t (AKA unsigned long). > > > > "The common use case" is not to call this ioctl ;) Although if we were > > designing this uAPI from scratch I'd just say require user space to > > decide where it wants the MMIO region and not have two offsets to choose > > from. > > I have to say that I'm with Steve here. Can we not actually change the IOCTL to userspace > passing an offset mask so that we can restrict the offset range? We don't need all the > locking or to let the user space decide the offsets. For the reasons explained in my reply to Steve, I think I'd prefer to have very restrictive constraints first and relax them once we have a need for random MMIO offsets. I mean, once that need arises, we'll have to update userspace binaries anyway, and bumping the KMD version is pretty trivial, so, better safe than sorry. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators 2025-04-17 12:16 ` Boris Brezillon @ 2025-04-17 13:05 ` Liviu Dudau 0 siblings, 0 replies; 13+ messages in thread From: Liviu Dudau @ 2025-04-17 13:05 UTC (permalink / raw) To: Boris Brezillon; +Cc: Steven Price, Adrián Larumbe, dri-devel, kernel On Thu, Apr 17, 2025 at 02:16:11PM +0200, Boris Brezillon wrote: > On Thu, 17 Apr 2025 12:33:01 +0100 > Liviu Dudau <liviu.dudau@arm.com> wrote: > > > On Thu, Apr 17, 2025 at 11:41:18AM +0100, Steven Price wrote: > > > On 17/04/2025 11:05, Boris Brezillon wrote: > > > > Currently, we pick the MMIO offset based on the size of the pgoff_t > > > > type seen by the process that manipulates the FD, such that a 32-bit > > > > process can always map the user MMIO ranges. But this approach doesn't > > > > work well for emulators like FEX, where the emulator is a 64-bit binary > > > > which might be executing 32-bit code. In that case, the kernel thinks > > > > it's the 64-bit process and assumes DRM_PANTHOR_USER_MMIO_OFFSET_64BIT > > > > is in use, but the UMD library expects DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, > > > > because it can't mmap() anything above the pgoff_t size. > > > > > > I'm not a fan of the FEX behaviour here. I know I won't be popular, but > > > can FEX not just handle this difference internally? > > > > > > > In order to solve that, we need a way to explicitly set the user MMIO > > > > offset from the UMD, such that the kernel doesn't have to guess it > > > > from the TIF_32BIT flag set on user thread. We keep the old behavior > > > > if DRM_PANTHOR_SET_USER_MMIO_OFFSET is never called. > > > > > > Although I agree this is probably a better uAPI that we should have had > > > from the beginning (hindsight and all that!). > > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > > --- > > > > drivers/gpu/drm/panthor/panthor_device.h | 18 +++++++ > > > > drivers/gpu/drm/panthor/panthor_drv.c | 60 +++++++++++++++++++----- > > > > include/uapi/drm/panthor_drm.h | 32 +++++++++++++ > > > > 3 files changed, 97 insertions(+), 13 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > > > > index 4c27b6d85f46..b97aba89132a 100644 > > > > --- a/drivers/gpu/drm/panthor/panthor_device.h > > > > +++ b/drivers/gpu/drm/panthor/panthor_device.h > > > > @@ -10,6 +10,7 @@ > > > > #include <linux/io-pgtable.h> > > > > #include <linux/regulator/consumer.h> > > > > #include <linux/pm_runtime.h> > > > > +#include <linux/rwsem.h> > > > > #include <linux/sched.h> > > > > #include <linux/spinlock.h> > > > > > > > > @@ -219,6 +220,23 @@ struct panthor_file { > > > > /** @ptdev: Device attached to this file. */ > > > > struct panthor_device *ptdev; > > > > > > > > + /** @user_mmio: User MMIO related fields. */ > > > > + struct { > > > > + /** @offset: Offset used for user MMIO mappings. */ > > > > + u64 offset; > > > > + > > > > + /** > > > > + * @offset_immutable: True if the user MMIO offset became immutable. > > > > + * > > > > + * Set to true after the first mmap() targeting a page in the user MMIO range. > > > > + * After this point, the user MMIO offset can't be changed. > > > > + */ > > > > + bool offset_immutable; > > > > > > Do we need this complexity? Does it really matter if user space confuses > > > itself by changing the offsets? > > > > > > > + > > > > + /** @offset_lock: Lock used to protect offset changes. */ > > > > + struct rw_semaphore offset_lock; > > > > > > Equally the lock seems slightly overkill - AFAICT user space can only > > > harm itself. > > > > > > > + } user_mmio; > > > > + > > > > /** @vms: VM pool attached to this file. */ > > > > struct panthor_vm_pool *vms; > > > > > > > > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c > > > > index 7cd131af340d..6a8931492536 100644 > > > > --- a/drivers/gpu/drm/panthor/panthor_drv.c > > > > +++ b/drivers/gpu/drm/panthor/panthor_drv.c > > > > @@ -1336,6 +1336,29 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data, > > > > return 0; > > > > } > > > > > > > > +static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev, > > > > + void *data, struct drm_file *file) > > > > +{ > > > > + struct drm_panthor_set_user_mmio_offset *args = data; > > > > + struct panthor_file *pfile = file->driver_priv; > > > > + int ret; > > > > + > > > > + if (args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_32BIT && > > > > + args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_64BIT) > > > > + return -EINVAL; > > > > > > Note we're not preventing a 32 bit client requesting to use 64 bit > > > offsets here. > > > > > > > + > > > > + down_write(&pfile->user_mmio.offset_lock); > > > > + if (pfile->user_mmio.offset_immutable) { > > > > + ret = pfile->user_mmio.offset != args->offset ? -EINVAL : 0; > > > > + } else { > > > > + pfile->user_mmio.offset = args->offset; > > > > + ret = 0; > > > > + } > > > > + up_write(&pfile->user_mmio.offset_lock); > > > > + > > > > + return ret; > > > > +} > > > > + > > > > static int > > > > panthor_open(struct drm_device *ddev, struct drm_file *file) > > > > { > > > > @@ -1353,6 +1376,19 @@ panthor_open(struct drm_device *ddev, struct drm_file *file) > > > > } > > > > > > > > pfile->ptdev = ptdev; > > > > + init_rwsem(&pfile->user_mmio.offset_lock); > > > > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET; > > > > + > > > > +#ifdef CONFIG_ARM64 > > > > + /* > > > > + * With 32-bit systems being limited by the 32-bit representation of > > > > + * mmap2's pgoffset field, we need to make the MMIO offset arch > > > > + * specific. > > > > + */ > > > > + if (test_tsk_thread_flag(current, TIF_32BIT)) > > > > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > > > > +#endif > > > > + > > > > > > > > ret = panthor_vm_pool_create(pfile); > > > > if (ret) > > > > @@ -1405,6 +1441,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = { > > > > PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW), > > > > PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW), > > > > PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW), > > > > + PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW), > > > > }; > > > > > > > > static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > > > > @@ -1418,20 +1455,16 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > > > > if (!drm_dev_enter(file->minor->dev, &cookie)) > > > > return -ENODEV; > > > > > > > > -#ifdef CONFIG_ARM64 > > > > - /* > > > > - * With 32-bit systems being limited by the 32-bit representation of > > > > - * mmap2's pgoffset field, we need to make the MMIO offset arch > > > > - * specific. This converts a user MMIO offset into something the kernel > > > > - * driver understands. > > > > - */ > > > > - if (test_tsk_thread_flag(current, TIF_32BIT) && > > > > - offset >= DRM_PANTHOR_USER_MMIO_OFFSET_32BIT) { > > > > - offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT - > > > > - DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > > > > + /* Adjust the user MMIO to match the offset used kernel side. */ > > > > + down_read(&pfile->user_mmio.offset_lock); > > > > + if (offset >= pfile->user_mmio.offset && > > > > + pfile->user_mmio.offset != DRM_PANTHOR_USER_MMIO_OFFSET) { > > > > + offset -= pfile->user_mmio.offset; > > > > + offset += DRM_PANTHOR_USER_MMIO_OFFSET; > > > > vma->vm_pgoff = offset >> PAGE_SHIFT; > > > > + pfile->user_mmio.offset_immutable = true; > > > > } > > > > -#endif > > > > + up_read(&pfile->user_mmio.offset_lock); > > > > > > I can't help feeling we can just simplify this to: > > > > > > u64 mmio_offset = pfile->user_mmio.offset; > > > > > > if (offset >= mmio_offset) { > > > offset -= mmio_offset; > > > offset += DRM_PANTHOR_USER_MMIO_OFFSET; > > > vma->vm_pgoff = offset >> PAGE_SHIFT; > > > > > > ret = panthor_device_mmap_io(ptdev, vma); > > > } else { > > > ret = drm_gem_mmap(filp, vma); > > > } > > > > > > Or even go further and push the offset calculations into > > > panthor_device_mmap_io(). > > > > > > > > > > > if (offset >= DRM_PANTHOR_USER_MMIO_OFFSET) > > > > ret = panthor_device_mmap_io(ptdev, vma); > > > > @@ -1514,6 +1547,7 @@ static void panthor_debugfs_init(struct drm_minor *minor) > > > > * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query > > > > * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority > > > > * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag > > > > + * - 1.4 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl > > > > */ > > > > static const struct drm_driver panthor_drm_driver = { > > > > .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | > > > > @@ -1527,7 +1561,7 @@ static const struct drm_driver panthor_drm_driver = { > > > > .name = "panthor", > > > > .desc = "Panthor DRM driver", > > > > .major = 1, > > > > - .minor = 3, > > > > + .minor = 4, > > > > > > > > .gem_create_object = panthor_gem_create_object, > > > > .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, > > > > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > > > > index 1379a2d4548c..2a16ca86113c 100644 > > > > --- a/include/uapi/drm/panthor_drm.h > > > > +++ b/include/uapi/drm/panthor_drm.h > > > > @@ -127,6 +127,20 @@ enum drm_panthor_ioctl_id { > > > > > > > > /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */ > > > > DRM_PANTHOR_TILER_HEAP_DESTROY, > > > > + > > > > + /** > > > > + * @DRM_PANTHOR_SET_USER_MMIO_OFFSET: Set the offset to use as the user MMIO offset. > > > > + * > > > > + * The default behavior is to pick the MMIO offset based on the size of the pgoff_t > > > > + * type seen by the process that manipulates the FD, such that a 32-bit process can > > > > + * always map the user MMIO ranges. But this approach doesn't work well for emulators > > > > + * like FEX, where the emulator is an 64-bit binary which might be executing 32-bit > > > > + * code. In that case, the kernel thinks it's the 64-bit process and assumes > > > > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT is in use, but the UMD library expects > > > > + * DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, because it can't mmap() anything above the > > > > + * pgoff_t size. > > > > + */ > > > > + DRM_PANTHOR_SET_USER_MMIO_OFFSET, > > > > }; > > > > > > > > /** > > > > @@ -989,6 +1003,22 @@ struct drm_panthor_tiler_heap_destroy { > > > > __u32 pad; > > > > }; > > > > > > > > +/** > > > > + * struct drm_panthor_set_user_mmio_offset - Arguments passed to > > > > + * DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET > > > > + */ > > > > +struct drm_panthor_set_user_mmio_offset { > > > > + /** > > > > + * @offset: User MMIO offset to use. > > > > + * > > > > + * Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT or > > > > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT. The common use case is to pass > > > > + * DRM_PANTHOR_USER_MMIO_OFFSET which picks the right value based on the size of > > > > + * pgoff_t (AKA unsigned long). > > > > > > "The common use case" is not to call this ioctl ;) Although if we were > > > designing this uAPI from scratch I'd just say require user space to > > > decide where it wants the MMIO region and not have two offsets to choose > > > from. > > > > I have to say that I'm with Steve here. Can we not actually change the IOCTL to userspace > > passing an offset mask so that we can restrict the offset range? We don't need all the > > locking or to let the user space decide the offsets. > > For the reasons explained in my reply to Steve, I think I'd prefer to > have very restrictive constraints first and relax them once we have a > need for random MMIO offsets. I mean, once that need arises, we'll > have to update userspace binaries anyway, and bumping the KMD version > is pretty trivial, so, better safe than sorry. Sorry, I've linked myself to the wrong part of the reply from Steve. What I'm trying to say is that I would change the whole patch into adding a new IOCTL that sets a mask for the offsets returned by DRM_IOCTL_PANTHOR_BO_MMAP_OFFSET if they are in the user MMIO range. Best regards, Liviu -- ==================== | I would like to | | fix the world, | | but they're not | | giving me the | \ source code! / --------------- ¯\_(ツ)_/¯ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators 2025-04-17 10:41 ` Steven Price 2025-04-17 11:33 ` Liviu Dudau @ 2025-04-17 12:10 ` Boris Brezillon 2025-04-17 12:49 ` Boris Brezillon 2025-04-17 14:58 ` Steven Price 1 sibling, 2 replies; 13+ messages in thread From: Boris Brezillon @ 2025-04-17 12:10 UTC (permalink / raw) To: Steven Price; +Cc: Liviu Dudau, Adrián Larumbe, dri-devel, kernel On Thu, 17 Apr 2025 11:41:18 +0100 Steven Price <steven.price@arm.com> wrote: > On 17/04/2025 11:05, Boris Brezillon wrote: > > Currently, we pick the MMIO offset based on the size of the pgoff_t > > type seen by the process that manipulates the FD, such that a 32-bit > > process can always map the user MMIO ranges. But this approach doesn't > > work well for emulators like FEX, where the emulator is a 64-bit binary > > which might be executing 32-bit code. In that case, the kernel thinks > > it's the 64-bit process and assumes DRM_PANTHOR_USER_MMIO_OFFSET_64BIT > > is in use, but the UMD library expects DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, > > because it can't mmap() anything above the pgoff_t size. > > I'm not a fan of the FEX behaviour here. I know I won't be popular, but > can FEX not just handle this difference internally? The only way it could do that is by intercepting the mmap() calls and adjusting the offset if the dev file is backed by a panthor device and the emulated arch is x86-32. Besides, every userspace emulator would have to do the same if they want to execute an x86-32 mesa lib, so it feels like the wrong place to address the problem. > > > In order to solve that, we need a way to explicitly set the user MMIO > > offset from the UMD, such that the kernel doesn't have to guess it > > from the TIF_32BIT flag set on user thread. We keep the old behavior > > if DRM_PANTHOR_SET_USER_MMIO_OFFSET is never called. > > Although I agree this is probably a better uAPI that we should have had > from the beginning (hindsight and all that!). > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > --- > > drivers/gpu/drm/panthor/panthor_device.h | 18 +++++++ > > drivers/gpu/drm/panthor/panthor_drv.c | 60 +++++++++++++++++++----- > > include/uapi/drm/panthor_drm.h | 32 +++++++++++++ > > 3 files changed, 97 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h > > index 4c27b6d85f46..b97aba89132a 100644 > > --- a/drivers/gpu/drm/panthor/panthor_device.h > > +++ b/drivers/gpu/drm/panthor/panthor_device.h > > @@ -10,6 +10,7 @@ > > #include <linux/io-pgtable.h> > > #include <linux/regulator/consumer.h> > > #include <linux/pm_runtime.h> > > +#include <linux/rwsem.h> > > #include <linux/sched.h> > > #include <linux/spinlock.h> > > > > @@ -219,6 +220,23 @@ struct panthor_file { > > /** @ptdev: Device attached to this file. */ > > struct panthor_device *ptdev; > > > > + /** @user_mmio: User MMIO related fields. */ > > + struct { > > + /** @offset: Offset used for user MMIO mappings. */ > > + u64 offset; > > + > > + /** > > + * @offset_immutable: True if the user MMIO offset became immutable. > > + * > > + * Set to true after the first mmap() targeting a page in the user MMIO range. > > + * After this point, the user MMIO offset can't be changed. > > + */ > > + bool offset_immutable; > > Do we need this complexity? Does it really matter if user space confuses > itself by changing the offsets? I initially thought we had a few if (is_mmio_offset) do_something else do_something_else conditionals happening after the mmap() call (like when we need to evict an mmio-mmap-ed region on suspend), but it turns out those are done with the fixed MMIO_OFFSET that's seen by the driver. My main worry here is if we start introducing such tests, and userspace starts moving things behind our back, leading to potential invalid branching in kernel space, which we know is never good. That's the very reason I wanted this offset to be set in stone as soon as the first mmio mmap() is done (maybe it should even be the first mmap even). I guess if we make it very clear in the doc that no test should happen on this field apart from the one done in panthor_mmap(), and this test is done on a local variable with a READ_ONCE() to ensure we don't deref the field twice (once for the offset >= mmio_offset, and once for the offset adjustment), we can probably drop the lock and the immutable bool. > > > + > > + /** @offset_lock: Lock used to protect offset changes. */ > > + struct rw_semaphore offset_lock; > > Equally the lock seems slightly overkill - AFAICT user space can only > harm itself. If we want to ensure things are not moving behind our back, we need the lock. Of course, if we drop offset_immutable and let userspace change the offset at any point, that's pointless. > > > + } user_mmio; > > + > > /** @vms: VM pool attached to this file. */ > > struct panthor_vm_pool *vms; > > > > diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c > > index 7cd131af340d..6a8931492536 100644 > > --- a/drivers/gpu/drm/panthor/panthor_drv.c > > +++ b/drivers/gpu/drm/panthor/panthor_drv.c > > @@ -1336,6 +1336,29 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data, > > return 0; > > } > > > > +static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev, > > + void *data, struct drm_file *file) > > +{ > > + struct drm_panthor_set_user_mmio_offset *args = data; > > + struct panthor_file *pfile = file->driver_priv; > > + int ret; > > + > > + if (args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_32BIT && > > + args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_64BIT) > > + return -EINVAL; > > Note we're not preventing a 32 bit client requesting to use 64 bit > offsets here. I know. What's likely to happen is that the first mmap() on the MMIO region is going to fail or point to a user BO, which only hurts the client itself. I guess I can check the TIF bit here, but that won't prevent an emulated x86 binary setting a 64-bit offset, and there's no way I can ensure it's picking the right offset, which is why I strongly suggest users of this ioctl() to always pass DRM_PANTHOR_USER_MMIO_OFFSET, not DRM_PANTHOR_USER_MMIO_OFFSET_{32,64}BIT, in the doc. > > > + > > + down_write(&pfile->user_mmio.offset_lock); > > + if (pfile->user_mmio.offset_immutable) { > > + ret = pfile->user_mmio.offset != args->offset ? -EINVAL : 0; > > + } else { > > + pfile->user_mmio.offset = args->offset; > > + ret = 0; > > + } > > + up_write(&pfile->user_mmio.offset_lock); > > + > > + return ret; > > +} > > + > > static int > > panthor_open(struct drm_device *ddev, struct drm_file *file) > > { > > @@ -1353,6 +1376,19 @@ panthor_open(struct drm_device *ddev, struct drm_file *file) > > } > > > > pfile->ptdev = ptdev; > > + init_rwsem(&pfile->user_mmio.offset_lock); > > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET; > > + > > +#ifdef CONFIG_ARM64 > > + /* > > + * With 32-bit systems being limited by the 32-bit representation of > > + * mmap2's pgoffset field, we need to make the MMIO offset arch > > + * specific. > > + */ > > + if (test_tsk_thread_flag(current, TIF_32BIT)) > > + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > > +#endif > > + > > > > ret = panthor_vm_pool_create(pfile); > > if (ret) > > @@ -1405,6 +1441,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = { > > PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW), > > PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW), > > PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW), > > + PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW), > > }; > > > > static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > > @@ -1418,20 +1455,16 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) > > if (!drm_dev_enter(file->minor->dev, &cookie)) > > return -ENODEV; > > > > -#ifdef CONFIG_ARM64 > > - /* > > - * With 32-bit systems being limited by the 32-bit representation of > > - * mmap2's pgoffset field, we need to make the MMIO offset arch > > - * specific. This converts a user MMIO offset into something the kernel > > - * driver understands. > > - */ > > - if (test_tsk_thread_flag(current, TIF_32BIT) && > > - offset >= DRM_PANTHOR_USER_MMIO_OFFSET_32BIT) { > > - offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT - > > - DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; > > + /* Adjust the user MMIO to match the offset used kernel side. */ > > + down_read(&pfile->user_mmio.offset_lock); > > + if (offset >= pfile->user_mmio.offset && > > + pfile->user_mmio.offset != DRM_PANTHOR_USER_MMIO_OFFSET) { > > + offset -= pfile->user_mmio.offset; > > + offset += DRM_PANTHOR_USER_MMIO_OFFSET; > > vma->vm_pgoff = offset >> PAGE_SHIFT; > > + pfile->user_mmio.offset_immutable = true; > > } > > -#endif > > + up_read(&pfile->user_mmio.offset_lock); > > I can't help feeling we can just simplify this to: > > u64 mmio_offset = pfile->user_mmio.offset; > > if (offset >= mmio_offset) { > offset -= mmio_offset; > offset += DRM_PANTHOR_USER_MMIO_OFFSET; > vma->vm_pgoff = offset >> PAGE_SHIFT; > > ret = panthor_device_mmap_io(ptdev, vma); > } else { > ret = drm_gem_mmap(filp, vma); > } > > Or even go further and push the offset calculations into > panthor_device_mmap_io(). Sure, I can do that. > > > > > if (offset >= DRM_PANTHOR_USER_MMIO_OFFSET) > > ret = panthor_device_mmap_io(ptdev, vma); > > @@ -1514,6 +1547,7 @@ static void panthor_debugfs_init(struct drm_minor *minor) > > * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query > > * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority > > * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag > > + * - 1.4 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl > > */ > > static const struct drm_driver panthor_drm_driver = { > > .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | > > @@ -1527,7 +1561,7 @@ static const struct drm_driver panthor_drm_driver = { > > .name = "panthor", > > .desc = "Panthor DRM driver", > > .major = 1, > > - .minor = 3, > > + .minor = 4, > > > > .gem_create_object = panthor_gem_create_object, > > .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, > > diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h > > index 1379a2d4548c..2a16ca86113c 100644 > > --- a/include/uapi/drm/panthor_drm.h > > +++ b/include/uapi/drm/panthor_drm.h > > @@ -127,6 +127,20 @@ enum drm_panthor_ioctl_id { > > > > /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */ > > DRM_PANTHOR_TILER_HEAP_DESTROY, > > + > > + /** > > + * @DRM_PANTHOR_SET_USER_MMIO_OFFSET: Set the offset to use as the user MMIO offset. > > + * > > + * The default behavior is to pick the MMIO offset based on the size of the pgoff_t > > + * type seen by the process that manipulates the FD, such that a 32-bit process can > > + * always map the user MMIO ranges. But this approach doesn't work well for emulators > > + * like FEX, where the emulator is an 64-bit binary which might be executing 32-bit > > + * code. In that case, the kernel thinks it's the 64-bit process and assumes > > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT is in use, but the UMD library expects > > + * DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, because it can't mmap() anything above the > > + * pgoff_t size. > > + */ > > + DRM_PANTHOR_SET_USER_MMIO_OFFSET, > > }; > > > > /** > > @@ -989,6 +1003,22 @@ struct drm_panthor_tiler_heap_destroy { > > __u32 pad; > > }; > > > > +/** > > + * struct drm_panthor_set_user_mmio_offset - Arguments passed to > > + * DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET > > + */ > > +struct drm_panthor_set_user_mmio_offset { > > + /** > > + * @offset: User MMIO offset to use. > > + * > > + * Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT or > > + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT. The common use case is to pass > > + * DRM_PANTHOR_USER_MMIO_OFFSET which picks the right value based on the size of > > + * pgoff_t (AKA unsigned long). > > "The common use case" is not to call this ioctl ;). Fair enough, I'll mention in the doc that this is mostly useful for emulators. > Although if we were > designing this uAPI from scratch I'd just say require user space to > decide where it wants the MMIO region and not have two offsets to choose > from. Actually we also need to make sure it doesn't overlap with the range reserved for GEM mappings [DRM_FILE_PAGE_OFFSET_START:DRM_FILE_PAGE_OFFSET_START+DRM_FILE_PAGE_OFFSET_SIZE], so at the time I just figured I'd assign fixed offsets that are high enough in the range covered by pgoff_t and fix that in stone. I'm not seeing any value in letting the user select the offset here, but maybe I'm missing something. BTW, I quite like the DRM_PANTHOR_USER_MMIO_OFFSET macro making this decision for the caller, because I can't think of any good reason for 64-bit userspace deciding to use DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, and 32-bit userspace just can't use DRM_PANTHOR_USER_MMIO_OFFSET_64BIT as an offset, otherwise it won't be able to mmap() MMIO stuff. The only case where we'd want an explicit OFFSET_32/64BIT is if a 64bit/32bit lib/bin takes the decision for a lib that has a different instruction size. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators 2025-04-17 12:10 ` Boris Brezillon @ 2025-04-17 12:49 ` Boris Brezillon 2025-04-17 14:58 ` Steven Price 1 sibling, 0 replies; 13+ messages in thread From: Boris Brezillon @ 2025-04-17 12:49 UTC (permalink / raw) To: Steven Price; +Cc: Liviu Dudau, Adrián Larumbe, dri-devel, kernel On Thu, 17 Apr 2025 14:10:31 +0200 Boris Brezillon <boris.brezillon@collabora.com> wrote: > > I can't help feeling we can just simplify this to: > > > > u64 mmio_offset = pfile->user_mmio.offset; > > > > if (offset >= mmio_offset) { > > offset -= mmio_offset; > > offset += DRM_PANTHOR_USER_MMIO_OFFSET; > > vma->vm_pgoff = offset >> PAGE_SHIFT; > > > > ret = panthor_device_mmap_io(ptdev, vma); > > } else { > > ret = drm_gem_mmap(filp, vma); > > } > > > > Or even go further and push the offset calculations into > > panthor_device_mmap_io(). > > Sure, I can do that. Actually, if I drop the lock in favor of a u64 mmio_offset = READ_ONCE(pfile->user_mmio.offset); I can't move the vm_pgoff/offset adjustement to panthor_device_mmap_io() because userspace might have called SET_MMIO_OFFSET in the meantime, thus changing the final offset. One option would be to pass the mmio_offset to panthor_device_mmap_io(), but I think I prefer keeping the offset adjustment here if you don't mind. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators 2025-04-17 12:10 ` Boris Brezillon 2025-04-17 12:49 ` Boris Brezillon @ 2025-04-17 14:58 ` Steven Price 1 sibling, 0 replies; 13+ messages in thread From: Steven Price @ 2025-04-17 14:58 UTC (permalink / raw) To: Boris Brezillon; +Cc: Liviu Dudau, Adrián Larumbe, dri-devel, kernel On 17/04/2025 13:10, Boris Brezillon wrote: > On Thu, 17 Apr 2025 11:41:18 +0100 > Steven Price <steven.price@arm.com> wrote: > >> On 17/04/2025 11:05, Boris Brezillon wrote: >>> Currently, we pick the MMIO offset based on the size of the pgoff_t >>> type seen by the process that manipulates the FD, such that a 32-bit >>> process can always map the user MMIO ranges. But this approach doesn't >>> work well for emulators like FEX, where the emulator is a 64-bit binary >>> which might be executing 32-bit code. In that case, the kernel thinks >>> it's the 64-bit process and assumes DRM_PANTHOR_USER_MMIO_OFFSET_64BIT >>> is in use, but the UMD library expects DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, >>> because it can't mmap() anything above the pgoff_t size. >> >> I'm not a fan of the FEX behaviour here. I know I won't be popular, but >> can FEX not just handle this difference internally? > > The only way it could do that is by intercepting the mmap() calls and > adjusting the offset if the dev file is backed by a panthor device and > the emulated arch is x86-32. Besides, every userspace emulator would > have to do the same if they want to execute an x86-32 mesa lib, so it > feels like the wrong place to address the problem. Yeah, I've had that argument before - my view is that if you try to call 64 bit ioctls from code which thinks it's 32 bit then you get to keep the pieces. Ultimately attempting to pass through x86-32 calls to arm64 is going to be painful. I'd much prefer there was some attempt to virtualise devices in user space (i.e. x86 Mesa talks to a virtual GPU implemented in arm64 code which uses arm64 Mesa to talk to the real GPU). However I'll accept the MMIO offset is something that's bad uAPI in panthor - I know I made the mistake of assuming 32 bit wasn't important for a new driver. So let's see if we can improve things kernel side. >> >>> In order to solve that, we need a way to explicitly set the user MMIO >>> offset from the UMD, such that the kernel doesn't have to guess it >>> from the TIF_32BIT flag set on user thread. We keep the old behavior >>> if DRM_PANTHOR_SET_USER_MMIO_OFFSET is never called. >> >> Although I agree this is probably a better uAPI that we should have had >> from the beginning (hindsight and all that!). >> >>> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> >>> --- >>> drivers/gpu/drm/panthor/panthor_device.h | 18 +++++++ >>> drivers/gpu/drm/panthor/panthor_drv.c | 60 +++++++++++++++++++----- >>> include/uapi/drm/panthor_drm.h | 32 +++++++++++++ >>> 3 files changed, 97 insertions(+), 13 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h >>> index 4c27b6d85f46..b97aba89132a 100644 >>> --- a/drivers/gpu/drm/panthor/panthor_device.h >>> +++ b/drivers/gpu/drm/panthor/panthor_device.h >>> @@ -10,6 +10,7 @@ >>> #include <linux/io-pgtable.h> >>> #include <linux/regulator/consumer.h> >>> #include <linux/pm_runtime.h> >>> +#include <linux/rwsem.h> >>> #include <linux/sched.h> >>> #include <linux/spinlock.h> >>> >>> @@ -219,6 +220,23 @@ struct panthor_file { >>> /** @ptdev: Device attached to this file. */ >>> struct panthor_device *ptdev; >>> >>> + /** @user_mmio: User MMIO related fields. */ >>> + struct { >>> + /** @offset: Offset used for user MMIO mappings. */ >>> + u64 offset; >>> + >>> + /** >>> + * @offset_immutable: True if the user MMIO offset became immutable. >>> + * >>> + * Set to true after the first mmap() targeting a page in the user MMIO range. >>> + * After this point, the user MMIO offset can't be changed. >>> + */ >>> + bool offset_immutable; >> >> Do we need this complexity? Does it really matter if user space confuses >> itself by changing the offsets? > > I initially thought we had a few > > if (is_mmio_offset) > do_something > else > do_something_else > > conditionals happening after the mmap() call (like when we need to evict > an mmio-mmap-ed region on suspend), but it turns out those are done with > the fixed MMIO_OFFSET that's seen by the driver. > > My main worry here is if we start introducing such tests, and userspace > starts moving things behind our back, leading to potential invalid > branching in kernel space, which we know is never good. That's the > very reason I wanted this offset to be set in stone as soon as the first > mmio mmap() is done (maybe it should even be the first mmap even). I > guess if we make it very clear in the doc that no test should happen on > this field apart from the one done in panthor_mmap(), and this test is > done on a local variable with a READ_ONCE() to ensure we don't deref > the field twice (once for the offset >= mmio_offset, and once for the > offset adjustment), we can probably drop the lock and the immutable > bool. Personally I do see the MMIO region as a hack for mmap(). Ideally we'd have a separate mmap-like function for doing the MMIO. But hiding mmap behind an ioctl breaks things like valgrind, and having to have a whole extra file descriptor just for (current) just one magic region would be overkill. Hence we have a 'this is higher than you will need so we've repurposed it' hack. So if we allow user space to choose this "higher than you will need" address we permit user space to shoot itself in the foot, but assuming it knows what it's doing it can make use of it. So in this case we have a 64 bit user space which is promising that it's only going to use 32 bit addresses. I agree a comment explaining that this check should only happen in panthor_mmap() would be good as a reminder our future selfs. >> >>> + >>> + /** @offset_lock: Lock used to protect offset changes. */ >>> + struct rw_semaphore offset_lock; >> >> Equally the lock seems slightly overkill - AFAICT user space can only >> harm itself. > > If we want to ensure things are not moving behind our back, we need the > lock. Of course, if we drop offset_immutable and let userspace change > the offset at any point, that's pointless. > >> >>> + } user_mmio; >>> + >>> /** @vms: VM pool attached to this file. */ >>> struct panthor_vm_pool *vms; >>> >>> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c >>> index 7cd131af340d..6a8931492536 100644 >>> --- a/drivers/gpu/drm/panthor/panthor_drv.c >>> +++ b/drivers/gpu/drm/panthor/panthor_drv.c >>> @@ -1336,6 +1336,29 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data, >>> return 0; >>> } >>> >>> +static int panthor_ioctl_set_user_mmio_offset(struct drm_device *ddev, >>> + void *data, struct drm_file *file) >>> +{ >>> + struct drm_panthor_set_user_mmio_offset *args = data; >>> + struct panthor_file *pfile = file->driver_priv; >>> + int ret; >>> + >>> + if (args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_32BIT && >>> + args->offset != DRM_PANTHOR_USER_MMIO_OFFSET_64BIT) >>> + return -EINVAL; >> >> Note we're not preventing a 32 bit client requesting to use 64 bit >> offsets here. > > I know. What's likely to happen is that the first mmap() on the MMIO > region is going to fail or point to a user BO, which only hurts the > client itself. I guess I can check the TIF bit here, but that won't > prevent an emulated x86 binary setting a 64-bit offset, and there's > no way I can ensure it's picking the right offset, which is why I > strongly suggest users of this ioctl() to always pass > DRM_PANTHOR_USER_MMIO_OFFSET, not > DRM_PANTHOR_USER_MMIO_OFFSET_{32,64}BIT, in the doc. Yep, I'm happy to allow a client to shoot itself in the foot - so no need to check. It just seemed at odds with the other immutable checking you were doing. In this particular case of a 32 bit client wanting a 64 bit offset then the effect is that it can never access the MMIO region - so breakage is immediately obvious. >> >>> + >>> + down_write(&pfile->user_mmio.offset_lock); >>> + if (pfile->user_mmio.offset_immutable) { >>> + ret = pfile->user_mmio.offset != args->offset ? -EINVAL : 0; >>> + } else { >>> + pfile->user_mmio.offset = args->offset; >>> + ret = 0; >>> + } >>> + up_write(&pfile->user_mmio.offset_lock); >>> + >>> + return ret; >>> +} >>> + >>> static int >>> panthor_open(struct drm_device *ddev, struct drm_file *file) >>> { >>> @@ -1353,6 +1376,19 @@ panthor_open(struct drm_device *ddev, struct drm_file *file) >>> } >>> >>> pfile->ptdev = ptdev; >>> + init_rwsem(&pfile->user_mmio.offset_lock); >>> + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET; >>> + >>> +#ifdef CONFIG_ARM64 >>> + /* >>> + * With 32-bit systems being limited by the 32-bit representation of >>> + * mmap2's pgoffset field, we need to make the MMIO offset arch >>> + * specific. >>> + */ >>> + if (test_tsk_thread_flag(current, TIF_32BIT)) >>> + pfile->user_mmio.offset = DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; >>> +#endif >>> + >>> >>> ret = panthor_vm_pool_create(pfile); >>> if (ret) >>> @@ -1405,6 +1441,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = { >>> PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW), >>> PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW), >>> PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW), >>> + PANTHOR_IOCTL(SET_USER_MMIO_OFFSET, set_user_mmio_offset, DRM_RENDER_ALLOW), >>> }; >>> >>> static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) >>> @@ -1418,20 +1455,16 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma) >>> if (!drm_dev_enter(file->minor->dev, &cookie)) >>> return -ENODEV; >>> >>> -#ifdef CONFIG_ARM64 >>> - /* >>> - * With 32-bit systems being limited by the 32-bit representation of >>> - * mmap2's pgoffset field, we need to make the MMIO offset arch >>> - * specific. This converts a user MMIO offset into something the kernel >>> - * driver understands. >>> - */ >>> - if (test_tsk_thread_flag(current, TIF_32BIT) && >>> - offset >= DRM_PANTHOR_USER_MMIO_OFFSET_32BIT) { >>> - offset += DRM_PANTHOR_USER_MMIO_OFFSET_64BIT - >>> - DRM_PANTHOR_USER_MMIO_OFFSET_32BIT; >>> + /* Adjust the user MMIO to match the offset used kernel side. */ >>> + down_read(&pfile->user_mmio.offset_lock); >>> + if (offset >= pfile->user_mmio.offset && >>> + pfile->user_mmio.offset != DRM_PANTHOR_USER_MMIO_OFFSET) { >>> + offset -= pfile->user_mmio.offset; >>> + offset += DRM_PANTHOR_USER_MMIO_OFFSET; >>> vma->vm_pgoff = offset >> PAGE_SHIFT; >>> + pfile->user_mmio.offset_immutable = true; >>> } >>> -#endif >>> + up_read(&pfile->user_mmio.offset_lock); >> >> I can't help feeling we can just simplify this to: >> >> u64 mmio_offset = pfile->user_mmio.offset; >> >> if (offset >= mmio_offset) { >> offset -= mmio_offset; >> offset += DRM_PANTHOR_USER_MMIO_OFFSET; >> vma->vm_pgoff = offset >> PAGE_SHIFT; >> >> ret = panthor_device_mmap_io(ptdev, vma); >> } else { >> ret = drm_gem_mmap(filp, vma); >> } >> >> Or even go further and push the offset calculations into >> panthor_device_mmap_io(). > > Sure, I can do that. As you say in the follow up email there are some complexities in pushing into panthor_device_mmap_io(). So don't worry if that doesn't work out. >> >>> >>> if (offset >= DRM_PANTHOR_USER_MMIO_OFFSET) >>> ret = panthor_device_mmap_io(ptdev, vma); >>> @@ -1514,6 +1547,7 @@ static void panthor_debugfs_init(struct drm_minor *minor) >>> * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query >>> * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority >>> * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag >>> + * - 1.4 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl >>> */ >>> static const struct drm_driver panthor_drm_driver = { >>> .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ | >>> @@ -1527,7 +1561,7 @@ static const struct drm_driver panthor_drm_driver = { >>> .name = "panthor", >>> .desc = "Panthor DRM driver", >>> .major = 1, >>> - .minor = 3, >>> + .minor = 4, >>> >>> .gem_create_object = panthor_gem_create_object, >>> .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table, >>> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h >>> index 1379a2d4548c..2a16ca86113c 100644 >>> --- a/include/uapi/drm/panthor_drm.h >>> +++ b/include/uapi/drm/panthor_drm.h >>> @@ -127,6 +127,20 @@ enum drm_panthor_ioctl_id { >>> >>> /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */ >>> DRM_PANTHOR_TILER_HEAP_DESTROY, >>> + >>> + /** >>> + * @DRM_PANTHOR_SET_USER_MMIO_OFFSET: Set the offset to use as the user MMIO offset. >>> + * >>> + * The default behavior is to pick the MMIO offset based on the size of the pgoff_t >>> + * type seen by the process that manipulates the FD, such that a 32-bit process can >>> + * always map the user MMIO ranges. But this approach doesn't work well for emulators >>> + * like FEX, where the emulator is an 64-bit binary which might be executing 32-bit >>> + * code. In that case, the kernel thinks it's the 64-bit process and assumes >>> + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT is in use, but the UMD library expects >>> + * DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, because it can't mmap() anything above the >>> + * pgoff_t size. >>> + */ >>> + DRM_PANTHOR_SET_USER_MMIO_OFFSET, >>> }; >>> >>> /** >>> @@ -989,6 +1003,22 @@ struct drm_panthor_tiler_heap_destroy { >>> __u32 pad; >>> }; >>> >>> +/** >>> + * struct drm_panthor_set_user_mmio_offset - Arguments passed to >>> + * DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET >>> + */ >>> +struct drm_panthor_set_user_mmio_offset { >>> + /** >>> + * @offset: User MMIO offset to use. >>> + * >>> + * Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT or >>> + * DRM_PANTHOR_USER_MMIO_OFFSET_64BIT. The common use case is to pass >>> + * DRM_PANTHOR_USER_MMIO_OFFSET which picks the right value based on the size of >>> + * pgoff_t (AKA unsigned long). >> >> "The common use case" is not to call this ioctl ;). > > Fair enough, I'll mention in the doc that this is mostly useful for > emulators. > >> Although if we were >> designing this uAPI from scratch I'd just say require user space to >> decide where it wants the MMIO region and not have two offsets to choose >> from. > > Actually we also need to make sure it doesn't overlap with the > range reserved for GEM mappings > [DRM_FILE_PAGE_OFFSET_START:DRM_FILE_PAGE_OFFSET_START+DRM_FILE_PAGE_OFFSET_SIZE], > so at the time I just figured I'd assign fixed offsets that are high > enough in the range covered by pgoff_t and fix that in stone. I'm not > seeing any value in letting the user select the offset here, but maybe > I'm missing something. I don't think there's benefit in giving clients full control of the offset at this time. With hindsight if we'd added something like drm_panthor_set_user_mmio_offset from the beginning then the whole issue with 32 bit could have been solved easily without the macro magic. It might well have been sensible to restrict the values that the client can choose to avoid them choosing bad values which cause hard to track down bugs. What you've currently got forcing one of two values makes sense from that point of view. I just wonder if we're going to see a future loosening of this when another use case comes up. But we can tackle that when we actually understand the new use case. > BTW, I quite like the DRM_PANTHOR_USER_MMIO_OFFSET macro making this > decision for the caller, because I can't think of any good reason for > 64-bit userspace deciding to use DRM_PANTHOR_USER_MMIO_OFFSET_32BIT, > and 32-bit userspace just can't use DRM_PANTHOR_USER_MMIO_OFFSET_64BIT > as an offset, otherwise it won't be able to mmap() MMIO stuff. The only > case where we'd want an explicit OFFSET_32/64BIT is if a 64bit/32bit > lib/bin takes the decision for a lib that has a different instruction > size. Agreed it makes sense. Although you already know the "good reason" for 64 bit userspace wanting OFFSET_32BIT - where the 64 bit user space is pretending to be 32 bit ;) But at least with FEX some of that "64 bit" user space is being compiled with a 32 bit long. Thanks, Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-04-17 14:58 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-17 10:05 [PATCH 0/2] drm/panthor: Fix panthor+FEX-Emu Boris Brezillon 2025-04-17 10:05 ` [PATCH 1/2] drm/panthor: Fix missing explicit padding in drm_panthor_gpu_info Boris Brezillon 2025-04-17 10:24 ` Steven Price 2025-04-17 11:24 ` Boris Brezillon 2025-04-17 11:26 ` Liviu Dudau 2025-04-17 10:05 ` [PATCH 2/2] drm/panthor: Fix the user MMIO offset logic for emulators Boris Brezillon 2025-04-17 10:41 ` Steven Price 2025-04-17 11:33 ` Liviu Dudau 2025-04-17 12:16 ` Boris Brezillon 2025-04-17 13:05 ` Liviu Dudau 2025-04-17 12:10 ` Boris Brezillon 2025-04-17 12:49 ` Boris Brezillon 2025-04-17 14:58 ` Steven Price
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.