Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [Linaro-mm-sig] Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-19  7:53 UTC (permalink / raw)
  To: Albert Esteve
  Cc: Barry Song, T.J. Mercier, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CADSE00Lc42s2bzXzV5D7t1Enf56u4BVj-yXLp3Yxhm0=qMPvuw@mail.gmail.com>

On 5/18/26 14:06, Albert Esteve wrote:
>>>>> udmabufs are already
>>>>> memcg-charged, so adding a separate MEMCG_DMABUF would double count.
>>>>> Are there any other exporters you had in mind that would benefit from
>>>>> this approach?
>>
>> Well apart from DMA-buf memfd_create() is one of the things which as broken our neck in the past a couple of times.
>>
>> But thinking more about it what if instead of making this DMA-buf heaps specific what if we have a general cgroups function which allows to change accounting of a buffer referenced by a file descriptor to a different process?
>>
>> That would cover not only the DMA-buf heaps use case, but also all other DMA-buf with dmem and whatever we come up in the future as well.
> 
> I removed a draft adding an ioctl for charge transfer from the series
> before sending because I wanted to focus on the charge_pid_fd approach
> and keep things simple, deferring the recharge path to a follow-up
> depending on feedback.
> 
> The main difference between my removed draft and what you're
> describing, iiuc, is scope and layer: my draft was an explicit ioctl
> on the dma-buf fd that the consumer calls to claim the charge (see
> below), while you seem to be suggesting a more general kernel-internal
> function that could work across buffer types and cgroup controllers,
> so not necessarily userspace-initiated? A kernel-internal function
> will need a way to identify the target process, which sounds similar
> to the binder-backed approach from TJ [1]. For everything else, the
> receiver still needs to declare itself, which the ioctl accomplishes.
> 
> ```
> # When an app imports a daemon-allocated buffer, it can transfer the
> charge to itself:
> int buf_fd = receive_dmabuf_from_daemon();
> ioctl(buf_fd, DMA_BUF_IOCTL_XFER_CHARGE); /* charge now attributed to
> apps's cgroup */

Well that thinking goes into the right direction, but the requirements are still not completely covered as far as I can see.

Let me explain below a bit more.

> 
> [1] https://lore.kernel.org/cgroups/20230109213809.418135-1-tjmercier@google.com/
> 
>>
>> The only drawback I can see is that DMA-buf heap allocations would be temporarily accounted to the memory allocation daemon, but I don't think that this would be a problem.
> 
> The main reasons we moved away from TJ's transfer-based approach
> toward `charge_pid_fd` are: avoid the transient charge window on the
> daemon's cgroup; and to decouple from Binder, allowing any allocator
> to use it.

Yeah those concerns are completely correct.

The application should not volunteering says 'Charge that buffer to me.', but rather that the daemon says force charge that buffer to this application and tell me when the application is over its limit.

> 
> Technically, both approaches could coexist, though. Of the three
> scenarios TJ described:
> - Scenario 2 is directly addressed by charge_pid_fd approach without
> any transient charge on the daemon at the cost of one extra field in
> the heap ioctl uAPI struct.

Yeah extending the uAPI to pass in the pid on allocation time is not much of a problem, but you also need to modify the whole stack above it and that is a bit more trickier.

> - Scenario 3 can be handled by the charge transfer function without
> changes to SurfaceFlinger. The app or dequeueBuffer claims the charge
> for itself or the app, respectively (depending on whether we include a
> pid_fd field in the transfer ioctl). It also covers non-heap
> exporters. The con in both variants is the transient charge window on
> the daemon.

It should be trivial for the deamon to charge the buffer to an application before handing it out.

> Both approaches shift the responsibility for correct charging
> attribution to userspace: first, 'charge_pid_fd` on the allocator's
> side, and the transfer charge on the consumer's side.

Yeah that's why I said it would be better if we do that without any uAPI change, but with all the uAPI we have to transfer file descriptors (dup(), fork(), passing FDs over sockets etc...) it could be really tricky to implement that.

> Deciding on one, the other or both depends on how much we value
> avoiding transient attribution, and how much we need a non-heap
> generic solution. With the XFER_CHARGE we can cover both. Thus, the
> `charge_pid_fd` approach in this RFC can be seen as a
> performance/strictness optimisation, eliminating transient charges to
> the daemon at the cost of a permanent uAPI addition to the heap ioctl
> struct, but not strictly required for correctness.

Well all we need is a uAPI which says charge this buffer (file descriptor) to that cgroup (pidfd).

With this at hand we should be able to handle all use cases at the same time.

> On the other hand,
> if we agree on the end goal of migrating other exporters to use
> dma-buf heaps

That won't work. DMA-buf heaps is actually only a rather small and Anroid specific use case.

We have tons of other interfaces to allocate DMA-bufs which need to stay around because of HW restrictions and we do need a solution for them as well.

Regards,
Christian.

>, and scenario 3 is addressed by adding the app's pid_fd
> to SurfaceFlinger, then `charge_pid_fd` alone is a coherent/sufficient
> approach despite the uAPI change.
> 
>>
>> Regards,
>> Christian.
>>
>>>
>>> Thanks
>>> Barry
>>
> 


^ permalink raw reply

* Re: [PATCH v4 04/30] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
From: David Woodhouse @ 2026-05-19  7:50 UTC (permalink / raw)
  To: Dongli Zhang, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
	Sean Christopherson, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Vitaly Kuznetsov, x86, Marc Zyngier, Juergen Gross,
	Boris Ostrovsky, Paul Durrant, Jonathan Cameron, Sascha Bischoff,
	Jack Allister, Joey Gouly, joe.jin, linux-doc, linux-kernel,
	xen-devel, linux-kselftest
In-Reply-To: <935312be-9a86-49fd-8bb4-2c998a68e2df@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 9845 bytes --]

On Mon, 2026-05-18 at 17:57 -0700, Dongli Zhang wrote:
> On 2026-05-18 1:48 AM, David Woodhouse wrote:
> > ...
> 
> I have fixed the Thunderbird configuration. Does it look better to you?

The date is certainly better, thank you. But although I *was* up late
that night frowning at clocks, I didn't think I was up *quite* as late
(almost 2am) as it suggests.

But I suspect that getting *that* right is beyond the limit of
Thunderbird's configurability.

Thanks :)

> I really appreciate guidelines like the ones below.
> 
> https://lore.kernel.org/all/20240522001817.619072-8-dwmw2@infradead.org
> 
> Assuming I am a user of the new API, I feel confused about whether the goal is
> to replace KVM_SET_CLOCK with KVM_SET_CLOCK_GUEST, or whether the latter is
> meant to supplement the former.

The issue is that KVM_SET_CLOCK_GUEST can only be used in 'masterclock'
mode, when the TSC is reliable and the guest TSCs are all in sync.

Which ought to be *all* of the time, on modern hardware and sane
configurations. And in this series, I don't even let the *guest* screw
that over by setting different TSC offsets on different vCPUs any more
(we stay in masterclock mode in that case now). But the VMM can cause
its guest to come out of masterclock mode, by setting different TSC
*speeds* on different vCPUs.

So there remain some pathological cases where the kvmclock actually
still has a justification to exist, and those are the cases where it
needs to be set in its own right as a function of host time
(KVM_SET_CLOCK), not purely as a function of the guest TSC
(KVM_SET_CLOCK_GUEST).

> 
> If we are going to use KVM_SET_CLOCK_GUEST when KVM_SET_CLOCK is not needed, I
> would appreciate it if the API could carry more data in addition to struct
> pvclock_vcpu_time_info.
> 
> +#define KVM_SET_CLOCK_GUEST    _IOW(KVMIO, 0xd6, struct pvclock_vcpu_time_info)
> +#define KVM_GET_CLOCK_GUEST    _IOR(KVMIO, 0xd7, struct pvclock_vcpu_time_info)
> 
> 
> In the future, if we need to carry additional data, we could simply reuse the
> padding fields instead of introducing another KVM_SET_CLOCK_GUEST2.
> 
> The following is an example of how additional data could be carried.
> 
> KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c68dc1b577eabd5605c6c7c08f3e07ae18d30d5d

I'm not very keen on the way that KVM_[GS]ET_CLOCK threw extra time
references in over the years without any of them ever actually making
*sense*, which makes me reluctant. 

KVM_[GS]ET_CLOCK_GUEST on the other hand does *one* thing: it exports
the relationship between the guest TSC and kvmclock. We don't *need* to
litter it with time values from other clocks willy-nilly.

But yes, you are right in principle.

And in fact, the other part of this conversation has really drawn my
attention to the ugliness of the "try KVM_SET_CLOCK_GUEST and if that
doesn't take then fall back to KVM_SET_CLOCK" which we are pushing onto
userspace.

Yes, I covered it in the guidelines: but we should always abide by the
mantra, "if it *needs* documenting, fix it first".

So perhaps we could build a variant which does both at once. It
provides the guest clock as a function of guest TSC for the sane
masterclock case, but *also* includes CLOCK_TAI at the same moment, in
case it needs to fall back to that.

Although... the actual reference time part of the existing pvclock
might be significantly in the past, especially with all the
MASTERCLOCK_UPDATE elimination that we've been doing. And we'd have to
regenerate it to get a simultaneous realtime reading, wouldn't we?
> > 

> > > Do we need a KVM_CHECK_EXTENSION capability for this? If userspace wants to
> > > support the new API, should it detect availability via KVM_CHECK_EXTENSION, or
> > > simply try the ioctl and handle failure?
> > 
> > That might be conventional, I suppose. But I suspect Jack's thinking
> > was that userspace is going to have to *try* it anyway, and still might
> > have to fall back to what KVM_SET_CLOCK can manage, so userspace
> > probably wouldn't even bother to check that capability; it doesn't
> > matter.
> > 
> > Since then, we've added some more attributes in this series though, and
> > it probably is worth adding a cap which advertises them *all*?
> > Something like KVM_CAP_CLOCK_PRECISION_API?
> 
> From an API user's perspective, userspace may need to distinguish between an API
> failure and the API not being available.

That's -ENOTTY vs. -EINVAL for the ioctl, isn't it? And isn't there
something similarly unambiguous for the attributes?

But I have no objection to adding a capability. The lack of it was more
oversight than intent.

> > > 
> > > From my perspective, I am also curious how we should reason about this in other
> > > scenarios in the future. Specifically, when do we need to process
> > > KVM_REQ_MASTERCLOCK_UPDATE before KVM_REQ_CLOCK_UPDATE, and when is it
> > > acceptable not to? I noticed that kvm_cpuid() already processes only
> > > KVM_REQ_CLOCK_UPDATE.
> > 
> > The way I've been thinking about it — and I'm only two cups of coffee
> > into Monday so take those words literally and don't think of them as
> > British understatement of something I believe is absolute truth — is
> > that MASTERCLOCK_UPDATE is updating the actual clock for the whole VM,
> > while CLOCK_UPDATE is about *putting* that information into the per-
> > vCPU pvclock structures.
> > 
> > So after a MASTERCLOCK_UPDATE, we need to do a CLOCK_UPDATE on all
> > vCPUs to disseminate the result. Which means that if CLOCK_UPDATE is
> > already pending before a MASTERCLOCK_UPDATE, it's probably redundant
> > and might as well be cleared because it's only going to get set *again*
> > in kvm_end_pvclock_update()? 
> 
> Another scenario is when only MASTERCLOCK_UPDATE is pending and there is no
> pending CLOCK_UPDATE.
> 
> In this scenario, is it fine to skip processing MASTERCLOCK_UPDATE before saving
> pvclock_vcpu_time_info?
> 

I'm not sure I understand that scenario. 

MASTERCLOCK_UPDATE means we have to actually recalculate the master
clock (which really *should* be rare, now!). And then any time we do
that, we also have to do a CLOCK_UPDATE on every vCPU to disseminate
the new information. Which is why kvm_end_pvclock_update() does exactly
that.

So your "MASTERCLOCK_UPDATE is pending and there is no pending
CLOCK_UPDATE" doesn't make much sense to me. If MASTERCLOCK_UPDATE is
pending, then there *will* be a CLOCK_UPDATE pending.

> > > 
> > > Would it be helpful to validate that the delta is within a reasonable range,
> > > e.g. that the drift can never be more than five minutes (forward or backward)?
> > 
> > If a guest has been running for months on a previous host and is
> > migrated to a new host, don't we expect that the KVM clock of the new
> > VM on the new host is tweaked from its default near-zero after
> > creation, to some large amount?
> > 
> 
> Regarding live migration, my own investigation does not show a proportional
> relationship between VM uptime and the amount of drift.

You're comparing the VM on the source host, with the VM on the
destination post-migration.

Perhaps I misunderstood, but I thought your suggested validation of a
'reasonable range' would also apply when adjusting the kvmclock of the
nascent VM on the destination host, from "newly created" to "has been
running for months" while migrating the state of the actual guest onto
a clean new slate.

> Just taking QEMU + KVM as an example: suppose TSC scaling is inactive, the
> amount of drift does not depend on how long the VM has been running before live
> migration.
> 
> Instead, it depends on the delta between when we call MSR_IA32_TSC and
> KVM_GET_CLOCK, and between MSR_IA32_TSC and KVM_SET_CLOCK.
> 
> The guest TSC stops at P1 and resumes at P3.
> The kvmclock stops at P2 and resumes at P4.
> 
> We expect P1 == P2 and P3 == P4.
> 
> On source host.
> 
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=0 ===> P1

Here's where it all starts going wrong. Line 1.

Any API which lets you get a single time value in isolation, and thus
which is already out of date by the time the system call even returns,
is fundamentally unsuitable for migration.

> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=1
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=2
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=3
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=4
> ... ...
> - kvm_get_msr_common(MSR_IA32_TSC) for vCPU=N
> - KVM_GET_CLOCK                               ===> P2
> 
> On target host.
> 
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=1 ===> P3
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=2

At this point, the nasty hack in the kernel steps in, realises that the
value you're setting on vCPU 2 is within a second or so of the value
you had previously set on vCPU 1, and snaps it back to be precisely the
same. To work around the fundamental brokenness of this method.

> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=3
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=4
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=5
> ... ...
> - kvm_set_msr_common(MSR_IA32_TSC) for vCPU=N
> - KVM_SET_CLOCK                               ====> P4
> 
> 
> Here is my equiation to predict the drift.

I'm sure you're right, but I didn't get that far when looking at this.
I'd already thrown up in my mouth a little bit by line one.

Here's my equation to predict the drift of a live update done correctly
on the same host using the method I've now put in the documentation:

0.

:)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Boris Brezillon @ 2026-05-19  7:39 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Liviu Dudau, Marcin Ślusarz, Ketil Johnsen, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
	Christian König, Steven Price, Daniel Almeida, Alice Ryhl,
	Matthias Brugger, AngeloGioacchino Del Regno, dri-devel,
	linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	linux-arm-kernel, linux-mediatek, Florent Tomasin, nd
In-Reply-To: <CAPaKu7R9ET767qc3eppBUfG2RAeyrg7E-gE0turgp-u_FU4+Vg@mail.gmail.com>

On Mon, 18 May 2026 17:36:40 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Mon, May 18, 2026 at 12:16 AM Boris Brezillon
> <boris.brezillon@collabora.com> wrote:
> >
> > On Wed, 13 May 2026 12:31:32 -0700
> > Chia-I Wu <olvaffe@gmail.com> wrote:
> >  
> > > On Tue, May 12, 2026 at 8:39 AM Liviu Dudau <liviu.dudau@arm.com> wrote:  
> > > >
> > > > On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:  
> > > > > On Tue, 12 May 2026 14:47:27 +0100
> > > > > Liviu Dudau <liviu.dudau@arm.com> wrote:
> > > > >  
> > > > > > On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:  
> > > > > > > On Thu, 7 May 2026 11:02:26 +0200
> > > > > > > Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> > > > > > >  
> > > > > > > > On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:  
> > > > > > > > > > @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
> > > > > > > > > >                     return ret;
> > > > > > > > > >     }
> > > > > > > > > >
> > > > > > > > > > +   /* If a protected heap name is specified but not found, defer the probe until created */
> > > > > > > > > > +   if (protected_heap_name && strlen(protected_heap_name)) {  
> > > > > > > > >
> > > > > > > > > Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
> > > > > > > > > name is "" already?  
> > > > > > > >
> > > > > > > > If dma_heap_find() will fail, then the whole probe with fail too.
> > > > > > > > This check prevents that.  
> > > > > > >
> > > > > > > Yeah, that's also a questionable design choice. I mean, we can
> > > > > > > currently probe and boot the FW even though we never setup the
> > > > > > > protected FW sections, so why should we defer the probe here? Can't we
> > > > > > > just retry the next time a group with the protected bit is created and
> > > > > > > fail if we can find a protected heap?  
> > > > > >
> > > > > > The problem we have with the current firmware is that it does a number of setup steps at "boot"
> > > > > > time only. One of the steps is preparing its internal structures for when it enters protected
> > > > > > mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
> > > > > > process when we have a group with protected mode set.  
> > > > >
> > > > > No, but we can force a full/slow reset and have that thing
> > > > > re-initialized, can't we? I mean, that's basically what we do when a
> > > > > fast reset fails: we re-initialize all the sections and reset again, at
> > > > > which point the FW should start from a fresh state, and be able to
> > > > > properly initialize the protected-related stuff if protected sections
> > > > > are populated. Am I missing something?  
> > > >
> > > > Right, we can do that. For some reason I keep associating the reset with the
> > > > error handling and not with "normal" operations.  
> > > I kind of hope we end up with either
> > >
> > >  - panthor knows the exact heap to use and fails with EPROBE_DEFER if
> > > the heap is missing, or
> > >  - panthor gets a dma-buf from userspace and does the full reset
> > >    - userspace also needs to provide a dma-buf for each protected
> > > group for the suspend buffer
> > >
> > > than something in-between. The latter is more ad-hoc and basically
> > > kicks the issue to the userspace.  
> >
> > Indeed, the second option is more ad-hoc, but when you think about it,
> > userspace has to have this knowledge, because it needs to know the
> > dma-heap to use for buffer allocation that cross a device boundary
> > anyway. Think about frames produced by a video decoder, and composited
> > by the GPU into a protected scanout buffer that's passed to the KMS
> > device. Why would the GPU driver be source of truth when it comes to
> > choosing the heap to use to allocate protected buffers for the video
> > decoder or those used for the display?  
> I don't think the GPU driver is ever the source of truth. If the
> system integrator wants to specify the source of truth (SoT) from
> kernel space, they should use the device tree (or module params /
> config options). If they want to specify the SoT in userspace, then we
> don't really care how it is done other than providing an ioctl.
> Panthor is always on the receiving end.

Okay, we're on the same page then.

> 
> If we don't want to delay this functionality, but it takes time to
> converge on SoT, maybe a solution that is not a long-term promise can
> work? Of the options on the table (dt, module params, kconfig options,
> ioctls), a kconfig option, potentially marked as experimental, seems
> like a good candidate.

If Panthor is only a consumer, I actually think it'd be easier to just
let userspace pass the protected FW section as an imported buffer
through an ioctl for now. It means we don't need any of the
modifications to the dma_heap API in this series, and userspace is free
to choose its SoT (efuse, DT, ...) and pass the info back to mesa/GBM
somehow (envvar, driconf, ...). The only thing we need to ensure is if
lazy protected FW section allocation is going to work, but given the
current code purely and simply ignores those sections, and the FW is
still able to boot and act properly (at least on v10-v13), I'm pretty
confident this is okay, unless there's some trick the MCU can do to
detect that the protected section isn't mapped (which I doubt, because
the MCU doesn't know it lives behind an MMU).

Of course, once we have a consensus on how to describe this in the DT,
we can switch Panthor over to "protected dma_heap selection through DT",
and reflect that through the ioctl that exposes whether protected
support is ready or not (would be a DEV_QUERY), such that userspace can
skip this "PROTM initialization" step.

We're talking about an extra ioctl to set those buffers, and a
DEV_QUERY to query the state (ready or not), the size of the global
protected buffer (protected FW section) and the size of the protected
suspend buffer. The protected suspend buffer would be allocated and
passed at group creation time (extra arg passed to the existing
GROUP_CREATE ioctl). So, overall, I don't consider it a huge liability
in term of maintenance cost.

> 
> >  
> > >
> > > For the former, expressing the relation in DT seems to be the best,
> > > but only if possible :-). Otherwise, a kconfig option (instead of
> > > module param) should be easier to work with.
> > >
> > > Looking at the userspace implementation, can we also have an panthor
> > > ioctl to return the heap to userspace?  
> >
> > Yes, it's something we can add, but again, I'm questioning the
> > usefulness of this: how can we ensure the heap used by panthor to
> > allocate its protected FW buffers is suitable for scanout buffers
> > (buffers that can be used by display drivers). There needs to be a glue
> > leaving in usersland and taking the decision, and I'm not too sure
> > trusting any of the component in the chain (vdec, gpu, display) is the
> > right thing to do.  
> The heap returned by panthor is only for panfrost/panvk. It says
> nothing about compatibility with other components on the system.

Okay, if it's used only for internal buffers, I guess that's fine.

^ permalink raw reply

* Re: [PATCH v4 16/30] KVM: x86: Restructure kvm_guest_time_update() for TSC upscaling
From: Dongli Zhang @ 2026-05-19  7:38 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Vitaly Kuznetsov, Juergen Gross, Boris Ostrovsky,
	Paul Durrant, Jonathan Cameron, Sascha Bischoff, Marc Zyngier,
	Joey Gouly, Jack Allister, joe.jin, linux-doc, linux-kernel,
	xen-devel, linux-kselftest
In-Reply-To: <20260509224824.3264567-17-dwmw2@infradead.org>

I have encountered this build error with this patch.

Perhaps it is because all usage of "flags" are removed.

$ make -j32 > /dev/null
arch/x86/kvm/x86.c: In function ‘kvm_guest_time_update’:
arch/x86/kvm/x86.c:3359:23: error: unused variable ‘flags’ [-Werror=unused-variable]
 3359 |         unsigned long flags;
      |                       ^~~~~
cc1: all warnings being treated as errors
make[4]: *** [scripts/Makefile.build:289: arch/x86/kvm/x86.o] Error 1
make[3]: *** [scripts/Makefile.build:548: arch/x86/kvm] Error 2
make[2]: *** [scripts/Makefile.build:548: arch/x86] Error 2
make[1]: *** [/home/opc/ext4/mainline-linux/Makefile:2143: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2

Thank you very much!

Dongli Zhang

On 2026-05-09 3:46 PM, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Restructure kvm_guest_time_update() so that kernel_ns/host_tsc are
> always "now" when doing TSC catchup, then swap in the master clock
> reference values afterward for the hv_clock.
> 
> This makes the TSC upscaling code considerably simpler: the catchup
> adjustment is computed as the delta between what the guest TSC *should*
> be at "now" and what it actually is, rather than mixing "now" and
> "master clock reference" timestamps.
> 
> The seqcount loop now also contains the kvm_get_time_and_clockread()
> call (matching get_kvmclock's pattern), with the same WARN for
> unexpected failure.
> 
> Based on a suggestion by Sean Christopherson.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  arch/x86/kvm/x86.c | 67 ++++++++++++++++++++++++++++++++--------------
>  1 file changed, 47 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e281c49561fa..8e4993ef4f6b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3363,39 +3363,51 @@ int kvm_guest_time_update(struct kvm_vcpu *v)
>  	struct kvm_arch *ka = &v->kvm->arch;
>  	s64 kernel_ns;
>  	u64 tsc_timestamp, host_tsc;
> +	u64 master_host_tsc = 0;
> +	s64 master_kernel_ns = 0;
>  	bool use_master_clock;
>  
> -	kernel_ns = 0;
> -	host_tsc = 0;
> -
>  	/*
>  	 * If the host uses TSC clock, then passthrough TSC as stable
>  	 * to the guest.
>  	 */
>  	do {
>  		seq = read_seqcount_begin(&ka->pvclock_sc);
> +
>  		use_master_clock = ka->use_master_clock;
> -		if (use_master_clock) {
> -			host_tsc = ka->master_cycle_now;
> -			kernel_ns = ka->master_kernel_ns;
> -		}
> +
> +		/*
> +		 * The TSC read and the call to get_cpu_tsc_khz() must happen
> +		 * on the same CPU.
> +		 */
> +		get_cpu();
> +
> +		tgt_tsc_hz = (u64)get_cpu_tsc_khz() * 1000;
> +
> +		if (use_master_clock &&
> +		    !kvm_get_time_and_clockread(&kernel_ns, &host_tsc) &&
> +		    WARN_ON_ONCE(!read_seqcount_retry(&ka->pvclock_sc, seq)))
> +			use_master_clock = false;
> +
> +		put_cpu();
> +
> +		if (!use_master_clock)
> +			break;
> +
> +		master_host_tsc = ka->master_cycle_now;
> +		master_kernel_ns = ka->master_kernel_ns;
>  	} while (read_seqcount_retry(&ka->pvclock_sc, seq));
>  
> -	/* Keep irq disabled to prevent changes to the clock */
> -	local_irq_save(flags);
> -	tgt_tsc_hz = (u64)get_cpu_tsc_khz() * 1000;
>  	if (unlikely(tgt_tsc_hz == 0)) {
> -		local_irq_restore(flags);
>  		kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>  		return 1;
>  	}
> +
>  	if (!use_master_clock) {
>  		host_tsc = rdtsc();
>  		kernel_ns = get_kvmclock_base_ns();
>  	}
>  
> -	tsc_timestamp = kvm_read_l1_tsc(v, host_tsc);
> -
>  	/*
>  	 * We may have to catch up the TSC to match elapsed wall clock
>  	 * time for two reasons, even if kvmclock is used.
> @@ -3404,17 +3416,32 @@ int kvm_guest_time_update(struct kvm_vcpu *v)
>  	 *      entry to avoid unknown leaps of TSC even when running
>  	 *      again on the same CPU.  This may cause apparent elapsed
>  	 *      time to disappear, and the guest to stand still or run
> -	 *	very slowly.
> +	 *      very slowly.
>  	 */
>  	if (vcpu->tsc_catchup) {
> -		u64 tsc = compute_guest_tsc(v, kernel_ns);
> -		if (tsc > tsc_timestamp) {
> -			adjust_tsc_offset_guest(v, tsc - tsc_timestamp);
> -			tsc_timestamp = tsc;
> -		}
> +		s64 adjustment;
> +
> +		/*
> +		 * Calculate the delta between what the guest TSC *should* be
> +		 * and what it actually is according to kvm_read_l1_tsc().
> +		 */
> +		adjustment = compute_guest_tsc(v, kernel_ns) -
> +			     kvm_read_l1_tsc(v, host_tsc);
> +		if (adjustment > 0)
> +			adjust_tsc_offset_guest(v, adjustment);
>  	}
>  
> -	local_irq_restore(flags);
> +	/*
> +	 * Now that TSC upscaling is out of the way, the remaining calculations
> +	 * are all relative to the reference time that's placed in hv_clock.
> +	 * If the master clock is NOT in use, the reference time is "now".  If
> +	 * master clock is in use, the reference time comes from there.
> +	 */
> +	if (use_master_clock) {
> +		host_tsc = master_host_tsc;
> +		kernel_ns = master_kernel_ns;
> +	}
> +	tsc_timestamp = kvm_read_l1_tsc(v, host_tsc);
>  
>  	/* With all the info we got, fill in the values */
>  


^ permalink raw reply

* Re: [PATCH v4 03/30] UAPI: x86: Move pvclock-abi to UAPI for x86 platforms
From: Dongli Zhang @ 2026-05-19  7:35 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
	Sean Christopherson, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86, H. Peter Anvin, Vitaly Kuznetsov, Juergen Gross,
	Boris Ostrovsky, Paul Durrant, Jonathan Cameron, Sascha Bischoff,
	Marc Zyngier, Joey Gouly, Jack Allister, joe.jin, linux-doc,
	linux-kernel, xen-devel, linux-kselftest
In-Reply-To: <20260509224824.3264567-4-dwmw2@infradead.org>

I have encountered below build warning.

Perhaps it is because of PATCH 03?

In file included from ./include/linux/types.h:5,
                 from ./arch/x86/include/uapi/asm/pvclock-abi.h:5,
                 from ./arch/x86/include/asm/xen/interface.h:197,
                 from ./include/xen/interface/xen.h:13,
                 from <command-line>:
./include/uapi/linux/types.h:10:2: warning: #warning "Attempt to use kernel
headers from user space, see https://kernelnewbies.org/KernelHeaders" [-Wcpp]
   10 | #warning "Attempt to use kernel headers from user space, see
https://kernelnewbies.org/KernelHeaders"
      |  ^~~~~~~
In file included from ./include/linux/types.h:5,
                 from ./arch/x86/include/uapi/asm/pvclock-abi.h:5,
                 from ./arch/x86/include/asm/xen/interface.h:197,
                 from ./include/xen/interface/xen.h:13,
                 from ./include/xen/interface/xenpmu.h:5,
                 from <command-line>:
./include/uapi/linux/types.h:10:2: warning: #warning "Attempt to use kernel
headers from user space, see https://kernelnewbies.org/KernelHeaders" [-Wcpp]
   10 | #warning "Attempt to use kernel headers from user space, see
https://kernelnewbies.org/KernelHeaders"
      |  ^~~~~~~

Thank you very much!

Dongli Zhang

On 2026-05-09 3:46 PM, David Woodhouse wrote:
> From: Jack Allister <jalliste@amazon.com>
> 
> A subsequent commit will provide a new KVM interface for performing a
> fixup/correction of the KVM clock against the reference TSC. The
> KVM_[GS]ET_CLOCK_GUEST API requires a pvclock_vcpu_time_info, as such
> the caller must know about this definition.
> 
> Move the definition to the UAPI folder so that it is exported to
> usermode and also change the type definitions to use the standard for
> UAPI exports.
> 
> Signed-off-by: Jack Allister <jalliste@amazon.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Paul Durrant <paul@xen.org>
> ---
>  MAINTAINERS                                   |  4 +--
>  arch/x86/include/{ => uapi}/asm/pvclock-abi.h | 27 ++++++++++---------
>  2 files changed, 17 insertions(+), 14 deletions(-)
>  rename arch/x86/include/{ => uapi}/asm/pvclock-abi.h (82%)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e0b307b2108c..e49676955c0c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14406,7 +14406,7 @@ S:	Supported
>  T:	git git://git.kernel.org/pub/scm/virt/kvm/kvm.git
>  F:	arch/um/include/asm/kvm_para.h
>  F:	arch/x86/include/asm/kvm_para.h
> -F:	arch/x86/include/asm/pvclock-abi.h
> +F:	arch/x86/include/uapi/asm/pvclock-abi.h
>  F:	arch/x86/include/uapi/asm/kvm_para.h
>  F:	arch/x86/kernel/kvm.c
>  F:	arch/x86/kernel/kvmclock.c
> @@ -29087,7 +29087,7 @@ R:	Boris Ostrovsky <boris.ostrovsky@oracle.com>
>  L:	xen-devel@lists.xenproject.org (moderated for non-subscribers)
>  S:	Supported
>  F:	arch/x86/configs/xen.config
> -F:	arch/x86/include/asm/pvclock-abi.h
> +F:	arch/x86/include/uapi/asm/pvclock-abi.h
>  F:	arch/x86/include/asm/xen/
>  F:	arch/x86/platform/pvh/
>  F:	arch/x86/xen/
> diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/uapi/asm/pvclock-abi.h
> similarity index 82%
> rename from arch/x86/include/asm/pvclock-abi.h
> rename to arch/x86/include/uapi/asm/pvclock-abi.h
> index b9fece5fc96d..6d70cf640362 100644
> --- a/arch/x86/include/asm/pvclock-abi.h
> +++ b/arch/x86/include/uapi/asm/pvclock-abi.h
> @@ -1,6 +1,9 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>  #ifndef _ASM_X86_PVCLOCK_ABI_H
>  #define _ASM_X86_PVCLOCK_ABI_H
> +
> +#include <linux/types.h>
> +
>  #ifndef __ASSEMBLER__
>  
>  /*
> @@ -24,20 +27,20 @@
>   */
>  
>  struct pvclock_vcpu_time_info {
> -	u32   version;
> -	u32   pad0;
> -	u64   tsc_timestamp;
> -	u64   system_time;
> -	u32   tsc_to_system_mul;
> -	s8    tsc_shift;
> -	u8    flags;
> -	u8    pad[2];
> +	__u32   version;
> +	__u32   pad0;
> +	__u64   tsc_timestamp;
> +	__u64   system_time;
> +	__u32   tsc_to_system_mul;
> +	__s8    tsc_shift;
> +	__u8    flags;
> +	__u8    pad[2];
>  } __attribute__((__packed__)); /* 32 bytes */
>  
>  struct pvclock_wall_clock {
> -	u32   version;
> -	u32   sec;
> -	u32   nsec;
> +	__u32   version;
> +	__u32   sec;
> +	__u32   nsec;
>  } __attribute__((__packed__));
>  
>  #define PVCLOCK_TSC_STABLE_BIT	(1 << 0)


^ permalink raw reply

* Re: [PATCH 2/6] mm/damon/sysfs: implement update_schemes_quota_goals command
From: Maksym Shcherba @ 2026-05-19  7:33 UTC (permalink / raw)
  To: SeongJae Park
  Cc: Maksym Shcherba, Maksym Shcherba, akpm, david, ljs, liam, vbabka,
	rppt, surenb, mhocko, corbet, skhan, damon, linux-mm,
	linux-kernel, linux-doc, linux-kselftest
In-Reply-To: <20260519001703.99264-1-sj@kernel.org>

On Mon, 18 May 2026 17:17:02 -0700 SeongJae Park <sj@kernel.org> wrote:

> On Mon, 18 May 2026 22:09:28 +0300 Maksym Shcherba <mshcherba2000@gmail.com> wrote:
> 
> > Add the logic to copy the current_value from the internal
> > damos_quota_goal structure to the damos_sysfs_quota_goal sysfs structure.
> > Introduce the DAMON_SYSFS_CMD_UPDATE_SCHEMES_QUOTA_GOALS command
> > and integrate it with the sysfs interface via the 'state' file.
> 
> Could you please further elaborate why you think this change is needed?  What
> is the expected use case and benefit?
>

Hi SJ,

The documentation (`Documentation/admin-guide/mm/damon/usage.rst`)
states that users can read the `current_value` file. However, the
kernel currently never updates this value in sysfs, preventing users
from reading the actual metrics.

This patch series implements the missing logic to align the code
with the documentation.

If the design intent was to intentionally keep `current_value`
internal and not expose it via sysfs, then the documentation is
incorrect. Let me know if that's the case, and I will send a v2
that drops the code changes and only fixes the documentation.

(Apologies for missing the cover letter where this should have
been explained, this is my first patch submission).

Thanks,
Maksym Shcherba

[...]

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-19  7:19 UTC (permalink / raw)
  To: T.J. Mercier
  Cc: Albert Esteve, Christian Brauner, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-devel, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CABdmKX3yZubjDKbVqwrjHAiKyj_ioHzOoxd0wzFbJK=PAGOqcQ@mail.gmail.com>

On 5/19/26 01:39, T.J. Mercier wrote:
> On Mon, May 18, 2026 at 7:07 AM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 5/18/26 14:50, Albert Esteve wrote:
>>> On Mon, May 18, 2026 at 9:20 AM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>>
>>>> On 5/15/26 19:06, T.J. Mercier wrote:
>>>>> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
>>>>>>
>>>>>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
>>>>>>> On embedded platforms a central process often allocates dma-buf
>>>>>>> memory on behalf of client applications. Without a way to
>>>>>>> attribute the charge to the requesting client's cgroup, the
>>>>>>> cost lands on the allocator, making per-cgroup memory limits
>>>>>>> ineffective for the actual consumers.
>>>>>>>
>>>>>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
>>>>>>
>>>>>> Please be aware that pidfds come in two flavors:
>>>>>>
>>>>>> thread-group pidfds and thread-specific pidfds. Make sure that your API
>>>>>> doesn't implicitly depend on this distinction not existing.
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> Memcg is not a controller that supports "thread mode" so all threads
>>>>> in a group should belong to the same memcg.
>>>>
>>>> BTW: Exactly that is the requirement automotive has with their native context use case.
>>>>
>>>> The use case is that you have a deamon which has multiple threads were each one is acting on behalve of some other process.
>>>>
>>>> At the moment we basically say they are simply not using cgroups for that use case, but it would be really nice if we could handle that as well.
>>>>
>>>> Summarizing the requirement of that use case: You need a different cgroup for each thread of a process.
>>>
>>> Hi Christian,
>>>
>>> Thanks for sharing this atuomotive usecase. If I understand correctly,
>>> the actual requirement is attributing dma-buf charges to the right
>>> client, not putting each daemon thread in a different cgroup?
>>
>> Nope, exactly that's the difference.
>>
>> The thread acts as a filtering agent for both memory allocation and command submission for somebody else, the process on which behalve the daemon does things can even be in a client VM, completely remote over some network or even something like a microcontroller.
>>
>> Everything the thread does regarding CPU time, GPU driver memory allocation as well as resources like GPU processing and I/O time etc.. needs to be accounted to one client which can be different for each thread of the process.
>>
>> The only thing which is shared with the main process thread is CPU memory resources, e.g. malloc() because that is basically just needed for housekeeping and pretty much irrelevant for this kind of use case.
>>
>> The problem is now you can't do that with cgroups at the moment but unfortunately only the kernel has the information you need to know to do this.
>>
>> So what you end up with is to define tons of interfaces just to get the necessary information from the kernel into userspace and then essentially duplicate the same infrastructure cgroup provides in the kernel in userspace again.
>>
>>> If so,
>>> the `charge_pid_fd` approach achieves this directly by passing the
>>> client's `pid_fd`, without needing to add per-thread cgroup
>>> infrastructure.
>>
>> Well it's already a massive improvemt, we could basically stop doing the whole duplication part for the GPU driver stack and just use cgroups for this part.
>>
>> Doing that automatically for CPU and I/O time would just be nice to have additionally.
>>
>> Regards,
>> Christian.
> 
> Hopefully I'm following correctly here.... So you are duplicating the
> GPU driver stack to achieve remote accounting on a per-thread basis?

Not quite, we are duplicating the handling cgroup provides in the kernel in userspace.

For this memory usage information as well as execution times of the GPU kernel driver is exposed in fdinfo for example.

> Does this mean for GPU allocations you currently have some GFP_ACCOUNT
> magic in your driver to attribute GPU memory to the correct remote
> client?

No, we just expose what the kernel driver has allocated for itself. E.g. page tables, buffers etc...

When userspace allocates something using memfd_create() for example we just ignore that. 

> So this series would close the gap for dma-buf allocations,
> but what about private GPU driver memory allocated on behalf of a
> client?

Well we would need a cgroup which isn't associated with any process were we could charge the GPU driver allocations against.

But good point, charging against a pid wouldn't work in this use case.

Regards,
Christian.

^ permalink raw reply

* Re: [PATCH v2 1/3] dt-bindings: iio: dac: Add AD5529R
From: Janani Sunil @ 2026-05-19  7:13 UTC (permalink / raw)
  To: David Lechner, Jonathan Cameron, Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, Nuno Sá,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan, linux-iio, devicetree,
	linux-kernel, linux-doc, rodrigo.alencar
In-Reply-To: <53d547ee-1ac3-42b9-92a6-e7f48b72fee3@baylibre.com>


On 5/16/26 21:25, David Lechner wrote:
> On 5/8/26 7:48 AM, Jonathan Cameron wrote:
>> On Fri, 8 May 2026 13:55:47 +0200
>> Janani Sunil <janani.sunil@analog.com> wrote:
>>
>>> Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
>>> buffered voltage output digital-to-analog converter (DAC) with an
>>> integrated precision reference.
>>>
>>> Signed-off-by: Janani Sunil <janani.sunil@analog.com>
>>> ---
> ...
>
>>> +  * Multiplexer for output voltage, load current sense and die temperature
>>> +
>>> +  Datasheet: https://www.analog.com/media/en/technical-documentation/data-sheets/ad5529r.pdf
>>> +
>>> +properties:
>>> +  compatible:
>>> +    const: adi,ad5529r
>>> +
>>> +  reg:
>>> +    maxItems: 1
>>> +
>>> +  spi-max-frequency:
>>> +    maximum: 50000000
>>> +
>>> +  reset-gpios:
>>> +    maxItems: 1
>>> +    description:
>>> +      GPIO connected to the RESET pin. Active low. When asserted low,
>>> +      performs a power-on reset and initializes the device to its default state.
>>> +
>>> +  vdd-supply:
>>> +    description: Digital power supply (typically 3.3V)
>>> +
>>> +  avdd-supply:
>>> +    description: Analog power supply (typically 5V)
>>> +
>>> +  hvdd-supply:
>>> +    description: High voltage positive supply (up to 40V for output range)
>>> +
>>> +  hvss-supply:
>>> +    description: High voltage negative supply (ground or negative voltage)
>> I don't mind doing it this way but in some similar cases where 0 is something that
>> can be considered the 'default' we've made the supply optional.  What was
>> your reasoning for requiring it in this case?
>>
>> dt-bindings should be as complete as we can make them - with that in mind...
>>
>> There are some more interesting corners on this device the binding doesn't
>> currently cover such as mux_out pin.  We'd normally do that by making the
>> driver potentially a client of an ADC
>>
>> Easier though is !alarm which smells like an interrupt.
>> !clear probably a gpio. TG0-3 also GPIOs.
> also optional vref-supply for external vs internal reference

I will add bindings for optional Vref supply in the next version.

Best Regards,
Janani Sunil


^ permalink raw reply

* Re: [PATCH v2 2/3] iio: dac: Add AD5529R DAC driver support
From: Janani Sunil @ 2026-05-19  7:11 UTC (permalink / raw)
  To: David Lechner, Janani Sunil, Lars-Peter Clausen,
	Michael Hennerich, Jonathan Cameron, Nuno Sá,
	Andy Shevchenko, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Philipp Zabel, Jonathan Corbet, Shuah Khan
  Cc: linux-iio, devicetree, linux-kernel, linux-doc
In-Reply-To: <2f74e76e-b066-40ac-9cb4-c75137c9825d@baylibre.com>


On 5/16/26 21:35, David Lechner wrote:
> On 5/8/26 6:55 AM, Janani Sunil wrote:
>> Add support for AD5529R 16-channel, 12/16 bit Digital to Analog Converter
>>
> ...
>
>
>> +		.realbits = (bits),				\
>> +		.storagebits = 16,				\
>> +	},							\
>> +}
>> +static struct regmap *ad5529r_get_regmap(struct ad5529r_state *st, unsigned int reg)
>> +{
>> +	if (reg <= AD5529R_8BIT_REG_MAX)
>> +		return st->regmap_8bit;
>> +
>> +	return st->regmap_16bit;
>> +}
> Another way we have done this is make custom read/write functions for the
> regmap itself so that we don't have to have two regmaps.

Dual regmap approach was chosen here because:

1) It leverages regmap's val_bits validation and endianness for 16 bit registers, rather than
implementing them manually.

2) The two distinct register banks- 8 bit and 16 bit map naturally to the separate regmap configs

3) Each regmap has a focused rd_table/wr_table ranges matching the hardware, rather than a complex unified table

The routing overhead is just an address comparison, similar to what custom functions would need, but with automatic validation
and endianness handling

Best Regards,
Janani Sunil


^ permalink raw reply

* Re: [Linaro-mm-sig] Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-19  7:09 UTC (permalink / raw)
  To: Barry Song
  Cc: T.J. Mercier, Albert Esteve, Tejun Heo, Johannes Weiner,
	Michal Koutný, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Michal Hocko, Roman Gushchin, Shakeel Butt, Muchun Song,
	Andrew Morton, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian Brauner, Paul Moore, James Morris, Serge E. Hallyn,
	Stephen Smalley, Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc,
	linux-kernel, linux-media, dri-, linaro-mm-sig, linux-mm,
	linux-security-module, selinux, linux-kselftest, mripard,
	echanude
In-Reply-To: <CAGsJ_4z121v4tK_3+j-hkD7HH0gH3w8tWD8nk0CwRhFE5T+4Og@mail.gmail.com>

On 5/19/26 01:00, Barry Song wrote:
> On Mon, May 18, 2026 at 3:34 PM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 5/16/26 11:19, Barry Song wrote:
>>> On Thu, May 14, 2026 at 12:35 AM T.J. Mercier <tjmercier@google.com> wrote:
>>> [...]
>>>>>> I have a question about this part. Albert I guess you are interested
>>>>>> only in accounting dmabuf-heap allocations, or do you expect to add
>>>>>> __GFP_ACCOUNT or mem_cgroup_charge_dmabuf calls to other
>>>>>> non-dmabuf-heap exporters?
>>>>>
>>>>> We're scoping this to dma-buf heaps for now. CMA heaps and the dmem
>>>>> controller are on the radar for follow-up/parallel work (there will be
>>>>> dragons and will surely need discussion). For DRM and V4L2 the
>>>>> long-term intent is migration to heaps, which would make direct
>>>>> accounting on those paths unnecessary.
>>>>
>>>> Ah I see. GEM buffers exported to dmabufs are what I had in mind. I
>>>> guess this would only leave the odd non-DRM driver with the need to
>>>> add their own accounting calls, which I don't expect would be a big
>>>> problem.
>>>>
>>>
>>> sounds like we still have a long way to go to correctly account for
>>> various v4l2, drm, GEM, CMA, etc. In patch 1, the charging is done in
>>> dma_buf_export(), so I guess it covers all dma-buf types except
>>> dma_heap, but the problem is that it has no remote charging support at
>>> all?
>>
>> No, just the other way around
>>
>> DMA-buf heaps can be handled here because we know that it is pure system memory and nothing special so memcg always applies.
>>
>> dma_buf_export() on the other hand handles tons of different use cases, ranging from buffer accounted to dmem, over special resources which aren't even memory all the way to buffers which can migrate from dmem to memcg and back during their lifetime.
>>
> 
> Hi Christian,
> 
> Thanks very much for your explanation. So basically it seems that
> dma_buf_export() is not the proper place to charge, since it may end up
> mixing in non-system-memory accounting?

Yes, exactly that.

> My question is also about the global view for both heap and non-heap cases.
> After reading the discussion, I’ve tried to summarize it—please let me know
> if my understanding is correct.
> 
> for dma_heap, we have the ioctl DMA_HEAP_IOCTL_ALLOC, where users can pass a
> remote pidfd or similar information to indicate where the dma-buf should be
> charged, as in Albert's patchset.

Well that's the current proposal, but I think we need to come up with something more general.

> For non-dma_heap dma-bufs, we don’t have an obvious userspace entry point that
> triggers the allocation. So we likely need other approaches. We could either
> move more drivers over to dma-heap, or introduce something like
> DMA_BUF_IOCTL_XFER_CHARGE, as you are discussing, to let userspace explicitly
> declare a charge.

Yeah but that's not only for DMA-buf, we need that for file descriptors returned by memfd_create() as well.

Regards,
Christian.

> Best Regards
> Barry


^ permalink raw reply

* Re: [PATCH v2 2/3] iio: dac: Add AD5529R DAC driver support
From: Janani Sunil @ 2026-05-19  7:07 UTC (permalink / raw)
  To: Jonathan Cameron, Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc
In-Reply-To: <20260508143017.28f86551@jic23-huawei>


On 5/8/26 15:30, Jonathan Cameron wrote:
> On Fri, 8 May 2026 13:55:48 +0200
> Janani Sunil <janani.sunil@analog.com> wrote:
>
>> Add support for AD5529R 16-channel, 12/16 bit Digital to Analog Converter
>>
>> Signed-off-by: Janani Sunil <janani.sunil@analog.com>
>> +/* Register Map */
>> +#define AD5529R_REG_INTERFACE_CONFIG_A		0x00
>> +#define AD5529R_REG_INTERFACE_CONFIG_B		0x01
>> +#define AD5529R_REG_DEVICE_CONFIG		0x02
>> +#define AD5529R_REG_CHIP_TYPE			0x03
>> +#define AD5529R_REG_PRODUCT_ID_L		0x04
>> +#define AD5529R_REG_PRODUCT_ID_H		0x05
>> +#define AD5529R_REG_CHIP_GRADE			0x06
>> +#define AD5529R_REG_SCRATCH_PAD			0x0A
>> +#define AD5529R_REG_SPI_REVISION		0x0B
>> +#define AD5529R_REG_VENDOR_L			0x0C
>> +#define AD5529R_REG_VENDOR_H			0x0D
>> +#define AD5529R_REG_STREAM_MODE			0x0E
>> +#define AD5529R_REG_TRANSFER_CONFIG		0x0F
>> +#define AD5529R_REG_INTERFACE_CONFIG_C		0x10
>> +#define AD5529R_REG_INTERFACE_STATUS_A		0x11
>> +
>> +/* Configuration registers */
>> +#define AD5529R_REG_MULTI_DAC_CH_SEL		(0x14 + 1)
> Feels like this would all be simpler if you used autoincrement rather than
> default value of autdecrement.  What breaks if you do that?
> Superficially feels like all the +1 would go away - though with need
> for a byte swap?  Might be worth that pain for the simpler code.
> Should just be a regmap_config parameter.

Switching to auto increment is feasible. I'll switch to auto increment and
eliminate all the +1 offsets.

>> +
>> +static const struct regmap_range ad5529r_8bit_readable_ranges[] = {
>> +	regmap_reg_range(AD5529R_REG_INTERFACE_CONFIG_A, AD5529R_REG_CHIP_GRADE),
>> +	regmap_reg_range(AD5529R_REG_SCRATCH_PAD, AD5529R_REG_VENDOR_H),
>> +	regmap_reg_range(AD5529R_REG_STREAM_MODE, AD5529R_REG_INTERFACE_STATUS_A),
>> +};
>> +
>> +static const struct regmap_range ad5529r_16bit_readable_ranges[] = {
> Tricky bit here is you are saying it's a 16 bit regmap but then providing
> address ranges including the ones we shouldn't use. We need to hide those
> intermediate addresses.  Various things might work depending on the addresses.
> Can we hide the bottom bit of each address then write it to appropriate value
> under the hood. That is divide addresses by 2?

I'll address this by using reg_stride = 2 in the 16-bit regmap configuration,
which automatically handles the address spacing and eliminates the need for manual
address range exclusion.

>> +	int ret;
>> +
>> +	switch (mask) {
>> +	case IIO_CHAN_INFO_RAW:
>> +		reg_addr = AD5529R_REG_DAC_INPUT_A(chan->channel);
>> +		ret = regmap_read(st->regmap_16bit, reg_addr, &reg_val_h);
>> +		if (ret)
>> +			return ret;
>> +
>> +		*val = reg_val_h;
>> +
>> +		return IIO_VAL_INT;
>> +	case IIO_CHAN_INFO_SCALE:
>> +		/*
>> +		 * Using default 0-5V range: VOUTn = A × D/2^N + B
>> +		 * where A = 5V, B = 0V, D = digital code, N = resolution
>> +		 * Scale = 5000mV / 2^resolution
> See the comment on the dt-binding. I think we need support for
> dt described output ranges from the start. This is a rare multi range
> device where we could set a safe default but to me it makes little sense
> and the driver will be doing something unexpected if a newer DT is
> provided with a different range.

I will add devicetree properties for per channel output range configuration.

>> +
>> +static int ad5529r_probe(struct spi_device *spi)
>> +{
>> +	struct device *dev = &spi->dev;
>> +	struct iio_dev *indio_dev;
>> +	struct ad5529r_state *st;
>> +	int ret;
>> +
>> +	indio_dev = devm_iio_device_alloc(dev, sizeof(*st));
>> +	if (!indio_dev)
>> +		return -ENOMEM;
>> +
>> +	st = iio_priv(indio_dev);
>> +
>> +	st->spi = spi;
>> +
>> +	ret = devm_regulator_bulk_get_enable(dev, AD5529R_NUM_SUPPLIES,
>> +					     ad5529r_supply_names);
>> +	if (ret)
>> +		return dev_err_probe(dev, ret, "Failed to get and enable regulators\n");
>> +
>> +	st->regmap_8bit = devm_regmap_init_spi(spi, &ad5529r_regmap_8bit_config);
>> +	if (IS_ERR(st->regmap_8bit))
>> +		return dev_err_probe(dev, PTR_ERR(st->regmap_8bit),
>> +				     "Failed to initialize 8-bit regmap\n");
>> +
>> +	st->regmap_16bit = devm_regmap_init_spi(spi, &ad5529r_regmap_16bit_config);
>> +	if (IS_ERR(st->regmap_16bit))
>> +		return dev_err_probe(dev, PTR_ERR(st->regmap_16bit),
>> +				     "Failed to initialize 16-bit regmap\n");
>> +
>> +	ret = ad5529r_reset(st);
>> +	if (ret)
>> +		return dev_err_probe(dev, ret, "Failed to reset device\n");
>> +
>> +	ret = ad5529r_detect_device(st);
>> +	if (ret)
>> +		return dev_err_probe(dev, ret, "Failed to detect device variant\n");
> No to this. It breaks the use of fallback device tree compatibles.  As such we
> never fail on an ID missmatch. Instead we just believe firmware when it says
> whatever is there is compatible with this device. See below on why I think
> we need to break this into separate compatibles.

I'll create separate compatibles and remove the device ID detection logic.

Best Regards,
Janani Sunil


^ permalink raw reply

* Re: [PATCH v4 09/10] dt-bindings: firmware: add arm,ras-cper
From: Krzysztof Kozlowski @ 2026-05-19  7:04 UTC (permalink / raw)
  To: Ahmed Tiba, rafael, bp, saket.dumbre, will, xueshuai, mchehab,
	krzk+dt, dave, conor+dt, vishal.l.verma, jic23, corbet, guohanjun,
	dave.jiang, catalin.marinas, lenb, tony.luck, skhan, djbw,
	alison.schofield, ira.weiny, robh
  Cc: devicetree, linux-acpi, linux-doc, Dmitry.Lamerov, linux-cxl,
	Michael.Zhao2, acpica-devel, linux-kernel, linux-arm-kernel,
	linux-edac
In-Reply-To: <20260518-topics-ahmtib01-ras_ffh_arm_internal_review-v4-9-42698675ba61@arm.com>

On 18/05/2026 13:57, Ahmed Tiba wrote:
> Describe the DeviceTree node that exposes the Arm firmware-first
> CPER provider and hook the file into MAINTAINERS so the
> binding has an owner.
> 
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>

Please implement previous comments.


> ---
>  .../devicetree/bindings/firmware/arm,ras-cper.yaml | 71 ++++++++++++++++++++++
>  MAINTAINERS                                        |  5 ++
>  2 files changed, 76 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/firmware/arm,ras-cper.yaml b/Documentation/devicetree/bindings/firmware/arm,ras-cper.yaml
> new file mode 100644
> index 000000000000..81dc37390af5
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/firmware/arm,ras-cper.yaml
> @@ -0,0 +1,71 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/firmware/arm,ras-cper.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Arm RAS CPER provider
> +
> +maintainers:
> +  - Ahmed Tiba <ahmed.tiba@arm.com>
> +
> +description:
> +  Arm Reliability, Availability and Serviceability (RAS) firmware can expose
> +  a firmware-first CPER error source directly via DeviceTree. Firmware
> +  provides the CPER Generic Error Status block and notifies the OS through
> +  an interrupt.
> +
> +properties:
> +  compatible:
> +    const: arm,ras-cper
> +
> +  memory-region:
> +    oneOf:
> +      - items:
> +          - description:
> +              CPER Generic Error Status block exposed by firmware
> +      - items:
> +          - description:
> +              CPER Generic Error Status block exposed by firmware.

Also, this is just a list with minItems. No need for oneOf.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH v2 1/3] dt-bindings: iio: dac: Add AD5529R
From: Janani Sunil @ 2026-05-19  6:59 UTC (permalink / raw)
  To: Jonathan Cameron, Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, rodrigo.alencar
In-Reply-To: <20260508140814.67800e4a@jic23-huawei>


On 5/8/26 15:08, Jonathan Cameron wrote:
> On Fri, 8 May 2026 13:48:43 +0100
> Jonathan Cameron <jic23@kernel.org> wrote:
>
>> On Fri, 8 May 2026 13:55:47 +0200
>> Janani Sunil <janani.sunil@analog.com> wrote:
>>
>>> Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
>>> buffered voltage output digital-to-analog converter (DAC) with an
>>> integrated precision reference.
>>>
>>> Signed-off-by: Janani Sunil <janani.sunil@analog.com>
>>> ---
>>>   .../devicetree/bindings/iio/dac/adi,ad5529r.yaml   | 96 ++++++++++++++++++++++
>>>   MAINTAINERS                                        |  7 ++
>>>   2 files changed, 103 insertions(+)
>>>
>>> diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml
>>> new file mode 100644
>>> index 000000000000..f531b4865b01
>>> --- /dev/null
>>> +++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml
>>> @@ -0,0 +1,96 @@
>>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>>> +%YAML 1.2
>>> +---
>>> +$id: http://devicetree.org/schemas/iio/dac/adi,ad5529r.yaml#
>>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>>> +
>>> +title: Analog Devices AD5529R 16-Channel 12/16-bit High Voltage DAC
>> How is one device bother 12 and 16-bit? That sometimes happens for
>> ADCs where it is really reflecting oversampling or for device with hardware
>> FIFOs where storage space is saved by using lower bit rate. I'm not sure either
>> applies here.
> Having read the driver I now understand. This is supporting two parts and
> doing device ID based detection.  In an unusual step for Analog they have
> the same base part number with a post fix.  Whilst this approach works today
> it fundamentally breaks fallback dt-compatibles being used in future (the
> driver fails for any non match of WHOAMI value as it needs them to look
> up device specific data)  As such I think you need to have separate
> compatibles for the 12 and 16 bit versions.

AD5529R supports two variants- AD5529R-12 bit and AD5529R-16 bit. They share the same register interface and pin configuration
but differ in DAC resolution. I will add separate compatibles for this case.

Best Regards,
Janani Sunil


^ permalink raw reply

* [syzbot ci] Re: Introduce Per-CPU Work helpers (was QPW)
From: syzbot ci @ 2026-05-19  6:58 UTC (permalink / raw)
  To: akpm, axelrasmussen, baohua, bhe, boqun, bp, brauner, chrisl, cl,
	corbet, coxu, dapeng1.mi, david, dianders, ebiggers, elver,
	feng.tang, frederic, gary, hannes, hao.li, harry, jackmanb, jannh,
	kasong, kees, kuba, leobras.c, liam, linux-doc, linux-kernel,
	linux-mm, linux-rt-devel, lirongqing, ljs, longman, masahiroy,
	mhocko, mingo, mtosatti, nathan, nphamcs, nsc, ojeda,
	pasha.tatashin, paulmck, peterz, pfalcato, qi.zheng, rdunlap
  Cc: syzbot, syzkaller-bugs
In-Reply-To: <20260519012754.240804-1-leobras.c@gmail.com>

syzbot ci has tested the following series

[v4] Introduce Per-CPU Work helpers (was QPW)
https://lore.kernel.org/all/20260519012754.240804-1-leobras.c@gmail.com
* [PATCH v4 1/4] Introducing pw_lock() and per-cpu queue & flush work
* [PATCH v4 2/4] mm/swap: move bh draining into a separate workqueue
* [PATCH v4 3/4] swap: apply new pw_queue_on() interface
* [PATCH v4 4/4] slub: apply new pw_queue_on() interface

and found the following issue:
WARNING in __pcs_replace_empty_main

Full report is available here:
https://ci.syzbot.org/series/804f81bd-77b4-490e-bd57-6345ad2aa923

***

WARNING in __pcs_replace_empty_main

tree:      drm-next
URL:       https://gitlab.freedesktop.org/drm/kernel.git
base:      5200f5f493f79f14bbdc349e402a40dfb32f23c8
arch:      amd64
compiler:  Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config:    https://ci.syzbot.org/builds/3ea80958-13bd-49da-9c64-6deb788113f8/config

clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
Zone ranges:
  DMA      [mem 0x0000000000001000-0x0000000000ffffff]
  DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
  Normal   [mem 0x0000000100000000-0x000000023fffffff]
  Device   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000001000-0x000000000009efff]
  node   0: [mem 0x0000000000100000-0x000000007ffdefff]
  node   0: [mem 0x0000000100000000-0x0000000160000fff]
  node   1: [mem 0x0000000160001000-0x000000023fffffff]
Initmem setup node 0 [mem 0x0000000000001000-0x0000000160000fff]
Initmem setup node 1 [mem 0x0000000160001000-0x000000023fffffff]
On node 0, zone DMA: 1 pages in unavailable ranges
On node 0, zone DMA: 97 pages in unavailable ranges
On node 0, zone Normal: 33 pages in unavailable ranges
setup_percpu: NR_CPUS:8 nr_cpumask_bits:2 nr_cpu_ids:2 nr_node_ids:2
percpu: Embedded 71 pages/cpu s250632 r8192 d31992 u2097152
kvm-guest: PV spinlocks disabled, no host support
Kernel command line: earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 \
Kernel command line: comedi.comedi_num_legacy_minors=4 panic_on_warn=1 root=/dev/sda console=ttyS0 root=/dev/sda1
Unknown kernel command line parameters "nbds_max=32", will be passed to user space.
printk: log buffer data + meta data: 262144 + 917504 = 1179648 bytes
software IO TLB: area num 2.
Fallback order for Node 0: 0 1 
Fallback order for Node 1: 1 0 
Built 2 zonelists, mobility grouping on.  Total pages: 1834877
Policy zone: Normal
mem auto-init: stack:all(zero), heap alloc:on, heap free:off
stackdepot: allocating hash table via alloc_large_system_hash
stackdepot hash table entries: 1048576 (order: 12, 16777216 bytes, linear)
stackdepot: allocating space for 8192 stack pools via memblock
**********************************************************
**   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
**                                                      **
** This system shows unhashed kernel memory addresses   **
** via the console, logs, and other interfaces. This    **
** might reduce the security of your system.            **
**                                                      **
** If you see this message and you are not debugging    **
** the kernel, report this immediately to your system   **
** administrator!                                       **
**                                                      **
** Use hash_pointers=always to force this mode off      **
**                                                      **
**   NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE   **
**********************************************************
------------[ cut here ]------------
debug_locks && !(lock_is_held(&(&s->cpu_sheaves->lock)->dep_map) != 0)
WARNING: mm/slub.c:4601 at __pcs_replace_empty_main+0x51b/0x6e0, CPU#0: swapper/0
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted syzkaller #0 PREEMPT(undef) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__pcs_replace_empty_main+0x51b/0x6e0
Code: 48 85 f6 74 15 4c 89 ff 48 89 c6 e8 af 5e ff ff 4d 89 74 24 38 e9 36 fc ff ff 49 89 44 24 40 4d 89 74 24 38 e9 27 fc ff ff 90 <0f> 0b 90 83 7b 2c 00 0f 85 23 fb ff ff 48 8b 1b e8 20 cd 82 09 41
RSP: 0000:ffffffff8e607d58 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffffffff91bb8398 RCX: 0000000000000002
RDX: 0000000000000cc0 RSI: ffffffff8e21ec94 RDI: ffffffff8c28b160
RBP: 0000000000000cc0 R08: 0000000000005e00 R09: 00000000477ac845
R10: 0000000047d13f7f R11: 000000002fa01ecd R12: ffff88812103f308
R13: 0000000000000000 R14: ffffffff91bb8398 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88818dc8a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88823ffff000 CR3: 000000000e74a000 CR4: 00000000000000b0
Call Trace:
 <TASK>
 kmem_cache_alloc_node_noprof+0x441/0x690
 do_kmem_cache_create+0x172/0x620
 create_boot_cache+0xbf/0x120
 kmem_cache_init+0x11a/0x1e0
 mm_core_init+0x7e/0xb0
 start_kernel+0x15a/0x3e0
 x86_64_start_reservations+0x24/0x30
 x86_64_start_kernel+0x143/0x1c0
 common_startup_64+0x13e/0x147
 </TASK>


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.

To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).

The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.

^ permalink raw reply

* Re: [PATCH 11/15] accel/qda: Add PRIME DMA-BUF import support
From: Christian König @ 2026-05-19  6:55 UTC (permalink / raw)
  To: ekansh.gupta, Oded Gabbay, Jonathan Corbet, Shuah Khan,
	Joerg Roedel, Will Deacon, Robin Murphy, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal
  Cc: Bharath Kumar, Chenna Kesava Raju, srini, dmitry.baryshkov,
	andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
	linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig
In-Reply-To: <20260519-qda-series-v1-11-b2d984c297f8@oss.qualcomm.com>

On 5/19/26 08:16, Ekansh Gupta via B4 Relay wrote:
> From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> 
> Allow user-space to import DMA-BUF file descriptors from other
> subsystems (GPU, camera, video) into the QDA driver via the standard
> DRM PRIME interface.
> 
> qda_prime.c
>   Implements qda_gem_prime_import(), which is set as the driver's
>   .gem_prime_import callback. On import it:
>   1. Short-circuits self-import: if the dma_buf was exported by this
>      device and is not itself an import, the existing GEM object is
>      returned with an incremented reference count.
>   2. Attaches to the dma_buf and maps it with DMA_BIDIRECTIONAL via
>      dma_buf_map_attachment_unlocked(), obtaining an sg_table whose
>      DMA addresses are IOMMU virtual addresses in the CB device's
>      address space.
>   3. Calls qda_memory_manager_alloc() to record the IOMMU mapping and
>      encode the SID in the upper 32 bits of the DMA address, matching
>      the convention used for natively allocated buffers.
> 
>   qda_prime_fd_to_handle() wraps drm_gem_prime_fd_to_handle() under
>   qdev->import_lock, storing the calling file_priv in
>   qdev->current_import_file_priv so that qda_gem_prime_import() can
>   retrieve it (the .gem_prime_import callback does not receive
>   file_priv directly).
> 
> qda_gem.c
>   qda_gem_free_object() is extended to handle the imported-buffer
>   teardown path: unmap the sg_table, detach from the dma_buf, and
>   release the dma_buf reference.
>   qda_gem_mmap_obj() rejects mmap requests on imported objects.
> 
> qda_memory_manager.c
>   qda_memory_manager_map_imported() records the IOMMU-mapped DMA
>   address from the first sg entry (the IOMMU maps the buffer as a
>   contiguous range) and encodes the SID prefix.

No it doesn't.

>   qda_memory_manager_free() skips the DMA free path for imported
>   buffers since the memory is owned by the exporter.
> 
> Assisted-by: Claude:claude-4-6-sonnet
> Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
> ---
>  drivers/accel/qda/Makefile             |   1 +
>  drivers/accel/qda/qda_drv.c            |  12 ++-
>  drivers/accel/qda/qda_drv.h            |   4 +
>  drivers/accel/qda/qda_gem.c            |  25 ++++-
>  drivers/accel/qda/qda_gem.h            |   8 ++
>  drivers/accel/qda/qda_memory_manager.c |  47 ++++++++-
>  drivers/accel/qda/qda_prime.c          | 184 +++++++++++++++++++++++++++++++++
>  drivers/accel/qda/qda_prime.h          |  18 ++++
>  8 files changed, 295 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
> index a46ddceecfc5..fb092e56d7f3 100644
> --- a/drivers/accel/qda/Makefile
> +++ b/drivers/accel/qda/Makefile
> @@ -12,6 +12,7 @@ qda-y := \
>         qda_ioctl.o \
>         qda_memory_dma.o \
>         qda_memory_manager.o \
> +       qda_prime.o \
>         qda_rpmsg.o
> 
>  obj-$(CONFIG_DRM_ACCEL_QDA_COMPUTE_BUS) += qda_compute_bus.o
> diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
> index c9b9e56dcb28..ef8bd573b836 100644
> --- a/drivers/accel/qda/qda_drv.c
> +++ b/drivers/accel/qda/qda_drv.c
> @@ -7,10 +7,12 @@
>  #include <drm/drm_file.h>
>  #include <drm/drm_gem.h>
>  #include <drm/drm_ioctl.h>
> +#include <drm/drm_prime.h>
>  #include <drm/drm_print.h>
>  #include <drm/qda_accel.h>
> 
>  #include "qda_drv.h"
> +#include "qda_prime.h"
>  #include "qda_ioctl.h"
>  #include "qda_rpmsg.h"
> 
> @@ -64,6 +66,8 @@ static const struct drm_driver qda_drm_driver = {
>         .postclose = qda_postclose,
>         .ioctls = qda_ioctls,
>         .num_ioctls = ARRAY_SIZE(qda_ioctls),
> +       .gem_prime_import = qda_gem_prime_import,
> +       .prime_fd_to_handle = qda_prime_fd_to_handle,
>         .name = QDA_DRIVER_NAME,
>         .desc = "Qualcomm DSP Accelerator Driver",
>  };
> @@ -100,6 +104,7 @@ static int init_memory_manager(struct qda_dev *qdev)
> 
>  void qda_deinit_device(struct qda_dev *qdev)
>  {
> +       mutex_destroy(&qdev->import_lock);
>         cleanup_memory_manager(qdev);
>  }
> 
> @@ -107,9 +112,14 @@ int qda_init_device(struct qda_dev *qdev)
>  {
>         int ret;
> 
> +       mutex_init(&qdev->import_lock);
> +       qdev->current_import_file_priv = NULL;
> +
>         ret = init_memory_manager(qdev);
> -       if (ret)
> +       if (ret) {
>                 drm_err(&qdev->drm_dev, "Failed to initialize memory manager: %d\n", ret);
> +               mutex_destroy(&qdev->import_lock);
> +       }
> 
>         return ret;
>  }
> diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
> index 8a7d647ac8fc..96ce4135e2d9 100644
> --- a/drivers/accel/qda/qda_drv.h
> +++ b/drivers/accel/qda/qda_drv.h
> @@ -47,6 +47,10 @@ struct qda_dev {
>         struct list_head cb_devs;
>         /** @iommu_mgr: IOMMU/memory manager instance */
>         struct qda_memory_manager *iommu_mgr;
> +       /** @import_lock: Lock protecting prime import context */
> +       struct mutex import_lock;
> +       /** @current_import_file_priv: Current file_priv during prime import */
> +       struct drm_file *current_import_file_priv;
>         /** @dsp_name: Name of the DSP domain (e.g. "cdsp", "adsp") */
>         const char *dsp_name;
>  };
> diff --git a/drivers/accel/qda/qda_gem.c b/drivers/accel/qda/qda_gem.c
> index 568b3c2e64b7..9e1ac7582d0c 100644
> --- a/drivers/accel/qda/qda_gem.c
> +++ b/drivers/accel/qda/qda_gem.c
> @@ -9,6 +9,7 @@
>  #include "qda_gem.h"
>  #include "qda_memory_manager.h"
>  #include "qda_memory_dma.h"
> +#include "qda_prime.h"
> 
>  static void setup_vma_flags(struct vm_area_struct *vma)
>  {
> @@ -25,8 +26,20 @@ void qda_gem_free_object(struct drm_gem_object *gem_obj)
>         struct qda_gem_obj *qda_gem_obj = to_qda_gem_obj(gem_obj);
>         struct qda_dev *qdev = qda_dev_from_drm(gem_obj->dev);
> 
> -       if (qda_gem_obj->virt && qdev->iommu_mgr)
> -               qda_memory_manager_free(qdev->iommu_mgr, qda_gem_obj);
> +       if (qda_gem_obj->is_imported) {
> +               if (qda_gem_obj->attachment && qda_gem_obj->sgt)
> +                       dma_buf_unmap_attachment_unlocked(qda_gem_obj->attachment,
> +                                                         qda_gem_obj->sgt, DMA_BIDIRECTIONAL);
> +               if (qda_gem_obj->attachment)
> +                       dma_buf_detach(qda_gem_obj->dma_buf, qda_gem_obj->attachment);
> +               if (qda_gem_obj->dma_buf)
> +                       dma_buf_put(qda_gem_obj->dma_buf);
> +               if (qda_gem_obj->iommu_dev && qdev->iommu_mgr)
> +                       qda_memory_manager_free(qdev->iommu_mgr, qda_gem_obj);
> +       } else {
> +               if (qda_gem_obj->virt && qdev->iommu_mgr)
> +                       qda_memory_manager_free(qdev->iommu_mgr, qda_gem_obj);
> +       }
> 
>         drm_gem_object_release(gem_obj);
>         kfree(qda_gem_obj);
> @@ -44,6 +57,10 @@ int qda_gem_mmap_obj(struct drm_gem_object *drm_obj, struct vm_area_struct *vma)
>         struct qda_gem_obj *qda_gem_obj = to_qda_gem_obj(drm_obj);
>         int ret;
> 
> +       /* Imported dma-buf objects must be mmap'd through the exporter, not the importer */
> +       if (qda_gem_obj->is_imported)
> +               return -EINVAL;
> +
>         /* Reset vm_pgoff for DMA mmap */
>         vma->vm_pgoff = 0;
> 
> @@ -143,6 +160,10 @@ struct drm_gem_object *qda_gem_create_object(struct drm_device *drm_dev,
>         qda_gem_obj = qda_gem_alloc_object(drm_dev, aligned_size);
>         if (IS_ERR(qda_gem_obj))
>                 return ERR_CAST(qda_gem_obj);
> +       qda_gem_obj->is_imported = false;
> +       qda_gem_obj->dma_buf = NULL;
> +       qda_gem_obj->attachment = NULL;
> +       qda_gem_obj->sgt = NULL;
> 
>         ret = qda_memory_manager_alloc(iommu_mgr, qda_gem_obj, file_priv);
>         if (ret) {
> diff --git a/drivers/accel/qda/qda_gem.h b/drivers/accel/qda/qda_gem.h
> index bb18f8155aa4..0878f57715f6 100644
> --- a/drivers/accel/qda/qda_gem.h
> +++ b/drivers/accel/qda/qda_gem.h
> @@ -22,12 +22,20 @@ struct qda_gem_obj {
>         struct drm_gem_object base;
>         /** @iommu_dev: IOMMU context bank device that performed the allocation */
>         struct qda_iommu_device *iommu_dev;
> +       /** @dma_buf: Reference to imported dma_buf */
> +       struct dma_buf *dma_buf;
> +       /** @attachment: DMA buf attachment */
> +       struct dma_buf_attachment *attachment;
> +       /** @sgt: Scatter-gather table */
> +       struct sg_table *sgt;
>         /** @virt: Kernel virtual address of the allocated DMA memory */
>         void *virt;
>         /** @dma_addr: DMA address (with SID encoded in upper 32 bits) */
>         dma_addr_t dma_addr;
>         /** @size: Size of the buffer in bytes */
>         size_t size;
> +       /** @is_imported: True if buffer is imported, false if allocated */
> +       bool is_imported;
>  };
> 
>  /**
> diff --git a/drivers/accel/qda/qda_memory_manager.c b/drivers/accel/qda/qda_memory_manager.c
> index 82111275f420..d2aa0e0e65f5 100644
> --- a/drivers/accel/qda/qda_memory_manager.c
> +++ b/drivers/accel/qda/qda_memory_manager.c
> @@ -202,6 +202,41 @@ static struct qda_iommu_device *get_or_assign_iommu_device(struct qda_memory_man
>         return NULL;
>  }
> 
> +static int qda_memory_manager_map_imported(struct qda_memory_manager *mem_mgr,
> +                                          struct qda_gem_obj *gem_obj,
> +                                          struct qda_iommu_device *iommu_dev)
> +{
> +       struct scatterlist *sg;
> +       dma_addr_t dma_addr;
> +
> +       if (!gem_obj->is_imported || !gem_obj->sgt || !iommu_dev) {
> +               drm_err(gem_obj->base.dev, "Invalid parameters for imported buffer mapping\n");
> +               return -EINVAL;
> +       }
> +
> +       sg = gem_obj->sgt->sgl;
> +       if (!sg) {
> +               drm_err(gem_obj->base.dev, "Invalid scatter-gather list for imported buffer\n");
> +               return -EINVAL;
> +       }
> +
> +       gem_obj->iommu_dev = iommu_dev;
> +
> +       /*
> +        * After dma_buf_map_attachment_unlocked(), sg_dma_address() returns the
> +        * IOMMU virtual address, not the physical address. The IOMMU maps the
> +        * entire buffer as a contiguous range in the IOMMU address space even if
> +        * the underlying physical memory is non-contiguous. Therefore the first
> +        * sg entry's DMA address is the start of the complete contiguous
> +        * IOMMU-mapped range and is sufficient to describe the buffer to the DSP.
> +        */
> +       dma_addr = sg_dma_address(sg);
> +       dma_addr += ((u64)iommu_dev->sid << 32);
> +       gem_obj->dma_addr = dma_addr;

That handling here is completely broken since it assumes that the exporter maps the buffer as contigious range.

But that's in no way guaranteed.

Regards,
Christian.

> +
> +       return 0;
> +}
> +
>  /**
>   * qda_memory_manager_alloc() - Allocate memory for a GEM object
>   * @mem_mgr: Pointer to memory manager
> @@ -237,7 +272,11 @@ int qda_memory_manager_alloc(struct qda_memory_manager *mem_mgr, struct qda_gem_
>                 return -ENOMEM;
>         }
> 
> -       ret = qda_dma_alloc(selected_dev, gem_obj, size);
> +       if (gem_obj->is_imported)
> +               ret = qda_memory_manager_map_imported(mem_mgr, gem_obj, selected_dev);
> +       else
> +               ret = qda_dma_alloc(selected_dev, gem_obj, size);
> +
>         if (ret) {
>                 drm_err(gem_obj->base.dev, "Allocation failed: size=%zu, device_id=%u, ret=%d\n",
>                         size, selected_dev->id, ret);
> @@ -262,6 +301,12 @@ void qda_memory_manager_free(struct qda_memory_manager *mem_mgr, struct qda_gem_
>                 return;
>         }
> 
> +       if (gem_obj->is_imported) {
> +               drm_dbg_driver(gem_obj->base.dev,
> +                              "Freed imported buffer tracking (no DMA free needed)\n");
> +               return;
> +       }
> +
>         qda_dma_free(gem_obj);
>  }
> 
> diff --git a/drivers/accel/qda/qda_prime.c b/drivers/accel/qda/qda_prime.c
> new file mode 100644
> index 000000000000..acb0ac8c40fd
> --- /dev/null
> +++ b/drivers/accel/qda/qda_prime.c
> @@ -0,0 +1,184 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> +#include <drm/drm_gem.h>
> +#include <drm/drm_prime.h>
> +#include <drm/drm_print.h>
> +#include <linux/slab.h>
> +#include <linux/dma-mapping.h>
> +#include "qda_drv.h"
> +#include "qda_gem.h"
> +#include "qda_prime.h"
> +#include "qda_memory_manager.h"
> +
> +static struct drm_gem_object *check_own_buffer(struct drm_device *dev, struct dma_buf *dma_buf)
> +{
> +       struct drm_gem_object *existing_gem;
> +
> +       /* Only safe to access priv if this dma-buf was exported by this device */
> +       if (!drm_gem_is_prime_exported_dma_buf(dev, dma_buf))
> +               return NULL;
> +
> +       existing_gem = dma_buf->priv;
> +       if (existing_gem->dev != dev)
> +               return NULL;
> +
> +       if (to_qda_gem_obj(existing_gem)->is_imported)
> +               return NULL;
> +
> +       drm_gem_object_get(existing_gem);
> +       return existing_gem;
> +}
> +
> +static struct qda_iommu_device *get_iommu_device_for_import(struct qda_dev *qdev,
> +                                                           struct drm_file **file_priv_out)
> +{
> +       struct drm_file *file_priv;
> +       struct qda_file_priv *qda_file_priv;
> +       struct qda_iommu_device *iommu_dev = NULL;
> +       int ret;
> +
> +       file_priv = qdev->current_import_file_priv;
> +       *file_priv_out = file_priv;
> +
> +       if (!file_priv || !file_priv->driver_priv)
> +               return NULL;
> +
> +       qda_file_priv = (struct qda_file_priv *)file_priv->driver_priv;
> +       iommu_dev = qda_file_priv->assigned_iommu_dev;
> +
> +       if (!iommu_dev) {
> +               ret = qda_memory_manager_assign_device(qdev->iommu_mgr, file_priv);
> +               if (ret) {
> +                       drm_err(&qdev->drm_dev, "Failed to assign IOMMU device: %d\n", ret);
> +                       return NULL;
> +               }
> +
> +               iommu_dev = qda_file_priv->assigned_iommu_dev;
> +       }
> +
> +       return iommu_dev;
> +}
> +
> +static int setup_dma_buf_mapping(struct qda_gem_obj *qda_gem_obj, struct dma_buf *dma_buf,
> +                                struct device *attach_dev, struct qda_dev *qdev)
> +{
> +       struct dma_buf_attachment *attachment;
> +       struct sg_table *sgt;
> +       int ret;
> +
> +       attachment = dma_buf_attach(dma_buf, attach_dev);
> +       if (IS_ERR(attachment)) {
> +               ret = PTR_ERR(attachment);
> +               drm_err(&qdev->drm_dev, "Failed to attach dma_buf: %d\n", ret);
> +               return ret;
> +       }
> +       qda_gem_obj->attachment = attachment;
> +
> +       sgt = dma_buf_map_attachment_unlocked(attachment, DMA_BIDIRECTIONAL);
> +       if (IS_ERR(sgt)) {
> +               ret = PTR_ERR(sgt);
> +               drm_err(&qdev->drm_dev, "Failed to map dma_buf attachment: %d\n", ret);
> +               dma_buf_detach(dma_buf, attachment);
> +               return ret;
> +       }
> +       qda_gem_obj->sgt = sgt;
> +
> +       return 0;
> +}
> +
> +/**
> + * qda_gem_prime_import() - Import a DMA-BUF as a GEM object
> + * @dev: DRM device structure
> + * @dma_buf: DMA-BUF to import
> + *
> + * Return: Pointer to the imported GEM object on success, ERR_PTR on failure
> + */
> +struct drm_gem_object *qda_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf)
> +{
> +       struct qda_dev *qdev = qda_dev_from_drm(dev);
> +       struct qda_gem_obj *qda_gem_obj;
> +       struct drm_file *file_priv;
> +       struct qda_iommu_device *iommu_dev;
> +       struct drm_gem_object *existing_gem;
> +       size_t aligned_size;
> +       int ret;
> +
> +       if (!qdev->iommu_mgr) {
> +               drm_err(dev, "Invalid iommu_mgr\n");
> +               return ERR_PTR(-ENODEV);
> +       }
> +
> +       existing_gem = check_own_buffer(dev, dma_buf);
> +       if (existing_gem)
> +               return existing_gem;
> +
> +       iommu_dev = get_iommu_device_for_import(qdev, &file_priv);
> +       if (!iommu_dev || !iommu_dev->dev) {
> +               drm_err(dev, "No IOMMU device assigned for prime import\n");
> +               return ERR_PTR(-ENODEV);
> +       }
> +
> +       drm_dbg_driver(dev, "Using IOMMU device %u for prime import\n", iommu_dev->id);
> +
> +       aligned_size = PAGE_ALIGN(dma_buf->size);
> +       qda_gem_obj = qda_gem_alloc_object(dev, aligned_size);
> +       if (IS_ERR(qda_gem_obj))
> +               return ERR_CAST(qda_gem_obj);
> +
> +       qda_gem_obj->is_imported = true;
> +       qda_gem_obj->dma_buf = dma_buf;
> +       qda_gem_obj->virt = NULL;
> +       qda_gem_obj->iommu_dev = iommu_dev;
> +
> +       get_dma_buf(dma_buf);
> +
> +       ret = setup_dma_buf_mapping(qda_gem_obj, dma_buf, iommu_dev->dev, qdev);
> +       if (ret)
> +               goto err_put_dma_buf;
> +
> +       ret = qda_memory_manager_alloc(qdev->iommu_mgr, qda_gem_obj, file_priv);
> +       if (ret) {
> +               drm_err(dev, "Failed to allocate IOMMU mapping: %d\n", ret);
> +               goto err_unmap;
> +       }
> +
> +       drm_dbg_driver(dev, "Prime import completed successfully size=%zu\n", aligned_size);
> +       return &qda_gem_obj->base;
> +
> +err_unmap:
> +       dma_buf_unmap_attachment_unlocked(qda_gem_obj->attachment,
> +                                         qda_gem_obj->sgt, DMA_BIDIRECTIONAL);
> +       dma_buf_detach(dma_buf, qda_gem_obj->attachment);
> +err_put_dma_buf:
> +       dma_buf_put(dma_buf);
> +       qda_gem_cleanup_object(qda_gem_obj);
> +       return ERR_PTR(ret);
> +}
> +
> +/**
> + * qda_prime_fd_to_handle() - Convert a PRIME fd to a GEM handle
> + * @dev: DRM device structure
> + * @file_priv: DRM file private data
> + * @prime_fd: File descriptor of the PRIME buffer
> + * @handle: Output GEM handle
> + *
> + * Return: 0 on success, negative error code on failure
> + */
> +int qda_prime_fd_to_handle(struct drm_device *dev, struct drm_file *file_priv,
> +                          int prime_fd, u32 *handle)
> +{
> +       struct qda_dev *qdev = qda_dev_from_drm(dev);
> +       int ret;
> +
> +       mutex_lock(&qdev->import_lock);
> +       qdev->current_import_file_priv = file_priv;
> +
> +       ret = drm_gem_prime_fd_to_handle(dev, file_priv, prime_fd, handle);
> +
> +       qdev->current_import_file_priv = NULL;
> +       mutex_unlock(&qdev->import_lock);
> +
> +       return ret;
> +}
> +
> +MODULE_IMPORT_NS("DMA_BUF");
> diff --git a/drivers/accel/qda/qda_prime.h b/drivers/accel/qda/qda_prime.h
> new file mode 100644
> index 000000000000..9b3850d54fa7
> --- /dev/null
> +++ b/drivers/accel/qda/qda_prime.h
> @@ -0,0 +1,18 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
> + */
> +
> +#ifndef __QDA_PRIME_H__
> +#define __QDA_PRIME_H__
> +
> +#include <drm/drm_device.h>
> +#include <drm/drm_file.h>
> +#include <drm/drm_gem.h>
> +#include <linux/dma-buf.h>
> +
> +struct drm_gem_object *qda_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf);
> +int qda_prime_fd_to_handle(struct drm_device *dev, struct drm_file *file_priv,
> +                          int prime_fd, u32 *handle);
> +
> +#endif /* __QDA_PRIME_H__ */
> 
> --
> 2.34.1
> 
> 


^ permalink raw reply

* Re: [PATCH v2 1/3] dt-bindings: iio: dac: Add AD5529R
From: Janani Sunil @ 2026-05-19  6:55 UTC (permalink / raw)
  To: Jonathan Cameron, Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc, rodrigo.alencar
In-Reply-To: <20260508134843.7646c4f5@jic23-huawei>


On 5/8/26 14:48, Jonathan Cameron wrote:
> On Fri, 8 May 2026 13:55:47 +0200
> Janani Sunil <janani.sunil@analog.com> wrote:
>
>> Devicetree bindings for AD5529R 16 channel 12/16 bit high voltage,
>> buffered voltage output digital-to-analog converter (DAC) with an
>> integrated precision reference.
>>
>> Signed-off-by: Janani Sunil <janani.sunil@analog.com>
>> ---
>>   .../devicetree/bindings/iio/dac/adi,ad5529r.yaml   | 96 ++++++++++++++++++++++
>>   MAINTAINERS                                        |  7 ++
>>   2 files changed, 103 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml
>> new file mode 100644
>> index 000000000000..f531b4865b01
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5529r.yaml
>> @@ -0,0 +1,96 @@
>> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
>> +%YAML 1.2
>> +---
>> +$id: http://devicetree.org/schemas/iio/dac/adi,ad5529r.yaml#
>> +$schema: http://devicetree.org/meta-schemas/core.yaml#
>> +
>> +title: Analog Devices AD5529R 16-Channel 12/16-bit High Voltage DAC
>> +  * Multiplexer for output voltage, load current sense and die temperature
>> +
>> +  Datasheet: https://www.analog.com/media/en/technical-documentation/data-sheets/ad5529r.pdf
>> +
>> +properties:
>> +  compatible:
>> +    const: adi,ad5529r
>> +
>> +  reg:
>> +    maxItems: 1
>> +
>> +  spi-max-frequency:
>> +    maximum: 50000000
>> +
>> +  reset-gpios:
>> +    maxItems: 1
>> +    description:
>> +      GPIO connected to the RESET pin. Active low. When asserted low,
>> +      performs a power-on reset and initializes the device to its default state.
>> +
>> +  vdd-supply:
>> +    description: Digital power supply (typically 3.3V)
>> +
>> +  avdd-supply:
>> +    description: Analog power supply (typically 5V)
>> +
>> +  hvdd-supply:
>> +    description: High voltage positive supply (up to 40V for output range)
>> +
>> +  hvss-supply:
>> +    description: High voltage negative supply (ground or negative voltage)
> I don't mind doing it this way but in some similar cases where 0 is something that
> can be considered the 'default' we've made the supply optional.  What was
> your reasoning for requiring it in this case?
>
> dt-bindings should be as complete as we can make them - with that in mind...
>
> There are some more interesting corners on this device the binding doesn't
> currently cover such as mux_out pin.  We'd normally do that by making the
> driver potentially a client of an ADC
>
> Easier though is !alarm which smells like an interrupt.
> !clear probably a gpio. TG0-3 also GPIOs.

You are right, for unipolar operation, HVSS can default to ground. I will make HVSS optional.
I will also add bindings for alarm/clear/TG1, TG2, TG3 and mux out.

Best Regards,
Janani Sunil


^ permalink raw reply

* Re: [PATCH v2 3/3] Documentation: iio: Add AD5529R Documentation
From: Janani Sunil @ 2026-05-19  6:49 UTC (permalink / raw)
  To: Jonathan Cameron, Janani Sunil
  Cc: Lars-Peter Clausen, Michael Hennerich, David Lechner,
	Nuno Sá, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	linux-iio, devicetree, linux-kernel, linux-doc
In-Reply-To: <20260508140029.35ff63b0@jic23-huawei>


On 5/8/26 15:00, Jonathan Cameron wrote:
> On Fri, 8 May 2026 13:55:49 +0200
> Janani Sunil <janani.sunil@analog.com> wrote:
>
>> Add documentation for AD5529R high voltage, 16-channel 12/16 bit DAC
> Whilst it is good to have documentation for devices - I've made some
> comments below on not providing documentation of standard things (too much
> duplication) and being careful to work out who the document is for.
> These tend to be for users and board integrators etc so we don't tend
> to have much about the internals of the driver.  For that see driver!
>
> Jonathan

After reviewing your feedback about removing standard IIO content and
driver internals, the remaining user-relevant content becomes too minimal
to justify a separate documentation file. I'll drop this patch from the next version.

Best Regards,
Janani Sunil


^ permalink raw reply

* htmldocs: Warning: drivers/scsi/mpt3sas/Kconfig references a file that doesn't exist: Documentation/hwmon/mpt3sas.rst
From: kernel test robot @ 2026-05-19  6:28 UTC (permalink / raw)
  To: Louis Sautier; +Cc: oe-kbuild-all, 0day robot, linux-doc

tree:   https://github.com/intel-lab-lkp/linux/commits/Louis-Sautier/scsi-mpt3sas-add-IO-Unit-Page-7-config-accessor/20260519-030206
head:   82f70fb1a3a62df368d90847eec6afb9adbf9d2e
commit: 82f70fb1a3a62df368d90847eec6afb9adbf9d2e scsi: mpt3sas: add hwmon support
date:   11 hours ago
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
docutils: docutils (Docutils 0.21.2, Python 3.13.5, on linux)
reproduce: (https://download.01.org/0day-ci/archive/20260519/202605190857.DUejkpQ7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605190857.DUejkpQ7-lkp@intel.com/

All warnings (new ones prefixed by >>):

   Warning: Documentation/translations/zh_CN/scsi/scsi_mid_low_api.rst references a file that doesn't exist: Documentation/Configure.help
   Warning: MAINTAINERS references a file that doesn't exist: Documentation/ABI/testing/sysfs-platform-ayaneo
   Warning: MAINTAINERS references a file that doesn't exist: Documentation/devicetree/bindings/display/bridge/megachips-stdpxxxx-ge-b850v3-fw.txt
   Warning: arch/powerpc/sysdev/mpic.c references a file that doesn't exist: Documentation/devicetree/bindings/powerpc/fsl/mpic.txt
   Warning: drivers/net/ethernet/smsc/Kconfig references a file that doesn't exist: file:Documentation/networking/device_drivers/ethernet/smsc/smc9.rst
>> Warning: drivers/scsi/mpt3sas/Kconfig references a file that doesn't exist: Documentation/hwmon/mpt3sas.rst
   Warning: rust/kernel/sync/atomic/ordering.rs references a file that doesn't exist: srctree/tools/memory-model/Documentation/explanation.txt
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: Documentation/virtual/lguest/lguest.c
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: m,\b(\S*)(Documentation/[A-Za-z0-9
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: Documentation/devicetree/dt-object-internal.txt
   Warning: tools/docs/documentation-file-ref-check references a file that doesn't exist: m,^Documentation/scheduler/sched-pelt

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply

* Re: [RFC PATCH 0/7] mm/damon: hardware-sampled access reports + AMD IBS Op example
From: SeongJae Park @ 2026-05-19  6:19 UTC (permalink / raw)
  To: Ravi Jonnalagadda
  Cc: SeongJae Park, damon, linux-mm, linux-kernel, linux-doc, akpm,
	corbet, bijan311, ajayjoshi, honggyu.kim, yunjeong.mun, bharata,
	Akinobu Mita
In-Reply-To: <20260516223439.4033-1-ravis.opensrc@gmail.com>

+ Akinobu

Hello Ravi,

On Sat, 16 May 2026 15:34:25 -0700 Ravi Jonnalagadda <ravis.opensrc@gmail.com> wrote:

> Hi all,
> 
> This is an RFC, not for merge.  The series exercises and validates
> damon_report_access() -- the consumer API SeongJae introduced in [1]
> -- as a substrate for ingesting access reports from hardware-sampling
> sources.  The series includes one worked-example backend, an AMD IBS
> Op module (damon_ibs.ko), that runs on Zen 3+ silicon via the
> existing perf event subsystem.

Thank you for sharing this great RFC series!

[...]
> Why a hardware-source primitive complements existing primitives
> ===============================================================
[...]
> Both primitives produce a view of hotness that converges to the
> true distribution over the aggregation interval.  For systems where
> the address space is small relative to the aggregation rate, this is
> the right tool.  On large heterogeneous-memory systems with goal-
> driven schemes asking the closed-loop tuner to converge on a target
> distribution, a complementary lower-latency view of accesses can
> tighten the loop -- reducing the time DAMON's nr_accesses takes to
> reflect the workload's actual access distribution, which in turn
> reduces ramp duration and oscillation amplitude during convergence
> of goal-driven schemes.
> 
> A hardware-sampling primitive provides this complementary view:
> hardware retirement records each access at its natural event rate,
> with a physical address per sample, independent of TLB state and
> independent of the unmap/fault path.

Yes, I fully agree.  Different multiple access check primitives have different
characteristics.

[...]

> Demonstration
> =============
[...]
> In both regimes, convergence to target is quick, and the workload's
> measured DRAM share then holds within 1.3 percentage points of
> target with standard deviation under 1.3 percentage points, sustained
> over runs of 15-30 minutes per target.

I understand this demonstration shows your AMD IBS-based version of DAMON is
functioning as expected.  Thank you for sharing this!

[...]
> What's in this series
> =====================
> 
>   Patch 1.  mm/damon/core: refcount ops owner module to prevent
>             rmmod UAF
>   Patch 2.  mm/damon/paddr: export damon_pa_* ops for IBS module
>   Patch 3.  mm/damon/core: replace mutex-protected report buffer
>             with per-CPU lockless ring
>   Patch 4.  mm/damon/core: flat-array snapshot + bsearch in ring-
>             drain loop
>   Patch 5.  mm/damon: add sysfs binding and dispatch hookup for
>             paddr_ibs operations
>   Patch 6.  mm/damon/core: accept paddr_ibs in node_eligible_mem_bp
>             ops check
>   Patch 7.  mm/damon/damon_ibs: add AMD IBS-based access sampling
>             backend
> 
> Patches 1, 3, and 4 are general infrastructure that benefits any
> consumer of damon_report_access().  Patches 2, 5, 6, and 7 are the
> worked-example backend (paddr_ibs ops, sysfs binding, IBS module).

I didn't read the detailed code of each patch.  But my high level understanding
is as below.

Patches 1 and 2 are needed for supporting loadable module-based DAMON operation
sets (access sampling backend).

Patch 3 is needed for supporting access check primitives that can provide the
access information in only nmi context.  It can also speedup the access
reporting in general, though.

Patch 4 makes DAMON's internal reported access information retrieval faster, so
will help any reporting-based DAMON operation set use case.

Patches 5-7 are required for only the IBS-based DAMON operations set
(paddr_ibs).

So I agree patch 4 is a general infrastructure improvement that benefits
multiple use cases.

Patch 3 is also arguably general infrastructure improvement, as it will make
the reporting faster in general.

Patch 1 is not technically coupled with paddr_ibs, and will be needed for
general loadable module based access check primitives.  But, should we support
lodable modules?  If so, why?

Patch 2 is also not technically coupled with paddr_ibs, to my understanding, so
should be categorized together with patch 1?  In other words, if we agree we
should support lodable modules based DAMON operation sets, this should be
useful for not only paddr_ibs but more general cases.

Correct me if I'm wrong.

> 
> 
> Patches worth folding into damon/next
> =====================================
> 
> Patches 1, 3, and 4 are not specific to IBS or to this RFC's
> backend.  Each is preparatory infrastructure that any consumer of
> damon_report_access() will need:
> 
>   - Patch 1 (refcount ops owner) -- any modular ops set, including
>     out-of-tree backends, needs clean module unload to avoid UAF
>     on damon_unregister_ops.
>   - Patch 3 (per-CPU lockless ring) -- damon_report_access() cannot
>     be called from NMI context with the current mutex-protected
>     buffer.  Hardware samplers all need NMI-safe submission.
>   - Patch 4 (flat-array snapshot + bsearch drain) -- the linear-
>     scan drain is O(reports x regions) and exceeds the sample
>     interval at high-CPU x large-region products.  Bsearch brings
>     it to O(reports x log regions).
> 
> If these belong directly on damon/next as preparatory patches for
> damon_report_access() rather than living inside an IBS-specific
> track, we are happy to rebase and resend them that way.

So I'm bit unsure about patch 1.  If we don't have a plan to support lodable
modules based DAMON operations set, we might not need it for now.

For patches 3 and 4, I agree those will be useful in general.  Nonetheless, I'd
slightly prefer to do that optimizations at the later part of the long term
project.

> 
> 
> Relation to prior and ongoing work
> ==================================
> 
> The IBS sampling pattern in patch 7 -- attr.config=0 to use IBS Op
> default config, dc_phy_addr_valid filter, NMI-safe sample submission
> -- is derived from concepts in Bharata B Rao's pghot RFC v5 [3].
> The attribution header is in mm/damon/damon_ibs.c and the patch
> carries a Suggested-by: trailer.
> 
> Bharata's pghot v7 [4] introduces a different IBS driver targeting
> the new IBS Memory Profiler (IBS-MProf) facility, which Bharata
> describes as a facility "that will be present in future AMD
> processors" -- a separate IBS instance from the one this RFC's
> backend uses. This version of driver based out of v5 [3] is an
> example of how DAMON can be benefited from AMD IBS Hardware
> source and validates importance of IBS information indepedently.
> It is not meant to be merged in the current form.
> @Bharata if you see a path where IBS samples can be consumed
> by DAMON at some point, will be happy to collaborate.
>  
> Akinobu Mita's perf-event-based access-check RFC [5] explores a
> configurable perf-event-driven access source for DAMON.  IBS has
> vendor-specific MSR setup beyond what perf_event_attr alone
> expresses (e.g. dc_phy_addr_valid filtering on the produced sample,
> not on the perf attr), so the IBS path here appears complementary
> to [5] -- operators choose based on whether their hardware sampler
> fits stock perf or needs additional kernel-side setup.

So apparently there are multiple approaches to develop and use h/w-based access
monitoring.  Akinobu and you are trying to do that using DAMON as the frontend,
and already made the working prototypes.  There were more people who showed
interest and will to contribute to this project other than you, too.  I 100%
agree h/w-based access monitoring can be useful, and I of course thinking using
DAMON as the fronend is the right approach.  I'm all for making this
upstreamed.

I was therefore spending time on thinking about in what long-term maintainable
shape this capability can successfully be upstreamed.  I suggested
damon_report_access() as the internal interface between DAMON and the h/w-based
access check primitives, and apparently we all (I, Ravi and Akinobu in this
context) agreed.  Akinobu thankfully revisioned his implementation based on
damon_report_access() interface.  Ravi also implemented this RFC based on the
interface.

After making the consensus with Akinobu, I was taking time on the user space
interface.  When I was discussing with Akinobu, my idea was extending the user
interface for the page faults based monitoring v3 [1].  But, recently I decided
to make this more general, so proposed data attributes monitoring extension [2]
at LSFMMBPF.  The patch series for the initial change [3] is merged into mm-new
for more testing, today.  The cover letter of the patch series is also sharing
how it will be extended for h/w based access monitoring in long term.

I of course want us to go in this direction.  I believe you already had chances
to take a look on the long term plan and didn't make some voice because you
don't strongly disagree about the plan.  If not, please make a voice.

Assuming you don't have concern on the long term plan yet, I will take time to
write down more formal and detailed plan.  It will explain the overall roadmap,
timeline and how we could collaborate.  On top of that, we could further
discuss.

> 
> 
> Specific asks
> =============
> 
> To SeongJae:
> 
>   1. Patches 1, 3, and 4 are infrastructure that benefits any consumer
>      of damon_report_access(), not just the IBS backend in this RFC.
>      Would these belong directly on damon/next as preparatory patches
>      for damon_report_access(), rather than living inside an
>      IBS-specific track?  Happy to rebase and resend them that way if
>      you'd prefer that shape.  Tested-by: tags can come along.

I'm still thinking about how we can collaborate well.  The answer for the above
question would be a part of that.  In other words, I have no good answer right
now, sorry.  Could you please give me more time to think more and share the
plan?  I will share the plan as another mail.  On the thread, we could further
discuss.  Of course, we could have DAMON beer/coffee/tea chats [4] like
additional discussions before/after/during the plan discussion.

So, long story short, we agreed this project (h/w-based data access monitoring)
should be upstreamed.  But give me little more time on thinking about how we
will do it and collaborate.  It will take some time.  Please bear in mind.
Sorry for making you wait, but I pretty sure and promise that we will
eventually make it.

[1] https://lore.kernel.org/20251208062943.68824-1-sj@kernel.org
[2] https://lwn.net/Articles/1071256/
[3] https://lore.kernel.org/20260518234119.97569-1-sj@kernel.org
[4] https://docs.google.com/document/d/1v43Kcj3ly4CYqmAkMaZzLiM2GEnWfgdGbZAH3mi2vpM/edit?usp=sharing

Thanks,
SJ

[...]

^ permalink raw reply

* [PATCH 15/15] accel/qda: Add remote memory unmap from DSP address space
From: Ekansh Gupta via B4 Relay @ 2026-05-19  6:16 UTC (permalink / raw)
  To: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
	Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König
  Cc: Bharath Kumar, Chenna Kesava Raju, srini, dmitry.baryshkov,
	andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
	linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig,
	Ekansh Gupta
In-Reply-To: <20260519-qda-series-v1-0-b2d984c297f8@oss.qualcomm.com>

From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>

Implement DRM_IOCTL_QDA_REMOTE_MUNMAP (command 0x06), which unmaps
a previously mapped memory region from the DSP's virtual address space.
Two unmap modes mirror the two map modes:

QDA_MUNMAP_REQUEST_LEGACY (FASTRPC_RMID_INIT_MUNMAP)
  Legacy single-argument unmap: sends a fastrpc_munmap_req_msg
  containing the session ID, the DSP virtual address (vaddrout from
  the original map response), and the region size.

QDA_MUNMAP_REQUEST_ATTR (FASTRPC_RMID_INIT_MEM_UNMAP)
  Attribute-based unmap: sends a fastrpc_mem_unmap_req_msg which
  additionally carries the original DMA-BUF fd and virtual address,
  matching the fd-based MEM_MAP path.

DRM_QDA_REMOTE_MUNMAP is assigned command number 0x06, filling the
slot that was previously reserved for this purpose.

Assisted-by: Claude:claude-4-6-sonnet
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
---
 drivers/accel/qda/qda_drv.c     |  1 +
 drivers/accel/qda/qda_fastrpc.c | 84 +++++++++++++++++++++++++++++++++++++++++
 drivers/accel/qda/qda_fastrpc.h | 34 +++++++++++++++++
 drivers/accel/qda/qda_ioctl.c   | 28 ++++++++++++++
 drivers/accel/qda/qda_ioctl.h   |  1 +
 include/uapi/drm/qda_accel.h    | 36 +++++++++++++++++-
 6 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
index 3640e4a41605..41cc207447b4 100644
--- a/drivers/accel/qda/qda_drv.c
+++ b/drivers/accel/qda/qda_drv.c
@@ -68,6 +68,7 @@ static const struct drm_ioctl_desc qda_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(QDA_GEM_MMAP_OFFSET, qda_ioctl_gem_mmap_offset, 0),
 	DRM_IOCTL_DEF_DRV(QDA_REMOTE_SESSION_CREATE, qda_ioctl_init_create, 0),
 	DRM_IOCTL_DEF_DRV(QDA_REMOTE_MAP, qda_ioctl_mmap, 0),
+	DRM_IOCTL_DEF_DRV(QDA_REMOTE_MUNMAP, qda_ioctl_munmap, 0),
 	DRM_IOCTL_DEF_DRV(QDA_REMOTE_INVOKE, qda_ioctl_invoke, 0),
 };
 
diff --git a/drivers/accel/qda/qda_fastrpc.c b/drivers/accel/qda/qda_fastrpc.c
index cab3a560ceb5..0513beede428 100644
--- a/drivers/accel/qda/qda_fastrpc.c
+++ b/drivers/accel/qda/qda_fastrpc.c
@@ -887,6 +887,84 @@ static int fastrpc_prepare_args_mem_map_attr(struct fastrpc_invoke_context *ctx,
 	return err;
 }
 
+static int fastrpc_prepare_args_munmap(struct fastrpc_invoke_context *ctx, char __user *argp)
+{
+	struct drm_qda_fastrpc_invoke_args *args;
+	struct fastrpc_munmap_req_msg *req_msg;
+	struct drm_qda_mem_unmap uargs;
+	void *req;
+	int err;
+
+	memcpy(&uargs, argp, sizeof(uargs));
+
+	args = kzalloc_obj(*args);
+	if (!args)
+		return -ENOMEM;
+
+	req = kzalloc_obj(*req_msg);
+	if (!req) {
+		err = -ENOMEM;
+		goto err_free_args;
+	}
+	req_msg = (struct fastrpc_munmap_req_msg *)req;
+
+	req_msg->remote_session_id = ctx->remote_session_id;
+	req_msg->size  = uargs.size;
+	req_msg->vaddr = uargs.vaddrout;
+
+	setup_single_arg(args, req_msg, sizeof(*req_msg));
+	ctx->sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MUNMAP, 1, 0);
+	ctx->args = args;
+	ctx->req = req;
+	ctx->handle = FASTRPC_INIT_HANDLE;
+
+	return 0;
+
+err_free_args:
+	kfree(args);
+	return err;
+}
+
+static int fastrpc_prepare_args_mem_unmap_attr(struct fastrpc_invoke_context *ctx,
+					       char __user *argp)
+{
+	struct drm_qda_fastrpc_invoke_args *args;
+	struct fastrpc_mem_unmap_req_msg *req_msg;
+	struct drm_qda_mem_unmap uargs;
+	void *req;
+	int err;
+
+	memcpy(&uargs, argp, sizeof(uargs));
+
+	args = kzalloc_obj(*args);
+	if (!args)
+		return -ENOMEM;
+
+	req = kzalloc_obj(*req_msg);
+	if (!req) {
+		err = -ENOMEM;
+		goto err_free_args;
+	}
+	req_msg = (struct fastrpc_mem_unmap_req_msg *)req;
+
+	req_msg->remote_session_id = ctx->remote_session_id;
+	req_msg->fd      = uargs.fd;		/* DMA-BUF fd forwarded to DSP */
+	req_msg->vaddrin = uargs.vaddr;
+	req_msg->len     = uargs.size;
+
+	setup_single_arg(args, req_msg, sizeof(*req_msg));
+	ctx->sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MEM_UNMAP, 1, 0);
+	ctx->args = args;
+	ctx->req = req;
+	ctx->handle = FASTRPC_INIT_HANDLE;
+
+	return 0;
+
+err_free_args:
+	kfree(args);
+	return err;
+}
+
 static int fastrpc_prepare_args_invoke(struct fastrpc_invoke_context *ctx, char __user *argp)
 {
 	struct drm_qda_invoke_args invoke_args;
@@ -945,6 +1023,12 @@ int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *ar
 	case FASTRPC_RMID_INIT_MEM_MAP:
 		err = fastrpc_prepare_args_mem_map_attr(ctx, argp);
 		break;
+	case FASTRPC_RMID_INIT_MUNMAP:
+		err = fastrpc_prepare_args_munmap(ctx, argp);
+		break;
+	case FASTRPC_RMID_INIT_MEM_UNMAP:
+		err = fastrpc_prepare_args_mem_unmap_attr(ctx, argp);
+		break;
 	case FASTRPC_RMID_INVOKE_DYNAMIC:
 		err = fastrpc_prepare_args_invoke(ctx, argp);
 		break;
diff --git a/drivers/accel/qda/qda_fastrpc.h b/drivers/accel/qda/qda_fastrpc.h
index 71812eaf9a54..030e9b954f7a 100644
--- a/drivers/accel/qda/qda_fastrpc.h
+++ b/drivers/accel/qda/qda_fastrpc.h
@@ -275,9 +275,11 @@ struct fastrpc_invoke_context {
 /* Remote Method ID table - identifies initialization and control operations */
 #define FASTRPC_RMID_INIT_RELEASE	1	/* Release DSP process */
 #define FASTRPC_RMID_INIT_MMAP		4	/* Map memory region to DSP */
+#define FASTRPC_RMID_INIT_MUNMAP	5	/* Unmap DSP memory region */
 #define FASTRPC_RMID_INIT_CREATE	6	/* Create DSP process */
 #define FASTRPC_RMID_INIT_CREATE_ATTR	7	/* Create DSP process with attributes */
 #define FASTRPC_RMID_INIT_MEM_MAP	10	/* Map DMA buffer with attributes to DSP */
+#define FASTRPC_RMID_INIT_MEM_UNMAP	11	/* Unmap DMA buffer from DSP */
 #define FASTRPC_RMID_INVOKE_DYNAMIC	0xFFFFFFFF	/* Dynamic method invocation */
 
 /* Common handle for initialization operations */
@@ -345,6 +347,38 @@ struct fastrpc_map_rsp_msg {
 	u64 vaddrout;
 };
 
+/**
+ * struct fastrpc_mem_unmap_req_msg - Memory unmap request message with attributes
+ *
+ * This message structure is sent to the DSP to request unmapping
+ * of a previously mapped memory region (ATTR request).
+ */
+struct fastrpc_mem_unmap_req_msg {
+	/** @remote_session_id: Client identifier for the session */
+	s32 remote_session_id;
+	/** @fd: DMA-BUF file descriptor of the buffer to unmap */
+	s32 fd;
+	/** @vaddrin: DSP virtual address of the mapped region to unmap */
+	u64 vaddrin;
+	/** @len: Size of the region to unmap in bytes */
+	u64 len;
+};
+
+/**
+ * struct fastrpc_munmap_req_msg - Legacy memory unmap request message
+ *
+ * This message structure is sent to the DSP to request unmapping
+ * of a previously mapped memory region.
+ */
+struct fastrpc_munmap_req_msg {
+	/** @remote_session_id: Client identifier for the session */
+	s32 remote_session_id;
+	/** @vaddr: DSP virtual address of the mapped region to unmap */
+	u64 vaddr;
+	/** @size: Size of the region to unmap in bytes */
+	u64 size;
+};
+
 void qda_fastrpc_context_free(struct kref *ref);
 struct fastrpc_invoke_context *qda_fastrpc_context_alloc(void);
 int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *argp);
diff --git a/drivers/accel/qda/qda_ioctl.c b/drivers/accel/qda/qda_ioctl.c
index 283eb7535c45..aeba6190182e 100644
--- a/drivers/accel/qda/qda_ioctl.c
+++ b/drivers/accel/qda/qda_ioctl.c
@@ -254,6 +254,34 @@ int qda_ioctl_mmap(struct drm_device *dev, void *data, struct drm_file *file_pri
 	}
 }
 
+/**
+ * qda_ioctl_munmap() - Unmap memory from DSP address space
+ * @dev: DRM device structure
+ * @data: User-space data (struct drm_qda_mem_unmap)
+ * @file_priv: DRM file private data
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_ioctl_munmap(struct drm_device *dev, void *data, struct drm_file *file_priv)
+{
+	struct drm_qda_mem_unmap *unmap_req;
+
+	if (!data)
+		return -EINVAL;
+
+	unmap_req = (struct drm_qda_mem_unmap *)data;
+
+	switch (unmap_req->request) {
+	case QDA_MUNMAP_REQUEST_LEGACY:
+		return fastrpc_invoke(FASTRPC_RMID_INIT_MUNMAP, dev, data, file_priv);
+	case QDA_MUNMAP_REQUEST_ATTR:
+		return fastrpc_invoke(FASTRPC_RMID_INIT_MEM_UNMAP, dev, data, file_priv);
+	default:
+		drm_err(dev, "Invalid munmap request type: %u\n", unmap_req->request);
+		return -EINVAL;
+	}
+}
+
 /**
  * qda_ioctl_invoke() - Perform a dynamic FastRPC method invocation
  * @dev: DRM device structure
diff --git a/drivers/accel/qda/qda_ioctl.h b/drivers/accel/qda/qda_ioctl.h
index 457ceccede08..e14a39050d09 100644
--- a/drivers/accel/qda/qda_ioctl.h
+++ b/drivers/accel/qda/qda_ioctl.h
@@ -14,5 +14,6 @@ int qda_ioctl_gem_create(struct drm_device *dev, void *data, struct drm_file *fi
 int qda_ioctl_gem_mmap_offset(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_invoke(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_mmap(struct drm_device *dev, void *data, struct drm_file *file_priv);
+int qda_ioctl_munmap(struct drm_device *dev, void *data, struct drm_file *file_priv);
 
 #endif /* __QDA_IOCTL_H__ */
diff --git a/include/uapi/drm/qda_accel.h b/include/uapi/drm/qda_accel.h
index 173f59abd361..e3b5c9a963bf 100644
--- a/include/uapi/drm/qda_accel.h
+++ b/include/uapi/drm/qda_accel.h
@@ -21,9 +21,10 @@ extern "C" {
 #define DRM_QDA_QUERY		0x00
 #define DRM_QDA_GEM_CREATE		0x01
 #define DRM_QDA_GEM_MMAP_OFFSET	0x02
-/* Command number 0x03 reserved for INIT_ATTACH; 0x06 reserved for MUNMAP */
+/* Command number 0x03 reserved for INIT_ATTACH */
 #define DRM_QDA_REMOTE_SESSION_CREATE		0x04
 #define DRM_QDA_REMOTE_MAP			0x05
+#define DRM_QDA_REMOTE_MUNMAP			0x06
 #define DRM_QDA_REMOTE_INVOKE			0x07
 
 /*
@@ -44,6 +45,8 @@ extern "C" {
 		 struct drm_qda_init_create)
 #define DRM_IOCTL_QDA_REMOTE_MAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_MAP, \
 					  struct drm_qda_mem_map)
+#define DRM_IOCTL_QDA_REMOTE_MUNMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_MUNMAP, \
+					  struct drm_qda_mem_unmap)
 #define DRM_IOCTL_QDA_REMOTE_INVOKE	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_INVOKE, \
 					  struct drm_qda_invoke_args)
 
@@ -51,6 +54,10 @@ extern "C" {
 #define QDA_MAP_REQUEST_LEGACY    1  /* Legacy MMAP operation */
 #define QDA_MAP_REQUEST_ATTR      2  /* Handle-based MEM_MAP operation with attributes */
 
+/* Request type definitions for qda_mem_unmap */
+#define QDA_MUNMAP_REQUEST_LEGACY    1  /* Legacy MUNMAP operation */
+#define QDA_MUNMAP_REQUEST_ATTR      2  /* Handle-based MEM_UNMAP operation */
+
 /**
  * struct drm_qda_query - Device information query structure
  * @dsp_name: Name of DSP (e.g., "adsp", "cdsp", "cdsp1", "gdsp0", "gdsp1")
@@ -188,6 +195,33 @@ struct drm_qda_mem_map {
 	__u64 vaddrout;
 };
 
+/**
+ * struct drm_qda_mem_unmap - Memory unmapping request structure
+ * @request: Request type (QDA_MUNMAP_REQUEST_LEGACY or QDA_MUNMAP_REQUEST_ATTR)
+ * @fd: DMA-BUF file descriptor (used for ATTR request)
+ * @vaddr: Virtual address (used for ATTR request)
+ * @vaddrout: DSP virtual address (used for LEGACY request)
+ * @size: Size of the memory region to unmap in bytes
+ *
+ * This structure is used to request unmapping of a previously mapped
+ * memory region from the DSP's virtual address space.
+ *
+ * For QDA_MUNMAP_REQUEST_LEGACY (value 1):
+ *   - Uses fields: vaddrout, size
+ *   - Legacy MUNMAP operation for backward compatibility
+ *
+ * For QDA_MUNMAP_REQUEST_ATTR (value 2):
+ *   - Uses fields: fd, vaddr, size
+ *   - Handle-based MEM_UNMAP operation
+ */
+struct drm_qda_mem_unmap {
+	__u32 request;
+	__s32 fd;
+	__u64 vaddr;
+	__u64 vaddrout;
+	__u64 size;
+};
+
 #if defined(__cplusplus)
 }
 #endif

-- 
2.34.1



^ permalink raw reply related

* [PATCH 14/15] accel/qda: Add remote memory mapping to DSP address space
From: Ekansh Gupta via B4 Relay @ 2026-05-19  6:16 UTC (permalink / raw)
  To: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
	Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König
  Cc: Bharath Kumar, Chenna Kesava Raju, srini, dmitry.baryshkov,
	andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
	linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig,
	Ekansh Gupta
In-Reply-To: <20260519-qda-series-v1-0-b2d984c297f8@oss.qualcomm.com>

From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>

Implement DRM_IOCTL_QDA_REMOTE_MAP, which maps a DMA buffer into the
DSP's virtual address space and returns the DSP virtual address to
user-space. Two mapping modes are supported:

QDA_MAP_REQUEST_LEGACY (FASTRPC_RMID_INIT_MMAP)
  Legacy three-argument mapping: sends a fastrpc_map_req_msg to the
  DSP containing the session ID, mapping flags, and virtual address
  hint, together with the physical page descriptor resolved from the
  DMA-BUF fd. The DSP returns the assigned virtual address in
  fastrpc_map_rsp_msg.vaddrout.

QDA_MAP_REQUEST_ATTR (FASTRPC_RMID_INIT_MEM_MAP)
  Attribute-based four-argument mapping: sends a
  fastrpc_mem_map_req_msg which additionally carries the DMA-BUF fd,
  byte offset, and SMMU attribute flags. The DSP uses these to apply
  custom cache and permission attributes to the mapping.

In both cases qda_fastrpc_return_result() writes the DSP virtual
address back into the drm_qda_mem_map.vaddrout field so the DRM
framework copies it to user-space on IOCTL return.

The DMA-BUF fd is resolved to a fastrpc_phy_page descriptor via
setup_mmap_pages(), which imports the fd as a GEM object to obtain
the IOMMU-mapped dma_addr and then releases the extra reference.

Assisted-by: Claude:claude-4-6-sonnet
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
---
 drivers/accel/qda/qda_drv.c     |   1 +
 drivers/accel/qda/qda_fastrpc.c | 237 ++++++++++++++++++++++++++++++++++++++++
 drivers/accel/qda/qda_fastrpc.h |  56 ++++++++++
 drivers/accel/qda/qda_ioctl.c   |  36 ++++++
 drivers/accel/qda/qda_ioctl.h   |   1 +
 include/uapi/drm/qda_accel.h    |  45 +++++++-
 6 files changed, 375 insertions(+), 1 deletion(-)

diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
index 4eaba9b050c0..3640e4a41605 100644
--- a/drivers/accel/qda/qda_drv.c
+++ b/drivers/accel/qda/qda_drv.c
@@ -67,6 +67,7 @@ static const struct drm_ioctl_desc qda_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(QDA_GEM_CREATE, qda_ioctl_gem_create, 0),
 	DRM_IOCTL_DEF_DRV(QDA_GEM_MMAP_OFFSET, qda_ioctl_gem_mmap_offset, 0),
 	DRM_IOCTL_DEF_DRV(QDA_REMOTE_SESSION_CREATE, qda_ioctl_init_create, 0),
+	DRM_IOCTL_DEF_DRV(QDA_REMOTE_MAP, qda_ioctl_mmap, 0),
 	DRM_IOCTL_DEF_DRV(QDA_REMOTE_INVOKE, qda_ioctl_invoke, 0),
 };
 
diff --git a/drivers/accel/qda/qda_fastrpc.c b/drivers/accel/qda/qda_fastrpc.c
index 305915022b91..cab3a560ceb5 100644
--- a/drivers/accel/qda/qda_fastrpc.c
+++ b/drivers/accel/qda/qda_fastrpc.c
@@ -524,6 +524,44 @@ int qda_fastrpc_invoke_unpack(struct fastrpc_invoke_context *ctx,
 	return err;
 }
 
+static int fastrpc_return_result_mem_map(struct fastrpc_invoke_context *ctx, char __user *argp)
+{
+	struct drm_qda_mem_map margs;
+	struct fastrpc_map_rsp_msg *rsp_msg;
+
+	rsp_msg = ctx->rsp;
+
+	memcpy(&margs, argp, sizeof(margs));
+
+	margs.vaddrout = rsp_msg->vaddrout;
+
+	memcpy(argp, &margs, sizeof(margs));
+	return 0;
+}
+
+/**
+ * qda_fastrpc_return_result() - Return invocation result to user-space
+ * @ctx: FastRPC invocation context
+ * @argp: User-space pointer to write result into
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_fastrpc_return_result(struct fastrpc_invoke_context *ctx, char __user *argp)
+{
+	int err = 0;
+
+	switch (ctx->type) {
+	case FASTRPC_RMID_INIT_MMAP:
+	case FASTRPC_RMID_INIT_MEM_MAP:
+		err = fastrpc_return_result_mem_map(ctx, argp);
+		break;
+	default:
+		break;
+	}
+
+	return err;
+}
+
 static void setup_create_process_args(struct drm_qda_fastrpc_invoke_args *args,
 				      struct fastrpc_create_process_inbuf *inbuf,
 				      struct drm_qda_init_create *init,
@@ -561,6 +599,37 @@ static void setup_single_arg(struct drm_qda_fastrpc_invoke_args *args, const voi
 	args[0].fd = -1;
 }
 
+/*
+ * setup_mmap_pages() - Resolve a DMA-BUF fd to a physical page descriptor
+ *
+ * Imports the DMA-BUF fd as a GEM object to obtain the IOMMU-mapped
+ * dma_addr, fills in the fastrpc_phy_page entry, then releases the extra
+ * GEM object reference.  The handle table keeps the object alive.
+ */
+static int setup_mmap_pages(struct fastrpc_invoke_context *ctx, int dmabuf_fd,
+			    struct fastrpc_phy_page *pages)
+{
+	struct drm_gem_object *gem_obj;
+	struct qda_gem_obj *qda_gem_obj;
+	int err;
+
+	if (dmabuf_fd <= 0) {
+		pages->addr = 0;
+		pages->size = 0;
+		return 0;
+	}
+
+	err = get_gem_obj_from_dmabuf_fd(ctx, dmabuf_fd, &gem_obj);
+	if (err)
+		return err;
+
+	qda_gem_obj = to_qda_gem_obj(gem_obj);
+	setup_pages_from_gem_obj(qda_gem_obj, pages);
+
+	drm_gem_object_put(gem_obj);
+	return 0;
+}
+
 static int fastrpc_prepare_args_release_process(struct fastrpc_invoke_context *ctx)
 {
 	struct drm_qda_fastrpc_invoke_args *args;
@@ -656,6 +725,168 @@ static int fastrpc_prepare_args_init_create(struct fastrpc_invoke_context *ctx,
 	return err;
 }
 
+static int fastrpc_prepare_args_map(struct fastrpc_invoke_context *ctx, char __user *argp)
+{
+	struct drm_qda_mem_map margs;
+	struct drm_qda_fastrpc_invoke_args *args;
+	void *req, *rsp;
+	struct fastrpc_map_req_msg *req_msg;
+	struct fastrpc_map_rsp_msg *rsp_msg;
+	int err;
+
+	memcpy(&margs, argp, sizeof(margs));
+
+	args = kzalloc_objs(*args, 3);
+	if (!args)
+		return -ENOMEM;
+
+	req = kzalloc_obj(*req_msg);
+	if (!req) {
+		err = -ENOMEM;
+		goto err_free_args;
+	}
+	req_msg = (struct fastrpc_map_req_msg *)req;
+
+	rsp = kzalloc_obj(*rsp_msg);
+	if (!rsp) {
+		err = -ENOMEM;
+		goto err_free_req;
+	}
+	rsp_msg = (struct fastrpc_map_rsp_msg *)rsp;
+
+	ctx->input_pages = kzalloc_objs(*ctx->input_pages, 1);
+	if (!ctx->input_pages) {
+		err = -ENOMEM;
+		goto err_free_rsp;
+	}
+
+	req_msg->remote_session_id = ctx->remote_session_id;
+	req_msg->flags = margs.flags;
+	req_msg->vaddr = margs.vaddrin;
+	req_msg->num = sizeof(*ctx->input_pages);
+
+	args[0].ptr = (u64)(uintptr_t)req;
+	args[0].length = sizeof(*req_msg);
+	args[0].fd = -1;
+
+	/* Resolve DMA-BUF fd to physical page descriptor */
+	err = setup_mmap_pages(ctx, margs.fd, ctx->input_pages);
+	if (err)
+		goto err_free_input_pages;
+
+	args[1].ptr = (u64)(uintptr_t)ctx->input_pages;
+	args[1].length = sizeof(*ctx->input_pages);
+	args[1].fd = -1;
+
+	args[2].ptr = (u64)(uintptr_t)rsp;
+	args[2].length = sizeof(*rsp_msg);
+	args[2].fd = -1;
+
+	ctx->sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MMAP, 2, 1);
+	ctx->args = args;
+	ctx->req = req;
+	ctx->rsp = rsp;
+	ctx->handle = FASTRPC_INIT_HANDLE;
+
+	return 0;
+
+err_free_input_pages:
+	kfree(ctx->input_pages);
+	ctx->input_pages = NULL;
+err_free_rsp:
+	kfree(rsp);
+err_free_req:
+	kfree(req);
+err_free_args:
+	kfree(args);
+	return err;
+}
+
+static int fastrpc_prepare_args_mem_map_attr(struct fastrpc_invoke_context *ctx, char __user *argp)
+{
+	struct drm_qda_mem_map margs;
+	struct drm_qda_fastrpc_invoke_args *args;
+	void *req, *rsp;
+	struct fastrpc_mem_map_req_msg *req_msg;
+	struct fastrpc_map_rsp_msg *rsp_msg;
+	int err;
+
+	memcpy(&margs, argp, sizeof(margs));
+
+	args = kzalloc_objs(*args, 4);
+	if (!args)
+		return -ENOMEM;
+
+	req = kzalloc_obj(*req_msg);
+	if (!req) {
+		err = -ENOMEM;
+		goto err_free_args;
+	}
+	req_msg = (struct fastrpc_mem_map_req_msg *)req;
+
+	rsp = kzalloc_obj(*rsp_msg);
+	if (!rsp) {
+		err = -ENOMEM;
+		goto err_free_req;
+	}
+	rsp_msg = (struct fastrpc_map_rsp_msg *)rsp;
+
+	ctx->input_pages = kzalloc_objs(*ctx->input_pages, 1);
+	if (!ctx->input_pages) {
+		err = -ENOMEM;
+		goto err_free_rsp;
+	}
+
+	req_msg->remote_session_id = ctx->remote_session_id;
+	req_msg->fd       = margs.fd;		/* DMA-BUF fd forwarded to DSP */
+	req_msg->offset   = margs.offset;
+	req_msg->flags    = margs.flags;
+	req_msg->vaddrin  = margs.vaddrin;
+	req_msg->num      = sizeof(*ctx->input_pages);
+	req_msg->data_len = 0;
+
+	args[0].ptr = (u64)(uintptr_t)req;
+	args[0].length = sizeof(*req_msg);
+	args[0].fd = -1;
+
+	/* Resolve DMA-BUF fd to physical page descriptor */
+	err = setup_mmap_pages(ctx, margs.fd, ctx->input_pages);
+	if (err)
+		goto err_free_input_pages;
+
+	args[1].ptr = (u64)(uintptr_t)ctx->input_pages;
+	args[1].length = sizeof(*ctx->input_pages);
+	args[1].fd = -1;
+
+	/* args[2] is a zero-length handle-only entry required by the DSP protocol */
+	args[2].ptr = (u64)(uintptr_t)ctx->input_pages;
+	args[2].length = 0;
+	args[2].fd = -1;
+
+	args[3].ptr = (u64)(uintptr_t)rsp;
+	args[3].length = sizeof(*rsp_msg);
+	args[3].fd = -1;
+
+	ctx->sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MEM_MAP, 3, 1);
+	ctx->args = args;
+	ctx->req = req;
+	ctx->rsp = rsp;
+	ctx->handle = FASTRPC_INIT_HANDLE;
+
+	return 0;
+
+err_free_input_pages:
+	kfree(ctx->input_pages);
+	ctx->input_pages = NULL;
+err_free_rsp:
+	kfree(rsp);
+err_free_req:
+	kfree(req);
+err_free_args:
+	kfree(args);
+	return err;
+}
+
 static int fastrpc_prepare_args_invoke(struct fastrpc_invoke_context *ctx, char __user *argp)
 {
 	struct drm_qda_invoke_args invoke_args;
@@ -708,6 +939,12 @@ int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *ar
 		ctx->pd = QDA_USER_PD;
 		err = fastrpc_prepare_args_init_create(ctx, argp);
 		break;
+	case FASTRPC_RMID_INIT_MMAP:
+		err = fastrpc_prepare_args_map(ctx, argp);
+		break;
+	case FASTRPC_RMID_INIT_MEM_MAP:
+		err = fastrpc_prepare_args_mem_map_attr(ctx, argp);
+		break;
 	case FASTRPC_RMID_INVOKE_DYNAMIC:
 		err = fastrpc_prepare_args_invoke(ctx, argp);
 		break;
diff --git a/drivers/accel/qda/qda_fastrpc.h b/drivers/accel/qda/qda_fastrpc.h
index 1c1236f9525e..71812eaf9a54 100644
--- a/drivers/accel/qda/qda_fastrpc.h
+++ b/drivers/accel/qda/qda_fastrpc.h
@@ -274,8 +274,10 @@ struct fastrpc_invoke_context {
 
 /* Remote Method ID table - identifies initialization and control operations */
 #define FASTRPC_RMID_INIT_RELEASE	1	/* Release DSP process */
+#define FASTRPC_RMID_INIT_MMAP		4	/* Map memory region to DSP */
 #define FASTRPC_RMID_INIT_CREATE	6	/* Create DSP process */
 #define FASTRPC_RMID_INIT_CREATE_ATTR	7	/* Create DSP process with attributes */
+#define FASTRPC_RMID_INIT_MEM_MAP	10	/* Map DMA buffer with attributes to DSP */
 #define FASTRPC_RMID_INVOKE_DYNAMIC	0xFFFFFFFF	/* Dynamic method invocation */
 
 /* Common handle for initialization operations */
@@ -290,11 +292,65 @@ struct fastrpc_invoke_context {
 /* Maximum initialization file size (4 MB) */
 #define FASTRPC_INIT_FILELEN_MAX	(4 * 1024 * 1024)
 
+/* Message structures for internal FastRPC calls */
+
+/**
+ * struct fastrpc_mem_map_req_msg - Memory map request message with attributes
+ *
+ * This message structure is sent to the DSP to request mapping
+ * of a DMA buffer with custom attributes (ATTR request).
+ */
+struct fastrpc_mem_map_req_msg {
+	/** @remote_session_id: Client identifier for the session */
+	s32 remote_session_id;
+	/** @fd: DMA-BUF file descriptor of the buffer to map */
+	s32 fd;
+	/** @offset: Byte offset within the buffer */
+	s32 offset;
+	/** @flags: Mapping flags (cache attributes, permissions) */
+	u32 flags;
+	/** @vaddrin: Virtual address hint for the DSP mapping */
+	u64 vaddrin;
+	/** @num: Size of the physical page descriptor array in bytes */
+	s32 num;
+	/** @data_len: Length of additional inline data */
+	s32 data_len;
+};
+
+/**
+ * struct fastrpc_map_req_msg - Legacy memory map request message
+ *
+ * This message structure is sent to the DSP to request mapping
+ * of a DMA buffer into the DSP's virtual address space.
+ */
+struct fastrpc_map_req_msg {
+	/** @remote_session_id: Client identifier for the session */
+	s32 remote_session_id;
+	/** @flags: Mapping flags (cache attributes, permissions) */
+	u32 flags;
+	/** @vaddr: Virtual address hint for the DSP mapping */
+	u64 vaddr;
+	/** @num: Size of the physical page descriptor array in bytes */
+	s32 num;
+};
+
+/**
+ * struct fastrpc_map_rsp_msg - Memory map response message
+ *
+ * This message structure is returned by the DSP after successfully
+ * mapping a buffer, providing the virtual address for future access.
+ */
+struct fastrpc_map_rsp_msg {
+	/** @vaddrout: DSP virtual address assigned to the mapped buffer */
+	u64 vaddrout;
+};
+
 void qda_fastrpc_context_free(struct kref *ref);
 struct fastrpc_invoke_context *qda_fastrpc_context_alloc(void);
 int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *argp);
 int qda_fastrpc_get_header_size(struct fastrpc_invoke_context *ctx, size_t *out_size);
 int qda_fastrpc_invoke_pack(struct fastrpc_invoke_context *ctx, struct qda_msg *msg);
 int qda_fastrpc_invoke_unpack(struct fastrpc_invoke_context *ctx, struct qda_msg *msg);
+int qda_fastrpc_return_result(struct fastrpc_invoke_context *ctx, char __user *argp);
 
 #endif /* __QDA_FASTRPC_H__ */
diff --git a/drivers/accel/qda/qda_ioctl.c b/drivers/accel/qda/qda_ioctl.c
index 33f0a798ad13..283eb7535c45 100644
--- a/drivers/accel/qda/qda_ioctl.c
+++ b/drivers/accel/qda/qda_ioctl.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 // Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
 #include <drm/drm_ioctl.h>
+#include <drm/drm_print.h>
 #include <drm/qda_accel.h>
 #include "qda_drv.h"
 #include "qda_fastrpc.h"
@@ -178,6 +179,10 @@ static int fastrpc_invoke(int type, struct drm_device *dev, void *data,
 	if (err)
 		goto err_context_free;
 
+	err = qda_fastrpc_return_result(ctx, (char __user *)data);
+	if (err)
+		goto err_context_free;
+
 	fastrpc_context_put_id(ctx, qdev);
 	kref_put(&ctx->refcount, qda_fastrpc_context_free);
 	return 0;
@@ -218,6 +223,37 @@ int qda_release_dsp_process(struct qda_dev *qdev, struct drm_file *file_priv)
 	return fastrpc_invoke(FASTRPC_RMID_INIT_RELEASE, &qdev->drm_dev, NULL, file_priv);
 }
 
+/**
+ * qda_ioctl_mmap() - Map memory to DSP address space
+ * @dev: DRM device structure
+ * @data: User-space data (struct drm_qda_mem_map)
+ * @file_priv: DRM file private data
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_ioctl_mmap(struct drm_device *dev, void *data, struct drm_file *file_priv)
+{
+	struct drm_qda_mem_map *map_req;
+
+	if (!data)
+		return -EINVAL;
+
+	map_req = (struct drm_qda_mem_map *)data;
+
+	if (map_req->pad)
+		return -EINVAL;
+
+	switch (map_req->request) {
+	case QDA_MAP_REQUEST_LEGACY:
+		return fastrpc_invoke(FASTRPC_RMID_INIT_MMAP, dev, data, file_priv);
+	case QDA_MAP_REQUEST_ATTR:
+		return fastrpc_invoke(FASTRPC_RMID_INIT_MEM_MAP, dev, data, file_priv);
+	default:
+		drm_err(dev, "Invalid map request type: %u\n", map_req->request);
+		return -EINVAL;
+	}
+}
+
 /**
  * qda_ioctl_invoke() - Perform a dynamic FastRPC method invocation
  * @dev: DRM device structure
diff --git a/drivers/accel/qda/qda_ioctl.h b/drivers/accel/qda/qda_ioctl.h
index 192565434363..457ceccede08 100644
--- a/drivers/accel/qda/qda_ioctl.h
+++ b/drivers/accel/qda/qda_ioctl.h
@@ -13,5 +13,6 @@ int qda_ioctl_init_create(struct drm_device *dev, void *data, struct drm_file *f
 int qda_ioctl_gem_create(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_gem_mmap_offset(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_invoke(struct drm_device *dev, void *data, struct drm_file *file_priv);
+int qda_ioctl_mmap(struct drm_device *dev, void *data, struct drm_file *file_priv);
 
 #endif /* __QDA_IOCTL_H__ */
diff --git a/include/uapi/drm/qda_accel.h b/include/uapi/drm/qda_accel.h
index 711e2523a570..173f59abd361 100644
--- a/include/uapi/drm/qda_accel.h
+++ b/include/uapi/drm/qda_accel.h
@@ -21,8 +21,9 @@ extern "C" {
 #define DRM_QDA_QUERY		0x00
 #define DRM_QDA_GEM_CREATE		0x01
 #define DRM_QDA_GEM_MMAP_OFFSET	0x02
-/* Command number 0x03 reserved for INIT_ATTACH; 0x05-0x06 reserved for MAP, MUNMAP */
+/* Command number 0x03 reserved for INIT_ATTACH; 0x06 reserved for MUNMAP */
 #define DRM_QDA_REMOTE_SESSION_CREATE		0x04
+#define DRM_QDA_REMOTE_MAP			0x05
 #define DRM_QDA_REMOTE_INVOKE			0x07
 
 /*
@@ -41,9 +42,15 @@ extern "C" {
 #define DRM_IOCTL_QDA_REMOTE_SESSION_CREATE					\
 	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_SESSION_CREATE,		\
 		 struct drm_qda_init_create)
+#define DRM_IOCTL_QDA_REMOTE_MAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_MAP, \
+					  struct drm_qda_mem_map)
 #define DRM_IOCTL_QDA_REMOTE_INVOKE	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_INVOKE, \
 					  struct drm_qda_invoke_args)
 
+/* Request type definitions for qda_mem_map */
+#define QDA_MAP_REQUEST_LEGACY    1  /* Legacy MMAP operation */
+#define QDA_MAP_REQUEST_ATTR      2  /* Handle-based MEM_MAP operation with attributes */
+
 /**
  * struct drm_qda_query - Device information query structure
  * @dsp_name: Name of DSP (e.g., "adsp", "cdsp", "cdsp1", "gdsp0", "gdsp1")
@@ -145,6 +152,42 @@ struct drm_qda_invoke_args {
 	__u64 args;
 };
 
+/**
+ * struct drm_qda_mem_map - Memory mapping request structure
+ * @request: Request type (QDA_MAP_REQUEST_LEGACY or QDA_MAP_REQUEST_ATTR)
+ * @flags: Mapping flags for DSP (cache attributes, permissions)
+ * @fd: DMA-BUF file descriptor of the buffer to map
+ * @attrs: Mapping attributes (used for ATTR request)
+ * @offset: Offset within buffer (used for ATTR request)
+ * @pad: Padding for 64-bit alignment (must be zero)
+ * @vaddrin: Optional virtual address hint for mapping
+ * @size: Size of the memory region to map in bytes
+ * @vaddrout: Output DSP virtual address after successful mapping
+ *
+ * This structure is used to request mapping of a DMA buffer into the
+ * DSP's virtual address space. The DSP will map the buffer according
+ * to the specified flags and return the virtual address in vaddrout.
+ *
+ * For QDA_MAP_REQUEST_LEGACY (value 1):
+ *   - Uses fields: fd, flags, vaddrin, size, vaddrout
+ *   - Legacy MMAP operation for backward compatibility
+ *
+ * For QDA_MAP_REQUEST_ATTR (value 2):
+ *   - Uses all fields including attrs and offset
+ *   - FD-based MEM_MAP operation with custom SMMU attributes
+ */
+struct drm_qda_mem_map {
+	__u32 request;
+	__u32 flags;
+	__s32 fd;
+	__u32 attrs;
+	__u32 offset;
+	__u32 pad;
+	__u64 vaddrin;
+	__u64 size;
+	__u64 vaddrout;
+};
+
 #if defined(__cplusplus)
 }
 #endif

-- 
2.34.1



^ permalink raw reply related

* [PATCH 13/15] accel/qda: Add DSP process creation and release
From: Ekansh Gupta via B4 Relay @ 2026-05-19  6:16 UTC (permalink / raw)
  To: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
	Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König
  Cc: Bharath Kumar, Chenna Kesava Raju, srini, dmitry.baryshkov,
	andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
	linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig,
	Ekansh Gupta
In-Reply-To: <20260519-qda-series-v1-0-b2d984c297f8@oss.qualcomm.com>

From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>

Implement the REMOTE_SESSION_CREATE and INIT_RELEASE FastRPC
operations, which establish and tear down a user process on the
DSP.

DRM_IOCTL_QDA_REMOTE_SESSION_CREATE (drm_qda_init_create)
  Creates a new process on the DSP by sending an INIT_CREATE message
  via the FastRPC INIT_HANDLE. The caller provides an ELF file (via
  DMA-BUF fd or direct pointer) and optional process attributes. A
  4 MB GEM buffer is allocated per session to hold the DSP process
  image; this buffer is stored in qda_file_priv and reused for the
  lifetime of the session.

  If attrs is non-zero, INIT_CREATE_ATTR is used instead of
  INIT_CREATE to pass the extended attribute and signature fields.

INIT_RELEASE
  Sends a release message to the DSP when the DRM file is closed
  (qda_postclose via qda_release_dsp_process), freeing the remote
  process and its resources. The release is skipped if the device
  has already been unplugged.

qda_fastrpc.c
  fastrpc_prepare_args_init_create() marshals the six-argument
  create-process payload: the inbuf descriptor, process name,
  ELF file, physical pages, attrs, and siglen.
  fastrpc_prepare_args_release_process() marshals the single-
  argument release payload (remote_session_id).

qda_drv.c
  qda_postclose() is extended to call qda_release_dsp_process()
  under drm_dev_enter() so the release message is only sent while
  the device is still accessible.

Assisted-by: Claude:claude-4-6-sonnet
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
---
 drivers/accel/qda/qda_drv.c     |   8 +++
 drivers/accel/qda/qda_drv.h     |   5 ++
 drivers/accel/qda/qda_fastrpc.c | 140 ++++++++++++++++++++++++++++++++++++++++
 drivers/accel/qda/qda_fastrpc.h |  39 +++++++++--
 drivers/accel/qda/qda_ioctl.c   |  52 +++++++++++++++
 drivers/accel/qda/qda_ioctl.h   |   1 +
 include/uapi/drm/qda_accel.h    |  32 ++++++++-
 7 files changed, 270 insertions(+), 7 deletions(-)

diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
index 704c7d3127d2..4eaba9b050c0 100644
--- a/drivers/accel/qda/qda_drv.c
+++ b/drivers/accel/qda/qda_drv.c
@@ -36,6 +36,13 @@ static int qda_open(struct drm_device *dev, struct drm_file *file)
 static void qda_postclose(struct drm_device *dev, struct drm_file *file)
 {
 	struct qda_file_priv *qda_file_priv = file->driver_priv;
+	int idx;
+
+	/* Only send the DSP release message while the device is accessible */
+	if (drm_dev_enter(dev, &idx)) {
+		qda_release_dsp_process(qda_file_priv->qda_dev, file);
+		drm_dev_exit(idx);
+	}
 
 	if (qda_file_priv->assigned_iommu_dev) {
 		struct qda_iommu_device *iommu_dev = qda_file_priv->assigned_iommu_dev;
@@ -59,6 +66,7 @@ static const struct drm_ioctl_desc qda_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(QDA_QUERY, qda_ioctl_query, 0),
 	DRM_IOCTL_DEF_DRV(QDA_GEM_CREATE, qda_ioctl_gem_create, 0),
 	DRM_IOCTL_DEF_DRV(QDA_GEM_MMAP_OFFSET, qda_ioctl_gem_mmap_offset, 0),
+	DRM_IOCTL_DEF_DRV(QDA_REMOTE_SESSION_CREATE, qda_ioctl_init_create, 0),
 	DRM_IOCTL_DEF_DRV(QDA_REMOTE_INVOKE, qda_ioctl_invoke, 0),
 };
 
diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
index 420cccff42bf..4b4639961d95 100644
--- a/drivers/accel/qda/qda_drv.h
+++ b/drivers/accel/qda/qda_drv.h
@@ -28,6 +28,8 @@ struct qda_file_priv {
 	struct qda_dev *qda_dev;
 	/** @assigned_iommu_dev: IOMMU device assigned to this process */
 	struct qda_iommu_device *assigned_iommu_dev;
+	/** @init_mem_gem_obj: GEM object for PD initialization memory */
+	struct qda_gem_obj *init_mem_gem_obj;
 	/** @pid: Process ID for tracking */
 	pid_t pid;
 	/** @remote_session_id: Unique session identifier */
@@ -83,4 +85,7 @@ void qda_deinit_device(struct qda_dev *qdev);
 int qda_register_device(struct qda_dev *qdev);
 void qda_unregister_device(struct qda_dev *qdev);
 
+/* DSP process / protection domain management */
+int qda_release_dsp_process(struct qda_dev *qdev, struct drm_file *file_priv);
+
 #endif /* __QDA_DRV_H__ */
diff --git a/drivers/accel/qda/qda_fastrpc.c b/drivers/accel/qda/qda_fastrpc.c
index 0ec37175a098..305915022b91 100644
--- a/drivers/accel/qda/qda_fastrpc.c
+++ b/drivers/accel/qda/qda_fastrpc.c
@@ -524,6 +524,138 @@ int qda_fastrpc_invoke_unpack(struct fastrpc_invoke_context *ctx,
 	return err;
 }
 
+static void setup_create_process_args(struct drm_qda_fastrpc_invoke_args *args,
+				      struct fastrpc_create_process_inbuf *inbuf,
+				      struct drm_qda_init_create *init,
+				      struct fastrpc_phy_page *pages)
+{
+	args[0].ptr = (u64)(uintptr_t)inbuf;
+	args[0].length = sizeof(*inbuf);
+	args[0].fd = -1;
+
+	args[1].ptr = (u64)(uintptr_t)current->comm;
+	args[1].length = inbuf->namelen;
+	args[1].fd = -1;
+
+	args[2].ptr = (u64)init->file;
+	args[2].length = inbuf->filelen;
+	args[2].fd = init->filefd;	/* DMA-BUF fd forwarded to DSP */
+
+	args[3].ptr = (u64)(uintptr_t)pages;
+	args[3].length = 1 * sizeof(*pages);
+	args[3].fd = -1;
+
+	args[4].ptr = (u64)(uintptr_t)&inbuf->attrs;
+	args[4].length = sizeof(inbuf->attrs);
+	args[4].fd = -1;
+
+	args[5].ptr = (u64)(uintptr_t)&inbuf->siglen;
+	args[5].length = sizeof(inbuf->siglen);
+	args[5].fd = -1;
+}
+
+static void setup_single_arg(struct drm_qda_fastrpc_invoke_args *args, const void *ptr, size_t size)
+{
+	args[0].ptr = (u64)(uintptr_t)ptr;
+	args[0].length = size;
+	args[0].fd = -1;
+}
+
+static int fastrpc_prepare_args_release_process(struct fastrpc_invoke_context *ctx)
+{
+	struct drm_qda_fastrpc_invoke_args *args;
+
+	args = kzalloc_obj(*args);
+	if (!args)
+		return -ENOMEM;
+
+	setup_single_arg(args, &ctx->remote_session_id, sizeof(ctx->remote_session_id));
+	ctx->sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_RELEASE, 1, 0);
+	ctx->args = args;
+	ctx->handle = FASTRPC_INIT_HANDLE;
+
+	return 0;
+}
+
+static int fastrpc_prepare_args_init_create(struct fastrpc_invoke_context *ctx,
+					    char __user *argp)
+{
+	struct drm_qda_init_create init;
+	struct drm_qda_fastrpc_invoke_args *args;
+	struct fastrpc_create_process_inbuf *inbuf;
+	int err;
+	u32 sc;
+
+	args = kcalloc(FASTRPC_CREATE_PROCESS_NARGS, sizeof(*args), GFP_KERNEL);
+	if (!args)
+		return -ENOMEM;
+
+	ctx->input_pages = kcalloc(1, sizeof(*ctx->input_pages), GFP_KERNEL);
+	if (!ctx->input_pages) {
+		err = -ENOMEM;
+		goto err_free_args;
+	}
+
+	ctx->inbuf = kcalloc(1, sizeof(*inbuf), GFP_KERNEL);
+	if (!ctx->inbuf) {
+		err = -ENOMEM;
+		goto err_free_input_pages;
+	}
+	inbuf = ctx->inbuf;
+
+	memcpy(&init, argp, sizeof(init));
+
+	if (init.filelen > FASTRPC_INIT_FILELEN_MAX) {
+		err = -EINVAL;
+		goto err_free_inbuf;
+	}
+
+	/*
+	 * Validate that the DMA-BUF fd is importable.  The fd itself is kept
+	 * in init.filefd and forwarded to the DSP via setup_create_process_args().
+	 */
+	if (init.filelen && init.filefd > 0) {
+		struct drm_gem_object *file_gem_obj;
+
+		err = get_gem_obj_from_dmabuf_fd(ctx, init.filefd, &file_gem_obj);
+		if (err) {
+			err = -EINVAL;
+			goto err_free_inbuf;
+		}
+		drm_gem_object_put(file_gem_obj);
+	}
+
+	inbuf->remote_session_id = ctx->remote_session_id;
+	inbuf->namelen = strlen(current->comm) + 1;
+	inbuf->filelen = init.filelen;
+	inbuf->pageslen = 1;
+	inbuf->attrs = init.attrs;
+	inbuf->siglen = init.siglen;
+
+	setup_pages_from_gem_obj(ctx->init_mem_gem_obj, &ctx->input_pages[0]);
+
+	setup_create_process_args(args, inbuf, &init, ctx->input_pages);
+
+	sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE, 4, 0);
+	if (init.attrs)
+		sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE_ATTR, 4, 0);
+	ctx->sc = sc;
+	ctx->args = args;
+	ctx->handle = FASTRPC_INIT_HANDLE;
+
+	return 0;
+
+err_free_inbuf:
+	kfree(ctx->inbuf);
+	ctx->inbuf = NULL;
+err_free_input_pages:
+	kfree(ctx->input_pages);
+	ctx->input_pages = NULL;
+err_free_args:
+	kfree(args);
+	return err;
+}
+
 static int fastrpc_prepare_args_invoke(struct fastrpc_invoke_context *ctx, char __user *argp)
 {
 	struct drm_qda_invoke_args invoke_args;
@@ -568,6 +700,14 @@ int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *ar
 	int err;
 
 	switch (ctx->type) {
+	case FASTRPC_RMID_INIT_RELEASE:
+		err = fastrpc_prepare_args_release_process(ctx);
+		break;
+	case FASTRPC_RMID_INIT_CREATE:
+	case FASTRPC_RMID_INIT_CREATE_ATTR:
+		ctx->pd = QDA_USER_PD;
+		err = fastrpc_prepare_args_init_create(ctx, argp);
+		break;
 	case FASTRPC_RMID_INVOKE_DYNAMIC:
 		err = fastrpc_prepare_args_invoke(ctx, argp);
 		break;
diff --git a/drivers/accel/qda/qda_fastrpc.h b/drivers/accel/qda/qda_fastrpc.h
index ce77baeccfba..1c1236f9525e 100644
--- a/drivers/accel/qda/qda_fastrpc.h
+++ b/drivers/accel/qda/qda_fastrpc.h
@@ -127,6 +127,27 @@ struct fastrpc_invoke_buf {
 	u32 pgidx;
 };
 
+/**
+ * struct fastrpc_create_process_inbuf - Input buffer for process creation
+ *
+ * This structure defines the input buffer format for creating a new
+ * process on the remote DSP.
+ */
+struct fastrpc_create_process_inbuf {
+	/** @remote_session_id: Client identifier for the session */
+	int remote_session_id;
+	/** @namelen: Length of the process name string including NUL terminator */
+	u32 namelen;
+	/** @filelen: Length of the ELF shell file in bytes */
+	u32 filelen;
+	/** @pageslen: Number of physical page descriptors */
+	u32 pageslen;
+	/** @attrs: Process attribute flags */
+	u32 attrs;
+	/** @siglen: Length of the signature data in bytes */
+	u32 siglen;
+};
+
 /**
  * struct fastrpc_msg - FastRPC wire message for remote invocations
  *
@@ -153,10 +174,6 @@ struct fastrpc_msg {
 
 /**
  * struct qda_msg - FastRPC message with kernel-internal bookkeeping
- *
- * The wire-format portion is kept in the embedded @fastrpc member (must
- * be first) so that &qda_msg->fastrpc can be passed directly to
- * rpmsg_send() without a copy.
  */
 struct qda_msg {
 	/**
@@ -245,7 +262,7 @@ struct fastrpc_invoke_context {
 	struct qda_gem_obj *msg_gem_obj;
 	/** @file_priv: DRM file private data */
 	struct drm_file *file_priv;
-	/** @init_mem_gem_obj: GEM object for protection domain init memory */
+	/** @init_mem_gem_obj: GEM object for PD initialization memory */
 	struct qda_gem_obj *init_mem_gem_obj;
 	/** @req: Pointer to kernel-internal request buffer */
 	void *req;
@@ -256,11 +273,23 @@ struct fastrpc_invoke_context {
 };
 
 /* Remote Method ID table - identifies initialization and control operations */
+#define FASTRPC_RMID_INIT_RELEASE	1	/* Release DSP process */
+#define FASTRPC_RMID_INIT_CREATE	6	/* Create DSP process */
+#define FASTRPC_RMID_INIT_CREATE_ATTR	7	/* Create DSP process with attributes */
 #define FASTRPC_RMID_INVOKE_DYNAMIC	0xFFFFFFFF	/* Dynamic method invocation */
 
 /* Common handle for initialization operations */
 #define FASTRPC_INIT_HANDLE		0x1
 
+/* Protection Domain (PD) identifiers */
+#define QDA_ROOT_PD		(0)
+#define QDA_USER_PD		(1)
+
+/* Number of arguments for process creation */
+#define FASTRPC_CREATE_PROCESS_NARGS	6
+/* Maximum initialization file size (4 MB) */
+#define FASTRPC_INIT_FILELEN_MAX	(4 * 1024 * 1024)
+
 void qda_fastrpc_context_free(struct kref *ref);
 struct fastrpc_invoke_context *qda_fastrpc_context_alloc(void);
 int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *argp);
diff --git a/drivers/accel/qda/qda_ioctl.c b/drivers/accel/qda/qda_ioctl.c
index c81268c20b04..33f0a798ad13 100644
--- a/drivers/accel/qda/qda_ioctl.c
+++ b/drivers/accel/qda/qda_ioctl.c
@@ -109,6 +109,7 @@ static int fastrpc_invoke(int type, struct drm_device *dev, void *data,
 	struct drm_gem_object *gem_obj;
 	int err;
 	size_t hdr_size;
+	size_t initmem_size = FASTRPC_INIT_FILELEN_MAX;
 
 	ctx = qda_fastrpc_context_alloc();
 	if (IS_ERR(ctx))
@@ -124,6 +125,27 @@ static int fastrpc_invoke(int type, struct drm_device *dev, void *data,
 	ctx->file_priv = file_priv;
 	ctx->remote_session_id = qda_file_priv->remote_session_id;
 
+	if (type == FASTRPC_RMID_INIT_CREATE) {
+		struct drm_gem_object *initmem_gem_obj;
+
+		if (qda_file_priv->init_mem_gem_obj) {
+			drm_gem_object_put(&qda_file_priv->init_mem_gem_obj->base);
+			qda_file_priv->init_mem_gem_obj = NULL;
+		}
+
+		initmem_gem_obj = qda_gem_create_object(dev, qdev->iommu_mgr,
+							initmem_size, file_priv);
+		if (IS_ERR(initmem_gem_obj)) {
+			err = PTR_ERR(initmem_gem_obj);
+			goto err_context_free;
+		}
+
+		ctx->init_mem_gem_obj = to_qda_gem_obj(initmem_gem_obj);
+		qda_file_priv->init_mem_gem_obj = ctx->init_mem_gem_obj;
+	} else if (type == FASTRPC_RMID_INIT_RELEASE) {
+		ctx->init_mem_gem_obj = qda_file_priv->init_mem_gem_obj;
+	}
+
 	err = qda_fastrpc_prepare_args(ctx, (char __user *)data);
 	if (err)
 		goto err_context_free;
@@ -161,11 +183,41 @@ static int fastrpc_invoke(int type, struct drm_device *dev, void *data,
 	return 0;
 
 err_context_free:
+	if (type == FASTRPC_RMID_INIT_RELEASE && !err && qda_file_priv->init_mem_gem_obj) {
+		drm_gem_object_put(&qda_file_priv->init_mem_gem_obj->base);
+		qda_file_priv->init_mem_gem_obj = NULL;
+	}
+
 	fastrpc_context_put_id(ctx, qdev);
 	kref_put(&ctx->refcount, qda_fastrpc_context_free);
 	return err;
 }
 
+/**
+ * qda_ioctl_init_create() - Create a DSP process
+ * @dev: DRM device structure
+ * @data: User-space data (struct drm_qda_init_create)
+ * @file_priv: DRM file private data
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_ioctl_init_create(struct drm_device *dev, void *data, struct drm_file *file_priv)
+{
+	return fastrpc_invoke(FASTRPC_RMID_INIT_CREATE, dev, data, file_priv);
+}
+
+/**
+ * qda_release_dsp_process() - Release DSP process resources for a file
+ * @qdev: QDA device structure
+ * @file_priv: DRM file private data
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_release_dsp_process(struct qda_dev *qdev, struct drm_file *file_priv)
+{
+	return fastrpc_invoke(FASTRPC_RMID_INIT_RELEASE, &qdev->drm_dev, NULL, file_priv);
+}
+
 /**
  * qda_ioctl_invoke() - Perform a dynamic FastRPC method invocation
  * @dev: DRM device structure
diff --git a/drivers/accel/qda/qda_ioctl.h b/drivers/accel/qda/qda_ioctl.h
index 3bb9cfd98370..192565434363 100644
--- a/drivers/accel/qda/qda_ioctl.h
+++ b/drivers/accel/qda/qda_ioctl.h
@@ -9,6 +9,7 @@
 #include "qda_drv.h"
 
 int qda_ioctl_query(struct drm_device *dev, void *data, struct drm_file *file_priv);
+int qda_ioctl_init_create(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_gem_create(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_gem_mmap_offset(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_invoke(struct drm_device *dev, void *data, struct drm_file *file_priv);
diff --git a/include/uapi/drm/qda_accel.h b/include/uapi/drm/qda_accel.h
index 72512213741f..711e2523a570 100644
--- a/include/uapi/drm/qda_accel.h
+++ b/include/uapi/drm/qda_accel.h
@@ -21,8 +21,9 @@ extern "C" {
 #define DRM_QDA_QUERY		0x00
 #define DRM_QDA_GEM_CREATE		0x01
 #define DRM_QDA_GEM_MMAP_OFFSET	0x02
-/* Command numbers 0x03-0x06 reserved for INIT_ATTACH, INIT_CREATE, MAP, MUNMAP */
-#define DRM_QDA_REMOTE_INVOKE		0x07
+/* Command number 0x03 reserved for INIT_ATTACH; 0x05-0x06 reserved for MAP, MUNMAP */
+#define DRM_QDA_REMOTE_SESSION_CREATE		0x04
+#define DRM_QDA_REMOTE_INVOKE			0x07
 
 /*
  * QDA IOCTL definitions
@@ -37,6 +38,9 @@ extern "C" {
 					  struct drm_qda_gem_create)
 #define DRM_IOCTL_QDA_GEM_MMAP_OFFSET	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_GEM_MMAP_OFFSET, \
 					  struct drm_qda_gem_mmap_offset)
+#define DRM_IOCTL_QDA_REMOTE_SESSION_CREATE					\
+	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_SESSION_CREATE,		\
+		 struct drm_qda_init_create)
 #define DRM_IOCTL_QDA_REMOTE_INVOKE	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_INVOKE, \
 					  struct drm_qda_invoke_args)
 
@@ -99,6 +103,30 @@ struct drm_qda_fastrpc_invoke_args {
 	__u32 attr;
 };
 
+/**
+ * struct drm_qda_init_create - Accelerator process initialization parameters
+ * @filelen: Length of the ELF file in bytes
+ * @filefd: DMA-BUF file descriptor containing the ELF file
+ * @attrs: Process attributes flags
+ * @siglen: Length of signature data in bytes
+ * @file: Pointer to ELF file data if not using filefd
+ *
+ * This structure is used with DRM_IOCTL_QDA_INIT_CREATE to initialize
+ * a new process on the accelerator. The process code is provided either
+ * via a file descriptor (filefd, typically a GEM object) or a direct
+ * pointer (file). Set file to 0 if using filefd.
+ *
+ * The attrs field contains bit flags for debug mode, privileged execution,
+ * and other process attributes.
+ */
+struct drm_qda_init_create {
+	__u32 filelen;
+	__s32 filefd;
+	__u32 attrs;
+	__u32 siglen;
+	__u64 file;
+};
+
 /**
  * struct drm_qda_invoke_args - Dynamic FastRPC invocation parameters
  * @handle: Remote handle to invoke on the DSP

-- 
2.34.1



^ permalink raw reply related

* [PATCH 11/15] accel/qda: Add PRIME DMA-BUF import support
From: Ekansh Gupta via B4 Relay @ 2026-05-19  6:16 UTC (permalink / raw)
  To: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
	Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König
  Cc: Bharath Kumar, Chenna Kesava Raju, srini, dmitry.baryshkov,
	andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
	linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig,
	Ekansh Gupta
In-Reply-To: <20260519-qda-series-v1-0-b2d984c297f8@oss.qualcomm.com>

From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>

Allow user-space to import DMA-BUF file descriptors from other
subsystems (GPU, camera, video) into the QDA driver via the standard
DRM PRIME interface.

qda_prime.c
  Implements qda_gem_prime_import(), which is set as the driver's
  .gem_prime_import callback. On import it:
  1. Short-circuits self-import: if the dma_buf was exported by this
     device and is not itself an import, the existing GEM object is
     returned with an incremented reference count.
  2. Attaches to the dma_buf and maps it with DMA_BIDIRECTIONAL via
     dma_buf_map_attachment_unlocked(), obtaining an sg_table whose
     DMA addresses are IOMMU virtual addresses in the CB device's
     address space.
  3. Calls qda_memory_manager_alloc() to record the IOMMU mapping and
     encode the SID in the upper 32 bits of the DMA address, matching
     the convention used for natively allocated buffers.

  qda_prime_fd_to_handle() wraps drm_gem_prime_fd_to_handle() under
  qdev->import_lock, storing the calling file_priv in
  qdev->current_import_file_priv so that qda_gem_prime_import() can
  retrieve it (the .gem_prime_import callback does not receive
  file_priv directly).

qda_gem.c
  qda_gem_free_object() is extended to handle the imported-buffer
  teardown path: unmap the sg_table, detach from the dma_buf, and
  release the dma_buf reference.
  qda_gem_mmap_obj() rejects mmap requests on imported objects.

qda_memory_manager.c
  qda_memory_manager_map_imported() records the IOMMU-mapped DMA
  address from the first sg entry (the IOMMU maps the buffer as a
  contiguous range) and encodes the SID prefix.
  qda_memory_manager_free() skips the DMA free path for imported
  buffers since the memory is owned by the exporter.

Assisted-by: Claude:claude-4-6-sonnet
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
---
 drivers/accel/qda/Makefile             |   1 +
 drivers/accel/qda/qda_drv.c            |  12 ++-
 drivers/accel/qda/qda_drv.h            |   4 +
 drivers/accel/qda/qda_gem.c            |  25 ++++-
 drivers/accel/qda/qda_gem.h            |   8 ++
 drivers/accel/qda/qda_memory_manager.c |  47 ++++++++-
 drivers/accel/qda/qda_prime.c          | 184 +++++++++++++++++++++++++++++++++
 drivers/accel/qda/qda_prime.h          |  18 ++++
 8 files changed, 295 insertions(+), 4 deletions(-)

diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
index a46ddceecfc5..fb092e56d7f3 100644
--- a/drivers/accel/qda/Makefile
+++ b/drivers/accel/qda/Makefile
@@ -12,6 +12,7 @@ qda-y := \
 	qda_ioctl.o \
 	qda_memory_dma.o \
 	qda_memory_manager.o \
+	qda_prime.o \
 	qda_rpmsg.o
 
 obj-$(CONFIG_DRM_ACCEL_QDA_COMPUTE_BUS) += qda_compute_bus.o
diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
index c9b9e56dcb28..ef8bd573b836 100644
--- a/drivers/accel/qda/qda_drv.c
+++ b/drivers/accel/qda/qda_drv.c
@@ -7,10 +7,12 @@
 #include <drm/drm_file.h>
 #include <drm/drm_gem.h>
 #include <drm/drm_ioctl.h>
+#include <drm/drm_prime.h>
 #include <drm/drm_print.h>
 #include <drm/qda_accel.h>
 
 #include "qda_drv.h"
+#include "qda_prime.h"
 #include "qda_ioctl.h"
 #include "qda_rpmsg.h"
 
@@ -64,6 +66,8 @@ static const struct drm_driver qda_drm_driver = {
 	.postclose = qda_postclose,
 	.ioctls = qda_ioctls,
 	.num_ioctls = ARRAY_SIZE(qda_ioctls),
+	.gem_prime_import = qda_gem_prime_import,
+	.prime_fd_to_handle = qda_prime_fd_to_handle,
 	.name = QDA_DRIVER_NAME,
 	.desc = "Qualcomm DSP Accelerator Driver",
 };
@@ -100,6 +104,7 @@ static int init_memory_manager(struct qda_dev *qdev)
 
 void qda_deinit_device(struct qda_dev *qdev)
 {
+	mutex_destroy(&qdev->import_lock);
 	cleanup_memory_manager(qdev);
 }
 
@@ -107,9 +112,14 @@ int qda_init_device(struct qda_dev *qdev)
 {
 	int ret;
 
+	mutex_init(&qdev->import_lock);
+	qdev->current_import_file_priv = NULL;
+
 	ret = init_memory_manager(qdev);
-	if (ret)
+	if (ret) {
 		drm_err(&qdev->drm_dev, "Failed to initialize memory manager: %d\n", ret);
+		mutex_destroy(&qdev->import_lock);
+	}
 
 	return ret;
 }
diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
index 8a7d647ac8fc..96ce4135e2d9 100644
--- a/drivers/accel/qda/qda_drv.h
+++ b/drivers/accel/qda/qda_drv.h
@@ -47,6 +47,10 @@ struct qda_dev {
 	struct list_head cb_devs;
 	/** @iommu_mgr: IOMMU/memory manager instance */
 	struct qda_memory_manager *iommu_mgr;
+	/** @import_lock: Lock protecting prime import context */
+	struct mutex import_lock;
+	/** @current_import_file_priv: Current file_priv during prime import */
+	struct drm_file *current_import_file_priv;
 	/** @dsp_name: Name of the DSP domain (e.g. "cdsp", "adsp") */
 	const char *dsp_name;
 };
diff --git a/drivers/accel/qda/qda_gem.c b/drivers/accel/qda/qda_gem.c
index 568b3c2e64b7..9e1ac7582d0c 100644
--- a/drivers/accel/qda/qda_gem.c
+++ b/drivers/accel/qda/qda_gem.c
@@ -9,6 +9,7 @@
 #include "qda_gem.h"
 #include "qda_memory_manager.h"
 #include "qda_memory_dma.h"
+#include "qda_prime.h"
 
 static void setup_vma_flags(struct vm_area_struct *vma)
 {
@@ -25,8 +26,20 @@ void qda_gem_free_object(struct drm_gem_object *gem_obj)
 	struct qda_gem_obj *qda_gem_obj = to_qda_gem_obj(gem_obj);
 	struct qda_dev *qdev = qda_dev_from_drm(gem_obj->dev);
 
-	if (qda_gem_obj->virt && qdev->iommu_mgr)
-		qda_memory_manager_free(qdev->iommu_mgr, qda_gem_obj);
+	if (qda_gem_obj->is_imported) {
+		if (qda_gem_obj->attachment && qda_gem_obj->sgt)
+			dma_buf_unmap_attachment_unlocked(qda_gem_obj->attachment,
+							  qda_gem_obj->sgt, DMA_BIDIRECTIONAL);
+		if (qda_gem_obj->attachment)
+			dma_buf_detach(qda_gem_obj->dma_buf, qda_gem_obj->attachment);
+		if (qda_gem_obj->dma_buf)
+			dma_buf_put(qda_gem_obj->dma_buf);
+		if (qda_gem_obj->iommu_dev && qdev->iommu_mgr)
+			qda_memory_manager_free(qdev->iommu_mgr, qda_gem_obj);
+	} else {
+		if (qda_gem_obj->virt && qdev->iommu_mgr)
+			qda_memory_manager_free(qdev->iommu_mgr, qda_gem_obj);
+	}
 
 	drm_gem_object_release(gem_obj);
 	kfree(qda_gem_obj);
@@ -44,6 +57,10 @@ int qda_gem_mmap_obj(struct drm_gem_object *drm_obj, struct vm_area_struct *vma)
 	struct qda_gem_obj *qda_gem_obj = to_qda_gem_obj(drm_obj);
 	int ret;
 
+	/* Imported dma-buf objects must be mmap'd through the exporter, not the importer */
+	if (qda_gem_obj->is_imported)
+		return -EINVAL;
+
 	/* Reset vm_pgoff for DMA mmap */
 	vma->vm_pgoff = 0;
 
@@ -143,6 +160,10 @@ struct drm_gem_object *qda_gem_create_object(struct drm_device *drm_dev,
 	qda_gem_obj = qda_gem_alloc_object(drm_dev, aligned_size);
 	if (IS_ERR(qda_gem_obj))
 		return ERR_CAST(qda_gem_obj);
+	qda_gem_obj->is_imported = false;
+	qda_gem_obj->dma_buf = NULL;
+	qda_gem_obj->attachment = NULL;
+	qda_gem_obj->sgt = NULL;
 
 	ret = qda_memory_manager_alloc(iommu_mgr, qda_gem_obj, file_priv);
 	if (ret) {
diff --git a/drivers/accel/qda/qda_gem.h b/drivers/accel/qda/qda_gem.h
index bb18f8155aa4..0878f57715f6 100644
--- a/drivers/accel/qda/qda_gem.h
+++ b/drivers/accel/qda/qda_gem.h
@@ -22,12 +22,20 @@ struct qda_gem_obj {
 	struct drm_gem_object base;
 	/** @iommu_dev: IOMMU context bank device that performed the allocation */
 	struct qda_iommu_device *iommu_dev;
+	/** @dma_buf: Reference to imported dma_buf */
+	struct dma_buf *dma_buf;
+	/** @attachment: DMA buf attachment */
+	struct dma_buf_attachment *attachment;
+	/** @sgt: Scatter-gather table */
+	struct sg_table *sgt;
 	/** @virt: Kernel virtual address of the allocated DMA memory */
 	void *virt;
 	/** @dma_addr: DMA address (with SID encoded in upper 32 bits) */
 	dma_addr_t dma_addr;
 	/** @size: Size of the buffer in bytes */
 	size_t size;
+	/** @is_imported: True if buffer is imported, false if allocated */
+	bool is_imported;
 };
 
 /**
diff --git a/drivers/accel/qda/qda_memory_manager.c b/drivers/accel/qda/qda_memory_manager.c
index 82111275f420..d2aa0e0e65f5 100644
--- a/drivers/accel/qda/qda_memory_manager.c
+++ b/drivers/accel/qda/qda_memory_manager.c
@@ -202,6 +202,41 @@ static struct qda_iommu_device *get_or_assign_iommu_device(struct qda_memory_man
 	return NULL;
 }
 
+static int qda_memory_manager_map_imported(struct qda_memory_manager *mem_mgr,
+					   struct qda_gem_obj *gem_obj,
+					   struct qda_iommu_device *iommu_dev)
+{
+	struct scatterlist *sg;
+	dma_addr_t dma_addr;
+
+	if (!gem_obj->is_imported || !gem_obj->sgt || !iommu_dev) {
+		drm_err(gem_obj->base.dev, "Invalid parameters for imported buffer mapping\n");
+		return -EINVAL;
+	}
+
+	sg = gem_obj->sgt->sgl;
+	if (!sg) {
+		drm_err(gem_obj->base.dev, "Invalid scatter-gather list for imported buffer\n");
+		return -EINVAL;
+	}
+
+	gem_obj->iommu_dev = iommu_dev;
+
+	/*
+	 * After dma_buf_map_attachment_unlocked(), sg_dma_address() returns the
+	 * IOMMU virtual address, not the physical address. The IOMMU maps the
+	 * entire buffer as a contiguous range in the IOMMU address space even if
+	 * the underlying physical memory is non-contiguous. Therefore the first
+	 * sg entry's DMA address is the start of the complete contiguous
+	 * IOMMU-mapped range and is sufficient to describe the buffer to the DSP.
+	 */
+	dma_addr = sg_dma_address(sg);
+	dma_addr += ((u64)iommu_dev->sid << 32);
+	gem_obj->dma_addr = dma_addr;
+
+	return 0;
+}
+
 /**
  * qda_memory_manager_alloc() - Allocate memory for a GEM object
  * @mem_mgr: Pointer to memory manager
@@ -237,7 +272,11 @@ int qda_memory_manager_alloc(struct qda_memory_manager *mem_mgr, struct qda_gem_
 		return -ENOMEM;
 	}
 
-	ret = qda_dma_alloc(selected_dev, gem_obj, size);
+	if (gem_obj->is_imported)
+		ret = qda_memory_manager_map_imported(mem_mgr, gem_obj, selected_dev);
+	else
+		ret = qda_dma_alloc(selected_dev, gem_obj, size);
+
 	if (ret) {
 		drm_err(gem_obj->base.dev, "Allocation failed: size=%zu, device_id=%u, ret=%d\n",
 			size, selected_dev->id, ret);
@@ -262,6 +301,12 @@ void qda_memory_manager_free(struct qda_memory_manager *mem_mgr, struct qda_gem_
 		return;
 	}
 
+	if (gem_obj->is_imported) {
+		drm_dbg_driver(gem_obj->base.dev,
+			       "Freed imported buffer tracking (no DMA free needed)\n");
+		return;
+	}
+
 	qda_dma_free(gem_obj);
 }
 
diff --git a/drivers/accel/qda/qda_prime.c b/drivers/accel/qda/qda_prime.c
new file mode 100644
index 000000000000..acb0ac8c40fd
--- /dev/null
+++ b/drivers/accel/qda/qda_prime.c
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+#include <drm/drm_gem.h>
+#include <drm/drm_prime.h>
+#include <drm/drm_print.h>
+#include <linux/slab.h>
+#include <linux/dma-mapping.h>
+#include "qda_drv.h"
+#include "qda_gem.h"
+#include "qda_prime.h"
+#include "qda_memory_manager.h"
+
+static struct drm_gem_object *check_own_buffer(struct drm_device *dev, struct dma_buf *dma_buf)
+{
+	struct drm_gem_object *existing_gem;
+
+	/* Only safe to access priv if this dma-buf was exported by this device */
+	if (!drm_gem_is_prime_exported_dma_buf(dev, dma_buf))
+		return NULL;
+
+	existing_gem = dma_buf->priv;
+	if (existing_gem->dev != dev)
+		return NULL;
+
+	if (to_qda_gem_obj(existing_gem)->is_imported)
+		return NULL;
+
+	drm_gem_object_get(existing_gem);
+	return existing_gem;
+}
+
+static struct qda_iommu_device *get_iommu_device_for_import(struct qda_dev *qdev,
+							    struct drm_file **file_priv_out)
+{
+	struct drm_file *file_priv;
+	struct qda_file_priv *qda_file_priv;
+	struct qda_iommu_device *iommu_dev = NULL;
+	int ret;
+
+	file_priv = qdev->current_import_file_priv;
+	*file_priv_out = file_priv;
+
+	if (!file_priv || !file_priv->driver_priv)
+		return NULL;
+
+	qda_file_priv = (struct qda_file_priv *)file_priv->driver_priv;
+	iommu_dev = qda_file_priv->assigned_iommu_dev;
+
+	if (!iommu_dev) {
+		ret = qda_memory_manager_assign_device(qdev->iommu_mgr, file_priv);
+		if (ret) {
+			drm_err(&qdev->drm_dev, "Failed to assign IOMMU device: %d\n", ret);
+			return NULL;
+		}
+
+		iommu_dev = qda_file_priv->assigned_iommu_dev;
+	}
+
+	return iommu_dev;
+}
+
+static int setup_dma_buf_mapping(struct qda_gem_obj *qda_gem_obj, struct dma_buf *dma_buf,
+				 struct device *attach_dev, struct qda_dev *qdev)
+{
+	struct dma_buf_attachment *attachment;
+	struct sg_table *sgt;
+	int ret;
+
+	attachment = dma_buf_attach(dma_buf, attach_dev);
+	if (IS_ERR(attachment)) {
+		ret = PTR_ERR(attachment);
+		drm_err(&qdev->drm_dev, "Failed to attach dma_buf: %d\n", ret);
+		return ret;
+	}
+	qda_gem_obj->attachment = attachment;
+
+	sgt = dma_buf_map_attachment_unlocked(attachment, DMA_BIDIRECTIONAL);
+	if (IS_ERR(sgt)) {
+		ret = PTR_ERR(sgt);
+		drm_err(&qdev->drm_dev, "Failed to map dma_buf attachment: %d\n", ret);
+		dma_buf_detach(dma_buf, attachment);
+		return ret;
+	}
+	qda_gem_obj->sgt = sgt;
+
+	return 0;
+}
+
+/**
+ * qda_gem_prime_import() - Import a DMA-BUF as a GEM object
+ * @dev: DRM device structure
+ * @dma_buf: DMA-BUF to import
+ *
+ * Return: Pointer to the imported GEM object on success, ERR_PTR on failure
+ */
+struct drm_gem_object *qda_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf)
+{
+	struct qda_dev *qdev = qda_dev_from_drm(dev);
+	struct qda_gem_obj *qda_gem_obj;
+	struct drm_file *file_priv;
+	struct qda_iommu_device *iommu_dev;
+	struct drm_gem_object *existing_gem;
+	size_t aligned_size;
+	int ret;
+
+	if (!qdev->iommu_mgr) {
+		drm_err(dev, "Invalid iommu_mgr\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	existing_gem = check_own_buffer(dev, dma_buf);
+	if (existing_gem)
+		return existing_gem;
+
+	iommu_dev = get_iommu_device_for_import(qdev, &file_priv);
+	if (!iommu_dev || !iommu_dev->dev) {
+		drm_err(dev, "No IOMMU device assigned for prime import\n");
+		return ERR_PTR(-ENODEV);
+	}
+
+	drm_dbg_driver(dev, "Using IOMMU device %u for prime import\n", iommu_dev->id);
+
+	aligned_size = PAGE_ALIGN(dma_buf->size);
+	qda_gem_obj = qda_gem_alloc_object(dev, aligned_size);
+	if (IS_ERR(qda_gem_obj))
+		return ERR_CAST(qda_gem_obj);
+
+	qda_gem_obj->is_imported = true;
+	qda_gem_obj->dma_buf = dma_buf;
+	qda_gem_obj->virt = NULL;
+	qda_gem_obj->iommu_dev = iommu_dev;
+
+	get_dma_buf(dma_buf);
+
+	ret = setup_dma_buf_mapping(qda_gem_obj, dma_buf, iommu_dev->dev, qdev);
+	if (ret)
+		goto err_put_dma_buf;
+
+	ret = qda_memory_manager_alloc(qdev->iommu_mgr, qda_gem_obj, file_priv);
+	if (ret) {
+		drm_err(dev, "Failed to allocate IOMMU mapping: %d\n", ret);
+		goto err_unmap;
+	}
+
+	drm_dbg_driver(dev, "Prime import completed successfully size=%zu\n", aligned_size);
+	return &qda_gem_obj->base;
+
+err_unmap:
+	dma_buf_unmap_attachment_unlocked(qda_gem_obj->attachment,
+					  qda_gem_obj->sgt, DMA_BIDIRECTIONAL);
+	dma_buf_detach(dma_buf, qda_gem_obj->attachment);
+err_put_dma_buf:
+	dma_buf_put(dma_buf);
+	qda_gem_cleanup_object(qda_gem_obj);
+	return ERR_PTR(ret);
+}
+
+/**
+ * qda_prime_fd_to_handle() - Convert a PRIME fd to a GEM handle
+ * @dev: DRM device structure
+ * @file_priv: DRM file private data
+ * @prime_fd: File descriptor of the PRIME buffer
+ * @handle: Output GEM handle
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_prime_fd_to_handle(struct drm_device *dev, struct drm_file *file_priv,
+			   int prime_fd, u32 *handle)
+{
+	struct qda_dev *qdev = qda_dev_from_drm(dev);
+	int ret;
+
+	mutex_lock(&qdev->import_lock);
+	qdev->current_import_file_priv = file_priv;
+
+	ret = drm_gem_prime_fd_to_handle(dev, file_priv, prime_fd, handle);
+
+	qdev->current_import_file_priv = NULL;
+	mutex_unlock(&qdev->import_lock);
+
+	return ret;
+}
+
+MODULE_IMPORT_NS("DMA_BUF");
diff --git a/drivers/accel/qda/qda_prime.h b/drivers/accel/qda/qda_prime.h
new file mode 100644
index 000000000000..9b3850d54fa7
--- /dev/null
+++ b/drivers/accel/qda/qda_prime.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ */
+
+#ifndef __QDA_PRIME_H__
+#define __QDA_PRIME_H__
+
+#include <drm/drm_device.h>
+#include <drm/drm_file.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-buf.h>
+
+struct drm_gem_object *qda_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf);
+int qda_prime_fd_to_handle(struct drm_device *dev, struct drm_file *file_priv,
+			   int prime_fd, u32 *handle);
+
+#endif /* __QDA_PRIME_H__ */

-- 
2.34.1



^ permalink raw reply related

* [PATCH 12/15] accel/qda: Add FastRPC invocation support
From: Ekansh Gupta via B4 Relay @ 2026-05-19  6:16 UTC (permalink / raw)
  To: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
	Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König
  Cc: Bharath Kumar, Chenna Kesava Raju, srini, dmitry.baryshkov,
	andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
	linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig,
	Ekansh Gupta
In-Reply-To: <20260519-qda-series-v1-0-b2d984c297f8@oss.qualcomm.com>

From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>

Implement the FastRPC remote procedure call path, allowing user-space
to invoke methods on the DSP via DRM_IOCTL_QDA_REMOTE_INVOKE.

qda_fastrpc.c / qda_fastrpc.h
  Implements the FastRPC protocol layer: argument marshalling
  (qda_fastrpc_invoke_pack), response unmarshalling
  (qda_fastrpc_invoke_unpack), and invocation context lifecycle
  management. Each invocation allocates a fastrpc_invoke_context
  which tracks buffer descriptors, GEM objects, and the completion
  used to synchronise with the DSP response.

  Buffer arguments are handled in three ways:
  - DMA-BUF fd: imported via PRIME, IOMMU-mapped dma_addr used
  - Direct (inline): copied into the GEM-backed message buffer
  - DMA handle: fd forwarded to DSP, physical page descriptor computed

qda_rpmsg.c
  Implements qda_rpmsg_send_msg() which sends the wire-format
  fastrpc_msg (embedded as the first member of qda_msg) directly
  via rpmsg_send(), and qda_rpmsg_wait_for_rsp() which blocks on
  the context completion. The RPMsg callback dispatches responses
  to waiting contexts via the ctx_xa XArray.

qda_ioctl.c
  qda_ioctl_invoke() drives the full invocation lifecycle:
  allocate context → assign XArray ID → prepare args → allocate
  GEM message buffer → pack → send → wait → unpack → free.

qda_drv.h / qda_drv.c
  qda_dev gains ctx_xa (XArray for in-flight context lookup) and
  remote_session_id_counter (atomic counter for session IDs).
  qda_file_priv gains remote_session_id for per-session tracking.

include/uapi/drm/qda_accel.h
  Adds DRM_IOCTL_QDA_REMOTE_INVOKE (command 0x07; command numbers
  0x03–0x06 are reserved) and the associated drm_qda_invoke_args
  and drm_qda_fastrpc_invoke_args structures.

Assisted-by: Claude:claude-4-6-sonnet
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
---
 drivers/accel/qda/Makefile      |   1 +
 drivers/accel/qda/qda_drv.c     |  17 ++
 drivers/accel/qda/qda_drv.h     |   8 +
 drivers/accel/qda/qda_fastrpc.c | 597 ++++++++++++++++++++++++++++++++++++++++
 drivers/accel/qda/qda_fastrpc.h | 271 ++++++++++++++++++
 drivers/accel/qda/qda_ioctl.c   | 104 +++++++
 drivers/accel/qda/qda_ioctl.h   |   1 +
 drivers/accel/qda/qda_rpmsg.c   | 136 ++++++++-
 drivers/accel/qda/qda_rpmsg.h   |  17 ++
 include/uapi/drm/qda_accel.h    |  39 +++
 10 files changed, 1189 insertions(+), 2 deletions(-)

diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
index fb092e56d7f3..2d10420cd1ec 100644
--- a/drivers/accel/qda/Makefile
+++ b/drivers/accel/qda/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_DRM_ACCEL_QDA)	:= qda.o
 qda-y := \
 	qda_cb.o \
 	qda_drv.o \
+	qda_fastrpc.o \
 	qda_gem.o \
 	qda_ioctl.o \
 	qda_memory_dma.o \
diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
index ef8bd573b836..704c7d3127d2 100644
--- a/drivers/accel/qda/qda_drv.c
+++ b/drivers/accel/qda/qda_drv.c
@@ -26,6 +26,8 @@ static int qda_open(struct drm_device *dev, struct drm_file *file)
 
 	qda_file_priv->pid = current->pid;
 	qda_file_priv->qda_dev = qda_dev_from_drm(dev);
+	qda_file_priv->remote_session_id =
+		atomic_inc_return(&qda_file_priv->qda_dev->remote_session_id_counter);
 	file->driver_priv = qda_file_priv;
 
 	return 0;
@@ -57,6 +59,7 @@ static const struct drm_ioctl_desc qda_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(QDA_QUERY, qda_ioctl_query, 0),
 	DRM_IOCTL_DEF_DRV(QDA_GEM_CREATE, qda_ioctl_gem_create, 0),
 	DRM_IOCTL_DEF_DRV(QDA_GEM_MMAP_OFFSET, qda_ioctl_gem_mmap_offset, 0),
+	DRM_IOCTL_DEF_DRV(QDA_REMOTE_INVOKE, qda_ioctl_invoke, 0),
 };
 
 static const struct drm_driver qda_drm_driver = {
@@ -93,6 +96,17 @@ static void cleanup_memory_manager(struct qda_dev *qdev)
 	}
 }
 
+static void cleanup_device_resources(struct qda_dev *qdev)
+{
+	xa_destroy(&qdev->ctx_xa);
+}
+
+static void init_device_resources(struct qda_dev *qdev)
+{
+	atomic_set(&qdev->remote_session_id_counter, 0);
+	xa_init_flags(&qdev->ctx_xa, XA_FLAGS_ALLOC1);
+}
+
 static int init_memory_manager(struct qda_dev *qdev)
 {
 	qdev->iommu_mgr = kzalloc_obj(*qdev->iommu_mgr);
@@ -106,6 +120,7 @@ void qda_deinit_device(struct qda_dev *qdev)
 {
 	mutex_destroy(&qdev->import_lock);
 	cleanup_memory_manager(qdev);
+	cleanup_device_resources(qdev);
 }
 
 int qda_init_device(struct qda_dev *qdev)
@@ -114,10 +129,12 @@ int qda_init_device(struct qda_dev *qdev)
 
 	mutex_init(&qdev->import_lock);
 	qdev->current_import_file_priv = NULL;
+	init_device_resources(qdev);
 
 	ret = init_memory_manager(qdev);
 	if (ret) {
 		drm_err(&qdev->drm_dev, "Failed to initialize memory manager: %d\n", ret);
+		cleanup_device_resources(qdev);
 		mutex_destroy(&qdev->import_lock);
 	}
 
diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
index 96ce4135e2d9..420cccff42bf 100644
--- a/drivers/accel/qda/qda_drv.h
+++ b/drivers/accel/qda/qda_drv.h
@@ -6,10 +6,12 @@
 #ifndef __QDA_DRV_H__
 #define __QDA_DRV_H__
 
+#include <linux/atomic.h>
 #include <linux/device.h>
 #include <linux/list.h>
 #include <linux/rpmsg.h>
 #include <linux/types.h>
+#include <linux/xarray.h>
 #include <drm/drm_device.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_file.h>
@@ -28,6 +30,8 @@ struct qda_file_priv {
 	struct qda_iommu_device *assigned_iommu_dev;
 	/** @pid: Process ID for tracking */
 	pid_t pid;
+	/** @remote_session_id: Unique session identifier */
+	u32 remote_session_id;
 };
 
 /**
@@ -51,8 +55,12 @@ struct qda_dev {
 	struct mutex import_lock;
 	/** @current_import_file_priv: Current file_priv during prime import */
 	struct drm_file *current_import_file_priv;
+	/** @ctx_xa: XArray for FastRPC context management */
+	struct xarray ctx_xa;
 	/** @dsp_name: Name of the DSP domain (e.g. "cdsp", "adsp") */
 	const char *dsp_name;
+	/** @remote_session_id_counter: Atomic counter for unique session IDs */
+	atomic_t remote_session_id_counter;
 };
 
 /**
diff --git a/drivers/accel/qda/qda_fastrpc.c b/drivers/accel/qda/qda_fastrpc.c
new file mode 100644
index 000000000000..0ec37175a098
--- /dev/null
+++ b/drivers/accel/qda/qda_fastrpc.c
@@ -0,0 +1,597 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/sort.h>
+#include <linux/completion.h>
+#include <linux/dma-buf.h>
+#include <drm/drm_gem.h>
+#include "qda_fastrpc.h"
+#include "qda_drv.h"
+#include "qda_gem.h"
+#include "qda_memory_manager.h"
+#include "qda_prime.h"
+
+/**
+ * get_gem_obj_from_dmabuf_fd() - Import a DMA-BUF fd and return the GEM object
+ * @ctx:       FastRPC invocation context
+ * @dmabuf_fd: DMA-BUF file descriptor supplied by user space
+ * @gem_obj:   Output GEM object (caller must call drm_gem_object_put() when done)
+ *
+ * Imports the DMA-BUF fd into the QDA device via qda_prime_fd_to_handle()
+ * (which performs IOMMU device assignment for newly imported buffers) and
+ * then looks up the resulting GEM object.  The caller is responsible for
+ * calling drm_gem_object_put() on the returned object.
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+static int get_gem_obj_from_dmabuf_fd(struct fastrpc_invoke_context *ctx,
+				      int dmabuf_fd,
+				      struct drm_gem_object **gem_obj)
+{
+	struct drm_device *dev = ctx->file_priv->minor->dev;
+	u32 handle;
+	int ret;
+
+	ret = qda_prime_fd_to_handle(dev, ctx->file_priv, dmabuf_fd, &handle);
+	if (ret)
+		return ret;
+
+	*gem_obj = drm_gem_object_lookup(ctx->file_priv, handle);
+	if (!*gem_obj)
+		return -ENOENT;
+
+	return 0;
+}
+
+static void setup_pages_from_gem_obj(struct qda_gem_obj *qda_gem_obj,
+				     struct fastrpc_phy_page *pages)
+{
+	pages->addr = qda_gem_obj->dma_addr;
+	pages->size = qda_gem_obj->size;
+}
+
+static u64 calculate_vma_offset(u64 user_ptr)
+{
+	struct vm_area_struct *vma;
+	u64 user_ptr_page_mask = user_ptr & PAGE_MASK;
+	u64 vma_offset = 0;
+
+	mmap_read_lock(current->mm);
+	vma = find_vma(current->mm, user_ptr);
+	if (vma)
+		vma_offset = user_ptr_page_mask - vma->vm_start;
+	mmap_read_unlock(current->mm);
+
+	return vma_offset;
+}
+
+static u64 calculate_page_aligned_size(u64 ptr, u64 len)
+{
+	u64 pg_start = (ptr & PAGE_MASK) >> PAGE_SHIFT;
+	u64 pg_end = ((ptr + len - 1) & PAGE_MASK) >> PAGE_SHIFT;
+	u64 aligned_size = (pg_end - pg_start + 1) * PAGE_SIZE;
+
+	return aligned_size;
+}
+
+static struct fastrpc_invoke_buf *fastrpc_invoke_buf_start(union fastrpc_remote_arg *pra, int len)
+{
+	return (struct fastrpc_invoke_buf *)(&pra[len]);
+}
+
+static struct fastrpc_phy_page *fastrpc_phy_page_start(struct fastrpc_invoke_buf *buf, int len)
+{
+	return (struct fastrpc_phy_page *)(&buf[len]);
+}
+
+static int fastrpc_get_meta_size(struct fastrpc_invoke_context *ctx)
+{
+	int size = 0;
+
+	size = (sizeof(struct fastrpc_remote_buf) +
+		sizeof(struct fastrpc_invoke_buf) +
+		sizeof(struct fastrpc_phy_page)) * ctx->nscalars +
+		sizeof(u64) * FASTRPC_MAX_FDLIST +
+		sizeof(u32) * FASTRPC_MAX_CRCLIST;
+
+	return size;
+}
+
+static u64 fastrpc_get_payload_size(struct fastrpc_invoke_context *ctx, int metalen)
+{
+	u64 size = 0;
+	int oix;
+
+	size = ALIGN(metalen, FASTRPC_ALIGN);
+
+	for (oix = 0; oix < ctx->nbufs; oix++) {
+		int i = ctx->olaps[oix].raix;
+
+		if (ctx->args[i].fd == 0 || ctx->args[i].fd == -1) {
+			if (ctx->olaps[oix].offset == 0)
+				size = ALIGN(size, FASTRPC_ALIGN);
+
+			size += (ctx->olaps[oix].mend - ctx->olaps[oix].mstart);
+		}
+	}
+
+	return size;
+}
+
+/**
+ * qda_fastrpc_context_free() - Free an invocation context
+ * @ref: Reference counter embedded in the context
+ *
+ * Called when the reference count reaches zero; releases all resources
+ * associated with the invocation context.
+ */
+void qda_fastrpc_context_free(struct kref *ref)
+{
+	struct fastrpc_invoke_context *ctx;
+	int i;
+
+	ctx = container_of(ref, struct fastrpc_invoke_context, refcount);
+	if (ctx->gem_objs) {
+		for (i = 0; i < ctx->nscalars; ++i) {
+			if (ctx->gem_objs[i])
+				drm_gem_object_put(ctx->gem_objs[i]);
+		}
+		kfree(ctx->gem_objs);
+	}
+
+	if (ctx->msg_gem_obj)
+		drm_gem_object_put(&ctx->msg_gem_obj->base);
+
+	kfree(ctx->olaps);
+
+	kfree(ctx->args);
+	kfree(ctx->req);
+	kfree(ctx->rsp);
+	kfree(ctx->input_pages);
+	kfree(ctx->inbuf);
+
+	kfree(ctx);
+}
+
+#define CMP(aa, bb) ((aa) == (bb) ? 0 : (aa) < (bb) ? -1 : 1)
+
+static int olaps_cmp(const void *a, const void *b)
+{
+	struct fastrpc_buf_overlap *pa = (struct fastrpc_buf_overlap *)a;
+	struct fastrpc_buf_overlap *pb = (struct fastrpc_buf_overlap *)b;
+	/* sort with lowest starting buffer first */
+	int st = CMP(pa->start, pb->start);
+	/* sort with highest ending buffer first */
+	int ed = CMP(pb->end, pa->end);
+
+	return st == 0 ? ed : st;
+}
+
+static void fastrpc_get_buff_overlaps(struct fastrpc_invoke_context *ctx)
+{
+	u64 max_end = 0;
+	int i;
+
+	for (i = 0; i < ctx->nbufs; ++i) {
+		ctx->olaps[i].start = ctx->args[i].ptr;
+		ctx->olaps[i].end = ctx->olaps[i].start + ctx->args[i].length;
+		ctx->olaps[i].raix = i;
+	}
+
+	sort(ctx->olaps, ctx->nbufs, sizeof(*ctx->olaps), olaps_cmp, NULL);
+
+	for (i = 0; i < ctx->nbufs; ++i) {
+		if (ctx->olaps[i].start < max_end) {
+			ctx->olaps[i].mstart = max_end;
+			ctx->olaps[i].mend = ctx->olaps[i].end;
+			ctx->olaps[i].offset = max_end - ctx->olaps[i].start;
+
+			if (ctx->olaps[i].end > max_end) {
+				max_end = ctx->olaps[i].end;
+			} else {
+				ctx->olaps[i].mend = 0;
+				ctx->olaps[i].mstart = 0;
+			}
+		} else {
+			ctx->olaps[i].mend = ctx->olaps[i].end;
+			ctx->olaps[i].mstart = ctx->olaps[i].start;
+			ctx->olaps[i].offset = 0;
+			max_end = ctx->olaps[i].end;
+		}
+	}
+}
+
+/**
+ * qda_fastrpc_context_alloc() - Allocate a new FastRPC invocation context
+ *
+ * Return: Pointer to allocated context, or ERR_PTR on failure
+ */
+struct fastrpc_invoke_context *qda_fastrpc_context_alloc(void)
+{
+	struct fastrpc_invoke_context *ctx = NULL;
+
+	ctx = kzalloc_obj(*ctx);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	INIT_LIST_HEAD(&ctx->node);
+
+	ctx->retval = -1;
+	ctx->pid = current->pid;
+	init_completion(&ctx->work);
+	ctx->msg_gem_obj = NULL;
+	kref_init(&ctx->refcount);
+
+	return ctx;
+}
+
+/*
+ * process_fd_buffer() - Handle an in/out buffer argument backed by a DMA-BUF fd
+ *
+ * args[i].fd is a DMA-BUF fd.  We import it to obtain the GEM object and its
+ * IOMMU-mapped dma_addr for the physical page descriptor.  The DSP uses the
+ * physical address directly for this buffer type; the fd is not forwarded.
+ */
+static int process_fd_buffer(struct fastrpc_invoke_context *ctx, int i,
+			     union fastrpc_remote_arg *rpra, struct fastrpc_phy_page *pages)
+{
+	struct drm_gem_object *gem_obj;
+	struct qda_gem_obj *qda_gem_obj;
+	int err;
+	u64 len = ctx->args[i].length;
+	u64 vma_offset;
+
+	err = get_gem_obj_from_dmabuf_fd(ctx, ctx->args[i].fd, &gem_obj);
+	if (err)
+		return err;
+
+	ctx->gem_objs[i] = gem_obj;
+	qda_gem_obj = to_qda_gem_obj(gem_obj);
+
+	rpra[i].buf.pv = (u64)ctx->args[i].ptr;
+
+	pages[i].addr = qda_gem_obj->dma_addr;
+
+	vma_offset = calculate_vma_offset(ctx->args[i].ptr);
+	pages[i].addr += vma_offset;
+	pages[i].size = calculate_page_aligned_size(ctx->args[i].ptr, len);
+
+	return 0;
+}
+
+static int process_direct_buffer(struct fastrpc_invoke_context *ctx, int i, int oix,
+				 union fastrpc_remote_arg *rpra, struct fastrpc_phy_page *pages,
+				 uintptr_t *args, u64 *rlen, u64 pkt_size)
+{
+	int mlen;
+	u64 len = ctx->args[i].length;
+	int inbufs = ctx->inbufs;
+
+	if (ctx->olaps[oix].offset == 0) {
+		*rlen -= ALIGN(*args, FASTRPC_ALIGN) - *args;
+		*args = ALIGN(*args, FASTRPC_ALIGN);
+	}
+
+	mlen = ctx->olaps[oix].mend - ctx->olaps[oix].mstart;
+
+	if (*rlen < mlen)
+		return -ENOSPC;
+
+	rpra[i].buf.pv = *args - ctx->olaps[oix].offset;
+
+	pages[i].addr = ctx->msg->phys - ctx->olaps[oix].offset + (pkt_size - *rlen);
+	pages[i].addr = pages[i].addr & PAGE_MASK;
+	pages[i].size = calculate_page_aligned_size(rpra[i].buf.pv, len);
+
+	*args = *args + mlen;
+	*rlen -= mlen;
+
+	if (i < inbufs) {
+		void *dst = (void *)(uintptr_t)rpra[i].buf.pv;
+		void *src = (void *)(uintptr_t)ctx->args[i].ptr;
+
+		/*
+		 * For user-space invocations (INVOKE_DYNAMIC), ptr is a user
+		 * virtual address and must be copied safely. For all other
+		 * (kernel-internal) invocations, ptr is a kernel address set
+		 * by the driver itself and can be copied directly.
+		 */
+		if (ctx->type == FASTRPC_RMID_INVOKE_DYNAMIC) {
+			if (copy_from_user(dst, (void __user *)src, len))
+				return -EFAULT;
+		} else {
+			memcpy(dst, src, len);
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * process_dma_handle() - Handle a DMA-handle scalar argument
+ *
+ * args[i].fd is a DMA-BUF fd.  We import it to get the physical page
+ * descriptor for the kernel, but forward the original DMA-BUF fd to the
+ * DSP in rpra[i].dma.fd so the DSP can identify the buffer by its fd.
+ */
+static int process_dma_handle(struct fastrpc_invoke_context *ctx, int i,
+			      union fastrpc_remote_arg *rpra, struct fastrpc_phy_page *pages)
+{
+	if (ctx->args[i].fd > 0) {
+		struct drm_gem_object *gem_obj;
+		struct qda_gem_obj *qda_gem_obj;
+		int err;
+
+		err = get_gem_obj_from_dmabuf_fd(ctx, ctx->args[i].fd, &gem_obj);
+		if (err)
+			return err;
+
+		ctx->gem_objs[i] = gem_obj;
+		qda_gem_obj = to_qda_gem_obj(gem_obj);
+
+		setup_pages_from_gem_obj(qda_gem_obj, &pages[i]);
+
+		/* Forward the original DMA-BUF fd to the DSP */
+		rpra[i].dma.fd     = ctx->args[i].fd;
+		rpra[i].dma.len    = ctx->args[i].length;
+		rpra[i].dma.offset = (u64)ctx->args[i].ptr;
+	} else {
+		rpra[i].buf.pv  = ctx->args[i].ptr;
+		rpra[i].buf.len = ctx->args[i].length;
+	}
+
+	return 0;
+}
+
+/**
+ * qda_fastrpc_get_header_size() - Compute the FastRPC message header size
+ * @ctx: FastRPC invocation context
+ * @out_size: Pointer to store the aligned packet size in bytes
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_fastrpc_get_header_size(struct fastrpc_invoke_context *ctx, size_t *out_size)
+{
+	ctx->inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
+	ctx->metalen = fastrpc_get_meta_size(ctx);
+	ctx->pkt_size = fastrpc_get_payload_size(ctx, ctx->metalen);
+
+	ctx->aligned_pkt_size = PAGE_ALIGN(ctx->pkt_size);
+	if (ctx->aligned_pkt_size == 0)
+		return -EINVAL;
+
+	*out_size = ctx->aligned_pkt_size;
+	return 0;
+}
+
+static int fastrpc_get_args(struct fastrpc_invoke_context *ctx)
+{
+	union fastrpc_remote_arg *rpra;
+	struct fastrpc_invoke_buf *list;
+	struct fastrpc_phy_page *pages;
+	int i, oix, err = 0;
+	u64 rlen;
+	uintptr_t args;
+	size_t hdr_size;
+
+	ctx->inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
+	err = qda_fastrpc_get_header_size(ctx, &hdr_size);
+	if (err)
+		return err;
+
+	ctx->msg->buf = ctx->msg_gem_obj->virt;
+	ctx->msg->phys = ctx->msg_gem_obj->dma_addr;
+
+	memset(ctx->msg->buf, 0, ctx->aligned_pkt_size);
+
+	rpra = (union fastrpc_remote_arg *)ctx->msg->buf;
+	ctx->list = fastrpc_invoke_buf_start(rpra, ctx->nscalars);
+	ctx->pages = fastrpc_phy_page_start(ctx->list, ctx->nscalars);
+	list = ctx->list;
+	pages = ctx->pages;
+	args = (uintptr_t)ctx->msg->buf + ctx->metalen;
+	rlen = ctx->pkt_size - ctx->metalen;
+	ctx->rpra = rpra;
+
+	for (oix = 0; oix < ctx->nbufs; ++oix) {
+		i = ctx->olaps[oix].raix;
+
+		rpra[i].buf.pv = 0;
+		rpra[i].buf.len = ctx->args[i].length;
+		list[i].num = ctx->args[i].length ? 1 : 0;
+		list[i].pgidx = i;
+
+		if (!ctx->args[i].length)
+			continue;
+
+		if (ctx->args[i].fd > 0)
+			err = process_fd_buffer(ctx, i, rpra, pages);
+		else
+			err = process_direct_buffer(ctx, i, oix, rpra, pages, &args, &rlen,
+						    ctx->pkt_size);
+
+		if (err)
+			goto bail_gem;
+	}
+
+	for (i = ctx->nbufs; i < ctx->nscalars; ++i) {
+		list[i].num = ctx->args[i].length ? 1 : 0;
+		list[i].pgidx = i;
+
+		err = process_dma_handle(ctx, i, rpra, pages);
+		if (err)
+			goto bail_gem;
+	}
+
+	return 0;
+
+bail_gem:
+	if (ctx->msg_gem_obj) {
+		drm_gem_object_put(&ctx->msg_gem_obj->base);
+		ctx->msg_gem_obj = NULL;
+	}
+
+	return err;
+}
+
+static int fastrpc_put_args(struct fastrpc_invoke_context *ctx, struct qda_msg *msg)
+{
+	union fastrpc_remote_arg *rpra;
+	int i, err = 0;
+
+	if (!ctx)
+		return -EINVAL;
+
+	rpra = ctx->rpra;
+	if (!rpra)
+		return -EINVAL;
+
+	for (i = ctx->inbufs; i < ctx->nbufs; ++i) {
+		if (ctx->args[i].fd <= 0) {
+			void *src = (void *)(uintptr_t)rpra[i].buf.pv;
+			void *dst = (void *)(uintptr_t)ctx->args[i].ptr;
+			u64 len = rpra[i].buf.len;
+
+			if (ctx->type == FASTRPC_RMID_INVOKE_DYNAMIC)
+				err = copy_to_user((void __user *)dst, src, len) ? -EFAULT : 0;
+			else
+				memcpy(dst, src, len);
+			if (err)
+				break;
+		}
+	}
+
+	return err;
+}
+
+/**
+ * qda_fastrpc_invoke_pack() - Pack an invocation context into a QDA message
+ * @ctx: FastRPC invocation context
+ * @msg: QDA message structure to pack into
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_fastrpc_invoke_pack(struct fastrpc_invoke_context *ctx,
+			    struct qda_msg *msg)
+{
+	int err = 0;
+
+	if (ctx->handle == FASTRPC_INIT_HANDLE)
+		msg->fastrpc.remote_session_id = 0;
+	else
+		msg->fastrpc.remote_session_id = ctx->remote_session_id;
+
+	ctx->msg = msg;
+
+	err = fastrpc_get_args(ctx);
+	if (err)
+		return err;
+
+	dma_wmb();
+
+	msg->fastrpc.tid    = ctx->pid;
+	msg->fastrpc.ctx    = ctx->ctxid | ctx->pd;
+	msg->fastrpc.handle = ctx->handle;
+	msg->fastrpc.sc     = ctx->sc;
+	msg->fastrpc.addr   = ctx->msg->phys;
+	msg->fastrpc.size   = roundup(ctx->pkt_size, PAGE_SIZE);
+	msg->fastrpc_ctx    = ctx;
+	msg->file_priv      = ctx->file_priv;
+
+	return 0;
+}
+
+/**
+ * qda_fastrpc_invoke_unpack() - Unpack a response message into an invocation context
+ * @ctx: FastRPC invocation context
+ * @msg: QDA message structure to unpack from
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_fastrpc_invoke_unpack(struct fastrpc_invoke_context *ctx,
+			      struct qda_msg *msg)
+{
+	int err;
+
+	dma_rmb();
+
+	err = fastrpc_put_args(ctx, msg);
+	if (err)
+		return err;
+
+	err = ctx->retval;
+	return err;
+}
+
+static int fastrpc_prepare_args_invoke(struct fastrpc_invoke_context *ctx, char __user *argp)
+{
+	struct drm_qda_invoke_args invoke_args;
+	struct drm_qda_fastrpc_invoke_args *args = NULL;
+	u32 nscalars;
+
+	/* argp is DRM ioctl data (kernel pointer); args pointer within it is user-space */
+	memcpy(&invoke_args, argp, sizeof(invoke_args));
+
+	ctx->handle = invoke_args.handle;
+	ctx->sc = invoke_args.sc;
+
+	nscalars = REMOTE_SCALARS_LENGTH(ctx->sc);
+	if (!nscalars) {
+		ctx->args = NULL;
+		return 0;
+	}
+
+	args = kcalloc(nscalars, sizeof(*args), GFP_KERNEL);
+	if (!args)
+		return -ENOMEM;
+
+	if (copy_from_user(args, u64_to_user_ptr(invoke_args.args),
+			   nscalars * sizeof(*args))) {
+		kfree(args);
+		return -EFAULT;
+	}
+
+	ctx->args = args;
+	return 0;
+}
+
+/**
+ * qda_fastrpc_prepare_args() - Prepare arguments for a FastRPC invocation
+ * @ctx: FastRPC invocation context
+ * @argp: User-space pointer to invocation arguments
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *argp)
+{
+	int err;
+
+	switch (ctx->type) {
+	case FASTRPC_RMID_INVOKE_DYNAMIC:
+		err = fastrpc_prepare_args_invoke(ctx, argp);
+		break;
+	default:
+		return -EINVAL;
+	}
+	if (err)
+		return err;
+
+	ctx->nscalars = REMOTE_SCALARS_LENGTH(ctx->sc);
+	ctx->nbufs = REMOTE_SCALARS_INBUFS(ctx->sc) + REMOTE_SCALARS_OUTBUFS(ctx->sc);
+
+	if (ctx->nscalars) {
+		ctx->gem_objs = kcalloc(ctx->nscalars, sizeof(*ctx->gem_objs), GFP_KERNEL);
+		if (!ctx->gem_objs)
+			return -ENOMEM;
+		ctx->olaps = kcalloc(ctx->nscalars, sizeof(*ctx->olaps), GFP_KERNEL);
+		if (!ctx->olaps) {
+			kfree(ctx->gem_objs);
+			ctx->gem_objs = NULL;
+			return -ENOMEM;
+		}
+		fastrpc_get_buff_overlaps(ctx);
+	}
+
+	return err;
+}
diff --git a/drivers/accel/qda/qda_fastrpc.h b/drivers/accel/qda/qda_fastrpc.h
new file mode 100644
index 000000000000..ce77baeccfba
--- /dev/null
+++ b/drivers/accel/qda/qda_fastrpc.h
@@ -0,0 +1,271 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ */
+
+#ifndef __QDA_FASTRPC_H__
+#define __QDA_FASTRPC_H__
+
+#include <linux/completion.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/types.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+#include <drm/qda_accel.h>
+
+/* Forward declarations */
+struct qda_gem_obj;
+
+/*
+ * FastRPC scalar extraction macros
+ *
+ * These macros extract different fields from the scalar value that describes
+ * the arguments passed in a FastRPC invocation.
+ */
+#define REMOTE_SCALARS_INBUFS(sc)	(((sc) >> 16) & 0x0ff)
+#define REMOTE_SCALARS_OUTBUFS(sc)	(((sc) >> 8) & 0x0ff)
+#define REMOTE_SCALARS_INHANDLES(sc)	(((sc) >> 4) & 0x0f)
+#define REMOTE_SCALARS_OUTHANDLES(sc)	((sc) & 0x0f)
+#define REMOTE_SCALARS_LENGTH(sc)	(REMOTE_SCALARS_INBUFS(sc) +   \
+					 REMOTE_SCALARS_OUTBUFS(sc) +  \
+					 REMOTE_SCALARS_INHANDLES(sc) + \
+					 REMOTE_SCALARS_OUTHANDLES(sc))
+
+/* FastRPC configuration constants */
+#define FASTRPC_ALIGN		128		/* Alignment requirement */
+#define FASTRPC_MAX_FDLIST	16		/* Maximum file descriptors */
+#define FASTRPC_MAX_CRCLIST	64		/* Maximum CRC list entries */
+
+/*
+ * FastRPC scalar construction macros
+ *
+ * These macros build the scalar value that describes the arguments
+ * for a FastRPC invocation.
+ */
+#define FASTRPC_BUILD_SCALARS(attr, method, in, out, oin, oout)		\
+				(((attr & 0x07) << 29) |		\
+				((method & 0x1f) << 24) |		\
+				((in & 0xff) << 16) |			\
+				((out & 0xff) <<  8) |			\
+				((oin & 0x0f) <<  4) |			\
+				(oout & 0x0f))
+
+#define FASTRPC_SCALARS(method, in, out) \
+		FASTRPC_BUILD_SCALARS(0, method, in, out, 0, 0)
+
+/**
+ * struct fastrpc_buf_overlap - Buffer overlap tracking structure
+ *
+ * Tracks overlapping buffer regions to optimise memory mapping and avoid
+ * redundant mappings of the same physical memory.
+ */
+struct fastrpc_buf_overlap {
+	/** @start: Start address of the buffer in user virtual address space */
+	u64 start;
+	/** @end: End address of the buffer in user virtual address space */
+	u64 end;
+	/** @raix: Remote argument index associated with this overlap */
+	int raix;
+	/** @mstart: Start address of the mapped region */
+	u64 mstart;
+	/** @mend: End address of the mapped region */
+	u64 mend;
+	/** @offset: Offset within the mapped region */
+	u64 offset;
+};
+
+/**
+ * struct fastrpc_remote_dmahandle - Remote DMA handle descriptor
+ */
+struct fastrpc_remote_dmahandle {
+	/** @fd: DMA-BUF file descriptor */
+	s32 fd;
+	/** @offset: Byte offset within the DMA-BUF */
+	u32 offset;
+	/** @len: Length of the region in bytes */
+	u32 len;
+};
+
+/**
+ * struct fastrpc_remote_buf - Remote buffer descriptor
+ */
+struct fastrpc_remote_buf {
+	/** @pv: Buffer pointer (user virtual address) */
+	u64 pv;
+	/** @len: Length of the buffer in bytes */
+	u64 len;
+};
+
+/**
+ * union fastrpc_remote_arg - Remote argument (buffer or DMA handle)
+ */
+union fastrpc_remote_arg {
+	/** @buf: Inline buffer descriptor */
+	struct fastrpc_remote_buf buf;
+	/** @dma: DMA-BUF handle descriptor */
+	struct fastrpc_remote_dmahandle dma;
+};
+
+/**
+ * struct fastrpc_phy_page - Physical page descriptor
+ */
+struct fastrpc_phy_page {
+	/** @addr: Physical (IOMMU) address of the page */
+	u64 addr;
+	/** @size: Size of the contiguous region in bytes */
+	u64 size;
+};
+
+/**
+ * struct fastrpc_invoke_buf - Invoke buffer descriptor
+ */
+struct fastrpc_invoke_buf {
+	/** @num: Number of contiguous physical regions */
+	u32 num;
+	/** @pgidx: Index into the physical page array */
+	u32 pgidx;
+};
+
+/**
+ * struct fastrpc_msg - FastRPC wire message for remote invocations
+ *
+ * Sent to the remote processor via RPMsg. This is the exact layout
+ * the DSP expects; do not reorder or add fields without DSP firmware
+ * coordination.
+ */
+struct fastrpc_msg {
+	/** @remote_session_id: Session identifier on the remote processor */
+	int remote_session_id;
+	/** @tid: Thread ID of the invoking thread */
+	int tid;
+	/** @ctx: Context identifier for matching request/response */
+	u64 ctx;
+	/** @handle: Handle of the remote method to invoke */
+	u32 handle;
+	/** @sc: Scalars value encoding in/out buffer counts */
+	u32 sc;
+	/** @addr: Physical address of the message payload buffer */
+	u64 addr;
+	/** @size: Size of the message payload in bytes */
+	u64 size;
+};
+
+/**
+ * struct qda_msg - FastRPC message with kernel-internal bookkeeping
+ *
+ * The wire-format portion is kept in the embedded @fastrpc member (must
+ * be first) so that &qda_msg->fastrpc can be passed directly to
+ * rpmsg_send() without a copy.
+ */
+struct qda_msg {
+	/**
+	 * @fastrpc: Wire-format message sent to the DSP via RPMsg.
+	 * Must be the first member.
+	 */
+	struct fastrpc_msg fastrpc;
+	/** @buf: Kernel virtual address of the payload buffer */
+	void *buf;
+	/** @phys: Physical/DMA address of the payload buffer */
+	u64 phys;
+	/** @ret: Return value from the remote processor */
+	int ret;
+	/** @fastrpc_ctx: Back-pointer to the owning invocation context */
+	struct fastrpc_invoke_context *fastrpc_ctx;
+	/** @file_priv: DRM file private data for GEM object lookup */
+	struct drm_file *file_priv;
+};
+
+/**
+ * struct fastrpc_invoke_context - Remote procedure call invocation context
+ *
+ * Maintains all state for a single remote procedure call, including buffer
+ * management, synchronisation, and result handling.
+ */
+struct fastrpc_invoke_context {
+	/** @node: List node for linking contexts in a queue */
+	struct list_head node;
+	/** @ctxid: Unique context identifier (XArray key shifted left by 4) */
+	u64 ctxid;
+	/** @inbufs: Number of input buffers */
+	int inbufs;
+	/** @outbufs: Number of output buffers */
+	int outbufs;
+	/** @handles: Number of DMA-BUF handle arguments */
+	int handles;
+	/** @nscalars: Total number of scalar arguments */
+	int nscalars;
+	/** @nbufs: Total number of buffer arguments (inbufs + outbufs) */
+	int nbufs;
+	/** @pid: Process ID of the calling process */
+	int pid;
+	/** @retval: Return value from the remote invocation */
+	int retval;
+	/** @metalen: Length of the FastRPC metadata header in bytes */
+	int metalen;
+	/** @remote_session_id: Session identifier on the remote processor */
+	int remote_session_id;
+	/** @pd: Protection domain identifier encoded into the context ID */
+	int pd;
+	/** @type: Invocation type (e.g. FASTRPC_RMID_INVOKE_DYNAMIC) */
+	int type;
+	/** @sc: Scalars value encoding in/out buffer counts */
+	u32 sc;
+	/** @handle: Handle of the remote method being invoked */
+	u32 handle;
+	/** @crc: Pointer to CRC values for data integrity checking */
+	u32 *crc;
+	/** @fdlist: Pointer to array of DMA-BUF file descriptors */
+	u64 *fdlist;
+	/** @pkt_size: Total payload size in bytes */
+	u64 pkt_size;
+	/** @aligned_pkt_size: Page-aligned payload size for GEM allocation */
+	u64 aligned_pkt_size;
+	/** @list: Array of invoke buffer descriptors */
+	struct fastrpc_invoke_buf *list;
+	/** @pages: Array of physical page descriptors for all arguments */
+	struct fastrpc_phy_page *pages;
+	/** @input_pages: Array of physical page descriptors for input buffers */
+	struct fastrpc_phy_page *input_pages;
+	/** @work: Completion used to synchronise with the DSP response */
+	struct completion work;
+	/** @msg: Pointer to the QDA message structure for this invocation */
+	struct qda_msg *msg;
+	/** @rpra: Array of remote procedure arguments */
+	union fastrpc_remote_arg *rpra;
+	/** @gem_objs: Array of GEM objects imported for argument buffers */
+	struct drm_gem_object **gem_objs;
+	/** @args: User-space invoke argument descriptors */
+	struct drm_qda_fastrpc_invoke_args *args;
+	/** @olaps: Array of buffer overlap descriptors for deduplication */
+	struct fastrpc_buf_overlap *olaps;
+	/** @refcount: Reference counter for context lifetime management */
+	struct kref refcount;
+	/** @msg_gem_obj: GEM object backing the message payload buffer */
+	struct qda_gem_obj *msg_gem_obj;
+	/** @file_priv: DRM file private data */
+	struct drm_file *file_priv;
+	/** @init_mem_gem_obj: GEM object for protection domain init memory */
+	struct qda_gem_obj *init_mem_gem_obj;
+	/** @req: Pointer to kernel-internal request buffer */
+	void *req;
+	/** @rsp: Pointer to kernel-internal response buffer */
+	void *rsp;
+	/** @inbuf: Pointer to kernel-internal input buffer */
+	void *inbuf;
+};
+
+/* Remote Method ID table - identifies initialization and control operations */
+#define FASTRPC_RMID_INVOKE_DYNAMIC	0xFFFFFFFF	/* Dynamic method invocation */
+
+/* Common handle for initialization operations */
+#define FASTRPC_INIT_HANDLE		0x1
+
+void qda_fastrpc_context_free(struct kref *ref);
+struct fastrpc_invoke_context *qda_fastrpc_context_alloc(void);
+int qda_fastrpc_prepare_args(struct fastrpc_invoke_context *ctx, char __user *argp);
+int qda_fastrpc_get_header_size(struct fastrpc_invoke_context *ctx, size_t *out_size);
+int qda_fastrpc_invoke_pack(struct fastrpc_invoke_context *ctx, struct qda_msg *msg);
+int qda_fastrpc_invoke_unpack(struct fastrpc_invoke_context *ctx, struct qda_msg *msg);
+
+#endif /* __QDA_FASTRPC_H__ */
diff --git a/drivers/accel/qda/qda_ioctl.c b/drivers/accel/qda/qda_ioctl.c
index 1769c85a3e98..c81268c20b04 100644
--- a/drivers/accel/qda/qda_ioctl.c
+++ b/drivers/accel/qda/qda_ioctl.c
@@ -3,8 +3,10 @@
 #include <drm/drm_ioctl.h>
 #include <drm/qda_accel.h>
 #include "qda_drv.h"
+#include "qda_fastrpc.h"
 #include "qda_gem.h"
 #include "qda_ioctl.h"
+#include "qda_rpmsg.h"
 
 /**
  * qda_ioctl_query() - Query DSP device information
@@ -74,3 +76,105 @@ int qda_ioctl_gem_mmap_offset(struct drm_device *dev, void *data, struct drm_fil
 
 	return drm_gem_dumb_map_offset(file_priv, dev, args->handle, &args->offset);
 }
+
+static int fastrpc_context_get_id(struct fastrpc_invoke_context *ctx, struct qda_dev *qdev)
+{
+	int ret;
+	u32 id;
+
+	if (!qdev)
+		return -EINVAL;
+
+	ret = xa_alloc(&qdev->ctx_xa, &id, ctx, xa_limit_32b, GFP_KERNEL);
+	if (ret)
+		return ret;
+
+	ctx->ctxid = id << 4;
+	return 0;
+}
+
+static void fastrpc_context_put_id(struct fastrpc_invoke_context *ctx, struct qda_dev *qdev)
+{
+	if (qdev)
+		xa_erase(&qdev->ctx_xa, ctx->ctxid >> 4);
+}
+
+static int fastrpc_invoke(int type, struct drm_device *dev, void *data,
+			  struct drm_file *file_priv)
+{
+	struct qda_file_priv *qda_file_priv = file_priv->driver_priv;
+	struct qda_dev *qdev = qda_file_priv->qda_dev;
+	struct qda_msg msg;
+	struct fastrpc_invoke_context *ctx;
+	struct drm_gem_object *gem_obj;
+	int err;
+	size_t hdr_size;
+
+	ctx = qda_fastrpc_context_alloc();
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	err = fastrpc_context_get_id(ctx, qdev);
+	if (err) {
+		kref_put(&ctx->refcount, qda_fastrpc_context_free);
+		return err;
+	}
+
+	ctx->type = type;
+	ctx->file_priv = file_priv;
+	ctx->remote_session_id = qda_file_priv->remote_session_id;
+
+	err = qda_fastrpc_prepare_args(ctx, (char __user *)data);
+	if (err)
+		goto err_context_free;
+
+	err = qda_fastrpc_get_header_size(ctx, &hdr_size);
+	if (err)
+		goto err_context_free;
+
+	gem_obj = qda_gem_create_object(dev, qdev->iommu_mgr, hdr_size, file_priv);
+	if (IS_ERR(gem_obj)) {
+		err = PTR_ERR(gem_obj);
+		goto err_context_free;
+	}
+
+	ctx->msg_gem_obj = to_qda_gem_obj(gem_obj);
+
+	err = qda_fastrpc_invoke_pack(ctx, &msg);
+	if (err)
+		goto err_context_free;
+
+	err = qda_rpmsg_send_msg(qdev, &msg);
+	if (err)
+		goto err_context_free;
+
+	err = qda_rpmsg_wait_for_rsp(ctx);
+	if (err)
+		goto err_context_free;
+
+	err = qda_fastrpc_invoke_unpack(ctx, &msg);
+	if (err)
+		goto err_context_free;
+
+	fastrpc_context_put_id(ctx, qdev);
+	kref_put(&ctx->refcount, qda_fastrpc_context_free);
+	return 0;
+
+err_context_free:
+	fastrpc_context_put_id(ctx, qdev);
+	kref_put(&ctx->refcount, qda_fastrpc_context_free);
+	return err;
+}
+
+/**
+ * qda_ioctl_invoke() - Perform a dynamic FastRPC method invocation
+ * @dev: DRM device structure
+ * @data: User-space data (struct qda_invoke_args)
+ * @file_priv: DRM file private data
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_ioctl_invoke(struct drm_device *dev, void *data, struct drm_file *file_priv)
+{
+	return fastrpc_invoke(FASTRPC_RMID_INVOKE_DYNAMIC, dev, data, file_priv);
+}
diff --git a/drivers/accel/qda/qda_ioctl.h b/drivers/accel/qda/qda_ioctl.h
index d1cbbfb6d965..3bb9cfd98370 100644
--- a/drivers/accel/qda/qda_ioctl.h
+++ b/drivers/accel/qda/qda_ioctl.h
@@ -11,5 +11,6 @@
 int qda_ioctl_query(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_gem_create(struct drm_device *dev, void *data, struct drm_file *file_priv);
 int qda_ioctl_gem_mmap_offset(struct drm_device *dev, void *data, struct drm_file *file_priv);
+int qda_ioctl_invoke(struct drm_device *dev, void *data, struct drm_file *file_priv);
 
 #endif /* __QDA_IOCTL_H__ */
diff --git a/drivers/accel/qda/qda_rpmsg.c b/drivers/accel/qda/qda_rpmsg.c
index 719dabb028c5..44b12a9f2808 100644
--- a/drivers/accel/qda/qda_rpmsg.c
+++ b/drivers/accel/qda/qda_rpmsg.c
@@ -1,14 +1,81 @@
 // SPDX-License-Identifier: GPL-2.0-only
 // Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+#include <linux/completion.h>
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/rpmsg.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
 #include <drm/drm_print.h>
 
 #include "qda_cb.h"
 #include "qda_drv.h"
+#include "qda_fastrpc.h"
 #include "qda_rpmsg.h"
 
+static int validate_device_availability(struct qda_dev *qdev)
+{
+	if (!qdev)
+		return -ENODEV;
+
+	if (!qdev->rpdev) {
+		drm_dbg_driver(&qdev->drm_dev, "RPMsg device unavailable: rpdev is NULL\n");
+		return -ENODEV;
+	}
+	return 0;
+}
+
+static struct fastrpc_invoke_context *get_and_validate_context(struct qda_msg *msg,
+							       struct qda_dev *qdev)
+{
+	struct fastrpc_invoke_context *ctx = msg->fastrpc_ctx;
+
+	if (!ctx) {
+		drm_dbg_driver(&qdev->drm_dev, "FastRPC context not found in message\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	kref_get(&ctx->refcount);
+	return ctx;
+}
+
+static int validate_callback_params(struct qda_dev *qdev, void *data, int len)
+{
+	if (!qdev)
+		return -ENODEV;
+
+	if (len < sizeof(struct qda_invoke_rsp)) {
+		drm_dbg_driver(&qdev->drm_dev, "Invalid message size from remote: %d\n", len);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static unsigned long extract_context_id(struct qda_invoke_rsp *resp_msg)
+{
+	return resp_msg->ctx >> 4;
+}
+
+static struct fastrpc_invoke_context *find_context_by_id(struct qda_dev *qdev,
+							 unsigned long ctxid)
+{
+	struct fastrpc_invoke_context *ctx;
+
+	ctx = xa_load(&qdev->ctx_xa, ctxid);
+	if (!ctx) {
+		drm_dbg_driver(&qdev->drm_dev, "FastRPC context not found for ctxid: %lu\n", ctxid);
+		return ERR_PTR(-ENOENT);
+	}
+	return ctx;
+}
+
+static void complete_context_processing(struct fastrpc_invoke_context *ctx, int retval)
+{
+	ctx->retval = retval;
+	complete(&ctx->work);
+	kref_put(&ctx->refcount, qda_fastrpc_context_free);
+}
+
 static struct qda_dev *alloc_and_init_qdev(struct rpmsg_device *rpdev)
 {
 	struct qda_dev *qdev;
@@ -24,11 +91,76 @@ static struct qda_dev *alloc_and_init_qdev(struct rpmsg_device *rpdev)
 	return qdev;
 }
 
+int qda_rpmsg_send_msg(struct qda_dev *qdev, struct qda_msg *msg)
+{
+	int ret, idx;
+	struct fastrpc_invoke_context *ctx;
+
+	if (!qdev)
+		return -ENODEV;
+
+	if (!drm_dev_enter(&qdev->drm_dev, &idx))
+		return -ENODEV;
+
+	ret = validate_device_availability(qdev);
+	if (ret)
+		goto out_exit;
+
+	ctx = get_and_validate_context(msg, qdev);
+	if (IS_ERR(ctx)) {
+		ret = PTR_ERR(ctx);
+		goto out_exit;
+	}
+
+	ret = rpmsg_send(qdev->rpdev->ept, &msg->fastrpc, sizeof(msg->fastrpc));
+	if (ret) {
+		drm_err(&qdev->drm_dev, "rpmsg_send failed: %d\n", ret);
+		kref_put(&ctx->refcount, qda_fastrpc_context_free);
+	}
+
+out_exit:
+	drm_dev_exit(idx);
+	return ret;
+}
+
+int qda_rpmsg_wait_for_rsp(struct fastrpc_invoke_context *ctx)
+{
+	return wait_for_completion_interruptible(&ctx->work);
+}
+
 static int qda_rpmsg_cb(struct rpmsg_device *rpdev, void *data, int len,
 			void *priv, u32 src)
 {
-	/* Placeholder: responses will be dispatched here */
-	return 0;
+	struct qda_dev *qdev = dev_get_drvdata(&rpdev->dev);
+	struct qda_invoke_rsp *resp_msg = (struct qda_invoke_rsp *)data;
+	struct fastrpc_invoke_context *ctx;
+	unsigned long ctxid;
+	int ret, idx;
+
+	if (!qdev)
+		return -ENODEV;
+
+	if (!drm_dev_enter(&qdev->drm_dev, &idx))
+		return -ENODEV;
+
+	ret = validate_callback_params(qdev, data, len);
+	if (ret)
+		goto out_exit;
+
+	ctxid = extract_context_id(resp_msg);
+
+	ctx = find_context_by_id(qdev, ctxid);
+	if (IS_ERR(ctx)) {
+		ret = PTR_ERR(ctx);
+		goto out_exit;
+	}
+
+	complete_context_processing(ctx, resp_msg->retval);
+	ret = 0;
+
+out_exit:
+	drm_dev_exit(idx);
+	return ret;
 }
 
 static void qda_rpmsg_remove(struct rpmsg_device *rpdev)
diff --git a/drivers/accel/qda/qda_rpmsg.h b/drivers/accel/qda/qda_rpmsg.h
index 5229d834b34b..bf601e915017 100644
--- a/drivers/accel/qda/qda_rpmsg.h
+++ b/drivers/accel/qda/qda_rpmsg.h
@@ -6,6 +6,23 @@
 #ifndef __QDA_RPMSG_H__
 #define __QDA_RPMSG_H__
 
+#include "qda_drv.h"
+#include "qda_fastrpc.h"
+
+/**
+ * struct qda_invoke_rsp - Response structure for FastRPC invocations
+ */
+struct qda_invoke_rsp {
+	/** @ctx: Invoke caller context for matching request/response */
+	u64 ctx;
+	/** @retval: Return value from the remote invocation */
+	int retval;
+};
+
+/* RPMsg transport layer functions */
+int qda_rpmsg_send_msg(struct qda_dev *qdev, struct qda_msg *msg);
+int qda_rpmsg_wait_for_rsp(struct fastrpc_invoke_context *ctx);
+
 /* RPMsg transport layer registration */
 int qda_rpmsg_register(void);
 void qda_rpmsg_unregister(void);
diff --git a/include/uapi/drm/qda_accel.h b/include/uapi/drm/qda_accel.h
index 319e21aae0d6..72512213741f 100644
--- a/include/uapi/drm/qda_accel.h
+++ b/include/uapi/drm/qda_accel.h
@@ -21,6 +21,8 @@ extern "C" {
 #define DRM_QDA_QUERY		0x00
 #define DRM_QDA_GEM_CREATE		0x01
 #define DRM_QDA_GEM_MMAP_OFFSET	0x02
+/* Command numbers 0x03-0x06 reserved for INIT_ATTACH, INIT_CREATE, MAP, MUNMAP */
+#define DRM_QDA_REMOTE_INVOKE		0x07
 
 /*
  * QDA IOCTL definitions
@@ -35,6 +37,8 @@ extern "C" {
 					  struct drm_qda_gem_create)
 #define DRM_IOCTL_QDA_GEM_MMAP_OFFSET	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_GEM_MMAP_OFFSET, \
 					  struct drm_qda_gem_mmap_offset)
+#define DRM_IOCTL_QDA_REMOTE_INVOKE	DRM_IOWR(DRM_COMMAND_BASE + DRM_QDA_REMOTE_INVOKE, \
+					  struct drm_qda_invoke_args)
 
 /**
  * struct drm_qda_query - Device information query structure
@@ -78,6 +82,41 @@ struct drm_qda_gem_mmap_offset {
 	__u32 pad;
 };
 
+/**
+ * struct drm_qda_fastrpc_invoke_args - FastRPC invocation argument descriptor
+ * @ptr: Pointer to argument data (user virtual address)
+ * @length: Length of the argument data in bytes
+ * @fd: DMA-BUF file descriptor for buffer arguments, -1/0 for scalar arguments
+ * @attr: Argument attributes and flags
+ *
+ * This structure describes a single argument passed to a FastRPC invocation.
+ * Arguments can be either scalar values or buffer references (via DMA-BUF fd).
+ */
+struct drm_qda_fastrpc_invoke_args {
+	__u64 ptr;
+	__u64 length;
+	__s32 fd;
+	__u32 attr;
+};
+
+/**
+ * struct drm_qda_invoke_args - Dynamic FastRPC invocation parameters
+ * @handle: Remote handle to invoke on the DSP
+ * @sc: FastRPC scalars value encoding the number of in/out buffers
+ * @args: User-space pointer to array of drm_qda_fastrpc_invoke_args descriptors;
+ *        the fd field in each entry must be a DMA-BUF fd (or -1/0 for
+ *        inline scalar buffers)
+ *
+ * This structure is used with DRM_IOCTL_QDA_REMOTE_INVOKE to perform a
+ * dynamic remote procedure call on the DSP. The args pointer must reference
+ * an array of REMOTE_SCALARS_LENGTH(sc) drm_qda_fastrpc_invoke_args entries.
+ */
+struct drm_qda_invoke_args {
+	__u32 handle;
+	__u32 sc;
+	__u64 args;
+};
+
 #if defined(__cplusplus)
 }
 #endif

-- 
2.34.1



^ permalink raw reply related

* [PATCH 09/15] accel/qda: Add DMA-backed GEM objects and memory manager integration
From: Ekansh Gupta via B4 Relay @ 2026-05-19  6:15 UTC (permalink / raw)
  To: Oded Gabbay, Jonathan Corbet, Shuah Khan, Joerg Roedel,
	Will Deacon, Robin Murphy, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König
  Cc: Bharath Kumar, Chenna Kesava Raju, srini, dmitry.baryshkov,
	andersson, konradybcio, robin.clark, linux-kernel, dri-devel,
	linux-doc, linux-arm-msm, iommu, linux-media, linaro-mm-sig,
	Ekansh Gupta
In-Reply-To: <20260519-qda-series-v1-0-b2d984c297f8@oss.qualcomm.com>

From: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>

Introduce DMA-coherent buffer management for the QDA driver, wiring
together the GEM subsystem, the IOMMU memory manager, and a DMA
allocation backend.

qda_gem.c / qda_gem.h
  Implements the GEM object lifecycle for QDA buffers. Each buffer is
  represented by a qda_gem_obj which embeds a drm_gem_object and
  carries the kernel virtual address, DMA address, and a pointer to
  the IOMMU device that performed the allocation. The .free callback
  delegates to the memory manager, and the .mmap callback uses
  dma_mmap_coherent() via the DMA backend.

qda_memory_dma.c / qda_memory_dma.h
  DMA coherent allocation backend. qda_dma_alloc() calls
  dma_alloc_coherent() on the CB device and encodes the stream ID
  (SID) in the upper 32 bits of the returned DMA address, following
  the Qualcomm FastRPC convention for IOMMU address space tagging.
  qda_dma_free() strips the SID prefix before calling
  dma_free_coherent().

qda_memory_manager.c
  Adds process-to-device assignment: each DRM file (process) is
  assigned one IOMMU context bank device for the lifetime of the
  session. qda_memory_manager_assign_device() first checks whether
  the process already has a device (reusing it with a refcount
  increment), then falls back to claiming an unassigned device.
  qda_memory_manager_alloc() and qda_memory_manager_free() delegate
  to the DMA backend after resolving the correct CB device for the
  calling process.

qda_drv.c / qda_drv.h
  qda_file_priv gains an assigned_iommu_dev pointer and a pid field.
  The .postclose callback decrements the IOMMU device refcount and
  clears the process assignment when the last reference is dropped.

Assisted-by: Claude:claude-4-6-sonnet
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
---
 drivers/accel/qda/Makefile             |   2 +
 drivers/accel/qda/qda_drv.c            |  13 ++
 drivers/accel/qda/qda_drv.h            |   4 +
 drivers/accel/qda/qda_gem.c            | 156 +++++++++++++++++++++++
 drivers/accel/qda/qda_gem.h            |  54 ++++++++
 drivers/accel/qda/qda_memory_dma.c     | 110 ++++++++++++++++
 drivers/accel/qda/qda_memory_dma.h     |  17 +++
 drivers/accel/qda/qda_memory_manager.c | 224 +++++++++++++++++++++++++++++++++
 drivers/accel/qda/qda_memory_manager.h |  28 ++++-
 9 files changed, 607 insertions(+), 1 deletion(-)

diff --git a/drivers/accel/qda/Makefile b/drivers/accel/qda/Makefile
index b658dad35fee..a46ddceecfc5 100644
--- a/drivers/accel/qda/Makefile
+++ b/drivers/accel/qda/Makefile
@@ -8,7 +8,9 @@ obj-$(CONFIG_DRM_ACCEL_QDA)	:= qda.o
 qda-y := \
 	qda_cb.o \
 	qda_drv.o \
+	qda_gem.o \
 	qda_ioctl.o \
+	qda_memory_dma.o \
 	qda_memory_manager.o \
 	qda_rpmsg.o
 
diff --git a/drivers/accel/qda/qda_drv.c b/drivers/accel/qda/qda_drv.c
index becd831d10be..1b534fea50c8 100644
--- a/drivers/accel/qda/qda_drv.c
+++ b/drivers/accel/qda/qda_drv.c
@@ -22,6 +22,7 @@ static int qda_open(struct drm_device *dev, struct drm_file *file)
 	if (!qda_file_priv)
 		return -ENOMEM;
 
+	qda_file_priv->pid = current->pid;
 	qda_file_priv->qda_dev = qda_dev_from_drm(dev);
 	file->driver_priv = qda_file_priv;
 
@@ -32,6 +33,18 @@ static void qda_postclose(struct drm_device *dev, struct drm_file *file)
 {
 	struct qda_file_priv *qda_file_priv = file->driver_priv;
 
+	if (qda_file_priv->assigned_iommu_dev) {
+		struct qda_iommu_device *iommu_dev = qda_file_priv->assigned_iommu_dev;
+		unsigned long flags;
+
+		if (refcount_dec_and_test(&iommu_dev->refcount)) {
+			spin_lock_irqsave(&iommu_dev->lock, flags);
+			iommu_dev->assigned_pid = 0;
+			iommu_dev->assigned_file_priv = NULL;
+			spin_unlock_irqrestore(&iommu_dev->lock, flags);
+		}
+	}
+
 	kfree(qda_file_priv);
 	file->driver_priv = NULL;
 }
diff --git a/drivers/accel/qda/qda_drv.h b/drivers/accel/qda/qda_drv.h
index eb089e586b17..8a7d647ac8fc 100644
--- a/drivers/accel/qda/qda_drv.h
+++ b/drivers/accel/qda/qda_drv.h
@@ -24,6 +24,10 @@
 struct qda_file_priv {
 	/** @qda_dev: Back-pointer to device structure */
 	struct qda_dev *qda_dev;
+	/** @assigned_iommu_dev: IOMMU device assigned to this process */
+	struct qda_iommu_device *assigned_iommu_dev;
+	/** @pid: Process ID for tracking */
+	pid_t pid;
 };
 
 /**
diff --git a/drivers/accel/qda/qda_gem.c b/drivers/accel/qda/qda_gem.c
new file mode 100644
index 000000000000..568b3c2e64b7
--- /dev/null
+++ b/drivers/accel/qda/qda_gem.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+#include <drm/drm_gem.h>
+#include <drm/drm_prime.h>
+#include <drm/drm_print.h>
+#include <linux/slab.h>
+#include <linux/dma-mapping.h>
+#include "qda_drv.h"
+#include "qda_gem.h"
+#include "qda_memory_manager.h"
+#include "qda_memory_dma.h"
+
+static void setup_vma_flags(struct vm_area_struct *vma)
+{
+	vm_flags_set(vma, VM_DONTEXPAND);
+	vm_flags_set(vma, VM_DONTDUMP);
+}
+
+/**
+ * qda_gem_free_object() - Free a GEM object and its associated resources
+ * @gem_obj: DRM GEM object to free
+ */
+void qda_gem_free_object(struct drm_gem_object *gem_obj)
+{
+	struct qda_gem_obj *qda_gem_obj = to_qda_gem_obj(gem_obj);
+	struct qda_dev *qdev = qda_dev_from_drm(gem_obj->dev);
+
+	if (qda_gem_obj->virt && qdev->iommu_mgr)
+		qda_memory_manager_free(qdev->iommu_mgr, qda_gem_obj);
+
+	drm_gem_object_release(gem_obj);
+	kfree(qda_gem_obj);
+}
+
+/**
+ * qda_gem_mmap_obj() - Map a GEM object into userspace
+ * @drm_obj: DRM GEM object to map
+ * @vma: Virtual memory area to map into
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_gem_mmap_obj(struct drm_gem_object *drm_obj, struct vm_area_struct *vma)
+{
+	struct qda_gem_obj *qda_gem_obj = to_qda_gem_obj(drm_obj);
+	int ret;
+
+	/* Reset vm_pgoff for DMA mmap */
+	vma->vm_pgoff = 0;
+
+	ret = qda_dma_mmap(qda_gem_obj, vma);
+	if (ret == 0)
+		setup_vma_flags(vma);
+
+	return ret;
+}
+
+static const struct drm_gem_object_funcs qda_gem_object_funcs = {
+	.free = qda_gem_free_object,
+	.mmap = qda_gem_mmap_obj,
+};
+
+/**
+ * qda_gem_alloc_object() - Allocate a new QDA GEM object
+ * @drm_dev: DRM device
+ * @aligned_size: Size of the object in bytes (must be page-aligned)
+ *
+ * Return: Pointer to the new GEM object, or ERR_PTR on failure
+ */
+struct qda_gem_obj *qda_gem_alloc_object(struct drm_device *drm_dev, size_t aligned_size)
+{
+	struct qda_gem_obj *qda_gem_obj;
+	int ret;
+
+	qda_gem_obj = kzalloc_obj(*qda_gem_obj);
+	if (!qda_gem_obj)
+		return ERR_PTR(-ENOMEM);
+
+	ret = drm_gem_object_init(drm_dev, &qda_gem_obj->base, aligned_size);
+	if (ret) {
+		drm_err(drm_dev, "Failed to initialize GEM object: %d\n", ret);
+		kfree(qda_gem_obj);
+		return ERR_PTR(ret);
+	}
+
+	qda_gem_obj->base.funcs = &qda_gem_object_funcs;
+	qda_gem_obj->size = aligned_size;
+
+	drm_dbg_driver(drm_dev, "Allocated GEM object size=%zu\n", aligned_size);
+	return qda_gem_obj;
+}
+
+void qda_gem_cleanup_object(struct qda_gem_obj *qda_gem_obj)
+{
+	drm_gem_object_release(&qda_gem_obj->base);
+	kfree(qda_gem_obj);
+}
+
+struct drm_gem_object *qda_gem_lookup_object(struct drm_file *file_priv, u32 handle)
+{
+	struct drm_gem_object *gem_obj;
+
+	gem_obj = drm_gem_object_lookup(file_priv, handle);
+	if (!gem_obj)
+		return ERR_PTR(-ENOENT);
+
+	return gem_obj;
+}
+
+int qda_gem_create_handle(struct drm_file *file_priv, struct drm_gem_object *gem_obj, u32 *handle)
+{
+	int ret;
+
+	ret = drm_gem_handle_create(file_priv, gem_obj, handle);
+	drm_gem_object_put(gem_obj);
+
+	return ret;
+}
+
+/**
+ * qda_gem_create_object() - Allocate and initialize a GEM object with DMA backing
+ * @drm_dev: DRM device
+ * @iommu_mgr: Memory manager to use for DMA allocation
+ * @size: Requested size in bytes
+ * @file_priv: DRM file private data for process association
+ *
+ * Return: Pointer to the base DRM GEM object on success, ERR_PTR on failure
+ */
+struct drm_gem_object *qda_gem_create_object(struct drm_device *drm_dev,
+					     struct qda_memory_manager *iommu_mgr, size_t size,
+					     struct drm_file *file_priv)
+{
+	struct qda_gem_obj *qda_gem_obj;
+	size_t aligned_size;
+	int ret;
+
+	if (size == 0) {
+		drm_err(drm_dev, "Invalid size for GEM object creation\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	aligned_size = PAGE_ALIGN(size);
+
+	qda_gem_obj = qda_gem_alloc_object(drm_dev, aligned_size);
+	if (IS_ERR(qda_gem_obj))
+		return ERR_CAST(qda_gem_obj);
+
+	ret = qda_memory_manager_alloc(iommu_mgr, qda_gem_obj, file_priv);
+	if (ret) {
+		drm_err(drm_dev, "Memory manager allocation failed: %d\n", ret);
+		qda_gem_cleanup_object(qda_gem_obj);
+		return ERR_PTR(ret);
+	}
+
+	drm_dbg_driver(drm_dev, "GEM object created successfully size=%zu\n", aligned_size);
+	return &qda_gem_obj->base;
+}
diff --git a/drivers/accel/qda/qda_gem.h b/drivers/accel/qda/qda_gem.h
new file mode 100644
index 000000000000..bb18f8155aa4
--- /dev/null
+++ b/drivers/accel/qda/qda_gem.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ */
+#ifndef __QDA_GEM_H__
+#define __QDA_GEM_H__
+
+#include <linux/dma-mapping.h>
+#include <linux/xarray.h>
+#include <drm/drm_device.h>
+#include <drm/drm_gem.h>
+#include "qda_memory_manager.h"
+
+/**
+ * struct qda_gem_obj - QDA GEM buffer object
+ *
+ * Represents a GEM buffer object that can be allocated by the driver
+ * or imported from another driver via DMA-BUF.
+ */
+struct qda_gem_obj {
+	/** @base: DRM GEM object base — must be first member */
+	struct drm_gem_object base;
+	/** @iommu_dev: IOMMU context bank device that performed the allocation */
+	struct qda_iommu_device *iommu_dev;
+	/** @virt: Kernel virtual address of the allocated DMA memory */
+	void *virt;
+	/** @dma_addr: DMA address (with SID encoded in upper 32 bits) */
+	dma_addr_t dma_addr;
+	/** @size: Size of the buffer in bytes */
+	size_t size;
+};
+
+/**
+ * to_qda_gem_obj - Cast a drm_gem_object pointer to qda_gem_obj
+ * @gem_obj: Pointer to the embedded drm_gem_object
+ */
+#define to_qda_gem_obj(gem_obj) container_of(gem_obj, struct qda_gem_obj, base)
+
+/* GEM object lifecycle */
+struct drm_gem_object *qda_gem_create_object(struct drm_device *drm_dev,
+					     struct qda_memory_manager *iommu_mgr,
+					     size_t size, struct drm_file *file_priv);
+void qda_gem_free_object(struct drm_gem_object *gem_obj);
+int qda_gem_mmap_obj(struct drm_gem_object *gem_obj, struct vm_area_struct *vma);
+
+/* Internal helpers (also used by PRIME import) */
+struct qda_gem_obj *qda_gem_alloc_object(struct drm_device *drm_dev, size_t aligned_size);
+void qda_gem_cleanup_object(struct qda_gem_obj *qda_gem_obj);
+
+/* Utility functions */
+struct drm_gem_object *qda_gem_lookup_object(struct drm_file *file_priv, u32 handle);
+int qda_gem_create_handle(struct drm_file *file_priv, struct drm_gem_object *gem_obj, u32 *handle);
+
+#endif /* __QDA_GEM_H__ */
diff --git a/drivers/accel/qda/qda_memory_dma.c b/drivers/accel/qda/qda_memory_dma.c
new file mode 100644
index 000000000000..97488c755d2d
--- /dev/null
+++ b/drivers/accel/qda/qda_memory_dma.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+#include <linux/slab.h>
+#include <linux/dma-mapping.h>
+#include "qda_drv.h"
+#include "qda_gem.h"
+#include "qda_memory_dma.h"
+
+static dma_addr_t get_actual_dma_addr(struct qda_gem_obj *gem_obj)
+{
+	return gem_obj->dma_addr - ((u64)gem_obj->iommu_dev->sid << 32);
+}
+
+static void setup_gem_object(struct qda_gem_obj *gem_obj, void *virt,
+			     dma_addr_t dma_addr, struct qda_iommu_device *iommu_dev)
+{
+	gem_obj->virt = virt;
+	gem_obj->dma_addr = dma_addr;
+	gem_obj->iommu_dev = iommu_dev;
+}
+
+static void cleanup_gem_object_fields(struct qda_gem_obj *gem_obj)
+{
+	gem_obj->virt = NULL;
+	gem_obj->dma_addr = 0;
+	gem_obj->iommu_dev = NULL;
+}
+
+/**
+ * qda_dma_alloc() - Allocate DMA coherent memory for a GEM object
+ * @iommu_dev: Pointer to the QDA IOMMU device structure
+ * @gem_obj: Pointer to GEM object to allocate memory for
+ * @size: Size of memory to allocate in bytes
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_dma_alloc(struct qda_iommu_device *iommu_dev,
+		  struct qda_gem_obj *gem_obj, size_t size)
+{
+	void *virt;
+	dma_addr_t dma_addr;
+
+	if (!iommu_dev || !iommu_dev->dev) {
+		pr_err("qda: Invalid iommu_dev or device for DMA allocation\n");
+		return -EINVAL;
+	}
+
+	virt = dma_alloc_coherent(iommu_dev->dev, size, &dma_addr, GFP_KERNEL);
+	if (!virt)
+		return -ENOMEM;
+
+	dma_addr += ((u64)iommu_dev->sid << 32);
+
+	dev_dbg(iommu_dev->dev, "DMA address with SID prefix: 0x%llx (sid=%u)\n",
+		(u64)dma_addr, iommu_dev->sid);
+
+	setup_gem_object(gem_obj, virt, dma_addr, iommu_dev);
+
+	return 0;
+}
+
+/**
+ * qda_dma_free() - Free DMA coherent memory for a GEM object
+ * @gem_obj: Pointer to GEM object to free memory for
+ */
+void qda_dma_free(struct qda_gem_obj *gem_obj)
+{
+	if (!gem_obj || !gem_obj->iommu_dev) {
+		pr_debug("qda: Invalid gem_obj or iommu_dev for DMA free\n");
+		return;
+	}
+
+	dev_dbg(gem_obj->iommu_dev->dev, "DMA freeing: size=%zu, device_id=%u, dma_addr=0x%llx\n",
+		gem_obj->size, gem_obj->iommu_dev->id, gem_obj->dma_addr);
+
+	dma_free_coherent(gem_obj->iommu_dev->dev, gem_obj->size,
+			  gem_obj->virt, get_actual_dma_addr(gem_obj));
+
+	cleanup_gem_object_fields(gem_obj);
+}
+
+/**
+ * qda_dma_mmap() - Map DMA memory into userspace
+ * @gem_obj: Pointer to GEM object containing DMA memory
+ * @vma: Virtual memory area to map into
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_dma_mmap(struct qda_gem_obj *gem_obj, struct vm_area_struct *vma)
+{
+	struct qda_iommu_device *iommu_dev;
+	int ret;
+
+	if (!gem_obj || !gem_obj->virt || !gem_obj->iommu_dev || !gem_obj->iommu_dev->dev) {
+		pr_err("qda: Invalid parameters for DMA mmap\n");
+		return -EINVAL;
+	}
+
+	iommu_dev = gem_obj->iommu_dev;
+
+	ret = dma_mmap_coherent(iommu_dev->dev, vma, gem_obj->virt,
+				get_actual_dma_addr(gem_obj), gem_obj->size);
+	if (ret) {
+		dev_err(iommu_dev->dev, "DMA mmap failed: size=%zu, device_id=%u, ret=%d\n",
+			gem_obj->size, iommu_dev->id, ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/drivers/accel/qda/qda_memory_dma.h b/drivers/accel/qda/qda_memory_dma.h
new file mode 100644
index 000000000000..99352a99dc33
--- /dev/null
+++ b/drivers/accel/qda/qda_memory_dma.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) Qualcomm Technologies, Inc. and/or its subsidiaries.
+ */
+
+#ifndef __QDA_MEMORY_DMA_H__
+#define __QDA_MEMORY_DMA_H__
+
+#include <linux/dma-mapping.h>
+#include "qda_memory_manager.h"
+
+int qda_dma_alloc(struct qda_iommu_device *iommu_dev,
+		  struct qda_gem_obj *gem_obj, size_t size);
+void qda_dma_free(struct qda_gem_obj *gem_obj);
+int qda_dma_mmap(struct qda_gem_obj *gem_obj, struct vm_area_struct *vma);
+
+#endif /* __QDA_MEMORY_DMA_H__ */
diff --git a/drivers/accel/qda/qda_memory_manager.c b/drivers/accel/qda/qda_memory_manager.c
index 00a9c0ae4224..82111275f420 100644
--- a/drivers/accel/qda/qda_memory_manager.c
+++ b/drivers/accel/qda/qda_memory_manager.c
@@ -6,8 +6,11 @@
 #include <linux/spinlock.h>
 #include <linux/xarray.h>
 #include <drm/drm_file.h>
+#include <drm/drm_print.h>
 #include "qda_drv.h"
+#include "qda_gem.h"
 #include "qda_memory_manager.h"
+#include "qda_memory_dma.h"
 
 static void cleanup_all_memory_devices(struct qda_memory_manager *mem_mgr)
 {
@@ -28,6 +31,14 @@ static void cleanup_all_memory_devices(struct qda_memory_manager *mem_mgr)
 	pr_debug("qda: Completed cleanup of all memory devices\n");
 }
 
+static void init_iommu_device_fields(struct qda_iommu_device *iommu_dev)
+{
+	spin_lock_init(&iommu_dev->lock);
+	refcount_set(&iommu_dev->refcount, 0);
+	iommu_dev->assigned_pid = 0;
+	iommu_dev->assigned_file_priv = NULL;
+}
+
 static int allocate_device_id(struct qda_memory_manager *mem_mgr,
 			      struct qda_iommu_device *iommu_dev, u32 *id)
 {
@@ -44,6 +55,216 @@ static int allocate_device_id(struct qda_memory_manager *mem_mgr,
 	return 0;
 }
 
+static struct qda_iommu_device *find_device_for_pid(struct qda_memory_manager *mem_mgr,
+						    pid_t pid)
+{
+	unsigned long index;
+	void *entry;
+	struct qda_iommu_device *found_dev = NULL;
+	unsigned long flags;
+
+	xa_lock_irqsave(&mem_mgr->device_xa, flags);
+	xa_for_each(&mem_mgr->device_xa, index, entry) {
+		struct qda_iommu_device *iommu_dev = entry;
+
+		spin_lock(&iommu_dev->lock);
+		if (iommu_dev->assigned_pid == pid) {
+			found_dev = iommu_dev;
+			refcount_inc(&found_dev->refcount);
+			dev_dbg(found_dev->dev, "Reusing device id=%u for PID=%d (refcount=%u)\n",
+				found_dev->id, pid, refcount_read(&found_dev->refcount));
+			spin_unlock(&iommu_dev->lock);
+			break;
+		}
+		spin_unlock(&iommu_dev->lock);
+	}
+	xa_unlock_irqrestore(&mem_mgr->device_xa, flags);
+
+	return found_dev;
+}
+
+static struct qda_iommu_device *assign_available_device_to_pid(struct qda_memory_manager *mem_mgr,
+							       pid_t pid,
+							       struct drm_file *file_priv)
+{
+	unsigned long index;
+	void *entry;
+	struct qda_iommu_device *selected_dev = NULL;
+	unsigned long flags;
+
+	xa_lock_irqsave(&mem_mgr->device_xa, flags);
+	xa_for_each(&mem_mgr->device_xa, index, entry) {
+		struct qda_iommu_device *iommu_dev = entry;
+
+		spin_lock(&iommu_dev->lock);
+		if (iommu_dev->assigned_pid == 0) {
+			iommu_dev->assigned_pid = pid;
+			iommu_dev->assigned_file_priv = file_priv;
+			selected_dev = iommu_dev;
+			refcount_set(&selected_dev->refcount, 1);
+			dev_dbg(selected_dev->dev, "Assigned device id=%u to PID=%d\n",
+				selected_dev->id, pid);
+			spin_unlock(&iommu_dev->lock);
+			break;
+		}
+		spin_unlock(&iommu_dev->lock);
+	}
+	xa_unlock_irqrestore(&mem_mgr->device_xa, flags);
+
+	return selected_dev;
+}
+
+static struct qda_iommu_device *get_process_iommu_device(struct qda_memory_manager *mem_mgr,
+							 struct drm_file *file_priv)
+{
+	struct qda_file_priv *qda_priv;
+
+	if (!file_priv || !file_priv->driver_priv)
+		return NULL;
+
+	qda_priv = (struct qda_file_priv *)file_priv->driver_priv;
+	return qda_priv->assigned_iommu_dev;
+}
+
+/**
+ * qda_memory_manager_assign_device() - Assign an IOMMU device to a process
+ * @mem_mgr: Pointer to memory manager
+ * @file_priv: DRM file private data for process association
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_memory_manager_assign_device(struct qda_memory_manager *mem_mgr,
+				     struct drm_file *file_priv)
+{
+	struct qda_file_priv *qda_priv;
+	struct qda_iommu_device *selected_dev = NULL;
+	int ret = 0;
+	pid_t current_pid;
+
+	if (!file_priv || !file_priv->driver_priv) {
+		pr_err("qda: Invalid file_priv or driver_priv\n");
+		return -EINVAL;
+	}
+
+	qda_priv = (struct qda_file_priv *)file_priv->driver_priv;
+	current_pid = qda_priv->pid;
+
+	mutex_lock(&mem_mgr->process_assignment_lock);
+
+	if (qda_priv->assigned_iommu_dev) {
+		dev_dbg(qda_priv->assigned_iommu_dev->dev,
+			"PID=%d already has device id=%u assigned\n",
+			current_pid, qda_priv->assigned_iommu_dev->id);
+		ret = 0;
+		goto unlock_and_return;
+	}
+
+	selected_dev = find_device_for_pid(mem_mgr, current_pid);
+
+	if (selected_dev) {
+		qda_priv->assigned_iommu_dev = selected_dev;
+		goto unlock_and_return;
+	}
+
+	selected_dev = assign_available_device_to_pid(mem_mgr, current_pid, file_priv);
+
+	if (!selected_dev) {
+		pr_err("qda: No available device for PID=%d\n", current_pid);
+		ret = -ENOMEM;
+		goto unlock_and_return;
+	}
+
+	qda_priv->assigned_iommu_dev = selected_dev;
+
+unlock_and_return:
+	mutex_unlock(&mem_mgr->process_assignment_lock);
+	return ret;
+}
+
+static struct qda_iommu_device *get_or_assign_iommu_device(struct qda_memory_manager *mem_mgr,
+							   struct drm_file *file_priv)
+{
+	struct qda_iommu_device *iommu_dev;
+	int ret;
+
+	iommu_dev = get_process_iommu_device(mem_mgr, file_priv);
+	if (iommu_dev)
+		return iommu_dev;
+
+	ret = qda_memory_manager_assign_device(mem_mgr, file_priv);
+	if (ret)
+		return NULL;
+
+	iommu_dev = get_process_iommu_device(mem_mgr, file_priv);
+	if (iommu_dev)
+		return iommu_dev;
+
+	return NULL;
+}
+
+/**
+ * qda_memory_manager_alloc() - Allocate memory for a GEM object
+ * @mem_mgr: Pointer to memory manager
+ * @gem_obj: Pointer to GEM object to allocate memory for
+ * @file_priv: DRM file private data for process association
+ *
+ * Return: 0 on success, negative error code on failure
+ */
+int qda_memory_manager_alloc(struct qda_memory_manager *mem_mgr, struct qda_gem_obj *gem_obj,
+			     struct drm_file *file_priv)
+{
+	struct qda_iommu_device *selected_dev;
+	size_t size;
+	int ret;
+
+	if (!mem_mgr || !gem_obj || !file_priv) {
+		pr_err("qda: Invalid parameters for memory allocation\n");
+		return -EINVAL;
+	}
+
+	size = gem_obj->size;
+	if (size == 0) {
+		drm_err(gem_obj->base.dev, "Invalid allocation size: 0\n");
+		return -EINVAL;
+	}
+
+	selected_dev = get_or_assign_iommu_device(mem_mgr, file_priv);
+
+	if (!selected_dev) {
+		drm_err(gem_obj->base.dev,
+			"Failed to get/assign device for allocation (size=%zu)\n",
+			size);
+		return -ENOMEM;
+	}
+
+	ret = qda_dma_alloc(selected_dev, gem_obj, size);
+	if (ret) {
+		drm_err(gem_obj->base.dev, "Allocation failed: size=%zu, device_id=%u, ret=%d\n",
+			size, selected_dev->id, ret);
+		return ret;
+	}
+
+	drm_dbg_driver(gem_obj->base.dev,
+		       "Successfully allocated: size=%zu, device_id=%u, dma_addr=0x%llx\n",
+		       size, selected_dev->id, gem_obj->dma_addr);
+	return 0;
+}
+
+/**
+ * qda_memory_manager_free() - Free memory for a GEM object
+ * @mem_mgr: Pointer to memory manager
+ * @gem_obj: Pointer to GEM object to free memory for
+ */
+void qda_memory_manager_free(struct qda_memory_manager *mem_mgr, struct qda_gem_obj *gem_obj)
+{
+	if (!gem_obj || !gem_obj->iommu_dev) {
+		pr_debug("qda: Invalid gem_obj or iommu_dev for free\n");
+		return;
+	}
+
+	qda_dma_free(gem_obj);
+}
+
 /**
  * qda_memory_manager_register_device() - Register an IOMMU device
  * @mem_mgr: Pointer to memory manager
@@ -57,6 +278,8 @@ int qda_memory_manager_register_device(struct qda_memory_manager *mem_mgr,
 	int ret;
 	u32 id;
 
+	init_iommu_device_fields(iommu_dev);
+
 	ret = allocate_device_id(mem_mgr, iommu_dev, &id);
 	if (ret) {
 		dev_err(iommu_dev->dev,
@@ -95,6 +318,7 @@ int qda_memory_manager_init(struct qda_memory_manager *mem_mgr)
 	pr_debug("qda: Initializing memory manager\n");
 
 	xa_init_flags(&mem_mgr->device_xa, XA_FLAGS_ALLOC);
+	mutex_init(&mem_mgr->process_assignment_lock);
 
 	pr_debug("qda: Memory manager initialized successfully\n");
 	return 0;
diff --git a/drivers/accel/qda/qda_memory_manager.h b/drivers/accel/qda/qda_memory_manager.h
index 0243f9c0c5aa..252459bc10d0 100644
--- a/drivers/accel/qda/qda_memory_manager.h
+++ b/drivers/accel/qda/qda_memory_manager.h
@@ -7,8 +7,15 @@
 #define __QDA_MEMORY_MANAGER_H__
 
 #include <linux/device.h>
+#include <linux/mutex.h>
+#include <linux/refcount.h>
+#include <linux/spinlock.h>
 #include <linux/xarray.h>
-#include "qda_drv.h"
+#include <drm/drm_file.h>
+
+/* Forward declarations */
+struct qda_dev;
+struct qda_gem_obj;
 
 /**
  * struct qda_iommu_device - IOMMU device instance for memory management
@@ -21,10 +28,18 @@ struct qda_iommu_device {
 	struct device *dev;
 	/** @qdev: Back-pointer to the parent QDA device */
 	struct qda_dev *qdev;
+	/** @assigned_file_priv: DRM file private data for the assigned process */
+	struct drm_file *assigned_file_priv;
 	/** @id: Unique identifier assigned by the memory manager XArray */
 	u32 id;
 	/** @sid: Stream ID for IOMMU transactions */
 	u32 sid;
+	/** @assigned_pid: Process ID of the process assigned to this device */
+	pid_t assigned_pid;
+	/** @refcount: Reference counter for device */
+	refcount_t refcount;
+	/** @lock: Spinlock protecting concurrent access to device */
+	spinlock_t lock;
 };
 
 /**
@@ -36,6 +51,8 @@ struct qda_iommu_device {
 struct qda_memory_manager {
 	/** @device_xa: XArray storing all registered IOMMU devices */
 	struct xarray device_xa;
+	/** @process_assignment_lock: Mutex protecting process-to-device assignments */
+	struct mutex process_assignment_lock;
 };
 
 int qda_memory_manager_init(struct qda_memory_manager *mem_mgr);
@@ -46,4 +63,13 @@ int qda_memory_manager_register_device(struct qda_memory_manager *mem_mgr,
 void qda_memory_manager_unregister_device(struct qda_memory_manager *mem_mgr,
 					  struct qda_iommu_device *iommu_dev);
 
+int qda_memory_manager_assign_device(struct qda_memory_manager *mem_mgr,
+				     struct drm_file *file_priv);
+
+int qda_memory_manager_alloc(struct qda_memory_manager *mem_mgr,
+			     struct qda_gem_obj *gem_obj,
+			     struct drm_file *file_priv);
+void qda_memory_manager_free(struct qda_memory_manager *mem_mgr,
+			     struct qda_gem_obj *gem_obj);
+
 #endif /* __QDA_MEMORY_MANAGER_H__ */

-- 
2.34.1



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox