All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philippe Gerum <rpm@xenomai.org>
To: "François Legal" <devel@thom.fr.eu.org>
Cc: xenomai@xenomai.org
Subject: Re: Doing DMA from peripheral to userland memory
Date: Thu, 02 Sep 2021 19:12:50 +0200	[thread overview]
Message-ID: <871r669a0d.fsf@xenomai.org> (raw)
In-Reply-To: <10a7-6130ff00-12b-35a6e6c0@10227466>


François Legal <devel@thom.fr.eu.org> writes:

> Le Mercredi, Septembre 01, 2021 10:24 CEST, François Legal via Xenomai <xenomai@xenomai.org> a écrit: 
>  
>> Le Mardi, Août 31, 2021 19:37 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>>  
>> > 
>> > François Legal <devel@thom.fr.eu.org> writes:
>> > 
>> > > Le Vendredi, Août 27, 2021 16:36 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> > >  
>> > >> 
>> > >> François Legal <devel@thom.fr.eu.org> writes:
>> > >> 
>> > >> > Le Vendredi, Août 27, 2021 15:54 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> > >> >  
>> > >> >> 
>> > >> >> François Legal <devel@thom.fr.eu.org> writes:
>> > >> >> 
>> > >> >> > Le Vendredi, Août 27, 2021 15:01 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> > >> >> >  
>> > >> >> >> 
>> > >> >> >> François Legal via Xenomai <xenomai@xenomai.org> writes:
>> > >> >> >> 
>> > >> >> >> > Hello,
>> > >> >> >> >
>> > >> >> >> > working on a zynq7000 target (arm cortex a9), we have a peripheral that generates loads of data (many kbytes per ms).
>> > >> >> >> >
>> > >> >> >> > We would like to move that data, directly from the peripheral memory (the OCM of the SoC) directly to our RT application user memory using DMA.
>> > >> >> >> >
>> > >> >> >> > For one part of the data, we would like the DMA to de interlace that data while moving it. We figured out, the PL330 peripheral on the SoC should be able to do it, however, we would like, as much as possible, to retain the use of one or two channels of the PL330 to plain linux non RT use (via dmaengine).
>> > >> >> >> >
>> > >> >> >> > My first attempt would be to enhance the dmaengine API to add RT API, then implement the RT API calls in the PL330 driver.
>> > >> >> >> >
>> > >> >> >> > What do you think of this approach, and is it achievable at all (DMA directly to user land memory and/or having DMA channels exploited by xenomai and other by linux) ?
>> > >> >> >> >
>> > >> >> >> > Thanks in advance
>> > >> >> >> >
>> > >> >> >> > François
>> > >> >> >> 
>> > >> >> >> As a starting point, you may want to have a look at this document:
>> > >> >> >> https://evlproject.org/core/oob-drivers/dma/
>> > >> >> >> 
>> > >> >> >> This is part of the EVL core documentation, but this is actually a
>> > >> >> >> Dovetail feature.
>> > >> >> >> 
>> > >> >> >
>> > >> >> > Well, that's quite what I want to do, so this is very good news that it is already available in the future. However, I need it through the ipipe right now, but I guess the process stays the same (through patching the dmaengine API and the DMA engine driver).
>> > >> >> >
>> > >> >> > I would guess the modifications to the DMA engine driver would be then easily ported to dovetail ?
>> > >> >> >
>> > >> >> 
>> > >> >> Since they should follow the same pattern used for the controllers
>> > >> >> Dovetail currently supports, I think so. You should be able to simplify
>> > >> >> the code when porting it Dovetail actually.
>> > >> >> 
>> > >> >
>> > >> > That's what I thought. Thanks a lot.
>> > >> >
>> > >> > So now, regarding the "to userland memory" aspect. I guess I will somehow have to, in order to make this happen, change the PTE flags to make these pages non cacheable (using dma_map_page maybe), but I wonder if I have to map the userland pages to kernel space and whether or not I have to pin the userland pages in memory (I believe mlockall in the userland process does that already) ?
>> > >> >
>> > >> 
>> > >> The out-of-band SPI support available from EVL illustrates a possible
>> > >> implementation. This code [2] implements what is described in this page
>> > >> [1].
>> > >> 
>> > >
>> > > Thanks for the example. I think what I'm trying to do is a little different from this however.
>> > > For the records, this is what I do (and that seems to be working) :> > - as soon as user land buffers are allocated, tell the driver to pin the user land buffer pages in memory (with get_user_pages_fast). I'm not sure if this is required, as I think mlockall in the app would already take care of that.
>> > > - whenever I need to transfer data to the user land buffer, instruct the driver to dma remap those user land pages (with dma_map_page), then instruct the DMA controller of the physical address of these pages.
>> > > et voilà
>> > >
>> > > This seem to work correctly and repeatedly so far.
>> > >
>> > 
>> > Are transfers controlled from the real-time stage, and if so, how do you
>> > deal with cache maintenance between transfers?
>> 
>> That is my next problem to fix. It seems, as long as I run the test program in the debugger, displaying the buffer filled by the DMA in GDB, everything is fine. When GDB get's out of the way, I seem to read data that got in the D cache before the DMA did the transfer.
>> I tried adding a flush_dcache_range before trigging the DMA, but it did not help.
>> 
>> Any suggestion ?
>> 
>> Thanks
>> 
>> François
>> 
>
> So I dug deep into the kernel cache management code for my (arm v7) arch, but could not find an answer nor a solution.
> I now wonder whether or not this (DMA to user land memory) is possible on this arch at all because of what is suggested in [1] even if that's a bit old.
>
> I saw that flush_dcache_range on armv7 is quite a noop, I tried with dmac_flush_range (which does the real thing with CP15), passing either the user land virtual address directly or first getting a kernel mapping with kmap_atomic but that did not change anything. I still, most of the time, get the first 2 cache line of data in the user land application wrong after the DMA transfer is done.
>
> I'm not sure where to look at next.
>

DMA to userland memory is a non-issue in the regular in-band
context. The problem starts with cache maintenance when you want to run
these I/O requests from the oob stage, hence my previous question.

The rule of thumb is that a driver should not fiddle with the innards of
cache maintenance directly, and certainly not with flush_dcache_range()
and friends. This includes Xenomai drivers. The DMA API hides these
details in a portable way, typically the DMA streaming API would clean
and/or invalidate the cache(s) layers when mapping, unmapping buffers.

Problem: we may not use the regular DMA API from oob context.  For
instance, if some IOMMU is involved, or bounce buffers of some sort
exist, or complex cache management layers in the kernel are traversed in
general (e.g. some outer L2 caches are ugly), then things might get
pretty nasty if this rule is not followed. For this reason, if using
coherent memory is practical performance-wise for the use case, then
this is a sane option for oob I/O, and you can do that as illustrated by
the example I referred to.

In this case, the kernel should allocate a suitable chunk of coherent
memory for your application to perform I/O, not your application
requesting common cached memory from its address space to be pinned and
used for DMA.

-- 
Philippe.


      reply	other threads:[~2021-09-02 17:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-27  9:29 Doing DMA from peripheral to userland memory François Legal
2021-08-27 13:01 ` Jan Kiszka
2021-08-27 13:01 ` Philippe Gerum
2021-08-27 13:44   ` François Legal
2021-08-27 13:54     ` Philippe Gerum
2021-08-27 14:09       ` François Legal
2021-08-27 14:36         ` Philippe Gerum
2021-08-31  9:36           ` François Legal
2021-08-31 17:37             ` Philippe Gerum
2021-09-01  8:24               ` François Legal
2021-09-02 16:41                 ` François Legal
2021-09-02 17:12                   ` Philippe Gerum [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871r669a0d.fsf@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=devel@thom.fr.eu.org \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.