From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: Maxime Ripard <mripard@redhat.com>
Cc: Lucas Stach <dev@lynxeye.de>, Milan Zamazal <mzamazal@redhat.com>,
Christoph Hellwig <hch@lst.de>,
iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
catalin.marinas@arm.com,
Bryan O'Donoghue <bryan.odonoghue@linaro.org>,
Andrey Konovalov <andrey.konovalov.ynk@gmail.com>,
Pavel Machek <pavel@ucw.cz>,
kieran.bingham@ideasonboard.com,
Hans de Goede <hdegoede@redhat.com>
Subject: Re: Re: Uncached buffers from CMA DMA heap on some Arm devices?
Date: Mon, 29 Jan 2024 14:05:20 +0200 [thread overview]
Message-ID: <20240129120520.GA8131@pendragon.ideasonboard.com> (raw)
In-Reply-To: <rhfec3v57xeoliukqxaives3uerb6phbxphgcua6mvxkdnzrzr@xm5zvoevxf4v>
Hi Maxime,
On Fri, Jan 26, 2024 at 01:17:50PM +0100, Maxime Ripard wrote:
> On Thu, Jan 25, 2024 at 12:41:01PM +0100, Lucas Stach wrote:
> > Am Mittwoch, dem 24.01.2024 um 19:27 +0100 schrieb Milan Zamazal:
> > > Hello,
> > >
> > > in the libcamera project, we experience a major performance problem related to
> > > DMA buffers while working on camera image processing using CPU. This happens
> > > only with some Arm boards, we have observed it on Debix Model A (NXP i.MX 8M
> > > Plus) and PinePhone. We use /dev/dma_heap/linux,cma (or reserved) DMA buffer
> > > heap on Arm.
> > >
> > > Reading V4L2 camera data from buffers is very slow. When we memcpy the data
> > > from the buffer to a malloc'ed memory before working with it (reading each byte
> > > multiple times, without any big non-sequential jumps across the data), we get
> > > more than 10 times speed up. It looks like the input buffer is uncached.
> > >
> > That's right and a reality you have to deal with on those small ARM
> > systems. The ARM architecture allows for systems that don't enforce
> > hardware coherency across the whole SoC and many of the small/cheap SoC
> > variants make use of this architectural feature.
> >
> > What this means is that the CPU caches aren't coherent when it comes to
> > DMA from other masters like the video capture units. There are two ways
> > to enforce DMA coherency on such systems:
> > 1. map the DMA buffers uncached on the CPU
> > 2. require explicit cache maintenance when touching DMA buffers with
> > the CPU
> >
> > Option 1 is what you see is happening in your setup, as it is simple,
> > straight-forward and doesn't require any synchronization points.
> >
> > Option 2 could be implemented by allocating cached DMA buffers in the
> > V4L2 device and then executing the necessary cache synchronization in
> > qbuf/dqbuf when ownership of the DMA buffer changes between CPU and DMA
> > master. However this isn't guaranteed to be any faster, as the cache
> > synchronization itself is a pretty heavy-weight operation when you are
> > dealing with buffer that are potentially multi-megabytes in size.
>
> My understanding was that the CMA DMA Heap is already allocating
> cacheable buffers,
I'll be a bit pedantic here. As far as I understand, the CMA heap
doesn't allocate "cacheable" buffers. It allocates pages, and they are
not inherently cached or uncached. Whether a page is mapped to the CPU
as cached or uncached is a decision made at mapping time. Unless I'm
mistaken, the CMA heap maps pages to userspace cached.
> with the expectation that you need to call the dma-buf cache
> management ioctl. Is it not?
Someone has to manage the cache, yes. It can be done explicitly by
userspace through the dmabuf sync ioctl, or implicitly within the
kernel. For instance, when queueing a dmabuf to a V4L2 device that uses
videobuf2-dma-contig, the QBUF ioctl ends up calling
flush_kernel_vmap_range() and dma_sync_sgtable_for_device() (see
vb2_dc_prepare()). videobuf2-vmalloc, on the other hand, has no cache
handling, which is a known issue when sharing buffers with the display.
On a side note, the cache handling in videobuf2-dma-contig.c seems
problematic to me, as vb2 shouldn't assume much about imported dmabufs.
It should instead use the operations exposed by dmabuf to delegate cache
handling to the exporter.
--
Regards,
Laurent Pinchart
next prev parent reply other threads:[~2024-01-29 12:05 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-24 18:27 Uncached buffers from CMA DMA heap on some Arm devices? Milan Zamazal
2024-01-25 11:41 ` Lucas Stach
2024-01-26 11:22 ` Milan Zamazal
2024-01-26 12:19 ` Maxime Ripard
2024-01-26 12:17 ` Maxime Ripard
2024-01-29 12:05 ` Laurent Pinchart [this message]
2024-01-29 10:23 ` Pavel Machek
2024-01-29 10:32 ` Maxime Ripard
2024-01-29 12:07 ` Laurent Pinchart
2024-01-29 13:12 ` Lucas Stach
2024-01-29 18:30 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240129120520.GA8131@pendragon.ideasonboard.com \
--to=laurent.pinchart@ideasonboard.com \
--cc=andrey.konovalov.ynk@gmail.com \
--cc=bryan.odonoghue@linaro.org \
--cc=catalin.marinas@arm.com \
--cc=dev@lynxeye.de \
--cc=hch@lst.de \
--cc=hdegoede@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=kieran.bingham@ideasonboard.com \
--cc=mripard@redhat.com \
--cc=mzamazal@redhat.com \
--cc=pavel@ucw.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.