Re: [RFC v1 3/4] swiotlb: Allow dynamic allocation of bounce buffers

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Juerg Haefliger <juergh@proton.me>
To: Petr Tesarik <petrtesarik@huaweicloud.com>
Cc: Christoph Hellwig <hch@lst.de>, Jonathan Corbet <corbet@lwn.net>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>, Borislav Petkov <bp@suse.de>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>,
	Kim Phillips <kim.phillips@amd.com>,
	"Steven Rostedt (Google)" <rostedt@goodmis.org>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:DMA MAPPING HELPERS" <iommu@lists.linux.dev>,
	Roberto Sassu <roberto.sassu@huawei.com>,
	petr@tesarici.cz, Alexander Graf <graf@amazon.com>
Subject: Re: [RFC v1 3/4] swiotlb: Allow dynamic allocation of bounce buffers
Date: Fri, 31 Mar 2023 07:26:09 +0000	[thread overview]
Message-ID: <20230331092553.677e9649@smeagol> (raw)
In-Reply-To: <4268fa4e-4f0f-a2f6-a2a5-5b78ca4a073d@huaweicloud.com>

[-- Attachment #1: Type: text/plain, Size: 5229 bytes --]

On Tue, 28 Mar 2023 09:54:35 +0200
Petr Tesarik <petrtesarik@huaweicloud.com> wrote:

> On 3/28/2023 6:07 AM, Christoph Hellwig wrote:
> > [adding Alex as he has been interested in this in the past]
> >
> > On Mon, Mar 20, 2023 at 01:28:15PM +0100, Petr Tesarik wrote:
> >> Second, on the Raspberry Pi 4, swiotlb is used by dma-buf for pages
> >> moved from the rendering GPU (v3d driver), which can access all
> >> memory, to the display output (vc4 driver), which is connected to a
> >> bus with an address limit of 1 GiB and no IOMMU. These buffers can
> >> be large (several megabytes) and cannot be handled by SWIOTLB,
> >> because they exceed maximum segment size of 256 KiB. Such mapping
> >> failures can be easily reproduced on a Raspberry Pi4: Starting
> >> GNOME remote desktop results in a flood of kernel messages like
> >> these:
> >
> > Shouldn't we make sure dma-buf allocates the buffers for the most
> > restricted devices, and more importantly does something like a dma
> > coherent allocation instead of a dynamic mapping of random memory?
> >
> > While a larger swiotlb works around this I don't think this fixes the root
> > cause.
>
> I tend to agree here. However, it's the DMABUF design itself that causes
> some trouble. The buffer is allocated by the v3d driver, which does not
> have the restriction, so the DMA API typically allocates an address
> somewhere near the 4G boundary. Userspace then exports the buffer, sends
> it to another process as a file descriptor and imports it into the vc4
> driver, which requires DMA below 1G. In the beginning, v3d had no idea
> that the buffer would be exported to userspace, much less that it would
> be later imported into vc4.
>
> Anyway, I suspected that the buffers need not be imported into the vc4
> driver (also hinted by Eric Anholt in a 2018 blog post [1]), and it
> seems I was right. I encountered the issue with Ubuntu 22.10; I
> installed latest openSUSE Tumbleweed yesterday, and I was not able to
> reproduce the issue there, most likely because the Mesa drivers have
> been fixed meanwhile. This makes the specific case of the Raspberry Pi 4
> drivers moot. The issue may still affect other combinations of drivers,
> but I don't have any other real-world example ATM.

I'm only seeing this problem with Wayland, no issue when switching Ubuntu to
X. It seems Tumbleweed is using X by default.

...Juerg


> [1] https://anholt.github.io/twivc4/2018/02/12/twiv/
>
> >> 1. The value is limited to ULONG_MAX, which is too little both for
> >>    physical addresses (e.g. x86 PAE or 32-bit ARM LPAE) and DMA
> >>    addresses (e.g. Xen guests on 32-bit ARM).
> >>
> >> 2. Since buffers are currently allocated with page granularity, a
> >>    PFN can be used instead. However, some values are reserved by
> >>    the maple tree implementation. Liam suggests to use
> >>    xa_mk_value() in that case, but that reduces the usable range by
> >>    half. Luckily, 31 bits are still enough to hold a PFN on all
> >>    32-bit platforms.
> >>
> >> 3. Software IO TLB is used from interrupt context. The maple tree
> >>    implementation is not IRQ-safe (MT_FLAGS_LOCK_IRQ does nothing
> >>    AFAICS). Instead, I use an external lock, spin_lock_irqsave() and
> >>    spin_unlock_irqrestore().
> >>
> >> Note that bounce buffers are never allocated dynamically if the
> >> software IO TLB is in fact a DMA restricted pool, which is intended
> >> to be stay in its designated location in physical memory.
> >
> > I'm a little worried about all that because it causes quite a bit
> > of overhead even for callers that don't end up going into the
> > dynamic range or do not use swiotlb at all.  I don't really have a
> > good answer here except for the usual avoid bounce buffering whenever
> > you can that might not always be easy to do.
>
> I'm also worried about all this overhead. OTOH I was not able to confirm
> it, because the difference between two successive fio test runs on an
> unmodified kernel was bigger than the difference between a vanilla and a
> patched kernel, except the maximum completion latency, which OTOH
> affected less than 0.01% of all requests.
>
> BTW my testing also suggests that the streaming DMA API is quite
> inefficient, because UAS performance _improved_ with swiotlb=force.
> Sure, this should probably be addressed in the UAS and/or xHCI driver,
> but what I mean is that moving away from swiotlb may even cause
> performance regressions, which is counter-intuitive. At least I would
> _not_ have expected it.
>
> >> +	gfp = (attrs & DMA_ATTR_MAY_SLEEP) ? GFP_KERNEL : GFP_NOWAIT;
> >> +	slot = kmalloc(sizeof(*slot), gfp | __GFP_NOWARN);
> >> +	if (!slot)
> >> +		goto err;
> >> +
> >> +	slot->orig_addr = orig_addr;
> >> +	slot->alloc_size = alloc_size;
> >> +	slot->page = dma_direct_alloc_pages(dev, PAGE_ALIGN(alloc_size),
> >> +					    &slot->dma_addr, dir,
> >> +					    gfp | __GFP_NOWARN);
> >> +	if (!slot->page)
> >> +		goto err_free_slot;
> >
> > Without GFP_NOIO allocations this will deadlock eventually.
>
> Ah, that would affect the non-sleeping case (GFP_KERNEL), right?
>
> Petr T
>


[-- Attachment #2: attachment.sig --]
[-- Type: application/pgp-signature, Size: 849 bytes --]

next prev parent reply	other threads:[~2023-03-31  7:26 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-20 12:28 [RFC v1 0/4] Allow dynamic allocation of software IO TLB bounce buffers Petr Tesarik
2023-03-20 12:28 ` [RFC v1 1/4] dma-mapping: introduce the DMA_ATTR_MAY_SLEEP attribute Petr Tesarik
2023-03-28  3:57   ` Christoph Hellwig
2023-03-28  7:21     ` Petr Tesarik
2023-04-07  5:52       ` Christoph Hellwig
2023-03-31 13:06   ` Bagas Sanjaya
2023-03-20 12:28 ` [RFC v1 2/4] swiotlb: Move code around in preparation for dynamic bounce buffers Petr Tesarik
2023-03-20 12:28 ` [RFC v1 3/4] swiotlb: Allow dynamic allocation of " Petr Tesarik
2023-03-28  4:07   ` Christoph Hellwig
2023-03-28  7:54     ` Petr Tesarik
2023-03-28 12:43       ` Petr Tesarik
2023-04-07  5:57         ` Christoph Hellwig
2023-04-07 10:15           ` Petr Tesařík
2023-04-13 11:09             ` Petr Tesarik
2023-04-21 13:03             ` Petr Tesařík
2023-04-21 14:58               ` Robin Murphy
2023-04-21 15:09                 ` Petr Tesařík
2023-04-24  6:03                   ` Christoph Hellwig
2023-03-31  7:26       ` Juerg Haefliger [this message]
2023-03-31  9:00         ` Petr Tesařík
2023-04-06 11:44           ` Juerg Haefliger
2023-05-11 10:36             ` Petr Tesařík
2023-04-07  5:55       ` Christoph Hellwig
2023-04-07 10:46         ` Petr Tesařík
2023-04-11  3:51           ` Christoph Hellwig
2023-03-20 12:28 ` [RFC v1 4/4] swiotlb: Add an option to allow dynamic " Petr Tesarik
2023-03-27 11:06 ` [RFC v1 0/4] Allow dynamic allocation of software IO TLB " Petr Tesarik
2023-04-07  6:00   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230331092553.677e9649@smeagol \
    --to=juergh@proton.me \
    --cc=akpm@linux-foundation.org \
    --cc=bp@suse.de \
    --cc=corbet@lwn.net \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=graf@amazon.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=kim.phillips@amd.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=paulmck@kernel.org \
    --cc=petr@tesarici.cz \
    --cc=petrtesarik@huaweicloud.com \
    --cc=rdunlap@infradead.org \
    --cc=roberto.sassu@huawei.com \
    --cc=robin.murphy@arm.com \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).