From: Catalin Marinas <catalin.marinas@arm.com>
To: "Petr Tesařík" <petr@tesarici.cz>
Cc: Christoph Hellwig <hch@lst.de>,
"Michael Kelley (LINUX)" <mikelley@microsoft.com>,
Petr Tesarik <petrtesarik@huaweicloud.com>,
Jonathan Corbet <corbet@lwn.net>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Robin Murphy <robin.murphy@arm.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Borislav Petkov <bp@suse.de>,
Randy Dunlap <rdunlap@infradead.org>,
Damien Le Moal <damien.lemoal@opensource.wdc.com>,
Kim Phillips <kim.phillips@amd.com>,
"Steven Rostedt (Google)" <rostedt@goodmis.org>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Hans de Goede <hdegoede@redhat.com>,
Jason Gunthorpe <jgg@ziepe.ca>, Kees Cook <keescook@chromium.org>,
Thomas Gleixner <tglx@linutronix.de>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>,
"open list:DRM DRIVERS" <dri-devel@lists.freedesktop.org>,
"open list:DMA MAPPING HELPERS" <iommu@lists.linux.dev>,
Roberto Sassu <roberto.sassu@huawei.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers
Date: Tue, 16 May 2023 18:59:30 +0100 [thread overview]
Message-ID: <ZGPEgsplBSsI9li3@arm.com> (raw)
In-Reply-To: <20230516083942.0303b5fb@meshulam.tesarici.cz>
On Tue, May 16, 2023 at 08:39:42AM +0200, Petr Tesařík wrote:
> On Tue, 16 May 2023 08:13:09 +0200
> Christoph Hellwig <hch@lst.de> wrote:
> > On Mon, May 15, 2023 at 07:43:52PM +0000, Michael Kelley (LINUX) wrote:
> > > FWIW, I don't think the approach you have implemented here will be
> > > practical to use for CoCo VMs (SEV, TDX, whatever else). The problem
> > > is that dma_direct_alloc_pages() and dma_direct_free_pages() must
> > > call dma_set_decrypted() and dma_set_encrypted(), respectively. In CoCo
> > > VMs, these calls are expensive because they require a hypercall to the host,
> > > and the operation on the host isn't trivial either. I haven't measured the
> > > overhead, but doing a hypercall on every DMA map operation and on
> > > every unmap operation has long been something we thought we must
> > > avoid. The fixed swiotlb bounce buffer space solves this problem by
> > > doing set_decrypted() in batch at boot time, and never
> > > doing set_encrypted().
> >
> > I also suspect it doesn't really scale too well due to the number of
> > allocations. I suspect a better way to implement things would be to
> > add more large chunks that are used just like the main swiotlb buffers.
> >
> > That is when we run out of space try to allocate another chunk of the
> > same size in the background, similar to what we do with the pool in
> > dma-pool.c. This means we'll do a fairly large allocation, so we'll
> > need compaction or even CMA to back it up, but the other big upside
> > is that it also reduces the number of buffers that need to be checked
> > in is_swiotlb_buffer or the free / sync side.
>
> I have considered this approach. The two main issues I ran into were:
>
> 1. MAX_ORDER allocations were too small (at least with 4K pages), and
> even then they would often fail.
>
> 2. Allocating from CMA did work but only from process context.
> I made a stab at modifying the CMA allocator to work from interrupt
> context, but there are non-trivial interactions with the buddy
> allocator. Making them safe from interrupt context looked like a
> major task.
Can you kick off a worker thread when the swiotlb allocation gets past
some reserve limit? It still has a risk of failing to bounce until the
swiotlb buffer is extended.
> I also had some fears about the length of the dynamic buffer list. I
> observed maximum length for block devices, and then it roughly followed
> the queue depth. Walking a few hundred buffers was still fast enough.
> I admit the list length may become an issue with high-end NVMe and
> I/O-intensive applications.
You could replace the list with an rbtree, O(log n) look-up vs O(n),
could be faster if you have many bounces active.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: "Petr Tesařík" <petr@tesarici.cz>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"open list:DRM DRIVERS" <dri-devel@lists.freedesktop.org>,
"Michael Kelley \(LINUX\)" <mikelley@microsoft.com>,
Kim Phillips <kim.phillips@amd.com>,
Christoph Hellwig <hch@lst.de>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Petr Tesarik <petrtesarik@huaweicloud.com>,
Jonathan Corbet <corbet@lwn.net>,
Damien Le Moal <damien.lemoal@opensource.wdc.com>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
"open list:DMA MAPPING HELPERS" <iommu@lists.linux.dev>,
Borislav Petkov <bp@suse.de>,
Thomas Zimmermann <tzimmermann@suse.de>,
"Paul E. McKenney" <paulmck@kernel.org>,
Hans de Goede <hdegoede@redhat.com>,
"Steven Rostedt \(Google\)" <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Kees Cook <keescook@chromium.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Randy Dunlap <rdunlap@infradead.org>,
Roberto Sassu <roberto.sassu@huawei.com>,
open list <linux-kernel@vger.kernel.org>,
Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers
Date: Tue, 16 May 2023 18:59:30 +0100 [thread overview]
Message-ID: <ZGPEgsplBSsI9li3@arm.com> (raw)
In-Reply-To: <20230516083942.0303b5fb@meshulam.tesarici.cz>
On Tue, May 16, 2023 at 08:39:42AM +0200, Petr Tesařík wrote:
> On Tue, 16 May 2023 08:13:09 +0200
> Christoph Hellwig <hch@lst.de> wrote:
> > On Mon, May 15, 2023 at 07:43:52PM +0000, Michael Kelley (LINUX) wrote:
> > > FWIW, I don't think the approach you have implemented here will be
> > > practical to use for CoCo VMs (SEV, TDX, whatever else). The problem
> > > is that dma_direct_alloc_pages() and dma_direct_free_pages() must
> > > call dma_set_decrypted() and dma_set_encrypted(), respectively. In CoCo
> > > VMs, these calls are expensive because they require a hypercall to the host,
> > > and the operation on the host isn't trivial either. I haven't measured the
> > > overhead, but doing a hypercall on every DMA map operation and on
> > > every unmap operation has long been something we thought we must
> > > avoid. The fixed swiotlb bounce buffer space solves this problem by
> > > doing set_decrypted() in batch at boot time, and never
> > > doing set_encrypted().
> >
> > I also suspect it doesn't really scale too well due to the number of
> > allocations. I suspect a better way to implement things would be to
> > add more large chunks that are used just like the main swiotlb buffers.
> >
> > That is when we run out of space try to allocate another chunk of the
> > same size in the background, similar to what we do with the pool in
> > dma-pool.c. This means we'll do a fairly large allocation, so we'll
> > need compaction or even CMA to back it up, but the other big upside
> > is that it also reduces the number of buffers that need to be checked
> > in is_swiotlb_buffer or the free / sync side.
>
> I have considered this approach. The two main issues I ran into were:
>
> 1. MAX_ORDER allocations were too small (at least with 4K pages), and
> even then they would often fail.
>
> 2. Allocating from CMA did work but only from process context.
> I made a stab at modifying the CMA allocator to work from interrupt
> context, but there are non-trivial interactions with the buddy
> allocator. Making them safe from interrupt context looked like a
> major task.
Can you kick off a worker thread when the swiotlb allocation gets past
some reserve limit? It still has a risk of failing to bounce until the
swiotlb buffer is extended.
> I also had some fears about the length of the dynamic buffer list. I
> observed maximum length for block devices, and then it roughly followed
> the queue depth. Walking a few hundred buffers was still fast enough.
> I admit the list length may become an issue with high-end NVMe and
> I/O-intensive applications.
You could replace the list with an rbtree, O(log n) look-up vs O(n),
could be faster if you have many bounces active.
--
Catalin
next prev parent reply other threads:[~2023-05-16 17:59 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-09 9:18 [PATCH v2 RESEND 0/7] Allow dynamic allocation of software IO TLB bounce buffers Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 1/7] swiotlb: Use a helper to initialize swiotlb fields in struct device Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 2/7] swiotlb: Move code around in preparation for dynamic bounce buffers Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 3/7] dma-mapping: introduce the DMA_ATTR_MAY_SLEEP attribute Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-15 19:43 ` Michael Kelley (LINUX)
2023-05-15 19:43 ` Michael Kelley (LINUX)
2023-05-16 6:13 ` Christoph Hellwig
2023-05-16 6:39 ` Petr Tesařík
2023-05-16 6:39 ` Petr Tesařík
2023-05-16 17:59 ` Catalin Marinas [this message]
2023-05-16 17:59 ` Catalin Marinas
2023-05-17 6:35 ` Petr Tesařík
2023-05-17 6:35 ` Petr Tesařík
2023-05-17 6:56 ` Christoph Hellwig
2023-05-17 7:32 ` Petr Tesařík
2023-05-17 7:32 ` Petr Tesařík
2023-05-17 9:41 ` Catalin Marinas
2023-05-17 9:41 ` Catalin Marinas
2023-05-17 9:58 ` Petr Tesařík
2023-05-17 9:58 ` Petr Tesařík
2023-05-17 11:08 ` Catalin Marinas
2023-05-17 11:08 ` Catalin Marinas
2023-05-17 11:27 ` Petr Tesařík
2023-05-17 11:27 ` Petr Tesařík
2023-05-23 9:54 ` Catalin Marinas
2023-05-23 9:54 ` Catalin Marinas
2023-05-23 11:53 ` Petr Tesařík
2023-05-23 11:53 ` Petr Tesařík
2023-05-16 6:16 ` Petr Tesařík
2023-05-16 6:16 ` Petr Tesařík
2023-05-09 9:18 ` [PATCH v2 RESEND 5/7] swiotlb: Add a boot option to enable dynamic " Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 6/7] drm: Use DMA_ATTR_MAY_SLEEP from process context Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 7/7] swiotlb: per-device flag if there are dynamically allocated buffers Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-14 18:54 ` Catalin Marinas
2023-05-14 18:54 ` Catalin Marinas
2023-05-15 8:48 ` Petr Tesařík
2023-05-15 8:48 ` Petr Tesařík
2023-05-15 10:00 ` Petr Tesařík
2023-05-15 10:00 ` Petr Tesařík
2023-05-15 16:28 ` Catalin Marinas
2023-05-15 16:28 ` Catalin Marinas
2023-05-16 7:55 ` Petr Tesařík
2023-05-16 7:55 ` Petr Tesařík
2023-05-16 11:22 ` Catalin Marinas
2023-05-16 11:22 ` Catalin Marinas
[not found] ` <20230515104737.2c4c05db@meshulam.tesarici.cz>
[not found] ` <ZGH9v2KWJWZnKvxP@arm.com>
2023-05-15 10:47 ` Petr Tesařík
2023-05-15 10:47 ` Petr Tesařík
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZGPEgsplBSsI9li3@arm.com \
--to=catalin.marinas@arm.com \
--cc=airlied@gmail.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=bp@suse.de \
--cc=corbet@lwn.net \
--cc=damien.lemoal@opensource.wdc.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=gregkh@linuxfoundation.org \
--cc=hch@lst.de \
--cc=hdegoede@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=keescook@chromium.org \
--cc=kim.phillips@amd.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mikelley@microsoft.com \
--cc=mripard@kernel.org \
--cc=paulmck@kernel.org \
--cc=petr@tesarici.cz \
--cc=petrtesarik@huaweicloud.com \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=roberto.sassu@huawei.com \
--cc=robin.murphy@arm.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tzimmermann@suse.de \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.