From: Robin Murphy <robin.murphy@arm.com>
To: "GuoRui.Yu" <GuoRui.Yu@linux.alibaba.com>,
linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
konrad.wilk@oracle.com, linux-coco@lists.linux.dev
Subject: Re: [RFC] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs
Date: Mon, 30 Jan 2023 13:03:18 +0000 [thread overview]
Message-ID: <0ee204fa-091d-43d9-9c2c-0c64cf0c1fdd@arm.com> (raw)
In-Reply-To: <20230128083254.86012-1-GuoRui.Yu@linux.alibaba.com>
On 2023-01-28 08:32, GuoRui.Yu wrote:
> This patch series adds a new swiotlb implementation, cc-swiotlb, for
> Confidential VMs (such as TDX and SEV-SNP). The new cc-swiotlb allocates
> the DMA TLB buffer dynamically in runtime instead of allocating at boot
> with a fixed size. Furthermore, future optimization and security
> enhancement could be applied on cc-swiotlb without "infecting" the
> legacy swiotlb.
>
> Background
> ==========
> Under COnfidential COmputing (CoCo) scenarios, the VMM cannot access
> guest memory directly but requires the guest to explicitly mark the
> memory as shared (decrypted). To make the streaming DMA mappings work,
> the current implementation relays on legacy SWIOTLB to bounce the DMA
> buffer between private (encrypted) and shared (decrypted) memory.
>
> However, the legacy swiotlb is designed for compatibility rather than
> efficiency and CoCo purpose, which will inevitably introduce some
> unnecessary restrictions.
>
> 1. Fixed immutable swiotlb size cannot accommodate to requirements of
> multiple devices. And 1GiB (current maximum size) of swiotlb in our
> testbed cannot afford multiple disks reads/writes simultaneously.
That's not a very logical argument - if a particular use-case needs a
particular total amount of SWIOTLB capacity, then that's how much it
needs. Whether that capacity is allocated up-front or allocated
gradually over time doesn't affect that. The obvious solution to this
issue as presented is "make the maximum size bigger", not "add a whole
other SWIOTLB implementation".
> 2. Fixed immutable IO_TLB_SIZE (2KiB) cannot satisfy various kinds of
> devices. At the moment, the minimal size of a swiotlb buffer is 2KiB,
> which will waste memory on small network packets (under 256 bytes) and
> decrease efficiency on a large block (up to 256KiB) size reads/writes of
> disks. And it is hard to have a trade-off on legacy swiotlb to rule them
> all.
That's clearly a general argument, so why should any improvement be
arbitrarily restricted to Confidential Compute scenarios?
> 3. The legacy swiotlb cannot efficiently support larger swiotlb buffers.
> In the worst case, the current implementation requires a full scan of
> the entire swiotlb buffer, which can cause severe performance hits.
Isn't that the main reason we just recently introduced the multiple area
stuff?
Robin.
> Changes in this patch set
> =========================
> Instead of keeping "infecting" the legacy swiotlb code with CoCo logic,
> this patch tries to introduce a new cc-swiotlb for Confidential VMs.
>
> Confidential VMs usually have reasonable modern devices (virtio devices,
> NVME, etc.), which can access memory above 4GiB, cc-swiotlb could
> allocate TLB buffers at any position dynamically. Since
> set_memory_{decrypted,encrypted} is time-consuming and cannot be used in
> interrupt context, a new kernel thread "kccd" has been added to populate
> new TLB buffers on-demand, which solved the problem 1.
>
> In addition, the cc-swiotlb manages TLB buffers by different sizes
> (512B, 2KiB, 4KiB, 16KiB, and 512KiB). The above values come from the
> following observations (boot with 8core, 32 GiB, 1 nvme disk, and 1
> virtio-net):
> - Allocations of 512 bytes and below account for 3.5% of the total DMA
> cache allocations;
> - Allocations of 2 KiB and below account for 57.7%;
> - Allocations of 4 KiB and below account for 91.3%;
> - Allocations of 16 KiB and below account for 96.0%;
> - Allocations of 512 KiB and below accounted for 100%;
> - At the end of booting, cc-swiotlb uses 288 MiB in total.
>
> For comparison, legacy swiotlb reserves memory at 6%, which requires
> min(1GiB, 32GiB * 0.06) = 1GiB, and will hang when operating multiple
> disks simultaneously due to no memory for the swiotlb buffer.
>
> These patches were tested with fio (using different iodepth and block
> size) on a platform with 96 cores, 384 GiB, and 20 NVME disks, and no IO
> hang or error was observed.
>
> For simplicity, the current RFC version cannot switch between legacy
> implementation with cmdline but through compile options. I am open to
> discussing how to integrate the cc-swiotlb into the legacy one.
>
> Patch Organization
> ==================
> - swiotlb: Split common code from swiotlb.{c,h}
> - swiotlb: Add a new cc-swiotlb implementation for Confidential VMs
> - swiotlb: Add tracepoint swiotlb_unbounced
> - cc-swiotlb: Allow set swiotlb watermark from cmdline
>
> Thanks for your time!
>
> Have a nice day,
> Guorui
>
>
next prev parent reply other threads:[~2023-01-30 13:03 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-28 8:32 [RFC] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs GuoRui.Yu
2023-01-28 8:32 ` [PATCH 1/4] swiotlb: Split common code from swiotlb.{c,h} GuoRui.Yu
2023-01-28 8:32 ` [PATCH 2/4] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs GuoRui.Yu
2023-01-28 12:03 ` kernel test robot
2023-01-28 16:41 ` Randy Dunlap
2023-01-29 1:54 ` Guorui Yu
2023-01-29 16:58 ` Andi Kleen
2023-01-30 2:25 ` Guorui Yu
2023-01-30 6:46 ` Andi Kleen
2023-01-30 13:45 ` Guorui Yu
2023-01-31 17:16 ` Andi Kleen
2023-02-01 2:08 ` Guorui Yu
2023-01-28 8:32 ` [PATCH 3/4] swiotlb: Add tracepoint swiotlb_unbounced GuoRui.Yu
2023-01-28 8:32 ` [PATCH 4/4] cc-swiotlb: Allow set swiotlb watermark from cmdline GuoRui.Yu
2023-01-28 20:19 ` kernel test robot
2023-01-28 9:03 ` [RFC] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs Guorui Yu
2023-01-30 6:54 ` Christoph Hellwig
2023-01-30 13:03 ` Robin Murphy [this message]
2023-01-30 14:37 ` Guorui Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0ee204fa-091d-43d9-9c2c-0c64cf0c1fdd@arm.com \
--to=robin.murphy@arm.com \
--cc=GuoRui.Yu@linux.alibaba.com \
--cc=iommu@lists.linux-foundation.org \
--cc=konrad.wilk@oracle.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).