Re: [RFC] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs

linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Robin Murphy <robin.murphy@arm.com>
To: "GuoRui.Yu" <GuoRui.Yu@linux.alibaba.com>,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	konrad.wilk@oracle.com, linux-coco@lists.linux.dev
Subject: Re: [RFC] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs
Date: Mon, 30 Jan 2023 13:03:18 +0000	[thread overview]
Message-ID: <0ee204fa-091d-43d9-9c2c-0c64cf0c1fdd@arm.com> (raw)
In-Reply-To: <20230128083254.86012-1-GuoRui.Yu@linux.alibaba.com>

On 2023-01-28 08:32, GuoRui.Yu wrote:
> This patch series adds a new swiotlb implementation, cc-swiotlb, for
> Confidential VMs (such as TDX and SEV-SNP). The new cc-swiotlb allocates
> the DMA TLB buffer dynamically in runtime instead of allocating at boot
> with a fixed size. Furthermore, future optimization and security
> enhancement could be applied on cc-swiotlb without "infecting" the
> legacy swiotlb.
> 
> Background
> ==========
> Under COnfidential COmputing (CoCo) scenarios, the VMM cannot access
> guest memory directly but requires the guest to explicitly mark the
> memory as shared (decrypted). To make the streaming DMA mappings work,
> the current implementation relays on legacy SWIOTLB to bounce the DMA
> buffer between private (encrypted) and shared (decrypted) memory.
> 
> However, the legacy swiotlb is designed for compatibility rather than
> efficiency and CoCo purpose, which will inevitably introduce some
> unnecessary restrictions.
> 
> 1. Fixed immutable swiotlb size cannot accommodate to requirements of
> multiple devices. And 1GiB (current maximum size) of swiotlb in our
> testbed cannot afford multiple disks reads/writes simultaneously.

That's not a very logical argument - if a particular use-case needs a 
particular total amount of SWIOTLB capacity, then that's how much it 
needs. Whether that capacity is allocated up-front or allocated 
gradually over time doesn't affect that. The obvious solution to this 
issue as presented is "make the maximum size bigger", not "add a whole 
other SWIOTLB implementation".

> 2. Fixed immutable IO_TLB_SIZE (2KiB) cannot satisfy various kinds of
> devices. At the moment, the minimal size of a swiotlb buffer is 2KiB,
> which will waste memory on small network packets (under 256 bytes) and
> decrease efficiency on a large block (up to 256KiB) size reads/writes of
> disks. And it is hard to have a trade-off on legacy swiotlb to rule them
> all.

That's clearly a general argument, so why should any improvement be 
arbitrarily restricted to Confidential Compute scenarios?

> 3. The legacy swiotlb cannot efficiently support larger swiotlb buffers.
> In the worst case, the current implementation requires a full scan of
> the entire swiotlb buffer, which can cause severe performance hits.

Isn't that the main reason we just recently introduced the multiple area 
stuff?

Robin.

> Changes in this patch set
> =========================
> Instead of keeping "infecting" the legacy swiotlb code with CoCo logic,
> this patch tries to introduce a new cc-swiotlb for Confidential VMs.
> 
> Confidential VMs usually have reasonable modern devices (virtio devices,
> NVME, etc.), which can access memory above 4GiB, cc-swiotlb could
> allocate TLB buffers at any position dynamically. Since
> set_memory_{decrypted,encrypted} is time-consuming and cannot be used in
> interrupt context, a new kernel thread "kccd" has been added to populate
> new TLB buffers on-demand, which solved the problem 1.
> 
> In addition, the cc-swiotlb manages TLB buffers by different sizes
> (512B, 2KiB, 4KiB, 16KiB, and 512KiB). The above values come from the
> following observations (boot with 8core, 32 GiB, 1 nvme disk, and 1
> virtio-net):
> - Allocations of 512 bytes and below account for 3.5% of the total DMA
>    cache allocations;
> - Allocations of 2 KiB and below account for 57.7%;
> - Allocations of 4 KiB and below account for 91.3%;
> - Allocations of 16 KiB and below account for 96.0%;
> - Allocations of 512 KiB and below accounted for 100%;
> - At the end of booting, cc-swiotlb uses 288 MiB in total.
> 
> For comparison, legacy swiotlb reserves memory at 6%, which requires
> min(1GiB, 32GiB * 0.06) = 1GiB, and will hang when operating multiple
> disks simultaneously due to no memory for the swiotlb buffer.
> 
> These patches were tested with fio (using different iodepth and block
> size) on a platform with 96 cores, 384 GiB, and 20 NVME disks, and no IO
> hang or error was observed.
> 
> For simplicity, the current RFC version cannot switch between legacy
> implementation with cmdline but through compile options. I am open to
> discussing how to integrate the cc-swiotlb into the legacy one.
> 
> Patch Organization
> ==================
> - swiotlb: Split common code from swiotlb.{c,h}
> - swiotlb: Add a new cc-swiotlb implementation for Confidential VMs
> - swiotlb: Add tracepoint swiotlb_unbounced
> - cc-swiotlb: Allow set swiotlb watermark from cmdline
> 
> Thanks for your time!
> 
> Have a nice day,
> Guorui
> 
>

next prev parent reply	other threads:[~2023-01-30 13:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-28  8:32 [RFC] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs GuoRui.Yu
2023-01-28  8:32 ` [PATCH 1/4] swiotlb: Split common code from swiotlb.{c,h} GuoRui.Yu
2023-01-28  8:32 ` [PATCH 2/4] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs GuoRui.Yu
2023-01-28 12:03   ` kernel test robot
2023-01-28 16:41   ` Randy Dunlap
2023-01-29  1:54     ` Guorui Yu
2023-01-29 16:58   ` Andi Kleen
2023-01-30  2:25     ` Guorui Yu
2023-01-30  6:46       ` Andi Kleen
2023-01-30 13:45         ` Guorui Yu
2023-01-31 17:16           ` Andi Kleen
2023-02-01  2:08             ` Guorui Yu
2023-01-28  8:32 ` [PATCH 3/4] swiotlb: Add tracepoint swiotlb_unbounced GuoRui.Yu
2023-01-28  8:32 ` [PATCH 4/4] cc-swiotlb: Allow set swiotlb watermark from cmdline GuoRui.Yu
2023-01-28 20:19   ` kernel test robot
2023-01-28  9:03 ` [RFC] swiotlb: Add a new cc-swiotlb implementation for Confidential VMs Guorui Yu
2023-01-30  6:54   ` Christoph Hellwig
2023-01-30 13:03 ` Robin Murphy [this message]
2023-01-30 14:37   ` Guorui Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ee204fa-091d-43d9-9c2c-0c64cf0c1fdd@arm.com \
    --to=robin.murphy@arm.com \
    --cc=GuoRui.Yu@linux.alibaba.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).