Re: Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Halil Pasic <pasic@linux.ibm.com>
To: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Petr Tesarik <petr.tesarik1@huawei-partners.com>,
	Ross Lagerwall <ross.lagerwall@citrix.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Halil Pasic <pasic@linux.ibm.com>
Subject: Re: Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y
Date: Fri, 3 Nov 2023 17:14:47 +0100	[thread overview]
Message-ID: <20231103171447.02759771.pasic@linux.ibm.com> (raw)
In-Reply-To: <104a8c8fedffd1ff8a2890983e2ec1c26bff6810.camel@linux.ibm.com>

On Fri, 03 Nov 2023 16:13:03 +0100
Niklas Schnelle <schnelle@linux.ibm.com> wrote:

> The reason for 1) is a bit more convoluted and not entirely understood
> by us. We are certain though that the function swiotlb_find_slots()
> allocates a pool with nr_slots(alloc_size), where this alloc_size is
> the alloc_size from swiotlb_tbl_map_single() + swiotlb_align_offset(),
> but for alignment reasons some slots may get "skipped over" in
> swiotlb_area_find_slots() causing the picked slots to overrun the
> allocation.
> 
> Not sure how to properly fix this as the different alignment
> requirements get pretty complex quickly. So would appreciate your
> input.

Let me post a more detailed analysis of why do we observe
swiotlb_area_find_slots() considering the slot with the
index 0 invalid in our particular case, and how does that
relate to the whole "alignment" complex.

Currently there are three distinct mechanisms that dictate the "alignment":
a) min_align_mask (introduced by 36950f2da1ea ("driver core: add a
   min_align_mask))
field to struct device_dma_parameters"))
b) alloc_size >= PAGE_SIZE which requires "page alignment"
c) alloc_aligned_mask.

In our case min_align_mask == 0 and a) is thus not applicable, because b) and
c) we end up with iotlb_align_mask = 0x800. And because orig_add & 0x800 ==
0x800 but pool->start & 0x800 == 0 and the slot at index i is skipped over. The
slot 0 is skipped over because it is page aligned, when !!((1UL << PAGE_SHIFT)
& orig_addr) 

Let us note that with the current implementation the min_align_size mask, that
is mechanism a) also controls the tlb_addr within the first slot so that:
tlb_addr & min_align_mask == orig_addr & min_align_mask. In that sense a) is
very unlike b) and c).

For example, if !min_align_mask, then tlb_addr & (IO_TLB_SIZE - 1) is always 0,
even if the alloc_size is >= PAGE_SIZE or if alloc_aligned_size is non 0.

If with b) and c) the goal is that the swiotlb buffer shall not stretch over
more pages or address space blocks of a size dictated by alloc_aligned_mask
then, that goal is accomplished. If however the goal is to preserve the offsets
modulo some exponent of 2 dictated either by PAGE_SHIFT or by alloc_aligned
mask, then that goal is not reached. 

But there is more to it! In the beginning there was b), or more precisely in the
olden days for mapping_size >= PAGE_SIZE we used to allocate properly page
aligned bounce buffers. That is tlb_addr & (~PAGE_MASK) == 0 regardless of what
orig_addr & (~PAGE_MASK) & (IO_TLB_SIZE - 1) is. That first got borked by
commit 1f221a0d0dbf ("swiotlb: respect min_align_mask") and then it did not get
fixed by commit 0eee5ae10256 ("swiotlb: fix slot alignment checks").

Let us also note that if more than one of the above mechanisms are applicable,
then for the slot selection the idea is apparently to go with the strictest
"alignment requirement", while for the offset within the slot only a) matters
(if applicable, i.e. min_align_mask != 0), which may appear strange if
not thoroughly documented.

In our opinion the first step towards getting this right is to figure out what
the different kinds of alignments are really supposed to mean. For each of the
mechanisms we need to understand and document, whether making sure that the
bounce buffer does not stretch over more of certain units of memory (like,
pages, iova granule size, whatever), or is it about preserving offset within a
certain unit of memory, and if yes to what extent (the least significant n-bits
of the orig_addr dictated by the respective mask, or something different).

Thank you for your help in advance!

Regards,
Halil and Niklas

next prev parent reply	other threads:[~2023-11-03 16:15 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-03 15:13 Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y Niklas Schnelle
2023-11-03 16:14 ` Halil Pasic [this message]
2023-11-03 20:50   ` Petr Tesařík
2023-11-06  7:42     ` Christoph Hellwig
2023-11-07 17:24       ` Halil Pasic
2023-11-08  7:30         ` Christoph Hellwig
2023-11-06 10:08     ` Halil Pasic
2023-11-07 17:24     ` Halil Pasic
2023-11-08  9:13       ` Petr Tesařík
2023-11-23 10:16         ` Petr Tesařík
2023-11-27 15:59           ` Christoph Hellwig
2023-11-28  7:16             ` Petr Tesařík
2023-11-03 18:59 ` Petr Tesařík
2023-11-06  7:44   ` Christoph Hellwig
2023-11-06 12:46     ` Petr Tesarik
2023-11-08 10:52   ` Halil Pasic
2023-11-08 11:04     ` Petr Tesarik
2023-11-08 14:32       ` Halil Pasic
2023-11-08 14:45         ` Petr Tesarik
2023-11-10  9:22           ` Halil Pasic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231103171447.02759771.pasic@linux.ibm.com \
    --to=pasic@linux.ibm.com \
    --cc=bhelgaas@google.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=petr.tesarik1@huawei-partners.com \
    --cc=robin.murphy@arm.com \
    --cc=ross.lagerwall@citrix.com \
    --cc=schnelle@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).