public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: Will Deacon <will@kernel.org>
Cc: <sagi@grimberg.me>, <hch@lst.de>, <axboe@kernel.dk>,
	<kbusch@kernel.org>, <joro@8bytes.org>, <robin.murphy@arm.com>,
	<jgg@nvidia.com>, <linux-nvme@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>, <iommu@lists.linux.dev>,
	<murphyt7@tcd.ie>, <baolu.lu@linux.intel.com>
Subject: Re: [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB
Date: Wed, 14 Feb 2024 11:57:32 -0800	[thread overview]
Message-ID: <Zc0bLAIXSAqsQJJv@Asurada-Nvidia> (raw)
In-Reply-To: <20240214164138.GA31927@willie-the-truck>

Hi Will,

On Wed, Feb 14, 2024 at 04:41:38PM +0000, Will Deacon wrote:
> Hi Nicolin,
> 
> On Tue, Feb 13, 2024 at 01:53:55PM -0800, Nicolin Chen wrote:
> > It's observed that an NVME device is causing timeouts when Ubuntu boots
> > with a kernel configured with PAGE_SIZE=64KB due to failures in swiotlb:
> >     systemd[1]: Started Journal Service.
> >  => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
> >     note: journal-offline[392] exited with irqs disabled
> >     note: journal-offline[392] exited with preempt_count 1
> >
> > An NVME device under a PCIe bus can be behind an IOMMU, so dma mappings
> > going through dma-iommu might be also redirected to swiotlb allocations.
> > Similar to dma_direct_max_mapping_size(), dma-iommu should implement its
> > dma_map_ops->max_mapping_size to return swiotlb_max_mapping_size() too.
> >
> > Though an iommu_dma_max_mapping_size() is a must, it alone can't fix the
> > issue. The swiotlb_max_mapping_size() returns 252KB, calculated from the
> > default pool 256KB subtracted by min_align_mask NVME_CTRL_PAGE_SIZE=4KB,
> > while dma-iommu can roundup a 252KB mapping to 256KB at its "alloc_size"
> > when PAGE_SIZE=64KB via iova->granule that is often set to PAGE_SIZE. So
> > this mismatch between NVME_CTRL_PAGE_SIZE=4KB and PAGE_SIZE=64KB results
> > in a similar failure, though its signature has a fixed size "256KB":
> >     systemd[1]: Started Journal Service.
> >  => nvme 0000:00:01.0: swiotlb buffer is full (sz: 262144 bytes), total 32768 (slots), used 128 (slots)
> >     note: journal-offline[392] exited with irqs disabled
> >     note: journal-offline[392] exited with preempt_count 1
> >
> > Both failures above occur to NVME behind IOMMU when PAGE_SIZE=64KB. They
> > were likely introduced for the security feature by:
> > commit 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers"),
> >
> > So, this series bundles two fixes together against that. They should be
> > taken at the same time to entirely fix the mapping failures.
> 
> It's a bit of a shot in the dark, but I've got a pending fix to some of
> the alignment handling in swiotlb. It would be interesting to know if
> patch 1 has any impact at all on your NVME allocations:
> 
> https://lore.kernel.org/r/20240205190127.20685-1-will@kernel.org

I applied these three patches locally for a test.

Though I am building with a v6.6 kernel, I see some warnings:
                 from kernel/dma/swiotlb.c:26:
kernel/dma/swiotlb.c: In function ‘swiotlb_area_find_slots’:
./include/linux/minmax.h:21:35: warning: comparison of distinct pointer types lacks a cast
   21 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
      |                                   ^~
./include/linux/minmax.h:27:18: note: in expansion of macro ‘__typecheck’
   27 |                 (__typecheck(x, y) && __no_side_effects(x, y))
      |                  ^~~~~~~~~~~
./include/linux/minmax.h:37:31: note: in expansion of macro ‘__safe_cmp’
   37 |         __builtin_choose_expr(__safe_cmp(x, y), \
      |                               ^~~~~~~~~~
./include/linux/minmax.h:75:25: note: in expansion of macro ‘__careful_cmp’
   75 | #define max(x, y)       __careful_cmp(x, y, >)
      |                         ^~~~~~~~~~~~~
kernel/dma/swiotlb.c:1007:26: note: in expansion of macro ‘max’
 1007 |                 stride = max(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
      |                          ^~~

Replacing with a max_t() can fix these.

And it seems to get worse, as even a 64KB mapping is failing:
[    0.239821] nvme 0000:00:01.0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)

With a printk, I found the iotlb_align_mask isn't correct:
   swiotlb_area_find_slots:alloc_align_mask 0xffff, iotlb_align_mask 0x800

But fixing the iotlb_align_mask to 0x7ff still fails the 64KB
mapping..

Thanks
Nicolin

  reply	other threads:[~2024-02-14 19:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 21:53 [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB Nicolin Chen
2024-02-13 21:53 ` [PATCH v1 1/2] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device Nicolin Chen
2024-02-13 21:53 ` [PATCH v1 2/2] nvme-pci: Fix iommu map (via swiotlb) failures when PAGE_SIZE=64KB Nicolin Chen
2024-02-13 23:31   ` Keith Busch
2024-02-14  6:09     ` Nicolin Chen
2024-02-15  1:36       ` Keith Busch
2024-02-15  4:46         ` Nicolin Chen
2024-02-15 12:01           ` Robin Murphy
2024-02-16  1:07             ` Nicolin Chen
2024-02-14 16:41 ` [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping " Will Deacon
2024-02-14 19:57   ` Nicolin Chen [this message]
2024-02-15 14:22     ` Will Deacon
2024-02-15 16:35       ` Will Deacon
2024-02-16  0:26         ` Nicolin Chen
2024-02-16 16:13           ` Will Deacon
2024-02-17  5:19             ` Nicolin Chen
2024-02-19  4:05               ` Michael Kelley
2024-02-16  0:29       ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zc0bLAIXSAqsQJJv@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=baolu.lu@linux.intel.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=murphyt7@tcd.ie \
    --cc=robin.murphy@arm.com \
    --cc=sagi@grimberg.me \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox