Re: AMD IOMMU problem after NIC uses multi-page allocation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <joro@8bytes.org>,
	Vasant Hegde <vasant.hegde@amd.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	iommu@lists.linux.dev,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Saeed Mahameed <saeed@kernel.org>
Subject: Re: AMD IOMMU problem after NIC uses multi-page allocation
Date: Thu, 30 Mar 2023 21:06:05 -0700	[thread overview]
Message-ID: <20230330210605.02406324@kernel.org> (raw)
In-Reply-To: <76c7e508-c7ca-e2d9-5915-545b394623ae@arm.com>

On Thu, 30 Mar 2023 14:10:09 +0100 Robin Murphy wrote:
> > There is that old issue already mentioned where there seems to be some 
> > interplay between the IOVA caching and the lazy flush queue, which we 
> > never really managed to get to the bottom of. IIRC my hunch was that 
> > with a sufficiently large number of CPUs, fq_flush_timeout() overwhelms 
> > the rcache depot and gets into a pathological state where it then 
> > continually thrashes the IOVA rbtree in a fight with the caching system.
> > 
> > Another (simpler) possibility which comes to mind is if the 9K MTU 
> > (which I guess means 16KB IOVA allocations) puts you up against the 
> > threshold of available 32-bit IOVA space - if you keep using the 16K 
> > entries then you'll mostly be recycling them out of the IOVA caches, 
> > which is nice and fast. However once you switch back to 1500 so needing 
> > 2KB IOVAs, you've now got a load of IOVA space hogged by all the 16KB 
> > entries that are now hanging around in caches, which could push you into 
> > the case where the optimistic 32-bit allocation starts to fail (but 
> > because it *can* fall back to a 64-bit allocation, it's not going to 
> > purge those unused 16KB entries to free up more 32-bit space). If the 
> > 32-bit space then *stays* full, alloc_iova should stay in fail-fast 
> > mode, but if some 2KB allocations were below 32 bits and eventually get 
> > freed back to the tree, then subsequent attempts are liable to spend 
> > ages doing doing their best to scrape up all the available 32-bit space 
> > until it's definitely full again. For that case, [1] should help.  
> 
> ...where by "2KB" I obviously mean 4KB, since apparently in remembering 
> that the caches round up to powers of two I managed to forget that 
> that's still in units of IOVA pages, derp.
> 
> Robin.
> 
> > 
> > Even in the second case, though, I think hitting the rbtree much at all 
> > still implies that the caches might not be well-matched to the 
> > workload's map/unmap pattern, and maybe scaling up the depot size could 
> > still be the biggest win.
> > 
> > Thanks,
> > Robin.
> > 
> > [1] 
> > https://lore.kernel.org/linux-iommu/e9abc601b00e26fd15a583fcd55f2a8227903077.1674061620.git.robin.murphy@arm.com/

Alright, can confirm! :) 
That patch on top of Linus's tree fixes the issue for me!

Noob question about large systems, if you indulge me - I run into this
after enabling the IOMMU driver to get large (255+ thread) AMD machines
to work. Is there a general dependency on IOMMU for such x86 systems or
the tie between IOMMU and x2apic is AMD-specific? Or I'm completely
confused?

I couldn't find anything in the kernel docs and I'm trying to wrap my
head around getting the kernel to work the same across a heterogeneous*
fleet of machines (* in terms of vendor and CPU count).

     prev parent reply	other threads:[~2023-03-31  4:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30  1:14 AMD IOMMU problem after NIC uses multi-page allocation Jakub Kicinski
2023-03-30  2:36 ` Yunsheng Lin
2023-03-30  7:41 ` Joerg Roedel
2023-03-30 12:07   ` Vasant Hegde
2023-03-30 13:04   ` Robin Murphy
2023-03-30 13:10     ` Robin Murphy
2023-03-31  4:06       ` Jakub Kicinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230330210605.02406324@kernel.org \
    --to=kuba@kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=netdev@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=saeed@kernel.org \
    --cc=suravee.suthikulpanit@amd.com \
    --cc=vasant.hegde@amd.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.