linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: dwmw2@infradead.org (David Woodhouse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
Date: Fri, 11 Nov 2011 13:27:28 +0000	[thread overview]
Message-ID: <1321018048.2027.44.camel@shinybook.infradead.org> (raw)
In-Reply-To: <20111111125837.GF13213@amd.com>

On Fri, 2011-11-11 at 13:58 +0100, Joerg Roedel wrote:
> For AMD IOMMU there is a feature called not-present cache. It says that
> the IOMMU caches non-present entries as well and needs an IOTLB flush
> when something is mapped (meant for software implementations of the
> IOMMU).
> So it can't be really taken out of the fast-path. But the IOMMU driver
> can optimize the function so that it only flushes the IOTLB when there
> was an unmap-call before. 

We have exactly the same situation with the Intel IOMMU (we call it
'Caching Mode') for the same reasons.

I'd be wary about making the IOMMU driver *track* whether there was an
unmap call before ? that seems like hard work and more cache contention,
especially if the ->commit() call happens on a CPU other than the one
that just did the unmap.

I'm also not sure exactly when you'd call the ->commit() function when
the DMA API is being used, and which 'side' of that API the
deferred-flush optimisations would live.

Would the optimisation be done on the generic side, only calling
->commit when it absolutely *has* to happen? (Or periodically after
unmaps have happened to avoid entries hanging around for ever?)

Or would the optimisation be done in the IOMMU driver, thus turning the
->commit() function into more of a *hint*? You could add a 'simon_says'
boolean argument to it, I suppose...?

> It is also an improvement over the current
> situation where every iommu_unmap call results in a flush implicitly.
> This pretty much a no-go for using IOMMU-API in DMA mapping at the
> moment.

Right. That definitely needs to be handled. We just need to work out the
(above and other) details.

> > But also, it's not *so* much of an issue to divide the space up even
> > when it's limited. The idea was not to have it *strictly* per-CPU, but
> > just for a CPU to try allocating from "its own" subrange first?
> 
> Yeah, I get the idea. I fear that the memory consumption will get pretty
> high with that approach. It basically means one round-robin allocator
> per cpu and device. What does that mean on a 4096 CPU machine :)

Well, if your network device is taking interrupts, and mapping/unmapping
buffers across all 4096 CPUs, then your performance is screwed anyway :)

Certainly your concerns are valid, but I think we can cope with them
fairly reasonably. If we *do* have large number of CPUs allocating for a
given domain, we can move to a per-node rather than per-CPU allocator.
And we can have dynamically sized allocation regions, so we aren't
wasting too much space on unused bitmaps if you map just *one* page from
each of your 4096 CPUs.

> How much lock contention will be lowered also depends on the work-load.
> If dma-handles are frequently freed from another cpu than they were
> allocated from the same problem re-appears.

The idea is that dma handles are *infrequently* freed, in batches. So
we'll bounce the lock's cache line occasionally, but not all the time.

In "strict" or "unmap_flush" mode, you get to go slowly unless you do
the unmap on the same CPU that you mapped it from. I can live with that.

> But in the end we have to try it out and see what works best :)

Indeed. I'm just trying to work out if I should try to do the allocator
thing purely inside the Intel code first, and then try to move it out
and make it generic ? or if I should start with making the DMA API work
with a wrapper around the IOMMU API, with your ->commit() and other
necessary changes. I think I'd prefer the latter, if we can work out how
it should look.

-- 
dwmw2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5818 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20111111/52150256/attachment-0001.bin>

  reply	other threads:[~2011-11-11 13:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-17 11:27 [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 1/7] iommu/core: stop converting bytes to page order back and forth Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-11-10  6:17   ` Kai Huang
2011-11-10  7:31     ` Ohad Ben-Cohen
2011-11-10 12:16       ` cody
2011-11-10 13:08         ` Joerg Roedel
2011-11-10 14:35           ` cody
2011-11-10 14:51             ` Joerg Roedel
2011-11-10 15:28     ` David Woodhouse
2011-11-10 17:09       ` Joerg Roedel
2011-11-10 19:28         ` David Woodhouse
2011-11-11 12:58           ` Joerg Roedel
2011-11-11 13:27             ` David Woodhouse [this message]
2011-11-11 14:18               ` Joerg Roedel
     [not found]           ` <20111111131728.GG13213@amd.com>
2011-11-24 12:52             ` Changing IOMMU-API for generic DMA-mapping " Marek Szyprowski
2011-11-24 15:27               ` 'Joerg Roedel'
2011-11-10 21:12         ` [PATCH v4 2/7] iommu/core: split mapping to page sizes as " Stepan Moskovchenko
2011-11-11 13:24           ` Joerg Roedel
2011-11-12  2:04             ` Stepan Moskovchenko
2011-11-13  1:43               ` KyongHo Cho
2011-10-17 11:27 ` [PATCH v4 3/7] iommu/omap: announce supported page sizes Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 4/7] iommu/msm: " Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 5/7] iommu/amd: " Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 6/7] iommu/intel: " Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 7/7] iommu/core: remove the temporary pgsize settings Ohad Ben-Cohen
2011-11-08 12:57 ` [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-11-08 14:01   ` Joerg Roedel
2011-11-08 14:03     ` Ohad Ben-Cohen
2011-11-08 16:23       ` Joerg Roedel
2011-11-08 16:43         ` Ohad Ben-Cohen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1321018048.2027.44.camel@shinybook.infradead.org \
    --to=dwmw2@infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).