public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@muc.de>
To: Anton Blanchard <anton@samba.org>
Cc: mark_salyzyn@adaptec.com, Christoph Hellwig <hch@infradead.org>,
	Alan Cox <alan@redhat.com>,
	linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: PATCH: Further aacraid work
Date: Thu, 17 Jun 2004 21:10:43 +0200	[thread overview]
Message-ID: <m3smcut2z0.fsf@averell.firstfloor.org> (raw)
In-Reply-To: <286Qp-5EU-19@gated-at.bofh.it> (Anton Blanchard's message of "Thu, 17 Jun 2004 17:20:09 +0200")

Anton Blanchard <anton@samba.org> writes:

> Please divert some of your anger towards your manufacturer of dodgy
> hardware. Any sane hardware with an IOMMU handles this just fine.
> eg on ppc64 running a disk test:
>
> sg size    in        out
> 1           3      47569
> 2           0       2591
> 3           0       1123
> 4           0        447
> 5           0        429
> ...
> 62       5095          0
> 64      47061          0
>
> The IOMMU is taking 62-64 entry SG lists and producing 1-5 entry lists.

The AMD64 IOMMU could do it too (and the code to do it exists in
2.6). But the problem is that the current IO layer doesn't provide a
sufficient fallback path when this fails. You have to promise in
advance that you can merge and then later it's too late to change your
mind without signalling an IO error.

This is a real problem on AMD64, because IOMMU aperture is relatively
small and can fragment. 

I had a chat with James about this at last year's OLS. The Consensus
was iirc that it needs driver interface changes at least.

If there was a sane fallback path for this I would enable merging
always (and add some fragmentation avoidance algorithms to the 
bitmap allocator to make failure less likely)

It's also a balancing act in terms of performance. The IOMMU setup
is relatively slow (it has to do an PCI config space write and 
an uncached memory access), and it depends on the device if it's 
actually faster to go through the IOMMU. I did some benchmarks
and it seems to help on MPT Fusion controllers, but slows down
ethernet. Most probably we need an driver function call where
the driver can tell the IOMMU layer "I am slow at merging; 
give me merged sg lists and i can handle errors by falling back"

Also of course when the merging is used you will always get
addresses <32bit and can potentially use smaller descriptors.
But again you need fallback, because on AMD64 the IOMMus 
can be quite small and it's possible to overflow them 
in extreme traffic situations.

-Andi


       reply	other threads:[~2004-06-17 19:12 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <286GI-5y3-11@gated-at.bofh.it>
     [not found] ` <286Qp-5EU-19@gated-at.bofh.it>
2004-06-17 19:10   ` Andi Kleen [this message]
2004-06-17 20:54     ` PATCH: Further aacraid work Alan Cox
2004-06-17 21:13       ` James Bottomley
2004-06-17 21:25       ` Andi Kleen
2004-06-18 15:19         ` Benjamin Herrenschmidt
2004-06-18  5:57       ` Jeff Garzik
2004-06-18 14:07         ` James Bottomley
2004-06-18 15:17     ` Benjamin Herrenschmidt
2004-06-29 20:55 Salyzyn, Mark
2004-06-29 23:22 ` Byron Stanoszek
2004-06-30 19:52 ` Byron Stanoszek
2004-06-30 19:59   ` Dario
  -- strict thread matches above, loose matches on Subject: below --
2004-06-29 19:27 Salyzyn, Mark
2004-06-29 20:20 ` Byron Stanoszek
2004-06-29 20:42 ` Alan Cox
2004-06-29 18:53 Salyzyn, Mark
2004-06-29 19:03 ` Byron Stanoszek
2004-06-28 13:17 Salyzyn, Mark
2004-06-18 20:53 Salyzyn, Mark
2004-06-17 17:54 Salyzyn, Mark
2004-06-17 20:38 ` Alan Cox
2004-06-17 20:48   ` William Lee Irwin III
2004-06-17 20:56     ` James Bottomley
2004-06-18 15:05     ` William Lee Irwin III
2004-06-18 20:32       ` William Lee Irwin III
2004-06-27 17:33         ` James Bottomley
2004-06-17 14:39 Salyzyn, Mark
2004-06-17 14:55 ` James Bottomley
2004-06-17 14:58   ` Alan Cox
2004-06-17 15:15     ` Arjan van de Ven
2004-06-17 19:16       ` James Bottomley
2004-06-17 16:32   ` Clay Haapala
2004-06-17 16:37     ` James Bottomley
2004-06-17 16:46     ` Alan Cox
2004-06-17 15:11 ` Anton Blanchard
2004-06-17 12:53 Salyzyn, Mark
2004-06-17 13:07 ` Matthew Wilcox
2004-06-17 13:19   ` Christoph Hellwig
2004-06-17 13:55   ` James Bottomley
2004-06-17 13:32 ` Christoph Hellwig
2004-06-17 14:02 ` Alan Cox
2004-06-16 21:04 Alan Cox
2004-06-16 21:33 ` Christoph Hellwig
2004-06-16 21:40   ` Alan Cox
2004-06-16 21:42     ` Christoph Hellwig
2004-06-16 21:48       ` Alan Cox
2004-06-16 21:58         ` Christoph Hellwig
2004-06-16 22:06           ` Alan Cox
2004-06-29 17:48 ` Byron Stanoszek
2004-06-29 18:27   ` Mark Haverkamp
2004-06-29 18:37   ` Alan Cox
2004-06-30  2:02 ` bm
2004-06-30 16:07   ` Alan Cox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3smcut2z0.fsf@averell.firstfloor.org \
    --to=ak@muc.de \
    --cc=alan@redhat.com \
    --cc=anton@samba.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mark_salyzyn@adaptec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox