From: Matthew Wilcox <matthew@wil.cx>
To: Grant Grundler <grundler@parisc-linux.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Notes from LPC PCI/MSI BoF session
Date: Wed, 24 Sep 2008 09:44:40 -0600 [thread overview]
Message-ID: <20080924154439.GM27204@parisc-linux.org> (raw)
In-Reply-To: <20080924055116.GA3101@colo.lackof.org>
On Tue, Sep 23, 2008 at 11:51:16PM -0600, Grant Grundler wrote:
> Being one of the "driver guys", let me add my thoughts.
> For the following discussion, I think we can treat MSI and MSI-X the
> same and will just say "MSI".
I really don't think so. MSI suffers from numerous problems, including
on x86 the need to have all interrupts targetted at the same CPU. You
effectively can't reprogram the number of MSI allocated while the device
is active. So I would say this discussion applies *only* to MSI-X.
> Dave Miller (and others) have clearly stated they don't want to see
> CPU affinity handled in the device drivers and want irqbalanced
> to handle interrupt distribution. The problem with this is irqbalanced
> needs to know how each device driver is binding multiple MSI to it's queues.
> Some devices could prefer several MSI go to the same processor and
> others want each MSI bound to a different "node" (NUMA).
But that's *policy*. It's not what the device wants, it's what the
sysadmin wants.
> A second solution I thought of later might be for the device driver to
> export (sysfs?) to irqbalanced which MSIs the driver instance owns and
> how many "domains" those MSIs can serve. irqbalanced can then write
> back into the same (sysfs?) the mapping of MSI to domains and update
> the smp_affinity mask for each of those MSI.
>
> The driver could quickly look up the reverse map CPUs to "domains".
> When a process attempts to start an IO, driver wants to know which
> queue pair the IO should be placed on so the completion event will
> be handled in the same "domain". The result is IOs could start/complete
> on the same (now warm) "CPU cache" with minimal spinlock bouncing.
>
> I'm not clear on details right now. I belive this would allow
> irqbalanced to manage IRQs in an optimal way without having to
> have device specific code in it. Unfortunately, I'm not in a position
> propose patches due to current work/family commitments. It would
> be fun to work on. *sigh*
I think looking at this in terms of MSIs is the wrong level. The driver
needs to be instructed how many and what type of *queues* to create.
Then allocation of MSIs falls out naturally from that.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
next prev parent reply other threads:[~2008-09-24 15:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-22 19:29 Notes from LPC PCI/MSI BoF session Jesse Barnes
2008-09-24 5:51 ` Grant Grundler
2008-09-24 6:47 ` David Miller
2008-09-25 15:53 ` Grant Grundler
2008-09-24 15:44 ` Matthew Wilcox [this message]
2008-09-25 16:15 ` Grant Grundler
2008-10-01 15:00 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080924154439.GM27204@parisc-linux.org \
--to=matthew@wil.cx \
--cc=grundler@parisc-linux.org \
--cc=jbarnes@virtuousgeek.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox