From: Matthew Wilcox <matthew@wil.cx>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: linux-kernel@vger.kernel.org, Greg Kroah-Hartman <gregkh@suse.de>,
Jesse Barnes <jbarnes@virtuousgeek.org>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH] sysfs: add per pci device msi[x] irq listing (v3)
Date: Thu, 22 Sep 2011 07:54:28 -0600 [thread overview]
Message-ID: <20110922135428.GC16740@parisc-linux.org> (raw)
In-Reply-To: <1316447235-31345-1-git-send-email-nhorman@tuxdriver.com>
On Mon, Sep 19, 2011 at 11:47:15AM -0400, Neil Horman wrote:
> So a while back, I wanted to provide a way for irqbalance (and other apps) to
> definitively map irqs to devices, which, for msi[x] irqs is currently not really
> possible in user space. My first attempt wen't not so well:
> https://lkml.org/lkml/2011/4/21/308
>
> It was plauged by the same issues that prior attempts were, namely that it
> violated the one-file-one-value sysfs rule. I wandered off but have recently
> come back to this. I've got a new implementation here that exports a new
> subdirectory for every pci device, called msi_irqs. This subdirectory contanis
> a variable number of numbered subdirectories, in which the number represents an
> msi irq. Each numbered subdirectory contains attributes for that irq, which
> currently is only the mode it is operating in (msi vs. msix). I think fits
> within the constraints sysfs requires, and will allow irqbalance to properly map
> msi irqs to devices without having to rely on rickety, best guess methods like
> interface name matching.
This approach feels like building bigger rockets instead of a space
elevator :-)
What we need is to allow device drivers to ask for per-CPU interrupts,
and implement them in terms of MSI-X. I've made a couple of stabs at
implementing this, but haven't got anything working yet. It would solve
a number of problems:
1. NUMA cacheline fetch. At the moment, desc->istate gets modified by
handle_edge_irq. handle_percpu_irq doesn't need to worry about any
of that stuff, so doesn't touch desc->istate. I've heard this is a
significant problem for the high-speed networking people.
2. /proc/interrupts is unmanagable on large machines. There are hundreds
of interrupts and dozens of CPUs. This would go a long way to reducing
the number of rows in the table (doesn't do anything about the columns).
ie instead of this:
79: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1
80: 0 0 9275611 0 0 0 0 0 PCI-MSI-edge eth1-TxRx-0
81: 0 0 9275611 0 0 0 0 0 PCI-MSI-edge eth1-TxRx-1
82: 0 0 0 0 9275611 0 0 0 PCI-MSI-edge eth1-TxRx-2
83: 0 0 0 0 9275611 0 0 0 PCI-MSI-edge eth1-TxRx-3
84: 0 0 0 0 0 9275611 0 0 PCI-MSI-edge eth1-TxRx-4
85: 0 0 0 0 0 9275611 0 0 PCI-MSI-edge eth1-TxRx-5
86: 0 0 0 0 0 0 9275611 0 PCI-MSI-edge eth1-TxRx-6
87: 0 0 0 0 0 0 9275611 0 PCI-MSI-edge eth1-TxRx-7
We'd get this:
79: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1
80: 9275611 9275611 9275611 9275611 9275611 9275611 9275611 9275611 PCI-MSI-edge eth1-TxRx
3. /proc/irq/x/smp_affinity actually makes sense again. It can be a
mask of which interrupts are active instead of being a degenerate case
in which only the lowest set bit is actually honoured.
4. Easier to manage for the device driver. All it needs is to call
request_percpu_irq(...) instead of trying to figure out how many
threads/cores/numa nodes/... there are in the machine, and how many
other multi-interrupt devices there are; and thus how many interrupts
it should allocate. That can be left to the interrupt core which at
least has a chance of getting it right.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
next prev parent reply other threads:[~2011-09-22 13:54 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-14 18:36 [PATCH] sysfs: add per pci device msi[x] irq listing Neil Horman
2011-09-15 14:40 ` Greg KH
2011-09-15 15:07 ` Neil Horman
2011-09-15 20:08 ` [PATCH] sysfs: add per pci device msi[x] irq listing (v2) Neil Horman
2011-09-16 8:36 ` Greg KH
2011-09-16 10:57 ` Neil Horman
2011-09-16 13:23 ` Greg KH
2011-09-16 13:32 ` Neil Horman
2011-09-16 16:12 ` Bjorn Helgaas
2011-09-19 15:47 ` [PATCH] sysfs: add per pci device msi[x] irq listing (v3) Neil Horman
2011-09-19 17:14 ` Greg KH
2011-09-19 17:33 ` Neil Horman
2011-09-22 10:49 ` Konrad Rzeszutek Wilk
2011-09-22 10:57 ` Neil Horman
2011-09-22 11:10 ` Konrad Rzeszutek Wilk
2011-09-22 13:21 ` Neil Horman
2011-09-22 13:17 ` Neil Horman
2011-09-22 13:54 ` Matthew Wilcox [this message]
2011-09-22 14:32 ` Neil Horman
2011-09-28 22:18 ` Bjorn Helgaas
2011-09-29 0:42 ` Neil Horman
2011-09-29 4:40 ` Bjorn Helgaas
2011-09-29 13:07 ` Neil Horman
2011-09-29 14:38 ` [PATCH] sysfs: add per pci device msi[x] irq listing (v4) Neil Horman
2011-09-29 14:51 ` Greg KH
2011-09-30 12:32 ` Stefan Richter
2011-09-30 15:33 ` Neil Horman
2011-09-30 16:33 ` Bjorn Helgaas
2011-09-30 16:54 ` Neil Horman
2011-10-06 15:36 ` Jesse Barnes
2011-10-06 17:12 ` Neil Horman
2011-10-06 17:57 ` Bjorn Helgaas
2011-10-06 18:08 ` [PATCH] sysfs: add per pci device msi[x] irq listing (v5) Neil Horman
2011-10-14 16:21 ` Jesse Barnes
2011-10-14 16:40 ` Greg KH
2011-10-14 17:31 ` Neil Horman
2011-11-01 16:47 ` Neil Horman
2011-11-01 16:58 ` Jesse Barnes
2011-11-01 18:05 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110922135428.GC16740@parisc-linux.org \
--to=matthew@wil.cx \
--cc=gregkh@suse.de \
--cc=jbarnes@virtuousgeek.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox