From: Matthew Wilcox <matthew@wil.cx>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: linux-kernel@vger.kernel.org, Greg Kroah-Hartman <gregkh@suse.de>,
Jesse Barnes <jbarnes@virtuousgeek.org>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH] sysfs: add per pci device msi[x] irq listing (v3)
Date: Thu, 22 Sep 2011 07:54:28 -0600 [thread overview]
Message-ID: <20110922135428.GC16740@parisc-linux.org> (raw)
In-Reply-To: <1316447235-31345-1-git-send-email-nhorman@tuxdriver.com>
On Mon, Sep 19, 2011 at 11:47:15AM -0400, Neil Horman wrote:
> So a while back, I wanted to provide a way for irqbalance (and other apps) to
> definitively map irqs to devices, which, for msi[x] irqs is currently not really
> possible in user space. My first attempt wen't not so well:
> https://lkml.org/lkml/2011/4/21/308
>
> It was plauged by the same issues that prior attempts were, namely that it
> violated the one-file-one-value sysfs rule. I wandered off but have recently
> come back to this. I've got a new implementation here that exports a new
> subdirectory for every pci device, called msi_irqs. This subdirectory contanis
> a variable number of numbered subdirectories, in which the number represents an
> msi irq. Each numbered subdirectory contains attributes for that irq, which
> currently is only the mode it is operating in (msi vs. msix). I think fits
> within the constraints sysfs requires, and will allow irqbalance to properly map
> msi irqs to devices without having to rely on rickety, best guess methods like
> interface name matching.
This approach feels like building bigger rockets instead of a space
elevator :-)
What we need is to allow device drivers to ask for per-CPU interrupts,
and implement them in terms of MSI-X. I've made a couple of stabs at
implementing this, but haven't got anything working yet. It would solve
a number of problems:
1. NUMA cacheline fetch. At the moment, desc->istate gets modified by
handle_edge_irq. handle_percpu_irq doesn't need to worry about any
of that stuff, so doesn't touch desc->istate. I've heard this is a
significant problem for the high-speed networking people.
2. /proc/interrupts is unmanagable on large machines. There are hundreds
of interrupts and dozens of CPUs. This would go a long way to reducing
the number of rows in the table (doesn't do anything about the columns).
ie instead of this:
79: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1
80: 0 0 9275611 0 0 0 0 0 PCI-MSI-edge eth1-TxRx-0
81: 0 0 9275611 0 0 0 0 0 PCI-MSI-edge eth1-TxRx-1
82: 0 0 0 0 9275611 0 0 0 PCI-MSI-edge eth1-TxRx-2
83: 0 0 0 0 9275611 0 0 0 PCI-MSI-edge eth1-TxRx-3
84: 0 0 0 0 0 9275611 0 0 PCI-MSI-edge eth1-TxRx-4
85: 0 0 0 0 0 9275611 0 0 PCI-MSI-edge eth1-TxRx-5
86: 0 0 0 0 0 0 9275611 0 PCI-MSI-edge eth1-TxRx-6
87: 0 0 0 0 0 0 9275611 0 PCI-MSI-edge eth1-TxRx-7
We'd get this:
79: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1
80: 9275611 9275611 9275611 9275611 9275611 9275611 9275611 9275611 PCI-MSI-edge eth1-TxRx
3. /proc/irq/x/smp_affinity actually makes sense again. It can be a
mask of which interrupts are active instead of being a degenerate case
in which only the lowest set bit is actually honoured.
4. Easier to manage for the device driver. All it needs is to call
request_percpu_irq(...) instead of trying to figure out how many
threads/cores/numa nodes/... there are in the machine, and how many
other multi-interrupt devices there are; and thus how many interrupts
it should allocate. That can be left to the interrupt core which at
least has a chance of getting it right.
--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."
next prev parent reply other threads:[~2011-09-22 13:54 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-14 18:36 [PATCH] sysfs: add per pci device msi[x] irq listing Neil Horman
2011-09-15 14:40 ` Greg KH
2011-09-15 15:07 ` Neil Horman
2011-09-15 20:08 ` [PATCH] sysfs: add per pci device msi[x] irq listing (v2) Neil Horman
2011-09-16 8:36 ` Greg KH
2011-09-16 10:57 ` Neil Horman
2011-09-16 13:23 ` Greg KH
2011-09-16 13:32 ` Neil Horman
2011-09-16 16:12 ` Bjorn Helgaas
2011-09-19 15:47 ` [PATCH] sysfs: add per pci device msi[x] irq listing (v3) Neil Horman
2011-09-19 17:14 ` Greg KH
2011-09-19 17:33 ` Neil Horman
2011-09-22 10:49 ` Konrad Rzeszutek Wilk
2011-09-22 10:57 ` Neil Horman
2011-09-22 11:10 ` Konrad Rzeszutek Wilk
2011-09-22 13:21 ` Neil Horman
2011-09-22 13:17 ` Neil Horman
2011-09-22 13:54 ` Matthew Wilcox [this message]
2011-09-22 14:32 ` Neil Horman
2011-09-28 22:18 ` Bjorn Helgaas
2011-09-29 0:42 ` Neil Horman
2011-09-29 4:40 ` Bjorn Helgaas
2011-09-29 13:07 ` Neil Horman
2011-09-29 14:38 ` [PATCH] sysfs: add per pci device msi[x] irq listing (v4) Neil Horman
2011-09-29 14:51 ` Greg KH
2011-09-30 12:32 ` Stefan Richter
2011-09-30 15:33 ` Neil Horman
2011-09-30 16:33 ` Bjorn Helgaas
2011-09-30 16:54 ` Neil Horman
2011-10-06 15:36 ` Jesse Barnes
2011-10-06 17:12 ` Neil Horman
2011-10-06 17:57 ` Bjorn Helgaas
2011-10-06 18:08 ` [PATCH] sysfs: add per pci device msi[x] irq listing (v5) Neil Horman
2011-10-14 16:21 ` Jesse Barnes
2011-10-14 16:40 ` Greg KH
2011-10-14 17:31 ` Neil Horman
2011-11-01 16:47 ` Neil Horman
2011-11-01 16:58 ` Jesse Barnes
2011-11-01 18:05 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110922135428.GC16740@parisc-linux.org \
--to=matthew@wil.cx \
--cc=gregkh@suse.de \
--cc=jbarnes@virtuousgeek.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.