From: ebiederm@xmission.com (Eric W. Biederman)
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
"H. Peter Anvin" <hpa@zytor.com>,
Jesse Barnes <jbarnes@virtuousgeek.org>,
Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
Andrew Vasquez <andrew.vasquez@qlogic.com>
Subject: Re: [PATCH] pci: change msi-x vector to 32bit
Date: Mon, 18 Aug 2008 12:59:58 -0700 [thread overview]
Message-ID: <m14p5i2iwx.fsf@frodo.ebiederm.org> (raw)
In-Reply-To: <1218928162.3940.62.camel@localhost.localdomain> (James Bottomley's message of "Sat, 16 Aug 2008 18:09:22 -0500")
James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
>> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> >> > What I still don't quite get is the benefit of large IRQ spaces ...
>> >> > particularly if you encode things the system doesn't really need to know
>> >> > in them.
>> >>
>> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
>> >> and count down for msi/msi-x?
>> >
>> > No, what I mean is that msis can trip directly to CPUs, so this is an
>> > affinity thing (that MSI is directly bound to that CPU now), so in the
>> > matrixed way we display this in show_interrupts() with the CPU along the
>> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
>> > affinity in the irq number again. So it makes more sense to assign the
>> > vectors based on both the irq number and the CPU affinity so that if the
>> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
>> > on.
>>
>> msi-x entry index, cpu_vector, irq number...
>>
>> you want to different cpus have same vector?
>
> Obviously I'm not communicating very well. Your apparent assumption is
> that irq number == vector.
Careful. There are two entities termed vector in this conversation.
There is the MSI-X vector which can hold up to 4096 entries per device.
There is the idt vector which has 256 entries per cpu.
> What I'm saying is that's not what we've
> done for individually vectored CPU interrupts in other architectures.
> In those we did (cpu no, irq) == vector. i.e. the affinity and the irq
> number identify the vector. For non-numa systems, this is effectively
> what you're interested in doing anyway. For numa systems, it just
> becomes a sparse matrix.
I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.
The number that was being changed was the irq number of for the
msi-x ``vectors'' from some random free irq number to roughly
bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
could have a stable irq number for msi irqs.
Once pci domain is considered it is hard to claim we have enough bits.
I expect we need at least pci domains to have one per NUMA node, in
the general case.
The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
msi-x which allows up to 4096 irqs per device and nic vendors starting
to produce cards with 256 queues, and from large SGI systems that don't do
I/O and want to be supported with the same kernel build as smaller systems.
A kernel built to handle 4096*32 irqs which is more or less reasonable if
the system was I/O heavy is a ridiculously sized array on smaller machines.
So a static irq_desc is out. And since with the combination of msi-x hotplug
we can not tell how many irq sources and thus irq numbers the machine is going
to have we can not reasonably even have a dynamic array at boot time. Further
we also want to allocate the irq_desc entries in node-local memory on NUMA
machines for better performance. Which means we need to dynamically allocate
irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.
So once we have all of that. It becomes possible to look at assigning a static
irq number to each pci (bus:device:function:msi-x vector) pair so the system
is more reproducible.
Eric
next prev parent reply other threads:[~2008-08-18 20:04 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-16 3:26 [PATCH] pci: change msi-x vector to 32bit H. Peter Anvin
2008-08-16 6:42 ` Yinghai Lu
2008-08-16 14:50 ` James Bottomley
2008-08-16 15:39 ` Alan Cox
2008-08-16 16:13 ` James Bottomley
2008-08-16 18:56 ` Yinghai Lu
2008-08-16 20:10 ` Andrew Vasquez
2008-08-16 20:25 ` James Bottomley
2008-08-16 20:34 ` Yinghai Lu
2008-08-16 20:45 ` James Bottomley
2008-08-16 22:17 ` Yinghai Lu
2008-08-16 23:09 ` James Bottomley
2008-08-16 23:21 ` Yinghai Lu
2008-08-18 19:59 ` Eric W. Biederman [this message]
2008-08-18 20:59 ` James Bottomley
2008-08-18 21:45 ` Eric W. Biederman
2008-08-18 22:04 ` James Bottomley
2008-08-18 21:51 ` Alan Cox
2008-08-18 22:13 ` H. Peter Anvin
2008-08-18 22:27 ` James Bottomley
2008-08-18 21:24 ` H. Peter Anvin
2008-08-16 8:17 ` Eric W. Biederman
2008-08-16 9:00 ` Yinghai Lu
-- strict thread matches above, loose matches on Subject: below --
2008-08-16 2:36 Yinghai Lu
2008-08-21 20:33 ` Jesse Barnes
2008-08-21 20:47 ` Eric W. Biederman
2008-08-21 23:07 ` Jesse Barnes
2008-08-22 0:11 ` Eric W. Biederman
2008-08-22 0:35 ` Jesse Barnes
2008-08-27 23:34 ` Jesse Barnes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m14p5i2iwx.fsf@frodo.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=andrew.vasquez@qlogic.com \
--cc=hpa@zytor.com \
--cc=jbarnes@virtuousgeek.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
--cc=yhlu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox