public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Andrew Vasquez <andrew.vasquez@qlogic.com>
Subject: Re: [PATCH] pci: change msi-x vector to 32bit
Date: Mon, 18 Aug 2008 12:59:58 -0700	[thread overview]
Message-ID: <m14p5i2iwx.fsf@frodo.ebiederm.org> (raw)
In-Reply-To: <1218928162.3940.62.camel@localhost.localdomain> (James Bottomley's message of "Sat, 16 Aug 2008 18:09:22 -0500")

James Bottomley <James.Bottomley@HansenPartnership.com> writes:

> On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
>> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> >> > What I still don't quite get is the benefit of large IRQ spaces ...
>> >> > particularly if you encode things the system doesn't really need to know
>> >> > in them.
>> >>
>> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
>> >> and count down for msi/msi-x?
>> >
>> > No, what I mean is that msis can trip directly to CPUs, so this is an
>> > affinity thing (that MSI is directly bound to that CPU now), so in the
>> > matrixed way we display this in show_interrupts() with the CPU along the
>> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
>> > affinity in the irq number again.   So it makes more sense to assign the
>> > vectors based on both the irq number and the CPU affinity so that if the
>> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
>> > on.
>> 
>> msi-x entry index, cpu_vector, irq number...
>> 
>> you want to different cpus have same vector?
>
> Obviously I'm not communicating very well.  Your apparent assumption is
> that irq number == vector.  

Careful.  There are two entities termed vector in this conversation.
There is the MSI-X vector which can hold up to 4096 entries per device.
There is the idt vector which has 256 entries per cpu.

> What I'm saying is that's not what we've
> done for individually vectored CPU interrupts in other architectures.
> In those we did (cpu no, irq) == vector.  i.e. the affinity and the irq
> number identify the vector.  For non-numa systems, this is effectively
> what you're interested in doing anyway.  For numa systems, it just
> becomes a sparse matrix.

I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.

The number that was being changed was the irq number of for the
msi-x ``vectors'' from some random free irq number to roughly
bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
could have a stable irq number for msi irqs.

Once pci domain is considered it is hard to claim we have enough bits.
I expect we need at least pci domains to have one per NUMA node, in
the general case.

The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
msi-x which allows up to 4096 irqs per device and nic vendors starting
to produce cards with 256 queues, and from large SGI systems that don't do
I/O and want to be supported with the same kernel build as smaller systems.
A kernel built to handle 4096*32 irqs which is more or less reasonable if
the system was I/O heavy is a ridiculously sized array on smaller machines.

So a static irq_desc is out.  And since with the combination of msi-x hotplug
we can not tell how many irq sources and thus irq numbers the machine is going
to have we can not reasonably even have a dynamic array at boot time.  Further
we also want to allocate the irq_desc entries in node-local memory on NUMA
machines for better performance.  Which means we need to dynamically allocate
irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.

So once we have all of that.  It becomes possible to look at assigning a static
irq number to each pci (bus:device:function:msi-x vector) pair so the system
is more reproducible.

Eric

  parent reply	other threads:[~2008-08-18 20:04 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-16  3:26 [PATCH] pci: change msi-x vector to 32bit H. Peter Anvin
2008-08-16  6:42 ` Yinghai Lu
2008-08-16 14:50   ` James Bottomley
2008-08-16 15:39     ` Alan Cox
2008-08-16 16:13       ` James Bottomley
2008-08-16 18:56         ` Yinghai Lu
2008-08-16 20:10           ` Andrew Vasquez
2008-08-16 20:25           ` James Bottomley
2008-08-16 20:34             ` Yinghai Lu
2008-08-16 20:45               ` James Bottomley
2008-08-16 22:17                 ` Yinghai Lu
2008-08-16 23:09                   ` James Bottomley
2008-08-16 23:21                     ` Yinghai Lu
2008-08-18 19:59                     ` Eric W. Biederman [this message]
2008-08-18 20:59                       ` James Bottomley
2008-08-18 21:45                         ` Eric W. Biederman
2008-08-18 22:04                           ` James Bottomley
2008-08-18 21:51                             ` Alan Cox
2008-08-18 22:13                               ` H. Peter Anvin
2008-08-18 22:27                               ` James Bottomley
2008-08-18 21:24                       ` H. Peter Anvin
2008-08-16  8:17 ` Eric W. Biederman
2008-08-16  9:00   ` Yinghai Lu
  -- strict thread matches above, loose matches on Subject: below --
2008-08-16  2:36 Yinghai Lu
2008-08-21 20:33 ` Jesse Barnes
2008-08-21 20:47   ` Eric W. Biederman
2008-08-21 23:07     ` Jesse Barnes
2008-08-22  0:11       ` Eric W. Biederman
2008-08-22  0:35         ` Jesse Barnes
2008-08-27 23:34 ` Jesse Barnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m14p5i2iwx.fsf@frodo.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andrew.vasquez@qlogic.com \
    --cc=hpa@zytor.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox