public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Andrew Vasquez <andrew.vasquez@qlogic.com>
Subject: Re: [PATCH] pci: change msi-x vector to 32bit
Date: Mon, 18 Aug 2008 15:59:18 -0500	[thread overview]
Message-ID: <1219093158.3261.70.camel@localhost.localdomain> (raw)
In-Reply-To: <m14p5i2iwx.fsf@frodo.ebiederm.org>

On Mon, 2008-08-18 at 12:59 -0700, Eric W. Biederman wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> 
> > On Sat, 2008-08-16 at 15:17 -0700, Yinghai Lu wrote:
> >> On Sat, Aug 16, 2008 at 1:45 PM, James Bottomley
> >> <James.Bottomley@hansenpartnership.com> wrote:
> >> >> > What I still don't quite get is the benefit of large IRQ spaces ...
> >> >> > particularly if you encode things the system doesn't really need to know
> >> >> > in them.
> >> >>
> >> >> then set nr_irqs = nr_cpu_ids * NR_VECTORS))
> >> >> and count down for msi/msi-x?
> >> >
> >> > No, what I mean is that msis can trip directly to CPUs, so this is an
> >> > affinity thing (that MSI is directly bound to that CPU now), so in the
> >> > matrixed way we display this in show_interrupts() with the CPU along the
> >> > top and the IRQ down the side, it doesn't make sense to me to encode IRQ
> >> > affinity in the irq number again.   So it makes more sense to assign the
> >> > vectors based on both the irq number and the CPU affinity so that if the
> >> > PCI MSI for qla is assigned to CPU4 you can reassign it to CPU5 and so
> >> > on.
> >> 
> >> msi-x entry index, cpu_vector, irq number...
> >> 
> >> you want to different cpus have same vector?
> >
> > Obviously I'm not communicating very well.  Your apparent assumption is
> > that irq number == vector.  
> 
> Careful.  There are two entities termed vector in this conversation.
> There is the MSI-X vector which can hold up to 4096 entries per device.
> There is the idt vector which has 256 entries per cpu.

Yes ... I was trying to discriminate, but not very successfully.

Where I'm coming from is parisc CPUs have 64 CPU vectors (what you'd
call the idt in the local apic) and the bus hardware has a variable
number per bridge (These aren't MSI they're bridge triggered:  The way
PSI works is that bridge<->CPU is done as MSI).

Anyway, the argument about vector or matrix representation goes to the
core of what irq number represents below.

> > What I'm saying is that's not what we've
> > done for individually vectored CPU interrupts in other architectures.
> > In those we did (cpu no, irq) == vector.  i.e. the affinity and the irq
> > number identify the vector.  For non-numa systems, this is effectively
> > what you're interested in doing anyway.  For numa systems, it just
> > becomes a sparse matrix.
> 
> I believe assign_irq_vector on x86_64 and soon on x86_32 does this already.
> 
> The number that was being changed was the irq number of for the
> msi-x ``vectors'' from some random free irq number to roughly
> bus(8 bits):device+function(8 bits):msix-vector(12 bits) so that we
> could have a stable irq number for msi irqs.
> 
> Once pci domain is considered it is hard to claim we have enough bits.
> I expect we need at least pci domains to have one per NUMA node, in
> the general case.

Right ... I think the idea of trying to give some meaningful number to
irq is a lost cause.  Since IRQ is just a handle, I'd actually use
something like dev->bus_id and individual card functions rather than a
number (i.e. no longer exposing the raw number to user space).  Then
what irq number means and how it's used becomes an internal matter.

The only real concern I had was not wanting the per-cpu idt vectors to
be mangled into the irq number because they're essentially columns in
the sparse matrix with CPU across the top.

> The big motivation for killing NR_IRQS sized arrays comes from 2 directions.
> msi-x which allows up to 4096 irqs per device and nic vendors starting
> to produce cards with 256 queues, and from large SGI systems that don't do
> I/O and want to be supported with the same kernel build as smaller systems.
> A kernel built to handle 4096*32 irqs which is more or less reasonable if
> the system was I/O heavy is a ridiculously sized array on smaller machines.
> 
> So a static irq_desc is out.  And since with the combination of msi-x hotplug
> we can not tell how many irq sources and thus irq numbers the machine is going
> to have we can not reasonably even have a dynamic array at boot time.  Further
> we also want to allocate the irq_desc entries in node-local memory on NUMA
> machines for better performance.  Which means we need to dynamically allocate
> irq_desc entries and have some lookup mechanism from irq# to irq_desc entry.
> 
> So once we have all of that.  It becomes possible to look at assigning a static
> irq number to each pci (bus:device:function:msi-x vector) pair so the system
> is more reproducible.

Yes, agree with the above except the static irq number ... although once
that's hidden from the user, I suppose I really don't care any more.

James




  reply	other threads:[~2008-08-18 20:59 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-16  3:26 [PATCH] pci: change msi-x vector to 32bit H. Peter Anvin
2008-08-16  6:42 ` Yinghai Lu
2008-08-16 14:50   ` James Bottomley
2008-08-16 15:39     ` Alan Cox
2008-08-16 16:13       ` James Bottomley
2008-08-16 18:56         ` Yinghai Lu
2008-08-16 20:10           ` Andrew Vasquez
2008-08-16 20:25           ` James Bottomley
2008-08-16 20:34             ` Yinghai Lu
2008-08-16 20:45               ` James Bottomley
2008-08-16 22:17                 ` Yinghai Lu
2008-08-16 23:09                   ` James Bottomley
2008-08-16 23:21                     ` Yinghai Lu
2008-08-18 19:59                     ` Eric W. Biederman
2008-08-18 20:59                       ` James Bottomley [this message]
2008-08-18 21:45                         ` Eric W. Biederman
2008-08-18 22:04                           ` James Bottomley
2008-08-18 21:51                             ` Alan Cox
2008-08-18 22:13                               ` H. Peter Anvin
2008-08-18 22:27                               ` James Bottomley
2008-08-18 21:24                       ` H. Peter Anvin
2008-08-16  8:17 ` Eric W. Biederman
2008-08-16  9:00   ` Yinghai Lu
  -- strict thread matches above, loose matches on Subject: below --
2008-08-16  2:36 Yinghai Lu
2008-08-21 20:33 ` Jesse Barnes
2008-08-21 20:47   ` Eric W. Biederman
2008-08-21 23:07     ` Jesse Barnes
2008-08-22  0:11       ` Eric W. Biederman
2008-08-22  0:35         ` Jesse Barnes
2008-08-27 23:34 ` Jesse Barnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1219093158.3261.70.camel@localhost.localdomain \
    --to=james.bottomley@hansenpartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=andrew.vasquez@qlogic.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox