Re: [PATCH 00/16] dyn_array and nr_irqs support v2

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Travis <travis@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>, hpa <hpa@zytor.com>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/16] dyn_array and nr_irqs support v2
Date: Mon, 04 Aug 2008 05:57:52 -0700	[thread overview]
Message-ID: <4896FCD0.8050006@sgi.com> (raw)
In-Reply-To: <m163qkjirc.fsf@frodo.ebiederm.org>

Eric W. Biederman wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> 
>>>> Increase NR_IRQS to 512 for x86_64?
>>> x86_32 has it set to 1024 so 512 is too small.  I think your patch
>>> which essentially restores the old behavior is the right way to go for
>>> this merge window.  I just want to carefully look at it and ensure we
>>> are restoring the old heuristics.  On a lot of large machines we wind
>>> up having irqs for pci slots that are never filled with cards.
>> it seems 32bit summit need NR_IRQS=256, NR_IRQ_VECTOR=1024
> 
> Yes.  Which is 1024 irq sources/gsis only 1/4 used so it will fit into 256 irqs.
> 
> On x86_64 we have removed the confusing and brittle irq compression
> code.  So to handle that many irqs we would need 1024 irqs.
> 
> I expect modern big systems that can only run x86_64 are larger still.
> 
>>> You have noticed how much of those arrays I have collapsed into irq_cfg
>>> on x86_64.  We can ultimately do the same on x86_32.  The
>>> tricky one is irq_2_pin.  I believe the proper solution is to just
>>> dynamically allocate entries and place a pointer in irq_cfg.  Although
>>> we may be able to simply a place a single entry in irq_cfg.
> 
>> so there will be irq_desc and irq_cfg lists?
> Or we place irq_desc in irq_cfg.
> 
>> wonder if helper to get irq_desc and irq_cfg for one irq_no could be bottleneck?
> 
> Nah.  We lookup whatever it we need in the 256 entry vector_irq table.
> I expect we can do the container_of trick beyond that.
> 
> If the helper which we should only see on the slow path is a bottleneck
> we can easily turn organize irq_desc into a tree structure.  Ultimately
> I think we want drivers to have a struct irq *irq pointer but we need
> to get the arch backend working first.
> 
>> PS: cpumask_t domain in irq_cfg need to updated... it wast 512bytes
>> when NR_CPUS=4096
>> could change it to unsigned int. logical mode (flat, x2apic logical) it as mask
>> and (physical flat mode, and x2apic physical) it is cpu number.
> 
> Certainly there is the potential to simplify things.
> 
>>> I agree with your sentiment if we can actually allocate the irqs by
>>> demand instead of preallocating them based on worst case usage we
>>> should use much less memory.
>> yes.
>>
>>> I figure that keeping any type of nr_irqs around you are requiring
>>> us to estimate the worst case number of irqs we need to deal with.
>> need to comprise flexibility and performance..., or say waste some
>> space to get some performance...
> 
> The thing is there is no good upper bound of how many irqs we can see
> short of of NR_PCI_DEVICES*4096
> 
>>> The challenge is that we have hot plug devices with MSI-X capabilities
>>> on them.  Just one of those could add 4K irqs (worst case).  256 or
>>> so I have actually heard hardware guys talking about.
> 
>> good know. so one cpu handle one card? or need 16 cpus serve one
>> cards? or they got new cpu to NR_VECTORS  with 32bit?
> 
> Yes.  Currently for the current worst case it requires 16 cpus.
> The biggest I have heard a card using at this point is 256 irqs.
> At lot of the goal in those cards is so they can have 2 irqs per cpu.
> 1 rx irq and 1 tx irq.  Allowing them to implement per cpu queues.
> 
>> then need to keep struct irq_desc, can not put everything into it.
> 
> Yes.  But we can put all the arch specific code in irq_cfg, and put
> irq_desc in irq_cfg.
> 
>>> But even one msi vector on a pci card that doesn't have normal irqs could
>>> mess up a tightly sized nr_irqs based soley on acpi_madt probing.
>> v2 double that last_gsi_end
> 
> Which is usable, but no where near as nice as not having a fixed upper bound.
> 
> 
>>> Sorry I was referring to the MSI-X source vector number which is a 12
>>> bit index into an array of MSI-X vectors on the pci device, not the
>>> vector we receive the irq at on the pci card.
>> cpu is going to check that vectors in addition to vectors in IDT?
> 
> No. The destination cpu and destination vector number are encoded in
> the MSI message.  Each MSI-X source ``vector'' has a different MSI message.
> 
> So on my wish list is to stably encode the MSI interurrpt numbers.  And
> using a sparse irq address space I can.  As it only takes 28 bits to hold
> the complete bus + device + function + msi source [ 0-4095 ] 
> 
> Eric

Don't you need "domain" (node) in the bus:device:function:vector combination?
(Or [hack] use a lot bigger field for bus with the node encoded into it.)

Thanks,
Mike

next prev parent reply	other threads:[~2008-08-04 12:58 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-01  9:37 [PATCH 00/16] dyn_array and nr_irqs support v2 Yinghai Lu
2008-08-01  9:37 ` [PATCH 01/16] x86: 64bit support more than 256 irq Yinghai Lu
2008-08-01  9:37   ` [PATCH 02/16] x86: introduce nr_irqs for 64bit v3 Yinghai Lu
2008-08-01  9:37     ` [PATCH 03/16] add dyn_array support Yinghai Lu
2008-08-01  9:37       ` [PATCH 04/16] make irq_timer_state to use dyn_array Yinghai Lu
2008-08-01  9:37         ` [PATCH 05/16] make irq2_iommu " Yinghai Lu
2008-08-01  9:37           ` [PATCH 06/16] make irq_desc " Yinghai Lu
2008-08-01  9:37             ` [PATCH 07/16] x86: make 64bit support dyn_array Yinghai Lu
2008-08-01  9:37               ` [PATCH 08/16] serial: change remove NR_IRQS in 8250.c v2 Yinghai Lu
2008-08-01  9:37                 ` [PATCH 09/16] add per_cpu_dyn_array support Yinghai Lu
2008-08-01  9:37                   ` [PATCH 10/16] irq: make irqs in kernel stat use per_cpu_dyn_array Yinghai Lu
2008-08-01  9:37                     ` [PATCH 11/16] x86 remove irq_vectors_limit.h Yinghai Lu
2008-08-01  9:37                       ` [PATCH 12/16] x86: make 32bit use dyn_array Yinghai Lu
2008-08-01  9:37                         ` [PATCH 13/16] add per_cpu_dyn_array for arch percpu support Yinghai Lu
2008-08-01  9:37                           ` [PATCH 14/16] x86: get mp_irqs from madt Yinghai Lu
2008-08-01  9:37                             ` [PATCH 15/16] x86: make 32bit more like with io_apic/dyn_array to 64 bit Yinghai Lu
2008-08-01  9:37                               ` [PATCH 16/16] x86: alloc dyn_array all alltogether Yinghai Lu
2008-08-01 20:46 ` [PATCH 00/16] dyn_array and nr_irqs support v2 Eric W. Biederman
2008-08-01 21:30   ` Yinghai Lu
2008-08-01 21:57     ` Yinghai Lu
2008-08-01 22:45       ` Eric W. Biederman
2008-08-01 22:10     ` Yinghai Lu
2008-08-01 22:38     ` Eric W. Biederman
2008-08-02  1:09       ` Yinghai Lu
2008-08-02  1:36         ` H. Peter Anvin
2008-08-02  1:41         ` Eric W. Biederman
2008-08-02  2:01           ` Yinghai Lu
2008-08-02  2:03             ` H. Peter Anvin
2008-08-02  2:39               ` Eric W. Biederman
2008-08-02  3:28                 ` H. Peter Anvin
2008-08-02  4:42                   ` Eric W. Biederman
2008-08-02 15:41                     ` H. Peter Anvin
2008-08-02 20:20                       ` Eric W. Biederman
2008-08-04 12:57           ` Mike Travis [this message]
2008-08-05  2:38             ` H. Peter Anvin
2008-08-05  3:40               ` Eric W. Biederman
2008-08-05  3:48                 ` H. Peter Anvin
2008-08-01 21:47   ` Mike Travis
2008-08-02  2:58   ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4896FCD0.8050006@sgi.com \
    --to=travis@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox