All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Travis <travis@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>, hpa <hpa@zytor.com>,
	Dhaval Giani <dhaval@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/16] dyn_array and nr_irqs support v2
Date: Mon, 04 Aug 2008 05:57:52 -0700	[thread overview]
Message-ID: <4896FCD0.8050006@sgi.com> (raw)
In-Reply-To: <m163qkjirc.fsf@frodo.ebiederm.org>

Eric W. Biederman wrote:
> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> 
>>>> Increase NR_IRQS to 512 for x86_64?
>>> x86_32 has it set to 1024 so 512 is too small.  I think your patch
>>> which essentially restores the old behavior is the right way to go for
>>> this merge window.  I just want to carefully look at it and ensure we
>>> are restoring the old heuristics.  On a lot of large machines we wind
>>> up having irqs for pci slots that are never filled with cards.
>> it seems 32bit summit need NR_IRQS=256, NR_IRQ_VECTOR=1024
> 
> Yes.  Which is 1024 irq sources/gsis only 1/4 used so it will fit into 256 irqs.
> 
> On x86_64 we have removed the confusing and brittle irq compression
> code.  So to handle that many irqs we would need 1024 irqs.
> 
> I expect modern big systems that can only run x86_64 are larger still.
> 
>>> You have noticed how much of those arrays I have collapsed into irq_cfg
>>> on x86_64.  We can ultimately do the same on x86_32.  The
>>> tricky one is irq_2_pin.  I believe the proper solution is to just
>>> dynamically allocate entries and place a pointer in irq_cfg.  Although
>>> we may be able to simply a place a single entry in irq_cfg.
> 
>> so there will be irq_desc and irq_cfg lists?
> Or we place irq_desc in irq_cfg.
> 
>> wonder if helper to get irq_desc and irq_cfg for one irq_no could be bottleneck?
> 
> Nah.  We lookup whatever it we need in the 256 entry vector_irq table.
> I expect we can do the container_of trick beyond that.
> 
> If the helper which we should only see on the slow path is a bottleneck
> we can easily turn organize irq_desc into a tree structure.  Ultimately
> I think we want drivers to have a struct irq *irq pointer but we need
> to get the arch backend working first.
> 
>> PS: cpumask_t domain in irq_cfg need to updated... it wast 512bytes
>> when NR_CPUS=4096
>> could change it to unsigned int. logical mode (flat, x2apic logical) it as mask
>> and (physical flat mode, and x2apic physical) it is cpu number.
> 
> Certainly there is the potential to simplify things.
> 
>>> I agree with your sentiment if we can actually allocate the irqs by
>>> demand instead of preallocating them based on worst case usage we
>>> should use much less memory.
>> yes.
>>
>>> I figure that keeping any type of nr_irqs around you are requiring
>>> us to estimate the worst case number of irqs we need to deal with.
>> need to comprise flexibility and performance..., or say waste some
>> space to get some performance...
> 
> The thing is there is no good upper bound of how many irqs we can see
> short of of NR_PCI_DEVICES*4096
> 
>>> The challenge is that we have hot plug devices with MSI-X capabilities
>>> on them.  Just one of those could add 4K irqs (worst case).  256 or
>>> so I have actually heard hardware guys talking about.
> 
>> good know. so one cpu handle one card? or need 16 cpus serve one
>> cards? or they got new cpu to NR_VECTORS  with 32bit?
> 
> Yes.  Currently for the current worst case it requires 16 cpus.
> The biggest I have heard a card using at this point is 256 irqs.
> At lot of the goal in those cards is so they can have 2 irqs per cpu.
> 1 rx irq and 1 tx irq.  Allowing them to implement per cpu queues.
> 
>> then need to keep struct irq_desc, can not put everything into it.
> 
> Yes.  But we can put all the arch specific code in irq_cfg, and put
> irq_desc in irq_cfg.
> 
>>> But even one msi vector on a pci card that doesn't have normal irqs could
>>> mess up a tightly sized nr_irqs based soley on acpi_madt probing.
>> v2 double that last_gsi_end
> 
> Which is usable, but no where near as nice as not having a fixed upper bound.
> 
> 
>>> Sorry I was referring to the MSI-X source vector number which is a 12
>>> bit index into an array of MSI-X vectors on the pci device, not the
>>> vector we receive the irq at on the pci card.
>> cpu is going to check that vectors in addition to vectors in IDT?
> 
> No. The destination cpu and destination vector number are encoded in
> the MSI message.  Each MSI-X source ``vector'' has a different MSI message.
> 
> So on my wish list is to stably encode the MSI interurrpt numbers.  And
> using a sparse irq address space I can.  As it only takes 28 bits to hold
> the complete bus + device + function + msi source [ 0-4095 ] 
> 
> Eric

Don't you need "domain" (node) in the bus:device:function:vector combination?
(Or [hack] use a lot bigger field for bus with the node encoded into it.)

Thanks,
Mike

  parent reply	other threads:[~2008-08-04 12:58 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-01  9:37 [PATCH 00/16] dyn_array and nr_irqs support v2 Yinghai Lu
2008-08-01  9:37 ` [PATCH 01/16] x86: 64bit support more than 256 irq Yinghai Lu
2008-08-01  9:37   ` [PATCH 02/16] x86: introduce nr_irqs for 64bit v3 Yinghai Lu
2008-08-01  9:37     ` [PATCH 03/16] add dyn_array support Yinghai Lu
2008-08-01  9:37       ` [PATCH 04/16] make irq_timer_state to use dyn_array Yinghai Lu
2008-08-01  9:37         ` [PATCH 05/16] make irq2_iommu " Yinghai Lu
2008-08-01  9:37           ` [PATCH 06/16] make irq_desc " Yinghai Lu
2008-08-01  9:37             ` [PATCH 07/16] x86: make 64bit support dyn_array Yinghai Lu
2008-08-01  9:37               ` [PATCH 08/16] serial: change remove NR_IRQS in 8250.c v2 Yinghai Lu
2008-08-01  9:37                 ` [PATCH 09/16] add per_cpu_dyn_array support Yinghai Lu
2008-08-01  9:37                   ` [PATCH 10/16] irq: make irqs in kernel stat use per_cpu_dyn_array Yinghai Lu
2008-08-01  9:37                     ` [PATCH 11/16] x86 remove irq_vectors_limit.h Yinghai Lu
2008-08-01  9:37                       ` [PATCH 12/16] x86: make 32bit use dyn_array Yinghai Lu
2008-08-01  9:37                         ` [PATCH 13/16] add per_cpu_dyn_array for arch percpu support Yinghai Lu
2008-08-01  9:37                           ` [PATCH 14/16] x86: get mp_irqs from madt Yinghai Lu
2008-08-01  9:37                             ` [PATCH 15/16] x86: make 32bit more like with io_apic/dyn_array to 64 bit Yinghai Lu
2008-08-01  9:37                               ` [PATCH 16/16] x86: alloc dyn_array all alltogether Yinghai Lu
2008-08-01 20:46 ` [PATCH 00/16] dyn_array and nr_irqs support v2 Eric W. Biederman
2008-08-01 21:30   ` Yinghai Lu
2008-08-01 21:57     ` Yinghai Lu
2008-08-01 22:45       ` Eric W. Biederman
2008-08-01 22:10     ` Yinghai Lu
2008-08-01 22:38     ` Eric W. Biederman
2008-08-02  1:09       ` Yinghai Lu
2008-08-02  1:36         ` H. Peter Anvin
2008-08-02  1:41         ` Eric W. Biederman
2008-08-02  2:01           ` Yinghai Lu
2008-08-02  2:03             ` H. Peter Anvin
2008-08-02  2:39               ` Eric W. Biederman
2008-08-02  3:28                 ` H. Peter Anvin
2008-08-02  4:42                   ` Eric W. Biederman
2008-08-02 15:41                     ` H. Peter Anvin
2008-08-02 20:20                       ` Eric W. Biederman
2008-08-04 12:57           ` Mike Travis [this message]
2008-08-05  2:38             ` H. Peter Anvin
2008-08-05  3:40               ` Eric W. Biederman
2008-08-05  3:48                 ` H. Peter Anvin
2008-08-01 21:47   ` Mike Travis
2008-08-02  2:58   ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4896FCD0.8050006@sgi.com \
    --to=travis@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.