From: Yinghai Lu <yinghai@kernel.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] irq: sparseirq enabling
Date: Mon, 24 Nov 2008 11:22:33 -0800 [thread overview]
Message-ID: <492AFEF9.609@kernel.org> (raw)
In-Reply-To: <20081124144007.GA30725@elte.hu>
Ingo Molnar wrote:
> * Yinghai Lu <yinghai@kernel.org> wrote:
>
>> +/*
>> + * Protect the sparse_irqs_free freelist:
>> + */
>> +static DEFINE_SPINLOCK(sparse_irq_lock);
>> +LIST_HEAD(sparse_irqs_head);
>> +
>> +/*
>> + * The sparse irqs are in a hash-table as well, for fast lookup:
>> + */
>> +#define SPARSEIRQHASH_BITS (13 - 1)
>> +#define SPARSEIRQHASH_SIZE (1UL << SPARSEIRQHASH_BITS)
>> +#define __sparseirqhashfn(key) hash_long((unsigned long)key, SPARSEIRQHASH_BITS)
>> +#define sparseirqhashentry(key) (sparseirqhash_table + __sparseirqhashfn((key)))
>> +
>> +static struct list_head sparseirqhash_table[SPARSEIRQHASH_SIZE];
>> +
>
>> +struct irq_desc *irq_to_desc(unsigned int irq)
>> +{
>> + struct irq_desc *desc;
>> + struct list_head *hash_head;
>> +
>> + hash_head = sparseirqhashentry(irq);
>> +
>> + /*
>> + * We can walk the hash lockfree, because the hash only
>> + * grows, and we are careful when adding entries to the end:
>> + */
>> + list_for_each_entry(desc, hash_head, hash_entry) {
>> + if (desc->irq == irq)
>> + return desc;
>> + }
>> +
>> + return NULL;
>> +}
>
> I have talked to Thomas about the current design of sparseirq, and the
> current code looks pretty good already, but we'd like to suggest one
> more refinement to the hashing scheme:
>
> Please simplify it by using a sparse pointers static array instead of
> a full hash. I.e. do it as a:
>
> struct irq_desc *irq_desc_ptrs[NR_IRQS] __read_mostly;
>
> This can scale up just fine to 4096 CPUs without measurable impact to
> the kernel image size on x86 distro kernels.
>
> Data structure rules:
>
> 1) allocation: an entry would be allocated at IRQ number creation time
> - never at any other time. Pure access to the desc returns NULL -
> it's atomic and lockless.
>
> 2) freeing: we never free them. If a system allocates an irq_desc[],
> it signals that it uses it. This makes all access lockless.
>
> 3) lookup: irq_to_desc() just does a simple irq_desc_ptrs[irq]
> dereference (inlined) - no locking or hash lookup needed. Since
> irq_desc_ptrs[] is accessed __read_mostly, this will scale really
> well even on NUMA.
>
> 4) iterators: we still keep NR_IRQS as a limit for the
> irq_desc_ptrs[] array - but it would never be used directly by any
> iteration loop, in generic code.
>
> 5) bootup, pre-kmalloc irq_desc access: on x86 we should preallocate a
> pool of ~32 irq_desc entries for allocation use. This can be a
> simple static array of struct irq_desc that is fed into the
> first 32 legacy irq_desc[] slots or so. No bootmem-alloc and
> no after_bootmem flaggery needed. All SMP init and secondary CPU
> irq desc allocation happens after kmalloc is active already -
> cleaning up the allocation path.
>
> 6) limits on x86: please scale NR_IRQS to this rule:
> max(32*MAX_IO_APICS, 8*NR_CPUS)
>
> That gives a minimum of 4096 IRQs on 64-bit systems. On even larger
> systems, we scale linearly up to 32K IRQs on 4K CPUs. That should
> be more than enough (and it can be increased if really needed).
>
> Please keep all the irq_desc[] abstraction APIs and cleanups you did
> so far - they are great. In the far future we can still make irq_desc
> a full hash if needed - but right now we'll be just fine with such a
> simpler scheme as well, and scale fine up to 16K CPUs or so.
>
> This should simplify quite a few things in the sparseirq code.
ok, will change to that.
1. nr_irqs will be legacy number or GSI numbers. MSI will start to use irq nr from nr_irqs
2. or MSI still to use irq from NR_IRQS - 1, and ... go smaller?
YH
next prev parent reply other threads:[~2008-11-24 19:23 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-24 2:59 [PATCH 1/2] irq: sparseirq enabling Yinghai Lu
2008-11-24 14:40 ` Ingo Molnar
2008-11-24 19:22 ` Yinghai Lu [this message]
2008-11-24 22:26 ` Thomas Gleixner
2008-11-25 3:57 ` [PATCH 1/2] irq: sparseirq enabling v2 Yinghai Lu
2008-11-25 3:58 ` [PATCH 2/2] irq: move irq_desc according to smp_affinity v2 Yinghai Lu
2008-11-26 7:48 ` [PATCH 1/2] irq: sparseirq enabling v2 Ingo Molnar
2008-11-26 8:02 ` Yinghai Lu
2008-11-26 8:17 ` Ingo Molnar
2008-11-26 18:33 ` Yinghai Lu
2008-11-27 2:26 ` [PATCH 1/2] irq: sparseirq enabling v3 Yinghai Lu
2008-11-27 2:26 ` [PATCH 2/2] irq: move irq_desc according to smp_affinity v3 Yinghai Lu
2008-11-28 16:34 ` [PATCH 1/2] irq: sparseirq enabling v3 Ingo Molnar
2008-11-29 7:13 ` [PATCH] irq: sparseirq enabling v4 Yinghai Lu
2008-11-29 10:02 ` Ingo Molnar
2008-11-29 10:26 ` Ingo Molnar
2008-12-01 4:44 ` [PATCH] irq: sparse irq_desc[] support - fix Yinghai Lu
2008-11-29 10:57 ` [PATCH] irq: sparseirq enabling v4 Sam Ravnborg
2008-11-29 14:33 ` Ingo Molnar
2008-11-29 17:54 ` Sam Ravnborg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=492AFEF9.609@kernel.org \
--to=yinghai@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.