From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752810AbYKXOke (ORCPT ); Mon, 24 Nov 2008 09:40:34 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752575AbYKXOk0 (ORCPT ); Mon, 24 Nov 2008 09:40:26 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:60144 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752563AbYKXOkZ (ORCPT ); Mon, 24 Nov 2008 09:40:25 -0500 Date: Mon, 24 Nov 2008 15:40:07 +0100 From: Ingo Molnar To: Yinghai Lu Cc: Thomas Gleixner , "H. Peter Anvin" , Andrew Morton , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 1/2] irq: sparseirq enabling Message-ID: <20081124144007.GA30725@elte.hu> References: <492A1877.4090304@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <492A1877.4090304@kernel.org> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Yinghai Lu wrote: > +/* > + * Protect the sparse_irqs_free freelist: > + */ > +static DEFINE_SPINLOCK(sparse_irq_lock); > +LIST_HEAD(sparse_irqs_head); > + > +/* > + * The sparse irqs are in a hash-table as well, for fast lookup: > + */ > +#define SPARSEIRQHASH_BITS (13 - 1) > +#define SPARSEIRQHASH_SIZE (1UL << SPARSEIRQHASH_BITS) > +#define __sparseirqhashfn(key) hash_long((unsigned long)key, SPARSEIRQHASH_BITS) > +#define sparseirqhashentry(key) (sparseirqhash_table + __sparseirqhashfn((key))) > + > +static struct list_head sparseirqhash_table[SPARSEIRQHASH_SIZE]; > + > +struct irq_desc *irq_to_desc(unsigned int irq) > +{ > + struct irq_desc *desc; > + struct list_head *hash_head; > + > + hash_head = sparseirqhashentry(irq); > + > + /* > + * We can walk the hash lockfree, because the hash only > + * grows, and we are careful when adding entries to the end: > + */ > + list_for_each_entry(desc, hash_head, hash_entry) { > + if (desc->irq == irq) > + return desc; > + } > + > + return NULL; > +} I have talked to Thomas about the current design of sparseirq, and the current code looks pretty good already, but we'd like to suggest one more refinement to the hashing scheme: Please simplify it by using a sparse pointers static array instead of a full hash. I.e. do it as a: struct irq_desc *irq_desc_ptrs[NR_IRQS] __read_mostly; This can scale up just fine to 4096 CPUs without measurable impact to the kernel image size on x86 distro kernels. Data structure rules: 1) allocation: an entry would be allocated at IRQ number creation time - never at any other time. Pure access to the desc returns NULL - it's atomic and lockless. 2) freeing: we never free them. If a system allocates an irq_desc[], it signals that it uses it. This makes all access lockless. 3) lookup: irq_to_desc() just does a simple irq_desc_ptrs[irq] dereference (inlined) - no locking or hash lookup needed. Since irq_desc_ptrs[] is accessed __read_mostly, this will scale really well even on NUMA. 4) iterators: we still keep NR_IRQS as a limit for the irq_desc_ptrs[] array - but it would never be used directly by any iteration loop, in generic code. 5) bootup, pre-kmalloc irq_desc access: on x86 we should preallocate a pool of ~32 irq_desc entries for allocation use. This can be a simple static array of struct irq_desc that is fed into the first 32 legacy irq_desc[] slots or so. No bootmem-alloc and no after_bootmem flaggery needed. All SMP init and secondary CPU irq desc allocation happens after kmalloc is active already - cleaning up the allocation path. 6) limits on x86: please scale NR_IRQS to this rule: max(32*MAX_IO_APICS, 8*NR_CPUS) That gives a minimum of 4096 IRQs on 64-bit systems. On even larger systems, we scale linearly up to 32K IRQs on 4K CPUs. That should be more than enough (and it can be increased if really needed). Please keep all the irq_desc[] abstraction APIs and cleanups you did so far - they are great. In the far future we can still make irq_desc a full hash if needed - but right now we'll be just fine with such a simpler scheme as well, and scale fine up to 16K CPUs or so. This should simplify quite a few things in the sparseirq code. Ingo