All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Yinghai Lu <yinghai@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] irq: sparseirq enabling
Date: Mon, 24 Nov 2008 15:40:07 +0100	[thread overview]
Message-ID: <20081124144007.GA30725@elte.hu> (raw)
In-Reply-To: <492A1877.4090304@kernel.org>


* Yinghai Lu <yinghai@kernel.org> wrote:

> +/*
> + * Protect the sparse_irqs_free freelist:
> + */
> +static DEFINE_SPINLOCK(sparse_irq_lock);
> +LIST_HEAD(sparse_irqs_head);
> +
> +/*
> + * The sparse irqs are in a hash-table as well, for fast lookup:
> + */
> +#define SPARSEIRQHASH_BITS          (13 - 1)
> +#define SPARSEIRQHASH_SIZE          (1UL << SPARSEIRQHASH_BITS)
> +#define __sparseirqhashfn(key)      hash_long((unsigned long)key, SPARSEIRQHASH_BITS)
> +#define sparseirqhashentry(key)     (sparseirqhash_table + __sparseirqhashfn((key)))
> +
> +static struct list_head sparseirqhash_table[SPARSEIRQHASH_SIZE];
> +

> +struct irq_desc *irq_to_desc(unsigned int irq)
> +{
> +	struct irq_desc *desc;
> +	struct list_head *hash_head;
> +
> +	hash_head = sparseirqhashentry(irq);
> +
> +	/*
> +	 * We can walk the hash lockfree, because the hash only
> +	 * grows, and we are careful when adding entries to the end:
> +	 */
> +	list_for_each_entry(desc, hash_head, hash_entry) {
> +		if (desc->irq == irq)
> +			return desc;
> +	}
> +
> +	return NULL;
> +}

I have talked to Thomas about the current design of sparseirq, and the 
current code looks pretty good already, but we'd like to suggest one 
more refinement to the hashing scheme:

Please simplify it by using a sparse pointers static array instead of 
a full hash. I.e. do it as a:

       struct irq_desc *irq_desc_ptrs[NR_IRQS] __read_mostly;

This can scale up just fine to 4096 CPUs without measurable impact to 
the kernel image size on x86 distro kernels.

Data structure rules:

1) allocation: an entry would be allocated at IRQ number creation time 
   - never at any other time. Pure access to the desc returns NULL - 
   it's atomic and lockless.

2) freeing: we never free them. If a system allocates an irq_desc[], 
   it signals that it uses it. This makes all access lockless.

3) lookup: irq_to_desc() just does a simple irq_desc_ptrs[irq]
   dereference (inlined) - no locking or hash lookup needed. Since 
   irq_desc_ptrs[] is accessed __read_mostly, this will scale really 
   well even on NUMA.

4) iterators: we still keep NR_IRQS as a limit for the
   irq_desc_ptrs[] array - but it would never be used directly by any 
   iteration loop, in generic code.

5) bootup, pre-kmalloc irq_desc access: on x86 we should preallocate a 
   pool of ~32 irq_desc entries for allocation use. This can be a
   simple static array of struct irq_desc that is fed into the
   first 32 legacy irq_desc[] slots or so. No bootmem-alloc and
   no after_bootmem flaggery needed. All SMP init and secondary CPU 
   irq desc allocation happens after kmalloc is active already - 
   cleaning up the allocation path.

6) limits on x86: please scale NR_IRQS to this rule:
               max(32*MAX_IO_APICS, 8*NR_CPUS)

   That gives a minimum of 4096 IRQs on 64-bit systems. On even larger 
   systems, we scale linearly up to 32K IRQs on 4K CPUs. That should 
   be more than enough (and it can be increased if really needed).

Please keep all the irq_desc[] abstraction APIs and cleanups you did 
so far - they are great. In the far future we can still make irq_desc 
a full hash if needed - but right now we'll be just fine with such a 
simpler scheme as well, and scale fine up to 16K CPUs or so.

This should simplify quite a few things in the sparseirq code.

	Ingo

  reply	other threads:[~2008-11-24 14:40 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-24  2:59 [PATCH 1/2] irq: sparseirq enabling Yinghai Lu
2008-11-24 14:40 ` Ingo Molnar [this message]
2008-11-24 19:22   ` Yinghai Lu
2008-11-24 22:26     ` Thomas Gleixner
2008-11-25  3:57   ` [PATCH 1/2] irq: sparseirq enabling v2 Yinghai Lu
2008-11-25  3:58     ` [PATCH 2/2] irq: move irq_desc according to smp_affinity v2 Yinghai Lu
2008-11-26  7:48     ` [PATCH 1/2] irq: sparseirq enabling v2 Ingo Molnar
2008-11-26  8:02       ` Yinghai Lu
2008-11-26  8:17         ` Ingo Molnar
2008-11-26 18:33           ` Yinghai Lu
2008-11-27  2:26           ` [PATCH 1/2] irq: sparseirq enabling v3 Yinghai Lu
2008-11-27  2:26             ` [PATCH 2/2] irq: move irq_desc according to smp_affinity v3 Yinghai Lu
2008-11-28 16:34             ` [PATCH 1/2] irq: sparseirq enabling v3 Ingo Molnar
2008-11-29  7:13               ` [PATCH] irq: sparseirq enabling v4 Yinghai Lu
2008-11-29 10:02                 ` Ingo Molnar
2008-11-29 10:26                   ` Ingo Molnar
2008-12-01  4:44                     ` [PATCH] irq: sparse irq_desc[] support - fix Yinghai Lu
2008-11-29 10:57                   ` [PATCH] irq: sparseirq enabling v4 Sam Ravnborg
2008-11-29 14:33                     ` Ingo Molnar
2008-11-29 17:54                       ` Sam Ravnborg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081124144007.GA30725@elte.hu \
    --to=mingo@elte.hu \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.