linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-arch@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mundt <lethal@linux-sh.org>,
	Russell King <linux@arm.linux.org.uk>,
	David Woodhouse <dwmw2@infradead.org>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Yinghai Lu <yinghai@kernel.org>,
	Grant Likely <grant.likely@secretlab.ca>
Subject: Re: [patch 00/47] Sparse irq rework
Date: Sun, 3 Oct 2010 21:16:47 +0200 (CEST)	[thread overview]
Message-ID: <alpine.LFD.2.00.1010032034420.14550@localhost6.localdomain6> (raw)
In-Reply-To: <m1wrpzfctu.fsf@fess.ebiederm.org>

On Sun, 3 Oct 2010, Eric W. Biederman wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> > Rationale:
> > ----------
> >
> > The current sparse_irq allocator has several short comings due to
> > failures in the design or the lack of it:
> >
> >  - Requires iteration over the number of active irqs to find a free slot
> >    Some architectures have grown their own workarounds for this.
> >
> >  - Freeing of irq descriptors is not possible
> >
> >  - Racy between create_irq_nr and destroy_irq plugged by horrible
> >    callbacks
> >
> >  - Migration of active irq descriptors is not possible
> 
> I believe you have distored the design when aiming for migration
> of active irq descriptors (which you have not even implemented yet).
> 
> How do you plan to remove the radix tree lookup from the irq
> handling path?

Not at all and it's not even even a requirement to remove the lookup
for implementing live migration.

> On x86 the obvious implementation is to store a pointer to the irq_desc
> in our 256 entry per cpu tables.  Please implement this and see how
> it affects the design.  The code is pretty trivial.

Thought about that already, but that's a pure optimization which does
not change anything about the underlying problem.
 
> >From what I can see of your migration plan it seems incompatible with
> removing the radix tree look up in the path to generic_handle_irq().
> 
> >  - No bulk allocation of irq ranges
> 
> Where is that a short coming?

In embedded, where you have modular irq expanders loaded which
prefer to have a consecutive number space.

> > Aside of that the sparse irq design failure caused that we sprinkled
> > irq_desc references all over the place outside of kernel/irq/. That
> > makes it extremly hard to do the core changes which are necessary to
> > do further cleanups and improvements like he migration of active irq
> > descriptors. The arch code needs only to know about the irq chip and
> > the data associated with the irq. The irq descriptor itself is solely
> > a core code data structure.
> 
> If by core you mean arch code irq handling code certainly and
> msi fits that bill.

Right. The chip functions are changing from (unsigned int) to (struct
irq_data *data). And that's what my first series is providing.
 
> > The reason is that with the non sparse code access to the irq data was
> > just array pointer math and most code (aside of the old __do_IRQ()
> > users) used the provided accessor functions.
> >
> > With sparse it requires a radix tree lookup, which casued performance
> > problems. Instead of tackling the problem at the chip function level
> > and handing down a pointer to the associated data instead of an irq
> > number, the low level code acquired a reference to irq_desc and
> > populated that all over the place. Yeah, it's easier than doing a full
> > cleanup and a sensible migration path, but the resulting mess is just
> > disgusting.
> >
> > The previous chip functions series on which this series is based is
> > addressing this issue on the chip level side by handing down the
> > associated interrupt data instead of the interruut number. The x86
> > cleanup is making use of it.
> 
> And always handing down the data structure so you can do the same
> thing with sparse irq enabled or not is a much needed code cleanup.

Well, that's the plan. I just don't want to do the full tree sweep
myself. I have implemented a migration path in the first series which
allows a step by step cleanup of the chip implementations.
 
> > New implementation:
> > -------------------
> >
> > I've implemented a sane allocator which fixes the above short comings
> > (though migration of active descriptors still needs a full tree wide
> > cleanup of the direct and mostly unlocked access to irq_desc).
> >
> > The new allocator still uses a radix_tree, but uses a bitmap for
> > keeping track of allocated irq numbers. That results in:
> 
> I don't know that I have a problem with this but I do have a problem
> with using a bitmap.  A lot of the kernels irq usage has been distored
> because we use a compact array, that we cannot grow over time.  Using a
> bitmap here essentially removes 90% of the point of sparse irq.  The
> ability to remove a hard coded NR_IRQS from the kernel.

Well, lets look at some (un)realistic numbers:

Assume 16k cores and 32 irqs / core. That's 512k interrupts and
requires a 64k bitmap.

If we hit that limit, then we have some other more serious problems to
solve.

And I really do not see a point to have a truly random 64bit number
space for interrupts. Especially the dynamically allocated interrupts
(MSI & co) do not care about the number space at all. They care about
getting a unique number, nothing else.

> >  - Fast lookup of a free slot
> >
> >  - The removal of disposed descriptors (destroy_irq())
> >
> >  - Prevents the create/destroy race
> >
> >  - Bulk (de)allocation of consecutive irq ranges
> >
> >  - Migration of life descriptors after further cleanups
> 
> You should be able to do all of that by walking your radix tree in the
> sparse irq case.

The bitmap makes the design way simpler and gets rid of useless tree
walks and looped lookups for bulk allocations.
 
> > Full conversion and clean up of x86:
> > ------------------------------------
> >
> > I spent quite a time to come up with a sane and splitable concept,
> > which does not reach out into drivers/pci/[msi|ht|dmar] and whatever.
> >
> > But that's simply impossible because everything is twisted together
> > mainly by optimization hacks done over time. (i.e. handing down
> > irq_desc to low level msi functions instead of irq_desc.msi_desc would
> > have kept the mess confined to x86).
> 
> Those files provide the genirq irq chip implementation especially
> drivers/pci/msi.c.  Of course they will do what every other irq_chip
> implementation does to get access to data.  There is an unpleasant
> difference between which generic irq data field htirq.c uses and msi.c
> which may be worth cleaning up.  But otherwise I don't see any
> fundamental problems.

The fundamental problem I hit, was the hack which handed down irq_desc
to avoid the lookup. If it had been msi_desc in the first place, then
I would not even need to touch the msi code to cleanup x86.

> The big difference is those are the irq controllers that we have code
> for that is not necessarily architecture specific.
> 
> > So I went there and started to convert stuff piece by piece in x86 and
> > added the drivers/pci/* fixes as separate patches along the way. Not
> > nice, but it turned out to be the only way which avoided even more
> > churn.
> 
> You should be able to convert msi.c and company directly to using
> irq_data immediately following your previous patchset shouldn't you.
> Perhaps with two flavors of helper functions during the transition
> to passing irq_data everywhere.

That's already in the first series. Otherwise I would not be possible
to convert one irq chip after the other.
 
> I don't see any code in the msi code is arch specific or sparse irq
> specific.

I just did realize the irq_desc handdown to msi late, when I gradually
converted the irq chips which are used in io_apic.c. I can push that
patch further down in the queue, but that does not make a difference.
 
Thanks,

	tglx

  parent reply	other threads:[~2010-10-03 19:17 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-30 23:14 [patch 00/47] Sparse irq rework Thomas Gleixner
2010-09-30 23:14 ` [patch 01/47] x86: Plug memory leak in sparse irq Thomas Gleixner
2010-09-30 23:14 ` [patch 02/47] x86: Hpet: Fix bogus error check in hpet_assign_irq() Thomas Gleixner
2010-09-30 23:14   ` Thomas Gleixner
2010-09-30 23:14 ` [patch 03/47] genirq: Provide status modifier Thomas Gleixner
2010-09-30 23:14   ` Thomas Gleixner
2010-09-30 23:14 ` [patch 04/47] arm: Use irq " Thomas Gleixner
2010-09-30 23:14   ` Thomas Gleixner
2010-09-30 23:14 ` [patch 05/47] genirq-sanitize-irq-data-accessors.patch Thomas Gleixner
2010-09-30 23:14   ` Thomas Gleixner
2010-09-30 23:15 ` [patch 06/47] genirq: Distangle kernel/irq/handle.c Thomas Gleixner
2010-09-30 23:15   ` Thomas Gleixner
2010-09-30 23:15 ` [patch 07/47] genirq: Remove early_init_irq_lock_class() Thomas Gleixner
2010-09-30 23:15   ` Thomas Gleixner
2010-09-30 23:15 ` [patch 08/47] genirq: Move core only inlines to kernel/irq Thomas Gleixner
2010-09-30 23:15 ` [patch 09/47] isdn: hisax: Replace the bogus access to irq stats Thomas Gleixner
2010-09-30 23:15   ` Thomas Gleixner
2010-09-30 23:15 ` [patch 10/47] genirq: Remove export of kstat_irqs_cpu Thomas Gleixner
2010-09-30 23:15 ` [patch 11/47] genirq: Provide default irq init flags Thomas Gleixner
2010-09-30 23:15   ` Thomas Gleixner
2010-09-30 23:15 ` [patch 12/47] arm: Use ARCH_IRQ_INIT_FLAGS Thomas Gleixner
2010-09-30 23:15 ` [patch 13/47] powerpc: " Thomas Gleixner
2010-09-30 23:15   ` Thomas Gleixner
2010-09-30 23:15 ` [patch 14/47] genirq: Implement a sane sparse_irq allocator Thomas Gleixner
2010-09-30 23:15   ` Thomas Gleixner
2010-10-01  5:28   ` Yinghai Lu
2010-10-01 20:36     ` Thomas Gleixner
2010-09-30 23:15 ` [patch 15/47] genirq: Prepare proc for real sparse irq support Thomas Gleixner
2010-09-30 23:15 ` [patch 16/47] genirq: Implement sane enumeration Thomas Gleixner
2010-09-30 23:15   ` Thomas Gleixner
2010-10-03 10:55   ` Grant Likely
2010-09-30 23:15 ` [patch 17/47] genirq-update-kerneldoc.patch Thomas Gleixner
2010-09-30 23:15   ` Thomas Gleixner
2010-09-30 23:16 ` [patch 18/47] genirq: Use sane sparse allocator Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-09-30 23:16 ` [patch 19/47] genirq: Query arch for number of early descriptors Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-09-30 23:16 ` [patch 20/47] x86: Remove useless reinitialization of irq descriptors Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-10-03 15:21   ` Eric W. Biederman
2010-10-03 18:26     ` Thomas Gleixner
2010-09-30 23:16 ` [patch 21/47] x86: Sanitize apb timer interrupt handling Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-09-30 23:16 ` [patch 22/47] x86: lguest: Convert to new irq chip functions Thomas Gleixner
2010-09-30 23:16 ` [patch 23/47] x86: Cleanup visws interrupt handling Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-09-30 23:16 ` [patch 24/47] x86: i8259: Convert to new irq_chip functions Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-09-30 23:16 ` [patch 25/47] x86: Cleanup io_apic Thomas Gleixner
2010-09-30 23:16 ` [patch 26/47] x86: io_apic: Convert startup to new irq_chip function Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-09-30 23:16 ` [patch 27/47] x86: ioapic: Convert mask " Thomas Gleixner
2010-09-30 23:16 ` [patch 28/47] x86: ioapic/hpet: Convert to new chip functions Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-09-30 23:16 ` [patch 29/47] pci: Convert msi to new irq_chip functions Thomas Gleixner
2010-09-30 23:16   ` Thomas Gleixner
2010-10-11 17:09   ` Jesse Barnes
2010-10-11 17:09     ` Jesse Barnes
2010-09-30 23:16 ` [patch 30/47] dmar: Convert to new irq chip functions Thomas Gleixner
2010-09-30 23:17 ` [patch 31/47] ht: Convert to new irq_chip functions Thomas Gleixner
2010-09-30 23:17   ` Thomas Gleixner
2010-09-30 23:17 ` [patch 32/47] x86: ioapic: Clean up the direct access to irq_desc Thomas Gleixner
2010-09-30 23:17   ` Thomas Gleixner
2010-09-30 23:17 ` [patch 33/47] pci: Cleanup the irq_desc mess in msi Thomas Gleixner
2010-10-11 17:08   ` Jesse Barnes
2010-10-11 17:08     ` Jesse Barnes
2010-09-30 23:17 ` [patch 34/47] x86: ioapic: Convert irq affinity to new chip functions Thomas Gleixner
2010-09-30 23:17 ` [patch 35/47] x86: ioapic: Cleanup some more Thomas Gleixner
2010-09-30 23:17 ` [patch 36/47] x86: ioapic: Cleanup sparse irq code Thomas Gleixner
2010-09-30 23:17   ` Thomas Gleixner
2010-09-30 23:17 ` [patch 37/47] x86: uv: Clean up the direct access to irq_desc Thomas Gleixner
2010-09-30 23:17 ` [patch 38/47] x86: Use sane enumeration Thomas Gleixner
2010-09-30 23:17 ` [patch 39/47] genirq: Remove arch_init_chip_data() Thomas Gleixner
2010-09-30 23:17 ` [patch 40/47] genirq: Sanitize dynamic irq handling Thomas Gleixner
2010-09-30 23:17   ` Thomas Gleixner
2010-10-01  5:47   ` Yinghai Lu
2010-09-30 23:18 ` [patch 41/47] arm: davinci: Cleanup irq_desc access Thomas Gleixner
2010-09-30 23:18   ` Thomas Gleixner
2010-09-30 23:18 ` [patch 42/47] genirq: Remove the now unused sparse irq leftovers Thomas Gleixner
2010-09-30 23:18   ` Thomas Gleixner
2010-09-30 23:18 ` [patch 43/47] x86: xen: Sanitise sparse_irq handling Thomas Gleixner
2010-09-30 23:18   ` Thomas Gleixner
2010-09-30 23:18 ` [patch 44/47] sh: Sanitize sparse irq Thomas Gleixner
2010-09-30 23:18 ` [patch 45/47] x86: lguest: Use new irq allocator Thomas Gleixner
2010-09-30 23:18   ` Thomas Gleixner
2010-09-30 23:18 ` [patch 46/47] powerpc: " Thomas Gleixner
2010-09-30 23:18   ` Thomas Gleixner
2010-10-01  0:42   ` Benjamin Herrenschmidt
2010-10-01 13:07     ` Thomas Gleixner
2010-10-01 20:46       ` Benjamin Herrenschmidt
2010-10-01 21:11         ` Grant Likely
2010-10-01 21:17           ` Benjamin Herrenschmidt
2010-10-03 16:53       ` Eric W. Biederman
2010-10-03 16:53         ` Eric W. Biederman
2010-10-03 18:34         ` Thomas Gleixner
2010-10-03 20:04           ` Thomas Gleixner
2010-10-03 22:54         ` Benjamin Herrenschmidt
2010-10-03 22:54           ` Benjamin Herrenschmidt
2010-10-04  0:15           ` Eric W. Biederman
2010-10-04  0:37             ` Benjamin Herrenschmidt
2010-10-04 16:46           ` Grant Likely
2010-09-30 23:18 ` [patch 47/47] genirq: Remove the old sparse irq allocator function Thomas Gleixner
2010-09-30 23:18   ` Thomas Gleixner
2010-10-01  3:32 ` [patch 00/47] Sparse irq rework Linus Torvalds
2010-10-01  3:32   ` Linus Torvalds
2010-10-01  5:54 ` Yinghai Lu
2010-10-01  5:54   ` Yinghai Lu
2010-10-01 20:35   ` Thomas Gleixner
2010-10-03 11:23 ` Grant Likely
2010-10-03 11:29   ` Russell King - ARM Linux
2010-10-03 11:29     ` Russell King - ARM Linux
2010-10-03 11:57     ` Grant Likely
2010-10-03 13:48       ` Thomas Gleixner
2010-10-05 10:22         ` Thomas Gleixner
2010-10-05 10:22           ` Thomas Gleixner
2010-10-06 22:45           ` Yinghai Lu
2010-10-06 22:52             ` Thomas Gleixner
2010-10-06 23:37               ` Yinghai Lu
2010-10-07  0:16                 ` Yinghai Lu
2010-10-07  4:01                   ` Thomas Gleixner
2010-10-07  4:38                     ` Yinghai Lu
2010-10-08 21:50                       ` Thomas Gleixner
2010-10-08 21:54                         ` Thomas Gleixner
2010-10-09  4:26                           ` Yinghai Lu
2010-10-09  5:44                             ` Yinghai Lu
2010-10-09  6:34                               ` Thomas Gleixner
2010-10-09  7:08                                 ` Yinghai Lu
2010-10-09  7:08                                   ` Yinghai Lu
2010-10-09 12:08                                   ` Thomas Gleixner
2010-10-10  9:32                                   ` Thomas Gleixner
2010-10-10  9:32                                     ` Thomas Gleixner
2010-10-10 13:30                                     ` Anca Emanuel
2010-10-11  2:20                                     ` Yinghai Lu
2010-10-11  2:20                                       ` Yinghai Lu
2010-10-11  3:50                                     ` Yinghai Lu
2010-10-11  3:50                                       ` Yinghai Lu
2010-10-11  8:16                                       ` Thomas Gleixner
2010-10-11 11:34                                         ` Benjamin Herrenschmidt
2010-10-11 16:19                                         ` Yinghai Lu
2010-10-11 16:19                                           ` Yinghai Lu
2010-10-09  6:10                             ` Thomas Gleixner
2010-10-09  7:03                               ` Yinghai Lu
2010-10-09 12:12                                 ` Thomas Gleixner
2010-10-10  2:32                                   ` Yinghai Lu
2010-10-10  2:32                                     ` Yinghai Lu
2010-10-10  5:11                                   ` Yinghai Lu
2010-10-10  5:11                                     ` Yinghai Lu
2010-10-10  8:20                                     ` Thomas Gleixner
2010-10-03 16:41 ` Eric W. Biederman
2010-10-03 16:41   ` Eric W. Biederman
2010-10-03 19:16   ` Thomas Gleixner [this message]
2010-10-03 22:57     ` Benjamin Herrenschmidt
2010-10-04 16:31       ` Grant Likely
2010-10-04  0:49     ` Eric W. Biederman
2010-10-04  8:05       ` Thomas Gleixner
2010-10-04  1:13     ` Eric W. Biederman
2010-10-04  1:13       ` Eric W. Biederman
2010-10-04  6:36       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1010032034420.14550@localhost6.localdomain6 \
    --to=tglx@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=dwmw2@infradead.org \
    --cc=ebiederm@xmission.com \
    --cc=grant.likely@secretlab.ca \
    --cc=jbarnes@virtuousgeek.org \
    --cc=lethal@linux-sh.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=peterz@infradead.org \
    --cc=torvalds@osdl.org \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).