From: Thomas Gleixner <tglx@linutronix.de>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
linux-arch@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>,
Andrew Morton <akpm@linux-foundation.org>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mundt <lethal@linux-sh.org>,
Russell King <linux@arm.linux.org.uk>,
David Woodhouse <dwmw2@infradead.org>,
Jesse Barnes <jbarnes@virtuousgeek.org>,
Yinghai Lu <yinghai@kernel.org>,
Grant Likely <grant.likely@secretlab.ca>
Subject: Re: [patch 00/47] Sparse irq rework
Date: Sun, 3 Oct 2010 21:16:47 +0200 (CEST) [thread overview]
Message-ID: <alpine.LFD.2.00.1010032034420.14550@localhost6.localdomain6> (raw)
In-Reply-To: <m1wrpzfctu.fsf@fess.ebiederm.org>
On Sun, 3 Oct 2010, Eric W. Biederman wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> > Rationale:
> > ----------
> >
> > The current sparse_irq allocator has several short comings due to
> > failures in the design or the lack of it:
> >
> > - Requires iteration over the number of active irqs to find a free slot
> > Some architectures have grown their own workarounds for this.
> >
> > - Freeing of irq descriptors is not possible
> >
> > - Racy between create_irq_nr and destroy_irq plugged by horrible
> > callbacks
> >
> > - Migration of active irq descriptors is not possible
>
> I believe you have distored the design when aiming for migration
> of active irq descriptors (which you have not even implemented yet).
>
> How do you plan to remove the radix tree lookup from the irq
> handling path?
Not at all and it's not even even a requirement to remove the lookup
for implementing live migration.
> On x86 the obvious implementation is to store a pointer to the irq_desc
> in our 256 entry per cpu tables. Please implement this and see how
> it affects the design. The code is pretty trivial.
Thought about that already, but that's a pure optimization which does
not change anything about the underlying problem.
> >From what I can see of your migration plan it seems incompatible with
> removing the radix tree look up in the path to generic_handle_irq().
>
> > - No bulk allocation of irq ranges
>
> Where is that a short coming?
In embedded, where you have modular irq expanders loaded which
prefer to have a consecutive number space.
> > Aside of that the sparse irq design failure caused that we sprinkled
> > irq_desc references all over the place outside of kernel/irq/. That
> > makes it extremly hard to do the core changes which are necessary to
> > do further cleanups and improvements like he migration of active irq
> > descriptors. The arch code needs only to know about the irq chip and
> > the data associated with the irq. The irq descriptor itself is solely
> > a core code data structure.
>
> If by core you mean arch code irq handling code certainly and
> msi fits that bill.
Right. The chip functions are changing from (unsigned int) to (struct
irq_data *data). And that's what my first series is providing.
> > The reason is that with the non sparse code access to the irq data was
> > just array pointer math and most code (aside of the old __do_IRQ()
> > users) used the provided accessor functions.
> >
> > With sparse it requires a radix tree lookup, which casued performance
> > problems. Instead of tackling the problem at the chip function level
> > and handing down a pointer to the associated data instead of an irq
> > number, the low level code acquired a reference to irq_desc and
> > populated that all over the place. Yeah, it's easier than doing a full
> > cleanup and a sensible migration path, but the resulting mess is just
> > disgusting.
> >
> > The previous chip functions series on which this series is based is
> > addressing this issue on the chip level side by handing down the
> > associated interrupt data instead of the interruut number. The x86
> > cleanup is making use of it.
>
> And always handing down the data structure so you can do the same
> thing with sparse irq enabled or not is a much needed code cleanup.
Well, that's the plan. I just don't want to do the full tree sweep
myself. I have implemented a migration path in the first series which
allows a step by step cleanup of the chip implementations.
> > New implementation:
> > -------------------
> >
> > I've implemented a sane allocator which fixes the above short comings
> > (though migration of active descriptors still needs a full tree wide
> > cleanup of the direct and mostly unlocked access to irq_desc).
> >
> > The new allocator still uses a radix_tree, but uses a bitmap for
> > keeping track of allocated irq numbers. That results in:
>
> I don't know that I have a problem with this but I do have a problem
> with using a bitmap. A lot of the kernels irq usage has been distored
> because we use a compact array, that we cannot grow over time. Using a
> bitmap here essentially removes 90% of the point of sparse irq. The
> ability to remove a hard coded NR_IRQS from the kernel.
Well, lets look at some (un)realistic numbers:
Assume 16k cores and 32 irqs / core. That's 512k interrupts and
requires a 64k bitmap.
If we hit that limit, then we have some other more serious problems to
solve.
And I really do not see a point to have a truly random 64bit number
space for interrupts. Especially the dynamically allocated interrupts
(MSI & co) do not care about the number space at all. They care about
getting a unique number, nothing else.
> > - Fast lookup of a free slot
> >
> > - The removal of disposed descriptors (destroy_irq())
> >
> > - Prevents the create/destroy race
> >
> > - Bulk (de)allocation of consecutive irq ranges
> >
> > - Migration of life descriptors after further cleanups
>
> You should be able to do all of that by walking your radix tree in the
> sparse irq case.
The bitmap makes the design way simpler and gets rid of useless tree
walks and looped lookups for bulk allocations.
> > Full conversion and clean up of x86:
> > ------------------------------------
> >
> > I spent quite a time to come up with a sane and splitable concept,
> > which does not reach out into drivers/pci/[msi|ht|dmar] and whatever.
> >
> > But that's simply impossible because everything is twisted together
> > mainly by optimization hacks done over time. (i.e. handing down
> > irq_desc to low level msi functions instead of irq_desc.msi_desc would
> > have kept the mess confined to x86).
>
> Those files provide the genirq irq chip implementation especially
> drivers/pci/msi.c. Of course they will do what every other irq_chip
> implementation does to get access to data. There is an unpleasant
> difference between which generic irq data field htirq.c uses and msi.c
> which may be worth cleaning up. But otherwise I don't see any
> fundamental problems.
The fundamental problem I hit, was the hack which handed down irq_desc
to avoid the lookup. If it had been msi_desc in the first place, then
I would not even need to touch the msi code to cleanup x86.
> The big difference is those are the irq controllers that we have code
> for that is not necessarily architecture specific.
>
> > So I went there and started to convert stuff piece by piece in x86 and
> > added the drivers/pci/* fixes as separate patches along the way. Not
> > nice, but it turned out to be the only way which avoided even more
> > churn.
>
> You should be able to convert msi.c and company directly to using
> irq_data immediately following your previous patchset shouldn't you.
> Perhaps with two flavors of helper functions during the transition
> to passing irq_data everywhere.
That's already in the first series. Otherwise I would not be possible
to convert one irq chip after the other.
> I don't see any code in the msi code is arch specific or sparse irq
> specific.
I just did realize the irq_desc handdown to msi late, when I gradually
converted the irq chips which are used in io_apic.c. I can push that
patch further down in the queue, but that does not make a difference.
Thanks,
tglx
next prev parent reply other threads:[~2010-10-03 19:17 UTC|newest]
Thread overview: 158+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-30 23:14 [patch 00/47] Sparse irq rework Thomas Gleixner
2010-09-30 23:14 ` [patch 01/47] x86: Plug memory leak in sparse irq Thomas Gleixner
2010-09-30 23:14 ` [patch 02/47] x86: Hpet: Fix bogus error check in hpet_assign_irq() Thomas Gleixner
2010-09-30 23:14 ` Thomas Gleixner
2010-09-30 23:14 ` [patch 03/47] genirq: Provide status modifier Thomas Gleixner
2010-09-30 23:14 ` Thomas Gleixner
2010-09-30 23:14 ` [patch 04/47] arm: Use irq " Thomas Gleixner
2010-09-30 23:14 ` Thomas Gleixner
2010-09-30 23:14 ` [patch 05/47] genirq-sanitize-irq-data-accessors.patch Thomas Gleixner
2010-09-30 23:14 ` Thomas Gleixner
2010-09-30 23:15 ` [patch 06/47] genirq: Distangle kernel/irq/handle.c Thomas Gleixner
2010-09-30 23:15 ` Thomas Gleixner
2010-09-30 23:15 ` [patch 07/47] genirq: Remove early_init_irq_lock_class() Thomas Gleixner
2010-09-30 23:15 ` Thomas Gleixner
2010-09-30 23:15 ` [patch 08/47] genirq: Move core only inlines to kernel/irq Thomas Gleixner
2010-09-30 23:15 ` [patch 09/47] isdn: hisax: Replace the bogus access to irq stats Thomas Gleixner
2010-09-30 23:15 ` Thomas Gleixner
2010-09-30 23:15 ` [patch 10/47] genirq: Remove export of kstat_irqs_cpu Thomas Gleixner
2010-09-30 23:15 ` [patch 11/47] genirq: Provide default irq init flags Thomas Gleixner
2010-09-30 23:15 ` Thomas Gleixner
2010-09-30 23:15 ` [patch 12/47] arm: Use ARCH_IRQ_INIT_FLAGS Thomas Gleixner
2010-09-30 23:15 ` [patch 13/47] powerpc: " Thomas Gleixner
2010-09-30 23:15 ` Thomas Gleixner
2010-09-30 23:15 ` [patch 14/47] genirq: Implement a sane sparse_irq allocator Thomas Gleixner
2010-09-30 23:15 ` Thomas Gleixner
2010-10-01 5:28 ` Yinghai Lu
2010-10-01 20:36 ` Thomas Gleixner
2010-09-30 23:15 ` [patch 15/47] genirq: Prepare proc for real sparse irq support Thomas Gleixner
2010-09-30 23:15 ` [patch 16/47] genirq: Implement sane enumeration Thomas Gleixner
2010-09-30 23:15 ` Thomas Gleixner
2010-10-03 10:55 ` Grant Likely
2010-09-30 23:15 ` [patch 17/47] genirq-update-kerneldoc.patch Thomas Gleixner
2010-09-30 23:15 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 18/47] genirq: Use sane sparse allocator Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 19/47] genirq: Query arch for number of early descriptors Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 20/47] x86: Remove useless reinitialization of irq descriptors Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-10-03 15:21 ` Eric W. Biederman
2010-10-03 18:26 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 21/47] x86: Sanitize apb timer interrupt handling Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 22/47] x86: lguest: Convert to new irq chip functions Thomas Gleixner
2010-09-30 23:16 ` [patch 23/47] x86: Cleanup visws interrupt handling Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 24/47] x86: i8259: Convert to new irq_chip functions Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 25/47] x86: Cleanup io_apic Thomas Gleixner
2010-09-30 23:16 ` [patch 26/47] x86: io_apic: Convert startup to new irq_chip function Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 27/47] x86: ioapic: Convert mask " Thomas Gleixner
2010-09-30 23:16 ` [patch 28/47] x86: ioapic/hpet: Convert to new chip functions Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 29/47] pci: Convert msi to new irq_chip functions Thomas Gleixner
2010-09-30 23:16 ` Thomas Gleixner
2010-10-11 17:09 ` Jesse Barnes
2010-10-11 17:09 ` Jesse Barnes
2010-09-30 23:16 ` [patch 30/47] dmar: Convert to new irq chip functions Thomas Gleixner
2010-09-30 23:17 ` [patch 31/47] ht: Convert to new irq_chip functions Thomas Gleixner
2010-09-30 23:17 ` Thomas Gleixner
2010-09-30 23:17 ` [patch 32/47] x86: ioapic: Clean up the direct access to irq_desc Thomas Gleixner
2010-09-30 23:17 ` Thomas Gleixner
2010-09-30 23:17 ` [patch 33/47] pci: Cleanup the irq_desc mess in msi Thomas Gleixner
2010-10-11 17:08 ` Jesse Barnes
2010-10-11 17:08 ` Jesse Barnes
2010-09-30 23:17 ` [patch 34/47] x86: ioapic: Convert irq affinity to new chip functions Thomas Gleixner
2010-09-30 23:17 ` [patch 35/47] x86: ioapic: Cleanup some more Thomas Gleixner
2010-09-30 23:17 ` [patch 36/47] x86: ioapic: Cleanup sparse irq code Thomas Gleixner
2010-09-30 23:17 ` Thomas Gleixner
2010-09-30 23:17 ` [patch 37/47] x86: uv: Clean up the direct access to irq_desc Thomas Gleixner
2010-09-30 23:17 ` [patch 38/47] x86: Use sane enumeration Thomas Gleixner
2010-09-30 23:17 ` [patch 39/47] genirq: Remove arch_init_chip_data() Thomas Gleixner
2010-09-30 23:17 ` [patch 40/47] genirq: Sanitize dynamic irq handling Thomas Gleixner
2010-09-30 23:17 ` Thomas Gleixner
2010-10-01 5:47 ` Yinghai Lu
2010-09-30 23:18 ` [patch 41/47] arm: davinci: Cleanup irq_desc access Thomas Gleixner
2010-09-30 23:18 ` Thomas Gleixner
2010-09-30 23:18 ` [patch 42/47] genirq: Remove the now unused sparse irq leftovers Thomas Gleixner
2010-09-30 23:18 ` Thomas Gleixner
2010-09-30 23:18 ` [patch 43/47] x86: xen: Sanitise sparse_irq handling Thomas Gleixner
2010-09-30 23:18 ` Thomas Gleixner
2010-09-30 23:18 ` [patch 44/47] sh: Sanitize sparse irq Thomas Gleixner
2010-09-30 23:18 ` [patch 45/47] x86: lguest: Use new irq allocator Thomas Gleixner
2010-09-30 23:18 ` Thomas Gleixner
2010-09-30 23:18 ` [patch 46/47] powerpc: " Thomas Gleixner
2010-09-30 23:18 ` Thomas Gleixner
2010-10-01 0:42 ` Benjamin Herrenschmidt
2010-10-01 13:07 ` Thomas Gleixner
2010-10-01 20:46 ` Benjamin Herrenschmidt
2010-10-01 21:11 ` Grant Likely
2010-10-01 21:17 ` Benjamin Herrenschmidt
2010-10-03 16:53 ` Eric W. Biederman
2010-10-03 16:53 ` Eric W. Biederman
2010-10-03 18:34 ` Thomas Gleixner
2010-10-03 20:04 ` Thomas Gleixner
2010-10-03 22:54 ` Benjamin Herrenschmidt
2010-10-03 22:54 ` Benjamin Herrenschmidt
2010-10-04 0:15 ` Eric W. Biederman
2010-10-04 0:37 ` Benjamin Herrenschmidt
2010-10-04 16:46 ` Grant Likely
2010-09-30 23:18 ` [patch 47/47] genirq: Remove the old sparse irq allocator function Thomas Gleixner
2010-09-30 23:18 ` Thomas Gleixner
2010-10-01 3:32 ` [patch 00/47] Sparse irq rework Linus Torvalds
2010-10-01 3:32 ` Linus Torvalds
2010-10-01 5:54 ` Yinghai Lu
2010-10-01 5:54 ` Yinghai Lu
2010-10-01 20:35 ` Thomas Gleixner
2010-10-03 11:23 ` Grant Likely
2010-10-03 11:29 ` Russell King - ARM Linux
2010-10-03 11:29 ` Russell King - ARM Linux
2010-10-03 11:57 ` Grant Likely
2010-10-03 13:48 ` Thomas Gleixner
2010-10-05 10:22 ` Thomas Gleixner
2010-10-05 10:22 ` Thomas Gleixner
2010-10-06 22:45 ` Yinghai Lu
2010-10-06 22:52 ` Thomas Gleixner
2010-10-06 23:37 ` Yinghai Lu
2010-10-07 0:16 ` Yinghai Lu
2010-10-07 4:01 ` Thomas Gleixner
2010-10-07 4:38 ` Yinghai Lu
2010-10-08 21:50 ` Thomas Gleixner
2010-10-08 21:54 ` Thomas Gleixner
2010-10-09 4:26 ` Yinghai Lu
2010-10-09 5:44 ` Yinghai Lu
2010-10-09 6:34 ` Thomas Gleixner
2010-10-09 7:08 ` Yinghai Lu
2010-10-09 7:08 ` Yinghai Lu
2010-10-09 12:08 ` Thomas Gleixner
2010-10-10 9:32 ` Thomas Gleixner
2010-10-10 9:32 ` Thomas Gleixner
2010-10-10 13:30 ` Anca Emanuel
2010-10-11 2:20 ` Yinghai Lu
2010-10-11 2:20 ` Yinghai Lu
2010-10-11 3:50 ` Yinghai Lu
2010-10-11 3:50 ` Yinghai Lu
2010-10-11 8:16 ` Thomas Gleixner
2010-10-11 11:34 ` Benjamin Herrenschmidt
2010-10-11 16:19 ` Yinghai Lu
2010-10-11 16:19 ` Yinghai Lu
2010-10-09 6:10 ` Thomas Gleixner
2010-10-09 7:03 ` Yinghai Lu
2010-10-09 12:12 ` Thomas Gleixner
2010-10-10 2:32 ` Yinghai Lu
2010-10-10 2:32 ` Yinghai Lu
2010-10-10 5:11 ` Yinghai Lu
2010-10-10 5:11 ` Yinghai Lu
2010-10-10 8:20 ` Thomas Gleixner
2010-10-03 16:41 ` Eric W. Biederman
2010-10-03 16:41 ` Eric W. Biederman
2010-10-03 19:16 ` Thomas Gleixner [this message]
2010-10-03 22:57 ` Benjamin Herrenschmidt
2010-10-04 16:31 ` Grant Likely
2010-10-04 0:49 ` Eric W. Biederman
2010-10-04 8:05 ` Thomas Gleixner
2010-10-04 1:13 ` Eric W. Biederman
2010-10-04 1:13 ` Eric W. Biederman
2010-10-04 6:36 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.1010032034420.14550@localhost6.localdomain6 \
--to=tglx@linutronix.de \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=dwmw2@infradead.org \
--cc=ebiederm@xmission.com \
--cc=grant.likely@secretlab.ca \
--cc=jbarnes@virtuousgeek.org \
--cc=lethal@linux-sh.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=peterz@infradead.org \
--cc=torvalds@osdl.org \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).