From: ebiederm@xmission.com (Eric W. Biederman)
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
linux-arch@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>,
Andrew Morton <akpm@linux-foundation.org>,
x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mundt <lethal@linux-sh.org>,
Russell King <linux@arm.linux.org.uk>,
David Woodhouse <dwmw2@infradead.org>,
Jesse Barnes <jbarnes@virtuousgeek.org>,
Yinghai Lu <yinghai@kernel.org>,
Grant Likely <grant.likely@secretlab.ca>
Subject: Re: [patch 00/47] Sparse irq rework
Date: Sun, 03 Oct 2010 09:41:49 -0700 [thread overview]
Message-ID: <m1wrpzfctu.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <20100930221351.682772535@linutronix.de> (Thomas Gleixner's message of "Thu, 30 Sep 2010 23:14:27 -0000")
Thomas Gleixner <tglx@linutronix.de> writes:
> The following patch series cleans up and mostly reimplements the core
> sparse irq implementation and sanitizes the most complex (ab)user:
> arch/x86
Overall this patchset looks pretty sane, but I don't see a clear picture
of what everything is going to look like when the dust settles.
> The series is based on the previous rework of irq chip functions which
> is available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git irq/core
>
> A combined throwaway git repository with all the following patches on top of
> tip/irq/core is available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-sparse-irq.git
>
> The overall changes are (full changelog below):
> 56 files changed, 1229 insertions(+), 1682 deletions(-)
>
> The series consists of 3 parts:
>
> - cleanup of kernel/irq code and implementation of new allocator
> - conversion of x86 to new irq_chip functions and new allocator
> - trivial cleanup of the remaining users and removal of the old stuff
>
> It's fully bisectable and survived a night of testing in my testfarm.
>
> There are two bugfix patches (1/47, 2/47), which resulted of staring
> at that maze for way too long. They are targeted for mainline urgent,
> but I left them in the queue to avoid further churn.
>
>
> Rationale:
> ----------
>
> The current sparse_irq allocator has several short comings due to
> failures in the design or the lack of it:
>
> - Requires iteration over the number of active irqs to find a free slot
> Some architectures have grown their own workarounds for this.
>
> - Freeing of irq descriptors is not possible
>
> - Racy between create_irq_nr and destroy_irq plugged by horrible
> callbacks
>
> - Migration of active irq descriptors is not possible
I believe you have distored the design when aiming for migration
of active irq descriptors (which you have not even implemented yet).
How do you plan to remove the radix tree lookup from the irq
handling path?
On x86 the obvious implementation is to store a pointer to the irq_desc
in our 256 entry per cpu tables. Please implement this and see how
it affects the design. The code is pretty trivial.
>From what I can see of your migration plan it seems incompatible with
removing the radix tree look up in the path to generic_handle_irq().
> - No bulk allocation of irq ranges
Where is that a short coming?
> Aside of that the sparse irq design failure caused that we sprinkled
> irq_desc references all over the place outside of kernel/irq/. That
> makes it extremly hard to do the core changes which are necessary to
> do further cleanups and improvements like he migration of active irq
> descriptors. The arch code needs only to know about the irq chip and
> the data associated with the irq. The irq descriptor itself is solely
> a core code data structure.
If by core you mean arch code irq handling code certainly and
msi fits that bill.
> The reason is that with the non sparse code access to the irq data was
> just array pointer math and most code (aside of the old __do_IRQ()
> users) used the provided accessor functions.
>
> With sparse it requires a radix tree lookup, which casued performance
> problems. Instead of tackling the problem at the chip function level
> and handing down a pointer to the associated data instead of an irq
> number, the low level code acquired a reference to irq_desc and
> populated that all over the place. Yeah, it's easier than doing a full
> cleanup and a sensible migration path, but the resulting mess is just
> disgusting.
>
> The previous chip functions series on which this series is based is
> addressing this issue on the chip level side by handing down the
> associated interrupt data instead of the interruut number. The x86
> cleanup is making use of it.
And always handing down the data structure so you can do the same
thing with sparse irq enabled or not is a much needed code cleanup.
> New implementation:
> -------------------
>
> I've implemented a sane allocator which fixes the above short comings
> (though migration of active descriptors still needs a full tree wide
> cleanup of the direct and mostly unlocked access to irq_desc).
>
> The new allocator still uses a radix_tree, but uses a bitmap for
> keeping track of allocated irq numbers. That results in:
I don't know that I have a problem with this but I do have a problem
with using a bitmap. A lot of the kernels irq usage has been distored
because we use a compact array, that we cannot grow over time. Using a
bitmap here essentially removes 90% of the point of sparse irq. The
ability to remove a hard coded NR_IRQS from the kernel.
> - Fast lookup of a free slot
>
> - The removal of disposed descriptors (destroy_irq())
>
> - Prevents the create/destroy race
>
> - Bulk (de)allocation of consecutive irq ranges
>
> - Migration of life descriptors after further cleanups
You should be able to do all of that by walking your radix tree in the
sparse irq case.
> Full conversion and clean up of x86:
> ------------------------------------
>
> I spent quite a time to come up with a sane and splitable concept,
> which does not reach out into drivers/pci/[msi|ht|dmar] and whatever.
>
> But that's simply impossible because everything is twisted together
> mainly by optimization hacks done over time. (i.e. handing down
> irq_desc to low level msi functions instead of irq_desc.msi_desc would
> have kept the mess confined to x86).
Those files provide the genirq irq chip implementation especially
drivers/pci/msi.c. Of course they will do what every other irq_chip
implementation does to get access to data. There is an unpleasant
difference between which generic irq data field htirq.c uses and msi.c
which may be worth cleaning up. But otherwise I don't see any
fundamental problems.
The big difference is those are the irq controllers that we have code
for that is not necessarily architecture specific.
> So I went there and started to convert stuff piece by piece in x86 and
> added the drivers/pci/* fixes as separate patches along the way. Not
> nice, but it turned out to be the only way which avoided even more
> churn.
You should be able to convert msi.c and company directly to using
irq_data immediately following your previous patchset shouldn't you.
Perhaps with two flavors of helper functions during the transition
to passing irq_data everywhere.
I don't see any code in the msi code is arch specific or sparse irq
specific.
> Further work:
> -------------
>
> - Cleanup the irq_desc references all over the tree, which should become
> easier after the remaining __do_IRQ() users are gone.
>
> - Implement migration of active irq descriptors
>
> - Implement node bound late allocation of low level irq vectors which
> solves an existing (SGI) problem on large machines.
>
>
> How to merge:
> -------------
>
> It needs:
>
> - ack to the new allocator design
>
> - ack to merge the whole arch/!x86 and driver related cleanups
> along with the core changes and the x86 cleanup
Eric
next prev parent reply other threads:[~2010-10-03 16:42 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-30 23:14 [patch 00/47] Sparse irq rework Thomas Gleixner
2010-09-30 23:14 ` [patch 01/47] x86: Plug memory leak in sparse irq Thomas Gleixner
2010-09-30 23:14 ` [patch 02/47] x86: Hpet: Fix bogus error check in hpet_assign_irq() Thomas Gleixner
2010-09-30 23:14 ` [patch 03/47] genirq: Provide status modifier Thomas Gleixner
2010-09-30 23:14 ` [patch 04/47] arm: Use irq " Thomas Gleixner
2010-09-30 23:14 ` [patch 05/47] genirq-sanitize-irq-data-accessors.patch Thomas Gleixner
2010-09-30 23:15 ` [patch 06/47] genirq: Distangle kernel/irq/handle.c Thomas Gleixner
2010-09-30 23:15 ` [patch 07/47] genirq: Remove early_init_irq_lock_class() Thomas Gleixner
2010-09-30 23:15 ` [patch 08/47] genirq: Move core only inlines to kernel/irq Thomas Gleixner
2010-09-30 23:15 ` [patch 09/47] isdn: hisax: Replace the bogus access to irq stats Thomas Gleixner
2010-09-30 23:15 ` [patch 10/47] genirq: Remove export of kstat_irqs_cpu Thomas Gleixner
2010-09-30 23:15 ` [patch 11/47] genirq: Provide default irq init flags Thomas Gleixner
2010-09-30 23:15 ` [patch 12/47] arm: Use ARCH_IRQ_INIT_FLAGS Thomas Gleixner
2010-09-30 23:15 ` [patch 13/47] powerpc: " Thomas Gleixner
2010-09-30 23:15 ` [patch 14/47] genirq: Implement a sane sparse_irq allocator Thomas Gleixner
2010-10-01 5:28 ` Yinghai Lu
2010-10-01 20:36 ` Thomas Gleixner
2010-09-30 23:15 ` [patch 15/47] genirq: Prepare proc for real sparse irq support Thomas Gleixner
2010-09-30 23:15 ` [patch 16/47] genirq: Implement sane enumeration Thomas Gleixner
2010-10-03 10:55 ` Grant Likely
2010-09-30 23:15 ` [patch 17/47] genirq-update-kerneldoc.patch Thomas Gleixner
2010-09-30 23:16 ` [patch 18/47] genirq: Use sane sparse allocator Thomas Gleixner
2010-09-30 23:16 ` [patch 19/47] genirq: Query arch for number of early descriptors Thomas Gleixner
2010-09-30 23:16 ` [patch 20/47] x86: Remove useless reinitialization of irq descriptors Thomas Gleixner
2010-10-03 15:21 ` Eric W. Biederman
2010-10-03 18:26 ` Thomas Gleixner
2010-09-30 23:16 ` [patch 21/47] x86: Sanitize apb timer interrupt handling Thomas Gleixner
2010-09-30 23:16 ` [patch 22/47] x86: lguest: Convert to new irq chip functions Thomas Gleixner
2010-09-30 23:16 ` [patch 23/47] x86: Cleanup visws interrupt handling Thomas Gleixner
2010-09-30 23:16 ` [patch 24/47] x86: i8259: Convert to new irq_chip functions Thomas Gleixner
2010-09-30 23:16 ` [patch 25/47] x86: Cleanup io_apic Thomas Gleixner
2010-09-30 23:16 ` [patch 26/47] x86: io_apic: Convert startup to new irq_chip function Thomas Gleixner
2010-09-30 23:16 ` [patch 27/47] x86: ioapic: Convert mask " Thomas Gleixner
2010-09-30 23:16 ` [patch 28/47] x86: ioapic/hpet: Convert to new chip functions Thomas Gleixner
2010-09-30 23:16 ` [patch 29/47] pci: Convert msi to new irq_chip functions Thomas Gleixner
2010-10-11 17:09 ` Jesse Barnes
2010-09-30 23:16 ` [patch 30/47] dmar: Convert to new irq chip functions Thomas Gleixner
2010-09-30 23:17 ` [patch 31/47] ht: Convert to new irq_chip functions Thomas Gleixner
2010-09-30 23:17 ` [patch 32/47] x86: ioapic: Clean up the direct access to irq_desc Thomas Gleixner
2010-09-30 23:17 ` [patch 33/47] pci: Cleanup the irq_desc mess in msi Thomas Gleixner
2010-10-11 17:08 ` Jesse Barnes
2010-09-30 23:17 ` [patch 34/47] x86: ioapic: Convert irq affinity to new chip functions Thomas Gleixner
2010-09-30 23:17 ` [patch 35/47] x86: ioapic: Cleanup some more Thomas Gleixner
2010-09-30 23:17 ` [patch 36/47] x86: ioapic: Cleanup sparse irq code Thomas Gleixner
2010-09-30 23:17 ` [patch 37/47] x86: uv: Clean up the direct access to irq_desc Thomas Gleixner
2010-09-30 23:17 ` [patch 38/47] x86: Use sane enumeration Thomas Gleixner
2010-09-30 23:17 ` [patch 39/47] genirq: Remove arch_init_chip_data() Thomas Gleixner
2010-09-30 23:17 ` [patch 40/47] genirq: Sanitize dynamic irq handling Thomas Gleixner
2010-10-01 5:47 ` Yinghai Lu
2010-09-30 23:18 ` [patch 41/47] arm: davinci: Cleanup irq_desc access Thomas Gleixner
2010-09-30 23:18 ` [patch 42/47] genirq: Remove the now unused sparse irq leftovers Thomas Gleixner
2010-09-30 23:18 ` [patch 43/47] x86: xen: Sanitise sparse_irq handling Thomas Gleixner
2010-09-30 23:18 ` [patch 44/47] sh: Sanitize sparse irq Thomas Gleixner
2010-09-30 23:18 ` [patch 45/47] x86: lguest: Use new irq allocator Thomas Gleixner
2010-09-30 23:18 ` [patch 46/47] powerpc: " Thomas Gleixner
2010-10-01 0:42 ` Benjamin Herrenschmidt
2010-10-01 13:07 ` Thomas Gleixner
2010-10-01 20:46 ` Benjamin Herrenschmidt
2010-10-01 21:11 ` Grant Likely
2010-10-01 21:17 ` Benjamin Herrenschmidt
2010-10-03 16:53 ` Eric W. Biederman
2010-10-03 18:34 ` Thomas Gleixner
2010-10-03 20:04 ` Thomas Gleixner
2010-10-03 22:54 ` Benjamin Herrenschmidt
2010-10-04 0:15 ` Eric W. Biederman
2010-10-04 0:37 ` Benjamin Herrenschmidt
2010-10-04 16:46 ` Grant Likely
2010-09-30 23:18 ` [patch 47/47] genirq: Remove the old sparse irq allocator function Thomas Gleixner
2010-10-01 3:32 ` [patch 00/47] Sparse irq rework Linus Torvalds
2010-10-01 5:54 ` Yinghai Lu
2010-10-01 20:35 ` Thomas Gleixner
2010-10-03 11:23 ` Grant Likely
2010-10-03 11:29 ` Russell King - ARM Linux
2010-10-03 11:57 ` Grant Likely
2010-10-03 13:48 ` Thomas Gleixner
2010-10-05 10:22 ` Thomas Gleixner
2010-10-06 22:45 ` Yinghai Lu
2010-10-06 22:52 ` Thomas Gleixner
2010-10-06 23:37 ` Yinghai Lu
2010-10-07 0:16 ` Yinghai Lu
2010-10-07 4:01 ` Thomas Gleixner
2010-10-07 4:38 ` Yinghai Lu
2010-10-08 21:50 ` Thomas Gleixner
2010-10-08 21:54 ` Thomas Gleixner
2010-10-09 4:26 ` Yinghai Lu
2010-10-09 5:44 ` Yinghai Lu
2010-10-09 6:34 ` Thomas Gleixner
2010-10-09 7:08 ` Yinghai Lu
2010-10-09 12:08 ` Thomas Gleixner
2010-10-10 9:32 ` Thomas Gleixner
2010-10-10 13:30 ` Anca Emanuel
2010-10-11 2:20 ` Yinghai Lu
2010-10-11 3:50 ` Yinghai Lu
2010-10-11 8:16 ` Thomas Gleixner
2010-10-11 11:34 ` Benjamin Herrenschmidt
2010-10-11 16:19 ` Yinghai Lu
2010-10-12 20:23 ` [tip:irq/core] x86: Don't setup ioapic irq for sci twice tip-bot for Yinghai Lu
2010-10-09 6:10 ` [patch 00/47] Sparse irq rework Thomas Gleixner
2010-10-09 7:03 ` Yinghai Lu
2010-10-09 12:12 ` Thomas Gleixner
2010-10-10 2:32 ` Yinghai Lu
2010-10-10 5:11 ` Yinghai Lu
2010-10-10 8:20 ` Thomas Gleixner
2010-10-03 16:41 ` Eric W. Biederman [this message]
2010-10-03 19:16 ` Thomas Gleixner
2010-10-03 22:57 ` Benjamin Herrenschmidt
2010-10-04 16:31 ` Grant Likely
2010-10-04 0:49 ` Eric W. Biederman
2010-10-04 8:05 ` Thomas Gleixner
2010-10-04 1:13 ` Eric W. Biederman
2010-10-04 6:36 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1wrpzfctu.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=dwmw2@infradead.org \
--cc=grant.likely@secretlab.ca \
--cc=jbarnes@virtuousgeek.org \
--cc=lethal@linux-sh.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=torvalds@osdl.org \
--cc=x86@kernel.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox