* Re: + msi-refactor-and-move-the-msi-irq_chip-into-the-arch-code.patch added to -mm tree [not found] <200609272215.k8RMFbuH018967@shell0.pdx.osdl.net> @ 2006-09-27 22:50 ` David Miller 2006-09-27 23:09 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 4+ messages in thread From: David Miller @ 2006-09-27 22:50 UTC (permalink / raw) To: linux-kernel, akpm; +Cc: ebiederm, ak, benh, greg, mingo, tglx, tony.luck From: akpm@osdl.org Date: Wed, 27 Sep 2006 15:15:37 -0700 > Subject: msi: refactor and move the msi irq_chip into the arch code > From: "Eric W. Biederman" <ebiederm@xmission.com> > > It turns out msi_ops was simply not enough to abstract the architecture > specific details of msi. So I have moved the resposibility of constructing > the struct irq_chip to the architectures, and have two architecture specific > functions arch_setup_msi_irq, and arch_teardown_msi_irq. > > For simple architectures those functions can do all of the work. For > architectures with platform dependencies they can call into the appropriate > platform code. > > With this msi.c is finally free of assuming you have an apic, and this > actually takes less code. > > The helpers for the architecture specific code are declared in the linux/msi.h > to keep them separate from the msi functions used by drivers in linux/pci.h > > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Andi Kleen <ak@suse.de> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Greg KH <greg@kroah.com> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Signed-off-by: Andrew Morton <akpm@osdl.org> Eric, thanks so much for doing this work. Once this goes in I'll try to add support for MSI on sparc64 Niagara boxes. I suppose the PowerPC folks can make use of this as well. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: + msi-refactor-and-move-the-msi-irq_chip-into-the-arch-code.patch added to -mm tree 2006-09-27 22:50 ` + msi-refactor-and-move-the-msi-irq_chip-into-the-arch-code.patch added to -mm tree David Miller @ 2006-09-27 23:09 ` Benjamin Herrenschmidt 2006-09-28 5:50 ` Eric W. Biederman 0 siblings, 1 reply; 4+ messages in thread From: Benjamin Herrenschmidt @ 2006-09-27 23:09 UTC (permalink / raw) To: David Miller Cc: linux-kernel, akpm, ebiederm, ak, greg, mingo, tglx, tony.luck, Michael Ellerman > Eric, thanks so much for doing this work. > > Once this goes in I'll try to add support for MSI on sparc64 > Niagara boxes. I suppose the PowerPC folks can make use of > this as well. Well, it's definitely a great step in the right direction. But I have some issues with one basic element of the design which is the allocation of linux irqs being made generic which I can't see how it can apply to us in any shape or form unfortunately without yet another significant rework of the ppc interrupt management code (ugh). We have our allocation mecanism for interrupts, which involves binding them to a controller among other things. Unfortunately, we actually wrote something for MSIs, but Michael didn't finish yet and didn't post it to lkml yet. We kept the idea of msi_ops though that isn't necessary (we could hide them in the arch if necessary) because we need different mecanisms for different busses in the same machine. We had a pci_get_msi_ops(pdev) provided by the arch returning the ops for a given device, and the ops callbacks we had were along the lines of: - alloc: allocates linux irqs for MSIs, (on powerpc, that includes binds them to a controller) and setup the irq_desc / irq_chip for them. This got passed the array of MSI(X) from the pci_enable_msi/x call (we use a one entry array for MSI-nonX iirc). - setup: returns for one given MSI the address/data to write to the device. Could be better named I agree as it doesn't actually setup anything, maybe get_data ? - free. The important thing here is that alloc is arch specific. We don't absolutely need the msi_ops mecanism exposed generically, it could be instead something like pcibios_alloc_msis() and pcibios_setup_msi() etc... and internally, powerpc implementation could use per-bus function pointers, but the base idea is that allocation is something that is left to the platform. Also, another important thing we have to do to be compatible with our hypervisor based platforms is to have the option of setup being NULL in the ops (which could also be done differently using a special result code from alloc). In this case, alloc does everything and is essentially a re-implementation at the top level of pci_enable_msi(x). We need that because on IBM PAPR at least, the hypervisor does everything: allocating hardware vectors, configuring the device, etc... so all the platform code does is to allocate linux irq numbers and bind them to the vectors returned by HV. There are additional "nits" with thus patch as I see it. He puts the msi_desc in the irq_desc->chip_data. That cannot work for us. MSIs can be routed in hardware to various chips like the MPIC, the XICS, etc... and thus their irq_chip (as setup by our alloc callback) will be our normal MPIC or XICS irq_chips... those controller implementations already use irq_desc->chip_data for their own use and having the MSI layer hijack it will just blow things up. (Typically we can have multiple PIC instances and we put the instance pointer in there). I think the msi_desc (or whatever data structure that holds the state of the MSI(X), backup LSI, etc... for one device) shall be hanging off struct pci_dev instead. Michael, can you post your current patch somewhere where it can be seen as an example of our approach ? Even if it is not complete yet in the sense that we haven't tested it with a backend yet, we were in the process of doing so... It should certainly be possible to modify Eric approach to fit our needs, but that probably involves not having the irq allocation mecanism made generic. Cheers, Ben. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: + msi-refactor-and-move-the-msi-irq_chip-into-the-arch-code.patch added to -mm tree 2006-09-27 23:09 ` Benjamin Herrenschmidt @ 2006-09-28 5:50 ` Eric W. Biederman 2006-09-28 6:16 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 4+ messages in thread From: Eric W. Biederman @ 2006-09-28 5:50 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: David Miller, linux-kernel, akpm, ebiederm, ak, greg, mingo, tglx, tony.luck, Michael Ellerman Benjamin Herrenschmidt <benh@kernel.crashing.org> writes: >> Eric, thanks so much for doing this work. >> >> Once this goes in I'll try to add support for MSI on sparc64 >> Niagara boxes. I suppose the PowerPC folks can make use of >> this as well. > > Well, it's definitely a great step in the right direction. But I have > some issues with one basic element of the design which is the allocation > of linux irqs being made generic which I can't see how it can apply to > us in any shape or form unfortunately without yet another significant > rework of the ppc interrupt management code (ugh). So for those architectures where there are not weird ties I think it makes sense. One function to allocate a linux irq number and another function to use it for something. I just looked and it looks relatively easy to refactor that part of the code so we do the work with one architecture call and not too. Both calls always happen within close proximity to each other. So that fact I use two functions appears to be incidental instead of fundamental. > We have our allocation mecanism for interrupts, which involves binding > them to a controller among other things. > > Unfortunately, we actually wrote something for MSIs, but Michael didn't > finish yet and didn't post it to lkml yet. We kept the idea of msi_ops > though that isn't necessary (we could hide them in the arch if > necessary) because we need different mecanisms for different busses in > the same machine. We had a pci_get_msi_ops(pdev) provided by the arch > returning the ops for a given device, and the ops callbacks we had were > along the lines of: My apologies for the conflict. I had hoped to catch you at OLS so I could get a better understanding of how things look in ppc land but we never ran into each other. I also posted a precursor to this design with the hypertransport irqs and asked for comments. > - alloc: allocates linux irqs for MSIs, (on powerpc, that includes > binds them to a controller) and setup the irq_desc / irq_chip for them. > This got passed the array of MSI(X) from the pci_enable_msi/x call (we > use a one entry array for MSI-nonX iirc). > > - setup: returns for one given MSI the address/data to write to the > device. Could be better named I agree as it doesn't actually setup > anything, maybe get_data ? > > - free. > > The important thing here is that alloc is arch specific. We don't > absolutely need the msi_ops mecanism exposed generically, it could be > instead something like pcibios_alloc_msis() and pcibios_setup_msi() > etc... and internally, powerpc implementation could use per-bus function > pointers, but the base idea is that allocation is something that is left > to the platform. Sure, and I left it to the architecture, so there should be no problem there. Mostly it looks like to meet your needs more of msi.c needs to be turned into library functions that most architectures can use but your weird cases on ppc can skip. > Also, another important thing we have to do to be compatible with our > hypervisor based platforms is to have the option of setup being NULL in > the ops (which could also be done differently using a special result > code from alloc). In this case, alloc does everything and is essentially > a re-implementation at the top level of pci_enable_msi(x). We need that > because on IBM PAPR at least, the hypervisor does everything: allocating > hardware vectors, configuring the device, etc... so all the platform > code does is to allocate linux irq numbers and bind them to the vectors > returned by HV. I knew there was a hypervisor issue but I don't have enough visibility there so I didn't even try. I have a weird concern coming from working with the infinipath driver, and that is what happens if someone almost implements msi and puts the registers in the wrong location. > There are additional "nits" with thus patch as I see it. He puts the > msi_desc in the irq_desc->chip_data. That cannot work for us. MSIs can > be routed in hardware to various chips like the MPIC, the XICS, etc... > and thus their irq_chip (as setup by our alloc callback) will be our > normal MPIC or XICS irq_chips... those controller implementations > already use irq_desc->chip_data for their own use and having the MSI > layer hijack it will just blow things up. (Typically we can have > multiple PIC instances and we put the instance pointer in there). The problem is simply we have 2 irq controllers in play. The one on the chip and the one your the msi is targeted at. It might be easiest to add an extra pointer in struct irq to handle the case of irq controllers on plug in devices. > I think the msi_desc (or whatever data structure that holds the state of > the MSI(X), backup LSI, etc... for one device) shall be hanging off > struct pci_dev instead. That doesn't feel right when you can have up to 4K irqs per device. Using using the msi_desc array is a reasonable short term solution but I am hoping at some point to kill all of the NR_IRQ arrays so we can more efficiently have a sparsely populated irq space. But I would really like something like a slot I can use in struct irq_desc. > Michael, can you post your current patch somewhere where it can be seen > as an example of our approach ? Even if it is not complete yet in the > sense that we haven't tested it with a backend yet, we were in the > process of doing so... > > It should certainly be possible to modify Eric approach to fit our > needs, but that probably involves not having the irq allocation mecanism > made generic. Sounds good to me. One small step at a time. As long as we keep things in terms of linux irq numbers in the generic code. I really don't much care. Let's just take this one small step at a time and fix the issues that we see and understand well. All I know is that the original code was so horrible that when I replicated the code across 3 different architectures the total number of lines went down. There are things in that code like msi_lock that I think are totally unnecessary but haven't been annoying enough for me to kill. I'm just glad things are now close enough people can imagine how to go from where we are to where the code needs to be. Eric ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: + msi-refactor-and-move-the-msi-irq_chip-into-the-arch-code.patch added to -mm tree 2006-09-28 5:50 ` Eric W. Biederman @ 2006-09-28 6:16 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 4+ messages in thread From: Benjamin Herrenschmidt @ 2006-09-28 6:16 UTC (permalink / raw) To: Eric W. Biederman Cc: David Miller, linux-kernel, akpm, ak, greg, mingo, tglx, tony.luck, Michael Ellerman > My apologies for the conflict. I had hoped to catch you at OLS so > I could get a better understanding of how things look in ppc land but > we never ran into each other. I also posted a precursor to this design > with the hypertransport irqs and asked for comments. No worries. We just both got sidetracked and Michael forgot to send you his preliminary code... anyway, it's not that fundamentally alien to our needs. I'm tempted to let Michael finish and push what he has so we have a solution for 2.6.19 (we have some emergency there) and in a second step, look into reconciling the 2 approaches. > > The important thing here is that alloc is arch specific. We don't > > absolutely need the msi_ops mecanism exposed generically, it could be > > instead something like pcibios_alloc_msis() and pcibios_setup_msi() > > etc... and internally, powerpc implementation could use per-bus function > > pointers, but the base idea is that allocation is something that is left > > to the platform. > > Sure, and I left it to the architecture, so there should be no problem > there. Ok. Good. > Mostly it looks like to meet your needs more of msi.c needs to be > turned into library functions that most architectures can use but your > weird cases on ppc can skip. Sort-of... We didn't that much use the library model though with our implementation as I didn't expect alloc to be generic on some archs, but you are right, it could be made that way. (I sort of assume that all archs have different means of allocating HW vectors and matching them to linux IRQs, or at least discovering which linux IRQs are not in potential use by hardware...). > I knew there was a hypervisor issue but I don't have enough visibility > there so I didn't even try. > > I have a weird concern coming from working with the infinipath driver, > and that is what happens if someone almost implements msi and puts the > registers in the wrong location. Heh... well, our hypervisor is nasty in the sense that it will do everything, and doesn't give us the choice, on MSI-X, of what individual MSI-X to enable. Only how many (starting with the first one). That doesn't quite match the linux API, though looking at how that API is used, I haven't found somebody requesting a discontiguous range of MSIs to be enabled ... > The problem is simply we have 2 irq controllers in play. The > one on the chip and the one your the msi is targeted at. It might > be easiest to add an extra pointer in struct irq to handle the case > of irq controllers on plug in devices. Well, our current approach on PPC hides the "MSI controller" part in the code for the PIC that receives the MSI. In practice, that works fairly well because we almost never have to handle different combinations: the Apple U4's PCIe is always driven by MPIC, the HT controllers (HT2000 typically is what we use) too, thus we just added code to the MPIC driver to also handle MSI for both types of controllers. The Cell stuff is separate and just a self-contained cascaded controller itself (the MSI part), etc... In some cases (like the HT and Apple PCIe), the MSI logic is comletely transparent, it's just a small decoder that turns MSI writes to a magic address or HT interrupts into inputs on the MPIC. Thus the irq_chip is really only the "standard" MPIC irq_chip... Same with hypervisor, where they just appear as just some more irqs to the existing virtual IRQ controller, and thus have the exact same irq_chip... those (at least MPIC does) need to use the chip_data already for other means. Thus it's the alloc function which is routed to the appropriate controller code for a given device, which allocates the low level hardware vectors, binds them to linux irqs, and sets up an appropriate irq_desc/irq_chip for it. That's all local to the alloc function, and those alloc functions are currently provided by PIC code. > > I think the msi_desc (or whatever data structure that holds the state of > > the MSI(X), backup LSI, etc... for one device) shall be hanging off > > struct pci_dev instead. > > That doesn't feel right when you can have up to 4K irqs per device. Why ? I didn't say it shall be embedded in the pci_dev, but hanging off it (a pointer). > Using using the msi_desc array is a reasonable short term solution > but I am hoping at some point to kill all of the NR_IRQ arrays > so we can more efficiently have a sparsely populated irq space. > But I would really like something like a slot I can use in struct > irq_desc. Might be useful too yes... I think our current implementation carries a mapping table of IRQs to pci_dev to get to the pointer in pci_dev iirc (at least that's how we discussed it a while ago with Michael, I don't know if he actually followed that). It might be better indeed to just have a pointer off irq_desc for exclusive use by MSIs ... > Sounds good to me. One small step at a time. As long as we keep > things in terms of linux irq numbers in the generic code. I really > don't much care. Yes, we all agree there. Hardware number are not visible to non-arch code. > Let's just take this one small step at a time and fix the issues that > we see and understand well. All I know is that the original code was > so horrible that when I replicated the code across 3 different > architectures the total number of lines went down. Yes, you work as I said is definitely a good step in the right direction. I don't think the divergences are that fundamental. > There are things in that code like msi_lock that I think are totally > unnecessary but haven't been annoying enough for me to kill. > > I'm just glad things are now close enough people can imagine > how to go from where we are to where the code needs to be. Yeah :) Cheers, Ben. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-09-28 6:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200609272215.k8RMFbuH018967@shell0.pdx.osdl.net>
2006-09-27 22:50 ` + msi-refactor-and-move-the-msi-irq_chip-into-the-arch-code.patch added to -mm tree David Miller
2006-09-27 23:09 ` Benjamin Herrenschmidt
2006-09-28 5:50 ` Eric W. Biederman
2006-09-28 6:16 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox