* No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) [not found] ` <47112797.7060003@s5r6.in-berlin.de> @ 2007-12-17 23:53 ` Stefan Richter 2007-12-18 0:50 ` David Miller 2007-12-18 2:58 ` Chris Newport 0 siblings, 2 replies; 14+ messages in thread From: Stefan Richter @ 2007-12-17 23:53 UTC (permalink / raw) To: sparclinux Cc: linux-kernel, David S. Miller, Greg Kroah-Hartman, Kristian Høgsberg, linux1394-devel Stefan Richter wrote on 2007-10-13: > bugme-daemon@bugzilla.kernel.org wrote: >> http://bugzilla.kernel.org/show_bug.cgi?id=9160 It's a 100% reproducible oops on Sparc (with FireWire controller) for 2.6.23 and 2.6.24 kernels, but not 2.6.22. The reporter confirmed that the bug also happens - with plain 2.6.24-rc5, - with 2.6.23.y and the firewire subsystem fully reverted to that of 2.6.22. This has also been reported independently once before against 2.6.23-rc3, http://marc.info/?l=linux-sparc&m=118751438108687 in August. >> Oct 13 20:26:04 succubus OOPS: Bogus kernel PC [0000000000000000] in fault handler >> Oct 13 20:26:04 succubus OOPS: RPC [0000000010068cd0] >> Oct 13 20:26:04 succubus RPC: <ar_context_add_page+0xd8/0x160 [firewire_ohci]> >> Oct 13 20:26:04 succubus OOPS: Fault was to vaddr[1004e000] >> Oct 13 20:26:04 succubus Call Trace: >> Oct 13 20:26:04 succubus [00000000004076f4] sparc64_realfault_common+0x18/0x20 >> Oct 13 20:26:04 succubus [0000000010068cd0] ar_context_add_page+0xd8/0x160 [firewire_ohci] >> Oct 13 20:26:04 succubus [0000000010068d90] ar_context_init+0x38/0x60 [firewire_ohci] >> Oct 13 20:26:04 succubus [000000001006ac50] pci_probe+0xf8/0x340 [firewire_ohci] >> Oct 13 20:26:04 succubus [00000000005299bc] pci_device_probe+0x64/0xa0 >> Oct 13 20:26:04 succubus [0000000000550a28] driver_probe_device+0x90/0x1c0 >> Oct 13 20:26:04 succubus [0000000000550bc0] __driver_attach+0x68/0x80 >> Oct 13 20:26:04 succubus [000000000054fe5c] bus_for_each_dev+0x44/0x80 >> Oct 13 20:26:04 succubus [0000000000550218] bus_add_driver+0x80/0x1c0 >> Oct 13 20:26:04 succubus [0000000000529b7c] __pci_register_driver+0x44/0xa0 >> Oct 13 20:26:04 succubus [000000000047d5cc] sys_init_module+0x134/0x1400 >> Oct 13 20:26:04 succubus [0000000000406094] linux_sparc_syscall32+0x3c/0x40 >> Oct 13 20:26:04 succubus [00000000000133d8] 0x133e0 >> Oct 13 20:26:04 succubus Unable to handle kernel NULL pointer dereference >> Oct 13 20:26:04 succubus tsk->{mm,active_mm}->context = 00000000000001f9 >> Oct 13 20:26:04 succubus tsk->{mm,active_mm}->pgd = fffff8004f9f8000 [...] The fault happens due to dma_sync_single_for_device() which drivers/firewire/fw-ohci.c calls in ar_context_add_page() when still being in its pci_probe method. I suspect that --- at least on Sparc and after 2.6.22 --- it is not possible anymore to use dma_sync_* before the pci_device's or device's probe was finished. Would that be a bug in the Sparc platform code? Or a bug in driver core code or in PCI code? Or am I expected to refrain from dma_sync_* calls until after the probe returned? (Doing the latter might be tricky, but I suspect that the AR buffers in fw-ohci would generally be better off using coherent allocations. The DMA mapping and syncing in this part of fw-ohci is currently slightly messy.) Thanks for any comments, -- Stefan Richter -=====-=-=== ==-- =--=- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-17 23:53 ` No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) Stefan Richter @ 2007-12-18 0:50 ` David Miller 2007-12-18 10:38 ` Stefan Richter 2007-12-19 21:07 ` Benjamin Herrenschmidt 2007-12-18 2:58 ` Chris Newport 1 sibling, 2 replies; 14+ messages in thread From: David Miller @ 2007-12-18 0:50 UTC (permalink / raw) To: stefanr; +Cc: sparclinux, linux-kernel, gregkh, krh, linux1394-devel From: Stefan Richter <stefanr@s5r6.in-berlin.de> Date: Tue, 18 Dec 2007 00:53:03 +0100 > The fault happens due to dma_sync_single_for_device() which > drivers/firewire/fw-ohci.c calls in ar_context_add_page() when still > being in its pci_probe method. I suspect that --- at least on Sparc and > after 2.6.22 --- it is not possible anymore to use dma_sync_* before the > pci_device's or device's probe was finished. > > Would that be a bug in the Sparc platform code? Or a bug in driver core > code or in PCI code? Or am I expected to refrain from dma_sync_* calls > until after the probe returned? The problem is likely what device struct you are passing to dma_sync_single_for_device(), it has to be a real pci_dev or similar that has it's dev_archdata properly initialized. I bet dev_archdata in whatever "struct device" is being passed in has a NULL iommu pointer or something like that. Oh yeah, I see what you're doing, that won't work, please pass in the correct device struct pointer. Please pass in the &pci_dev->dev not this ohci->card.device thing. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-18 0:50 ` David Miller @ 2007-12-18 10:38 ` Stefan Richter 2007-12-18 22:29 ` David Miller ` (2 more replies) 2007-12-19 21:07 ` Benjamin Herrenschmidt 1 sibling, 3 replies; 14+ messages in thread From: Stefan Richter @ 2007-12-18 10:38 UTC (permalink / raw) To: David Miller; +Cc: sparclinux, linux-kernel, gregkh, krh, linux1394-devel [As pointed out elsewhere in the thread, this is indeed about sparc64, not sparc per se.] David Miller wrote: > From: Stefan Richter <stefanr@s5r6.in-berlin.de> > Date: Tue, 18 Dec 2007 00:53:03 +0100 > >> The fault happens due to dma_sync_single_for_device() which >> drivers/firewire/fw-ohci.c calls in ar_context_add_page() when still >> being in its pci_probe method. I suspect that --- at least on Sparc and >> after 2.6.22 --- it is not possible anymore to use dma_sync_* before the >> pci_device's or device's probe was finished. >> >> Would that be a bug in the Sparc platform code? Or a bug in driver core >> code or in PCI code? Or am I expected to refrain from dma_sync_* calls >> until after the probe returned? > > The problem is likely what device struct you are passing to > dma_sync_single_for_device(), it has to be a real pci_dev or similar > that has it's dev_archdata properly initialized. > > I bet dev_archdata in whatever "struct device" is being passed in has > a NULL iommu pointer or something like that. > > Oh yeah, I see what you're doing, that won't work, please pass in > the correct device struct pointer. Please pass in the &pci_dev->dev > not this ohci->card.device thing. No, the dev argument is alright. We use it a few lines above in the same function in a call to dma_map_single(). The dev argument is IMO correctly obtained here: static int pci_probe(struct pci_dev *dev, const struct pci_device_id *ent) { ... fw_card_initialize(&ohci->card, &ohci_driver, &dev->dev); ... } void fw_card_initialize(struct fw_card *card, const struct fw_card_driver *driver, struct device *device) { ... card->device = device; ... } So, ohci->card.device is in fact &pci_dev->dev. Also note: - The very same code did not oops at this point in 2.6.22. It only started doing so in 2.6.23. - There has been no other report of this kind for any other architecture yet. I would expect e.g. the PPC64 folks to report bugs in our dma mappings eventually. ----- Two footnotes: - Although the 2.6.22 firewire subsystem does not oops during the pci_probe like it does in 2.6.23 and 2.6.24, it does lock up sometime later during actual use. However this is not surprising, as I found and fixed some portential DMA mapping issues in the fw-sbp2 highlevel driver sometime after 2.6.22. But due to the pci_probe problem, the firewire subsystem doesn't get as far on sparc64 on 2.6.23 and 2.6.24. - One thing which we do slightly wrong in ar_context_add_page() is that we 1.) dma-map the buffer, 2.) continue to write into the buffer from the CPU, 3.) then sync it for the device. I let the reporter try a patch which inserted a dma_sync_single_for_cpu() right after the dma_map_single() in order to be clearly entitled to access the buffer by the CPU, but that didn't fix it. -- Stefan Richter -=====-=-=== ==-- =--=- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-18 10:38 ` Stefan Richter @ 2007-12-18 22:29 ` David Miller 2007-12-19 16:33 ` Stefan Richter 2007-12-18 22:30 ` David Miller 2007-12-19 21:08 ` Benjamin Herrenschmidt 2 siblings, 1 reply; 14+ messages in thread From: David Miller @ 2007-12-18 22:29 UTC (permalink / raw) To: stefanr; +Cc: sparclinux, linux-kernel, gregkh, krh, linux1394-devel From: Stefan Richter <stefanr@s5r6.in-berlin.de> Date: Tue, 18 Dec 2007 11:38:27 +0100 > Also note: > - The very same code did not oops at this point in 2.6.22. It only > started doing so in 2.6.23. 2.6.23 is when the sparc64 IOMMU code started relying upon the dev_archdata bits being correct. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-18 22:29 ` David Miller @ 2007-12-19 16:33 ` Stefan Richter 2007-12-19 23:06 ` David Miller 2007-12-20 8:40 ` David Miller 0 siblings, 2 replies; 14+ messages in thread From: Stefan Richter @ 2007-12-19 16:33 UTC (permalink / raw) To: David Miller; +Cc: sparclinux, linux-kernel, gregkh, krh, linux1394-devel David Miller wrote: > From: Stefan Richter <stefanr@s5r6.in-berlin.de> > Date: Tue, 18 Dec 2007 11:38:27 +0100 > >> Also note: >> - The very same code did not oops at this point in 2.6.22. It only >> started doing so in 2.6.23. > > 2.6.23 is when the sparc64 IOMMU code started relying upon > the dev_archdata bits being correct. And why are the dev_archdata corrupt? Does arch/sparc64/kernel/pci.c fill them in incorrectly or too late? drivers/firewire/fw_ohci.c needs them for dma_map_single() + dma_sync_single_for_device() in the pci_driver.probe(), sometime after it called pci_enable_device(), before finishing the probe. -- Stefan Richter -=====-=-=== ==-- =--== http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-19 16:33 ` Stefan Richter @ 2007-12-19 23:06 ` David Miller 2007-12-20 8:40 ` David Miller 1 sibling, 0 replies; 14+ messages in thread From: David Miller @ 2007-12-19 23:06 UTC (permalink / raw) To: stefanr; +Cc: sparclinux, linux-kernel, gregkh, krh, linux1394-devel From: Stefan Richter <stefanr@s5r6.in-berlin.de> Date: Wed, 19 Dec 2007 17:33:05 +0100 > drivers/firewire/fw_ohci.c needs them for dma_map_single() + > dma_sync_single_for_device() in the pci_driver.probe(), sometime after > it called pci_enable_device(), before finishing the probe. I'll take a look at this and try to figure out exactly what might be going wrong. The dev_archdata should be fully setup at this time. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-19 16:33 ` Stefan Richter 2007-12-19 23:06 ` David Miller @ 2007-12-20 8:40 ` David Miller 2007-12-20 20:19 ` Emanuele Rocca 1 sibling, 1 reply; 14+ messages in thread From: David Miller @ 2007-12-20 8:40 UTC (permalink / raw) To: stefanr; +Cc: sparclinux, linux-kernel, gregkh, krh, linux1394-devel From: Stefan Richter <stefanr@s5r6.in-berlin.de> Date: Wed, 19 Dec 2007 17:33:05 +0100 > Does arch/sparc64/kernel/pci.c fill them in incorrectly or too late? The problem is that I created indirection that was totally unused, the operation vectors members for these cases thus didn't get filled in, and we OOPS trying to call NULL pointers as functions :-) This should fix the crash: diff --git a/include/asm-sparc64/dma-mapping.h b/include/asm-sparc64/dma-mapping.h index 1fc6554..38cbec7 100644 --- a/include/asm-sparc64/dma-mapping.h +++ b/include/asm-sparc64/dma-mapping.h @@ -25,15 +25,9 @@ struct dma_ops { void (*sync_single_for_cpu)(struct device *dev, dma_addr_t dma_handle, size_t size, enum dma_data_direction direction); - void (*sync_single_for_device)(struct device *dev, - dma_addr_t dma_handle, size_t size, - enum dma_data_direction direction); void (*sync_sg_for_cpu)(struct device *dev, struct scatterlist *sg, int nelems, enum dma_data_direction direction); - void (*sync_sg_for_device)(struct device *dev, struct scatterlist *sg, - int nelems, - enum dma_data_direction direction); }; extern const struct dma_ops *dma_ops; @@ -105,7 +99,7 @@ static inline void dma_sync_single_for_device(struct device *dev, size_t size, enum dma_data_direction direction) { - dma_ops->sync_single_for_device(dev, dma_handle, size, direction); + /* No flushing needed to sync cpu writes to the device. */ } static inline void dma_sync_single_range_for_cpu(struct device *dev, @@ -123,7 +117,7 @@ static inline void dma_sync_single_range_for_device(struct device *dev, size_t size, enum dma_data_direction direction) { - dma_sync_single_for_device(dev, dma_handle+offset, size, direction); + /* No flushing needed to sync cpu writes to the device. */ } @@ -138,7 +132,7 @@ static inline void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems, enum dma_data_direction direction) { - dma_ops->sync_sg_for_device(dev, sg, nelems, direction); + /* No flushing needed to sync cpu writes to the device. */ } static inline int dma_mapping_error(dma_addr_t dma_addr) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-20 8:40 ` David Miller @ 2007-12-20 20:19 ` Emanuele Rocca 2007-12-22 13:10 ` Stefan Richter 0 siblings, 1 reply; 14+ messages in thread From: Emanuele Rocca @ 2007-12-20 20:19 UTC (permalink / raw) To: David Miller Cc: stefanr, sparclinux, linux-kernel, gregkh, krh, linux1394-devel * David Miller <davem@davemloft.net>, [2007-12-20 0:40 -0800]: > The problem is that I created indirection that was totally unused, the > operation vectors members for these cases thus didn't get filled in, > and we OOPS trying to call NULL pointers as functions :-) > > This should fix the crash: It does, tested on a Sun Blade 2000. Thank you David. ciao, ema ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-20 20:19 ` Emanuele Rocca @ 2007-12-22 13:10 ` Stefan Richter 0 siblings, 0 replies; 14+ messages in thread From: Stefan Richter @ 2007-12-22 13:10 UTC (permalink / raw) To: David Miller; +Cc: sparclinux, linux-kernel, gregkh, krh, linux1394-devel Emanuele Rocca wrote: > * David Miller <davem@davemloft.net>, [2007-12-20 0:40 -0800]: >> The problem is that I created indirection that was totally unused, the >> operation vectors members for these cases thus didn't get filled in, >> and we OOPS trying to call NULL pointers as functions :-) >> >> This should fix the crash: > > It does, tested on a Sun Blade 2000. Thanks David and Emanuele. I haven't got feedback from the other reporter yet but I assume this fixes the issue (http://bugzilla.kernel.org/show_bug.cgi?id=9160). -- Stefan Richter -=====-=-=== ==-- =-==- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-18 10:38 ` Stefan Richter 2007-12-18 22:29 ` David Miller @ 2007-12-18 22:30 ` David Miller 2007-12-19 21:08 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 14+ messages in thread From: David Miller @ 2007-12-18 22:30 UTC (permalink / raw) To: stefanr; +Cc: sparclinux, linux-kernel, gregkh, krh, linux1394-devel From: Stefan Richter <stefanr@s5r6.in-berlin.de> Date: Tue, 18 Dec 2007 11:38:27 +0100 > - There has been no other report of this kind for any other > architecture yet. I would expect e.g. the PPC64 folks to report > bugs in our dma mappings eventually. Irrelevant fact, powerpc handles it's dev_archdata differently from sparc64. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-18 10:38 ` Stefan Richter 2007-12-18 22:29 ` David Miller 2007-12-18 22:30 ` David Miller @ 2007-12-19 21:08 ` Benjamin Herrenschmidt 2 siblings, 0 replies; 14+ messages in thread From: Benjamin Herrenschmidt @ 2007-12-19 21:08 UTC (permalink / raw) To: Stefan Richter Cc: David Miller, sparclinux, linux-kernel, gregkh, krh, linux1394-devel On Tue, 2007-12-18 at 11:38 +0100, Stefan Richter wrote: > So, ohci->card.device is in fact &pci_dev->dev. > > Also note: > - The very same code did not oops at this point in 2.6.22. It only > started doing so in 2.6.23. > - There has been no other report of this kind for any other > architecture yet. I would expect e.g. the PPC64 folks to report > bugs in our dma mappings eventually. Ignore my previous message... if you are indeed passing &pci_dev->dev, it should work. Ben. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-18 0:50 ` David Miller 2007-12-18 10:38 ` Stefan Richter @ 2007-12-19 21:07 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 14+ messages in thread From: Benjamin Herrenschmidt @ 2007-12-19 21:07 UTC (permalink / raw) To: David Miller Cc: stefanr, sparclinux, linux-kernel, gregkh, krh, linux1394-devel On Mon, 2007-12-17 at 16:50 -0800, David Miller wrote: > The problem is likely what device struct you are passing to > dma_sync_single_for_device(), it has to be a real pci_dev or similar > that has it's dev_archdata properly initialized. > > I bet dev_archdata in whatever "struct device" is being passed in has > a NULL iommu pointer or something like that. > > Oh yeah, I see what you're doing, that won't work, please pass in > the correct device struct pointer. Please pass in the &pci_dev->dev > not this ohci->card.device thing. Yup, this would crash on powerpc 64 bits as well for the same reason. Ben. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-17 23:53 ` No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) Stefan Richter 2007-12-18 0:50 ` David Miller @ 2007-12-18 2:58 ` Chris Newport 2007-12-18 3:03 ` David Miller 1 sibling, 1 reply; 14+ messages in thread From: Chris Newport @ 2007-12-18 2:58 UTC (permalink / raw) To: Stefan Richter Cc: sparclinux, linux-kernel, David S. Miller, Greg Kroah-Hartman, Kristian Høgsberg, linux1394-devel On Tue, 18 Dec 2007, Stefan Richter wrote: > It's a 100% reproducible oops on Sparc (with FireWire controller) for > 2.6.23 and 2.6.24 kernels, but not 2.6.22. The reporter confirmed that > the bug also happens How do you achieve a sparc system with firewire ? AFAIK there is no SBUS firewire card. Only sparc64 and some rare javastations have PCI slots. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) 2007-12-18 2:58 ` Chris Newport @ 2007-12-18 3:03 ` David Miller 0 siblings, 0 replies; 14+ messages in thread From: David Miller @ 2007-12-18 3:03 UTC (permalink / raw) To: crn; +Cc: stefanr, sparclinux, linux-kernel, gregkh, krh, linux1394-devel From: Chris Newport <crn@netunix.com> Date: Tue, 18 Dec 2007 02:58:29 +0000 (GMT) > On Tue, 18 Dec 2007, Stefan Richter wrote: > > > It's a 100% reproducible oops on Sparc (with FireWire controller) for > > 2.6.23 and 2.6.24 kernels, but not 2.6.22. The reporter confirmed that > > the bug also happens > > How do you achieve a sparc system with firewire ? > AFAIK there is no SBUS firewire card. He means sparc64, which have PCI firewire onboard many systems. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2007-12-22 13:11 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-9160-4803@http.bugzilla.kernel.org/>
[not found] ` <47112797.7060003@s5r6.in-berlin.de>
2007-12-17 23:53 ` No dma_sync_* during pci_probe? (Sparc, post 2.6.22 regression) Stefan Richter
2007-12-18 0:50 ` David Miller
2007-12-18 10:38 ` Stefan Richter
2007-12-18 22:29 ` David Miller
2007-12-19 16:33 ` Stefan Richter
2007-12-19 23:06 ` David Miller
2007-12-20 8:40 ` David Miller
2007-12-20 20:19 ` Emanuele Rocca
2007-12-22 13:10 ` Stefan Richter
2007-12-18 22:30 ` David Miller
2007-12-19 21:08 ` Benjamin Herrenschmidt
2007-12-19 21:07 ` Benjamin Herrenschmidt
2007-12-18 2:58 ` Chris Newport
2007-12-18 3:03 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox