* [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
@ 2005-05-01 7:49 Grant Grundler
[not found] ` <4274FC81.4060906@tiscali.be>
2005-05-01 19:24 ` James Bottomley
0 siblings, 2 replies; 8+ messages in thread
From: Grant Grundler @ 2005-05-01 7:49 UTC (permalink / raw)
To: parisc-linux
Trying to get "tvflash" running on pa8800. tvflash is a userspace
firmware flash tool for mellanox infiniband boards.
Works fine on x86, x86-64, ia64, ppc, and sparc64.
"tvflash -i" is supposed to go off and identify all
installed mellanox boards and which firmware rev they
currently have.
The panic is in pfn_to_nid():
r = pfnnid_map[i];
BUG_ON(r == 0xff); <---- panic
Any clue what's broken?
[ sorry - feels like I'm reporting alot more problems than I'm
fixing lately *sigh* ]
thanks,
grant
grundler@ion:/usr/src/openib_gen2/src/userspace/tvflash$ sudo src/tvflash -i
open_hca(0)
kernel BUG at include/asm/mmzone.h:85!
Backtrace:
[<0000000010113060>] dump_stack+0x18/0x28
[<000000001018125c>] remap_pfn_range+0x37c/0x4b8
[<00000000102413e4>] mmap_mem+0x2c/0x40
[<0000000010187120>] do_mmap_pgoff+0x478/0x850
[<0000000010114f84>] do_mmap2+0xa4/0x108
[<0000000010115030>] sys_mmap+0x28/0x38
[<0000000010107f80>] syscall_exit+0x0/0x14
Kernel panic - not syncing: BUG!
<0>Rebooting in 5 seconds..
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
[not found] ` <4274FC81.4060906@tiscali.be>
@ 2005-05-01 17:24 ` Grant Grundler
0 siblings, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2005-05-01 17:24 UTC (permalink / raw)
To: Joel Soete; +Cc: parisc-linux
On Sun, May 01, 2005 at 03:57:53PM +0000, Joel Soete wrote:
> it's CONFIG_DISCONTIGMEM
What's that mean?
I was hoping someone could explain what the BUG_ON()
that tripped would mean.
> so first of all do you need DISCONTIGMEM?
Erm, define "need". I try to run the same kernels on several
a500s - several of which have > 4GB RAM.
pa8800/ZX1 has 4GB RAM installed but only 1GB is in 32-bit address space.
The rest is remapped by HW to > 32-bit.
> eventhougt what hapened if you don't use this option?
No.
> (it would also boot just miss some ram)
"some" on pa8800 is not ok.
1GB for a 4-way, 1GHz box just isn't right.
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
2005-05-01 7:49 [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic Grant Grundler
[not found] ` <4274FC81.4060906@tiscali.be>
@ 2005-05-01 19:24 ` James Bottomley
2005-05-02 0:08 ` Grant Grundler
1 sibling, 1 reply; 8+ messages in thread
From: James Bottomley @ 2005-05-01 19:24 UTC (permalink / raw)
To: Grant Grundler; +Cc: PARISC list
On Sun, 2005-05-01 at 01:49 -0600, Grant Grundler wrote:
> Trying to get "tvflash" running on pa8800. tvflash is a userspace
> firmware flash tool for mellanox infiniband boards.
> Works fine on x86, x86-64, ia64, ppc, and sparc64.
> "tvflash -i" is supposed to go off and identify all
> installed mellanox boards and which firmware rev they
> currently have.
>
> The panic is in pfn_to_nid():
> r = pfnnid_map[i];
> BUG_ON(r == 0xff); <---- panic
>
> Any clue what's broken?
pfnnid_map is a map per 1gb (currently what PFNNID_SHIFT defines) of our
memory range showing which discontig chunk this maps to. 0xff means the
range maps nowhere.
Without better debugging, it's hard to say, but I guess that this flash
tool is actually trying to mmap a region of memory on the PCI card and
it's tripping over this section of code. My second order guess would be
that we don't update the pfnnid_map when we actually declare a card I/O
range, so the kernel thinks it can map the region OK but we erroneously
trip this bug.
I'm with Joel on this one: Can you reproduce the problem without
CONFIG_DISCONTIGMEM?
James
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
2005-05-01 19:24 ` James Bottomley
@ 2005-05-02 0:08 ` Grant Grundler
2005-05-02 0:31 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2005-05-02 0:08 UTC (permalink / raw)
To: James Bottomley; +Cc: PARISC list
On Sun, May 01, 2005 at 02:24:20PM -0500, James Bottomley wrote:
> pfnnid_map is a map per 1gb (currently what PFNNID_SHIFT defines) of our
> memory range showing which discontig chunk this maps to. 0xff means the
> range maps nowhere.
ok
> Without better debugging, it's hard to say, but I guess that this flash
> tool is actually trying to mmap a region of memory on the PCI card and
> it's tripping over this section of code.
Yes. A copy of the openib_gen2 tree is parked on gsyprf11:/usr/src/ as well.
> My second order guess would be
> that we don't update the pfnnid_map when we actually declare a card I/O
> range, so the kernel thinks it can map the region OK but we erroneously
> trip this bug.
*nod*. Where should we be telling the VM about MMIO ranges?
We clearly need to be advertising them.
> I'm with Joel on this one:
> Can you reproduce the problem without CONFIG_DISCONTIGMEM?
Yes. I rebuilt the kernel with CONFIG_DISCONTIGMEM=n.
I'm having doubts though that I rebooted the right kernel.
I believe so but post again later if not.
ion:/usr/src/openib_gen2/src/userspace/tvflash# src/tvflash -i
open_hca(0)
kernel BUG at include/asm/mmzone.h:85!
Backtrace:
[<0000000010113060>] dump_stack+0x18/0x28
[<00000000101813fc>] remap_pfn_range+0x37c/0x4b8
[<0000000010241934>] mmap_mem+0x2c/0x40
[<0000000010187288>] do_mmap_pgoff+0x478/0x858
[<0000000010114f84>] do_mmap2+0xa4/0x108
[<0000000010115030>] sys_mmap+0x28/0x38
[<0000000010107f80>] syscall_exit+0x0/0x14
Kernel panic - not syncing: BUG!
<0>Rebooting in 5 seconds..
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
2005-05-02 0:08 ` Grant Grundler
@ 2005-05-02 0:31 ` James Bottomley
2005-05-02 4:12 ` Grant Grundler
0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2005-05-02 0:31 UTC (permalink / raw)
To: Grant Grundler; +Cc: PARISC list
On Sun, 2005-05-01 at 18:08 -0600, Grant Grundler wrote:
> On Sun, May 01, 2005 at 02:24:20PM -0500, James Bottomley wrote:
> > My second order guess would be
> > that we don't update the pfnnid_map when we actually declare a card I/O
> > range, so the kernel thinks it can map the region OK but we erroneously
> > trip this bug.
>
> *nod*. Where should we be telling the VM about MMIO ranges?
> We clearly need to be advertising them.
Is this a 64 bit mmio region? Our pfn_is_io heuristics are a bit simple
(i.e. top F only set).
> > I'm with Joel on this one:
> > Can you reproduce the problem without CONFIG_DISCONTIGMEM?
>
> Yes. I rebuilt the kernel with CONFIG_DISCONTIGMEM=n.
> I'm having doubts though that I rebooted the right kernel.
> I believe so but post again later if not.
>
> ion:/usr/src/openib_gen2/src/userspace/tvflash# src/tvflash -i
> open_hca(0)
> kernel BUG at include/asm/mmzone.h:85!
> Backtrace:
> [<0000000010113060>] dump_stack+0x18/0x28
> [<00000000101813fc>] remap_pfn_range+0x37c/0x4b8
> [<0000000010241934>] mmap_mem+0x2c/0x40
> [<0000000010187288>] do_mmap_pgoff+0x478/0x858
> [<0000000010114f84>] do_mmap2+0xa4/0x108
> [<0000000010115030>] sys_mmap+0x28/0x38
> [<0000000010107f80>] syscall_exit+0x0/0x14
Yes, wrong kernel. That bug physically cannot occur with
CONFIG_DISCONTIGMEM=n. What I'm interested in there is if the tool
actually works.
James
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
2005-05-02 0:31 ` James Bottomley
@ 2005-05-02 4:12 ` Grant Grundler
2005-05-02 14:51 ` James Bottomley
0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2005-05-02 4:12 UTC (permalink / raw)
To: James Bottomley; +Cc: PARISC list
On Sun, May 01, 2005 at 07:31:37PM -0500, James Bottomley wrote:
> > *nod*. Where should we be telling the VM about MMIO ranges?
> > We clearly need to be advertising them.
>
> Is this a 64 bit mmio region? Our pfn_is_io heuristics are a bit simple
> (i.e. top F only set).
It's a 64-bit BAR but not an address > 32-bits.
ion:~# lspci -vs 81:
0000:81:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1)
Subsystem: Hewlett-Packard Company: Unknown device 12ce
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 24
Memory at c0800000 (64-bit, non-prefetchable) [size=1M]
Memory at c0000000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] #11 [001f]
Capabilities: [50] Vital Product Data
Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable-
Capabilities: [70] PCI-X non-bridge device.
The 3rd BAR isn't exposed by the current version of firmware.
If it were exposed, then we would see a 128MB or 256MB MMIO space.
> Yes, wrong kernel. That bug physically cannot occur with
> CONFIG_DISCONTIGMEM=n. What I'm interested in there is if the tool
> actually works.
Sorry, it was the old kernel. I expect make -j2 failure scrolled
off the screen after one of the threads segfaults.
ie no new kernel got built.
Now the system just resets. No tombstone of any kind. :^(
ion:/usr/src/openib_gen2/src/userspace/tvflash# cat /proc/meminfo
MemTotal: 1024548 kB
MemFree: 988272 kB
Buffers: 2016 kB
Cached: 15172 kB
...
ion:/usr/src/openib_gen2/src/userspace/tvflash# src/tvflash -i
open_hca(0)
Firmware Version 44.24
Duplex Console IO Dependent Code (IODC) revision 1
PC and RP from PIM dump + System.map:
CPU0 IOAQ 0x1010aadc $$remoI+350
CPU0 GR02 0x1010d110 __udivdi3+198
CPU1 IOAQ 0x101011b0 flush_data_cache_local+8
CPU1 GR02 0x10113b04 update_mmu_cache+94
Looks like CPU1 died right away trying to flush an uncacheable region.
No surprise that didn't work too well.
And no clue what's up with CPU0.
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
2005-05-02 4:12 ` Grant Grundler
@ 2005-05-02 14:51 ` James Bottomley
2005-05-02 16:00 ` Grant Grundler
0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2005-05-02 14:51 UTC (permalink / raw)
To: Grant Grundler; +Cc: PARISC list
On Sun, 2005-05-01 at 22:12 -0600, Grant Grundler wrote:
> Memory at c0800000 (64-bit, non-prefetchable) [size=1M]
> Memory at c0000000 (64-bit, prefetchable) [size=8M]
Exactly ... that's the 3rd gigabyte; I assume there's no physical memory
there, so the pfnnid_map is 0xff (except the bit where the io region
check works).
However, there's a total screw up here: 0xc0000000 is outside of our
premapped I/O region (0xf0000000-0xffffffff) so one of our assumptions
about pa is broken; either the mercury doesn't obey the I/O window rules
and we need to update the OS, or the card has the wrong address.
> Now the system just resets. No tombstone of any kind. :^(
[...]
> CPU0 IOAQ 0x1010aadc $$remoI+350
> CPU0 GR02 0x1010d110 __udivdi3+198
>
> CPU1 IOAQ 0x101011b0 flush_data_cache_local+8
> CPU1 GR02 0x10113b04 update_mmu_cache+94
>
>
> Looks like CPU1 died right away trying to flush an uncacheable region.
> No surprise that didn't work too well.
>
> And no clue what's up with CPU0.
CPU0 was probably executing a different thread when it was halted by the
HPMC.
flushing an uncacheable area doesn't cause a HPMC, but flushing a non-
existent (and non-responding) area would ...
James
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
2005-05-02 14:51 ` James Bottomley
@ 2005-05-02 16:00 ` Grant Grundler
0 siblings, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2005-05-02 16:00 UTC (permalink / raw)
To: James Bottomley; +Cc: PARISC list
On Mon, May 02, 2005 at 09:51:50AM -0500, James Bottomley wrote:
> On Sun, 2005-05-01 at 22:12 -0600, Grant Grundler wrote:
> > Memory at c0800000 (64-bit, non-prefetchable) [size=1M]
> > Memory at c0000000 (64-bit, prefetchable) [size=8M]
>
> Exactly ... that's the 3rd gigabyte; I assume there's no physical memory
> there, so the pfnnid_map is 0xff (except the bit where the io region
> check works).
ok
> However, there's a total screw up here: 0xc0000000 is outside of our
> premapped I/O region (0xf0000000-0xffffffff) so one of our assumptions
> about pa is broken; either the mercury doesn't obey the I/O window rules
> and we need to update the OS,
We need to update the OS.
> or the card has the wrong address.
The card has the right address for the Rope it's under.
THis is a ZX1 chipset. Similar to N-class, MMIO space is 2-4GB address.
> > CPU1 IOAQ 0x101011b0 flush_data_cache_local+8
> > CPU1 GR02 0x10113b04 update_mmu_cache+94
> >
> >
> > Looks like CPU1 died right away trying to flush an uncacheable region.
> > No surprise that didn't work too well.
> >
> > And no clue what's up with CPU0.
>
> CPU0 was probably executing a different thread when it was halted by the
> HPMC.
Maybe. The fact that it died early in the routine suggests otherwise.
> flushing an uncacheable area doesn't cause a HPMC,
This is the part I'm not sure about. John Marvin?
> but flushing a non-existent (and non-responding) area would ...
Yes, probably.
thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2005-05-02 16:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-01 7:49 [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic Grant Grundler
[not found] ` <4274FC81.4060906@tiscali.be>
2005-05-01 17:24 ` Grant Grundler
2005-05-01 19:24 ` James Bottomley
2005-05-02 0:08 ` Grant Grundler
2005-05-02 0:31 ` James Bottomley
2005-05-02 4:12 ` Grant Grundler
2005-05-02 14:51 ` James Bottomley
2005-05-02 16:00 ` Grant Grundler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox