Linux PARISC architecture development
 help / color / mirror / Atom feed
* [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
@ 2005-05-01  7:49 Grant Grundler
       [not found] ` <4274FC81.4060906@tiscali.be>
  2005-05-01 19:24 ` James Bottomley
  0 siblings, 2 replies; 8+ messages in thread
From: Grant Grundler @ 2005-05-01  7:49 UTC (permalink / raw)
  To: parisc-linux

Trying to get "tvflash" running on pa8800. tvflash is a userspace
firmware flash tool for mellanox infiniband boards.
Works fine on x86, x86-64, ia64, ppc, and sparc64.
"tvflash -i" is supposed to go off and identify all
installed mellanox boards and which firmware rev they
currently have.

The panic is in pfn_to_nid():
        r = pfnnid_map[i];
        BUG_ON(r == 0xff);	<---- panic

Any clue what's broken?


[ sorry - feels like I'm reporting alot more problems than I'm
fixing lately *sigh* ]

thanks,
grant

grundler@ion:/usr/src/openib_gen2/src/userspace/tvflash$ sudo src/tvflash -i
open_hca(0)
kernel BUG at include/asm/mmzone.h:85!
Backtrace:
 [<0000000010113060>] dump_stack+0x18/0x28
 [<000000001018125c>] remap_pfn_range+0x37c/0x4b8
 [<00000000102413e4>] mmap_mem+0x2c/0x40
 [<0000000010187120>] do_mmap_pgoff+0x478/0x850
 [<0000000010114f84>] do_mmap2+0xa4/0x108
 [<0000000010115030>] sys_mmap+0x28/0x38
 [<0000000010107f80>] syscall_exit+0x0/0x14

Kernel panic - not syncing: BUG!
 <0>Rebooting in 5 seconds..
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
       [not found] ` <4274FC81.4060906@tiscali.be>
@ 2005-05-01 17:24   ` Grant Grundler
  0 siblings, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2005-05-01 17:24 UTC (permalink / raw)
  To: Joel Soete; +Cc: parisc-linux

On Sun, May 01, 2005 at 03:57:53PM +0000, Joel Soete wrote:
> it's CONFIG_DISCONTIGMEM

What's that mean?
I was hoping someone could explain what the BUG_ON()
that tripped would mean.

> so first of all do you need DISCONTIGMEM?

Erm, define "need". I try to run the same kernels on several
a500s - several of which have > 4GB RAM.

pa8800/ZX1 has 4GB RAM installed but only 1GB is in 32-bit address space.
The rest is remapped by HW to > 32-bit.

> eventhougt what hapened if you don't use this option?

No.

> (it would also boot just miss some ram)

"some" on pa8800 is not ok.
1GB for a 4-way, 1GHz box just isn't right.

thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
  2005-05-01  7:49 [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic Grant Grundler
       [not found] ` <4274FC81.4060906@tiscali.be>
@ 2005-05-01 19:24 ` James Bottomley
  2005-05-02  0:08   ` Grant Grundler
  1 sibling, 1 reply; 8+ messages in thread
From: James Bottomley @ 2005-05-01 19:24 UTC (permalink / raw)
  To: Grant Grundler; +Cc: PARISC list

On Sun, 2005-05-01 at 01:49 -0600, Grant Grundler wrote:
> Trying to get "tvflash" running on pa8800. tvflash is a userspace
> firmware flash tool for mellanox infiniband boards.
> Works fine on x86, x86-64, ia64, ppc, and sparc64.
> "tvflash -i" is supposed to go off and identify all
> installed mellanox boards and which firmware rev they
> currently have.
> 
> The panic is in pfn_to_nid():
>         r = pfnnid_map[i];
>         BUG_ON(r == 0xff);	<---- panic
> 
> Any clue what's broken?

pfnnid_map is a map per 1gb (currently what PFNNID_SHIFT defines) of our
memory range showing which discontig chunk this maps to.  0xff means the
range maps nowhere.

Without better debugging, it's hard to say, but I guess that this flash
tool is actually trying to mmap a region of memory on the PCI card and
it's tripping over this section of code.  My second order guess would be
that we don't update the pfnnid_map when we actually declare a card I/O
range, so the kernel thinks it can map the region OK but we erroneously
trip this bug.

I'm with Joel on this one: Can you reproduce the problem without
CONFIG_DISCONTIGMEM?

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
  2005-05-01 19:24 ` James Bottomley
@ 2005-05-02  0:08   ` Grant Grundler
  2005-05-02  0:31     ` James Bottomley
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2005-05-02  0:08 UTC (permalink / raw)
  To: James Bottomley; +Cc: PARISC list

On Sun, May 01, 2005 at 02:24:20PM -0500, James Bottomley wrote:
> pfnnid_map is a map per 1gb (currently what PFNNID_SHIFT defines) of our
> memory range showing which discontig chunk this maps to.  0xff means the
> range maps nowhere.

ok

> Without better debugging, it's hard to say, but I guess that this flash
> tool is actually trying to mmap a region of memory on the PCI card and
> it's tripping over this section of code.

Yes. A copy of the openib_gen2 tree is parked on gsyprf11:/usr/src/ as well.

>   My second order guess would be
> that we don't update the pfnnid_map when we actually declare a card I/O
> range, so the kernel thinks it can map the region OK but we erroneously
> trip this bug.

*nod*. Where should we be telling the VM about MMIO ranges?
We clearly need to be advertising them.

> I'm with Joel on this one:
> Can you reproduce the problem without CONFIG_DISCONTIGMEM?

Yes. I rebuilt the kernel with CONFIG_DISCONTIGMEM=n.
I'm having doubts though that I rebooted the right kernel.
I believe so but post again later if not.

ion:/usr/src/openib_gen2/src/userspace/tvflash# src/tvflash -i
open_hca(0)
kernel BUG at include/asm/mmzone.h:85!
Backtrace:
 [<0000000010113060>] dump_stack+0x18/0x28
 [<00000000101813fc>] remap_pfn_range+0x37c/0x4b8
 [<0000000010241934>] mmap_mem+0x2c/0x40
 [<0000000010187288>] do_mmap_pgoff+0x478/0x858
 [<0000000010114f84>] do_mmap2+0xa4/0x108
 [<0000000010115030>] sys_mmap+0x28/0x38
 [<0000000010107f80>] syscall_exit+0x0/0x14

Kernel panic - not syncing: BUG!
 <0>Rebooting in 5 seconds..


thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
  2005-05-02  0:08   ` Grant Grundler
@ 2005-05-02  0:31     ` James Bottomley
  2005-05-02  4:12       ` Grant Grundler
  0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2005-05-02  0:31 UTC (permalink / raw)
  To: Grant Grundler; +Cc: PARISC list

On Sun, 2005-05-01 at 18:08 -0600, Grant Grundler wrote:
> On Sun, May 01, 2005 at 02:24:20PM -0500, James Bottomley wrote:
> >   My second order guess would be
> > that we don't update the pfnnid_map when we actually declare a card I/O
> > range, so the kernel thinks it can map the region OK but we erroneously
> > trip this bug.
> 
> *nod*. Where should we be telling the VM about MMIO ranges?
> We clearly need to be advertising them.

Is this a 64 bit mmio region?  Our pfn_is_io heuristics are a bit simple
(i.e. top F only set).

> > I'm with Joel on this one:
> > Can you reproduce the problem without CONFIG_DISCONTIGMEM?
> 
> Yes. I rebuilt the kernel with CONFIG_DISCONTIGMEM=n.
> I'm having doubts though that I rebooted the right kernel.
> I believe so but post again later if not.
> 
> ion:/usr/src/openib_gen2/src/userspace/tvflash# src/tvflash -i
> open_hca(0)
> kernel BUG at include/asm/mmzone.h:85!
> Backtrace:
>  [<0000000010113060>] dump_stack+0x18/0x28
>  [<00000000101813fc>] remap_pfn_range+0x37c/0x4b8
>  [<0000000010241934>] mmap_mem+0x2c/0x40
>  [<0000000010187288>] do_mmap_pgoff+0x478/0x858
>  [<0000000010114f84>] do_mmap2+0xa4/0x108
>  [<0000000010115030>] sys_mmap+0x28/0x38
>  [<0000000010107f80>] syscall_exit+0x0/0x14

Yes, wrong kernel.  That bug physically cannot occur with
CONFIG_DISCONTIGMEM=n.  What I'm interested in there is if the tool
actually works.

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
  2005-05-02  0:31     ` James Bottomley
@ 2005-05-02  4:12       ` Grant Grundler
  2005-05-02 14:51         ` James Bottomley
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Grundler @ 2005-05-02  4:12 UTC (permalink / raw)
  To: James Bottomley; +Cc: PARISC list

On Sun, May 01, 2005 at 07:31:37PM -0500, James Bottomley wrote:
> > *nod*. Where should we be telling the VM about MMIO ranges?
> > We clearly need to be advertising them.
> 
> Is this a 64 bit mmio region?  Our pfn_is_io heuristics are a bit simple
> (i.e. top F only set).

It's a 64-bit BAR but not an address > 32-bits.
ion:~# lspci -vs 81:
0000:81:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1)
        Subsystem: Hewlett-Packard Company: Unknown device 12ce
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 24
        Memory at c0800000 (64-bit, non-prefetchable) [size=1M]
        Memory at c0000000 (64-bit, prefetchable) [size=8M]
        Capabilities: [40] #11 [001f]
        Capabilities: [50] Vital Product Data
        Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable-
        Capabilities: [70] PCI-X non-bridge device.

The 3rd BAR isn't exposed by the current version of firmware.
If it were exposed, then we would see a 128MB or 256MB MMIO space.


> Yes, wrong kernel.  That bug physically cannot occur with
> CONFIG_DISCONTIGMEM=n.  What I'm interested in there is if the tool
> actually works.

Sorry, it was the old kernel. I expect make -j2 failure scrolled
off the screen after one of the threads segfaults.
ie no new kernel got built.

Now the system just resets. No tombstone of any kind. :^(

ion:/usr/src/openib_gen2/src/userspace/tvflash# cat /proc/meminfo 
MemTotal:      1024548 kB
MemFree:        988272 kB
Buffers:          2016 kB
Cached:          15172 kB
...
ion:/usr/src/openib_gen2/src/userspace/tvflash# src/tvflash -i
open_hca(0)



Firmware Version  44.24

Duplex Console IO Dependent Code (IODC) revision 1


PC and RP from PIM dump + System.map:

CPU0 IOAQ 0x1010aadc $$remoI+350
CPU0 GR02 0x1010d110 __udivdi3+198

CPU1 IOAQ 0x101011b0 flush_data_cache_local+8
CPU1 GR02 0x10113b04 update_mmu_cache+94


Looks like CPU1 died right away trying to flush an uncacheable region.
No surprise that didn't work too well.

And no clue what's up with CPU0.

thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
  2005-05-02  4:12       ` Grant Grundler
@ 2005-05-02 14:51         ` James Bottomley
  2005-05-02 16:00           ` Grant Grundler
  0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2005-05-02 14:51 UTC (permalink / raw)
  To: Grant Grundler; +Cc: PARISC list

On Sun, 2005-05-01 at 22:12 -0600, Grant Grundler wrote:
>         Memory at c0800000 (64-bit, non-prefetchable) [size=1M]
>         Memory at c0000000 (64-bit, prefetchable) [size=8M]

Exactly ... that's the 3rd gigabyte; I assume there's no physical memory
there, so the pfnnid_map is 0xff (except the bit where the io region
check works).

However, there's a total screw up here: 0xc0000000 is outside of our
premapped I/O region (0xf0000000-0xffffffff) so one of our assumptions
about pa is broken; either the mercury doesn't obey the I/O window rules
and we need to update the OS, or the card has the wrong address.

> Now the system just resets. No tombstone of any kind. :^(
[...]
> CPU0 IOAQ 0x1010aadc $$remoI+350
> CPU0 GR02 0x1010d110 __udivdi3+198
> 
> CPU1 IOAQ 0x101011b0 flush_data_cache_local+8
> CPU1 GR02 0x10113b04 update_mmu_cache+94
> 
> 
> Looks like CPU1 died right away trying to flush an uncacheable region.
> No surprise that didn't work too well.
> 
> And no clue what's up with CPU0.

CPU0 was probably executing a different thread when it was halted by the
HPMC.

flushing an uncacheable area doesn't cause a HPMC, but flushing a non-
existent (and non-responding) area would ...

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic
  2005-05-02 14:51         ` James Bottomley
@ 2005-05-02 16:00           ` Grant Grundler
  0 siblings, 0 replies; 8+ messages in thread
From: Grant Grundler @ 2005-05-02 16:00 UTC (permalink / raw)
  To: James Bottomley; +Cc: PARISC list

On Mon, May 02, 2005 at 09:51:50AM -0500, James Bottomley wrote:
> On Sun, 2005-05-01 at 22:12 -0600, Grant Grundler wrote:
> >         Memory at c0800000 (64-bit, non-prefetchable) [size=1M]
> >         Memory at c0000000 (64-bit, prefetchable) [size=8M]
> 
> Exactly ... that's the 3rd gigabyte; I assume there's no physical memory
> there, so the pfnnid_map is 0xff (except the bit where the io region
> check works).

ok

> However, there's a total screw up here: 0xc0000000 is outside of our
> premapped I/O region (0xf0000000-0xffffffff) so one of our assumptions
> about pa is broken; either the mercury doesn't obey the I/O window rules
> and we need to update the OS,

We need to update the OS.

> or the card has the wrong address.

The card has the right address for the Rope it's under.
THis is a ZX1 chipset. Similar to N-class, MMIO space is 2-4GB address.

> > CPU1 IOAQ 0x101011b0 flush_data_cache_local+8
> > CPU1 GR02 0x10113b04 update_mmu_cache+94
> > 
> > 
> > Looks like CPU1 died right away trying to flush an uncacheable region.
> > No surprise that didn't work too well.
> > 
> > And no clue what's up with CPU0.
> 
> CPU0 was probably executing a different thread when it was halted by the
> HPMC.

Maybe. The fact that it died early in the routine suggests otherwise.

> flushing an uncacheable area doesn't cause a HPMC,

This is the part I'm not sure about. John Marvin?

> but flushing a non-existent (and non-responding) area would ...

Yes, probably.

thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-05-02 16:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-01  7:49 [parisc-linux] BUG 2.6.12-rc3-pa1 PCI mmap panic Grant Grundler
     [not found] ` <4274FC81.4060906@tiscali.be>
2005-05-01 17:24   ` Grant Grundler
2005-05-01 19:24 ` James Bottomley
2005-05-02  0:08   ` Grant Grundler
2005-05-02  0:31     ` James Bottomley
2005-05-02  4:12       ` Grant Grundler
2005-05-02 14:51         ` James Bottomley
2005-05-02 16:00           ` Grant Grundler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox