public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Kernel BUG at objrmap:325 in 2.6.5-7.151-smp (SuSE, x86_64)
@ 2005-07-13  9:07 Christian Boehme
  2005-07-13  9:16 ` Arjan van de Ven
  2005-07-13  9:59 ` Jan Engelhardt
  0 siblings, 2 replies; 5+ messages in thread
From: Christian Boehme @ 2005-07-13  9:07 UTC (permalink / raw)
  To: linux-kernel

We often see the following kernel-bug in our logs:

kernel: Kernel BUG at objrmap:325
kernel: invalid operand: 0000 [1] SMP
kernel: CPU 0
kernel: Pid: 4752, comm: mhd3d.opteron Tainted: G  U (2.6.5-7.151-smp SLES9_SP1_BRANCH-200503181131210000)
kernel: RIP: 0010:[<ffffffff8017d1de>] <ffffffff8017d1de>{page_add_rmap+334}
kernel: RSP: 0000:00000103e68d5db8  EFLAGS: 00010246
kernel: RAX: 000000000100806d RBX: 0000010007cefc48 RCX: 0000000000000000
kernel: RDX: ffffffff80596340 RSI: 00000101fb0d9d90 RDI: 0000010007cefc48
kernel: RBP: 00000000000165ae R08: 0000000000000000 R09: 000000000043c300
kernel: R10: 000000000d9b0b10 R11: 0000000000000002 R12: 00000101fb0d9d90
kernel: R13: 0000000000000003 R14: 00000103f0947800 R15: 000000000043c300
kernel: FS:  0000002a961864c0(0000) GS:ffffffff80554200(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000000000043c300 CR3: 0000000000101000 CR4: 00000000000006e0
kernel: Process mhd3d.opteron (pid: 4752, threadinfo 00000103e68d4000, task 00000101fb6e8c40)
kernel: Stack: 0000000000000246 ffffffff801740db 00000103e5028800 0000010231caf010
kernel:        0000000100000000 00000101fb0d9d90 00000103ea3b01e0 0000000000000000
kernel:        000000000043c300 00000103f0947800
kernel: Call Trace:<ffffffff801740db>{do_no_page+2987} <ffffffff801758a5>{handle_mm_fault+405}
kernel:        <ffffffff80122554>{do_page_fault+468} <ffffffff801477c1>{sys_rt_sigaction+113}
kernel:        <ffffffff80111041>{error_exit+0}
kernel:
kernel: Code: 0f 0b 10 ec 37 80 ff ff ff ff 45 01 8b 07 a9 00 80 00 00 75
kernel: RIP <ffffffff8017d1de>{page_add_rmap+334} RSP <00000103e68d5db8>

The bug always hits the same user-compiled executable,
always at the time the daily cron-jobs are run, so there
seems to be a dependency on another process. The process
4752 is part of an MPI-parallel application, and another
instance of it runs on the same node with PID 4751. Interestingly
after this bug the /proc/4751 dir (i.e., the one of the instance
not cited in the bug) is inaccessible. Thanks for any help
with this issue!

Regards

Christian Boehme

-- 
Dr. Christian Boehme
GWDG                            Private:
Am Fassberg                     Wilhelm-Raabe-Str. 15
37077 Göttingen                 37083 Göttingen
email: Christian.Boehme@gwdg.de ChristianBoehme@web.de
phone: +49 (0)551 201-1839      +49 (0)551 3077000
fax:   +49 (0)551 201-2150      +49 (0)551 3077077


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG at objrmap:325 in 2.6.5-7.151-smp (SuSE, x86_64)
  2005-07-13  9:07 Kernel BUG at objrmap:325 in 2.6.5-7.151-smp (SuSE, x86_64) Christian Boehme
@ 2005-07-13  9:16 ` Arjan van de Ven
  2005-07-13  9:59 ` Jan Engelhardt
  1 sibling, 0 replies; 5+ messages in thread
From: Arjan van de Ven @ 2005-07-13  9:16 UTC (permalink / raw)
  To: Christian Boehme; +Cc: linux-kernel

On Wed, 2005-07-13 at 11:07 +0200, Christian Boehme wrote:
> We often see the following kernel-bug in our logs:

you really should call SuSE support for this... after all that's what
you're paying them for ;)


> 
> kernel: Kernel BUG at objrmap:325
> kernel: invalid operand: 0000 [1] SMP
> kernel: CPU 0
> kernel: Pid: 4752, comm: mhd3d.opteron Tainted: G  U (2.6.5-7.151-smp SLES9_SP1_BRANCH-200503181131210000)

which modules do you use ?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG at objrmap:325 in 2.6.5-7.151-smp (SuSE, x86_64)
       [not found] ` <4pLzx-1KX-23@gated-at.bofh.it>
@ 2005-07-13  9:52   ` Christian Boehme
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Boehme @ 2005-07-13  9:52 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel

Arjan van de Ven schrieb:
> you really should call SuSE support for this... after all that's what
> you're paying them for ;)

We use SLES because it is a requiremnet for getting support for some
vendor-supplied MPI-libraries... but you are of course still right. I
file a bug report to them as well.

>>kernel: Kernel BUG at objrmap:325
>>kernel: invalid operand: 0000 [1] SMP
>>kernel: CPU 0
>>kernel: Pid: 4752, comm: mhd3d.opteron Tainted: G  U (2.6.5-7.151-smp SLES9_SP1_BRANCH-200503181131210000)
> 
> 
> which modules do you use ?

Here's the list:

joydev                 19584  0
sg                     51000  0
st                     51236  0
sr_mod                 26788  0
thermal                22668  0
processor              28352  1 thermal
fan                    12808  0
button                 15648  0
battery                17928  0
ac                     13832  0
ipv6                  317304  21
tg3                    88324  0
ohci_hcd               29700  0
evdev                  18944  0
usbcore               131824  3 ohci_hcd
ib_sdp                136000  0
ib_useraccess_cm       28320  0
ib_cm                  59000  2 ib_sdp,ib_useraccess_cm
ib_udapl               49852  0
ib_ip2pr               38200  3 ib_sdp,ib_useraccess_cm,ib_udapl
ib_ipoib               75160  2 ib_udapl,ib_ip2pr
ib_useraccess          21540  0
ib_sa_client           41232  3 ib_udapl,ib_ip2pr,ib_ipoib
ib_client_query        27040  4 ib_udapl,ib_ip2pr,ib_ipoib,ib_sa_client
ib_tavor               40964  7 ib_useraccess_cm
mod_vapi              169824  3 ib_useraccess_cm,ib_udapl,ib_tavor
mod_vipkl             262520  1 mod_vapi
mod_thh               258272  1 mod_vapi
mod_hh                 31888  2 mod_vipkl,mod_thh
mod_mpga               35200  1 mod_vapi
mod_vapi_common       103296  6 ib_useraccess_cm,ib_udapl,ib_tavor,mod_vapi,mod_vipkl,mod_thh
mosal                 155340  5 mod_vapi,mod_vipkl,mod_thh,mod_mpga,mod_vapi_common
ib_mad                 34200  4 ib_cm,ib_useraccess,ib_client_query,ib_tavor
ib_core               286744  10 ib_sdp,ib_useraccess_cm,ib_cm,ib_udapl,ib_ip2pr,ib_ipoib,ib_useraccess,ib_sa_client,ib_tavor,ib_mad
ib_poll                34768  4 ib_sdp,ib_cm,ib_ip2pr,ib_client_query
ib_services            28484  13 ib_sdp,ib_useraccess_cm,ib_cm,ib_udapl,ib_ip2pr,ib_ipoib,ib_useraccess,ib_sa_client,ib_client_query,ib_tavor,ib_mad,ib_core,ib_poll
mst_pciconf            95488  0
mst_pci                93312  2
w83627hf               39172  0
lm85                   34052  0
i2c_sensor             11520  2 w83627hf,lm85
i2c_isa                11008  0
i2c_amd756             15620  0
i2c_core               35716  5 w83627hf,lm85,i2c_sensor,i2c_isa,i2c_amd756
dm_mod                 67776  0
ext3                  133104  3
jbd                    83144  1 ext3
sata_sil               17284  4
libata                 51200  1 sata_sil,[permanent]
sd_mod                 29568  5
scsi_mod              140800  5 sg,st,sr_mod,libata,sd_mod

The ib_* modules are for the Infiniband network connections. Thanks for
your help!

Best wishes

Christian Boehme

-- 
Dr. Christian Boehme
GWDG                            Private:
Am Fassberg                     Wilhelm-Raabe-Str. 15
37077 Göttingen                 37083 Göttingen
email: Christian.Boehme@gwdg.de ChristianBoehme@web.de
phone: +49 (0)551 201-1839      +49 (0)551 3077000
fax:   +49 (0)551 201-2150      +49 (0)551 3077077


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG at objrmap:325 in 2.6.5-7.151-smp (SuSE, x86_64)
  2005-07-13  9:07 Kernel BUG at objrmap:325 in 2.6.5-7.151-smp (SuSE, x86_64) Christian Boehme
  2005-07-13  9:16 ` Arjan van de Ven
@ 2005-07-13  9:59 ` Jan Engelhardt
  2005-07-13 10:03   ` Jan Engelhardt
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Engelhardt @ 2005-07-13  9:59 UTC (permalink / raw)
  To: Christian Boehme; +Cc: linux-kernel


> We often see the following kernel-bug in our logs:

> kernel: Process mhd3d.opteron (pid: 4752, threadinfo 00000103e68d4000,
> kernel: task 00000101fb6e8c40)
> kernel: Stack: 0000000000000246 ffffffff801740db 00000103e5028800
> kernel: 0000010231caf010
> kernel: 0000000100000000 00000101fb0d9d90 00000103ea3b01e0
> kernel: 0000000000000000
> kernel: 000000000043c300 00000103f0947800
> kernel: Call Trace:<ffffffff801740db>{do_no_page+2987}
> kernel: <ffffffff801758a5>{handle_mm_fault+405}
> kernel: <ffffffff80122554>{do_page_fault+468}
> kernel: <ffffffff801477c1>{sys_rt_sigaction+113}
> kernel: <ffffffff80111041>{error_exit+0}

SUSE kontaktieren (bugzilla.novell.com) (oder auch bei Eberhard nachfragen). 
Aber da wirst du wahrscheinlich auch nicht viel erfahren, da der Stack Trace 
nicht viel enthaelt.

Ich gehe mal davon aus, dass die Userspace-Anwendung, die das verursacht, 'n 
Segfault erhaelt. Du kannst also mal 'n strace mitlaufen lassen und schauen, 
welcher "Userspace-Syscall" das ausloest. Eigentlich steht's auch da: 
rt_sigaction() [man rt_sigaction] aber da ist auch nicht viel bei.

Was ist ueberhaupt mhd3d.opteron? Und sonst, halt einfach mal auf neueres SUSE 
updaten, die haben naemlich schon in kurzer Zeit den Sprung von 2.6.5 auf 
2.6.8 auf 2.6.11 gemacht - zumindest im KOTD.

> Interestingly
> after this bug the /proc/4751 dir (i.e., the one of the instance
> not cited in the bug) is inaccessible. Thanks for any help
> with this issue!

Welche Fehlernummer? Mal mit root probiert, reinzukommen?



Jan Engelhardt
-- 
| Alphagate Systems, http://alphagate.hopto.org/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel BUG at objrmap:325 in 2.6.5-7.151-smp (SuSE, x86_64)
  2005-07-13  9:59 ` Jan Engelhardt
@ 2005-07-13 10:03   ` Jan Engelhardt
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Engelhardt @ 2005-07-13 10:03 UTC (permalink / raw)
  Cc: Linux Kernel Mailing List

>> not cited in the bug) is inaccessible. Thanks for any help
>> with this issue!
>Welche Fehlernummer? Mal mit root probiert, reinzukommen?

<whoops, this should not have made it to the list; sorry>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-07-13 10:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-13  9:07 Kernel BUG at objrmap:325 in 2.6.5-7.151-smp (SuSE, x86_64) Christian Boehme
2005-07-13  9:16 ` Arjan van de Ven
2005-07-13  9:59 ` Jan Engelhardt
2005-07-13 10:03   ` Jan Engelhardt
     [not found] <4pLpN-1F0-15@gated-at.bofh.it>
     [not found] ` <4pLzx-1KX-23@gated-at.bofh.it>
2005-07-13  9:52   ` Christian Boehme

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox