* Analysis of Samba configure oops
@ 2000-07-17 1:24 Keith M Wesolowski
2000-07-17 17:05 ` Keith M Wesolowski
0 siblings, 1 reply; 7+ messages in thread
From: Keith M Wesolowski @ 2000-07-17 1:24 UTC (permalink / raw)
To: linux-mips
Hi all,
I've been analysing the infamous Samba configure oops and I think I've
taken it as far as I can. Hopefully someone here will know what's
wrong.
The configuration: sgi indy 4400sc/200 (R4000SC V6.0), current CVS.
The trigger (from samba configuration, tests/shared_mmap.c):
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#define DATA "conftest.mmap"
#ifndef MAP_FILE
#define MAP_FILE 0
#endif
int main()
{
int *buf;
int i;
int fd = open(DATA,O_RDWR|O_CREAT|O_TRUNC,0666);
int count=7;
if (fd == -1) exit(1);
for (i=0;i<10000;i++) {
write(fd,&i,sizeof(i));
}
close(fd);
if (fork() == 0) {
fd = open(DATA,O_RDWR);
if (fd == -1) exit(1);
buf = (int *)mmap(NULL, 10000*sizeof(int),
(PROT_READ | PROT_WRITE),
MAP_FILE | MAP_SHARED,
fd, 0);
while (count-- && buf[9124] != 55732) sleep(1);
if (count <= 0) exit(1);
buf[1763] = 7268;
exit(0);
}
fd = open(DATA,O_RDWR);
if (fd == -1) exit(1);
buf = (int *)mmap(NULL, 10000*sizeof(int),
(PROT_READ | PROT_WRITE),
MAP_FILE | MAP_SHARED,
fd, 0);
if (buf == (int *)-1) exit(1);
buf[9124] = 55732;
while (count-- && buf[1763] != 7268) sleep(1);
unlink(DATA);
if (count > 0) exit(0);
exit(1);
}
The oops:
Unable to handle kernel paging request at virtual address 0000003c, epc == 8801eb20, ra == 8803a108
$0 : 00000000 1000fc00 000003f8 00000000
$4 : 000000fb 00005eda 2acfe000 8816c8a0
$8 : 8d6d8360 ffff00ff 1000fc01 00000001
$12: 00000001 7ffffa10 00000003 00400c0c
$16: 00346e80 0d1ba79f 00000001 2acfe000
$20: 000fe000 8da1c3f8 000fd000 8d6d8360
$24: 00000014 2ac26590
$28: 8d776000 8d777e18 8d9372ac 8803a108
epc : 8801eb20
Status: 1000fc02
Cause : 00000008
Process conftest (pid: 5154, stackpage=8d776000)
Stack: 88502c80 8d9372ac 00000001 2acf9000 000fd000 003fffff 8d9372ac 2ad07000
00000000 2acfd000 2ac00000 00107000 00000000 00107000 00000000 003fffff
2ac00000 00000000 8d6d8360 00000001 2acfd000 8816c8a0 0000a000 8d6d86a0
00000000 1002e510 00000000 8803a39c 00004000 2acfd000 00000000 00000001
88036828 880368b0 8816c8a0 8fcc9b60 000478e5 00000065 00000001 8816c8a0
8d776000 ...
Call Trace: [<8803a39c>] [<88036828>] [<880368b0>] [<880296b0>] [<88025208>] [<8
802c914>] [<88013ef8>] [<8802cb8c>] [<88010c28>] [<88010c28>]
Code: 8f83002c 8d040014 8ce5003c <8c62003c> 10a2007d 30840004 3c028819 8c42
>>RA; 8803a108 <filemap_sync+204/488>
>>PC; 8801eb20 <r4k_flush_cache_page_s128d16i16+78/324> <=====
Trace; 8803a39c <filemap_unmap+10/1c>
Trace; 88036828 <exit_mmap+78/198>
Trace; 880368b0 <exit_mmap+100/198>
Trace; 880296b0 <mmput+40/7c>
Trace; 88025208 <sys_rt_sigaction+88/e8>
Code; 8801eb14 <r4k_flush_cache_page_s128d16i16+6c/324>
0000000000000000 <_PC>:
Code; 8801eb14 <r4k_flush_cache_page_s128d16i16+6c/324>
0: 8f83002c lw $v1,44($gp)
Code; 8801eb18 <r4k_flush_cache_page_s128d16i16+70/324>
4: 8d040014 lw $a0,20($t0)
Code; 8801eb1c <r4k_flush_cache_page_s128d16i16+74/324>
8: 8ce5003c lw $a1,60($a3)
Code; 8801eb20 <r4k_flush_cache_page_s128d16i16+78/324> <=====
c: 8c62003c lw $v0,60($v1) <=====
Code; 8801eb24 <r4k_flush_cache_page_s128d16i16+7c/324>
10: 10a2007d beq $a1,$v0,208 <_PC+0x208> 8801ed1c <r4k_flush_cache_page_s128d16i16+274/324>
Code; 8801eb28 <r4k_flush_cache_page_s128d16i16+80/324>
14: 30840004 andi $a0,$a0,0x4
Code; 8801eb2c <r4k_flush_cache_page_s128d16i16+84/324>
18: 3c028819 lui $v0,0x8819
Code; 8801eb30 <r4k_flush_cache_page_s128d16i16+88/324>
1c: 8c420000 lw $v0,0($v0)
The matching code (arch/mips/mm/r4xx0.c:1600):
if(mm->context != current->mm->context) {
The analysis:
The fault address is 0x3c. The offset of mm in current is 0x2c. Thus
the immediate cause appears to be that current->mm is 0x10, obviously
nonsense.
Further investigation yields this call sequence: exit_mmap =>
filemap_unmap => filemap_sync => filemap_sync_pmd_range =>
filemap_sync_pte_range => filemap_sync_pte => flush_cache_page
Anybody?
--
Keith M Wesolowski <wesolows@foobazco.org> http://foobazco.org/~wesolows/
(( Project Foobazco Coordinator and Network Administrator )) aiieeeeeeee!
"The list of people so amazingly stupid they can't even tie their shoes?"
"Yeah, you know, /etc/passwd."
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Analysis of Samba configure oops
2000-07-17 1:24 Analysis of Samba configure oops Keith M Wesolowski
@ 2000-07-17 17:05 ` Keith M Wesolowski
2000-07-18 3:18 ` Ralf Baechle
0 siblings, 1 reply; 7+ messages in thread
From: Keith M Wesolowski @ 2000-07-17 17:05 UTC (permalink / raw)
To: Keith M Wesolowski; +Cc: linux-mips
On Sun, Jul 16, 2000 at 06:24:28PM -0700, Keith M Wesolowski wrote:
Responding to my own mail, yeesh. I was obviously suffering a dumbass
attack when I wrote this.
> Code; 8801eb1c <r4k_flush_cache_page_s128d16i16+74/324>
> 8: 8ce5003c lw $a1,60($a3)
> Code; 8801eb20 <r4k_flush_cache_page_s128d16i16+78/324> <=====
> c: 8c62003c lw $v0,60($v1) <=====
>
> The fault address is 0x3c. The offset of mm in current is 0x2c. Thus
> the immediate cause appears to be that current->mm is 0x10, obviously
> nonsense.
The interesting bit is not current->mm, but current->mm->context. The
offset of context is 60 as shown above in the disassembly. 60 = 3c, so
it's clear that current->mm is in fact NULL.
Hope this makes things a bit clearer.
--
Keith M Wesolowski wesolows@chem.unr.edu
University of Nevada http://www.chem.unr.edu
Chemistry Department Systems and Network Administrator
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Analysis of Samba configure oops
2000-07-17 17:05 ` Keith M Wesolowski
@ 2000-07-18 3:18 ` Ralf Baechle
2000-07-19 4:33 ` Keith M Wesolowski
0 siblings, 1 reply; 7+ messages in thread
From: Ralf Baechle @ 2000-07-18 3:18 UTC (permalink / raw)
To: Keith M Wesolowski; +Cc: linux-mips
On Mon, Jul 17, 2000 at 10:05:34AM -0700, Keith M Wesolowski wrote:
> Responding to my own mail, yeesh. I was obviously suffering a dumbass
> attack when I wrote this.
>
> > Code; 8801eb1c <r4k_flush_cache_page_s128d16i16+74/324>
> > 8: 8ce5003c lw $a1,60($a3)
> > Code; 8801eb20 <r4k_flush_cache_page_s128d16i16+78/324> <=====
> > c: 8c62003c lw $v0,60($v1) <=====
> >
> > The fault address is 0x3c. The offset of mm in current is 0x2c. Thus
> > the immediate cause appears to be that current->mm is 0x10, obviously
> > nonsense.
>
> The interesting bit is not current->mm, but current->mm->context. The
> offset of context is 60 as shown above in the disassembly. 60 = 3c, so
> it's clear that current->mm is in fact NULL.
>
> Hope this makes things a bit clearer.
Indeed, it does. I've commited a patch for this bug to cvs and would like
to hear reports.
Ralf
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Analysis of Samba configure oops
2000-07-18 3:18 ` Ralf Baechle
@ 2000-07-19 4:33 ` Keith M Wesolowski
2000-07-19 14:10 ` Ralf Baechle
0 siblings, 1 reply; 7+ messages in thread
From: Keith M Wesolowski @ 2000-07-19 4:33 UTC (permalink / raw)
To: Ralf Baechle; +Cc: linux-mips
On Tue, Jul 18, 2000 at 05:18:28AM +0200, Ralf Baechle wrote:
> Indeed, it does. I've commited a patch for this bug to cvs and would like
> to hear reports.
I am pleased to report that without this fix I observe the
oft-reported problem when using two disks simultaneously on IP22:
SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 27010000
I/O error: dev 08:11, sector 1885720
but with this fix I no longer see this. How many more bugs will this
fix I wonder...
--
Keith M Wesolowski wesolows@chem.unr.edu
University of Nevada http://www.chem.unr.edu
Chemistry Department Systems and Network Administrator
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Analysis of Samba configure oops
2000-07-19 4:33 ` Keith M Wesolowski
@ 2000-07-19 14:10 ` Ralf Baechle
2000-07-19 23:40 ` Keith M Wesolowski
0 siblings, 1 reply; 7+ messages in thread
From: Ralf Baechle @ 2000-07-19 14:10 UTC (permalink / raw)
To: Keith M Wesolowski; +Cc: linux-mips
On Tue, Jul 18, 2000 at 09:33:15PM -0700, Keith M Wesolowski wrote:
> > Indeed, it does. I've commited a patch for this bug to cvs and would like
> > to hear reports.
>
> I am pleased to report that without this fix I observe the
> oft-reported problem when using two disks simultaneously on IP22:
>
> SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 27010000
> I/O error: dev 08:11, sector 1885720
>
> but with this fix I no longer see this. How many more bugs will this
> fix I wonder...
Funny. It's unobvious why this happend but I gratefully accept this
bug being fixed as well. Now that this cure was so successful I'll have
to research if mips64 is also affected.
Ralf
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Analysis of Samba configure oops
2000-07-19 14:10 ` Ralf Baechle
@ 2000-07-19 23:40 ` Keith M Wesolowski
2000-07-20 12:19 ` Ralf Baechle
0 siblings, 1 reply; 7+ messages in thread
From: Keith M Wesolowski @ 2000-07-19 23:40 UTC (permalink / raw)
To: Ralf Baechle; +Cc: linux-mips
On Wed, Jul 19, 2000 at 04:10:12PM +0200, Ralf Baechle wrote:
> Funny. It's unobvious why this happend but I gratefully accept this
> bug being fixed as well. Now that this cure was so successful I'll have
> to research if mips64 is also affected.
Klaus and I have investiagated further. Apparently the problem does
not manifest itself with cp -rd or similar, but using tar cf - | tar
xf - does trigger it. It's not clear why this is.
--
Keith M Wesolowski wesolows@chem.unr.edu
University of Nevada http://www.chem.unr.edu
Chemistry Department Systems and Network Administrator
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Analysis of Samba configure oops
2000-07-19 23:40 ` Keith M Wesolowski
@ 2000-07-20 12:19 ` Ralf Baechle
0 siblings, 0 replies; 7+ messages in thread
From: Ralf Baechle @ 2000-07-20 12:19 UTC (permalink / raw)
To: Keith M Wesolowski; +Cc: linux-mips
On Wed, Jul 19, 2000 at 04:40:15PM -0700, Keith M Wesolowski wrote:
> > Funny. It's unobvious why this happend but I gratefully accept this
> > bug being fixed as well. Now that this cure was so successful I'll have
> > to research if mips64 is also affected.
>
> Klaus and I have investiagated further. Apparently the problem does
> not manifest itself with cp -rd or similar, but using tar cf - | tar
> xf - does trigger it. It's not clear why this is.
I don't see why this should make a difference to SCSI. But the tar variant
that's two processes and lots of context switching between them.
Ralf
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2000-07-20 13:33 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-07-17 1:24 Analysis of Samba configure oops Keith M Wesolowski
2000-07-17 17:05 ` Keith M Wesolowski
2000-07-18 3:18 ` Ralf Baechle
2000-07-19 4:33 ` Keith M Wesolowski
2000-07-19 14:10 ` Ralf Baechle
2000-07-19 23:40 ` Keith M Wesolowski
2000-07-20 12:19 ` Ralf Baechle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox