Analysis of Samba configure oops

Linux MIPS Architecture development
 help / color / mirror / Atom feed

* Analysis of Samba configure oops
@ 2000-07-17  1:24 Keith M Wesolowski
  2000-07-17 17:05 ` Keith M Wesolowski
  0 siblings, 1 reply; 7+ messages in thread
From: Keith M Wesolowski @ 2000-07-17  1:24 UTC (permalink / raw)
  To: linux-mips

Hi all,

I've been analysing the infamous Samba configure oops and I think I've
taken it as far as I can. Hopefully someone here will know what's
wrong.

The configuration: sgi indy 4400sc/200 (R4000SC V6.0), current CVS.

The trigger (from samba configuration, tests/shared_mmap.c):

#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>

#define DATA "conftest.mmap"

#ifndef MAP_FILE
#define MAP_FILE 0
#endif

int main()
{
	int *buf;
	int i; 
	int fd = open(DATA,O_RDWR|O_CREAT|O_TRUNC,0666);
	int count=7;

	if (fd == -1) exit(1);

	for (i=0;i<10000;i++) {
		write(fd,&i,sizeof(i));
	}

	close(fd);

	if (fork() == 0) {
		fd = open(DATA,O_RDWR);
		if (fd == -1) exit(1);

		buf = (int *)mmap(NULL, 10000*sizeof(int), 
				   (PROT_READ | PROT_WRITE), 
				   MAP_FILE | MAP_SHARED, 
				   fd, 0);

		while (count-- && buf[9124] != 55732) sleep(1);

		if (count <= 0) exit(1);

		buf[1763] = 7268;
		exit(0);
	}

	fd = open(DATA,O_RDWR);
	if (fd == -1) exit(1);

	buf = (int *)mmap(NULL, 10000*sizeof(int), 
			   (PROT_READ | PROT_WRITE), 
			   MAP_FILE | MAP_SHARED, 
			   fd, 0);

	if (buf == (int *)-1) exit(1);

	buf[9124] = 55732;

	while (count-- && buf[1763] != 7268) sleep(1);

	unlink(DATA);
		
	if (count > 0) exit(0);
	exit(1);
}

The oops:

Unable to handle kernel paging request at virtual address 0000003c, epc == 8801eb20, ra == 8803a108
$0 : 00000000 1000fc00 000003f8 00000000
$4 : 000000fb 00005eda 2acfe000 8816c8a0
$8 : 8d6d8360 ffff00ff 1000fc01 00000001
$12: 00000001 7ffffa10 00000003 00400c0c
$16: 00346e80 0d1ba79f 00000001 2acfe000
$20: 000fe000 8da1c3f8 000fd000 8d6d8360
$24: 00000014 2ac26590
$28: 8d776000 8d777e18 8d9372ac 8803a108
epc   : 8801eb20
Status: 1000fc02
Cause : 00000008
Process conftest (pid: 5154, stackpage=8d776000)
Stack: 88502c80 8d9372ac 00000001 2acf9000 000fd000 003fffff 8d9372ac 2ad07000
       00000000 2acfd000 2ac00000 00107000 00000000 00107000 00000000 003fffff
       2ac00000 00000000 8d6d8360 00000001 2acfd000 8816c8a0 0000a000 8d6d86a0
       00000000 1002e510 00000000 8803a39c 00004000 2acfd000 00000000 00000001
       88036828 880368b0 8816c8a0 8fcc9b60 000478e5 00000065 00000001 8816c8a0
       8d776000 ...
Call Trace: [<8803a39c>] [<88036828>] [<880368b0>] [<880296b0>] [<88025208>] [<8
802c914>] [<88013ef8>] [<8802cb8c>] [<88010c28>] [<88010c28>]
Code: 8f83002c  8d040014  8ce5003c <8c62003c> 10a2007d  30840004  3c028819  8c42

>>RA;  8803a108 <filemap_sync+204/488>
>>PC;  8801eb20 <r4k_flush_cache_page_s128d16i16+78/324>   <=====
Trace; 8803a39c <filemap_unmap+10/1c>
Trace; 88036828 <exit_mmap+78/198>
Trace; 880368b0 <exit_mmap+100/198>
Trace; 880296b0 <mmput+40/7c>
Trace; 88025208 <sys_rt_sigaction+88/e8>
Code;  8801eb14 <r4k_flush_cache_page_s128d16i16+6c/324>
0000000000000000 <_PC>:
Code;  8801eb14 <r4k_flush_cache_page_s128d16i16+6c/324>
   0:   8f83002c  lw      $v1,44($gp)
Code;  8801eb18 <r4k_flush_cache_page_s128d16i16+70/324>
   4:   8d040014  lw      $a0,20($t0)
Code;  8801eb1c <r4k_flush_cache_page_s128d16i16+74/324>
   8:   8ce5003c  lw      $a1,60($a3)
Code;  8801eb20 <r4k_flush_cache_page_s128d16i16+78/324>   <=====
   c:   8c62003c  lw      $v0,60($v1)   <=====
Code;  8801eb24 <r4k_flush_cache_page_s128d16i16+7c/324>
  10:   10a2007d  beq     $a1,$v0,208 <_PC+0x208> 8801ed1c <r4k_flush_cache_page_s128d16i16+274/324>
Code;  8801eb28 <r4k_flush_cache_page_s128d16i16+80/324>
  14:   30840004  andi    $a0,$a0,0x4
Code;  8801eb2c <r4k_flush_cache_page_s128d16i16+84/324>
  18:   3c028819  lui     $v0,0x8819
Code;  8801eb30 <r4k_flush_cache_page_s128d16i16+88/324>
  1c:   8c420000  lw      $v0,0($v0)

The matching code (arch/mips/mm/r4xx0.c:1600):

        if(mm->context != current->mm->context) {

The analysis:

The fault address is 0x3c. The offset of mm in current is 0x2c. Thus
the immediate cause appears to be that current->mm is 0x10, obviously
nonsense.

Further investigation yields this call sequence: exit_mmap =>
filemap_unmap => filemap_sync => filemap_sync_pmd_range =>
filemap_sync_pte_range => filemap_sync_pte => flush_cache_page

Anybody?

-- 
Keith M Wesolowski <wesolows@foobazco.org> http://foobazco.org/~wesolows/
(( Project Foobazco Coordinator and Network Administrator )) aiieeeeeeee!
"The list of people so amazingly stupid they can't even tie their shoes?"
"Yeah, you know, /etc/passwd."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Analysis of Samba configure oops
  2000-07-17  1:24 Analysis of Samba configure oops Keith M Wesolowski
@ 2000-07-17 17:05 ` Keith M Wesolowski
  2000-07-18  3:18   ` Ralf Baechle
  0 siblings, 1 reply; 7+ messages in thread
From: Keith M Wesolowski @ 2000-07-17 17:05 UTC (permalink / raw)
  To: Keith M Wesolowski; +Cc: linux-mips

On Sun, Jul 16, 2000 at 06:24:28PM -0700, Keith M Wesolowski wrote:

Responding to my own mail, yeesh. I was obviously suffering a dumbass
attack when I wrote this.

> Code;  8801eb1c <r4k_flush_cache_page_s128d16i16+74/324>
>    8:   8ce5003c  lw      $a1,60($a3)
> Code;  8801eb20 <r4k_flush_cache_page_s128d16i16+78/324>   <=====
>    c:   8c62003c  lw      $v0,60($v1)   <=====
> 
> The fault address is 0x3c. The offset of mm in current is 0x2c. Thus
> the immediate cause appears to be that current->mm is 0x10, obviously
> nonsense.

The interesting bit is not current->mm, but current->mm->context. The
offset of context is 60 as shown above in the disassembly. 60 = 3c, so
it's clear that current->mm is in fact NULL.

Hope this makes things a bit clearer.

-- 
Keith M Wesolowski			wesolows@chem.unr.edu
University of Nevada			http://www.chem.unr.edu
Chemistry Department Systems and Network Administrator

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Analysis of Samba configure oops
  2000-07-17 17:05 ` Keith M Wesolowski
@ 2000-07-18  3:18   ` Ralf Baechle
  2000-07-19  4:33     ` Keith M Wesolowski
  0 siblings, 1 reply; 7+ messages in thread
From: Ralf Baechle @ 2000-07-18  3:18 UTC (permalink / raw)
  To: Keith M Wesolowski; +Cc: linux-mips

On Mon, Jul 17, 2000 at 10:05:34AM -0700, Keith M Wesolowski wrote:

> Responding to my own mail, yeesh. I was obviously suffering a dumbass
> attack when I wrote this.
> 
> > Code;  8801eb1c <r4k_flush_cache_page_s128d16i16+74/324>
> >    8:   8ce5003c  lw      $a1,60($a3)
> > Code;  8801eb20 <r4k_flush_cache_page_s128d16i16+78/324>   <=====
> >    c:   8c62003c  lw      $v0,60($v1)   <=====
> > 
> > The fault address is 0x3c. The offset of mm in current is 0x2c. Thus
> > the immediate cause appears to be that current->mm is 0x10, obviously
> > nonsense.
> 
> The interesting bit is not current->mm, but current->mm->context. The
> offset of context is 60 as shown above in the disassembly. 60 = 3c, so
> it's clear that current->mm is in fact NULL.
> 
> Hope this makes things a bit clearer.

Indeed, it does.  I've commited a patch for this bug to cvs and would like
to hear reports.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Analysis of Samba configure oops
  2000-07-18  3:18   ` Ralf Baechle
@ 2000-07-19  4:33     ` Keith M Wesolowski
  2000-07-19 14:10       ` Ralf Baechle
  0 siblings, 1 reply; 7+ messages in thread
From: Keith M Wesolowski @ 2000-07-19  4:33 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On Tue, Jul 18, 2000 at 05:18:28AM +0200, Ralf Baechle wrote:

> Indeed, it does.  I've commited a patch for this bug to cvs and would like
> to hear reports.

I am pleased to report that without this fix I observe the
oft-reported problem when using two disks simultaneously on IP22:

  SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 27010000
   I/O error: dev 08:11, sector 1885720

but with this fix I no longer see this. How many more bugs will this
fix I wonder...

-- 
Keith M Wesolowski			wesolows@chem.unr.edu
University of Nevada			http://www.chem.unr.edu
Chemistry Department Systems and Network Administrator

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Analysis of Samba configure oops
  2000-07-19  4:33     ` Keith M Wesolowski
@ 2000-07-19 14:10       ` Ralf Baechle
  2000-07-19 23:40         ` Keith M Wesolowski
  0 siblings, 1 reply; 7+ messages in thread
From: Ralf Baechle @ 2000-07-19 14:10 UTC (permalink / raw)
  To: Keith M Wesolowski; +Cc: linux-mips

On Tue, Jul 18, 2000 at 09:33:15PM -0700, Keith M Wesolowski wrote:

> > Indeed, it does.  I've commited a patch for this bug to cvs and would like
> > to hear reports.
> 
> I am pleased to report that without this fix I observe the
> oft-reported problem when using two disks simultaneously on IP22:
> 
>   SCSI disk error : host 0 channel 0 id 2 lun 0 return code = 27010000
>    I/O error: dev 08:11, sector 1885720
> 
> but with this fix I no longer see this. How many more bugs will this
> fix I wonder...

Funny.  It's unobvious why this happend but I gratefully accept this
bug being fixed as well.  Now that this cure was so successful I'll have
to research if mips64 is also affected.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Analysis of Samba configure oops
  2000-07-19 14:10       ` Ralf Baechle
@ 2000-07-19 23:40         ` Keith M Wesolowski
  2000-07-20 12:19           ` Ralf Baechle
  0 siblings, 1 reply; 7+ messages in thread
From: Keith M Wesolowski @ 2000-07-19 23:40 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips

On Wed, Jul 19, 2000 at 04:10:12PM +0200, Ralf Baechle wrote:

> Funny.  It's unobvious why this happend but I gratefully accept this
> bug being fixed as well.  Now that this cure was so successful I'll have
> to research if mips64 is also affected.

Klaus and I have investiagated further. Apparently the problem does
not manifest itself with cp -rd or similar, but using tar cf - | tar
xf - does trigger it. It's not clear why this is.

-- 
Keith M Wesolowski			wesolows@chem.unr.edu
University of Nevada			http://www.chem.unr.edu
Chemistry Department Systems and Network Administrator

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Analysis of Samba configure oops
  2000-07-19 23:40         ` Keith M Wesolowski
@ 2000-07-20 12:19           ` Ralf Baechle
  0 siblings, 0 replies; 7+ messages in thread
From: Ralf Baechle @ 2000-07-20 12:19 UTC (permalink / raw)
  To: Keith M Wesolowski; +Cc: linux-mips

On Wed, Jul 19, 2000 at 04:40:15PM -0700, Keith M Wesolowski wrote:

> > Funny.  It's unobvious why this happend but I gratefully accept this
> > bug being fixed as well.  Now that this cure was so successful I'll have
> > to research if mips64 is also affected.
> 
> Klaus and I have investiagated further. Apparently the problem does
> not manifest itself with cp -rd or similar, but using tar cf - | tar
> xf - does trigger it. It's not clear why this is.

I don't see why this should make a difference to SCSI.  But the tar variant
that's two processes and lots of context switching between them.

  Ralf

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-07-20 13:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-07-17  1:24 Analysis of Samba configure oops Keith M Wesolowski
2000-07-17 17:05 ` Keith M Wesolowski
2000-07-18  3:18   ` Ralf Baechle
2000-07-19  4:33     ` Keith M Wesolowski
2000-07-19 14:10       ` Ralf Baechle
2000-07-19 23:40         ` Keith M Wesolowski
2000-07-20 12:19           ` Ralf Baechle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox