[uml-devel] debugging UML cores

All of lore.kernel.org
 help / color / mirror / Atom feed

* [uml-devel] debugging UML cores
@ 2004-08-11 15:32 Joe Marzot
  2004-08-12  5:41 ` Jeff Dike
  0 siblings, 1 reply; 23+ messages in thread
From: Joe Marzot @ 2004-08-11 15:32 UTC (permalink / raw)
  To: user-mode-linux-devel

Hi UML developers,

I am getting a variety of cores from UML - fairly intermittent. The bad 
part is that examining these cores with GDB is utterly fruitless (for 
me). Are there some tricks beyond the normal stuff below I should be 
doing to get a better sense of what is going wrong? I see no back trace 
and no thread info. thanks for any help, Giovanni

[root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c 
/bne/home/gmarzot/proj/celp/cores/core_sanity_crash_8_11 celp/linux.celp
GNU gdb 6.2
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db 
library "/lib/libthread_db.so.1".

Core was generated by `/vob/neptune/plankton/celp/linux.celp (CSC-0-4-0) 
[/sbin/modprobe]            '.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000 in ?? ()
(gdb) where
#0  0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
(gdb) info thr
* 1 process 12718  0x00000000 in ?? ()
warning: Couldn't restore frame in current thread, at frame 0
0x00000000 in ?? ()
(gdb)

-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] debugging UML cores
  2004-08-11 15:32 [uml-devel] debugging UML cores Joe Marzot
@ 2004-08-12  5:41 ` Jeff Dike
  2004-08-12 15:21   ` Joe Marzot
  2004-08-12 15:36   ` Joe Marzot
  0 siblings, 2 replies; 23+ messages in thread
From: Jeff Dike @ 2004-08-12  5:41 UTC (permalink / raw)
  To: Joe Marzot; +Cc: user-mode-linux-devel

gmarzot@nortelnetworks.com said:
> I am getting a variety of cores from UML - fairly intermittent. The
> bad  part is that examining these cores with GDB is utterly fruitless
> (for  me). Are there some tricks beyond the normal stuff below I
> should be  doing to get a better sense of what is going wrong? I see
> no back trace  and no thread info.

That is utterly confused.  I would make sure that you are absolutely positive
that you are giving gdb the exact binary that created the core.

				Jeff



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] debugging UML cores
  2004-08-12  5:41 ` Jeff Dike
@ 2004-08-12 15:21   ` Joe Marzot
  2004-08-12 16:56     ` Jeff Dike
  2004-08-12 15:36   ` Joe Marzot
  1 sibling, 1 reply; 23+ messages in thread
From: Joe Marzot @ 2004-08-12 15:21 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

Jeff Dike wrote:
> gmarzot@nortelnetworks.com said:
>  > I am getting a variety of cores from UML - fairly intermittent. The
>  > bad  part is that examining these cores with GDB is utterly fruitless
>  > (for  me). Are there some tricks beyond the normal stuff below I
>  > should be  doing to get a better sense of what is going wrong? I see
>  > no back trace  and no thread info.
> 
> That is utterly confused.  I would make sure that you are absolutely 
> positive
> that you are giving gdb the exact binary that created the core.
> 
>                                 Jeff
> 
> 

hi,

I am 99% sure. Is there a way from the core to see if it agrees with 
binary...to see if some key info matches.

I can try to make both available to you some place if that helps. 
regards, Giovanni



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] debugging UML cores
  2004-08-12  5:41 ` Jeff Dike
  2004-08-12 15:21   ` Joe Marzot
@ 2004-08-12 15:36   ` Joe Marzot
  2004-08-12 15:47     ` Joe Marzot
  2004-08-13 15:46     ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot
  1 sibling, 2 replies; 23+ messages in thread
From: Joe Marzot @ 2004-08-12 15:36 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Joe Marzot, user-mode-linux-devel

Jeff Dike wrote:
> gmarzot@nortelnetworks.com said:
>  > I am getting a variety of cores from UML - fairly intermittent. The
>  > bad  part is that examining these cores with GDB is utterly fruitless
>  > (for  me). Are there some tricks beyond the normal stuff below I
>  > should be  doing to get a better sense of what is going wrong? I see
>  > no back trace  and no thread info.
> 
> That is utterly confused.  I would make sure that you are absolutely 
> positive
> that you are giving gdb the exact binary that created the core.
> 
>                                 Jeff
> 
> 
here is a better one produced under similar conditions - this time the 
core is readable (I do get the unreadable cores quite often though).

this is host RH8 + skas3 pathc

guest is 2.4.2x + 2.4.24-1um

[root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c 
~szhimin/tmp/joe/core.13456 
/view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp
GNU gdb 6.2
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db 
library "/lib/libthread_db.so.1".

Core was generated by `/vob/neptune/plankton/celp/linux.celp (DSC-0-0-0) 
[nameServer]                '.
Program terminated with signal 6, Aborted.
#0  0xa01643e1 in kill ()
     at 
/localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
486                     case 1: COMMON("\n\tstosb"); return s;
(gdb) where
#0  0xa01643e1 in kill ()
     at 
/localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
#1  0xa018cbdb in raise ()
     at 
/localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
#2  0xa01646cd in abort ()
     at 
/localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
#3  0xa00d01e4 in handle_trap (pid=13461, regs=0xa5f7827c) at process.c:90
#4  0xa00d0438 in userspace (regs=0xa5f7827c) at process.c:168
#5  0xa00d0bfa in fork_handler (sig=10) at process_kern.c:102
#6  <signal handler called>
#7  0xa01643e1 in kill ()
     at 
/localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
#8  0xa00d4734 in os_usr1_process (pid=13456) at process.c:95
#9  0xa00d04ce in new_thread (stack=Cannot access memory at address 0x8
) at process.c:205
Previous frame inner to this frame (corrupt stack?)
(gdb) info thr
* 1 process 13456  0xa01643e1 in kill ()
     at 
/localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
(gdb)





-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] debugging UML cores
  2004-08-12 15:36   ` Joe Marzot
@ 2004-08-12 15:47     ` Joe Marzot
  2004-08-13 15:46     ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot
  1 sibling, 0 replies; 23+ messages in thread
From: Joe Marzot @ 2004-08-12 15:47 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

Joe Marzot wrote:
> 
> [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c 
> ~szhimin/tmp/joe/core.13456 
> /view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp
> GNU gdb 6.2
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you 
> are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db 
> library "/lib/libthread_db.so.1".
> 
> Core was generated by `/vob/neptune/plankton/celp/linux.celp (DSC-0-0-0) 
> [nameServer]                '.
> Program terminated with signal 6, Aborted.
> #0  0xa01643e1 in kill ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486


I wonder what is producing these bogus source file and  line numbers 
though. -gsm



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] debugging UML cores
  2004-08-12 16:56     ` Jeff Dike
@ 2004-08-12 16:16       ` Joe Marzot
  0 siblings, 0 replies; 23+ messages in thread
From: Joe Marzot @ 2004-08-12 16:16 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Joe Marzot, user-mode-linux-devel

Jeff Dike wrote:
> gmarzot@nortelnetworks.com said:
> 
>>I am 99% sure. Is there a way from the core to see if it agrees with
>>binary...to see if some key info matches. 
> 
> 
> I would be most concerned about whether the binary had been rebuilt since the
> core happend.  gdb will tell you the path of the thing that dumped core -
> presumably that matched.  The other thing would be to check the dates, to
> see that the binary is older than the core.

all checks out.

> 
> If that's all OK, you could try a different version of gdb.  We've had problems
> with some versions not producing good stacks.

that's what I thought so I got a new gdb 6.2 and no help.

do you recommend a combination gcc/gdb libc libpthread that all play 
nice together?

regards, G

> 
> 				Jeff
> 
> 





-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] debugging UML cores
  2004-08-12 15:21   ` Joe Marzot
@ 2004-08-12 16:56     ` Jeff Dike
  2004-08-12 16:16       ` Joe Marzot
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff Dike @ 2004-08-12 16:56 UTC (permalink / raw)
  To: Joe Marzot; +Cc: user-mode-linux-devel

gmarzot@nortelnetworks.com said:
> I am 99% sure. Is there a way from the core to see if it agrees with
> binary...to see if some key info matches. 

I would be most concerned about whether the binary had been rebuilt since the
core happend.  gdb will tell you the path of the thing that dumped core -
presumably that matched.  The other thing would be to check the dates, to
see that the binary is older than the core.

If that's all OK, you could try a different version of gdb.  We've had problems
with some versions not producing good stacks.

				Jeff



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores]
  2004-08-12 15:36   ` Joe Marzot
  2004-08-12 15:47     ` Joe Marzot
@ 2004-08-13 15:46     ` Joe Marzot
  2004-08-13 18:01       ` Joe Marzot
                         ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Joe Marzot @ 2004-08-13 15:46 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

Joe Marzot wrote:

> here is a better one produced under similar conditions - this time the 
> core is readable (I do get the unreadable cores quite often though).
> 
> this is host RH8 + skas3 patch
> 
> guest is 2.4.2x + 2.4.24-1um

so looking deeper in this core in handle_trap I see the call to waitpid 
fails with a status of 383 and an err of 13456

(gdb) p err
$6 = 13456 (the pid of the child who exitted)
(gdb) p status
$7 = 383

WSTOPSIG(err) = SIGHUP

does this give any clues...any ideas of what else to look at?

thanks, GSM

> 
> [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c 
> ~szhimin/tmp/joe/core.13456 
> /view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp
> GNU gdb 6.2
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you 
> are
> welcome to change it and/or distribute copies of it under certain 
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db 
> library "/lib/libthread_db.so.1".
> 
> Core was generated by `/vob/neptune/plankton/celp/linux.celp (DSC-0-0-0) 
> [nameServer]                '.
> Program terminated with signal 6, Aborted.
> #0  0xa01643e1 in kill ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
> 486                     case 1: COMMON("\n\tstosb"); return s;
> (gdb) where
> #0  0xa01643e1 in kill ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
> #1  0xa018cbdb in raise ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
> #2  0xa01646cd in abort ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
> #3  0xa00d01e4 in handle_trap (pid=13461, regs=0xa5f7827c) at process.c:90
> #4  0xa00d0438 in userspace (regs=0xa5f7827c) at process.c:168
> #5  0xa00d0bfa in fork_handler (sig=10) at process_kern.c:102
> #6  <signal handler called>
> #7  0xa01643e1 in kill ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
> #8  0xa00d4734 in os_usr1_process (pid=13456) at process.c:95
> #9  0xa00d04ce in new_thread (stack=Cannot access memory at address 0x8
> ) at process.c:205
> Previous frame inner to this frame (corrupt stack?)
> (gdb) info thr
> * 1 process 13456  0xa01643e1 in kill ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
> (gdb)
> 
> 
> 





-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores]
  2004-08-13 15:46     ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot
@ 2004-08-13 18:01       ` Joe Marzot
  2004-08-13 21:47       ` Jeff Dike
  2004-09-13 15:39       ` [uml-devel] handle_trap - failed to wait at end of syscall Joe Marzot
  2 siblings, 0 replies; 23+ messages in thread
From: Joe Marzot @ 2004-08-13 18:01 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

Marzot, Joe [BL60:NP72:EXCH] wrote:
> (gdb) p status
> $7 = 383
> 
> WSTOPSIG(err) = SIGHUP
> 

that should be

WSTOPSIG(status) = SIGHUP

of course. -g



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores]
  2004-08-13 15:46     ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot
  2004-08-13 18:01       ` Joe Marzot
@ 2004-08-13 21:47       ` Jeff Dike
  2004-08-16 17:47         ` [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] " Joe Marzot
  2004-08-20 11:46         ` handle_trap - failed to wait at end of syscall [was Re: [uml-devel] " BlaisorBlade
  2004-09-13 15:39       ` [uml-devel] handle_trap - failed to wait at end of syscall Joe Marzot
  2 siblings, 2 replies; 23+ messages in thread
From: Jeff Dike @ 2004-08-13 21:47 UTC (permalink / raw)
  To: Joe Marzot; +Cc: user-mode-linux-devel

gmarzot@nortelnetworks.com said:
> WSTOPSIG(err) = SIGHUP
> does this give any clues...any ideas of what else to look at?

Do you have any idea how you're making this happen?  The userspace process is
getting a SIGHUP in the middle of having a system call nullified.  This is OK
since a SIGHUP can happen any time if you log out on it or something, but
I'd like to know exactly what's going on so I can decide what the right reaction
to it is.

Simplistically, we could just handle it there and ignore it, since UML probably
got the SIGHUP as well, and will deal with it then.

				Jeff




-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] debugging UML cores]
  2004-08-13 21:47       ` Jeff Dike
@ 2004-08-16 17:47         ` Joe Marzot
  2004-08-16 19:25           ` Joe Marzot
  2004-08-20 11:46         ` handle_trap - failed to wait at end of syscall [was Re: [uml-devel] " BlaisorBlade
  1 sibling, 1 reply; 23+ messages in thread
From: Joe Marzot @ 2004-08-16 17:47 UTC (permalink / raw)
  To: Jeff Dike; +Cc: Joe Marzot, user-mode-linux-devel

Jeff Dike wrote:
> gmarzot@nortelnetworks.com said:
>  > WSTOPSIG(err) = SIGHUP
>  > does this give any clues...any ideas of what else to look at?
> 
> Do you have any idea how you're making this happen? 

unfortunately not...the UML instance is being used as a test harness for 
a complex set of interacting processes. all sorts of things are going 
prior to the crash.

> The userspace 
> process is
> getting a SIGHUP in the middle of having a system call nullified.  

what does it mean to nullify a system call?

I am also losing whether this is a simulated signal inside the UML 
userspace app or a host signal being delivered to the host resident UML 
usespace thread.

> This is OK
> since a SIGHUP can happen any time if you log out on it or something, but
> I'd like to know exactly what's going on so I can decide what the right 
> reaction
> to it is.

as it is a test harness there are lot's of scripts being invoked - 
shells are being spawned and exited. There may be expect scripts logging 
into the UML and logging out if that's what mean.

> 
> Simplistically, we could just handle it there and ignore it, since UML 
> probably
> got the SIGHUP as well, and will deal with it then.

something like this?

if((err < 0) || !WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP) || 
(WSTOPSIG(status) != SIGHUP)) {
    ....
} else {
    handle_syscall(regs);
}

regards, GSM

> 
>                                 Jeff
> 
> 
> 





-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] debugging UML cores]
  2004-08-16 17:47         ` [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] " Joe Marzot
@ 2004-08-16 19:25           ` Joe Marzot
  2004-08-16 19:53             ` D. Bahi
  0 siblings, 1 reply; 23+ messages in thread
From: Joe Marzot @ 2004-08-16 19:25 UTC (permalink / raw)
  Cc: Jeff Dike, user-mode-linux-devel

Joe Marzot wrote:
> Jeff Dike wrote:
> 
>> gmarzot@nortelnetworks.com said:
>>  > WSTOPSIG(err) = SIGHUP
>>  > does this give any clues...any ideas of what else to look at?
>>
>> Do you have any idea how you're making this happen? 

here's another twist - looks like a different crash but stimulated by 
the same tests being performed inside UML. This back trace goes on down 
to zero just like this ->  sig 11, change_sig 10, sig 11...

looks like a klm might have corrupted kernel mem...or does this look 
familial to other UML'ers?

#2156 <signal handler called>
#2157 0xa0151ac0 in sigismember ()
     at 
/localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
#2158 0xa00c09eb in change_sig (signal=10, on=1) at signal_user.c:57
#2159 0xa00c4a01 in sig_handler_common_skas (sig=11, sc_ptr=0xa00cc100)
     at trap_user.c:31
#2160 0xa00c2746 in sig_handler (sig=11, sc=
       {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 
43, __dsh = 0, edi = 10, esi = 2685191148, ebp = 2685191428, esp = 
2685191128, ebx = 2685191276, edx = 2685191276, ecx = 2685191276, eax = 
354011904, trapno = 14, err = 6, eip = 2685737664, cs = 35, __csh = 0, 
eflags = 66050, esp_at_signal = 2685191128, ss = 43, __ssh = 0, fpstate 
= 0x0, oldmask = 134217792, cr2 = 354011904})
     at trap_user.c:102
#2161 <signal handler called>
#2162 0xa0151ac0 in sigismember ()
     at 
/localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
#2163 0xa00c09eb in change_sig (signal=10, on=1) at signal_user.c:57
---Type <return> to continue, or q <return> to quit---
#2164 0xa00c4a01 in sig_handler_common_skas (sig=0, sc_ptr=0xa00cc560)
     at trap_user.c:31
#2165 0xa00c2746 in sig_handler (sig=Cannot access memory at address 0x16
) at trap_user.c:102
Previous frame inner to this frame (corrupt stack?)

anyone have any tips on interesting fields to look at?

regards, Giovanni

> 
> 
> unfortunately not...the UML instance is being used as a test harness for 
> a complex set of interacting processes. all sorts of things are going 
> prior to the crash.
> 
>> The userspace process is
>> getting a SIGHUP in the middle of having a system call nullified.  
> 
> 
> what does it mean to nullify a system call?
> 
> I am also losing whether this is a simulated signal inside the UML 
> userspace app or a host signal being delivered to the host resident UML 
> usespace thread.
> 
>> This is OK
>> since a SIGHUP can happen any time if you log out on it or something, but
>> I'd like to know exactly what's going on so I can decide what the 
>> right reaction
>> to it is.
> 
> 
> as it is a test harness there are lot's of scripts being invoked - 
> shells are being spawned and exited. There may be expect scripts logging 
> into the UML and logging out if that's what mean.
> 
>>
>> Simplistically, we could just handle it there and ignore it, since UML 
>> probably
>> got the SIGHUP as well, and will deal with it then.
> 
> 
> something like this?
> 
> if((err < 0) || !WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP) || 
> (WSTOPSIG(status) != SIGHUP)) {
>    ....
> } else {
>    handle_syscall(regs);
> }
> 
> regards, GSM
> 
>>
>>                                 Jeff
>>
>>
>>
> 
> 
> 





-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] debugging UML cores]
  2004-08-16 19:25           ` Joe Marzot
@ 2004-08-16 19:53             ` D. Bahi
  2004-08-17  5:26               ` Jeff Dike
  0 siblings, 1 reply; 23+ messages in thread
From: D. Bahi @ 2004-08-16 19:53 UTC (permalink / raw)
  To: Joe Marzot; +Cc: Jeff Dike, user-mode-linux-devel


[-- Attachment #1.1: Type: text/plain, Size: 3957 bytes --]

does this look familar? humm, here's 2.4.26-3um, backtrace attached.

we do have kernel modules loaded... and lots of communication with
a modified uml_switch going on... otherwise this can happen in a
relatively idle UML after some random period of time.

i have not seen this in a vanilla 2.4.26-3 with a generic redhat 9 file
system just doing 'ls -R' over and over for exercise -- btw: it has no
modules loaded... and none in the filesystem to load for a quick test.

i'm installing Expect.pm so i can play with the test scripts and try
to isolate this and the hostfs troubles... fun.

db

Joe Marzot wrote:

> Joe Marzot wrote:
> 
>> Jeff Dike wrote:
>>
>>> gmarzot@nortelnetworks.com said:
>>>  > WSTOPSIG(err) = SIGHUP
>>>  > does this give any clues...any ideas of what else to look at?
>>>
>>> Do you have any idea how you're making this happen? 
> 
> 
> here's another twist - looks like a different crash but stimulated by 
> the same tests being performed inside UML. This back trace goes on down 
> to zero just like this ->  sig 11, change_sig 10, sig 11...
> 
> looks like a klm might have corrupted kernel mem...or does this look 
> familial to other UML'ers?
> 
> #2156 <signal handler called>
> #2157 0xa0151ac0 in sigismember ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
> #2158 0xa00c09eb in change_sig (signal=10, on=1) at signal_user.c:57
> #2159 0xa00c4a01 in sig_handler_common_skas (sig=11, sc_ptr=0xa00cc100)
>     at trap_user.c:31
> #2160 0xa00c2746 in sig_handler (sig=11, sc=
>       {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 
> 43, __dsh = 0, edi = 10, esi = 2685191148, ebp = 2685191428, esp = 
> 2685191128, ebx = 2685191276, edx = 2685191276, ecx = 2685191276, eax = 
> 354011904, trapno = 14, err = 6, eip = 2685737664, cs = 35, __csh = 0, 
> eflags = 66050, esp_at_signal = 2685191128, ss = 43, __ssh = 0, fpstate 
> = 0x0, oldmask = 134217792, cr2 = 354011904})
>     at trap_user.c:102
> #2161 <signal handler called>
> #2162 0xa0151ac0 in sigismember ()
>     at 
> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
> #2163 0xa00c09eb in change_sig (signal=10, on=1) at signal_user.c:57
> ---Type <return> to continue, or q <return> to quit---
> #2164 0xa00c4a01 in sig_handler_common_skas (sig=0, sc_ptr=0xa00cc560)
>     at trap_user.c:31
> #2165 0xa00c2746 in sig_handler (sig=Cannot access memory at address 0x16
> ) at trap_user.c:102
> Previous frame inner to this frame (corrupt stack?)
> 
> anyone have any tips on interesting fields to look at?
> 
> regards, Giovanni
> 
>>
>>
>> unfortunately not...the UML instance is being used as a test harness 
>> for a complex set of interacting processes. all sorts of things are 
>> going prior to the crash.
>>
>>> The userspace process is
>>> getting a SIGHUP in the middle of having a system call nullified.  
>>
>>
>>
>> what does it mean to nullify a system call?
>>
>> I am also losing whether this is a simulated signal inside the UML 
>> userspace app or a host signal being delivered to the host resident 
>> UML usespace thread.
>>
>>> This is OK
>>> since a SIGHUP can happen any time if you log out on it or something, 
>>> but
>>> I'd like to know exactly what's going on so I can decide what the 
>>> right reaction
>>> to it is.
>>
>>
>>
>> as it is a test harness there are lot's of scripts being invoked - 
>> shells are being spawned and exited. There may be expect scripts 
>> logging into the UML and logging out if that's what mean.
>>
>>>
>>> Simplistically, we could just handle it there and ignore it, since 
>>> UML probably
>>> got the SIGHUP as well, and will deal with it then.
>>
>>
>>
>> something like this?
>>
>> if((err < 0) || !WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP) 
>> || (WSTOPSIG(status) != SIGHUP)) {
>>    ....
>> } else {
>>    handle_syscall(regs);
>> }
>>
>> regards, GSM
>>
>>>
>>>                                 Jeff


[-- Attachment #1.2: randomkernelpanic.txt --]
[-- Type: text/plain, Size: 3210 bytes --]

#35 0x080dd263 in linux_main (argc=12, argv=0x20000000) at um_arch.c:393
#34 0x080debae in start_uml_skas () at process_kern.c:193
#33 0x080de4e5 in start_idle_thread (stack=0x81e8000, switch_buf_ptr=0x81e8578, fork_buf_ptr=0x0) at process.c:303
#32 0x0815a325 in siglongjmp () at proc_fs.h:154
#31 0x0815a691 in kill () at proc_fs.h:154
#30 <signal handler called>
#29 0x080de886 in new_thread_handler (sig=10) at process_kern.c:72
#28 0x080d90ed in run_kernel_thread (fn=0x80deb34 <start_kernel_proc>, arg=0x0, jmp_ptr=0x81e8000) at process.c:231
#27 0x080deb5b in start_kernel_proc (unused=0x0) at process_kern.c:179
#26 0x0804950a in start_kernel () at init/main.c:440
#25 0x0805144e in rest_init () at init/main.c:346
#24 0x080d94f1 in cpu_idle () at process_kern.c:209
#23 0x080dc27a in idle_sleep (secs=-4) at time.c:132
#22 0x0816787a in nanosleep () at proc_fs.h:154
#21 <signal handler called>
#20 0x080dce1e in sig_handler (sig=29, sc={gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, edi = 136216576, esi = 136216576, ebp = 136248204, esp = 136248176, ebx = 136248196, edx = 136216576, ecx = 0, eax = 4294967292, trapno = 14, err = 6, eip = 135690362, cs = 35, __csh = 0, eflags = 582, esp_at_signal = 136248176, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0, cr2 = 681033728}) at trap_user.c:109
#19 0x080df1f5 in sig_handler_common_skas (sig=29, sc_ptr=0xe8) at trap_user.c:35
#18 0x080d72bb in sigio_handler (sig=29, regs=0x81e8270) at irq_user.c:73
#17 0x080d6c57 in do_IRQ (irq=5, regs=0x81e8270) at irq.c:336
#16 0x0805ae62 in do_softirq () at softirq.c:90
#15 0x08109b50 in net_rx_action (h=0x8203590) at dev.c:1626
#14 0x08109a35 in process_backlog (backlog_dev=0x82038e8, budget=0x81ef7ac) at dev.c:1563
#13 0x08109915 in netif_receive_skb (skb=0x25ba6d20) at dev.c:1530
#12 0x08136763 in arp_process (skb=0x25ba6d20) at arp.c:946
#11 0x0810db92 in neigh_update (neigh=0x260535a0, lladdr=0x20cc7858 "\002", new=2 '\002', override=1, arp=1) at neighbour.c:895
#10 0x0810ef94 in neigh_app_notify (n=0x260535a0) at neighbour.c:1477
#9  0x0810eb11 in neigh_fill_info (skb=0x83cca80, n=0x260535a0, pid=1, seq=1, event=1) at neighbour.c:1341
#8  <signal handler called>
#7  0x080dce1e in sig_handler (sig=11, sc={gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, edi = 525299216, esi = 138201728, ebp = 136246892, esp = 136246836, ebx = 525299200, edx = 637875616, ecx = 136246660, eax = 1, trapno = 14, err = 4, eip = 135326481, cs = 35, __csh = 0, eflags = 66050, esp_at_signal = 136246836, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 134217728, cr2 = 61}) at trap_user.c:109
#6  0x080df1f5 in sig_handler_common_skas (sig=11, sc_ptr=0x58) at trap_user.c:35
#5  0x080dcdfd in segv_handler (sig=11, regs=0x81e8270) at trap_user.c:74
#4  0x080dcab9 in segv (address=61, ip=0, is_write=0, is_user=0, sc=0x81e8270) at trap_kern.c:149
#3  0x08056215 in panic (fmt=0x81c8b60 "Kernel mode fault at addr 0x%lx, ip 0x%lx") at panic.c:77
#2  0x080612a6 in notifier_call_chain (n=0xf4240, val=0, v=0x820d1c0) at sys.c:148
#1  0x080dd3a9 in panic_exit (self=0x81f6c34, unused1=0, unused2=0x820d1c0) at um_arch.c:403
#0  stop () at user_util.c:52

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] debugging UML cores]
  2004-08-16 19:53             ` D. Bahi
@ 2004-08-17  5:26               ` Jeff Dike
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Dike @ 2004-08-17  5:26 UTC (permalink / raw)
  To: D. Bahi; +Cc: Joe Marzot, user-mode-linux-devel

dbahi@enterasys.com said:
> does this look familar? humm, here's 2.4.26-3um, backtrace attached. 

Not even close to the same bug.  It's segfaulting at neighbour.c, line 1341,
which is this:

  1340          ci.ndm_used = now - n->used;
  1341          ci.ndm_confirmed = now - n->confirmed;
  1342          ci.ndm_updated = now - n->updated;

This is mystifying because whatever address that line 1341 could have faulted 
on, line 1340 should have faulted.  The fault address is 61 (== 0x3d), which
I can't see in that code either.  If n were 0, then you'd get a fault on some
low address, but
	n = 0x260535a0 (and I assume you're using the new load-low option)
	n->confirmed is a 4-byte aligned field, and would not fault on an
odd address.

So, I think I don't totally trust gdb's line number reporting in this case.
What I would do is disassemble neigh_fill_info, and see what line the faulting
instruction really belongs to.

				Jeff



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores]
  2004-08-13 21:47       ` Jeff Dike
  2004-08-16 17:47         ` [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] " Joe Marzot
@ 2004-08-20 11:46         ` BlaisorBlade
  1 sibling, 0 replies; 23+ messages in thread
From: BlaisorBlade @ 2004-08-20 11:46 UTC (permalink / raw)
  To: user-mode-linux-devel; +Cc: Jeff Dike, Joe Marzot

Alle 23:47, venerdì 13 agosto 2004, Jeff Dike ha scritto:
> gmarzot@nortelnetworks.com said:
> > WSTOPSIG(err) = SIGHUP
> > does this give any clues...any ideas of what else to look at?
>
> Do you have any idea how you're making this happen?  The userspace process
> is getting a SIGHUP in the middle of having a system call nullified.  This
> is OK since a SIGHUP can happen any time if you log out on it or something,
> but I'd like to know exactly what's going on so I can decide what the right
> reaction to it is.
I'm getting a similar problem, in another situation. With my current tree (but 
also any 2.6.7-bb should do; but you can test it on -bb4, to be sure), on a 
2.6.7 host with host-skas3-2.6.7-v2.patch (notice -v2, it contains SYSEMU) 
(never got this on 2.4 host, but I'll retest), if I do "echo 0 
> /proc/sysemu" I get the same failure, but with 2943 as status (i.e. 
WSTOPSIG=SIGSEGV) and EINTR as errno (but I think that's from a previous 
loop, since I've applied the catch-EINTR patch; i.e. probably it did the 
syscall, it was interrupted, and did the syscall again which returned SIGSEGV 
as stop status). And this is definitely reproducible with echo 0 
> /proc/sysemu. But when booting with nosysemu, it works, and I can even 
reenable it with echo 1 > /proc/sysemu; but if I try disabling it again, I 
get the same problem.
-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall
  2004-08-13 15:46     ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot
  2004-08-13 18:01       ` Joe Marzot
  2004-08-13 21:47       ` Jeff Dike
@ 2004-09-13 15:39       ` Joe Marzot
  2004-09-13 19:39         ` BlaisorBlade
  2 siblings, 1 reply; 23+ messages in thread
From: Joe Marzot @ 2004-09-13 15:39 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel

remeber this one?...the latest take on this is that because we launch 
UMLs from a perl script (using fork/exec) when the perl script exits a 
SIGHUP is transmitted to the UML proc which sometimes interrupts a 
waitpid()...if that interruption occurs during the nullification of a 
syscall (now that I know what that means:) then you get a kernel panic 
like below.

I made a small fix that seems to be working for me and looks like what's 
going on in CATCH_EINTR

do {
   CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED));
} while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP))

can't do this globally in CATCH_EINTR since some waitpids don't check 
status...maybe they should...maybe there is a more correct way to do 
this altogether...

thoughts?

regards, Giovanni

Marzot, Joe [BL60:NP72:EXCH] wrote:
> Joe Marzot wrote:
> 
>> here is a better one produced under similar conditions - this time the 
>> core is readable (I do get the unreadable cores quite often though).
>>
>> this is host RH8 + skas3 patch
>>
>> guest is 2.4.2x + 2.4.24-1um
> 
> 
> so looking deeper in this core in handle_trap I see the call to waitpid 
> fails with a status of 383 and an err of 13456
> 
> (gdb) p err
> $6 = 13456 (the pid of the child who exitted)
> (gdb) p status
> $7 = 383
> 
> WSTOPSIG(err) = SIGHUP
> 
> does this give any clues...any ideas of what else to look at?
> 
> thanks, GSM
> 
>>
>> [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c 
>> ~szhimin/tmp/joe/core.13456 
>> /view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp
>> GNU gdb 6.2
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and 
>> you are
>> welcome to change it and/or distribute copies of it under certain 
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>> details.
>> This GDB was configured as "i686-pc-linux-gnu"...Using host 
>> libthread_db library "/lib/libthread_db.so.1".
>>
>> Core was generated by `/vob/neptune/plankton/celp/linux.celp 
>> (DSC-0-0-0) [nameServer]                '.
>> Program terminated with signal 6, Aborted.
>> #0  0xa01643e1 in kill ()
>>     at 
>> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
>> 486                     case 1: COMMON("\n\tstosb"); return s;
>> (gdb) where
>> #0  0xa01643e1 in kill ()
>>     at 
>> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
>> #1  0xa018cbdb in raise ()
>>     at 
>> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
>> #2  0xa01646cd in abort ()
>>     at 
>> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
>> #3  0xa00d01e4 in handle_trap (pid=13461, regs=0xa5f7827c) at 
>> process.c:90
>> #4  0xa00d0438 in userspace (regs=0xa5f7827c) at process.c:168
>> #5  0xa00d0bfa in fork_handler (sig=10) at process_kern.c:102
>> #6  <signal handler called>
>> #7  0xa01643e1 in kill ()
>>     at 
>> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
>> #8  0xa00d4734 in os_usr1_process (pid=13456) at process.c:95
>> #9  0xa00d04ce in new_thread (stack=Cannot access memory at address 0x8
>> ) at process.c:205
>> Previous frame inner to this frame (corrupt stack?)
>> (gdb) info thr
>> * 1 process 13456  0xa01643e1 in kill ()
>>     at 
>> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486
>> (gdb)
>>
>>
>>
> 
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
> 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
> Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
> http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
> 





-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall
  2004-09-13 15:39       ` [uml-devel] handle_trap - failed to wait at end of syscall Joe Marzot
@ 2004-09-13 19:39         ` BlaisorBlade
  2004-09-13 22:14           ` Jeff Dike
  0 siblings, 1 reply; 23+ messages in thread
From: BlaisorBlade @ 2004-09-13 19:39 UTC (permalink / raw)
  To: user-mode-linux-devel; +Cc: Joe Marzot, Jeff Dike

On Monday 13 September 2004 17:39, Joe Marzot wrote:
> remeber this one?...the latest take on this is that because we launch
> UMLs from a perl script (using fork/exec) when the perl script exits a
> SIGHUP is transmitted to the UML proc which sometimes interrupts a
> waitpid()...if that interruption occurs during the nullification of a
> syscall (now that I know what that means:) then you get a kernel panic
> like below.

> I made a small fix that seems to be working for me and looks like what's
> going on in CATCH_EINTR

> can't do this globally in CATCH_EINTR since some waitpids don't check
> status...maybe they should...maybe there is a more correct way to do
> this altogether...
I'm going to merge something like this. Also, sorry - it's about a month that 
I should do this and keep forgetting to work on other stuff.

However, it is not possible nor desirable to do this in CATCH_EINTR - retry if 
errno == EINTR is a general rule valid in every Unix program ever, while this 
is very specific to this call.

> do {
>    CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED));
> } while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP))

I'll turn that to STOPSIG(status) != SIGTRAP. I'm getting the same problem 
with SIGSEGV instead (IIRC).

However, maybe that must be != SIGTRAP and != <other signal>. I don't think 
so, but I must check to be sure. Jeff, what do you think?

Also, we don't make a distinction between real SIGTRAP and syscall stop.

Jeff, would you agree to using PTRACE_O_SYSGOOD?
See arch/i386/kernel/ptrace.c:do_syscall_trace for an explaination of this 
func.

-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall
  2004-09-13 19:39         ` BlaisorBlade
@ 2004-09-13 22:14           ` Jeff Dike
  2004-09-14 10:41             ` BlaisorBlade
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff Dike @ 2004-09-13 22:14 UTC (permalink / raw)
  To: BlaisorBlade; +Cc: user-mode-linux-devel, Joe Marzot

blaisorblade_spam@yahoo.it said:
> However, it is not possible nor desirable to do this in CATCH_EINTR -
> retry if  errno == EINTR is a general rule valid in every Unix program
> ever, while this  is very specific to this call.
>
> do {
>    CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED));
> } while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP)) 

I'd still like to understand exactly what's going on here.  UML interprets
a SIGHUP to itself as a "shut down now" command, while it should not see
SIGHUP from a terminal going away.

Figuring out why it is should point us at the correct fix.

				Jeff



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall
  2004-09-13 22:14           ` Jeff Dike
@ 2004-09-14 10:41             ` BlaisorBlade
  2004-09-14 16:09               ` Joe Marzot
  0 siblings, 1 reply; 23+ messages in thread
From: BlaisorBlade @ 2004-09-14 10:41 UTC (permalink / raw)
  To: Jeff Dike; +Cc: user-mode-linux-devel, Joe Marzot

On Tuesday 14 September 2004 00:14, Jeff Dike wrote:
> blaisorblade_spam@yahoo.it said:
> > However, it is not possible nor desirable to do this in CATCH_EINTR -
> > retry if  errno == EINTR is a general rule valid in every Unix program
> > ever, while this  is very specific to this call.
> >
> > do {
> >    CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED));
> > } while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP))
>
> I'd still like to understand exactly what's going on here.  UML interprets
> a SIGHUP to itself as a "shut down now" command, while it should not see
> SIGHUP from a terminal going away.
>
> Figuring out why it is should point us at the correct fix.

Well, I've a situation where I consistently get SIGSEGV instead of SIGHUP 
here, but only on 2.6 host. The scenario is to do "echo 0 > /proc/sysemu". 
You can test that with 2.6.9-rc2 or with 2.6.7-bb6 (both include /proc/sysemu 
support).

The problem (at least in my scenario) is that the signal in 2.4 is delivered 
only to the kernel thread, while on 2.6 (for some reason) it is delivered 
first to the userspace thread. You too mentioned 2.6 signal delivery changes 
as the reason for some fixes.

So, Joe, since you can get this panic consistently, could you try reproducing 
the scenario on a 2.4 host kernel? I guess you shouldn't be able, but I could 
be wrong. Also, a 2.4 RH kernel does not qualify as a true 2.4 host kernel, 
since it contains some NPTL code - if you can, try just a 2.4 vanilla + SKAS.

About the fix, most signals get delivered to all threads, so we can probably 
safely ignore them when received through waitpid(). But Ulrich Drepper says 
here:

http://people.redhat.com/drepper/posix-signal-model.xml

that SIGSEGV should be delivered only to the generating thread; the document 
lists changes to be done to Linux, so maybe this is implemented in 2.6 and 
not in 2.4. However, OTOH, he also says that signal handlers are 
process-wide, so we should be safe anyway. And anyway, the code works 
perfectly on 2.4 hosts.
-- 
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729

-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM. 
Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall
  2004-09-14 10:41             ` BlaisorBlade
@ 2004-09-14 16:09               ` Joe Marzot
  2004-09-14 21:23                 ` Jeff Dike
  0 siblings, 1 reply; 23+ messages in thread
From: Joe Marzot @ 2004-09-14 16:09 UTC (permalink / raw)
  To: BlaisorBlade
  Cc: Jeff Dike, user-mode-linux-devel, Joe Marzot,
	Smith, Paul [BL60:NP52:EXCH]

BlaisorBlade wrote:

>>Figuring out why it is should point us at the correct fix.
> 

Agree...unfortunately I do not really understand where the SUGHUP is 
coming from exactly. It is being delivered to the userspace thread since 
the waitpid() in the kernel thread returns it in status.

In our case the UML is launched like this:

perl script
   my $pid = fork();
   if ($pid == 0) {
      setpgrp(); # give all UMLs the same group id so I can renice them
      exec($cmd);

where $cmd is something like:

'exec linux umid=foo ubd0=cow,rootfs mem=256M con0=xterm con=pts 
eth0=tuntap,tap0,02:00:00:04:00:01, fakehd fake_ide < /dev/null > 
/tmp/uml.log'

We have pretty well correlated the SUGHUP delivery with the exit of the 
parent perl script...although it occurs only about 10% of the time...if 
we put a delay before the script exits it still produces the same crash 
rate except delayed.

> 
> Well, I've a situation where I consistently get SIGSEGV instead of SIGHUP 
> here, but only on 2.6 host. The scenario is to do "echo 0 > /proc/sysemu". 
> You can test that with 2.6.9-rc2 or with 2.6.7-bb6 (both include /proc/sysemu 
> support).
> 
> The problem (at least in my scenario) is that the signal in 2.4 is delivered 
> only to the kernel thread, while on 2.6 (for some reason) it is delivered 
> first to the userspace thread. You too mentioned 2.6 signal delivery changes 
> as the reason for some fixes.

In my case I see the signal is being delivered to the userpsace process.

I am running a 2.4.18-19.8.0 RHish host with SKAS3 patch and a 2.4.22ish 
guest w/ 2.4.24-1um patch. No NPTL here anywhere that I know of.

> 
> So, Joe, since you can get this panic consistently, could you try reproducing 
> the scenario on a 2.4 host kernel? I guess you shouldn't be able, but I could 
> be wrong. Also, a 2.4 RH kernel does not qualify as a true 2.4 host kernel, 
> since it contains some NPTL code - if you can, try just a 2.4 vanilla + SKAS.

I am not sure I understand the request - I am already using a 2.4 host. 
Would like to help though if you can think of something I can do with 
the base I have. I have no /proc/sysemu on guest or host.

> 
> About the fix, most signals get delivered to all threads, so we can probably 
> safely ignore them when received through waitpid(). But Ulrich Drepper says 
> here:
>  
> http://people.redhat.com/drepper/posix-signal-model.xml

cool article - thanks.

> 
> that SIGSEGV should be delivered only to the generating thread; the document 
> lists changes to be done to Linux, so maybe this is implemented in 2.6 and 
> not in 2.4. However, OTOH, he also says that signal handlers are 
> process-wide, so we should be safe anyway. And anyway, the code works 
> perfectly on 2.4 hosts.





-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall
  2004-09-14 16:09               ` Joe Marzot
@ 2004-09-14 21:23                 ` Jeff Dike
  2004-09-15  5:00                   ` Richard Potter
  2004-09-15 19:35                   ` Joe Marzot
  0 siblings, 2 replies; 23+ messages in thread
From: Jeff Dike @ 2004-09-14 21:23 UTC (permalink / raw)
  To: Joe Marzot
  Cc: BlaisorBlade, user-mode-linux-devel, Smith, Paul [BL60:NP52:EXCH]

gmarzot@nortelnetworks.com said:
> In our case the UML is launched like this:
> perl script
>    my $pid = fork();
>    if ($pid == 0) {
>       setpgrp(); # give all UMLs the same group id so I can renice them
>       exec($cmd); 

I think I understand what's happening.  You are (unwittingly) sending UML
(and every process that belongs to it) a HUP when you, in effect, detach it
from its parent terminal.

My first thought was that SIGHUP isn't disabled in the userspace process,
and it should be, but it is, so I don't know why it's even being seen by
the ptracer.  It should only see signals which are enabled in the child.

In the meantime, while we figure this out, you might try just adding "nohup"
to the beginning of your $cmd.

				Jeff



-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall
  2004-09-14 21:23                 ` Jeff Dike
@ 2004-09-15  5:00                   ` Richard Potter
  2004-09-15 19:35                   ` Joe Marzot
  1 sibling, 0 replies; 23+ messages in thread
From: Richard Potter @ 2004-09-15  5:00 UTC (permalink / raw)
  To: Jeff Dike
  Cc: Joe Marzot, BlaisorBlade, user-mode-linux-devel,
	Smith, Paul [BL60:NP52:EXCH]

SBUML (sbuml.sf.net) had a big SIGHUP problem that produced intermittent
crashes, also about 10% of the time.  After definitively tracing the
problem to SIGHUP, it seemed like nohup in bash would solve it but I did
not have any luck with it. The solution that finally worked was using
setsid:

setsid linux umid=foo .....

--Richard

> gmarzot@nortelnetworks.com said:
> > In our case the UML is launched like this:
> > perl script
> >    my $pid = fork();
> >    if ($pid == 0) {
> >       setpgrp(); # give all UMLs the same group id so I can renice them
> >       exec($cmd); 
> 
> I think I understand what's happening.  You are (unwittingly) sending UML
> (and every process that belongs to it) a HUP when you, in effect, detach it
> from its parent terminal.
> 
> My first thought was that SIGHUP isn't disabled in the userspace process,
> and it should be, but it is, so I don't know why it's even being seen by
> the ptracer.  It should only see signals which are enabled in the child.
> 
> In the meantime, while we figure this out, you might try just adding "nohup"
> to the beginning of your $cmd.
> 
> 				Jeff
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
> Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
> Camcorder. More prizes in the weekly Lunch Hour Challenge.
> Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel




-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [uml-devel] handle_trap - failed to wait at end of syscall
  2004-09-14 21:23                 ` Jeff Dike
  2004-09-15  5:00                   ` Richard Potter
@ 2004-09-15 19:35                   ` Joe Marzot
  1 sibling, 0 replies; 23+ messages in thread
From: Joe Marzot @ 2004-09-15 19:35 UTC (permalink / raw)
  To: Jeff Dike
  Cc: Joe Marzot, BlaisorBlade, user-mode-linux-devel,
	Smith, Paul [BL60:NP52:EXCH]

Jeff Dike wrote:
> gmarzot@nortelnetworks.com said:
> 
>>In our case the UML is launched like this:
>>perl script
>>   my $pid = fork();
>>   if ($pid == 0) {
>>      setpgrp(); # give all UMLs the same group id so I can renice them
>>      exec($cmd); 
> 
> 
> I think I understand what's happening.  You are (unwittingly) sending UML
> (and every process that belongs to it) a HUP when you, in effect, detach it
> from its parent terminal.

where am I detaching from the parent terminal?

I just did a little test and instead of invoking UML I start a little 
perl script in exactly the same way as above to catch SIGHUP...but no 
SIGHUP arrives.

> 
> My first thought was that SIGHUP isn't disabled in the userspace process,
> and it should be, but it is, so I don't know why it's even being seen by
> the ptracer.  It should only see signals which are enabled in the child.

more oddness.

> 
> In the meantime, while we figure this out, you might try just adding "nohup"
> to the beginning of your $cmd.

originally tried this but something bad happened...forget which I could 
go back and see ... and try setsid as the other poster suggested.

regards, GSM

> 
> 				Jeff
> 
> 
> 





-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2004-09-15 19:37 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-11 15:32 [uml-devel] debugging UML cores Joe Marzot
2004-08-12  5:41 ` Jeff Dike
2004-08-12 15:21   ` Joe Marzot
2004-08-12 16:56     ` Jeff Dike
2004-08-12 16:16       ` Joe Marzot
2004-08-12 15:36   ` Joe Marzot
2004-08-12 15:47     ` Joe Marzot
2004-08-13 15:46     ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot
2004-08-13 18:01       ` Joe Marzot
2004-08-13 21:47       ` Jeff Dike
2004-08-16 17:47         ` [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] " Joe Marzot
2004-08-16 19:25           ` Joe Marzot
2004-08-16 19:53             ` D. Bahi
2004-08-17  5:26               ` Jeff Dike
2004-08-20 11:46         ` handle_trap - failed to wait at end of syscall [was Re: [uml-devel] " BlaisorBlade
2004-09-13 15:39       ` [uml-devel] handle_trap - failed to wait at end of syscall Joe Marzot
2004-09-13 19:39         ` BlaisorBlade
2004-09-13 22:14           ` Jeff Dike
2004-09-14 10:41             ` BlaisorBlade
2004-09-14 16:09               ` Joe Marzot
2004-09-14 21:23                 ` Jeff Dike
2004-09-15  5:00                   ` Richard Potter
2004-09-15 19:35                   ` Joe Marzot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.