* [uml-devel] debugging UML cores @ 2004-08-11 15:32 Joe Marzot 2004-08-12 5:41 ` Jeff Dike 0 siblings, 1 reply; 23+ messages in thread From: Joe Marzot @ 2004-08-11 15:32 UTC (permalink / raw) To: user-mode-linux-devel Hi UML developers, I am getting a variety of cores from UML - fairly intermittent. The bad part is that examining these cores with GDB is utterly fruitless (for me). Are there some tricks beyond the normal stuff below I should be doing to get a better sense of what is going wrong? I see no back trace and no thread info. thanks for any help, Giovanni [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c /bne/home/gmarzot/proj/celp/cores/core_sanity_crash_8_11 celp/linux.celp GNU gdb 6.2 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `/vob/neptune/plankton/celp/linux.celp (CSC-0-4-0) [/sbin/modprobe] '. Program terminated with signal 11, Segmentation fault. #0 0x00000000 in ?? () (gdb) where #0 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () (gdb) info thr * 1 process 12718 0x00000000 in ?? () warning: Couldn't restore frame in current thread, at frame 0 0x00000000 in ?? () (gdb) ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] debugging UML cores 2004-08-11 15:32 [uml-devel] debugging UML cores Joe Marzot @ 2004-08-12 5:41 ` Jeff Dike 2004-08-12 15:21 ` Joe Marzot 2004-08-12 15:36 ` Joe Marzot 0 siblings, 2 replies; 23+ messages in thread From: Jeff Dike @ 2004-08-12 5:41 UTC (permalink / raw) To: Joe Marzot; +Cc: user-mode-linux-devel gmarzot@nortelnetworks.com said: > I am getting a variety of cores from UML - fairly intermittent. The > bad part is that examining these cores with GDB is utterly fruitless > (for me). Are there some tricks beyond the normal stuff below I > should be doing to get a better sense of what is going wrong? I see > no back trace and no thread info. That is utterly confused. I would make sure that you are absolutely positive that you are giving gdb the exact binary that created the core. Jeff ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] debugging UML cores 2004-08-12 5:41 ` Jeff Dike @ 2004-08-12 15:21 ` Joe Marzot 2004-08-12 16:56 ` Jeff Dike 2004-08-12 15:36 ` Joe Marzot 1 sibling, 1 reply; 23+ messages in thread From: Joe Marzot @ 2004-08-12 15:21 UTC (permalink / raw) To: Jeff Dike; +Cc: user-mode-linux-devel Jeff Dike wrote: > gmarzot@nortelnetworks.com said: > > I am getting a variety of cores from UML - fairly intermittent. The > > bad part is that examining these cores with GDB is utterly fruitless > > (for me). Are there some tricks beyond the normal stuff below I > > should be doing to get a better sense of what is going wrong? I see > > no back trace and no thread info. > > That is utterly confused. I would make sure that you are absolutely > positive > that you are giving gdb the exact binary that created the core. > > Jeff > > hi, I am 99% sure. Is there a way from the core to see if it agrees with binary...to see if some key info matches. I can try to make both available to you some place if that helps. regards, Giovanni ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] debugging UML cores 2004-08-12 15:21 ` Joe Marzot @ 2004-08-12 16:56 ` Jeff Dike 2004-08-12 16:16 ` Joe Marzot 0 siblings, 1 reply; 23+ messages in thread From: Jeff Dike @ 2004-08-12 16:56 UTC (permalink / raw) To: Joe Marzot; +Cc: user-mode-linux-devel gmarzot@nortelnetworks.com said: > I am 99% sure. Is there a way from the core to see if it agrees with > binary...to see if some key info matches. I would be most concerned about whether the binary had been rebuilt since the core happend. gdb will tell you the path of the thing that dumped core - presumably that matched. The other thing would be to check the dates, to see that the binary is older than the core. If that's all OK, you could try a different version of gdb. We've had problems with some versions not producing good stacks. Jeff ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] debugging UML cores 2004-08-12 16:56 ` Jeff Dike @ 2004-08-12 16:16 ` Joe Marzot 0 siblings, 0 replies; 23+ messages in thread From: Joe Marzot @ 2004-08-12 16:16 UTC (permalink / raw) To: Jeff Dike; +Cc: Joe Marzot, user-mode-linux-devel Jeff Dike wrote: > gmarzot@nortelnetworks.com said: > >>I am 99% sure. Is there a way from the core to see if it agrees with >>binary...to see if some key info matches. > > > I would be most concerned about whether the binary had been rebuilt since the > core happend. gdb will tell you the path of the thing that dumped core - > presumably that matched. The other thing would be to check the dates, to > see that the binary is older than the core. all checks out. > > If that's all OK, you could try a different version of gdb. We've had problems > with some versions not producing good stacks. that's what I thought so I got a new gdb 6.2 and no help. do you recommend a combination gcc/gdb libc libpthread that all play nice together? regards, G > > Jeff > > ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] debugging UML cores 2004-08-12 5:41 ` Jeff Dike 2004-08-12 15:21 ` Joe Marzot @ 2004-08-12 15:36 ` Joe Marzot 2004-08-12 15:47 ` Joe Marzot 2004-08-13 15:46 ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot 1 sibling, 2 replies; 23+ messages in thread From: Joe Marzot @ 2004-08-12 15:36 UTC (permalink / raw) To: Jeff Dike; +Cc: Joe Marzot, user-mode-linux-devel Jeff Dike wrote: > gmarzot@nortelnetworks.com said: > > I am getting a variety of cores from UML - fairly intermittent. The > > bad part is that examining these cores with GDB is utterly fruitless > > (for me). Are there some tricks beyond the normal stuff below I > > should be doing to get a better sense of what is going wrong? I see > > no back trace and no thread info. > > That is utterly confused. I would make sure that you are absolutely > positive > that you are giving gdb the exact binary that created the core. > > Jeff > > here is a better one produced under similar conditions - this time the core is readable (I do get the unreadable cores quite often though). this is host RH8 + skas3 pathc guest is 2.4.2x + 2.4.24-1um [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c ~szhimin/tmp/joe/core.13456 /view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp GNU gdb 6.2 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". Core was generated by `/vob/neptune/plankton/celp/linux.celp (DSC-0-0-0) [nameServer] '. Program terminated with signal 6, Aborted. #0 0xa01643e1 in kill () at /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 486 case 1: COMMON("\n\tstosb"); return s; (gdb) where #0 0xa01643e1 in kill () at /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 #1 0xa018cbdb in raise () at /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 #2 0xa01646cd in abort () at /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 #3 0xa00d01e4 in handle_trap (pid=13461, regs=0xa5f7827c) at process.c:90 #4 0xa00d0438 in userspace (regs=0xa5f7827c) at process.c:168 #5 0xa00d0bfa in fork_handler (sig=10) at process_kern.c:102 #6 <signal handler called> #7 0xa01643e1 in kill () at /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 #8 0xa00d4734 in os_usr1_process (pid=13456) at process.c:95 #9 0xa00d04ce in new_thread (stack=Cannot access memory at address 0x8 ) at process.c:205 Previous frame inner to this frame (corrupt stack?) (gdb) info thr * 1 process 13456 0xa01643e1 in kill () at /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 (gdb) ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] debugging UML cores 2004-08-12 15:36 ` Joe Marzot @ 2004-08-12 15:47 ` Joe Marzot 2004-08-13 15:46 ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot 1 sibling, 0 replies; 23+ messages in thread From: Joe Marzot @ 2004-08-12 15:47 UTC (permalink / raw) To: Jeff Dike; +Cc: user-mode-linux-devel Joe Marzot wrote: > > [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c > ~szhimin/tmp/joe/core.13456 > /view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp > GNU gdb 6.2 > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db > library "/lib/libthread_db.so.1". > > Core was generated by `/vob/neptune/plankton/celp/linux.celp (DSC-0-0-0) > [nameServer] '. > Program terminated with signal 6, Aborted. > #0 0xa01643e1 in kill () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 I wonder what is producing these bogus source file and line numbers though. -gsm ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] 2004-08-12 15:36 ` Joe Marzot 2004-08-12 15:47 ` Joe Marzot @ 2004-08-13 15:46 ` Joe Marzot 2004-08-13 18:01 ` Joe Marzot ` (2 more replies) 1 sibling, 3 replies; 23+ messages in thread From: Joe Marzot @ 2004-08-13 15:46 UTC (permalink / raw) To: Jeff Dike; +Cc: user-mode-linux-devel Joe Marzot wrote: > here is a better one produced under similar conditions - this time the > core is readable (I do get the unreadable cores quite often though). > > this is host RH8 + skas3 patch > > guest is 2.4.2x + 2.4.24-1um so looking deeper in this core in handle_trap I see the call to waitpid fails with a status of 383 and an err of 13456 (gdb) p err $6 = 13456 (the pid of the child who exitted) (gdb) p status $7 = 383 WSTOPSIG(err) = SIGHUP does this give any clues...any ideas of what else to look at? thanks, GSM > > [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c > ~szhimin/tmp/joe/core.13456 > /view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp > GNU gdb 6.2 > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db > library "/lib/libthread_db.so.1". > > Core was generated by `/vob/neptune/plankton/celp/linux.celp (DSC-0-0-0) > [nameServer] '. > Program terminated with signal 6, Aborted. > #0 0xa01643e1 in kill () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 > 486 case 1: COMMON("\n\tstosb"); return s; > (gdb) where > #0 0xa01643e1 in kill () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 > #1 0xa018cbdb in raise () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 > #2 0xa01646cd in abort () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 > #3 0xa00d01e4 in handle_trap (pid=13461, regs=0xa5f7827c) at process.c:90 > #4 0xa00d0438 in userspace (regs=0xa5f7827c) at process.c:168 > #5 0xa00d0bfa in fork_handler (sig=10) at process_kern.c:102 > #6 <signal handler called> > #7 0xa01643e1 in kill () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 > #8 0xa00d4734 in os_usr1_process (pid=13456) at process.c:95 > #9 0xa00d04ce in new_thread (stack=Cannot access memory at address 0x8 > ) at process.c:205 > Previous frame inner to this frame (corrupt stack?) > (gdb) info thr > * 1 process 13456 0xa01643e1 in kill () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 > (gdb) > > > ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] 2004-08-13 15:46 ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot @ 2004-08-13 18:01 ` Joe Marzot 2004-08-13 21:47 ` Jeff Dike 2004-09-13 15:39 ` [uml-devel] handle_trap - failed to wait at end of syscall Joe Marzot 2 siblings, 0 replies; 23+ messages in thread From: Joe Marzot @ 2004-08-13 18:01 UTC (permalink / raw) To: Jeff Dike; +Cc: user-mode-linux-devel Marzot, Joe [BL60:NP72:EXCH] wrote: > (gdb) p status > $7 = 383 > > WSTOPSIG(err) = SIGHUP > that should be WSTOPSIG(status) = SIGHUP of course. -g ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] 2004-08-13 15:46 ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot 2004-08-13 18:01 ` Joe Marzot @ 2004-08-13 21:47 ` Jeff Dike 2004-08-16 17:47 ` [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] " Joe Marzot 2004-08-20 11:46 ` handle_trap - failed to wait at end of syscall [was Re: [uml-devel] " BlaisorBlade 2004-09-13 15:39 ` [uml-devel] handle_trap - failed to wait at end of syscall Joe Marzot 2 siblings, 2 replies; 23+ messages in thread From: Jeff Dike @ 2004-08-13 21:47 UTC (permalink / raw) To: Joe Marzot; +Cc: user-mode-linux-devel gmarzot@nortelnetworks.com said: > WSTOPSIG(err) = SIGHUP > does this give any clues...any ideas of what else to look at? Do you have any idea how you're making this happen? The userspace process is getting a SIGHUP in the middle of having a system call nullified. This is OK since a SIGHUP can happen any time if you log out on it or something, but I'd like to know exactly what's going on so I can decide what the right reaction to it is. Simplistically, we could just handle it there and ignore it, since UML probably got the SIGHUP as well, and will deal with it then. Jeff ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] debugging UML cores] 2004-08-13 21:47 ` Jeff Dike @ 2004-08-16 17:47 ` Joe Marzot 2004-08-16 19:25 ` Joe Marzot 2004-08-20 11:46 ` handle_trap - failed to wait at end of syscall [was Re: [uml-devel] " BlaisorBlade 1 sibling, 1 reply; 23+ messages in thread From: Joe Marzot @ 2004-08-16 17:47 UTC (permalink / raw) To: Jeff Dike; +Cc: Joe Marzot, user-mode-linux-devel Jeff Dike wrote: > gmarzot@nortelnetworks.com said: > > WSTOPSIG(err) = SIGHUP > > does this give any clues...any ideas of what else to look at? > > Do you have any idea how you're making this happen? unfortunately not...the UML instance is being used as a test harness for a complex set of interacting processes. all sorts of things are going prior to the crash. > The userspace > process is > getting a SIGHUP in the middle of having a system call nullified. what does it mean to nullify a system call? I am also losing whether this is a simulated signal inside the UML userspace app or a host signal being delivered to the host resident UML usespace thread. > This is OK > since a SIGHUP can happen any time if you log out on it or something, but > I'd like to know exactly what's going on so I can decide what the right > reaction > to it is. as it is a test harness there are lot's of scripts being invoked - shells are being spawned and exited. There may be expect scripts logging into the UML and logging out if that's what mean. > > Simplistically, we could just handle it there and ignore it, since UML > probably > got the SIGHUP as well, and will deal with it then. something like this? if((err < 0) || !WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP) || (WSTOPSIG(status) != SIGHUP)) { .... } else { handle_syscall(regs); } regards, GSM > > Jeff > > > ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] debugging UML cores] 2004-08-16 17:47 ` [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] " Joe Marzot @ 2004-08-16 19:25 ` Joe Marzot 2004-08-16 19:53 ` D. Bahi 0 siblings, 1 reply; 23+ messages in thread From: Joe Marzot @ 2004-08-16 19:25 UTC (permalink / raw) Cc: Jeff Dike, user-mode-linux-devel Joe Marzot wrote: > Jeff Dike wrote: > >> gmarzot@nortelnetworks.com said: >> > WSTOPSIG(err) = SIGHUP >> > does this give any clues...any ideas of what else to look at? >> >> Do you have any idea how you're making this happen? here's another twist - looks like a different crash but stimulated by the same tests being performed inside UML. This back trace goes on down to zero just like this -> sig 11, change_sig 10, sig 11... looks like a klm might have corrupted kernel mem...or does this look familial to other UML'ers? #2156 <signal handler called> #2157 0xa0151ac0 in sigismember () at /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 #2158 0xa00c09eb in change_sig (signal=10, on=1) at signal_user.c:57 #2159 0xa00c4a01 in sig_handler_common_skas (sig=11, sc_ptr=0xa00cc100) at trap_user.c:31 #2160 0xa00c2746 in sig_handler (sig=11, sc= {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, edi = 10, esi = 2685191148, ebp = 2685191428, esp = 2685191128, ebx = 2685191276, edx = 2685191276, ecx = 2685191276, eax = 354011904, trapno = 14, err = 6, eip = 2685737664, cs = 35, __csh = 0, eflags = 66050, esp_at_signal = 2685191128, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 134217792, cr2 = 354011904}) at trap_user.c:102 #2161 <signal handler called> #2162 0xa0151ac0 in sigismember () at /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 #2163 0xa00c09eb in change_sig (signal=10, on=1) at signal_user.c:57 ---Type <return> to continue, or q <return> to quit--- #2164 0xa00c4a01 in sig_handler_common_skas (sig=0, sc_ptr=0xa00cc560) at trap_user.c:31 #2165 0xa00c2746 in sig_handler (sig=Cannot access memory at address 0x16 ) at trap_user.c:102 Previous frame inner to this frame (corrupt stack?) anyone have any tips on interesting fields to look at? regards, Giovanni > > > unfortunately not...the UML instance is being used as a test harness for > a complex set of interacting processes. all sorts of things are going > prior to the crash. > >> The userspace process is >> getting a SIGHUP in the middle of having a system call nullified. > > > what does it mean to nullify a system call? > > I am also losing whether this is a simulated signal inside the UML > userspace app or a host signal being delivered to the host resident UML > usespace thread. > >> This is OK >> since a SIGHUP can happen any time if you log out on it or something, but >> I'd like to know exactly what's going on so I can decide what the >> right reaction >> to it is. > > > as it is a test harness there are lot's of scripts being invoked - > shells are being spawned and exited. There may be expect scripts logging > into the UML and logging out if that's what mean. > >> >> Simplistically, we could just handle it there and ignore it, since UML >> probably >> got the SIGHUP as well, and will deal with it then. > > > something like this? > > if((err < 0) || !WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP) || > (WSTOPSIG(status) != SIGHUP)) { > .... > } else { > handle_syscall(regs); > } > > regards, GSM > >> >> Jeff >> >> >> > > > ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] debugging UML cores] 2004-08-16 19:25 ` Joe Marzot @ 2004-08-16 19:53 ` D. Bahi 2004-08-17 5:26 ` Jeff Dike 0 siblings, 1 reply; 23+ messages in thread From: D. Bahi @ 2004-08-16 19:53 UTC (permalink / raw) To: Joe Marzot; +Cc: Jeff Dike, user-mode-linux-devel [-- Attachment #1.1: Type: text/plain, Size: 3957 bytes --] does this look familar? humm, here's 2.4.26-3um, backtrace attached. we do have kernel modules loaded... and lots of communication with a modified uml_switch going on... otherwise this can happen in a relatively idle UML after some random period of time. i have not seen this in a vanilla 2.4.26-3 with a generic redhat 9 file system just doing 'ls -R' over and over for exercise -- btw: it has no modules loaded... and none in the filesystem to load for a quick test. i'm installing Expect.pm so i can play with the test scripts and try to isolate this and the hostfs troubles... fun. db Joe Marzot wrote: > Joe Marzot wrote: > >> Jeff Dike wrote: >> >>> gmarzot@nortelnetworks.com said: >>> > WSTOPSIG(err) = SIGHUP >>> > does this give any clues...any ideas of what else to look at? >>> >>> Do you have any idea how you're making this happen? > > > here's another twist - looks like a different crash but stimulated by > the same tests being performed inside UML. This back trace goes on down > to zero just like this -> sig 11, change_sig 10, sig 11... > > looks like a klm might have corrupted kernel mem...or does this look > familial to other UML'ers? > > #2156 <signal handler called> > #2157 0xa0151ac0 in sigismember () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 > #2158 0xa00c09eb in change_sig (signal=10, on=1) at signal_user.c:57 > #2159 0xa00c4a01 in sig_handler_common_skas (sig=11, sc_ptr=0xa00cc100) > at trap_user.c:31 > #2160 0xa00c2746 in sig_handler (sig=11, sc= > {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = > 43, __dsh = 0, edi = 10, esi = 2685191148, ebp = 2685191428, esp = > 2685191128, ebx = 2685191276, edx = 2685191276, ecx = 2685191276, eax = > 354011904, trapno = 14, err = 6, eip = 2685737664, cs = 35, __csh = 0, > eflags = 66050, esp_at_signal = 2685191128, ss = 43, __ssh = 0, fpstate > = 0x0, oldmask = 134217792, cr2 = 354011904}) > at trap_user.c:102 > #2161 <signal handler called> > #2162 0xa0151ac0 in sigismember () > at > /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 > #2163 0xa00c09eb in change_sig (signal=10, on=1) at signal_user.c:57 > ---Type <return> to continue, or q <return> to quit--- > #2164 0xa00c4a01 in sig_handler_common_skas (sig=0, sc_ptr=0xa00cc560) > at trap_user.c:31 > #2165 0xa00c2746 in sig_handler (sig=Cannot access memory at address 0x16 > ) at trap_user.c:102 > Previous frame inner to this frame (corrupt stack?) > > anyone have any tips on interesting fields to look at? > > regards, Giovanni > >> >> >> unfortunately not...the UML instance is being used as a test harness >> for a complex set of interacting processes. all sorts of things are >> going prior to the crash. >> >>> The userspace process is >>> getting a SIGHUP in the middle of having a system call nullified. >> >> >> >> what does it mean to nullify a system call? >> >> I am also losing whether this is a simulated signal inside the UML >> userspace app or a host signal being delivered to the host resident >> UML usespace thread. >> >>> This is OK >>> since a SIGHUP can happen any time if you log out on it or something, >>> but >>> I'd like to know exactly what's going on so I can decide what the >>> right reaction >>> to it is. >> >> >> >> as it is a test harness there are lot's of scripts being invoked - >> shells are being spawned and exited. There may be expect scripts >> logging into the UML and logging out if that's what mean. >> >>> >>> Simplistically, we could just handle it there and ignore it, since >>> UML probably >>> got the SIGHUP as well, and will deal with it then. >> >> >> >> something like this? >> >> if((err < 0) || !WIFSTOPPED(status) || (WSTOPSIG(status) != SIGTRAP) >> || (WSTOPSIG(status) != SIGHUP)) { >> .... >> } else { >> handle_syscall(regs); >> } >> >> regards, GSM >> >>> >>> Jeff [-- Attachment #1.2: randomkernelpanic.txt --] [-- Type: text/plain, Size: 3210 bytes --] #35 0x080dd263 in linux_main (argc=12, argv=0x20000000) at um_arch.c:393 #34 0x080debae in start_uml_skas () at process_kern.c:193 #33 0x080de4e5 in start_idle_thread (stack=0x81e8000, switch_buf_ptr=0x81e8578, fork_buf_ptr=0x0) at process.c:303 #32 0x0815a325 in siglongjmp () at proc_fs.h:154 #31 0x0815a691 in kill () at proc_fs.h:154 #30 <signal handler called> #29 0x080de886 in new_thread_handler (sig=10) at process_kern.c:72 #28 0x080d90ed in run_kernel_thread (fn=0x80deb34 <start_kernel_proc>, arg=0x0, jmp_ptr=0x81e8000) at process.c:231 #27 0x080deb5b in start_kernel_proc (unused=0x0) at process_kern.c:179 #26 0x0804950a in start_kernel () at init/main.c:440 #25 0x0805144e in rest_init () at init/main.c:346 #24 0x080d94f1 in cpu_idle () at process_kern.c:209 #23 0x080dc27a in idle_sleep (secs=-4) at time.c:132 #22 0x0816787a in nanosleep () at proc_fs.h:154 #21 <signal handler called> #20 0x080dce1e in sig_handler (sig=29, sc={gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, edi = 136216576, esi = 136216576, ebp = 136248204, esp = 136248176, ebx = 136248196, edx = 136216576, ecx = 0, eax = 4294967292, trapno = 14, err = 6, eip = 135690362, cs = 35, __csh = 0, eflags = 582, esp_at_signal = 136248176, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0, cr2 = 681033728}) at trap_user.c:109 #19 0x080df1f5 in sig_handler_common_skas (sig=29, sc_ptr=0xe8) at trap_user.c:35 #18 0x080d72bb in sigio_handler (sig=29, regs=0x81e8270) at irq_user.c:73 #17 0x080d6c57 in do_IRQ (irq=5, regs=0x81e8270) at irq.c:336 #16 0x0805ae62 in do_softirq () at softirq.c:90 #15 0x08109b50 in net_rx_action (h=0x8203590) at dev.c:1626 #14 0x08109a35 in process_backlog (backlog_dev=0x82038e8, budget=0x81ef7ac) at dev.c:1563 #13 0x08109915 in netif_receive_skb (skb=0x25ba6d20) at dev.c:1530 #12 0x08136763 in arp_process (skb=0x25ba6d20) at arp.c:946 #11 0x0810db92 in neigh_update (neigh=0x260535a0, lladdr=0x20cc7858 "\002", new=2 '\002', override=1, arp=1) at neighbour.c:895 #10 0x0810ef94 in neigh_app_notify (n=0x260535a0) at neighbour.c:1477 #9 0x0810eb11 in neigh_fill_info (skb=0x83cca80, n=0x260535a0, pid=1, seq=1, event=1) at neighbour.c:1341 #8 <signal handler called> #7 0x080dce1e in sig_handler (sig=11, sc={gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, edi = 525299216, esi = 138201728, ebp = 136246892, esp = 136246836, ebx = 525299200, edx = 637875616, ecx = 136246660, eax = 1, trapno = 14, err = 4, eip = 135326481, cs = 35, __csh = 0, eflags = 66050, esp_at_signal = 136246836, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 134217728, cr2 = 61}) at trap_user.c:109 #6 0x080df1f5 in sig_handler_common_skas (sig=11, sc_ptr=0x58) at trap_user.c:35 #5 0x080dcdfd in segv_handler (sig=11, regs=0x81e8270) at trap_user.c:74 #4 0x080dcab9 in segv (address=61, ip=0, is_write=0, is_user=0, sc=0x81e8270) at trap_kern.c:149 #3 0x08056215 in panic (fmt=0x81c8b60 "Kernel mode fault at addr 0x%lx, ip 0x%lx") at panic.c:77 #2 0x080612a6 in notifier_call_chain (n=0xf4240, val=0, v=0x820d1c0) at sys.c:148 #1 0x080dd3a9 in panic_exit (self=0x81f6c34, unused1=0, unused2=0x820d1c0) at um_arch.c:403 #0 stop () at user_util.c:52 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 187 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] debugging UML cores] 2004-08-16 19:53 ` D. Bahi @ 2004-08-17 5:26 ` Jeff Dike 0 siblings, 0 replies; 23+ messages in thread From: Jeff Dike @ 2004-08-17 5:26 UTC (permalink / raw) To: D. Bahi; +Cc: Joe Marzot, user-mode-linux-devel dbahi@enterasys.com said: > does this look familar? humm, here's 2.4.26-3um, backtrace attached. Not even close to the same bug. It's segfaulting at neighbour.c, line 1341, which is this: 1340 ci.ndm_used = now - n->used; 1341 ci.ndm_confirmed = now - n->confirmed; 1342 ci.ndm_updated = now - n->updated; This is mystifying because whatever address that line 1341 could have faulted on, line 1340 should have faulted. The fault address is 61 (== 0x3d), which I can't see in that code either. If n were 0, then you'd get a fault on some low address, but n = 0x260535a0 (and I assume you're using the new load-low option) n->confirmed is a 4-byte aligned field, and would not fault on an odd address. So, I think I don't totally trust gdb's line number reporting in this case. What I would do is disassemble neigh_fill_info, and see what line the faulting instruction really belongs to. Jeff ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] 2004-08-13 21:47 ` Jeff Dike 2004-08-16 17:47 ` [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] " Joe Marzot @ 2004-08-20 11:46 ` BlaisorBlade 1 sibling, 0 replies; 23+ messages in thread From: BlaisorBlade @ 2004-08-20 11:46 UTC (permalink / raw) To: user-mode-linux-devel; +Cc: Jeff Dike, Joe Marzot Alle 23:47, venerdì 13 agosto 2004, Jeff Dike ha scritto: > gmarzot@nortelnetworks.com said: > > WSTOPSIG(err) = SIGHUP > > does this give any clues...any ideas of what else to look at? > > Do you have any idea how you're making this happen? The userspace process > is getting a SIGHUP in the middle of having a system call nullified. This > is OK since a SIGHUP can happen any time if you log out on it or something, > but I'd like to know exactly what's going on so I can decide what the right > reaction to it is. I'm getting a similar problem, in another situation. With my current tree (but also any 2.6.7-bb should do; but you can test it on -bb4, to be sure), on a 2.6.7 host with host-skas3-2.6.7-v2.patch (notice -v2, it contains SYSEMU) (never got this on 2.4 host, but I'll retest), if I do "echo 0 > /proc/sysemu" I get the same failure, but with 2943 as status (i.e. WSTOPSIG=SIGSEGV) and EINTR as errno (but I think that's from a previous loop, since I've applied the catch-EINTR patch; i.e. probably it did the syscall, it was interrupted, and did the syscall again which returned SIGSEGV as stop status). And this is definitely reproducible with echo 0 > /proc/sysemu. But when booting with nosysemu, it works, and I can even reenable it with echo 1 > /proc/sysemu; but if I try disabling it again, I get the same problem. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 ------------------------------------------------------- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall 2004-08-13 15:46 ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot 2004-08-13 18:01 ` Joe Marzot 2004-08-13 21:47 ` Jeff Dike @ 2004-09-13 15:39 ` Joe Marzot 2004-09-13 19:39 ` BlaisorBlade 2 siblings, 1 reply; 23+ messages in thread From: Joe Marzot @ 2004-09-13 15:39 UTC (permalink / raw) To: Jeff Dike; +Cc: user-mode-linux-devel remeber this one?...the latest take on this is that because we launch UMLs from a perl script (using fork/exec) when the perl script exits a SIGHUP is transmitted to the UML proc which sometimes interrupts a waitpid()...if that interruption occurs during the nullification of a syscall (now that I know what that means:) then you get a kernel panic like below. I made a small fix that seems to be working for me and looks like what's going on in CATCH_EINTR do { CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED)); } while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP)) can't do this globally in CATCH_EINTR since some waitpids don't check status...maybe they should...maybe there is a more correct way to do this altogether... thoughts? regards, Giovanni Marzot, Joe [BL60:NP72:EXCH] wrote: > Joe Marzot wrote: > >> here is a better one produced under similar conditions - this time the >> core is readable (I do get the unreadable cores quite often though). >> >> this is host RH8 + skas3 patch >> >> guest is 2.4.2x + 2.4.24-1um > > > so looking deeper in this core in handle_trap I see the call to waitpid > fails with a status of 383 and an err of 13456 > > (gdb) p err > $6 = 13456 (the pid of the child who exitted) > (gdb) p status > $7 = 383 > > WSTOPSIG(err) = SIGHUP > > does this give any clues...any ideas of what else to look at? > > thanks, GSM > >> >> [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c >> ~szhimin/tmp/joe/core.13456 >> /view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp >> GNU gdb 6.2 >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and >> you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for >> details. >> This GDB was configured as "i686-pc-linux-gnu"...Using host >> libthread_db library "/lib/libthread_db.so.1". >> >> Core was generated by `/vob/neptune/plankton/celp/linux.celp >> (DSC-0-0-0) [nameServer] '. >> Program terminated with signal 6, Aborted. >> #0 0xa01643e1 in kill () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> 486 case 1: COMMON("\n\tstosb"); return s; >> (gdb) where >> #0 0xa01643e1 in kill () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> #1 0xa018cbdb in raise () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> #2 0xa01646cd in abort () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> #3 0xa00d01e4 in handle_trap (pid=13461, regs=0xa5f7827c) at >> process.c:90 >> #4 0xa00d0438 in userspace (regs=0xa5f7827c) at process.c:168 >> #5 0xa00d0bfa in fork_handler (sig=10) at process_kern.c:102 >> #6 <signal handler called> >> #7 0xa01643e1 in kill () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> #8 0xa00d4734 in os_usr1_process (pid=13456) at process.c:95 >> #9 0xa00d04ce in new_thread (stack=Cannot access memory at address 0x8 >> ) at process.c:205 >> Previous frame inner to this frame (corrupt stack?) >> (gdb) info thr >> * 1 process 13456 0xa01643e1 in kill () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> (gdb) >> >> >> > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media > 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 > Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. > http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 > _______________________________________________ > User-mode-linux-devel mailing list > User-mode-linux-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall 2004-09-13 15:39 ` [uml-devel] handle_trap - failed to wait at end of syscall Joe Marzot @ 2004-09-13 19:39 ` BlaisorBlade 2004-09-13 22:14 ` Jeff Dike 0 siblings, 1 reply; 23+ messages in thread From: BlaisorBlade @ 2004-09-13 19:39 UTC (permalink / raw) To: user-mode-linux-devel; +Cc: Joe Marzot, Jeff Dike On Monday 13 September 2004 17:39, Joe Marzot wrote: > remeber this one?...the latest take on this is that because we launch > UMLs from a perl script (using fork/exec) when the perl script exits a > SIGHUP is transmitted to the UML proc which sometimes interrupts a > waitpid()...if that interruption occurs during the nullification of a > syscall (now that I know what that means:) then you get a kernel panic > like below. > I made a small fix that seems to be working for me and looks like what's > going on in CATCH_EINTR > can't do this globally in CATCH_EINTR since some waitpids don't check > status...maybe they should...maybe there is a more correct way to do > this altogether... I'm going to merge something like this. Also, sorry - it's about a month that I should do this and keep forgetting to work on other stuff. However, it is not possible nor desirable to do this in CATCH_EINTR - retry if errno == EINTR is a general rule valid in every Unix program ever, while this is very specific to this call. > do { > CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED)); > } while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP)) I'll turn that to STOPSIG(status) != SIGTRAP. I'm getting the same problem with SIGSEGV instead (IIRC). However, maybe that must be != SIGTRAP and != <other signal>. I don't think so, but I must check to be sure. Jeff, what do you think? Also, we don't make a distinction between real SIGTRAP and syscall stop. Jeff, would you agree to using PTRACE_O_SYSGOOD? See arch/i386/kernel/ptrace.c:do_syscall_trace for an explaination of this func. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall 2004-09-13 19:39 ` BlaisorBlade @ 2004-09-13 22:14 ` Jeff Dike 2004-09-14 10:41 ` BlaisorBlade 0 siblings, 1 reply; 23+ messages in thread From: Jeff Dike @ 2004-09-13 22:14 UTC (permalink / raw) To: BlaisorBlade; +Cc: user-mode-linux-devel, Joe Marzot blaisorblade_spam@yahoo.it said: > However, it is not possible nor desirable to do this in CATCH_EINTR - > retry if errno == EINTR is a general rule valid in every Unix program > ever, while this is very specific to this call. > > do { > CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED)); > } while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP)) I'd still like to understand exactly what's going on here. UML interprets a SIGHUP to itself as a "shut down now" command, while it should not see SIGHUP from a terminal going away. Figuring out why it is should point us at the correct fix. Jeff ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall 2004-09-13 22:14 ` Jeff Dike @ 2004-09-14 10:41 ` BlaisorBlade 2004-09-14 16:09 ` Joe Marzot 0 siblings, 1 reply; 23+ messages in thread From: BlaisorBlade @ 2004-09-14 10:41 UTC (permalink / raw) To: Jeff Dike; +Cc: user-mode-linux-devel, Joe Marzot On Tuesday 14 September 2004 00:14, Jeff Dike wrote: > blaisorblade_spam@yahoo.it said: > > However, it is not possible nor desirable to do this in CATCH_EINTR - > > retry if errno == EINTR is a general rule valid in every Unix program > > ever, while this is very specific to this call. > > > > do { > > CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED)); > > } while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP)) > > I'd still like to understand exactly what's going on here. UML interprets > a SIGHUP to itself as a "shut down now" command, while it should not see > SIGHUP from a terminal going away. > > Figuring out why it is should point us at the correct fix. Well, I've a situation where I consistently get SIGSEGV instead of SIGHUP here, but only on 2.6 host. The scenario is to do "echo 0 > /proc/sysemu". You can test that with 2.6.9-rc2 or with 2.6.7-bb6 (both include /proc/sysemu support). The problem (at least in my scenario) is that the signal in 2.4 is delivered only to the kernel thread, while on 2.6 (for some reason) it is delivered first to the userspace thread. You too mentioned 2.6 signal delivery changes as the reason for some fixes. So, Joe, since you can get this panic consistently, could you try reproducing the scenario on a 2.4 host kernel? I guess you shouldn't be able, but I could be wrong. Also, a 2.4 RH kernel does not qualify as a true 2.4 host kernel, since it contains some NPTL code - if you can, try just a 2.4 vanilla + SKAS. About the fix, most signals get delivered to all threads, so we can probably safely ignore them when received through waitpid(). But Ulrich Drepper says here: http://people.redhat.com/drepper/posix-signal-model.xml that SIGSEGV should be delivered only to the generating thread; the document lists changes to be done to Linux, so maybe this is implemented in 2.6 and not in 2.4. However, OTOH, he also says that signal handlers are process-wide, so we should be safe anyway. And anyway, the code works perfectly on 2.4 hosts. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall 2004-09-14 10:41 ` BlaisorBlade @ 2004-09-14 16:09 ` Joe Marzot 2004-09-14 21:23 ` Jeff Dike 0 siblings, 1 reply; 23+ messages in thread From: Joe Marzot @ 2004-09-14 16:09 UTC (permalink / raw) To: BlaisorBlade Cc: Jeff Dike, user-mode-linux-devel, Joe Marzot, Smith, Paul [BL60:NP52:EXCH] BlaisorBlade wrote: >>Figuring out why it is should point us at the correct fix. > Agree...unfortunately I do not really understand where the SUGHUP is coming from exactly. It is being delivered to the userspace thread since the waitpid() in the kernel thread returns it in status. In our case the UML is launched like this: perl script my $pid = fork(); if ($pid == 0) { setpgrp(); # give all UMLs the same group id so I can renice them exec($cmd); where $cmd is something like: 'exec linux umid=foo ubd0=cow,rootfs mem=256M con0=xterm con=pts eth0=tuntap,tap0,02:00:00:04:00:01, fakehd fake_ide < /dev/null > /tmp/uml.log' We have pretty well correlated the SUGHUP delivery with the exit of the parent perl script...although it occurs only about 10% of the time...if we put a delay before the script exits it still produces the same crash rate except delayed. > > Well, I've a situation where I consistently get SIGSEGV instead of SIGHUP > here, but only on 2.6 host. The scenario is to do "echo 0 > /proc/sysemu". > You can test that with 2.6.9-rc2 or with 2.6.7-bb6 (both include /proc/sysemu > support). > > The problem (at least in my scenario) is that the signal in 2.4 is delivered > only to the kernel thread, while on 2.6 (for some reason) it is delivered > first to the userspace thread. You too mentioned 2.6 signal delivery changes > as the reason for some fixes. In my case I see the signal is being delivered to the userpsace process. I am running a 2.4.18-19.8.0 RHish host with SKAS3 patch and a 2.4.22ish guest w/ 2.4.24-1um patch. No NPTL here anywhere that I know of. > > So, Joe, since you can get this panic consistently, could you try reproducing > the scenario on a 2.4 host kernel? I guess you shouldn't be able, but I could > be wrong. Also, a 2.4 RH kernel does not qualify as a true 2.4 host kernel, > since it contains some NPTL code - if you can, try just a 2.4 vanilla + SKAS. I am not sure I understand the request - I am already using a 2.4 host. Would like to help though if you can think of something I can do with the base I have. I have no /proc/sysemu on guest or host. > > About the fix, most signals get delivered to all threads, so we can probably > safely ignore them when received through waitpid(). But Ulrich Drepper says > here: > > http://people.redhat.com/drepper/posix-signal-model.xml cool article - thanks. > > that SIGSEGV should be delivered only to the generating thread; the document > lists changes to be done to Linux, so maybe this is implemented in 2.6 and > not in 2.4. However, OTOH, he also says that signal handlers are > process-wide, so we should be safe anyway. And anyway, the code works > perfectly on 2.4 hosts. ------------------------------------------------------- This SF.Net email is sponsored by: thawte's Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall 2004-09-14 16:09 ` Joe Marzot @ 2004-09-14 21:23 ` Jeff Dike 2004-09-15 5:00 ` Richard Potter 2004-09-15 19:35 ` Joe Marzot 0 siblings, 2 replies; 23+ messages in thread From: Jeff Dike @ 2004-09-14 21:23 UTC (permalink / raw) To: Joe Marzot Cc: BlaisorBlade, user-mode-linux-devel, Smith, Paul [BL60:NP52:EXCH] gmarzot@nortelnetworks.com said: > In our case the UML is launched like this: > perl script > my $pid = fork(); > if ($pid == 0) { > setpgrp(); # give all UMLs the same group id so I can renice them > exec($cmd); I think I understand what's happening. You are (unwittingly) sending UML (and every process that belongs to it) a HUP when you, in effect, detach it from its parent terminal. My first thought was that SIGHUP isn't disabled in the userspace process, and it should be, but it is, so I don't know why it's even being seen by the ptracer. It should only see signals which are enabled in the child. In the meantime, while we figure this out, you might try just adding "nohup" to the beginning of your $cmd. Jeff ------------------------------------------------------- This SF.Net email is sponsored by: thawte's Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall 2004-09-14 21:23 ` Jeff Dike @ 2004-09-15 5:00 ` Richard Potter 2004-09-15 19:35 ` Joe Marzot 1 sibling, 0 replies; 23+ messages in thread From: Richard Potter @ 2004-09-15 5:00 UTC (permalink / raw) To: Jeff Dike Cc: Joe Marzot, BlaisorBlade, user-mode-linux-devel, Smith, Paul [BL60:NP52:EXCH] SBUML (sbuml.sf.net) had a big SIGHUP problem that produced intermittent crashes, also about 10% of the time. After definitively tracing the problem to SIGHUP, it seemed like nohup in bash would solve it but I did not have any luck with it. The solution that finally worked was using setsid: setsid linux umid=foo ..... --Richard > gmarzot@nortelnetworks.com said: > > In our case the UML is launched like this: > > perl script > > my $pid = fork(); > > if ($pid == 0) { > > setpgrp(); # give all UMLs the same group id so I can renice them > > exec($cmd); > > I think I understand what's happening. You are (unwittingly) sending UML > (and every process that belongs to it) a HUP when you, in effect, detach it > from its parent terminal. > > My first thought was that SIGHUP isn't disabled in the userspace process, > and it should be, but it is, so I don't know why it's even being seen by > the ptracer. It should only see signals which are enabled in the child. > > In the meantime, while we figure this out, you might try just adding "nohup" > to the beginning of your $cmd. > > Jeff > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: thawte's Crypto Challenge Vl > Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam > Camcorder. More prizes in the weekly Lunch Hour Challenge. > Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m > _______________________________________________ > User-mode-linux-devel mailing list > User-mode-linux-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ------------------------------------------------------- This SF.Net email is sponsored by: thawte's Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [uml-devel] handle_trap - failed to wait at end of syscall 2004-09-14 21:23 ` Jeff Dike 2004-09-15 5:00 ` Richard Potter @ 2004-09-15 19:35 ` Joe Marzot 1 sibling, 0 replies; 23+ messages in thread From: Joe Marzot @ 2004-09-15 19:35 UTC (permalink / raw) To: Jeff Dike Cc: Joe Marzot, BlaisorBlade, user-mode-linux-devel, Smith, Paul [BL60:NP52:EXCH] Jeff Dike wrote: > gmarzot@nortelnetworks.com said: > >>In our case the UML is launched like this: >>perl script >> my $pid = fork(); >> if ($pid == 0) { >> setpgrp(); # give all UMLs the same group id so I can renice them >> exec($cmd); > > > I think I understand what's happening. You are (unwittingly) sending UML > (and every process that belongs to it) a HUP when you, in effect, detach it > from its parent terminal. where am I detaching from the parent terminal? I just did a little test and instead of invoking UML I start a little perl script in exactly the same way as above to catch SIGHUP...but no SIGHUP arrives. > > My first thought was that SIGHUP isn't disabled in the userspace process, > and it should be, but it is, so I don't know why it's even being seen by > the ptracer. It should only see signals which are enabled in the child. more oddness. > > In the meantime, while we figure this out, you might try just adding "nohup" > to the beginning of your $cmd. originally tried this but something bad happened...forget which I could go back and see ... and try setsid as the other poster suggested. regards, GSM > > Jeff > > > ------------------------------------------------------- This SF.Net email is sponsored by: thawte's Crypto Challenge Vl Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam Camcorder. More prizes in the weekly Lunch Hour Challenge. Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2004-09-15 19:37 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-08-11 15:32 [uml-devel] debugging UML cores Joe Marzot 2004-08-12 5:41 ` Jeff Dike 2004-08-12 15:21 ` Joe Marzot 2004-08-12 16:56 ` Jeff Dike 2004-08-12 16:16 ` Joe Marzot 2004-08-12 15:36 ` Joe Marzot 2004-08-12 15:47 ` Joe Marzot 2004-08-13 15:46 ` [uml-devel] handle_trap - failed to wait at end of syscall [was Re: [uml-devel] debugging UML cores] Joe Marzot 2004-08-13 18:01 ` Joe Marzot 2004-08-13 21:47 ` Jeff Dike 2004-08-16 17:47 ` [uml-devel] Re: handle_trap - failed to wait at end of syscall [was Re: [uml- devel] " Joe Marzot 2004-08-16 19:25 ` Joe Marzot 2004-08-16 19:53 ` D. Bahi 2004-08-17 5:26 ` Jeff Dike 2004-08-20 11:46 ` handle_trap - failed to wait at end of syscall [was Re: [uml-devel] " BlaisorBlade 2004-09-13 15:39 ` [uml-devel] handle_trap - failed to wait at end of syscall Joe Marzot 2004-09-13 19:39 ` BlaisorBlade 2004-09-13 22:14 ` Jeff Dike 2004-09-14 10:41 ` BlaisorBlade 2004-09-14 16:09 ` Joe Marzot 2004-09-14 21:23 ` Jeff Dike 2004-09-15 5:00 ` Richard Potter 2004-09-15 19:35 ` Joe Marzot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.