From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.12] helo=sc8-sf-mx2.sourceforge.net) by sc8-sf-list1.sourceforge.net with esmtp (Exim 4.30) id 1C6svw-0006rX-Co for user-mode-linux-devel@lists.sourceforge.net; Mon, 13 Sep 2004 08:40:00 -0700 Received: from zrtps0kn.nortelnetworks.com ([47.140.192.55]) by sc8-sf-mx2.sourceforge.net with esmtp (Exim 4.34) id 1C6svv-0005UU-NG for user-mode-linux-devel@lists.sourceforge.net; Mon, 13 Sep 2004 08:40:00 -0700 Message-ID: <4145BF3F.3090502@nortelnetworks.com> From: Joe Marzot MIME-Version: 1.0 Subject: Re: [uml-devel] handle_trap - failed to wait at end of syscall References: <200408120541.i7C5faJd010923@ccure.user-mode-linux.org> <411B8E89.5040407@nortelnetworks.com> <411CE23C.4070903@nortelnetworks.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: user-mode-linux-devel-admin@lists.sourceforge.net Errors-To: user-mode-linux-devel-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: The user-mode Linux development list List-Post: List-Help: List-Subscribe: , List-Archive: Date: Mon, 13 Sep 2004 11:39:43 -0400 To: Jeff Dike Cc: user-mode-linux-devel@lists.sourceforge.net remeber this one?...the latest take on this is that because we launch UMLs from a perl script (using fork/exec) when the perl script exits a SIGHUP is transmitted to the UML proc which sometimes interrupts a waitpid()...if that interruption occurs during the nullification of a syscall (now that I know what that means:) then you get a kernel panic like below. I made a small fix that seems to be working for me and looks like what's going on in CATCH_EINTR do { CATCH_EINTR(err = waitpid(pid, &status, WUNTRACED)); } while (WIFSTOPPED(status) && (STOPSIG(status) == SIGHUP)) can't do this globally in CATCH_EINTR since some waitpids don't check status...maybe they should...maybe there is a more correct way to do this altogether... thoughts? regards, Giovanni Marzot, Joe [BL60:NP72:EXCH] wrote: > Joe Marzot wrote: > >> here is a better one produced under similar conditions - this time the >> core is readable (I do get the unreadable cores quite often though). >> >> this is host RH8 + skas3 patch >> >> guest is 2.4.2x + 2.4.24-1um > > > so looking deeper in this core in handle_trap I see the call to waitpid > fails with a status of 383 and an err of 13456 > > (gdb) p err > $6 = 13456 (the pid of the child who exitted) > (gdb) p status > $7 = 383 > > WSTOPSIG(err) = SIGHUP > > does this give any clues...any ideas of what else to look at? > > thanks, GSM > >> >> [root@wbl6y227 plankton]# /usr/local/builds/gdb-6.2/gdb/gdb -c >> ~szhimin/tmp/joe/core.13456 >> /view/build_neptune_dev_int144.resp3/vob/neptune/plankton/celp/linux.celp >> GNU gdb 6.2 >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and >> you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for >> details. >> This GDB was configured as "i686-pc-linux-gnu"...Using host >> libthread_db library "/lib/libthread_db.so.1". >> >> Core was generated by `/vob/neptune/plankton/celp/linux.celp >> (DSC-0-0-0) [nameServer] '. >> Program terminated with signal 6, Aborted. >> #0 0xa01643e1 in kill () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> 486 case 1: COMMON("\n\tstosb"); return s; >> (gdb) where >> #0 0xa01643e1 in kill () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> #1 0xa018cbdb in raise () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> #2 0xa01646cd in abort () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> #3 0xa00d01e4 in handle_trap (pid=13461, regs=0xa5f7827c) at >> process.c:90 >> #4 0xa00d0438 in userspace (regs=0xa5f7827c) at process.c:168 >> #5 0xa00d0bfa in fork_handler (sig=10) at process_kern.c:102 >> #6 >> #7 0xa01643e1 in kill () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> #8 0xa00d4734 in os_usr1_process (pid=13456) at process.c:95 >> #9 0xa00d04ce in new_thread (stack=Cannot access memory at address 0x8 >> ) at process.c:205 >> Previous frame inner to this frame (corrupt stack?) >> (gdb) info thr >> * 1 process 13456 0xa01643e1 in kill () >> at >> /localdisk/builds/3pc/2.4.22-i686sim/2.4.22/include/asm/arch/string.h:486 >> (gdb) >> >> >> > > > > > > ------------------------------------------------------- > SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media > 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 > Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift. > http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 > _______________________________________________ > User-mode-linux-devel mailing list > User-mode-linux-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel > ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 13. Go here: http://sf.net/ppc_contest.php _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel