From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frank Steiner Date: Fri, 01 Oct 2004 07:38:53 +0000 Subject: Re: Hanging udev process on nfs-mounted /dev Message-Id: <415D098D.7070707@bio.ifi.lmu.de> List-Id: References: <415980BF.1020401@bio.ifi.lmu.de> In-Reply-To: <415980BF.1020401@bio.ifi.lmu.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-hotplug@vger.kernel.org Hi, here we go :-) On reboot, one of the clients ran into the haning udev process. Althoug the timeout patch was applied, the hanging udev process was not killed. But it blocked a lot of other processes because there are messages about "timeout reached" in /var/log/messages. I had to reboot the PC (the professors client :-)), but I tried to collect all information that might be helpful. I've put all the logs on a website. They include /var/log/messages from the point where the system bootet until it hung, a "ps -aux" output while udev was hanging, and the straces for all udev processes started during the boot. Recall that I replaced /sbin/udev{start} by strace -o /var/log/udev.log.`uname -n`.${$} -f /sbin/utest/`basename $0` $@ and moved the original udev and udevstart to /sbin/utest/. All the information is here: http://www.bio.ifi.lmu.de/~steiner/udev/ The udev traces are sorted in "ls -lat" order. The udev process that was hanging had pid 9700. The matching strace is udev.log.noether.9652. After calling "pkill udev" to make the host usable again, three straces were changed. Those are listed with both versions, so that one can see what happened after killing (don't know if this helps). Again, the hanging udev process hung after F_SETLKW: ... 9700 fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start(8, len=1}) = 0 9700 fcntl64(5, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, startt924, len=1}) = 0 9700 fcntl64(5, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, startt924, len=1}) = 0 9700 fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start4, len=1}) = 0 9700 --- SIGALRM (Alarm clock) @ 0 (0) --- 9700 time([1096612648]) = 1096612648 9700 rt_sigaction(SIGPIPE, {0x40116ae0, [], SA_RESTORER, 0x40067aa8}, {SIG_DFL}, 8) = 0 9700 send(0, "<14>Oct 1 08:37:28 udev: error:"..., 137, 0) = 137 9700 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 9700 sigreturn() = ? (mask now []) And after "pkill udev" those lines were added: 9700 --- SIGTERM (Terminated) @ 0 (0) --- 9700 munmap(0x4001a000, 81920) = 0 9700 close(5) = 0 9700 exit_group(35) = ? I hope these information help! cu, Frank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. * ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net Linux-hotplug-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel