From mboxrd@z Thu Jan 1 00:00:00 1970 From: Frank Steiner Date: Tue, 28 Sep 2004 15:18:23 +0000 Subject: Hanging udev process on nfs-mounted /dev Message-Id: <415980BF.1020401@bio.ifi.lmu.de> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-hotplug@vger.kernel.org Hi, I also sent this to the NFS list, because I'm not sure if this is an NFS or an udev problem. I hope it's ok to ask here! The issue: ===== From time to time some udev process goes mad and comsumes allmost all the CPU power, making the whole system terribly slow. The software: ======SuSE 9.1, kernel 2.6.8.1, udev 032, klibc 0.179, sysfsutils 1.1.0, hotplug 0.44 (SuSE hotplug package, the rest (udev, klibc, sysfsutils, kernel) has been replaced by newer versions). My guess: ====Maybe sth. related to nfs? The udev process hangs at/after calling F_SETLKW, see traces below. We have diskless clients which mount their own /dev over NFS from a server (not shared with other clients, each has its own /dev). Since the clients don't have local hard disks, I guess this cannot be done without NFS because we maintain some permanent links like /dev/cdrecorder -> /dev/hdd, so we need some permanent filesystem. I though about using a ramdisk for /dev on the clients and make udev setup such links, but I'm not sure if a ramdisk for /dev is better than a NFS mount. Would that be worth a try? /dev is mounted in /etc/init.d/boot, before any other start script runs. Thus, it is mounted with "nolock" because at this time no lockd etc. is running. How I got the logs: ========= I moved /sbin/udev to /sbin/utest/ and created a script /sbin/udev with #!/bin/bash strace -o /var/log/udev.log.`uname -n`.${$} -f /sbin/utest/`basename $0` $@ The first two logs are from the some udev call initiated during the system start, likely by /etc/init.d/boot.device-mapper. The .hangs log is from the hanging udev process. It was not killed manually, because we did not have a shell at this point, so after some time we hard-rebooted the computer. The second "succeed" is from the same computer, 5 boots later. So the error does not occur every time. The other two logs are from a call "/sbin/udev block" initiated by /etc/hotplug/block.agent when calling "pktsetup mycd /dev/cdrecorder" manually. Here we killed the process after 15 minutes as you can see. The logs are almost identical, just some difference w.r.t to locking. E.g., the hanging pktsetup process issues some fcntl64(0, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=4, len=1}) = -1 EAGAIN (Resource temporarily unavailable). The main difference is at the end: The hanging process hang after some F_SETLKW command: udevstart.hangs: ======== ... 1073 unlink("/dev/mapper/control") = 0 1073 symlink("../device-mapper", "/dev/mapper/control") = 0 1073 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start32, len=1}) = 0 1073 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start4, len=1}) = 0 udevstart.succeeds: =========... 1073 unlink("/dev/mapper/control") = 0 1073 symlink("../device-mapper", "/dev/mapper/control") = 0 1073 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start32, len=1}) = 0 1073 fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start32, len=1}) = 0 1073 open("/etc/dev.d/device-mapper", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory) 1073 open("/etc/dev.d/misc", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory) 1073 open("/etc/dev.d/default", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory) 1073 munmap(0x4001a000, 16384) = 0 1073 close(0) = 0 1073 exit_group(0) = ? udev.pktsetup.hangs: ========== ... lstat64("/sys/block/pktcdvd0/device", 0xbffff37c) = -1 ENOENT (No such file or directory) time(NULL) = 1095859917 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, startB4, len=1}) = 0 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start4, len=1}) = ? ERESTARTSYS (To be restarted) --- SIGTERM (Terminated) @ 0 (0) --- < we killed it here > munmap(0x4001a000, 114688) = 0 close(0) = 0 exit_group(35) = ? udev.pktsetup.succeds: =========== ... 3161 chmod("/dev/pktcdvd0", 060600) = 0 3161 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, startB4, len=1}) = 0 3161 fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start4, len=1}) = 0 3161 fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start4, len=1}) = 0 3161 fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, startB4, len=1}) = 0 3161 open("/etc/dev.d/pktcdvd0", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory) 3161 open("/etc/dev.d/block", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory) 3161 open("/etc/dev.d/default", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory) 3161 munmap(0x4001a000, 32768) = 0 3161 close(0) = 0 3161 exit_group(0) = ? Could that be a problem related to /dev mounted with "nolock" via NFS, or just some bug in NFS? I didn't send the full logs, they are quite long. They are here it someone needs to take a look: http://www.bio.ifi.lmu.de/~steiner/udevstart.hangs http://www.bio.ifi.lmu.de/~steiner/udevstart.succeeds http://www.bio.ifi.lmu.de/~steiner/udev.pktsetup.hangs http://www.bio.ifi.lmu.de/~steiner/udev.pktsetup.succeeds I greatly appreciate any hints, because this problem hits our hosts quite often and users cannot kill this udev process, so they have to find some admin to kill it :-( cu, Frank -- Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/ Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/ LMU, Amalienstr. 17 Phone: +49 89 2180-4049 80333 Muenchen, Germany Fax: +49 89 2180-99-4049 * Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. * ------------------------------------------------------- This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170 Project Admins to receive an Apple iPod Mini FREE for your judgement on who ports your project to Linux PPC the best. Sponsored by IBM. Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php _______________________________________________ Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net Linux-hotplug-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel