linux-hotplug.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Hanging udev process on nfs-mounted /dev
@ 2004-09-28 15:18 Frank Steiner
  2004-09-29 17:18 ` Greg KH
                   ` (24 more replies)
  0 siblings, 25 replies; 26+ messages in thread
From: Frank Steiner @ 2004-09-28 15:18 UTC (permalink / raw)
  To: linux-hotplug

Hi,

I also sent this to the NFS list, because I'm not sure if this is an
NFS or an udev problem. I hope it's ok to ask here!


The issue:
=====
 From time to time some udev process goes mad and comsumes allmost all
the CPU power, making the whole system terribly slow.

The software:
======SuSE 9.1, kernel 2.6.8.1, udev 032, klibc 0.179, sysfsutils 1.1.0,
hotplug 0.44 (SuSE hotplug package, the rest (udev, klibc, sysfsutils,
kernel) has been replaced by newer versions).

My guess:
====Maybe sth. related to nfs? The udev process hangs at/after calling
F_SETLKW, see traces below.

We have diskless clients which mount their own /dev over NFS from a
server (not shared with other clients, each has its own /dev). Since
the clients don't have local hard disks, I guess this cannot be done
without NFS because we maintain some permanent links like
/dev/cdrecorder -> /dev/hdd, so we need some permanent filesystem.
I though about using a ramdisk for /dev on the clients and make udev
setup such links, but I'm not sure if a ramdisk for /dev is better
than a NFS mount. Would that be worth a try?

/dev is mounted in /etc/init.d/boot, before any other start script
runs. Thus, it is mounted with "nolock" because at this time no lockd
etc. is running.

How I got the logs:
=========
I moved /sbin/udev to /sbin/utest/ and created a script /sbin/udev with

    #!/bin/bash
    strace -o /var/log/udev.log.`uname -n`.${$} -f /sbin/utest/`basename $0` $@

The first two logs are from the some udev call initiated during the system
start, likely by /etc/init.d/boot.device-mapper. The .hangs log is from the
hanging udev process.
It was not killed manually, because we did not have a shell at this point,
so after some time we hard-rebooted the computer.
The second "succeed" is from the same computer, 5 boots later. So the error
does not occur every time.

The other two logs are from a call "/sbin/udev block" initiated by
/etc/hotplug/block.agent when calling "pktsetup mycd /dev/cdrecorder"
manually. Here we killed the process after 15 minutes as you can see.

The logs are almost identical, just some difference w.r.t to locking.
E.g., the hanging pktsetup process issues some
    fcntl64(0, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=4, len=1}) = -1 EAGAIN (Resource temporarily unavailable).

The main difference is at the end: The hanging process hang after some F_SETLKW command:

udevstart.hangs:
========
...
1073  unlink("/dev/mapper/control")     = 0
1073  symlink("../device-mapper", "/dev/mapper/control") = 0
1073  fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start32, len=1}) = 0
1073  fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start\x164, len=1}) = 0
<hangs forever>

udevstart.succeeds:
=========...
1073  unlink("/dev/mapper/control")     = 0
1073  symlink("../device-mapper", "/dev/mapper/control") = 0
1073  fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start32, len=1}) = 0
1073  fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start32, len=1}) = 0
1073  open("/etc/dev.d/device-mapper", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
1073  open("/etc/dev.d/misc", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
1073  open("/etc/dev.d/default", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
1073  munmap(0x4001a000, 16384)         = 0
1073  close(0)                          = 0
1073  exit_group(0)                     = ?

udev.pktsetup.hangs:
==========
...
lstat64("/sys/block/pktcdvd0/device", 0xbffff37c) = -1 ENOENT (No such file or directory)
time(NULL)                              = 1095859917
fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, startB4, len=1}) = 0
fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start\x164, len=1}) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) --- < we killed it here >
munmap(0x4001a000, 114688)              = 0
close(0)                                = 0
exit_group(35)                          = ?


udev.pktsetup.succeds:
===========
...
3161  chmod("/dev/pktcdvd0", 060600)    = 0
3161  fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, startB4, len=1}) = 0
3161  fcntl64(0, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start\x164, len=1}) = 0
3161  fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start\x164, len=1}) = 0
3161  fcntl64(0, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, startB4, len=1}) = 0
3161  open("/etc/dev.d/pktcdvd0", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
3161  open("/etc/dev.d/block", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
3161  open("/etc/dev.d/default", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = -1 ENOENT (No such file or directory)
3161  munmap(0x4001a000, 32768)         = 0
3161  close(0)                          = 0
3161  exit_group(0)                     = ?

Could that be a problem related to /dev mounted with "nolock" via NFS,
or just some bug in NFS?

I didn't send the full logs, they are quite long. They are here it
someone needs to take a look:
http://www.bio.ifi.lmu.de/~steiner/udevstart.hangs
http://www.bio.ifi.lmu.de/~steiner/udevstart.succeeds
http://www.bio.ifi.lmu.de/~steiner/udev.pktsetup.hangs
http://www.bio.ifi.lmu.de/~steiner/udev.pktsetup.succeeds

I greatly appreciate any hints, because this problem hits our hosts
quite often and users cannot kill this udev process, so they have
to find some admin to kill it :-(

cu,
Frank



-- 
Dipl.-Inform. Frank Steiner   Web:  http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik    Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17           Phone: +49 89 2180-4049
80333 Muenchen, Germany       Fax:   +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Linux-hotplug-devel mailing list  http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2004-10-08  5:59 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
2004-09-29 17:18 ` Greg KH
2004-09-29 23:39 ` Kay Sievers
2004-09-30  2:11 ` Kay Sievers
2004-09-30  6:18 ` Frank Steiner
2004-09-30  6:21 ` Frank Steiner
2004-09-30 14:07 ` Kay Sievers
2004-10-01  6:25 ` Frank Steiner
2004-10-01  7:36 ` Kay Sievers
2004-10-01  7:38 ` Frank Steiner
2004-10-01  7:55 ` Frank Steiner
2004-10-01  8:08 ` Kay Sievers
2004-10-01  9:43 ` Frank Steiner
2004-10-01  9:57 ` Kay Sievers
2004-10-01 10:43 ` Kay Sievers
2004-10-01 22:18 ` Kay Sievers
2004-10-03 21:10 ` Frank Steiner
2004-10-03 23:07 ` Kay Sievers
2004-10-04  6:15 ` Frank Steiner
2004-10-04 14:19 ` Kay Sievers
2004-10-04 14:53 ` Frank Steiner
2004-10-05 15:37 ` Kay Sievers
2004-10-06  6:06 ` Frank Steiner
2004-10-06 12:00 ` Kay Sievers
2004-10-06 12:29 ` Frank Steiner
2004-10-08  5:59 ` Frank Steiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).