From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kay Sievers Date: Thu, 30 Sep 2004 14:07:22 +0000 Subject: Re: Hanging udev process on nfs-mounted /dev Message-Id: <1096553242.5010.29.camel@localhost.localdomain> List-Id: References: <415980BF.1020401@bio.ifi.lmu.de> In-Reply-To: <415980BF.1020401@bio.ifi.lmu.de> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-hotplug@vger.kernel.org On Thu, 2004-09-30 at 08:18 +0200, Frank Steiner wrote: > Kay Sievers wrote > > >>Seems we have two different problems here, one that sounds like a loop > >>consuming all the CPU and onother one, like the trace, which looks like > >>a F_SETLKW deadlock. > >>The traces are indicating a deadlock, where processes are simply waiting > >>for each other for a write-lock on the udev.tdb to be released. > > That would match my observation that there seemed to be 2-3 udev processes > started almost at the same time. Since I recorded all the udev traces > with the ppid in the log name, I could see that there were always three > processes started close together (the log files having the same timestamp > and the ppids not differing much, like pids 29465, 29470 and 29473), so > they might deadlock. It would be nice to know, if there is posssibly one process spinning at this time, which blocks all the other processes? Or if there is a "real" deadlock, where all processes are blocking in the lock call. You may increase the alarm()-timout to have more than 20 seconds to investigate this :) > However, also note that these problems so far occured only on hosts > having /dev/ mounted via NFS. Maybe the slow NFS traffic (in comparison > to the local hard disk) is well-suited for triggering the deadlock. Sounds possible. > > Here is a patch that implements a timeout for the dead udev process. After > > 20 seconds the lock system call is interrupted and the error debug from tdb > > is logged to the syslog. I needed to port the sleep() calls, cause they > > are not compatible with alarm(). > > Thanks for the patch, I will apply it and try to reproduce the situations! > If I get a log, I will send it here. > > A general question: Someone on the NFS mailing list proposed to remove > the NFS mount for dev and replace it by some tmpfs mounted on /dev. > SuSE is not really prepared for it, so udevstart misses a lot of devices > like /dev/stderr etc., but I could hack this myself easily. > Is it safe to assume that one should have less problems with a tmpfs > dev compared to a NFS mount? Yeah, it does not sound very sane to do concurrent writing to the same file over nfs without proper locking. A local tmpfs-based /dev seems more appropriate for that. It should be faster anyway and there is no reason to store the /dev anywhere while using udev. Thanks, Kay ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net Linux-hotplug-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel