From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nishanth Aravamudan Date: Mon, 25 Oct 2004 17:27:18 +0000 Subject: Re: udev hangs under high loads Message-Id: <20041025172718.GD2209@us.ibm.com> List-Id: References: <20041023054119.GA11915@kroah.com> In-Reply-To: <20041023054119.GA11915@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-hotplug@vger.kernel.org On Mon, Oct 25, 2004 at 04:51:14AM +0200, Kay Sievers wrote: > On Sun, Oct 24, 2004 at 07:08:41AM +0200, Kay Sievers wrote: > > On Sat, Oct 23, 2004 at 08:31:04AM +0200, Kay Sievers wrote: > > > On Fri, Oct 22, 2004 at 10:41:19PM -0700, Greg KH wrote: > > > > One of my coworkers (hi Nish) is trying to get 10000 disk support tested > > > > and working properly with linux and ran into a nasty udev issue > > > > yesterday. It seems that under a very high load, with a whole bunch of > > > > hotplug events happening (for when the disks get added to the system), > > > > udev hangs. > > > > > > > > It hangs in the "grabbing the database lock" portion of the code (don't > > > > have actual logs of where it hangs, will try to get that next week.) > > > > But the interesting thing is we almost handle everything properly. > > > > udev creates the node, tries to write to the database. Then the timer > > > > expires and we report this. After the timer expires, udev is done for. > > > > It just sits and spins, doing a nanosleep constantly. Have 500 of those > > > > instances all running at once, all at a nice level of -10 is a sure way > > > > to bring a box (even a relatively big one) down hard. > > > > > > > > So, while I'll agree finding the root tdb locking bug is a good idea, I > > > > wanted to point out that perhaps we should just exit udev if our timeout > > > > expires, instead of still waiting around. Or do you have a better > > > > solution? > > > > > > Maybe the time the udev process locks the db is too long for that setup > > > and the serialization of the concurrent locks will take longer than the > > > timout of the process? > > > > > > Here is a patch that tries to limit the access time to the db. > > > The current udev opens the db, reads all the rules, writes to the > > > db, executes the scripts and then closes the db. With this patch we open > > > the db after the rules and close it directly after writing to it. > > > > > > A rate limit in udevd may help here too, to keep that under control. > > > > > > > Try testing this out on your own using the scsi_debug module and adding > > > > a few hundred disks. It also helps if you have scsi generic support > > > > enabled, as that creates 2 udev events for every disk created. > > > > > > I expect that this is completely different on a SMP machine, but I will > > > try again. I once tried it with 200 disks and this was working well. > > > > Here is another idea. I've ripped out the tdb code completely. We are > > maintaining a directory of files at /dev/.udevdb/* now. Every node will > > have a corresponding file (the slashes in DEVPATH are replaced by another > > char), that carries all neccessary data. The files are human readable: > > > > [root@pim default]# ls -l /dev/.udevdb/*hda* > > -rw-r--r-- 1 root root 26 Oct 24 06:32 /dev/.udevdb/block@hda > > -rw-r--r-- 1 root root 32 Oct 24 06:32 /dev/.udevdb/block@hda@hda1 > > -rw-r--r-- 1 root root 32 Oct 24 06:32 /dev/.udevdb/block@hda@hda2 > > -rw-r--r-- 1 root root 32 Oct 24 06:32 /dev/.udevdb/block@hda@hda3 > > -rw-r--r-- 1 root root 54 Oct 24 06:32 /dev/.udevdb/block@hdc > > > > [root@pim default]# cat /dev/.udevdb/block@hdc > > P:/block/hdc > > N:hdc > > S:cdrom dvd cdwriter dvdwriter > > A:0 > > > > This way we have _no_ locking at all in userspace, every event operation > > is completely independent from other events. No need to synchronize anything, > > which is expected to be much much faster and reliable on your 10.000 disk > > system. It may also work over NFS. > > > > The patch is complete, but not well tested. It just stopped working on > > it, after the regression test was successful and my box was able to > > reboot and remove the symlinks grabbed form the files :) > > New patch with a fix for a bug introduced in udevinfo. I am going to rerun my test with this patch applied. I will respond soon with the results. -Nish ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net Linux-hotplug-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel