From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kay Sievers Date: Sat, 23 Oct 2004 06:31:04 +0000 Subject: Re: udev hangs under high loads Message-Id: <20041023063104.GA23512@vrfy.org> MIME-Version: 1 Content-Type: multipart/mixed; boundary="C7zPtVaVf+AK4Oqc" List-Id: References: <20041023054119.GA11915@kroah.com> In-Reply-To: <20041023054119.GA11915@kroah.com> To: linux-hotplug@vger.kernel.org --C7zPtVaVf+AK4Oqc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Fri, Oct 22, 2004 at 10:41:19PM -0700, Greg KH wrote: > One of my coworkers (hi Nish) is trying to get 10000 disk support tested > and working properly with linux and ran into a nasty udev issue > yesterday. It seems that under a very high load, with a whole bunch of > hotplug events happening (for when the disks get added to the system), > udev hangs. > > It hangs in the "grabbing the database lock" portion of the code (don't > have actual logs of where it hangs, will try to get that next week.) > But the interesting thing is we almost handle everything properly. > udev creates the node, tries to write to the database. Then the timer > expires and we report this. After the timer expires, udev is done for. > It just sits and spins, doing a nanosleep constantly. Have 500 of those > instances all running at once, all at a nice level of -10 is a sure way > to bring a box (even a relatively big one) down hard. > > So, while I'll agree finding the root tdb locking bug is a good idea, I > wanted to point out that perhaps we should just exit udev if our timeout > expires, instead of still waiting around. Or do you have a better > solution? Maybe the time the udev process locks the db is too long for that setup and the serialization of the concurrent locks will take longer than the timout of the process? Here is a patch that tries to limit the access time to the db. The current udev opens the db, reads all the rules, writes to the db, executes the scripts and then closes the db. With this patch we open the db after the rules and close it directly after writing to it. A rate limit in udevd may help here too, to keep that under control. > Try testing this out on your own using the scsi_debug module and adding > a few hundred disks. It also helps if you have scsi generic support > enabled, as that creates 2 udev events for every disk created. I expect that this is completely different on a SMP machine, but I will try again. I once tried it with 200 disks and this was working well. Kay --C7zPtVaVf+AK4Oqc Content-Type: text/plain; charset=us-ascii Content-Disposition: inline; filename="udev-db-small-window-01.patch" ===== udev.c 1.74 vs edited ===== --- 1.74/udev.c 2004-10-20 06:13:33 +02:00 +++ edited/udev.c 2004-10-23 08:03:29 +02:00 @@ -156,15 +156,17 @@ int main(int argc, char *argv[], char *e /* trigger timout to interrupt blocking syscalls */ alarm(ALARM_TIMEOUT); - /* initialize udev database */ - if (udevdb_init(UDEVDB_DEFAULT) != 0) - info("error: unable to initialize database, continuing without database"); - switch(act_type) { case UDEVSTART: dbg("udevstart"); namedev_init(); + + /* add node for every device in sysfs, store everything in db and run scripts */ + if (udevdb_init(UDEVDB_DEFAULT) != 0) + info("error: unable to initialize database, continuing without database"); retval = udev_start(); + udevdb_exit(); + break; case ADD: dbg("udev add"); @@ -182,7 +184,10 @@ int main(int argc, char *argv[], char *e namedev_init(); /* name, create node, store in db */ + if (udevdb_init(UDEVDB_DEFAULT) != 0) + info("error: unable to initialize database, continuing without database"); retval = udev_add_device(&udev, class_dev); + udevdb_exit(); /* run scripts */ dev_d_execute(&udev); @@ -192,15 +197,15 @@ int main(int argc, char *argv[], char *e case REMOVE: dbg("udev remove"); - /* get node from db, delete it*/ + /* get node from db, delete node, delete db entry */ + if (udevdb_init(UDEVDB_DEFAULT) != 0) + info("error: unable to initialize database, continuing without database"); retval = udev_remove_device(&udev); + udevdb_exit(); /* run scripts */ dev_d_execute(&udev); } - - udevdb_exit(); - exit: logging_close(); return retval; --C7zPtVaVf+AK4Oqc-- ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net Linux-hotplug-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel