* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
@ 2004-09-29 17:18 ` Greg KH
2004-09-29 23:39 ` Kay Sievers
` (23 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Greg KH @ 2004-09-29 17:18 UTC (permalink / raw)
To: linux-hotplug
On Tue, Sep 28, 2004 at 05:18:23PM +0200, Frank Steiner wrote:
> Hi,
>
> I also sent this to the NFS list, because I'm not sure if this is an
> NFS or an udev problem. I hope it's ok to ask here!
>
>
> The issue:
> =====
> From time to time some udev process goes mad and comsumes allmost all
> the CPU power, making the whole system terribly slow.
This isn't a NFS specific bug. I've had a number of reports of this in
the past. It traces itself back to a tdb "issue" that the internal
database links are getting messed up and looping on themselves wrongly.
Unfortunately I haven't had the time to look into this fully, but
hopefully near the end of this week I will.
I'll post here if I find anything out.
thanks,
greg k-h
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
2004-09-29 17:18 ` Greg KH
@ 2004-09-29 23:39 ` Kay Sievers
2004-09-30 2:11 ` Kay Sievers
` (22 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-09-29 23:39 UTC (permalink / raw)
To: linux-hotplug
On Wed, 2004-09-29 at 10:18 -0700, Greg KH wrote:
> On Tue, Sep 28, 2004 at 05:18:23PM +0200, Frank Steiner wrote:
> > Hi,
> >
> > I also sent this to the NFS list, because I'm not sure if this is an
> > NFS or an udev problem. I hope it's ok to ask here!
> >
> >
> > The issue:
> > =====
> > From time to time some udev process goes mad and comsumes allmost all
> > the CPU power, making the whole system terribly slow.
>
> This isn't a NFS specific bug. I've had a number of reports of this in
> the past. It traces itself back to a tdb "issue" that the internal
> database links are getting messed up and looping on themselves wrongly.
Seems we have two different problems here, one that sounds like a loop
consuming all the CPU and onother one, like the trace, which looks like
a F_SETLKW deadlock.
The traces are indicating a deadlock, where processes are simply waiting
for each other for a write-lock on the udev.tdb to be released.
Kay
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
2004-09-29 17:18 ` Greg KH
2004-09-29 23:39 ` Kay Sievers
@ 2004-09-30 2:11 ` Kay Sievers
2004-09-30 6:18 ` Frank Steiner
` (21 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-09-30 2:11 UTC (permalink / raw)
To: linux-hotplug
[-- Attachment #1: Type: text/plain, Size: 1972 bytes --]
On Thu, Sep 30, 2004 at 01:39:46AM +0200, Kay Sievers wrote:
> On Wed, 2004-09-29 at 10:18 -0700, Greg KH wrote:
> > On Tue, Sep 28, 2004 at 05:18:23PM +0200, Frank Steiner wrote:
> > > Hi,
> > >
> > > I also sent this to the NFS list, because I'm not sure if this is an
> > > NFS or an udev problem. I hope it's ok to ask here!
> > >
> > >
> > > The issue:
> > > ==========
> > > From time to time some udev process goes mad and comsumes allmost all
> > > the CPU power, making the whole system terribly slow.
> >
> > This isn't a NFS specific bug. I've had a number of reports of this in
> > the past. It traces itself back to a tdb "issue" that the internal
> > database links are getting messed up and looping on themselves wrongly.
>
> Seems we have two different problems here, one that sounds like a loop
> consuming all the CPU and onother one, like the trace, which looks like
> a F_SETLKW deadlock.
> The traces are indicating a deadlock, where processes are simply waiting
> for each other for a write-lock on the udev.tdb to be released.
Here is a patch that implements a timeout for the dead udev process. After
20 seconds the lock system call is interrupted and the error debug from tdb
is logged to the syslog. I needed to port the sleep() calls, cause they
are not compatible with alarm().
As I can't reproduce this on my box, I locked the complete database with a
simple test program. A deadlock in the db-open call now looks like this:
udev: main: looking at '/block/hda'
udev: error: timout reached, node probably not created, please report to <linux-hotplug-devel@lists.sourceforge.net>
udev: tdb_brlock failed (fd=4) at offset 0 rw_type=1 lck_type=7
udev: tdb_open_ex: failed to get global lock on /dev/.udev.tdb: Interrupted system call
udev: udevdb_init: unable to initialize database at '/dev/.udev.tdb'
udev: main: unable to initialize database
Maybe this will help to bring some light into the tdb failure.
Good luck,
Kay
[-- Attachment #2: udev-deadlock-debug-01.patch --]
[-- Type: text/plain, Size: 5874 bytes --]
===== namedev.c 1.146 vs edited =====
--- 1.146/namedev.c 2004-09-08 15:17:55 +02:00
+++ edited/namedev.c 2004-09-30 04:03:42 +02:00
@@ -29,7 +29,6 @@
#include <ctype.h>
#include <unistd.h>
#include <errno.h>
-#include <time.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
@@ -353,7 +352,6 @@ static struct bus_file {
{}
};
-#define SECONDS_TO_WAIT_FOR_FILE 10
static void wait_for_device_to_initialize(struct sysfs_device *sysfs_device)
{
/* sleep until we see the file for this specific bus type show up this
@@ -367,14 +365,14 @@ static void wait_for_device_to_initializ
struct bus_file *b = &bus_files[0];
struct sysfs_attribute *tmpattr;
int found = 0;
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (1) {
if (b->bus == NULL) {
if (!found)
break;
- /* sleep to give the kernel a chance to create the file */
- sleep(1);
+ /* give the kernel a chance to create the file */
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
--loop;
if (loop == 0)
break;
@@ -682,7 +680,6 @@ static struct sysfs_device *get_sysfs_de
{
struct sysfs_device *sysfs_device;
struct sysfs_class_device *class_dev_parent;
- struct timespec tspec;
int loop;
/* Figure out where the device symlink is at. For char devices this will
@@ -698,16 +695,14 @@ static struct sysfs_device *get_sysfs_de
if (class_dev_parent != NULL)
dbg("given class device has a parent, use this instead");
- tspec.tv_sec = 0;
- tspec.tv_nsec = 10000000; /* sleep 10 millisec */
- loop = 10;
+ loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (loop--) {
if (udev_sleep) {
if (whitelist_search(class_dev)) {
sysfs_device = NULL;
goto exit;
}
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
if (class_dev_parent)
@@ -729,11 +724,9 @@ device_found:
if (sysfs_device->bus[0] != '\0')
goto bus_found;
- loop = 10;
- tspec.tv_nsec = 10000000;
while (loop--) {
if (udev_sleep)
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
sysfs_get_device_bus(sysfs_device);
if (sysfs_device->bus[0] != '\0')
===== udev-add.c 1.73 vs edited =====
--- 1.73/udev-add.c 2004-08-05 00:41:08 +02:00
+++ edited/udev-add.c 2004-09-30 02:18:31 +02:00
@@ -340,11 +340,10 @@ exit:
/* wait for the "dev" file to show up in the directory in sysfs.
* If it doesn't happen in about 10 seconds, give up.
*/
-#define SECONDS_TO_WAIT_FOR_FILE 10
static int sleep_for_file(const char *path, char* file)
{
char filename[SYSFS_PATH_MAX + 6];
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
int retval;
strfieldcpy(filename, sysfs_path);
@@ -360,7 +359,7 @@ static int sleep_for_file(const char *pa
goto exit;
/* sleep to give the kernel a chance to create the dev file */
- sleep(1);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
retval = -ENODEV;
exit:
===== udev.c 1.62 vs edited =====
--- 1.62/udev.c 2004-09-14 02:25:32 +02:00
+++ edited/udev.c 2004-09-30 03:52:06 +02:00
@@ -36,6 +36,9 @@
#include "namedev.h"
#include "udevdb.h"
+/* timeout flag for udevdb */
+extern sig_atomic_t gotalarm;
+
/* global variables */
char **main_argv;
char **main_envp;
@@ -58,6 +61,11 @@ void log_message(int level, const char *
asmlinkage static void sig_handler(int signum)
{
switch (signum) {
+ case SIGALRM:
+ gotalarm = 1;
+ info("error: timout reached, node probably not created, "
+ "please report to <linux-hotplug-devel@lists.sourceforge.net> ");
+ break;
case SIGINT:
case SIGTERM:
udevdb_exit();
@@ -147,14 +155,21 @@ int main(int argc, char *argv[], char *e
/* set signal handlers */
act.sa_handler = sig_handler;
sigemptyset (&act.sa_mask);
+
+ /* alarm should interrupt */
+ sigaction(SIGALRM, &act, NULL);
+
act.sa_flags = SA_RESTART;
sigaction(SIGINT, &act, NULL);
sigaction(SIGTERM, &act, NULL);
+ /* trigger timout */
+ alarm(20);
+
/* initialize udev database */
if (udevdb_init(UDEVDB_DEFAULT) != 0) {
dbg("unable to initialize database");
- goto exit;
+ exit(1);
}
switch(act_type) {
===== udev.h 1.62 vs edited =====
--- 1.62/udev.h 2004-09-14 14:29:10 +02:00
+++ edited/udev.h 2004-09-30 02:46:13 +02:00
@@ -26,6 +26,8 @@
#include <sys/param.h>
#include "libsysfs/sysfs/libsysfs.h"
+#define WAIT_FOR_FILE_SECONDS 10
+#define WAIT_FOR_FILE_RETRY_FREQ 10
#define COMMENT_CHARACTER '#'
#define NAME_SIZE 256
===== udevdb.c 1.30 vs edited =====
--- 1.30/udevdb.c 2004-06-29 14:51:35 +02:00
+++ edited/udevdb.c 2004-09-30 03:01:59 +02:00
@@ -42,7 +42,19 @@
#include "tdb/tdb.h"
static TDB_CONTEXT *udevdb;
+sig_atomic_t gotalarm;
+static void tdb_log(TDB_CONTEXT *tdb, int level, const char *format, ...)
+{
+ va_list args;
+
+ if (!udev_log)
+ return;
+
+ va_start(args, format);
+ vsyslog(level, format, args);
+ va_end(args);
+}
int udevdb_add_dev(const char *path, const struct udevice *dev)
{
@@ -121,7 +133,9 @@ int udevdb_init(int init_flag)
if (init_flag != UDEVDB_DEFAULT && init_flag != UDEVDB_INTERNAL)
return -EINVAL;
- udevdb = tdb_open(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644);
+ tdb_set_lock_alarm(&gotalarm);
+
+ udevdb = tdb_open_ex(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644, tdb_log);
if (udevdb == NULL) {
if (init_flag == UDEVDB_INTERNAL)
dbg("unable to initialize in-memory database");
@@ -137,7 +151,7 @@ int udevdb_init(int init_flag)
*/
int udevdb_open_ro(void)
{
- udevdb = tdb_open(udev_db_filename, 0, 0, O_RDONLY, 0);
+ udevdb = tdb_open_ex(udev_db_filename, 0, 0, O_RDONLY, 0, tdb_log);
if (udevdb == NULL) {
dbg("unable to open database at '%s'", udev_db_filename);
return -EACCES;
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (2 preceding siblings ...)
2004-09-30 2:11 ` Kay Sievers
@ 2004-09-30 6:18 ` Frank Steiner
2004-09-30 6:21 ` Frank Steiner
` (20 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-09-30 6:18 UTC (permalink / raw)
To: linux-hotplug
Kay Sievers wrote
>>Seems we have two different problems here, one that sounds like a loop
>>consuming all the CPU and onother one, like the trace, which looks like
>>a F_SETLKW deadlock.
>>The traces are indicating a deadlock, where processes are simply waiting
>>for each other for a write-lock on the udev.tdb to be released.
That would match my observation that there seemed to be 2-3 udev processes
started almost at the same time. Since I recorded all the udev traces
with the ppid in the log name, I could see that there were always three
processes started close together (the log files having the same timestamp
and the ppids not differing much, like pids 29465, 29470 and 29473), so
they might deadlock.
However, also note that these problems so far occured only on hosts
having /dev/ mounted via NFS. Maybe the slow NFS traffic (in comparison
to the local hard disk) is well-suited for triggering the deadlock.
> Here is a patch that implements a timeout for the dead udev process. After
> 20 seconds the lock system call is interrupted and the error debug from tdb
> is logged to the syslog. I needed to port the sleep() calls, cause they
> are not compatible with alarm().
Thanks for the patch, I will apply it and try to reproduce the situations!
If I get a log, I will send it here.
A general question: Someone on the NFS mailing list proposed to remove
the NFS mount for dev and replace it by some tmpfs mounted on /dev.
SuSE is not really prepared for it, so udevstart misses a lot of devices
like /dev/stderr etc., but I could hack this myself easily.
Is it safe to assume that one should have less problems with a tmpfs
dev compared to a NFS mount?
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (3 preceding siblings ...)
2004-09-30 6:18 ` Frank Steiner
@ 2004-09-30 6:21 ` Frank Steiner
2004-09-30 14:07 ` Kay Sievers
` (19 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-09-30 6:21 UTC (permalink / raw)
To: linux-hotplug
Just one more comment:
Kay Sievers wrote
>>The traces are indicating a deadlock, where processes are simply waiting
>>for each other for a write-lock on the udev.tdb to be released.
The /dev mounts are mounted with "nolock". Could that be a reason for
that problem? Usually I would expect both processes to write concurrently
without the "lock" option, maybe destroying the udev.tdb? But perhaps
the effect of "nolock" is different in this case, causing the deadlock...
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (4 preceding siblings ...)
2004-09-30 6:21 ` Frank Steiner
@ 2004-09-30 14:07 ` Kay Sievers
2004-10-01 6:25 ` Frank Steiner
` (18 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-09-30 14:07 UTC (permalink / raw)
To: linux-hotplug
On Thu, 2004-09-30 at 08:18 +0200, Frank Steiner wrote:
> Kay Sievers wrote
>
> >>Seems we have two different problems here, one that sounds like a loop
> >>consuming all the CPU and onother one, like the trace, which looks like
> >>a F_SETLKW deadlock.
> >>The traces are indicating a deadlock, where processes are simply waiting
> >>for each other for a write-lock on the udev.tdb to be released.
>
> That would match my observation that there seemed to be 2-3 udev processes
> started almost at the same time. Since I recorded all the udev traces
> with the ppid in the log name, I could see that there were always three
> processes started close together (the log files having the same timestamp
> and the ppids not differing much, like pids 29465, 29470 and 29473), so
> they might deadlock.
It would be nice to know, if there is posssibly one process spinning at
this time, which blocks all the other processes? Or if there is a "real"
deadlock, where all processes are blocking in the lock call.
You may increase the alarm()-timout to have more than 20 seconds to
investigate this :)
> However, also note that these problems so far occured only on hosts
> having /dev/ mounted via NFS. Maybe the slow NFS traffic (in comparison
> to the local hard disk) is well-suited for triggering the deadlock.
Sounds possible.
> > Here is a patch that implements a timeout for the dead udev process. After
> > 20 seconds the lock system call is interrupted and the error debug from tdb
> > is logged to the syslog. I needed to port the sleep() calls, cause they
> > are not compatible with alarm().
>
> Thanks for the patch, I will apply it and try to reproduce the situations!
> If I get a log, I will send it here.
>
> A general question: Someone on the NFS mailing list proposed to remove
> the NFS mount for dev and replace it by some tmpfs mounted on /dev.
> SuSE is not really prepared for it, so udevstart misses a lot of devices
> like /dev/stderr etc., but I could hack this myself easily.
> Is it safe to assume that one should have less problems with a tmpfs
> dev compared to a NFS mount?
Yeah, it does not sound very sane to do concurrent writing to the same
file over nfs without proper locking. A local tmpfs-based /dev seems
more appropriate for that. It should be faster anyway and there is no
reason to store the /dev anywhere while using udev.
Thanks,
Kay
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (5 preceding siblings ...)
2004-09-30 14:07 ` Kay Sievers
@ 2004-10-01 6:25 ` Frank Steiner
2004-10-01 7:36 ` Kay Sievers
` (17 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-01 6:25 UTC (permalink / raw)
To: linux-hotplug
Kay Sievers wrote
> It would be nice to know, if there is posssibly one process spinning at
> this time, which blocks all the other processes? Or if there is a "real"
> deadlock, where all processes are blocking in the lock call.
As far as I remember, when the udev process was running with 90% cpu time,
it was the only udev process (pgrep udev).
> You may increase the alarm()-timout to have more than 20 seconds to
> investigate this :)
I will try, but the hosts where the problem occured most frequently are
desktop clients of research asisstants. So it is not that easy to debug
it without stopping the users from working :-)
> Yeah, it does not sound very sane to do concurrent writing to the same
> file over nfs without proper locking. A local tmpfs-based /dev seems
> more appropriate for that. It should be faster anyway and there is no
> reason to store the /dev anywhere while using udev.
With debugging and logging enabled now (needed for your patch to compile),
I get lots of messages from udev broadcasted to every shell, which is
quite annoying for the users, because they get all their xterms filled:
Oct 1 05:46:50 noether udevinfo[336]: rec_read bad magic 0xd9fee666 at offsety12
And this about 15 times, every time the card reader reconnects. On the
hosts where I already replaces /dev/ with a tmpfs, those messages do
not appear at all. Apart from /dev-over-nfs vs. /dev-over-tmpfs there
are no differences between these hosts. So it seems that the nfs causes
some other problems, that tmpfs is not suffering from. I would like
to switch all hosts to tmpfs immediately, but I'm afraid I won't get
any deadlocks anymore so that we cannot do useful debugging :-)
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (6 preceding siblings ...)
2004-10-01 6:25 ` Frank Steiner
@ 2004-10-01 7:36 ` Kay Sievers
2004-10-01 7:38 ` Frank Steiner
` (16 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-01 7:36 UTC (permalink / raw)
To: linux-hotplug
On Fri, 2004-10-01 at 08:25 +0200, Frank Steiner wrote:
> Kay Sievers wrote
>
> > It would be nice to know, if there is posssibly one process spinning at
> > this time, which blocks all the other processes? Or if there is a "real"
> > deadlock, where all processes are blocking in the lock call.
>
> As far as I remember, when the udev process was running with 90% cpu time,
> it was the only udev process (pgrep udev).
This may be the process that blocks all the other ones. If you can find
one of these beasts, please attach gdb to the running process and look
if we find something in the backtrace. Here is a sample from my
"lock the whole file"-test application:
* [root@pim ~]# gdb -p 14727
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
Attaching to process 14727
...
Reading symbols from /home/kay/src/lock...(no debugging symbols found)...done.
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
0x0804839f in spin ()
* (gdb) bt
#0 0x0804839f in spin ()
#1 0x08048405 in main ()
* (gdb) q
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /home/kay/src/lock, process 14727
[root@pim ~]#
> With debugging and logging enabled now (needed for your patch to compile),
> I get lots of messages from udev broadcasted to every shell, which is
> quite annoying for the users, because they get all their xterms filled:
>
> Oct 1 05:46:50 noether udevinfo[336]: rec_read bad magic 0xd9fee666 at offsety12
Oops, that is from the tdb-code and indicates a corrupt database, which
is likely the reason for all the bad behavior. You may try to
"rm /dev/.udev.tdb" and look if these messages are going away. The next
udev run will create a new one.
Does "udevinfo -d" (database dump) print anything?
The /dev is stored on nfs and not cleaned and recreated with udevstart
before mounting, right? So the database may be corrupt since a long
time?
Best,
Kay
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (7 preceding siblings ...)
2004-10-01 7:36 ` Kay Sievers
@ 2004-10-01 7:38 ` Frank Steiner
2004-10-01 7:55 ` Frank Steiner
` (15 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-01 7:38 UTC (permalink / raw)
To: linux-hotplug
Hi,
here we go :-) On reboot, one of the clients ran into the haning
udev process. Althoug the timeout patch was applied, the hanging
udev process was not killed.
But it blocked a lot of other processes because there are messages
about "timeout reached" in /var/log/messages. I had to reboot the
PC (the professors client :-)), but I tried to collect all information
that might be helpful.
I've put all the logs on a website. They include /var/log/messages
from the point where the system bootet until it hung, a "ps -aux" output
while udev was hanging, and the straces for all udev processes started
during the boot. Recall that I replaced /sbin/udev{start} by
strace -o /var/log/udev.log.`uname -n`.${$} -f /sbin/utest/`basename $0` $@
and moved the original udev and udevstart to /sbin/utest/.
All the information is here: http://www.bio.ifi.lmu.de/~steiner/udev/
The udev traces are sorted in "ls -lat" order.
The udev process that was hanging had pid 9700. The matching strace
is udev.log.noether.9652. After calling "pkill udev" to make the
host usable again, three straces were changed. Those are listed
with both versions, so that one can see what happened after killing
(don't know if this helps). Again, the hanging udev process hung
after F_SETLKW:
...
9700 fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start(8, len=1}) = 0
9700 fcntl64(5, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, startt924, len=1}) = 0
9700 fcntl64(5, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, startt924, len=1}) = 0
9700 fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start\x164, len=1}) = 0
9700 --- SIGALRM (Alarm clock) @ 0 (0) ---
9700 time([1096612648]) = 1096612648
9700 rt_sigaction(SIGPIPE, {0x40116ae0, [], SA_RESTORER, 0x40067aa8}, {SIG_DFL}, 8) = 0
9700 send(0, "<14>Oct 1 08:37:28 udev: error:"..., 137, 0) = 137
9700 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
9700 sigreturn() = ? (mask now [])
And after "pkill udev" those lines were added:
9700 --- SIGTERM (Terminated) @ 0 (0) ---
9700 munmap(0x4001a000, 81920) = 0
9700 close(5) = 0
9700 exit_group(35) = ?
I hope these information help!
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (8 preceding siblings ...)
2004-10-01 7:38 ` Frank Steiner
@ 2004-10-01 7:55 ` Frank Steiner
2004-10-01 8:08 ` Kay Sievers
` (14 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-01 7:55 UTC (permalink / raw)
To: linux-hotplug
Kay Sievers wrote
> This may be the process that blocks all the other ones. If you can find
> one of these beasts, please attach gdb to the running process and look
> if we find something in the backtrace. Here is a sample from my
> "lock the whole file"-test application:
Arrrgggh damn! I wish I had waited a bit longer with killing the process
so that I had read this mail before and could have tried the gdb :-((
But since it was the professors (my boss :-)) client, he wanted to have
it back working quickly.
I will try to get the lock again on another host by rebooting it over and
over again, maybe I can trigger the lock.
>>With debugging and logging enabled now (needed for your patch to compile),
>>I get lots of messages from udev broadcasted to every shell, which is
>>quite annoying for the users, because they get all their xterms filled:
>>
>>Oct 1 05:46:50 noether udevinfo[336]: rec_read bad magic 0xd9fee666 at offsety12
>
>
> Oops, that is from the tdb-code and indicates a corrupt database, which
> is likely the reason for all the bad behavior. You may try to
> "rm /dev/.udev.tdb" and look if these messages are going away. The next
> udev run will create a new one.
Hmm, this sounds like the problem is NFS without locking. Maybe two
processes indeed write concurrently to the database, thus corrupting it.
That would also explain why I don't see any of these messages on the
tmpfs hosts.
I wish there was a solution for nfsroot with nfs locking :-(
> Does "udevinfo -d" (database dump) print anything?
Not very much, just 4 entries:
noether /var/log# udevinfo -d
P: /block/fd0
N: fd0
T: b
M: 060660
S:
O: root
G: disk
F:
L: 0
U: 55
P: /block/loop4
N: loop4
T: b
M: 060660
S:
O: root
G: disk
F:
L: 0
U: 55
P: /block/ram0
N: ram0
T: b
M: 060660
S:
O: root
G: disk
F:
L: 0
U: 56
P: /class/scsi_generic/sg0
N: sg0
T: c
M: 020640
S: by-path/usb-storage-00000000710D:0:0:0-generic
O: root
G: disk
F: /etc/udev/rules.d/udev.rules
L: 6
U: 498
noether /var/log#
Message from syslogd@noether at Fri Oct 1 09:44:59 2004 ...
noether udevinfo[11629]: rec_read bad magic 0xd9fee666 at offsetc176
And that's it. On a hardware-identical host with /dev being a tmpfs,
I have a bout 180 entries!
>
> The /dev is stored on nfs and not cleaned and recreated with udevstart
> before mounting, right? So the database may be corrupt since a long
> time?
boot.udev is run on boot, so it recreates the database on every start,
and thus, it looks like it gets corrupted again on almost every boot.
Note that my scenario here is a little bit mixed because I just started
using udev by backporting the hotplug stuff from SuSE 9.1 to SuSE 9.0.
But SuSE 9.1 is still using a static /dev and udev just for certain
things like hotplugging of e.g. usb devices or pktsetup etc. Since
most of the boot script from SuSE are not prepared for working with
an empty /dev, many devices are missing if I run udevstart on an empty
/dev. E.g, things like /dev/stdin etc. Because I didn't want to hack
every SuSE script, I kept their static devices (from a devs.rpm) but
boot.udev is still running.
I can try to reproduce it on another host to get the gdb stuff, but
I feel pretty sure now that it is problem with the "nolock" mount
option for the NFS-based /dev...
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (9 preceding siblings ...)
2004-10-01 7:55 ` Frank Steiner
@ 2004-10-01 8:08 ` Kay Sievers
2004-10-01 9:43 ` Frank Steiner
` (13 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-01 8:08 UTC (permalink / raw)
To: linux-hotplug
On Fri, 2004-10-01 at 09:38 +0200, Frank Steiner wrote:
> Hi,
>
> here we go :-) On reboot, one of the clients ran into the haning
> udev process. Althoug the timeout patch was applied, the hanging
> udev process was not killed.
That's ok. The signal handler does not kill the process. It is just a
timeout to interrupt a system call waiting for the kernel. The tdb code
return unsuccessful if it catches that timeout. The hanging udev version
is spinning by itself (not hanging in a system call) and therefore will
do that forever.
> But it blocked a lot of other processes because there are messages
> about "timeout reached" in /var/log/messages. I had to reboot the
> PC (the professors client :-)), but I tried to collect all information
> that might be helpful.
Yes, sure, it is. We're getting closer.
> I've put all the logs on a website. They include /var/log/messages
> from the point where the system bootet until it hung, a "ps -aux" output
> while udev was hanging, and the straces for all udev processes started
> during the boot. Recall that I replaced /sbin/udev{start} by
>
> strace -o /var/log/udev.log.`uname -n`.${$} -f /sbin/utest/`basename $0` $@
>
> and moved the original udev and udevstart to /sbin/utest/.
> All the information is here: http://www.bio.ifi.lmu.de/~steiner/udev/
> The udev traces are sorted in "ls -lat" order.
>
> The udev process that was hanging had pid 9700. The matching strace
> is udev.log.noether.9652. After calling "pkill udev" to make the
> host usable again, three straces were changed. Those are listed
> with both versions, so that one can see what happened after killing
> (don't know if this helps). Again, the hanging udev process hung
> after F_SETLKW:
> ...
> 9700 fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start(8, len=1}) = 0
> 9700 fcntl64(5, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, startt924, len=1}) = 0
> 9700 fcntl64(5, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, startt924, len=1}) = 0
> 9700 fcntl64(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start\x164, len=1}) = 0
> 9700 --- SIGALRM (Alarm clock) @ 0 (0) ---
> 9700 time([1096612648]) = 1096612648
> 9700 rt_sigaction(SIGPIPE, {0x40116ae0, [], SA_RESTORER, 0x40067aa8}, {SIG_DFL}, 8) = 0
> 9700 send(0, "<14>Oct 1 08:37:28 udev: error:"..., 137, 0) = 137
> 9700 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
> 9700 sigreturn() = ? (mask now [])
Yes, that's the fault. Seems that this process locks the db-file and
then keeps spinning forever without doing system calls. It's just a loop
inside of the tdb code. It consumed a lot of your CPU:
> root 9688 0.0 0.0 1696 600 ? S< 08:37 0:00 strace -o /var/log/udev.log.noether.9652 -f /sbin/utest/udev scsi_generic
> root 9700 99.9 0.0 1664 604 ? R< 08:37 17:37 /sbin/utest/udev scsi_generic
Thanks,
Kay
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (10 preceding siblings ...)
2004-10-01 8:08 ` Kay Sievers
@ 2004-10-01 9:43 ` Frank Steiner
2004-10-01 9:57 ` Kay Sievers
` (12 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-01 9:43 UTC (permalink / raw)
To: linux-hotplug
Kay Sievers wrote
>>boot.udev is run on boot, so it recreates the database on every start,
>>and thus, it looks like it gets corrupted again on almost every boot.
>
>
> Weird! Please double check that this script is really running. There is
> nothing in the boot.msg file, but I don't know if there should be some.
It is definitely running: The link is in boot.d:
noether /root# ls -la /etc/init.d/boot.d/S01boot.udev
lrwxrwxrwx 1 root root 12 2004-09-30 08:57 /etc/init.d/boot.d/S01boot.udev -> ../boot.udev
and boot.udev contains:
case "$1" in
start)
if [ -x /sbin/udev -a -x /sbin/udevstart ] ; then
echo -n "creating device nodes "
rm -f /dev/.udev.tdb
/sbin/udevstart
You can also see this in the strace log of the first udev process,
which is the last on the page (the lowest pid number). The first
line of this strace is
1037 execve("/sbin/utest/udevstart", ["/sbin/utest/udevstart"], [/* 174 vars */]) = 0
A general question for understanding things better: Let's assume that
the errors indeed are caused by missing nfs locking on my /dev dir.
It sounds reasonable that udev must be able to rely on propper locking
for maintaining its database, so one should not expect it to work on
a fs without locking.
Would it be reasonable to issue a warning if udev detects it's running
on a fs without locking (if this is possible to detect)? Or, if in
case of missing locks the hangs cannot be prevented, udev could even
refuse to do any work. If one gets a message from udev "No locks available.
I will not create any devices until you give me locks" this would
definitely help people doing stupid things like mounting /dev via NFS :-)
I'm still trying to get another host hanging, so that I can login and
call gdb. I'll let you know if it happens :-)
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (11 preceding siblings ...)
2004-10-01 9:43 ` Frank Steiner
@ 2004-10-01 9:57 ` Kay Sievers
2004-10-01 10:43 ` Kay Sievers
` (11 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-01 9:57 UTC (permalink / raw)
To: linux-hotplug
On Fri, 2004-10-01 at 09:55 +0200, Frank Steiner wrote:
> Kay Sievers wrote
>
> > This may be the process that blocks all the other ones. If you can find
> > one of these beasts, please attach gdb to the running process and look
> > if we find something in the backtrace. Here is a sample from my
> > "lock the whole file"-test application:
>
>
> Arrrgggh damn! I wish I had waited a bit longer with killing the process
> so that I had read this mail before and could have tried the gdb :-((
> But since it was the professors (my boss :-)) client, he wanted to have
> it back working quickly.
Hey, it already consumed 17 minutes of the CPU. I think he has a better
job for that CPU than a spinning udev :)
> >>Oct 1 05:46:50 noether udevinfo[336]: rec_read bad magic 0xd9fee666 at offsety12
> >
> >
> > Oops, that is from the tdb-code and indicates a corrupt database, which
> > is likely the reason for all the bad behavior. You may try to
> > "rm /dev/.udev.tdb" and look if these messages are going away. The next
> > udev run will create a new one.
>
> Hmm, this sounds like the problem is NFS without locking. Maybe two
> processes indeed write concurrently to the database, thus corrupting it.
> That would also explain why I don't see any of these messages on the
> tmpfs hosts.
This is probably what happens here, yes.
> > Does "udevinfo -d" (database dump) print anything?
>
> Not very much, just 4 entries:
This database if definitely corrupt.
> > The /dev is stored on nfs and not cleaned and recreated with udevstart
> > before mounting, right? So the database may be corrupt since a long
> > time?
>
> boot.udev is run on boot, so it recreates the database on every start,
> and thus, it looks like it gets corrupted again on almost every boot.
Weird! Please double check that this script is really running. There is
nothing in the boot.msg file, but I don't know if there should be some.
Thanks,
Kay
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (12 preceding siblings ...)
2004-10-01 9:57 ` Kay Sievers
@ 2004-10-01 10:43 ` Kay Sievers
2004-10-01 22:18 ` Kay Sievers
` (10 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-01 10:43 UTC (permalink / raw)
To: linux-hotplug
On Fri, 2004-10-01 at 11:43 +0200, Frank Steiner wrote:
> A general question for understanding things better: Let's assume that
> the errors indeed are caused by missing nfs locking on my /dev dir.
> It sounds reasonable that udev must be able to rely on propper locking
> for maintaining its database, so one should not expect it to work on
> a fs without locking.
> Would it be reasonable to issue a warning if udev detects it's running
> on a fs without locking (if this is possible to detect)? Or, if in
> case of missing locks the hangs cannot be prevented, udev could even
> refuse to do any work. If one gets a message from udev "No locks available.
> I will not create any devices until you give me locks" this would
> definitely help people doing stupid things like mounting /dev via NFS :-)
We should find the loop bug first. Then the alarm() should be sufficient
to prevent a hanging udev. Yes, we may include a hint in the logged
error.
We still can create nodes with udev, even with a corrupt database (I
will change the alarm() patch to act like this later). Only the remove
event will eventually fail, if a rule has set a custom name for the
device.
Other programs asking with udevinfo, may also not work correctly, but
it's better to create the node without bookkeeping than to do nothing,
I think.
Best,
Kay
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (13 preceding siblings ...)
2004-10-01 10:43 ` Kay Sievers
@ 2004-10-01 22:18 ` Kay Sievers
2004-10-03 21:10 ` Frank Steiner
` (9 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-01 22:18 UTC (permalink / raw)
To: linux-hotplug
[-- Attachment #1: Type: text/plain, Size: 1931 bytes --]
On Fri, Oct 01, 2004 at 12:43:47PM +0200, Kay Sievers wrote:
> On Fri, 2004-10-01 at 11:43 +0200, Frank Steiner wrote:
>
> > A general question for understanding things better: Let's assume that
> > the errors indeed are caused by missing nfs locking on my /dev dir.
> > It sounds reasonable that udev must be able to rely on propper locking
> > for maintaining its database, so one should not expect it to work on
> > a fs without locking.
> > Would it be reasonable to issue a warning if udev detects it's running
> > on a fs without locking (if this is possible to detect)? Or, if in
> > case of missing locks the hangs cannot be prevented, udev could even
> > refuse to do any work. If one gets a message from udev "No locks available.
> > I will not create any devices until you give me locks" this would
> > definitely help people doing stupid things like mounting /dev via NFS :-)
>
> We should find the loop bug first. Then the alarm() should be sufficient
> to prevent a hanging udev. Yes, we may include a hint in the logged
> error.
>
> We still can create nodes with udev, even with a corrupt database (I
> will change the alarm() patch to act like this later). Only the remove
> event will eventually fail, if a rule has set a custom name for the
> device.
> Other programs asking with udevinfo, may also not work correctly, but
> it's better to create the node without bookkeeping than to do nothing,
> I think.
Here is a new patch to try to recover from a corrupted udev database. udev
will now continue without database support in that case and log the failure
to syslog. So the node should be generated in any case, remove will obviously
not work for custom names. All tdb errors will be logged to syslog.
I've added two iteration limits to the tdb-code at the places I expect
the endless loop. In the case we try to find more than 100.000 db-entries
with the same hash, we better give up :)
Good luck,
Kay
[-- Attachment #2: udev-deadlock-debug-02.patch --]
[-- Type: text/plain, Size: 8377 bytes --]
===== namedev.c 1.146 vs edited =====
--- 1.146/namedev.c 2004-09-08 15:17:55 +02:00
+++ edited/namedev.c 2004-10-01 22:17:52 +02:00
@@ -29,7 +29,6 @@
#include <ctype.h>
#include <unistd.h>
#include <errno.h>
-#include <time.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
@@ -353,7 +352,6 @@
{}
};
-#define SECONDS_TO_WAIT_FOR_FILE 10
static void wait_for_device_to_initialize(struct sysfs_device *sysfs_device)
{
/* sleep until we see the file for this specific bus type show up this
@@ -367,14 +365,14 @@
struct bus_file *b = &bus_files[0];
struct sysfs_attribute *tmpattr;
int found = 0;
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (1) {
if (b->bus == NULL) {
if (!found)
break;
- /* sleep to give the kernel a chance to create the file */
- sleep(1);
+ /* give the kernel a chance to create the file */
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
--loop;
if (loop == 0)
break;
@@ -394,7 +392,8 @@
}
if (!found)
dbg("did not find bus type '%s' on list of bus_id_files, "
- "contact greg@kroah.com", sysfs_device->bus);
+ "please report to <linux-hotplug-devel@lists.sourceforge.net>",
+ sysfs_device->bus);
exit:
return; /* here to prevent compiler warning... */
}
@@ -682,7 +681,6 @@
{
struct sysfs_device *sysfs_device;
struct sysfs_class_device *class_dev_parent;
- struct timespec tspec;
int loop;
/* Figure out where the device symlink is at. For char devices this will
@@ -698,16 +696,14 @@
if (class_dev_parent != NULL)
dbg("given class device has a parent, use this instead");
- tspec.tv_sec = 0;
- tspec.tv_nsec = 10000000; /* sleep 10 millisec */
- loop = 10;
+ loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (loop--) {
if (udev_sleep) {
if (whitelist_search(class_dev)) {
sysfs_device = NULL;
goto exit;
}
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
if (class_dev_parent)
@@ -729,11 +725,9 @@
if (sysfs_device->bus[0] != '\0')
goto bus_found;
- loop = 10;
- tspec.tv_nsec = 10000000;
while (loop--) {
if (udev_sleep)
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
sysfs_get_device_bus(sysfs_device);
if (sysfs_device->bus[0] != '\0')
===== udev-add.c 1.73 vs edited =====
--- 1.73/udev-add.c 2004-08-05 00:41:08 +02:00
+++ edited/udev-add.c 2004-09-30 02:18:31 +02:00
@@ -340,11 +340,10 @@
/* wait for the "dev" file to show up in the directory in sysfs.
* If it doesn't happen in about 10 seconds, give up.
*/
-#define SECONDS_TO_WAIT_FOR_FILE 10
static int sleep_for_file(const char *path, char* file)
{
char filename[SYSFS_PATH_MAX + 6];
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
int retval;
strfieldcpy(filename, sysfs_path);
@@ -360,7 +359,7 @@
goto exit;
/* sleep to give the kernel a chance to create the dev file */
- sleep(1);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
retval = -ENODEV;
exit:
===== udev.c 1.62 vs edited =====
--- 1.62/udev.c 2004-09-14 02:25:32 +02:00
+++ edited/udev.c 2004-10-01 23:50:59 +02:00
@@ -36,6 +36,9 @@
#include "namedev.h"
#include "udevdb.h"
+/* timeout flag for udevdb */
+extern sig_atomic_t gotalarm;
+
/* global variables */
char **main_argv;
char **main_envp;
@@ -58,6 +61,11 @@
asmlinkage static void sig_handler(int signum)
{
switch (signum) {
+ case SIGALRM:
+ gotalarm = 1;
+ info("error: timeout reached, event probably not handled correctly, "
+ "please report to <linux-hotplug-devel@lists.sourceforge.net> ");
+ break;
case SIGINT:
case SIGTERM:
udevdb_exit();
@@ -94,7 +102,8 @@
dbg("version %s", UDEV_VERSION);
- /* initialize our configuration */
+ init_logging("udev");
+
udev_init_config();
if (strstr(argv[0], "udevstart")) {
@@ -147,15 +156,20 @@
/* set signal handlers */
act.sa_handler = sig_handler;
sigemptyset (&act.sa_mask);
+
+ /* alarm should interrupt */
+ sigaction(SIGALRM, &act, NULL);
+
act.sa_flags = SA_RESTART;
sigaction(SIGINT, &act, NULL);
sigaction(SIGTERM, &act, NULL);
+ /* trigger timout to interrupt blocking syscalls */
+ alarm(ALARM_TIMEOUT);
+
/* initialize udev database */
- if (udevdb_init(UDEVDB_DEFAULT) != 0) {
- dbg("unable to initialize database");
- goto exit;
- }
+ if (udevdb_init(UDEVDB_DEFAULT) != 0)
+ info("error: unable to initialize database, continuing without database");
switch(act_type) {
case UDEVSTART:
===== udev.h 1.62 vs edited =====
--- 1.62/udev.h 2004-09-14 14:29:10 +02:00
+++ edited/udev.h 2004-10-01 22:20:37 +02:00
@@ -26,6 +26,9 @@
#include <sys/param.h>
#include "libsysfs/sysfs/libsysfs.h"
+#define ALARM_TIMEOUT 20
+#define WAIT_FOR_FILE_SECONDS 10
+#define WAIT_FOR_FILE_RETRY_FREQ 10
#define COMMENT_CHARACTER '#'
#define NAME_SIZE 256
===== udevdb.c 1.30 vs edited =====
--- 1.30/udevdb.c 2004-06-29 14:51:35 +02:00
+++ edited/udevdb.c 2004-10-01 23:46:47 +02:00
@@ -42,13 +42,28 @@
#include "tdb/tdb.h"
static TDB_CONTEXT *udevdb;
+sig_atomic_t gotalarm;
+static void tdb_log(TDB_CONTEXT *tdb, int level, const char *format, ...)
+{
+ va_list args;
+
+ if (!udev_log)
+ return;
+
+ va_start(args, format);
+ vsyslog(level, format, args);
+ va_end(args);
+}
int udevdb_add_dev(const char *path, const struct udevice *dev)
{
TDB_DATA key, data;
char keystr[SYSFS_PATH_MAX];
+ if (udevdb == NULL)
+ return -1;
+
if ((path == NULL) || (dev == NULL))
return -ENODEV;
@@ -68,6 +83,9 @@
{
TDB_DATA key, data;
+ if (udevdb == NULL)
+ return -1;
+
if (path == NULL)
return -ENODEV;
@@ -88,6 +106,9 @@
TDB_DATA key;
char keystr[SYSFS_PATH_MAX];
+ if (udevdb == NULL)
+ return -1;
+
if (path == NULL)
return -EINVAL;
@@ -121,7 +142,9 @@
if (init_flag != UDEVDB_DEFAULT && init_flag != UDEVDB_INTERNAL)
return -EINVAL;
- udevdb = tdb_open(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644);
+ tdb_set_lock_alarm(&gotalarm);
+
+ udevdb = tdb_open_ex(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644, tdb_log);
if (udevdb == NULL) {
if (init_flag == UDEVDB_INTERNAL)
dbg("unable to initialize in-memory database");
@@ -137,7 +160,7 @@
*/
int udevdb_open_ro(void)
{
- udevdb = tdb_open(udev_db_filename, 0, 0, O_RDONLY, 0);
+ udevdb = tdb_open_ex(udev_db_filename, 0, 0, O_RDONLY, 0, tdb_log);
if (udevdb == NULL) {
dbg("unable to open database at '%s'", udev_db_filename);
return -EACCES;
@@ -159,6 +182,9 @@
int udevdb_call_foreach(int (*user_record_handler) (char *path, struct udevice *dev))
{
int retval = 0;
+
+ if (udevdb == NULL)
+ return -1;
if (user_record_handler == NULL) {
dbg("invalid user record handling function");
===== tdb/tdb.c 1.3 vs edited =====
--- 1.3/tdb/tdb.c 2003-12-17 01:23:27 +01:00
+++ edited/tdb/tdb.c 2004-10-01 23:53:17 +02:00
@@ -980,12 +980,14 @@
struct list_struct *r)
{
tdb_off rec_ptr;
-
+ int maxloop;
+
/* read in the hash top */
if (ofs_read(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1)
return 0;
/* keep looking until we find the right record */
+ maxloop = 100000;
while (rec_ptr) {
if (rec_read(tdb, rec_ptr, r) == -1)
return 0;
@@ -1005,6 +1007,12 @@
SAFE_FREE(k);
}
rec_ptr = r->next;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "tdb_find maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, 0);
+ }
}
return TDB_ERRCODE(TDB_ERR_NOEXIST, 0);
}
@@ -1187,6 +1195,7 @@
{
tdb_off last_ptr, i;
struct list_struct lastrec;
+ int maxloop;
if (tdb->read_only) return -1;
@@ -1201,9 +1210,18 @@
/* find previous record in hash chain */
if (ofs_read(tdb, TDB_HASH_TOP(rec->full_hash), &i) == -1)
return -1;
- for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next)
+
+ maxloop = 100000;
+ for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next) {
if (rec_read(tdb, i, &lastrec) == -1)
return -1;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "(tdb)do_delete: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
+ }
+ }
/* unlink it: next ptr is at start of record. */
if (last_ptr == 0)
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (14 preceding siblings ...)
2004-10-01 22:18 ` Kay Sievers
@ 2004-10-03 21:10 ` Frank Steiner
2004-10-03 23:07 ` Kay Sievers
` (8 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-03 21:10 UTC (permalink / raw)
To: linux-hotplug
Hi,
I got two hosts locked with those hanging udev processes. I can leave
them in this state for at least the whole Monday.
Kay Sievers wrote
> This may be the process that blocks all the other ones. If you can find
> one of these beasts, please attach gdb to the running process and look
> if we find something in the backtrace. Here is a sample from my
> "lock the whole file"-test application:
>
> * [root@pim ~]# gdb -p 14727
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> Attaching to process 14727
> ...
> Reading symbols from /home/kay/src/lock...(no debugging symbols found)...done.
> Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib/ld-linux.so.2
> 0x0804839f in spin ()
> * (gdb) bt
> #0 0x0804839f in spin ()
> #1 0x08048405 in main ()
> * (gdb) q
> The program is running. Quit anyway (and detach it)? (y or n) y
> Detaching from program: /home/kay/src/lock, process 14727
> [root@pim ~]#
Unfortunately, this fails :-(
GNU gdb 5.3.92
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-suse-linux".
Attaching to process 24498
ptrace: Operation not permitted.
/export/localhome/root/24498: No such file or directory.
From googling I learned that this could be a problem with child processes
that were forked or sth. similar...
Anything else I can do on the hanging hosts to provide helpful information?
I hope that they don't crash too soon, because there are a lot of hanging
udev processes now, with only one taking all the cpu time, Likely the others
are all just waiting for the one to unlock the database...
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: -4054
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (15 preceding siblings ...)
2004-10-03 21:10 ` Frank Steiner
@ 2004-10-03 23:07 ` Kay Sievers
2004-10-04 6:15 ` Frank Steiner
` (7 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-03 23:07 UTC (permalink / raw)
To: linux-hotplug
On Sun, 2004-10-03 at 23:10 +0200, Frank Steiner wrote:
> Hi,
>
> I got two hosts locked with those hanging udev processes. I can leave
> them in this state for at least the whole Monday.
>
> Kay Sievers wrote
>
> > This may be the process that blocks all the other ones. If you can find
> > one of these beasts, please attach gdb to the running process and look
> > if we find something in the backtrace. Here is a sample from my
> > "lock the whole file"-test application:
> >
> > * [root@pim ~]# gdb -p 14727
> > Copyright 2004 Free Software Foundation, Inc.
> > GDB is free software, covered by the GNU General Public License, and you are
> > welcome to change it and/or distribute copies of it under certain conditions.
> > Type "show copying" to see the conditions.
> > There is absolutely no warranty for GDB. Type "show warranty" for details.
> > Attaching to process 14727
> > ...
> > Reading symbols from /home/kay/src/lock...(no debugging symbols found)...done.
> > Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
> > Loaded symbols for /lib/ld-linux.so.2
> > 0x0804839f in spin ()
> > * (gdb) bt
> > #0 0x0804839f in spin ()
> > #1 0x08048405 in main ()
> > * (gdb) q
> > The program is running. Quit anyway (and detach it)? (y or n) y
> > Detaching from program: /home/kay/src/lock, process 14727
> > [root@pim ~]#
>
> Unfortunately, this fails :-(
>
> GNU gdb 5.3.92
> Copyright 2003 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "i586-suse-linux".
> Attaching to process 24498
> ptrace: Operation not permitted.
> /export/localhome/root/24498: No such file or directory.
>
>
> From googling I learned that this could be a problem with child processes
> that were forked or sth. similar...
Oh, bad.
It's your running strace that prevents gbd to take the control over the
process, I expect. Just try to send the strace parent-process a SIGUSR1
which may leave the udev process running.
Thanks,
Kay
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (16 preceding siblings ...)
2004-10-03 23:07 ` Kay Sievers
@ 2004-10-04 6:15 ` Frank Steiner
2004-10-04 14:19 ` Kay Sievers
` (6 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-04 6:15 UTC (permalink / raw)
To: linux-hotplug
[-- Attachment #1: Type: text/plain, Size: 1048 bytes --]
Kay Sievers wrote
> Oh, bad.
> It's your running strace that prevents gbd to take the control over the
> process, I expect. Just try to send the strace parent-process a SIGUSR1
> which may leave the udev process running.
Yes, that worked :-) On one host there were already 3 udev processes
sharing all the CPU time, on the second it was just one. All four
traces look a little bit different, that's why I've attached all
four of them. I hope you can fetch sth. useful from the traces...
cu,
Frank
P.S.: I will wait with the patch you sent until you tell me that you got
all information from the hanging udevs you need. I can leave the hosts
up and hanging for today, one of them even until end of the week!
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
[-- Attachment #2: udev.trace.1 --]
[-- Type: text/x-troff-man, Size: 2150 bytes --]
turan /root# gdb -p 24498
GNU gdb 5.3.92
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-suse-linux".
Attaching to process 24498
Reading symbols from /sbin/utest/udev...done.
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_compat.so.2...done.
Loaded symbols for /lib/libnss_compat.so.2
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
0x400b91be in memcpy () from /lib/i686/libc.so.6
(gdb) bt
#0 0x400b91be in memcpy () from /lib/i686/libc.so.6
#1 0x08056164 in tdb_read (tdb=0x80640a8, off=1073871052, buf=0xbffff2d8,
len=3221222108, cv=0) at tdb/tdb.c:407
#2 0x080562b9 in ofs_read (tdb=0x4, offset=22728, d=0xbffff2d8)
at tdb/tdb.c:447
#3 0x080567e4 in remove_from_freelist (tdb=0x80640a8, off=28176, next=696)
at tdb/tdb.c:628
#4 0x080568d4 in tdb_free (tdb=0x80640a8, offset=27268, rec=0xbffff370)
at tdb/tdb.c:662
#5 0x08056e2e in tdb_allocate (tdb=0x80640a8, length=884, rec=0xbffff3e0)
at tdb/tdb.c:910
#6 0x08057e5a in tdb_store (tdb=0x80640a8, key=
{dptr = 0xbffff450 "/class/scsi_generic/sg3", dsize = 24}, dbuf=
{dptr = 0xbffff6a0 "sg3", dsize = 856}, flag=1) at tdb/tdb.c:1497
#7 0x0804c361 in udevdb_add_dev (path=0xbfffff67 "/class/scsi_generic/sg3",
dev=0xbffff6a0) at udevdb.c:76
#8 0x0804bb66 in udev_add_device (path=0xbfffff67 "/class/scsi_generic/sg3",
subsystem=0xbfffff45 "scsi_generic", fake=0) at udev-add.c:446
#9 0x0804971f in main (argc=2, argv=0xbffffd70, envp=0x4) at udev.c:185
(gdb)
[-- Attachment #3: udev.trace.2 --]
[-- Type: text/x-troff-man, Size: 1783 bytes --]
knuth /root# gdb -p 12063
GNU gdb 5.3.92
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-suse-linux".
Attaching to process 12063
Reading symbols from /sbin/utest/udev...done.
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
tdb_oob (tdb=0x80640a8, len=3221222944, probe=7004) at tdb/tdb.c:342
342 tdb/tdb.c: No such file or directory.
in tdb/tdb.c
(gdb) bt
#0 tdb_oob (tdb=0x80640a8, len=3221222944, probe=7004) at tdb/tdb.c:342
#1 0x0805638b in rec_read (tdb=0x80640a8, offset=7004, rec=0xbffff620)
at tdb/tdb.c:466
#2 0x0805703e in tdb_find (tdb=0x80640a8, key=
{dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}, hash=536231552,
r=0xbffff620) at tdb/tdb.c:990
#3 0x080571b8 in tdb_find_lock (tdb=0x80640a8, key=
{dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}, locktype=0,
rec=0xbffff620) at tdb/tdb.c:1035
#4 0x080572ef in tdb_fetch (tdb=0x80640a8, key=
{dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}) at tdb/tdb.c:1113
#5 0x0804c3a3 in udevdb_get_dev (path=0xffffff00 <Address 0xffffff00 out of bounds>,
dev=0xbffff6a0) at udevdb.c:89
#6 0x0804c17f in udev_remove_device (path=0xbfffff60 "/class/scsi_device/26:0:0:0",
subsystem=0xbfffff45 "block") at udev-remove.c:170
#7 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0x1b74) at udev.c:189
[-- Attachment #4: udev.trace.3 --]
[-- Type: text/x-troff-man, Size: 1855 bytes --]
knuth /root# gdb -p 23587
GNU gdb 5.3.92
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-suse-linux".
Attaching to process 23587
Reading symbols from /sbin/utest/udev...done.
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0x08055f9a in tdb_oob (tdb=0x80640a8, len=7028, probe=0) at tdb/tdb.c:344
344 tdb/tdb.c: No such file or directory.
in tdb/tdb.c
(gdb) bt
#0 0x08055f9a in tdb_oob (tdb=0x80640a8, len=7028, probe=0) at tdb/tdb.c:344
#1 0x0805613c in tdb_read (tdb=0x80640a8, off=7004, buf=0xbffff620, len=24, cv=0)
at tdb/tdb.c:403
#2 0x0805631e in rec_read (tdb=0x80640a8, offset=7004, rec=0xbffff620)
at tdb/tdb.c:458
#3 0x0805703e in tdb_find (tdb=0x80640a8, key=
{dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}, hash=3221149257,
r=0xbffff620) at tdb/tdb.c:990
#4 0x080571b8 in tdb_find_lock (tdb=0x80640a8, key=
{dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}, locktype=0,
rec=0xbffff620) at tdb/tdb.c:1035
#5 0x080572ef in tdb_fetch (tdb=0x80640a8, key=
{dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}) at tdb/tdb.c:1113
#6 0x0804c3a3 in udevdb_get_dev (path=0x0, dev=0xbffff6a0) at udevdb.c:89
#7 0x0804c17f in udev_remove_device (path=0xbfffff61 "/class/scsi_device/9:0:0:2",
subsystem=0xbfffff46 "block") at udev-remove.c:170
#8 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0x0) at udev.c:189
(gdb)
[-- Attachment #5: udev.trace.4 --]
[-- Type: text/x-troff-man, Size: 1606 bytes --]
knuth /root# gdb -p 25580
GNU gdb 5.3.92
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-suse-linux".
Attaching to process 25580
Reading symbols from /sbin/utest/udev...done.
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
ofs_read (tdb=0xbffff438, offset=20016, d=0x0) at tdb/tdb.c:446
446 tdb/tdb.c: No such file or directory.
in tdb/tdb.c
(gdb) bt
#0 ofs_read (tdb=0xbffff438, offset=20016, d=0x0) at tdb/tdb.c:446
#1 0x080567e4 in remove_from_freelist (tdb=0x80640a8, off=20016, next=0)
at tdb/tdb.c:628
#2 0x080568d4 in tdb_free (tdb=0x80640a8, offset=19108, rec=0xbffff520)
at tdb/tdb.c:662
#3 0x08057586 in do_delete (tdb=0x80640a8, rec_ptr=19108, rec=0xbffff520)
at tdb/tdb.c:1215
#4 0x08057cc4 in tdb_delete (tdb=0x80640a8, key=
{dptr = 0xbffff570 "/class/scsi_generic/sg2", dsize = 24}) at tdb/tdb.c:1434
#5 0x0804c475 in udevdb_delete_dev (path=0xbfffff64 "/class/scsi_generic/sg2")
at udevdb.c:112
#6 0x0804c218 in udev_remove_device (path=0xbfffff64 "/class/scsi_generic/sg2",
subsystem=0xbfffff42 "scsi_generic") at udev-remove.c:182
#7 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0xbffff438) at udev.c:189
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (17 preceding siblings ...)
2004-10-04 6:15 ` Frank Steiner
@ 2004-10-04 14:19 ` Kay Sievers
2004-10-04 14:53 ` Frank Steiner
` (5 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-04 14:19 UTC (permalink / raw)
To: linux-hotplug
[-- Attachment #1: Type: text/plain, Size: 4999 bytes --]
On Mon, Oct 04, 2004 at 08:15:46AM +0200, Frank Steiner wrote:
> Kay Sievers wrote
>
> >Oh, bad.
> >It's your running strace that prevents gbd to take the control over the
> >process, I expect. Just try to send the strace parent-process a SIGUSR1
> >which may leave the udev process running.
>
> Yes, that worked :-) On one host there were already 3 udev processes
> sharing all the CPU time, on the second it was just one. All four
> traces look a little bit different, that's why I've attached all
> four of them. I hope you can fetch sth. useful from the traces...
Ok, let's look at it:
> (gdb) bt
> #0 0x400b91be in memcpy () from /lib/i686/libc.so.6
> #1 0x08056164 in tdb_read (tdb=0x80640a8, off=1073871052, buf=0xbffff2d8,
> len=3221222108, cv=0) at tdb/tdb.c:407
> #2 0x080562b9 in ofs_read (tdb=0x4, offset=22728, d=0xbffff2d8)
> at tdb/tdb.c:447
> #3 0x080567e4 in remove_from_freelist (tdb=0x80640a8, off=28176, next=696)
> at tdb/tdb.c:628
> #4 0x080568d4 in tdb_free (tdb=0x80640a8, offset=27268, rec=0xbffff370)
> at tdb/tdb.c:662
> #5 0x08056e2e in tdb_allocate (tdb=0x80640a8, length=884, rec=0xbffff3e0)
> at tdb/tdb.c:910
> #6 0x08057e5a in tdb_store (tdb=0x80640a8, key=
> {dptr = 0xbffff450 "/class/scsi_generic/sg3", dsize = 24}, dbuf=
> {dptr = 0xbffff6a0 "sg3", dsize = 856}, flag=1) at tdb/tdb.c:1497
> #7 0x0804c361 in udevdb_add_dev (path=0xbfffff67 "/class/scsi_generic/sg3",
> dev=0xbffff6a0) at udevdb.c:76
> #8 0x0804bb66 in udev_add_device (path=0xbfffff67 "/class/scsi_generic/sg3",
> subsystem=0xbfffff45 "scsi_generic", fake=0) at udev-add.c:446
> #9 0x0804971f in main (argc=2, argv=0xbffffd70, envp=0x4) at udev.c:185
This seems to be a loop in tdb_allocate().
> (gdb) bt
> #0 tdb_oob (tdb=0x80640a8, len=3221222944, probe=7004) at tdb/tdb.c:342
> #1 0x0805638b in rec_read (tdb=0x80640a8, offset=7004, rec=0xbffff620)
> at tdb/tdb.c:466
> #2 0x0805703e in tdb_find (tdb=0x80640a8, key=
> {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}, hash=536231552,
> r=0xbffff620) at tdb/tdb.c:990
> #3 0x080571b8 in tdb_find_lock (tdb=0x80640a8, key=
> {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}, locktype=0,
> rec=0xbffff620) at tdb/tdb.c:1035
> #4 0x080572ef in tdb_fetch (tdb=0x80640a8, key=
> {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}) at tdb/tdb.c:1113
> #5 0x0804c3a3 in udevdb_get_dev (path=0xffffff00 <Address 0xffffff00 out of bounds>,
> dev=0xbffff6a0) at udevdb.c:89
> #6 0x0804c17f in udev_remove_device (path=0xbfffff60 "/class/scsi_device/26:0:0:0",
> subsystem=0xbfffff45 "block") at udev-remove.c:170
> #7 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0x1b74) at udev.c:189
This is the loop in tdb_find().
> (gdb) bt
> #0 0x08055f9a in tdb_oob (tdb=0x80640a8, len=7028, probe=0) at tdb/tdb.c:344
> #1 0x0805613c in tdb_read (tdb=0x80640a8, off=7004, buf=0xbffff620, len=24, cv=0)
> at tdb/tdb.c:403
> #2 0x0805631e in rec_read (tdb=0x80640a8, offset=7004, rec=0xbffff620)
> at tdb/tdb.c:458
> #3 0x0805703e in tdb_find (tdb=0x80640a8, key=
> {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}, hash=3221149257,
> r=0xbffff620) at tdb/tdb.c:990
> #4 0x080571b8 in tdb_find_lock (tdb=0x80640a8, key=
> {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}, locktype=0,
> rec=0xbffff620) at tdb/tdb.c:1035
> #5 0x080572ef in tdb_fetch (tdb=0x80640a8, key=
> {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}) at tdb/tdb.c:1113
> #6 0x0804c3a3 in udevdb_get_dev (path=0x0, dev=0xbffff6a0) at udevdb.c:89
> #7 0x0804c17f in udev_remove_device (path=0xbfffff61 "/class/scsi_device/9:0:0:2",
> subsystem=0xbfffff46 "block") at udev-remove.c:170
> #8 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0x0) at udev.c:189
The same tdb_find() loop.
> (gdb) bt
> #0 ofs_read (tdb=0xbffff438, offset=20016, d=0x0) at tdb/tdb.c:446
> #1 0x080567e4 in remove_from_freelist (tdb=0x80640a8, off=20016, next=0)
> at tdb/tdb.c:628
> #2 0x080568d4 in tdb_free (tdb=0x80640a8, offset=19108, rec=0xbffff520)
> at tdb/tdb.c:662
> #3 0x08057586 in do_delete (tdb=0x80640a8, rec_ptr=19108, rec=0xbffff520)
> at tdb/tdb.c:1215
> #4 0x08057cc4 in tdb_delete (tdb=0x80640a8, key=
> {dptr = 0xbffff570 "/class/scsi_generic/sg2", dsize = 24}) at tdb/tdb.c:1434
> #5 0x0804c475 in udevdb_delete_dev (path=0xbfffff64 "/class/scsi_generic/sg2")
> at udevdb.c:112
> #6 0x0804c218 in udev_remove_device (path=0xbfffff64 "/class/scsi_generic/sg2",
> subsystem=0xbfffff42 "scsi_generic") at udev-remove.c:182
> #7 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0xbffff438) at udev.c:189
Seems to be a loop in remove_from_freelist().
All here known failure paths are now covered in the attached patch by
limiting the iteration count for loops over data read from disk. Let's
see what happens next :)
Kay
[-- Attachment #2: udev-deadlock-debug-03.patch --]
[-- Type: text/plain, Size: 10735 bytes --]
===== namedev.c 1.146 vs edited =====
--- 1.146/namedev.c 2004-09-08 15:17:55 +02:00
+++ edited/namedev.c 2004-10-04 15:24:31 +02:00
@@ -29,7 +29,6 @@
#include <ctype.h>
#include <unistd.h>
#include <errno.h>
-#include <time.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
@@ -353,7 +352,6 @@ static struct bus_file {
{}
};
-#define SECONDS_TO_WAIT_FOR_FILE 10
static void wait_for_device_to_initialize(struct sysfs_device *sysfs_device)
{
/* sleep until we see the file for this specific bus type show up this
@@ -367,14 +365,14 @@ static void wait_for_device_to_initializ
struct bus_file *b = &bus_files[0];
struct sysfs_attribute *tmpattr;
int found = 0;
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (1) {
if (b->bus == NULL) {
if (!found)
break;
- /* sleep to give the kernel a chance to create the file */
- sleep(1);
+ /* give the kernel a chance to create the file */
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
--loop;
if (loop == 0)
break;
@@ -394,7 +392,8 @@ static void wait_for_device_to_initializ
}
if (!found)
dbg("did not find bus type '%s' on list of bus_id_files, "
- "contact greg@kroah.com", sysfs_device->bus);
+ "please report to <linux-hotplug-devel@lists.sourceforge.net>",
+ sysfs_device->bus);
exit:
return; /* here to prevent compiler warning... */
}
@@ -682,7 +681,6 @@ static struct sysfs_device *get_sysfs_de
{
struct sysfs_device *sysfs_device;
struct sysfs_class_device *class_dev_parent;
- struct timespec tspec;
int loop;
/* Figure out where the device symlink is at. For char devices this will
@@ -698,16 +696,14 @@ static struct sysfs_device *get_sysfs_de
if (class_dev_parent != NULL)
dbg("given class device has a parent, use this instead");
- tspec.tv_sec = 0;
- tspec.tv_nsec = 10000000; /* sleep 10 millisec */
- loop = 10;
+ loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (loop--) {
if (udev_sleep) {
if (whitelist_search(class_dev)) {
sysfs_device = NULL;
goto exit;
}
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
if (class_dev_parent)
@@ -729,11 +725,9 @@ device_found:
if (sysfs_device->bus[0] != '\0')
goto bus_found;
- loop = 10;
- tspec.tv_nsec = 10000000;
while (loop--) {
if (udev_sleep)
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
sysfs_get_device_bus(sysfs_device);
if (sysfs_device->bus[0] != '\0')
===== udev-add.c 1.73 vs edited =====
--- 1.73/udev-add.c 2004-08-05 00:41:08 +02:00
+++ edited/udev-add.c 2004-10-04 15:24:31 +02:00
@@ -340,11 +340,10 @@ exit:
/* wait for the "dev" file to show up in the directory in sysfs.
* If it doesn't happen in about 10 seconds, give up.
*/
-#define SECONDS_TO_WAIT_FOR_FILE 10
static int sleep_for_file(const char *path, char* file)
{
char filename[SYSFS_PATH_MAX + 6];
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
int retval;
strfieldcpy(filename, sysfs_path);
@@ -360,7 +359,7 @@ static int sleep_for_file(const char *pa
goto exit;
/* sleep to give the kernel a chance to create the dev file */
- sleep(1);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
retval = -ENODEV;
exit:
===== udev.c 1.62 vs edited =====
--- 1.62/udev.c 2004-09-14 02:25:32 +02:00
+++ edited/udev.c 2004-10-04 15:24:31 +02:00
@@ -36,6 +36,9 @@
#include "namedev.h"
#include "udevdb.h"
+/* timeout flag for udevdb */
+extern sig_atomic_t gotalarm;
+
/* global variables */
char **main_argv;
char **main_envp;
@@ -58,6 +61,11 @@ void log_message(int level, const char *
asmlinkage static void sig_handler(int signum)
{
switch (signum) {
+ case SIGALRM:
+ gotalarm = 1;
+ info("error: timeout reached, event probably not handled correctly, "
+ "please report to <linux-hotplug-devel@lists.sourceforge.net> ");
+ break;
case SIGINT:
case SIGTERM:
udevdb_exit();
@@ -94,7 +102,8 @@ int main(int argc, char *argv[], char *e
dbg("version %s", UDEV_VERSION);
- /* initialize our configuration */
+ init_logging("udev");
+
udev_init_config();
if (strstr(argv[0], "udevstart")) {
@@ -147,15 +156,20 @@ int main(int argc, char *argv[], char *e
/* set signal handlers */
act.sa_handler = sig_handler;
sigemptyset (&act.sa_mask);
+
+ /* alarm should interrupt */
+ sigaction(SIGALRM, &act, NULL);
+
act.sa_flags = SA_RESTART;
sigaction(SIGINT, &act, NULL);
sigaction(SIGTERM, &act, NULL);
+ /* trigger timout to interrupt blocking syscalls */
+ alarm(ALARM_TIMEOUT);
+
/* initialize udev database */
- if (udevdb_init(UDEVDB_DEFAULT) != 0) {
- dbg("unable to initialize database");
- goto exit;
- }
+ if (udevdb_init(UDEVDB_DEFAULT) != 0)
+ info("error: unable to initialize database, continuing without database");
switch(act_type) {
case UDEVSTART:
===== udev.h 1.62 vs edited =====
--- 1.62/udev.h 2004-09-14 14:29:10 +02:00
+++ edited/udev.h 2004-10-04 15:24:31 +02:00
@@ -26,6 +26,9 @@
#include <sys/param.h>
#include "libsysfs/sysfs/libsysfs.h"
+#define ALARM_TIMEOUT 20
+#define WAIT_FOR_FILE_SECONDS 10
+#define WAIT_FOR_FILE_RETRY_FREQ 10
#define COMMENT_CHARACTER '#'
#define NAME_SIZE 256
===== udevdb.c 1.30 vs edited =====
--- 1.30/udevdb.c 2004-06-29 14:51:35 +02:00
+++ edited/udevdb.c 2004-10-04 15:24:31 +02:00
@@ -42,13 +42,28 @@
#include "tdb/tdb.h"
static TDB_CONTEXT *udevdb;
+sig_atomic_t gotalarm;
+static void tdb_log(TDB_CONTEXT *tdb, int level, const char *format, ...)
+{
+ va_list args;
+
+ if (!udev_log)
+ return;
+
+ va_start(args, format);
+ vsyslog(level, format, args);
+ va_end(args);
+}
int udevdb_add_dev(const char *path, const struct udevice *dev)
{
TDB_DATA key, data;
char keystr[SYSFS_PATH_MAX];
+ if (udevdb == NULL)
+ return -1;
+
if ((path == NULL) || (dev == NULL))
return -ENODEV;
@@ -68,6 +83,9 @@ int udevdb_get_dev(const char *path, str
{
TDB_DATA key, data;
+ if (udevdb == NULL)
+ return -1;
+
if (path == NULL)
return -ENODEV;
@@ -88,6 +106,9 @@ int udevdb_delete_dev(const char *path)
TDB_DATA key;
char keystr[SYSFS_PATH_MAX];
+ if (udevdb == NULL)
+ return -1;
+
if (path == NULL)
return -EINVAL;
@@ -121,7 +142,9 @@ int udevdb_init(int init_flag)
if (init_flag != UDEVDB_DEFAULT && init_flag != UDEVDB_INTERNAL)
return -EINVAL;
- udevdb = tdb_open(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644);
+ tdb_set_lock_alarm(&gotalarm);
+
+ udevdb = tdb_open_ex(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644, tdb_log);
if (udevdb == NULL) {
if (init_flag == UDEVDB_INTERNAL)
dbg("unable to initialize in-memory database");
@@ -137,7 +160,7 @@ int udevdb_init(int init_flag)
*/
int udevdb_open_ro(void)
{
- udevdb = tdb_open(udev_db_filename, 0, 0, O_RDONLY, 0);
+ udevdb = tdb_open_ex(udev_db_filename, 0, 0, O_RDONLY, 0, tdb_log);
if (udevdb == NULL) {
dbg("unable to open database at '%s'", udev_db_filename);
return -EACCES;
@@ -159,6 +182,9 @@ static int traverse_callback(TDB_CONTEXT
int udevdb_call_foreach(int (*user_record_handler) (char *path, struct udevice *dev))
{
int retval = 0;
+
+ if (udevdb == NULL)
+ return -1;
if (user_record_handler == NULL) {
dbg("invalid user record handling function");
===== tdb/tdb.c 1.3 vs edited =====
--- 1.3/tdb/tdb.c 2003-12-17 01:23:27 +01:00
+++ edited/tdb/tdb.c 2004-10-04 16:01:26 +02:00
@@ -616,8 +616,10 @@ int tdb_printfreelist(TDB_CONTEXT *tdb)
static int remove_from_freelist(TDB_CONTEXT *tdb, tdb_off off, tdb_off next)
{
tdb_off last_ptr, i;
+ int maxloop;
/* read in the freelist top */
+ maxloop = 100000;
last_ptr = FREELIST_TOP;
while (ofs_read(tdb, last_ptr, &i) != -1 && i != 0) {
if (i == off) {
@@ -626,6 +628,12 @@ static int remove_from_freelist(TDB_CONT
}
/* Follow chain (next offset is at start of record) */
last_ptr = i;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "remove_from_freelist: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
+ }
}
TDB_LOG((tdb, 0,"remove_from_freelist: not on list at off=%d\n", off));
return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
@@ -852,6 +860,7 @@ static tdb_off tdb_allocate(TDB_CONTEXT
{
tdb_off rec_ptr, last_ptr, newrec_ptr;
struct list_struct newrec;
+ int maxloop;
if (tdb_lock(tdb, -1, F_WRLCK) == -1)
return 0;
@@ -867,6 +876,7 @@ static tdb_off tdb_allocate(TDB_CONTEXT
goto fail;
/* keep looking until we find a freelist record big enough */
+ maxloop = 100000;
while (rec_ptr) {
if (rec_free_read(tdb, rec_ptr, rec) == -1)
goto fail;
@@ -918,6 +928,12 @@ static tdb_off tdb_allocate(TDB_CONTEXT
/* move to the next record */
last_ptr = rec_ptr;
rec_ptr = rec->next;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "tdb_allocate: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, 0);
+ }
}
/* we didn't find enough space. See if we can expand the
database and if we can then try again */
@@ -980,12 +996,14 @@ static tdb_off tdb_find(TDB_CONTEXT *tdb
struct list_struct *r)
{
tdb_off rec_ptr;
-
+ int maxloop;
+
/* read in the hash top */
if (ofs_read(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1)
return 0;
/* keep looking until we find the right record */
+ maxloop = 100000;
while (rec_ptr) {
if (rec_read(tdb, rec_ptr, r) == -1)
return 0;
@@ -1005,6 +1023,12 @@ static tdb_off tdb_find(TDB_CONTEXT *tdb
SAFE_FREE(k);
}
rec_ptr = r->next;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "tdb_find maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, 0);
+ }
}
return TDB_ERRCODE(TDB_ERR_NOEXIST, 0);
}
@@ -1187,6 +1211,7 @@ static int do_delete(TDB_CONTEXT *tdb, t
{
tdb_off last_ptr, i;
struct list_struct lastrec;
+ int maxloop;
if (tdb->read_only) return -1;
@@ -1201,9 +1226,18 @@ static int do_delete(TDB_CONTEXT *tdb, t
/* find previous record in hash chain */
if (ofs_read(tdb, TDB_HASH_TOP(rec->full_hash), &i) == -1)
return -1;
- for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next)
+
+ maxloop = 100000;
+ for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next) {
if (rec_read(tdb, i, &lastrec) == -1)
return -1;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "(tdb)do_delete: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
+ }
+ }
/* unlink it: next ptr is at start of record. */
if (last_ptr == 0)
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (18 preceding siblings ...)
2004-10-04 14:19 ` Kay Sievers
@ 2004-10-04 14:53 ` Frank Steiner
2004-10-05 15:37 ` Kay Sievers
` (4 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-04 14:53 UTC (permalink / raw)
To: linux-hotplug
Kay Sievers wrote
> All here known failure paths are now covered in the attached patch by
> limiting the iteration count for loops over data read from disk. Let's
> see what happens next :)
Ok :-) I'm running this patch in parallel on hosts using tmpfs for /dev
as well as "old" ones still using NFS. I will report success or failures!
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (19 preceding siblings ...)
2004-10-04 14:53 ` Frank Steiner
@ 2004-10-05 15:37 ` Kay Sievers
2004-10-06 6:06 ` Frank Steiner
` (3 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-05 15:37 UTC (permalink / raw)
To: linux-hotplug
On Mon, 2004-10-04 at 16:53 +0200, Frank Steiner wrote:
> Kay Sievers wrote
>
> > All here known failure paths are now covered in the attached patch by
> > limiting the iteration count for loops over data read from disk. Let's
> > see what happens next :)
>
> Ok :-) I'm running this patch in parallel on hosts using tmpfs for /dev
> as well as "old" ones still using NFS. I will report success or failures!
Nice, thanks for taking the time to investigate this. I hope we catch
all these failures now.
Here is the debug from the corrupt tdb-file you sent me. As expected,
the concurrent writing from multiple processes on a filesystem without
proper locking created an endless loop. Here is the debug with a loop
over three records:
696 -> 16384 -> 18200 ==> 696 -> 16384 -> 18200 ==> ...
[kay@pim ~]$ ./dumpit /home/kay/udev.tdb
hash\x11
rec: offseti6 next\x16384 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0x26011999
rec: offset\x16384 next\x18200 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0xd9fee666
rec: offset\x18200 nexti6 rec_lenˆ4 key_len$ data_len…6 full_hash=0xa713efac magic=0x26011999
rec: offseti6 next\x16384 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0x26011999
rec: offset\x16384 next\x18200 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0xd9fee666
rec: offset\x18200 nexti6 rec_lenˆ4 key_len$ data_len…6 full_hash=0xa713efac magic=0x26011999
rec: offseti6 next\x16384 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0x26011999
rec: offset\x16384 next\x18200 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0xd9fee666
rec: offset\x18200 nexti6 rec_lenˆ4 key_len$ data_len…6 full_hash=0xa713efac magic=0x26011999
rec: offseti6 next\x16384 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0x26011999
rec: offset\x16384 next\x18200 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0xd9fee666
rec: offset\x18200 nexti6 rec_lenˆ4 key_len$ data_len…6 full_hash=0xa713efac magic=0x26011999
rec: offseti6 next\x16384 rec_lenˆ4 key_len$ data_len…6 full_hash=0x13792fac magic=0x26011999
...
Thanks,
Kay
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (20 preceding siblings ...)
2004-10-05 15:37 ` Kay Sievers
@ 2004-10-06 6:06 ` Frank Steiner
2004-10-06 12:00 ` Kay Sievers
` (2 subsequent siblings)
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-06 6:06 UTC (permalink / raw)
To: linux-hotplug
[-- Attachment #1: Type: text/plain, Size: 3353 bytes --]
Frank Steiner wrote
> Kay Sievers wrote
>
>
>>All here known failure paths are now covered in the attached patch by
>>limiting the iteration count for loops over data read from disk. Let's
>>see what happens next :)
>
>
> Ok :-) I'm running this patch in parallel on hosts using tmpfs for /dev
> as well as "old" ones still using NFS. I will report success or failures!
Well, we are not done yet :-) Another udev hanging with 100% CPU, with
the latest deadlock patch applied. Here's the trace (I removed the "(gdb) s"
lines):
knuth /root# gdb -p 3480
GNU gdb 5.3.92
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-suse-linux".
Attaching to process 3480
Reading symbols from /sbin/udev...done.
Reading symbols from /lib/i686/libc.so.6...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
0x4010db79 in chown@@GLIBC_2.1 () from /lib/i686/libc.so.6
(gdb) bt
#0 0x4010db79 in chown@@GLIBC_2.1 () from /lib/i686/libc.so.6
#1 0x0804be05 in secure_unlink (filename=0xbffff360 "/dev/sg2173337579")
at udev-remove.c:78
#2 0x0804c018 in delete_node (dev=0xbffff690) at udev-remove.c:127
#3 0x0804c2c5 in udev_remove_device (path=0xbfffff70 "/class/scsi_generic/sg2",
subsystem=0xbfffff3b "scsi_generic") at udev-remove.c:185
#4 0x080497f1 in main (argc=2, argv=0xbffffd60, envp=0xfffffe00) at udev.c:188
(gdb) s
Single stepping until exit from function chown@@GLIBC_2.1,
which has no line number information.
secure_unlink (filename=0xbffff360 "/dev/sg2173337579") at udev-remove.c:79
79 udev-remove.c: No such file or directory.
in udev-remove.c
81 in udev-remove.c
log_message (level=7, format=0x805a060 "%s: chown(%s, 0, 0) failed with error '%s'")
at udev.c:52
52 udev.c: No such file or directory.
in udev.c
55 in udev.c
56 in udev.c
58 in udev.c
secure_unlink (filename=0xbffff360 "/dev/sg2173337579") at udev-remove.c:87
87 udev-remove.c: No such file or directory.
in udev-remove.c
88 in udev-remove.c
90 in udev-remove.c
log_message (level=7, format=0x805a0a0 "%s: chmod(%s, 0000) failed with error '%s'")
at udev.c:52
52 udev.c: No such file or directory.
in udev.c
55 in udev.c
56 in udev.c
...
and so on. The device in question does not exist (at least not after I exited
tdb):
knuth /root# ls -la /dev/sg2173337579
ls: /dev/sg2173337579: No such file or directory
Just in case I also attached the .udev.tdb from this state if it helps. I'll
leave the host in this state until you say you don't need more info from it :-)
All the hosts running tmpfs have not reported any errors yet!
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
[-- Attachment #2: .udev.tdb --]
[-- Type: application/octet-stream, Size: 81920 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (21 preceding siblings ...)
2004-10-06 6:06 ` Frank Steiner
@ 2004-10-06 12:00 ` Kay Sievers
2004-10-06 12:29 ` Frank Steiner
2004-10-08 5:59 ` Frank Steiner
24 siblings, 0 replies; 26+ messages in thread
From: Kay Sievers @ 2004-10-06 12:00 UTC (permalink / raw)
To: linux-hotplug
[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]
On Wed, Oct 06, 2004 at 08:06:29AM +0200, Frank Steiner wrote:
> Frank Steiner wrote
>
> >Kay Sievers wrote
> >
> >
> >>All here known failure paths are now covered in the attached patch by
> >>limiting the iteration count for loops over data read from disk. Let's
> >>see what happens next :)
> >
> >
> >Ok :-) I'm running this patch in parallel on hosts using tmpfs for /dev
> >as well as "old" ones still using NFS. I will report success or failures!
>
> Well, we are not done yet :-) Another udev hanging with 100% CPU, with
> the latest deadlock patch applied. Here's the trace (I removed the "(gdb) s"
> lines):
Funny, this is not inside the tdb, it's caused by getting garbage from the
udev database.
You don't use {all_partitions}, right?
I expect, that the payload-data from the tdb is corrupt and the record
claims, that the udev_add has added > 2.000.000 partition-nodes for the
device :) You should have something like this in the syslog:
"removing partitions 'sg2[1-24532432]'"
Here is a patch for it. It's against the current tree after our patch
session yesterday. But it should successfully apply to your tree with
offsets.
Thanks,
Kay
[-- Attachment #2: udev-deadlock-debug-04.patch --]
[-- Type: text/plain, Size: 11731 bytes --]
===== namedev.c 1.147 vs edited =====
--- 1.147/namedev.c 2004-09-20 16:01:58 +02:00
+++ edited/namedev.c 2004-10-06 13:09:20 +02:00
@@ -29,7 +29,6 @@
#include <ctype.h>
#include <unistd.h>
#include <errno.h>
-#include <time.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
@@ -353,7 +352,6 @@ static struct bus_file {
{}
};
-#define SECONDS_TO_WAIT_FOR_FILE 10
static void wait_for_device_to_initialize(struct sysfs_device *sysfs_device)
{
/* sleep until we see the file for this specific bus type show up this
@@ -367,14 +365,14 @@ static void wait_for_device_to_initializ
struct bus_file *b = &bus_files[0];
struct sysfs_attribute *tmpattr;
int found = 0;
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (1) {
if (b->bus == NULL) {
if (!found)
break;
- /* sleep to give the kernel a chance to create the file */
- sleep(1);
+ /* give the kernel a chance to create the file */
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
--loop;
if (loop == 0)
break;
@@ -394,7 +392,8 @@ static void wait_for_device_to_initializ
}
if (!found)
dbg("did not find bus type '%s' on list of bus_id_files, "
- "contact greg@kroah.com", sysfs_device->bus);
+ "please report to <linux-hotplug-devel@lists.sourceforge.net>",
+ sysfs_device->bus);
exit:
return; /* here to prevent compiler warning... */
}
@@ -680,7 +679,6 @@ static struct sysfs_device *get_sysfs_de
{
struct sysfs_device *sysfs_device;
struct sysfs_class_device *class_dev_parent;
- struct timespec tspec;
int loop;
/* Figure out where the device symlink is at. For char devices this will
@@ -696,16 +694,14 @@ static struct sysfs_device *get_sysfs_de
if (class_dev_parent != NULL)
dbg("given class device has a parent, use this instead");
- tspec.tv_sec = 0;
- tspec.tv_nsec = 10000000; /* sleep 10 millisec */
- loop = 10;
+ loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (loop--) {
if (udev_sleep) {
if (whitelist_search(class_dev)) {
sysfs_device = NULL;
goto exit;
}
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
if (class_dev_parent)
@@ -727,11 +723,9 @@ device_found:
if (sysfs_device->bus[0] != '\0')
goto bus_found;
- loop = 10;
- tspec.tv_nsec = 10000000;
while (loop--) {
if (udev_sleep)
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
sysfs_get_device_bus(sysfs_device);
if (sysfs_device->bus[0] != '\0')
===== udev-add.c 1.74 vs edited =====
--- 1.74/udev-add.c 2004-10-06 00:22:36 +02:00
+++ edited/udev-add.c 2004-10-06 13:09:20 +02:00
@@ -348,11 +348,10 @@ exit:
/* wait for the "dev" file to show up in the directory in sysfs.
* If it doesn't happen in about 10 seconds, give up.
*/
-#define SECONDS_TO_WAIT_FOR_FILE 10
static int sleep_for_file(const char *path, char* file)
{
char filename[SYSFS_PATH_MAX + 6];
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
int retval;
strfieldcpy(filename, sysfs_path);
@@ -368,7 +367,7 @@ static int sleep_for_file(const char *pa
goto exit;
/* sleep to give the kernel a chance to create the dev file */
- sleep(1);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
retval = -ENODEV;
exit:
===== udev-remove.c 1.33 vs edited =====
--- 1.33/udev-remove.c 2004-08-05 00:40:11 +02:00
+++ edited/udev-remove.c 2004-10-06 13:24:28 +02:00
@@ -109,6 +109,7 @@ static int delete_node(struct udevice *d
int i;
char *pos;
int len;
+ int num;
strfieldcpy(filename, udev_root);
strfieldcat(filename, dev->name);
@@ -118,10 +119,15 @@ static int delete_node(struct udevice *d
if (retval)
return retval;
- /* remove partition nodes */
- if (dev->partitions > 0) {
- info("removing partitions '%s[1-%i]'", filename, dev->partitions);
- for (i = 1; i <= dev->partitions; i++) {
+ /* remove all_partitions nodes */
+ num = dev->partitions;
+ if (num > 0) {
+ info("removing all_partitions '%s[1-%i]'", filename, num);
+ if (num > PARTITIONS_COUNT) {
+ info("garbage from udev database, skip all_partitions removal");
+ return -1;
+ }
+ for (i = 1; i <= num; i++) {
strfieldcpy(partitionname, filename);
strintcat(partitionname, i);
secure_unlink(partitionname);
===== udev.c 1.62 vs edited =====
--- 1.62/udev.c 2004-09-14 02:25:32 +02:00
+++ edited/udev.c 2004-10-06 13:09:20 +02:00
@@ -36,6 +36,9 @@
#include "namedev.h"
#include "udevdb.h"
+/* timeout flag for udevdb */
+extern sig_atomic_t gotalarm;
+
/* global variables */
char **main_argv;
char **main_envp;
@@ -58,6 +61,11 @@ void log_message(int level, const char *
asmlinkage static void sig_handler(int signum)
{
switch (signum) {
+ case SIGALRM:
+ gotalarm = 1;
+ info("error: timeout reached, event probably not handled correctly, "
+ "please report to <linux-hotplug-devel@lists.sourceforge.net> ");
+ break;
case SIGINT:
case SIGTERM:
udevdb_exit();
@@ -94,7 +102,8 @@ int main(int argc, char *argv[], char *e
dbg("version %s", UDEV_VERSION);
- /* initialize our configuration */
+ init_logging("udev");
+
udev_init_config();
if (strstr(argv[0], "udevstart")) {
@@ -147,15 +156,20 @@ int main(int argc, char *argv[], char *e
/* set signal handlers */
act.sa_handler = sig_handler;
sigemptyset (&act.sa_mask);
+
+ /* alarm should interrupt */
+ sigaction(SIGALRM, &act, NULL);
+
act.sa_flags = SA_RESTART;
sigaction(SIGINT, &act, NULL);
sigaction(SIGTERM, &act, NULL);
+ /* trigger timout to interrupt blocking syscalls */
+ alarm(ALARM_TIMEOUT);
+
/* initialize udev database */
- if (udevdb_init(UDEVDB_DEFAULT) != 0) {
- dbg("unable to initialize database");
- goto exit;
- }
+ if (udevdb_init(UDEVDB_DEFAULT) != 0)
+ info("error: unable to initialize database, continuing without database");
switch(act_type) {
case UDEVSTART:
===== udev.h 1.62 vs edited =====
--- 1.62/udev.h 2004-09-14 14:29:10 +02:00
+++ edited/udev.h 2004-10-06 13:09:21 +02:00
@@ -26,6 +26,9 @@
#include <sys/param.h>
#include "libsysfs/sysfs/libsysfs.h"
+#define ALARM_TIMEOUT 20
+#define WAIT_FOR_FILE_SECONDS 10
+#define WAIT_FOR_FILE_RETRY_FREQ 10
#define COMMENT_CHARACTER '#'
#define NAME_SIZE 256
===== udevdb.c 1.30 vs edited =====
--- 1.30/udevdb.c 2004-06-29 14:51:35 +02:00
+++ edited/udevdb.c 2004-10-06 13:09:21 +02:00
@@ -42,13 +42,28 @@
#include "tdb/tdb.h"
static TDB_CONTEXT *udevdb;
+sig_atomic_t gotalarm;
+static void tdb_log(TDB_CONTEXT *tdb, int level, const char *format, ...)
+{
+ va_list args;
+
+ if (!udev_log)
+ return;
+
+ va_start(args, format);
+ vsyslog(level, format, args);
+ va_end(args);
+}
int udevdb_add_dev(const char *path, const struct udevice *dev)
{
TDB_DATA key, data;
char keystr[SYSFS_PATH_MAX];
+ if (udevdb == NULL)
+ return -1;
+
if ((path == NULL) || (dev == NULL))
return -ENODEV;
@@ -68,6 +83,9 @@ int udevdb_get_dev(const char *path, str
{
TDB_DATA key, data;
+ if (udevdb == NULL)
+ return -1;
+
if (path == NULL)
return -ENODEV;
@@ -88,6 +106,9 @@ int udevdb_delete_dev(const char *path)
TDB_DATA key;
char keystr[SYSFS_PATH_MAX];
+ if (udevdb == NULL)
+ return -1;
+
if (path == NULL)
return -EINVAL;
@@ -121,7 +142,9 @@ int udevdb_init(int init_flag)
if (init_flag != UDEVDB_DEFAULT && init_flag != UDEVDB_INTERNAL)
return -EINVAL;
- udevdb = tdb_open(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644);
+ tdb_set_lock_alarm(&gotalarm);
+
+ udevdb = tdb_open_ex(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644, tdb_log);
if (udevdb == NULL) {
if (init_flag == UDEVDB_INTERNAL)
dbg("unable to initialize in-memory database");
@@ -137,7 +160,7 @@ int udevdb_init(int init_flag)
*/
int udevdb_open_ro(void)
{
- udevdb = tdb_open(udev_db_filename, 0, 0, O_RDONLY, 0);
+ udevdb = tdb_open_ex(udev_db_filename, 0, 0, O_RDONLY, 0, tdb_log);
if (udevdb == NULL) {
dbg("unable to open database at '%s'", udev_db_filename);
return -EACCES;
@@ -159,6 +182,9 @@ static int traverse_callback(TDB_CONTEXT
int udevdb_call_foreach(int (*user_record_handler) (char *path, struct udevice *dev))
{
int retval = 0;
+
+ if (udevdb == NULL)
+ return -1;
if (user_record_handler == NULL) {
dbg("invalid user record handling function");
===== tdb/tdb.c 1.4 vs edited =====
--- 1.4/tdb/tdb.c 2004-09-20 16:01:58 +02:00
+++ edited/tdb/tdb.c 2004-10-06 13:09:21 +02:00
@@ -617,8 +617,10 @@ int tdb_printfreelist(TDB_CONTEXT *tdb)
static int remove_from_freelist(TDB_CONTEXT *tdb, tdb_off off, tdb_off next)
{
tdb_off last_ptr, i;
+ int maxloop;
/* read in the freelist top */
+ maxloop = 100000;
last_ptr = FREELIST_TOP;
while (ofs_read(tdb, last_ptr, &i) != -1 && i != 0) {
if (i == off) {
@@ -627,6 +629,12 @@ static int remove_from_freelist(TDB_CONT
}
/* Follow chain (next offset is at start of record) */
last_ptr = i;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "remove_from_freelist: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
+ }
}
TDB_LOG((tdb, 0,"remove_from_freelist: not on list at off=%d\n", off));
return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
@@ -853,6 +861,7 @@ static tdb_off tdb_allocate(TDB_CONTEXT
{
tdb_off rec_ptr, last_ptr, newrec_ptr;
struct list_struct newrec;
+ int maxloop;
if (tdb_lock(tdb, -1, F_WRLCK) == -1)
return 0;
@@ -868,6 +877,7 @@ static tdb_off tdb_allocate(TDB_CONTEXT
goto fail;
/* keep looking until we find a freelist record big enough */
+ maxloop = 100000;
while (rec_ptr) {
if (rec_free_read(tdb, rec_ptr, rec) == -1)
goto fail;
@@ -919,6 +929,12 @@ static tdb_off tdb_allocate(TDB_CONTEXT
/* move to the next record */
last_ptr = rec_ptr;
rec_ptr = rec->next;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "tdb_allocate: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, 0);
+ }
}
/* we didn't find enough space. See if we can expand the
database and if we can then try again */
@@ -981,12 +997,14 @@ static tdb_off tdb_find(TDB_CONTEXT *tdb
struct list_struct *r)
{
tdb_off rec_ptr;
-
+ int maxloop;
+
/* read in the hash top */
if (ofs_read(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1)
return 0;
/* keep looking until we find the right record */
+ maxloop = 100000;
while (rec_ptr) {
if (rec_read(tdb, rec_ptr, r) == -1)
return 0;
@@ -1006,6 +1024,12 @@ static tdb_off tdb_find(TDB_CONTEXT *tdb
SAFE_FREE(k);
}
rec_ptr = r->next;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "tdb_find maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, 0);
+ }
}
return TDB_ERRCODE(TDB_ERR_NOEXIST, 0);
}
@@ -1188,6 +1212,7 @@ static int do_delete(TDB_CONTEXT *tdb, t
{
tdb_off last_ptr, i;
struct list_struct lastrec;
+ int maxloop;
if (tdb->read_only) return -1;
@@ -1202,9 +1227,18 @@ static int do_delete(TDB_CONTEXT *tdb, t
/* find previous record in hash chain */
if (ofs_read(tdb, TDB_HASH_TOP(rec->full_hash), &i) == -1)
return -1;
- for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next)
+
+ maxloop = 100000;
+ for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next) {
if (rec_read(tdb, i, &lastrec) == -1)
return -1;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "(tdb)do_delete: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
+ }
+ }
/* unlink it: next ptr is at start of record. */
if (last_ptr == 0)
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (22 preceding siblings ...)
2004-10-06 12:00 ` Kay Sievers
@ 2004-10-06 12:29 ` Frank Steiner
2004-10-08 5:59 ` Frank Steiner
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-06 12:29 UTC (permalink / raw)
To: linux-hotplug
Kay Sievers wrote
>
> Funny, this is not inside the tdb, it's caused by getting garbage from the
> udev database.
>
> You don't use {all_partitions}, right?
No. I just took over SuSE udev.rules, and they don't use it, but everything
without card readers etc. always worked find, so never looked into this.
>
> I expect, that the payload-data from the tdb is corrupt and the record
> claims, that the udev_add has added > 2.000.000 partition-nodes for the
> device :) You should have something like this in the syslog:
> "removing partitions 'sg2[1-24532432]'"
No, but there are ssyslog messages missing for about 9 hours. Sth. very
weird must have happened to this host tonight :-)
> Here is a patch for it. It's against the current tree after our patch
> session yesterday. But it should successfully apply to your tree with
> offsets.
Thanks, let's for the next run :-)
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: Hanging udev process on nfs-mounted /dev
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
` (23 preceding siblings ...)
2004-10-06 12:29 ` Frank Steiner
@ 2004-10-08 5:59 ` Frank Steiner
24 siblings, 0 replies; 26+ messages in thread
From: Frank Steiner @ 2004-10-08 5:59 UTC (permalink / raw)
To: linux-hotplug
Hi Kay,
with version 04 of your deadlock patch, the three hosts which still use
/dev over nfs have now been rebooting in a loop (rebooting every 5 minutes
with extra, parallel udevstart calls) for two days without any lockups.
The same rebooting mechanism reliably locked them up in all the former
tests, so I guess that all possible deadlocks (at least those caused by
non-locking NFS) are fixed! I see all the warnings in the logs, so I can
tell that the deadlock situations happen a few times a day, but patch
unbreaks them all. So I guess you can make it final.
The hosts running tmps didn't have any problems at all, with or without
the patch.
Just a minor problem: With the deadlock patch I can't compile udev with
"USE_LOGúlse DEBUGúlse". I fixed that by putting an #ifdef around, i.e.,
+static void tdb_log(TDB_CONTEXT *tdb, int level, const char *format, ...)
+{
+ va_list args;
+
+ if (!udev_log)
+ return;
+
+#ifdef LOG
+ va_start(args, format);
+ vsyslog(level, format, args);
+ va_end(args);
+#endif
+}
Not sure if that makes any sense except making it compile, but you
will know better :-)
Thanks again for all your help and consistent patching :-)
cu,
Frank
--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Linux-hotplug-devel mailing list http://linux-hotplug.sourceforge.net
Linux-hotplug-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-hotplug-devel
^ permalink raw reply [flat|nested] 26+ messages in thread