From: Kay Sievers <kay.sievers@vrfy.org>
To: linux-hotplug@vger.kernel.org
Subject: Re: Hanging udev process on nfs-mounted /dev
Date: Mon, 04 Oct 2004 14:19:57 +0000 [thread overview]
Message-ID: <20041004141957.GA3086@vrfy.org> (raw)
In-Reply-To: <415980BF.1020401@bio.ifi.lmu.de>
[-- Attachment #1: Type: text/plain, Size: 4999 bytes --]
On Mon, Oct 04, 2004 at 08:15:46AM +0200, Frank Steiner wrote:
> Kay Sievers wrote
>
> >Oh, bad.
> >It's your running strace that prevents gbd to take the control over the
> >process, I expect. Just try to send the strace parent-process a SIGUSR1
> >which may leave the udev process running.
>
> Yes, that worked :-) On one host there were already 3 udev processes
> sharing all the CPU time, on the second it was just one. All four
> traces look a little bit different, that's why I've attached all
> four of them. I hope you can fetch sth. useful from the traces...
Ok, let's look at it:
> (gdb) bt
> #0 0x400b91be in memcpy () from /lib/i686/libc.so.6
> #1 0x08056164 in tdb_read (tdb=0x80640a8, off=1073871052, buf=0xbffff2d8,
> len=3221222108, cv=0) at tdb/tdb.c:407
> #2 0x080562b9 in ofs_read (tdb=0x4, offset=22728, d=0xbffff2d8)
> at tdb/tdb.c:447
> #3 0x080567e4 in remove_from_freelist (tdb=0x80640a8, off=28176, next=696)
> at tdb/tdb.c:628
> #4 0x080568d4 in tdb_free (tdb=0x80640a8, offset=27268, rec=0xbffff370)
> at tdb/tdb.c:662
> #5 0x08056e2e in tdb_allocate (tdb=0x80640a8, length=884, rec=0xbffff3e0)
> at tdb/tdb.c:910
> #6 0x08057e5a in tdb_store (tdb=0x80640a8, key=
> {dptr = 0xbffff450 "/class/scsi_generic/sg3", dsize = 24}, dbuf=
> {dptr = 0xbffff6a0 "sg3", dsize = 856}, flag=1) at tdb/tdb.c:1497
> #7 0x0804c361 in udevdb_add_dev (path=0xbfffff67 "/class/scsi_generic/sg3",
> dev=0xbffff6a0) at udevdb.c:76
> #8 0x0804bb66 in udev_add_device (path=0xbfffff67 "/class/scsi_generic/sg3",
> subsystem=0xbfffff45 "scsi_generic", fake=0) at udev-add.c:446
> #9 0x0804971f in main (argc=2, argv=0xbffffd70, envp=0x4) at udev.c:185
This seems to be a loop in tdb_allocate().
> (gdb) bt
> #0 tdb_oob (tdb=0x80640a8, len=3221222944, probe=7004) at tdb/tdb.c:342
> #1 0x0805638b in rec_read (tdb=0x80640a8, offset=7004, rec=0xbffff620)
> at tdb/tdb.c:466
> #2 0x0805703e in tdb_find (tdb=0x80640a8, key=
> {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}, hash=536231552,
> r=0xbffff620) at tdb/tdb.c:990
> #3 0x080571b8 in tdb_find_lock (tdb=0x80640a8, key=
> {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}, locktype=0,
> rec=0xbffff620) at tdb/tdb.c:1035
> #4 0x080572ef in tdb_fetch (tdb=0x80640a8, key=
> {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}) at tdb/tdb.c:1113
> #5 0x0804c3a3 in udevdb_get_dev (path=0xffffff00 <Address 0xffffff00 out of bounds>,
> dev=0xbffff6a0) at udevdb.c:89
> #6 0x0804c17f in udev_remove_device (path=0xbfffff60 "/class/scsi_device/26:0:0:0",
> subsystem=0xbfffff45 "block") at udev-remove.c:170
> #7 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0x1b74) at udev.c:189
This is the loop in tdb_find().
> (gdb) bt
> #0 0x08055f9a in tdb_oob (tdb=0x80640a8, len=7028, probe=0) at tdb/tdb.c:344
> #1 0x0805613c in tdb_read (tdb=0x80640a8, off=7004, buf=0xbffff620, len=24, cv=0)
> at tdb/tdb.c:403
> #2 0x0805631e in rec_read (tdb=0x80640a8, offset=7004, rec=0xbffff620)
> at tdb/tdb.c:458
> #3 0x0805703e in tdb_find (tdb=0x80640a8, key=
> {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}, hash=3221149257,
> r=0xbffff620) at tdb/tdb.c:990
> #4 0x080571b8 in tdb_find_lock (tdb=0x80640a8, key=
> {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}, locktype=0,
> rec=0xbffff620) at tdb/tdb.c:1035
> #5 0x080572ef in tdb_fetch (tdb=0x80640a8, key=
> {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}) at tdb/tdb.c:1113
> #6 0x0804c3a3 in udevdb_get_dev (path=0x0, dev=0xbffff6a0) at udevdb.c:89
> #7 0x0804c17f in udev_remove_device (path=0xbfffff61 "/class/scsi_device/9:0:0:2",
> subsystem=0xbfffff46 "block") at udev-remove.c:170
> #8 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0x0) at udev.c:189
The same tdb_find() loop.
> (gdb) bt
> #0 ofs_read (tdb=0xbffff438, offset=20016, d=0x0) at tdb/tdb.c:446
> #1 0x080567e4 in remove_from_freelist (tdb=0x80640a8, off=20016, next=0)
> at tdb/tdb.c:628
> #2 0x080568d4 in tdb_free (tdb=0x80640a8, offset=19108, rec=0xbffff520)
> at tdb/tdb.c:662
> #3 0x08057586 in do_delete (tdb=0x80640a8, rec_ptr=19108, rec=0xbffff520)
> at tdb/tdb.c:1215
> #4 0x08057cc4 in tdb_delete (tdb=0x80640a8, key=
> {dptr = 0xbffff570 "/class/scsi_generic/sg2", dsize = 24}) at tdb/tdb.c:1434
> #5 0x0804c475 in udevdb_delete_dev (path=0xbfffff64 "/class/scsi_generic/sg2")
> at udevdb.c:112
> #6 0x0804c218 in udev_remove_device (path=0xbfffff64 "/class/scsi_generic/sg2",
> subsystem=0xbfffff42 "scsi_generic") at udev-remove.c:182
> #7 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0xbffff438) at udev.c:189
Seems to be a loop in remove_from_freelist().
All here known failure paths are now covered in the attached patch by
limiting the iteration count for loops over data read from disk. Let's
see what happens next :)
Kay
[-- Attachment #2: udev-deadlock-debug-03.patch --]
[-- Type: text/plain, Size: 10735 bytes --]
===== namedev.c 1.146 vs edited =====
--- 1.146/namedev.c 2004-09-08 15:17:55 +02:00
+++ edited/namedev.c 2004-10-04 15:24:31 +02:00
@@ -29,7 +29,6 @@
#include <ctype.h>
#include <unistd.h>
#include <errno.h>
-#include <time.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
@@ -353,7 +352,6 @@ static struct bus_file {
{}
};
-#define SECONDS_TO_WAIT_FOR_FILE 10
static void wait_for_device_to_initialize(struct sysfs_device *sysfs_device)
{
/* sleep until we see the file for this specific bus type show up this
@@ -367,14 +365,14 @@ static void wait_for_device_to_initializ
struct bus_file *b = &bus_files[0];
struct sysfs_attribute *tmpattr;
int found = 0;
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (1) {
if (b->bus == NULL) {
if (!found)
break;
- /* sleep to give the kernel a chance to create the file */
- sleep(1);
+ /* give the kernel a chance to create the file */
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
--loop;
if (loop == 0)
break;
@@ -394,7 +392,8 @@ static void wait_for_device_to_initializ
}
if (!found)
dbg("did not find bus type '%s' on list of bus_id_files, "
- "contact greg@kroah.com", sysfs_device->bus);
+ "please report to <linux-hotplug-devel@lists.sourceforge.net>",
+ sysfs_device->bus);
exit:
return; /* here to prevent compiler warning... */
}
@@ -682,7 +681,6 @@ static struct sysfs_device *get_sysfs_de
{
struct sysfs_device *sysfs_device;
struct sysfs_class_device *class_dev_parent;
- struct timespec tspec;
int loop;
/* Figure out where the device symlink is at. For char devices this will
@@ -698,16 +696,14 @@ static struct sysfs_device *get_sysfs_de
if (class_dev_parent != NULL)
dbg("given class device has a parent, use this instead");
- tspec.tv_sec = 0;
- tspec.tv_nsec = 10000000; /* sleep 10 millisec */
- loop = 10;
+ loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
while (loop--) {
if (udev_sleep) {
if (whitelist_search(class_dev)) {
sysfs_device = NULL;
goto exit;
}
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
if (class_dev_parent)
@@ -729,11 +725,9 @@ device_found:
if (sysfs_device->bus[0] != '\0')
goto bus_found;
- loop = 10;
- tspec.tv_nsec = 10000000;
while (loop--) {
if (udev_sleep)
- nanosleep(&tspec, NULL);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
sysfs_get_device_bus(sysfs_device);
if (sysfs_device->bus[0] != '\0')
===== udev-add.c 1.73 vs edited =====
--- 1.73/udev-add.c 2004-08-05 00:41:08 +02:00
+++ edited/udev-add.c 2004-10-04 15:24:31 +02:00
@@ -340,11 +340,10 @@ exit:
/* wait for the "dev" file to show up in the directory in sysfs.
* If it doesn't happen in about 10 seconds, give up.
*/
-#define SECONDS_TO_WAIT_FOR_FILE 10
static int sleep_for_file(const char *path, char* file)
{
char filename[SYSFS_PATH_MAX + 6];
- int loop = SECONDS_TO_WAIT_FOR_FILE;
+ int loop = WAIT_FOR_FILE_SECONDS * WAIT_FOR_FILE_RETRY_FREQ;
int retval;
strfieldcpy(filename, sysfs_path);
@@ -360,7 +359,7 @@ static int sleep_for_file(const char *pa
goto exit;
/* sleep to give the kernel a chance to create the dev file */
- sleep(1);
+ usleep(1000 * 1000 / WAIT_FOR_FILE_RETRY_FREQ);
}
retval = -ENODEV;
exit:
===== udev.c 1.62 vs edited =====
--- 1.62/udev.c 2004-09-14 02:25:32 +02:00
+++ edited/udev.c 2004-10-04 15:24:31 +02:00
@@ -36,6 +36,9 @@
#include "namedev.h"
#include "udevdb.h"
+/* timeout flag for udevdb */
+extern sig_atomic_t gotalarm;
+
/* global variables */
char **main_argv;
char **main_envp;
@@ -58,6 +61,11 @@ void log_message(int level, const char *
asmlinkage static void sig_handler(int signum)
{
switch (signum) {
+ case SIGALRM:
+ gotalarm = 1;
+ info("error: timeout reached, event probably not handled correctly, "
+ "please report to <linux-hotplug-devel@lists.sourceforge.net> ");
+ break;
case SIGINT:
case SIGTERM:
udevdb_exit();
@@ -94,7 +102,8 @@ int main(int argc, char *argv[], char *e
dbg("version %s", UDEV_VERSION);
- /* initialize our configuration */
+ init_logging("udev");
+
udev_init_config();
if (strstr(argv[0], "udevstart")) {
@@ -147,15 +156,20 @@ int main(int argc, char *argv[], char *e
/* set signal handlers */
act.sa_handler = sig_handler;
sigemptyset (&act.sa_mask);
+
+ /* alarm should interrupt */
+ sigaction(SIGALRM, &act, NULL);
+
act.sa_flags = SA_RESTART;
sigaction(SIGINT, &act, NULL);
sigaction(SIGTERM, &act, NULL);
+ /* trigger timout to interrupt blocking syscalls */
+ alarm(ALARM_TIMEOUT);
+
/* initialize udev database */
- if (udevdb_init(UDEVDB_DEFAULT) != 0) {
- dbg("unable to initialize database");
- goto exit;
- }
+ if (udevdb_init(UDEVDB_DEFAULT) != 0)
+ info("error: unable to initialize database, continuing without database");
switch(act_type) {
case UDEVSTART:
===== udev.h 1.62 vs edited =====
--- 1.62/udev.h 2004-09-14 14:29:10 +02:00
+++ edited/udev.h 2004-10-04 15:24:31 +02:00
@@ -26,6 +26,9 @@
#include <sys/param.h>
#include "libsysfs/sysfs/libsysfs.h"
+#define ALARM_TIMEOUT 20
+#define WAIT_FOR_FILE_SECONDS 10
+#define WAIT_FOR_FILE_RETRY_FREQ 10
#define COMMENT_CHARACTER '#'
#define NAME_SIZE 256
===== udevdb.c 1.30 vs edited =====
--- 1.30/udevdb.c 2004-06-29 14:51:35 +02:00
+++ edited/udevdb.c 2004-10-04 15:24:31 +02:00
@@ -42,13 +42,28 @@
#include "tdb/tdb.h"
static TDB_CONTEXT *udevdb;
+sig_atomic_t gotalarm;
+static void tdb_log(TDB_CONTEXT *tdb, int level, const char *format, ...)
+{
+ va_list args;
+
+ if (!udev_log)
+ return;
+
+ va_start(args, format);
+ vsyslog(level, format, args);
+ va_end(args);
+}
int udevdb_add_dev(const char *path, const struct udevice *dev)
{
TDB_DATA key, data;
char keystr[SYSFS_PATH_MAX];
+ if (udevdb == NULL)
+ return -1;
+
if ((path == NULL) || (dev == NULL))
return -ENODEV;
@@ -68,6 +83,9 @@ int udevdb_get_dev(const char *path, str
{
TDB_DATA key, data;
+ if (udevdb == NULL)
+ return -1;
+
if (path == NULL)
return -ENODEV;
@@ -88,6 +106,9 @@ int udevdb_delete_dev(const char *path)
TDB_DATA key;
char keystr[SYSFS_PATH_MAX];
+ if (udevdb == NULL)
+ return -1;
+
if (path == NULL)
return -EINVAL;
@@ -121,7 +142,9 @@ int udevdb_init(int init_flag)
if (init_flag != UDEVDB_DEFAULT && init_flag != UDEVDB_INTERNAL)
return -EINVAL;
- udevdb = tdb_open(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644);
+ tdb_set_lock_alarm(&gotalarm);
+
+ udevdb = tdb_open_ex(udev_db_filename, 0, init_flag, O_RDWR | O_CREAT, 0644, tdb_log);
if (udevdb == NULL) {
if (init_flag == UDEVDB_INTERNAL)
dbg("unable to initialize in-memory database");
@@ -137,7 +160,7 @@ int udevdb_init(int init_flag)
*/
int udevdb_open_ro(void)
{
- udevdb = tdb_open(udev_db_filename, 0, 0, O_RDONLY, 0);
+ udevdb = tdb_open_ex(udev_db_filename, 0, 0, O_RDONLY, 0, tdb_log);
if (udevdb == NULL) {
dbg("unable to open database at '%s'", udev_db_filename);
return -EACCES;
@@ -159,6 +182,9 @@ static int traverse_callback(TDB_CONTEXT
int udevdb_call_foreach(int (*user_record_handler) (char *path, struct udevice *dev))
{
int retval = 0;
+
+ if (udevdb == NULL)
+ return -1;
if (user_record_handler == NULL) {
dbg("invalid user record handling function");
===== tdb/tdb.c 1.3 vs edited =====
--- 1.3/tdb/tdb.c 2003-12-17 01:23:27 +01:00
+++ edited/tdb/tdb.c 2004-10-04 16:01:26 +02:00
@@ -616,8 +616,10 @@ int tdb_printfreelist(TDB_CONTEXT *tdb)
static int remove_from_freelist(TDB_CONTEXT *tdb, tdb_off off, tdb_off next)
{
tdb_off last_ptr, i;
+ int maxloop;
/* read in the freelist top */
+ maxloop = 100000;
last_ptr = FREELIST_TOP;
while (ofs_read(tdb, last_ptr, &i) != -1 && i != 0) {
if (i == off) {
@@ -626,6 +628,12 @@ static int remove_from_freelist(TDB_CONT
}
/* Follow chain (next offset is at start of record) */
last_ptr = i;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "remove_from_freelist: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
+ }
}
TDB_LOG((tdb, 0,"remove_from_freelist: not on list at off=%d\n", off));
return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
@@ -852,6 +860,7 @@ static tdb_off tdb_allocate(TDB_CONTEXT
{
tdb_off rec_ptr, last_ptr, newrec_ptr;
struct list_struct newrec;
+ int maxloop;
if (tdb_lock(tdb, -1, F_WRLCK) == -1)
return 0;
@@ -867,6 +876,7 @@ static tdb_off tdb_allocate(TDB_CONTEXT
goto fail;
/* keep looking until we find a freelist record big enough */
+ maxloop = 100000;
while (rec_ptr) {
if (rec_free_read(tdb, rec_ptr, rec) == -1)
goto fail;
@@ -918,6 +928,12 @@ static tdb_off tdb_allocate(TDB_CONTEXT
/* move to the next record */
last_ptr = rec_ptr;
rec_ptr = rec->next;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "tdb_allocate: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, 0);
+ }
}
/* we didn't find enough space. See if we can expand the
database and if we can then try again */
@@ -980,12 +996,14 @@ static tdb_off tdb_find(TDB_CONTEXT *tdb
struct list_struct *r)
{
tdb_off rec_ptr;
-
+ int maxloop;
+
/* read in the hash top */
if (ofs_read(tdb, TDB_HASH_TOP(hash), &rec_ptr) == -1)
return 0;
/* keep looking until we find the right record */
+ maxloop = 100000;
while (rec_ptr) {
if (rec_read(tdb, rec_ptr, r) == -1)
return 0;
@@ -1005,6 +1023,12 @@ static tdb_off tdb_find(TDB_CONTEXT *tdb
SAFE_FREE(k);
}
rec_ptr = r->next;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "tdb_find maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, 0);
+ }
}
return TDB_ERRCODE(TDB_ERR_NOEXIST, 0);
}
@@ -1187,6 +1211,7 @@ static int do_delete(TDB_CONTEXT *tdb, t
{
tdb_off last_ptr, i;
struct list_struct lastrec;
+ int maxloop;
if (tdb->read_only) return -1;
@@ -1201,9 +1226,18 @@ static int do_delete(TDB_CONTEXT *tdb, t
/* find previous record in hash chain */
if (ofs_read(tdb, TDB_HASH_TOP(rec->full_hash), &i) == -1)
return -1;
- for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next)
+
+ maxloop = 100000;
+ for (last_ptr = 0; i != rec_ptr; last_ptr = i, i = lastrec.next) {
if (rec_read(tdb, i, &lastrec) == -1)
return -1;
+
+ maxloop--;
+ if (maxloop == 0) {
+ TDB_LOG((tdb, 0, "(tdb)do_delete: maxloop reached; corrupt database!\n"));
+ return TDB_ERRCODE(TDB_ERR_CORRUPT, -1);
+ }
+ }
/* unlink it: next ptr is at start of record. */
if (last_ptr == 0)
next prev parent reply other threads:[~2004-10-04 14:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-28 15:18 Hanging udev process on nfs-mounted /dev Frank Steiner
2004-09-29 17:18 ` Greg KH
2004-09-29 23:39 ` Kay Sievers
2004-09-30 2:11 ` Kay Sievers
2004-09-30 6:18 ` Frank Steiner
2004-09-30 6:21 ` Frank Steiner
2004-09-30 14:07 ` Kay Sievers
2004-10-01 6:25 ` Frank Steiner
2004-10-01 7:36 ` Kay Sievers
2004-10-01 7:38 ` Frank Steiner
2004-10-01 7:55 ` Frank Steiner
2004-10-01 8:08 ` Kay Sievers
2004-10-01 9:43 ` Frank Steiner
2004-10-01 9:57 ` Kay Sievers
2004-10-01 10:43 ` Kay Sievers
2004-10-01 22:18 ` Kay Sievers
2004-10-03 21:10 ` Frank Steiner
2004-10-03 23:07 ` Kay Sievers
2004-10-04 6:15 ` Frank Steiner
2004-10-04 14:19 ` Kay Sievers [this message]
2004-10-04 14:53 ` Frank Steiner
2004-10-05 15:37 ` Kay Sievers
2004-10-06 6:06 ` Frank Steiner
2004-10-06 12:00 ` Kay Sievers
2004-10-06 12:29 ` Frank Steiner
2004-10-08 5:59 ` Frank Steiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041004141957.GA3086@vrfy.org \
--to=kay.sievers@vrfy.org \
--cc=linux-hotplug@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).