On Mon, Oct 04, 2004 at 08:15:46AM +0200, Frank Steiner wrote: > Kay Sievers wrote > > >Oh, bad. > >It's your running strace that prevents gbd to take the control over the > >process, I expect. Just try to send the strace parent-process a SIGUSR1 > >which may leave the udev process running. > > Yes, that worked :-) On one host there were already 3 udev processes > sharing all the CPU time, on the second it was just one. All four > traces look a little bit different, that's why I've attached all > four of them. I hope you can fetch sth. useful from the traces... Ok, let's look at it: > (gdb) bt > #0 0x400b91be in memcpy () from /lib/i686/libc.so.6 > #1 0x08056164 in tdb_read (tdb=0x80640a8, off=1073871052, buf=0xbffff2d8, > len=3221222108, cv=0) at tdb/tdb.c:407 > #2 0x080562b9 in ofs_read (tdb=0x4, offset=22728, d=0xbffff2d8) > at tdb/tdb.c:447 > #3 0x080567e4 in remove_from_freelist (tdb=0x80640a8, off=28176, next=696) > at tdb/tdb.c:628 > #4 0x080568d4 in tdb_free (tdb=0x80640a8, offset=27268, rec=0xbffff370) > at tdb/tdb.c:662 > #5 0x08056e2e in tdb_allocate (tdb=0x80640a8, length=884, rec=0xbffff3e0) > at tdb/tdb.c:910 > #6 0x08057e5a in tdb_store (tdb=0x80640a8, key= > {dptr = 0xbffff450 "/class/scsi_generic/sg3", dsize = 24}, dbuf= > {dptr = 0xbffff6a0 "sg3", dsize = 856}, flag=1) at tdb/tdb.c:1497 > #7 0x0804c361 in udevdb_add_dev (path=0xbfffff67 "/class/scsi_generic/sg3", > dev=0xbffff6a0) at udevdb.c:76 > #8 0x0804bb66 in udev_add_device (path=0xbfffff67 "/class/scsi_generic/sg3", > subsystem=0xbfffff45 "scsi_generic", fake=0) at udev-add.c:446 > #9 0x0804971f in main (argc=2, argv=0xbffffd70, envp=0x4) at udev.c:185 This seems to be a loop in tdb_allocate(). > (gdb) bt > #0 tdb_oob (tdb=0x80640a8, len=3221222944, probe=7004) at tdb/tdb.c:342 > #1 0x0805638b in rec_read (tdb=0x80640a8, offset=7004, rec=0xbffff620) > at tdb/tdb.c:466 > #2 0x0805703e in tdb_find (tdb=0x80640a8, key= > {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}, hash=536231552, > r=0xbffff620) at tdb/tdb.c:990 > #3 0x080571b8 in tdb_find_lock (tdb=0x80640a8, key= > {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}, locktype=0, > rec=0xbffff620) at tdb/tdb.c:1035 > #4 0x080572ef in tdb_fetch (tdb=0x80640a8, key= > {dptr = 0xbfffff60 "/class/scsi_device/26:0:0:0", dsize = 28}) at tdb/tdb.c:1113 > #5 0x0804c3a3 in udevdb_get_dev (path=0xffffff00
, > dev=0xbffff6a0) at udevdb.c:89 > #6 0x0804c17f in udev_remove_device (path=0xbfffff60 "/class/scsi_device/26:0:0:0", > subsystem=0xbfffff45 "block") at udev-remove.c:170 > #7 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0x1b74) at udev.c:189 This is the loop in tdb_find(). > (gdb) bt > #0 0x08055f9a in tdb_oob (tdb=0x80640a8, len=7028, probe=0) at tdb/tdb.c:344 > #1 0x0805613c in tdb_read (tdb=0x80640a8, off=7004, buf=0xbffff620, len=24, cv=0) > at tdb/tdb.c:403 > #2 0x0805631e in rec_read (tdb=0x80640a8, offset=7004, rec=0xbffff620) > at tdb/tdb.c:458 > #3 0x0805703e in tdb_find (tdb=0x80640a8, key= > {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}, hash=3221149257, > r=0xbffff620) at tdb/tdb.c:990 > #4 0x080571b8 in tdb_find_lock (tdb=0x80640a8, key= > {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}, locktype=0, > rec=0xbffff620) at tdb/tdb.c:1035 > #5 0x080572ef in tdb_fetch (tdb=0x80640a8, key= > {dptr = 0xbfffff61 "/class/scsi_device/9:0:0:2", dsize = 27}) at tdb/tdb.c:1113 > #6 0x0804c3a3 in udevdb_get_dev (path=0x0, dev=0xbffff6a0) at udevdb.c:89 > #7 0x0804c17f in udev_remove_device (path=0xbfffff61 "/class/scsi_device/9:0:0:2", > subsystem=0xbfffff46 "block") at udev-remove.c:170 > #8 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0x0) at udev.c:189 The same tdb_find() loop. > (gdb) bt > #0 ofs_read (tdb=0xbffff438, offset=20016, d=0x0) at tdb/tdb.c:446 > #1 0x080567e4 in remove_from_freelist (tdb=0x80640a8, off=20016, next=0) > at tdb/tdb.c:628 > #2 0x080568d4 in tdb_free (tdb=0x80640a8, offset=19108, rec=0xbffff520) > at tdb/tdb.c:662 > #3 0x08057586 in do_delete (tdb=0x80640a8, rec_ptr=19108, rec=0xbffff520) > at tdb/tdb.c:1215 > #4 0x08057cc4 in tdb_delete (tdb=0x80640a8, key= > {dptr = 0xbffff570 "/class/scsi_generic/sg2", dsize = 24}) at tdb/tdb.c:1434 > #5 0x0804c475 in udevdb_delete_dev (path=0xbfffff64 "/class/scsi_generic/sg2") > at udevdb.c:112 > #6 0x0804c218 in udev_remove_device (path=0xbfffff64 "/class/scsi_generic/sg2", > subsystem=0xbfffff42 "scsi_generic") at udev-remove.c:182 > #7 0x08049760 in main (argc=2, argv=0xbffffd70, envp=0xbffff438) at udev.c:189 Seems to be a loop in remove_from_freelist(). All here known failure paths are now covered in the attached patch by limiting the iteration count for loops over data read from disk. Let's see what happens next :) Kay