* Failed reads from RAID-0 array (from newbie who has read the FAQ) @ 2007-03-17 2:20 Michael Schwarz 2007-03-17 5:31 ` Neil Brown 0 siblings, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-17 2:20 UTC (permalink / raw) To: linux-raid I'm not a Linux newbie (I've even written a couple of books and done some very light device driver work), but I'm completely new to the software raid subsystem. I'm doing something rather oddball. I'm making an array of USB flash drives and comparing read and write rates. Well, I've had great success writing. I've got seven flash drives on a hub. I've joined them up both linear and raid0 and written large amounts of data to them. But come time to read from them, linear works, but raid0 hangs after transferring just shy of 2G of data. It doesn't matter if it reading from one file or from many files whose cumulative size is just shy of 2G. It doesn't matter if I'm using "dd" or "cp" to read the file or files. The process doing the transfer is unkillable. Not with a kill -15 or a kill -9. It won't die, but it also won't make progress. "Linear" always works. Raid-0 always hangs. Here are my mdadm commands to create the array: mdadm --create /dev/md0 --level=linear --auto=md --chunk=32 --raid-devices=7 /dev/sd? (The wildcard works because the seven flash drives are the only scsi devices on the system). The command for the raid-0 array is the same as above except for the "--level=0" it takes to make a raid 0 array. I then use "mkfs" to make the filesystem and mount the resulting array at "/mnt" Can anyone give a raid newbiw some tips? Is there something obvious I'm missing? Would it help to provide strace/ltrace/ptrace of the hanging copy command? Any help (including URLs of manuals I should RTFM) would be most welcome. Thanks! -- Michael Schwarz ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-17 2:20 Failed reads from RAID-0 array (from newbie who has read the FAQ) Michael Schwarz @ 2007-03-17 5:31 ` Neil Brown 2007-03-17 18:01 ` Michael Schwarz [not found] ` <45FC33A4.2090408@tmr.com> 0 siblings, 2 replies; 23+ messages in thread From: Neil Brown @ 2007-03-17 5:31 UTC (permalink / raw) To: mschwarz; +Cc: linux-raid, linux-usb-users On Friday March 16, mschwarz@multitool.net wrote: > I'm not a Linux newbie (I've even written a couple of books and done some > very light device driver work), but I'm completely new to the software > raid subsystem. > > I'm doing something rather oddball. I'm making an array of USB flash > drives and comparing read and write rates. > > Well, I've had great success writing. I've got seven flash drives on a > hub. I've joined them up both linear and raid0 and written large amounts > of data to them. But come time to read from them, linear works, but raid0 > hangs after transferring just shy of 2G of data. It doesn't matter if it > reading from one file or from many files whose cumulative size is just shy > of 2G. It doesn't matter if I'm using "dd" or "cp" to read the file or > files. > > The process doing the transfer is unkillable. Not with a kill -15 or a > kill -9. It won't die, but it also won't make progress. > > "Linear" always works. Raid-0 always hangs. My guess would be a locking bug in the usb storage driver or some lower level USB driver.. A significant difference between raid0 and linear is that a largish IO will touch all drives for raid-0, but only one or two for linear. That gives much more opportunity for locking bugs to hit. When it is in the hanging state, do echo t > /proc/sysrq-trigger and look in the kernel logs for the stack trace of all processes. Hopefully the stack trace for the processes in 'D' state will be informative. NeilBrown > > Here are my mdadm commands to create the array: > > mdadm --create /dev/md0 --level=linear --auto=md --chunk=32 > --raid-devices=7 /dev/sd? > > (The wildcard works because the seven flash drives are the only scsi > devices on the system). > > The command for the raid-0 array is the same as above except for the > "--level=0" it takes to make a raid 0 array. > > I then use "mkfs" to make the filesystem and mount the resulting array at > "/mnt" > > Can anyone give a raid newbiw some tips? Is there something obvious I'm > missing? Would it help to provide strace/ltrace/ptrace of the hanging copy > command? > > Any help (including URLs of manuals I should RTFM) would be most welcome. > > Thanks! > > > -- > Michael Schwarz > > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Linux-usb-users@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-17 5:31 ` Neil Brown @ 2007-03-17 18:01 ` Michael Schwarz 2007-03-17 20:49 ` Alan Stern [not found] ` <45FC33A4.2090408@tmr.com> 1 sibling, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-17 18:01 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, linux-usb-users Neil: Relevant stack trace follows. Any suggestions? blk_backing_dev_unplug... Does that mean the raid subsystem thinks one of the usb drives has been removed? I assure you that physically this is untrue, but that doesn't mean that some sort logical disconnect hasn't happened... Makes me wonder if one of my USB hub connections is intermittent... I would also welcome any tips on any other developers group to follow up with. I haven't hacked any kernel code since the 2.2.x kernel and things have changed a bit! I don't mind digging into this, but I suspect I could get things cleared up fast if I could find the right subject expert! ======================= cp D E2FBEDB0 1784 4271 4270 (NOTLB) e2fbedb4 00200086 c15dc550 e2fbedb0 00000001 00200082 00001000 00000000 00000000 c15dc550 0000000a e94182b0 f3161430 26320f40 000001c5 00000000 e94183bc c1c8c480 00000000 ecd7d300 c04e0bf2 c042e0e4 f7d767f8 003b6622 Call Trace: [<c04e0bf2>] blk_backing_dev_unplug+0x73/0x7b [<c042e0e4>] getnstimeofday+0x30/0xb6 [<c061ec7e>] io_schedule+0x3a/0x5c [<c045626b>] sync_page+0x0/0x3b [<c04562a3>] sync_page+0x38/0x3b [<c061ed8a>] __wait_on_bit_lock+0x2a/0x52 [<c045625d>] __lock_page+0x58/0x5e [<c043788e>] wake_bit_function+0x0/0x3c [<c04569e3>] do_generic_mapping_read+0x1e0/0x459 [<c0458b0d>] generic_file_aio_read+0x173/0x1a6 [<c0456070>] file_read_actor+0x0/0xe0 [<c047202f>] do_sync_read+0xc7/0x10a [<c0437859>] autoremove_wake_function+0x0/0x35 [<c0471f68>] do_sync_read+0x0/0x10a [<c04728b6>] vfs_read+0xa6/0x152 [<c0472d0f>] sys_read+0x41/0x67 [<c0403f64>] syscall_call+0x7/0xb ======================= -- Michael Schwarz > My guess would be a locking bug in the usb storage driver or some > lower level USB driver.. > A significant difference between raid0 and linear is that a largish IO > will touch all drives for raid-0, but only one or two for linear. > That gives much more opportunity for locking bugs to hit. > > When it is in the hanging state, do > echo t > /proc/sysrq-trigger > > and look in the kernel logs for the stack trace of all processes. > Hopefully the stack trace for the processes in 'D' state will be > informative. > > NeilBrown > > >> >> Here are my mdadm commands to create the array: >> >> mdadm --create /dev/md0 --level=linear --auto=md --chunk=32 >> --raid-devices=7 /dev/sd? >> >> (The wildcard works because the seven flash drives are the only scsi >> devices on the system). >> >> The command for the raid-0 array is the same as above except for the >> "--level=0" it takes to make a raid 0 array. >> >> I then use "mkfs" to make the filesystem and mount the resulting array >> at >> "/mnt" >> >> Can anyone give a raid newbiw some tips? Is there something obvious I'm >> missing? Would it help to provide strace/ltrace/ptrace of the hanging >> copy >> command? >> >> Any help (including URLs of manuals I should RTFM) would be most >> welcome. >> >> Thanks! >> >> >> -- >> Michael Schwarz >> >> >> - >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-17 18:01 ` Michael Schwarz @ 2007-03-17 20:49 ` Alan Stern 2007-03-17 21:35 ` Michael Schwarz 0 siblings, 1 reply; 23+ messages in thread From: Alan Stern @ 2007-03-17 20:49 UTC (permalink / raw) To: Michael Schwarz; +Cc: Neil Brown, linux-raid, linux-usb-users On Sat, 17 Mar 2007, Michael Schwarz wrote: > Neil: > > Relevant stack trace follows. Any suggestions? blk_backing_dev_unplug... > Does that mean the raid subsystem thinks one of the usb drives has been > removed? I assure you that physically this is untrue, but that doesn't > mean that some sort logical disconnect hasn't happened... > > Makes me wonder if one of my USB hub connections is intermittent... > > I would also welcome any tips on any other developers group to follow up > with. I haven't hacked any kernel code since the 2.2.x kernel and things > have changed a bit! I don't mind digging into this, but I suspect I could > get things cleared up fast if I could find the right subject expert! > > > > ======================= > cp D E2FBEDB0 1784 4271 4270 (NOTLB) > e2fbedb4 00200086 c15dc550 e2fbedb0 00000001 00200082 00001000 > 00000000 > 00000000 c15dc550 0000000a e94182b0 f3161430 26320f40 000001c5 > 00000000 > e94183bc c1c8c480 00000000 ecd7d300 c04e0bf2 c042e0e4 f7d767f8 > 003b6622 > Call Trace: > [<c04e0bf2>] blk_backing_dev_unplug+0x73/0x7b > [<c042e0e4>] getnstimeofday+0x30/0xb6 > [<c061ec7e>] io_schedule+0x3a/0x5c > [<c045626b>] sync_page+0x0/0x3b > [<c04562a3>] sync_page+0x38/0x3b > [<c061ed8a>] __wait_on_bit_lock+0x2a/0x52 > [<c045625d>] __lock_page+0x58/0x5e > [<c043788e>] wake_bit_function+0x0/0x3c > [<c04569e3>] do_generic_mapping_read+0x1e0/0x459 > [<c0458b0d>] generic_file_aio_read+0x173/0x1a6 > [<c0456070>] file_read_actor+0x0/0xe0 > [<c047202f>] do_sync_read+0xc7/0x10a > [<c0437859>] autoremove_wake_function+0x0/0x35 > [<c0471f68>] do_sync_read+0x0/0x10a > [<c04728b6>] vfs_read+0xa6/0x152 > [<c0472d0f>] sys_read+0x41/0x67 > [<c0403f64>] syscall_call+0x7/0xb > ======================= This isn't much help. The important processes here are khubd, usb-storage, and scsi_eh_*. Possibly some raid-related processes too, but I don't know which they would be. It also would help a lot to see your dmesg log. Especially if you would build your kernel with CONFIG_USB_DEBUG turned on. > Update: > > (For those who've been waiting breathlessly). It hangs at a particular > point in a particular file. In other words, it doesn't depend on the total > number of bytes transfered. Rather, when it reaches a particular point in > a particular file (12267520 bytes into a file that is 1073709056 bytes > long) it hangs. > > I begin to suspect that I have a "dead spot" in my USB hub. But what gets > me if that is true is why does the write work? Do cp and dd not check to > see if writes succeed? Depends what you mean. They do check the return codes from the underlying device drivers, but they don't try to read the data back to make sure it really was written. > I know it isn't a particular flash drive because I've used two different > sets of 7 USB drives and it seems to fail consistently no matter which. But you haven't tried using different hubs, different USB cables, or different computers. > Nonetheless, I'm beginning to think I'm dealing with a hardware issue, not > a kernel issue, just because it is so consistent. People have reported problems in which the hardware fails when it encounters a certain pattern of bytes in the data stream. Maybe you're seeing the same sort of thing. Alan Stern ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Linux-usb-users@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-17 20:49 ` Alan Stern @ 2007-03-17 21:35 ` Michael Schwarz 2007-03-18 2:06 ` [Linux-usb-users] " Alan Stern 2007-03-18 2:12 ` Alan Stern 0 siblings, 2 replies; 23+ messages in thread From: Michael Schwarz @ 2007-03-17 21:35 UTC (permalink / raw) To: Alan Stern; +Cc: Neil Brown, linux-raid, linux-usb-users Comments/questions below... -- Michael Schwarz > > This isn't much help. The important processes here are khubd, > usb-storage, and scsi_eh_*. Possibly some raid-related processes too, but > I don't know which they would be. I have no copy khubd running. What is the list policy on attachments? I have the kernel stack traces for the kernel threads you want, but the text is rather long. Being new to the group I do not know if I should just go ahead and paste something that long in here, or if it should be a MIME attachment. Can you (or someone) tell me, and I'll provide what is asked for. Aw, heck... I'll just put it at the end... > It also would help a lot to see your dmesg log. Especially if you would > build your kernel with CONFIG_USB_DEBUG turned on. I saw nothing unusual in the dmesg log, and I'm fairly familiar with reading it. I'm working with a stock FC6 kernel right now, but I will eventually roll my own with that flag on if we get to where we need that... I'm hoping some wise heads here will be able to make something out of some more logs and stack traces before I go that far... > >> I begin to suspect that I have a "dead spot" in my USB hub. But what >> gets >> me if that is true is why does the write work? Do cp and dd not check to >> see if writes succeed? > > Depends what you mean. They do check the return codes from the underlying > device drivers, but they don't try to read the data back to make sure it > really was written. Yeah, I meant return codes from the system calls. > >> I know it isn't a particular flash drive because I've used two different >> sets of 7 USB drives and it seems to fail consistently no matter which. > > But you haven't tried using different hubs, different USB cables, or > different computers. I just got back from buying a different hub and different cables to do just that. And I do have a different computer to try, but will try that last. > >> Nonetheless, I'm beginning to think I'm dealing with a hardware issue, >> not >> a kernel issue, just because it is so consistent. > > People have reported problems in which the hardware fails when it > encounters a certain pattern of bytes in the data stream. Maybe you're > seeing the same sort of thing. This theory is (somewhat) blown by the fact that I have tried different data sources (from vobcopied DVD images to dd'ed images of Linux distros). I suppose I could try just a large file of zeros... Anyways, while the list readers ponder this message, I'm going to try the new hub and cables. Thanks Alan and Neil! Nasty big stack trace set follows: Mar 17 16:13:08 localhost kernel: ======================= Mar 17 16:13:08 localhost kernel: scsi_eh_7 S 00000000 3744 4385 7 4386 2257 (L-TLB) Mar 17 16:13:08 localhost kernel: ebe2af68 00000046 00000000 00000000 c1c8c94c f7da30f0 c0420c7c c1c8c480 Mar 17 16:13:08 localhost kernel: f7da30f0 ebe2af28 00000008 e7676d30 f7da25f0 a0caa580 00000032 00000000 Mar 17 16:13:08 localhost kernel: e7676e3c c1c8c480 00000000 f7f53880 f7da30f0 c1c8c94c f7d98f2c f7f53880 Mar 17 16:13:08 localhost kernel: Call Trace: Mar 17 16:13:08 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:08 localhost kernel: [<c061fc0d>] _spin_unlock_irq+0x5/0x7 Mar 17 16:13:08 localhost kernel: [<c061e5c1>] __sched_text_start+0x999/0xa21 Mar 17 16:13:08 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:08 localhost kernel: [<f88e5474>] scsi_error_handler+0x5f/0x9b6 [scsi_mod] Mar 17 16:13:08 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:08 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:08 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:08 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:08 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:08 localhost kernel: ======================= Mar 17 16:13:08 localhost kernel: usb-storage S F7E08840 3048 4386 7 4410 4385 (L-TLB) Mar 17 16:13:08 localhost kernel: f3272f5c 00000046 f3272eec f7e08840 e764f000 f3efa000 f88cbd5f c1c8c4d4 Mar 17 16:13:08 localhost kernel: f7da0b30 c0420c7c 0000000a f7da25f0 f7da0b30 6ce38240 0000096b 00000000 Mar 17 16:13:08 localhost kernel: f7da26fc c1c8c480 00000000 ec92ec40 00000000 0000000f 00000000 00000000 Mar 17 16:13:08 localhost kernel: Call Trace: Mar 17 16:13:08 localhost kernel: [<f88cbd5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] Mar 17 16:13:08 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:08 localhost kernel: [<c061fa0d>] __down_interruptible+0xab/0xf0 Mar 17 16:13:08 localhost kernel: [<c042f220>] del_timer+0x41/0x47 Mar 17 16:13:08 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:08 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:08 localhost kernel: [<c061f8ef>] __down_failed_interruptible+0x7/0xc Mar 17 16:13:08 localhost kernel: [<f88cd067>] usb_stor_control_thread+0x45/0x1a3 [usb_storage] Mar 17 16:13:08 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:08 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:08 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:08 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:08 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:08 localhost kernel: ======================= Mar 17 16:13:08 localhost kernel: scsi_eh_8 S 00000000 3744 4410 7 4411 4386 (L-TLB) Mar 17 16:13:08 localhost kernel: f1ad2f68 00000046 00000000 00000000 c1c8c4d4 f7da30f0 c0420c7c c1c8c480 Mar 17 16:13:08 localhost kernel: f7da30f0 f1ad2f28 00000009 f3f150f0 f3f145f0 b0d342c0 00000032 00000000 Mar 17 16:13:08 localhost kernel: f3f151fc c1c8c480 00000000 f7f53880 f7da30f0 c1c8c4d4 f7d98f2c f7f53880 Mar 17 16:13:08 localhost kernel: Call Trace: Mar 17 16:13:08 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:08 localhost kernel: [<c061fc0d>] _spin_unlock_irq+0x5/0x7 Mar 17 16:13:09 localhost kernel: [<c061e5c1>] __sched_text_start+0x999/0xa21 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<f88e5474>] scsi_error_handler+0x5f/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: usb-storage S F3F03B40 3068 4411 7 4436 4410 (L-TLB) Mar 17 16:13:09 localhost kernel: e3339f5c 00000046 e3339eec f3f03b40 e9355400 f3efa080 f88cbd5f c1c8c4d4 Mar 17 16:13:09 localhost kernel: f7da0b30 c0420c7c 0000000a f3f145f0 f7da0b30 7578c640 0000096b 00000000 Mar 17 16:13:09 localhost kernel: f3f146fc c1c8c480 00000000 e9ddad40 00000000 0000000f 00000000 00000000 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<f88cbd5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fa0d>] __down_interruptible+0xab/0xf0 Mar 17 16:13:09 localhost kernel: [<c042f220>] del_timer+0x41/0x47 Mar 17 16:13:09 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c061f8ef>] __down_failed_interruptible+0x7/0xc Mar 17 16:13:09 localhost kernel: [<f88cd067>] usb_stor_control_thread+0x45/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: scsi_eh_9 S 00000000 3744 4436 7 4437 4411 (L-TLB) Mar 17 16:13:09 localhost kernel: e31b4f68 00000046 00000000 00000000 c1c8c4d4 f7da30f0 c0420c7c c1c8c480 Mar 17 16:13:09 localhost kernel: f7da30f0 e31b4f28 00000009 f3e2c2b0 e2c70db0 c0cc9dc0 00000032 00000000 Mar 17 16:13:09 localhost kernel: f3e2c3bc c1c8c480 00000000 f3ce14c0 f7da30f0 c1c8c4d4 f7d98f2c f3ce14c0 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fc0d>] _spin_unlock_irq+0x5/0x7 Mar 17 16:13:09 localhost kernel: [<c061e5c1>] __sched_text_start+0x999/0xa21 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<f88e5474>] scsi_error_handler+0x5f/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: usb-storage S F3D0B440 3048 4437 7 4451 4436 (L-TLB) Mar 17 16:13:09 localhost kernel: e2cc8f5c 00000046 e2cc8eec f3d0b440 f3cc0400 f3efa100 f88cbd5f c1c8c4d4 Mar 17 16:13:09 localhost kernel: f7da0b30 c0420c7c 0000000a e2c70db0 f7da0b30 5cf96980 0000096b 00000000 Mar 17 16:13:09 localhost kernel: e2c70ebc c1c8c480 00000000 e9316180 00000000 0000000f 00000000 00000000 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<f88cbd5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fa0d>] __down_interruptible+0xab/0xf0 Mar 17 16:13:09 localhost kernel: [<c042f220>] del_timer+0x41/0x47 Mar 17 16:13:09 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c061f8ef>] __down_failed_interruptible+0x7/0xc Mar 17 16:13:09 localhost kernel: [<f88cd067>] usb_stor_control_thread+0x45/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: scsi_eh_10 S 00000000 3744 4451 7 4452 4437 (L-TLB) Mar 17 16:13:09 localhost kernel: e3368f68 00000046 00000000 00000000 c1c8c94c f7da30f0 c0420c7c c1c8c480 Mar 17 16:13:09 localhost kernel: f7da30f0 e3368f28 00000009 ea739870 ea738270 d0d53b00 00000032 00000000 Mar 17 16:13:09 localhost kernel: ea73997c c1c8c480 00000000 f7f53880 f7d086f0 c1c8c94c f7d10c80 f7f53880 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fc0d>] _spin_unlock_irq+0x5/0x7 Mar 17 16:13:09 localhost kernel: [<c061e5c1>] __sched_text_start+0x999/0xa21 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<f88e5474>] scsi_error_handler+0x5f/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: usb-storage S F3F03CC0 3032 4452 7 4476 4451 (L-TLB) Mar 17 16:13:09 localhost kernel: e3352f5c 00000046 e3352eec f3f03cc0 f3cbc000 f3efa180 f88cbd5f c1c8c4d4 Mar 17 16:13:09 localhost kernel: f7da0b30 c0420c7c 0000000a ea738270 f7da0b30 aab2df80 0000096b 00000000 Mar 17 16:13:09 localhost kernel: ea73837c c1c8c480 00000000 f3fb8540 00000000 0000000f 00000000 00000000 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<f88cbd5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fa0d>] __down_interruptible+0xab/0xf0 Mar 17 16:13:09 localhost kernel: [<c042f220>] del_timer+0x41/0x47 Mar 17 16:13:09 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c061f8ef>] __down_failed_interruptible+0x7/0xc Mar 17 16:13:09 localhost kernel: [<f88cd067>] usb_stor_control_thread+0x45/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: scsi_eh_11 D 00000000 3688 4476 7 4477 4452 (L-TLB) Mar 17 16:13:09 localhost kernel: f32eff30 00000046 00000000 00000000 00000000 00000000 f3f16260 c1c89780 Mar 17 16:13:09 localhost kernel: 00000000 00000000 0000000a e32a2e30 f3f160b0 31528500 00000254 00000000 Mar 17 16:13:09 localhost kernel: e32a2f3c c1c8c480 00000000 e9316b80 00000096 f7cb1400 f7e08640 f7e08640 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<c0586b4f>] unlink1+0x74/0x86 Mar 17 16:13:09 localhost kernel: [<c061e701>] wait_for_completion+0x73/0x98 Mar 17 16:13:09 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:09 localhost kernel: [<f88cb53a>] command_abort+0x64/0x6d [usb_storage] Mar 17 16:13:09 localhost kernel: [<f88e57a3>] scsi_error_handler+0x38e/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: usb-storage S 00000010 3048 4477 7 4500 4476 (L-TLB) Mar 17 16:13:09 localhost kernel: e31dee78 00000046 f88459c0 00000010 f7e0865c f3da8364 c0587c0e 00000010 Mar 17 16:13:09 localhost kernel: 00000000 f7da25f0 0000000a edb68630 f7da25f0 390b2d00 00000246 00000000 Mar 17 16:13:09 localhost kernel: edb6873c c1c8c480 00000000 ec92ec40 00219434 c04062cf 00000073 ffffffff Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<c0587c0e>] usb_hcd_submit_urb+0x6cd/0x773 Mar 17 16:13:09 localhost kernel: [<c04062cf>] do_IRQ+0xc6/0xdb Mar 17 16:13:09 localhost kernel: [<c061ecc2>] schedule_timeout+0x13/0x8d Mar 17 16:13:09 localhost kernel: [<c061e925>] wait_for_completion_interruptible_timeout+0x99/0xd5 Mar 17 16:13:09 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:09 localhost kernel: [<f88cb90c>] usb_stor_msg_common+0xc9/0xe8 [usb_storage] Mar 17 16:13:09 localhost kernel: [<f88cbd5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] Mar 17 16:13:09 localhost kernel: [<f88cc2a9>] usb_stor_Bulk_transport+0xcb/0x221 [usb_storage] Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<f88cc414>] usb_stor_invoke_transport+0x15/0x259 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c061fa40>] __down_interruptible+0xde/0xf0 Mar 17 16:13:09 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<f88cd14a>] usb_stor_control_thread+0x128/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: scsi_eh_12 S 00000000 3744 4500 7 4501 4477 (L-TLB) Mar 17 16:13:09 localhost kernel: e2d43f68 00000046 00000000 00000000 c1c8c4d4 f7da30f0 c0420c7c c1c8c480 Mar 17 16:13:09 localhost kernel: f7da30f0 e2d43f28 00000009 e2d4d370 e2c71330 f0c7f100 00000032 00000000 Mar 17 16:13:09 localhost kernel: e2d4d47c c1c8c480 00000000 f7f53880 f7da30f0 c1c8c4d4 f7d98f2c f7f53880 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fc0d>] _spin_unlock_irq+0x5/0x7 Mar 17 16:13:09 localhost kernel: [<c061e5c1>] __sched_text_start+0x999/0xa21 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<f88e5474>] scsi_error_handler+0x5f/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: usb-storage S F3D0B740 3048 4501 7 4518 4500 (L-TLB) Mar 17 16:13:09 localhost kernel: f1315f5c 00000046 f1315eec f3d0b740 e764f400 f3efa280 f88cbd5f c1c8c4d4 Mar 17 16:13:09 localhost kernel: f7da0b30 c0420c7c 0000000a e2c71330 f7da0b30 a4522ec0 0000096b 00000000 Mar 17 16:13:09 localhost kernel: e2c7143c c1c8c480 00000000 e9ddab40 00000000 0000000f 00000000 00000000 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<f88cbd5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fa0d>] __down_interruptible+0xab/0xf0 Mar 17 16:13:09 localhost kernel: [<c042f220>] del_timer+0x41/0x47 Mar 17 16:13:09 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c061f8ef>] __down_failed_interruptible+0x7/0xc Mar 17 16:13:09 localhost kernel: [<f88cd067>] usb_stor_control_thread+0x45/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: scsi_eh_13 S 00000000 3744 4518 7 4519 4501 (L-TLB) Mar 17 16:13:09 localhost kernel: e2d2af68 00000046 00000000 00000000 c1c8c94c f7da30f0 c0420c7c c1c8c480 Mar 17 16:13:09 localhost kernel: f7da30f0 e2d2af28 00000009 e2d4d8f0 e2c718b0 00c14c00 00000033 00000000 Mar 17 16:13:09 localhost kernel: e2d4d9fc c1c8c480 00000000 f7f53880 f7d086f0 c1c8c94c f7d10c80 f7f53880 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fc0d>] _spin_unlock_irq+0x5/0x7 Mar 17 16:13:09 localhost kernel: [<c061e5c1>] __sched_text_start+0x999/0xa21 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<f88e5474>] scsi_error_handler+0x5f/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88e5415>] scsi_error_handler+0x0/0x9b6 [scsi_mod] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 Mar 17 16:13:09 localhost kernel: ======================= Mar 17 16:13:09 localhost kernel: usb-storage S F7E088C0 3068 4519 7 4976 4518 (L-TLB) Mar 17 16:13:09 localhost kernel: e2c3ff5c 00000046 e2c3feec f7e088c0 f3cc1400 f3efa300 f88cbd5f c1c8c4d4 Mar 17 16:13:09 localhost kernel: f7da0b30 c0420c7c 0000000a e2c718b0 f7da0b30 9f5071c0 0000096b 00000000 Mar 17 16:13:09 localhost kernel: e2c719bc c1c8c480 00000000 e2c445c0 00000000 0000000f 00000000 00000000 Mar 17 16:13:09 localhost kernel: Call Trace: Mar 17 16:13:09 localhost kernel: [<f88cbd5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420c7c>] enqueue_task+0x29/0x39 Mar 17 16:13:09 localhost kernel: [<c061fa0d>] __down_interruptible+0xab/0xf0 Mar 17 16:13:09 localhost kernel: [<c042f220>] del_timer+0x41/0x47 Mar 17 16:13:09 localhost kernel: [<c04226ab>] default_wake_function+0x0/0xc Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c061f8ef>] __down_failed_interruptible+0x7/0xc Mar 17 16:13:09 localhost kernel: [<f88cd067>] usb_stor_control_thread+0x45/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c0420a03>] complete+0x39/0x48 Mar 17 16:13:09 localhost kernel: [<f88cd022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] Mar 17 16:13:09 localhost kernel: [<c043779f>] kthread+0xb0/0xd9 Mar 17 16:13:09 localhost kernel: [<c04376ef>] kthread+0x0/0xd9 Mar 17 16:13:09 localhost kernel: [<c0404b33>] kernel_thread_helper+0x7/0x10 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Linux-usb-users@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-17 21:35 ` Michael Schwarz @ 2007-03-18 2:06 ` Alan Stern 2007-03-18 2:12 ` Alan Stern 1 sibling, 0 replies; 23+ messages in thread From: Alan Stern @ 2007-03-18 2:06 UTC (permalink / raw) To: Michael Schwarz; +Cc: Neil Brown, linux-raid, linux-usb-users On Sat, 17 Mar 2007, Michael Schwarz wrote: > Comments/questions below... > > -- > Michael Schwarz > > > This isn't much help. The important processes here are khubd, > > usb-storage, and scsi_eh_*. Possibly some raid-related processes too, but > > I don't know which they would be. > > I have no copy khubd running. That in itself is a very bad sign. You need to look at the dmesg log. Alan Stern ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-17 21:35 ` Michael Schwarz 2007-03-18 2:06 ` [Linux-usb-users] " Alan Stern @ 2007-03-18 2:12 ` Alan Stern 2007-03-18 4:42 ` Michael Schwarz 1 sibling, 1 reply; 23+ messages in thread From: Alan Stern @ 2007-03-18 2:12 UTC (permalink / raw) To: Michael Schwarz; +Cc: Neil Brown, linux-raid, linux-usb-users On Sat, 17 Mar 2007, Michael Schwarz wrote: > Nasty big stack trace set follows: This format is kind of awkward. For one thing, a lot of lines were wrapped by your email program. For another, you copied the stack trace from the syslog log file. That is not a good way to do it; syslogd is liable to miss bits and pieces of the kernel log when a lot of information comes along all at once. You're much better off getting the stack trace data directly from dmesg. (And when you do, you don't end up with 30 columns of wasted data added to the beginning of each line.) Alan Stern ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-18 2:12 ` Alan Stern @ 2007-03-18 4:42 ` Michael Schwarz 2007-03-18 16:56 ` [Linux-usb-users] " Michael Schwarz 0 siblings, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-18 4:42 UTC (permalink / raw) To: Alan Stern; +Cc: Neil Brown, linux-raid, linux-usb-users Yeah, I understand that. Sorry, I use squirrelmail. Pretty limited... I'll get you a "raw" dmseg output when I replicate the problem. Let me clarify on khubd: There is such an entry in my process table, but there was no kernel thread stack trace for it when I dumped the traces. I don't know if that is a bad sign... Right now I thought it would be best to verify my hardware, so I'm working with the new hubs and cables, writing a large file to each of seven attached (non-md) flash drives and diff-ing the usb drive contents against the original file. If I have dead cables, connectors, or flash drives that would save you all a lot of hassle. When I'm done with that, I'll again replicate my problem, grab the logs straight from dmesg, and post another entry here. I'll even fire up kmail or mutt to avoid bad formatting. Thanks again. -- Michael Schwarz > On Sat, 17 Mar 2007, Michael Schwarz wrote: > >> Nasty big stack trace set follows: > > This format is kind of awkward. For one thing, a lot of lines were > wrapped by your email program. > > For another, you copied the stack trace from the syslog log file. That is > not a good way to do it; syslogd is liable to miss bits and pieces of > the kernel log when a lot of information comes along all at once. You're > much better off getting the stack trace data directly from dmesg. (And > when you do, you don't end up with 30 columns of wasted data added to the > beginning of each line.) > > Alan Stern > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Linux-usb-users@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-18 4:42 ` Michael Schwarz @ 2007-03-18 16:56 ` Michael Schwarz 2007-03-18 17:44 ` Michael Schwarz ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Michael Schwarz @ 2007-03-18 16:56 UTC (permalink / raw) To: mschwarz; +Cc: Alan Stern, Neil Brown, linux-raid, linux-usb-users [-- Attachment #1: Type: text/plain, Size: 1591 bytes --] Okay. I've verified my hardware (by doing large write/reads to non-raid file systems on each of the seven USB flash drives on the hub). So this morning I booted cold and began gathering log data. I'm sending it to you guys (you hsould know this) before looking at it myself. Here's the sequence: Cold boot Gnome login echo t > /proc/sysrq-trigger dmesg > dmesg-0-beforehub.log Attached hub with 7 drives dmesg > dmesg-1-afterhub.log mdadm --create /dev/md0 --auto=md --level=0 --raid-devices=7 /dev/sd? dmesg > dmesg-2-aftermdcreate.log mke2fs -b 4096 -R stride=16 /dev/md0 dmesg > dmesg-3-aftermkfs.log mount /dev/md0 /mnt cp -rv ~mschwarz/FUTURAMA_S2D2/* /mnt dmesg > dmesg-4-afterbigwrite.log cp -rv /mnt/* fs2d2/ At this point, the process hangs. So I ran: echo t > /proc/sysrq-trigger dmesg > dmesg-5-hungread.log ...in a different root window. All these operations were performed as root (in order to be as dangerous as possible -- actually, in order to reduce possible permissions issues; although I don't think there are any) All of these dmesg logs are attached in gzip format. I don't know what majordomo will do with those, but the cc's going directly to Alan and Neil should come through. I'm going to start combing these files myself, so if you guys want to save time, you can certainly give me a couple hours to get started! ;-) As always, I'm very grateful for your assistance and that of the group! -- Michael Schwarz > Yeah, I understand that. > > Sorry, I use squirrelmail. Pretty limited... > > I'll get you a "raw" dmseg output when I replicate the problem. > [-- Attachment #2: dmesg-0-beforehub.log.gz --] [-- Type: application/x-gzip, Size: 8407 bytes --] [-- Attachment #3: dmesg-1-afterhub.log.gz --] [-- Type: application/x-gzip, Size: 9085 bytes --] [-- Attachment #4: dmesg-2-aftermdcreate.log.gz --] [-- Type: application/x-gzip, Size: 9340 bytes --] [-- Attachment #5: dmesg-3-aftermkfs.log.gz --] [-- Type: application/x-gzip, Size: 9336 bytes --] [-- Attachment #6: dmesg-4-afterbigwrite.log.gz --] [-- Type: application/x-gzip, Size: 9350 bytes --] [-- Attachment #7: dmesg-5-hungread.log.gz --] [-- Type: application/x-gzip, Size: 16862 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-18 16:56 ` [Linux-usb-users] " Michael Schwarz @ 2007-03-18 17:44 ` Michael Schwarz 2007-03-18 21:55 ` Michael Schwarz 2007-03-18 21:57 ` Neil Brown 2 siblings, 0 replies; 23+ messages in thread From: Michael Schwarz @ 2007-03-18 17:44 UTC (permalink / raw) To: mschwarz; +Cc: Alan Stern, Neil Brown, linux-raid, linux-usb-users As I suspected, majordomo doesn't like attachments. I looked through the logs. The only odd thing I see before the read that hangs is this message: smartd[3069]: Device: /dev/hda, 1 Currently unreadable (pending) sectors Which I only see in /var/log/messages because the stack dump blows whatever buffer size if reserved for dmesg (the whole stack trace doesn't make it in). I'm going to try a different computer running a different OS next. Alan, Neil, I wasn't able to make anything of those logs. I've also grabbed /var/log/message to get the gap between dmesg-4-* and dmesg-5-*. I'll send that to you two in a separate message. If anyone else would like my logs, let me know. -- Michael Schwarz > Okay. I've verified my hardware (by doing large write/reads to non-raid > file systems on each of the seven USB flash drives on the hub). > > So this morning I booted cold and began gathering log data. I'm sending it > to you guys (you hsould know this) before looking at it myself. Here's > the sequence: > > Cold boot > Gnome login > echo t > /proc/sysrq-trigger > dmesg > dmesg-0-beforehub.log > Attached hub with 7 drives > dmesg > dmesg-1-afterhub.log > mdadm --create /dev/md0 --auto=md --level=0 --raid-devices=7 /dev/sd? > dmesg > dmesg-2-aftermdcreate.log > mke2fs -b 4096 -R stride=16 /dev/md0 > dmesg > dmesg-3-aftermkfs.log > mount /dev/md0 /mnt > cp -rv ~mschwarz/FUTURAMA_S2D2/* /mnt > dmesg > dmesg-4-afterbigwrite.log > cp -rv /mnt/* fs2d2/ > > At this point, the process hangs. So I ran: > > echo t > /proc/sysrq-trigger > dmesg > dmesg-5-hungread.log > > ...in a different root window. All these operations were performed as root > (in order to be as dangerous as possible -- actually, in order to reduce > possible permissions issues; although I don't think there are any) > > All of these dmesg logs are attached in gzip format. I don't know what > majordomo will do with those, but the cc's going directly to Alan and Neil > should come through. I'm going to start combing these files myself, so if > you guys want to save time, you can certainly give me a couple hours to > get started! ;-) > > As always, I'm very grateful for your assistance and that of the group! > > -- > Michael Schwarz > >> Yeah, I understand that. >> >> Sorry, I use squirrelmail. Pretty limited... >> >> I'll get you a "raw" dmseg output when I replicate the problem. >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-18 16:56 ` [Linux-usb-users] " Michael Schwarz 2007-03-18 17:44 ` Michael Schwarz @ 2007-03-18 21:55 ` Michael Schwarz 2007-03-18 21:57 ` Neil Brown 2 siblings, 0 replies; 23+ messages in thread From: Michael Schwarz @ 2007-03-18 21:55 UTC (permalink / raw) To: mschwarz; +Cc: Alan Stern, Neil Brown, linux-raid, linux-usb-users Just tried in on a stock Ubuntu Edgy install. Same thing. Locks on read. I've got a dmesg (w/stack trace) file from the ubuntu attempt (it was clean prior to doing the read) which I will send to Alan and Neil (any anyone else who asks for it). There were no error messages in dmesg prior to running the stack trace. -- Michael Schwarz > Okay. I've verified my hardware (by doing large write/reads to non-raid > file systems on each of the seven USB flash drives on the hub). > > So this morning I booted cold and began gathering log data. I'm sending it > to you guys (you hsould know this) before looking at it myself. Here's > the sequence: > > Cold boot > Gnome login > echo t > /proc/sysrq-trigger > dmesg > dmesg-0-beforehub.log > Attached hub with 7 drives > dmesg > dmesg-1-afterhub.log > mdadm --create /dev/md0 --auto=md --level=0 --raid-devices=7 /dev/sd? > dmesg > dmesg-2-aftermdcreate.log > mke2fs -b 4096 -R stride=16 /dev/md0 > dmesg > dmesg-3-aftermkfs.log > mount /dev/md0 /mnt > cp -rv ~mschwarz/FUTURAMA_S2D2/* /mnt > dmesg > dmesg-4-afterbigwrite.log > cp -rv /mnt/* fs2d2/ > > At this point, the process hangs. So I ran: > > echo t > /proc/sysrq-trigger > dmesg > dmesg-5-hungread.log > > ...in a different root window. All these operations were performed as root > (in order to be as dangerous as possible -- actually, in order to reduce > possible permissions issues; although I don't think there are any) > > All of these dmesg logs are attached in gzip format. I don't know what > majordomo will do with those, but the cc's going directly to Alan and Neil > should come through. I'm going to start combing these files myself, so if > you guys want to save time, you can certainly give me a couple hours to > get started! ;-) > > As always, I'm very grateful for your assistance and that of the group! > > -- > Michael Schwarz > >> Yeah, I understand that. >> >> Sorry, I use squirrelmail. Pretty limited... >> >> I'll get you a "raw" dmseg output when I replicate the problem. >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-18 16:56 ` [Linux-usb-users] " Michael Schwarz 2007-03-18 17:44 ` Michael Schwarz 2007-03-18 21:55 ` Michael Schwarz @ 2007-03-18 21:57 ` Neil Brown 2007-03-19 3:27 ` Michael Schwarz 2 siblings, 1 reply; 23+ messages in thread From: Neil Brown @ 2007-03-18 21:57 UTC (permalink / raw) To: mschwarz; +Cc: Alan Stern, linux-raid, linux-usb-users On Sunday March 18, mschwarz@multitool.net wrote: > cp -rv /mnt/* fs2d2/ > > At this point, the process hangs. So I ran: > > echo t > /proc/sysrq-trigger > dmesg > dmesg-5-hungread.log Unfortunate (as you say) the whole trace doesn't fit. Could you try compiling the kernel with a larger value for CONFIG_LOG_BUF_SHIFT ?? It looks like you have 17. 21 is the max. 19 should probably be sufficient. Two things look a bit odd. 1/ hald-addon-st (process 3974) seems to be hung doing a 'test_unit_ready' after a media-changed signal. Any idea why? Could you try killing of hald while running the test? 2/ one usb-storage thread (3667) appears to be waiting for IO to complete (though that is just a guess really). Maybe usb-storage is waiting for the hald test-unit-ready? But I'm a bit out of my depth here, so I'll leave it to the USB experts. NeilBrown ======================= hald-addon-st D EF9FBD00 2812 3974 2935 3977 3966 (NOTLB) ef9fbd14 00000086 00000002 ef9fbd00 ef9fbcfc 00000000 00000000 ed4fcbe4 c04dc5cc 00000086 0000000a ed407770 c06fb480 18f88700 00000206 00000000 ed40787c c1c8c480 00000000 ebe7adc0 001d605d db30e9c8 00000096 ffffffff Call Trace: [<c04dc5cc>] elv_next_request+0xfe/0x1ac [<c061e701>] wait_for_completion+0x73/0x98 [<c04226ab>] default_wake_function+0x0/0xc [<c04df415>] blk_execute_rq+0xcf/0xe5 [<c04de74f>] blk_end_sync_rq+0x0/0x23 [<c04dbdf0>] elv_set_request+0x14/0x22 [<c04decda>] get_request+0x205/0x2b2 [<c04df4e7>] get_request_wait+0x26/0x16c [<f8de1116>] scsi_execute+0xc6/0xd9 [scsi_mod] [<f8de11e0>] scsi_execute_req+0xb7/0xd5 [scsi_mod] [<f8de1241>] scsi_test_unit_ready+0x43/0x80 [scsi_mod] [<f8d726a5>] sd_media_changed+0x60/0xb5 [sd_mod] [<c04e8c82>] kobject_get+0xf/0x13 [<c0491481>] check_disk_change+0x16/0x5c [<c055890a>] class_device_get+0xe/0x14 [<f8d72b70>] sd_open+0x92/0x120 [sd_mod] [<c04e14cc>] exact_match+0x0/0x4 [<c0491b65>] do_open+0x19f/0x255 [<c0491d8e>] blkdev_open+0x0/0x4d [<c0491db3>] blkdev_open+0x25/0x4d [<c0470cac>] __dentry_open+0xc3/0x17a [<c0470ddd>] nameidata_to_filp+0x24/0x33 [<c0470e1e>] do_filp_open+0x32/0x39 [<c061f0e0>] do_nanosleep+0x42/0x66 [<c0470bdf>] get_unused_fd+0xb3/0xbd [<c0470e67>] do_sys_open+0x42/0xbe [<c0470f1c>] sys_open+0x1c/0x1e [<c0403f64>] syscall_call+0x7/0xb ======================= usb-storage S 00000010 3048 3667 7 3669 3666 (L-TLB) ebcaee78 00000046 f88459c0 00000010 ebc6b7dc f6de08e4 c0587c0e 00000010 00000000 c06fb480 0000000a ed5f2bb0 d80fa9b0 e8b0e880 00000205 00000000 ed5f2cbc c1c8c480 00000000 ebe7a9c0 001d5d31 00000205 00000000 ffffffff Call Trace: [<c0587c0e>] usb_hcd_submit_urb+0x6cd/0x773 [<c061ecc2>] schedule_timeout+0x13/0x8d [<c061e925>] wait_for_completion_interruptible_timeout+0x99/0xd5 [<c04226ab>] default_wake_function+0x0/0xc [<f8db090c>] usb_stor_msg_common+0xc9/0xe8 [usb_storage] [<f8db0d5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] [<f8db12a9>] usb_stor_Bulk_transport+0xcb/0x221 [usb_storage] [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] [<f8db1414>] usb_stor_invoke_transport+0x15/0x259 [usb_storage] [<c061fa40>] __down_interruptible+0xde/0xf0 [<c04226ab>] default_wake_function+0x0/0xc [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] [<f8db214a>] usb_stor_control_thread+0x128/0x1a3 [usb_storage] [<c0420a03>] complete+0x39/0x48 [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] [<c043779f>] kthread+0xb0/0xd9 [<c04376ef>] kthread+0x0/0xd9 [<c0404b33>] kernel_thread_helper+0x7/0x10 ======================= ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-18 21:57 ` Neil Brown @ 2007-03-19 3:27 ` Michael Schwarz 2007-03-19 14:29 ` Bill Davidsen 0 siblings, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-19 3:27 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid, Alan Stern, linux-usb-users More than ever, I am convinced that it is actually a hardware problem, but I am curious for the opinions of both of you on whether the "system" (meaning, I guess, the combination of usb-storage driver and raid) is really doing the best with what it has. My last effort was to switch to a different computer. When I did, I got in the dmesg log (unfortunately, not preserved, although I should be able to recreate) that one of the flash drives had bad blocks. Some part of the system eventually decided it was a "dead device" (I believe dmesg indicate the scsi subsystem said so). The device (it happened to be /dev/sdc) was peremptorially dropped from the system. This appears to be what hanged the raid system. (Why these messages never appeared on the other computer is beyond me; obviously some difference in how the actual USB controller reports errors, but, as I said, I've never studied USB drivers or hardware. In fact, once you get beyond the UARTs you are getting sophisticated to me) I've built an array of five known-good devices and so far it works swimmingly (at least on the hardware that was better at error reporting). So it seems to me that there is probably nothing actually wrong with the drivers or their interactions at it leaves me only asking if there should be some sort of improvement in error reporting/recovery up to userland. If I am right and the scsi system was marking a device as dead, shouldn't the userland read against the md device get an error instead of an indefinite hang? Beyond this question which I leave to you (although I'd love to hear your answers/thoughts), I think we can safely say that the problem was hardware (even if hard to find). If either of you would like, I'd be happy to find time this week to recreate the error on my "better" PC and send that along. As for rolling a custom kernel with more message buffer, well, I'm going to be getting into a new device driver in the coming months, so a custom debug kernel is definitely in my future, but I'm not sure when. I must say, the kernel has become a much more complex beastie since 2.2.x! (Although it also appears to be improved and somewhat more organized -- but definitely MUCH larger!) Thank you both so much! I wouldn't even have diagnosed my hardware problem without your prompts. I'm very grateful. Let me know if you'd like those dmesg logs or if you'd just like to let it go! -- Michael Schwarz > On Sunday March 18, mschwarz@multitool.net wrote: >> cp -rv /mnt/* fs2d2/ >> >> At this point, the process hangs. So I ran: >> >> echo t > /proc/sysrq-trigger >> dmesg > dmesg-5-hungread.log > > Unfortunate (as you say) the whole trace doesn't fit. > Could you try compiling the kernel with a larger value for > CONFIG_LOG_BUF_SHIFT ?? It looks like you have 17. 21 is the max. > 19 should probably be sufficient. > > Two things look a bit odd. > 1/ hald-addon-st (process 3974) seems to be hung doing a > 'test_unit_ready' after a media-changed signal. Any idea why? > Could you try killing of hald while running the test? > > 2/ one usb-storage thread (3667) appears to be waiting for > IO to complete (though that is just a guess really). > > Maybe usb-storage is waiting for the hald test-unit-ready? > > But I'm a bit out of my depth here, so I'll leave it to the USB > experts. > > NeilBrown > > ======================= > hald-addon-st D EF9FBD00 2812 3974 2935 3977 3966 (NOTLB) > ef9fbd14 00000086 00000002 ef9fbd00 ef9fbcfc 00000000 00000000 > ed4fcbe4 > c04dc5cc 00000086 0000000a ed407770 c06fb480 18f88700 00000206 > 00000000 > ed40787c c1c8c480 00000000 ebe7adc0 001d605d db30e9c8 00000096 > ffffffff > Call Trace: > [<c04dc5cc>] elv_next_request+0xfe/0x1ac > [<c061e701>] wait_for_completion+0x73/0x98 > [<c04226ab>] default_wake_function+0x0/0xc > [<c04df415>] blk_execute_rq+0xcf/0xe5 > [<c04de74f>] blk_end_sync_rq+0x0/0x23 > [<c04dbdf0>] elv_set_request+0x14/0x22 > [<c04decda>] get_request+0x205/0x2b2 > [<c04df4e7>] get_request_wait+0x26/0x16c > [<f8de1116>] scsi_execute+0xc6/0xd9 [scsi_mod] > [<f8de11e0>] scsi_execute_req+0xb7/0xd5 [scsi_mod] > [<f8de1241>] scsi_test_unit_ready+0x43/0x80 [scsi_mod] > [<f8d726a5>] sd_media_changed+0x60/0xb5 [sd_mod] > [<c04e8c82>] kobject_get+0xf/0x13 > [<c0491481>] check_disk_change+0x16/0x5c > [<c055890a>] class_device_get+0xe/0x14 > [<f8d72b70>] sd_open+0x92/0x120 [sd_mod] > [<c04e14cc>] exact_match+0x0/0x4 > [<c0491b65>] do_open+0x19f/0x255 > [<c0491d8e>] blkdev_open+0x0/0x4d > [<c0491db3>] blkdev_open+0x25/0x4d > [<c0470cac>] __dentry_open+0xc3/0x17a > [<c0470ddd>] nameidata_to_filp+0x24/0x33 > [<c0470e1e>] do_filp_open+0x32/0x39 > [<c061f0e0>] do_nanosleep+0x42/0x66 > [<c0470bdf>] get_unused_fd+0xb3/0xbd > [<c0470e67>] do_sys_open+0x42/0xbe > [<c0470f1c>] sys_open+0x1c/0x1e > [<c0403f64>] syscall_call+0x7/0xb > ======================= > usb-storage S 00000010 3048 3667 7 3669 3666 (L-TLB) > ebcaee78 00000046 f88459c0 00000010 ebc6b7dc f6de08e4 c0587c0e > 00000010 > 00000000 c06fb480 0000000a ed5f2bb0 d80fa9b0 e8b0e880 00000205 > 00000000 > ed5f2cbc c1c8c480 00000000 ebe7a9c0 001d5d31 00000205 00000000 > ffffffff > Call Trace: > [<c0587c0e>] usb_hcd_submit_urb+0x6cd/0x773 > [<c061ecc2>] schedule_timeout+0x13/0x8d > [<c061e925>] wait_for_completion_interruptible_timeout+0x99/0xd5 > [<c04226ab>] default_wake_function+0x0/0xc > [<f8db090c>] usb_stor_msg_common+0xc9/0xe8 [usb_storage] > [<f8db0d5f>] usb_stor_bulk_transfer_buf+0x61/0x98 [usb_storage] > [<f8db12a9>] usb_stor_Bulk_transport+0xcb/0x221 [usb_storage] > [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] > [<f8db1414>] usb_stor_invoke_transport+0x15/0x259 [usb_storage] > [<c061fa40>] __down_interruptible+0xde/0xf0 > [<c04226ab>] default_wake_function+0x0/0xc > [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] > [<f8db214a>] usb_stor_control_thread+0x128/0x1a3 [usb_storage] > [<c0420a03>] complete+0x39/0x48 > [<f8db2022>] usb_stor_control_thread+0x0/0x1a3 [usb_storage] > [<c043779f>] kthread+0xb0/0xd9 > [<c04376ef>] kthread+0x0/0xd9 > [<c0404b33>] kernel_thread_helper+0x7/0x10 > ======================= > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Linux-usb-users@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-19 3:27 ` Michael Schwarz @ 2007-03-19 14:29 ` Bill Davidsen 2007-03-19 14:54 ` [Linux-usb-users] " Michael Schwarz 0 siblings, 1 reply; 23+ messages in thread From: Bill Davidsen @ 2007-03-19 14:29 UTC (permalink / raw) To: mschwarz; +Cc: Neil Brown, linux-raid, Alan Stern, linux-usb-users Michael Schwarz wrote: > More than ever, I am convinced that it is actually a hardware problem, but > I am curious for the opinions of both of you on whether the "system" > (meaning, I guess, the combination of usb-storage driver and raid) is > really doing the best with what it has. > See below, but the short answer is there is probably room for improvement. > My last effort was to switch to a different computer. When I did, I got in > the dmesg log (unfortunately, not preserved, although I should be able to > recreate) that one of the flash drives had bad blocks. Some part of the > system eventually decided it was a "dead device" (I believe dmesg indicate > the scsi subsystem said so). The device (it happened to be /dev/sdc) was > peremptorially dropped from the system. This appears to be what hanged the > raid system. > > (Why these messages never appeared on the other computer is beyond me; > obviously some difference in how the actual USB controller reports errors, > but, as I said, I've never studied USB drivers or hardware. In fact, once > you get beyond the UARTs you are getting sophisticated to me) > > I've built an array of five known-good devices and so far it works > swimmingly (at least on the hardware that was better at error reporting). > > So it seems to me that there is probably nothing actually wrong with the > drivers or their interactions at it leaves me only asking if there should > be some sort of improvement in error reporting/recovery up to userland. > > If I am right and the scsi system was marking a device as dead, shouldn't > the userland read against the md device get an error instead of an > indefinite hang? > Let me make sure I have this scenario right... one write process (dd or cp) hangs, but you can still access data on the array, so the devices (all of them?) are working. It would be useful at that point to see if /proc/mdstat shows one device as failed. Given that I have described the behavior, I would think that there is still a problem in the driver or md somewhere, hangs should time out, errors should be reported up, and if this is caused by a lost write completion, I would hope that would be timed out and reported. That's my read on it, these "just hangs" cases probably are undetected or mishandled errors which should be passed up and reported to the application or retried and completed. Or handled in some better way than what you describe. Bad hardware is a fact of life, if you feel like chasing this more, an understanding of what the hardware did wrong and what the kernel didn't do right would be helpful. Of course the failure mode may be so rare, and the fix so time-consuming that it won't get fixed, but it can get documented. > Beyond this question which I leave to you (although I'd love to hear your > answers/thoughts), I think we can safely say that the problem was hardware > (even if hard to find). If either of you would like, I'd be happy to find > time this week to recreate the error on my "better" PC and send that > along. > > As for rolling a custom kernel with more message buffer, well, I'm going > to be getting into a new device driver in the coming months, so a custom > debug kernel is definitely in my future, but I'm not sure when. > > I must say, the kernel has become a much more complex beastie since 2.2.x! > (Although it also appears to be improved and somewhat more organized -- > but definitely MUCH larger!) > > Thank you both so much! I wouldn't even have diagnosed my hardware problem > without your prompts. I'm very grateful. Let me know if you'd like those > dmesg logs or if you'd just like to let it go! > > -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Linux-usb-users@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-19 14:29 ` Bill Davidsen @ 2007-03-19 14:54 ` Michael Schwarz 2007-03-19 15:31 ` Alan Stern 0 siblings, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-19 14:54 UTC (permalink / raw) To: Bill Davidsen; +Cc: Neil Brown, Alan Stern, linux-raid, linux-usb-users I'm going to hang on to the hardware. This is a pilot/demo that may lead to development of a new device, and, if so, I'll be getting back into device driver writing. Working this problem would be great practice for that. So I will do it. The only problem is I don't know when! I believe I can replicate the problem, so I'll find time (perhaps next weekend) to capture the data of interest. Mr. Stern: Where might I go for low level programming information on USB devices? I'm interested in registers/DMA/packet formats, etc. I've found info on the USB protocol itself, but I haven't found info on devices. Obviously I can dig through kernel source, but documents would be nice! Again, if this is an unreasonable request for you to "do my homework," just say so! I won't be offended. I'm sure I can find it myself given time, but if you happen to have some URLs handy, they'd be appreciated. YET AGAIN thank you both! You've been of great help. -- Michael Schwarz > Michael Schwarz wrote: >> More than ever, I am convinced that it is actually a hardware problem, >> but >> I am curious for the opinions of both of you on whether the "system" >> (meaning, I guess, the combination of usb-storage driver and raid) is >> really doing the best with what it has. >> > > See below, but the short answer is there is probably room for improvement. >> My last effort was to switch to a different computer. When I did, I got >> in >> the dmesg log (unfortunately, not preserved, although I should be able >> to >> recreate) that one of the flash drives had bad blocks. Some part of the >> system eventually decided it was a "dead device" (I believe dmesg >> indicate >> the scsi subsystem said so). The device (it happened to be /dev/sdc) was >> peremptorially dropped from the system. This appears to be what hanged >> the >> raid system. >> >> (Why these messages never appeared on the other computer is beyond me; >> obviously some difference in how the actual USB controller reports >> errors, >> but, as I said, I've never studied USB drivers or hardware. In fact, >> once >> you get beyond the UARTs you are getting sophisticated to me) >> >> I've built an array of five known-good devices and so far it works >> swimmingly (at least on the hardware that was better at error >> reporting). >> >> So it seems to me that there is probably nothing actually wrong with the >> drivers or their interactions at it leaves me only asking if there >> should >> be some sort of improvement in error reporting/recovery up to userland. >> >> If I am right and the scsi system was marking a device as dead, >> shouldn't >> the userland read against the md device get an error instead of an >> indefinite hang? >> > > Let me make sure I have this scenario right... one write process (dd or > cp) hangs, but you can still access data on the array, so the devices > (all of them?) are working. It would be useful at that point to see if > /proc/mdstat shows one device as failed. > > Given that I have described the behavior, I would think that there is > still a problem in the driver or md somewhere, hangs should time out, > errors should be reported up, and if this is caused by a lost write > completion, I would hope that would be timed out and reported. That's my > read on it, these "just hangs" cases probably are undetected or > mishandled errors which should be passed up and reported to the > application or retried and completed. Or handled in some better way than > what you describe. > > Bad hardware is a fact of life, if you feel like chasing this more, an > understanding of what the hardware did wrong and what the kernel didn't > do right would be helpful. Of course the failure mode may be so rare, > and the fix so time-consuming that it won't get fixed, but it can get > documented. >> Beyond this question which I leave to you (although I'd love to hear >> your >> answers/thoughts), I think we can safely say that the problem was >> hardware >> (even if hard to find). If either of you would like, I'd be happy to >> find >> time this week to recreate the error on my "better" PC and send that >> along. >> >> As for rolling a custom kernel with more message buffer, well, I'm going >> to be getting into a new device driver in the coming months, so a custom >> debug kernel is definitely in my future, but I'm not sure when. >> >> I must say, the kernel has become a much more complex beastie since >> 2.2.x! >> (Although it also appears to be improved and somewhat more organized -- >> but definitely MUCH larger!) >> >> Thank you both so much! I wouldn't even have diagnosed my hardware >> problem >> without your prompts. I'm very grateful. Let me know if you'd like those >> dmesg logs or if you'd just like to let it go! >> >> > -- > > bill davidsen <davidsen@tmr.com> > CTO TMR Associates, Inc > Doing interesting things with small computers since 1979 > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-19 14:54 ` [Linux-usb-users] " Michael Schwarz @ 2007-03-19 15:31 ` Alan Stern 2007-03-19 16:58 ` Michael Schwarz 0 siblings, 1 reply; 23+ messages in thread From: Alan Stern @ 2007-03-19 15:31 UTC (permalink / raw) To: Michael Schwarz; +Cc: Bill Davidsen, Neil Brown, linux-raid, linux-usb-users On Mon, 19 Mar 2007, Michael Schwarz wrote: > I'm going to hang on to the hardware. This is a pilot/demo that may lead > to development of a new device, and, if so, I'll be getting back into > device driver writing. Working this problem would be great practice for > that. So I will do it. The only problem is I don't know when! > > I believe I can replicate the problem, so I'll find time (perhaps next > weekend) to capture the data of interest. Michael, you don't seem to appreciate the basic principles for tracking down problems. First: Simplify. Get rid of everything that isn't relevant to the problem and could serve to distract you. In particular, don't run X. That will eliminate around half of your running processes and shrink the stack dump down so that it might fit in the kernel buffer without overflowing. Second: Simplify. Don't run kernels that have been modified by Fedora or anybody else. Use a plain vanilla kernel from kernel.org. Third: Simplify. Try not to collect the same data over and over again (take a look at the starts of all those dmesg files you compressed and emailed). You can clear the kernel's log buffer after dumping it by doing "dmesg -c >/dev/null". Fourth: Be prepared to make changes. This means making changes to the kernel configuration or source code, another reason for using a stock kernel. To get some really useful data, you need to build a kernel with CONFIG_USB_DEBUG turned on. Without that setting there won't be any helpful debugging information in the log. Then you should run a minimal system. Single-user mode would be best, but that can be _too_ bare-bones. No GUI will suffice. Then you should clear the kernel log before before starting the big file copy. Basically nothing that happens before then is important, because nothing has gone wrong. Then after the hang occurs, see what shows up in the dmesg log. And get a stack dump. > Mr. Stern: Where might I go for low level programming information on USB > devices? I'm interested in registers/DMA/packet formats, etc. Are you interested in USB devices (i.e., flash drives, webcams, and so on -- the things you plug in to a USB connection) or USB controllers (the hardware in your computer that manages the USB bus)? > I've found info on the USB protocol itself, but I haven't found info on > devices. Obviously I can dig through kernel source, but documents would be > nice! Again, if this is an unreasonable request for you to "do my > homework," just say so! I won't be offended. I'm sure I can find it myself > given time, but if you happen to have some URLs handy, they'd be > appreciated. There are three types of USB controllers used in personal computers: UHCI, OHCI, and EHCI. Links to their specifications are available here: http://www.usb.org/developers/resources/ Specifications for various classes of USB devices are available here: http://www.usb.org/developers/devclass_docs Alan Stern ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Linux-usb-users] Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-19 15:31 ` Alan Stern @ 2007-03-19 16:58 ` Michael Schwarz 2007-03-19 18:17 ` Alan Stern 0 siblings, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-19 16:58 UTC (permalink / raw) To: Alan Stern; +Cc: Bill Davidsen, Neil Brown, linux-raid, linux-usb-users Comments below. -- Michael Schwarz > On Mon, 19 Mar 2007, Michael Schwarz wrote: > >> I'm going to hang on to the hardware. This is a pilot/demo that may lead >> to development of a new device, and, if so, I'll be getting back into >> device driver writing. Working this problem would be great practice for >> that. So I will do it. The only problem is I don't know when! >> >> I believe I can replicate the problem, so I'll find time (perhaps next >> weekend) to capture the data of interest. > > Michael, you don't seem to appreciate the basic principles for tracking > down problems. I want to bristle at this. I've been a professional software developer for nearly 20 years. But I can't because all of your points below are, of course, dead on for tracking down a device-level problem. > > First: Simplify. Get rid of everything that isn't relevant > to the problem and could serve to distract you. In particular, > don't run X. That will eliminate around half of your running > processes and shrink the stack dump down so that it might fit > in the kernel buffer without overflowing. Right on. And I know this; I should have had two boxes where I was working; one where I could do browsy-emaily things separate from the problem I was working. > > Second: Simplify. Don't run kernels that have been modified by > Fedora or anybody else. Use a plain vanilla kernel from > kernel.org. Yeah; But here was where I lacked confidence. I used to know every inch of my kernel and my hardware, but, as previously stated, that was back in the 2.2.x days. I wasn't confident that I could run my hardware with a plain-vanilla kernel or that I could successfully roll my own working 2.6.x kernel in a timely manner. But, of course, I understand why this is a good idea. > > Third: Simplify. Try not to collect the same data over and over > again (take a look at the starts of all those dmesg files you > compressed and emailed). You can clear the kernel's log buffer > after dumping it by doing "dmesg -c >/dev/null". Thanks, I actually didn't know that flag. Makes me feel pretty stupid... > > Fourth: Be prepared to make changes. This means making changes > to the kernel configuration or source code, another reason for > using a stock kernel. I agree -- I just lacked confidence doing so with newer kernels. I used to ALWAYS build my own kernel right up through the 2.2.x series, building the kernel to exactly match my hardware. I just haven't kept up. And if you compare the 2.2.x kernel's configuration parameter list to the 2.6.x, well, you can maybe understand why I was reluctant to launch on that when under time pressure. But you point (I gather) is that if I had, it might well have taken less time than it did... > > To get some really useful data, you need to build a kernel with > CONFIG_USB_DEBUG turned on. Without that setting there won't be any > helpful debugging information in the log. Before I send any more info on this problem, I will do this and all of the above. > > Then you should run a minimal system. Single-user mode would be best, > but that can be _too_ bare-bones. No GUI will suffice. Will do. > > Then you should clear the kernel log before before starting the big file > copy. Basically nothing that happens before then is important, because > nothing has gone wrong. > > Then after the hang occurs, see what shows up in the dmesg log. And get a > stack dump. > >> Mr. Stern: Where might I go for low level programming information on USB >> devices? I'm interested in registers/DMA/packet formats, etc. > > Are you interested in USB devices (i.e., flash drives, webcams, and so on > -- the things you plug in to a USB connection) or USB controllers (the > hardware in your computer that manages the USB bus)? Firstly the controllers, then specific devices. > >> I've found info on the USB protocol itself, but I haven't found info on >> devices. Obviously I can dig through kernel source, but documents would >> be >> nice! Again, if this is an unreasonable request for you to "do my >> homework," just say so! I won't be offended. I'm sure I can find it >> myself >> given time, but if you happen to have some URLs handy, they'd be >> appreciated. > > There are three types of USB controllers used in personal computers: UHCI, > OHCI, and EHCI. Links to their specifications are available here: > > http://www.usb.org/developers/resources/ Thanks. This is just what I wanted. > > Specifications for various classes of USB devices are available here: > > http://www.usb.org/developers/devclass_docs And this. Thank you much. I won't post on this issue again until I've "cleared the decks" of the items you mention above. Thanks again. > > Alan Stern > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array (from newbie who has read the FAQ) 2007-03-19 16:58 ` Michael Schwarz @ 2007-03-19 18:17 ` Alan Stern 0 siblings, 0 replies; 23+ messages in thread From: Alan Stern @ 2007-03-19 18:17 UTC (permalink / raw) To: Michael Schwarz; +Cc: Neil Brown, linux-raid, Bill Davidsen, linux-usb-users On Mon, 19 Mar 2007, Michael Schwarz wrote: > Yeah; But here was where I lacked confidence. I used to know every inch of > my kernel and my hardware, but, as previously stated, that was back in the > 2.2.x days. I wasn't confident that I could run my hardware with a > plain-vanilla kernel or that I could successfully roll my own working > 2.6.x kernel in a timely manner. But, of course, I understand why this is > a good idea. It's not so hard to do, if you start from a known-good configuration. For instance, you could take the config your current distribution's kernel is built from and just use it, although it would take a long time to build because it includes so many drivers. Whittling it down to just the drivers you need would be tedious but not very difficult. Alan Stern ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Linux-usb-users@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <45FC33A4.2090408@tmr.com>]
* Re: Failed reads from RAID-0 array; still no joy in Mudville. [not found] ` <45FC33A4.2090408@tmr.com> @ 2007-03-17 19:13 ` Michael Schwarz 2007-03-17 19:21 ` Michael Schwarz 0 siblings, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-17 19:13 UTC (permalink / raw) To: Bill Davidsen; +Cc: Neil Brown, linux-raid, linux-usb-users I'll try playing around with IO sizes with dd. What I'm finding so far is ABSOLUTE consistency on where it locks. If it were a race condition with kernel locks I guess I would expect it to be more indeterminate (in my limited experience) unless it is due to specific "deadly embrace" condition between the usb drivers(s) and the raid subsystem. I must admit that I'm not familiar enough with either one. I will also mention that I experienced this lockup phenomenon with both a stock Fedora Core 6 i686 kernel and with a stock Ubuntu kernel, so the behavior isn't terribly kernel compile/module mix sensetive. I've downloaded the kernel-devel package for my Fedora kernel and I'm going to start working backwards from the stack trace I've captured to see where I'm hanging and why. strace wasn't particularly helpful since the write to file was buffered and so I can't be sure I have the call that failed. (I'll take a look and see if there's an 'unbuffered write' switch on strace -- there probably is). Anyways, I'm still hoping someone who knows a lot will see this and say "oh, yeah! That's because of BLAH." I don't mind becoming more knowledgeable about the 2.6.x kernel, but this wasn't how I wanted to go about it! ;-) Thanks again, all... What I find odd is that it seems to be a "per-process" problem. I can still access the md drive from other processes when the copy is hung. I'm going to see if it is "positional" by copying the file that is "hung" alone and see if it hangs in the same place on the same file, or if it hangs later or what,,, There will be more posts from me. (Fair warning to all!) -- Michael Schwarz > Neil Brown wrote: >> On Friday March 16, mschwarz@multitool.net wrote: >> >>> I'm not a Linux newbie (I've even written a couple of books and done >>> some >>> very light device driver work), but I'm completely new to the software >>> raid subsystem. >>> >>> I'm doing something rather oddball. I'm making an array of USB flash >>> drives and comparing read and write rates. >>> >>> Well, I've had great success writing. I've got seven flash drives on a >>> hub. I've joined them up both linear and raid0 and written large >>> amounts >>> of data to them. But come time to read from them, linear works, but >>> raid0 >>> hangs after transferring just shy of 2G of data. It doesn't matter if >>> it >>> reading from one file or from many files whose cumulative size is just >>> shy >>> of 2G. It doesn't matter if I'm using "dd" or "cp" to read the file or >>> files. >>> >>> The process doing the transfer is unkillable. Not with a kill -15 or a >>> kill -9. It won't die, but it also won't make progress. >>> >>> "Linear" always works. Raid-0 always hangs. >>> >> >> My guess would be a locking bug in the usb storage driver or some >> lower level USB driver.. >> A significant difference between raid0 and linear is that a largish IO >> will touch all drives for raid-0, but only one or two for linear. >> That gives much more opportunity for locking bugs to hit. >> >> When it is in the hanging state, do >> echo t > /proc/sysrq-trigger >> >> and look in the kernel logs for the stack trace of all processes. >> Hopefully the stack trace for the processes in 'D' state will be >> informative. >> >> NeilBrown >> >> >> >>> Here are my mdadm commands to create the array: >>> >>> mdadm --create /dev/md0 --level=linear --auto=md --chunk=32 >>> --raid-devices=7 /dev/sd? >>> >>> (The wildcard works because the seven flash drives are the only scsi >>> devices on the system). >>> >>> The command for the raid-0 array is the same as above except for the >>> "--level=0" it takes to make a raid 0 array. >>> >>> I then use "mkfs" to make the filesystem and mount the resulting array >>> at >>> "/mnt" >>> >>> Can anyone give a raid newbiw some tips? Is there something obvious I'm >>> missing? Would it help to provide strace/ltrace/ptrace of the hanging >>> copy >>> command? >>> >>> Any help (including URLs of manuals I should RTFM) would be most >>> welcome. >>> >>> Thanks! >>> >>> >>> -- >>> Michael Schwarz >>> > > Neil, would retrying this with small i/o show anything, assuming your > thought is the cause? Also, would it give useful information to usee dd > with direct i/o on read: > dd if=/dev/md0 iflag=direct bs=1024k of=/dev/null > and see if large buffer with O_DIRECT works? > > These are suggestions on getting more info, if the trace doesn't clarify > the problem. > > -- > bill davidsen <davidsen@tmr.com> > CTO TMR Associates, Inc > Doing interesting things with small computers since 1979 > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array; still no joy in Mudville. 2007-03-17 19:13 ` Failed reads from RAID-0 array; still no joy in Mudville Michael Schwarz @ 2007-03-17 19:21 ` Michael Schwarz 2007-03-18 17:22 ` Bill Davidsen 0 siblings, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-17 19:21 UTC (permalink / raw) To: mschwarz; +Cc: linux-raid, linux-usb-users Update: (For those who've been waiting breathlessly). It hangs at a particular point in a particular file. In other words, it doesn't depend on the total number of bytes transfered. Rather, when it reaches a particular point in a particular file (12267520 bytes into a file that is 1073709056 bytes long) it hangs. I begin to suspect that I have a "dead spot" in my USB hub. But what gets me if that is true is why does the write work? Do cp and dd not check to see if writes succeed? I know it isn't a particular flash drive because I've used two different sets of 7 USB drives and it seems to fail consistently no matter which. Nonetheless, I'm beginning to think I'm dealing with a hardware issue, not a kernel issue, just because it is so consistent. Thanks again for all the help. -- Michael Schwarz > I'll try playing around with IO sizes with dd. > > What I'm finding so far is ABSOLUTE consistency on where it locks. If it > were a race condition with kernel locks I guess I would expect it to be > more indeterminate (in my limited experience) unless it is due to specific > "deadly embrace" condition between the usb drivers(s) and the raid > subsystem. > > I must admit that I'm not familiar enough with either one. I will also > mention that I experienced this lockup phenomenon with both a stock Fedora > Core 6 i686 kernel and with a stock Ubuntu kernel, so the behavior isn't > terribly kernel compile/module mix sensetive. > > I've downloaded the kernel-devel package for my Fedora kernel and I'm > going to start working backwards from the stack trace I've captured to see > where I'm hanging and why. strace wasn't particularly helpful since the > write to file was buffered and so I can't be sure I have the call that > failed. (I'll take a look and see if there's an 'unbuffered write' switch > on strace -- there probably is). > > Anyways, I'm still hoping someone who knows a lot will see this and say > "oh, yeah! That's because of BLAH." I don't mind becoming more > knowledgeable about the 2.6.x kernel, but this wasn't how I wanted to go > about it! ;-) > > Thanks again, all... > > What I find odd is that it seems to be a "per-process" problem. I can > still access the md drive from other processes when the copy is hung. I'm > going to see if it is "positional" by copying the file that is "hung" > alone and see if it hangs in the same place on the same file, or if it > hangs later or what,,, There will be more posts from me. (Fair warning to > all!) > > -- > Michael Schwarz > >> Neil Brown wrote: >>> On Friday March 16, mschwarz@multitool.net wrote: >>> >>>> I'm not a Linux newbie (I've even written a couple of books and done >>>> some >>>> very light device driver work), but I'm completely new to the software >>>> raid subsystem. >>>> >>>> I'm doing something rather oddball. I'm making an array of USB flash >>>> drives and comparing read and write rates. >>>> >>>> Well, I've had great success writing. I've got seven flash drives on a >>>> hub. I've joined them up both linear and raid0 and written large >>>> amounts >>>> of data to them. But come time to read from them, linear works, but >>>> raid0 >>>> hangs after transferring just shy of 2G of data. It doesn't matter if >>>> it >>>> reading from one file or from many files whose cumulative size is just >>>> shy >>>> of 2G. It doesn't matter if I'm using "dd" or "cp" to read the file or >>>> files. >>>> >>>> The process doing the transfer is unkillable. Not with a kill -15 or a >>>> kill -9. It won't die, but it also won't make progress. >>>> >>>> "Linear" always works. Raid-0 always hangs. >>>> >>> >>> My guess would be a locking bug in the usb storage driver or some >>> lower level USB driver.. >>> A significant difference between raid0 and linear is that a largish IO >>> will touch all drives for raid-0, but only one or two for linear. >>> That gives much more opportunity for locking bugs to hit. >>> >>> When it is in the hanging state, do >>> echo t > /proc/sysrq-trigger >>> >>> and look in the kernel logs for the stack trace of all processes. >>> Hopefully the stack trace for the processes in 'D' state will be >>> informative. >>> >>> NeilBrown >>> >>> >>> >>>> Here are my mdadm commands to create the array: >>>> >>>> mdadm --create /dev/md0 --level=linear --auto=md --chunk=32 >>>> --raid-devices=7 /dev/sd? >>>> >>>> (The wildcard works because the seven flash drives are the only scsi >>>> devices on the system). >>>> >>>> The command for the raid-0 array is the same as above except for the >>>> "--level=0" it takes to make a raid 0 array. >>>> >>>> I then use "mkfs" to make the filesystem and mount the resulting array >>>> at >>>> "/mnt" >>>> >>>> Can anyone give a raid newbiw some tips? Is there something obvious >>>> I'm >>>> missing? Would it help to provide strace/ltrace/ptrace of the hanging >>>> copy >>>> command? >>>> >>>> Any help (including URLs of manuals I should RTFM) would be most >>>> welcome. >>>> >>>> Thanks! >>>> >>>> >>>> -- >>>> Michael Schwarz >>>> >> >> Neil, would retrying this with small i/o show anything, assuming your >> thought is the cause? Also, would it give useful information to usee dd >> with direct i/o on read: >> dd if=/dev/md0 iflag=direct bs=1024k of=/dev/null >> and see if large buffer with O_DIRECT works? >> >> These are suggestions on getting more info, if the trace doesn't clarify >> the problem. >> >> -- >> bill davidsen <davidsen@tmr.com> >> CTO TMR Associates, Inc >> Doing interesting things with small computers since 1979 >> >> > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Linux-usb-users@lists.sourceforge.net To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-users ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array; still no joy in Mudville. 2007-03-17 19:21 ` Michael Schwarz @ 2007-03-18 17:22 ` Bill Davidsen 2007-03-18 17:39 ` Michael Schwarz 0 siblings, 1 reply; 23+ messages in thread From: Bill Davidsen @ 2007-03-18 17:22 UTC (permalink / raw) To: mschwarz; +Cc: linux-raid, linux-usb-users Michael Schwarz wrote: > Update: > > (For those who've been waiting breathlessly). It hangs at a particular > point in a particular file. In other words, it doesn't depend on the total > number of bytes transfered. Rather, when it reaches a particular point in > a particular file (12267520 bytes into a file that is 1073709056 bytes > long) it hangs. > I have an odd thought, have you tried copying that same file to /dev/null or similar? The reason I ask is that if it were by any chance a sparse file, while the program is reading all those unwritten bytes odd things may happen. Sorry, I haven't seen this is years, but I do remember seeing a filesystem on the destination end running out of space because all those unwritten pages were now being "really written" as zeros. Use of cp with the --sparse= flag may change the behavior if this is the case. > I begin to suspect that I have a "dead spot" in my USB hub. But what gets > me if that is true is why does the write work? Do cp and dd not check to > see if writes succeed? > > I know it isn't a particular flash drive because I've used two different > sets of 7 USB drives and it seems to fail consistently no matter which. > > Nonetheless, I'm beginning to think I'm dealing with a hardware issue, not > a kernel issue, just because it is so consistent. > > Thanks again for all the help. > > > -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array; still no joy in Mudville. 2007-03-18 17:22 ` Bill Davidsen @ 2007-03-18 17:39 ` Michael Schwarz 2007-03-18 18:21 ` Bill Davidsen 0 siblings, 1 reply; 23+ messages in thread From: Michael Schwarz @ 2007-03-18 17:39 UTC (permalink / raw) To: Bill Davidsen; +Cc: linux-raid, linux-usb-users I've tried both single and multiple files. The files are not sparse. They are highly compressed files (mpeg files) that would, to the filesystem, be nearly random with no repeated patterns or voids. -- Michael Schwarz > Michael Schwarz wrote: >> Update: >> >> (For those who've been waiting breathlessly). It hangs at a particular >> point in a particular file. In other words, it doesn't depend on the >> total >> number of bytes transfered. Rather, when it reaches a particular point >> in >> a particular file (12267520 bytes into a file that is 1073709056 bytes >> long) it hangs. >> > > I have an odd thought, have you tried copying that same file to > /dev/null or similar? The reason I ask is that if it were by any chance > a sparse file, while the program is reading all those unwritten bytes > odd things may happen. Sorry, I haven't seen this is years, but I do > remember seeing a filesystem on the destination end running out of space > because all those unwritten pages were now being "really written" as > zeros. > > Use of cp with the --sparse= flag may change the behavior if this is the > case. >> I begin to suspect that I have a "dead spot" in my USB hub. But what >> gets >> me if that is true is why does the write work? Do cp and dd not check to >> see if writes succeed? >> >> I know it isn't a particular flash drive because I've used two different >> sets of 7 USB drives and it seems to fail consistently no matter which. >> >> Nonetheless, I'm beginning to think I'm dealing with a hardware issue, >> not >> a kernel issue, just because it is so consistent. >> >> Thanks again for all the help. >> >> >> > > > -- > bill davidsen <davidsen@tmr.com> > CTO TMR Associates, Inc > Doing interesting things with small computers since 1979 > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Failed reads from RAID-0 array; still no joy in Mudville. 2007-03-18 17:39 ` Michael Schwarz @ 2007-03-18 18:21 ` Bill Davidsen 0 siblings, 0 replies; 23+ messages in thread From: Bill Davidsen @ 2007-03-18 18:21 UTC (permalink / raw) To: mschwarz; +Cc: linux-raid, linux-usb-users Michael Schwarz wrote: > I've tried both single and multiple files. The files are not sparse. They > are highly compressed files (mpeg files) that would, to the filesystem, be > nearly random with no repeated patterns or voids. > > Good, one possible cause eliminated. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2007-03-19 18:17 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-17 2:20 Failed reads from RAID-0 array (from newbie who has read the FAQ) Michael Schwarz
2007-03-17 5:31 ` Neil Brown
2007-03-17 18:01 ` Michael Schwarz
2007-03-17 20:49 ` Alan Stern
2007-03-17 21:35 ` Michael Schwarz
2007-03-18 2:06 ` [Linux-usb-users] " Alan Stern
2007-03-18 2:12 ` Alan Stern
2007-03-18 4:42 ` Michael Schwarz
2007-03-18 16:56 ` [Linux-usb-users] " Michael Schwarz
2007-03-18 17:44 ` Michael Schwarz
2007-03-18 21:55 ` Michael Schwarz
2007-03-18 21:57 ` Neil Brown
2007-03-19 3:27 ` Michael Schwarz
2007-03-19 14:29 ` Bill Davidsen
2007-03-19 14:54 ` [Linux-usb-users] " Michael Schwarz
2007-03-19 15:31 ` Alan Stern
2007-03-19 16:58 ` Michael Schwarz
2007-03-19 18:17 ` Alan Stern
[not found] ` <45FC33A4.2090408@tmr.com>
2007-03-17 19:13 ` Failed reads from RAID-0 array; still no joy in Mudville Michael Schwarz
2007-03-17 19:21 ` Michael Schwarz
2007-03-18 17:22 ` Bill Davidsen
2007-03-18 17:39 ` Michael Schwarz
2007-03-18 18:21 ` Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).