* readahead logic and I/O errors @ 2006-02-15 20:35 Michael Tokarev 2006-02-15 20:59 ` linux-os (Dick Johnson) 2006-02-16 6:53 ` Jan Knutar 0 siblings, 2 replies; 7+ messages in thread From: Michael Tokarev @ 2006-02-15 20:35 UTC (permalink / raw) To: Linux-kernel The thing is: I just fired a cdrom drive in my PC. It was a good device, and now it's dead. And the reason is the readahead logic, plus an unreadable (damaged, scratched) CD-rom. By default, linux enables read-ahead for all block devices (I can't be certain about "all", but it's true for hard disks, cd-roms and floppies at least). And when a read request comes to the device, according to readahead logic, linux tries to get more data from the drive than requested. Now, one of blocks requested to be read can't be read due to whatever condition. It happens on removable media, it's a normal (sort of, anyway) condition. Looks like I had many unreadable blocks on that CD. And, instead of obvious (in my view anyway) thing to do, namely, to STOP readahead operation and return only whatever data which was requested by an application (it was just `cp') after FIRST I/O error, linux continued trying reading-ahead, discovering more and more failed blocks, as dmesg said. And finally, after about 100 blocks failed in a row, the drive fired. I wasn't able to stop the process - `cp' was in D state sitting in one read() syscall thus unkillable. Umount didn't work (obviously). Only one option left was to reboot/poweroff the system, which I was finally forced to do.. but too late. I don't understand the logic in ll_rw_blk.c (the file is quite large), but my guessing is that initially, the read request was one for several blocks, but when it failed, linux continued trying to get next block, next-to-next block and so on - hence all consequtive failed block numbers in dmesg, ie, it split original large request into several smaller chunks, failing each in turn. Even if I'm wrong here and it originally made "short" reads, the question remains: Why linux tries to continue readahead operation after I/O errors, instead of just dropping the reads, and repeat them WHEN ACTUALLY NEEDED? `cp' will stop after first failed read(), so only one bad sector will be tried, instead of numerous, finally firing the drive? When setting readahead to 0, the same CD can be read up to the first error, and will fail correctly. The same happens with floppy as well - exactly the same thing. I especially damaged a floppy disk (I still have several here), just to verify - after ~15 minutes of waiting while it will finish it's readahead nonsense, I become bored and finally rebooted the system. So, to sum: current linux behaviour (similar since 2.4 -- at least I remember seeing something like that, repeated attempts to read something from a bad drive, -- up to current 2.6.15) is that it's trivial to kill the system just by inserting a cd-rom or floppy with some unreadable sectors. Worse, it's possible to FIRE the system this way (with certain drives anyway), as just happened here (I understand, maybe, after some more waiting, when it will try all 128 or 256 more sectors, it WILL finally do something useful again and stop retrying, but eg on this my drive, each read attempt took about 30 sec, retrying several times by its own, so total time is umm.. quite large, and there's nothing one can do with it but to poweroff the PC). From my point of view, it's a serious bug (even if not counting my dead CD drive). Thanks. /mjt ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: readahead logic and I/O errors 2006-02-15 20:35 readahead logic and I/O errors Michael Tokarev @ 2006-02-15 20:59 ` linux-os (Dick Johnson) 2006-02-15 22:02 ` Michael Tokarev 2006-02-16 6:53 ` Jan Knutar 1 sibling, 1 reply; 7+ messages in thread From: linux-os (Dick Johnson) @ 2006-02-15 20:59 UTC (permalink / raw) To: Michael Tokarev; +Cc: Linux-kernel On Wed, 15 Feb 2006, Michael Tokarev wrote: > The thing is: I just fired a cdrom drive in my PC. > It was a good device, and now it's dead. > And the reason is the readahead logic, plus an unreadable > (damaged, scratched) CD-rom. > > By default, linux enables read-ahead for all block devices > (I can't be certain about "all", but it's true for hard > disks, cd-roms and floppies at least). And when a read > request comes to the device, according to readahead logic, > linux tries to get more data from the drive than requested. > > Now, one of blocks requested to be read can't be read due > to whatever condition. It happens on removable media, it's > a normal (sort of, anyway) condition. > > Looks like I had many unreadable blocks on that CD. And, > instead of obvious (in my view anyway) thing to do, namely, > to STOP readahead operation and return only whatever data > which was requested by an application (it was just `cp') > after FIRST I/O error, linux continued trying reading-ahead, > discovering more and more failed blocks, as dmesg said. > > And finally, after about 100 blocks failed in a row, the > drive fired. > > I wasn't able to stop the process - `cp' was in D state > sitting in one read() syscall thus unkillable. Umount > didn't work (obviously). Only one option left was to > reboot/poweroff the system, which I was finally forced > to do.. but too late. > > I don't understand the logic in ll_rw_blk.c (the file > is quite large), but my guessing is that initially, > the read request was one for several blocks, but when > it failed, linux continued trying to get next block, > next-to-next block and so on - hence all consequtive > failed block numbers in dmesg, ie, it split original > large request into several smaller chunks, failing each > in turn. > > Even if I'm wrong here and it originally made "short" > reads, the question remains: > > Why linux tries to continue readahead operation after > I/O errors, instead of just dropping the reads, and > repeat them WHEN ACTUALLY NEEDED? `cp' will stop after > first failed read(), so only one bad sector will be tried, > instead of numerous, finally firing the drive? > > When setting readahead to 0, the same CD can be read up > to the first error, and will fail correctly. > > The same happens with floppy as well - exactly the same > thing. I especially damaged a floppy disk (I still have > several here), just to verify - after ~15 minutes of waiting > while it will finish it's readahead nonsense, I become bored > and finally rebooted the system. > > So, to sum: current linux behaviour (similar since 2.4 -- at > least I remember seeing something like that, repeated attempts > to read something from a bad drive, -- up to current 2.6.15) > is that it's trivial to kill the system just by inserting a > cd-rom or floppy with some unreadable sectors. Worse, it's > possible to FIRE the system this way (with certain drives > anyway), as just happened here (I understand, maybe, after > some more waiting, when it will try all 128 or 256 more sectors, > it WILL finally do something useful again and stop retrying, > but eg on this my drive, each read attempt took about 30 sec, > retrying several times by its own, so total time is umm.. > quite large, and there's nothing one can do with it but > to poweroff the PC). > > From my point of view, it's a serious bug (even if not counting > my dead CD drive). > > Thanks. > > /mjt Aside from the obvious read-ahead bug you discovered, have you tried your CD drive after the reboot? It is not possible to kill those things with software, even if attempting to write with too much infrared LED drive. There is nothing except for the removable disc to get hurt. Even the "head" isn't in contact. It's just some infrared light, focused with a voice-coil, that moves on a fixed platform. Of course you can make many cow-flops from discs, but can you describe how you think the drive got "fired" as you say? Cheers, Dick Johnson Penguin : Linux version 2.6.13.4 on an i686 machine (5589.66 BogoMips). Warning : 98.36% of all statistics are fiction. _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: readahead logic and I/O errors 2006-02-15 20:59 ` linux-os (Dick Johnson) @ 2006-02-15 22:02 ` Michael Tokarev 2006-02-15 22:51 ` Phillip Susi 0 siblings, 1 reply; 7+ messages in thread From: Michael Tokarev @ 2006-02-15 22:02 UTC (permalink / raw) To: linux-os (Dick Johnson); +Cc: Linux-kernel linux-os (Dick Johnson) wrote: > On Wed, 15 Feb 2006, Michael Tokarev wrote: > > >>The thing is: I just fired a cdrom drive in my PC. >>It was a good device, and now it's dead. >>And the reason is the readahead logic, plus an unreadable >>(damaged, scratched) CD-rom. [] > Aside from the obvious read-ahead bug you discovered, have you > tried your CD drive after the reboot? It is not possible to > kill those things with software, even if attempting to write > with too much infrared LED drive. There is nothing except for > the removable disc to get hurt. Even the "head" isn't in contact. > It's just some infrared light, focused with a voice-coil, that > moves on a fixed platform. I opened the drive after BIOS didn't detect it on reboot (after power-off). There's one fired (burned? perished? how's that in english?) chip on the plate, wich smells like fired silicone. It looks like a ~5mm pit in the center of square chip, full of ache, and there's a crack across it. The drive is dead. I think it's a chip which controls one of the motors of the drive, most probably the one which moves the head, because the head motor connector is right near the chip. When I turned off power, the drive was *hot*, and it started "trembling" (or chattering) when I turned power off. It was a dvd-cd combo (read dvd, read-write cd) Teac drive, I don't remember the model (there's no label on the drive, and I can't send inquiry/identify command to it anymore, obviously). Yest it looks like a problem in the drive *too*, as it should not behave like that in the first place. But the thing is, I did know something's bad going on, I saw it, but I wasn't able to stop it from linux, only poweroff stopped things from going. /mjt ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: readahead logic and I/O errors 2006-02-15 22:02 ` Michael Tokarev @ 2006-02-15 22:51 ` Phillip Susi 0 siblings, 0 replies; 7+ messages in thread From: Phillip Susi @ 2006-02-15 22:51 UTC (permalink / raw) To: Michael Tokarev; +Cc: linux-os (Dick Johnson), Linux-kernel Michael Tokarev wrote: > I opened the drive after BIOS didn't detect it on reboot (after > power-off). There's one fired (burned? perished? how's that in I think you mean "fried", which is what happens when something is heated to excess. "fired" is what happens to you when your boss catches you stealing office supplies or sleeping with his daughter. It is also what happens to the bullet when you pull the trigger. > english?) chip on the plate, wich smells like fired silicone. > It looks like a ~5mm pit in the center of square chip, full of > ache, and there's a crack across it. The drive is dead. > > I think it's a chip which controls one of the motors of the drive, > most probably the one which moves the head, because the head > motor connector is right near the chip. > > When I turned off power, the drive was *hot*, and it started > "trembling" (or chattering) when I turned power off. > > It was a dvd-cd combo (read dvd, read-write cd) Teac drive, I > don't remember the model (there's no label on the drive, and > I can't send inquiry/identify command to it anymore, obviously). > > Yest it looks like a problem in the drive *too*, as it should > not behave like that in the first place. But the thing is, I > did know something's bad going on, I saw it, but I wasn't able > to stop it from linux, only poweroff stopped things from going. > > /mjt I assume the drive was no longer under warranty? It certainly should not have burned itself out like that, but yea, the kernel should not keep trying to readahead on error. I ran into a similar problem where I mounted a filesystem on a cdrw read/write and wrote some files to it. When I went to unmount it all the writes failed because the media was not properly formatted to be writable. The kernel kept trying to write each sector though, and blocked the umount process in the unkillable D state for a good 20 minutes and kept the drive door locked. Needless to say, I was very annoyed. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: readahead logic and I/O errors 2006-02-15 20:35 readahead logic and I/O errors Michael Tokarev 2006-02-15 20:59 ` linux-os (Dick Johnson) @ 2006-02-16 6:53 ` Jan Knutar 2006-02-16 10:42 ` Michael Tokarev 2006-02-20 10:57 ` Erik Mouw 1 sibling, 2 replies; 7+ messages in thread From: Jan Knutar @ 2006-02-16 6:53 UTC (permalink / raw) To: Michael Tokarev; +Cc: Linux-kernel On Wednesday 15 February 2006 22:35, Michael Tokarev wrote: > after FIRST I/O error, linux continued trying reading-ahead, > discovering more and more failed blocks, as dmesg said. Sorry for hijacking the thread, but on another note, is there anyway to tell linux to tell the drive to not bother retrying read errors? Would be perfect for streaming video from a CD or DVD. Usually video players have excellent error recovery themselves, which probably looks better on the screen than the movie coming to a grinding halt due to the retries. I remember this being discussed quite some time ago, but I don't remember if anything came out of it? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: readahead logic and I/O errors 2006-02-16 6:53 ` Jan Knutar @ 2006-02-16 10:42 ` Michael Tokarev 2006-02-20 10:57 ` Erik Mouw 1 sibling, 0 replies; 7+ messages in thread From: Michael Tokarev @ 2006-02-16 10:42 UTC (permalink / raw) To: Jan Knutar; +Cc: Linux-kernel Jan Knutar wrote: > On Wednesday 15 February 2006 22:35, Michael Tokarev wrote: > > >>after FIRST I/O error, linux continued trying reading-ahead, >>discovering more and more failed blocks, as dmesg said. > > Sorry for hijacking the thread, but on another note, is there > anyway to tell linux to tell the drive to not bother retrying > read errors? Would be perfect for streaming video from a > CD or DVD. Usually video players have excellent error > recovery themselves, which probably looks better on the > screen than the movie coming to a grinding halt due to > the retries. It looks like exactly the same issue. When you set readahead for the drive to 0 (to disable it), the only retry which is done is the one by drive itself. Linux return I/O error to the application right when drive tells so, and it's up to the application to descide what to do next - abort (like `cp' does), or continue next (or next-to-next) sector etc. I don't know how to control retries in the drive (if it's at all possible). /mjt ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: readahead logic and I/O errors 2006-02-16 6:53 ` Jan Knutar 2006-02-16 10:42 ` Michael Tokarev @ 2006-02-20 10:57 ` Erik Mouw 1 sibling, 0 replies; 7+ messages in thread From: Erik Mouw @ 2006-02-20 10:57 UTC (permalink / raw) To: Jan Knutar; +Cc: Michael Tokarev, Linux-kernel On Thu, Feb 16, 2006 at 08:53:16AM +0200, Jan Knutar wrote: > Sorry for hijacking the thread, but on another note, is there > anyway to tell linux to tell the drive to not bother retrying > read errors? Would be perfect for streaming video from a > CD or DVD. Usually video players have excellent error > recovery themselves, which probably looks better on the > screen than the movie coming to a grinding halt due to > the retries. FWIW, ATAPI defines a streaming IO command set, but that isn't uses by any driver. Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-02-20 10:57 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-02-15 20:35 readahead logic and I/O errors Michael Tokarev 2006-02-15 20:59 ` linux-os (Dick Johnson) 2006-02-15 22:02 ` Michael Tokarev 2006-02-15 22:51 ` Phillip Susi 2006-02-16 6:53 ` Jan Knutar 2006-02-16 10:42 ` Michael Tokarev 2006-02-20 10:57 ` Erik Mouw
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox