* [patch] floppy: suspend/resume fix
2006-11-11 20:48 [BUG] floppy: broken after resume due to 2.6.18-rc1 lockdep changes Mikael Pettersson
@ 2006-11-12 15:47 ` Ingo Molnar
0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2006-11-12 15:47 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: linux-kernel, Andrew Morton
* Mikael Pettersson <mikpe@it.uu.se> wrote:
> On my old Dell Latitude laptop, the first access to the floppy after
> having resumed from APM suspend fails miserably and generates these
> kernel messages (from 2.6.19-rc5):
[...]
> It's only the first post-resume access that triggers this failure,
> subsequent accesses do work.
>
> I've traced the cause to Ingo's lockdep patch in 2.6.18-rc1 (see
> below): reverting it makes the floppy work after resume again.
could you check the patch below? I had to add a platform driver to
floppy.c to get suspend/resume callbacks, but otherwise it's relatively
straightforward.
Ingo
----------------------->
Subject: [patch] floppy: suspend/resume fix
From: Ingo Molnar <mingo@elte.hu>
introduce a floppy platform-driver and suspend/resume ops to
stop/start the floppy driver. Bug reported by Mikael Pettersson.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
drivers/block/floppy.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
Index: linux/drivers/block/floppy.c
===================================================================
--- linux.orig/drivers/block/floppy.c
+++ linux/drivers/block/floppy.c
@@ -4157,6 +4157,28 @@ static void floppy_device_release(struct
complete(&device_release);
}
+static int floppy_suspend(struct platform_device *dev, pm_message_t state)
+{
+ floppy_release_irq_and_dma();
+
+ return 0;
+}
+
+static int floppy_resume(struct platform_device *dev)
+{
+ floppy_grab_irq_and_dma();
+
+ return 0;
+}
+
+static struct platform_driver floppy_driver = {
+ .suspend = floppy_suspend,
+ .resume = floppy_resume,
+ .driver = {
+ .name = "floppy",
+ },
+};
+
static struct platform_device floppy_device[N_DRIVE];
static struct kobject *floppy_find(dev_t dev, int *part, void *data)
@@ -4205,10 +4227,14 @@ static int __init floppy_init(void)
if (err)
goto out_put_disk;
+ err = platform_driver_register(&floppy_driver);
+ if (err)
+ goto out_unreg_blkdev;
+
floppy_queue = blk_init_queue(do_fd_request, &floppy_lock);
if (!floppy_queue) {
err = -ENOMEM;
- goto out_unreg_blkdev;
+ goto out_unreg_driver;
}
blk_queue_max_sectors(floppy_queue, 64);
@@ -4352,6 +4378,8 @@ out_flush_work:
out_unreg_region:
blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
blk_cleanup_queue(floppy_queue);
+out_unreg_driver:
+ platform_driver_unregister(&floppy_driver);
out_unreg_blkdev:
unregister_blkdev(FLOPPY_MAJOR, "fd");
out_put_disk:
@@ -4543,6 +4571,7 @@ void cleanup_module(void)
init_completion(&device_release);
blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
unregister_blkdev(FLOPPY_MAJOR, "fd");
+ platform_driver_unregister(&floppy_driver);
for (drive = 0; drive < N_DRIVE; drive++) {
del_timer_sync(&motor_off_timer[drive]);
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
@ 2006-11-12 17:53 Mikael Pettersson
2006-11-12 18:09 ` Ingo Molnar
0 siblings, 1 reply; 25+ messages in thread
From: Mikael Pettersson @ 2006-11-12 17:53 UTC (permalink / raw)
To: mingo; +Cc: akpm, linux-kernel
On Sun, 12 Nov 2006 16:47:12 +0100, Ingo Molnar wrote:
>* Mikael Pettersson <mikpe@it.uu.se> wrote:
>
>> On my old Dell Latitude laptop, the first access to the floppy after
>> having resumed from APM suspend fails miserably and generates these
>> kernel messages (from 2.6.19-rc5):
>[...]
>
>> It's only the first post-resume access that triggers this failure,
>> subsequent accesses do work.
>>
>> I've traced the cause to Ingo's lockdep patch in 2.6.18-rc1 (see
>> below): reverting it makes the floppy work after resume again.
>
>could you check the patch below? I had to add a platform driver to
>floppy.c to get suspend/resume callbacks, but otherwise it's relatively
>straightforward.
Sorry, no joy. The first access post-resume still fails and generates:
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 8
Buffer I/O error on device fd0, logical block 1
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 16
Buffer I/O error on device fd0, logical block 2
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 24
Buffer I/O error on device fd0, logical block 3
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 32
Buffer I/O error on device fd0, logical block 4
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 40
Buffer I/O error on device fd0, logical block 5
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 48
Buffer I/O error on device fd0, logical block 6
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 56
Buffer I/O error on device fd0, logical block 7
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
The 2nd etc post-resume accesses work like before.
Your patch did change the failure behaviour slightly.
Previously the first post-resume access would experience
a delay of about a second or so, and then report an error.
With this patch the error is immediate.
/Mikael
> Ingo
>
>----------------------->
>Subject: [patch] floppy: suspend/resume fix
>From: Ingo Molnar <mingo@elte.hu>
>
>introduce a floppy platform-driver and suspend/resume ops to
>stop/start the floppy driver. Bug reported by Mikael Pettersson.
>
>Signed-off-by: Ingo Molnar <mingo@elte.hu>
>---
> drivers/block/floppy.c | 31 ++++++++++++++++++++++++++++++-
> 1 file changed, 30 insertions(+), 1 deletion(-)
>
>Index: linux/drivers/block/floppy.c
>===================================================================
>--- linux.orig/drivers/block/floppy.c
>+++ linux/drivers/block/floppy.c
>@@ -4157,6 +4157,28 @@ static void floppy_device_release(struct
> complete(&device_release);
> }
>
>+static int floppy_suspend(struct platform_device *dev, pm_message_t state)
>+{
>+ floppy_release_irq_and_dma();
>+
>+ return 0;
>+}
>+
>+static int floppy_resume(struct platform_device *dev)
>+{
>+ floppy_grab_irq_and_dma();
>+
>+ return 0;
>+}
>+
>+static struct platform_driver floppy_driver = {
>+ .suspend = floppy_suspend,
>+ .resume = floppy_resume,
>+ .driver = {
>+ .name = "floppy",
>+ },
>+};
>+
> static struct platform_device floppy_device[N_DRIVE];
>
> static struct kobject *floppy_find(dev_t dev, int *part, void *data)
>@@ -4205,10 +4227,14 @@ static int __init floppy_init(void)
> if (err)
> goto out_put_disk;
>
>+ err = platform_driver_register(&floppy_driver);
>+ if (err)
>+ goto out_unreg_blkdev;
>+
> floppy_queue = blk_init_queue(do_fd_request, &floppy_lock);
> if (!floppy_queue) {
> err = -ENOMEM;
>- goto out_unreg_blkdev;
>+ goto out_unreg_driver;
> }
> blk_queue_max_sectors(floppy_queue, 64);
>
>@@ -4352,6 +4378,8 @@ out_flush_work:
> out_unreg_region:
> blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
> blk_cleanup_queue(floppy_queue);
>+out_unreg_driver:
>+ platform_driver_unregister(&floppy_driver);
> out_unreg_blkdev:
> unregister_blkdev(FLOPPY_MAJOR, "fd");
> out_put_disk:
>@@ -4543,6 +4571,7 @@ void cleanup_module(void)
> init_completion(&device_release);
> blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
> unregister_blkdev(FLOPPY_MAJOR, "fd");
>+ platform_driver_unregister(&floppy_driver);
>
> for (drive = 0; drive < N_DRIVE; drive++) {
> del_timer_sync(&motor_off_timer[drive]);
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 17:53 Mikael Pettersson
@ 2006-11-12 18:09 ` Ingo Molnar
2006-11-12 19:30 ` Andrew Morton
2006-11-12 19:40 ` Russell King
0 siblings, 2 replies; 25+ messages in thread
From: Ingo Molnar @ 2006-11-12 18:09 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: akpm, linux-kernel
* Mikael Pettersson <mikpe@it.uu.se> wrote:
> Sorry, no joy. The first access post-resume still fails and generates:
ok, then someone who knows the floppy driver better than me should put
the right stuff into the suspend/resume hooks :-)
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 18:09 ` Ingo Molnar
@ 2006-11-12 19:30 ` Andrew Morton
2006-11-12 20:35 ` Arjan van de Ven
2006-11-12 19:40 ` Russell King
1 sibling, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2006-11-12 19:30 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Mikael Pettersson, linux-kernel
On Sun, 12 Nov 2006 19:09:53 +0100
Ingo Molnar <mingo@elte.hu> wrote:
>
> * Mikael Pettersson <mikpe@it.uu.se> wrote:
>
> > Sorry, no joy. The first access post-resume still fails and generates:
>
> ok, then someone who knows the floppy driver better than me should put
> the right stuff into the suspend/resume hooks :-)
I don't think anyone understands the floppy driver.
How about we just revert the lockdep change?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 18:09 ` Ingo Molnar
2006-11-12 19:30 ` Andrew Morton
@ 2006-11-12 19:40 ` Russell King
1 sibling, 0 replies; 25+ messages in thread
From: Russell King @ 2006-11-12 19:40 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Mikael Pettersson, akpm, linux-kernel
On Sun, Nov 12, 2006 at 07:09:53PM +0100, Ingo Molnar wrote:
>
> * Mikael Pettersson <mikpe@it.uu.se> wrote:
>
> > Sorry, no joy. The first access post-resume still fails and generates:
>
> ok, then someone who knows the floppy driver better than me should put
> the right stuff into the suspend/resume hooks :-)
At a guess, what's probably happening is that the floppy drive, when
powered on after resume, reports "disk changed" because it doesn't
know any better.
We interpret "disk changed" to mean the disk has been removed and
possibly changed (which _is_ correct) and thereby abort any further
IO (irrespective of resume.)
Now, consider the following two scenarios:
1. You suspend and then resume, leaving the disk in the floppy drive.
2. You suspend, remove the floppy disk, insert a totally different disk
in the same drive, and then resume.
What should you do? (Hint: without reading the disk and comparing it
with what you have cached you don't know if the disk has been changed
or not.)
If you argue that in case (1) you should continue to allow IO, then
you potentially end up scribbling over a disk when someone does (2).
So I'd argue that the behaviour being seen by Mikael is the _safest_
behaviour, and the most correct behaviour given the limitations of
the hardware.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 19:30 ` Andrew Morton
@ 2006-11-12 20:35 ` Arjan van de Ven
0 siblings, 0 replies; 25+ messages in thread
From: Arjan van de Ven @ 2006-11-12 20:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ingo Molnar, Mikael Pettersson, linux-kernel
On Sun, 2006-11-12 at 11:30 -0800, Andrew Morton wrote:
> On Sun, 12 Nov 2006 19:09:53 +0100
> Ingo Molnar <mingo@elte.hu> wrote:
>
> >
> > * Mikael Pettersson <mikpe@it.uu.se> wrote:
> >
> > > Sorry, no joy. The first access post-resume still fails and generates:
> >
> > ok, then someone who knows the floppy driver better than me should put
> > the right stuff into the suspend/resume hooks :-)
>
> I don't think anyone understands the floppy driver.
>
> How about we just revert the lockdep change?
the lockdep change wasn't "just" an annotation, but an actual bugfix
though :(
but yeah arguably for a bug that isn't hit much :(
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
@ 2006-11-12 20:47 Mikael Pettersson
2006-11-12 21:29 ` Russell King
0 siblings, 1 reply; 25+ messages in thread
From: Mikael Pettersson @ 2006-11-12 20:47 UTC (permalink / raw)
To: mingo, rmk+lkml; +Cc: akpm, linux-kernel
On Sun, 12 Nov 2006 19:40:01 +0000, Russell King wrote:
> On Sun, Nov 12, 2006 at 07:09:53PM +0100, Ingo Molnar wrote:
> >
> > * Mikael Pettersson <mikpe@it.uu.se> wrote:
> >
> > > Sorry, no joy. The first access post-resume still fails and generates:
> >
> > ok, then someone who knows the floppy driver better than me should put
> > the right stuff into the suspend/resume hooks :-)
>
> At a guess, what's probably happening is that the floppy drive, when
> powered on after resume, reports "disk changed" because it doesn't
> know any better.
>
> We interpret "disk changed" to mean the disk has been removed and
> possibly changed (which _is_ correct) and thereby abort any further
> IO (irrespective of resume.)
>
> Now, consider the following two scenarios:
>
> 1. You suspend and then resume, leaving the disk in the floppy drive.
>
> 2. You suspend, remove the floppy disk, insert a totally different disk
> in the same drive, and then resume.
>
> What should you do? (Hint: without reading the disk and comparing it
> with what you have cached you don't know if the disk has been changed
> or not.)
>
> If you argue that in case (1) you should continue to allow IO, then
> you potentially end up scribbling over a disk when someone does (2).
>
> So I'd argue that the behaviour being seen by Mikael is the _safest_
> behaviour, and the most correct behaviour given the limitations of
> the hardware.
Note that my usage scenario consists of separate I/O commands:
1. boot
2. tar cf /dev/fd0 somefile
3. tar tvf /dev/fd0
4. suspend (close lid) and later resume (open lid)
5. tar tvf /dev/fd0 (this one fails since 2.6.18-rc1)
Ages and ages ago the floppy driver would cache data so that
after e.g. one "tar tvf /dev/fd0" command, a new "tar tvf /dev/fd0"
wouldn't actually do any I/O. But nowadays, at least in all 2.6
kernels, that's not done and both "tar tvf /dev/fd0" commands
will read the media.
Thus I don't see any excuse for the kernel failing step 5 above.
/Mikael
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 20:47 Mikael Pettersson
@ 2006-11-12 21:29 ` Russell King
2006-11-12 22:03 ` Ingo Molnar
0 siblings, 1 reply; 25+ messages in thread
From: Russell King @ 2006-11-12 21:29 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: mingo, akpm, linux-kernel
On Sun, Nov 12, 2006 at 09:47:08PM +0100, Mikael Pettersson wrote:
> On Sun, 12 Nov 2006 19:40:01 +0000, Russell King wrote:
> > On Sun, Nov 12, 2006 at 07:09:53PM +0100, Ingo Molnar wrote:
> > >
> > > * Mikael Pettersson <mikpe@it.uu.se> wrote:
> > >
> > > > Sorry, no joy. The first access post-resume still fails and generates:
> > >
> > > ok, then someone who knows the floppy driver better than me should put
> > > the right stuff into the suspend/resume hooks :-)
> >
> > At a guess, what's probably happening is that the floppy drive, when
> > powered on after resume, reports "disk changed" because it doesn't
> > know any better.
> >
> > We interpret "disk changed" to mean the disk has been removed and
> > possibly changed (which _is_ correct) and thereby abort any further
> > IO (irrespective of resume.)
> >
> > Now, consider the following two scenarios:
> >
> > 1. You suspend and then resume, leaving the disk in the floppy drive.
> >
> > 2. You suspend, remove the floppy disk, insert a totally different disk
> > in the same drive, and then resume.
> >
> > What should you do? (Hint: without reading the disk and comparing it
> > with what you have cached you don't know if the disk has been changed
> > or not.)
> >
> > If you argue that in case (1) you should continue to allow IO, then
> > you potentially end up scribbling over a disk when someone does (2).
> >
> > So I'd argue that the behaviour being seen by Mikael is the _safest_
> > behaviour, and the most correct behaviour given the limitations of
> > the hardware.
>
> Note that my usage scenario consists of separate I/O commands:
>
> 1. boot
> 2. tar cf /dev/fd0 somefile
> 3. tar tvf /dev/fd0
> 4. suspend (close lid) and later resume (open lid)
> 5. tar tvf /dev/fd0 (this one fails since 2.6.18-rc1)
>
> Ages and ages ago the floppy driver would cache data so that
> after e.g. one "tar tvf /dev/fd0" command, a new "tar tvf /dev/fd0"
> wouldn't actually do any I/O. But nowadays, at least in all 2.6
> kernels, that's not done and both "tar tvf /dev/fd0" commands
> will read the media.
>
> Thus I don't see any excuse for the kernel failing step 5 above.
In which case isn't the real regression that it does IO?
Nevertheless, I give you two options:
1. Abort all IO do inserted floppy disk after resume.
2. Corrupt replaced floppy disk after resume.
You have to pick one and exactly one. Which is inherently less risky to
the end user?
(I'm not saying that I can change the behaviour; I'm merely trying to
point out that you're arguing for an inherently dangerous behaviour.)
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
[not found] ` <7gtvd-7xg-23@gated-at.bofh.it>
@ 2006-11-12 21:44 ` Bodo Eggert
2006-11-14 11:05 ` Pavel Machek
0 siblings, 1 reply; 25+ messages in thread
From: Bodo Eggert @ 2006-11-12 21:44 UTC (permalink / raw)
To: Ingo Molnar, Mikael Pettersson, akpm, linux-kernel
Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> At a guess, what's probably happening is that the floppy drive, when
> powered on after resume, reports "disk changed" because it doesn't
> know any better.
>
> We interpret "disk changed" to mean the disk has been removed and
> possibly changed (which _is_ correct) and thereby abort any further
> IO (irrespective of resume.)
>
> Now, consider the following two scenarios:
>
> 1. You suspend and then resume, leaving the disk in the floppy drive.
>
> 2. You suspend, remove the floppy disk, insert a totally different disk
> in the same drive, and then resume.
>
> What should you do? (Hint: without reading the disk and comparing it
> with what you have cached you don't know if the disk has been changed
> or not.)
>
> If you argue that in case (1) you should continue to allow IO, then
> you potentially end up scribbling over a disk when someone does (2).
>
> So I'd argue that the behaviour being seen by Mikael is the _safest_
> behaviour, and the most correct behaviour given the limitations of
> the hardware.
- If a user suspends with a floppy in the drive, it will mostly be an error,
and he'll unsuspend in order to correct it.
- If it is no error, putting a different/modified floppy into the drive
before resume is unlikely
- Even if somebody does this, you can mostly detect the different disk
by comparing the first sector or just the FAT "serial number".
Therefore you can implement a relatively safe resume that will mostly DTRT
but destroy data in some unlikely cases, while doing the "safe thing" would
mostly cause a harmless (unless not noticed) kind of data loss and sometimes
safe you from real data loss.
I think you should let the user choose which foot to shoot.
--
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.
http://david.woodhou.se/why-not-spf.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 21:29 ` Russell King
@ 2006-11-12 22:03 ` Ingo Molnar
2006-11-12 23:54 ` Russell King
0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2006-11-12 22:03 UTC (permalink / raw)
To: Mikael Pettersson, akpm, linux-kernel
* Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> In which case isn't the real regression that it does IO?
>
> Nevertheless, I give you two options:
>
> 1. Abort all IO do inserted floppy disk after resume.
> 2. Corrupt replaced floppy disk after resume.
>
> You have to pick one and exactly one. Which is inherently less risky
> to the end user?
this isnt about in-flight IO (suspend doesnt succeed if IO is in flight
anyway). The bug is this:
1) you use the floppy and then stop using it
2) 1 hour passes. Nothing uses the floppy.
3) you suspend and later resume
4) another hour passes. Nothing uses the floppy.
5) you try to use the floppy: you get a bunch of IO errors!
6) you try to use the floppy again: this time it works
that's the regression. For some reason suspend/resume puts the floppy
hardware into a state that confuses the floppy driver.
my patch adds all the right suspend/resume hooks for this, without
reintroducing the bug that was noticed by lockdep and which was fixed by
my patch that introduced this regression - we just have to figure out
what to do upon resume to get the floppy driver and the hardware match
up each other, without passing spurious IO errors to the user.
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
@ 2006-11-12 22:40 Mikael Pettersson
2006-11-12 23:58 ` Russell King
0 siblings, 1 reply; 25+ messages in thread
From: Mikael Pettersson @ 2006-11-12 22:40 UTC (permalink / raw)
To: 7eggert, akpm, linux-kernel, mikpe, mingo
On Sun, 12 Nov 2006 22:44:28 +0100, Bodo Eggert wrote:
> > 1. You suspend and then resume, leaving the disk in the floppy drive.
> >
> > 2. You suspend, remove the floppy disk, insert a totally different disk
> > in the same drive, and then resume.
> >
> > What should you do? (Hint: without reading the disk and comparing it
> > with what you have cached you don't know if the disk has been changed
> > or not.)
> >
> > If you argue that in case (1) you should continue to allow IO, then
> > you potentially end up scribbling over a disk when someone does (2).
> >
> > So I'd argue that the behaviour being seen by Mikael is the _safest_
> > behaviour, and the most correct behaviour given the limitations of
> > the hardware.
>
> - If a user suspends with a floppy in the drive, it will mostly be an error,
> and he'll unsuspend in order to correct it.
> - If it is no error, putting a different/modified floppy into the drive
> before resume is unlikely
> - Even if somebody does this, you can mostly detect the different disk
> by comparing the first sector or just the FAT "serial number".
>
> Therefore you can implement a relatively safe resume that will mostly DTRT
> but destroy data in some unlikely cases, while doing the "safe thing" would
> mostly cause a harmless (unless not noticed) kind of data loss and sometimes
> safe you from real data loss.
The bug occurs regardless of whether I leave the floppy disc in the drive
during suspend or not. 2.6.19-rc5 (vanilla and with Ingo's suspend/resume
hooks) fails the following use case as well:
1. boot
2. insert floppy disc
3. tar tvf /dev/fd0 (works)
4. manually eject floppy disc
5. suspend, later resume
6. insert floppy disc
7. tar tvf /dev/fd0 (fails with I/O errors)
8. tar tvf /dev/fd0 (works)
Like Ingo said, something happens to the HW during suspend and we
need to figure out how to reinitialise the HW and the driver so that
things work immediately after resume.
/Mikael
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 22:03 ` Ingo Molnar
@ 2006-11-12 23:54 ` Russell King
2006-11-14 11:09 ` Pavel Machek
0 siblings, 1 reply; 25+ messages in thread
From: Russell King @ 2006-11-12 23:54 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Mikael Pettersson, akpm, linux-kernel
On Sun, Nov 12, 2006 at 11:03:18PM +0100, Ingo Molnar wrote:
>
> * Russell King <rmk+lkml@arm.linux.org.uk> wrote:
>
> > In which case isn't the real regression that it does IO?
> >
> > Nevertheless, I give you two options:
> >
> > 1. Abort all IO do inserted floppy disk after resume.
> > 2. Corrupt replaced floppy disk after resume.
> >
> > You have to pick one and exactly one. Which is inherently less risky
> > to the end user?
>
> this isnt about in-flight IO (suspend doesnt succeed if IO is in flight
> anyway).
I wasn't talking about in-flight IO. Take a moment to think about it.
- You have a floppy inserted and mounted.
- You write a file to it, and then suspend the machine.
*** No IO was in progress when the suspend occurred.
- You remove the floppy disk and insert a different disk
- You resume
- The kernel submits the dirty buffers for writing out to the disk.
That would lead to a corrupted floppy disk.
> The bug is this:
>
> 1) you use the floppy and then stop using it
> 2) 1 hour passes. Nothing uses the floppy.
> 3) you suspend and later resume
> 4) another hour passes. Nothing uses the floppy.
> 5) you try to use the floppy: you get a bunch of IO errors!
> 6) you try to use the floppy again: this time it works
>
> that's the regression. For some reason suspend/resume puts the floppy
> hardware into a state that confuses the floppy driver.
>
> my patch adds all the right suspend/resume hooks for this, without
> reintroducing the bug that was noticed by lockdep and which was fixed by
> my patch that introduced this regression - we just have to figure out
> what to do upon resume to get the floppy driver and the hardware match
> up each other, without passing spurious IO errors to the user.
You need to step the head off track 0 and back again. That clears the
disk changed status on the floppy drive hardware.
Now, if you'd actually taken the time to read my messages you'd have
realised that the floppy drive hardware defaults to reporting a
"disk changed" status after power on. Ergo, after resume, it says
"disk changed" as well.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 22:40 [patch] floppy: suspend/resume fix Mikael Pettersson
@ 2006-11-12 23:58 ` Russell King
2006-11-15 18:53 ` Ingo Molnar
0 siblings, 1 reply; 25+ messages in thread
From: Russell King @ 2006-11-12 23:58 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: 7eggert, akpm, linux-kernel, mingo
On Sun, Nov 12, 2006 at 11:40:28PM +0100, Mikael Pettersson wrote:
> The bug occurs regardless of whether I leave the floppy disc in the drive
> during suspend or not. 2.6.19-rc5 (vanilla and with Ingo's suspend/resume
> hooks) fails the following use case as well:
>
> 1. boot
> 2. insert floppy disc
> 3. tar tvf /dev/fd0 (works)
> 4. manually eject floppy disc
> 5. suspend, later resume
> 6. insert floppy disc
> 7. tar tvf /dev/fd0 (fails with I/O errors)
> 8. tar tvf /dev/fd0 (works)
>
> Like Ingo said, something happens to the HW during suspend and we
> need to figure out how to reinitialise the HW and the driver so that
> things work immediately after resume.
Now this is interesting - I know there's been a long standing bug with
kernels on my Thinkpad which behave in a similar way to your description
above. Basically whenever I change the disk in the drive I tend to need
_two_ goes to do anything with it - the first mostly always fails with
IO errors.
I've not bothered to report it because it runs a _very_ old 2.6 kernel
and upgrading the machine to something modern would be extremely painful
(read - would need new hard drive, and probably more RAM than the laptop
can take.)
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 21:44 ` Bodo Eggert
@ 2006-11-14 11:05 ` Pavel Machek
0 siblings, 0 replies; 25+ messages in thread
From: Pavel Machek @ 2006-11-14 11:05 UTC (permalink / raw)
To: Bodo Eggert; +Cc: Ingo Molnar, Mikael Pettersson, akpm, linux-kernel
Hi!
> - Even if somebody does this, you can mostly detect the different disk
> by comparing the first sector or just the FAT "serial number".
Feel free to send a patch; it would be useful, but it would probably
also mean an ugly layering violation.
> I think you should let the user choose which foot to shoot.
-ENOPATCH.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 23:54 ` Russell King
@ 2006-11-14 11:09 ` Pavel Machek
2006-11-14 16:34 ` Lee Revell
2006-11-15 18:46 ` Ingo Molnar
0 siblings, 2 replies; 25+ messages in thread
From: Pavel Machek @ 2006-11-14 11:09 UTC (permalink / raw)
To: Ingo Molnar, Mikael Pettersson, akpm, linux-kernel
Hi!
> > > Nevertheless, I give you two options:
> > >
> > > 1. Abort all IO do inserted floppy disk after resume.
> > > 2. Corrupt replaced floppy disk after resume.
> > >
> > > You have to pick one and exactly one. Which is inherently less risky
> > > to the end user?
> >
> > this isnt about in-flight IO (suspend doesnt succeed if IO is in flight
> > anyway).
>
> I wasn't talking about in-flight IO. Take a moment to think about it.
>
> - You have a floppy inserted and mounted.
Notice that Ingo is not talking about floppy being mounted.
> - You write a file to it, and then suspend the machine.
> *** No IO was in progress when the suspend occurred.
> - You remove the floppy disk and insert a different disk
> - You resume
> - The kernel submits the dirty buffers for writing out to the disk.
>
> That would lead to a corrupted floppy disk.
Suspending with mounted floppy is a user error. Write tarball to a
floppy, suspend, resume, write another tarball to a floppy is not, and
we want to fix that.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-14 11:09 ` Pavel Machek
@ 2006-11-14 16:34 ` Lee Revell
2006-11-15 20:24 ` Pavel Machek
2006-11-15 18:46 ` Ingo Molnar
1 sibling, 1 reply; 25+ messages in thread
From: Lee Revell @ 2006-11-14 16:34 UTC (permalink / raw)
To: Pavel Machek; +Cc: Ingo Molnar, Mikael Pettersson, akpm, linux-kernel
On Tue, 2006-11-14 at 12:09 +0100, Pavel Machek wrote:
> Suspending with mounted floppy is a user error.
Huh? How so?
Lee
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-14 11:09 ` Pavel Machek
2006-11-14 16:34 ` Lee Revell
@ 2006-11-15 18:46 ` Ingo Molnar
1 sibling, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2006-11-15 18:46 UTC (permalink / raw)
To: Pavel Machek; +Cc: Mikael Pettersson, akpm, linux-kernel
* Pavel Machek <pavel@ucw.cz> wrote:
> > I wasn't talking about in-flight IO. Take a moment to think about
> > it.
> >
> > - You have a floppy inserted and mounted.
>
> Notice that Ingo is not talking about floppy being mounted.
yeah. I was thinking in terms of "mdir a:".
if a floppy is mounted then i agree that suspending is probably a bit
too much to expect from the kernel - but this particular bug affects
normal mtool use (which you can think of to be equivalent to mounting,
using and then umounting the floppy).
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-12 23:58 ` Russell King
@ 2006-11-15 18:53 ` Ingo Molnar
0 siblings, 0 replies; 25+ messages in thread
From: Ingo Molnar @ 2006-11-15 18:53 UTC (permalink / raw)
To: Mikael Pettersson, 7eggert, akpm, linux-kernel
* Russell King <rmk+lkml@arm.linux.org.uk> wrote:
> On Sun, Nov 12, 2006 at 11:40:28PM +0100, Mikael Pettersson wrote:
> > The bug occurs regardless of whether I leave the floppy disc in the drive
> > during suspend or not. 2.6.19-rc5 (vanilla and with Ingo's suspend/resume
> > hooks) fails the following use case as well:
> >
> > 1. boot
> > 2. insert floppy disc
> > 3. tar tvf /dev/fd0 (works)
> > 4. manually eject floppy disc
> > 5. suspend, later resume
> > 6. insert floppy disc
> > 7. tar tvf /dev/fd0 (fails with I/O errors)
> > 8. tar tvf /dev/fd0 (works)
> >
> > Like Ingo said, something happens to the HW during suspend and we
> > need to figure out how to reinitialise the HW and the driver so that
> > things work immediately after resume.
>
> Now this is interesting - I know there's been a long standing bug with
> kernels on my Thinkpad which behave in a similar way to your
> description above. Basically whenever I change the disk in the drive
> I tend to need _two_ goes to do anything with it - the first mostly
> always fails with IO errors.
yeah. But somehow the pre-lockdep-change driver gets this right - purely
by virtue of unregistering the IRQ line and the DMA channel - neither of
which should have any material effect on behavior ... [and when i
restored this in suspend/resume it didnt fix the bug] Weird.
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-14 16:34 ` Lee Revell
@ 2006-11-15 20:24 ` Pavel Machek
2006-11-15 20:34 ` Lee Revell
2006-11-15 20:49 ` Alan
0 siblings, 2 replies; 25+ messages in thread
From: Pavel Machek @ 2006-11-15 20:24 UTC (permalink / raw)
To: Lee Revell; +Cc: Ingo Molnar, Mikael Pettersson, akpm, linux-kernel
On Tue 2006-11-14 11:34:21, Lee Revell wrote:
> On Tue, 2006-11-14 at 12:09 +0100, Pavel Machek wrote:
> > Suspending with mounted floppy is a user error.
>
> Huh? How so?
Floppy is removable, and you are expected to umount removable devices
before suspend.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-15 20:24 ` Pavel Machek
@ 2006-11-15 20:34 ` Lee Revell
2006-11-15 20:49 ` Alan
1 sibling, 0 replies; 25+ messages in thread
From: Lee Revell @ 2006-11-15 20:34 UTC (permalink / raw)
To: Pavel Machek; +Cc: Ingo Molnar, Mikael Pettersson, akpm, linux-kernel
On Wed, 2006-11-15 at 21:24 +0100, Pavel Machek wrote:
> On Tue 2006-11-14 11:34:21, Lee Revell wrote:
> > On Tue, 2006-11-14 at 12:09 +0100, Pavel Machek wrote:
> > > Suspending with mounted floppy is a user error.
> >
> > Huh? How so?
>
> Floppy is removable, and you are expected to umount removable devices
> before suspend.
Ah, OK. I read that as "it's an error for the user to close the lid
with a floppy or CD in the drive". But really HAL or the desktop
environment or whatever is supposed to do it...
Lee
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-15 20:24 ` Pavel Machek
2006-11-15 20:34 ` Lee Revell
@ 2006-11-15 20:49 ` Alan
2006-11-15 20:49 ` Pavel Machek
1 sibling, 1 reply; 25+ messages in thread
From: Alan @ 2006-11-15 20:49 UTC (permalink / raw)
To: Pavel Machek, linux-kernel
On Wed, 15 Nov 2006 21:24:18 +0100
Pavel Machek <pavel@ucw.cz> wrote:
> On Tue 2006-11-14 11:34:21, Lee Revell wrote:
> > On Tue, 2006-11-14 at 12:09 +0100, Pavel Machek wrote:
> > > Suspending with mounted floppy is a user error.
> >
> > Huh? How so?
>
> Floppy is removable, and you are expected to umount removable devices
> before suspend.
That seems pretty crude. There are lots of cases where an apparently
removable device is/should be preserved properly and left mounted (eg
builtin CF).
We really want to be smarter than that - which means the drivers ought to
be doing stuff in their suspend/resume paths to figure out if the media
changed when really possible (eg IDE removable)
Floppy is probably not too fixable, but calling it a "user error" is
insulting - user expectation is reasonable that suspend/resume should
just work. The implementation is just rather trickier/nonsensical in this
case.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-15 20:49 ` Alan
@ 2006-11-15 20:49 ` Pavel Machek
2006-11-15 21:03 ` Rafael J. Wysocki
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Pavel Machek @ 2006-11-15 20:49 UTC (permalink / raw)
To: Alan; +Cc: linux-kernel
Hi!
> > > > Suspending with mounted floppy is a user error.
> > >
> > > Huh? How so?
> >
> > Floppy is removable, and you are expected to umount removable devices
> > before suspend.
>
> That seems pretty crude. There are lots of cases where an apparently
> removable device is/should be preserved properly and left mounted (eg
> builtin CF).
>
> We really want to be smarter than that - which means the drivers ought to
> be doing stuff in their suspend/resume paths to figure out if the media
> changed when really possible (eg IDE removable)
>
> Floppy is probably not too fixable, but calling it a "user error" is
> insulting - user expectation is reasonable that suspend/resume should
> just work. The implementation is just rather trickier/nonsensical in this
> case.
Yep, it would be nice to do something about that; but I'm not sure how
this "was media changed" should be implemented, and if it should be
done in kernel or in userland.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-15 20:49 ` Pavel Machek
@ 2006-11-15 21:03 ` Rafael J. Wysocki
2006-11-15 21:14 ` Arjan van de Ven
2006-11-16 12:38 ` Russell King
2 siblings, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2006-11-15 21:03 UTC (permalink / raw)
To: Pavel Machek; +Cc: Alan, linux-kernel
On Wednesday, 15 November 2006 21:49, Pavel Machek wrote:
> Hi!
>
> > > > > Suspending with mounted floppy is a user error.
> > > >
> > > > Huh? How so?
> > >
> > > Floppy is removable, and you are expected to umount removable devices
> > > before suspend.
> >
> > That seems pretty crude. There are lots of cases where an apparently
> > removable device is/should be preserved properly and left mounted (eg
> > builtin CF).
> >
> > We really want to be smarter than that - which means the drivers ought to
> > be doing stuff in their suspend/resume paths to figure out if the media
> > changed when really possible (eg IDE removable)
> >
> > Floppy is probably not too fixable, but calling it a "user error" is
> > insulting - user expectation is reasonable that suspend/resume should
> > just work. The implementation is just rather trickier/nonsensical in this
> > case.
>
> Yep, it would be nice to do something about that; but I'm not sure how
> this "was media changed" should be implemented, and if it should be
> done in kernel or in userland.
<heresy>
Use something like subfs?
</heresy>
Greetings,
Rafael
--
You never change things by fighting the existing reality.
R. Buckminster Fuller
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-15 20:49 ` Pavel Machek
2006-11-15 21:03 ` Rafael J. Wysocki
@ 2006-11-15 21:14 ` Arjan van de Ven
2006-11-16 12:38 ` Russell King
2 siblings, 0 replies; 25+ messages in thread
From: Arjan van de Ven @ 2006-11-15 21:14 UTC (permalink / raw)
To: Pavel Machek; +Cc: Alan, linux-kernel
> Yep, it would be nice to do something about that; but I'm not sure how
> this "was media changed" should be implemented, and if it should be
> done in kernel or in userland.
well I guess step 1 is to sync_bdev() or whatever it is called nowadays
before suspend. And maybe force unmount on resume always?
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
2006-11-15 20:49 ` Pavel Machek
2006-11-15 21:03 ` Rafael J. Wysocki
2006-11-15 21:14 ` Arjan van de Ven
@ 2006-11-16 12:38 ` Russell King
2 siblings, 0 replies; 25+ messages in thread
From: Russell King @ 2006-11-16 12:38 UTC (permalink / raw)
To: Pavel Machek; +Cc: Alan, linux-kernel
On Wed, Nov 15, 2006 at 09:49:33PM +0100, Pavel Machek wrote:
> > > > > Suspending with mounted floppy is a user error.
> > > >
> > > > Huh? How so?
> > >
> > > Floppy is removable, and you are expected to umount removable devices
> > > before suspend.
> >
> > That seems pretty crude. There are lots of cases where an apparently
> > removable device is/should be preserved properly and left mounted (eg
> > builtin CF).
> >
> > We really want to be smarter than that - which means the drivers ought to
> > be doing stuff in their suspend/resume paths to figure out if the media
> > changed when really possible (eg IDE removable)
> >
> > Floppy is probably not too fixable, but calling it a "user error" is
> > insulting - user expectation is reasonable that suspend/resume should
> > just work. The implementation is just rather trickier/nonsensical in this
> > case.
>
> Yep, it would be nice to do something about that; but I'm not sure how
> this "was media changed" should be implemented, and if it should be
> done in kernel or in userland.
With a floppy the only way to do that is to read data off the disk and
compare it with what we know was on the disk prior to suspend/resume.
Since we don't have a "standard floppy format" (since we're flexible)
you can't rely on MSDOS boot sectors and the like to uniquely identify
floppy disks.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2006-11-16 12:39 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-12 22:40 [patch] floppy: suspend/resume fix Mikael Pettersson
2006-11-12 23:58 ` Russell King
2006-11-15 18:53 ` Ingo Molnar
[not found] <7grMO-2YO-55@gated-at.bofh.it>
[not found] ` <7gs69-46A-37@gated-at.bofh.it>
[not found] ` <7gtvd-7xg-23@gated-at.bofh.it>
2006-11-12 21:44 ` Bodo Eggert
2006-11-14 11:05 ` Pavel Machek
-- strict thread matches above, loose matches on Subject: below --
2006-11-12 20:47 Mikael Pettersson
2006-11-12 21:29 ` Russell King
2006-11-12 22:03 ` Ingo Molnar
2006-11-12 23:54 ` Russell King
2006-11-14 11:09 ` Pavel Machek
2006-11-14 16:34 ` Lee Revell
2006-11-15 20:24 ` Pavel Machek
2006-11-15 20:34 ` Lee Revell
2006-11-15 20:49 ` Alan
2006-11-15 20:49 ` Pavel Machek
2006-11-15 21:03 ` Rafael J. Wysocki
2006-11-15 21:14 ` Arjan van de Ven
2006-11-16 12:38 ` Russell King
2006-11-15 18:46 ` Ingo Molnar
2006-11-12 17:53 Mikael Pettersson
2006-11-12 18:09 ` Ingo Molnar
2006-11-12 19:30 ` Andrew Morton
2006-11-12 20:35 ` Arjan van de Ven
2006-11-12 19:40 ` Russell King
2006-11-11 20:48 [BUG] floppy: broken after resume due to 2.6.18-rc1 lockdep changes Mikael Pettersson
2006-11-12 15:47 ` [patch] floppy: suspend/resume fix Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox