public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [patch] floppy: suspend/resume fix
@ 2006-11-12 22:40 Mikael Pettersson
  2006-11-12 23:58 ` Russell King
  0 siblings, 1 reply; 25+ messages in thread
From: Mikael Pettersson @ 2006-11-12 22:40 UTC (permalink / raw)
  To: 7eggert, akpm, linux-kernel, mikpe, mingo

On Sun, 12 Nov 2006 22:44:28 +0100, Bodo Eggert wrote:
> > 1. You suspend and then resume, leaving the disk in the floppy drive.
> > 
> > 2. You suspend, remove the floppy disk, insert a totally different disk
> >    in the same drive, and then resume.
> > 
> > What should you do?  (Hint: without reading the disk and comparing it
> > with what you have cached you don't know if the disk has been changed
> > or not.)
> > 
> > If you argue that in case (1) you should continue to allow IO, then
> > you potentially end up scribbling over a disk when someone does (2).
> > 
> > So I'd argue that the behaviour being seen by Mikael is the _safest_
> > behaviour, and the most correct behaviour given the limitations of
> > the hardware.
> 
> - If a user suspends with a floppy in the drive, it will mostly be an error,
>   and he'll unsuspend in order to correct it.
> - If it is no error, putting a different/modified floppy into the drive
>   before resume is unlikely
> - Even if somebody does this, you can mostly detect the different disk
>   by comparing the first sector or just the FAT "serial number".
> 
> Therefore you can implement a relatively safe resume that will mostly DTRT
> but destroy data in some unlikely cases, while doing the "safe thing" would
> mostly cause a harmless (unless not noticed) kind of data loss and sometimes
> safe you from real data loss.

The bug occurs regardless of whether I leave the floppy disc in the drive
during suspend or not. 2.6.19-rc5 (vanilla and with Ingo's suspend/resume
hooks) fails the following use case as well:

1. boot
2. insert floppy disc
3. tar tvf /dev/fd0 (works)
4. manually eject floppy disc
5. suspend, later resume 
6. insert floppy disc
7. tar tvf /dev/fd0 (fails with I/O errors)
8. tar tvf /dev/fd0 (works)

Like Ingo said, something happens to the HW during suspend and we
need to figure out how to reinitialise the HW and the driver so that
things work immediately after resume.

/Mikael

^ permalink raw reply	[flat|nested] 25+ messages in thread
[parent not found: <7grMO-2YO-55@gated-at.bofh.it>]
* Re: [patch] floppy: suspend/resume fix
@ 2006-11-12 20:47 Mikael Pettersson
  2006-11-12 21:29 ` Russell King
  0 siblings, 1 reply; 25+ messages in thread
From: Mikael Pettersson @ 2006-11-12 20:47 UTC (permalink / raw)
  To: mingo, rmk+lkml; +Cc: akpm, linux-kernel

On Sun, 12 Nov 2006 19:40:01 +0000, Russell King wrote:
> On Sun, Nov 12, 2006 at 07:09:53PM +0100, Ingo Molnar wrote:
> > 
> > * Mikael Pettersson <mikpe@it.uu.se> wrote:
> > 
> > > Sorry, no joy. The first access post-resume still fails and generates:
> > 
> > ok, then someone who knows the floppy driver better than me should put 
> > the right stuff into the suspend/resume hooks :-)
> 
> At a guess, what's probably happening is that the floppy drive, when
> powered on after resume, reports "disk changed" because it doesn't
> know any better.
> 
> We interpret "disk changed" to mean the disk has been removed and
> possibly changed (which _is_ correct) and thereby abort any further
> IO (irrespective of resume.)
> 
> Now, consider the following two scenarios:
> 
> 1. You suspend and then resume, leaving the disk in the floppy drive.
> 
> 2. You suspend, remove the floppy disk, insert a totally different disk
>    in the same drive, and then resume.
> 
> What should you do?  (Hint: without reading the disk and comparing it
> with what you have cached you don't know if the disk has been changed
> or not.)
> 
> If you argue that in case (1) you should continue to allow IO, then
> you potentially end up scribbling over a disk when someone does (2).
> 
> So I'd argue that the behaviour being seen by Mikael is the _safest_
> behaviour, and the most correct behaviour given the limitations of
> the hardware.

Note that my usage scenario consists of separate I/O commands:

1. boot
2. tar cf /dev/fd0 somefile
3. tar tvf /dev/fd0
4. suspend (close lid) and later resume (open lid) 
5. tar tvf /dev/fd0 (this one fails since 2.6.18-rc1)

Ages and ages ago the floppy driver would cache data so that
after e.g. one "tar tvf /dev/fd0" command, a new "tar tvf /dev/fd0"
wouldn't actually do any I/O. But nowadays, at least in all 2.6
kernels, that's not done and both "tar tvf /dev/fd0" commands
will read the media.

Thus I don't see any excuse for the kernel failing step 5 above.

/Mikael

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [patch] floppy: suspend/resume fix
@ 2006-11-12 17:53 Mikael Pettersson
  2006-11-12 18:09 ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Mikael Pettersson @ 2006-11-12 17:53 UTC (permalink / raw)
  To: mingo; +Cc: akpm, linux-kernel

On Sun, 12 Nov 2006 16:47:12 +0100, Ingo Molnar wrote:
>* Mikael Pettersson <mikpe@it.uu.se> wrote:
>
>> On my old Dell Latitude laptop, the first access to the floppy after 
>> having resumed from APM suspend fails miserably and generates these 
>> kernel messages (from 2.6.19-rc5):
>[...]
>
>> It's only the first post-resume access that triggers this failure, 
>> subsequent accesses do work.
>> 
>> I've traced the cause to Ingo's lockdep patch in 2.6.18-rc1 (see 
>> below): reverting it makes the floppy work after resume again.
>
>could you check the patch below? I had to add a platform driver to 
>floppy.c to get suspend/resume callbacks, but otherwise it's relatively 
>straightforward.

Sorry, no joy. The first access post-resume still fails and generates:

floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 8
Buffer I/O error on device fd0, logical block 1
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 16
Buffer I/O error on device fd0, logical block 2
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 24
Buffer I/O error on device fd0, logical block 3
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 32
Buffer I/O error on device fd0, logical block 4
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 40
Buffer I/O error on device fd0, logical block 5
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 48
Buffer I/O error on device fd0, logical block 6
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 56
Buffer I/O error on device fd0, logical block 7
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0

The 2nd etc post-resume accesses work like before.

Your patch did change the failure behaviour slightly.
Previously the first post-resume access would experience
a delay of about a second or so, and then report an error.
With this patch the error is immediate.

/Mikael

>	Ingo
>
>----------------------->
>Subject: [patch] floppy: suspend/resume fix
>From: Ingo Molnar <mingo@elte.hu>
>
>introduce a floppy platform-driver and suspend/resume ops to
>stop/start the floppy driver. Bug reported by Mikael Pettersson.
>
>Signed-off-by: Ingo Molnar <mingo@elte.hu>
>---
> drivers/block/floppy.c |   31 ++++++++++++++++++++++++++++++-
> 1 file changed, 30 insertions(+), 1 deletion(-)
>
>Index: linux/drivers/block/floppy.c
>===================================================================
>--- linux.orig/drivers/block/floppy.c
>+++ linux/drivers/block/floppy.c
>@@ -4157,6 +4157,28 @@ static void floppy_device_release(struct
> 	complete(&device_release);
> }
> 
>+static int floppy_suspend(struct platform_device *dev, pm_message_t state)
>+{
>+	floppy_release_irq_and_dma();
>+
>+	return 0;
>+}
>+
>+static int floppy_resume(struct platform_device *dev)
>+{
>+	floppy_grab_irq_and_dma();
>+
>+	return 0;
>+}
>+
>+static struct platform_driver floppy_driver = {
>+	.suspend	= floppy_suspend,
>+	.resume		= floppy_resume,
>+	.driver		= {
>+		.name	= "floppy",
>+	},
>+};
>+
> static struct platform_device floppy_device[N_DRIVE];
> 
> static struct kobject *floppy_find(dev_t dev, int *part, void *data)
>@@ -4205,10 +4227,14 @@ static int __init floppy_init(void)
> 	if (err)
> 		goto out_put_disk;
> 
>+	err = platform_driver_register(&floppy_driver);
>+	if (err)
>+		goto out_unreg_blkdev;
>+
> 	floppy_queue = blk_init_queue(do_fd_request, &floppy_lock);
> 	if (!floppy_queue) {
> 		err = -ENOMEM;
>-		goto out_unreg_blkdev;
>+		goto out_unreg_driver;
> 	}
> 	blk_queue_max_sectors(floppy_queue, 64);
> 
>@@ -4352,6 +4378,8 @@ out_flush_work:
> out_unreg_region:
> 	blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
> 	blk_cleanup_queue(floppy_queue);
>+out_unreg_driver:
>+	platform_driver_unregister(&floppy_driver);
> out_unreg_blkdev:
> 	unregister_blkdev(FLOPPY_MAJOR, "fd");
> out_put_disk:
>@@ -4543,6 +4571,7 @@ void cleanup_module(void)
> 	init_completion(&device_release);
> 	blk_unregister_region(MKDEV(FLOPPY_MAJOR, 0), 256);
> 	unregister_blkdev(FLOPPY_MAJOR, "fd");
>+	platform_driver_unregister(&floppy_driver);
> 
> 	for (drive = 0; drive < N_DRIVE; drive++) {
> 		del_timer_sync(&motor_off_timer[drive]);
>

^ permalink raw reply	[flat|nested] 25+ messages in thread
* [BUG] floppy: broken after resume due to 2.6.18-rc1 lockdep changes
@ 2006-11-11 20:48 Mikael Pettersson
  2006-11-12 15:47 ` [patch] floppy: suspend/resume fix Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: Mikael Pettersson @ 2006-11-11 20:48 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel

On my old Dell Latitude laptop, the first access to the floppy
after having resumed from APM suspend fails miserably and generates
these kernel messages (from 2.6.19-rc5):

floppy driver state
-------------------
now=1023244 last interrupt=256121 diff=767123 last called handler=c01e7580
timeout_message=lock fdc
last output bytes:
 4 90 256004
 0 90 256004
 1 90 256004
 2 90 256004
12 90 256004
1b 90 256004
ff 90 256004
 f 80 256061
 0 90 256061
 5 90 256061
 8 81 256062
e6 80 256064
 0 90 256064
 5 90 256064
 0 90 256064
 1 90 256064
 2 90 256064
12 90 256064
1b 90 256064
ff 90 256064
last result at 256121
last redo_fd_request at 256121
 4  0  0  6  0  1  2 
status=0
fdc_busy=1
do_floppy=c01ed580
cont=c02a5b24
current_req=00000000
command_status=-1

floppy0: floppy timeout called
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 8
Buffer I/O error on device fd0, logical block 1
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 16
Buffer I/O error on device fd0, logical block 2
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 24
Buffer I/O error on device fd0, logical block 3
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 32
Buffer I/O error on device fd0, logical block 4
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 40
Buffer I/O error on device fd0, logical block 5
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 48
Buffer I/O error on device fd0, logical block 6
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 56
Buffer I/O error on device fd0, logical block 7
floppy0: disk absent or changed during operation
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0

It's only the first post-resume access that triggers this failure,
subsequent accesses do work.

I've traced the cause to Ingo's lockdep patch in 2.6.18-rc1
(see below): reverting it makes the floppy work after resume again.

Before these changes the driver reserved and released the HW
dynamically, but after the changes it no longer does that.
What I suspect is happening is that an APM suspend/resume cycle
doesn't fully preserve the HW state. Previously the driver
wouldn't notice since it would re-reserve and reinitialise
the HW anyway when needed.

So putting a floppy_release_irq_and_dma() in a ->suspend()
hook and a floppy_grab_irq_and_dma() in a ->resume() hook
should do the trick. Unfortunately I'm not sure where in
floppy.c to do that since these routines aren't per-device
but work on all devices in one go.

/Mikael

>From: Ingo Molnar <mingo@elte.hu>
Date: Mon, 3 Jul 2006 07:24:23 +0000 (-0700)
Subject: [PATCH] lockdep: floppy.c irq release fix
X-Git-Tag: v2.6.18-rc1
X-Git-Url: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3e541a4ae534a7e59ad464af9abea382b3035724

[PATCH] lockdep: floppy.c irq release fix

The lock validator triggered a number of bugs in the floppy driver, all
related to the floppy driver allocating and freeing irq and dma resources from
interrupt context.  The initial solution was to use schedule_work() to push
this into process context, but this caused further problems: for example the
current floppy driver in -mm2 is totally broken and all floppy commands time
out with an error.  (as reported by Barry K.  Nathan)

This patch tries another solution: simply get rid of all that dynamic IRQ and
DMA allocation/freeing.  I doubt it made much sense back in the heydays of
floppies (if two devices raced for DMA or IRQ resources then we didnt handle
those cases too gracefully anyway), and today it makes near zero sense.

So the new code does the simplest and most straightforward thing: allocate IRQ
and DMA resources at module init time, and free them at module removal time.
Dont try to release while the driver is operational.  This, besides making the
floppy driver functional again has an added bonus, floppy IRQ stats are
finally persistent and visible in /proc/interrupts:

  6: 63 XT-PIC-level floppy

Besides normal floppy IO i have also tested IO error handling, motor-off
timeouts, etc.  - and everything seems to be working fine.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -249,18 +249,6 @@ static int irqdma_allocated;
 #include <linux/cdrom.h>	/* for the compatibility eject ioctl */
 #include <linux/completion.h>
 
-/*
- * Interrupt freeing also means /proc VFS work - dont do it
- * from interrupt context. We push this work into keventd:
- */
-static void fd_free_irq_fn(void *data)
-{
-	fd_free_irq();
-}
-
-static DECLARE_WORK(fd_free_irq_work, fd_free_irq_fn, NULL);
-
-
 static struct request *current_req;
 static struct request_queue *floppy_queue;
 static void do_fd_request(request_queue_t * q);
@@ -826,15 +814,6 @@ static int set_dor(int fdc, char mask, c
 			UDRS->select_date = jiffies;
 		}
 	}
-	/*
-	 *      We should propagate failures to grab the resources back
-	 *      nicely from here. Actually we ought to rewrite the fd
-	 *      driver some day too.
-	 */
-	if (newdor & FLOPPY_MOTOR_MASK)
-		floppy_grab_irq_and_dma();
-	if (olddor & FLOPPY_MOTOR_MASK)
-		floppy_release_irq_and_dma();
 	return olddor;
 }
 
@@ -892,8 +871,6 @@ static int _lock_fdc(int drive, int inte
 		       line);
 		return -1;
 	}
-	if (floppy_grab_irq_and_dma() == -1)
-		return -EBUSY;
 
 	if (test_and_set_bit(0, &fdc_busy)) {
 		DECLARE_WAITQUEUE(wait, current);
@@ -915,6 +892,8 @@ static int _lock_fdc(int drive, int inte
 
 		set_current_state(TASK_RUNNING);
 		remove_wait_queue(&fdc_wait, &wait);
+
+		flush_scheduled_work();
 	}
 	command_status = FD_COMMAND_NONE;
 
@@ -948,7 +927,6 @@ static inline void unlock_fdc(void)
 	if (elv_next_request(floppy_queue))
 		do_fd_request(floppy_queue);
 	spin_unlock_irqrestore(&floppy_lock, flags);
-	floppy_release_irq_and_dma();
 	wake_up(&fdc_wait);
 }
 
@@ -3694,8 +3672,8 @@ static int floppy_release(struct inode *
 	}
 	if (!UDRS->fd_ref)
 		opened_bdev[drive] = NULL;
-	floppy_release_irq_and_dma();
 	mutex_unlock(&open_lock);
+
 	return 0;
 }
 
@@ -3726,9 +3704,6 @@ static int floppy_open(struct inode *ino
 	if (UDRS->fd_ref == -1 || (UDRS->fd_ref && (filp->f_flags & O_EXCL)))
 		goto out2;
 
-	if (floppy_grab_irq_and_dma())
-		goto out2;
-
 	if (filp->f_flags & O_EXCL)
 		UDRS->fd_ref = -1;
 	else
@@ -3805,7 +3780,6 @@ out:
 		UDRS->fd_ref--;
 	if (!UDRS->fd_ref)
 		opened_bdev[drive] = NULL;
-	floppy_release_irq_and_dma();
 out2:
 	mutex_unlock(&open_lock);
 	return res;
@@ -3822,14 +3796,9 @@ static int check_floppy_change(struct ge
 		return 1;
 
 	if (time_after(jiffies, UDRS->last_checked + UDP->checkfreq)) {
-		if (floppy_grab_irq_and_dma()) {
-			return 1;
-		}
-
 		lock_fdc(drive, 0);
 		poll_drive(0, 0);
 		process_fd_request();
-		floppy_release_irq_and_dma();
 	}
 
 	if (UTESTF(FD_DISK_CHANGED) ||
@@ -4346,7 +4315,6 @@ static int __init floppy_init(void)
 	fdc = 0;
 	del_timer(&fd_timeout);
 	current_drive = 0;
-	floppy_release_irq_and_dma();
 	initialising = 0;
 	if (have_no_fdc) {
 		DPRINT("no floppy controllers found\n");
@@ -4504,7 +4472,7 @@ static void floppy_release_irq_and_dma(v
 	if (irqdma_allocated) {
 		fd_disable_dma();
 		fd_free_dma();
-		schedule_work(&fd_free_irq_work);
+		fd_free_irq();
 		irqdma_allocated = 0;
 	}
 	set_dor(0, ~0, 8);
@@ -4600,8 +4568,6 @@ void cleanup_module(void)
 	/* eject disk, if any */
 	fd_eject(0);
 
-	flush_scheduled_work();		/* fd_free_irq() might be pending */
-
 	wait_for_completion(&device_release);
 }
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-11-16 12:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-12 22:40 [patch] floppy: suspend/resume fix Mikael Pettersson
2006-11-12 23:58 ` Russell King
2006-11-15 18:53   ` Ingo Molnar
     [not found] <7grMO-2YO-55@gated-at.bofh.it>
     [not found] ` <7gs69-46A-37@gated-at.bofh.it>
     [not found]   ` <7gtvd-7xg-23@gated-at.bofh.it>
2006-11-12 21:44     ` Bodo Eggert
2006-11-14 11:05       ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2006-11-12 20:47 Mikael Pettersson
2006-11-12 21:29 ` Russell King
2006-11-12 22:03   ` Ingo Molnar
2006-11-12 23:54     ` Russell King
2006-11-14 11:09       ` Pavel Machek
2006-11-14 16:34         ` Lee Revell
2006-11-15 20:24           ` Pavel Machek
2006-11-15 20:34             ` Lee Revell
2006-11-15 20:49             ` Alan
2006-11-15 20:49               ` Pavel Machek
2006-11-15 21:03                 ` Rafael J. Wysocki
2006-11-15 21:14                 ` Arjan van de Ven
2006-11-16 12:38                 ` Russell King
2006-11-15 18:46         ` Ingo Molnar
2006-11-12 17:53 Mikael Pettersson
2006-11-12 18:09 ` Ingo Molnar
2006-11-12 19:30   ` Andrew Morton
2006-11-12 20:35     ` Arjan van de Ven
2006-11-12 19:40   ` Russell King
2006-11-11 20:48 [BUG] floppy: broken after resume due to 2.6.18-rc1 lockdep changes Mikael Pettersson
2006-11-12 15:47 ` [patch] floppy: suspend/resume fix Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox