Problem in IDE Disks cache handling in kernel 2.4.XX

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Problem in IDE Disks cache handling in kernel 2.4.XX
@ 2003-01-10  9:54 Francis Verscheure
  2003-01-10 10:19 ` John Bradford
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Francis Verscheure @ 2003-01-10  9:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: marcelo

Hello to everybody  and a Happy New Year.
Thanks a lot for all the great job you are all doing.

Having EXT2 file system corruption after suspend on a notebook I investigate 
kernel code and it seems to me that IDE Disk cache handling is wrong in 
kernel 2.4.XX.

Sources extracts are from kernel 2.4.20.

If you look at  struct hd_driveid  in hdreg.h you will find :

unsigned short  command_set_1;	/* (word 82) supported
	 *  6:	look-ahead
	 *  5:	write cache			==== this means the Disk has a disk cache =====

unsigned short  command_set_2;	/* (word 83)
	 * 13:	FLUSH CACHE EXT		==== Those fields were RESERVED in ATA/ATAPI 5 
	 * 12:	FLUSH CACHE			==== and are used ONLY in ATA/ATAPI 6 ====
	 * 10:	48-bit Address Feature Set

unsigned short  cfs_enable_1;	/* (word 85)
	 *  6:	look-ahead
	 *  5:	write cache			==== this means the Disk cache is enabled or disabled
					 */
unsigned short  cfs_enable_2;	/* (word 86)
	 * 13:	FLUSH CACHE EXT		==== Those fields were RESERVED in ATA/ATAPI 5
	 * 12:	FLUSH CACHE			==== and are used ONLY in ATA/ATAPI 6 ====
	 * 10:	48-bit Address Feature Set

In ide-disk.c you have

static int write_cache (ide_drive_t *drive, int arg)
{
	struct hd_drive_task_hdr taskfile;
	struct hd_drive_hob_hdr hobfile;
	memset(&taskfile, 0, sizeof(struct hd_drive_task_hdr));
	memset(&hobfile, 0, sizeof(struct hd_drive_hob_hdr));
	taskfile.feature	= (arg) ? SETFEATURES_EN_WCACHE : SETFEATURES_DIS_WCACHE;
	taskfile.command	= WIN_SETFEATURES;

	if (!(drive->id->cfs_enable_2 & 0x3000))	<==== WRONG ! ONLY FOR ATA/ATAPI 6
		return 1;

	(void) ide_wait_taskfile(drive, &taskfile, &hobfile, NULL);
	drive->wcache = arg;
	return 0;
}

In fact for ATA/ATAPI 5 cfs_enable_2  has no meaning.  The fields to test are 
write cache bits in command_set_1 and cfs_enable_1.
And in both cases the FLUSH CACHE command ALWAYS EXISTS !
The test of cfs_enable_2 must only be used for ATA/ATAPI 6 drives to use 
FLUSH CACHE or FLUSH CACHE EXT in case of 48 bit addressing mode.

And it seems to me that when an IDE drive has a cache enabled wcache must be 
initialized to say so ? Or you have to do a STANDY or SLEEP before APM 
suspend or power off to be sure that the cache has been written to the disk.

I had a look at patch 2.4.21pre3 and the code looks the same.

And by the way how are powered off the IDE drives ?
Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the 
drive with cache enabled or you will enjoy lost data.

I am not on the list so thank you to CC me.

Best regards to all.

Francis Verscheure

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10  9:54 Problem in IDE Disks cache handling in kernel 2.4.XX Francis Verscheure
@ 2003-01-10 10:19 ` John Bradford
  2003-01-10 11:37 ` Alan Cox
  2003-01-10 17:23 ` Alan Cox
  2 siblings, 0 replies; 16+ messages in thread
From: John Bradford @ 2003-01-10 10:19 UTC (permalink / raw)
  To: fverscheure; +Cc: marcelo, linux-kernel

> And by the way how are powered off the IDE drives ?
> Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before
> powering off the drive with cache enabled or you will enjoy lost
> data.

This was discussed on the list a few months ago:

http://marc.theaimsgroup.com/?l=linux-kernel&m=103188486216124&w=2

I'm not sure it really got fully resolved, I had disks that would
spin down and then spin up again, because of the order that the
standby and flush cache commands were sent.

John

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 11:37 ` Alan Cox
@ 2003-01-10 11:07   ` Benjamin Herrenschmidt
  2003-01-10 11:14   ` Andre Hedrick
  1 sibling, 0 replies; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2003-01-10 11:07 UTC (permalink / raw)
  To: Alan Cox
  Cc: fverscheure, Linux Kernel Mailing List, Marcelo Tosatti,
	Andre Hedrick

On Fri, 2003-01-10 at 12:37, Alan Cox wrote:

> > And by the way how are powered off the IDE drives ?
> > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the 
> > drive with cache enabled or you will enjoy lost data
> 
> IDE disagrees with itself over this but when we get a controlled power
> off we do this. The same ATA5/ATA6 problem may well be present there
> too. I will review both

I did fix a data loss problem with some PPC laptops that way too, that
is just before shutdown and just before machine sleep, sending a STANDBYNOW
command. In the case of machine sleep, I make sure not to accept any more
request (mark the hwif busy) after that and until machine wake up (at which
point I do a full hard reset or poweron reset of the drive).

Ben.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 11:37 ` Alan Cox
  2003-01-10 11:07   ` Benjamin Herrenschmidt
@ 2003-01-10 11:14   ` Andre Hedrick
  2003-01-10 13:35     ` Alan Cox
  1 sibling, 1 reply; 16+ messages in thread
From: Andre Hedrick @ 2003-01-10 11:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: fverscheure, Linux Kernel Mailing List, Marcelo Tosatti

On 10 Jan 2003, Alan Cox wrote:

> On Fri, 2003-01-10 at 09:54, Francis Verscheure wrote:
> > In fact for ATA/ATAPI 5 cfs_enable_2  has no meaning.  The fields to test are 
> > write cache bits in command_set_1 and cfs_enable_1.
> > And in both cases the FLUSH CACHE command ALWAYS EXISTS !
> > The test of cfs_enable_2 must only be used for ATA/ATAPI 6 drives to use 
> > FLUSH CACHE or FLUSH CACHE EXT in case of 48 bit addressing mode.
> 
> Thanks for the report. I need to go reread the spec before I can
> definitively comemnt on this.

I had started to work out a ruleset from ATA-3 forward based on the
capablities supported/enabled/supported_enabled ...

But it it is not clear how to even hand 2 or 3 bits in word 93, I have
little hope with out huge amounts of help to classify a rule set for at
least 48-bits of supported against another 48-bits of enabled, traversing
ATA-3,4,5,6,7 major releases, and about 15 minor revisions.

> > And it seems to me that when an IDE drive has a cache enabled wcache must be 
> > initialized to say so ? Or you have to do a STANDY or SLEEP before APM 
> > suspend or power off to be sure that the cache has been written to the disk.
> 
> Technically - no. In the real world its a very very good idea

They default wcache enabled.

bdflush,sync,spin,flushcache,check_error,(OMG's),STANDY/SLEEP

OMG:

The drive does random and automatic flush caches, if an error happens it
does not report. *sigh*  When APM hits it with a flush and pray the error
is from this flush, but it does not matter ... the kernel does not have
the paths to deal this issue ... so bye bye data!  Now it if the current
flush is not the owner of the error OMFG is suggested.

OMFG:

Since there is not a mechanism to assist the drive, and the drive can not
help it self ... you can expect a device lockup and fatal operations.

Stuff people really do not want to know.

> > I had a look at patch 2.4.21pre3 and the code looks the same.
> > 
> > And by the way how are powered off the IDE drives ?
> > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the 
> > drive with cache enabled or you will enjoy lost data
> 
> IDE disagrees with itself over this but when we get a controlled power
> off we do this. The same ATA5/ATA6 problem may well be present there
> too. I will review both

Not true, the firmware knows to commit the data to platter.
If it was true you would be screaming long ago.

> Any specific opinion Andre ?

A dirty trick used to date is to pop the STANDY or SLEEP, and depend on
the drive to deal with the double dirty flush error.  If the FLUSH CACHE
was not valid, the drive would spin back up from STANDY, but not from
SLEEP, this could be a problem.  However SLEEP issued by the driver only
happens at shutdown unless it has been changed.  In the shutdown process,
each partition unmount was flushed and also once extra when the usage
count was set to zero.  Worst case was 2 flush min.

So it is a pig in a poke ...

Cheers,

Andre Hedrick
LAD Storage Consulting Group

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10  9:54 Problem in IDE Disks cache handling in kernel 2.4.XX Francis Verscheure
  2003-01-10 10:19 ` John Bradford
@ 2003-01-10 11:37 ` Alan Cox
  2003-01-10 11:07   ` Benjamin Herrenschmidt
  2003-01-10 11:14   ` Andre Hedrick
  2003-01-10 17:23 ` Alan Cox
  2 siblings, 2 replies; 16+ messages in thread
From: Alan Cox @ 2003-01-10 11:37 UTC (permalink / raw)
  To: fverscheure; +Cc: Linux Kernel Mailing List, Marcelo Tosatti, Andre Hedrick

On Fri, 2003-01-10 at 09:54, Francis Verscheure wrote:
> In fact for ATA/ATAPI 5 cfs_enable_2  has no meaning.  The fields to test are 
> write cache bits in command_set_1 and cfs_enable_1.
> And in both cases the FLUSH CACHE command ALWAYS EXISTS !
> The test of cfs_enable_2 must only be used for ATA/ATAPI 6 drives to use 
> FLUSH CACHE or FLUSH CACHE EXT in case of 48 bit addressing mode.

Thanks for the report. I need to go reread the spec before I can
definitively comemnt on this.

> And it seems to me that when an IDE drive has a cache enabled wcache must be 
> initialized to say so ? Or you have to do a STANDY or SLEEP before APM 
> suspend or power off to be sure that the cache has been written to the disk.

Technically - no. In the real world its a very very good idea

> I had a look at patch 2.4.21pre3 and the code looks the same.
> 
> And by the way how are powered off the IDE drives ?
> Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the 
> drive with cache enabled or you will enjoy lost data

IDE disagrees with itself over this but when we get a controlled power
off we do this. The same ATA5/ATA6 problem may well be present there
too. I will review both


Any specific opinion Andre ?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 13:35     ` Alan Cox
@ 2003-01-10 13:03       ` Andre Hedrick
  2003-01-10 14:13         ` Alan Cox
  0 siblings, 1 reply; 16+ messages in thread
From: Andre Hedrick @ 2003-01-10 13:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: fverscheure, Linux Kernel Mailing List, Marcelo Tosatti


Oh, just let the darn thing barf a 0x51/0x04 is fine with me!
Just an abort/unsupported command.

Cheers,


On 10 Jan 2003, Alan Cox wrote:

> On Fri, 2003-01-10 at 11:14, Andre Hedrick wrote:
> > The drive does random and automatic flush caches, if an error happens it
> > does not report. *sigh*  When APM hits it with a flush and pray the error
> > is from this flush, but it does not matter ... the kernel does not have
> > the paths to deal this issue ... so bye bye data!  Now it if the current
> > flush is not the owner of the error OMFG is suggested.
> 
> For that matter the BIOS tends to issue the flush, in fact APM is
> supposed to be transparent so the BIOS is required to handle it and
> since a critical shutdown from the APM PM might not even hit the OS
> it has to. Of course pigs also fly 8)
> 
> > > > I had a look at patch 2.4.21pre3 and the code looks the same.
> > > > 
> > > > And by the way how are powered off the IDE drives ?
> > > > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the 
> > > > drive with cache enabled or you will enjoy lost data
> > > 
> > > IDE disagrees with itself over this but when we get a controlled power
> > > off we do this. The same ATA5/ATA6 problem may well be present there
> > > too. I will review both
> > 
> > Not true, the firmware knows to commit the data to platter.
> > If it was true you would be screaming long ago.
> 
> IDE disagrees with itself because it is meant to work compatibly but if
> you run it compatibly you lose data on poweroff.
> 
> > 
> > > Any specific opinion Andre ?
> > 
> > A dirty trick used to date is to pop the STANDY or SLEEP, and depend on
> > the drive to deal with the double dirty flush error.  If the FLUSH CACHE
> > was not valid, the drive would spin back up from STANDY, but not from
> > SLEEP, this could be a problem.  However SLEEP issued by the driver only
> > happens at shutdown unless it has been changed.  In the shutdown process,
> > each partition unmount was flushed and also once extra when the usage
> > count was set to zero.  Worst case was 2 flush min.
> > 
> 
> The original question however is whether we are skipping issuing the flush
> and sleep on ATA3-5 devices when we should not, because the test is over
> strong.
> 
> It seems weakening the test is the best option, it fixes ATA-5 and any device
> told to sleep, standby or flush that doesn't know the command is just going
> to go "Huh ?" and we'll get a nice easy to handle error.
> 
> Alan
> 
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 11:14   ` Andre Hedrick
@ 2003-01-10 13:35     ` Alan Cox
  2003-01-10 13:03       ` Andre Hedrick
  0 siblings, 1 reply; 16+ messages in thread
From: Alan Cox @ 2003-01-10 13:35 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: fverscheure, Linux Kernel Mailing List, Marcelo Tosatti

On Fri, 2003-01-10 at 11:14, Andre Hedrick wrote:
> The drive does random and automatic flush caches, if an error happens it
> does not report. *sigh*  When APM hits it with a flush and pray the error
> is from this flush, but it does not matter ... the kernel does not have
> the paths to deal this issue ... so bye bye data!  Now it if the current
> flush is not the owner of the error OMFG is suggested.

For that matter the BIOS tends to issue the flush, in fact APM is
supposed to be transparent so the BIOS is required to handle it and
since a critical shutdown from the APM PM might not even hit the OS
it has to. Of course pigs also fly 8)

> > > I had a look at patch 2.4.21pre3 and the code looks the same.
> > > 
> > > And by the way how are powered off the IDE drives ?
> > > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the 
> > > drive with cache enabled or you will enjoy lost data
> > 
> > IDE disagrees with itself over this but when we get a controlled power
> > off we do this. The same ATA5/ATA6 problem may well be present there
> > too. I will review both
> 
> Not true, the firmware knows to commit the data to platter.
> If it was true you would be screaming long ago.

IDE disagrees with itself because it is meant to work compatibly but if
you run it compatibly you lose data on poweroff.

> 
> > Any specific opinion Andre ?
> 
> A dirty trick used to date is to pop the STANDY or SLEEP, and depend on
> the drive to deal with the double dirty flush error.  If the FLUSH CACHE
> was not valid, the drive would spin back up from STANDY, but not from
> SLEEP, this could be a problem.  However SLEEP issued by the driver only
> happens at shutdown unless it has been changed.  In the shutdown process,
> each partition unmount was flushed and also once extra when the usage
> count was set to zero.  Worst case was 2 flush min.
> 

The original question however is whether we are skipping issuing the flush
and sleep on ATA3-5 devices when we should not, because the test is over
strong.

It seems weakening the test is the best option, it fixes ATA-5 and any device
told to sleep, standby or flush that doesn't know the command is just going
to go "Huh ?" and we'll get a nice easy to handle error.

Alan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 13:03       ` Andre Hedrick
@ 2003-01-10 14:13         ` Alan Cox
  2003-01-10 16:48           ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Alan Cox @ 2003-01-10 14:13 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: fverscheure, Linux Kernel Mailing List, Marcelo Tosatti

On Fri, 2003-01-10 at 13:03, Andre Hedrick wrote:
> Oh, just let the darn thing barf a 0x51/0x04 is fine with me!
> Just an abort/unsupported command.

Sounds ok to me. We do need a barfsupressor option so we can issue
commands that may fail without confusing the user (eg multiwrite setup)

ie
	ide_hwif_barfsupress(hwif);
	ide_command....

That's very much true irrespective of the flush thing


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 14:13         ` Alan Cox
@ 2003-01-10 16:48           ` Jens Axboe
  2003-01-10 18:12             ` Alan Cox
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2003-01-10 16:48 UTC (permalink / raw)
  To: Alan Cox, Andre Hedrick
  Cc: fverscheure, Linux Kernel Mailing List, Marcelo Tosatti

On Fri, Jan 10 2003, Alan Cox wrote:
> On Fri, 2003-01-10 at 13:03, Andre Hedrick wrote:
> > Oh, just let the darn thing barf a 0x51/0x04 is fine with me!
> > Just an abort/unsupported command.
> 
> Sounds ok to me. We do need a barfsupressor option so we can issue
> commands that may fail without confusing the user (eg multiwrite setup)
> 
> ie
> 	ide_hwif_barfsupress(hwif);
> 	ide_command....
> 
> That's very much true irrespective of the flush thing

In the barrier patches, I just used drive->quiet to supress ide_error()
complaining too much (on cache flushes, too). Whether that's per-drive
of per-hwif entity, dunno...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 17:23 ` Alan Cox
@ 2003-01-10 17:02   ` John Bradford
  0 siblings, 0 replies; 16+ messages in thread
From: John Bradford @ 2003-01-10 17:02 UTC (permalink / raw)
  To: Alan Cox; +Cc: fverscheure, linux-kernel, marcelo, andre

> > And by the way how are powered off the IDE drives ?
> > Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before
> > powering off the drive with cache enabled or you will enjoy lost
> > data.
> 
> We always issue standby or sleep commands to a drive before powering
> off which means the cache flush thing should never have been an
> issue.

I experienced drives spinning back up after they had been flushed on
powerdown, which is not necessarily wrong, (I.E. I never noticed any
data loss), but it's not ideal.  Can't we do:

* Standby
* Flush
* Standby

or is there a reason not to?  I know there were discussions about the
right order to do the standyby and flush, and as far as I remember, we
never reached a conclusion :-).

John.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10  9:54 Problem in IDE Disks cache handling in kernel 2.4.XX Francis Verscheure
  2003-01-10 10:19 ` John Bradford
  2003-01-10 11:37 ` Alan Cox
@ 2003-01-10 17:23 ` Alan Cox
  2003-01-10 17:02   ` John Bradford
  2 siblings, 1 reply; 16+ messages in thread
From: Alan Cox @ 2003-01-10 17:23 UTC (permalink / raw)
  To: fverscheure; +Cc: Linux Kernel Mailing List, Marcelo Tosatti, Andre Hedrick

[-- Attachment #1: Type: text/plain, Size: 553 bytes --]

On Fri, 2003-01-10 at 09:54, Francis Verscheure wrote:
> And by the way how are powered off the IDE drives ?
> Because a FLUSH CACHE or STANDY or SLEEP is MANDATORY before powering off the 
> drive with cache enabled or you will enjoy lost data.

We always issue standby or sleep commands to a drive before powering off which means
the cache flush thing should never have been an issue.

Having been over the other stuff here is a fairly paranoid first cut. Marcelo
please *don't* apply this yet, I'd like an Andre review of it and some testing
first.


[-- Attachment #2: a1 --]
[-- Type: text/plain, Size: 2990 bytes --]

--- ../linux.21pre3/drivers/ide/ide-disk.c	2003-01-07 14:03:08.000000000 +0000
+++ drivers/ide/ide-disk.c	2003-01-10 17:15:02.000000000 +0000
@@ -38,9 +38,11 @@
  * Version 1.15		convert all calls to ide_raw_taskfile
  *				since args will return register content.
  * Version 1.16		added suspend-resume-checkpower
+ * Version 1.17		do flush on standy, do flush on ATA < ATA6
+ *			fix wcache setup.
  */
 
-#define IDEDISK_VERSION	"1.16"
+#define IDEDISK_VERSION	"1.17"
 
 #undef REALLY_SLOW_IO		/* most systems can safely undef this */
 
@@ -67,7 +69,7 @@
 #include <asm/uaccess.h>
 #include <asm/io.h>
 
-/* FIXME: soem day we shouldnt need to look in here! */
+/* FIXME: some day we shouldnt need to look in here! */
 
 #include "legacy/pdc4030.h"
 
@@ -716,6 +718,7 @@
 	MOD_INC_USE_COUNT;
 	if (drive->removable && drive->usage == 1) {
 		ide_task_t args;
+		int cf;
 		memset(&args, 0, sizeof(ide_task_t));
 		args.tfRegister[IDE_COMMAND_OFFSET] = WIN_DOORLOCK;
 		args.command_type = ide_cmd_type_parser(&args);
@@ -727,12 +730,40 @@
 		 */
 		if (drive->doorlocking && ide_raw_taskfile(drive, &args, NULL))
 			drive->doorlocking = 0;
+		drive->wcache = 0;
+		/* Cache enabled ? */
+		if (drive->id->csfo & 1)
+			drive->wcache = 1;
+		/* Cache command set available ? */
+		if (drive->id->cfs_enable_1 & (1<<5))
+			drive->wcache = 1;
+		/* ATA6 cache extended commands */
+		cf = drive->id->command_set_2 >> 24;
+		if((cf & 0xC0) == 0x40 && (cf & 0x30) != 0)
+			drive->wcache = 1;
 	}
 	return 0;
 }
 
 static int do_idedisk_flushcache(ide_drive_t *drive);
 
+static int ide_cacheflush_p(ide_drive_t *drive)
+{
+	if(drive->wcache)
+	{
+		printk(KERN_INFO "ide: performing cache flush.\n");
+		if (do_idedisk_flushcache(drive))
+		{
+			printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",
+				drive->name);
+			return -EIO;
+		}
+		return 1;
+	}
+	printk(KERN_INFO "ide: no cache flush required.\n");
+	return 0;
+}
+
 static void idedisk_release (struct inode *inode, struct file *filp, ide_drive_t *drive)
 {
 	if (drive->removable && !drive->usage) {
@@ -744,10 +775,7 @@
 		if (drive->doorlocking && ide_raw_taskfile(drive, &args, NULL))
 			drive->doorlocking = 0;
 	}
-	if ((drive->id->cfs_enable_2 & 0x3000) && drive->wcache)
-		if (do_idedisk_flushcache(drive))
-			printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",
-				drive->name);
+	ide_cacheflush_p(drive);
 	MOD_DEC_USE_COUNT;
 }
 
@@ -1435,7 +1463,7 @@
 {
 	if (drive->suspend_reset)
 		return 1;
-
+	ide_cacheflush_p(drive);
 	return call_idedisk_suspend(drive, 0);
 }
 
@@ -1679,10 +1707,7 @@
 
 static int idedisk_cleanup(ide_drive_t *drive)
 {
-	if ((drive->id->cfs_enable_2 & 0x3000) && drive->wcache)
-		if (do_idedisk_flushcache(drive))
-			printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",
-				drive->name);
+	ide_cacheflush_p(drive);
 	return ide_unregister_subdriver(drive);
 }
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* PATCH: [2.4.21-pre3] Fix for SMP race condition in IDE code
  2003-01-10 18:12             ` Alan Cox
@ 2003-01-10 18:12               ` Ross Biro
  2003-01-10 19:29                 ` Alan Cox
  2003-01-10 20:22                 ` Andre Hedrick
  2003-01-10 18:25               ` Problem in IDE Disks cache handling in kernel 2.4.XX Jens Axboe
  1 sibling, 2 replies; 16+ messages in thread
From: Ross Biro @ 2003-01-10 18:12 UTC (permalink / raw)
  To: Alan Cox, Andre Hedrick, Marcelo Tosatti; +Cc: Linux Kernel Mailing List


There is a race condition in all versions of the IDE code that I've 
looked at including 2.4.18 and 2.4.21-pre3. Basically on an SMP system 
if mutiple IDE channels are on the same interrupt and 1 channel sends 
has an interrupt pending on 1 processor while the other processor is 
calling ide_set_handler, then the interrupt can be mistaken for command 
completion on both channels, and a command can be completed before it is 
even issued.

This problem can be triggered with the following code

cd /proc/ide
(while true; do for i in hd[a-z]; do for j in 1 2 3 ; do cat 
$i/smart_values >/dev/null; done; done; done) &
(while true; do for i in hd[a-z]; do dd if=/dev/$i of=/dev/null bs=4096k 
skip=0 & done; wait; done) &

And can be seen by properly instrumenting drive_cmd_intr to check for 
errors.

On a dual proc machine with 4 channels on a single interrupt and 2.4.18 
I expec to see an error about once every twenty minutes with the above 
code.  I believe it should occur much less often on 2.4.20 and above.  

Drives react to this problem in different ways.  Often they simply lock 
up and refuse to talk to the host until they have been properly reset. 
 Some drives require a power cycle before they will work properly again.

This problem required the use of over 200 machines, approximately 2000 
drives, a bus analyzer, and a lot of cooperation from a couple of drive 
manufacturers to go from "something goes wrong once in a while" to 
something we could easily reproduce.

This patch has only been minimally tested and then only with 1 brand of 
ide hard drive.

----- snip ------

diff -durbB linux-2.4.20/drivers/ide/ide-cd.c 
linux-2.4.20-p1/drivers/ide/ide-cd.c
--- linux-2.4.20/drivers/ide/ide-cd.c    Thu Jan  9 11:14:01 2003
+++ linux-2.4.20-p1/drivers/ide/ide-cd.c    Wed Jan  8 16:25:04 2003
@@ -863,11 +864,15 @@
         HWIF(drive)->OUTB(drive->ctl, IDE_CONTROL_REG);
 
     if (CDROM_CONFIG_FLAGS (drive)->drq_interrupt) {
+                unsigned long flags;
         if (HWGROUP(drive)->handler != NULL)
             BUG();
-        ide_set_handler (drive, handler, WAIT_CMD, cdrom_timer_expiry);
+                spin_lock_irqsave(&io_request_lock, flags);
         /* packet command */
         HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+        ide_set_handler_nolock (drive, handler, WAIT_CMD, 
cdrom_timer_expiry);
+                ide_delay_400ns();
+                spin_unlock_irqrestore(&io_request_lock, flags);
         return ide_started;
     } else {
         /* packet command */
diff -durbB linux-2.4.20/drivers/ide/ide-disk.c 
linux-2.4.20-p1/drivers/ide/ide-disk.c
--- linux-2.4.20/drivers/ide/ide-disk.c    Thu Jan  9 11:14:01 2003
+++ linux-2.4.20-p1/drivers/ide/ide-disk.c    Wed Jan  8 16:25:17 2003
@@ -467,12 +467,15 @@
 #endif /* CONFIG_BLK_DEV_IDEDMA */
         if (HWGROUP(drive)->handler != NULL)
             BUG();
-        ide_set_handler(drive, &read_intr, WAIT_CMD, NULL);
 
         command = ((drive->mult_count) ?
                ((lba48) ? WIN_MULTREAD_EXT : WIN_MULTREAD) :
                ((lba48) ? WIN_READ_EXT : WIN_READ));
+                spin_lock_irqsave(&io_request_lock, flags);
         hwif->OUTB(command, IDE_COMMAND_REG);
+        ide_set_handler_nolock(drive, &read_intr, WAIT_CMD, NULL);
+                ide_delay_400ns();
+                spin_unlock_irqrestore(&io_request_lock, flags);
         return ide_started;
     } else if (rq_data_dir(rq) == WRITE) {
         ide_startstop_t startstop;
diff -durbB linux-2.4.20/drivers/ide/ide-dma.c 
linux-2.4.20-p1/drivers/ide/ide-dma.c
--- linux-2.4.20/drivers/ide/ide-dma.c    Thu Jan  9 11:14:01 2003
+++ linux-2.4.20-p1/drivers/ide/ide-dma.c    Thu Jan  9 15:32:09 2003
@@ -655,6 +655,7 @@
     unsigned int count    = 0;
     u8 dma_stat = 0, lba48    = (drive->addressing == 1) ? 1 : 0;
     task_ioreg_t command    = WIN_NOP;
+        unsigned long flags;
 
     if (!(count = ide_build_dmatable(drive, rq, PCI_DMA_FROMDEVICE)))
         /* try PIO instead of DMA */
@@ -673,7 +674,6 @@
     /* paranoia check */
     if (HWGROUP(drive)->handler != NULL)
         BUG();
-    ide_set_handler(drive, &ide_dma_intr, 2*WAIT_CMD, dma_timer_expiry);
 
     /*
      * FIX ME to use only ACB ide_task_t args Struct
@@ -691,7 +691,11 @@
     }
 #endif
     /* issue cmd to drive */
+        spin_lock_irqsave(&io_request_lock, flags);
     hwif->OUTB(command, IDE_COMMAND_REG);
+    ide_set_handler_nolock(drive, &ide_dma_intr, 2*WAIT_CMD, 
dma_timer_expiry);
+        ide_delay_400ns();
+        spin_unlock_irqrestore(&io_request_lock, flags);
 
     return HWIF(drive)->ide_dma_count(drive);
 }
@@ -707,6 +711,7 @@
     unsigned int count    = 0;
     u8 dma_stat = 0, lba48    = (drive->addressing == 1) ? 1 : 0;
     task_ioreg_t command    = WIN_NOP;
+        unsigned long flags;
 
     if (!(count = ide_build_dmatable(drive, rq, PCI_DMA_TODEVICE)))
         /* try PIO instead of DMA */
@@ -725,7 +730,6 @@
     /* paranoia check */
     if (HWGROUP(drive)->handler != NULL)
         BUG();
-    ide_set_handler(drive, &ide_dma_intr, 2*WAIT_CMD, dma_timer_expiry);
     /*
      * FIX ME to use only ACB ide_task_t args Struct
      */
@@ -742,7 +746,13 @@
     }
 #endif
     /* issue cmd to drive */
+        spin_lock_irqsave(&io_request_lock, flags);
     hwif->OUTB(command, IDE_COMMAND_REG);
+    ide_set_handler_nolock(drive, &ide_dma_intr,
+                               2*WAIT_CMD, dma_timer_expiry);
+        ide_delay_400ns();
+        spin_unlock_irqrestore(&io_request_lock, flags);
+       
     return HWIF(drive)->ide_dma_count(drive);
 }
 
diff -durbB linux-2.4.20/drivers/ide/ide-floppy.c 
linux-2.4.20-p1/drivers/ide/ide-floppy.c
--- linux-2.4.20/drivers/ide/ide-floppy.c    Thu Jan  9 11:14:01 2003
+++ linux-2.4.20-p1/drivers/ide/ide-floppy.c    Wed Jan  8 16:15:17 2003
@@ -1123,14 +1123,17 @@
     }
    
     if (test_bit(IDEFLOPPY_DRQ_INTERRUPT, &floppy->flags)) {
+                unsigned long flags;
         if (HWGROUP(drive)->handler != NULL)
             BUG();
+        /* Issue the packet command */
+                spin_lock_irqsave(&io_request_lock, flags);
+        HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
         ide_set_handler(drive,
                 pkt_xfer_routine,
                 IDEFLOPPY_WAIT_CMD,
                 NULL);
-        /* Issue the packet command */
-        HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+                spin_unlock_irqrestore(&io_request_lock, flags);
         return ide_started;
     } else {
         /* Issue the packet command */
diff -durbB linux-2.4.20/drivers/ide/ide-io.c 
linux-2.4.20-p1/drivers/ide/ide-io.c
--- linux-2.4.20/drivers/ide/ide-io.c    Thu Jan  9 11:14:01 2003
+++ linux-2.4.20-p1/drivers/ide/ide-io.c    Wed Jan  8 16:25:37 2003
@@ -363,14 +363,21 @@
 void ide_cmd (ide_drive_t *drive, u8 cmd, u8 nsect, ide_handler_t *handler)
 {
     ide_hwif_t *hwif = HWIF(drive);
+        unsigned long flags;
+
     if (HWGROUP(drive)->handler != NULL)
         BUG();
-    ide_set_handler(drive, handler, WAIT_CMD, NULL);
     if (IDE_CONTROL_REG)
         hwif->OUTB(drive->ctl,IDE_CONTROL_REG);    /* clear nIEN */
     SELECT_MASK(drive,0);
     hwif->OUTB(nsect,IDE_NSECTOR_REG);
+
+        spin_lock_irqsave(&io_request_lock, flags);
     hwif->OUTB(cmd,IDE_COMMAND_REG);
+    ide_set_handler_nolock(drive, handler, WAIT_CMD, NULL);
+        ide_delay_400ns();
+        spin_unlock_irqrestore(&io_request_lock, flags);
+
 }
 
 EXPORT_SYMBOL(ide_cmd);
diff -durbB linux-2.4.20/drivers/ide/ide-iops.c 
linux-2.4.20-p1/drivers/ide/ide-iops.c
--- linux-2.4.20/drivers/ide/ide-iops.c    Thu Jan  9 11:14:01 2003
+++ linux-2.4.20-p1/drivers/ide/ide-iops.c    Wed Jan  8 15:54:18 2003
@@ -908,13 +908,14 @@
  * timer is started to prevent us from waiting forever in case
  * something goes wrong (see the ide_timer_expiry() handler later on).
  */
-void ide_set_handler (ide_drive_t *drive, ide_handler_t *handler,
+
+/* This version doesn't get the spinlock, so you must call it with a 
spinlock
+   on io_request_lock. */
+void ide_set_handler_nolock (ide_drive_t *drive, ide_handler_t *handler,
               unsigned int timeout, ide_expiry_t *expiry)
 {
-    unsigned long flags;
     ide_hwgroup_t *hwgroup = HWGROUP(drive);
 
-    spin_lock_irqsave(&io_request_lock, flags);
     if (hwgroup->handler != NULL) {
         printk("%s: ide_set_handler: handler not null; "
             "old=%p, new=%p\n",
@@ -924,6 +925,15 @@
     hwgroup->expiry        = expiry;
     hwgroup->timer.expires    = jiffies + timeout;
     add_timer(&hwgroup->timer);
+}
+
+/* This version grabs and releases the io_request_lock, so must be called
+   with out the spinlock grabbed. */
+void ide_set_handler (ide_drive_t *drive, ide_handler_t *handler,
+              unsigned int timeout, ide_expiry_t *expiry) {
+    unsigned long flags;
+    spin_lock_irqsave(&io_request_lock, flags);
+        ide_set_handler_nolock(drive, handler, timeout, expiry);
     spin_unlock_irqrestore(&io_request_lock, flags);
 }
 
diff -durbB linux-2.4.20/drivers/ide/ide-tape.c 
linux-2.4.20-p1/drivers/ide/ide-tape.c
--- linux-2.4.20/drivers/ide/ide-tape.c    Thu Jan  9 11:14:01 2003
+++ linux-2.4.20-p1/drivers/ide/ide-tape.c    Wed Jan  8 16:20:09 2003
@@ -2457,13 +2457,17 @@
         set_bit(PC_DMA_IN_PROGRESS, &pc->flags);
 #endif /* CONFIG_BLK_DEV_IDEDMA */
     if (test_bit(IDETAPE_DRQ_INTERRUPT, &tape->flags)) {
+                unsigned long flags;
         if (HWGROUP(drive)->handler != NULL)
             BUG();
-        ide_set_handler(drive,
+
+                spin_lock_irqsave(&io_request_lock, flags);
+        HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+        ide_set_handler_nolock(drive,
                 &idetape_transfer_pc,
                 IDETAPE_WAIT_CMD,
                 NULL);
-        HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
+                spin_unlock_irqrestore(&io_request_lock, flags);
         return ide_started;
     } else {
         HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
diff -durbB linux-2.4.20/drivers/ide/ide-taskfile.c 
linux-2.4.20-p1/drivers/ide/ide-taskfile.c
--- linux-2.4.20/drivers/ide/ide-taskfile.c    Thu Jan  9 11:14:01 2003
+++ linux-2.4.20-p1/drivers/ide/ide-taskfile.c    Wed Jan  8 16:25:43 2003
@@ -173,6 +173,7 @@
     task_struct_t *taskfile    = (task_struct_t *) task->tfRegister;
     hob_struct_t *hobfile    = (hob_struct_t *) task->hobRegister;
     u8 HIHI            = (drive->addressing == 1) ? 0xE0 : 0xEF;
+        unsigned long flags;
 
 #ifdef CONFIG_IDE_TASK_IOCTL_DEBUG
     void debug_taskfile(drive, task);
@@ -201,8 +202,14 @@
 
     hwif->OUTB((taskfile->device_head & HIHI) | drive->select.all, 
IDE_SELECT_REG);
     if (task->handler != NULL) {
-        ide_set_handler(drive, task->handler, WAIT_WORSTCASE, NULL);
+                spin_lock_irqsave(&io_request_lock, flags);
         hwif->OUTB(taskfile->command, IDE_COMMAND_REG);
+                /* We need to give the drive time to set the busy
+                   flag, or we may mistake an interrupt from another drive
+                   for the command completion on this drive. */
+        ide_set_handler_nolock(drive, task->handler, WAIT_WORSTCASE, NULL);
+                ide_delay_400ns();
+                spin_unlock_irqrestore(&io_request_lock, flags);
         if (task->prehandler != NULL)
             return task->prehandler(drive, task->rq);
         return ide_started;
@@ -1832,6 +1839,7 @@
     ide_hwif_t *hwif    = HWIF(drive);
     task_struct_t *taskfile    = (task_struct_t *) task->tfRegister;
     hob_struct_t *hobfile    = (hob_struct_t *) task->hobRegister;
+        unsigned long flags;
 #if DEBUG_TASKFILE
     u8 status;
 #endif
@@ -1929,9 +1937,13 @@
              if (task->handler == NULL)
                 return ide_stopped;
 
-            ide_set_handler(drive, task->handler, WAIT_WORSTCASE, NULL);
+                       
             /* Issue the command */
+                        spin_lock_irqsave(&io_request_lock, flags);
             hwif->OUTB(taskfile->command, IDE_COMMAND_REG);
+            ide_set_handler_nolock(drive, task->handler,
+                                               WAIT_WORSTCASE, NULL);
+                        spin_unlock_irqrestore(&io_request_lock, flags);
             if (task->prehandler != NULL)
                 return task->prehandler(drive, HWGROUP(drive)->rq);
     }
diff -durbB linux-2.4.20/include/asm-i386/ide.h 
linux-2.4.20-p1/include/asm-i386/ide.h
--- linux-2.4.20/include/asm-i386/ide.h    Thu Jan  9 11:17:05 2003
+++ linux-2.4.20-p1/include/asm-i386/ide.h    Fri Jan 10 09:54:07 2003
@@ -14,6 +14,7 @@
 #ifdef __KERNEL__
 
 #include <linux/config.h>
+#include <linux/delay.h>
 
 #ifndef MAX_HWIFS
 # ifdef CONFIG_BLK_DEV_IDEPCI
@@ -22,6 +23,16 @@
 #define MAX_HWIFS    6
 # endif
 #endif
+
+
+
+/* The ATA spec requires 400ns delays all over the place. */
+/* Do the same fixed point trick the udelay does to get our delay. */
+#define IDE_DELAY_400NS
+static __inline__ void ide_delay_400ns(void)
+{
+        __const_udelay (400 * 4);
+}
 
 static __inline__ int ide_default_irq(ide_ioreg_t base)
 {
diff -durbB linux-2.4.20/include/linux/ide.h 
linux-2.4.20-p1/include/linux/ide.h
--- linux-2.4.20/include/linux/ide.h    Thu Jan  9 11:17:05 2003
+++ linux-2.4.20-p1/include/linux/ide.h    Thu Jan  9 15:37:22 2003
@@ -18,6 +18,7 @@
 #include <linux/bitops.h>
 #include <linux/highmem.h>
 #include <linux/pci.h>
+#include <linux/delay.h>
 #include <asm/byteorder.h>
 #include <asm/system.h>
 #include <asm/hdreg.h>
@@ -354,6 +355,11 @@
 
 #include <asm/ide.h>
 
+#ifndef IDE_DELAY_400NS
+#define IDE_DELAY_400NS
+static inline void ide_delay_400ns(void) { udelay(1); }
+#endif
+
 /* Currently only m68k, apus and m8xx need it */
 #ifdef IDE_ARCH_ACK_INTR
 extern int ide_irq_lock;
@@ -1282,6 +1288,7 @@
  * and also to start the safety timer.
  */
 extern void ide_set_handler(ide_drive_t *, ide_handler_t *, unsigned 
int, ide_expiry_t *);
+extern void ide_set_handler_nolock(ide_drive_t *, ide_handler_t *, 
unsigned int, ide_expiry_t *);
 
 /*
  * Error reporting, in human readable form (luxurious, but a memory hog).



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 16:48           ` Jens Axboe
@ 2003-01-10 18:12             ` Alan Cox
  2003-01-10 18:12               ` PATCH: [2.4.21-pre3] Fix for SMP race condition in IDE code Ross Biro
  2003-01-10 18:25               ` Problem in IDE Disks cache handling in kernel 2.4.XX Jens Axboe
  0 siblings, 2 replies; 16+ messages in thread
From: Alan Cox @ 2003-01-10 18:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andre Hedrick, fverscheure, Linux Kernel Mailing List,
	Marcelo Tosatti

On Fri, 2003-01-10 at 16:48, Jens Axboe wrote:
> In the barrier patches, I just used drive->quiet to supress ide_error()
> complaining too much (on cache flushes, too). Whether that's per-drive
> of per-hwif entity, dunno...

Commands are queued per hwif so it doesn't actually matter I suspect.

BTW do you plan to fix up the oopses in the tcq code or should I just mark
it disabled for anyone who has the time to finish the job ? There are a 
whole pile of drivers that fail with tcq - mostly because they have custom
dma end functions



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problem in IDE Disks cache handling in kernel 2.4.XX
  2003-01-10 18:12             ` Alan Cox
  2003-01-10 18:12               ` PATCH: [2.4.21-pre3] Fix for SMP race condition in IDE code Ross Biro
@ 2003-01-10 18:25               ` Jens Axboe
  1 sibling, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2003-01-10 18:25 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andre Hedrick, fverscheure, Linux Kernel Mailing List,
	Marcelo Tosatti

On Fri, Jan 10 2003, Alan Cox wrote:
> On Fri, 2003-01-10 at 16:48, Jens Axboe wrote:
> > In the barrier patches, I just used drive->quiet to supress ide_error()
> > complaining too much (on cache flushes, too). Whether that's per-drive
> > of per-hwif entity, dunno...
> 
> Commands are queued per hwif so it doesn't actually matter I suspect.

True

> BTW do you plan to fix up the oopses in the tcq code or should I just mark
> it disabled for anyone who has the time to finish the job ? There are a 
> whole pile of drivers that fail with tcq - mostly because they have custom
> dma end functions

Yes I will get around to it, probably next week.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: [2.4.21-pre3] Fix for SMP race condition in IDE code
  2003-01-10 18:12               ` PATCH: [2.4.21-pre3] Fix for SMP race condition in IDE code Ross Biro
@ 2003-01-10 19:29                 ` Alan Cox
  2003-01-10 20:22                 ` Andre Hedrick
  1 sibling, 0 replies; 16+ messages in thread
From: Alan Cox @ 2003-01-10 19:29 UTC (permalink / raw)
  To: Ross Biro; +Cc: Andre Hedrick, Marcelo Tosatti, Linux Kernel Mailing List

On Fri, 2003-01-10 at 18:12, Ross Biro wrote:
> There is a race condition in all versions of the IDE code that I've 
> looked at including 2.4.18 and 2.4.21-pre3. Basically on an SMP system 
> if mutiple IDE channels are on the same interrupt and 1 channel sends 
> has an interrupt pending on 1 processor while the other processor is 
> calling ide_set_handler, then the interrupt can be mistaken for command 
> completion on both channels, and a command can be completed before it is 
> even issued.

Thanks Ross. I'll go over this in some detail. 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: PATCH: [2.4.21-pre3] Fix for SMP race condition in IDE code
  2003-01-10 18:12               ` PATCH: [2.4.21-pre3] Fix for SMP race condition in IDE code Ross Biro
  2003-01-10 19:29                 ` Alan Cox
@ 2003-01-10 20:22                 ` Andre Hedrick
  1 sibling, 0 replies; 16+ messages in thread
From: Andre Hedrick @ 2003-01-10 20:22 UTC (permalink / raw)
  To: Ross Biro; +Cc: Alan Cox, Marcelo Tosatti, Linux Kernel Mailing List


Ross,

Sheesh, I know who is a candiated for me to dump the maintainership on in
the future!  This will take a little time to reveiw; however, I know of no
one who runs systems in this scale.

Also "t" is the last drive.

If you need more drives, use the upper half of the current majors
/dev/hdu 3,128
/dev/hdv 3,192

Repeat ... that will make it to 20 channels.

Cheers,


On Fri, 10 Jan 2003, Ross Biro wrote:

> 
> There is a race condition in all versions of the IDE code that I've 
> looked at including 2.4.18 and 2.4.21-pre3. Basically on an SMP system 
> if mutiple IDE channels are on the same interrupt and 1 channel sends 
> has an interrupt pending on 1 processor while the other processor is 
> calling ide_set_handler, then the interrupt can be mistaken for command 
> completion on both channels, and a command can be completed before it is 
> even issued.
> 
> This problem can be triggered with the following code
> 
> cd /proc/ide
> (while true; do for i in hd[a-z]; do for j in 1 2 3 ; do cat 
> $i/smart_values >/dev/null; done; done; done) &
> (while true; do for i in hd[a-z]; do dd if=/dev/$i of=/dev/null bs=4096k 
> skip=0 & done; wait; done) &
> 
> And can be seen by properly instrumenting drive_cmd_intr to check for 
> errors.
> 
> On a dual proc machine with 4 channels on a single interrupt and 2.4.18 
> I expec to see an error about once every twenty minutes with the above 
> code.  I believe it should occur much less often on 2.4.20 and above.  
> 
> Drives react to this problem in different ways.  Often they simply lock 
> up and refuse to talk to the host until they have been properly reset. 
>  Some drives require a power cycle before they will work properly again.
> 
> This problem required the use of over 200 machines, approximately 2000 
> drives, a bus analyzer, and a lot of cooperation from a couple of drive 
> manufacturers to go from "something goes wrong once in a while" to 
> something we could easily reproduce.
> 
> This patch has only been minimally tested and then only with 1 brand of 
> ide hard drive.
> 
> ----- snip ------
> 
> diff -durbB linux-2.4.20/drivers/ide/ide-cd.c 
> linux-2.4.20-p1/drivers/ide/ide-cd.c
> --- linux-2.4.20/drivers/ide/ide-cd.c    Thu Jan  9 11:14:01 2003
> +++ linux-2.4.20-p1/drivers/ide/ide-cd.c    Wed Jan  8 16:25:04 2003
> @@ -863,11 +864,15 @@
>          HWIF(drive)->OUTB(drive->ctl, IDE_CONTROL_REG);
>  
>      if (CDROM_CONFIG_FLAGS (drive)->drq_interrupt) {
> +                unsigned long flags;
>          if (HWGROUP(drive)->handler != NULL)
>              BUG();
> -        ide_set_handler (drive, handler, WAIT_CMD, cdrom_timer_expiry);
> +                spin_lock_irqsave(&io_request_lock, flags);
>          /* packet command */
>          HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> +        ide_set_handler_nolock (drive, handler, WAIT_CMD, 
> cdrom_timer_expiry);
> +                ide_delay_400ns();
> +                spin_unlock_irqrestore(&io_request_lock, flags);
>          return ide_started;
>      } else {
>          /* packet command */
> diff -durbB linux-2.4.20/drivers/ide/ide-disk.c 
> linux-2.4.20-p1/drivers/ide/ide-disk.c
> --- linux-2.4.20/drivers/ide/ide-disk.c    Thu Jan  9 11:14:01 2003
> +++ linux-2.4.20-p1/drivers/ide/ide-disk.c    Wed Jan  8 16:25:17 2003
> @@ -467,12 +467,15 @@
>  #endif /* CONFIG_BLK_DEV_IDEDMA */
>          if (HWGROUP(drive)->handler != NULL)
>              BUG();
> -        ide_set_handler(drive, &read_intr, WAIT_CMD, NULL);
>  
>          command = ((drive->mult_count) ?
>                 ((lba48) ? WIN_MULTREAD_EXT : WIN_MULTREAD) :
>                 ((lba48) ? WIN_READ_EXT : WIN_READ));
> +                spin_lock_irqsave(&io_request_lock, flags);
>          hwif->OUTB(command, IDE_COMMAND_REG);
> +        ide_set_handler_nolock(drive, &read_intr, WAIT_CMD, NULL);
> +                ide_delay_400ns();
> +                spin_unlock_irqrestore(&io_request_lock, flags);
>          return ide_started;
>      } else if (rq_data_dir(rq) == WRITE) {
>          ide_startstop_t startstop;
> diff -durbB linux-2.4.20/drivers/ide/ide-dma.c 
> linux-2.4.20-p1/drivers/ide/ide-dma.c
> --- linux-2.4.20/drivers/ide/ide-dma.c    Thu Jan  9 11:14:01 2003
> +++ linux-2.4.20-p1/drivers/ide/ide-dma.c    Thu Jan  9 15:32:09 2003
> @@ -655,6 +655,7 @@
>      unsigned int count    = 0;
>      u8 dma_stat = 0, lba48    = (drive->addressing == 1) ? 1 : 0;
>      task_ioreg_t command    = WIN_NOP;
> +        unsigned long flags;
>  
>      if (!(count = ide_build_dmatable(drive, rq, PCI_DMA_FROMDEVICE)))
>          /* try PIO instead of DMA */
> @@ -673,7 +674,6 @@
>      /* paranoia check */
>      if (HWGROUP(drive)->handler != NULL)
>          BUG();
> -    ide_set_handler(drive, &ide_dma_intr, 2*WAIT_CMD, dma_timer_expiry);
>  
>      /*
>       * FIX ME to use only ACB ide_task_t args Struct
> @@ -691,7 +691,11 @@
>      }
>  #endif
>      /* issue cmd to drive */
> +        spin_lock_irqsave(&io_request_lock, flags);
>      hwif->OUTB(command, IDE_COMMAND_REG);
> +    ide_set_handler_nolock(drive, &ide_dma_intr, 2*WAIT_CMD, 
> dma_timer_expiry);
> +        ide_delay_400ns();
> +        spin_unlock_irqrestore(&io_request_lock, flags);
>  
>      return HWIF(drive)->ide_dma_count(drive);
>  }
> @@ -707,6 +711,7 @@
>      unsigned int count    = 0;
>      u8 dma_stat = 0, lba48    = (drive->addressing == 1) ? 1 : 0;
>      task_ioreg_t command    = WIN_NOP;
> +        unsigned long flags;
>  
>      if (!(count = ide_build_dmatable(drive, rq, PCI_DMA_TODEVICE)))
>          /* try PIO instead of DMA */
> @@ -725,7 +730,6 @@
>      /* paranoia check */
>      if (HWGROUP(drive)->handler != NULL)
>          BUG();
> -    ide_set_handler(drive, &ide_dma_intr, 2*WAIT_CMD, dma_timer_expiry);
>      /*
>       * FIX ME to use only ACB ide_task_t args Struct
>       */
> @@ -742,7 +746,13 @@
>      }
>  #endif
>      /* issue cmd to drive */
> +        spin_lock_irqsave(&io_request_lock, flags);
>      hwif->OUTB(command, IDE_COMMAND_REG);
> +    ide_set_handler_nolock(drive, &ide_dma_intr,
> +                               2*WAIT_CMD, dma_timer_expiry);
> +        ide_delay_400ns();
> +        spin_unlock_irqrestore(&io_request_lock, flags);
> +       
>      return HWIF(drive)->ide_dma_count(drive);
>  }
>  
> diff -durbB linux-2.4.20/drivers/ide/ide-floppy.c 
> linux-2.4.20-p1/drivers/ide/ide-floppy.c
> --- linux-2.4.20/drivers/ide/ide-floppy.c    Thu Jan  9 11:14:01 2003
> +++ linux-2.4.20-p1/drivers/ide/ide-floppy.c    Wed Jan  8 16:15:17 2003
> @@ -1123,14 +1123,17 @@
>      }
>     
>      if (test_bit(IDEFLOPPY_DRQ_INTERRUPT, &floppy->flags)) {
> +                unsigned long flags;
>          if (HWGROUP(drive)->handler != NULL)
>              BUG();
> +        /* Issue the packet command */
> +                spin_lock_irqsave(&io_request_lock, flags);
> +        HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
>          ide_set_handler(drive,
>                  pkt_xfer_routine,
>                  IDEFLOPPY_WAIT_CMD,
>                  NULL);
> -        /* Issue the packet command */
> -        HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> +                spin_unlock_irqrestore(&io_request_lock, flags);
>          return ide_started;
>      } else {
>          /* Issue the packet command */
> diff -durbB linux-2.4.20/drivers/ide/ide-io.c 
> linux-2.4.20-p1/drivers/ide/ide-io.c
> --- linux-2.4.20/drivers/ide/ide-io.c    Thu Jan  9 11:14:01 2003
> +++ linux-2.4.20-p1/drivers/ide/ide-io.c    Wed Jan  8 16:25:37 2003
> @@ -363,14 +363,21 @@
>  void ide_cmd (ide_drive_t *drive, u8 cmd, u8 nsect, ide_handler_t *handler)
>  {
>      ide_hwif_t *hwif = HWIF(drive);
> +        unsigned long flags;
> +
>      if (HWGROUP(drive)->handler != NULL)
>          BUG();
> -    ide_set_handler(drive, handler, WAIT_CMD, NULL);
>      if (IDE_CONTROL_REG)
>          hwif->OUTB(drive->ctl,IDE_CONTROL_REG);    /* clear nIEN */
>      SELECT_MASK(drive,0);
>      hwif->OUTB(nsect,IDE_NSECTOR_REG);
> +
> +        spin_lock_irqsave(&io_request_lock, flags);
>      hwif->OUTB(cmd,IDE_COMMAND_REG);
> +    ide_set_handler_nolock(drive, handler, WAIT_CMD, NULL);
> +        ide_delay_400ns();
> +        spin_unlock_irqrestore(&io_request_lock, flags);
> +
>  }
>  
>  EXPORT_SYMBOL(ide_cmd);
> diff -durbB linux-2.4.20/drivers/ide/ide-iops.c 
> linux-2.4.20-p1/drivers/ide/ide-iops.c
> --- linux-2.4.20/drivers/ide/ide-iops.c    Thu Jan  9 11:14:01 2003
> +++ linux-2.4.20-p1/drivers/ide/ide-iops.c    Wed Jan  8 15:54:18 2003
> @@ -908,13 +908,14 @@
>   * timer is started to prevent us from waiting forever in case
>   * something goes wrong (see the ide_timer_expiry() handler later on).
>   */
> -void ide_set_handler (ide_drive_t *drive, ide_handler_t *handler,
> +
> +/* This version doesn't get the spinlock, so you must call it with a 
> spinlock
> +   on io_request_lock. */
> +void ide_set_handler_nolock (ide_drive_t *drive, ide_handler_t *handler,
>                unsigned int timeout, ide_expiry_t *expiry)
>  {
> -    unsigned long flags;
>      ide_hwgroup_t *hwgroup = HWGROUP(drive);
>  
> -    spin_lock_irqsave(&io_request_lock, flags);
>      if (hwgroup->handler != NULL) {
>          printk("%s: ide_set_handler: handler not null; "
>              "old=%p, new=%p\n",
> @@ -924,6 +925,15 @@
>      hwgroup->expiry        = expiry;
>      hwgroup->timer.expires    = jiffies + timeout;
>      add_timer(&hwgroup->timer);
> +}
> +
> +/* This version grabs and releases the io_request_lock, so must be called
> +   with out the spinlock grabbed. */
> +void ide_set_handler (ide_drive_t *drive, ide_handler_t *handler,
> +              unsigned int timeout, ide_expiry_t *expiry) {
> +    unsigned long flags;
> +    spin_lock_irqsave(&io_request_lock, flags);
> +        ide_set_handler_nolock(drive, handler, timeout, expiry);
>      spin_unlock_irqrestore(&io_request_lock, flags);
>  }
>  
> diff -durbB linux-2.4.20/drivers/ide/ide-tape.c 
> linux-2.4.20-p1/drivers/ide/ide-tape.c
> --- linux-2.4.20/drivers/ide/ide-tape.c    Thu Jan  9 11:14:01 2003
> +++ linux-2.4.20-p1/drivers/ide/ide-tape.c    Wed Jan  8 16:20:09 2003
> @@ -2457,13 +2457,17 @@
>          set_bit(PC_DMA_IN_PROGRESS, &pc->flags);
>  #endif /* CONFIG_BLK_DEV_IDEDMA */
>      if (test_bit(IDETAPE_DRQ_INTERRUPT, &tape->flags)) {
> +                unsigned long flags;
>          if (HWGROUP(drive)->handler != NULL)
>              BUG();
> -        ide_set_handler(drive,
> +
> +                spin_lock_irqsave(&io_request_lock, flags);
> +        HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> +        ide_set_handler_nolock(drive,
>                  &idetape_transfer_pc,
>                  IDETAPE_WAIT_CMD,
>                  NULL);
> -        HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> +                spin_unlock_irqrestore(&io_request_lock, flags);
>          return ide_started;
>      } else {
>          HWIF(drive)->OUTB(WIN_PACKETCMD, IDE_COMMAND_REG);
> diff -durbB linux-2.4.20/drivers/ide/ide-taskfile.c 
> linux-2.4.20-p1/drivers/ide/ide-taskfile.c
> --- linux-2.4.20/drivers/ide/ide-taskfile.c    Thu Jan  9 11:14:01 2003
> +++ linux-2.4.20-p1/drivers/ide/ide-taskfile.c    Wed Jan  8 16:25:43 2003
> @@ -173,6 +173,7 @@
>      task_struct_t *taskfile    = (task_struct_t *) task->tfRegister;
>      hob_struct_t *hobfile    = (hob_struct_t *) task->hobRegister;
>      u8 HIHI            = (drive->addressing == 1) ? 0xE0 : 0xEF;
> +        unsigned long flags;
>  
>  #ifdef CONFIG_IDE_TASK_IOCTL_DEBUG
>      void debug_taskfile(drive, task);
> @@ -201,8 +202,14 @@
>  
>      hwif->OUTB((taskfile->device_head & HIHI) | drive->select.all, 
> IDE_SELECT_REG);
>      if (task->handler != NULL) {
> -        ide_set_handler(drive, task->handler, WAIT_WORSTCASE, NULL);
> +                spin_lock_irqsave(&io_request_lock, flags);
>          hwif->OUTB(taskfile->command, IDE_COMMAND_REG);
> +                /* We need to give the drive time to set the busy
> +                   flag, or we may mistake an interrupt from another drive
> +                   for the command completion on this drive. */
> +        ide_set_handler_nolock(drive, task->handler, WAIT_WORSTCASE, NULL);
> +                ide_delay_400ns();
> +                spin_unlock_irqrestore(&io_request_lock, flags);
>          if (task->prehandler != NULL)
>              return task->prehandler(drive, task->rq);
>          return ide_started;
> @@ -1832,6 +1839,7 @@
>      ide_hwif_t *hwif    = HWIF(drive);
>      task_struct_t *taskfile    = (task_struct_t *) task->tfRegister;
>      hob_struct_t *hobfile    = (hob_struct_t *) task->hobRegister;
> +        unsigned long flags;
>  #if DEBUG_TASKFILE
>      u8 status;
>  #endif
> @@ -1929,9 +1937,13 @@
>               if (task->handler == NULL)
>                  return ide_stopped;
>  
> -            ide_set_handler(drive, task->handler, WAIT_WORSTCASE, NULL);
> +                       
>              /* Issue the command */
> +                        spin_lock_irqsave(&io_request_lock, flags);
>              hwif->OUTB(taskfile->command, IDE_COMMAND_REG);
> +            ide_set_handler_nolock(drive, task->handler,
> +                                               WAIT_WORSTCASE, NULL);
> +                        spin_unlock_irqrestore(&io_request_lock, flags);
>              if (task->prehandler != NULL)
>                  return task->prehandler(drive, HWGROUP(drive)->rq);
>      }
> diff -durbB linux-2.4.20/include/asm-i386/ide.h 
> linux-2.4.20-p1/include/asm-i386/ide.h
> --- linux-2.4.20/include/asm-i386/ide.h    Thu Jan  9 11:17:05 2003
> +++ linux-2.4.20-p1/include/asm-i386/ide.h    Fri Jan 10 09:54:07 2003
> @@ -14,6 +14,7 @@
>  #ifdef __KERNEL__
>  
>  #include <linux/config.h>
> +#include <linux/delay.h>
>  
>  #ifndef MAX_HWIFS
>  # ifdef CONFIG_BLK_DEV_IDEPCI
> @@ -22,6 +23,16 @@
>  #define MAX_HWIFS    6
>  # endif
>  #endif
> +
> +
> +
> +/* The ATA spec requires 400ns delays all over the place. */
> +/* Do the same fixed point trick the udelay does to get our delay. */
> +#define IDE_DELAY_400NS
> +static __inline__ void ide_delay_400ns(void)
> +{
> +        __const_udelay (400 * 4);
> +}
>  
>  static __inline__ int ide_default_irq(ide_ioreg_t base)
>  {
> diff -durbB linux-2.4.20/include/linux/ide.h 
> linux-2.4.20-p1/include/linux/ide.h
> --- linux-2.4.20/include/linux/ide.h    Thu Jan  9 11:17:05 2003
> +++ linux-2.4.20-p1/include/linux/ide.h    Thu Jan  9 15:37:22 2003
> @@ -18,6 +18,7 @@
>  #include <linux/bitops.h>
>  #include <linux/highmem.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>  #include <asm/byteorder.h>
>  #include <asm/system.h>
>  #include <asm/hdreg.h>
> @@ -354,6 +355,11 @@
>  
>  #include <asm/ide.h>
>  
> +#ifndef IDE_DELAY_400NS
> +#define IDE_DELAY_400NS
> +static inline void ide_delay_400ns(void) { udelay(1); }
> +#endif
> +
>  /* Currently only m68k, apus and m8xx need it */
>  #ifdef IDE_ARCH_ACK_INTR
>  extern int ide_irq_lock;
> @@ -1282,6 +1288,7 @@
>   * and also to start the safety timer.
>   */
>  extern void ide_set_handler(ide_drive_t *, ide_handler_t *, unsigned 
> int, ide_expiry_t *);
> +extern void ide_set_handler_nolock(ide_drive_t *, ide_handler_t *, 
> unsigned int, ide_expiry_t *);
>  
>  /*
>   * Error reporting, in human readable form (luxurious, but a memory hog).
> 
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-01-10 20:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-10  9:54 Problem in IDE Disks cache handling in kernel 2.4.XX Francis Verscheure
2003-01-10 10:19 ` John Bradford
2003-01-10 11:37 ` Alan Cox
2003-01-10 11:07   ` Benjamin Herrenschmidt
2003-01-10 11:14   ` Andre Hedrick
2003-01-10 13:35     ` Alan Cox
2003-01-10 13:03       ` Andre Hedrick
2003-01-10 14:13         ` Alan Cox
2003-01-10 16:48           ` Jens Axboe
2003-01-10 18:12             ` Alan Cox
2003-01-10 18:12               ` PATCH: [2.4.21-pre3] Fix for SMP race condition in IDE code Ross Biro
2003-01-10 19:29                 ` Alan Cox
2003-01-10 20:22                 ` Andre Hedrick
2003-01-10 18:25               ` Problem in IDE Disks cache handling in kernel 2.4.XX Jens Axboe
2003-01-10 17:23 ` Alan Cox
2003-01-10 17:02   ` John Bradford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox