PATCH: Fixes for ide-disk.c

All of lore.kernel.org
 help / color / mirror / Atom feed

* PATCH: Fixes for ide-disk.c
@ 2003-04-05  7:31 Nigel Cunningham
  2003-04-05  9:27 ` John Bradford
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Nigel Cunningham @ 2003-04-05  7:31 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Hi.

In working on swsusp, I've found the following patches were needed.
The first fragment is trivial, as you'll see.
The second and third handle the fact that the suspend & resume functions
can be called multiple times. Pavel asked me to pass it on immediately.
The fourth handles the fact that writeback caches seem to be implemented
in two ways. People using swsusp under 2.4 found that everything worked
fine if they rebooted after writing the image, but powering down at the
end of writing the image caused corruption. I got the additional check
from the source for hdparm, which only does the new check to determine
if a drive has a writeback cache.

Regards,

Nigel

--- linux-2.5.66-original/drivers/ide/ide-disk.c	2003-03-26 08:56:49.000000000 +1200
+++ linux-2.5.66/drivers/ide/ide-disk.c	2003-04-05 18:51:17.000000000 +1200
@@ -1515,7 +1515,7 @@
 {
 	ide_drive_t *drive = dev->driver_data;
 
-	printk("Suspending device %p\n", dev->driver_data);
+	printk(KERN_INFO "Suspending device %p\n", dev->driver_data);
 
 	/* I hope that every freeze operation from the upper levels have
 	 * already been done...
@@ -1527,7 +1527,7 @@
 	/* set the drive to standby */
 	printk(KERN_INFO "suspending: %s ", drive->name);
 	do_idedisk_standby(drive);
-	drive->blocked = 1;
+	drive->blocked++;
 
 	BUG_ON (HWGROUP(drive)->handler);
 	return 0;
@@ -1539,8 +1539,8 @@
 
 	if (level != RESUME_RESTORE_STATE)
 		return 0;
-	BUG_ON(!drive->blocked);
-	drive->blocked = 0;
+	if (drive->blocked)
+		drive->blocked--;
 	return 0;
 }
 
@@ -1804,7 +1804,8 @@
 	if ((!drive->head || drive->head > 16) && !drive->select.b.lba) {
 		printk(KERN_ERR "%s: INVALID GEOMETRY: %d PHYSICAL HEADS?\n",
 			drive->name, drive->head);
-		if ((drive->id->cfs_enable_2 & 0x3000) && drive->wcache)
+		if (((drive->id->cfs_enable_2 & 0x3000) && drive->wcache) ||
+		    ((drive->id->command_set_1 & 0x20) && drive->id->cfs_enable_1 & 0x20))
 			if (do_idedisk_flushcache(drive))
 				printk (KERN_INFO "%s: Write Cache FAILED Flushing!\n",
 					drive->name);




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05  7:31 PATCH: Fixes for ide-disk.c Nigel Cunningham
@ 2003-04-05  9:27 ` John Bradford
  2003-04-05 10:57   ` Andre Hedrick
  2003-04-05 16:46 ` Alan Cox
  2003-04-05 16:46 ` Alan Cox
  2 siblings, 1 reply; 26+ messages in thread
From: John Bradford @ 2003-04-05  9:27 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: alan, linux-kernel

> People using swsusp under 2.4 found that everything worked
> fine if they rebooted after writing the image, but powering down at the
> end of writing the image caused corruption. I got the additional check
> from the source for hdparm, which only does the new check to determine
> if a drive has a writeback cache.

Did we ever establish what the best way to ensure that the write cache is
flushed, is?  An explicit cache flush and spin down are both necessary, but
I had problems with drives spinning back up when we did the spindown first.

John.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
@ 2003-04-05 10:32 Chuck Ebbert
  2003-04-05 11:30 ` Andre Hedrick
  0 siblings, 1 reply; 26+ messages in thread
From: Chuck Ebbert @ 2003-04-05 10:32 UTC (permalink / raw)
  To: linux-kernel


John Bradford wrote:


>Did we ever establish what the best way to ensure
>that the write cache is flushed, is?  An explicit
>cache flush and spin down are both necessary, but
>I had problems with drives spinning back up when
>we did the spindown first.


Disks that don't support flush should be sent an IDLE command, IIRC.



--
 Chuck
 I am not a number!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05  9:27 ` John Bradford
@ 2003-04-05 10:57   ` Andre Hedrick
  2003-04-05 13:09     ` John Bradford
  0 siblings, 1 reply; 26+ messages in thread
From: Andre Hedrick @ 2003-04-05 10:57 UTC (permalink / raw)
  To: John Bradford; +Cc: Nigel Cunningham, alan, linux-kernel


Are you issuing "standby" or "suspend" ?

The shutdown issues a "standby" and not a "suspend", this is why you get
the spinup on a flush-cache.

On Sat, 5 Apr 2003, John Bradford wrote:

> > People using swsusp under 2.4 found that everything worked
> > fine if they rebooted after writing the image, but powering down at the
> > end of writing the image caused corruption. I got the additional check
> > from the source for hdparm, which only does the new check to determine
> > if a drive has a writeback cache.
> 
> Did we ever establish what the best way to ensure that the write cache is
> flushed, is?  An explicit cache flush and spin down are both necessary, but
> I had problems with drives spinning back up when we did the spindown first.
> 
> John.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05 10:32 Chuck Ebbert
@ 2003-04-05 11:30 ` Andre Hedrick
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Hedrick @ 2003-04-05 11:30 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel


If the drive is compliant it will issue an abort if not supported.
Otherwise one should check the identify page; however, there are several
cases where the bits are improperly set.  Another double edge sword to
make the driver more interesting.

Cheers,

On Sat, 5 Apr 2003, Chuck Ebbert wrote:

> 
> John Bradford wrote:
> 
> 
> >Did we ever establish what the best way to ensure
> >that the write cache is flushed, is?  An explicit
> >cache flush and spin down are both necessary, but
> >I had problems with drives spinning back up when
> >we did the spindown first.
> 
> 
> Disks that don't support flush should be sent an IDLE command, IIRC.
> 
> 
> 
> --
>  Chuck
>  I am not a number!
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05 10:57   ` Andre Hedrick
@ 2003-04-05 13:09     ` John Bradford
  0 siblings, 0 replies; 26+ messages in thread
From: John Bradford @ 2003-04-05 13:09 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: linux-kernel

> Are you issuing "standby" or "suspend" ?
> 
> The shutdown issues a "standby" and not a "suspend", this is why you get
> the spinup on a flush-cache.

I noticed this happening a few months ago when I was testing 2.4.20 on a
couple of similar 486 laptops.  About 25% of the time, the disk would spin
down, then up again, after the flushing IDE devices message.

I think that the argument against doing the flush first, then the standby,
was that some drives would abort, or wait on the flush, but I never saw a
conclusive answer to it.

Am I the only person who observed the spin down, and immediate spin up
behavior?

John.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05  7:31 PATCH: Fixes for ide-disk.c Nigel Cunningham
  2003-04-05  9:27 ` John Bradford
@ 2003-04-05 16:46 ` Alan Cox
  2003-04-05 19:25   ` Nigel Cunningham
  2003-04-05 16:46 ` Alan Cox
  2 siblings, 1 reply; 26+ messages in thread
From: Alan Cox @ 2003-04-05 16:46 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: Linux Kernel Mailing List

On Sad, 2003-04-05 at 08:31, Nigel Cunningham wrote:
> @@ -1527,7 +1527,7 @@
>  	/* set the drive to standby */
>  	printk(KERN_INFO "suspending: %s ", drive->name);
>  	do_idedisk_standby(drive);
> -	drive->blocked = 1;
> +	drive->blocked++;
>  

Drive->blocked is a single bit field. Its not a counting lock. Either
we are blocked or not.

Alan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05  7:31 PATCH: Fixes for ide-disk.c Nigel Cunningham
  2003-04-05  9:27 ` John Bradford
  2003-04-05 16:46 ` Alan Cox
@ 2003-04-05 16:46 ` Alan Cox
  2 siblings, 0 replies; 26+ messages in thread
From: Alan Cox @ 2003-04-05 16:46 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: Linux Kernel Mailing List

On Sad, 2003-04-05 at 08:31, Nigel Cunningham wrote:
> @@ -1527,7 +1527,7 @@
>  	/* set the drive to standby */
>  	printk(KERN_INFO "suspending: %s ", drive->name);
>  	do_idedisk_standby(drive);
> -	drive->blocked = 1;
> +	drive->blocked++;
>  

Drive->blocked is a single bit field. Its not a counting lock. Either
we are blocked or not.

Alan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: PATCH: Fixes for ide-disk.c
@ 2003-04-05 18:34 Mudama, Eric
  2003-04-05 19:52 ` Andre Hedrick
  0 siblings, 1 reply; 26+ messages in thread
From: Mudama, Eric @ 2003-04-05 18:34 UTC (permalink / raw)
  To: 'Andre Hedrick ', 'Chuck Ebbert '; +Cc: 'linux-kernel '

If you guys see a problem with how the bits are set on a Maxtor drive,
please let me know so I can fix them.

If you try to spin down, the drive shouldn't allow that to happen with dirty
write data, period.

To ensure the cache is flushed, you must issue FLUSH CACHE (EXT).  That is
the only "according to spec" method.  I know of other things you can do (on
a Maxtor drive) that have cache flush as a side effect, but that is the only
specified way.

Once the flush cache has returned 0x50 status, there can be no dirty data
left in the drive.  Note that this FLUSH CACHE can easily take 6-10 seconds
depending on drive cache size and workload pattern.

-----Original Message-----
From: Andre Hedrick
To: Chuck Ebbert
Cc: linux-kernel
Sent: 4/5/03 4:30 AM
Subject: Re: PATCH: Fixes for ide-disk.c

If the drive is compliant it will issue an abort if not supported.
Otherwise one should check the identify page; however, there are several
cases where the bits are improperly set.  Another double edge sword to
make the driver more interesting.

Cheers,

On Sat, 5 Apr 2003, Chuck Ebbert wrote:

> 
> John Bradford wrote:
> 
> 
> >Did we ever establish what the best way to ensure
> >that the write cache is flushed, is?  An explicit
> >cache flush and spin down are both necessary, but
> >I had problems with drives spinning back up when
> >we did the spindown first.
> 
> 
> Disks that don't support flush should be sent an IDLE command, IIRC.
> 
> 
> 
> --
>  Chuck
>  I am not a number!
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05 16:46 ` Alan Cox
@ 2003-04-05 19:25   ` Nigel Cunningham
  2003-04-05 20:58     ` Andre Hedrick
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Nigel Cunningham @ 2003-04-05 19:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Hi.

On Sun, 2003-04-06 at 04:46, Alan Cox wrote:
> Drive->blocked is a single bit field. Its not a counting lock. Either
> we are blocked or not.

Okay. We need a different solution then, but the problem remains - the
driver can get multiple, nexted calls to block and unblock. Can it
become a counting lock?

Regards,

Nigel

-- 
Nigel Cunningham
495 St Georges Road South, Hastings 4201, New Zealand

Be diligent to present yourself approved to God as a workman who does
not need to be ashamed, handling accurately the word of truth.
	-- 2 Timothy 2:14, NASB.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: PATCH: Fixes for ide-disk.c
  2003-04-05 18:34 Mudama, Eric
@ 2003-04-05 19:52 ` Andre Hedrick
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Hedrick @ 2003-04-05 19:52 UTC (permalink / raw)
  To: Mudama, Eric; +Cc: 'Chuck Ebbert ', 'linux-kernel '



Eric,

Ask Leroy in Longmont how it should behave.

6-10 seconds is a nice idea, the reality as we both know it is up to 30
seconds to return because of OOB seeks to write on reallocations.

FLUSH CACHE/FLUSH CACHE EXT ...

If you have a drive that is not larger than 28-bit in geometry regardless
if it supports the 48-bit feature set, if it fails to comply with 28-bit
flush cache it is a "BAD DEVICE", period.  You will not be allowed to push
off the 48-bit feature set rules.  Regardless if the new smart data is
set the the GPL and not Smart Logs.

If you are suggesting a pole for completion on the FLUSH, say so.
Otherwise, standard non-data INTQ completion is default.

Cheers,

On Sat, 5 Apr 2003, Mudama, Eric wrote:

> 
> If you guys see a problem with how the bits are set on a Maxtor drive,
> please let me know so I can fix them.
> 
> If you try to spin down, the drive shouldn't allow that to happen with dirty
> write data, period.
> 
> To ensure the cache is flushed, you must issue FLUSH CACHE (EXT).  That is
> the only "according to spec" method.  I know of other things you can do (on
> a Maxtor drive) that have cache flush as a side effect, but that is the only
> specified way.
> 
> Once the flush cache has returned 0x50 status, there can be no dirty data
> left in the drive.  Note that this FLUSH CACHE can easily take 6-10 seconds
> depending on drive cache size and workload pattern.
> 
>  
> 
> -----Original Message-----
> From: Andre Hedrick
> To: Chuck Ebbert
> Cc: linux-kernel
> Sent: 4/5/03 4:30 AM
> Subject: Re: PATCH: Fixes for ide-disk.c
> 
> 
> If the drive is compliant it will issue an abort if not supported.
> Otherwise one should check the identify page; however, there are several
> cases where the bits are improperly set.  Another double edge sword to
> make the driver more interesting.
> 
> Cheers,
> 
> On Sat, 5 Apr 2003, Chuck Ebbert wrote:
> 
> > 
> > John Bradford wrote:
> > 
> > 
> > >Did we ever establish what the best way to ensure
> > >that the write cache is flushed, is?  An explicit
> > >cache flush and spin down are both necessary, but
> > >I had problems with drives spinning back up when
> > >we did the spindown first.
> > 
> > 
> > Disks that don't support flush should be sent an IDLE command, IIRC.
> > 
> > 
> > 
> > --
> >  Chuck
> >  I am not a number!
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> Andre Hedrick
> LAD Storage Consulting Group
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: PATCH: Fixes for ide-disk.c
@ 2003-04-05 20:17 Mudama, Eric
  2003-04-05 20:54 ` Andre Hedrick
  0 siblings, 1 reply; 26+ messages in thread
From: Mudama, Eric @ 2003-04-05 20:17 UTC (permalink / raw)
  To: 'Andre Hedrick', Mudama, Eric
  Cc: 'Chuck Ebbert ', 'linux-kernel '

> -----Original Message-----
> From: Andre Hedrick [mailto:andre@linux-ide.org]
> Sent: Saturday, April 05, 2003 12:52 PM
> To: Mudama, Eric
> Subject: RE: PATCH: Fixes for ide-disk.c
>
> 6-10 seconds is a nice idea, the reality as we both know it is up to 30
> seconds to return because of OOB seeks to write on reallocations.

Of course, 6-10 is the non-error case.  Depending on how hard one has been
pounding the drive with writes, there is no reason that error recovery can't
creep up into the multiple-minute domain in some cases if you happen to have
excessively high temperature, vibrations, and be working on an old drive
with ratty servo bursts in a certain location or something.

> FLUSH CACHE/FLUSH CACHE EXT ...
>
> If you have a drive that is not larger than 28-bit in geometry 
> regardless if it supports the 48-bit feature set, if it fails
> to comply with 28-bit flush cache it is a "BAD DEVICE", period.

Agreed.  They're different opcodes, E7 and EA.

In either case, according to the spec, when a FLUSH CACHE (normal or EXT,
doesn't matter) completes with "good" status (0x50) all the write cache has
been successfully flushed to the disk.

A 48-bit sized drive should still commit dirty data when issued the 28-bit
FLUSH CACHE command, it will simply have problems reporting potential error
locations if it gets a fatal error on the write.  

> You will not be allowed to push off the 48-bit feature set rules.
> Regardless if the new smart data is set the the GPL and not Smart
> Logs.

I don't understand what you mean.  I am looking at the ATA7 r1a spec now,
and don't see the 48-bit specific feature set rules that you're referring to
being "pushed off".

>If you are suggesting a pole for completion on the FLUSH, say so.
>Otherwise, standard non-data INTQ completion is default.

Clearing of the busy bit with a status of 0x50 is what to look for.  Polled
or INTQ doesn't matter.

If the drive reports 0x50 without a completely clean write cache, it is
broken.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: PATCH: Fixes for ide-disk.c
  2003-04-05 20:17 Mudama, Eric
@ 2003-04-05 20:54 ` Andre Hedrick
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Hedrick @ 2003-04-05 20:54 UTC (permalink / raw)
  To: Mudama, Eric; +Cc: 'Chuck Ebbert ', 'linux-kernel '

On Sat, 5 Apr 2003, Mudama, Eric wrote:

> In either case, according to the spec, when a FLUSH CACHE (normal or EXT,
> doesn't matter) completes with "good" status (0x50) all the write cache has
> been successfully flushed to the disk.
> 
> A 48-bit sized drive should still commit dirty data when issued the 28-bit
> FLUSH CACHE command, it will simply have problems reporting potential error
> locations if it gets a fatal error on the write.  
> 
> > You will not be allowed to push off the 48-bit feature set rules.
> > Regardless if the new smart data is set the the GPL and not Smart
> > Logs.
> 
> I don't understand what you mean.  I am looking at the ATA7 r1a spec now,
> and don't see the 48-bit specific feature set rules that you're referring to
> being "pushed off".

This is the result of a flush cache error returning a failure and the
device logging the error.  If it was a 48-bit command it should commit to
the GPL (smart extentions), otherwise a 28-bit would commit to 28-bit
smart logs.  Since the Linux does not track the returns on completion, the
report of any flush cache error is worthless.  Since the error reports the
LBA where it happened but that request is long gone, the data is toast.

It will take 2.7 or later to address the data integrity and recovery on a
device that fails completion on a flush.  This is a bad host, but the host
is getting smarter.  The difference is this host's author knows the
issues, and just needs to communicate them out.

> >If you are suggesting a pole for completion on the FLUSH, say so.
> >Otherwise, standard non-data INTQ completion is default.
> 
> Clearing of the busy bit with a status of 0x50 is what to look for.  Polled
> or INTQ doesn't matter.
> 
> If the drive reports 0x50 without a completely clean write cache, it is
> broken.

This is a DEVICE side problem, the HOST can only respond to the content
returned in the taskfiles.  The DEVICE can LIE all day long and the HOST
can only follow the lead cue from the DEVICE.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05 19:25   ` Nigel Cunningham
@ 2003-04-05 20:58     ` Andre Hedrick
  2003-04-06 15:03     ` Alan Cox
  2003-04-10 18:37     ` Pavel Machek
  2 siblings, 0 replies; 26+ messages in thread
From: Andre Hedrick @ 2003-04-05 20:58 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: Alan Cox, Linux Kernel Mailing List


Why?

Drive->blocked N times is still blocked.

I refuse to play tracking games to figure out whose Drive->blocked is
whose!

Cheers,

On Sun, 6 Apr 2003, Nigel Cunningham wrote:

> Hi.
> 
> On Sun, 2003-04-06 at 04:46, Alan Cox wrote:
> > Drive->blocked is a single bit field. Its not a counting lock. Either
> > we are blocked or not.
> 
> Okay. We need a different solution then, but the problem remains - the
> driver can get multiple, nexted calls to block and unblock. Can it
> become a counting lock?
> 
> Regards,
> 
> Nigel
> 
> -- 
> Nigel Cunningham
> 495 St Georges Road South, Hastings 4201, New Zealand
> 
> Be diligent to present yourself approved to God as a workman who does
> not need to be ashamed, handling accurately the word of truth.
> 	-- 2 Timothy 2:14, NASB.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05 19:25   ` Nigel Cunningham
  2003-04-05 20:58     ` Andre Hedrick
@ 2003-04-06 15:03     ` Alan Cox
  2003-04-06 20:59       ` Nigel Cunningham
                         ` (2 more replies)
  2003-04-10 18:37     ` Pavel Machek
  2 siblings, 3 replies; 26+ messages in thread
From: Alan Cox @ 2003-04-06 15:03 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: Linux Kernel Mailing List

On Sad, 2003-04-05 at 20:25, Nigel Cunningham wrote:
> Hi.
> 
> On Sun, 2003-04-06 at 04:46, Alan Cox wrote:
> > Drive->blocked is a single bit field. Its not a counting lock. Either
> > we are blocked or not.
> 
> Okay. We need a different solution then, but the problem remains - the
> driver can get multiple, nexted calls to block and unblock. Can it
> become a counting lock?

Blocked is a binary power management described state, its not a lock.
What are you actually trying to do ?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-06 15:03     ` Alan Cox
@ 2003-04-06 20:59       ` Nigel Cunningham
  2003-04-06 21:02         ` Alan Cox
  2003-04-06 21:12       ` Nigel Cunningham
  2003-04-10 18:38       ` Pavel Machek
  2 siblings, 1 reply; 26+ messages in thread
From: Nigel Cunningham @ 2003-04-06 20:59 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Ok. I just figured that if there are two (say) calls to suspend a
driver, there should be two calls to resume it before it actually is
resumed. Does the spec say anything about how multiple suspends &
resumes should be handled?

Regards,

Nigel

> Blocked is a binary power management described state, its not a lock.
> What are you actually trying to do ?
-- 
Nigel Cunningham
495 St Georges Road South, Hastings 4201, New Zealand

Be diligent to present yourself approved to God as a workman who does
not need to be ashamed, handling accurately the word of truth.
	-- 2 Timothy 2:14, NASB.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-06 21:12       ` Nigel Cunningham
@ 2003-04-06 21:00         ` Alan Cox
  2003-04-06 21:58           ` Nigel Cunningham
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Cox @ 2003-04-06 21:00 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: Linux Kernel Mailing List

On Sul, 2003-04-06 at 22:12, Nigel Cunningham wrote:
> Hi again.
> 
> I've just reread your message. Isn't it used as a lock at the moment?
> IIRC, you get a BUG if you try to use the driver while its blocked.
> Perhaps that's where I'm getting confused.

There are multiple "blocked" fields to deal with. drive->blocked
indicates we can't feed it commands because it is blocked due to
power management.

idedisk_suspend sets the drive->blocked field so that we panic if
anything is sent to the disk after we turned it off (since it
would be a corrupter).

idedisk_resume is called on the resume path and marks the drive
as safe to use.

So if you hit that bug, you did I/O after shutting the drive down,
which is not allowed.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-06 20:59       ` Nigel Cunningham
@ 2003-04-06 21:02         ` Alan Cox
  0 siblings, 0 replies; 26+ messages in thread
From: Alan Cox @ 2003-04-06 21:02 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: Linux Kernel Mailing List

On Sul, 2003-04-06 at 21:59, Nigel Cunningham wrote:
> Ok. I just figured that if there are two (say) calls to suspend a
> driver, there should be two calls to resume it before it actually is
> resumed. Does the spec say anything about how multiple suspends &
> resumes should be handled?

The IDE layer assumes the power management layer doesn't generate
bogus repeated suspend requests. If it does then lots of drivers
are broken and the core code needs fixing.

If its because there are two different notifiers trying to park
the IDE disk that might make more sense.

However getting this wrong loses data so I will accept no changes to
this code until someone precisely explains what is going on so we
can be sure the fix itself is safe.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-06 15:03     ` Alan Cox
  2003-04-06 20:59       ` Nigel Cunningham
@ 2003-04-06 21:12       ` Nigel Cunningham
  2003-04-06 21:00         ` Alan Cox
  2003-04-10 18:38       ` Pavel Machek
  2 siblings, 1 reply; 26+ messages in thread
From: Nigel Cunningham @ 2003-04-06 21:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Hi again.

I've just reread your message. Isn't it used as a lock at the moment?
IIRC, you get a BUG if you try to use the driver while its blocked.
Perhaps that's where I'm getting confused.

Regards,

Nigel

On Mon, 2003-04-07 at 03:03, Alan Cox wrote:
> Blocked is a binary power management described state, its not a lock.
> What are you actually trying to do ?



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-06 21:00         ` Alan Cox
@ 2003-04-06 21:58           ` Nigel Cunningham
  0 siblings, 0 replies; 26+ messages in thread
From: Nigel Cunningham @ 2003-04-06 21:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On Mon, 2003-04-07 at 09:00, Alan Cox wrote:
> There are multiple "blocked" fields to deal with. drive->blocked
> indicates we can't feed it commands because it is blocked due to
> power management.
> 
> idedisk_suspend sets the drive->blocked field so that we panic if
> anything is sent to the disk after we turned it off (since it
> would be a corrupter).
> 
> idedisk_resume is called on the resume path and marks the drive
> as safe to use.
> 
> So if you hit that bug, you did I/O after shutting the drive down,
> which is not allowed.

Okay. So the right behaviour is to ignore multiple idedisk_suspend calls
and ignore multiple idedisk_resume calls?

Regards,

Nigel

-- 
Nigel Cunningham
495 St Georges Road South, Hastings 4201, New Zealand

Be diligent to present yourself approved to God as a workman who does
not need to be ashamed, handling accurately the word of truth.
	-- 2 Timothy 2:14, NASB.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-05 19:25   ` Nigel Cunningham
  2003-04-05 20:58     ` Andre Hedrick
  2003-04-06 15:03     ` Alan Cox
@ 2003-04-10 18:37     ` Pavel Machek
  2003-04-11  3:30       ` Andre Hedrick
  2 siblings, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2003-04-10 18:37 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: Alan Cox, Linux Kernel Mailing List

Hi!

> > Drive->blocked is a single bit field. Its not a counting lock. Either
> > we are blocked or not.
> 
> Okay. We need a different solution then, but the problem remains - the
> driver can get multiple, nexted calls to block and unblock. Can it
> become a counting lock?

Simplest fix seems to be make sure it only
reacts to one "suspending level" (suspend_save_state)
and that really should not be nested.
(If that nests, fix caller)
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-06 15:03     ` Alan Cox
  2003-04-06 20:59       ` Nigel Cunningham
  2003-04-06 21:12       ` Nigel Cunningham
@ 2003-04-10 18:38       ` Pavel Machek
  2003-04-10 21:55         ` Alan Cox
  2 siblings, 1 reply; 26+ messages in thread
From: Pavel Machek @ 2003-04-10 18:38 UTC (permalink / raw)
  To: Alan Cox; +Cc: Nigel Cunningham, Linux Kernel Mailing List

Hi!

> > Okay. We need a different solution then, but the problem remains - the
> > driver can get multiple, nexted calls to block and unblock. Can it
> > become a counting lock?
> 
> Blocked is a binary power management described state, its not a lock.
> What are you actually trying to do ?

He's trying to fix swsusp. Just now it likes
to BUG() for some people.
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-10 18:38       ` Pavel Machek
@ 2003-04-10 21:55         ` Alan Cox
  2003-04-10 23:36           ` Nigel Cunningham
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Cox @ 2003-04-10 21:55 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Nigel Cunningham, Linux Kernel Mailing List

On Thu, 2003-04-10 at 19:38, Pavel Machek wrote:
> Hi!
> 
> > > Okay. We need a different solution then, but the problem remains - the
> > > driver can get multiple, nexted calls to block and unblock. Can it
> > > become a counting lock?
> > 
> > Blocked is a binary power management described state, its not a lock.
> > What are you actually trying to do ?
> 
> He's trying to fix swsusp. Just now it likes
> to BUG() for some people.

I meant at a slightly lower and more detailed level


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-10 21:55         ` Alan Cox
@ 2003-04-10 23:36           ` Nigel Cunningham
  2003-04-13 11:17             ` Pavel Machek
  0 siblings, 1 reply; 26+ messages in thread
From: Nigel Cunningham @ 2003-04-10 23:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: Pavel Machek, Linux Kernel Mailing List

Hi.

As far as ide code goes, I'm trying to:

1) Get rid of messages that don't need to appear during suspend (so the
'nice display' looks nice (hence printk changes)
2) Make sure the data actually does make it to disk before we poweroff
(hence write back cache interest)
3) Make sure ide_suspend/unsuspend do the right thing (however that's
defined).

I hadn't looked at the code yet to see why it's called twice - too busy
with other things. I did wonder if 

        device_suspend(4, SUSPEND_NOTIFY);
        device_suspend(4, SUSPEND_SAVE_STATE);
        device_suspend(4, SUSPEND_DISABLE);

might do it, but it's an untested theory.

Regards,

Nigel

On Fri, 2003-04-11 at 09:55, Alan Cox wrote:
> On Thu, 2003-04-10 at 19:38, Pavel Machek wrote:
> > Hi!
> > 
> > > > Okay. We need a different solution then, but the problem remains - the
> > > > driver can get multiple, nexted calls to block and unblock. Can it
> > > > become a counting lock?
> > > 
> > > Blocked is a binary power management described state, its not a lock.
> > > What are you actually trying to do ?
> > 
> > He's trying to fix swsusp. Just now it likes
> > to BUG() for some people.
> 
> I meant at a slightly lower and more detailed level
-- 
Nigel Cunningham
495 St Georges Road South, Hastings 4201, New Zealand

Be diligent to present yourself approved to God as a workman who does
not need to be ashamed, handling accurately the word of truth.
	-- 2 Timothy 2:14, NASB.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-10 18:37     ` Pavel Machek
@ 2003-04-11  3:30       ` Andre Hedrick
  0 siblings, 0 replies; 26+ messages in thread
From: Andre Hedrick @ 2003-04-11  3:30 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Nigel Cunningham, Alan Cox, Linux Kernel Mailing List


Hi!

<rant>
Well you authored the mess^H^H^H^Hwork for the suspend, and it fails the
binary logic of on or off.  You failed to abort the caller for attempting
to set multiple blocks.  Fix the kernel because the caller has to be
assumed dumb and brain dead ... it is user space.
</rant>

Andre Hedrick
LAD Storage Consulting Group


On Thu, 10 Apr 2003, Pavel Machek wrote:

> Hi!
> 
> > > Drive->blocked is a single bit field. Its not a counting lock. Either
> > > we are blocked or not.
> > 
> > Okay. We need a different solution then, but the problem remains - the
> > driver can get multiple, nexted calls to block and unblock. Can it
> > become a counting lock?
> 
> Simplest fix seems to be make sure it only
> reacts to one "suspending level" (suspend_save_state)
> and that really should not be nested.
> (If that nests, fix caller)
> -- 
> 				Pavel
> Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PATCH: Fixes for ide-disk.c
  2003-04-10 23:36           ` Nigel Cunningham
@ 2003-04-13 11:17             ` Pavel Machek
  0 siblings, 0 replies; 26+ messages in thread
From: Pavel Machek @ 2003-04-13 11:17 UTC (permalink / raw)
  To: Nigel Cunningham; +Cc: Alan Cox, Pavel Machek, Linux Kernel Mailing List

Hi!

> I hadn't looked at the code yet to see why it's called twice - too busy
> with other things. I did wonder if 
> 
>         device_suspend(4, SUSPEND_NOTIFY);
>         device_suspend(4, SUSPEND_SAVE_STATE);
>         device_suspend(4, SUSPEND_DISABLE);
> 
> might do it, but it's an untested theory.

ide should only care about one of those
levels.
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2003-04-12 20:55 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-04-05  7:31 PATCH: Fixes for ide-disk.c Nigel Cunningham
2003-04-05  9:27 ` John Bradford
2003-04-05 10:57   ` Andre Hedrick
2003-04-05 13:09     ` John Bradford
2003-04-05 16:46 ` Alan Cox
2003-04-05 19:25   ` Nigel Cunningham
2003-04-05 20:58     ` Andre Hedrick
2003-04-06 15:03     ` Alan Cox
2003-04-06 20:59       ` Nigel Cunningham
2003-04-06 21:02         ` Alan Cox
2003-04-06 21:12       ` Nigel Cunningham
2003-04-06 21:00         ` Alan Cox
2003-04-06 21:58           ` Nigel Cunningham
2003-04-10 18:38       ` Pavel Machek
2003-04-10 21:55         ` Alan Cox
2003-04-10 23:36           ` Nigel Cunningham
2003-04-13 11:17             ` Pavel Machek
2003-04-10 18:37     ` Pavel Machek
2003-04-11  3:30       ` Andre Hedrick
2003-04-05 16:46 ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2003-04-05 10:32 Chuck Ebbert
2003-04-05 11:30 ` Andre Hedrick
2003-04-05 18:34 Mudama, Eric
2003-04-05 19:52 ` Andre Hedrick
2003-04-05 20:17 Mudama, Eric
2003-04-05 20:54 ` Andre Hedrick

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.