public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] ipr: Fix for adapter shutdown issue
@ 2005-06-15 15:15 brking
  2005-06-15 15:17 ` Christoph Hellwig
  2005-06-15 15:28 ` James Bottomley
  0 siblings, 2 replies; 12+ messages in thread
From: brking @ 2005-06-15 15:15 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, brking, haren



James,
Haren found this in some recent kexec testing. Without this fix, the
ipr adapter's write cache never gets flushed. I'd like to see this
pushed into 2.6.12 if possible.

Thanks
Brian

From: Haren Myneni <haren@us.ibm.com>
Subject: [PATCH] Fix for IPR shutdown  issue

Hi Brian,
    The ipr_shutdown function will never get called because of changes 
made in the recent kernel. pci_driver->driver->shutdown is reset to 
pci_device_shutdown() when the the IPR driver is registered. Hence, the 
normal kexec boot was not successful. The following patch should fix 
this problem. Please send it to Andrew if you are Ok with this fix.

Thanks
Haren

Signed-off-by: Brian King <brking@us.ibm.com>
---

 linux-2.6-bjking1/drivers/scsi/ipr.c |   10 ++++------
 1 files changed, 4 insertions(+), 6 deletions(-)

diff -puN drivers/scsi/ipr.c~ipr_shutdown_fix drivers/scsi/ipr.c
--- linux-2.6/drivers/scsi/ipr.c~ipr_shutdown_fix	2005-06-15 08:33:02.000000000 -0500
+++ linux-2.6-bjking1/drivers/scsi/ipr.c	2005-06-15 08:33:18.000000000 -0500
@@ -5991,7 +5991,7 @@ static int __devinit ipr_probe(struct pc
 
 /**
  * ipr_shutdown - Shutdown handler.
- * @dev:	device struct
+ * @pdev:	PCI device struct
  *
  * This function is invoked upon system shutdown/reboot. It will issue
  * an adapter shutdown to the adapter to flush the write cache.
@@ -5999,9 +5999,9 @@ static int __devinit ipr_probe(struct pc
  * Return value:
  * 	none
  **/
-static void ipr_shutdown(struct device *dev)
+static void ipr_shutdown(struct pci_dev *pdev)
 {
-	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(to_pci_dev(dev));
+	struct ipr_ioa_cfg *ioa_cfg = pci_get_drvdata(pdev);
 	unsigned long lock_flags = 0;
 
 	spin_lock_irqsave(ioa_cfg->host->host_lock, lock_flags);
@@ -6047,9 +6047,7 @@ static struct pci_driver ipr_driver = {
 	.id_table = ipr_pci_table,
 	.probe = ipr_probe,
 	.remove = ipr_remove,
-	.driver = {
-		.shutdown = ipr_shutdown,
-	},
+	.shutdown = ipr_shutdown
 };
 
 /**
_

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 15:15 [PATCH 1/1] ipr: Fix for adapter shutdown issue brking
@ 2005-06-15 15:17 ` Christoph Hellwig
  2005-06-15 16:01   ` Greg KH
  2005-06-15 15:28 ` James Bottomley
  1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2005-06-15 15:17 UTC (permalink / raw)
  To: brking; +Cc: James.Bottomley, linux-scsi, haren, gregkh

On Wed, Jun 15, 2005 at 10:15:22AM -0500, brking@us.ibm.com wrote:
> 
> 
> James,
> Haren found this in some recent kexec testing. Without this fix, the
> ipr adapter's write cache never gets flushed. I'd like to see this
> pushed into 2.6.12 if possible.

Umm, there's tons of other drivers doing this. Greg, please revert
your pci-layer ->shutdown thing for 2.6.12 and make convert all drivers
when re-introducing it.  Or even better make sure it doesn't break
drivers setting ->shutdown themselves.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 15:15 [PATCH 1/1] ipr: Fix for adapter shutdown issue brking
  2005-06-15 15:17 ` Christoph Hellwig
@ 2005-06-15 15:28 ` James Bottomley
  2005-06-15 15:34   ` Brian King
  1 sibling, 1 reply; 12+ messages in thread
From: James Bottomley @ 2005-06-15 15:28 UTC (permalink / raw)
  To: brking; +Cc: SCSI Mailing List, haren

On Wed, 2005-06-15 at 10:15 -0500, brking@us.ibm.com wrote:
> Haren found this in some recent kexec testing. Without this fix, the
> ipr adapter's write cache never gets flushed. I'd like to see this
> pushed into 2.6.12 if possible.

Just out of curiosity, what are you doing with the SYNCHRONIZE_CACHE
that comes down from sd on shutdown?

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 15:28 ` James Bottomley
@ 2005-06-15 15:34   ` Brian King
  2005-06-15 15:47     ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Brian King @ 2005-06-15 15:34 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List, haren

James Bottomley wrote:
> On Wed, 2005-06-15 at 10:15 -0500, brking@us.ibm.com wrote:
> 
>>Haren found this in some recent kexec testing. Without this fix, the
>>ipr adapter's write cache never gets flushed. I'd like to see this
>>pushed into 2.6.12 if possible.
> 
> 
> Just out of curiosity, what are you doing with the SYNCHRONIZE_CACHE
> that comes down from sd on shutdown?

For scsi disks attached to an ipr adapter, the SYNCHRONIZE_CACHE command
gets sent to the disk and works just like when attached to any other
HBA. The ipr disk array devices, however, do not support the SYNC_CACHE command,
nor do they support the caching mode page, so SYNC_CACHE never gets sent.
So, the shutdown hook is needed to flush the adapter's battery backed write
cache for all attached disk arrays on system shutdown.

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 15:34   ` Brian King
@ 2005-06-15 15:47     ` James Bottomley
  2005-06-15 16:11       ` Brian King
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2005-06-15 15:47 UTC (permalink / raw)
  To: brking; +Cc: SCSI Mailing List, haren

On Wed, 2005-06-15 at 10:34 -0500, Brian King wrote:
> For scsi disks attached to an ipr adapter, the SYNCHRONIZE_CACHE command
> gets sent to the disk and works just like when attached to any other
> HBA. The ipr disk array devices, however, do not support the SYNC_CACHE command,
> nor do they support the caching mode page, so SYNC_CACHE never gets sent.
> So, the shutdown hook is needed to flush the adapter's battery backed write
> cache for all attached disk arrays on system shutdown.

Well, that means you have a whole lot more trouble in 2.6.12 than simply
failing to flush a cache on shutdown.  The barrier code now uses cache
synchronization commands, so if you crash the on-disk image will not be
what a journalling filesystem expects.  As long as the battery keeps the
information alive in the cache, I assume this corrects itself when ipr
next powers up, but if you trusted this, you wouldn't be fussing about
the shutdown cache flush, now would you?

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 15:17 ` Christoph Hellwig
@ 2005-06-15 16:01   ` Greg KH
  2005-06-15 16:14     ` Brian King
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2005-06-15 16:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: brking, James.Bottomley, linux-scsi, haren

On Wed, Jun 15, 2005 at 04:17:42PM +0100, Christoph Hellwig wrote:
> On Wed, Jun 15, 2005 at 10:15:22AM -0500, brking@us.ibm.com wrote:
> > 
> > 
> > James,
> > Haren found this in some recent kexec testing. Without this fix, the
> > ipr adapter's write cache never gets flushed. I'd like to see this
> > pushed into 2.6.12 if possible.
> 
> Umm, there's tons of other drivers doing this. Greg, please revert
> your pci-layer ->shutdown thing for 2.6.12 and make convert all drivers
> when re-introducing it.  Or even better make sure it doesn't break
> drivers setting ->shutdown themselves.

I'm sorry, what pci-layer ->shutdown thing?  Have a pointer to the patch
in question that people are having problems with?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 15:47     ` James Bottomley
@ 2005-06-15 16:11       ` Brian King
  2005-06-15 16:44         ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Brian King @ 2005-06-15 16:11 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List, haren

James Bottomley wrote:
> On Wed, 2005-06-15 at 10:34 -0500, Brian King wrote:
> 
>>For scsi disks attached to an ipr adapter, the SYNCHRONIZE_CACHE command
>>gets sent to the disk and works just like when attached to any other
>>HBA. The ipr disk array devices, however, do not support the SYNC_CACHE command,
>>nor do they support the caching mode page, so SYNC_CACHE never gets sent.
>>So, the shutdown hook is needed to flush the adapter's battery backed write
>>cache for all attached disk arrays on system shutdown.
> 
> 
> Well, that means you have a whole lot more trouble in 2.6.12 than simply
> failing to flush a cache on shutdown.  The barrier code now uses cache
> synchronization commands, so if you crash the on-disk image will not be
> what a journalling filesystem expects.  As long as the battery keeps the
> information alive in the cache, I assume this corrects itself when ipr
> next powers up, but if you trusted this, you wouldn't be fussing about
> the shutdown cache flush, now would you?

System crash and controlled power off are two different things. Yes, if the
adapter is not told about the shutdown so it can flush its cache, the
data will be maintained, but it can only be maintained for as long as
the battery lasts. Currently, with 2.6.12-rc, if you do a normal power off
and leave the system powered off for a long enough period of time (usually
a week or so, depending on the battery on the card), you will suffer data
loss. This is not acceptable. With 2.6.11, the adapter is told to flush
the write cache on a controlled power off, so the system can remain powered
off indefinitely with no data loss.

Regarding supporting the sync cache command to a disk array, I did consider
requesting support be added to the adapter microcode, but decided against it
since it would make a RAID array perform horribly and it did not seem to be
the appropriate model to follow for NV write cache. Some of the ipr adapters
have over 500 MB of write cache, which can take an awful long time to flush,
much longer than you would want for a journal barrier. So, all the barrier/
SYNC_CACHE code works great for a volatile write cache on a disk, but
doesn't necessarily seem like the best model for large non-volatile write
caches for RAID adapters.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 16:01   ` Greg KH
@ 2005-06-15 16:14     ` Brian King
  0 siblings, 0 replies; 12+ messages in thread
From: Brian King @ 2005-06-15 16:14 UTC (permalink / raw)
  To: Greg KH; +Cc: Christoph Hellwig, James.Bottomley, linux-scsi, haren

Greg KH wrote:
> On Wed, Jun 15, 2005 at 04:17:42PM +0100, Christoph Hellwig wrote:
> 
>>On Wed, Jun 15, 2005 at 10:15:22AM -0500, brking@us.ibm.com wrote:
>>
>>>
>>>James,
>>>Haren found this in some recent kexec testing. Without this fix, the
>>>ipr adapter's write cache never gets flushed. I'd like to see this
>>>pushed into 2.6.12 if possible.
>>
>>Umm, there's tons of other drivers doing this. Greg, please revert
>>your pci-layer ->shutdown thing for 2.6.12 and make convert all drivers
>>when re-introducing it.  Or even better make sure it doesn't break
>>drivers setting ->shutdown themselves.
> 
> 
> I'm sorry, what pci-layer ->shutdown thing?  Have a pointer to the patch
> in question that people are having problems with?

This is the one:

http://marc.theaimsgroup.com/?l=linux-pci&m=111751146113738&w=2



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 16:11       ` Brian King
@ 2005-06-15 16:44         ` James Bottomley
  2005-06-15 17:05           ` Patrick Mansfield
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2005-06-15 16:44 UTC (permalink / raw)
  To: brking; +Cc: SCSI Mailing List, haren

On Wed, 2005-06-15 at 11:11 -0500, Brian King wrote:
> System crash and controlled power off are two different things. Yes, if the
> adapter is not told about the shutdown so it can flush its cache, the
> data will be maintained, but it can only be maintained for as long as
> the battery lasts. Currently, with 2.6.12-rc, if you do a normal power off
> and leave the system powered off for a long enough period of time (usually
> a week or so, depending on the battery on the card), you will suffer data
> loss. This is not acceptable. With 2.6.11, the adapter is told to flush
> the write cache on a controlled power off, so the system can remain powered
> off indefinitely with no data loss.
> 
> Regarding supporting the sync cache command to a disk array, I did consider
> requesting support be added to the adapter microcode, but decided against it
> since it would make a RAID array perform horribly and it did not seem to be
> the appropriate model to follow for NV write cache. Some of the ipr adapters
> have over 500 MB of write cache, which can take an awful long time to flush,
> much longer than you would want for a journal barrier. So, all the barrier/
> SYNC_CACHE code works great for a volatile write cache on a disk, but
> doesn't necessarily seem like the best model for large non-volatile write
> caches for RAID adapters.

Well, I still think you're operating unsafely ... some people let these
batteries run out, you know ...

To fix this, add a

blk_queue_ordered(sdev->request_queue, QUEUE_ORDERED_NONE);

to the slave configure routine for each of the devices that doesn't
synchronize the cache.  That should correct the SCSI layer assumptions.

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 16:44         ` James Bottomley
@ 2005-06-15 17:05           ` Patrick Mansfield
  2005-06-15 17:23             ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Patrick Mansfield @ 2005-06-15 17:05 UTC (permalink / raw)
  To: James Bottomley; +Cc: brking, SCSI Mailing List, haren

On Wed, Jun 15, 2005 at 11:44:59AM -0500, James Bottomley wrote:

> 
> Well, I still think you're operating unsafely ... some people let these
> batteries run out, you know ...
> 
> To fix this, add a
> 
> blk_queue_ordered(sdev->request_queue, QUEUE_ORDERED_NONE);
> 
> to the slave configure routine for each of the devices that doesn't
> synchronize the cache.  That should correct the SCSI layer assumptions.

I don't get it.

First in scsi_alloc_queue, the flush/sync code is only used if we don't
have ordered tags, and if the adapter driver explicitly allows it
(shost->order_flush).

Plus we clear order_flush in scsi_host_alloc if can_queue > 1.

So ipr will never get those sync cache commands, correct?

And we default to QUEUE_ORDERED_NONE (0) if ordered_tag and ordered_flush
are not set.

But I can't find any where that we set ordered_tag at all, am I missing
something??? Let alone in ipr.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 17:05           ` Patrick Mansfield
@ 2005-06-15 17:23             ` James Bottomley
  2005-06-16 12:00               ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2005-06-15 17:23 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: brking, SCSI Mailing List, haren

On Wed, 2005-06-15 at 10:05 -0700, Patrick Mansfield wrote:
> First in scsi_alloc_queue, the flush/sync code is only used if we don't
> have ordered tags, and if the adapter driver explicitly allows it
> (shost->order_flush).
> 
> Plus we clear order_flush in scsi_host_alloc if can_queue > 1.
> 
> So ipr will never get those sync cache commands, correct?

Actually, yes.  I thought Jens turned it on globally, but apparently
it's only set for the sata controllers.

the can_queue check is a bit mysterious since non TCQ devices should
have can_queue == 2 (so they have one command executing and one command
ready to roll).

> And we default to QUEUE_ORDERED_NONE (0) if ordered_tag and ordered_flush
> are not set.
> 
> But I can't find any where that we set ordered_tag at all, am I missing
> something??? Let alone in ipr.

I think that was also added by the barrier patch set. Although it's well
known what the difficulties of using ordered tag queueing to implement
barriers are:

1) Race in queuecommand() where barrier gets BUSY or QUEUE_FULL but next
command is accepted before we see this.
2) Error handling is not robust to ordering.
3) Qerr bit of the control mode page needs to be set correctly.

James



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] ipr: Fix for adapter shutdown issue
  2005-06-15 17:23             ` James Bottomley
@ 2005-06-16 12:00               ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2005-06-16 12:00 UTC (permalink / raw)
  To: James Bottomley; +Cc: Patrick Mansfield, brking, SCSI Mailing List, haren

On Wed, Jun 15 2005, James Bottomley wrote:
> On Wed, 2005-06-15 at 10:05 -0700, Patrick Mansfield wrote:
> > First in scsi_alloc_queue, the flush/sync code is only used if we don't
> > have ordered tags, and if the adapter driver explicitly allows it
> > (shost->order_flush).
> > 
> > Plus we clear order_flush in scsi_host_alloc if can_queue > 1.
> > 
> > So ipr will never get those sync cache commands, correct?
> 
> Actually, yes.  I thought Jens turned it on globally, but apparently
> it's only set for the sata controllers.

Yeah, I figured most other scsi controllers supported TCQ...

> the can_queue check is a bit mysterious since non TCQ devices should
> have can_queue == 2 (so they have one command executing and one command
> ready to roll).

But that will cause the mid layer to invoke ->queuecommand() when you
already have one command queued, thus the driver needs to check if it
really can queue one more. I greatly prefer the driver setting the exact
depth instead. If you want to have the next command pre-setup, you
should do so in the mid layer.

> > And we default to QUEUE_ORDERED_NONE (0) if ordered_tag and ordered_flush
> > are not set.
> > 
> > But I can't find any where that we set ordered_tag at all, am I missing
> > something??? Let alone in ipr.
> 
> I think that was also added by the barrier patch set. Although it's well
> known what the difficulties of using ordered tag queueing to implement
> barriers are:
> 
> 1) Race in queuecommand() where barrier gets BUSY or QUEUE_FULL but next
> command is accepted before we see this.
> 2) Error handling is not robust to ordering.
> 3) Qerr bit of the control mode page needs to be set correctly.

Some work still left to fully supported barriers with tags, Tejun is
making great progress though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2005-06-16 11:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-15 15:15 [PATCH 1/1] ipr: Fix for adapter shutdown issue brking
2005-06-15 15:17 ` Christoph Hellwig
2005-06-15 16:01   ` Greg KH
2005-06-15 16:14     ` Brian King
2005-06-15 15:28 ` James Bottomley
2005-06-15 15:34   ` Brian King
2005-06-15 15:47     ` James Bottomley
2005-06-15 16:11       ` Brian King
2005-06-15 16:44         ` James Bottomley
2005-06-15 17:05           ` Patrick Mansfield
2005-06-15 17:23             ` James Bottomley
2005-06-16 12:00               ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox