SYNCHRONIZE_CACHE command is not retried

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* SYNCHRONIZE_CACHE command is not retried
@ 2010-05-04 12:23 Hannes Reinecke
  2010-05-04 12:53 ` Bernd Schubert
  0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2010-05-04 12:23 UTC (permalink / raw)
  To: SCSI Mailing List

Hi all,

I'm facing an issue here where the 'SYNCHRONIZE CACHE' command is not retried:

[  652.602637] sd 0:0:0:0: [sda] Send: 
[  652.602640] sd 0:0:0:0: [sda] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[  652.604943] sd 0:0:0:0: [sda] Done: SUCCESS
[  652.604947] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[  652.604951] sd 0:0:0:0: [sda] CDB: Synchronize Cache(10): 35 00 00 00 00 00 00 00 00 00
[  652.604958] sd 0:0:0:0: [sda] Sense Key : Unit Attention [current] 
[  652.604962] sd 0:0:0:0: [sda] Add. Sense: Reported luns data has changed
[  652.604972] sd 0:0:0:0: [sda] Send: 
[  652.604974] sd 0:0:0:0: [sda] CDB: Write(10): 2a 08 01 58 7b d6 00 00 08 00
[  652.605176] sd 0:0:0:0: [sda] Done: SUCCESS
[  652.605179] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[  652.605182] sd 0:0:0:0: [sda] CDB: Write(10): 2a 08 01 58 7b d6 00 00 08 00
[  652.605190] end_request: I/O error, dev sda, sector 22576086
[  652.605194] Buffer I/O error on device sda2, logical block 2295882
[  652.605196] lost page write due to I/O error on sda2
[  652.605207] Aborting journal on device sda2.

The 'SYNCHRONIZE CACHE' command is being inserted due to sd.c:

static void sd_prepare_flush(struct request_queue *q, struct request *rq)
{
	rq->cmd_type = REQ_TYPE_BLOCK_PC;
	rq->timeout = SD_TIMEOUT;
	rq->cmd[0] = SYNCHRONIZE_CACHE;
	rq->cmd_len = 10;
}

However, as the command type is set to REQ_TYPE_BLOCK_PC, the request is
not retried from the SCSI midlayer, but rather passed back upwards.
'Upwards' here being the block layer, which has no truck with retrying.
Bad.
The same sense code is retried when it occurs during normal READ/WRITE
commands.
So it really would make sense to have it retried here, too.
Shouldn't the flush command being marked as 'REQ_TYPE_SPECIAL' here?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SYNCHRONIZE_CACHE command is not retried
  2010-05-04 12:23 SYNCHRONIZE_CACHE command is not retried Hannes Reinecke
@ 2010-05-04 12:53 ` Bernd Schubert
  2010-05-04 13:07   ` Hannes Reinecke
  0 siblings, 1 reply; 4+ messages in thread
From: Bernd Schubert @ 2010-05-04 12:53 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: SCSI Mailing List

On Tuesday 04 May 2010, Hannes Reinecke wrote:
> Hi all,
> 
> I'm facing an issue here where the 'SYNCHRONIZE CACHE' command is not
>  retried:


Interesting that suddenly several people run into it, when I already noticed 
long ago. Hmm, you work for Suse, maybe you now got the ticket I have asked 
our customer about to open for their SLES system? ;)

Recent discussion is here:
http://kerneltrap.org/mailarchive/linux-scsi/2010/4/19/6884638

Sorry, I didn't have time yet to update the patch there.


Cheers,
Bernd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SYNCHRONIZE_CACHE command is not retried
  2010-05-04 12:53 ` Bernd Schubert
@ 2010-05-04 13:07   ` Hannes Reinecke
  2010-05-04 14:33     ` James Bottomley
  0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2010-05-04 13:07 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: SCSI Mailing List

Bernd Schubert wrote:
> On Tuesday 04 May 2010, Hannes Reinecke wrote:
>> Hi all,
>>
>> I'm facing an issue here where the 'SYNCHRONIZE CACHE' command is not
>>  retried:
> 
> 
> Interesting that suddenly several people run into it, when I already noticed 
> long ago. Hmm, you work for Suse, maybe you now got the ticket I have asked 
> our customer about to open for their SLES system? ;)
> 
> Recent discussion is here:
> http://kerneltrap.org/mailarchive/linux-scsi/2010/4/19/6884638
> 
> Sorry, I didn't have time yet to update the patch there.
> 
Well, yes, and no.

Your patch focussed primarily about the SYNC CACHE command as sent
from eg. sd_suspendI()
There it's quite easy as I just have to intercept the return
values and everything's dandy.

sd_prepare_flush(), OTOH, just prepares the command and hopes
the lower levels will to the right thing.
Which, apparently, they don't.
And setting 'retries' or 'timeout' wouldn't help here at all,
as we're never evaluating the number of retries; 

scsi_check_sense() returns 'SUCCESS', causing
scsi_decide_disposition() to never evaluate ->retries.
Then (eventually) scsi_io_completion() is called,
which logs an error:

	if (blk_pc_request(req)) { /* SG_IO ioctl from block level */
		req->errors = result;
		if (result) {
			if (sense_valid && req->sense) {
				/*
				 * SG_IO wants current and deferred errors
				 */
				int len = 8 + cmd->sense_buffer[7];

				if (len > SCSI_SENSE_BUFFERSIZE)
					len = SCSI_SENSE_BUFFERSIZE;
				memcpy(req->sense, cmd->sense_buffer,  len);
				req->sense_len = len;
			}
			if (!sense_deferred)
				error = -EIO;


which will end up in the block layer causing the abort.
At least, that's my interpretation.

So by just using eg 'REQ_TYPE_SPECIAL' we would avoid
this trap and indeed retry the command here:

	if (sense_valid && !sense_deferred) {
		switch (sshdr.sense_key) {
		case UNIT_ATTENTION:
			if (cmd->device->removable) {
				/* Detected disc change.  Set a bit
				 * and quietly refuse further access.
				 */
				cmd->device->changed = 1;
				scsi_end_request(cmd, -EIO, this_count, 1);
				return;
			} else {
				/* Must have been a power glitch, or a
				 * bus reset.  Could not have been a
				 * media change, so we just retry the
				 * request and see what happens.
				 */
				scsi_requeue_command(q, cmd);
				return;
			}
			break;

given that using REQ_TYPE_SPECIAL is infact correct here.

Let's see what the powers that be say to this reasoning.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SYNCHRONIZE_CACHE command is not retried
  2010-05-04 13:07   ` Hannes Reinecke
@ 2010-05-04 14:33     ` James Bottomley
  0 siblings, 0 replies; 4+ messages in thread
From: James Bottomley @ 2010-05-04 14:33 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Bernd Schubert, SCSI Mailing List

On Tue, 2010-05-04 at 15:07 +0200, Hannes Reinecke wrote:
> Let's see what the powers that be say to this reasoning.

The actual powers that be are on holiday at the moment, I'm just the dog
sitter.  However, I don't think looping forever on unit attention is a
good idea (there are known error cases where devices return unit
attention forever).  If the device has device mapper stuff, there should
already be a device_handler module intercepting this sense code, so we
shouldn't switch paths because of it.  If there's no device mapper, I
think I'd really rather just add the usual number of retries to the sync
cache command.

I already said I'd be happy with a patch adding this, as well as one
allowing a user configurable timeout for the suspend problem.

James

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-05-04 14:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-04 12:23 SYNCHRONIZE_CACHE command is not retried Hannes Reinecke
2010-05-04 12:53 ` Bernd Schubert
2010-05-04 13:07   ` Hannes Reinecke
2010-05-04 14:33     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox