* Re: noisy reservation conflicts
2004-06-14 19:18 ` Mike Anderson
@ 2004-06-15 12:54 ` Doug Ledford
2004-06-25 6:17 ` [PATCH] lk 2.6.7-bk6 [Re: noisy reservation conflicts] Douglas Gilbert
1 sibling, 0 replies; 4+ messages in thread
From: Doug Ledford @ 2004-06-15 12:54 UTC (permalink / raw)
To: Mike Anderson; +Cc: Douglas Gilbert, James Bottomley, linux-scsi mailing list
On Mon, 2004-06-14 at 15:18, Mike Anderson wrote:
> Douglas Gilbert [dougg@torque.net] wrote:
> > While testing persistent reservations it is quite common
> > to get a status of "reservation conflict". When using
> > the SG_IO ioctl this status is returned via the
> > sg_io_hdr::status field and is simple to process.
> >
> > However this lk 2.6.7-rc3 code fragment in
> > scsi_decide_disposition() [scsi_error.c] makes it very
> > noisy (on the console and in the log):
> >
> > case RESERVATION_CONFLICT:
> > printk("scsi%d (%d,%d,%d) : reservation conflict\n",
> > scmd->device->host->host_no,
> > scmd->device->channel,
> > scmd->device->id, scmd->device->lun);
> > return SUCCESS; /* causes immediate i/o error */
> >
>
> If we think it is important enough to leave we should wrap it with
> either a SCSI_LOG_MLCOMPLETE or SCSI_LOG_ERROR_RECOVERY to stay
> consistent with the scsi_decide_disposition function. Though I think the
> use of SCSI_LOG_ERROR_RECOVERY maybe should be replaced with
> SCSI_LOG_MLCOMPLETE in scsi_decide_disposition.
>
> The upper level drivers could also report similar info if this was
> removed as sd has SCSI_LOG_HLCOMPLETE that should cover sg_io ios
> through sd and sg_cmd_done has SCSI_LOG_TIMEOUT (correct?).
Sounds reasonable, but if you are going to be changing this anyway...you
might as well put in the STONITH option. In order for scsi reservations
to be a reliable method of host interaction control on a shared device,
you need the ability for a reservation conflict to cause the machine
that got the conflict to fall over dead (aka STONITH, Shoot The Other
Node In The Head). If you don't have a remote controlled power switch,
then you can implement this in the SCSI stack with a panic on
reservation conflict. The basic scenario then is that in a multi-node
failover cluster, the primary machine starts up and gets the
reservation, then as long as it holds it everything is fine. If the
primary machine fails its status check by the failover node, then the
failover node can cause a bus reset, steal the reservation, and pick up
where the main node left off (speaking specifically of SCSI-2
reservations here obviously). In that case, the next time the main node
tries to access the device, it will get a reservation conflict and that
is the main node's signal to panic/reboot and come back as the new
failover node with the failover node now the primary node. Simple 4
line change required to make SCSI-2 reservations useful.
--
Doug Ledford <dledford@redhat.com> 919-754-3700 x44233
Red Hat, Inc.
1801 Varsity Dr.
Raleigh, NC 27606
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH] lk 2.6.7-bk6 [Re: noisy reservation conflicts]
2004-06-14 19:18 ` Mike Anderson
2004-06-15 12:54 ` Doug Ledford
@ 2004-06-25 6:17 ` Douglas Gilbert
1 sibling, 0 replies; 4+ messages in thread
From: Douglas Gilbert @ 2004-06-25 6:17 UTC (permalink / raw)
To: Mike Anderson; +Cc: James Bottomley, linux-scsi
[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]
Mike Anderson wrote:
> Douglas Gilbert [dougg@torque.net] wrote:
>
>>While testing persistent reservations it is quite common
>>to get a status of "reservation conflict". When using
>>the SG_IO ioctl this status is returned via the
>>sg_io_hdr::status field and is simple to process.
>>
>>However this lk 2.6.7-rc3 code fragment in
>>scsi_decide_disposition() [scsi_error.c] makes it very
>>noisy (on the console and in the log):
>>
>> case RESERVATION_CONFLICT:
>> printk("scsi%d (%d,%d,%d) : reservation conflict\n",
>> scmd->device->host->host_no,
>> scmd->device->channel,
>> scmd->device->id, scmd->device->lun);
>> return SUCCESS; /* causes immediate i/o error */
>>
>
>
> If we think it is important enough to leave we should wrap it with
> either a SCSI_LOG_MLCOMPLETE or SCSI_LOG_ERROR_RECOVERY to stay
> consistent with the scsi_decide_disposition function. Though I think the
> use of SCSI_LOG_ERROR_RECOVERY maybe should be replaced with
> SCSI_LOG_MLCOMPLETE in scsi_decide_disposition.
>
> The upper level drivers could also report similar info if this was
> removed as sd has SCSI_LOG_HLCOMPLETE that should cover sg_io ios
> through sd and sg_cmd_done has SCSI_LOG_TIMEOUT (correct?).
Mike,
Yes the sg driver uses SCSI_LOG_TIMEOUT to avoid other usages
and a logging loop.
Just so this thread isn't forgotten, attached is a patch to
wrap the printk in question in a SCSI_LOG_ERROR_RECOVERY
macro.
Doug Gilbert
[-- Attachment #2: scsi_error_rconflict.diff --]
[-- Type: text/x-patch, Size: 661 bytes --]
--- linux/drivers/scsi/scsi_error.c 2004-06-25 15:10:49.000000000 +1000
+++ linux/drivers/scsi/scsi_error.c267bk6rc 2004-06-25 16:04:20.161354808 +1000
@@ -1375,9 +1375,10 @@
return SUCCESS;
case RESERVATION_CONFLICT:
- printk("scsi%d (%d,%d,%d) : reservation conflict\n",
- scmd->device->host->host_no, scmd->device->channel,
- scmd->device->id, scmd->device->lun);
+ SCSI_LOG_ERROR_RECOVERY(5, printk("scsi%d (%d,%d,%d) : "
+ "reservation conflict\n",
+ scmd->device->host->host_no, scmd->device->channel,
+ scmd->device->id, scmd->device->lun));
return SUCCESS; /* causes immediate i/o error */
default:
return FAILED;
^ permalink raw reply [flat|nested] 4+ messages in thread