How to resurrect offlined SCSI devices?

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* How to resurrect offlined SCSI devices?
@ 2004-05-26 13:14 Martin Peschke3
  2004-05-26 14:45 ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Martin Peschke3 @ 2004-05-26 13:14 UTC (permalink / raw)
  To: linux-scsi

Is there a SCSI mid layer interface that allows an lldd to get a
SCSI devices back online?

The SCSI mid layer sets devices offline if it fails to recover them.
Nevertheless, an lldd might see some event that indicates
that a device has become accessible again. Why not give the mid
layer a hint in order to have it re-enable the device?
One aspect that could make this harder is that the mid layer
may wish to test the device instead of buying the lldd's indication
unseen. The alternative would be to try to use the device and to
offline it again if needed.

I have seen a system that loses paths (EVMS) to SCSI
devices that were set offline. Getting these paths operational
again seems to be required for a reliable failback.

Is scsi_device_resume() a good starting point?
Any thoughts?

Cheers,
Martin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
  2004-05-26 13:14 Martin Peschke3
@ 2004-05-26 14:45 ` James Bottomley
  0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2004-05-26 14:45 UTC (permalink / raw)
  To: Martin Peschke3; +Cc: SCSI Mailing List

On Wed, 2004-05-26 at 08:14, Martin Peschke3 wrote:
> Is there a SCSI mid layer interface that allows an lldd to get a
> SCSI devices back online?

actually

echo "running" > /sys/...<device path>.../state

will work from user land.

Really, it isn't safe for a LLD to try to bring a device back online on
its own without user intervention.

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
@ 2004-05-26 15:04 Martin Peschke3
  2004-05-26 15:11 ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Martin Peschke3 @ 2004-05-26 15:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List

James,

> Really, it isn't safe for a LLD to try to bring a device back online on
> its own without user intervention.

How does the user know that it is safe?
What if they just try to bring it online? What is happening in the worst
case?
My point is that users probably set the appropriate sys-attribute
in a "trial and error" fashion, if they think they have got a problem,
while an lldd might even know better.

Martin

James Bottomley <James.Bottomley@SteelEye.com> on 26/05/2004 16:45:46

To:    Martin Peschke3/Germany/IBM@IBMDE
cc:    SCSI Mailing List <linux-scsi@vger.kernel.org>
Subject:    Re: How to resurrect offlined SCSI devices?

On Wed, 2004-05-26 at 08:14, Martin Peschke3 wrote:
> Is there a SCSI mid layer interface that allows an lldd to get a
> SCSI devices back online?

actually

echo "running" > /sys/...<device path>.../state

will work from user land.

Really, it isn't safe for a LLD to try to bring a device back online on
its own without user intervention.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
  2004-05-26 15:04 Martin Peschke3
@ 2004-05-26 15:11 ` James Bottomley
  0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2004-05-26 15:11 UTC (permalink / raw)
  To: Martin Peschke3; +Cc: SCSI Mailing List

On Wed, 2004-05-26 at 10:04, Martin Peschke3 wrote:
> How does the user know that it is safe?
> What if they just try to bring it online? What is happening in the worst
> case?
> My point is that users probably set the appropriate sys-attribute
> in a "trial and error" fashion, if they think they have got a problem,
> while an lldd might even know better.

The LLD rarely knows why a device was set offline.  The user at least
has the logs to look through.  Use of the state interface is caveat
emptor anyway.

The way it would work in the scenario you outline: some event indicating
that the device is available is to trigger a hotplug which may take the
action to set the device running again.

Setting devices back on line after a failure is a policy decision that
should not be taken by the LLD.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: How to resurrect offlined SCSI devices?
@ 2004-05-26 16:17 Infante, Jon
  2004-05-26 16:55 ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Infante, Jon @ 2004-05-26 16:17 UTC (permalink / raw)
  To: 'James Bottomley', Martin Peschke3; +Cc: SCSI Mailing List

Wouldn't it be acceptable for the lldd to call  scsi_scan_host() at this
point to force a scsi mid layer rescan of the devices attached to the hba?
In a fibre channel environment a device can disappear for a long period of
time, then come back. The driver knows exactly when it comes back and should
be able to tell the scsi layer.

Jon Infante
Emulex

-----Original Message-----
From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
Sent: Wednesday, May 26, 2004 7:46 AM
To: Martin Peschke3
Cc: SCSI Mailing List
Subject: Re: How to resurrect offlined SCSI devices?

On Wed, 2004-05-26 at 08:14, Martin Peschke3 wrote:
> Is there a SCSI mid layer interface that allows an lldd to get a
> SCSI devices back online?

actually

echo "running" > /sys/...<device path>.../state

will work from user land.

Really, it isn't safe for a LLD to try to bring a device back online on
its own without user intervention.

James

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: How to resurrect offlined SCSI devices?
@ 2004-05-26 16:48 Salyzyn, Mark
  0 siblings, 0 replies; 14+ messages in thread
From: Salyzyn, Mark @ 2004-05-26 16:48 UTC (permalink / raw)
  To: James Bottomley, Martin Peschke3; +Cc: SCSI Mailing List

The LLD Hardware RAID drivers, on the other hand, have exacting
knowledge as to why a device was taken offline.

Sincerely -- Mark Salyzyn

-----Original Message-----
From: linux-scsi-owner@vger.kernel.org
[mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of James Bottomley
Sent: Wednesday, May 26, 2004 11:11 AM
To: Martin Peschke3
Cc: SCSI Mailing List
Subject: Re: How to resurrect offlined SCSI devices?

On Wed, 2004-05-26 at 10:04, Martin Peschke3 wrote:
> How does the user know that it is safe?
> What if they just try to bring it online? What is happening in the
worst
> case?
> My point is that users probably set the appropriate sys-attribute
> in a "trial and error" fashion, if they think they have got a problem,
> while an lldd might even know better.

The LLD rarely knows why a device was set offline.  The user at least
has the logs to look through.  Use of the state interface is caveat
emptor anyway.

The way it would work in the scenario you outline: some event indicating
that the device is available is to trigger a hotplug which may take the
action to set the device running again.

Setting devices back on line after a failure is a policy decision that
should not be taken by the LLD.

James

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: How to resurrect offlined SCSI devices?
  2004-05-26 16:17 How to resurrect offlined SCSI devices? Infante, Jon
@ 2004-05-26 16:55 ` James Bottomley
  2004-05-26 17:15   ` Mike Anderson
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2004-05-26 16:55 UTC (permalink / raw)
  To: Infante, Jon; +Cc: Martin Peschke3, SCSI Mailing List

On Wed, 2004-05-26 at 11:17, Infante, Jon wrote:
> Wouldn't it be acceptable for the lldd to call  scsi_scan_host() at this
> point to force a scsi mid layer rescan of the devices attached to the hba?
> In a fibre channel environment a device can disappear for a long period of
> time, then come back. The driver knows exactly when it comes back and should
> be able to tell the scsi layer.

Well, only if the LLD previously did a remove of the device, which I
don't think it will have done.

I'd really like to see all fibre events (like loop up/down, device
add/remove) handled inside the FC transport class.  From there, it
probably still make sense to use hotplug as the mechanism for importing
user policy.

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
  2004-05-26 16:55 ` James Bottomley
@ 2004-05-26 17:15   ` Mike Anderson
  2004-05-26 17:26     ` Mike Christie
  2004-05-26 17:37     ` James Bottomley
  0 siblings, 2 replies; 14+ messages in thread
From: Mike Anderson @ 2004-05-26 17:15 UTC (permalink / raw)
  To: James Bottomley; +Cc: Infante, Jon, Martin Peschke3, SCSI Mailing List

James Bottomley [James.Bottomley@steeleye.com] wrote:
> I'd really like to see all fibre events (like loop up/down, device
> add/remove) handled inside the FC transport class.  From there, it
> probably still make sense to use hotplug as the mechanism for importing
> user policy.

Currently there are no hotplug events generated from SCSI except those
from add / remove of sysfs related structures. Are you suggesting that
SCSI should call call_usermodehelper and generate events?

The general issue I believe Martin is investigating is in the context of
device mapper multi-path (Martin correct me if I have this incorrect).
In the failure transition of a path a scsi_device can be marked
offline.  Until the device is restored path checking / re-enablement
cannot proceed. One issue is that path testing tools need to have SCSI
specific knowledge to know to change a device state. The second issue is
that while this could be done with a daemon it would appear a more
efficient method would be some form of state / event notification of
device and transport specific events.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
  2004-05-26 17:15   ` Mike Anderson
@ 2004-05-26 17:26     ` Mike Christie
  2004-05-26 17:48       ` Mike Anderson
  2004-05-26 17:37     ` James Bottomley
  1 sibling, 1 reply; 14+ messages in thread
From: Mike Christie @ 2004-05-26 17:26 UTC (permalink / raw)
  To: Mike Anderson
  Cc: James Bottomley, Infante, Jon, Martin Peschke3, SCSI Mailing List

Mike Anderson wrote:
> James Bottomley [James.Bottomley@steeleye.com] wrote:
> 
>>I'd really like to see all fibre events (like loop up/down, device
>>add/remove) handled inside the FC transport class.  From there, it
>>probably still make sense to use hotplug as the mechanism for importing
>>user policy.
> 
> 
> Currently there are no hotplug events generated from SCSI except those
> from add / remove of sysfs related structures. Are you suggesting that
> SCSI should call call_usermodehelper and generate events?

What about kobject_hotplug()?

Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
  2004-05-26 17:15   ` Mike Anderson
  2004-05-26 17:26     ` Mike Christie
@ 2004-05-26 17:37     ` James Bottomley
  2004-05-26 18:02       ` Mike Anderson
  1 sibling, 1 reply; 14+ messages in thread
From: James Bottomley @ 2004-05-26 17:37 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Infante, Jon, Martin Peschke3, SCSI Mailing List

On Wed, 2004-05-26 at 12:15, Mike Anderson wrote:
> James Bottomley [James.Bottomley@steeleye.com] wrote:
> > I'd really like to see all fibre events (like loop up/down, device
> > add/remove) handled inside the FC transport class.  From there, it
> > probably still make sense to use hotplug as the mechanism for importing
> > user policy.
> 
> Currently there are no hotplug events generated from SCSI except those
> from add / remove of sysfs related structures. Are you suggesting that
> SCSI should call call_usermodehelper and generate events?

Not necessarily roll our own, just plug into the hotplug
infrastructure.  But generate events, definitely.

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
  2004-05-26 17:26     ` Mike Christie
@ 2004-05-26 17:48       ` Mike Anderson
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Anderson @ 2004-05-26 17:48 UTC (permalink / raw)
  To: Mike Christie
  Cc: James Bottomley, Infante, Jon, Martin Peschke3, SCSI Mailing List

Mike Christie [mikenc@us.ibm.com] wrote:
> Mike Anderson wrote:
> >James Bottomley [James.Bottomley@steeleye.com] wrote:
> >
> >>I'd really like to see all fibre events (like loop up/down, device
> >>add/remove) handled inside the FC transport class.  From there, it
> >>probably still make sense to use hotplug as the mechanism for importing
> >>user policy.
> >
> >
> >Currently there are no hotplug events generated from SCSI except those
> >from add / remove of sysfs related structures. Are you suggesting that
> >SCSI should call call_usermodehelper and generate events?
> 
> What about kobject_hotplug()?

Thanks, my mistake I did not look close enough and was still thinking
that interfaces where not exported and we would need to roll our own.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
  2004-05-26 17:37     ` James Bottomley
@ 2004-05-26 18:02       ` Mike Anderson
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Anderson @ 2004-05-26 18:02 UTC (permalink / raw)
  To: James Bottomley; +Cc: Infante, Jon, Martin Peschke3, SCSI Mailing List

James Bottomley [James.Bottomley@SteelEye.com] wrote:
> Not necessarily roll our own, just plug into the hotplug
> infrastructure.  But generate events, definitely.

Yes we do not need to roll our own, Mike C's mail pointed out that these
interfaces are exported.

We should be able to add a call to kobject_hotplug in
scsi_device_set_state for certain / all state transitions depending on
how many events one would want to go to users space. Unexpected state
transitions would seem like a good starting point to not generate to
much user space traffic (i.e running -> offline, offline -> running).

This also assumes that if one expects user space reaction to these
events occurring on a root device they have already prepared there needed
tools in a ramdisk environment.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: How to resurrect offlined SCSI devices?
@ 2004-05-27 20:04 Martin Peschke3
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Peschke3 @ 2004-05-27 20:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: Infante, Jon, SCSI Mailing List

> I'd really like to see all fibre events (like loop up/down, device
> add/remove) handled inside the FC transport class.  From there, it
> probably still make sense to use hotplug as the mechanism for importing
> user policy.

I am fine with deploying user space to handle policy decisions, as long as
the kernel's SCSI code allows to propagate events, which is needed to
make it work well.

Martin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to resurrect offlined SCSI devices?
@ 2004-05-27 20:05 Martin Peschke3
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Peschke3 @ 2004-05-27 20:05 UTC (permalink / raw)
  To: Mike Anderson; +Cc: James Bottomley, Infante, Jon, SCSI Mailing List

> The general issue I believe Martin is investigating is in the context of
> device mapper multi-path (Martin correct me if I have this incorrect).

Right. If an lldd tries to accomodate multipathed I/O by returning
I/O promptly (instead of delaying, retrying, queueing, and working
around the threat of offlined devices and I/O errors), then the issue
is most appararent. Although, the subject question also applies to
non-multipathed I/O.

> In the failure transition of a path a scsi_device can be marked
> offline.  Until the device is restored path checking / re-enablement
> cannot proceed. One issue is that path testing tools need to have SCSI
> specific knowledge to know to change a device state.

Exacly. It would be ugly to have generic multipath tools care about
I/O subsystem specific flags. I am wondering why we need that special
offline bit in SCSI and how such matters are handled by other subsystems.
How does it fit into the generic device model and hotplug scheme?
If it fits well, then it could probably reside above SCSI. If it doesn't
fit, then it could be an indication that SCSI does it differently
(for historical reasons perhaps) and that there might be an alternative
which makes the offline bit dispensible.
I am not jeopardising 2.6 SCSI as it is. And I would be fine with
some user land helpers triggered by the kernel to get SCSI devices
back online. But perhaps there is something to think about for 2.7.

> The second issue is
> that while this could be done with a daemon it would appear a more
> efficient method would be some form of state / event notification of
> device and transport specific events.

Right, indications available to lower layers should not be dismissed,
regardless of whatever layer is finally doing the actual work.

Martin

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-05-27 20:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-26 16:17 How to resurrect offlined SCSI devices? Infante, Jon
2004-05-26 16:55 ` James Bottomley
2004-05-26 17:15   ` Mike Anderson
2004-05-26 17:26     ` Mike Christie
2004-05-26 17:48       ` Mike Anderson
2004-05-26 17:37     ` James Bottomley
2004-05-26 18:02       ` Mike Anderson
  -- strict thread matches above, loose matches on Subject: below --
2004-05-27 20:05 Martin Peschke3
2004-05-27 20:04 Martin Peschke3
2004-05-26 16:48 Salyzyn, Mark
2004-05-26 15:04 Martin Peschke3
2004-05-26 15:11 ` James Bottomley
2004-05-26 13:14 Martin Peschke3
2004-05-26 14:45 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox