* sg_remove and pending write request
@ 2006-10-17 18:45 Qi, Yanling
2006-10-17 19:36 ` Douglas Gilbert
0 siblings, 1 reply; 4+ messages in thread
From: Qi, Yanling @ 2006-10-17 18:45 UTC (permalink / raw)
To: linux-scsi
Hi All,
We are running a test case of SAS cable pull/push on a SAS RAID system.
After the SAS cable is pulled from a SAS RAID, scsi devices are deleted.
And then when the cable is pushed back, the scsi device with the same
H:C:T:L sometime will be assigned to a diffent sgX.
Reading through the sg.c, it seems that if the sg device has a pending
write request, the sg slot (sg_dev_arr[k] = NULL) will not be freed
during sg_remove time. Can someone confirm this?
If this is the case, what the user space process do to prevent this from
happening?
I see that the sg.c sends SIGPOLL to the user space process
(kill_fasync(&sfp->async_qp, SIGPOLL,POLL_HUP);), what this signal will
be translated to the user space return-code from read/write call?
Thanks,
Yanling
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: sg_remove and pending write request
2006-10-17 18:45 sg_remove and pending write request Qi, Yanling
@ 2006-10-17 19:36 ` Douglas Gilbert
0 siblings, 0 replies; 4+ messages in thread
From: Douglas Gilbert @ 2006-10-17 19:36 UTC (permalink / raw)
To: Qi, Yanling; +Cc: linux-scsi
Qi, Yanling wrote:
> Hi All,
>
> We are running a test case of SAS cable pull/push on a SAS RAID system.
> After the SAS cable is pulled from a SAS RAID, scsi devices are deleted.
> And then when the cable is pushed back, the scsi device with the same
> H:C:T:L sometime will be assigned to a diffent sgX.
There is no guarantee of the naming stability of sg
nodes (e.g. /dev/sg3) when devices disappear and re-appear.
Actually the design of lk 2.6 seems to actively discourage
user space programs from the assumption. Same applies for
all SCSI device nodes (and host numbers)
In the case of SAS, you really should be looking at the
target port SAS address in the device identification VPD
page (page 0x83). If the device in question is a SATA disk
then you have more work to do.
> Reading through the sg.c, it seems that if the sg device has a pending
> write request, the sg slot (sg_dev_arr[k] = NULL) will not be freed
> during sg_remove time. Can someone confirm this?
Yes, I can confirm that. The sg driver waits for the mid
level to callback with the outstanding IO completions (or
timeouts). If the user kills the process, the sg driver
still waits for IO completion. [A problem arises if the
user tries to 'rmmod sg'.] The device could well re-appear
during that "wait" time and the sg driver will assign a
different device node (i.e. the first unused slot in
sg_dev_arr[]).
> If this is the case, what the user space process do to prevent this from
> happening?
Develop a user space program that applies fast acting
super glue to the SAS connectors when IOs are in flight
and hands approach.
As I said above, you cannot assume device node names will
be stable across disconnect, reconnect cycles.
> I see that the sg.c sends SIGPOLL to the user space process
> (kill_fasync(&sfp->async_qp, SIGPOLL,POLL_HUP);), what this signal will
> be translated to the user space return-code from read/write call?
You would need to be running asynchronous IO with the
sg driver (i.e. write(),poll(),read() rather than SG_IO)
and POLLUP should appear in struct pollfd::revents .
You should also be able to run poll() from a signal handler
that catches SIGPOLL. [My knowledge is a bit rusty in this
area.]
Doug Gilbert
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: sg_remove and pending write request
@ 2006-10-17 21:33 Qi, Yanling
2006-10-17 22:02 ` Stefan Richter
0 siblings, 1 reply; 4+ messages in thread
From: Qi, Yanling @ 2006-10-17 21:33 UTC (permalink / raw)
To: dougg; +Cc: linux-scsi
Thank you for your reply and comments, Douglas. The user land product is
waiting for release. The time frame doesn't allow doing many changes in
the user land product at this time.
Read the sg.c again. It seems that the reattached SAS devices would take
the same sg slot if the following conditions meet
1. wait for 2+ minutes for a pending SG-IO write request to come back
before pushing the cable back. The 2+ minutes gives the scsi middle
level to timeout the pending io request and do error-recovery if it is
needed.
2. close user space fd properly (sg_release will try to do the
sg_dev_arr[k] = NULL.
Do you see any other conditions?
Thanks,
Yanling
-----Original Message-----
From: Douglas Gilbert [mailto:dougg@torque.net]
Sent: Tuesday, October 17, 2006 2:37 PM
To: Qi, Yanling
Cc: linux-scsi@vger.kernel.org
Subject: Re: sg_remove and pending write request
Qi, Yanling wrote:
> Hi All,
>
> We are running a test case of SAS cable pull/push on a SAS RAID
system.
> After the SAS cable is pulled from a SAS RAID, scsi devices are
deleted.
> And then when the cable is pushed back, the scsi device with the same
> H:C:T:L sometime will be assigned to a diffent sgX.
There is no guarantee of the naming stability of sg
nodes (e.g. /dev/sg3) when devices disappear and re-appear.
Actually the design of lk 2.6 seems to actively discourage
user space programs from the assumption. Same applies for
all SCSI device nodes (and host numbers)
In the case of SAS, you really should be looking at the
target port SAS address in the device identification VPD
page (page 0x83). If the device in question is a SATA disk
then you have more work to do.
> Reading through the sg.c, it seems that if the sg device has a pending
> write request, the sg slot (sg_dev_arr[k] = NULL) will not be freed
> during sg_remove time. Can someone confirm this?
Yes, I can confirm that. The sg driver waits for the mid
level to callback with the outstanding IO completions (or
timeouts). If the user kills the process, the sg driver
still waits for IO completion. [A problem arises if the
user tries to 'rmmod sg'.] The device could well re-appear
during that "wait" time and the sg driver will assign a
different device node (i.e. the first unused slot in
sg_dev_arr[]).
> If this is the case, what the user space process do to prevent this
from
> happening?
Develop a user space program that applies fast acting
super glue to the SAS connectors when IOs are in flight
and hands approach.
As I said above, you cannot assume device node names will
be stable across disconnect, reconnect cycles.
> I see that the sg.c sends SIGPOLL to the user space process
> (kill_fasync(&sfp->async_qp, SIGPOLL,POLL_HUP);), what this signal
will
> be translated to the user space return-code from read/write call?
You would need to be running asynchronous IO with the
sg driver (i.e. write(),poll(),read() rather than SG_IO)
and POLLUP should appear in struct pollfd::revents .
You should also be able to run poll() from a signal handler
that catches SIGPOLL. [My knowledge is a bit rusty in this
area.]
Doug Gilbert
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: sg_remove and pending write request
2006-10-17 21:33 Qi, Yanling
@ 2006-10-17 22:02 ` Stefan Richter
0 siblings, 0 replies; 4+ messages in thread
From: Stefan Richter @ 2006-10-17 22:02 UTC (permalink / raw)
To: Qi, Yanling; +Cc: dougg, linux-scsi
Qi, Yanling wrote:
> It seems that the reattached SAS devices would take
> the same sg slot if the following conditions meet
> 1. wait for 2+ minutes for a pending SG-IO write request to come back
> before pushing the cable back. The 2+ minutes gives the scsi middle
> level to timeout the pending io request and do error-recovery if it is
> needed.
> 2. close user space fd properly (sg_release will try to do the
> sg_dev_arr[k] = NULL.
>
> Do you see any other conditions?
3. Do not attach other SCSI devices in the meantime.
4. Hope that the kernel re-adds the missing devices in exactly the same
order as before.
That's why many SCSI transports provide unique persistent identifiers.
You can let udev name devices files according to these identifiers or
you can query them in your application software.
--
Stefan Richter
-=====-=-==- =-=- =---=
http://arcgraph.de/sr/
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-10-17 22:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-17 18:45 sg_remove and pending write request Qi, Yanling
2006-10-17 19:36 ` Douglas Gilbert
-- strict thread matches above, loose matches on Subject: below --
2006-10-17 21:33 Qi, Yanling
2006-10-17 22:02 ` Stefan Richter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox