* [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
@ 2006-05-18 19:59 Michael Reed
2006-05-20 15:33 ` James Bottomley
0 siblings, 1 reply; 12+ messages in thread
From: Michael Reed @ 2006-05-18 19:59 UTC (permalink / raw)
To: linux-scsi
[-- Attachment #1: Type: text/plain, Size: 367 bytes --]
mpt_config() can return EAGAIN. When this happens, fibre channel target
discovery can prematurely terminate with fewer than the total number of
targets discovered. This patch detects EAGAIN and reschedules the scan
work.
Generally, this situation only occurs when the lsiutil program is being
used to reset the board.
Signed-off-by: Michael Reed <mdr@sgi.com>
[-- Attachment #2: 01-mptfc_eagain.patch --]
[-- Type: text/x-patch, Size: 2073 bytes --]
mpt_config() can return EAGAIN. When this happens, fibre channel target
discovery can prematurely terminate with fewer than the total number of
targets discovered. This patch detects EAGAIN and reschedules the scan
work.
Generally, this situation only occurs when the lsiutil program is being
used to reset the board.
Signed-off-by: Michael Reed <mdr@sgi.com>
--- rc3u/drivers/message/fusion/mptfc.c 2006-05-01 16:06:13.311966423 -0500
+++ rc3/drivers/message/fusion/mptfc.c 2006-05-03 14:16:47.669834844 -0500
@@ -634,6 +634,7 @@
MPT_ADAPTER *ioc = (MPT_ADAPTER *)arg;
int ii;
int work_to_do;
+ int rc=0;
u64 pn;
unsigned long flags;
struct mptfc_rport_info *ri;
@@ -651,9 +652,13 @@
* will reregister existing rports
*/
for (ii=0; ii < ioc->facts.NumberOfPorts; ii++) {
- (void) mptbase_GetFcPortPage0(ioc, ii);
+ rc = mptbase_GetFcPortPage0(ioc, ii);
+ if (rc == -EAGAIN)
+ break;
mptfc_init_host_attr(ioc,ii); /* refresh */
- mptfc_GetFcDevPage0(ioc,ii,mptfc_register_dev);
+ rc = mptfc_GetFcDevPage0(ioc,ii,mptfc_register_dev);
+ if (rc == -EAGAIN)
+ break;
}
/* delete devices still missing */
@@ -686,6 +691,20 @@
work_to_do = --ioc->fc_rescan_work_count;
spin_unlock_irqrestore(&ioc->fc_rescan_work_lock, flags);
} while (work_to_do);
+
+ /* if last pass failed with EAGAIN, reschedule work for a later attempt */
+ if (rc == -EAGAIN) {
+ spin_lock_irqsave(&ioc->fc_rescan_work_lock, flags);
+ if (ioc->fc_rescan_work_q) {
+ queue_delayed_work(ioc->fc_rescan_work_q, &ioc->fc_rescan_work, HZ);
+ ioc->fc_rescan_work_count = 1;
+ }
+ spin_unlock_irqrestore(&ioc->fc_rescan_work_lock, flags);
+ dfcprintk ((MYIOC_s_INFO_FMT
+ "mptfc_rescan.%d: rescheduling work\n",
+ ioc->name,
+ ioc->sh->host_no));
+ }
}
static int
@@ -981,6 +1000,7 @@
spin_lock_irqsave(&ioc->fc_rescan_work_lock, flags);
ioc->fc_rescan_work_q = NULL;
spin_unlock_irqrestore(&ioc->fc_rescan_work_lock, flags);
+ cancel_delayed_work(&ioc->fc_rescan_work);
destroy_workqueue(work_q);
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
@ 2006-05-19 23:41 Moore, Eric
0 siblings, 0 replies; 12+ messages in thread
From: Moore, Eric @ 2006-05-19 23:41 UTC (permalink / raw)
To: Michael Reed, linux-scsi
On Thursday, May 18, 2006 2:00 PM, Michael Reed wrote:
> Subject: [PATCH 1/6] mpt fusion - fibre channel target
> discovery prematurely terminates
>
ACK
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
2006-05-18 19:59 Michael Reed
@ 2006-05-20 15:33 ` James Bottomley
2006-05-22 17:31 ` Michael Reed
0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2006-05-20 15:33 UTC (permalink / raw)
To: Michael Reed; +Cc: linux-scsi
On Thu, 2006-05-18 at 14:59 -0500, Michael Reed wrote:
> mpt_config() can return EAGAIN. When this happens, fibre channel target
> discovery can prematurely terminate with fewer than the total number of
> targets discovered. This patch detects EAGAIN and reschedules the scan
> work.
>
> Generally, this situation only occurs when the lsiutil program is being
> used to reset the board.
mpt_config() only returns EAGAIN when it's out of message frames. That
should be a very transient condition, so if this rarely occurs anyway,
why not just put an msleep(10) on the condition and then retry? It
would save all the requeue logic.
James
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
2006-05-20 15:33 ` James Bottomley
@ 2006-05-22 17:31 ` Michael Reed
2006-05-22 17:34 ` Michael Reed
0 siblings, 1 reply; 12+ messages in thread
From: Michael Reed @ 2006-05-22 17:31 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi
James Bottomley wrote:
> On Thu, 2006-05-18 at 14:59 -0500, Michael Reed wrote:
>> mpt_config() can return EAGAIN. When this happens, fibre channel target
>> discovery can prematurely terminate with fewer than the total number of
>> targets discovered. This patch detects EAGAIN and reschedules the scan
>> work.
>>
>> Generally, this situation only occurs when the lsiutil program is being
>> used to reset the board.
>
> mpt_config() only returns EAGAIN when it's out of message frames. That
> should be a very transient condition, so if this rarely occurs anyway,
> why not just put an msleep(10) on the condition and then retry? It
> would save all the requeue logic.
It's not so transient during reset processing. I've measured 10 to
15 seconds of elapsed time. But, it always eventually succeeds.
Mike
>
> James
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
2006-05-22 17:31 ` Michael Reed
@ 2006-05-22 17:34 ` Michael Reed
0 siblings, 0 replies; 12+ messages in thread
From: Michael Reed @ 2006-05-22 17:34 UTC (permalink / raw)
To: James Bottomley; +Cc: Michael Reed, linux-scsi
Michael Reed wrote:
>
> James Bottomley wrote:
>> On Thu, 2006-05-18 at 14:59 -0500, Michael Reed wrote:
>>> mpt_config() can return EAGAIN. When this happens, fibre channel target
>>> discovery can prematurely terminate with fewer than the total number of
>>> targets discovered. This patch detects EAGAIN and reschedules the scan
>>> work.
>>>
>>> Generally, this situation only occurs when the lsiutil program is being
>>> used to reset the board.
>> mpt_config() only returns EAGAIN when it's out of message frames. That
>> should be a very transient condition, so if this rarely occurs anyway,
>> why not just put an msleep(10) on the condition and then retry? It
>> would save all the requeue logic.
>
> It's not so transient during reset processing. I've measured 10 to
> 15 seconds of elapsed time. But, it always eventually succeeds.
Well, I should have said that the event occurs with some regularity
during reset processing. And that the duration of the EAGAIN response
is 10 to 15 seconds. I'd rather reschedule than msleep.
Mike
>
> Mike
>
>> James
>>
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
@ 2006-05-22 17:45 Moore, Eric
2006-05-22 17:57 ` James Bottomley
2006-05-22 18:05 ` Michael Reed
0 siblings, 2 replies; 12+ messages in thread
From: Moore, Eric @ 2006-05-22 17:45 UTC (permalink / raw)
To: Michael Reed, James Bottomley; +Cc: linux-scsi
On Monday, May 22, 2006 11:35 AM, Michael Reed wrote:
> >
> > It's not so transient during reset processing. I've measured 10 to
> > 15 seconds of elapsed time. But, it always eventually succeeds.
>
> Well, I should have said that the event occurs with some regularity
> during reset processing. And that the duration of the EAGAIN response
> is 10 to 15 seconds. I'd rather reschedule than msleep.
>
Why?
Anyways ... if were going to sleep, I'd rather the sleeping/waiting be
done
from mpt_config when were are calling mpt_get_msg_frame(), instead
of the calling functions. Perhaps mpt_get_msg_frame triggering a signal
or something when its having a freed mf.
Eric
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
2006-05-22 17:45 Moore, Eric
@ 2006-05-22 17:57 ` James Bottomley
2006-05-22 18:05 ` Michael Reed
1 sibling, 0 replies; 12+ messages in thread
From: James Bottomley @ 2006-05-22 17:57 UTC (permalink / raw)
To: Moore, Eric; +Cc: Michael Reed, linux-scsi
On Mon, 2006-05-22 at 11:45 -0600, Moore, Eric wrote:
> Anyways ... if were going to sleep, I'd rather the sleeping/waiting be
> done
> from mpt_config when were are calling mpt_get_msg_frame(), instead
> of the calling functions. Perhaps mpt_get_msg_frame triggering a
> signal
> or something when its having a freed mf.
That would be the perfect solution ... if you can guarantee that
mpt_config always has user context, that is.
James
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
2006-05-22 17:45 Moore, Eric
2006-05-22 17:57 ` James Bottomley
@ 2006-05-22 18:05 ` Michael Reed
1 sibling, 0 replies; 12+ messages in thread
From: Michael Reed @ 2006-05-22 18:05 UTC (permalink / raw)
To: Moore, Eric; +Cc: James Bottomley, linux-scsi
Moore, Eric wrote:
>
> On Monday, May 22, 2006 11:35 AM, Michael Reed wrote:
>>> It's not so transient during reset processing. I've measured 10 to
>>> 15 seconds of elapsed time. But, it always eventually succeeds.
>> Well, I should have said that the event occurs with some regularity
>> during reset processing. And that the duration of the EAGAIN response
>> is 10 to 15 seconds. I'd rather reschedule than msleep.
>>
>
> Why?
>
> Anyways ... if were going to sleep, I'd rather the sleeping/waiting be
> done
> from mpt_config when were are calling mpt_get_msg_frame(), instead
> of the calling functions. Perhaps mpt_get_msg_frame triggering a signal
> or something when its having a freed mf.
Changing mpt_config() to sleep changes the behavior in that EAGAIN might no longer
be returned. lan, ctl, sas, spi, and fc all make use of mpt_config(). This may be
non-trivial with regard to testing. Or it may not. And, as James points out,
we have to assure that the caller is in a context which can sleep.
Can we leave the interface alone for the moment and accept the patch
as written? Then, look at changing mpt_config() and the evaluate the
testing burden that the change might impose?
My vested interest is in getting the functionality into certain
distros of interest. I have no problem with rearchitecting the
patchset as described above. I'm just concerned with the timing.
I suspect that the testing required will push the patch's acceptance
beyond my potential window of opportunity. As written, the change
is confined to fibre channel so will not potentially introduce
regressions into the other drivers.
Thanks,
Mike
>
> Eric
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
@ 2006-05-22 18:15 Moore, Eric
2006-05-22 18:40 ` Michael Reed
0 siblings, 1 reply; 12+ messages in thread
From: Moore, Eric @ 2006-05-22 18:15 UTC (permalink / raw)
To: Michael Reed; +Cc: James Bottomley, linux-scsi
On Monday, May 22, 2006 12:06 PM, Michael Reed wrote:
>
>
> Changing mpt_config() to sleep changes the behavior in that
> EAGAIN might no longer
> be returned. lan, ctl, sas, spi, and fc all make use of
> mpt_config(). This may be
> non-trivial with regard to testing. Or it may not. And, as
> James points out,
> we have to assure that the caller is in a context which can sleep.
>
The caller context of mpt_config() *MUST* be able to sleep.
Towards this end of this function we call wait_event(). This is
because we are waiting on the firmware reply. If the fw reply
doesn't come, then watchdog timer kicks in, then mpt_timer_expired
is called.
> Can we leave the interface alone for the moment and accept the patch
> as written? Then, look at changing mpt_config() and the evaluate the
> testing burden that the change might impose?
>
James?
> My vested interest is in getting the functionality into certain
> distros of interest. I have no problem with rearchitecting the
> patchset as described above. I'm just concerned with the timing.
> I suspect that the testing required will push the patch's acceptance
> beyond my potential window of opportunity. As written, the change
> is confined to fibre channel so will not potentially introduce
> regressions into the other drivers.
>
How?
I doubt there would be regressions. We would need to implement timeout
on not receiving a mf, then return EAGAIN as we do today.
Eric
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
2006-05-22 18:15 [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates Moore, Eric
@ 2006-05-22 18:40 ` Michael Reed
2006-05-24 18:48 ` Michael Reed
0 siblings, 1 reply; 12+ messages in thread
From: Michael Reed @ 2006-05-22 18:40 UTC (permalink / raw)
To: Moore, Eric; +Cc: James Bottomley, linux-scsi
Moore, Eric wrote:
> On Monday, May 22, 2006 12:06 PM, Michael Reed wrote:
>>
>> Changing mpt_config() to sleep changes the behavior in that
>> EAGAIN might no longer
>> be returned. lan, ctl, sas, spi, and fc all make use of
>> mpt_config(). This may be
>> non-trivial with regard to testing. Or it may not. And, as
>> James points out,
>> we have to assure that the caller is in a context which can sleep.
>>
>
> The caller context of mpt_config() *MUST* be able to sleep.
> Towards this end of this function we call wait_event(). This is
> because we are waiting on the firmware reply. If the fw reply
> doesn't come, then watchdog timer kicks in, then mpt_timer_expired
> is called.
>
>
>> Can we leave the interface alone for the moment and accept the patch
>> as written? Then, look at changing mpt_config() and the evaluate the
>> testing burden that the change might impose?
>>
>
> James?
>
>> My vested interest is in getting the functionality into certain
>> distros of interest. I have no problem with rearchitecting the
>> patchset as described above. I'm just concerned with the timing.
>> I suspect that the testing required will push the patch's acceptance
>> beyond my potential window of opportunity. As written, the change
>> is confined to fibre channel so will not potentially introduce
>> regressions into the other drivers.
>>
>
> How?
>
> I doubt there would be regressions. We would need to implement timeout
> on not receiving a mf, then return EAGAIN as we do today.
Then is there a reason to implement the change at all? If the caller
can still receive EAGAIN, the caller still has to handle it.
Or just view it as an opaque error as callers do today.
Mike
>
> Eric
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
@ 2006-05-22 23:09 Moore, Eric
0 siblings, 0 replies; 12+ messages in thread
From: Moore, Eric @ 2006-05-22 23:09 UTC (permalink / raw)
To: Michael Reed; +Cc: James Bottomley, linux-scsi
On Monday, May 22, 2006 12:41 PM, Michael Reed wrote:
> > I doubt there would be regressions. We would need to
> implement timeout
> > on not receiving a mf, then return EAGAIN as we do today.
>
> Then is there a reason to implement the change at all? If the caller
> can still receive EAGAIN, the caller still has to handle it.
> Or just view it as an opaque error as callers do today.
>
Callers of mpt_config don't care about EAGAIN. Some return the
return value up the calling stack, however ends with them evaulating
a non-zero value as a error, not as retry me later. So my suggestion of
returning EGAIN after waiting on mf's to be available, really doesn't
matter,
in regards to regressions.
I will not have time the next three weeks to address this, as I will
be on the road. I prefer mpt_config to be fixed, however I understand
your
concerns, as far as testing and timing goes.
Eric
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates
2006-05-22 18:40 ` Michael Reed
@ 2006-05-24 18:48 ` Michael Reed
0 siblings, 0 replies; 12+ messages in thread
From: Michael Reed @ 2006-05-24 18:48 UTC (permalink / raw)
To: Moore, Eric, James Bottomley
Cc: Michael Reed, linux-scsi, Jeremy Higdon, Gary Hagensen
I've done some investigation into WHY mpt_config() is returning EAGAIN
during my board reset testing. It's not because there are no message
frames, it is because mpt_get_msg_frame() is checking ioc->active and
finding it zero. ioc->active is set to zero in mpt_do_ioc_recovery()
and it remains zero until the board reset is complete. This takes
some time.
With this new understanding, I've tested an alternate bit of code.
The fc routine in question examines ioc->active, and if zero, it exits.
The existing code in the driver already reschedules the work once
the ioc becomes active. So, I can get rid of the sleep. (Yeah.)
Thank you for making me re-examine this issue. A better fix was
^^^^^^^^^
available.
I'll repost the patches later today.
Mike
Michael Reed wrote:
>
> Moore, Eric wrote:
>> On Monday, May 22, 2006 12:06 PM, Michael Reed wrote:
>>> Changing mpt_config() to sleep changes the behavior in that
>>> EAGAIN might no longer
>>> be returned. lan, ctl, sas, spi, and fc all make use of
>>> mpt_config(). This may be
>>> non-trivial with regard to testing. Or it may not. And, as
>>> James points out,
>>> we have to assure that the caller is in a context which can sleep.
>>>
>> The caller context of mpt_config() *MUST* be able to sleep.
>> Towards this end of this function we call wait_event(). This is
>> because we are waiting on the firmware reply. If the fw reply
>> doesn't come, then watchdog timer kicks in, then mpt_timer_expired
>> is called.
>>
>>
>>> Can we leave the interface alone for the moment and accept the patch
>>> as written? Then, look at changing mpt_config() and the evaluate the
>>> testing burden that the change might impose?
>>>
>> James?
>>
>>> My vested interest is in getting the functionality into certain
>>> distros of interest. I have no problem with rearchitecting the
>>> patchset as described above. I'm just concerned with the timing.
>>> I suspect that the testing required will push the patch's acceptance
>>> beyond my potential window of opportunity. As written, the change
>>> is confined to fibre channel so will not potentially introduce
>>> regressions into the other drivers.
>>>
>> How?
>>
>> I doubt there would be regressions. We would need to implement timeout
>> on not receiving a mf, then return EAGAIN as we do today.
>
> Then is there a reason to implement the change at all? If the caller
> can still receive EAGAIN, the caller still has to handle it.
> Or just view it as an opaque error as callers do today.
>
> Mike
>
>> Eric
>>
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2006-05-24 18:48 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-22 18:15 [PATCH 1/6] mpt fusion - fibre channel target discovery prematurely terminates Moore, Eric
2006-05-22 18:40 ` Michael Reed
2006-05-24 18:48 ` Michael Reed
-- strict thread matches above, loose matches on Subject: below --
2006-05-22 23:09 Moore, Eric
2006-05-22 17:45 Moore, Eric
2006-05-22 17:57 ` James Bottomley
2006-05-22 18:05 ` Michael Reed
2006-05-19 23:41 Moore, Eric
2006-05-18 19:59 Michael Reed
2006-05-20 15:33 ` James Bottomley
2006-05-22 17:31 ` Michael Reed
2006-05-22 17:34 ` Michael Reed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).