[patch 0/2] zfcp fixes

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 0/2] zfcp fixes
@ 2008-03-10 15:18 Christof Schmitt
  2008-03-10 15:18 ` [patch 1/2] zfcp: Fix race during ERP thread shutdown Christof Schmitt
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Christof Schmitt @ 2008-03-10 15:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-s390

James,

here are two more fixes for zfcp. They apply on scsi-misc.

-- 
Christof Schmitt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 1/2] zfcp: Fix race during ERP thread shutdown
  2008-03-10 15:18 [patch 0/2] zfcp fixes Christof Schmitt
@ 2008-03-10 15:18 ` Christof Schmitt
  2008-03-11 16:43   ` Heiko Carstens
  2008-03-10 15:18 ` [patch 2/2] zfcp: Fix handling for boxed port after physical close Christof Schmitt
  2008-03-10 15:33 ` [patch 0/2] zfcp fixes James Bottomley
  2 siblings, 1 reply; 10+ messages in thread
From: Christof Schmitt @ 2008-03-10 15:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-s390, Martin Peschke, Christof Schmitt

[-- Attachment #1: zfcp-fix-erp-new-action-vs-thread-kill-race.diff --]
[-- Type: text/plain, Size: 2476 bytes --]

From: Martin Peschke <mp3@de.ibm.com>

When shutting down an adapter, recovery actions might have been
started between zfcp_erp_wait() and zfcp_erp_thread_kill(), breaking
assumptions in the latter function.

Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---
 drivers/s390/scsi/zfcp_ccw.c |    1 -
 drivers/s390/scsi/zfcp_def.h |    1 +
 drivers/s390/scsi/zfcp_erp.c |    7 ++++---
 3 files changed, 5 insertions(+), 4 deletions(-)

--- a/drivers/s390/scsi/zfcp_def.h	2008-03-10 12:34:48.000000000 +0100
+++ b/drivers/s390/scsi/zfcp_def.h	2008-03-10 16:05:26.000000000 +0100
@@ -608,6 +608,7 @@ do { \
 #define ZFCP_STATUS_ADAPTER_XCONFIG_OK		0x00000008
 #define ZFCP_STATUS_ADAPTER_HOST_CON_INIT	0x00000010
 #define ZFCP_STATUS_ADAPTER_ERP_THREAD_UP	0x00000020
+#define ZFCP_STATUS_ADAPTER_ERP_ACCEPT		0x00000040
 #define ZFCP_STATUS_ADAPTER_ERP_THREAD_KILL	0x00000080
 #define ZFCP_STATUS_ADAPTER_ERP_PENDING		0x00000100
 #define ZFCP_STATUS_ADAPTER_LINK_UNPLUGGED	0x00000200
--- a/drivers/s390/scsi/zfcp_erp.c	2008-03-10 12:34:48.000000000 +0100
+++ b/drivers/s390/scsi/zfcp_erp.c	2008-03-10 16:05:26.000000000 +0100
@@ -1002,7 +1002,7 @@ zfcp_erp_thread_setup(struct zfcp_adapte
 					    &adapter->status));
 		debug_text_event(adapter->erp_dbf, 5, "a_thset_ok");
 	}
-
+	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status);
 	return (retval < 0);
 }
 
@@ -1025,6 +1025,8 @@ zfcp_erp_thread_kill(struct zfcp_adapter
 {
 	int retval = 0;
 
+	atomic_clear_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status);
+	zfcp_erp_wait(adapter);
 	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_KILL, &adapter->status);
 	up(&adapter->erp_ready_sem);
 
@@ -2940,8 +2942,7 @@ zfcp_erp_action_enqueue(int action,
 	 * efficient.
 	 */
 
-	if (!atomic_test_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_UP,
-			      &adapter->status))
+	if (!atomic_test_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status))
 		return -EIO;
 
 	debug_event(adapter->erp_dbf, 4, &action, sizeof (int));
--- a/drivers/s390/scsi/zfcp_ccw.c	2008-03-10 12:34:48.000000000 +0100
+++ b/drivers/s390/scsi/zfcp_ccw.c	2008-03-10 16:05:26.000000000 +0100
@@ -198,7 +198,6 @@ zfcp_ccw_set_offline(struct ccw_device *
 	down(&zfcp_data.config_sema);
 	adapter = dev_get_drvdata(&ccw_device->dev);
 	zfcp_erp_adapter_shutdown(adapter, 0);
-	zfcp_erp_wait(adapter);
 	zfcp_erp_thread_kill(adapter);
 	up(&zfcp_data.config_sema);
 	return 0;

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch 2/2] zfcp: Fix handling for boxed port after physical close
  2008-03-10 15:18 [patch 0/2] zfcp fixes Christof Schmitt
  2008-03-10 15:18 ` [patch 1/2] zfcp: Fix race during ERP thread shutdown Christof Schmitt
@ 2008-03-10 15:18 ` Christof Schmitt
  2008-03-10 15:33 ` [patch 0/2] zfcp fixes James Bottomley
  2 siblings, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2008-03-10 15:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-s390, Christof Schmitt, Martin Peschke

[-- Attachment #1: zfcp-phys-close-boxed.diff --]
[-- Type: text/plain, Size: 1173 bytes --]

From: Christof Schmitt <christof.schmitt@de.ibm.com>

When a FSF physical close returns the status boxed, this means that
another system already closed the port. For our system this is the
same status as in the good path, we have to send the normal close. So,
set the status for the boxed response to the same as for the good
status.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
---
 drivers/s390/scsi/zfcp_fsf.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- linux-2.6.git.orig/drivers/s390/scsi/zfcp_fsf.c
+++ linux-2.6.git/drivers/s390/scsi/zfcp_fsf.c
@@ -2968,6 +2968,13 @@ zfcp_fsf_close_physical_port_handler(str
 		zfcp_erp_port_boxed(port);
 		fsf_req->status |= ZFCP_STATUS_FSFREQ_ERROR |
 			ZFCP_STATUS_FSFREQ_RETRY;
+
+		/* can't use generic zfcp_erp_modify_port_status because
+		 * ZFCP_STATUS_COMMON_OPEN must not be reset for the port */
+		atomic_clear_mask(ZFCP_STATUS_PORT_PHYS_OPEN, &port->status);
+		list_for_each_entry(unit, &port->unit_list_head, list)
+			atomic_clear_mask(ZFCP_STATUS_COMMON_OPEN,
+					  &unit->status);
 		break;
 
 	case FSF_ADAPTER_STATUS_AVAILABLE:

-- 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 0/2] zfcp fixes
  2008-03-10 15:18 [patch 0/2] zfcp fixes Christof Schmitt
  2008-03-10 15:18 ` [patch 1/2] zfcp: Fix race during ERP thread shutdown Christof Schmitt
  2008-03-10 15:18 ` [patch 2/2] zfcp: Fix handling for boxed port after physical close Christof Schmitt
@ 2008-03-10 15:33 ` James Bottomley
  2008-03-10 16:26   ` Christof Schmitt
  2 siblings, 1 reply; 10+ messages in thread
From: James Bottomley @ 2008-03-10 15:33 UTC (permalink / raw)
  To: Christof Schmitt; +Cc: linux-scsi, linux-s390


On Mon, 2008-03-10 at 16:18 +0100, Christof Schmitt wrote:
> James,
> 
> here are two more fixes for zfcp. They apply on scsi-misc.

scsi-misc is accumulating paches for the next merge window.  If these
are fixes as they seem to be, shouldn't they be going into
scsi-rc-fixes, which is where bug fixes are accumulated for fast onward
transmission?

James



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 0/2] zfcp fixes
  2008-03-10 15:33 ` [patch 0/2] zfcp fixes James Bottomley
@ 2008-03-10 16:26   ` Christof Schmitt
  0 siblings, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2008-03-10 16:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-s390

On Mon, Mar 10, 2008 at 10:33:24AM -0500, James Bottomley wrote:
> scsi-misc is accumulating paches for the next merge window.  If these
> are fixes as they seem to be, shouldn't they be going into
> scsi-rc-fixes, which is where bug fixes are accumulated for fast onward
> transmission?

These fixes are not critical. The one for the ERP thread was found
during code cleanup, and i am not sure if it is possible to trigger
the bug. The other one improves the error recovery behaviour, so that
in this special case there are less actions necessary.

In short: They fit scsi-misc better than the important fixes in
scsi-rc. When there are critical zfcp fixes, i will send them for
scsi-rc.

Christof

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] zfcp: Fix race during ERP thread shutdown
  2008-03-10 15:18 ` [patch 1/2] zfcp: Fix race during ERP thread shutdown Christof Schmitt
@ 2008-03-11 16:43   ` Heiko Carstens
  2008-03-12 14:13     ` Martin Peschke
  0 siblings, 1 reply; 10+ messages in thread
From: Heiko Carstens @ 2008-03-11 16:43 UTC (permalink / raw)
  To: Christof Schmitt; +Cc: James Bottomley, linux-scsi, linux-s390, Martin Peschke

> --- a/drivers/s390/scsi/zfcp_erp.c	2008-03-10 12:34:48.000000000 +0100
> +++ b/drivers/s390/scsi/zfcp_erp.c	2008-03-10 16:05:26.000000000 +0100
> @@ -1002,7 +1002,7 @@ zfcp_erp_thread_setup(struct zfcp_adapte
>  					    &adapter->status));
>  		debug_text_event(adapter->erp_dbf, 5, "a_thset_ok");
>  	}
> -
> +	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status);
>  	return (retval < 0);

This way the flag will be set even if the creation of the erp thread failed.
It should be done somewhere within zfcp_erp_thread() instead.

Besides that setting the flag must be done while holding the erp_lock.
Otherwise zfcp_erp_action_enqueue might check the flag, other cpu clears
the flag right after that, and then enqueueing continues while the flag is
not set.

>  }
> 
> @@ -1025,6 +1025,8 @@ zfcp_erp_thread_kill(struct zfcp_adapter
>  {
>  	int retval = 0;
> 
> +	atomic_clear_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status);
> +	zfcp_erp_wait(adapter);
>  	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_KILL, &adapter->status);
>  	up(&adapter->erp_ready_sem);
> 
> @@ -2940,8 +2942,7 @@ zfcp_erp_action_enqueue(int action,
>  	 * efficient.
>  	 */
> 
> -	if (!atomic_test_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_UP,
> -			      &adapter->status))
> +	if (!atomic_test_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status))
>  		return -EIO;
> 
>  	debug_event(adapter->erp_dbf, 4, &action, sizeof (int));
> --- a/drivers/s390/scsi/zfcp_ccw.c	2008-03-10 12:34:48.000000000 +0100
> +++ b/drivers/s390/scsi/zfcp_ccw.c	2008-03-10 16:05:26.000000000 +0100
> @@ -198,7 +198,6 @@ zfcp_ccw_set_offline(struct ccw_device *
>  	down(&zfcp_data.config_sema);
>  	adapter = dev_get_drvdata(&ccw_device->dev);
>  	zfcp_erp_adapter_shutdown(adapter, 0);
> -	zfcp_erp_wait(adapter);
>  	zfcp_erp_thread_kill(adapter);

And as a sidenote, if the scenario this patch is supposed to fix can really
happen, how can you make sure the adapter is still down before the erp thread
gets killed? Some action could have been enqueued and finished which reopened
the adapter after zfcp_erp_adapter_shutdown and before zfcp_erp_thread_kill.
Just wondering... :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] zfcp: Fix race during ERP thread shutdown
  2008-03-11 16:43   ` Heiko Carstens
@ 2008-03-12 14:13     ` Martin Peschke
  2008-03-17 14:19       ` Christof Schmitt
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Peschke @ 2008-03-12 14:13 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Christof Schmitt, James Bottomley, linux-scsi, linux-s390

On Tue, 2008-03-11 at 17:43 +0100, Heiko Carstens wrote:
> > --- a/drivers/s390/scsi/zfcp_erp.c	2008-03-10 12:34:48.000000000 +0100
> > +++ b/drivers/s390/scsi/zfcp_erp.c	2008-03-10 16:05:26.000000000 +0100
> > @@ -1002,7 +1002,7 @@ zfcp_erp_thread_setup(struct zfcp_adapte
> >  					    &adapter->status));
> >  		debug_text_event(adapter->erp_dbf, 5, "a_thset_ok");
> >  	}
> > -
> > +	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status);
> >  	return (retval < 0);
> 
> This way the flag will be set even if the creation of the erp thread failed.

You are right.

This used to be correct with all the cleanups pending in our queue
applied. Looks like this flaw creeped in when the bug fix patch was
moved to the head of queue.

> It should be done somewhere within zfcp_erp_thread() instead.

Setting the flag in the setup function is fine, as long as we make sure
the thread is really up and running. I'd prefer to have it here because
of its symmetry with zfcp_erp_thread_kill().

> Besides that setting the flag must be done while holding the erp_lock.
> Otherwise zfcp_erp_action_enqueue might check the flag, other cpu clears
> the flag right after that, and then enqueueing continues while the flag is
> not set.

Correct.

zfcp_erp_wait() might fail to wait for that new action to finish,
because the PENDING flag is set later in the action enqueuing process.
But we want zfcp_erp_wait() to make sure that recovery is settled, so
that we can go on killing the thread.

> >  }
> > 
> > @@ -1025,6 +1025,8 @@ zfcp_erp_thread_kill(struct zfcp_adapter
> >  {
> >  	int retval = 0;
> > 
> > +	atomic_clear_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status);
> > +	zfcp_erp_wait(adapter);
> >  	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_KILL, &adapter->status);
> >  	up(&adapter->erp_ready_sem);
> > 
> > @@ -2940,8 +2942,7 @@ zfcp_erp_action_enqueue(int action,
> >  	 * efficient.
> >  	 */
> > 
> > -	if (!atomic_test_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_UP,
> > -			      &adapter->status))
> > +	if (!atomic_test_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status))
> >  		return -EIO;
> > 
> >  	debug_event(adapter->erp_dbf, 4, &action, sizeof (int));
> > --- a/drivers/s390/scsi/zfcp_ccw.c	2008-03-10 12:34:48.000000000 +0100
> > +++ b/drivers/s390/scsi/zfcp_ccw.c	2008-03-10 16:05:26.000000000 +0100
> > @@ -198,7 +198,6 @@ zfcp_ccw_set_offline(struct ccw_device *
> >  	down(&zfcp_data.config_sema);
> >  	adapter = dev_get_drvdata(&ccw_device->dev);
> >  	zfcp_erp_adapter_shutdown(adapter, 0);
> > -	zfcp_erp_wait(adapter);
> >  	zfcp_erp_thread_kill(adapter);
> 
> And as a sidenote, if the scenario this patch is supposed to fix can really
> happen, how can you make sure the adapter is still down before the erp thread
> gets killed? Some action could have been enqueued and finished which reopened
> the adapter after zfcp_erp_adapter_shutdown and before zfcp_erp_thread_kill.
> Just wondering... :)

You acting the innocent... :)

As to question one - Is there a way for recovery being triggered when we
are about to kill the recovery thread during the adapter offlining
procedure:
I am not sure. There are many potential triggers for error recovery.
Some code path might be blocked when an adapter has been shut down
(zfcp_fsf.c). I could review each one of them, and maybe add a comment
about the assumption in the zfcp_ccw.c. But I don't like the idea of
some one adding another recovery trigger somewhere in the driver in the
future, not taking heed of this assumption.

That is why, I'd rather have the error recovery shutdown procedure be
airtight and self-dependent in the first place and have it not rely on
shaky assumptions about the code which uses it.

As to question two - Do we know for sure the adapter won't be brought
back up again?
With or without my patch, the issue would be there.
Actually my patch is useless as long as it only concernes itself with
bringing recovery to a termination, while disregarding the fact that
recovery needs to be brought to a termination in a way that guarantees
that adapter operation has been stopped.

Thank you,
Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] zfcp: Fix race during ERP thread shutdown
  2008-03-12 14:13     ` Martin Peschke
@ 2008-03-17 14:19       ` Christof Schmitt
  2008-03-17 14:23         ` James Bottomley
  0 siblings, 1 reply; 10+ messages in thread
From: Christof Schmitt @ 2008-03-17 14:19 UTC (permalink / raw)
  To: James Bottomley; +Cc: Heiko Carstens, Martin Peschke, linux-scsi, linux-s390

James,

we discussed this in our team. Since the issue is not new, and nobody
was able to trigger this so far, we will drop the patch for now, until
there is a better solution.

But this does not affect the patch in "[patch 2/2] zfcp: Fix handling
for boxed port after physical close". Could you pull this one into
scsi-misc?

Christof

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] zfcp: Fix race during ERP thread shutdown
  2008-03-17 14:19       ` Christof Schmitt
@ 2008-03-17 14:23         ` James Bottomley
  2008-03-17 14:29           ` Christof Schmitt
  0 siblings, 1 reply; 10+ messages in thread
From: James Bottomley @ 2008-03-17 14:23 UTC (permalink / raw)
  To: Christof Schmitt; +Cc: Heiko Carstens, Martin Peschke, linux-scsi, linux-s390

On Mon, 2008-03-17 at 15:19 +0100, Christof Schmitt wrote:
> But this does not affect the patch in "[patch 2/2] zfcp: Fix handling
> for boxed port after physical close". Could you pull this one into
> scsi-misc?

Sure ... it's queued and I've dropped the other one.

James



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch 1/2] zfcp: Fix race during ERP thread shutdown
  2008-03-17 14:23         ` James Bottomley
@ 2008-03-17 14:29           ` Christof Schmitt
  0 siblings, 0 replies; 10+ messages in thread
From: Christof Schmitt @ 2008-03-17 14:29 UTC (permalink / raw)
  To: James Bottomley; +Cc: Heiko Carstens, Martin Peschke, linux-scsi, linux-s390

On Mon, Mar 17, 2008 at 09:23:57AM -0500, James Bottomley wrote:
> On Mon, 2008-03-17 at 15:19 +0100, Christof Schmitt wrote:
> > But this does not affect the patch in "[patch 2/2] zfcp: Fix handling
> > for boxed port after physical close". Could you pull this one into
> > scsi-misc?
> 
> Sure ... it's queued and I've dropped the other one.

Thanks.

Christof

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-03-17 14:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-10 15:18 [patch 0/2] zfcp fixes Christof Schmitt
2008-03-10 15:18 ` [patch 1/2] zfcp: Fix race during ERP thread shutdown Christof Schmitt
2008-03-11 16:43   ` Heiko Carstens
2008-03-12 14:13     ` Martin Peschke
2008-03-17 14:19       ` Christof Schmitt
2008-03-17 14:23         ` James Bottomley
2008-03-17 14:29           ` Christof Schmitt
2008-03-10 15:18 ` [patch 2/2] zfcp: Fix handling for boxed port after physical close Christof Schmitt
2008-03-10 15:33 ` [patch 0/2] zfcp fixes James Bottomley
2008-03-10 16:26   ` Christof Schmitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox