public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin Peschke <mp3@de.ibm.com>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christof Schmitt <christof.schmitt@de.ibm.com>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	linux-scsi@vger.kernel.org, linux-s390@vger.kernel.org
Subject: Re: [patch 1/2] zfcp: Fix race during ERP thread shutdown
Date: Wed, 12 Mar 2008 15:13:47 +0100	[thread overview]
Message-ID: <1205331227.4852.102.camel@kitka.ibm.com> (raw)
In-Reply-To: <20080311164341.GA6380@osiris.boeblingen.de.ibm.com>

On Tue, 2008-03-11 at 17:43 +0100, Heiko Carstens wrote:
> > --- a/drivers/s390/scsi/zfcp_erp.c	2008-03-10 12:34:48.000000000 +0100
> > +++ b/drivers/s390/scsi/zfcp_erp.c	2008-03-10 16:05:26.000000000 +0100
> > @@ -1002,7 +1002,7 @@ zfcp_erp_thread_setup(struct zfcp_adapte
> >  					    &adapter->status));
> >  		debug_text_event(adapter->erp_dbf, 5, "a_thset_ok");
> >  	}
> > -
> > +	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status);
> >  	return (retval < 0);
> 
> This way the flag will be set even if the creation of the erp thread failed.

You are right.

This used to be correct with all the cleanups pending in our queue
applied. Looks like this flaw creeped in when the bug fix patch was
moved to the head of queue.

> It should be done somewhere within zfcp_erp_thread() instead.

Setting the flag in the setup function is fine, as long as we make sure
the thread is really up and running. I'd prefer to have it here because
of its symmetry with zfcp_erp_thread_kill().

> Besides that setting the flag must be done while holding the erp_lock.
> Otherwise zfcp_erp_action_enqueue might check the flag, other cpu clears
> the flag right after that, and then enqueueing continues while the flag is
> not set.

Correct.

zfcp_erp_wait() might fail to wait for that new action to finish,
because the PENDING flag is set later in the action enqueuing process.
But we want zfcp_erp_wait() to make sure that recovery is settled, so
that we can go on killing the thread.

> >  }
> > 
> > @@ -1025,6 +1025,8 @@ zfcp_erp_thread_kill(struct zfcp_adapter
> >  {
> >  	int retval = 0;
> > 
> > +	atomic_clear_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status);
> > +	zfcp_erp_wait(adapter);
> >  	atomic_set_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_KILL, &adapter->status);
> >  	up(&adapter->erp_ready_sem);
> > 
> > @@ -2940,8 +2942,7 @@ zfcp_erp_action_enqueue(int action,
> >  	 * efficient.
> >  	 */
> > 
> > -	if (!atomic_test_mask(ZFCP_STATUS_ADAPTER_ERP_THREAD_UP,
> > -			      &adapter->status))
> > +	if (!atomic_test_mask(ZFCP_STATUS_ADAPTER_ERP_ACCEPT, &adapter->status))
> >  		return -EIO;
> > 
> >  	debug_event(adapter->erp_dbf, 4, &action, sizeof (int));
> > --- a/drivers/s390/scsi/zfcp_ccw.c	2008-03-10 12:34:48.000000000 +0100
> > +++ b/drivers/s390/scsi/zfcp_ccw.c	2008-03-10 16:05:26.000000000 +0100
> > @@ -198,7 +198,6 @@ zfcp_ccw_set_offline(struct ccw_device *
> >  	down(&zfcp_data.config_sema);
> >  	adapter = dev_get_drvdata(&ccw_device->dev);
> >  	zfcp_erp_adapter_shutdown(adapter, 0);
> > -	zfcp_erp_wait(adapter);
> >  	zfcp_erp_thread_kill(adapter);
> 
> And as a sidenote, if the scenario this patch is supposed to fix can really
> happen, how can you make sure the adapter is still down before the erp thread
> gets killed? Some action could have been enqueued and finished which reopened
> the adapter after zfcp_erp_adapter_shutdown and before zfcp_erp_thread_kill.
> Just wondering... :)

You acting the innocent... :)

As to question one - Is there a way for recovery being triggered when we
are about to kill the recovery thread during the adapter offlining
procedure:
I am not sure. There are many potential triggers for error recovery.
Some code path might be blocked when an adapter has been shut down
(zfcp_fsf.c). I could review each one of them, and maybe add a comment
about the assumption in the zfcp_ccw.c. But I don't like the idea of
some one adding another recovery trigger somewhere in the driver in the
future, not taking heed of this assumption.

That is why, I'd rather have the error recovery shutdown procedure be
airtight and self-dependent in the first place and have it not rely on
shaky assumptions about the code which uses it.

As to question two - Do we know for sure the adapter won't be brought
back up again?
With or without my patch, the issue would be there.
Actually my patch is useless as long as it only concernes itself with
bringing recovery to a termination, while disregarding the fact that
recovery needs to be brought to a termination in a way that guarantees
that adapter operation has been stopped.

Thank you,
Martin



  reply	other threads:[~2008-03-12 14:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-10 15:18 [patch 0/2] zfcp fixes Christof Schmitt
2008-03-10 15:18 ` [patch 1/2] zfcp: Fix race during ERP thread shutdown Christof Schmitt
2008-03-11 16:43   ` Heiko Carstens
2008-03-12 14:13     ` Martin Peschke [this message]
2008-03-17 14:19       ` Christof Schmitt
2008-03-17 14:23         ` James Bottomley
2008-03-17 14:29           ` Christof Schmitt
2008-03-10 15:18 ` [patch 2/2] zfcp: Fix handling for boxed port after physical close Christof Schmitt
2008-03-10 15:33 ` [patch 0/2] zfcp fixes James Bottomley
2008-03-10 16:26   ` Christof Schmitt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1205331227.4852.102.camel@kitka.ibm.com \
    --to=mp3@de.ibm.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=christof.schmitt@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox