linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <jejb@linux.vnet.ibm.com>
To: Christoph Hellwig <hch@infradead.org>, Wei Fang <fangwei1@huawei.com>
Cc: tj@kernel.org, martin.petersen@oracle.com,
	linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: [PATCH] scsi: fix race between simultaneous decrements of ->host_failed
Date: Sun, 29 May 2016 08:41:13 -0700	[thread overview]
Message-ID: <1464536473.2287.6.camel@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160529065452.GA21677@infradead.org>

On Sat, 2016-05-28 at 23:54 -0700, Christoph Hellwig wrote:
> On Sat, May 28, 2016 at 11:51:11AM +0800, Wei Fang wrote:
> > async_sas_ata_eh(), which will call scsi_eh_finish_cmd() in some 
> > case, would be performed simultaneously in 
> > sas_ata_strategy_handler(). In this case, ->host_failed may be 
> > decreased simultaneously in scsi_eh_finish_cmd() on different CPUs,
> > and become abnormal.
> > 
> > It will lead to permanently inequal between ->host_failed and
> >  ->host_busy. Then SCSI error handler thread won't become running,
> > SCSI errors after that won't be handled forever.
> > 
> > Use atomic type for ->host_failed to fix this race.
> 
> Looks fine,

Actually, it doesn't look fine at all.  The same mechanism that's
supposed to protect the host_failed decrement is also supposed to
protect the list_move_tail().  If there's a problem with the former
then we're also in danger of corrupting the list.

Can we go back to the theory of what the problem is, since it's not
spelled out very clearly in the change log.  Our usual reason for not
requiring locking in eh routines is that the eh is single threaded on
the eh thread per host, so any host manipulations can't have
concurrency problems.  In this case, the sas_ata routines are trying to
be clever and use asynchronous workqueues for the port error handler
and you theorise that these can execute concurrently on two CPUs, thus
causing the problem?

James



  reply	other threads:[~2016-05-29 15:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-28  3:51 [PATCH] scsi: fix race between simultaneous decrements of ->host_failed Wei Fang
2016-05-29  6:54 ` Christoph Hellwig
2016-05-29 15:41   ` James Bottomley [this message]
2016-05-29 18:06     ` Christoph Hellwig
2016-05-29 19:15       ` James Bottomley
2016-05-30  7:27     ` Wei Fang
2016-05-30 16:04       ` James Bottomley
2016-05-30  7:43   ` Wei Fang
2016-05-30 19:10     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1464536473.2287.6.camel@linux.vnet.ibm.com \
    --to=jejb@linux.vnet.ibm.com \
    --cc=fangwei1@huawei.com \
    --cc=hch@infradead.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).