linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Jeff Garzik <jgarzik@pobox.com>
Cc: linux-ide@vger.kernel.org, albertcc@tw.ibm.com
Subject: Re: [PATCH 09/13] libata: fix handling of race between timeout and completion
Date: Thu, 2 Feb 2006 00:36:58 +0900	[thread overview]
Message-ID: <20060201153658.GA4311@htj.dyndns.org> (raw)
In-Reply-To: <43D999CF.7020707@pobox.com>

[-- Attachment #1: Type: text/plain, Size: 5272 bytes --]

Hello, Jeff.

On Thu, Jan 26, 2006 at 10:55:59PM -0500, Jeff Garzik wrote:
> Tejun Heo wrote:
> >If a qc completes after SCSI timer expires but before libata EH kicks
> >in, the qc gets completed but the scsicmd still gets passed to libata
> >EH resulting in ->eng_timeout invocation with NULL qc.  Currently none
> >of ->eng_timeout callbacks handles this properly.  This patch makes
> >ata_scsi_error() bypass ->eng_timeout and handle this rare case.
> >
> >Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
> Doesn't look right to me.  If the qc gets completed, then the SCSI 
> command should have also been completed.
> 
> AFAICS the race is this:  if the qc is completed, then there is no 
> active tag, so ata_qc_from_tag() correctly returns NULL.  This case is 
> really an "error has already been handled" case.  It looks really wrong 
> to even touch a scsicmd.  I would just clear the eh_cmd list and be done 
> with it.
> 

SCSI EH never invokes ->eh_strategy_handler without any failed
scsicmd.  The only case where qc <-> scmd mismatch occurs is the
following case.

1. SCSI issues command (timer activated)
2. libata interprets and issues
3. time passes
4. SCSI timeout occurs, scmd queued for EH
5. irq occurs, completes qc.  ata_qc_complete() calls qc->complete_fn
   which ends up calling scsi_done().  scsi_done() sees that SCSI
   timer has already expired, so it returns without doing anything
   expecting EH to clean up.
6. SCSI EH kicks in, libata ->eh_strategy_handler (ata_scsi_error)
   invoked.
7. in libata EH, we have no active qc but scmds to recover.

This can be reproduced by delaying completion of a qc by 31 secs and
EH of the command by 5 secs.

@30: timeout kicks in and invoked EH sleeps for 5secs
@31: delayed qc completion
@35: EH continues

With the current upstream branch, the following is the result of such
race.

  ata1: inducing race condition
  ata1: delaying completion of f774c818 by 31 secs
  ata1: EH invoked for delayed qc f774c818, sleeping 5secs
  ata1: delayed completion of f774c818
  ata1: woke up, proceeding with EH
  ata1: handling error/timeout
  ata1: BUG: timeout without command
  Assertion failed! host->host_failed == 0 && list_empty(&host->eh_cmd_q),drivers/
  scsi/libata-scsi.c,ata_scsi_error,line=751
  ata1: handling error/timeout
  ata1: BUG: timeout without command
  Assertion failed! host->host_failed == 0 && list_empty(&host->eh_cmd_q),drivers/
  scsi/libata-scsi.c,ata_scsi_error,line=751
  ata1: handling error/timeout
  ata1: BUG: timeout without command
  Assertion failed! host->host_failed == 0 && list_empty(&host->eh_cmd_q),drivers/
  scsi/libata-scsi.c,ata_scsi_error,line=751
  ata1: handling error/timeout
  ata1: BUG: timeout without command
  Assertion failed! host->host_failed == 0 && list_empty(&host->eh_cmd_q),drivers/
  scsi/libata-scsi.c,ata_scsi_error,line=751
  ... repeat ...

What happens is, eh_cmd_q and host_failed aren't cleared properly and
scsi_error_handler() ends up looping forever.

If host->eh_cmd_q and host->host_failed are cleared as you suggested,
the following happens.

ata1: inducing race condition
ata1: delaying completion of f774c818 by 31 secs
ata1: EH invoked for delayed qc f774c818, sleeping 5secs
ata1: delayed completion of f774c818
ata1: woke up, proceeding with EH
<< the drive is locked up >>

As eh_cmd_q and host_failed are cleared, scsi_error_handler() goes to
sleep but the failed scmd is not actually finished.  From SCSI's POV,
it's still out there (host_busy and device_busy haven't been decreased
yet).  As our queue depth is 1, no other command will ever get issued
again.

With the patch I submitted, the result looks like...

ata1: inducing race condition
ata1: delaying completion of f774b818 by 31 secs
ata1: EH invoked for delayed qc f774b818, sleeping 5secs
ata1: delayed completion of f774b818
ata1: woke up, proceeding with EH
ata1: interrupt and timer raced for scsicmd f774943c
ata1: inducing race condition
ata1: delaying completion of f774b818 by 31 secs
ata1: EH invoked for delayed qc f774b818, sleeping 5secs
ata1: delayed completion of f774b818
ata1: woke up, proceeding with EH
ata1: interrupt and timer raced for scsicmd f774919c
ata1: inducing race condition
ata1: delaying completion of f774b818 by 31 secs
ata1: EH invoked for delayed qc f774b818, sleeping 5secs
ata1: delayed completion of f774b818
ata1: woke up, proceeding with EH
ata1: interrupt and timer raced for scsicmd f774943c

It recovers and IO continues.

The problem is that there are race conditions which make scmd's
lifetime and qc's lifetime different.  qc ends up dying earlier than
its scmd and in the process we also lose completion status.  The only
thing that can/must be done is retrying or failing the scmd.

Note that altough this race condition should be very rare but with NCQ
the likelihood increases as EH waits until all commands are settled
after a qc times out.  This widens the window for such race condition
considerably.

I'm attaching four patches used for the above test cases.

* race-patch		: race inducing patch on top of upstream
* jeff-patch		: clear eh_cmd_q and host_failed only on race
* jeff-race-patch	: race inducing patch on top of jeff-patch
* tj-race-patch		: race inducing patch on top of this fix

-- 
tejun

[-- Attachment #2: race-patch --]
[-- Type: text/plain, Size: 2232 bytes --]

diff --git a/drivers/scsi/libata-core.c b/drivers/scsi/libata-core.c
index 249e67f..d3bc7c1 100644
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -3919,6 +3919,13 @@ void ata_qc_free(struct ata_queued_cmd *
 	}
 }
 
+static void delayed_qc_complete(unsigned long arg)
+{
+	struct ata_queued_cmd *qc = (void *)arg;
+	printk(KERN_ERR "ata%u: delayed completion of %p\n", qc->ap->id, qc);
+	ata_qc_complete(qc);
+}
+
 /**
  *	ata_qc_complete - Complete an active ATA command
  *	@qc: Command to complete
@@ -3936,6 +3943,17 @@ void ata_qc_complete(struct ata_queued_c
 	assert(qc != NULL);	/* ata_qc_from_tag _might_ return NULL */
 	assert(qc->flags & ATA_QCFLAG_ACTIVE);
 
+	if (qc->flags & (1 << 31) && !(qc->flags & (1 << 30))) {
+		static struct timer_list timer;
+		printk(KERN_ERR "ata%u: delaying completion of %p by 31 secs\n",
+		       qc->ap->id, qc);
+		qc->flags |= 1 << 30;
+		setup_timer(&timer, delayed_qc_complete, (unsigned long)qc);
+		timer.expires = jiffies + 31 * HZ;
+		add_timer(&timer);
+		return;
+	}
+
 	if (likely(qc->flags & ATA_QCFLAG_DMAMAP))
 		ata_sg_clean(qc);
 
@@ -4010,6 +4028,15 @@ unsigned int ata_qc_issue(struct ata_que
 	qc->ap->active_tag = qc->tag;
 	qc->flags |= ATA_QCFLAG_ACTIVE;
 
+	{
+		static int cnt = 0;
+		if (++cnt % 1000 == 0) {
+			printk(KERN_ERR "ata%u: inducing race condition\n",
+			       ap->id);
+			qc->flags |= (1 << 31);
+		}
+	}
+
 	return ap->ops->qc_issue(qc);
 
 sg_err:
diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c
index 6df8293..53a7c84 100644
--- a/drivers/scsi/libata-scsi.c
+++ b/drivers/scsi/libata-scsi.c
@@ -732,10 +732,20 @@ int ata_scsi_slave_config(struct scsi_de
 int ata_scsi_error(struct Scsi_Host *host)
 {
 	struct ata_port *ap;
+	struct ata_queued_cmd *qc;
 
 	DPRINTK("ENTER\n");
 
 	ap = (struct ata_port *) &host->hostdata[0];
+
+	qc = ata_qc_from_tag(ap, ap->active_tag);
+	if (qc && (qc->flags & (1 << 31))) {
+		printk(KERN_ERR "ata%u: EH invoked for delayed qc %p, sleeping 5secs\n",
+		       ap->id, qc);
+		msleep(5000);
+		printk(KERN_ERR "ata%u: woke up, proceeding with EH\n", ap->id);
+	}
+
 	ap->ops->eng_timeout(ap);
 
 	assert(host->host_failed == 0 && list_empty(&host->eh_cmd_q));

[-- Attachment #3: jeff-patch --]
[-- Type: text/plain, Size: 657 bytes --]

diff --git a/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c
index 6df8293..17920d1 100644
--- a/drivers/scsi/libata-scsi.c
+++ b/drivers/scsi/libata-scsi.c
@@ -732,11 +732,19 @@ int ata_scsi_slave_config(struct scsi_de
 int ata_scsi_error(struct Scsi_Host *host)
 {
 	struct ata_port *ap;
+	struct ata_queued_cmd *qc;
 
 	DPRINTK("ENTER\n");
 
 	ap = (struct ata_port *) &host->hostdata[0];
-	ap->ops->eng_timeout(ap);
+
+	qc = ata_qc_from_tag(ap, ap->active_tag);
+	if (qc)
+		ap->ops->eng_timeout(ap);
+	else {
+		host->host_failed = 0;
+		INIT_LIST_HEAD(&host->eh_cmd_q);
+	}
 
 	assert(host->host_failed == 0 && list_empty(&host->eh_cmd_q));
 

[-- Attachment #4: jeff-race-patch --]
[-- Type: text/plain, Size: 1957 bytes --]

diff -u b/drivers/scsi/libata-scsi.c b/drivers/scsi/libata-scsi.c
--- b/drivers/scsi/libata-scsi.c
+++ b/drivers/scsi/libata-scsi.c
@@ -739,6 +739,14 @@
 	ap = (struct ata_port *) &host->hostdata[0];
 
 	qc = ata_qc_from_tag(ap, ap->active_tag);
+	if (qc && (qc->flags & (1 << 31))) {
+		printk(KERN_ERR "ata%u: EH invoked for delayed qc %p, sleeping 5secs\n",
+		       ap->id, qc);
+		msleep(5000);
+		printk(KERN_ERR "ata%u: woke up, proceeding with EH\n", ap->id);
+	}
+
+	qc = ata_qc_from_tag(ap, ap->active_tag);
 	if (qc)
 		ap->ops->eng_timeout(ap);
 	else {
only in patch2:
unchanged:
--- a/drivers/scsi/libata-core.c
+++ b/drivers/scsi/libata-core.c
@@ -3919,6 +3919,13 @@ void ata_qc_free(struct ata_queued_cmd *
 	}
 }
 
+static void delayed_qc_complete(unsigned long arg)
+{
+	struct ata_queued_cmd *qc = (void *)arg;
+	printk(KERN_ERR "ata%u: delayed completion of %p\n", qc->ap->id, qc);
+	ata_qc_complete(qc);
+}
+
 /**
  *	ata_qc_complete - Complete an active ATA command
  *	@qc: Command to complete
@@ -3936,6 +3943,17 @@ void ata_qc_complete(struct ata_queued_c
 	assert(qc != NULL);	/* ata_qc_from_tag _might_ return NULL */
 	assert(qc->flags & ATA_QCFLAG_ACTIVE);
 
+	if (qc->flags & (1 << 31) && !(qc->flags & (1 << 30))) {
+		static struct timer_list timer;
+		printk(KERN_ERR "ata%u: delaying completion of %p by 31 secs\n",
+		       qc->ap->id, qc);
+		qc->flags |= 1 << 30;
+		setup_timer(&timer, delayed_qc_complete, (unsigned long)qc);
+		timer.expires = jiffies + 31 * HZ;
+		add_timer(&timer);
+		return;
+	}
+
 	if (likely(qc->flags & ATA_QCFLAG_DMAMAP))
 		ata_sg_clean(qc);
 
@@ -4010,6 +4028,15 @@ unsigned int ata_qc_issue(struct ata_que
 	qc->ap->active_tag = qc->tag;
 	qc->flags |= ATA_QCFLAG_ACTIVE;
 
+	{
+		static int cnt = 0;
+		if (++cnt % 1000 == 0) {
+			printk(KERN_ERR "ata%u: inducing race condition\n",
+			       ap->id);
+			qc->flags |= (1 << 31);
+		}
+	}
+
 	return ap->ops->qc_issue(qc);
 
 sg_err:

[-- Attachment #5: tj-race-patch --]
[-- Type: text/plain, Size: 2310 bytes --]

Index: work1/drivers/scsi/libata-core.c
===================================================================
--- work1.orig/drivers/scsi/libata-core.c	2006-02-01 23:29:00.000000000 +0900
+++ work1/drivers/scsi/libata-core.c	2006-02-01 23:29:20.000000000 +0900
@@ -3959,6 +3959,13 @@ void ata_qc_free(struct ata_queued_cmd *
 	}
 }
 
+static void delayed_qc_complete(unsigned long arg)
+{
+	struct ata_queued_cmd *qc = (void *)arg;
+	printk(KERN_ERR "ata%u: delayed completion of %p\n", qc->ap->id, qc);
+	ata_qc_complete(qc);
+}
+
 /**
  *	ata_qc_complete - Complete an active ATA command
  *	@qc: Command to complete
@@ -3976,6 +3983,17 @@ void ata_qc_complete(struct ata_queued_c
 	assert(qc != NULL);	/* ata_qc_from_tag _might_ return NULL */
 	assert(qc->flags & ATA_QCFLAG_ACTIVE);
 
+	if (qc->flags & (1 << 31) && !(qc->flags & (1 << 30))) {
+		static struct timer_list timer;
+		printk(KERN_ERR "ata%u: delaying completion of %p by 31 secs\n",
+		       qc->ap->id, qc);
+		qc->flags |= 1 << 30;
+		setup_timer(&timer, delayed_qc_complete, (unsigned long)qc);
+		timer.expires = jiffies + 31 * HZ;
+		add_timer(&timer);
+		return;
+	}
+
 	if (likely(qc->flags & ATA_QCFLAG_DMAMAP))
 		ata_sg_clean(qc);
 
@@ -4050,6 +4068,15 @@ unsigned int ata_qc_issue(struct ata_que
 	qc->ap->active_tag = qc->tag;
 	qc->flags |= ATA_QCFLAG_ACTIVE;
 
+	{
+		static int cnt = 0;
+		if (++cnt % 1000 == 0) {
+			printk(KERN_ERR "ata%u: inducing race condition\n",
+			       ap->id);
+			qc->flags |= (1 << 31);
+		}
+	}
+
 	return ap->ops->qc_issue(qc);
 
 sg_err:
Index: work1/drivers/scsi/libata-scsi.c
===================================================================
--- work1.orig/drivers/scsi/libata-scsi.c	2006-02-01 23:29:00.000000000 +0900
+++ work1/drivers/scsi/libata-scsi.c	2006-02-01 23:29:50.000000000 +0900
@@ -737,6 +737,14 @@ int ata_scsi_error(struct Scsi_Host *hos
 
 	DPRINTK("ENTER\n");
 
+	qc = ata_qc_from_tag(ap, ap->active_tag);
+	if (qc && (qc->flags & (1 << 31))) {
+		printk(KERN_ERR "ata%u: EH invoked for delayed qc %p, sleeping 5secs\n",
+		       ap->id, qc);
+		msleep(5000);
+		printk(KERN_ERR "ata%u: woke up, proceeding with EH\n", ap->id);
+	}
+
 	spin_lock_irqsave(&ap->host_set->lock, flags);
 	qc = ata_qc_from_tag(ap, ap->active_tag);
 	assert(!(ap->flags & ATA_FLAG_IN_EH));

  parent reply	other threads:[~2006-02-01 15:37 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-23  4:09 [PATCHSET] libata: various fixes related to EH, take #2 Tejun Heo
2006-01-23  4:09 ` [PATCH 03/13] libata: fix ata_qc_issue() error handling Tejun Heo
2006-01-23  4:09 ` [PATCH 04/13] ahci: fix err_mask setting in ahci_host_intr Tejun Heo
2006-01-27  3:36   ` Jeff Garzik
2006-01-23  4:09 ` [PATCH 02/13] libata: make the owner of a qc responsible for freeing it Tejun Heo
2006-01-23  4:09 ` [PATCH 05/13] libata: add detailed AC_ERR_* flags Tejun Heo
2006-01-23  4:09 ` [PATCH 07/13] SCSI: export scsi_eh_finish_cmd() and scsi_eh_flush_done_q() Tejun Heo
2006-01-23  7:09   ` Jeff Garzik
2006-01-23  7:26   ` Arjan van de Ven
2006-01-23  8:20     ` Tejun Heo
2006-01-23  9:36       ` Christoph Hellwig
2006-01-23 10:05         ` Tejun Heo
2006-01-24 17:11           ` Luben Tuikov
2006-01-24 17:20             ` Arjan van de Ven
2006-01-24 18:25               ` Luben Tuikov
2006-01-24 17:30             ` Jeff Garzik
2006-01-24 18:53               ` Luben Tuikov
2006-01-23 14:52   ` Tejun Heo
2006-01-23  4:09 ` [PATCH 06/13] libata: return AC_ERR_* from issue functions Tejun Heo
2006-01-23  4:09 ` [PATCH 01/13] libata: fold __ata_qc_complete() into ata_qc_free() Tejun Heo
2006-01-27  3:34   ` Jeff Garzik
2006-01-23  4:09 ` [PATCH 09/13] libata: fix handling of race between timeout and completion Tejun Heo
2006-01-27  3:55   ` Jeff Garzik
2006-01-27  3:58     ` Jeff Garzik
2006-02-01 15:36     ` Tejun Heo [this message]
2006-02-09  6:21       ` Jeff Garzik
2006-01-23  4:09 ` [PATCH 08/13] libata: implement and apply ata_eh_qc_complete/retry() Tejun Heo
2006-01-23  4:09 ` [PATCH 12/13] libata: create pio/atapi task queueing wrappers Tejun Heo
2006-01-27  4:02   ` Jeff Garzik
2006-01-23  4:09 ` [PATCH 11/13] libata: implement ATA_FLAG_IN_EH port flag Tejun Heo
2006-01-27  4:00   ` Jeff Garzik
2006-01-23  4:09 ` [PATCH 10/13] libata: kill NULL qc handling from ->eng_timeout callbacks Tejun Heo
2006-01-23  4:09 ` [PATCH 13/13] libata: EH / pio tasks synchronization Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060201153658.GA4311@htj.dyndns.org \
    --to=htejun@gmail.com \
    --cc=albertcc@tw.ibm.com \
    --cc=jgarzik@pobox.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).