linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Lord <mlord@pobox.com>
To: Jeff Garzik <jgarzik@pobox.com>
Cc: IDE/ATA development list <linux-ide@vger.kernel.org>
Subject: libata total system lockup fix
Date: Mon, 25 Jul 2005 09:47:28 -0400	[thread overview]
Message-ID: <42E4ED70.1050501@pobox.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

Jeff,

We corresponded about this bugfix ages ago,
but I find I'm still patching it into each new
kernel.org kernel to keep my system from locking
up hard in the libata error handling path.

(error handling is triggered by the KDE/Gnome desktop
polling the empty ATAPI drive every second or two,
gets an error when no disc inserted, thereby triggering
SCSI/libata error paths, which lock system hard once
in a few thousand tries -- about once every two hours).

This patch originally came to me from you, and I've now
forgotten where you got it from.  But it does fix the
problem here.  I have also circulated this patch among
many other users of "ata_piix" on modern laptops,
and it seems to cure random lockups for them as well.

This configuration (2.6 kernel, ata_piix driving hard disk
and DVD-RW drive in a Centrino laptop) is now very very
commonplace, and the lockups are driving users crazy.

Sure would be nice to see it in 2.6.13 or 2.6.14 by default.

Cheers

-- 
Mark Lord
Real-Time Remedies Inc.
mlord@pobox.com


[-- Attachment #2: libata_error_handling.patch --]
[-- Type: text/x-patch, Size: 3629 bytes --]

diff -u --recursive --new-file --exclude='.*' linux-2.6.12/drivers/scsi/libata-core.c linux/drivers/scsi/libata-core.c
--- linux-2.6.12/drivers/scsi/libata-core.c	2005-06-17 15:48:29.000000000 -0400
+++ linux/drivers/scsi/libata-core.c	2005-07-02 12:33:25.000000000 -0400
@@ -41,6 +41,7 @@
 #include <scsi/scsi.h>
 #include "scsi.h"
 #include "scsi_priv.h"
+#include "scsi_logging.h"
 #include <scsi/scsi_host.h>
 #include <linux/libata.h>
 #include <asm/io.h>
@@ -2803,6 +2804,11 @@
 	DPRINTK("EXIT\n");
 }
 
+void ata_qc_timeout_done(struct scsi_cmnd *scmd)
+{
+	return;
+}
+
 /**
  *	ata_qc_timeout - Handle timeout of queued command
  *	@qc: Command that timed out
@@ -2835,17 +2841,16 @@
 		struct scsi_cmnd *cmd = qc->scsicmd;
 
 		if (!scsi_eh_eflags_chk(cmd, SCSI_EH_CANCEL_CMD)) {
-
 			/* finish completing original command */
+			qc->scsidone = ata_qc_timeout_done;
+
 			__ata_qc_complete(qc);
 
 			atapi_request_sense(ap, dev, cmd);
 
 			cmd->result = (CHECK_CONDITION << 1) | (DID_OK << 16);
-			scsi_finish_command(cmd);
-
-			goto out;
 		}
+		goto out;
 	}
 
 	/* hack alert!  We cannot use the supplied completion
diff -u --recursive --new-file --exclude='.*' linux-2.6.12/drivers/scsi/libata-scsi.c linux/drivers/scsi/libata-scsi.c
--- linux-2.6.12/drivers/scsi/libata-scsi.c	2005-06-17 15:48:29.000000000 -0400
+++ linux/drivers/scsi/libata-scsi.c	2005-07-02 12:33:25.000000000 -0400
@@ -380,12 +380,6 @@
 	ap = (struct ata_port *) &host->hostdata[0];
 	ap->ops->eng_timeout(ap);
 
-	/* TODO: this is per-command; when queueing is supported
-	 * this code will either change or move to a more
-	 * appropriate place
-	 */
-	host->host_failed--;
-
 	DPRINTK("EXIT\n");
 	return 0;
 }
diff -u --recursive --new-file --exclude='.*' linux-2.6.12/drivers/scsi/scsi_error.c linux/drivers/scsi/scsi_error.c
--- linux-2.6.12/drivers/scsi/scsi_error.c	2005-06-17 15:48:29.000000000 -0400
+++ linux/drivers/scsi/scsi_error.c	2005-07-02 12:33:25.000000000 -0400
@@ -1613,6 +1613,40 @@
 	scsi_eh_flush_done_q(&eh_done_q);
 }
 
+static void scsi_invoke_strategy_handler(struct Scsi_Host *shost)
+{
+	int rtn;
+	struct list_head *lh, *lh_sf;
+	struct scsi_cmnd *scmd;
+	unsigned long flags;
+	LIST_HEAD(eh_work_q);
+	LIST_HEAD(eh_done_q);
+
+	rtn = shost->hostt->eh_strategy_handler(shost);
+
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_splice_init(&shost->eh_cmd_q, &eh_work_q);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
+
+	list_for_each_safe(lh, lh_sf, &eh_work_q) {
+		scmd = list_entry(lh, struct scsi_cmnd, eh_entry);
+
+		if (scsi_eh_eflags_chk(scmd, SCSI_EH_CANCEL_CMD) ||
+		    !SCSI_SENSE_VALID(scmd))
+			continue;
+		scmd->retries = scmd->allowed;
+		scsi_eh_finish_cmd(scmd, &eh_done_q);
+	}
+
+	if (!list_empty(&eh_work_q))
+		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
+			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
+
+	scsi_eh_flush_done_q(&eh_done_q);
+}
+
 /**
  * scsi_error_handler - Handle errors/timeouts of SCSI cmds.
  * @data:	Host for which we are running.
@@ -1627,7 +1661,6 @@
 int scsi_error_handler(void *data)
 {
 	struct Scsi_Host *shost = (struct Scsi_Host *) data;
-	int rtn;
 	DECLARE_MUTEX_LOCKED(sem);
 
 	/*
@@ -1683,8 +1716,8 @@
 		 * what we need to do to get it up and online again (if we can).
 		 * If we fail, we end up taking the thing offline.
 		 */
-		if (shost->hostt->eh_strategy_handler) 
-			rtn = shost->hostt->eh_strategy_handler(shost);
+		if (shost->hostt->eh_strategy_handler)
+			scsi_invoke_strategy_handler(shost);
 		else
 			scsi_unjam_host(shost);
 

             reply	other threads:[~2005-07-25 13:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-07-25 13:47 Mark Lord [this message]
2005-07-25 14:51 ` libata total system lockup fix Jeff Garzik
2005-07-25 15:53   ` Mark Lord
2005-08-05  3:52     ` Tejun Heo
2005-08-05  4:01       ` Tejun Heo
2005-08-11 20:13         ` Jeff Garzik
2005-08-12  0:49           ` Tejun Heo
2005-08-12  2:22             ` Mark Lord
2005-08-12  2:38               ` Tejun Heo
2005-08-12  2:41                 ` Tejun Heo
2005-08-09 15:16       ` Mark Lord
2005-08-09 23:54         ` Tejun Heo
2005-08-10 14:16           ` Mark Lord
2005-08-10 21:24       ` Jeff Garzik
2005-08-19  0:46         ` Mark Lord
2005-08-19  3:21           ` Tejun Heo
2005-08-19  3:36             ` Mark Lord
2005-08-19  3:45               ` Tejun Heo
2005-08-19  4:01                 ` Mark Lord
2005-08-19  9:37               ` Erik Slagter
2005-08-19  9:35           ` Erik Slagter
2005-08-19 13:07             ` Mark Lord
2005-08-19 13:32               ` Bartlomiej Zolnierkiewicz
2005-08-19 13:37                 ` Bartlomiej Zolnierkiewicz
2005-09-03 22:58           ` Mark Lord
2005-09-03 23:06             ` Jeff Garzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42E4ED70.1050501@pobox.com \
    --to=mlord@pobox.com \
    --cc=jgarzik@pobox.com \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).