linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@SteelEye.com>
To: Laurent Riffard <laurent.riffard@free.fr>
Cc: Hannes Reinecke <hare@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org,
	linux-scsi@vger.kernel.org
Subject: Re: 2.6.24-rc3-mm1: I/O error, system hangs
Date: Sat, 24 Nov 2007 08:42:49 +0200	[thread overview]
Message-ID: <1195886569.3195.2.camel@localhost.localdomain> (raw)
In-Reply-To: <4747135C.60205@free.fr>


On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote:
> Le 23.11.2007 12:38, Hannes Reinecke a écrit :
> > Hannes Reinecke wrote:
> >> Laurent Riffard wrote:
> >>> Le 21.11.2007 23:41, Andrew Morton a écrit :
> >>>> On Wed, 21 Nov 2007 22:45:22 +0100
> >>>> Laurent Riffard <laurent.riffard@free.fr> wrote:
> >>>>
> >>>>> Le 21.11.2007 05:45, Andrew Morton a écrit :
> >>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc3/2.6.24-rc3-mm1/
> >>>>> Hello, 
> >>>>>
> >>>>> My system hangs shortly after I logged in Gnome desktop. SysRq-W shows
> >>>>> that a bunch of task are blocked in "D" state, they seem to wait for
> >>>>> some I/O completion. I can try to hand-copy some data if requested.
> >>>>>
> >>>>> I found these messages in dmesg:
> >>>>>
> >>>>> ~$ grep -C2 end_request dmesg-2.6.24-rc3-mm1 
> >>>>> EXT3-fs: mounted filesystem with ordered data mode.
> >>>>> sd 0:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> >>>>> end_request: I/O error, dev sda, sector 16460
> >>>>> ReiserFS: sda7: found reiserfs format "3.6" with standard journal
> >>>>> ReiserFS: sda7: using ordered data mode
> >>>>> --
> >>>>> ReiserFS: sda7: Using r5 hash to sort names
> >>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> >>>>> end_request: I/O error, dev sdb, sector 19632
> >>>>> sd 0:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> >>>>> end_request: I/O error, dev sdb, sector 40037363
> >>>>> Adding 1048568k swap on /dev/mapper/vglinux1-lvswap.  Priority:-1 extents:1 across:1048568k
> >>>>> lp0: using parport0 (interrupt-driven).
> >>>>>
> >>>>> These errors occur *only* with 2.6.24-rc3-mm1, they are 100% reproducible.
> >>>>> 2.6.24-rc3 and 2.6.24-rc2-mm1 are fine.
> >>>>>
> >>>>> Maybe something is broken in pata_via driver ?
> >>>>>
> >>>> Could be - libata-reimplement-ata_acpi_cbl_80wire-using-ata_acpi_gtm_xfermask.patch
> >>>> and pata_amd-pata_via-de-couple-programming-of-pio-mwdma-and-udma-timings.patch
> >>>> touch pata_via.c.
> >>> None of the above...
> >>>
> >>> I did a bisection, it spotted git-scsi-misc.patch. 
> >>> I just run 2.6.24-rc3-mm1 + revert-git-scsi-misc.patch, and it works fine.
> >>>
> >>> I guess commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 "[SCSI] Do not 
> >>> requeue requests if REQ_FAILFAST is set" is the real culprit. The other 
> >>> commits are touching documentation or drivers I don't use. I'll try 
> >>> to revert only this one this evening.
> 
> I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0 
> does fix the problem.
> 
> >> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an error where
> >> I shouldn't. Checking ...
> >>
> > Ok, found it. We are blocking even special commands (ie requests with PREEMPT not set)
> > when FAILFAST is set. Which is clearly wrong. The attached patch fixes this.
> 
> Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O errors.

I think the problem is the way we treat BLOCKED and QUIESCED (the latter
is the state that the domain validation uses and which we cannot kill
fastfail on).  It's definitely wrong to kill fastfail requests when the
state is QUIESCE.

This patch (which is applied on top of Hannes original) separates the
BLOCK and QUIESCE states correctly ... does this fix the problem?

James

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 13e7e09..a7cf23a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1279,18 +1279,21 @@ int scsi_prep_state_check(struct scsi_device *sdev, struct request *req)
 				    "rejecting I/O to dead device\n");
 			ret = BLKPREP_KILL;
 			break;
-		case SDEV_QUIESCE:
 		case SDEV_BLOCK:
 			/*
-			 * If the devices is blocked we defer normal commands.
-			 */
-			if (!(req->cmd_flags & REQ_PREEMPT))
-				ret = BLKPREP_DEFER;
-			/*
 			 * Return failfast requests immediately
 			 */
 			if (req->cmd_flags & REQ_FAILFAST)
 				ret = BLKPREP_KILL;
+
+			/* fall through */
+
+		case SDEV_QUIESCE:
+			/*
+			 * If the devices is blocked we defer normal commands.
+			 */
+			if (!(req->cmd_flags & REQ_PREEMPT))
+				ret = BLKPREP_DEFER;
 			break;
 		default:
 			/*



  reply	other threads:[~2007-11-24  6:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20071120204525.ff27ac98.akpm@linux-foundation.org>
     [not found] ` <4744A6F2.4030302@free.fr>
2007-11-21 22:41   ` 2.6.24-rc3-mm1: I/O error, system hangs Andrew Morton
2007-11-23  7:29     ` Laurent Riffard
2007-11-23  7:51       ` Hannes Reinecke
2007-11-23 11:38         ` Hannes Reinecke
2007-11-23 17:52           ` Laurent Riffard
2007-11-24  6:42             ` James Bottomley [this message]
2007-11-24 12:57               ` Laurent Riffard
2007-11-24 13:26                 ` James Bottomley
2007-11-24 17:54                   ` Gabriel C
2007-11-24 18:04                     ` James Bottomley
2007-11-24 18:08                       ` Gabriel C
2007-11-24 18:28                         ` Gabriel C
2007-11-24 22:59                   ` Laurent Riffard
2007-11-25  7:37                     ` James Bottomley
2007-11-25 20:39                       ` Laurent Riffard
2007-11-28 21:38                         ` Laurent Riffard
2007-11-24 17:44           ` James Bottomley
2007-11-26  7:54             ` Hannes Reinecke

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1195886569.3195.2.camel@localhost.localdomain \
    --to=james.bottomley@steeleye.com \
    --cc=akpm@linux-foundation.org \
    --cc=hare@suse.de \
    --cc=laurent.riffard@free.fr \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).