All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@suse.de>
To: Warren Togami <wtogami@redhat.com>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>,
	Arjan van de Ven <arjanv@redhat.com>,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alan Cox <alan@redhat.com>
Subject: Re: [PATCH] i2o_block Fix, possible CFQ elevator problem?
Date: Tue, 20 Apr 2004 10:03:25 +0200	[thread overview]
Message-ID: <20040420080325.GD25806@suse.de> (raw)
In-Reply-To: <4084D83D.8060405@redhat.com>

On Mon, Apr 19 2004, Warren Togami wrote:
> Jens Axboe wrote:
> >>>http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie.txt
> >>
> >>Next we tested cfq with the following section of code commented out. 
> >>With this change the kernel no longer panics and seems to survive with 
> >>four simultaneous bonnie++'s on all four block devices.
> >>
> >>--- cfq-iosched.c       2004-04-20 13:52:55.000000000 -1000
> >>+++ /root/linux-2.6.5-1.326/drivers/block/cfq-iosched.c 2004-04-20 
> >>14:09:43.000000000 -1000
> >>@@ -401,10 +401,12 @@
> >>dispatch:
> >>               rq = list_entry_rq(cfqd->dispatch->next);
> >>
> >>+/*
> >>               BUG_ON(q->last_merge == rq);
> >>               crq = RQ_DATA(rq);
> >>               if (crq)
> >>                       BUG_ON(ON_MHASH(crq));
> >>+*/
> >>
> >>               return rq;
> >>       }
> >
> >
> >This is not safe, the BUG_ON is there for a reason. If the request in on
> >the merge hash when handed to the driver, you risk corrupting data. The
> >fix would be figuring out why this is happening. Maybe it's looking at
> >bad data, could you test with this patch applied and see if the oops
> >still triggers?
> >
> >===== drivers/block/cfq-iosched.c 1.1 vs edited =====
> >--- 1.1/drivers/block/cfq-iosched.c	Mon Apr 12 19:55:20 2004
> >+++ edited/drivers/block/cfq-iosched.c	Tue Apr 20 09:07:20 2004
> >@@ -403,7 +403,7 @@
> > 
> > 		BUG_ON(q->last_merge == rq);
> > 		crq = RQ_DATA(rq);
> >-		if (crq)
> >+		if (blk_fs_request(rq) && crq)
> > 			BUG_ON(ON_MHASH(crq));
> > 
> > 		return rq;
> >
> 
> We figured removing error handling was not safe, the previous post was 
> only reporting test results to ask for more suggestions.  I have now 
> tested your suggested patch above and it seems to crash in the same way 
> as originally.
> 
> http://togami.com/~warren/archive/2004/i2o_cfq_quad_bonnie2.txt

As a temporary safe work-around, you can apply this patch.

> This makes me curious, the other elevators lacked this type of error 
> checking.  Did this mean they were possibly allowing data corruption to 
> happen with buggy drivers like this?  Kind of scary!  We were lucky to 
> test this now, because this was one of the first FC kernels that 
> included cfq by default.

Not necessarily, it's most likely a CFQ bug. Otherwise it would have
surfaced before :-)

> Do you have any advice regarding the atomic type removal problem that we 
> experienced from our previous post?

Just change the type to an unsigned integer instead. Double check that
all decrements/increments and reads of that integer are inside the
device lock, it looked like they were.

===== drivers/block/cfq-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/cfq-iosched.c	Mon Apr 12 19:55:20 2004
+++ edited/drivers/block/cfq-iosched.c	Tue Apr 20 10:02:01 2004
@@ -404,7 +404,7 @@
 		BUG_ON(q->last_merge == rq);
 		crq = RQ_DATA(rq);
 		if (crq)
-			BUG_ON(ON_MHASH(crq));
+			cfq_remove_merge_hints(q, crq);
 
 		return rq;
 	}

-- 
Jens Axboe


  reply	other threads:[~2004-04-20  8:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-19 11:37 [PATCH] i2o_block Fix, possible CFQ elevator problem? Warren Togami
2004-04-19 12:12 ` Jens Axboe
2004-04-20  0:42   ` Warren Togami
     [not found]     ` <40848159.7090605@togami.com>
2004-04-20  7:08       ` Jens Axboe
2004-04-20  7:58         ` Warren Togami
2004-04-20  8:03           ` Jens Axboe [this message]
2004-04-20  8:59             ` Warren Togami
2004-04-20  9:05               ` Jens Axboe
2004-04-20 10:53                 ` Warren Togami
2004-04-20 10:56                   ` Jens Axboe
2004-04-20 11:29                     ` Warren Togami
2004-04-20 11:34                       ` Jens Axboe
2004-04-20 11:38                         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040420080325.GD25806@suse.de \
    --to=axboe@suse.de \
    --cc=Markus.Lidel@shadowconnect.com \
    --cc=alan@redhat.com \
    --cc=arjanv@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=wtogami@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.