From: Jeff Lessem <Jeff@Lessem.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Bill Davidsen" <davidsen@tmr.com>,
"BERTRAND Joël" <joel.bertrand@systella.fr>,
"Justin Piszcz" <jpiszcz@lucidpixels.com>,
"Neil Brown" <neilb@suse.de>,
linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state
Date: Fri, 09 Nov 2007 13:36:11 -0700 [thread overview]
Message-ID: <4734C4BB.1050000@Lessem.org> (raw)
In-Reply-To: <e9c3a7c20711081002k5c3be41doa700afce7c1a7a6e@mail.gmail.com>
Dan Williams wrote:
> On 11/8/07, Bill Davidsen <davidsen@tmr.com> wrote:
>> Jeff Lessem wrote:
>>> Dan Williams wrote:
>>>> The following patch, also attached, cleans up cases where the code
>>> looks
>>>> at sh->ops.pending when it should be looking at the consistent
>>>> stack-based snapshot of the operations flags.
>>> I tried this patch (against a stock 2.6.23), and it did not work for
>>> me. Not only did I/O to the effected RAID5 & XFS partition stop, but
>>> also I/O to all other disks. I was not able to capture any debugging
>>> information, but I should be able to do that tomorrow when I can hook
>>> a serial console to the machine.
>> That can't be good! This is worrisome because Joel is giddy with joy
>> because it fixes his iSCSI problems. I was going to try it with nbd, but
>> perhaps I'll wait a week or so and see if others have more information.
>> Applying patches before a holiday weekend is a good way to avoid time
>> off. :-(
>
> We need to see more information on the failure that Jeff is seeing,
> and whether it goes away with the two known patches applied. He
> applied this most recent patch against stock 2.6.23 which means that
> the platform was still open to the first biofill flags issue.
I applied both of the patches. The biofill one did not apply cleanly,
as it was adding biofill to one section, and removing it from another,
but it appears that biofill does not need to be removed from a stock
2.6.23 kernel. The second patch applies with a slight offset, but no
errors.
I can report success so far with both patches applied. I created an
1100GB RAID5, formated it XFS, and successfully "tar c | tar x" 895GB
of data onto it. I'm also in the process of rsync-ing the 895GB of
data from the (slightly changed) original. In the past, I would
always get a hang within 0-50GB of data transfer.
For each drive in the RAID I also:
echo 128 > /sys/block/"$i"/queue/max_sectors_kb
echo 512 > /sys/block/"$i"/queue/nr_requests
echo 1 > /sys/block/"$i"/device/queue_depth
blockdev --setra 65536 /dev/md3
echo 16384 > /sys/block/md3/md/stripe_cache_size
These changes appear to improve performance, along with a RAID5 chunk
size of 1024k, but these changes alone (without the patches) do not
fix the problem.
next prev parent reply other threads:[~2007-11-09 20:36 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-11-04 12:03 2.6.23.1: mdadm/raid5 hung/d-state Justin Piszcz
2007-11-04 12:39 ` 2.6.23.1: mdadm/raid5 hung/d-state (md3_raid5 stuck in endless loop?) Justin Piszcz
2007-11-04 12:48 ` 2.6.23.1: mdadm/raid5 hung/d-state Michael Tokarev
2007-11-04 12:52 ` Justin Piszcz
2007-11-04 14:55 ` Michael Tokarev
2007-11-04 14:59 ` Justin Piszcz
2007-11-04 18:17 ` BERTRAND Joël
2007-11-04 21:40 ` David Greaves
2007-11-04 13:40 ` BERTRAND Joël
2007-11-04 13:42 ` Justin Piszcz
2007-11-04 21:49 ` Neil Brown
2007-11-04 21:51 ` Justin Piszcz
2007-11-05 18:35 ` Dan Williams
2007-11-05 18:35 ` Justin Piszcz
2007-11-06 0:19 ` Dan Williams
2007-11-06 10:19 ` BERTRAND Joël
2007-11-06 11:29 ` Justin Piszcz
2007-11-06 11:39 ` BERTRAND Joël
2007-11-06 11:42 ` Justin Piszcz
2007-11-06 12:20 ` BERTRAND Joël
2007-11-07 1:25 ` Dan Williams
2007-11-07 5:00 ` Jeff Lessem
2007-11-08 17:45 ` Bill Davidsen
2007-11-08 18:02 ` Dan Williams
2007-11-09 20:36 ` Jeff Lessem [this message]
2007-11-08 21:40 ` Carlos Carvalho
2007-11-09 9:14 ` Justin Piszcz
2007-11-09 14:09 ` Fabiano Silva
2007-11-07 11:20 ` BERTRAND Joël
2007-11-06 23:18 ` Jeff Lessem
2007-11-05 8:36 ` BERTRAND Joël
2007-11-07 16:39 ` Chuck Ebbert
2007-11-07 16:48 ` BERTRAND Joël
2007-11-08 11:42 ` BERTRAND Joël
2007-11-08 12:44 ` Justin Piszcz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4734C4BB.1050000@Lessem.org \
--to=jeff@lessem.org \
--cc=dan.j.williams@intel.com \
--cc=davidsen@tmr.com \
--cc=joel.bertrand@systella.fr \
--cc=jpiszcz@lucidpixels.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).