Re: 2.6.23.1: mdadm/raid5 hung/d-state

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Chuck Ebbert <cebbert@redhat.com>
Cc: Neil Brown <neilb@suse.de>,
	Justin Piszcz <jpiszcz@lucidpixels.com>,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state
Date: Wed, 07 Nov 2007 17:48:36 +0100	[thread overview]
Message-ID: <4731EC64.3050903@systella.fr> (raw)
In-Reply-To: <4731EA2B.5000806@redhat.com>

Chuck Ebbert wrote:
> On 11/05/2007 03:36 AM, BERTRAND Joël wrote:
>> Neil Brown wrote:
>>> On Sunday November 4, jpiszcz@lucidpixels.com wrote:
>>>> # ps auxww | grep D
>>>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>>>> root       273  0.0  0.0      0     0 ?        D    Oct21  14:40
>>>> [pdflush]
>>>> root       274  0.0  0.0      0     0 ?        D    Oct21  13:00
>>>> [pdflush]
>>>>
>>>> After several days/weeks, this is the second time this has happened,
>>>> while doing regular file I/O (decompressing a file), everything on
>>>> the device went into D-state.
>>> At a guess (I haven't looked closely) I'd say it is the bug that was
>>> meant to be fixed by
>>>
>>> commit 4ae3f847e49e3787eca91bced31f8fd328d50496
>>>
>>> except that patch applied badly and needed to be fixed with
>>> the following patch (not in git yet).
>>> These have been sent to stable@ and should be in the queue for 2.6.23.2
>>     My linux-2.6.23/drivers/md/raid5.c contains your patch for a long
>> time :
>>
>> ...
>>         spin_lock(&sh->lock);
>>         clear_bit(STRIPE_HANDLE, &sh->state);
>>         clear_bit(STRIPE_DELAYED, &sh->state);
>>
>>         s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
>>         s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
>>         s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
>>         /* Now to look around and see what can be done */
>>
>>         /* clean-up completed biofill operations */
>>         if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
>>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
>>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
>>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
>>         }
>>
>>         rcu_read_lock();
>>         for (i=disks; i--; ) {
>>                 mdk_rdev_t *rdev;
>>                 struct r5dev *dev = &sh->dev[i];
>> ...
>>
>> but it doesn't fix this bug.
>>
> 
> Did that chunk starting with "clean-up completed biofill operations" end
> up where it belongs? The patch with the big context moves it to a different
> place from where the original one puts it when applied to 2.6.23...
> 
> Lately I've seen several problems where the context isn't enough to make
> a patch apply properly when some offsets have changed. In some cases a
> patch won't apply at all because two nearly-identical areas are being
> changed and the first chunk gets applied where the second one should,
> leaving nowhere for the second chunk to apply.

	I always apply this kind of patches by hands, and no by patch command. 
Last patch sent here seems to fix this bug :

gershwin:[/usr/scripts] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
       1464725632 blocks [2/1] [U_]
       [=====>...............]  recovery = 27.1% (396992504/1464725632) 
finish=1040.3min speed=17104K/sec

	Regards,

	JKB
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)

From: "BERTRAND Joël" <joel.bertrand@systella.fr>
To: Chuck Ebbert <cebbert@redhat.com>
Cc: Neil Brown <neilb@suse.de>,
	Justin Piszcz <jpiszcz@lucidpixels.com>,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: 2.6.23.1: mdadm/raid5 hung/d-state
Date: Wed, 07 Nov 2007 17:48:36 +0100	[thread overview]
Message-ID: <4731EC64.3050903@systella.fr> (raw)
In-Reply-To: <4731EA2B.5000806@redhat.com>

Chuck Ebbert wrote:
> On 11/05/2007 03:36 AM, BERTRAND Joël wrote:
>> Neil Brown wrote:
>>> On Sunday November 4, jpiszcz@lucidpixels.com wrote:
>>>> # ps auxww | grep D
>>>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
>>>> root       273  0.0  0.0      0     0 ?        D    Oct21  14:40
>>>> [pdflush]
>>>> root       274  0.0  0.0      0     0 ?        D    Oct21  13:00
>>>> [pdflush]
>>>>
>>>> After several days/weeks, this is the second time this has happened,
>>>> while doing regular file I/O (decompressing a file), everything on
>>>> the device went into D-state.
>>> At a guess (I haven't looked closely) I'd say it is the bug that was
>>> meant to be fixed by
>>>
>>> commit 4ae3f847e49e3787eca91bced31f8fd328d50496
>>>
>>> except that patch applied badly and needed to be fixed with
>>> the following patch (not in git yet).
>>> These have been sent to stable@ and should be in the queue for 2.6.23.2
>>     My linux-2.6.23/drivers/md/raid5.c contains your patch for a long
>> time :
>>
>> ...
>>         spin_lock(&sh->lock);
>>         clear_bit(STRIPE_HANDLE, &sh->state);
>>         clear_bit(STRIPE_DELAYED, &sh->state);
>>
>>         s.syncing = test_bit(STRIPE_SYNCING, &sh->state);
>>         s.expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
>>         s.expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
>>         /* Now to look around and see what can be done */
>>
>>         /* clean-up completed biofill operations */
>>         if (test_bit(STRIPE_OP_BIOFILL, &sh->ops.complete)) {
>>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending);
>>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack);
>>                 clear_bit(STRIPE_OP_BIOFILL, &sh->ops.complete);
>>         }
>>
>>         rcu_read_lock();
>>         for (i=disks; i--; ) {
>>                 mdk_rdev_t *rdev;
>>                 struct r5dev *dev = &sh->dev[i];
>> ...
>>
>> but it doesn't fix this bug.
>>
> 
> Did that chunk starting with "clean-up completed biofill operations" end
> up where it belongs? The patch with the big context moves it to a different
> place from where the original one puts it when applied to 2.6.23...
> 
> Lately I've seen several problems where the context isn't enough to make
> a patch apply properly when some offsets have changed. In some cases a
> patch won't apply at all because two nearly-identical areas are being
> changed and the first chunk gets applied where the second one should,
> leaving nowhere for the second chunk to apply.

	I always apply this kind of patches by hands, and no by patch command. 
Last patch sent here seems to fix this bug :

gershwin:[/usr/scripts] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
       1464725632 blocks [2/1] [U_]
       [=====>...............]  recovery = 27.1% (396992504/1464725632) 
finish=1040.3min speed=17104K/sec

	Regards,

	JKB

next prev parent reply	other threads:[~2007-11-07 16:48 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-04 12:03 2.6.23.1: mdadm/raid5 hung/d-state Justin Piszcz
2007-11-04 12:39 ` 2.6.23.1: mdadm/raid5 hung/d-state (md3_raid5 stuck in endless loop?) Justin Piszcz
2007-11-04 12:48 ` 2.6.23.1: mdadm/raid5 hung/d-state Michael Tokarev
2007-11-04 12:52   ` Justin Piszcz
2007-11-04 14:55     ` Michael Tokarev
2007-11-04 14:59       ` Justin Piszcz
2007-11-04 18:17       ` BERTRAND Joël
2007-11-04 21:40       ` David Greaves
2007-11-04 13:40 ` BERTRAND Joël
2007-11-04 13:42   ` Justin Piszcz
2007-11-04 21:49 ` Neil Brown
2007-11-04 21:51   ` Justin Piszcz
2007-11-05 18:35     ` Dan Williams
2007-11-05 18:35       ` Dan Williams
2007-11-05 18:35       ` Justin Piszcz
2007-11-06  0:19         ` Dan Williams
2007-11-06 10:19           ` BERTRAND Joël
2007-11-06 11:29             ` Justin Piszcz
2007-11-06 11:39               ` BERTRAND Joël
2007-11-06 11:39                 ` BERTRAND Joël
2007-11-06 11:42                 ` Justin Piszcz
2007-11-06 12:20                   ` BERTRAND Joël
2007-11-06 12:20                     ` BERTRAND Joël
2007-11-07  1:25             ` Dan Williams
2007-11-07  5:00               ` Jeff Lessem
2007-11-08 17:45                 ` Bill Davidsen
2007-11-08 18:02                   ` Dan Williams
2007-11-09 20:36                     ` Jeff Lessem
2007-11-08 21:40                 ` Carlos Carvalho
2007-11-09  9:14                   ` Justin Piszcz
2007-11-09 14:09                     ` Fabiano Silva
2007-11-07 11:20               ` BERTRAND Joël
2007-11-07 11:20                 ` BERTRAND Joël
2007-11-06 23:18       ` Jeff Lessem
2007-11-05  8:36   ` BERTRAND Joël
2007-11-07 16:39     ` Chuck Ebbert
2007-11-07 16:39       ` Chuck Ebbert
2007-11-07 16:48       ` BERTRAND Joël [this message]
2007-11-07 16:48         ` BERTRAND Joël
2007-11-08 11:42         ` BERTRAND Joël
2007-11-08 11:42           ` BERTRAND Joël
2007-11-08 12:44           ` Justin Piszcz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4731EC64.3050903@systella.fr \
    --to=joel.bertrand@systella.fr \
    --cc=cebbert@redhat.com \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.