md/raid5: raid5d livelocks after drive failure during resync

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* md/raid5: raid5d livelocks after drive failure during resync
@ 2013-07-18 10:59 Alexander Lyakas
  2013-07-28  6:30 ` Alexander Lyakas
  2013-07-29  5:25 ` NeilBrown
  0 siblings, 2 replies; 3+ messages in thread
From: Alexander Lyakas @ 2013-07-18 10:59 UTC (permalink / raw)
  To: NeilBrown, linux-raid; +Cc: yair, Shyam Kaushik, vladimir

Hello Neil,
we have a 3-drive raid5, that was resyncing, but then one drive
failed. As a result, now raid5 is livelocked on 100% cpu, and the
failed drive is not ejected from the array.
Kernel is ubuntu-precise 3.2.0-25 40 plus following patches applied manually:

commit fab363b5ff502d1b39ddcfec04271f5858d9f26e
Author: Shaohua Li <shli <at> kernel.org>
Date:   Tue Jul 3 15:57:19 2012 +1000
    raid5: delayed stripe fix

and

commit a7854487cd7128a30a7f4f5259de9f67d5efb95f
Author: Alexander Lyakas <alex.bolshoy@gmail.com>
Date:   Thu Oct 11 13:50:12 2012 +1100
    md: When RAID5 is dirty, force reconstruct-write instead of
read-modify-write.

/proc/mdstat shows:

Personalities : [raid1] [raid6] [raid5] [raid4]
md2 : active raid5 dm-5[0] dm-7[2](F) dm-6[1]
      7809200128 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [UU_]
        resync=PENDING
      bitmap: 29/30 pages [116KB], 65536KB chunk

From the patches applied to that kernel above our version, the
following seems somewhat relevant:

cc1ceee md/raid5: In ops_run_io, inc nr_pending before calling
md_wait_for_blocked_rdev

but in our case badblocks are disabled.
(original conversation is in http://www.spinics.net/lists/raid/msg39191.html).

Here are some stacks that we captured and appropriate places in the code:

[] __cond_resched+0x2a/0x40
[] handle_stripe+0x400/0x1d80 [raid456]
[] raid5d+0x463/0x650 [raid456]
[] md_thread+0x10e/0x140
[] kthread+0x8c/0xa0
[] kernel_thread_helper+0x4/0x10
[] 0xffffffffffffffff
0x59e0 is in handle_stripe
(/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:495).
490             struct r5conf *conf = sh->raid_conf;
491             int i, disks = sh->disks;
492
493             might_sleep();
494
495             for (i = disks; i--; ) {
496                     int rw;
497                     struct bio *bi;
498                     struct md_rdev *rdev;
499                     if (test_and_clear_bit(R5_Wantwrite,
&sh->dev[i].flags)) {

[] __cond_resched+0x2a/0x40
[] raid5d+0x470/0x650 [raid456]
[] md_thread+0x10e/0x140
[] kthread+0x8c/0xa0
[] kernel_thread_helper+0x4/0x10
[] 0xffffffffffffffff
0x8d80 is in raid5d (/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:4306).
4301                    handled++;
4302                    handle_stripe(sh);
4303                    release_stripe(sh);
4304                    cond_resched();
4305
4306                    if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
4307                            md_check_recovery(mddev);
4308
4309                    spin_lock_irq(&conf->device_lock);
4310            }

[] md_wakeup_thread+0x28/0x30
[] __release_stripe+0x101/0x1d0 [raid456]
[] release_stripe+0x4d/0x60 [raid456]
[] raid5d+0x46b/0x650 [raid456]
[] md_thread+0x10e/0x140
[] kthread+0x8c/0xa0
[] kernel_thread_helper+0x4/0x10
0x1be1 is in __release_stripe
(/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:227).
222                                     if (conf->retry_read_aligned)
223
md_wakeup_thread(conf->mddev->thread);
224                             }
225                     }
226             }
227     }
228
229     static void release_stripe(struct stripe_head *sh)
230     {
231             struct r5conf *conf = sh->raid_conf;

[] __cond_resched+0x2a/0x40
[] handle_stripe+0x5dc/0x1d80 [raid456]
[] raid5d+0x463/0x650 [raid456]
[] md_thread+0x10e/0x140
[] kthread+0x8c/0xa0
[] kernel_thread_helper+0x4/0x10
[] 0xffffffffffffffff
0x5bbc is in handle_stripe
(/usr/src/linux-headers-3.2.0-25-generic/arch/x86/include/asm/bitops.h:121).
116      * clear_bit() is atomic and implies release semantics before the memory
117      * operation. It can be used for an unlock.
118      */
119     static inline void clear_bit_unlock(unsigned nr, volatile
unsigned long *addr)
120     {
121             barrier();
122             clear_bit(nr, addr);
123     }
124
125     static inline void __clear_bit(int nr, volatile unsigned long *addr)

[] __cond_resched+0x2a/0x40
[] handle_stripe+0xde/0x1d80 [raid456]
[] raid5d+0x463/0x650 [raid456]
[] md_thread+0x10e/0x140
[] kthread+0x8c/0xa0
[] kernel_thread_helper+0x4/0x10
0x56be is in handle_stripe (include/linux/spinlock.h:310).
305             raw_spin_lock_nest_lock(spinlock_check(lock),
nest_lock);       \
306     } while (0)
307
308     static inline void spin_lock_irq(spinlock_t *lock)
309     {
310             raw_spin_lock_irq(&lock->rlock);
311     }
312
313     #define spin_lock_irqsave(lock, flags)                          \
314     do {

[] md_wakeup_thread+0x28/0x30
[] __release_stripe+0x101/0x1d0 [raid456]
[] release_stripe+0x42/0x60 [raid456]
[] raid5d+0x46b/0x650 [raid456]
[] md_thread+0x10e/0x140
[] kthread+0x8c/0xa0
[] kernel_thread_helper+0x4/0x10
0x1cf2 is in release_stripe (include/linux/spinlock.h:340).
335             raw_spin_unlock_irq(&lock->rlock);
336     }
337
338     static inline void spin_unlock_irqrestore(spinlock_t *lock,
unsigned long flags)
339     {
340             raw_spin_unlock_irqrestore(&lock->rlock, flags);
341     }
342
343     static inline int spin_trylock_bh(spinlock_t *lock)
344     {

Can you pls advise what might be the issue?

Thanks,
Alex.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: md/raid5: raid5d livelocks after drive failure during resync
  2013-07-18 10:59 md/raid5: raid5d livelocks after drive failure during resync Alexander Lyakas
@ 2013-07-28  6:30 ` Alexander Lyakas
  2013-07-29  5:25 ` NeilBrown
  1 sibling, 0 replies; 3+ messages in thread
From: Alexander Lyakas @ 2013-07-28  6:30 UTC (permalink / raw)
  To: NeilBrown, linux-raid; +Cc: yair, Shyam Kaushik, vladimir

Ping?

On Thu, Jul 18, 2013 at 1:59 PM, Alexander Lyakas
<alex.bolshoy@gmail.com> wrote:
> Hello Neil,
> we have a 3-drive raid5, that was resyncing, but then one drive
> failed. As a result, now raid5 is livelocked on 100% cpu, and the
> failed drive is not ejected from the array.
> Kernel is ubuntu-precise 3.2.0-25 40 plus following patches applied manually:
>
> commit fab363b5ff502d1b39ddcfec04271f5858d9f26e
> Author: Shaohua Li <shli <at> kernel.org>
> Date:   Tue Jul 3 15:57:19 2012 +1000
>     raid5: delayed stripe fix
>
> and
>
> commit a7854487cd7128a30a7f4f5259de9f67d5efb95f
> Author: Alexander Lyakas <alex.bolshoy@gmail.com>
> Date:   Thu Oct 11 13:50:12 2012 +1100
>     md: When RAID5 is dirty, force reconstruct-write instead of
> read-modify-write.
>
> /proc/mdstat shows:
>
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md2 : active raid5 dm-5[0] dm-7[2](F) dm-6[1]
>       7809200128 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [UU_]
>         resync=PENDING
>       bitmap: 29/30 pages [116KB], 65536KB chunk
>
> From the patches applied to that kernel above our version, the
> following seems somewhat relevant:
>
> cc1ceee md/raid5: In ops_run_io, inc nr_pending before calling
> md_wait_for_blocked_rdev
>
> but in our case badblocks are disabled.
> (original conversation is in http://www.spinics.net/lists/raid/msg39191.html).
>
> Here are some stacks that we captured and appropriate places in the code:
>
> [] __cond_resched+0x2a/0x40
> [] handle_stripe+0x400/0x1d80 [raid456]
> [] raid5d+0x463/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> [] 0xffffffffffffffff
> 0x59e0 is in handle_stripe
> (/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:495).
> 490             struct r5conf *conf = sh->raid_conf;
> 491             int i, disks = sh->disks;
> 492
> 493             might_sleep();
> 494
> 495             for (i = disks; i--; ) {
> 496                     int rw;
> 497                     struct bio *bi;
> 498                     struct md_rdev *rdev;
> 499                     if (test_and_clear_bit(R5_Wantwrite,
> &sh->dev[i].flags)) {
>
> [] __cond_resched+0x2a/0x40
> [] raid5d+0x470/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> [] 0xffffffffffffffff
> 0x8d80 is in raid5d (/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:4306).
> 4301                    handled++;
> 4302                    handle_stripe(sh);
> 4303                    release_stripe(sh);
> 4304                    cond_resched();
> 4305
> 4306                    if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
> 4307                            md_check_recovery(mddev);
> 4308
> 4309                    spin_lock_irq(&conf->device_lock);
> 4310            }
>
> [] md_wakeup_thread+0x28/0x30
> [] __release_stripe+0x101/0x1d0 [raid456]
> [] release_stripe+0x4d/0x60 [raid456]
> [] raid5d+0x46b/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> 0x1be1 is in __release_stripe
> (/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:227).
> 222                                     if (conf->retry_read_aligned)
> 223
> md_wakeup_thread(conf->mddev->thread);
> 224                             }
> 225                     }
> 226             }
> 227     }
> 228
> 229     static void release_stripe(struct stripe_head *sh)
> 230     {
> 231             struct r5conf *conf = sh->raid_conf;
>
> [] __cond_resched+0x2a/0x40
> [] handle_stripe+0x5dc/0x1d80 [raid456]
> [] raid5d+0x463/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> [] 0xffffffffffffffff
> 0x5bbc is in handle_stripe
> (/usr/src/linux-headers-3.2.0-25-generic/arch/x86/include/asm/bitops.h:121).
> 116      * clear_bit() is atomic and implies release semantics before the memory
> 117      * operation. It can be used for an unlock.
> 118      */
> 119     static inline void clear_bit_unlock(unsigned nr, volatile
> unsigned long *addr)
> 120     {
> 121             barrier();
> 122             clear_bit(nr, addr);
> 123     }
> 124
> 125     static inline void __clear_bit(int nr, volatile unsigned long *addr)
>
> [] __cond_resched+0x2a/0x40
> [] handle_stripe+0xde/0x1d80 [raid456]
> [] raid5d+0x463/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> 0x56be is in handle_stripe (include/linux/spinlock.h:310).
> 305             raw_spin_lock_nest_lock(spinlock_check(lock),
> nest_lock);       \
> 306     } while (0)
> 307
> 308     static inline void spin_lock_irq(spinlock_t *lock)
> 309     {
> 310             raw_spin_lock_irq(&lock->rlock);
> 311     }
> 312
> 313     #define spin_lock_irqsave(lock, flags)                          \
> 314     do {
>
> [] md_wakeup_thread+0x28/0x30
> [] __release_stripe+0x101/0x1d0 [raid456]
> [] release_stripe+0x42/0x60 [raid456]
> [] raid5d+0x46b/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> 0x1cf2 is in release_stripe (include/linux/spinlock.h:340).
> 335             raw_spin_unlock_irq(&lock->rlock);
> 336     }
> 337
> 338     static inline void spin_unlock_irqrestore(spinlock_t *lock,
> unsigned long flags)
> 339     {
> 340             raw_spin_unlock_irqrestore(&lock->rlock, flags);
> 341     }
> 342
> 343     static inline int spin_trylock_bh(spinlock_t *lock)
> 344     {
>
> Can you pls advise what might be the issue?
>
> Thanks,
> Alex.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: md/raid5: raid5d livelocks after drive failure during resync
  2013-07-18 10:59 md/raid5: raid5d livelocks after drive failure during resync Alexander Lyakas
  2013-07-28  6:30 ` Alexander Lyakas
@ 2013-07-29  5:25 ` NeilBrown
  1 sibling, 0 replies; 3+ messages in thread
From: NeilBrown @ 2013-07-29  5:25 UTC (permalink / raw)
  To: Alexander Lyakas; +Cc: linux-raid, yair, Shyam Kaushik, vladimir

[-- Attachment #1: Type: text/plain, Size: 5791 bytes --]

On Thu, 18 Jul 2013 13:59:29 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hello Neil,
> we have a 3-drive raid5, that was resyncing, but then one drive
> failed. As a result, now raid5 is livelocked on 100% cpu, and the
> failed drive is not ejected from the array.
> Kernel is ubuntu-precise 3.2.0-25 40 plus following patches applied manually:
> 
> commit fab363b5ff502d1b39ddcfec04271f5858d9f26e
> Author: Shaohua Li <shli <at> kernel.org>
> Date:   Tue Jul 3 15:57:19 2012 +1000
>     raid5: delayed stripe fix
> 
> and
> 
> commit a7854487cd7128a30a7f4f5259de9f67d5efb95f
> Author: Alexander Lyakas <alex.bolshoy@gmail.com>
> Date:   Thu Oct 11 13:50:12 2012 +1100
>     md: When RAID5 is dirty, force reconstruct-write instead of
> read-modify-write.
> 
> /proc/mdstat shows:
> 
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md2 : active raid5 dm-5[0] dm-7[2](F) dm-6[1]
>       7809200128 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [UU_]
>         resync=PENDING
>       bitmap: 29/30 pages [116KB], 65536KB chunk
> 
> >From the patches applied to that kernel above our version, the
> following seems somewhat relevant:
> 
> cc1ceee md/raid5: In ops_run_io, inc nr_pending before calling
> md_wait_for_blocked_rdev
> 
> but in our case badblocks are disabled.
> (original conversation is in http://www.spinics.net/lists/raid/msg39191.html).
> 
> Here are some stacks that we captured and appropriate places in the code:
> 
> [] __cond_resched+0x2a/0x40
> [] handle_stripe+0x400/0x1d80 [raid456]
> [] raid5d+0x463/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> [] 0xffffffffffffffff
> 0x59e0 is in handle_stripe
> (/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:495).
> 490             struct r5conf *conf = sh->raid_conf;
> 491             int i, disks = sh->disks;
> 492
> 493             might_sleep();
> 494
> 495             for (i = disks; i--; ) {
> 496                     int rw;
> 497                     struct bio *bi;
> 498                     struct md_rdev *rdev;
> 499                     if (test_and_clear_bit(R5_Wantwrite,
> &sh->dev[i].flags)) {
> 
> [] __cond_resched+0x2a/0x40
> [] raid5d+0x470/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> [] 0xffffffffffffffff
> 0x8d80 is in raid5d (/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:4306).
> 4301                    handled++;
> 4302                    handle_stripe(sh);
> 4303                    release_stripe(sh);
> 4304                    cond_resched();
> 4305
> 4306                    if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
> 4307                            md_check_recovery(mddev);
> 4308
> 4309                    spin_lock_irq(&conf->device_lock);
> 4310            }
> 
> [] md_wakeup_thread+0x28/0x30
> [] __release_stripe+0x101/0x1d0 [raid456]
> [] release_stripe+0x4d/0x60 [raid456]
> [] raid5d+0x46b/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> 0x1be1 is in __release_stripe
> (/mnt/work/alex/Ubuntu-3.2.0-25.40/drivers/md/raid5.c:227).
> 222                                     if (conf->retry_read_aligned)
> 223
> md_wakeup_thread(conf->mddev->thread);
> 224                             }
> 225                     }
> 226             }
> 227     }
> 228
> 229     static void release_stripe(struct stripe_head *sh)
> 230     {
> 231             struct r5conf *conf = sh->raid_conf;
> 
> [] __cond_resched+0x2a/0x40
> [] handle_stripe+0x5dc/0x1d80 [raid456]
> [] raid5d+0x463/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> [] 0xffffffffffffffff
> 0x5bbc is in handle_stripe
> (/usr/src/linux-headers-3.2.0-25-generic/arch/x86/include/asm/bitops.h:121).
> 116      * clear_bit() is atomic and implies release semantics before the memory
> 117      * operation. It can be used for an unlock.
> 118      */
> 119     static inline void clear_bit_unlock(unsigned nr, volatile
> unsigned long *addr)
> 120     {
> 121             barrier();
> 122             clear_bit(nr, addr);
> 123     }
> 124
> 125     static inline void __clear_bit(int nr, volatile unsigned long *addr)
> 
> [] __cond_resched+0x2a/0x40
> [] handle_stripe+0xde/0x1d80 [raid456]
> [] raid5d+0x463/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> 0x56be is in handle_stripe (include/linux/spinlock.h:310).
> 305             raw_spin_lock_nest_lock(spinlock_check(lock),
> nest_lock);       \
> 306     } while (0)
> 307
> 308     static inline void spin_lock_irq(spinlock_t *lock)
> 309     {
> 310             raw_spin_lock_irq(&lock->rlock);
> 311     }
> 312
> 313     #define spin_lock_irqsave(lock, flags)                          \
> 314     do {
> 
> [] md_wakeup_thread+0x28/0x30
> [] __release_stripe+0x101/0x1d0 [raid456]
> [] release_stripe+0x42/0x60 [raid456]
> [] raid5d+0x46b/0x650 [raid456]
> [] md_thread+0x10e/0x140
> [] kthread+0x8c/0xa0
> [] kernel_thread_helper+0x4/0x10
> 0x1cf2 is in release_stripe (include/linux/spinlock.h:340).
> 335             raw_spin_unlock_irq(&lock->rlock);
> 336     }
> 337
> 338     static inline void spin_unlock_irqrestore(spinlock_t *lock,
> unsigned long flags)
> 339     {
> 340             raw_spin_unlock_irqrestore(&lock->rlock, flags);
> 341     }
> 342
> 343     static inline int spin_trylock_bh(spinlock_t *lock)
> 344     {
> 
> Can you pls advise what might be the issue?
> 
> Thanks,
> Alex.


sorry, but nothing occurs to me that might be the cause.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-07-29  5:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-18 10:59 md/raid5: raid5d livelocks after drive failure during resync Alexander Lyakas
2013-07-28  6:30 ` Alexander Lyakas
2013-07-29  5:25 ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).