public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: md/raid5 bug by commit 415e72d03
@ 2011-07-08  5:58 Qin Dehua
  2011-07-14  6:52 ` NeilBrown
  0 siblings, 1 reply; 2+ messages in thread
From: Qin Dehua @ 2011-07-08  5:58 UTC (permalink / raw)
  To: neilb; +Cc: linux-kernel, Dan Williams

By bisecting, commit 415e72d03(md/raid5: Allow recovered part of
partially recovered devices to be in-sync) was found as the cause of
follow problem:
	* md1_raid5 or md1_raid6 process will hang(99% CPU usage) after
repeatedly remove disk from then re-add disk to raid5 or raid6(there
is also a dd process write to the raid continuously).
	
Hardware platform is IOP 341 XScale processor(did not test on other
platforms). The problem can be reproduced with this script:

{ while true; do dd if=/dev/zero of=/dev/md1 bs=1M count=90000 >
/dev/null 2>/dev/null;done; } &

while true;do
	mdadm /dev/md1 -f /dev/sda /dev/sdb
	sleep 1
	mdadm /dev/md1 -r /dev/sda /dev/sdb
	sleep 1
	mdadm /dev/md1 -a /dev/sda /dev/sdb
	sleep 6
done

Regards,
Qin Dehua

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: PROBLEM: md/raid5 bug by commit 415e72d03
  2011-07-08  5:58 PROBLEM: md/raid5 bug by commit 415e72d03 Qin Dehua
@ 2011-07-14  6:52 ` NeilBrown
  0 siblings, 0 replies; 2+ messages in thread
From: NeilBrown @ 2011-07-14  6:52 UTC (permalink / raw)
  To: Qin Dehua; +Cc: linux-kernel, Dan Williams

On Fri, 8 Jul 2011 13:58:39 +0800 Qin Dehua <qindehua@gmail.com> wrote:

> By bisecting, commit 415e72d03(md/raid5: Allow recovered part of
> partially recovered devices to be in-sync) was found as the cause of
> follow problem:
> 	* md1_raid5 or md1_raid6 process will hang(99% CPU usage) after
> repeatedly remove disk from then re-add disk to raid5 or raid6(there
> is also a dd process write to the raid continuously).
> 	
> Hardware platform is IOP 341 XScale processor(did not test on other
> platforms). The problem can be reproduced with this script:
> 
> { while true; do dd if=/dev/zero of=/dev/md1 bs=1M count=90000 >
> /dev/null 2>/dev/null;done; } &
> 
> while true;do
> 	mdadm /dev/md1 -f /dev/sda /dev/sdb
> 	sleep 1
> 	mdadm /dev/md1 -r /dev/sda /dev/sdb
> 	sleep 1
> 	mdadm /dev/md1 -a /dev/sda /dev/sdb
> 	sleep 6
> done

I cannot reproduce this on x86_64.

Can you find out more about the hanging process?  Maybe
  cat /proc/PROCESS-ID/stack

a few times and see where it is spending its time??

NeilBrown

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-07-14  6:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-08  5:58 PROBLEM: md/raid5 bug by commit 415e72d03 Qin Dehua
2011-07-14  6:52 ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox