From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753430Ab1GNGwo (ORCPT ); Thu, 14 Jul 2011 02:52:44 -0400 Received: from cantor2.suse.de ([195.135.220.15]:58661 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751623Ab1GNGwn (ORCPT ); Thu, 14 Jul 2011 02:52:43 -0400 Date: Thu, 14 Jul 2011 16:52:31 +1000 From: NeilBrown To: Qin Dehua Cc: linux-kernel@vger.kernel.org, Dan Williams Subject: Re: PROBLEM: md/raid5 bug by commit 415e72d03 Message-ID: <20110714165231.0cda7762@notabene.brown> In-Reply-To: References: X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.1; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 8 Jul 2011 13:58:39 +0800 Qin Dehua wrote: > By bisecting, commit 415e72d03(md/raid5: Allow recovered part of > partially recovered devices to be in-sync) was found as the cause of > follow problem: > * md1_raid5 or md1_raid6 process will hang(99% CPU usage) after > repeatedly remove disk from then re-add disk to raid5 or raid6£¨there > is also a dd process write to the raid continuously). > > Hardware platform is IOP 341 XScale processor(did not test on other > platforms£©. The problem can be reproduced with this script: > > { while true; do dd if=/dev/zero of=/dev/md1 bs=1M count=90000 > > /dev/null 2>/dev/null;done; } & > > while true;do > mdadm /dev/md1 -f /dev/sda /dev/sdb > sleep 1 > mdadm /dev/md1 -r /dev/sda /dev/sdb > sleep 1 > mdadm /dev/md1 -a /dev/sda /dev/sdb > sleep 6 > done I cannot reproduce this on x86_64. Can you find out more about the hanging process? Maybe cat /proc/PROCESS-ID/stack a few times and see where it is spending its time?? NeilBrown