From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx14.extmail.prod.ext.phx2.redhat.com [10.5.110.19]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p28Do23J015512 for ; Tue, 8 Mar 2011 08:50:02 -0500 Received: from mail.sf-mail.de (mail.sf-mail.de [62.27.20.61]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p28Dno1o030259 for ; Tue, 8 Mar 2011 08:49:51 -0500 Received: from business-088-079-120-127.static.arcor-ip.net ([::ffff:88.79.120.127] HELO eb-work1.ma.silicon-software.de) (auth=eike@sf-mail.de) by mail.sf-mail.de (Qsmtpd 0.19svn) with (DHE-RSA-AES256-SHA encrypted) ESMTPSA for ; Tue, 08 Mar 2011 13:49:48 +0000 From: Rolf Eike Beer Date: Tue, 8 Mar 2011 14:49:44 +0100 References: <201103081038.38338.eike-kernel@sf-tec.de> In-Reply-To: <201103081038.38338.eike-kernel@sf-tec.de> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Message-Id: <201103081449.44783.eike-kernel@sf-tec.de> Subject: Re: [linux-lvm] 2.6.37.2: LVM pvmove hangs system Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: Text/Plain; charset="iso-8859-1" To: linux-lvm@redhat.com Cc: linux-kernel@vger.kernel.org Am Dienstag 08 M=EF=BF=BDrz 2011, 10:38:38 schrieb Rolf Eike Beer: > Hi all, >=20 > I'm experiencing a very annoying system lockup for some days. The setup is > as follows: >=20 > -two pairs of SATA disks that are bundled into a software raid 1 each > -each of the raid devices is a physical volume > -a volume group that includes both pv's > -all mounted volumes (including root and swap) are in that vg >=20 > The machine is a Xeon E5520 with 16G RAM that is otherwise idle, so swap > shouldn't matter. And from what I read out of the documentation this all > looks perfectly sane, but: >=20 > Now I try to move the data from one pv to the other using pv. This prints > out the current state (currently 10.9%) and then starts doing something. > Two minutes later the kernel will complain: After some further testing I _think_ I have an idea what's going on: this i= s a=20 deadlock somewhere in the I/O stack. I have recompiled the kernel with all = the=20 lock debugging enabled and will probably test this but this is a production= =20 machine that should better get online again better sooner than later so my = amount of what I can test is pretty limited. Since the machine is currently= =20 doing the move and actually working I have not yet booted into the debug=20 kernel. What I did was basically stopping everything on the machine. The only=20 userspace programs currently running are init, my sshd, my screen, shell, a= nd=20 of course pvmove. And now it works. Whenever I try to do anything that caus= es=20 I/O in parallel the machine will stop working. So this box is basically at = runlevel 1 now moving all the stuff around instead of doing some useful wor= k=20 while moving in the background :( Eike