From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx12.extmail.prod.ext.phx2.redhat.com [10.5.110.17]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id pBEEod0A010448 for ; Wed, 14 Dec 2011 09:50:39 -0500 Received: from youngberry.canonical.com (youngberry.canonical.com [91.189.89.112]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id pBEEocRD026410 for ; Wed, 14 Dec 2011 09:50:38 -0500 Received: from c-66-30-139-20.hsd1.nh.comcast.net ([66.30.139.20] helo=[192.168.1.128]) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1RaqAH-0001dq-Sg for linux-lvm@redhat.com; Wed, 14 Dec 2011 14:50:37 +0000 Message-ID: <4EE8B7B3.900@canonical.com> Date: Wed, 14 Dec 2011 09:50:27 -0500 From: "Peter M. Petrakis" MIME-Version: 1.0 References: <20111213114558.7acbf2e9@bettercgi.com> <4EE79B02.5050709@canonical.com> <20111213141040.3b090df3@bettercgi.com> <4EE7D78A.2080704@canonical.com> <20111213173301.3d504b86@bettercgi.com> In-Reply-To: <20111213173301.3d504b86@bettercgi.com> Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] access through LVM causes D state lock up Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii" To: linux-lvm@redhat.com On 12/13/2011 06:33 PM, Ray Morris wrote: >>> On Tue, 13 Dec 2011 13:35:46 -0500 >>> "Peter M. Petrakis" wrote > >> What distro and kernel on you on? > > > 2.6.32-71.29.1.el6.x86_64 (CentOS 6) > > >>> Copying the entire LVs sequentially saw no problems. Later when I >>> tried to rsync to the LVs the problem showed itself. >> >> That's remarkable as it removes the fs from the equation. What >> fs are you using? > > ext3 > >> Not a bad idea. Returning to the backtrace: > ... >> raid5_quiesce should have been straight forward >> >> http://lxr.linux.no/linux+v3.1.5/drivers/md/raid5.c#L5422 > > Interesting. Not that I speak kernel, but I may have to learn. > Please note the other partial stack trace included refers to a > different function. > > > Dec 13 09:15:52 clonebox3 kernel: Call Trace: > Dec 13 09:15:52 clonebox3 kernel: [] raid5_quiesce+0x125/0x1a0 [raid456] > Dec 13 09:15:52 clonebox3 kernel: [] ? default_wake_function+0x0/0x20 > Dec 13 09:15:52 clonebox3 kernel: [] ? __wake_up+0x53/0x70 > -- > Dec 13 09:15:52 clonebox3 kernel: Call Trace: > Dec 13 09:15:52 clonebox3 kernel: [] io_schedule+0x73/0xc0 > Dec 13 09:15:52 clonebox3 kernel: [] sync_io+0xe5/0x180 [dm_mod] > Dec 13 09:15:52 clonebox3 kernel: [] ? generic_make_request+0x1b2/0x4f0 > -- > Dec 13 09:15:52 clonebox3 kernel: Call Trace: > Dec 13 09:15:52 clonebox3 kernel: [] ? dm_table_unplug_all+0x5c/0xd0 [dm_mod] > Dec 13 09:15:52 clonebox3 kernel: [] ? ktime_get_ts+0xa9/0xe0 > Dec 13 09:15:52 clonebox3 kernel: [] ? sync_buffer+0x0/0x50 > > an earlier occurrence: > > Dec 5 23:31:34 clonebox3 kernel: Call Trace: > Dec 5 23:31:34 clonebox3 kernel: [] ? scsi_setup_blk_pc_cmnd+0x13d/0x170 > Dec 5 23:31:34 clonebox3 kernel: [] raid5_quiesce+0x125/0x1a0 [raid456] > Dec 5 23:31:34 clonebox3 kernel: [] ? default_wake_function+0x0/0x20 [snip] Still in the RAID code, just a tiny bit further. I assume when you examine lsscsi -l that all the disks are 'running' at this point? > > >> At this point I think you might have more of an MD issue than >> anything else. If you could take MD out of the picture by using a >> single disk or use a HW RAID, that would be a really useful data >> point. > > I _THINK_ it was all hardware RAID when this happened before, but I > can't be sure. Then you're not at your wits end, and you posses the HW to isolate this issue. Please retry your experiment and keep us posted. Peter