From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:52931 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752047AbaE0TY4 (ORCPT ); Tue, 27 May 2014 15:24:56 -0400 Message-ID: <5384E728.3070403@fb.com> Date: Tue, 27 May 2014 15:27:36 -0400 From: Chris Mason MIME-Version: 1.0 To: Marc MERLIN CC: Duncan <1i5t5.duncan@cox.net>, , Subject: Re: 3.15.0-rc5: now sync and mount are hung on call_rwsem_down_write_failed References: <20140522090921.GA12037@merlins.org> <20140522131528.GB22952@merlins.org> <20140523002243.GE12312@merlins.org> <20140523141722.GA11991@merlins.org> <537FAE91.20308@fb.com> <20140523231337.GC12384@merlins.org> In-Reply-To: <20140523231337.GC12384@merlins.org> Content-Type: text/plain; charset="ISO-8859-1" Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 05/23/2014 07:13 PM, Marc MERLIN wrote: > On Fri, May 23, 2014 at 04:24:49PM -0400, Chris Mason wrote: >>> I was able to kill btrfs send and receive, but mencoder is very hung, and >>> sync does not finish either: >>> 10654 merlin sync sync_inodes_sb >>> 17191 merlin sync call_rwsem_down_read_failed >>> >>> I'm not posting the sysrq-w every time, but I have it available if needed. >> >> Hi Marc, >> >> Can I have the sysrq-w from this one if it's still available? > > Argh, just found out that the bug caused none of the 2 copies to ever > be committed to disk (including an ext4 partition), and the remote > syslog lost too much for it to be useful. > > What's more weird is the previous one, where I was able to copy the > syslog data that never got committed to disk but was still in the page > cache to another machine, I just realized that this one is missing the > beginning (it starts at cpu #4). > > So it looks like the only complete one I have right now is > https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs-hang.txt&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=trVl686QjTewKFAeRvMI4%2BQqLBCr36hUPGAiCv6xEMk%3D%0A&s=8b775a694311d54d110d686f86531ca5ce2db479b2aa5966d6056ebf173825b8 > > If you need more, please let me know, and I'll make sure that I save > that very carefully next time. It's not 100% clear what is going on here. You have a number of procs waiting for page locks, one of which is trying to read in your free space cache. Was this one of your machines with metadata corruption? More traces definitely help. -chris