From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:26834 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932833AbaCQOe4 (ORCPT ); Mon, 17 Mar 2014 10:34:56 -0400 Message-ID: <532707FB.3010000@fb.com> Date: Mon, 17 Mar 2014 10:34:35 -0400 From: Josef Bacik MIME-Version: 1.0 To: Rich Freeman CC: Zach Brown , Subject: Re: [PATCH] Btrfs: fix deadlock with nested trans handles References: <1394150467-5990-1-git-send-email-jbacik@fb.com> <20140307002549.GC16439@lenny.home.zabbo.net> <53207C4B.6020000@fb.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 03/14/2014 06:40 PM, Rich Freeman wrote: > On Wed, Mar 12, 2014 at 12:34 PM, Rich Freeman > wrote: >> On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik wrote: >>> On 03/12/2014 08:56 AM, Rich Freeman wrote: >>>> >>>> After a number of reboots the system became stable, presumably >>>> whatever race condition btrfs was hitting followed a favorable >>>> path. >>>> >>>> I do have a 2GB btrfs-image pre-dating my application of this >>>> patch that was causing the issue last week. >>>> >>> >>> Uhm wow that's pretty epic. I will talk to chris and figure out how >>> we want to deal with that and send you a patch shortly. Thanks, >> >> A tiny bit more background. > > And some more background. I had more reboots over the next two days > at the same time each day, just after my crontab successfully > completed. One of the last thing it does is runs the snapper cleanups > which delete a bunch of snapshots. During a reboot I checked and > there were a bunch of deleted snapshots, which disappeared over the > next 30-60 seconds before the panic, and then they would re-appear on > the next reboot. > > I disabled the snapper cron job and this morning had no issues at all. > One day isn't much to establish a trend, but I suspect that this is > the cause. Obviously getting rid of snapshots would be desirable at > some point, but I can wait for a patch. Snapper would be deleting > about 48 snapshots at the same time, since I create them hourly and > the cleanup occurs daily on two different subvolumes on the same > filesystem. Ok that's helpful, I'm no longer positive I know what's causing this, I'll try to reproduce once I've nailed down these backref problems and balance corruption. Thanks, Josef