From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dkim1.fusionio.com ([66.114.96.53]:60988 "EHLO dkim1.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755575Ab3EQQz0 (ORCPT ); Fri, 17 May 2013 12:55:26 -0400 Received: from mx1.fusionio.com (unknown [10.101.1.160]) by dkim1.fusionio.com (Postfix) with ESMTP id 45AAB7C04F9 for ; Fri, 17 May 2013 10:55:26 -0600 (MDT) Date: Fri, 17 May 2013 12:54:56 -0400 From: Josef Bacik To: Marc MERLIN CC: "linux-btrfs@vger.kernel.org" Subject: Re: kernel 3.8.8: btrfs still crashes on boot when it can't replay a log Message-ID: <20130517165456.GH1765@localhost.localdomain> References: <20130516150918.GB26762@merlins.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" In-Reply-To: <20130516150918.GB26762@merlins.org> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu, May 16, 2013 at 09:09:18AM -0600, Marc MERLIN wrote: > I've reported this bug a few times over different kernel versions over the > last year now, and unfortunately it's still not fixed as of 3.8 (yes, I know > 3.9 is out, I'm just about to switch). > > What happens as far as I know: > I have btrfs on top of dmcrypt on an SDD. > > The SSD on occasion seems to just hang, so I have to power cycle my laptop. > I can't say how much the SSD did and did not write before stopping to work. > > Then, maybe one time out of 2 or 3, btrfs crashes when I reboot and it tries > to replay the log. > > I'm then forced to do this from emergency boot media: > > gandalfthegreat:~# btrfs-zero-log /dev/mapper/root > Check tree block failed, want=64855564288, have=14954667565421255623 > Check tree block failed, want=64855564288, have=14954667565421255623 > Check tree block failed, want=64855564288, have=7474503720151340134 > Check tree block failed, want=64855564288, have=14954667565421255623 > Check tree block failed, want=64855564288, have=14954667565421255623 > read block failed check_tree_block > > The last bits of the crash before I zero the log: > http://marc.merlins.org/tmp/btrfs-3.8.8.jpg > > Still issues with btrfs_numb_copies. > > This has been going on for over a year now, not very pleasant :) > > Is there no way you can corrupt logs in a test lab and reproduce this? > > Or is it still known to happen due to missing code that decides whether a log is corrupt > and whether to discard it before the code reads it and crashes? > > If so, could you add this to the list of things to fix to make btrfs a bit > less scary to others? :) > (and of course more production ready, this repeated problem would kill any > server it happens on) > This has been all fixed in 3.10. Thanks, Josef