public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Diego Calleja <diegocg@gmail.com>
Cc: Bill Pemberton <wfp5p@viridian.itc.virginia.edu>,
	linux-btrfs@vger.kernel.org
Subject: Re: assertion failures
Date: Fri, 26 Feb 2010 16:08:53 -0500	[thread overview]
Message-ID: <20100226210853.GM12841@think> (raw)
In-Reply-To: <201002262149.14718.diegocg@gmail.com>

On Fri, Feb 26, 2010 at 09:49:14PM +0100, Diego Calleja wrote:
> On Viernes, 26 de Febrero de 2010 20:09:15 Chris Mason escribi=F3:
> > My would be the super block, it is updated more often and so more l=
ikely
> > to get stuck in the array's cache.
>=20
> IIRC, this is exactly the same problem that ZFS users have been
> hitting. Some users got cheap disks that don't honour barriers
> correctly, so their uberblock didn't have the correct data.

This isn't new, XFS and reiserfs v3 have had problems as well.  But,
this is just my first suspect, Bill might be hitting something entirely
different.

> They developed an app that tries to rollback transactions to
> get the pool into a sane state...I guess that fsck will be able
> to do that at some point?

Yes, this is something that fsck will need to fix.  This corruption is
hardest because it involves the tree that maps all the other trees
(ugh).

The ioctl I'm working on for snapshot/subvol listing will make it easie=
r
to create a program to backup the chunk tree externally.

>=20
> Stupid question from someone who is not a fs dev...it's not possible
> to solve this issue by doing some sort of "superblock journaling"?
> Since there are several superblock copies you could:
>  -Modify a secondary superblock copy to point to the tree root block
>   that still has not been written to disk
>  -Write whatever tree root block has been COW'ed
>  -Modify the primary superblock
>=20
> So in case of these failures, mount code could look in the secondary
> superblock copy before failing. Since barriers are not being honoured=
,
> there's still a possibility that the tree root blocks would be writte=
n
> before the secondary superblock block that was submitted before, but
> that problem would be much harder to hit I guess. But maybe the fs co=
de
> can not know where the tree root blocks are going to be written befor=
e
> writting them, and hence it can't generate a valid superblock?
>=20
> Sorry if all this has not sense at all, I'm just wondering if there's
> a way to solve these drive issues without any kind of recovery tools

The problem is that with a writeback cache, any write is likely
to be missed on power failures.  journalling in general requires some
notion of being able to wait for block A to be on disk before you write
block B, and that's difficult to do when the disk lies about what is
really there ;)

To make things especially difficult, you can't really just roll back to
an older state.  Internally the filesystem does something like this:

allocate a bunch of blocks
free a bunch of blocks

commit

reuse blocks that were freed

Basically once that commit is on disk, we're allowed to (and likely to)
start writing over blocks that were freed in the earlier transaction.
If you try to roll back to the state at the start of that transaction
many of those blocks won't have the same data they did before.

Now, the size of the corruption might be smaller in the rolled back
transaction than in the main transaction or it might be much worse.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-02-26 21:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-24 13:45 assertion failures Bill Pemberton
2010-02-25  0:40 ` Chris Mason
2010-02-25 14:04   ` Bill Pemberton
2010-02-25 18:28     ` Gustavo Alves
2010-02-26 16:13       ` Chris Mason
2010-02-26 16:15         ` Chris Mason
2010-02-26 19:57           ` Gustavo Alves
2010-02-26 21:10             ` Chris Mason
2010-02-26 21:26               ` Gustavo Alves
2010-02-26 16:17     ` Chris Mason
2010-02-26 16:41       ` Bill Pemberton
2010-02-26 17:59         ` Chris Mason
2010-02-26 18:11           ` Bill Pemberton
2010-02-26 19:09             ` Chris Mason
2010-02-26 20:43               ` Bill Pemberton
2010-02-26 20:49               ` Diego Calleja
2010-02-26 21:08                 ` Chris Mason [this message]
2010-02-28  3:05                   ` Cláudio Martins
2010-02-26 19:11             ` Mike Fedyk
2010-02-26 19:15               ` Chris Mason
2010-02-26 20:45                 ` Bill Pemberton
2010-02-26 20:53                   ` Chris Mason
2010-02-27 22:56                     ` Bill Pemberton
2010-02-26 20:44               ` Bill Pemberton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100226210853.GM12841@think \
    --to=chris.mason@oracle.com \
    --cc=diegocg@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wfp5p@viridian.itc.virginia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox