Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Filipe David Manana <fdmanana@gmail.com>
Cc: Filipe Manana <fdmanana@suse.com>,
	fstests@vger.kernel.org,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] fstests: add generic test to verify xattr replace operations are atomic
Date: Tue, 11 Nov 2014 06:47:48 +1100	[thread overview]
Message-ID: <20141110194748.GQ23575@dastard> (raw)
In-Reply-To: <CAL3q7H4=-xRRWXUcLGfH1NW8riX4k0sRB3JPdb71n_rKKNeOmQ@mail.gmail.com>

On Mon, Nov 10, 2014 at 11:53:00AM +0000, Filipe David Manana wrote:
> On Mon, Nov 10, 2014 at 1:31 AM, Dave Chinner <david@fromorbit.com> wrote:
> > On Mon, Nov 10, 2014 at 12:49:12AM +0000, Filipe David Manana wrote:
> >> On Sun, Nov 9, 2014 at 11:45 PM, Dave Chinner <david@fromorbit.com> wrote:
> >> > On Fri, Nov 07, 2014 at 08:40:26PM +0000, Filipe Manana wrote:
> >> >> This test verifies that replacing a xattr's value is an atomic
> >> >> operation. This is motivated by an issue in btrfs where replacing
> >> >> a xattr's value wasn't an atomic operation, it consisted of
> >> >> removing the old value and then inserting the new value in a
> >> >> btree. This made readers (getxattr and listxattrs) not getting
> >> >> neither the old nor the new value during a short time window.
> >> >
> >> > OK, seems like a good thing to test that the application can only
> >> > see the old or the new value.
> >> >
> >> > However, I can't help but wonder about whether the btrfs behaviour
> >> > is crash safe as it wasn't designed to be atomic from the ground up.
> >> > i.e. if the system crashes half way through a attribute overwrite,
> >> > what does btrfs end up with as a result? XFS is guaranteed at a
> >> > transactional level to return either the old or the new value,
> >> > depending on where in the operaiton the crash occurred, but I'd just
> >> > assumed that everyone did attribute replace atomically so it never
> >> > occurred to me that it might be an issue...
> >>
> >> It's crash safe. Both the delete and insert were done in the same
> >> transaction, so a crash in between both operations (or after both and
> >> before the transaction commit) would result in always seeing the old
> >> value (or better saying, the last persisted value by a transaction
> >> commit or fsync).
> >
> > Alright, so no crash issues because all the modifications are in a
> > single transaction. However, if both modifications are made in the
> > same transaction, then this bug implies that a user can read a
> > metadata object in the btree whilst somethign else is concurrently
> > modifying it, right? i.e. that there is no serialisation between
> > inode metadata reads and transactional modification operations?
> 
> Nop, not exactly the possibility to read while being modified. The
> replace steps were:
> 
> lock btree node(s), delete current value from leaf, unlock btree
> nodes, lock btree nodes, write new value to leaf, unlock btree nodes

You missed "transaction start" and "transaction commit", which
define the boundaries of a modification in progress.

> Readers (a getxattr or listxattrs syscall for e.g.), could just after
> the first btree node unlocking done by the writer: lock btree nodes,
> get current value (none found), unlock btree nodes, and then return
> -ENODATA.

Right, and that's a exact demonstration "read while modifying" when
"modifying" means an object has been modified in an uncommitted
transaction. This indicates the transaction failed both the
Atomicity and Isolation properties of an ACID transaction model.

I hope this isn't a common pattern in btrfs -  it worries me just to
know that the btrfs transaction subsytem does not, at minimum, throw
warnings when ACID attributes are violated...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2014-11-10 19:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-07 20:40 [PATCH] fstests: add generic test to verify xattr replace operations are atomic Filipe Manana
2014-11-09 23:45 ` Dave Chinner
2014-11-10  0:49   ` Filipe David Manana
2014-11-10  1:31     ` Dave Chinner
2014-11-10 11:53       ` Filipe David Manana
2014-11-10 19:47         ` Dave Chinner [this message]
2014-11-10  0:48 ` [PATCH v2] " Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141110194748.GQ23575@dastard \
    --to=david@fromorbit.com \
    --cc=fdmanana@gmail.com \
    --cc=fdmanana@suse.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox