public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Valerie Aurora <vaurora@redhat.com>
To: Sage Weil <sage@newdream.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: user transactions and ENOSPC...
Date: Wed, 7 Oct 2009 17:58:04 -0400	[thread overview]
Message-ID: <20091007215804.GE17516@shell> (raw)
In-Reply-To: <Pine.LNX.4.64.0909251405560.9827@cobra.newdream.net>

On Fri, Sep 25, 2009 at 02:10:14PM -0700, Sage Weil wrote:
> Hi everyone,
> 
> So, the btrfs user transaction ioctls work like so
> 
>  ioctl(fd, BTRFS_IOC_TRANS_START);
>  /* do many operations: write(), setxattr(), rmdir(), whatever. */
>  ioctl(fd, BTRFS_IOC_TRANS_END);    /* or close(fd); */
> 
> and allow an application to ensure some number of operations commit to 
> disk together.  Ceph's storage daemon uses this to avoid the overhead of 
> maintaining a write-ahead journal for complex updates.  I can see this 
> being useful for lots of other services too, since it can avoid all kinds 
> of (often slow) atomicity games.
> 
> But there are two problems with the user transaction ioctls as 
> implemented...
> The first is that we may get ENOSPC somewhere between START and END
> without any prior warning.  The patch below is intended to fix that by
> adding a new reservation category used only by a new TRANS_RESV_START
> ioctl.  It'll allow an application to specify the total amount of data
> it wants to write when the transaction starts, and get ENOSPC right
> away before it starts making changes.
> 
> This isn't a perfect solution: a mix of a transaction workload a regular
> workload will violate the reservations, and we can't really fix that
> without knowing whether any given write() or whatever belongs to a user
> transaction or not.
> 
> The second problem is that the application may die between START and 
> END. The current ioctls are "safe" in that the transaction handle is 
> closed when the struct file is released, so the fs won't get wedged if 
> you say segfault.  On the other hand, they're "unsafe" in that a process 
> that is killed or segfaults will result in an imcomplete transaction 
> making it to disk, which leaves the file system in an inconsistent state 
> (from the point of view of the application).

This is a pet peeve of mine - exporting file system transactions to
user space usually has these problems.

I would be quite interested in seeing the Featherstitch-style
patchgroups implemented on btrfs.  Do you think the ordering
guarantees they give would work for Ceph's storage daemon?

http://featherstitch.cs.ucla.edu/
http://lwn.net/Articles/354861/

-VAL

  parent reply	other threads:[~2009-10-07 21:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-25 21:10 user transactions and ENOSPC Sage Weil
2009-09-26 14:08 ` Daniel J Blueman
2009-09-28 16:05   ` Sage Weil
2009-10-07 21:58 ` Valerie Aurora [this message]
2009-10-08  4:07   ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091007215804.GE17516@shell \
    --to=vaurora@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox