From: Valerie Aurora <vaurora@redhat.com>
To: Sage Weil <sage@newdream.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: user transactions and ENOSPC...
Date: Wed, 7 Oct 2009 17:58:04 -0400 [thread overview]
Message-ID: <20091007215804.GE17516@shell> (raw)
In-Reply-To: <Pine.LNX.4.64.0909251405560.9827@cobra.newdream.net>
On Fri, Sep 25, 2009 at 02:10:14PM -0700, Sage Weil wrote:
> Hi everyone,
>
> So, the btrfs user transaction ioctls work like so
>
> ioctl(fd, BTRFS_IOC_TRANS_START);
> /* do many operations: write(), setxattr(), rmdir(), whatever. */
> ioctl(fd, BTRFS_IOC_TRANS_END); /* or close(fd); */
>
> and allow an application to ensure some number of operations commit to
> disk together. Ceph's storage daemon uses this to avoid the overhead of
> maintaining a write-ahead journal for complex updates. I can see this
> being useful for lots of other services too, since it can avoid all kinds
> of (often slow) atomicity games.
>
> But there are two problems with the user transaction ioctls as
> implemented...
> The first is that we may get ENOSPC somewhere between START and END
> without any prior warning. The patch below is intended to fix that by
> adding a new reservation category used only by a new TRANS_RESV_START
> ioctl. It'll allow an application to specify the total amount of data
> it wants to write when the transaction starts, and get ENOSPC right
> away before it starts making changes.
>
> This isn't a perfect solution: a mix of a transaction workload a regular
> workload will violate the reservations, and we can't really fix that
> without knowing whether any given write() or whatever belongs to a user
> transaction or not.
>
> The second problem is that the application may die between START and
> END. The current ioctls are "safe" in that the transaction handle is
> closed when the struct file is released, so the fs won't get wedged if
> you say segfault. On the other hand, they're "unsafe" in that a process
> that is killed or segfaults will result in an imcomplete transaction
> making it to disk, which leaves the file system in an inconsistent state
> (from the point of view of the application).
This is a pet peeve of mine - exporting file system transactions to
user space usually has these problems.
I would be quite interested in seeing the Featherstitch-style
patchgroups implemented on btrfs. Do you think the ordering
guarantees they give would work for Ceph's storage daemon?
http://featherstitch.cs.ucla.edu/
http://lwn.net/Articles/354861/
-VAL
next prev parent reply other threads:[~2009-10-07 21:58 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-25 21:10 user transactions and ENOSPC Sage Weil
2009-09-26 14:08 ` Daniel J Blueman
2009-09-28 16:05 ` Sage Weil
2009-10-07 21:58 ` Valerie Aurora [this message]
2009-10-08 4:07 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091007215804.GE17516@shell \
--to=vaurora@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.