From: Valerie Aurora <vaurora@redhat.com>
To: Sage Weil <sage@newdream.net>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: user transactions and ENOSPC...
Date: Wed, 7 Oct 2009 17:58:04 -0400 [thread overview]
Message-ID: <20091007215804.GE17516@shell> (raw)
In-Reply-To: <Pine.LNX.4.64.0909251405560.9827@cobra.newdream.net>
On Fri, Sep 25, 2009 at 02:10:14PM -0700, Sage Weil wrote:
> Hi everyone,
>
> So, the btrfs user transaction ioctls work like so
>
> ioctl(fd, BTRFS_IOC_TRANS_START);
> /* do many operations: write(), setxattr(), rmdir(), whatever. */
> ioctl(fd, BTRFS_IOC_TRANS_END); /* or close(fd); */
>
> and allow an application to ensure some number of operations commit to
> disk together. Ceph's storage daemon uses this to avoid the overhead of
> maintaining a write-ahead journal for complex updates. I can see this
> being useful for lots of other services too, since it can avoid all kinds
> of (often slow) atomicity games.
>
> But there are two problems with the user transaction ioctls as
> implemented...
> The first is that we may get ENOSPC somewhere between START and END
> without any prior warning. The patch below is intended to fix that by
> adding a new reservation category used only by a new TRANS_RESV_START
> ioctl. It'll allow an application to specify the total amount of data
> it wants to write when the transaction starts, and get ENOSPC right
> away before it starts making changes.
>
> This isn't a perfect solution: a mix of a transaction workload a regular
> workload will violate the reservations, and we can't really fix that
> without knowing whether any given write() or whatever belongs to a user
> transaction or not.
>
> The second problem is that the application may die between START and
> END. The current ioctls are "safe" in that the transaction handle is
> closed when the struct file is released, so the fs won't get wedged if
> you say segfault. On the other hand, they're "unsafe" in that a process
> that is killed or segfaults will result in an imcomplete transaction
> making it to disk, which leaves the file system in an inconsistent state
> (from the point of view of the application).
This is a pet peeve of mine - exporting file system transactions to
user space usually has these problems.
I would be quite interested in seeing the Featherstitch-style
patchgroups implemented on btrfs. Do you think the ordering
guarantees they give would work for Ceph's storage daemon?
http://featherstitch.cs.ucla.edu/
http://lwn.net/Articles/354861/
-VAL
next prev parent reply other threads:[~2009-10-07 21:58 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-25 21:10 user transactions and ENOSPC Sage Weil
2009-09-26 14:08 ` Daniel J Blueman
2009-09-28 16:05 ` Sage Weil
2009-10-07 21:58 ` Valerie Aurora [this message]
2009-10-08 4:07 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091007215804.GE17516@shell \
--to=vaurora@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox