linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Ville Herva <vherva@vianova.fi>
Cc: Jan Hudec <bulb@ucw.cz>, John Stoffel <john@stoffel.org>,
	"Artem B. Bityuckiy" <dedekind@oktetlabs.ru>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: filesystem transactions API
Date: Wed, 27 Apr 2005 16:17:58 +0100	[thread overview]
Message-ID: <20050427151758.GE1957@mail.shareable.org> (raw)
In-Reply-To: <20050427134331.GT5470@viasys.com>

Ville Herva wrote:
> > How do we specify which calls belong to a transaction? By some kind of
> > extra file handle?
> > 
> > I'd think having global per-process transaction is not the best way.
> > So I think we should have some kind of transaction handle (probably in
> > the file handle space) and a way to say that a syscall is done within
> > a transaction. To avoid duplicating all syscalls, we could have
> > set_active_transaction() operation.
> 
> That's more or less what NTFS does. See the example at
> http://blogs.msdn.com/because_we_can/

That's the obvious choice but it limits the usefulness quite a lot.

If we have transactions, then I'd like to be able to do this from a shell:

    transaction_open t

    tar xvpSfz blahblah.tar.gz
    cd blahblah
    patch -p1 -E < foo.patch
    # etc.

    transaction_close $t

I'd also like to write inside a single C program:

    transaction * t = transaction_open ();

    /* Ordinary complicated filesystem operations here... */
    link (a, b);
    rename (c, d);
    read, write, stat etc.
    conf = open ("/etc/blahblah.conf", O_RDONLY);
    read (conf, ...)
    close (conf);
    /* If /etc/blahblah.conf is changed by another program during
       the transaction, the transaction is invalidated, because the
       dbm update below is dependent on what was read... */
    dbm_open (...);
    do_dbm_stuff (...);
    dbm_close (...);
    /* Whatever this command does, I'd like to include in the transaction. */
    system ("perl -pi -e 's/old_value/new_value/g' /etc/another.conf");

    transaction_close (t);

Fundamentally, if transactions are supported in the kernel then these
two usages are easy to offer:

    1. Ordinary file system calls as part of a transaction.

       This allows libraries which are not transaction-aware to be
       used, such as the dbm example above, and other things like XML
       parsers/writers.

    2. Subprocesses inherit a transaction, so a program can execute
       complex transactions by using other programs.

It's useful, and there is no good reason to disallow that.

Nonetheless, there's a need for some kind of transaction handles.  A
file descriptor representing a transaction seems like a natural fit.

Complex programs will want to have multiple transactions at the same
time: For example, any program structured using event-driven logic or
async I/O may have multiple independent state machines per thread,
each wanting to be able to have their own transactions.

This suggests a few things:

  - Transactions have a file descriptor to represent them.

  - Each thread has a "current transaction" that applies to all filesystem
    operations.

  - Concurrent threads will need their own current transactions, even
    while keeping "current directory" global to the whole process for
    POSIX reasons.  A process wide "current transaction" is too coarse.

  - Transactions should be automatically nestable: a program or
    library which uses transactions should itself be callable from a
    program or library which is using a transaction.

  - Transactions should record whether they cannot provide
    transactions for some operation that is attempted (e.g. writing to
    a file on a remote filesystem), aborting the transaction.

  - When a transaction aborts due to the actions of _another_ process
    (or thread) which is outside the transaction, that abort is an
    event which should be detectable synchronously (by polling the
    transaction fd) or asynchronously (by a signal - the SIGIO
    mechanism is fine for this).

  - An exclusive locking period should be optional, requested by a
    flag when opening the transaction.  Most usages will want the
    locking period with its default parameters.

  - Ideally, programs or mechanisms which provide alternative views of
    part of a filesystem, such as search results (Beagle), tarfs, or
    mailfs, should be able to update synchronously with transactions
    that affect whatever the view is watching, so that the view
    changes are effectively part of the transaction.  This does _not_
    mean that a transaction must wait for watchers to calculate
    anything.  It does mean a transaction must synchronously and
    simultaneously invalidate caches held by watchers during the
    atomic commit.

-- Jamie

  reply	other threads:[~2005-04-27 15:18 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-24 20:08 [PATCH] private mounts Miklos Szeredi
2005-04-24 20:13 ` Al Viro
2005-04-24 20:45   ` Miklos Szeredi
2005-04-24 20:18 ` Christoph Hellwig
2005-04-24 20:50   ` Miklos Szeredi
2005-04-24 20:54     ` Al Viro
2005-04-24 20:59       ` Miklos Szeredi
2005-04-24 21:06         ` Christoph Hellwig
2005-04-24 21:12           ` Jamie Lokier
2005-04-24 21:06         ` Al Viro
2005-04-24 21:15           ` Miklos Szeredi
2005-04-24 21:19             ` Al Viro
2005-04-24 21:29               ` Miklos Szeredi
2005-04-24 21:39                 ` Jamie Lokier
2005-04-25  7:10                 ` Jan Hudec
2005-04-25  9:58                   ` Miklos Szeredi
2005-04-25 11:45                     ` Jan Hudec
2005-04-30  8:35                     ` Christoph Hellwig
2005-04-30  9:25                       ` Miklos Szeredi
2005-04-30  9:42                         ` Jamie Lokier
2005-04-30 10:14                           ` Miklos Szeredi
2005-04-30 14:36                             ` Jamie Lokier
2005-04-30 15:59                               ` Miklos Szeredi
2005-04-30 16:42                                 ` Jamie Lokier
2005-04-30 17:07                                   ` Miklos Szeredi
2005-04-30 18:20                                     ` Olivier Galibert
2005-04-30 23:58                                       ` Jamie Lokier
2005-05-01  2:39                                         ` Ram
2005-04-30 23:54                                     ` Jamie Lokier
2005-05-01  5:56                                       ` Miklos Szeredi
2005-05-01  6:39                                         ` Miklos Szeredi
2005-05-01 15:41                                         ` Eric Van Hensbergen
2005-05-11  9:00                         ` Christoph Hellwig
2005-05-11 10:42                           ` Miklos Szeredi
2005-04-24 21:43               ` Jamie Lokier
2005-04-25  7:14                 ` Jan Hudec
2005-04-27  9:14                 ` Helge Hafting
2005-04-25  9:48               ` Olivier Galibert
2005-04-25 16:37                 ` Tim Hockin
2005-04-30  8:37                 ` Christoph Hellwig
2005-04-25 21:09               ` Bryan Henderson
2005-04-26 13:46                 ` filesystem transactions API Ville Herva
2005-04-26 14:14                   ` Jamie Lokier
2005-04-26 14:22                     ` Artem B. Bityuckiy
2005-04-26 14:32                       ` Jamie Lokier
2005-04-26 14:46                         ` Artem B. Bityuckiy
2005-04-26 15:19                           ` Jamie Lokier
2005-04-26 15:01                         ` John Stoffel
2005-04-26 15:12                           ` Lars Marowsky-Bree
2005-04-26 15:19                           ` Trond Myklebust
2005-04-26 15:29                             ` Ritesh Kumar
2005-04-26 15:50                               ` Jamie Lokier
2005-04-26 16:44                               ` Trond Myklebust
2005-04-26 22:44                               ` Bryan Henderson
2005-04-26 15:47                             ` Jamie Lokier
2005-04-26 15:51                               ` Artem B. Bityuckiy
2005-04-26 15:56                                 ` Jamie Lokier
2005-04-26 16:01                                   ` Artem B. Bityuckiy
2005-04-27  9:14                                     ` Jan Hudec
2005-04-26 15:24                           ` Jamie Lokier
2005-04-26 17:22                             ` Diego Calleja
2005-04-26 17:38                               ` Jamie Lokier
2005-04-27  9:34                             ` Jan Hudec
2005-04-27 13:43                               ` Ville Herva
2005-04-27 15:17                                 ` Jamie Lokier [this message]
2005-04-26 15:40                       ` Charles P. Wright
2005-04-26 16:07                         ` Artem B. Bityuckiy
2005-04-26 17:22                           ` Charles P. Wright
2005-04-27  9:37                         ` Lars Marowsky-Bree
2005-04-27 13:36                       ` Andi Kleen
2005-04-26 14:25                   ` Trond Myklebust
2005-04-24 21:38           ` [PATCH] private mounts Jamie Lokier
2005-04-24 22:20             ` Ram
2005-04-24 22:22               ` Jamie Lokier
2005-04-25  6:00             ` Miklos Szeredi
2005-04-25  6:41               ` Ram
2005-04-25  9:55                 ` Miklos Szeredi
2005-04-25  7:22               ` Jan Hudec
2005-04-25 10:08                 ` Miklos Szeredi
2005-04-25 15:20             ` Pavel Machek
2005-04-25 19:07               ` Jamie Lokier
2005-04-26  9:29                 ` Pavel Machek
2005-04-26 14:07                   ` Jamie Lokier
2005-04-28 13:28                     ` Eric Van Hensbergen
2005-04-28 19:22                       ` Jamie Lokier
2005-04-28 13:47                     ` Eric Van Hensbergen
2005-04-28 19:20                       ` Jamie Lokier
2005-04-28 19:39                         ` Ram
2005-04-28 22:08                           ` Jamie Lokier
2005-04-29  7:57                             ` Ram
2005-04-29 14:13                               ` Miklos Szeredi
2005-04-29 14:42                                 ` Jamie Lokier
2005-04-29 14:50                                   ` Question about current->namespace and check_mnt() Jamie Lokier
2005-04-30  8:33                 ` [PATCH] private mounts Christoph Hellwig
2005-04-30 16:47                   ` Ram

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050427151758.GE1957@mail.shareable.org \
    --to=jamie@shareable.org \
    --cc=bulb@ucw.cz \
    --cc=dedekind@oktetlabs.ru \
    --cc=john@stoffel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vherva@vianova.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).