From: Bron Gondwana <brong@fastmail.fm>
To: Kyle Moffett <mrmacman_g4@mac.com>
Cc: Bryan Henderson <hbryan@us.ibm.com>,
Jack Stone <jack@hawkeye.stone.uk.eu.org>,
Andrew Morton <akpm@linux-foundation.org>,
alan <alan@clueserver.org>, "H. Peter Anvin" <hpa@zytor.com>,
linux-fsdevel@vger.kernel.org,
LKML Kernel <linux-kernel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
git@vger.kernel.org
Subject: Re: Versioning file system
Date: Tue, 19 Jun 2007 17:58:57 +1000 [thread overview]
Message-ID: <20070619075857.GA2944@brong.net> (raw)
In-Reply-To: <6E9A6F9E-8948-40F2-9129-1F1491D49D83@mac.com>
On Mon, Jun 18, 2007 at 11:10:42PM -0400, Kyle Moffett wrote:
> On Jun 18, 2007, at 13:56:05, Bryan Henderson wrote:
>>> The question remains is where to implement versioning: directly in
>>> individual filesystems or in the vfs code so all filesystems can use it?
>>
>> Or not in the kernel at all. I've been doing versioning of the types I
>> described for years with user space code and I don't remember feeling that
>> I compromised in order not to involve the kernel.
>
> What I think would be particularly interesting in this domain is something
> similar in concept to GIT, except in a file-system:
I've written a couple of user-space things very much like this - one
being a purely database (blobs in database, yeah I know) system for
managing medical data, where signatures and auditability were the most
important part of the system. Performance really wasn't a
consideration.
The other one is my current job, FastMail - we have a virtual filesystem
which uses files stored by sha1 on ordainary filesystems for data
storage and a database for metadata (filename to sha1 mappings, mtime,
mimetype, directory structure, etc).
Multiple machine distribution is handled by a daemon on each machine
which can be asked to make sure the file gets sent out to every machine
that matches the prefix and will only return success once it's written
to at least one other machine. Database replication is a different
beast.
It can work, but there's one big pain at the file level: no mmap.
If you don't want to support mmap it can work reasonably happily, though
you may want to keep your sha1 (or other digest) state as well as the
final digest so you can cheaply calculate the digest for a small append
without walking the entire file. You may also want to keep state
checkpoints every so often along a big file so that truncates don't cost
too much to recalculate.
Luckily in a userspace VFS that's only accessed via FTP and DAV we can
support a limited set of operations (basically create, append, read,
delete) You don't get that luxury for a general purpose filesystem, and
that's the problem. There will always be particular usage patterns
(especially something that mmaps or seeks and touches all over the place
like a loopback mounted filesystem or a database file) that just dodn't
work for file-level sha1s.
It does have some lovely properties though. I'd enjoy working in an
envionment that didn't look much like POSIX but had the strong
guarantees and auditability that addressing by sha1 buys you.
Bron.
next prev parent reply other threads:[~2007-06-19 8:29 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <OF7FA807A1.64C0D5AF-ON882572FE.0061B34C-882572FE.00628322@us.ibm.com>
2007-06-19 3:10 ` Versioning file system Kyle Moffett
2007-06-19 7:49 ` Jack Stone
2007-06-19 7:58 ` Bron Gondwana [this message]
2007-06-20 2:43 ` Kyle Moffett
2007-06-19 9:09 ` Martin Langhoff
2007-06-19 16:52 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070619075857.GA2944@brong.net \
--to=brong@fastmail.fm \
--cc=akpm@linux-foundation.org \
--cc=alan@clueserver.org \
--cc=git@vger.kernel.org \
--cc=hbryan@us.ibm.com \
--cc=hpa@zytor.com \
--cc=jack@hawkeye.stone.uk.eu.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mrmacman_g4@mac.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).