* Re: Versioning file system
[not found] <OF7FA807A1.64C0D5AF-ON882572FE.0061B34C-882572FE.00628322@us.ibm.com>
@ 2007-06-19 3:10 ` Kyle Moffett
2007-06-19 7:49 ` Jack Stone
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Kyle Moffett @ 2007-06-19 3:10 UTC (permalink / raw)
To: Bryan Henderson
Cc: Jack Stone, Andrew Morton, alan, H. Peter Anvin, linux-fsdevel,
LKML Kernel, Al Viro, git
On Jun 18, 2007, at 13:56:05, Bryan Henderson wrote:
>> The question remains is where to implement versioning: directly in
>> individual filesystems or in the vfs code so all filesystems can
>> use it?
>
> Or not in the kernel at all. I've been doing versioning of the
> types I described for years with user space code and I don't
> remember feeling that I compromised in order not to involve the
> kernel.
>
> Of course, if you want to do it with snapshots and COW, you'll have
> to ask where in the kernel to put that, but that's not a file
> versioning question; it's the larger snapshot question.
What I think would be particularly interesting in this domain is
something similar in concept to GIT, except in a file-system:
1) Redundancy is easy, you just ensure that you have at least "N"
distributed copies of each object, where "N" is some function of the
object itself.
2) Network replication is easy, you look up objects based on the
SHA-1 stored in the parent directory entry and cache them where
needed (IE: make the "N" function above dynamic based on frequency of
access on a given computer).
3) Snapshots are easy and cheap; an RO snapshot is a tag and an RW
snapshot is a branch. These can be easily converted between.
4) Compression is easy; you can compress objects based on any
arbitrary configurable criteria and the filesystem will record
whether or not an object is compressed. You can also compress
differently when archiving objects to secondary storage.
5) Streaming fsck-like verification is easy; ensure the hash name
field matches the actual hash of the object.
6) Fsck is easy since rollback is trivial, you can always revert
to an older tree to boot and start up services before attempting
resurrection of lost objects and trees in the background.
7) Multiple-drive or multiple-host storage pools are easy: Think
the git "alternates" file.
8) Network filesystem load-balancing is easy; SHA-1s are
essentially random so you can just assign SHA-1 prefixes to different
systems for data storage and your data is automatically split up.
Other issues:
Q. How do you deal with block allocation?
A. Same way other filesystems deal with block allocation. Reference-
counting gets tricky, especially across a network, but it's easy to
play it safe with simple cross-network refcount-journalling. Since
the _only_ thing that needs journalling is the refcounts and block-
free data, you need at most a megabyte or two of journal. If in
doubt, it's easy to play it safe and keep an extra refcount around
for an in-the-background consistency check later on. When networked-
gitfs systems crash, you just assume they still have all the
refcounts they had at the moment they died, and compare notes when
they start back up again. If a node has a cached copy of data on its
local disk then it can just nonatomically increment the refcount for
that object in its own RAM (ordered with respect to disk-flushes, of
course) and tell its peers at some point. A node should probably
cache most of its working set on local disk for efficiency; it's
trivially verified against updates from other nodes and provides an
easy way to keep refcounts for such data. If a node increments the
refcount on such data and dies before getting that info out to its
peers, then when it starts up again its peers will just be told that
it has a "new" node with insufficient replication and they will clone
it out again properly. For networked refcount-increments you can do
one of 2 things: (1) Tell at least X many peers and wait for them to
sync the update out to disk, or (2) Get the object from any peer (at
least one of whom hopefully has it in RAM) and save it to local disk
with an increased refcount.
Q. How do you actually delete things?
A. Just replace all the to-be-erased tree and commit objects before a
specified point with "History erased" objects with their SHA-1's
magically set to that of the erased objects. If you want you may
delete only the "tree" objects and leave the commits intact. If you
delete a whole linear segment of history then you can just use a
single "History erased" commit object with its parent pointed to the
object before the erased segment. Probably needs some form of back-
reference storage to make it efficient; not sure how expensive that
would be. This would allow making a bunch of snapshots and purging
them logarithmically based on passage of time. For instance, you
might have snapshots of every 5 minutes for the last hour, every 30
minutes for the last day, every 4 hours for the last week, every day
for the last month, once per week for the last year, once per month
for the last 5 years, and once per year beyond that.
That's pretty impressive data-recovery resolution, and it accounts
for only 200 unique commits after it's been running for 10 years.
Q. How do you archive data?
A. Same as deleting, except instead of a "History erased" object you
would use a "History archived" object with a little bit of string
data to indicate which volume it's stored on (and where on the
volume). When you stick that volume into the system you could easily
tell the kernel to use it as an alternate for the given storage group.
Q. What enforces data integrity?
A. Ensure that a new tree object and its associated sub objects are
on disk before you delete the old one. Doesn't need any actual full
syncs at all, just barriers. If you replace the tree object before
write-out is complete then just skip writing the old one and write
the new one in its place.
Q. What consists of a "commit"?
A. Anything the administrator wants to define it as. Useful
algorithms include: "Once per x Mbyte of page dirtying", "Once per 5
min", "Only when sync() or fsync() are called", "Only when gitfs-
commit is called". You could even combine them: "Every x Mbyte of
page dirtying or every 5 minutes, whichever is shorter (or longer,
depending on admin requirements)". There would also be appropriate
syscalls to trigger appropriate git-like behavior. Network-
accessible gitfs would want to have mechanisms to trigger commits
based on activity on other systems (needs more thought).
Q. How do you access old versions?
A. Mount another instance of the filesystem with an SHA-1 ID, a tag-
name, or a branch-name in a special mount option. Should be user
accessible with some restrictions (needs more thought).
Q. How do you deal with conflicts on networked filesystems.
A. Once again, however the administrator wants to deal with them.
Options:
1) Forcibly create a new branch for the conflicted tree.
2) Attempt to merge changes using the standard git-merge semantics
3) Merge independent changes to different files and pick one for
changes to the same file
4) Your Algorithm Here(TM). GIT makes it easy to extend
conflict-resolution.
Q. How do you deal with little scattered changes in big (or sparse)
files?
A. Two questions, two answers: For sparse files, git would need
extending to understand (and hash) the nature of the sparse-ness.
For big files, you should be able to introduce a "compound-file"
datatype and configure git to deal with specific X-Mbyte chunks of it
independently. This might not be a bad idea for native git as well.
Would need system-specific configuration.
Q. How do you prevent massive data consumption by spurious tiny changes
A. You have a few options:
1) Configure your commit algorithm as above to not commit so often
2) Configure a stepped commit-discard algorithm as described
above in the "How do you delete things" question
3) Archive unused data to secondary storage more often
Q. What about all the unanswered questions?
A. These are all the ones I could think of off the top of my head but
there are at least a hundred more. I'm pretty sure these are some of
the most significant ones.
Q. That's a great idea and I'll implement it right away!
A. Yay! (but that's not a question :-D) Good luck and happy hacking.
Q. That's a stupid idea and would never ever work!
A. Thanks for your useful input! (but that's not a question either)
I'm sure anybody who takes up a project like this will consider such
opinions.
Q. *flamage*
A. I'm glad you have such strong opinions, feel free to to continue
to spam my /dev/null device (and that's also not a question).
All opinions and comments welcomed.
Cheers,
Kyle Moffett
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Versioning file system
2007-06-19 3:10 ` Versioning file system Kyle Moffett
@ 2007-06-19 7:49 ` Jack Stone
2007-06-19 7:58 ` Bron Gondwana
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Jack Stone @ 2007-06-19 7:49 UTC (permalink / raw)
To: Kyle Moffett
Cc: Bryan Henderson, akpm, alan, hpa, linux-fsdevel, linux-kernel,
viro, git
Kyle Moffett wrote:
> On Jun 18, 2007, at 13:56:05, Bryan Henderson wrote:
>>> The question remains is where to implement versioning: directly in
>>> individual filesystems or in the vfs code so all filesystems can use it?
>>
>> Or not in the kernel at all. I've been doing versioning of the types
>> I described for years with user space code and I don't remember
>> feeling that I compromised in order not to involve the kernel.
>>
>> Of course, if you want to do it with snapshots and COW, you'll have to
>> ask where in the kernel to put that, but that's not a file versioning
>> question; it's the larger snapshot question.
>
> What I think would be particularly interesting in this domain is
> something similar in concept to GIT, except in a file-system:
> 1) Redundancy is easy, you just ensure that you have at least "N"
> distributed copies of each object, where "N" is some function of the
> object itself.
> 2) Network replication is easy, you look up objects based on the SHA-1
> stored in the parent directory entry and cache them where needed (IE:
> make the "N" function above dynamic based on frequency of access on a
> given computer).
> 3) Snapshots are easy and cheap; an RO snapshot is a tag and an RW
> snapshot is a branch. These can be easily converted between.
> 4) Compression is easy; you can compress objects based on any
> arbitrary configurable criteria and the filesystem will record whether
> or not an object is compressed. You can also compress differently when
> archiving objects to secondary storage.
> 5) Streaming fsck-like verification is easy; ensure the hash name
> field matches the actual hash of the object.
> 6) Fsck is easy since rollback is trivial, you can always revert to an
> older tree to boot and start up services before attempting resurrection
> of lost objects and trees in the background.
> 7) Multiple-drive or multiple-host storage pools are easy: Think the
> git "alternates" file.
> 8) Network filesystem load-balancing is easy; SHA-1s are essentially
> random so you can just assign SHA-1 prefixes to different systems for
> data storage and your data is automatically split up.
>
>
> Other issues:
>
> Q. How do you deal with block allocation?
> A. Same way other filesystems deal with block allocation.
> Reference-counting gets tricky, especially across a network, but it's
> easy to play it safe with simple cross-network refcount-journalling.
> Since the _only_ thing that needs journalling is the refcounts and
> block-free data, you need at most a megabyte or two of journal. If in
> doubt, it's easy to play it safe and keep an extra refcount around for
> an in-the-background consistency check later on. When networked-gitfs
> systems crash, you just assume they still have all the refcounts they
> had at the moment they died, and compare notes when they start back up
> again. If a node has a cached copy of data on its local disk then it
> can just nonatomically increment the refcount for that object in its own
> RAM (ordered with respect to disk-flushes, of course) and tell its peers
> at some point. A node should probably cache most of its working set on
> local disk for efficiency; it's trivially verified against updates from
> other nodes and provides an easy way to keep refcounts for such data.
> If a node increments the refcount on such data and dies before getting
> that info out to its peers, then when it starts up again its peers will
> just be told that it has a "new" node with insufficient replication and
> they will clone it out again properly. For networked
> refcount-increments you can do one of 2 things: (1) Tell at least X many
> peers and wait for them to sync the update out to disk, or (2) Get the
> object from any peer (at least one of whom hopefully has it in RAM) and
> save it to local disk with an increased refcount.
>
> Q. How do you actually delete things?
> A. Just replace all the to-be-erased tree and commit objects before a
> specified point with "History erased" objects with their SHA-1's
> magically set to that of the erased objects. If you want you may delete
> only the "tree" objects and leave the commits intact. If you delete a
> whole linear segment of history then you can just use a single "History
> erased" commit object with its parent pointed to the object before the
> erased segment. Probably needs some form of back-reference storage to
> make it efficient; not sure how expensive that would be. This would
> allow making a bunch of snapshots and purging them logarithmically based
> on passage of time. For instance, you might have snapshots of every 5
> minutes for the last hour, every 30 minutes for the last day, every 4
> hours for the last week, every day for the last month, once per week for
> the last year, once per month for the last 5 years, and once per year
> beyond that.
>
> That's pretty impressive data-recovery resolution, and it accounts for
> only 200 unique commits after it's been running for 10 years.
>
> Q. How do you archive data?
> A. Same as deleting, except instead of a "History erased" object you
> would use a "History archived" object with a little bit of string data
> to indicate which volume it's stored on (and where on the volume). When
> you stick that volume into the system you could easily tell the kernel
> to use it as an alternate for the given storage group.
>
> Q. What enforces data integrity?
> A. Ensure that a new tree object and its associated sub objects are on
> disk before you delete the old one. Doesn't need any actual full syncs
> at all, just barriers. If you replace the tree object before write-out
> is complete then just skip writing the old one and write the new one in
> its place.
>
> Q. What consists of a "commit"?
> A. Anything the administrator wants to define it as. Useful algorithms
> include: "Once per x Mbyte of page dirtying", "Once per 5 min", "Only
> when sync() or fsync() are called", "Only when gitfs-commit is called".
> You could even combine them: "Every x Mbyte of page dirtying or every 5
> minutes, whichever is shorter (or longer, depending on admin
> requirements)". There would also be appropriate syscalls to trigger
> appropriate git-like behavior. Network-accessible gitfs would want to
> have mechanisms to trigger commits based on activity on other systems
> (needs more thought).
>
> Q. How do you access old versions?
> A. Mount another instance of the filesystem with an SHA-1 ID, a
> tag-name, or a branch-name in a special mount option. Should be user
> accessible with some restrictions (needs more thought).
>
> Q. How do you deal with conflicts on networked filesystems.
> A. Once again, however the administrator wants to deal with them. Options:
> 1) Forcibly create a new branch for the conflicted tree.
> 2) Attempt to merge changes using the standard git-merge semantics
> 3) Merge independent changes to different files and pick one for
> changes to the same file
> 4) Your Algorithm Here(TM). GIT makes it easy to extend
> conflict-resolution.
>
> Q. How do you deal with little scattered changes in big (or sparse) files?
> A. Two questions, two answers: For sparse files, git would need
> extending to understand (and hash) the nature of the sparse-ness. For
> big files, you should be able to introduce a "compound-file" datatype
> and configure git to deal with specific X-Mbyte chunks of it
> independently. This might not be a bad idea for native git as well.
> Would need system-specific configuration.
>
> Q. How do you prevent massive data consumption by spurious tiny changes
> A. You have a few options:
> 1) Configure your commit algorithm as above to not commit so often
> 2) Configure a stepped commit-discard algorithm as described above
> in the "How do you delete things" question
> 3) Archive unused data to secondary storage more often
>
> Q. What about all the unanswered questions?
> A. These are all the ones I could think of off the top of my head but
> there are at least a hundred more. I'm pretty sure these are some of
> the most significant ones.
>
> Q. That's a great idea and I'll implement it right away!
> A. Yay! (but that's not a question :-D) Good luck and happy hacking.
>
> Q. That's a stupid idea and would never ever work!
> A. Thanks for your useful input! (but that's not a question either) I'm
> sure anybody who takes up a project like this will consider such opinions.
>
> Q. *flamage*
> A. I'm glad you have such strong opinions, feel free to to continue to
> spam my /dev/null device (and that's also not a question).
>
> All opinions and comments welcomed.
>
> Cheers,
> Kyle Moffett
>
>
It sounds brilliant and I'd love to have a got at implementing it but I
don't know enough (yet :-D) about how git works, a little research is
called for I think.
Jack
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Versioning file system
2007-06-19 3:10 ` Versioning file system Kyle Moffett
2007-06-19 7:49 ` Jack Stone
@ 2007-06-19 7:58 ` Bron Gondwana
2007-06-20 2:43 ` Kyle Moffett
2007-06-19 9:09 ` Martin Langhoff
2007-06-19 16:52 ` Jakub Narebski
3 siblings, 1 reply; 6+ messages in thread
From: Bron Gondwana @ 2007-06-19 7:58 UTC (permalink / raw)
To: Kyle Moffett
Cc: Bryan Henderson, Jack Stone, Andrew Morton, alan, H. Peter Anvin,
linux-fsdevel, LKML Kernel, Al Viro, git
On Mon, Jun 18, 2007 at 11:10:42PM -0400, Kyle Moffett wrote:
> On Jun 18, 2007, at 13:56:05, Bryan Henderson wrote:
>>> The question remains is where to implement versioning: directly in
>>> individual filesystems or in the vfs code so all filesystems can use it?
>>
>> Or not in the kernel at all. I've been doing versioning of the types I
>> described for years with user space code and I don't remember feeling that
>> I compromised in order not to involve the kernel.
>
> What I think would be particularly interesting in this domain is something
> similar in concept to GIT, except in a file-system:
I've written a couple of user-space things very much like this - one
being a purely database (blobs in database, yeah I know) system for
managing medical data, where signatures and auditability were the most
important part of the system. Performance really wasn't a
consideration.
The other one is my current job, FastMail - we have a virtual filesystem
which uses files stored by sha1 on ordainary filesystems for data
storage and a database for metadata (filename to sha1 mappings, mtime,
mimetype, directory structure, etc).
Multiple machine distribution is handled by a daemon on each machine
which can be asked to make sure the file gets sent out to every machine
that matches the prefix and will only return success once it's written
to at least one other machine. Database replication is a different
beast.
It can work, but there's one big pain at the file level: no mmap.
If you don't want to support mmap it can work reasonably happily, though
you may want to keep your sha1 (or other digest) state as well as the
final digest so you can cheaply calculate the digest for a small append
without walking the entire file. You may also want to keep state
checkpoints every so often along a big file so that truncates don't cost
too much to recalculate.
Luckily in a userspace VFS that's only accessed via FTP and DAV we can
support a limited set of operations (basically create, append, read,
delete) You don't get that luxury for a general purpose filesystem, and
that's the problem. There will always be particular usage patterns
(especially something that mmaps or seeks and touches all over the place
like a loopback mounted filesystem or a database file) that just dodn't
work for file-level sha1s.
It does have some lovely properties though. I'd enjoy working in an
envionment that didn't look much like POSIX but had the strong
guarantees and auditability that addressing by sha1 buys you.
Bron.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Versioning file system
2007-06-19 3:10 ` Versioning file system Kyle Moffett
2007-06-19 7:49 ` Jack Stone
2007-06-19 7:58 ` Bron Gondwana
@ 2007-06-19 9:09 ` Martin Langhoff
2007-06-19 16:52 ` Jakub Narebski
3 siblings, 0 replies; 6+ messages in thread
From: Martin Langhoff @ 2007-06-19 9:09 UTC (permalink / raw)
To: Kyle Moffett
Cc: Bryan Henderson, Jack Stone, Andrew Morton, alan, H. Peter Anvin,
linux-fsdevel, LKML Kernel, Al Viro, git
On 6/19/07, Kyle Moffett <mrmacman_g4@mac.com> wrote:
> What I think would be particularly interesting in this domain is
> something similar in concept to GIT, except in a file-system:
perhaps stating the blindingly obvious, but there was an early
implementation of a FUSE-based gitfs --
http://www.sfgoth.com/~mitch/linux/gitfs/
cheers,
martin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Versioning file system
2007-06-19 3:10 ` Versioning file system Kyle Moffett
` (2 preceding siblings ...)
2007-06-19 9:09 ` Martin Langhoff
@ 2007-06-19 16:52 ` Jakub Narebski
3 siblings, 0 replies; 6+ messages in thread
From: Jakub Narebski @ 2007-06-19 16:52 UTC (permalink / raw)
To: linux-kernel; +Cc: git, linux-fsdevel, git, linux-kernel
Kyle Moffett wrote:
> On Jun 18, 2007, at 13:56:05, Bryan Henderson wrote:
>>> The question remains is where to implement versioning: directly in
>>> individual filesystems or in the vfs code so all filesystems can
>>> use it?
>>
>> Or not in the kernel at all. I've been doing versioning of the
>> types I described for years with user space code and I don't
>> remember feeling that I compromised in order not to involve the
>> kernel.
>>
>> Of course, if you want to do it with snapshots and COW, you'll have
>> to ask where in the kernel to put that, but that's not a file
>> versioning question; it's the larger snapshot question.
>
> What I think would be particularly interesting in this domain is
> something similar in concept to GIT, except in a file-system
[cut]
How it relates to ext3cow versioning (snapshotting) filesystem,
for example? ext3cow assumes linear history, which simplifies things
a bit.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Versioning file system
2007-06-19 7:58 ` Bron Gondwana
@ 2007-06-20 2:43 ` Kyle Moffett
0 siblings, 0 replies; 6+ messages in thread
From: Kyle Moffett @ 2007-06-20 2:43 UTC (permalink / raw)
To: Bron Gondwana
Cc: Bryan Henderson, Jack Stone, Andrew Morton, alan, H. Peter Anvin,
linux-fsdevel, LKML Kernel, Al Viro, git
On Jun 19, 2007, at 03:58:57, Bron Gondwana wrote:
> On Mon, Jun 18, 2007 at 11:10:42PM -0400, Kyle Moffett wrote:
>> On Jun 18, 2007, at 13:56:05, Bryan Henderson wrote:
>>>> The question remains is where to implement versioning: directly
>>>> in individual filesystems or in the vfs code so all filesystems
>>>> can use it?
>>>
>>> Or not in the kernel at all. I've been doing versioning of the
>>> types I described for years with user space code and I don't
>>> remember feeling that I compromised in order not to involve the
>>> kernel.
>>
>> What I think would be particularly interesting in this domain is
>> something similar in concept to GIT, except in a file-system:
>
> [...snip...]
>
> It can work, but there's one big pain at the file level: no mmap.
IMHO it's actually not that bad. The "gitfs" would divide larger
files up into manageable chunks (say 4MB) which could be quickly
SHA-1ed. When a file is mmapped and partially modified, the SHA-1
would be marked as locally invalid, but since mmap() loses most
consistency guarantees that's OK. A time or writeout based "commit"
scheme might still freeze, SHA-1, and write-out the page at regular
intervals without the program's knowledge, but since you only have to
SHA-1 the relatively-small 4MB chunk (which is about to hit disk
anyways), it's not a significant time penalty. Even if under memory
pressure and swapping data out to disk you don't have to update the
SHA-1 and create a new commit as long as you keep a reference to the
object stored in the volume header somewhere and maintain the "SHA-1
out-of-date" bit.
A program which carefully uses msync() would be fine, of course (with
proper configuration) as that would create a new commit as appropriate.
Since mmap() is poorly defined on network filesystems in the absence
of msync(), I don't see that such behaviour would be a problem. And
it certainly would be fine on local filesystems as there you can just
stuff the "SHA-1 out-of-date" bit and a reference to the parent
commit and path in the object itself. Then you just need to keep a
useful reference to that object in a table somewhere in the volume
and you're set.
> If you don't want to support mmap it can work reasonably happily,
> though you may want to keep your sha1 (or other digest) state as
> well as the final digest so you can cheaply calculate the digest
> for a small append without walking the entire file. You may also
> want to keep state checkpoints every so often along a big file so
> that truncates don't cost too much to recalculate.
That may be worth it even if the file is divided into 4MB chunks (or
other configurable value), but it would need benchmarking.
> Luckily in a userspace VFS that's only accessed via FTP and DAV we
> can support a limited set of operations (basically create, append,
> read, delete) You don't get that luxury for a general purpose
> filesystem, and that's the problem. There will always be
> particular usage patterns (especially something that mmaps or seeks
> and touches all over the place like a loopback mounted filesystem
> or a database file) that just dodn't work for file-level sha1s.
I'd think that loopback-mounted filesystems wouldn't be that difficult
1) Set the SHA-1 block size appropriately to divide the big file
into a bunch of little manageable files. Could conceivably be multi-
layered like directories, depending on the size of the file.
2) Mark the file as exempt from normal commits (IE: without
special syscalls or fsync/msync() on the file itself, it is never
updated in the tree objects.
3) Set up the loopback device to call the gitfs commit code when
it receives barriers or flushes from the parent filesystem.
And database files aren't a big issue. I have yet to see a networked
filesystem which you could stick a MySQL database on it from one node
and expect to get useful/recent read results from other nodes. If
you really wanted something like that for such a "gitfs", you could
just add code to MySQL to create a gitfs commit every N transactions
and not otherwise. The best part is: that would make online MySQL
backups from another node trivial! Just pick any arbitrary
appropriate commit object and mount that object, then "cp -a
mysql_db_dir mysql_backup_dir". That's not to say it wouldn't have a
performance penalty, but for some people the performance penalty
might be worth it.
Oh, and for those programs which want multi-master replication, this
makes it ten times easier:
1) Put each master-server on a different gitfs branch
2) Write your program as gitfs aware. Make it create gitfs
commits at appropriate times (so the data is accessible from other
nodes).
3) Come up with a useful non-interactive database-file merge
algorithm. Useful examples of different kinds of merge engines may
be found in the git project. This should take $BASE_VERSION,
$NEWVERSION1, $NEWVERSION2, and produce a $MERGEDVERSION. A good
algorithm should probably pick a safe default and save a "conflict"
entry in the face of conflicting changes.
4) Hook your merge algorithm into the gitfs mechanics using some
to-be-defined API.
5) Whenever your software does a database-file commit it sends
out a little notification to the other nodes (maybe using a gitfs API?)
6) Run a periodic (as defined by the admin yet again) thread on
each node which does branch merging. When two or more branches have
different SHA-1 sums the servers will rotate the merging task between
them. The thus-selected server will merge changes from the other
server(s) into its current working copy. With 2 servers this means
that the maximum delay between one server making a change and the
other server seeing it will be 2 times the merge interval.
7) For small pools of servers a simple rotated-merge-master
algorithm would work. For larger pools you would need to come up
with some logarithmic rotating-merge-node algorithm to evenly divide
the work of propagating changes across all nodes.
> It does have some lovely properties though. I'd enjoy working in
> an envionment that didn't look much like POSIX but had the strong
> guarantees and auditability that addressing by sha1 buys you.
I'd like to think we can have our cake and eat it too :-D. POSIX
requirements should be doable on the local system and can be mimiced
well enough on networked filesystems (albeit with update latency)
that most programs won't care. If you're the only person modifying
files on gitfs, regardless of what node they are stored on, it should
have the same behavior as local files (since with gitfs caching they
would *become* local files too :-D). The few programs that do care
about POSIX atomicity across networked filesystems (which is already
mostly implementation defined) could probably be updated to map gitfs
commits and merges into their own internal transactions and do just
fine.
Cheers,
Kyle Moffett
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-06-20 2:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <OF7FA807A1.64C0D5AF-ON882572FE.0061B34C-882572FE.00628322@us.ibm.com>
2007-06-19 3:10 ` Versioning file system Kyle Moffett
2007-06-19 7:49 ` Jack Stone
2007-06-19 7:58 ` Bron Gondwana
2007-06-20 2:43 ` Kyle Moffett
2007-06-19 9:09 ` Martin Langhoff
2007-06-19 16:52 ` Jakub Narebski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).