* patches to support working without the object database @ 2005-07-08 10:37 Bryan Larsen 2005-07-08 18:36 ` Junio C Hamano 0 siblings, 1 reply; 4+ messages in thread From: Bryan Larsen @ 2005-07-08 10:37 UTC (permalink / raw) To: git Sometimes you may wish to keep an audit trail of what changed, where, and by whom. You do not need to know the exact details of the change, and the files are so large that keeping an extra copy of the data in the object database cache is prohibitively expensive. Git is (almost) ideally suited for this. There's very little out there that is faster than git-diff-cache. The design of git also facilitates this. git-update-cache --cacheinfo allows the index to be updated without an object in the database, and operations can then be performed around the index. However, there are some things that are inconvenient and one show stopper. I will separately mail a series of patches. The first will address the show stopper, the rest the inconveniences. Once applied, cg-init, cg-commit and cg-add will all take the "-N" option to update the index without moving the objects into the database. Operations that don't require the database such as cg-status and cg-log -f work fine and are very useful. I don't expect the patches to be accepted as is. One was designed to be minimally intrusive, but I suspect that there is a better way to do it: suggestions are welcome. The controversial one switches git-update-cache --refresh to rely on the SHA1 being unique rather than doing a byte comparison against the (possibly missing) object database. It could be made an option, but I think that's ugly. All patches are against cogito-0.12. cheers, Bryan ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: patches to support working without the object database 2005-07-08 10:37 patches to support working without the object database Bryan Larsen @ 2005-07-08 18:36 ` Junio C Hamano 2005-07-08 19:34 ` Junio C Hamano 2005-07-08 20:09 ` Bryan Larsen 0 siblings, 2 replies; 4+ messages in thread From: Junio C Hamano @ 2005-07-08 18:36 UTC (permalink / raw) To: Bryan Larsen; +Cc: git >>>>> "BL" == Bryan Larsen <bryan.larsen@gmail.com> writes: BL> Sometimes you may wish to keep an audit trail of what changed, where, BL> and by whom. You do not need to know the exact details of the change, BL> and the files are so large that keeping an extra copy of the data in BL> the object database cache is prohibitively expensive. I am basically OK with this patch series, except I have one minor problem about interface detail, and more seriously, that the patch is whitespace mangled and would not apply. E.g. diff --git a/cache.h b/cache.h --- a/cache.h +++ b/cache.h @@ -139,7 +139,7 @@ extern int remove_cache_entry_at(int pos extern int remove_file_from_cache(char *path); extern int ce_same_name(struct cache_entry *a, struct cache_entry *b); extern int ce_match_stat(struct cache_entry *ce, struct stat *st); -extern int index_fd(unsigned char *sha1, int fd, struct stat *st); +extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int info_only); Notice the "info_only" folded, and other unchanged lines indented by two spaces instead of one? Please retry. I especially like what [PATCH 4/7] does and do not want to see this patch go to dustbin due to technicalities. Also please make sure that core GIT part patch applies against Linus tip (especially [PATCH 2/7]) as well. I think it does, but please double check. I would also suggest adding the same --info-only logic to write-blob (perhaps give it a short and sweet name like "-n"), in order to get the hash information out of it without actually registering the blob. This would make things more useful in general. One immediate benefit of it is that we would have a standalone checksum program we can reuse, by just saying "write-blob -n". Once you have it, you _could_ even drop --info-only from git-update-cache and use normal --cacheinfo instead. While you are at it, you might also want to add an option to write-blob to specify the type of the object you are hashing, so that would make [*1*]: git-write-blob [-n] [-t <type>] <file>... One way to do this would be to add "const char *type" argument to index_fd(), which is usually "blob" in the traditional use. Then, the change to index_fd() would become: - ret = write_sha1_file(buf, size, "blob", sha1); + if (info_only) { + (void) write_sha1_file_prepare(buf, size, type, sha1, hdr, &hdrlen); + ret = 0; + } else ret = write_sha1_file(buf, size, type, sha1); But first let's get the whitespace mangling fixed up ;-). [Footnote] *1* I considered this instead: git-write-blob [-n | -t <type>] <file>... which means that if you specify type then -n is implied. But making -t independent would let you have inverse of git-cat-file; a silly example: $ git-cat-file -t $FOO tree $ git-cat-file tree $FOO >tmp1 $ FOO1=$(git-write-blob -t tree tmp1) If we go this route, we may also want to rename it to write-object, but I would want to have it as a separate patch after this series settles down. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: patches to support working without the object database 2005-07-08 18:36 ` Junio C Hamano @ 2005-07-08 19:34 ` Junio C Hamano 2005-07-08 20:09 ` Bryan Larsen 1 sibling, 0 replies; 4+ messages in thread From: Junio C Hamano @ 2005-07-08 19:34 UTC (permalink / raw) To: Bryan Larsen; +Cc: git Replying to myself... JCH> While you are at it, you might also want to add an option to JCH> write-blob to specify the type of the object you are hashing, so JCH> that would make [*1*]: JCH> git-write-blob [-n] [-t <type>] <file>... JCH> [Footnote] JCH> *1* I considered this instead: JCH> git-write-blob [-n | -t <type>] <file>... JCH> which means that if you specify type then -n is implied. But JCH> making -t independent would let you have inverse of JCH> git-cat-file; a silly example: JCH> $ git-cat-file -t $FOO JCH> tree JCH> $ git-cat-file tree $FOO >tmp1 JCH> $ FOO1=$(git-write-blob -t tree tmp1) JCH> If we go this route, we may also want to rename it to JCH> write-object, but I would want to have it as a separate patch JCH> after this series settles down. Come to think of it, there is only one in-tree user of write-blob remaining. Renaming it to hash-object, changing the default behaviour to just hash without storing and instead give it --write (or just -w) flag would make more sense. Without -t, the type should default to "blob". Then, the above stupid example would then become: $ git-cat-file -t $FOO tree $ git-cat-file tree $FOO >tmp1 $ FOO1=$(git-hash-object -t tree tmp1) And the only in-tree user git-cvsimport-script would be changed to: --- a/git-cvsimport-script +++ b/git-cvsimport-script @@ -683,7 +683,7 @@ while(<CVS>) { $fn =~ s#^/+##; my ($tmpname, $size) = $cvs->file($fn,$rev); print "".($init ? "New" : "Update")." $fn: $size bytes.\n" if $opt_v; - open my $F, '-|', "git-write-blob $tmpname" + open my $F, '-|', "git-hash-object -w $tmpname" or die "Cannot create object: $!\n"; my $sha = <$F>; chomp $sha; ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: patches to support working without the object database 2005-07-08 18:36 ` Junio C Hamano 2005-07-08 19:34 ` Junio C Hamano @ 2005-07-08 20:09 ` Bryan Larsen 1 sibling, 0 replies; 4+ messages in thread From: Bryan Larsen @ 2005-07-08 20:09 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: >>>>>>"BL" == Bryan Larsen <bryan.larsen@gmail.com> writes: > > > BL> Sometimes you may wish to keep an audit trail of what changed, where, > BL> and by whom. You do not need to know the exact details of the change, > BL> and the files are so large that keeping an extra copy of the data in > BL> the object database cache is prohibitively expensive. > > I am basically OK with this patch series, except I have one > minor problem about interface detail, and more seriously, that > the patch is whitespace mangled and would not apply. E.g. > * SNIP * > > Also please make sure that core GIT part patch applies against > Linus tip (especially [PATCH 2/7]) as well. I think it does, but > please double check. > > I had trouble getting tip. That may be because I'm on OS X: I want to try it on a Linux box to narrow down the source of my problems. Given that it's currently 4PM on Friday, I don't think the IT staff is going to fix the firewall before Monday. So please excuse me while I scrounge up another Linux box or two. After that happens, I'll fix up my patches as suggested by you and Linus, get myself a real mailer and resubmit. thanks, Bryan ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-07-08 23:39 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-07-08 10:37 patches to support working without the object database Bryan Larsen 2005-07-08 18:36 ` Junio C Hamano 2005-07-08 19:34 ` Junio C Hamano 2005-07-08 20:09 ` Bryan Larsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).