* patches to support working without the object database
@ 2005-07-08 10:37 Bryan Larsen
2005-07-08 18:36 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: Bryan Larsen @ 2005-07-08 10:37 UTC (permalink / raw)
To: git
Sometimes you may wish to keep an audit trail of what changed, where,
and by whom. You do not need to know the exact details of the change,
and the files are so large that keeping an extra copy of the data in the
object database cache is prohibitively expensive.
Git is (almost) ideally suited for this. There's very little out there
that is faster than git-diff-cache.
The design of git also facilitates this. git-update-cache --cacheinfo
allows the index to be updated without an object in the database, and
operations can then be performed around the index. However, there are
some things that are inconvenient and one show stopper.
I will separately mail a series of patches. The first will address the
show stopper, the rest the inconveniences. Once applied, cg-init,
cg-commit and cg-add will all take the "-N" option to update the index
without moving the objects into the database. Operations that don't
require the database such as cg-status and cg-log -f work fine and are
very useful.
I don't expect the patches to be accepted as is. One was designed to be
minimally intrusive, but I suspect that there is a better way to do it:
suggestions are welcome. The controversial one switches
git-update-cache --refresh to rely on the SHA1 being unique rather than
doing a byte comparison against the (possibly missing) object database.
It could be made an option, but I think that's ugly.
All patches are against cogito-0.12.
cheers,
Bryan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: patches to support working without the object database
2005-07-08 10:37 patches to support working without the object database Bryan Larsen
@ 2005-07-08 18:36 ` Junio C Hamano
2005-07-08 19:34 ` Junio C Hamano
2005-07-08 20:09 ` Bryan Larsen
0 siblings, 2 replies; 4+ messages in thread
From: Junio C Hamano @ 2005-07-08 18:36 UTC (permalink / raw)
To: Bryan Larsen; +Cc: git
>>>>> "BL" == Bryan Larsen <bryan.larsen@gmail.com> writes:
BL> Sometimes you may wish to keep an audit trail of what changed, where,
BL> and by whom. You do not need to know the exact details of the change,
BL> and the files are so large that keeping an extra copy of the data in
BL> the object database cache is prohibitively expensive.
I am basically OK with this patch series, except I have one
minor problem about interface detail, and more seriously, that
the patch is whitespace mangled and would not apply. E.g.
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -139,7 +139,7 @@ extern int remove_cache_entry_at(int pos
extern int remove_file_from_cache(char *path);
extern int ce_same_name(struct cache_entry *a, struct cache_entry *b);
extern int ce_match_stat(struct cache_entry *ce, struct stat *st);
-extern int index_fd(unsigned char *sha1, int fd, struct stat *st);
+extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int
info_only);
Notice the "info_only" folded, and other unchanged lines
indented by two spaces instead of one?
Please retry. I especially like what [PATCH 4/7] does and do
not want to see this patch go to dustbin due to technicalities.
Also please make sure that core GIT part patch applies against
Linus tip (especially [PATCH 2/7]) as well. I think it does, but
please double check.
I would also suggest adding the same --info-only logic to
write-blob (perhaps give it a short and sweet name like "-n"),
in order to get the hash information out of it without actually
registering the blob.
This would make things more useful in general. One immediate
benefit of it is that we would have a standalone checksum
program we can reuse, by just saying "write-blob -n". Once you
have it, you _could_ even drop --info-only from git-update-cache
and use normal --cacheinfo instead.
While you are at it, you might also want to add an option to
write-blob to specify the type of the object you are hashing, so
that would make [*1*]:
git-write-blob [-n] [-t <type>] <file>...
One way to do this would be to add "const char *type" argument
to index_fd(), which is usually "blob" in the traditional use.
Then, the change to index_fd() would become:
- ret = write_sha1_file(buf, size, "blob", sha1);
+ if (info_only) {
+ (void) write_sha1_file_prepare(buf, size, type, sha1, hdr, &hdrlen);
+ ret = 0;
+ } else ret = write_sha1_file(buf, size, type, sha1);
But first let's get the whitespace mangling fixed up ;-).
[Footnote]
*1* I considered this instead:
git-write-blob [-n | -t <type>] <file>...
which means that if you specify type then -n is implied. But
making -t independent would let you have inverse of
git-cat-file; a silly example:
$ git-cat-file -t $FOO
tree
$ git-cat-file tree $FOO >tmp1
$ FOO1=$(git-write-blob -t tree tmp1)
If we go this route, we may also want to rename it to
write-object, but I would want to have it as a separate patch
after this series settles down.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: patches to support working without the object database
2005-07-08 18:36 ` Junio C Hamano
@ 2005-07-08 19:34 ` Junio C Hamano
2005-07-08 20:09 ` Bryan Larsen
1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2005-07-08 19:34 UTC (permalink / raw)
To: Bryan Larsen; +Cc: git
Replying to myself...
JCH> While you are at it, you might also want to add an option to
JCH> write-blob to specify the type of the object you are hashing, so
JCH> that would make [*1*]:
JCH> git-write-blob [-n] [-t <type>] <file>...
JCH> [Footnote]
JCH> *1* I considered this instead:
JCH> git-write-blob [-n | -t <type>] <file>...
JCH> which means that if you specify type then -n is implied. But
JCH> making -t independent would let you have inverse of
JCH> git-cat-file; a silly example:
JCH> $ git-cat-file -t $FOO
JCH> tree
JCH> $ git-cat-file tree $FOO >tmp1
JCH> $ FOO1=$(git-write-blob -t tree tmp1)
JCH> If we go this route, we may also want to rename it to
JCH> write-object, but I would want to have it as a separate patch
JCH> after this series settles down.
Come to think of it, there is only one in-tree user of
write-blob remaining. Renaming it to hash-object, changing the
default behaviour to just hash without storing and instead give
it --write (or just -w) flag would make more sense. Without -t,
the type should default to "blob".
Then, the above stupid example would then become:
$ git-cat-file -t $FOO
tree
$ git-cat-file tree $FOO >tmp1
$ FOO1=$(git-hash-object -t tree tmp1)
And the only in-tree user git-cvsimport-script would be changed to:
--- a/git-cvsimport-script
+++ b/git-cvsimport-script
@@ -683,7 +683,7 @@ while(<CVS>) {
$fn =~ s#^/+##;
my ($tmpname, $size) = $cvs->file($fn,$rev);
print "".($init ? "New" : "Update")." $fn: $size bytes.\n" if $opt_v;
- open my $F, '-|', "git-write-blob $tmpname"
+ open my $F, '-|', "git-hash-object -w $tmpname"
or die "Cannot create object: $!\n";
my $sha = <$F>;
chomp $sha;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: patches to support working without the object database
2005-07-08 18:36 ` Junio C Hamano
2005-07-08 19:34 ` Junio C Hamano
@ 2005-07-08 20:09 ` Bryan Larsen
1 sibling, 0 replies; 4+ messages in thread
From: Bryan Larsen @ 2005-07-08 20:09 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano wrote:
>>>>>>"BL" == Bryan Larsen <bryan.larsen@gmail.com> writes:
>
>
> BL> Sometimes you may wish to keep an audit trail of what changed, where,
> BL> and by whom. You do not need to know the exact details of the change,
> BL> and the files are so large that keeping an extra copy of the data in
> BL> the object database cache is prohibitively expensive.
>
> I am basically OK with this patch series, except I have one
> minor problem about interface detail, and more seriously, that
> the patch is whitespace mangled and would not apply. E.g.
>
* SNIP *
>
> Also please make sure that core GIT part patch applies against
> Linus tip (especially [PATCH 2/7]) as well. I think it does, but
> please double check.
>
>
I had trouble getting tip. That may be because I'm on OS X: I want to
try it on
a Linux box to narrow down the source of my problems. Given that it's
currently
4PM on Friday, I don't think the IT staff is going to fix the firewall
before Monday.
So please excuse me while I scrounge up another Linux box or two.
After that happens, I'll fix up my patches as suggested by you and
Linus, get
myself a real mailer and resubmit.
thanks,
Bryan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2005-07-08 23:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-08 10:37 patches to support working without the object database Bryan Larsen
2005-07-08 18:36 ` Junio C Hamano
2005-07-08 19:34 ` Junio C Hamano
2005-07-08 20:09 ` Bryan Larsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).