* [ANNOUNCE] git-rev-size: calculate sizes of repository
@ 2006-08-20 10:54 Rutger Nijlunsing
2006-08-20 13:20 ` Johannes Schindelin
0 siblings, 1 reply; 13+ messages in thread
From: Rutger Nijlunsing @ 2006-08-20 10:54 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 1519 bytes --]
Hi,
Just created as answer to a request on IRC: a script to get the sizes
of a repository and various stages. It caches all sizes it finds, so
is quite fast once it has ramped up.
Example on git repo:
$ git-rev-size.rb HEAD~10..HEAD
ef75951ecabd53b5ed816eb596992f8d222d0fe3 21 694 3343495
a625daccb1750c56768481ec9a5dfd4f9053774e 21 694 3343492
55c3eb434ab6d489c632263239be15a1054df7f2 21 694 3343481
a89fccd28197fa179828c8596791ff16e2268d20 21 694 3343523
d4baf9eaf47ea1ba204f1ab5ecd22326913dd081 21 694 3343498
409d1d2053657f73a3222651111740606122aa80 21 694 3343423
076a10c7282a08f783a28c1b64d0e114a3fe3d39 21 694 3342501
8e3abd4c97b8e7e1128ad0cc44dcc267f478659a 21 694 3342485
500a99935dc157a6625b4decae0b97e896061c2c 21 692 3334754
6493cc09c2aa626ffbe6024dd705e1495c2d87e4 21 692 3334511
d78b0f3d6aa04510dd0c22c3853d3954c5f5b531 21 688 3322774
0fc82cff12a887c1e0e7e69937dbd8a82843c081 21 694 3343352
42f774063db1442fc3815f596d263f90dcd8380b 21 694 3344828
520cd3eca5743bebd217423e1fd0721f32613bb1 21 693 3344115
789a09b4874ae2616987794e0e739b8227957175 21 692 3335517
c35f4c371ac12f4d29b08e46c519ddc0a6494f6e 21 691 3330286
Numbers are SHA1 hash, number of trees, number of blobs and total
number of bytes in those blobs.
You can also find it on http://www.wingding.demon.nl/git-rev-size.rb
--
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------
[-- Attachment #2: git-rev-size.rb --]
[-- Type: text/plain, Size: 2075 bytes --]
#!/usr/bin/env ruby
# Calculates sizes of repository at different commits in git
#
# 20060819 Initial release
# 20060820 Pass arguments to git-rev-list
#
# (c)2006 R. Nijlunsing <git@tux.tmfweb.nl>
# License: LGPLv2
require 'set'
require 'enumerator'
if ARGV.empty?
puts "Calculates sizes of repository at different commits"
puts
puts "Usage: #{$0} <arguments for git-rev-list>"
puts "Example: #{$0} HEAD"
exit 1
end
class Sizes
attr_reader :trees, :blobs, :bytes
def initialize(trees, blobs, bytes); @trees = trees; @blobs = blobs; @bytes = bytes; end
def add(o); @trees += o.trees; @blobs += o.blobs; @bytes += o.bytes; end
end
def tree_size(tree)
return $sha2size[tree] if $sha2size.include?(tree)
size = Sizes.new(1, 0, 0)
blobs = [] # Blobs with unknown sizes
File.popen("git cat-file -p #{tree}", "r") { |io|
while line = io.gets
line =~ %r{^[0-9]{6} ([a-z]+) ([0-9a-f]+)}
type, sha1 = $1, $2
if $sha2size.include?(sha1)
size.add($sha2size[sha1])
elsif type == "tree"
size.add(tree_size(sha1))
elsif type == "blob"
blobs << sha1
else
raise type
end
end
}
if blobs.size > 0
# Do all _blobs_ at once. For this to help, git-cat-file should accept
# more than one filename a time.
blobs.each_slice(1) { |blobs_slice|
File.popen("git cat-file -s #{blobs_slice.join(' ')}", "r") { |io|
blobs_slice.each { |blob|
blob_size = $sha2size[blob] = Sizes.new(0, 1, io.gets.to_i)
size.add(blob_size)
}
}
}
end
$sha2size[tree] = size
end
$sha2size = {} # SHA1 -> Sizes
File.popen("git rev-list #{ARGV.join(' ')}", "r") do |cio|
while commit = cio.gets
tree = nil # Root tree of this commit
commit = commit.chomp
File.popen("git cat-file -p #{commit}", "r") do |io|
while (line = io.gets) && !tree
tree = $1 if line =~ %r{^tree ([a-f0-9]+)}
end
end
if tree
sizes = tree_size(tree)
puts "#{commit} #{sizes.trees} #{sizes.blobs} #{sizes.bytes}"
end
end
end
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [ANNOUNCE] git-rev-size: calculate sizes of repository 2006-08-20 10:54 [ANNOUNCE] git-rev-size: calculate sizes of repository Rutger Nijlunsing @ 2006-08-20 13:20 ` Johannes Schindelin 2006-08-20 15:24 ` Rutger Nijlunsing 0 siblings, 1 reply; 13+ messages in thread From: Johannes Schindelin @ 2006-08-20 13:20 UTC (permalink / raw) To: git; +Cc: git Hi, On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > You can also find it on http://www.wingding.demon.nl/git-rev-size.rb Ruby is _so_ mainstream. Could I have a Haskell version, pretty please? Ciao, Dscho ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] git-rev-size: calculate sizes of repository 2006-08-20 13:20 ` Johannes Schindelin @ 2006-08-20 15:24 ` Rutger Nijlunsing 2006-08-20 16:09 ` Johannes Schindelin 0 siblings, 1 reply; 13+ messages in thread From: Rutger Nijlunsing @ 2006-08-20 15:24 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git On Sun, Aug 20, 2006 at 03:20:19PM +0200, Johannes Schindelin wrote: > Hi, > > On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > > > You can also find it on http://www.wingding.demon.nl/git-rev-size.rb > > Ruby is _so_ mainstream. Could I have a Haskell version, pretty please? I _knew_ it... Please go bug someone else. The only thing I did was help someone, and for that I choose my own tools since I do it for fun. I don't ask for inclusion in the git archive. I don't ask you to review it, download it, read it nor use it. Just ignore this post if Ruby offends you and this problem wasn't your itch. Please. -- Rutger Nijlunsing ---------------------------------- eludias ed dse.nl never attribute to a conspiracy which can be explained by incompetence ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] git-rev-size: calculate sizes of repository 2006-08-20 15:24 ` Rutger Nijlunsing @ 2006-08-20 16:09 ` Johannes Schindelin 2006-08-20 16:37 ` Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) Josef Weidendorfer ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Johannes Schindelin @ 2006-08-20 16:09 UTC (permalink / raw) To: Rutger Nijlunsing; +Cc: git Hi, On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > On Sun, Aug 20, 2006 at 03:20:19PM +0200, Johannes Schindelin wrote: > > Hi, > > > > On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > > > > > You can also find it on http://www.wingding.demon.nl/git-rev-size.rb > > > > Ruby is _so_ mainstream. Could I have a Haskell version, pretty please? > > I _knew_ it... Please go bug someone else. The only thing I did was > help someone, and for that I choose my own tools since I do it for > fun. Fair enough. -- 8< -- [PATCH] Add git-rev-size This tool spits out the number of trees, the number of blobs, and the total bytes of the blobs for a given rev range. Most notably, it adds an object hash map structure to the library. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> --- Makefile | 4 ++ builtin-rev-size.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++ builtin.h | 1 + git.c | 1 + hash.c | 50 ++++++++++++++++++++++++++++ hash.h | 12 +++++++ 6 files changed, 159 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index a86f289..06c8dd9 100644 --- a/Makefile +++ b/Makefile @@ -264,7 +264,8 @@ LIB_OBJS = \ server-info.o setup.o sha1_file.o sha1_name.o strbuf.o \ tag.o tree.o usage.o config.o environment.o ctype.o copy.o \ fetch-clone.o revision.o pager.o tree-walk.o xdiff-interface.o \ - alloc.o merge-file.o path-list.o unpack-trees.o help.o $(DIFF_OBJS) + alloc.o merge-file.o path-list.o unpack-trees.o help.o \ + hash.o $(DIFF_OBJS) BUILTIN_OBJS = \ builtin-add.o \ @@ -297,6 +298,7 @@ BUILTIN_OBJS = \ builtin-repo-config.o \ builtin-rev-list.o \ builtin-rev-parse.o \ + builtin-rev-size.o \ builtin-rm.o \ builtin-show-branch.o \ builtin-stripspace.o \ diff --git a/builtin-rev-size.c b/builtin-rev-size.c new file mode 100644 index 0000000..ad88e48 --- /dev/null +++ b/builtin-rev-size.c @@ -0,0 +1,92 @@ +/* + * "git rev-size" builtin command + * + * Copyright (C) 2006 Johannes Schindelin + */ + +#include "cache.h" +#include "builtin.h" +#include "object.h" +#include "tree.h" +#include "tree-walk.h" +#include "commit.h" +#include "diff.h" +#include "revision.h" +#include "hash.h" + +static const char builtin_rev_size_usage[] = +"git-rev-size <commit-id>..."; + +struct rev_size { + struct object object; + size_t trees, blobs, bytes; +}; + +struct hash_map rev_size_hash = { 0, 0, NULL }; + +static struct rev_size *get_rev_size(const char *sha1) +{ + struct rev_size *rev_size = + (struct rev_size *)hash_get(&rev_size_hash, sha1); + + if (rev_size == NULL) { + char type[64]; + unsigned long size; + + rev_size = xcalloc(1, sizeof(struct rev_size)); + + if (sha1_object_info(sha1, type, &size)) + die("Cannot get info for %s", sha1_to_hex(sha1)); + + if (!strcmp(type, "blob")) { + rev_size->blobs = 1; + rev_size->bytes = size; + } else if (!strcmp(type, "tree")) { + struct tree *tree = (struct tree *)parse_object(sha1); + struct tree_desc desc; + struct name_entry entry; + + desc.buf = tree->buffer; + desc.size = tree->size; + + while (tree_entry(&desc, &entry)) { + struct rev_size *r = get_rev_size(entry.sha1); + + rev_size->trees += r->trees; + rev_size->blobs += r->blobs; + rev_size->bytes += r->bytes; + } + + rev_size->trees++; + } else + die("Cannot calculate size for type %s", type); + + memcpy(rev_size->object.sha1, sha1, 20); + hash_put(&rev_size_hash, &rev_size->object); + } + + return rev_size; +} + +int cmd_rev_size(int argc, const char **argv, const char *prefix) +{ + struct rev_info revs; + struct commit *commit; + + init_revisions(&revs, prefix); + revs.abbrev = 0; + revs.commit_format = CMIT_FMT_UNSPECIFIED; + argc = setup_revisions(argc, argv, &revs, NULL); + + prepare_revision_walk(&revs); + + while ((commit = get_revision(&revs))) { + struct rev_size *rev_size = + get_rev_size(commit->tree->object.sha1); + + printf("%s %d %d %d\n", sha1_to_hex(commit->object.sha1), + rev_size->trees, rev_size->blobs, rev_size->bytes); + } + + return 0; +} diff --git a/builtin.h b/builtin.h index ade58c4..9848a5e 100644 --- a/builtin.h +++ b/builtin.h @@ -46,6 +46,7 @@ extern int cmd_read_tree(int argc, const extern int cmd_repo_config(int argc, const char **argv, const char *prefix); extern int cmd_rev_list(int argc, const char **argv, const char *prefix); extern int cmd_rev_parse(int argc, const char **argv, const char *prefix); +extern int cmd_rev_size(int argc, const char **argv, const char *prefix); extern int cmd_rm(int argc, const char **argv, const char *prefix); extern int cmd_show_branch(int argc, const char **argv, const char *prefix); extern int cmd_show(int argc, const char **argv, const char *prefix); diff --git a/git.c b/git.c index bf0fe0e..4cfa6cf 100644 --- a/git.c +++ b/git.c @@ -262,6 +262,7 @@ static void handle_internal_command(int { "repo-config", cmd_repo_config }, { "rev-list", cmd_rev_list, RUN_SETUP }, { "rev-parse", cmd_rev_parse, RUN_SETUP }, + { "rev-size", cmd_rev_size, RUN_SETUP }, { "rm", cmd_rm, RUN_SETUP }, { "show-branch", cmd_show_branch, RUN_SETUP }, { "show", cmd_show, RUN_SETUP | USE_PAGER }, diff --git a/hash.c b/hash.c new file mode 100644 index 0000000..12d1e65 --- /dev/null +++ b/hash.c @@ -0,0 +1,50 @@ +#include "cache.h" +#include "object.h" +#include "hash.h" + +static unsigned int hash_index(struct hash_map *hash, const char *sha1) +{ + unsigned int index = *(unsigned int *)sha1; + while (1) { + if (index >= hash->alloc) + index = index % hash->alloc; + if (hash->map[index] == NULL || + !hashcmp(sha1, hash->map[index]->sha1)) + return index; + index++; + } +} + +static void grow_hash(struct hash_map *hash) +{ + int i; + int old_alloc = hash->alloc; + struct object **old_map = hash->map; + + hash->alloc = hash->alloc < 32 ? 32 : 2 * hash->alloc; + hash->map = xcalloc(hash->alloc, sizeof(struct object *)); + hash->nr = 0; + + for (i = 0; i < old_alloc; i++) { + struct object *obj = old_map[i]; + if (!obj) + continue; + hash_put(hash, obj); + } + free(old_map); +} + +void hash_put(struct hash_map *hash, struct object *obj) +{ + if (++hash->nr > hash->alloc / 2) + grow_hash(hash); + + hash->map[hash_index(hash, obj->sha1)] = obj; +} + +struct object *hash_get(struct hash_map *hash, const char *sha1) +{ + if (hash->alloc == 0) + return NULL; + return hash->map[hash_index(hash, sha1)]; +} diff --git a/hash.h b/hash.h new file mode 100644 index 0000000..0e2b67c --- /dev/null +++ b/hash.h @@ -0,0 +1,12 @@ +#ifndef HASH_H +#define HASH_H + +struct hash_map { + unsigned long nr, alloc; + struct object **map; +}; + +extern struct object *hash_get(struct hash_map *hash, const char *sha1); +extern void hash_put(struct hash_map *hash, struct object *obj); + +#endif -- 1.4.2.ga5e8f-dirty ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) 2006-08-20 16:09 ` Johannes Schindelin @ 2006-08-20 16:37 ` Josef Weidendorfer 2006-08-20 16:51 ` Johannes Schindelin 2006-08-20 17:24 ` [ANNOUNCE] git-rev-size: calculate sizes of repository Rutger Nijlunsing 2006-08-20 21:38 ` Junio C Hamano 2 siblings, 1 reply; 13+ messages in thread From: Josef Weidendorfer @ 2006-08-20 16:37 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Rutger Nijlunsing, git On Sunday 20 August 2006 18:09, Johannes Schindelin wrote: Hi, > Most notably, it adds an object hash map structure to the library. Aside from the given command of this thread, this is interesting (even more interesting would be a persistent cache for arbitrary object data). As this could be used in other contexts, some general comments: > +static unsigned int hash_index(struct hash_map *hash, const char *sha1) > +{ > + unsigned int index = *(unsigned int *)sha1; If you have the same SHA1, stored at different addresses, you get different indexes for the same SHA1. Index probably should be calculated from the SHA1 string. > +void hash_put(struct hash_map *hash, struct object *obj) > +{ > + if (++hash->nr > hash->alloc / 2) > + grow_hash(hash); If you insert the same object multiple times, hash->nr will get too big. Josef ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) 2006-08-20 16:37 ` Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) Josef Weidendorfer @ 2006-08-20 16:51 ` Johannes Schindelin 2006-08-20 17:40 ` Rutger Nijlunsing 2006-08-20 18:41 ` Josef Weidendorfer 0 siblings, 2 replies; 13+ messages in thread From: Johannes Schindelin @ 2006-08-20 16:51 UTC (permalink / raw) To: Josef Weidendorfer; +Cc: Rutger Nijlunsing, git Hi, On Sun, 20 Aug 2006, Josef Weidendorfer wrote: > On Sunday 20 August 2006 18:09, Johannes Schindelin wrote: > > > +static unsigned int hash_index(struct hash_map *hash, const char *sha1) > > +{ > > + unsigned int index = *(unsigned int *)sha1; > > If you have the same SHA1, stored at different addresses, you get different > indexes for the same SHA1. Index probably should be calculated from the > SHA1 string. Actually, it does! "*(unsigned int *)sha1" means that the first 4 bytes of the sha1 are interpreted as a number. > > +void hash_put(struct hash_map *hash, struct object *obj) > > +{ > > + if (++hash->nr > hash->alloc / 2) > > + grow_hash(hash); > > If you insert the same object multiple times, hash->nr will get too big. First, you cannot put the same object multiple times. That is not what a hash does (at least in this case): it stores unique objects (identified by their sha1 in this case). If you put another object with the same sha1, the first will be replaced. Second, since you call hash_put() once per object, hash->nr cannot grow too big, because grow_hash() doubles hash->alloc. And I call grow_hash() once the hash map is half-full; Somebody once told me that would be the optimal growing strategy. Ciao, Dscho ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) 2006-08-20 16:51 ` Johannes Schindelin @ 2006-08-20 17:40 ` Rutger Nijlunsing 2006-08-20 18:41 ` Josef Weidendorfer 1 sibling, 0 replies; 13+ messages in thread From: Rutger Nijlunsing @ 2006-08-20 17:40 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Josef Weidendorfer, Rutger Nijlunsing, git > Second, since you call hash_put() once per object, hash->nr cannot grow > too big, because grow_hash() doubles hash->alloc. And I call grow_hash() > once the hash map is half-full; Somebody once told me that would be the > optimal growing strategy. Optimal growing mainly means to be O(n) (amortized) after n inserts. That translates to at least _doubling_ (factor 2 or more) the capacity once you're too full. Assume doubling at a percentage full. Assume realloc(s) takes O(s) (where s = number of bytes). Assume we start with 1 element. We realloc() then when we've got 1 element, then at 2, 4, 8 etc. The size of the realloc() at each point will also be 1, 2, 4, 8 etc. However, this cost of O(s) can be amortized over the number of elements. So the work done _per insert_ is still a constant (amortized again). Ascilly: x x x x x x x x x x ... (each insert) R R R ... (each realloc) 1 2 0 4 0 0 0 8 0 0 ... (cost of those realloc()) This has also to do with the infinite series of the sum(k>0) of 2^-k being a constant. -- Rutger Nijlunsing ---------------------------------- eludias ed dse.nl never attribute to a conspiracy which can be explained by incompetence ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) 2006-08-20 16:51 ` Johannes Schindelin 2006-08-20 17:40 ` Rutger Nijlunsing @ 2006-08-20 18:41 ` Josef Weidendorfer 2006-08-20 18:47 ` Johannes Schindelin 1 sibling, 1 reply; 13+ messages in thread From: Josef Weidendorfer @ 2006-08-20 18:41 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git On Sunday 20 August 2006 18:51, Johannes Schindelin wrote: > > > +static unsigned int hash_index(struct hash_map *hash, const char *sha1) > > > +{ > > > + unsigned int index = *(unsigned int *)sha1; > > > > If you have the same SHA1, stored at different addresses, you get different > > indexes for the same SHA1. Index probably should be calculated from the > > SHA1 string. > > Actually, it does! "*(unsigned int *)sha1" means that the first 4 bytes > of the sha1 are interpreted as a number. Ah, yes. That's fine. > > > +void hash_put(struct hash_map *hash, struct object *obj) > > > +{ > > > + if (++hash->nr > hash->alloc / 2) > > > + grow_hash(hash); > > > > If you insert the same object multiple times, hash->nr will get too big. > > First, you cannot put the same object multiple times. That is not what a > hash does (at least in this case): it stores unique objects (identified by > their sha1 in this case). I put it the wrong way; I should have said "if you call hash_put() multiple times with the same object". You get the same index, and nothing should change. However, you still increment hash->nr, but this error is not really important as you correct it in grow_hash(). So... sorry for the noise ;-) Josef ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) 2006-08-20 18:41 ` Josef Weidendorfer @ 2006-08-20 18:47 ` Johannes Schindelin 0 siblings, 0 replies; 13+ messages in thread From: Johannes Schindelin @ 2006-08-20 18:47 UTC (permalink / raw) To: Josef Weidendorfer; +Cc: git Hi, On Sun, 20 Aug 2006, Josef Weidendorfer wrote: > On Sunday 20 August 2006 18:51, Johannes Schindelin wrote: > > > > > +void hash_put(struct hash_map *hash, struct object *obj) > > > > +{ > > > > + if (++hash->nr > hash->alloc / 2) > > > > + grow_hash(hash); > > > > > > If you insert the same object multiple times, hash->nr will get too big. > > > > First, you cannot put the same object multiple times. That is not what a > > hash does (at least in this case): it stores unique objects (identified by > > their sha1 in this case). > > I put it the wrong way; I should have said "if you call hash_put() multiple > times with the same object". You get the same index, and nothing should > change. However, you still increment hash->nr, but this error is not really > important as you correct it in grow_hash(). Talk about unintended side effects ;-) Ciao, Dscho ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] git-rev-size: calculate sizes of repository 2006-08-20 16:09 ` Johannes Schindelin 2006-08-20 16:37 ` Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) Josef Weidendorfer @ 2006-08-20 17:24 ` Rutger Nijlunsing 2006-08-20 18:44 ` Johannes Schindelin 2006-08-20 21:38 ` Junio C Hamano 2 siblings, 1 reply; 13+ messages in thread From: Rutger Nijlunsing @ 2006-08-20 17:24 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Rutger Nijlunsing, git On Sun, Aug 20, 2006 at 06:09:34PM +0200, Johannes Schindelin wrote: > Hi, > > On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > > > On Sun, Aug 20, 2006 at 03:20:19PM +0200, Johannes Schindelin wrote: > > > Hi, > > > > > > On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > > > > > > > You can also find it on http://www.wingding.demon.nl/git-rev-size.rb > > > > > > Ruby is _so_ mainstream. Could I have a Haskell version, pretty please? > > > > I _knew_ it... Please go bug someone else. The only thing I did was > > help someone, and for that I choose my own tools since I do it for > > fun. > > Fair enough. > > -- 8< -- > [PATCH] Add git-rev-size > > This tool spits out the number of trees, the number of blobs, and the total > bytes of the blobs for a given rev range. > > Most notably, it adds an object hash map structure to the library. > > Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> [Hm, the itch seems to be contagious. Better watch out...] Small comments: The 'git-rev-size' name was chosen because originally it understood the same arguments as git-rev-list. You might want to add this popen() back, or have some other way to share those (might be simple in C). Or is setup_revisions() enough to have the power of git-rev-list? If seperate commits have to be given on the command line instead of a range, the command line limit is hit quite quickly (~780 commits). And if you'll be using xargs, the hash / cache will be less of an advantage. The original request was 'for each commit' to get an idea of the size growth during a project. 'builtin_rev_size_usage' is not referred to in the patch, only defined. Signed-off-by: Rutger Nijlunsing <git@tux.tmfweb.nl> > --- > Makefile | 4 ++ > builtin-rev-size.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > builtin.h | 1 + > git.c | 1 + > hash.c | 50 ++++++++++++++++++++++++++++ > hash.h | 12 +++++++ > 6 files changed, 159 insertions(+), 1 deletions(-) [snip patch] -- Rutger Nijlunsing ---------------------------------- eludias ed dse.nl never attribute to a conspiracy which can be explained by incompetence ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] git-rev-size: calculate sizes of repository 2006-08-20 17:24 ` [ANNOUNCE] git-rev-size: calculate sizes of repository Rutger Nijlunsing @ 2006-08-20 18:44 ` Johannes Schindelin 0 siblings, 0 replies; 13+ messages in thread From: Johannes Schindelin @ 2006-08-20 18:44 UTC (permalink / raw) To: Rutger Nijlunsing; +Cc: git Hi, On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > On Sun, Aug 20, 2006 at 06:09:34PM +0200, Johannes Schindelin wrote: > > Hi, > > > > On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > > > > > On Sun, Aug 20, 2006 at 03:20:19PM +0200, Johannes Schindelin wrote: > > > > Hi, > > > > > > > > On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > > > > > > > > > You can also find it on http://www.wingding.demon.nl/git-rev-size.rb > > > > > > > > Ruby is _so_ mainstream. Could I have a Haskell version, pretty please? > > > > > > I _knew_ it... Please go bug someone else. The only thing I did was > > > help someone, and for that I choose my own tools since I do it for > > > fun. > > > > Fair enough. > > > > -- 8< -- > > [PATCH] Add git-rev-size > > > > This tool spits out the number of trees, the number of blobs, and the total > > bytes of the blobs for a given rev range. > > > > Most notably, it adds an object hash map structure to the library. > > > > Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> > > > [Hm, the itch seems to be contagious. Better watch out...] > > Small comments: > > The 'git-rev-size' name was chosen because originally it understood > the same arguments as git-rev-list. You might want to add this popen() > back, or have some other way to share those (might be simple in C). Or > is setup_revisions() enough to have the power of git-rev-list? It is enough. That is the beauty of setup_revisions(). > If seperate commits have to be given on the command line instead of a > range, the command line limit is hit quite quickly (~780 commits). And > if you'll be using xargs, the hash / cache will be less of an advantage. Certainly. But I doubt that you'll use this command all that often. However, it was a nice example of how easy it is to write a git builtin ;-) > The original request was 'for each commit' to get an idea of the size > growth during a project. Since the arguments are the same as for git-rev-list, this is easy enough. > 'builtin_rev_size_usage' is not referred to in the patch, only defined. True. -- 8< -- [PATCH] rev-size: actually show usage Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> --- builtin-rev-size.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/builtin-rev-size.c b/builtin-rev-size.c index ad88e48..184f926 100644 --- a/builtin-rev-size.c +++ b/builtin-rev-size.c @@ -78,6 +78,9 @@ int cmd_rev_size(int argc, const char ** revs.commit_format = CMIT_FMT_UNSPECIFIED; argc = setup_revisions(argc, argv, &revs, NULL); + if (revs.pending.nr == 0) + usage(builtin_rev_size_usage); + prepare_revision_walk(&revs); while ((commit = get_revision(&revs))) { -- 1.4.2.ga5e8f-dirty ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] git-rev-size: calculate sizes of repository 2006-08-20 16:09 ` Johannes Schindelin 2006-08-20 16:37 ` Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) Josef Weidendorfer 2006-08-20 17:24 ` [ANNOUNCE] git-rev-size: calculate sizes of repository Rutger Nijlunsing @ 2006-08-20 21:38 ` Junio C Hamano 2006-08-20 23:36 ` Johannes Schindelin 2 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2006-08-20 21:38 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Rutger Nijlunsing Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > >> I _knew_ it... Please go bug someone else. The only thing I did was >> help someone, and for that I choose my own tools since I do it for >> fun. > > Fair enough. > > -- 8< -- > [PATCH] Add git-rev-size > > This tool spits out the number of trees, the number of blobs, and the total > bytes of the blobs for a given rev range. I do not speak ruby (well I suspect I could read it if I wanted to but I didn't try) so this may or may not be something Johannes inherited from the original, but I think the code overcounts blobs and trees for a top-level tree that happens to have the same blob (or tree) twice. I am not sure if that is intended. Overcounting would give closer estimate for how big a tar archive would be, or how big an populated working tree would be, so it could be considered a feature. It all depends on what this tools is useful for, I guess. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [ANNOUNCE] git-rev-size: calculate sizes of repository 2006-08-20 21:38 ` Junio C Hamano @ 2006-08-20 23:36 ` Johannes Schindelin 0 siblings, 0 replies; 13+ messages in thread From: Johannes Schindelin @ 2006-08-20 23:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Rutger Nijlunsing Hi, On Sun, 20 Aug 2006, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > On Sun, 20 Aug 2006, Rutger Nijlunsing wrote: > > > >> I _knew_ it... Please go bug someone else. The only thing I did was > >> help someone, and for that I choose my own tools since I do it for > >> fun. > > > > Fair enough. > > > > -- 8< -- > > [PATCH] Add git-rev-size > > > > This tool spits out the number of trees, the number of blobs, and the total > > bytes of the blobs for a given rev range. > > I do not speak ruby (well I suspect I could read it if I wanted > to but I didn't try) so this may or may not be something > Johannes inherited from the original, No, it was no rewrite. But looking at the Ruby code again, it is not really similar: the builtin uses the hash to cache the sizes even for a blob. Further, it does not unpack the objects (except for the trees, and for the revision walk if you limit by pathname). However, it inherits this: > but I think the code overcounts blobs and trees for a top-level tree > that happens to have the same blob (or tree) twice. I am not sure if > that is intended. > > Overcounting would give closer estimate for how big a tar > archive would be, or how big an populated working tree would be, > so it could be considered a feature. It all depends on what > this tools is useful for, I guess. I dunno. No idea what the original requester wanted to do with it. For me, it was a nice distraction from my work. And a nice occasion to finally copy^H^H^H^Himplement the independent hash map code I always wanted to refactor from object.c. And a nice demonstration how easy it actually is these days to implement a builtin. Ciao, Dscho ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-08-20 23:36 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-20 10:54 [ANNOUNCE] git-rev-size: calculate sizes of repository Rutger Nijlunsing 2006-08-20 13:20 ` Johannes Schindelin 2006-08-20 15:24 ` Rutger Nijlunsing 2006-08-20 16:09 ` Johannes Schindelin 2006-08-20 16:37 ` Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository) Josef Weidendorfer 2006-08-20 16:51 ` Johannes Schindelin 2006-08-20 17:40 ` Rutger Nijlunsing 2006-08-20 18:41 ` Josef Weidendorfer 2006-08-20 18:47 ` Johannes Schindelin 2006-08-20 17:24 ` [ANNOUNCE] git-rev-size: calculate sizes of repository Rutger Nijlunsing 2006-08-20 18:44 ` Johannes Schindelin 2006-08-20 21:38 ` Junio C Hamano 2006-08-20 23:36 ` Johannes Schindelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).