* [PATCH 1/5] revision caching documentation: man page and technical discussion
@ 2009-08-06 9:55 Nick Edelen
2009-08-07 3:08 ` Sam Vilain
0 siblings, 1 reply; 5+ messages in thread
From: Nick Edelen @ 2009-08-06 9:55 UTC (permalink / raw)
To: Junio C Hamano, Johannes Schindelin, Jeff King, Sam Vilain,
Shawn O. Pearce
Before any code is introduced the full documentation is put forth. This
provides a man page for the porcelain, and a technical doc in technical/. The
latter describes the API, and discusses rev-cache's design, file format and
mechanics.
Signed-off-by: Nick Edelen <sirnot@gmail.com>
---
Documentation/rev-cache.txt | 51 +++++
Documentation/technical/rev-cache.txt | 336 +++++++++++++++++++++++++++++++++
2 files changed, 387 insertions(+), 0 deletions(-)
diff --git a/Documentation/rev-cache.txt b/Documentation/rev-cache.txt
new file mode 100755
index 0000000..64bd051
--- /dev/null
+++ b/Documentation/rev-cache.txt
@@ -0,0 +1,51 @@
+rev-cache porcelain
+===================
+
+A front end for the rev-cache API is provided with the builtin utility
+`rev-cache`. It is mainly intended for cache slice generation and maintenance,
+but can also walk commits within a slice. At the moment it is not particularly
+advanced, but is sufficient for repository administration.
+
+It's general syntax is:
+
+`git-rev-cache COMMAND [options] [<commit-id>...]`
+
+With the commands:
+
+`add`::
+ Add revisions to the cache. Reads commit ids from stdin, formatted as:
+ `END END ... \--not START START ...`
++
+Options:
+
+`\--all`:: Include all heads as ends.
+`\--fresh`:: Exclude everything already in a cache slice.
+`\--stdin`:: Also read commit ids from stdin (seperated by newline, `\--not`
+also valid).
+`\--legs`:: Ensure branch has no "dangling" starts (ie. is self-contained).
+`\--noobjects`:: Don't include non-commit objects.
+
+`walk`::
+ Walk a cache slice based on set of commits; formatted as add.
++
+Options:
+
+`\--objects`:: Include non-commit objects in traversal.
+
+`fuse`::
+ Coagulate several cache slices into a single large slice.
++
+Options:
+
+`\--all`:: Include all objects in repository. Will only traverse as of cache
+ends if this is not specified.
+`\--noobjects`:: Don't include non-commit objects.
+`\--ignore-size[=N]`:: Do not fuse slices of file size >= `N`. If `N` is not
+given the cutoff size defaults to ~5MB.
+
+`index`::
+ Regenerate the cache index.
+
+
+For an explanation of the API and its inner workings, see
+link:technical/rev-cache.txt[technical info on rev-cache]
diff --git a/Documentation/technical/rev-cache.txt b/Documentation/technical/rev-cache.txt
new file mode 100755
index 0000000..e95ec89
--- /dev/null
+++ b/Documentation/technical/rev-cache.txt
@@ -0,0 +1,336 @@
+rev-cache
+=========
+
+The revision cache API ('rev-cache') provides a method for efficiently storing
+and accessing commit branch sections. Such branch slices are defined by a
+series of top (interesting) and bottom (uninteresting) commits. Each slice
+contains, per commit:
+
+* All intra-slice topological relations, encoded into path "channels" (see
+ 'Mechanics' for full explanation).
+* Object meta-data: type, SHA-1, size, date (for commits).
+* Objects introduced in that commit, relative to slice (ie. only for non-start
+ commits).
+
+Storage data structures are not exported, in part to keep git's global scope
+clean, but largely because they're pretty much useless outside of rev-cache.
+
+The API
+-------
+
+The API for 'rev-cache' is quite simple. You can find the function prototypes
+in `revision.h`.
+
+Data Structures
+~~~~~~~~~~~~~~~
+
+The `rev_cache_info` struct holds all the options and flags for the API.
+
+----
+struct rev_cache_info {
+ /* generation flags */
+ unsigned objects : 1,
+ legs : 1,
+ make_index : 1;
+
+ /* traversal flags */
+ unsigned save_unique : 1,
+ add_to_pending : 1;
+
+ /* fuse options */
+ unsigned int ignore_size;
+};
+----
+
+The fields:
+
+`objects`:: Add non-commit objects to slice.
+`legs`:: Ensure bottoms have no childrens.
+`make_index`:: Integrate newly-made slice into index.
+`save_unique`:: Load unique non-commit objects into `unique` field of each
+`commit` object.
+`add_to_pending`:: Append unique non-commit objects to the `pending` object
+list in the passed `rev_info` instance.
+`ignore_size`:: If non-zero, ignore slices with size greater or equal to this.
+
+Functions
+~~~~~~~~~
+
+`init_rci`::
+
+ Initiate a `rev_cache_info` struct to default options.
+
+`make_cache_slice`::
+
+ Create a cache based on an a `rev_info` instance or `commit_list` s of
+ "tops" and "bottoms" (defaulting to latter if `rev_info` pointer is
+ NULL), copying the cache SHA-1 into a passed pointer if non-zero. A
+ `rev_cache_info` struct pointer can be passed to set options, while
+ passing NULL will set default options. A last parameter can
+ optionally recieve the final cache hash.
+
+`make_cache_index`::
+
+ Add a slice to the cache index. Requires a file descriptor, the cache
+ hash and the file size. Note that this is normally called by
+ `make_cache_slice` under the `make_index` option.
+
+`get_cache_slice`::
+
+ Given a commit SHA-1 `get_cache_slice` will search the slice index and
+ return, if found, the cache-identifying SHA-1.
+
+`traverse_cache_slice`::
+
+ Traverse a specified cache slice based on:
+
+ * `rev_cache_info` instance (optional)
+ * cache SHA-1 identifier
+ * `rev_info` instance
+ * a starting commit and commit work list
+ * date of oldest encountered interesting commit
+ * current `slop` (this and above mainly used in integration with
+ revision walker)
+
++
+The output is sent to a FILO `commit_list` "queue", while any bottom commits
+are passed back into the work list. If the walk is not contained within the
+slice, commit boundaries are also inserted into "work".
+
+`tops_from_slices`::
+
+ Will mark all top-commits in the specified cache slices with a given
+ flag, and add them to the rev pending list. Will include all if no
+ slices are specified.
+
+`coagulate_cache_slices`::
+
+ Generate a slice based on the passed `rev_info` instance, replacing all
+ encountered slices with one (larger) slice. The `ignore_size` field in
+ `rci`, if non-zero, will dictate which cache slice sizes to ignore in
+ both traversal and replacement.
+
+`regenerate_index`::
+
+ Remake cache index.
+
+Example Usage
+-------------
+
+A few examples to demonstrate usage:
+
+.Creating a slice
+----
+/* pretend you're a porcelain for rev-cache reading from the command line */
+struct rev_info revs;
+struct rev_cache_info rci;
+
+init_revisions(&revs, 0);
+init_rci(&rci);
+
+flags = 0;
+for (i = 1; i < argc; i++) {
+ if (!strcmp(argv[i], "--not"))
+ flags ^= UNINTERESTING;
+ else if(!strcmp(argv[i], "--fresh"))
+ starts_from_slices(&revs, UNINTERESTING, 0, 0);
+ else
+ handle_revision_arg(argv[i], &revs, flags, 1);
+}
+
+/* we want to explicitly set certain options */
+rci.objects = 0;
+
+if (!make_cache_slice(&rci, &revs, 0, 0, cache_sha1))
+ printf("made slice! it's called %s\n", sha1_to_hex(cache_sha1));
+----
+
+.Traversing a slice
+----
+/* let's say you're walking the tree with a 'work' list of current heads and a
+ * FILO output list 'out' */
+out = 0;
+outp = &out;
+
+while (work) {
+ struct commit *commit = pop_commit(&work);
+ struct object *object = &commit->object;
+ unsigned char *cache_sha1;
+
+ if (cache_sha1 = get_cache_slice(object->sha1)) {
+ /* note that this will instatiate any topo-relations
+ * as it goes */
+ if (traverse_cache_slice(&revs, cache_sha1,
+ commit, 0, 0, /* use defaults */
+ &outp, &work) < 0)
+ die("I'm overreacting to a non-fatal cache error");
+ } else {
+ struct commit_list *parents = commit->parents;
+
+ while (parents) {
+ struct commit *p = parents->item;
+ struct object *po = &p->object;
+
+ parents = parents->next;
+ if (po->flags & UNINTERESTING)
+ continue;
+
+ if (object->flags & UNINTERESTING)
+ po->flags |= UNINTERESTING;
+ else if (po->flags & SEEN)
+ continue;
+
+ if (!po->parsed)
+ parse_commit(p);
+ insert_by_date(p, &work);
+ }
+
+ if (object->flags & (SEEN | UNINTERESTING) == 0)
+ outp = &commit_list_insert(commit, outp);
+ object->flags |= SEEN;
+ }
+}
+----
+
+Some Internals
+--------------
+
+Although you really don't need to know anything about how rev-cache actually
+does its magic shizz, a bit of background may go a long way if you're wading
+through the source.
+
+File Formats
+~~~~~~~~~~~~
+
+A slice has a basic fixed-size header, followed by a certain number of object
+entries. Commits are sorted in topo-order, and each commit entry is followed
+by the objects added in that commit.
+
+----
+ -- +--------------------------------+
+header | object number, etc... |
+ -- +--------------------------------+
+commit | commit info |
+entry | path data |
+ +--------------------------------+
+ | tree/blob info |
+ +--------------------------------+
+ | tree/blob info |
+ +--------------------------------+
+ | ... |
+ -- +--------------------------------+
+commit | commit info |
+entry | path data |
+ +--------------------------------+
+ | tree/blob info |
+ +--------------------------------+
+ | ... |
+ -- +--------------------------------+
+... ...
+ +--------------------------------+
+----
+
+The index is somewhat similar to pack-file indexes, containing a fanout table
+and a list of index entries sorted by hash.
+
+----
+ -- +--------------------------------+
+header | object #, cache #, etc. |
+ -- +--------------------------------+
+cache | SHA-1 |
+sha1s | ... |
+ -- +--------------------------------+
+fanout | fanout[0x00] |
+table ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ | fanout[0xff] |
+ -- +--------------------------------+
+index | entry SHA-1 |
+entries | cache sha1 index |
+ +--------------------------------+
+ | |
+ ...
+ +--------------------------------+
+----
+
+All the relavant structures are readily accessible in `rev-cache.c`
+
+Mechanics
+~~~~~~~~~
+
+The most important part of rev-cache is its method of encoding topological
+relations. To ensure fluid traversal and reconstruction, commits are related
+through high-level "streams"/"channels" rather than individual
+interconnections. Intuitively, rev-cache stores history the way gitk shows it:
+commits strung up on lines, which interconnect at merges and branches.
+
+Each commit is associated to a given channel/path via a 'path id', and
+variable-length fields govern which paths (if any) are closed or opened at that
+object. This means that topo-data can be preserved in only a few bytes extra
+per object entry. Other information stored per entry is the sha-1 hash, type,
+date, size, and status in cache slice. Here is format of an object entry, both
+on-disk and in-memory:
+
+----
+struct object_entry {
+ unsigned type : 3;
+ unsigned is_end : 1;
+ unsigned is_start : 1;
+ unsigned uninteresting : 1;
+ unsigned include : 1;
+ unsigned flags : 1;
+ unsigned char sha1[20];
+
+ unsigned merge_nr : 6;
+ unsigned split_nr : 7;
+ unsigned size_size : 3;
+
+ uint32_t date;
+ uint16_t path;
+
+ /* merge paths */
+ /* split paths */
+ /* size */
+};
+----
+
+An explanation of each field:
+
+`type`:: Object type
+`is_end`:: The commit has some parents outside the cache slice (all if slice
+has legs)
+`is_start`:: The commit has no children in cache slice
+`uninteresting`:: Run-time flag, used in traversal
+`include`:: Run-time flag, used in traversal (initialization)
+`flags`:: Currently unused, extra bit
+`sha`:: Object SHA-1 hash
+
+`merge_nr`:: The number of paths the current channel diverges into; the current
+path ends upon any merge.
+`split_nr`:: The number of paths this commit ends; used on both merging and
+branching.
+`size_size`:: Number of bytes the object size takes up.
+
+merge paths:: The path IDs (16-bit) that are to be created.
+split paths:: The path IDs (16-bit) that are to be ended.
+size:: The size split into the minimum number of bytes.
+
+Each path ID refers to an index in a 'path array', which stores the current
+status (eg. active, interestingness) of each channel.
+
+Due to topo-relations and boundary tracking, all of a commit's parents must be
+encountered before the path is reallocated. This is achieved by using a
+counter system per merge: starting at the parent number, the counter is
+decremented as each parent is encountered (dictated by 'split paths'); at 0 the
+path is cleared.
+
+Boundary tracking is necessary because non-commits are stored relative to the
+commit in which they were introduced. If a series of commits is not included
+in the output, the last interesting commit must be parsed manually to ensure
+all objects are accounted for.
+
+To prevent list-objects from recursing into trees that we've already taken care
+of, the flag `FACE_VALUE` is introduced. An object with this flag is not
+explored (= "taken at face value"), significantly reducing I/O and processing
+time.
+
+(NSE)
--
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/5] revision caching documentation: man page and technical discussion
2009-08-06 9:55 [PATCH 1/5] revision caching documentation: man page and technical discussion Nick Edelen
@ 2009-08-07 3:08 ` Sam Vilain
2009-08-07 12:07 ` Rogan Dawes
0 siblings, 1 reply; 5+ messages in thread
From: Sam Vilain @ 2009-08-07 3:08 UTC (permalink / raw)
To: Nick Edelen
Cc: Junio C Hamano, Johannes Schindelin, Jeff King, Shawn O. Pearce,
Andreas Ericsson, Christian Couder, git@vger.kernel.org
Nick Edelen wrote:
> Before any code is introduced the full documentation is put forth. This
> provides a man page for the porcelain, and a technical doc in technical/. The
> latter describes the API, and discusses rev-cache's design, file format and
> mechanics.
>
> Signed-off-by: Nick Edelen <sirnot@gmail.com>
>
> ---
> Documentation/rev-cache.txt | 51 +++++
> Documentation/technical/rev-cache.txt | 336 +++++++++++++++++++++++++++++++++
> 2 files changed, 387 insertions(+), 0 deletions(-)
>
A couple of minor nits here:
1. Documentation/rev-cache.txt should be
Documentation/git-rev-cache.txt, because it documents that command.
2. there are many whitespace errors:
wilber:~/src/git$ git am \[PATCH\ 1_5\]\ revision\ caching\
documentation\:\ man\ page\ and\ technical\ discussion.eml
Applying: revision caching documentation: man page and technical discussion
/home/samv/src/git/.git/rebase-apply/patch:15: trailing whitespace.
A front end for the rev-cache API is provided with the builtin utility
/home/samv/src/git/.git/rebase-apply/patch:16: trailing whitespace.
`rev-cache`. It is mainly intended for cache slice generation and
maintenance,
/home/samv/src/git/.git/rebase-apply/patch:17: trailing whitespace.
but can also walk commits within a slice. At the moment it is not
particularly
/home/samv/src/git/.git/rebase-apply/patch:34: trailing whitespace.
`\--stdin`:: Also read commit ids from stdin (seperated by newline,
`\--not`
/home/samv/src/git/.git/rebase-apply/patch:51: trailing whitespace.
`\--all`:: Include all objects in repository. Will only traverse as of
cache
warning: squelched 166 whitespace errors
warning: 171 lines add whitespace errors.
Be sure to check the patch doesn't do that...
> diff --git a/Documentation/rev-cache.txt b/Documentation/rev-cache.txt
> new file mode 100755
> index 0000000..64bd051
> --- /dev/null
> +++ b/Documentation/rev-cache.txt
> @@ -0,0 +1,51 @@
> +rev-cache porcelain
> +===================
> +
> +A front end for the rev-cache API is provided with the builtin utility
> +`rev-cache`. It is mainly intended for cache slice generation and maintenance,
> +but can also walk commits within a slice. At the moment it is not particularly
> +advanced, but is sufficient for repository administration.
>
That last sentence is a bit unnecessarily self-doubting; "It currently
provides basic functionality" would be fine.
> +
> +It's general syntax is:
> +
> +`git-rev-cache COMMAND [options] [<commit-id>...]`
> +
> +With the commands:
>
You should use the same structure and headings of this file as the other
man pages which are found under Documentation/git-*.txt
> +
> +`add`::
> + Add revisions to the cache. Reads commit ids from stdin, formatted as:
> + `END END ... \--not START START ...`
>
Really, it's reading a revision list; might be better to call it that.
Can it accept the same options as git-rev-list here or just a list of
starting points/tips? And "END" and "START" are still backwards to the
rest of git core!
> ++
> +Options:
> +
> +`\--all`:: Include all heads as ends.
> +`\--fresh`:: Exclude everything already in a cache slice.
>
If you use the paragraph layout used by other commands you can explain a
little bit more about what you mean here. By "Cache Slice" do you mean
"Revision Cache"? Does that --fresh imply that all of the revisions
which were not indexed will automatically get indexed?
> +`\--stdin`:: Also read commit ids from stdin (seperated by newline, `\--not`
> +also valid).
> +`\--legs`:: Ensure branch has no "dangling" starts (ie. is self-contained).
>
The --legs term is a little too 'cute', is there a better way to
describe this? And "starts" is not a real word in that context; if you
are using a technical term, you should define it..
> +`\--noobjects`:: Don't include non-commit objects.
> +
> +`walk`::
> + Walk a cache slice based on set of commits; formatted as add.
>
So, this works like 'rev-list' ?
"Formatted" is wrong there I think; do you mean it 'accepts the same
arguments as add' ?
> ++
> +Options:
> +
> +`\--objects`:: Include non-commit objects in traversal.
> +
>
Which was the default? --objects or --no-objects?
> +`fuse`::
> + Coagulate several cache slices into a single large slice.
>
Coagulate? You mean, the revision caches will stop being liquid and go
gluggy, like a pool of blood clotting?
How about "combine" :-) - and the option might be better called
something simple like that, too.
> ++
> +Options:
> +
> +`\--all`:: Include all objects in repository. Will only traverse as of cache
> +ends if this is not specified.
>
That sentence doesn't quite read well to me.
> +`\--noobjects`:: Don't include non-commit objects.
> +`\--ignore-size[=N]`:: Do not fuse slices of file size >= `N`. If `N` is not
> +given the cutoff size defaults to ~5MB.
> +
> +`index`::
> + Regenerate the cache index.
> +
> +
> +For an explanation of the API and its inner workings, see
> +link:technical/rev-cache.txt[technical info on rev-cache]
> diff --git a/Documentation/technical/rev-cache.txt b/Documentation/technical/rev-cache.txt
> new file mode 100755
> index 0000000..e95ec89
> --- /dev/null
> +++ b/Documentation/technical/rev-cache.txt
> @@ -0,0 +1,336 @@
> +rev-cache
> +=========
> +
> +The revision cache API ('rev-cache') provides a method for efficiently storing
> +and accessing commit branch sections. Such branch slices are defined by a
> +series of top (interesting) and bottom (uninteresting) commits. Each slice
> +contains, per commit:
> +
> +* All intra-slice topological relations, encoded into path "channels" (see
> + 'Mechanics' for full explanation).
> +* Object meta-data: type, SHA-1, size, date (for commits).
> +* Objects introduced in that commit, relative to slice (ie. only for non-start
> + commits).
> +
> +Storage data structures are not exported, in part to keep git's global scope
> +clean, but largely because they're pretty much useless outside of rev-cache.
> +
> +The API
> +-------
> +
> +The API for 'rev-cache' is quite simple. You can find the function prototypes
> +in `revision.h`.
> +
> +Data Structures
> +~~~~~~~~~~~~~~~
> +
> +The `rev_cache_info` struct holds all the options and flags for the API.
> +
> +----
> +struct rev_cache_info {
> + /* generation flags */
> + unsigned objects : 1,
> + legs : 1,
> + make_index : 1;
> +
> + /* traversal flags */
> + unsigned save_unique : 1,
> + add_to_pending : 1;
> +
> + /* fuse options */
> + unsigned int ignore_size;
> +};
> +----
> +
> +The fields:
> +
> +`objects`:: Add non-commit objects to slice.
> +`legs`:: Ensure bottoms have no childrens.
>
"children" is already plural and needs no 's'
> +`make_index`:: Integrate newly-made slice into index.
> +`save_unique`:: Load unique non-commit objects into `unique` field of each
> +`commit` object.
>
Unique how? Do you mean, that they were introduced in this slice and
not reachable from any of the bottom/end commits?
> +`add_to_pending`:: Append unique non-commit objects to the `pending` object
> +list in the passed `rev_info` instance.
>
That doesn't make much sense to me either. What does that mean? Well,
I'll read on.
> +`ignore_size`:: If non-zero, ignore slices with size greater or equal to this.
>
What will this ignoring mean?
> +Functions
> +~~~~~~~~~
> +
> +`init_rci`::
> +
> + Initiate a `rev_cache_info` struct to default options.
> +
> +`make_cache_slice`::
> +
> + Create a cache based on an a `rev_info` instance or `commit_list` s of
> + "tops" and "bottoms" (defaulting to latter if `rev_info` pointer is
> + NULL), copying the cache SHA-1 into a passed pointer if non-zero. A
> + `rev_cache_info` struct pointer can be passed to set options, while
> + passing NULL will set default options. A last parameter can
> + optionally recieve the final cache hash.
> +
> +`make_cache_index`::
> +
> + Add a slice to the cache index. Requires a file descriptor, the cache
> + hash and the file size. Note that this is normally called by
> + `make_cache_slice` under the `make_index` option.
> +
> +`get_cache_slice`::
> +
> + Given a commit SHA-1 `get_cache_slice` will search the slice index and
> + return, if found, the cache-identifying SHA-1.
> +
> +`traverse_cache_slice`::
> +
> + Traverse a specified cache slice based on:
> +
> + * `rev_cache_info` instance (optional)
> + * cache SHA-1 identifier
> + * `rev_info` instance
> + * a starting commit and commit work list
> + * date of oldest encountered interesting commit
> + * current `slop` (this and above mainly used in integration with
> + revision walker)
>
Hmm what's a 'slop' ?
> +
> ++
> +The output is sent to a FILO `commit_list` "queue", while any bottom commits
> +are passed back into the work list. If the walk is not contained within the
> +slice, commit boundaries are also inserted into "work".
> +
> +`tops_from_slices`::
> +
> + Will mark all top-commits in the specified cache slices with a given
> + flag, and add them to the rev pending list. Will include all if no
> + slices are specified.
> +
> +`coagulate_cache_slices`::
> +
> + Generate a slice based on the passed `rev_info` instance, replacing all
> + encountered slices with one (larger) slice. The `ignore_size` field in
> + `rci`, if non-zero, will dictate which cache slice sizes to ignore in
> + both traversal and replacement.
> +
> +`regenerate_index`::
> +
> + Remake cache index.
> +
> +Example Usage
> +-------------
> +
> +A few examples to demonstrate usage:
> +
> +.Creating a slice
> +----
> +/* pretend you're a porcelain for rev-cache reading from the command line */
> +struct rev_info revs;
> +struct rev_cache_info rci;
> +
> +init_revisions(&revs, 0);
> +init_rci(&rci);
> +
> +flags = 0;
> +for (i = 1; i < argc; i++) {
> + if (!strcmp(argv[i], "--not"))
> + flags ^= UNINTERESTING;
> + else if(!strcmp(argv[i], "--fresh"))
> + starts_from_slices(&revs, UNINTERESTING, 0, 0);
> + else
> + handle_revision_arg(argv[i], &revs, flags, 1);
> +}
> +
> +/* we want to explicitly set certain options */
> +rci.objects = 0;
> +
> +if (!make_cache_slice(&rci, &revs, 0, 0, cache_sha1))
> + printf("made slice! it's called %s\n", sha1_to_hex(cache_sha1));
>
Heh, normally it's acceptable to let examples in documentation not be
complete working programs :)
> +----
> +
> +.Traversing a slice
> +----
> +/* let's say you're walking the tree with a 'work' list of current heads and a
> + * FILO output list 'out' */
> +out = 0;
> +outp = &out;
> +
> +while (work) {
> + struct commit *commit = pop_commit(&work);
> + struct object *object = &commit->object;
> + unsigned char *cache_sha1;
> +
> + if (cache_sha1 = get_cache_slice(object->sha1)) {
> + /* note that this will instatiate any topo-relations
> + * as it goes */
> + if (traverse_cache_slice(&revs, cache_sha1,
> + commit, 0, 0, /* use defaults */
> + &outp, &work) < 0)
> + die("I'm overreacting to a non-fatal cache error");
> + } else {
> + struct commit_list *parents = commit->parents;
> +
> + while (parents) {
> + struct commit *p = parents->item;
> + struct object *po = &p->object;
> +
> + parents = parents->next;
> + if (po->flags & UNINTERESTING)
> + continue;
> +
> + if (object->flags & UNINTERESTING)
> + po->flags |= UNINTERESTING;
> + else if (po->flags & SEEN)
> + continue;
> +
> + if (!po->parsed)
> + parse_commit(p);
> + insert_by_date(p, &work);
> + }
> +
> + if (object->flags & (SEEN | UNINTERESTING) == 0)
> + outp = &commit_list_insert(commit, outp);
> + object->flags |= SEEN;
> + }
> +}
> +----
>
Ok, good - perhaps needs a comment or two - not much, it's already long.
> +
> +Some Internals
> +--------------
> +
> +Although you really don't need to know anything about how rev-cache actually
> +does its magic shizz, a bit of background may go a long way if you're wading
> +through the source.
>
"does its work" will do nicely..
> +
> +File Formats
> +~~~~~~~~~~~~
> +
> +A slice has a basic fixed-size header, followed by a certain number of object
> +entries. Commits are sorted in topo-order, and each commit entry is followed
> +by the objects added in that commit.
>
Starting from the top or the bottom? And it might pay to clarify which
objects are included or not; ie objects reachable from "bottom" commits
or not.
> +
> +----
> + -- +--------------------------------+
> +header | object number, etc... |
> + -- +--------------------------------+
> +commit | commit info |
> +entry | path data |
> + +--------------------------------+
> + | tree/blob info |
> + +--------------------------------+
> + | tree/blob info |
> + +--------------------------------+
> + | ... |
> + -- +--------------------------------+
> +commit | commit info |
> +entry | path data |
> + +--------------------------------+
> + | tree/blob info |
> + +--------------------------------+
> + | ... |
> + -- +--------------------------------+
> +... ...
> + +--------------------------------+
> +----
> +
> +The index is somewhat similar to pack-file indexes, containing a fanout table
> +and a list of index entries sorted by hash.
> +
> +----
> + -- +--------------------------------+
> +header | object #, cache #, etc. |
> + -- +--------------------------------+
> +cache | SHA-1 |
> +sha1s | ... |
> + -- +--------------------------------+
> +fanout | fanout[0x00] |
> +table ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + | fanout[0xff] |
> + -- +--------------------------------+
> +index | entry SHA-1 |
> +entries | cache sha1 index |
> + +--------------------------------+
> + | |
> + ...
> + +--------------------------------+
> +----
> +
> +All the relavant structures are readily accessible in `rev-cache.c`
> +
>
Ok right so which is the on-disk container format? The index or the
slice? I'm a little confused..
> +Mechanics
> +~~~~~~~~~
> +
> +The most important part of rev-cache is its method of encoding topological
> +relations. To ensure fluid traversal and reconstruction,
Is it still fluid after coagulation? ;-)
> commits are related
> +through high-level "streams"/"channels" rather than individual
> +interconnections. Intuitively, rev-cache stores history the way gitk shows it:
> +commits strung up on lines, which interconnect at merges and branches.
> +
> +Each commit is associated to a given channel/path via a 'path id', and
> +variable-length fields govern which paths (if any) are closed or opened at that
> +object. This means that topo-data can be preserved in only a few bytes extra
> +per object entry. Other information stored per entry is the sha-1 hash, type,
> +date, size, and status in cache slice. Here is format of an object entry, both
> +on-disk and in-memory:
> +
> +----
> +struct object_entry {
> + unsigned type : 3;
> + unsigned is_end : 1;
> + unsigned is_start : 1;
> + unsigned uninteresting : 1;
> + unsigned include : 1;
> + unsigned flags : 1;
> + unsigned char sha1[20];
> +
> + unsigned merge_nr : 6;
> + unsigned split_nr : 7;
> + unsigned size_size : 3;
> +
> + uint32_t date;
> + uint16_t path;
> +
> + /* merge paths */
> + /* split paths */
> + /* size */
> +};
> +----
> +
> +An explanation of each field:
> +
> +`type`:: Object type
> +`is_end`:: The commit has some parents outside the cache slice (all if slice
> +has legs)
> +`is_start`:: The commit has no children in cache slice
> +`uninteresting`:: Run-time flag, used in traversal
> +`include`:: Run-time flag, used in traversal (initialization)
> +`flags`:: Currently unused, extra bit
> +`sha`:: Object SHA-1 hash
> +
> +`merge_nr`:: The number of paths the current channel diverges into; the current
> +path ends upon any merge.
> +`split_nr`:: The number of paths this commit ends; used on both merging and
> +branching.
> +`size_size`:: Number of bytes the object size takes up.
> +
> +merge paths:: The path IDs (16-bit) that are to be created.
> +split paths:: The path IDs (16-bit) that are to be ended.
> +size:: The size split into the minimum number of bytes.
> +
> +Each path ID refers to an index in a 'path array', which stores the current
> +status (eg. active, interestingness) of each channel.
> +
> +Due to topo-relations and boundary tracking, all of a commit's parents must be
> +encountered before the path is reallocated. This is achieved by using a
> +counter system per merge: starting at the parent number, the counter is
> +decremented as each parent is encountered (dictated by 'split paths'); at 0 the
> +path is cleared.
> +
> +Boundary tracking is necessary because non-commits are stored relative to the
> +commit in which they were introduced. If a series of commits is not included
> +in the output, the last interesting commit must be parsed manually to ensure
> +all objects are accounted for.
> +
> +To prevent list-objects from recursing into trees that we've already taken care
> +of, the flag `FACE_VALUE` is introduced. An object with this flag is not
> +explored (= "taken at face value"), significantly reducing I/O and processing
> +time.
> +
> +(NSE)
>
tl;dr but that's ok it's magic shizz.
Anyway looking nice ... see what I can say about the next patches.
sam
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/5] revision caching documentation: man page and technical discussion
2009-08-07 3:08 ` Sam Vilain
@ 2009-08-07 12:07 ` Rogan Dawes
2009-08-07 12:20 ` Johannes Schindelin
0 siblings, 1 reply; 5+ messages in thread
From: Rogan Dawes @ 2009-08-07 12:07 UTC (permalink / raw)
To: Sam Vilain
Cc: Nick Edelen, Junio C Hamano, Johannes Schindelin, Jeff King,
Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git@vger.kernel.org
Sam Vilain wrote:
>> +`fuse`::
>> + Coagulate several cache slices into a single large slice.
>>
>
> Coagulate? You mean, the revision caches will stop being liquid and go
> gluggy, like a pool of blood clotting?
>
> How about "combine" :-) - and the option might be better called
> something simple like that, too.
I think the word he had in mind was "coalesce".
Rogan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/5] revision caching documentation: man page and technical discussion
2009-08-07 12:07 ` Rogan Dawes
@ 2009-08-07 12:20 ` Johannes Schindelin
2009-08-07 21:58 ` Sam Vilain
0 siblings, 1 reply; 5+ messages in thread
From: Johannes Schindelin @ 2009-08-07 12:20 UTC (permalink / raw)
To: Rogan Dawes
Cc: Sam Vilain, Nick Edelen, Junio C Hamano, Jeff King,
Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git@vger.kernel.org
Hi,
On Fri, 7 Aug 2009, Rogan Dawes wrote:
> Sam Vilain wrote:
>
> >> +`fuse`::
> >> + Coagulate several cache slices into a single large slice.
> >>
> >
> > Coagulate? You mean, the revision caches will stop being liquid and go
> > gluggy, like a pool of blood clotting?
> >
> > How about "combine" :-) - and the option might be better called
> > something simple like that, too.
>
> I think the word he had in mind was "coalesce".
As Git users typically have a quite good idea what a "merge" is, I'd
prefer that word anyway.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/5] revision caching documentation: man page and technical discussion
2009-08-07 12:20 ` Johannes Schindelin
@ 2009-08-07 21:58 ` Sam Vilain
0 siblings, 0 replies; 5+ messages in thread
From: Sam Vilain @ 2009-08-07 21:58 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Rogan Dawes, Nick Edelen, Junio C Hamano, Jeff King,
Shawn O. Pearce, Andreas Ericsson, Christian Couder,
git@vger.kernel.org
On Fri, 2009-08-07 at 14:20 +0200, Johannes Schindelin wrote:
> > I think the word he had in mind was "coalesce".
> As Git users typically have a quite good idea what a "merge" is, I'd
> prefer that word anyway.
I don't like that so much, given that "merge" has quite a specific
meaning. But not really that fussed. In this case it's more like a
'gc' or 'repack'; an internal reshuffle of an index to make it a single
one and not many.
Sam
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-08-07 21:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-06 9:55 [PATCH 1/5] revision caching documentation: man page and technical discussion Nick Edelen
2009-08-07 3:08 ` Sam Vilain
2009-08-07 12:07 ` Rogan Dawes
2009-08-07 12:20 ` Johannes Schindelin
2009-08-07 21:58 ` Sam Vilain
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).