* [PATCH] Add a birdview-on-the-source-code section to the user manual
@ 2007-05-08 15:10 Johannes Schindelin
2007-05-08 21:01 ` Karl Hasselström
2007-05-09 3:18 ` J. Bruce Fields
0 siblings, 2 replies; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-08 15:10 UTC (permalink / raw)
To: bfields, junio, git
In http://thread.gmane.org/gmane.comp.version-control.git/42479,
a birdview on the source code was requested.
J. Bruce Fields suggested that my reply should be included in the
user manual, and there was nothing of an outcry, so here it is,
not even 2 months later.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Documentation/user-manual.txt | 196 +++++++++++++++++++++++++++++++++++++++++
1 files changed, 196 insertions(+), 0 deletions(-)
diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index 67f5b9b..3c67b68 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -3161,6 +3161,202 @@ confusing and scary messages, but it won't actually do anything bad. In
contrast, running "git prune" while somebody is actively changing the
repository is a *BAD* idea).
+[[birdview-on-the-source-code]]
+A birdview on Git's source code
+-----------------------------
+
+While Git's source code is quite elegant, it is not always easy for
+new developers to find their way through it. A good idea is to look
+at the contents of the initial commit:
+_e83c5163316f89bfbde7d9ab23ca2e25604af290_ (also known as _v0.99~954_).
+
+Tip: you can see what files are in there with
+
+----------------------------------------------------
+$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:
+----------------------------------------------------
+
+and look at those files with something like
+
+-----------------------------------------------------------
+$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:cache.h
+-----------------------------------------------------------
+
+Be sure to read the README in that revision _after_ you are familiar with
+the terminology (<<glossary>>), since the terminology has changed a little
+since then. For example, we call the things "commits" now, which are
+described in that README as "changesets".
+
+Actually a lot of the structure as it is now can be explained by that
+initial commit.
+
+For example, we do not call it "cache" any more, but "index", however, the
+file is still called `cache.h`. Remark: Not much reason to change it now,
+especially since there is no good single name for it anyway, because it is
+basically _the_ header file which is included by _all_ of Git's C sources.
+
+If you grasp the ideas in that initial commit (it is really small and you
+can get into it really fast, and it will help you recognize things in the
+much larger code base we have now), you should go on skimming `cache.h`,
+`object.h` and `commit.h`.
+
+By now, you know what the index is (and find the corresponding data
+structures in `cache.h`), and that there are just a couple of object types
+(blobs, trees, commits and tags) which inherit their common structure from
+`struct object`, which is their first member (and thus, you can cast e.g.
+`(struct object *)commit` to achieve the _same_ as `&commit->object`, i.e.
+get at the object name and flags).
+
+Now is a good point to take a break to let this information sink in.
+
+Next step: get familiar with the object naming. Read <<naming-commits>>.
+There are quite a few ways to name an object (and not only revisions!).
+All of these are handled in `sha1_name.c`. Just have a quick look at
+the function `get_sha1()`. A lot of the special handling is done by
+functions like `get_sha1_basic()` or the likes.
+
+This is just to get you into the groove for the most libified part of Git:
+the revision walker.
+
+Basically, the initial version of `git log` was a shell script:
+
+----------------------------------------------------------------
+$ git-rev-list --pretty $(git-rev-parse --default HEAD "$@") | \
+ LESS=-S ${PAGER:-less}
+----------------------------------------------------------------
+
+What does this mean?
+
+`git-rev-list` is the original version of the revision walker, which
+_always_ printed a list of revisions to stdout. It is still functional,
+and needs to, since most new Git programs start out as scripts using
+`git-rev-list`.
+
+`git-rev-parse` is not as important any more; it was only used to filter out
+options that were relevant for the different plumbing commands that were
+called by the script.
+
+Most of what `git-rev-list` did is contained in `revision.c` and
+`revision.h`. It wraps the options in a struct named rev_info, which
+controls how and what revisions are walked, and more.
+
+Nowadays, `git log` is a builtin, which means that it is _contained_ in the
+command `git`. The source side of a builtin is
+
+- a function called `cmd_<bla>`, typically defined in `builtin-<bla>.c`,
+ and declared in `builtin.h`,
+
+- an entry in the `commands[]` array in `git.c`, and
+
+- an entry in `BUILTIN_OBJECTS` in the `Makefile`.
+
+Sometimes, more than one builtin is contained in one source file. For
+example, `cmd_whatchanged()` and `cmd_log()` both reside in `builtin-log.c`,
+since they share quite a bit of code. In that case, the commands which are
+_not_ named like the `.c` file in which they live have to be listed in
+`BUILT_INS` in the `Makefile`.
+
+`git log` looks more complicated in C than it does in the original script,
+but that allows for a much greater flexibility and performance.
+
+Here again it is a good point to take a pause.
+
+Lesson three is: study the code. Really, it is the best way to learn about
+the organization of Git (after you know the basic concepts).
+
+So, think about something which you are interested in, say, "how can I
+access a blob just knowing the object name of it?". The first step is to
+find a Git command with which you can do it. In this example, it is either
+`git show` or `git cat-file`.
+
+For the sake of clarity, let's stay with `git cat-file`, because it
+
+- is plumbing, and
+
+- was around even in the initial commit (it literally went only through
+ some 20 revisions as `cat-file.c`, was renamed to `builtin-cat-file.c`
+ when made a builtin, and then saw less than 10 versions).
+
+So, look into `builtin-cat-file.c`, search for `cmd_cat_file()` and look what
+it does.
+
+------------------------------------------------------------------
+ git_config(git_default_config);
+ if (argc != 3)
+ usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
+ if (get_sha1(argv[2], sha1))
+ die("Not a valid object name %s", argv[2]);
+------------------------------------------------------------------
+
+Let's skip over the obvious details; the only really interesting part
+here is the call to `get_sha1()`. It tries to interpret `argv[2]` as an
+object name, and if it refers to an object which is present in the current
+repository, it writes the resulting SHA-1 into the variable `sha1`.
+
+Two things are interesting here:
+
+- `get_sha1()` returns 0 on _success_. This might surprise some new
+ Git hackers, but there is a long tradition in UNIX to return different
+ negative numbers in case of different errors -- and 0 on success.
+
+- the variable `sha1` in the function signature of `get_sha1()` is `unsigned
+ char *`, but is actually expected to be a pointer to `unsigned
+ char[20]`. This variable will contain the big endian version of the
+ 40-character hex string representation of the SHA-1.
+
+You will see both of these things throughout the code.
+
+Now, for the meat:
+
+-----------------------------------------------------------------------------
+ case 0:
+ buf = read_object_with_reference(sha1, argv[1], &size, NULL);
+-----------------------------------------------------------------------------
+
+This is how you read a blob (actually, not only a blob, but any type of
+object). To know how the function `read_object_with_reference()` actually
+works, find the source code for it (something like `git grep
+read_object_with | grep ":[a-z]"` in the git repository), and read
+the source.
+
+To find out how the result can be used, just read on in `cmd_cat_file()`:
+
+-----------------------------------
+ write_or_die(1, buf, size);
+-----------------------------------
+
+Sometimes, you do not know where to look for a feature. In many such cases,
+it helps to search through the output of `git log`, and then `git show` the
+corresponding commit.
+
+Example: If you know that there was some test case for `git bundle`, but
+do not remember where it was (yes, you _could_ `git grep bundle t/`, but that
+does not illustrate the point!):
+
+------------------------
+$ git log --no-merges t/
+------------------------
+
+In the pager (`less`), just search for "bundle", go a few lines back,
+and see that it is in commit 18449ab0... Now just copy this object name,
+and paste it into the command line
+
+-------------------
+$ git show 18449ab0
+-------------------
+
+Voila.
+
+Another example: Find out what to do in order to make some script a
+builtin:
+
+-------------------------------------------------
+$ git log --no-merges --diff-filter=A builtin-*.c
+-------------------------------------------------
+
+You see, Git is actually the best tool to find out about the source of Git
+itself!
+
[[glossary]]
include::glossary.txt[]
--
1.5.2.rc2.2469.gea95f
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-08 15:10 [PATCH] Add a birdview-on-the-source-code section to the user manual Johannes Schindelin
@ 2007-05-08 21:01 ` Karl Hasselström
2007-05-08 21:07 ` Johannes Schindelin
2007-05-09 3:18 ` J. Bruce Fields
1 sibling, 1 reply; 33+ messages in thread
From: Karl Hasselström @ 2007-05-08 21:01 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: bfields, junio, git
On 2007-05-08 17:10:47 +0200, Johannes Schindelin wrote:
> + char *`, but is actually expected to be a pointer to `unsigned
> + char[20]`. This variable will contain the big endian version of the
> + 40-character hex string representation of the SHA-1.
Either it should be "unsigned char[40]" (or possibly 41 with a
terminating \0), or else you shouldn't be talking about hexadecimal
since it's just a 20-byte big-endian unsigned integer. (A third
possibility is that I'm totally confused.)
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-08 21:01 ` Karl Hasselström
@ 2007-05-08 21:07 ` Johannes Schindelin
2007-05-08 21:31 ` Karl Hasselström
0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-08 21:07 UTC (permalink / raw)
To: Karl Hasselström; +Cc: bfields, junio, git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 686 bytes --]
Hi,
On Tue, 8 May 2007, Karl Hasselström wrote:
> On 2007-05-08 17:10:47 +0200, Johannes Schindelin wrote:
>
> > + char *`, but is actually expected to be a pointer to `unsigned
> > + char[20]`. This variable will contain the big endian version of the
> > + 40-character hex string representation of the SHA-1.
>
> Either it should be "unsigned char[40]" (or possibly 41 with a
> terminating \0), or else you shouldn't be talking about hexadecimal
> since it's just a 20-byte big-endian unsigned integer. (A third
> possibility is that I'm totally confused.)
It is 40 hex-character, but 20 _byte_. If you have any ideas how to
formulate that better than I did...
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-08 21:07 ` Johannes Schindelin
@ 2007-05-08 21:31 ` Karl Hasselström
2007-05-08 23:10 ` Johannes Schindelin
0 siblings, 1 reply; 33+ messages in thread
From: Karl Hasselström @ 2007-05-08 21:31 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: bfields, junio, git
On 2007-05-08 23:07:04 +0200, Johannes Schindelin wrote:
> On Tue, 8 May 2007, Karl Hasselström wrote:
>
> > On 2007-05-08 17:10:47 +0200, Johannes Schindelin wrote:
> >
> > > + char *`, but is actually expected to be a pointer to `unsigned
> > > + char[20]`. This variable will contain the big endian version of the
> > > + 40-character hex string representation of the SHA-1.
> >
> > Either it should be "unsigned char[40]" (or possibly 41 with a
> > terminating \0), or else you shouldn't be talking about
> > hexadecimal since it's just a 20-byte big-endian unsigned integer.
> > (A third possibility is that I'm totally confused.)
>
> It is 40 hex-character, but 20 _byte_. If you have any ideas how to
> formulate that better than I did...
I think this is less confusing:
This variable will contain the 160-bit SHA-1.
It avoids talking of hex, since it's not really stored in hex format
any more than any other binary number with a number of bits divisible
by four. And it avoids saying big-endian, which is not relevant anyway
since we don't use hashes as integers.
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-08 21:31 ` Karl Hasselström
@ 2007-05-08 23:10 ` Johannes Schindelin
2007-05-08 23:22 ` Karl Hasselström
2007-05-09 4:54 ` Daniel Barkalow
0 siblings, 2 replies; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-08 23:10 UTC (permalink / raw)
To: Karl Hasselström; +Cc: bfields, junio, git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1510 bytes --]
Hi,
On Tue, 8 May 2007, Karl Hasselström wrote:
> On 2007-05-08 23:07:04 +0200, Johannes Schindelin wrote:
>
> > On Tue, 8 May 2007, Karl Hasselström wrote:
> >
> > > On 2007-05-08 17:10:47 +0200, Johannes Schindelin wrote:
> > >
> > > > + char *`, but is actually expected to be a pointer to `unsigned
> > > > + char[20]`. This variable will contain the big endian version of the
> > > > + 40-character hex string representation of the SHA-1.
> > >
> > > Either it should be "unsigned char[40]" (or possibly 41 with a
> > > terminating \0), or else you shouldn't be talking about
> > > hexadecimal since it's just a 20-byte big-endian unsigned integer.
> > > (A third possibility is that I'm totally confused.)
> >
> > It is 40 hex-character, but 20 _byte_. If you have any ideas how to
> > formulate that better than I did...
>
> I think this is less confusing:
>
> This variable will contain the 160-bit SHA-1.
>
> It avoids talking of hex, since it's not really stored in hex format
> any more than any other binary number with a number of bits divisible
> by four. And it avoids saying big-endian, which is not relevant anyway
> since we don't use hashes as integers.
Well, I do not buy into that. First, we _have_ to say that it is
big-endian. It was utterly confusing to _me_ that the hash was not little
endian, as I expected on an Intel processor.
And I'd rather mention the hex representation (what you see in git-log and
git-ls-tree). This helps debugging, believe me.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-08 23:10 ` Johannes Schindelin
@ 2007-05-08 23:22 ` Karl Hasselström
2007-05-09 4:54 ` Daniel Barkalow
1 sibling, 0 replies; 33+ messages in thread
From: Karl Hasselström @ 2007-05-08 23:22 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: bfields, junio, git
On 2007-05-09 01:10:13 +0200, Johannes Schindelin wrote:
> On Tue, 8 May 2007, Karl Hasselström wrote:
>
> > I think this is less confusing:
> >
> > This variable will contain the 160-bit SHA-1.
> >
> > It avoids talking of hex, since it's not really stored in hex
> > format any more than any other binary number with a number of bits
> > divisible by four. And it avoids saying big-endian, which is not
> > relevant anyway since we don't use hashes as integers.
>
> Well, I do not buy into that. First, we _have_ to say that it is
> big-endian. It was utterly confusing to _me_ that the hash was not
> little endian, as I expected on an Intel processor.
If you think of it as a integer and not a byte array, then yes. But
fair enough, if it confused you, it'd probably confuse others as well.
> And I'd rather mention the hex representation (what you see in
> git-log and git-ls-tree). This helps debugging, believe me.
But that byte array doesn't store the hex representation!
There is a trivial transformation that will convert a 20-byte integer
to a 40-character hex string representation of that integer: translate
each nibble to one hex digit. But the code snippet you gave uses the
former representation, and that's the point I thought you were trying
to make in the first place.
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-08 15:10 [PATCH] Add a birdview-on-the-source-code section to the user manual Johannes Schindelin
2007-05-08 21:01 ` Karl Hasselström
@ 2007-05-09 3:18 ` J. Bruce Fields
2007-05-09 4:06 ` Junio C Hamano
` (3 more replies)
1 sibling, 4 replies; 33+ messages in thread
From: J. Bruce Fields @ 2007-05-09 3:18 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: junio, git
On Tue, May 08, 2007 at 05:10:47PM +0200, Johannes Schindelin wrote:
>
> In http://thread.gmane.org/gmane.comp.version-control.git/42479,
> a birdview on the source code was requested.
>
> J. Bruce Fields suggested that my reply should be included in the
> user manual, and there was nothing of an outcry, so here it is,
> not even 2 months later.
Looks helpful, concise, and to the point. Neat-o.
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Comments, nothing major:
> +If you grasp the ideas in that initial commit (it is really small and you
> +can get into it really fast, and it will help you recognize things in the
> +much larger code base we have now), you should go on skimming `cache.h`,
> +`object.h` and `commit.h`.
Might want to add "in a recent commit"?--it's not clear that you've
transitioned away from talking about the initial commit.
> +This is just to get you into the groove for the most libified part of Git:
> +the revision walker.
Unless the reader has already been hanging out on the mailing list a
while, "most libified" may not mean much to them yet at this point.
The organization of the next bit is slightly confusing: we're set up to
expect a longer lecture on the revision walker, but instead there's just
the historical note on git-rev-list, a mention of 'revision.c',
'revision.h', and 'struct rev_info', and then it rapidly digresses into
discussing builtins.
Which actually is fine, but just a few small markers of where we are in
the discussion might be reassuring--a section header or two, maybe a
little more emphasis on the pointers you're giving, like: "take a moment
to go read revision.h and revision.c now, paying special attention to
struct rev_info, which ....".
--b.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 3:18 ` J. Bruce Fields
@ 2007-05-09 4:06 ` Junio C Hamano
2007-05-09 5:05 ` Junio C Hamano
2007-05-09 6:48 ` Karl Hasselström
` (2 subsequent siblings)
3 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2007-05-09 4:06 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Johannes Schindelin, git
"J. Bruce Fields" <bfields@fieldses.org> writes:
> The organization of the next bit is slightly confusing: we're set up to
> expect a longer lecture on the revision walker, but instead there's just
> the historical note on git-rev-list, a mention of 'revision.c',
> 'revision.h', and 'struct rev_info', and then it rapidly digresses into
> discussing builtins.
I had the same impression.
I was meaning to write a "code walkthru for git hackers and
wannabes" with target audience quite different from the
user-manual. My idea of which areas to cover in what order
seems to match with what Johannes started.
- sha1_name.c;
- read_sha1_file();
- revision.c::setup_revisions() to talk about parsing but not
about walking yet.
- start from builtin-merge-base.c into commit.c to talk about
revision traversal done by get_merge_bases(). This codepath
is much simpler than the revision.c machinery and is a good
primer to understand the latter.
- builtin-diff-tree.c to show one tree and two tree cases, go
into log-tree.c then tree-diff.c to show the use of
add_remove() and change() callbacks, and then finally talk
about diff_flush(), without talking about diffcore
transformations yet.
- start from builtin-log.c to review the setup_revisions(),
then talk about prepare_revision_walk() and get_revision()
machinery, first pass without talking about path limiting and
then with path limiting.
- fetch-pack.c and upload-pack.c to talk about the native
protocol over ssh and local forking, how revision traversal
machinery is used, the "objects pointed by refs are complete"
contract.
- daemon.c to see how upload-pack is invoked.
- read_cache(), active_cache[], active_nr and friends;
- update-index and write-tree, including how cache-tree
optimizes tree writing after small updates. Advanced students
can also look at git-apply here.
- unpack-trees.c and builtin-read-tree.c to talk about index stages.
- diffcore transformations, especially diffcore-rename.
- merge-recursive
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-08 23:10 ` Johannes Schindelin
2007-05-08 23:22 ` Karl Hasselström
@ 2007-05-09 4:54 ` Daniel Barkalow
2007-05-09 6:31 ` Karl Hasselström
2007-05-09 9:38 ` Johannes Schindelin
1 sibling, 2 replies; 33+ messages in thread
From: Daniel Barkalow @ 2007-05-09 4:54 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Karl Hasselström, bfields, junio, git
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; CHARSET=X-UNKNOWN, Size: 2423 bytes --]
On Wed, 9 May 2007, Johannes Schindelin wrote:
> Hi,
>
> On Tue, 8 May 2007, Karl Hasselström wrote:
>
> > On 2007-05-08 23:07:04 +0200, Johannes Schindelin wrote:
> >
> > > On Tue, 8 May 2007, Karl Hasselström wrote:
> > >
> > > > On 2007-05-08 17:10:47 +0200, Johannes Schindelin wrote:
> > > >
> > > > > + char *`, but is actually expected to be a pointer to `unsigned
> > > > > + char[20]`. This variable will contain the big endian version of the
> > > > > + 40-character hex string representation of the SHA-1.
> > > >
> > > > Either it should be "unsigned char[40]" (or possibly 41 with a
> > > > terminating \0), or else you shouldn't be talking about
> > > > hexadecimal since it's just a 20-byte big-endian unsigned integer.
> > > > (A third possibility is that I'm totally confused.)
> > >
> > > It is 40 hex-character, but 20 _byte_. If you have any ideas how to
> > > formulate that better than I did...
> >
> > I think this is less confusing:
> >
> > This variable will contain the 160-bit SHA-1.
> >
> > It avoids talking of hex, since it's not really stored in hex format
> > any more than any other binary number with a number of bits divisible
> > by four. And it avoids saying big-endian, which is not relevant anyway
> > since we don't use hashes as integers.
>
> Well, I do not buy into that. First, we _have_ to say that it is
> big-endian. It was utterly confusing to _me_ that the hash was not little
> endian, as I expected on an Intel processor.
SHA-1 is defined as producing a octet sequence, and to have a canonical
hex digit sequence conversion with the high nibbles first. Internally, it
is canonically specified using big-endian math, but the same algorithm
could equally be specified with little-endian math and different rules for
input and output.
> And I'd rather mention the hex representation (what you see in git-log and
> git-ls-tree). This helps debugging, believe me.
It's kind of important to distinguish between the hex representation and
the octet representation, because your code will not work at all if you
use the wrong one. And "unsigned char *" or "unsigned char[20]" is always
the octets; the hex is always "char *". Primarily mentioning the one that
is more intuitive but less frequently used doesn't help with understanding
the actual code.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 4:06 ` Junio C Hamano
@ 2007-05-09 5:05 ` Junio C Hamano
2007-05-09 9:33 ` Johannes Schindelin
2007-05-09 17:36 ` J. Bruce Fields
0 siblings, 2 replies; 33+ messages in thread
From: Junio C Hamano @ 2007-05-09 5:05 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Johannes Schindelin, git
Junio C Hamano <junkio@cox.net> writes:
> "J. Bruce Fields" <bfields@fieldses.org> writes:
>
>> The organization of the next bit is slightly confusing: we're set up to
>> expect a longer lecture on the revision walker, but instead there's just
>> the historical note on git-rev-list, a mention of 'revision.c',
>> 'revision.h', and 'struct rev_info', and then it rapidly digresses into
>> discussing builtins.
>
> I had the same impression.
>
> I was meaning to write a "code walkthru for git hackers and
> wannabes" with target audience quite different from the
> user-manual. My idea of which areas to cover in what order
> seems to match with what Johannes started.
Having said that, I do not think the patch belongs to the "git
USER'S manual". It is a very good introductory material for a
separate "git hackers manual", though.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 4:54 ` Daniel Barkalow
@ 2007-05-09 6:31 ` Karl Hasselström
2007-05-09 9:38 ` Johannes Schindelin
1 sibling, 0 replies; 33+ messages in thread
From: Karl Hasselström @ 2007-05-09 6:31 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Johannes Schindelin, bfields, junio, git
On 2007-05-09 00:54:03 -0400, Daniel Barkalow wrote:
> And "unsigned char *" or "unsigned char[20]" is always the octets;
> the hex is always "char *".
uint8_t, anyone? :-)
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 3:18 ` J. Bruce Fields
2007-05-09 4:06 ` Junio C Hamano
@ 2007-05-09 6:48 ` Karl Hasselström
2007-05-09 9:27 ` Johannes Schindelin
2007-05-09 12:19 ` Johannes Schindelin
3 siblings, 0 replies; 33+ messages in thread
From: Karl Hasselström @ 2007-05-09 6:48 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Johannes Schindelin, junio, git
On 2007-05-08 23:18:04 -0400, J. Bruce Fields wrote:
> Looks helpful, concise, and to the point. Neat-o.
Yeah, I like it too. I forgot to say that, and got right to the
criticism instead. But better late than never, I hope. :-)
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 3:18 ` J. Bruce Fields
2007-05-09 4:06 ` Junio C Hamano
2007-05-09 6:48 ` Karl Hasselström
@ 2007-05-09 9:27 ` Johannes Schindelin
2007-05-09 12:19 ` Johannes Schindelin
3 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-09 9:27 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: junio, git
Hi,
On Tue, 8 May 2007, J. Bruce Fields wrote:
> On Tue, May 08, 2007 at 05:10:47PM +0200, Johannes Schindelin wrote:
> >
> > +If you grasp the ideas in that initial commit (it is really small and you
> > +can get into it really fast, and it will help you recognize things in the
> > +much larger code base we have now), you should go on skimming `cache.h`,
> > +`object.h` and `commit.h`.
>
> Might want to add "in a recent commit"?--it's not clear that you've
> transitioned away from talking about the initial commit.
Yes, good idea.
> > +This is just to get you into the groove for the most libified part of Git:
> > +the revision walker.
>
> Unless the reader has already been hanging out on the mailing list a
> while, "most libified" may not mean much to them yet at this point.
How about a sentence way before that, when I talk about the initial
commit, like this:
In the early days, Git (in the tradition of UNIX) was a bunch of
programs which were extremely simple, and which you used in scripts,
piping the output of one into another. This turned out to be good
for initial development, since it was easier to test new things.
However, recently many of these parts have become builtins, and
some of the core has been "libified", i.e. put into libgit.a for
performance, portability reasons, and to avoid code duplication.
> The organization of the next bit is slightly confusing: we're set up to
> expect a longer lecture on the revision walker, but instead there's just
> the historical note on git-rev-list, a mention of 'revision.c',
> 'revision.h', and 'struct rev_info', and then it rapidly digresses into
> discussing builtins.
>
> Which actually is fine, but just a few small markers of where we are in
> the discussion might be reassuring--a section header or two, maybe a
> little more emphasis on the pointers you're giving, like: "take a moment
> to go read revision.h and revision.c now, paying special attention to
> struct rev_info, which ....".
Okay. I hope I will be able to make these changes until tomorrow (I will
be gone for a few days after that).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 5:05 ` Junio C Hamano
@ 2007-05-09 9:33 ` Johannes Schindelin
2007-05-09 17:36 ` J. Bruce Fields
1 sibling, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-09 9:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: J. Bruce Fields, git
Hi,
On Tue, 8 May 2007, Junio C Hamano wrote:
> Junio C Hamano <junkio@cox.net> writes:
>
> > "J. Bruce Fields" <bfields@fieldses.org> writes:
> >
> >> The organization of the next bit is slightly confusing: we're set up
> >> to expect a longer lecture on the revision walker, but instead
> >> there's just the historical note on git-rev-list, a mention of
> >> 'revision.c', 'revision.h', and 'struct rev_info', and then it
> >> rapidly digresses into discussing builtins.
> >
> > I had the same impression.
> >
> > I was meaning to write a "code walkthru for git hackers and wannabes"
> > with target audience quite different from the user-manual. My idea of
> > which areas to cover in what order seems to match with what Johannes
> > started.
>
> Having said that, I do not think the patch belongs to the "git USER'S
> manual". It is a very good introductory material for a separate "git
> hackers manual", though.
That is what I was referring to when I mentioned "no outcry". Bruce said
that he liked the idea to have something like that in the USER's manual.
And I have to agree: There might be enough room to actually go and write a
Git hacker's manual, but IMHO that takes a lot of time which has to be
found at first.
And even if we actually have such a hacker's manual one day, this "sneak
preview" in the user's manual does not hurt, but could actually entice
people to read that manual, too.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 4:54 ` Daniel Barkalow
2007-05-09 6:31 ` Karl Hasselström
@ 2007-05-09 9:38 ` Johannes Schindelin
2007-05-09 10:43 ` Karl Hasselström
1 sibling, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-09 9:38 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Karl Hasselström, bfields, junio, git
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=UTF-8, Size: 2681 bytes --]
Hi,
On Wed, 9 May 2007, Daniel Barkalow wrote:
> On Wed, 9 May 2007, Johannes Schindelin wrote:
>
> > On Tue, 8 May 2007, Karl Hasselström wrote:
> >
> > > On 2007-05-08 23:07:04 +0200, Johannes Schindelin wrote:
> > >
> > > > On Tue, 8 May 2007, Karl Hasselström wrote:
> > > >
> > > > > On 2007-05-08 17:10:47 +0200, Johannes Schindelin wrote:
> > > > >
> > > > > > + char *`, but is actually expected to be a pointer to `unsigned
> > > > > > + char[20]`. This variable will contain the big endian version of the
> > > > > > + 40-character hex string representation of the SHA-1.
> > > > >
> > > > > Either it should be "unsigned char[40]" (or possibly 41 with a
> > > > > terminating \0), or else you shouldn't be talking about
> > > > > hexadecimal since it's just a 20-byte big-endian unsigned integer.
> > > > > (A third possibility is that I'm totally confused.)
> > > >
> > > > It is 40 hex-character, but 20 _byte_. If you have any ideas how to
> > > > formulate that better than I did...
> > >
> > > I think this is less confusing:
> > >
> > > This variable will contain the 160-bit SHA-1.
> > >
> > > It avoids talking of hex, since it's not really stored in hex format
> > > any more than any other binary number with a number of bits divisible
> > > by four. And it avoids saying big-endian, which is not relevant anyway
> > > since we don't use hashes as integers.
> >
> > Well, I do not buy into that. First, we _have_ to say that it is
> > big-endian. It was utterly confusing to _me_ that the hash was not little
> > endian, as I expected on an Intel processor.
>
> SHA-1 is defined as producing a octet sequence, and to have a canonical
> hex digit sequence conversion with the high nibbles first. Internally, it
> is canonically specified using big-endian math, but the same algorithm
> could equally be specified with little-endian math and different rules for
> input and output.
>
> > And I'd rather mention the hex representation (what you see in git-log and
> > git-ls-tree). This helps debugging, believe me.
>
> It's kind of important to distinguish between the hex representation and
> the octet representation, because your code will not work at all if you
> use the wrong one. And "unsigned char *" or "unsigned char[20]" is always
> the octets; the hex is always "char *". Primarily mentioning the one that
> is more intuitive but less frequently used doesn't help with understanding
> the actual code.
That's a really good idea, to point out that "unsigned char *" refers to
octets, while "char *" refers to the ASCII representation. I will add
this, together with a simple example (the initial commit).
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 9:38 ` Johannes Schindelin
@ 2007-05-09 10:43 ` Karl Hasselström
0 siblings, 0 replies; 33+ messages in thread
From: Karl Hasselström @ 2007-05-09 10:43 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Daniel Barkalow, bfields, junio, git
On 2007-05-09 11:38:34 +0200, Johannes Schindelin wrote:
> On Wed, 9 May 2007, Daniel Barkalow wrote:
>
> > It's kind of important to distinguish between the hex
> > representation and the octet representation, because your code
> > will not work at all if you use the wrong one. And "unsigned char
> > *" or "unsigned char[20]" is always the octets; the hex is always
> > "char *". Primarily mentioning the one that is more intuitive but
> > less frequently used doesn't help with understanding the actual
> > code.
>
> That's a really good idea, to point out that "unsigned char *"
> refers to octets, while "char *" refers to the ASCII representation.
> I will add this, together with a simple example (the initial
> commit).
That'll address my complaint nicely, I believe. It was the confusion
between these two formats that I was trying to get at.
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 3:18 ` J. Bruce Fields
` (2 preceding siblings ...)
2007-05-09 9:27 ` Johannes Schindelin
@ 2007-05-09 12:19 ` Johannes Schindelin
2007-05-09 12:32 ` Petr Baudis
2007-05-09 13:18 ` J. Bruce Fields
3 siblings, 2 replies; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-09 12:19 UTC (permalink / raw)
To: J. Bruce Fields, kha, barkalow; +Cc: junio, git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3662 bytes --]
Hi,
for your reviewing pleasure, I made a patch on top of the original one,
but I can easily provide a full patch for application.
--
[PATCH] user-manual: Touch ups on the birdview section
... as suggested by J. Bruce Fields, Karl Hasselström and Daniel Barkalow.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
Documentation/user-manual.txt | 31 +++++++++++++++++++++++++++----
1 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index 2d58bb0..55934db 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -3197,7 +3197,15 @@ basically _the_ header file which is included by _all_ of Git's C sources.
If you grasp the ideas in that initial commit (it is really small and you
can get into it really fast, and it will help you recognize things in the
much larger code base we have now), you should go on skimming `cache.h`,
-`object.h` and `commit.h`.
+`object.h` and `commit.h` in the current version.
+
+In the early days, Git (in the tradition of UNIX) was a bunch of programs
+which were extremely simple, and which you used in scripts, piping the
+output of one into another. This turned out to be good for initial
+development, since it was easier to test new things. However, recently
+many of these parts have become builtins, and some of the core has been
+"libified", i.e. put into libgit.a for performance, portability reasons,
+and to avoid code duplication.
By now, you know what the index is (and find the corresponding data
structures in `cache.h`), and that there are just a couple of object types
@@ -3236,9 +3244,22 @@ options that were relevant for the different plumbing commands that were
called by the script.
Most of what `git-rev-list` did is contained in `revision.c` and
-`revision.h`. It wraps the options in a struct named rev_info, which
+`revision.h`. It wraps the options in a struct named `rev_info`, which
controls how and what revisions are walked, and more.
+The original job of `git-rev-parse` is now taken by the function
+`setup_revisions()`, which parses the revisions and the common command line
+options for the revision walker. This information is stored in the struct
+`rev_info` for later consumption. You can do your own command line option
+parsing after calling `setup_revisions()`. After that, you have to call
+`prepare_revision_walk()` for initialization, and then you can get the
+commits one by one with the function `get_revision()`.
+
+If you are interested in more details of the revision walking process,
+just have a look at the first implementation of `cmd_log()`; call
+`git-show v1.3.0~155^2~4` and scroll down to that function (note that you
+no longer need to call `setup_pager()` directly).
+
Nowadays, `git log` is a builtin, which means that it is _contained_ in the
command `git`. The source side of a builtin is
@@ -3300,8 +3321,10 @@ Two things are interesting here:
- the variable `sha1` in the function signature of `get_sha1()` is `unsigned
char *`, but is actually expected to be a pointer to `unsigned
- char[20]`. This variable will contain the big endian version of the
- 40-character hex string representation of the SHA-1.
+ char[20]`. This variable will contain the 160-bit SHA-1 of the given
+ commit. Note that whenever a SHA-1 is passed as "unsigned char *", it
+ is the binary representation (big-endian), as opposed to the ASCII
+ representation in hex characters, which is passed as "char *".
You will see both of these things throughout the code.
--
1.5.2.rc2.2502.g46b5cb
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 12:19 ` Johannes Schindelin
@ 2007-05-09 12:32 ` Petr Baudis
2007-05-09 12:50 ` Johannes Schindelin
2007-05-09 13:18 ` J. Bruce Fields
1 sibling, 1 reply; 33+ messages in thread
From: Petr Baudis @ 2007-05-09 12:32 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: J. Bruce Fields, kha, barkalow, junio, git
On Wed, May 09, 2007 at 02:19:03PM CEST, Johannes Schindelin wrote:
> diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
> index 2d58bb0..55934db 100644
> --- a/Documentation/user-manual.txt
> +++ b/Documentation/user-manual.txt
> @@ -3197,7 +3197,15 @@ basically _the_ header file which is included by _all_ of Git's C sources.
> If you grasp the ideas in that initial commit (it is really small and you
> can get into it really fast, and it will help you recognize things in the
> much larger code base we have now), you should go on skimming `cache.h`,
> -`object.h` and `commit.h`.
> +`object.h` and `commit.h` in the current version.
> +
> +In the early days, Git (in the tradition of UNIX) was a bunch of programs
> +which were extremely simple, and which you used in scripts, piping the
> +output of one into another. This turned out to be good for initial
> +development, since it was easier to test new things. However, recently
> +many of these parts have become builtins, and some of the core has been
> +"libified", i.e. put into libgit.a for performance, portability reasons,
> +and to avoid code duplication.
>
> By now, you know what the index is (and find the corresponding data
> structures in `cache.h`), and that there are just a couple of object types
I disagree, especially with the past tense of the first half of the
paragraph. Git is _still_ a bunch of programs you use in scripts, piping
the output of one into another. Another point is that
implementation-wise many of the code is currently shared in an internal
library, etc.
I'd be a bit careful to talk about libgit.a so leisurely since it might
give the reader an impression that there really _is_ "the git library",
with API and everything, that they can use externally. Of course you
need to mention libgit.a, but I'd also mention that it is so far meant
only for internal git's use and has no solidified API.
> @@ -3300,8 +3321,10 @@ Two things are interesting here:
>
> - the variable `sha1` in the function signature of `get_sha1()` is `unsigned
> char *`, but is actually expected to be a pointer to `unsigned
> - char[20]`. This variable will contain the big endian version of the
> - 40-character hex string representation of the SHA-1.
> + char[20]`. This variable will contain the 160-bit SHA-1 of the given
> + commit. Note that whenever a SHA-1 is passed as "unsigned char *", it
> + is the binary representation (big-endian), as opposed to the ASCII
> + representation in hex characters, which is passed as "char *".
>
> You will see both of these things throughout the code.
To be honest, I wouldn't even be *thinking* about the endianity of SHA-1
octet representation (you don't usually really deal with the hash as
with a number, so expecting to have it in native endianity is not very
natural; you just deal with it as with a data blob) and the
"(big-endian)" would only confuse me and get me thinking about "huh, do
they swap the bytes, or wait, they don't, ...?!".
But that's maybe just me.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
-- Samuel Beckett
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 12:32 ` Petr Baudis
@ 2007-05-09 12:50 ` Johannes Schindelin
2007-05-09 16:18 ` Daniel Barkalow
0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-09 12:50 UTC (permalink / raw)
To: Petr Baudis; +Cc: J. Bruce Fields, kha, barkalow, junio, git
Hi,
On Wed, 9 May 2007, Petr Baudis wrote:
> On Wed, May 09, 2007 at 02:19:03PM CEST, Johannes Schindelin wrote:
> > diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
> > index 2d58bb0..55934db 100644
> > --- a/Documentation/user-manual.txt
> > +++ b/Documentation/user-manual.txt
> > @@ -3197,7 +3197,15 @@ basically _the_ header file which is included by _all_ of Git's C sources.
> > If you grasp the ideas in that initial commit (it is really small and you
> > can get into it really fast, and it will help you recognize things in the
> > much larger code base we have now), you should go on skimming `cache.h`,
> > -`object.h` and `commit.h`.
> > +`object.h` and `commit.h` in the current version.
> > +
> > +In the early days, Git (in the tradition of UNIX) was a bunch of programs
> > +which were extremely simple, and which you used in scripts, piping the
> > +output of one into another. This turned out to be good for initial
> > +development, since it was easier to test new things. However, recently
> > +many of these parts have become builtins, and some of the core has been
> > +"libified", i.e. put into libgit.a for performance, portability reasons,
> > +and to avoid code duplication.
> >
> > By now, you know what the index is (and find the corresponding data
> > structures in `cache.h`), and that there are just a couple of object types
>
> I disagree, especially with the past tense of the first half of the
> paragraph. Git is _still_ a bunch of programs you use in scripts, piping
> the output of one into another. Another point is that
> implementation-wise many of the code is currently shared in an internal
> library, etc.
No. Many parts are _not_ simple programs piped into each other. git-log,
git-show, git-mv come to mind. That is why I wrote "many" and not "all".
> I'd be a bit careful to talk about libgit.a so leisurely since it might
> give the reader an impression that there really _is_ "the git library",
> with API and everything, that they can use externally. Of course you
> need to mention libgit.a, but I'd also mention that it is so far meant
> only for internal git's use and has no solidified API.
Frankly, this is just a birdview thing. If you want to go and make a
hacker's manual, go ahead!
> > @@ -3300,8 +3321,10 @@ Two things are interesting here:
> >
> > - the variable `sha1` in the function signature of `get_sha1()` is `unsigned
> > char *`, but is actually expected to be a pointer to `unsigned
> > - char[20]`. This variable will contain the big endian version of the
> > - 40-character hex string representation of the SHA-1.
> > + char[20]`. This variable will contain the 160-bit SHA-1 of the given
> > + commit. Note that whenever a SHA-1 is passed as "unsigned char *", it
> > + is the binary representation (big-endian), as opposed to the ASCII
> > + representation in hex characters, which is passed as "char *".
> >
> > You will see both of these things throughout the code.
>
> To be honest, I wouldn't even be *thinking* about the endianity of SHA-1
> octet representation (you don't usually really deal with the hash as
> with a number, so expecting to have it in native endianity is not very
> natural; you just deal with it as with a data blob) and the
> "(big-endian)" would only confuse me and get me thinking about "huh, do
> they swap the bytes, or wait, they don't, ...?!".
>
> But that's maybe just me.
But then, maybe it is just me? I got it completely wrong the first time,
fully expecting the calculations to be carried out in host endianness for
performance reasons.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 12:19 ` Johannes Schindelin
2007-05-09 12:32 ` Petr Baudis
@ 2007-05-09 13:18 ` J. Bruce Fields
2007-05-10 4:15 ` Junio C Hamano
1 sibling, 1 reply; 33+ messages in thread
From: J. Bruce Fields @ 2007-05-09 13:18 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: kha, barkalow, junio, git
On Wed, May 09, 2007 at 02:19:03PM +0200, Johannes Schindelin wrote:
> [PATCH] user-manual: Touch ups on the birdview section
Those all look like sensible changes to me, thanks!
--b.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 12:50 ` Johannes Schindelin
@ 2007-05-09 16:18 ` Daniel Barkalow
2007-05-09 16:25 ` Johannes Schindelin
0 siblings, 1 reply; 33+ messages in thread
From: Daniel Barkalow @ 2007-05-09 16:18 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Petr Baudis, J. Bruce Fields, kha, junio, git
On Wed, 9 May 2007, Johannes Schindelin wrote:
> > To be honest, I wouldn't even be *thinking* about the endianity of SHA-1
> > octet representation (you don't usually really deal with the hash as
> > with a number, so expecting to have it in native endianity is not very
> > natural; you just deal with it as with a data blob) and the
> > "(big-endian)" would only confuse me and get me thinking about "huh, do
> > they swap the bytes, or wait, they don't, ...?!".
> >
> > But that's maybe just me.
>
> But then, maybe it is just me? I got it completely wrong the first time,
> fully expecting the calculations to be carried out in host endianness for
> performance reasons.
I think the Mozilla implementation carries out calculations in host
endianness, and transfers data from the input to the internal state and
from the internal state to the final hash with shifts and masks.
Which calculations are you seeing that involve byte order?
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 16:18 ` Daniel Barkalow
@ 2007-05-09 16:25 ` Johannes Schindelin
2007-05-09 17:07 ` J. Bruce Fields
0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-09 16:25 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: Petr Baudis, J. Bruce Fields, kha, junio, git
Hi,
On Wed, 9 May 2007, Daniel Barkalow wrote:
> On Wed, 9 May 2007, Johannes Schindelin wrote:
>
> > > To be honest, I wouldn't even be *thinking* about the endianity of SHA-1
> > > octet representation (you don't usually really deal with the hash as
> > > with a number, so expecting to have it in native endianity is not very
> > > natural; you just deal with it as with a data blob) and the
> > > "(big-endian)" would only confuse me and get me thinking about "huh, do
> > > they swap the bytes, or wait, they don't, ...?!".
> > >
> > > But that's maybe just me.
> >
> > But then, maybe it is just me? I got it completely wrong the first time,
> > fully expecting the calculations to be carried out in host endianness for
> > performance reasons.
>
> I think the Mozilla implementation carries out calculations in host
> endianness, and transfers data from the input to the internal state and
> >from the internal state to the final hash with shifts and masks.
>
> Which calculations are you seeing that involve byte order?
None. I only suspected them to be carried out in byte order. From what I
know, there are some shifts involved, which might or might not be helped
by 32-bit arithmetic.
I did not really look into it.
>From my prior debugging experiences on Intel, though, I automatically
looked for the least significant bytes at the beginning of those "sha1"
variables, and came up empty.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 16:25 ` Johannes Schindelin
@ 2007-05-09 17:07 ` J. Bruce Fields
2007-05-09 20:15 ` Johannes Schindelin
0 siblings, 1 reply; 33+ messages in thread
From: J. Bruce Fields @ 2007-05-09 17:07 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Daniel Barkalow, Petr Baudis, kha, junio, git
On Wed, May 09, 2007 at 06:25:01PM +0200, Johannes Schindelin wrote:
> None. I only suspected them to be carried out in byte order. From what I
> know, there are some shifts involved, which might or might not be helped
> by 32-bit arithmetic.
>
> I did not really look into it.
>
> From my prior debugging experiences on Intel, though, I automatically
> looked for the least significant bytes at the beginning of those "sha1"
> variables, and came up empty.
So, I'm confused about what you actually mean by "big endian" here. I
originally assumed that you meant that SHA1's are defined as bit arrays,
and that the first bit of the SHA1 is in the high-order bit of the first
byte. But if you just meant that the first byte of the SHA1 is stored
in the first byte of the array... that kind of goes without saying,
doesn't it?
In any case, maybe this is a detail that's best left to the code itself.
--b.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 5:05 ` Junio C Hamano
2007-05-09 9:33 ` Johannes Schindelin
@ 2007-05-09 17:36 ` J. Bruce Fields
1 sibling, 0 replies; 33+ messages in thread
From: J. Bruce Fields @ 2007-05-09 17:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, git
On Tue, May 08, 2007 at 10:05:36PM -0700, Junio C Hamano wrote:
> Having said that, I do not think the patch belongs to the "git
> USER'S manual".
Well, we could remove the word "user" from the name. There's a pretty
good continuum between users and hackers, and that's as it should be.
(Where do you document "porcelain" level stuff?)
> It is a very good introductory material for a separate "git hackers
> manual", though.
But that would be OK too, as long as we have a clear idea how we're
going to decide what goes in which manual. No need to wait until the
whole thing's done--we could commit an initial version of it Johannes's
work and your outline and fill in the rest later.
--b.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 17:07 ` J. Bruce Fields
@ 2007-05-09 20:15 ` Johannes Schindelin
2007-05-09 20:32 ` J. Bruce Fields
2007-05-09 20:45 ` Daniel Barkalow
0 siblings, 2 replies; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-09 20:15 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Daniel Barkalow, Petr Baudis, kha, junio, git
Hi,
On Wed, 9 May 2007, J. Bruce Fields wrote:
> On Wed, May 09, 2007 at 06:25:01PM +0200, Johannes Schindelin wrote:
> > None. I only suspected them to be carried out in byte order. From what I
> > know, there are some shifts involved, which might or might not be helped
> > by 32-bit arithmetic.
> >
> > I did not really look into it.
> >
> > From my prior debugging experiences on Intel, though, I automatically
> > looked for the least significant bytes at the beginning of those "sha1"
> > variables, and came up empty.
>
> So, I'm confused about what you actually mean by "big endian" here. I
> originally assumed that you meant that SHA1's are defined as bit arrays,
> and that the first bit of the SHA1 is in the high-order bit of the first
> byte. But if you just meant that the first byte of the SHA1 is stored
> in the first byte of the array... that kind of goes without saying,
> doesn't it?
Hm.
Let me explain it in this way:
If you parse a number, passed to a program, with strtol(argv[1], NULL, 0)
you would expect something like this on an Intel processor:
Input 0x1234 -> memory 0x34 0x12 0x00 0x00.
On a big endian machine, you'd expect 0x00 0x00 0x12 0x34.
That is what endianness means.
If you tell Git that it should look for commit e83c6516..., it will store
the sha1 as 0xe8 0x3c 0x65 0x16 ... in memory, no matter which
endianness the processor has.
Which was positively confusing for me, since I automatically searched for
the sequence 0x90 0xf2 0x4a 0x60 ... (which is the tail of that hash).
But if all this sounds too confusing, I agree to delete the
"(big-endian)".
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 20:15 ` Johannes Schindelin
@ 2007-05-09 20:32 ` J. Bruce Fields
2007-05-09 20:45 ` Daniel Barkalow
1 sibling, 0 replies; 33+ messages in thread
From: J. Bruce Fields @ 2007-05-09 20:32 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Daniel Barkalow, Petr Baudis, kha, junio, git
On Wed, May 09, 2007 at 10:15:19PM +0200, Johannes Schindelin wrote:
> If you parse a number, passed to a program, with strtol(argv[1], NULL, 0)
> you would expect something like this on an Intel processor:
>
> Input 0x1234 -> memory 0x34 0x12 0x00 0x00.
Right, but this is something special to integers. If it made sense for
some strange reason to define the structure carrying a sha1 as int[5]
instead of char[20] then I'd understand the confusion, but char[20] is
totally unambiguous.
> But if all this sounds too confusing, I agree to delete the
> "(big-endian)".
Yeah, I think that'd be best; thanks.
--b.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 20:15 ` Johannes Schindelin
2007-05-09 20:32 ` J. Bruce Fields
@ 2007-05-09 20:45 ` Daniel Barkalow
2007-05-09 22:23 ` Johannes Schindelin
1 sibling, 1 reply; 33+ messages in thread
From: Daniel Barkalow @ 2007-05-09 20:45 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: J. Bruce Fields, Petr Baudis, kha, junio, git
On Wed, 9 May 2007, Johannes Schindelin wrote:
> Let me explain it in this way:
>
> If you parse a number, passed to a program, with strtol(argv[1], NULL, 0)
> you would expect something like this on an Intel processor:
>
> Input 0x1234 -> memory 0x34 0x12 0x00 0x00.
>
> On a big endian machine, you'd expect 0x00 0x00 0x12 0x34.
>
> That is what endianness means.
>
> If you tell Git that it should look for commit e83c6516..., it will store
> the sha1 as 0xe8 0x3c 0x65 0x16 ... in memory, no matter which
> endianness the processor has.
But it would be really weird to get 0x90 0xf2 0x4a 0x60 ... 0x16 0x65 0x3c
0xe8 unless you've got a 160-bit little-endian processor. That would be as
strange as having "Test" stored as 0x74 0x73 0x65 0x54, I think.
> Which was positively confusing for me, since I automatically searched for
> the sequence 0x90 0xf2 0x4a 0x60 ... (which is the tail of that hash).
>
> But if all this sounds too confusing, I agree to delete the
> "(big-endian)".
If it confused you, there should be something there. Maybe "(in order)" or
something else implying that the underlying type is an octet sequence,
rather than a 160-bit integer?
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 20:45 ` Daniel Barkalow
@ 2007-05-09 22:23 ` Johannes Schindelin
2007-05-10 20:01 ` Karl Hasselström
0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-09 22:23 UTC (permalink / raw)
To: Daniel Barkalow; +Cc: J. Bruce Fields, Petr Baudis, kha, junio, git
Hi,
On Wed, 9 May 2007, Daniel Barkalow wrote:
> On Wed, 9 May 2007, Johannes Schindelin wrote:
>
> > If you tell Git that it should look for commit e83c6516..., it will
> > store the sha1 as 0xe8 0x3c 0x65 0x16 ... in memory, no matter which
> > endianness the processor has.
>
> But it would be really weird to get 0x90 0xf2 0x4a 0x60 ... 0x16 0x65
> 0x3c 0xe8 unless you've got a 160-bit little-endian processor. That
> would be as strange as having "Test" stored as 0x74 0x73 0x65 0x54, I
> think.
I was not aware originally, that no arithmetic is involved in SHA-1
computation.
If you store large integers, it makes tons of sense to follow the
endianness, especially if you do _both_ boolean and integer operations on
them.
> > Which was positively confusing for me, since I automatically searched
> > for the sequence 0x90 0xf2 0x4a 0x60 ... (which is the tail of that
> > hash).
> >
> > But if all this sounds too confusing, I agree to delete the
> > "(big-endian)".
>
> If it confused you, there should be something there. Maybe "(in order)"
> or something else implying that the underlying type is an octet
> sequence, rather than a 160-bit integer?
Well, I am convinced by now that nobody could be as stupid as me, so I
think it is good without such a hint :-)
Ciao,
Dscho
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 13:18 ` J. Bruce Fields
@ 2007-05-10 4:15 ` Junio C Hamano
2007-05-10 10:36 ` Johannes Schindelin
0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2007-05-10 4:15 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Johannes Schindelin, kha, barkalow, git
"J. Bruce Fields" <bfields@fieldses.org> writes:
> On Wed, May 09, 2007 at 02:19:03PM +0200, Johannes Schindelin wrote:
>> [PATCH] user-manual: Touch ups on the birdview section
>
> Those all look like sensible changes to me, thanks!
Likewise, except that big-endian bit I think everybody agrees on
just dropping.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-10 4:15 ` Junio C Hamano
@ 2007-05-10 10:36 ` Johannes Schindelin
2007-05-10 20:42 ` Junio C Hamano
0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2007-05-10 10:36 UTC (permalink / raw)
To: Junio C Hamano; +Cc: J. Bruce Fields, kha, barkalow, git
[-- Attachment #1: Type: TEXT/PLAIN, Size: 11208 bytes --]
In http://thread.gmane.org/gmane.comp.version-control.git/42479,
a birdview on the source code was requested.
J. Bruce Fields suggested that my reply should be included in the
user manual, and there was nothing of an outcry, so here it is,
not 2 months later.
It includes modifications as suggested by J. Bruce Fields, Karl
Hasselström and Daniel Barkalow.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
On Wed, 9 May 2007, Junio C Hamano wrote:
> "J. Bruce Fields" <bfields@fieldses.org> writes:
>
> > On Wed, May 09, 2007 at 02:19:03PM +0200, Johannes Schindelin wrote:
> >> [PATCH] user-manual: Touch ups on the birdview section
> >
> > Those all look like sensible changes to me, thanks!
>
> Likewise, except that big-endian bit I think everybody agrees on
> just dropping.
And here it is, in its full glory, prepared in a way which
appeals to the maintainers, hopefully...
Documentation/user-manual.txt | 219 +++++++++++++++++++++++++++++++++++++++++
1 files changed, 219 insertions(+), 0 deletions(-)
diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index 13db969..3c3f1b4 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -3160,6 +3160,225 @@ confusing and scary messages, but it won't actually do anything bad. In
contrast, running "git prune" while somebody is actively changing the
repository is a *BAD* idea).
+[[birdview-on-the-source-code]]
+A birdview on Git's source code
+-----------------------------
+
+While Git's source code is quite elegant, it is not always easy for
+new developers to find their way through it. A good idea is to look
+at the contents of the initial commit:
+_e83c5163316f89bfbde7d9ab23ca2e25604af290_ (also known as _v0.99~954_).
+
+Tip: you can see what files are in there with
+
+----------------------------------------------------
+$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:
+----------------------------------------------------
+
+and look at those files with something like
+
+-----------------------------------------------------------
+$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:cache.h
+-----------------------------------------------------------
+
+Be sure to read the README in that revision _after_ you are familiar with
+the terminology (<<glossary>>), since the terminology has changed a little
+since then. For example, we call the things "commits" now, which are
+described in that README as "changesets".
+
+Actually a lot of the structure as it is now can be explained by that
+initial commit.
+
+For example, we do not call it "cache" any more, but "index", however, the
+file is still called `cache.h`. Remark: Not much reason to change it now,
+especially since there is no good single name for it anyway, because it is
+basically _the_ header file which is included by _all_ of Git's C sources.
+
+If you grasp the ideas in that initial commit (it is really small and you
+can get into it really fast, and it will help you recognize things in the
+much larger code base we have now), you should go on skimming `cache.h`,
+`object.h` and `commit.h` in the current version.
+
+In the early days, Git (in the tradition of UNIX) was a bunch of programs
+which were extremely simple, and which you used in scripts, piping the
+output of one into another. This turned out to be good for initial
+development, since it was easier to test new things. However, recently
+many of these parts have become builtins, and some of the core has been
+"libified", i.e. put into libgit.a for performance, portability reasons,
+and to avoid code duplication.
+
+By now, you know what the index is (and find the corresponding data
+structures in `cache.h`), and that there are just a couple of object types
+(blobs, trees, commits and tags) which inherit their common structure from
+`struct object`, which is their first member (and thus, you can cast e.g.
+`(struct object *)commit` to achieve the _same_ as `&commit->object`, i.e.
+get at the object name and flags).
+
+Now is a good point to take a break to let this information sink in.
+
+Next step: get familiar with the object naming. Read <<naming-commits>>.
+There are quite a few ways to name an object (and not only revisions!).
+All of these are handled in `sha1_name.c`. Just have a quick look at
+the function `get_sha1()`. A lot of the special handling is done by
+functions like `get_sha1_basic()` or the likes.
+
+This is just to get you into the groove for the most libified part of Git:
+the revision walker.
+
+Basically, the initial version of `git log` was a shell script:
+
+----------------------------------------------------------------
+$ git-rev-list --pretty $(git-rev-parse --default HEAD "$@") | \
+ LESS=-S ${PAGER:-less}
+----------------------------------------------------------------
+
+What does this mean?
+
+`git-rev-list` is the original version of the revision walker, which
+_always_ printed a list of revisions to stdout. It is still functional,
+and needs to, since most new Git programs start out as scripts using
+`git-rev-list`.
+
+`git-rev-parse` is not as important any more; it was only used to filter out
+options that were relevant for the different plumbing commands that were
+called by the script.
+
+Most of what `git-rev-list` did is contained in `revision.c` and
+`revision.h`. It wraps the options in a struct named `rev_info`, which
+controls how and what revisions are walked, and more.
+
+The original job of `git-rev-parse` is now taken by the function
+`setup_revisions()`, which parses the revisions and the common command line
+options for the revision walker. This information is stored in the struct
+`rev_info` for later consumption. You can do your own command line option
+parsing after calling `setup_revisions()`. After that, you have to call
+`prepare_revision_walk()` for initialization, and then you can get the
+commits one by one with the function `get_revision()`.
+
+If you are interested in more details of the revision walking process,
+just have a look at the first implementation of `cmd_log()`; call
+`git-show v1.3.0~155^2~4` and scroll down to that function (note that you
+no longer need to call `setup_pager()` directly).
+
+Nowadays, `git log` is a builtin, which means that it is _contained_ in the
+command `git`. The source side of a builtin is
+
+- a function called `cmd_<bla>`, typically defined in `builtin-<bla>.c`,
+ and declared in `builtin.h`,
+
+- an entry in the `commands[]` array in `git.c`, and
+
+- an entry in `BUILTIN_OBJECTS` in the `Makefile`.
+
+Sometimes, more than one builtin is contained in one source file. For
+example, `cmd_whatchanged()` and `cmd_log()` both reside in `builtin-log.c`,
+since they share quite a bit of code. In that case, the commands which are
+_not_ named like the `.c` file in which they live have to be listed in
+`BUILT_INS` in the `Makefile`.
+
+`git log` looks more complicated in C than it does in the original script,
+but that allows for a much greater flexibility and performance.
+
+Here again it is a good point to take a pause.
+
+Lesson three is: study the code. Really, it is the best way to learn about
+the organization of Git (after you know the basic concepts).
+
+So, think about something which you are interested in, say, "how can I
+access a blob just knowing the object name of it?". The first step is to
+find a Git command with which you can do it. In this example, it is either
+`git show` or `git cat-file`.
+
+For the sake of clarity, let's stay with `git cat-file`, because it
+
+- is plumbing, and
+
+- was around even in the initial commit (it literally went only through
+ some 20 revisions as `cat-file.c`, was renamed to `builtin-cat-file.c`
+ when made a builtin, and then saw less than 10 versions).
+
+So, look into `builtin-cat-file.c`, search for `cmd_cat_file()` and look what
+it does.
+
+------------------------------------------------------------------
+ git_config(git_default_config);
+ if (argc != 3)
+ usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
+ if (get_sha1(argv[2], sha1))
+ die("Not a valid object name %s", argv[2]);
+------------------------------------------------------------------
+
+Let's skip over the obvious details; the only really interesting part
+here is the call to `get_sha1()`. It tries to interpret `argv[2]` as an
+object name, and if it refers to an object which is present in the current
+repository, it writes the resulting SHA-1 into the variable `sha1`.
+
+Two things are interesting here:
+
+- `get_sha1()` returns 0 on _success_. This might surprise some new
+ Git hackers, but there is a long tradition in UNIX to return different
+ negative numbers in case of different errors -- and 0 on success.
+
+- the variable `sha1` in the function signature of `get_sha1()` is `unsigned
+ char *`, but is actually expected to be a pointer to `unsigned
+ char[20]`. This variable will contain the 160-bit SHA-1 of the given
+ commit. Note that whenever a SHA-1 is passed as "unsigned char *", it
+ is the binary representation, as opposed to the ASCII representation in
+ hex characters, which is passed as "char *".
+
+You will see both of these things throughout the code.
+
+Now, for the meat:
+
+-----------------------------------------------------------------------------
+ case 0:
+ buf = read_object_with_reference(sha1, argv[1], &size, NULL);
+-----------------------------------------------------------------------------
+
+This is how you read a blob (actually, not only a blob, but any type of
+object). To know how the function `read_object_with_reference()` actually
+works, find the source code for it (something like `git grep
+read_object_with | grep ":[a-z]"` in the git repository), and read
+the source.
+
+To find out how the result can be used, just read on in `cmd_cat_file()`:
+
+-----------------------------------
+ write_or_die(1, buf, size);
+-----------------------------------
+
+Sometimes, you do not know where to look for a feature. In many such cases,
+it helps to search through the output of `git log`, and then `git show` the
+corresponding commit.
+
+Example: If you know that there was some test case for `git bundle`, but
+do not remember where it was (yes, you _could_ `git grep bundle t/`, but that
+does not illustrate the point!):
+
+------------------------
+$ git log --no-merges t/
+------------------------
+
+In the pager (`less`), just search for "bundle", go a few lines back,
+and see that it is in commit 18449ab0... Now just copy this object name,
+and paste it into the command line
+
+-------------------
+$ git show 18449ab0
+-------------------
+
+Voila.
+
+Another example: Find out what to do in order to make some script a
+builtin:
+
+-------------------------------------------------
+$ git log --no-merges --diff-filter=A builtin-*.c
+-------------------------------------------------
+
+You see, Git is actually the best tool to find out about the source of Git
+itself!
+
[[glossary]]
include::glossary.txt[]
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-09 22:23 ` Johannes Schindelin
@ 2007-05-10 20:01 ` Karl Hasselström
0 siblings, 0 replies; 33+ messages in thread
From: Karl Hasselström @ 2007-05-10 20:01 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Daniel Barkalow, J. Bruce Fields, Petr Baudis, junio, git
On 2007-05-10 00:23:50 +0200, Johannes Schindelin wrote:
> I was not aware originally, that no arithmetic is involved in SHA-1
> computation.
>
> If you store large integers, it makes tons of sense to follow the
> endianness, especially if you do _both_ boolean and integer
> operations on them.
Actually, if you take a look at
http://en.wikipedia.org/wiki/Sha1#SHA-1_algorithm
you'll see that in addition to an unholy mess of bitwise operations,
it does do some additions, on 32-bit big-endian words according to the
article.
But thinking of them as addition gives you the wrong mental picture;
they're simply one of many ways for a standard processor to mix bits
efficiently as far as SHA-1 is concerned. The algorithm is specified
as yielding a 160-bit binary blob, and can and should be thought of as
a black NSA-certified box with "warranty void if this seal is broken"
stickers. (Unless you're the one implementing it, of course. But then
you know what you're doing.)
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-10 10:36 ` Johannes Schindelin
@ 2007-05-10 20:42 ` Junio C Hamano
2007-05-10 21:14 ` Karl Hasselström
0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2007-05-10 20:42 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: J. Bruce Fields, kha, barkalow, git
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
> index 13db969..3c3f1b4 100644
> --- a/Documentation/user-manual.txt
> +++ b/Documentation/user-manual.txt
> @@ -3160,6 +3160,225 @@ confusing and scary messages, but it won't actually do anything bad. In
> contrast, running "git prune" while somebody is actively changing the
> repository is a *BAD* idea).
>
> +[[birdview-on-the-source-code]]
> +A birdview on Git's source code
> +-----------------------------
Perhaps two dashes too short here...
> +
> +While Git's source code is quite elegant, it is not always easy for
> +new developers to find their way through it. A good idea is to look
> +at the contents of the initial commit:
> +_e83c5163316f89bfbde7d9ab23ca2e25604af290_ (also known as _v0.99~954_).
I am not sure we would need to say "is quite elegant". Why
don't we be blunt and say "It is not always easy for ...". That
holds true for any project of nontrivial size. I would rewrite
the first part like this.
It is not always easy for new developers to find their
way through Git's source code. This section gives you a
gentle guidance to show where to start.
A good place to start is to look at the contents of the
initial commit, with this command:
----------------------------------------------------------------
$ git checkout e83c516
----------------------------------------------------------------
and would not bore users with v0.99~954 or "git show" details.
"git show" to inspect one file at a time is not a good way to
get the feel of unknown set of source files, even though it is
very handy once you know where things were. And then continue
on to this part...
> +Be sure to read the README in that revision _after_ you are familiar with
> +the terminology (<<glossary>>), since the terminology has changed a little
> +since then. For example, we call the things "commits" now, which are
> +described in that README as "changesets".
> +
> +Actually a lot of the structure as it is now can be explained by that
> +initial commit.
It is also worth to point out that the initial revision, while
laying the foundation of almost every important factor of git we
have today, is small enough for reading everything in one
sitting, probably upfront, instead of making it a parenthesized
comment in a later paragraph. If somebody wants to dive into
git development to take a source-code tour, it is not really "it
will help you", but is a small enough required investment of
time.
> +For example, we do not call it "cache" any more, but "index", however, the
> +file is still called `cache.h`. Remark: Not much reason to change it now,
> +especially since there is no good single name for it anyway, because it is
> +basically _the_ header file which is included by _all_ of Git's C sources.
> +
> +If you grasp the ideas in that initial commit (it is really small and you
> +can get into it really fast, and it will help you recognize things in the
> +much larger code base we have now), you should go on skimming `cache.h`,
> +`object.h` and `commit.h` in the current version.
Other than that, I think this is well written, and if everybody
thinks it should be in the user's manual, I am fine with it.
By the way, when I sent the outline of hacker's manual as a
follow-up to the discussion, I think I forgot to properly say
this, so here it is: Thanks for starting the bird's eye view.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Add a birdview-on-the-source-code section to the user manual
2007-05-10 20:42 ` Junio C Hamano
@ 2007-05-10 21:14 ` Karl Hasselström
0 siblings, 0 replies; 33+ messages in thread
From: Karl Hasselström @ 2007-05-10 21:14 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, J. Bruce Fields, barkalow, git
On 2007-05-10 13:42:38 -0700, Junio C Hamano wrote:
> Other than that, I think this is well written, and if everybody
> thinks it should be in the user's manual, I am fine with it.
One "hacking howto" chapter at the end of the user's manual seems
perfectly fine to me too. We'll just have to remember to split it out
into a manual of its own if it grows too large for one chapter.
It's the potted plant strategy. :-)
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2007-05-10 21:14 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-08 15:10 [PATCH] Add a birdview-on-the-source-code section to the user manual Johannes Schindelin
2007-05-08 21:01 ` Karl Hasselström
2007-05-08 21:07 ` Johannes Schindelin
2007-05-08 21:31 ` Karl Hasselström
2007-05-08 23:10 ` Johannes Schindelin
2007-05-08 23:22 ` Karl Hasselström
2007-05-09 4:54 ` Daniel Barkalow
2007-05-09 6:31 ` Karl Hasselström
2007-05-09 9:38 ` Johannes Schindelin
2007-05-09 10:43 ` Karl Hasselström
2007-05-09 3:18 ` J. Bruce Fields
2007-05-09 4:06 ` Junio C Hamano
2007-05-09 5:05 ` Junio C Hamano
2007-05-09 9:33 ` Johannes Schindelin
2007-05-09 17:36 ` J. Bruce Fields
2007-05-09 6:48 ` Karl Hasselström
2007-05-09 9:27 ` Johannes Schindelin
2007-05-09 12:19 ` Johannes Schindelin
2007-05-09 12:32 ` Petr Baudis
2007-05-09 12:50 ` Johannes Schindelin
2007-05-09 16:18 ` Daniel Barkalow
2007-05-09 16:25 ` Johannes Schindelin
2007-05-09 17:07 ` J. Bruce Fields
2007-05-09 20:15 ` Johannes Schindelin
2007-05-09 20:32 ` J. Bruce Fields
2007-05-09 20:45 ` Daniel Barkalow
2007-05-09 22:23 ` Johannes Schindelin
2007-05-10 20:01 ` Karl Hasselström
2007-05-09 13:18 ` J. Bruce Fields
2007-05-10 4:15 ` Junio C Hamano
2007-05-10 10:36 ` Johannes Schindelin
2007-05-10 20:42 ` Junio C Hamano
2007-05-10 21:14 ` Karl Hasselström
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).