* [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
@ 2007-05-14 15:21 J. Bruce Fields
0 siblings, 0 replies; 18+ messages in thread
From: J. Bruce Fields @ 2007-05-14 15:21 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Johannes Schindelin
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
In http://thread.gmane.org/gmane.comp.version-control.git/42479,
a birdview on the source code was requested.
J. Bruce Fields suggested that my reply should be included in the
user manual, and there was nothing of an outcry, so here it is,
not 2 months later.
It includes modifications as suggested by J. Bruce Fields, Karl
Hasselström and Daniel Barkalow.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---
Documentation/user-manual.txt | 219 +++++++++++++++++++++++++++++++++++++++++
1 files changed, 219 insertions(+), 0 deletions(-)
diff --git a/Documentation/user-manual.txt b/Documentation/user-manual.txt
index 13db969..bac9660 100644
--- a/Documentation/user-manual.txt
+++ b/Documentation/user-manual.txt
@@ -3160,6 +3160,225 @@ confusing and scary messages, but it won't actually do anything bad. In
contrast, running "git prune" while somebody is actively changing the
repository is a *BAD* idea).
+[[birdview-on-the-source-code]]
+A birdview on Git's source code
+-----------------------------
+
+While Git's source code is quite elegant, it is not always easy for
+new developers to find their way through it. A good idea is to look
+at the contents of the initial commit:
+_e83c5163316f89bfbde7d9ab23ca2e25604af290_ (also known as _v0.99~954_).
+
+Tip: you can see what files are in there with
+
+----------------------------------------------------
+$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:
+----------------------------------------------------
+
+and look at those files with something like
+
+-----------------------------------------------------------
+$ git show e83c5163316f89bfbde7d9ab23ca2e25604af290:cache.h
+-----------------------------------------------------------
+
+Be sure to read the README in that revision _after_ you are familiar with
+the terminology (<<glossary>>), since the terminology has changed a little
+since then. For example, we call the things "commits" now, which are
+described in that README as "changesets".
+
+Actually a lot of the structure as it is now can be explained by that
+initial commit.
+
+For example, we do not call it "cache" any more, but "index", however, the
+file is still called `cache.h`. Remark: Not much reason to change it now,
+especially since there is no good single name for it anyway, because it is
+basically _the_ header file which is included by _all_ of Git's C sources.
+
+If you grasp the ideas in that initial commit (it is really small and you
+can get into it really fast, and it will help you recognize things in the
+much larger code base we have now), you should go on skimming `cache.h`,
+`object.h` and `commit.h` in the current version.
+
+In the early days, Git (in the tradition of UNIX) was a bunch of programs
+which were extremely simple, and which you used in scripts, piping the
+output of one into another. This turned out to be good for initial
+development, since it was easier to test new things. However, recently
+many of these parts have become builtins, and some of the core has been
+"libified", i.e. put into libgit.a for performance, portability reasons,
+and to avoid code duplication.
+
+By now, you know what the index is (and find the corresponding data
+structures in `cache.h`), and that there are just a couple of object types
+(blobs, trees, commits and tags) which inherit their common structure from
+`struct object`, which is their first member (and thus, you can cast e.g.
+`(struct object *)commit` to achieve the _same_ as `&commit->object`, i.e.
+get at the object name and flags).
+
+Now is a good point to take a break to let this information sink in.
+
+Next step: get familiar with the object naming. Read <<naming-commits>>.
+There are quite a few ways to name an object (and not only revisions!).
+All of these are handled in `sha1_name.c`. Just have a quick look at
+the function `get_sha1()`. A lot of the special handling is done by
+functions like `get_sha1_basic()` or the likes.
+
+This is just to get you into the groove for the most libified part of Git:
+the revision walker.
+
+Basically, the initial version of `git log` was a shell script:
+
+----------------------------------------------------------------
+$ git-rev-list --pretty $(git-rev-parse --default HEAD "$@") | \
+ LESS=-S ${PAGER:-less}
+----------------------------------------------------------------
+
+What does this mean?
+
+`git-rev-list` is the original version of the revision walker, which
+_always_ printed a list of revisions to stdout. It is still functional,
+and needs to, since most new Git programs start out as scripts using
+`git-rev-list`.
+
+`git-rev-parse` is not as important any more; it was only used to filter out
+options that were relevant for the different plumbing commands that were
+called by the script.
+
+Most of what `git-rev-list` did is contained in `revision.c` and
+`revision.h`. It wraps the options in a struct named `rev_info`, which
+controls how and what revisions are walked, and more.
+
+The original job of `git-rev-parse` is now taken by the function
+`setup_revisions()`, which parses the revisions and the common command line
+options for the revision walker. This information is stored in the struct
+`rev_info` for later consumption. You can do your own command line option
+parsing after calling `setup_revisions()`. After that, you have to call
+`prepare_revision_walk()` for initialization, and then you can get the
+commits one by one with the function `get_revision()`.
+
+If you are interested in more details of the revision walking process,
+just have a look at the first implementation of `cmd_log()`; call
+`git-show v1.3.0~155^2~4` and scroll down to that function (note that you
+no longer need to call `setup_pager()` directly).
+
+Nowadays, `git log` is a builtin, which means that it is _contained_ in the
+command `git`. The source side of a builtin is
+
+- a function called `cmd_<bla>`, typically defined in `builtin-<bla>.c`,
+ and declared in `builtin.h`,
+
+- an entry in the `commands[]` array in `git.c`, and
+
+- an entry in `BUILTIN_OBJECTS` in the `Makefile`.
+
+Sometimes, more than one builtin is contained in one source file. For
+example, `cmd_whatchanged()` and `cmd_log()` both reside in `builtin-log.c`,
+since they share quite a bit of code. In that case, the commands which are
+_not_ named like the `.c` file in which they live have to be listed in
+`BUILT_INS` in the `Makefile`.
+
+`git log` looks more complicated in C than it does in the original script,
+but that allows for a much greater flexibility and performance.
+
+Here again it is a good point to take a pause.
+
+Lesson three is: study the code. Really, it is the best way to learn about
+the organization of Git (after you know the basic concepts).
+
+So, think about something which you are interested in, say, "how can I
+access a blob just knowing the object name of it?". The first step is to
+find a Git command with which you can do it. In this example, it is either
+`git show` or `git cat-file`.
+
+For the sake of clarity, let's stay with `git cat-file`, because it
+
+- is plumbing, and
+
+- was around even in the initial commit (it literally went only through
+ some 20 revisions as `cat-file.c`, was renamed to `builtin-cat-file.c`
+ when made a builtin, and then saw less than 10 versions).
+
+So, look into `builtin-cat-file.c`, search for `cmd_cat_file()` and look what
+it does.
+
+------------------------------------------------------------------
+ git_config(git_default_config);
+ if (argc != 3)
+ usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
+ if (get_sha1(argv[2], sha1))
+ die("Not a valid object name %s", argv[2]);
+------------------------------------------------------------------
+
+Let's skip over the obvious details; the only really interesting part
+here is the call to `get_sha1()`. It tries to interpret `argv[2]` as an
+object name, and if it refers to an object which is present in the current
+repository, it writes the resulting SHA-1 into the variable `sha1`.
+
+Two things are interesting here:
+
+- `get_sha1()` returns 0 on _success_. This might surprise some new
+ Git hackers, but there is a long tradition in UNIX to return different
+ negative numbers in case of different errors -- and 0 on success.
+
+- the variable `sha1` in the function signature of `get_sha1()` is `unsigned
+ char *`, but is actually expected to be a pointer to `unsigned
+ char[20]`. This variable will contain the 160-bit SHA-1 of the given
+ commit. Note that whenever a SHA-1 is passed as "unsigned char *", it
+ is the binary representation, as opposed to the ASCII representation in
+ hex characters, which is passed as "char *".
+
+You will see both of these things throughout the code.
+
+Now, for the meat:
+
+-----------------------------------------------------------------------------
+ case 0:
+ buf = read_object_with_reference(sha1, argv[1], &size, NULL);
+-----------------------------------------------------------------------------
+
+This is how you read a blob (actually, not only a blob, but any type of
+object). To know how the function `read_object_with_reference()` actually
+works, find the source code for it (something like `git grep
+read_object_with | grep ":[a-z]"` in the git repository), and read
+the source.
+
+To find out how the result can be used, just read on in `cmd_cat_file()`:
+
+-----------------------------------
+ write_or_die(1, buf, size);
+-----------------------------------
+
+Sometimes, you do not know where to look for a feature. In many such cases,
+it helps to search through the output of `git log`, and then `git show` the
+corresponding commit.
+
+Example: If you know that there was some test case for `git bundle`, but
+do not remember where it was (yes, you _could_ `git grep bundle t/`, but that
+does not illustrate the point!):
+
+------------------------
+$ git log --no-merges t/
+------------------------
+
+In the pager (`less`), just search for "bundle", go a few lines back,
+and see that it is in commit 18449ab0... Now just copy this object name,
+and paste it into the command line
+
+-------------------
+$ git show 18449ab0
+-------------------
+
+Voila.
+
+Another example: Find out what to do in order to make some script a
+builtin:
+
+-------------------------------------------------
+$ git log --no-merges --diff-filter=A builtin-*.c
+-------------------------------------------------
+
+You see, Git is actually the best tool to find out about the source of Git
+itself!
+
[[glossary]]
include::glossary.txt[]
--
1.5.1.4.19.g69e2
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
@ 2007-05-14 18:19 Karl Hasselström
2007-05-14 18:39 ` J. Bruce Fields
0 siblings, 1 reply; 18+ messages in thread
From: Karl Hasselström @ 2007-05-14 18:19 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Junio C Hamano, git, Johannes Schindelin
On 2007-05-14 11:21:20 -0400, J. Bruce Fields wrote:
> It includes modifications as suggested by J. Bruce Fields, Karl
> Hasselström and Daniel Barkalow.
Agh! utf8/latin1 confusion! Your mail is in latin1, but you've used
the utf8 byte sequence for my name.
Hmm. Maybe I should keep quiet, so people won't start dropping my name
completely just to get rid of my complaints. :-)
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-14 18:19 [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual Karl Hasselström
@ 2007-05-14 18:39 ` J. Bruce Fields
2007-05-14 18:57 ` Matthieu Moy
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: J. Bruce Fields @ 2007-05-14 18:39 UTC (permalink / raw)
To: Karl Hasselström; +Cc: Junio C Hamano, git, Johannes Schindelin
On Mon, May 14, 2007 at 08:19:43PM +0200, Karl Hasselström wrote:
> On 2007-05-14 11:21:20 -0400, J. Bruce Fields wrote:
>
> > It includes modifications as suggested by J. Bruce Fields, Karl
> > Hasselström and Daniel Barkalow.
>
> Agh! utf8/latin1 confusion! Your mail is in latin1, but you've used
> the utf8 byte sequence for my name.
>
> Hmm. Maybe I should keep quiet, so people won't start dropping my name
> completely just to get rid of my complaints. :-)
No, I appreciate the complaint, I just don't know what to do about
it--as far as I can tell, I've chosen utf-8 everywhere I can: my commits
are in utf-8, and "locale" run from the shell reports everything as
"en_US.UTF-8". But I suspect the problem is on my end somewhere--do I
need to do something to make sure mail I send gets a header identifying
it as utf-8 and not iso-8859-1? I'll investigate some more tonight if I
get the chance; any advice welcomed.
--b.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-14 18:39 ` J. Bruce Fields
@ 2007-05-14 18:57 ` Matthieu Moy
2007-05-14 18:58 ` Karl Hasselström
[not found] ` <20070515042200.GA10884@coredump.intra.peff.net>
2 siblings, 0 replies; 18+ messages in thread
From: Matthieu Moy @ 2007-05-14 18:57 UTC (permalink / raw)
To: git
"J. Bruce Fields" <bfields@fieldses.org> writes:
> Content-Type: text/plain; charset=iso-8859-1
> Content-Disposition: inline
> Content-Transfer-Encoding: 8bit
[...]
> as far as I can tell, I've chosen utf-8 everywhere I can:
Probably except in your mailer's configuration then!
--
Matthieu
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-14 18:39 ` J. Bruce Fields
2007-05-14 18:57 ` Matthieu Moy
@ 2007-05-14 18:58 ` Karl Hasselström
[not found] ` <20070515042200.GA10884@coredump.intra.peff.net>
2 siblings, 0 replies; 18+ messages in thread
From: Karl Hasselström @ 2007-05-14 18:58 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Junio C Hamano, git, Johannes Schindelin
On 2007-05-14 14:39:31 -0400, J. Bruce Fields wrote:
> No, I appreciate the complaint, I just don't know what to do about
> it--as far as I can tell, I've chosen utf-8 everywhere I can: my
> commits are in utf-8, and "locale" run from the shell reports
> everything as "en_US.UTF-8". But I suspect the problem is on my end
> somewhere--do I need to do something to make sure mail I send gets a
> header identifying it as utf-8 and not iso-8859-1? I'll investigate
> some more tonight if I get the chance; any advice welcomed.
Your mail headers include this:
Content-Transfer-Encoding: QUOTED-PRINTABLE
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
but the mail body has this:
It includes modifications as suggested by J. Bruce Fields, Karl
Hasselstr=C3=B6m and Daniel Barkalow.
(That's a two-byte sequence for a single character, which indicates
utf8 and rules out latin1.)
I guess the program that generates the e-mail (git-format-patch?)
thinks it's getting latin1 input, when it's in fact getting utf8
input. This is the exact same error (or rather, the exact same
symptom) that's happened once or twice the last week or so.
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
[not found] ` <20070515042200.GA10884@coredump.intra.peff.net>
@ 2007-05-15 4:50 ` J. Bruce Fields
2007-05-15 5:08 ` Jeff King
0 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2007-05-15 4:50 UTC (permalink / raw)
To: Jeff King; +Cc: Karl Hasselström, git
On Tue, May 15, 2007 at 12:22:00AM -0400, Jeff King wrote:
> Your original mail _does_ claim utf-8 for me. I wonder if Karl's mail is
> getting munged by something along the path (my path is straight from vger to a
> qmail server that I know is doing no munging). The headers I received, for
> reference:
Hm. Yes, so if I send that patch to myself with git-send-email, I see
the same thing as you:
...
> From: "J. Bruce Fields" <bfields@citi.umich.edu>
> To: Junio C Hamano <junkio@cox.net>
> Cc: git@vger.kernel.org,
> Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Subject: [PATCH 01/10] Add a birdview-on-the-source-code section to the user man
> ual
> Date: Mon, 14 May 2007 11:21:20 -0400
> Message-Id: <11791560893572-git-send-email->
> X-Mailer: git-send-email 1.5.1.4.19.g69e2
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: 8bit
...
But the mail I got through the git list yesterday has some odd stuff in
it:
>From git-owner@vger.kernel.org Mon May 14 11:22:01 2007
Received: from vger.kernel.org ([209.132.176.167])
by fieldses.org with esmtp (Exim 4.67)
(envelope-from <git-owner@vger.kernel.org>)
id 1HncN6-00051C-Mh
for bfields@fieldses.org; Mon, 14 May 2007 11:22:01 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id S1755729AbXENPVe (ORCPT <rfc822;bfields@fieldses.org>);
Mon, 14 May 2007 11:21:34 -0400
X-Warning: Original message contained 8-bit characters, however during
the SMTP transport session the receiving system did not announce
capability of receiving 8-bit SMTP (RFC 1651-1653), and as this
message does not have MIME headers (RFC 2045-2049) to enable
encoding change, we had very little choice.
X-Warning: We ASSUME it is less harmful to add the MIME headers, and
convert the text to Quoted-Printable, than not to do so,
and to strip the message to 7-bits.. (RFC 1428 Appendix A)
X-Warning: We don't know what character set the user used, thus we had to
write these MIME-headers with our local system default value.
MIME-Version: 1.0
Content-Transfer-Encoding: QUOTED-PRINTABLE
Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756250AbXENPVc
(ORCPT <rfc822;git-outgoing>); Mon, 14 May 2007 11:21:32 -0400
Received: from mail.fieldses.org ([66.93.2.214]:54954 "EHLO fieldses.org"
rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
id S1755315AbXENPVb (ORCPT <rfc822;git@vger.kernel.org>);
Mon, 14 May 2007 11:21:31 -0400
Received: from bfields by fieldses.org with local (Exim 4.67)
(envelope-from <bfields@fieldses.org>)
id 1HncMb-0004z0-E7; Mon, 14 May 2007 11:21:29 -0400
From: "J. Bruce Fields" <bfields@citi.umich.edu>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
Date: Mon, 14 May 2007 11:21:20 -0400
Message-Id: <11791560893572-git-send-email->
X-Mailer: git-send-email 1.5.1.4.19.g69e2
Sender: git-owner@vger.kernel.org
Precedence: bulk
X-Mailing-List: git@vger.kernel.org
Status: RO
Any idea how that happened?
--b.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 4:50 ` J. Bruce Fields
@ 2007-05-15 5:08 ` Jeff King
2007-05-15 5:57 ` Jeffrey C. Ollie
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Jeff King @ 2007-05-15 5:08 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Karl Hasselström, git
On Tue, May 15, 2007 at 12:50:44AM -0400, J. Bruce Fields wrote:
> But the mail I got through the git list yesterday has some odd stuff in
> it:
>
> From git-owner@vger.kernel.org Mon May 14 11:22:01 2007
> Received: from vger.kernel.org ([209.132.176.167])
> by fieldses.org with esmtp (Exim 4.67)
> (envelope-from <git-owner@vger.kernel.org>)
> id 1HncN6-00051C-Mh
> for bfields@fieldses.org; Mon, 14 May 2007 11:22:01 -0400
> Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
> id S1755729AbXENPVe (ORCPT <rfc822;bfields@fieldses.org>);
> Mon, 14 May 2007 11:21:34 -0400
> X-Warning: Original message contained 8-bit characters, however during
> the SMTP transport session the receiving system did not announce
> capability of receiving 8-bit SMTP (RFC 1651-1653), and as this
> message does not have MIME headers (RFC 2045-2049) to enable
> encoding change, we had very little choice.
> X-Warning: We ASSUME it is less harmful to add the MIME headers, and
> convert the text to Quoted-Printable, than not to do so,
> and to strip the message to 7-bits.. (RFC 1428 Appendix A)
> X-Warning: We don't know what character set the user used, thus we had to
> write these MIME-headers with our local system default value.
> MIME-Version: 1.0
> Content-Transfer-Encoding: QUOTED-PRINTABLE
> Content-Type: TEXT/PLAIN; charset=ISO-8859-1
Interesting. vger is correct in translating, since your mail server does
_not_ advertise the 8BITMIME extension (even though exim is 8-bit clean,
and could handle it).
However, the content-type is already specified, so it shouldn't need to
rewrite. However, I notice that your original message is missing a
MIME-Version: 1.0 header. My guess is that vger's logic is that without
that header, it can't trust the Content-Type you have provided (and
indeed, not including MIME-Version violates the MIME RFCs, I believe).
I assumed this was a bug in git-send-email, but looking closer, it
doesn't put in any mime information at all! So your sending smtp server
is adding in the content-type header, but it's failing to add the
MIME-Version header, which I think is a bug (I can dig up the RFC
reference if you want).
Arguably, git should be generating the full MIME header-set, since it
knows what actual encoding the message is in.
-Peff
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 5:08 ` Jeff King
@ 2007-05-15 5:57 ` Jeffrey C. Ollie
2007-05-15 6:24 ` Jeff King
2007-05-15 8:24 ` Karl Hasselström
2007-05-15 15:24 ` J. Bruce Fields
2 siblings, 1 reply; 18+ messages in thread
From: Jeffrey C. Ollie @ 2007-05-15 5:57 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 399 bytes --]
On Tue, 2007-05-15 at 01:08 -0400, Jeff King wrote:
> Interesting. vger is correct in translating, since your mail server
> does
> _not_ advertise the 8BITMIME extension (even though exim is 8-bit
> clean,
> and could handle it).
Exim can advertise the 8BITMIME extension - it's turned off by default:
http://www.exim.org/exim-html-current/doc/html/spec_html/ch14.html#SECTalomo
Jeff
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 5:57 ` Jeffrey C. Ollie
@ 2007-05-15 6:24 ` Jeff King
0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2007-05-15 6:24 UTC (permalink / raw)
To: Jeffrey C. Ollie; +Cc: git
On Tue, May 15, 2007 at 12:57:53AM -0500, Jeffrey C. Ollie wrote:
> Exim can advertise the 8BITMIME extension - it's turned off by default:
Yes, although turning it on would just paper over the actual problem,
which is that vger is rewritin the content-type header with the wrong
charset. It would fix the problem for Bruce, but not for other
receivers.
The real problem is (I believe) the lack of the MIME-Version header. I
will do a few test messages momentarily (which will unfortunately
require me spamming the list a bit).
-Peff
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 5:08 ` Jeff King
2007-05-15 5:57 ` Jeffrey C. Ollie
@ 2007-05-15 8:24 ` Karl Hasselström
2007-05-15 8:55 ` Junio C Hamano
2007-05-15 15:24 ` J. Bruce Fields
2 siblings, 1 reply; 18+ messages in thread
From: Karl Hasselström @ 2007-05-15 8:24 UTC (permalink / raw)
To: Jeff King; +Cc: J. Bruce Fields, git
On 2007-05-15 01:08:08 -0400, Jeff King wrote:
> However, the content-type is already specified, so it shouldn't need
> to rewrite. However, I notice that your original message is missing
> a MIME-Version: 1.0 header. My guess is that vger's logic is that
> without that header, it can't trust the Content-Type you have
> provided (and indeed, not including MIME-Version violates the MIME
> RFCs, I believe).
You know, this rings a bell. I've discovered that a "MIME-Version:
1.0" is needed before. :-)
"stg mail" used to have the same problem, until it was changed to use
the Python e-mail libraries for all that stuff. And since then I
haven't had problems with it.
> I assumed this was a bug in git-send-email, but looking closer, it
> doesn't put in any mime information at all! So your sending smtp
> server is adding in the content-type header, but it's failing to add
> the MIME-Version header, which I think is a bug (I can dig up the
> RFC reference if you want).
>
> Arguably, git should be generating the full MIME header-set, since
> it knows what actual encoding the message is in.
I very much agree.
--
Karl Hasselström, kha@treskal.com
www.treskal.com/kalle
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 8:24 ` Karl Hasselström
@ 2007-05-15 8:55 ` Junio C Hamano
2007-05-15 9:57 ` Jeff King
0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2007-05-15 8:55 UTC (permalink / raw)
To: Karl Hasselström; +Cc: Jeff King, J. Bruce Fields, git
Karl Hasselström <kha@treskal.com> writes:
> On 2007-05-15 01:08:08 -0400, Jeff King wrote:
>
>> However, the content-type is already specified, so it shouldn't need
>> to rewrite. However, I notice that your original message is missing
>> a MIME-Version: 1.0 header. My guess is that vger's logic is that
>> without that header, it can't trust the Content-Type you have
>> provided (and indeed, not including MIME-Version violates the MIME
>> RFCs, I believe).
>
> You know, this rings a bell. I've discovered that a "MIME-Version:
> 1.0" is needed before. :-)
>
> "stg mail" used to have the same problem, until it was changed to use
> the Python e-mail libraries for all that stuff. And since then I
> haven't had problems with it.
>
>> I assumed this was a bug in git-send-email, but looking closer, it
>> doesn't put in any mime information at all! So your sending smtp
>> server is adding in the content-type header, but it's failing to add
>> the MIME-Version header, which I think is a bug (I can dig up the
>> RFC reference if you want).
>>
>> Arguably, git should be generating the full MIME header-set, since
>> it knows what actual encoding the message is in.
>
> I very much agree.
If the above statement meand git-send-email by "git" I would
very much agree.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 8:55 ` Junio C Hamano
@ 2007-05-15 9:57 ` Jeff King
2007-05-15 18:41 ` Junio C Hamano
0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2007-05-15 9:57 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karl Hasselström, J. Bruce Fields, git
On Tue, May 15, 2007 at 01:55:19AM -0700, Junio C Hamano wrote:
> >> Arguably, git should be generating the full MIME header-set, since
> >> it knows what actual encoding the message is in.
> > I very much agree.
> If the above statement meand git-send-email by "git" I would
> very much agree.
OK, the lack of a MIME-Version is clearly the problem, based on Karl's
view of the messages I sent. I agree that git-send-email is the right
place to implement this (though the weird partial mime headers are
actually an artifact of Bruce's MTA).
Unfortunately, I don't think we have the encoding information any more
at that point. We can infer how the patch was generated by looking at
the git-config, and that should be right 99% of the time (unless the
patches were generated with a different config, either from another repo
or before some settings were changed).
Junio, can you confirm my understanding that:
- if i18n.logOutputEncoding is set, then we are definitely in that
encoding
- otherwise, if i18n.commitEncoding is set, we should assume commits are
in that encoding (which is just a guess, since they may have been
generated on another config, but it's our best guess)
- otherwise, assume utf-8
If that is OK, I will work up a patch.
Also Junio, it looks like commit 7cbcf4d5 moved parsing of the
--encoding parameter into setup_revisions, but it's still being checked
for in cmd_log_init. Can you confirm that the latter is now superfluous
and can be removed?
-Peff
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 5:08 ` Jeff King
2007-05-15 5:57 ` Jeffrey C. Ollie
2007-05-15 8:24 ` Karl Hasselström
@ 2007-05-15 15:24 ` J. Bruce Fields
2007-05-15 15:35 ` Jeff King
2 siblings, 1 reply; 18+ messages in thread
From: J. Bruce Fields @ 2007-05-15 15:24 UTC (permalink / raw)
To: Jeff King; +Cc: Karl Hasselström, git
On Tue, May 15, 2007 at 01:08:08AM -0400, Jeff King wrote:
> However, the content-type is already specified, so it shouldn't need to
> rewrite. However, I notice that your original message is missing a
> MIME-Version: 1.0 header. My guess is that vger's logic is that without
> that header, it can't trust the Content-Type you have provided (and
> indeed, not including MIME-Version violates the MIME RFCs, I believe).
>
> I assumed this was a bug in git-send-email, but looking closer, it
> doesn't put in any mime information at all! So your sending smtp server
> is adding in the content-type header,
Nope...
> but it's failing to add the
> MIME-Version header, which I think is a bug (I can dig up the RFC
> reference if you want).
>
> Arguably, git should be generating the full MIME header-set, since it
> knows what actual encoding the message is in.
... Yes. But actually, the Content-Type header is from
git-format-patch:
$ git format-patch --stdout 12806b^..12806b |head
From 12806b65b0d1faec249002c51b871775dc344a47 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Date: Thu, 10 May 2007 12:36:15 +0200
Subject: [PATCH] Add a birdview-on-the-source-code section to the user
manual
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
In http://thread.gmane.org/gmane.comp.version-control.git/42479,
a birdview on the source code was requested.
So it's a git-format-patch bug?
--b.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 15:24 ` J. Bruce Fields
@ 2007-05-15 15:35 ` Jeff King
2007-05-15 18:42 ` Junio C Hamano
0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2007-05-15 15:35 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Karl Hasselström, Junio C Hamano, git
On Tue, May 15, 2007 at 11:24:58AM -0400, J. Bruce Fields wrote:
> ... Yes. But actually, the Content-Type header is from
> git-format-patch:
>
> $ git format-patch --stdout 12806b^..12806b |head
> From 12806b65b0d1faec249002c51b871775dc344a47 Mon Sep 17 00:00:00 2001
> From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Date: Thu, 10 May 2007 12:36:15 +0200
> Subject: [PATCH] Add a birdview-on-the-source-code section to the user
> manual
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: 8bit
Ah, interesting. I had checked that, but my test didn't produce those
headers. It seems we only produce them if there are non-ascii characters
in the commit message (and I just checked with an arbitrary commit).
So really, this (totally untested) one-liner should fix it:
diff --git a/commit.c b/commit.c
index 922437f..5669c2f 100644
--- a/commit.c
+++ b/commit.c
@@ -1065,6 +1065,7 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt,
int sz;
char header[512];
const char *header_fmt =
+ "MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=%s\n"
"Content-Transfer-Encoding: 8bit\n";
sz = snprintf(header, sizeof(header), header_fmt,
Providing that nobody objects to sticking that extra header in
format-patch's output (but of course only when we actually have
non-ascii data). It's technically required if we want the output to be a
valid MIME message, but most things are unlikely to care (except vger's
apparently picky MTA).
-Peff
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 9:57 ` Jeff King
@ 2007-05-15 18:41 ` Junio C Hamano
2007-05-16 11:15 ` Jeff King
0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2007-05-15 18:41 UTC (permalink / raw)
To: Jeff King; +Cc: Karl Hasselström, J. Bruce Fields, git
Jeff King <peff@peff.net> writes:
> Unfortunately, I don't think we have the encoding information any more
> at that point. We can infer how the patch was generated by looking at
> the git-config, and that should be right 99% of the time (unless the
> patches were generated with a different config, either from another repo
> or before some settings were changed).
>
> Junio, can you confirm my understanding that:
> - if i18n.logOutputEncoding is set, then we are definitely in that
> encoding
> - otherwise, if i18n.commitEncoding is set, we should assume commits are
> in that encoding (which is just a guess, since they may have been
> generated on another config, but it's our best guess)
> - otherwise, assume utf-8
I do not want to break projects whose members consistently use a
single non UTF-8 encoding, and I've been hoping that in such a
use case they should not have to set any of these encoding
configuration. So in that sense I would be somewhat reluctant
to agree with the last one. But I am getting a feeling that it
is a losing battle.
On the patch acceptance side, when we do _not_ have encoding
information and the input does not look like a valid UTF-8, we
assume that the input is latin-1 and convert it to UTF-8, if I
recall correctly. If somebody sent you a patch without encoding
header, and then you are forwarding that patch, not adding
anything ourselves (because we do not know) and let the
receiving end to do that conversion is certainly the best; but
if we _were_ to add anything I would suspect it would be a
better idea to use the same logic to default to latin-1 or
UTF-8. East Asian users may want to raise objections here.
I think it is a reasonable compromise to do it the way you
outlined. Doing it at patch generation time would fix the
ambiguity issues during the step 2, so it might turn out to be
necessary to add the encoding header to format-patch output
after all, but send-email needs to be able to handle messages
that do not have the header anyway, so probably the first step
is to do so in send-email.
When we update format-patch, the ambiguity at step 2 would
disappear. My gut feeling is that adding an extra header to
format-patch output would not break people's workflow nor
scripts (I do not think it would break mine, as I either suck in
only the body of the message to my MUA or use send-email), but I
am not sure.
> Also Junio, it looks like commit 7cbcf4d5 moved parsing of the
> --encoding parameter into setup_revisions, but it's still being checked
> for in cmd_log_init. Can you confirm that the latter is now superfluous
> and can be removed?
Thanks for noticing, and I think you are right. The code parses
the same input and sets the same global variable the same way.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 15:35 ` Jeff King
@ 2007-05-15 18:42 ` Junio C Hamano
2007-05-16 11:18 ` Jeff King
0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2007-05-15 18:42 UTC (permalink / raw)
To: Jeff King; +Cc: J. Bruce Fields, Karl Hasselström, git
Jeff King <peff@peff.net> writes:
> On Tue, May 15, 2007 at 11:24:58AM -0400, J. Bruce Fields wrote:
>
>> ... Yes. But actually, the Content-Type header is from
>> git-format-patch:
>>
>> $ git format-patch --stdout 12806b^..12806b |head
>> From 12806b65b0d1faec249002c51b871775dc344a47 Mon Sep 17 00:00:00 2001
>> From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
>> Date: Thu, 10 May 2007 12:36:15 +0200
>> Subject: [PATCH] Add a birdview-on-the-source-code section to the user
>> manual
>> Content-Type: text/plain; charset=utf-8
>> Content-Transfer-Encoding: 8bit
>
> Ah, interesting. I had checked that, but my test didn't produce those
> headers. It seems we only produce them if there are non-ascii characters
> in the commit message (and I just checked with an arbitrary commit).
>
> So really, this (totally untested) one-liner should fix it:
>
> diff --git a/commit.c b/commit.c
> index 922437f..5669c2f 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1065,6 +1065,7 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt,
> int sz;
> char header[512];
> const char *header_fmt =
> + "MIME-Version: 1.0\n"
> "Content-Type: text/plain; charset=%s\n"
> "Content-Transfer-Encoding: 8bit\n";
> sz = snprintf(header, sizeof(header), header_fmt,
>
>
> Providing that nobody objects to sticking that extra header in
> format-patch's output (but of course only when we actually have
> non-ascii data). It's technically required if we want the output to be a
> valid MIME message, but most things are unlikely to care (except vger's
> apparently picky MTA).
Thanks; I think this is a sane thing to do.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 18:41 ` Junio C Hamano
@ 2007-05-16 11:15 ` Jeff King
0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2007-05-16 11:15 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Karl Hasselström, J. Bruce Fields, git
On Tue, May 15, 2007 at 11:41:01AM -0700, Junio C Hamano wrote:
> I do not want to break projects whose members consistently use a
> single non UTF-8 encoding, and I've been hoping that in such a
> use case they should not have to set any of these encoding
> configuration. So in that sense I would be somewhat reluctant
> to agree with the last one. But I am getting a feeling that it
> is a losing battle.
I think that is a good goal, but I think we have already failed, as
git-format-patch generates content-type headers with charset=utf-8
(unless the encoding variables are set up). This code was added last
year around this time (cdd406e38).
It looks like this is squelched in the presence of format.headers
configuration. However, that still means they have to do _something_ to
get it to work right (and I note that the fact that format.headers
squelches MIME headers doesn't seem to be documented anywhere...)
> I think it is a reasonable compromise to do it the way you
> outlined. Doing it at patch generation time would fix the
> ambiguity issues during the step 2, so it might turn out to be
> necessary to add the encoding header to format-patch output
> after all, but send-email needs to be able to handle messages
> that do not have the header anyway, so probably the first step
> is to do so in send-email.
As I noted in my other email, it actually _is_ there already. So the
MIME-Version fix just keeps the status quo, and we've been doing it this
way for a year.
Is it still worth making these guesses in send-email?
> > Also Junio, it looks like commit 7cbcf4d5 moved parsing of the
> > --encoding parameter into setup_revisions, but it's still being checked
> > for in cmd_log_init. Can you confirm that the latter is now superfluous
> > and can be removed?
> Thanks for noticing, and I think you are right. The code parses
> the same input and sets the same global variable the same way.
Well, I wouldn't have noticed it if you hadn't written git-log -S. :) In
case you haven't fixed it yet, here it is in patch form:
-- >8 --
cmd_log_init: remove parsing of --encoding command line parameter
This was moved to the setup_revisions parsing in 7cbcf4d5, so it was
never being triggered.
Signed-off-by: Jeff King <peff@peff.net>
---
diff --git a/builtin-log.c b/builtin-log.c
index 3744712..cebb958 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -60,13 +60,7 @@ static void cmd_log_init(int argc, const char **argv, const char *prefix,
rev->always_show_header = 0;
for (i = 1; i < argc; i++) {
const char *arg = argv[i];
- if (!prefixcmp(arg, "--encoding=")) {
- arg += 11;
- if (strcmp(arg, "none"))
- git_log_output_encoding = xstrdup(arg);
- else
- git_log_output_encoding = "";
- } else if (!strcmp(arg, "--decorate")) {
+ if (!strcmp(arg, "--decorate")) {
if (!decorate)
for_each_ref(add_ref_decoration, NULL);
decorate = 1;
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual
2007-05-15 18:42 ` Junio C Hamano
@ 2007-05-16 11:18 ` Jeff King
0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2007-05-16 11:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: J. Bruce Fields, Karl Hasselström, git
On Tue, May 15, 2007 at 11:42:24AM -0700, Junio C Hamano wrote:
> > + "MIME-Version: 1.0\n"
> Thanks; I think this is a sane thing to do.
Do you want me to work up a commit message, or do you just want to
assemble it from my other discussion?
BTW, I also checked for other places where we generate a content-type.
The only other place I found was when we do multipart/mixed
(log-tree.c:209), but we correctly generate the MIME-Version header
there.
-Peff
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-05-16 11:19 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-14 18:19 [PATCH 01/10] Add a birdview-on-the-source-code section to the user manual Karl Hasselström
2007-05-14 18:39 ` J. Bruce Fields
2007-05-14 18:57 ` Matthieu Moy
2007-05-14 18:58 ` Karl Hasselström
[not found] ` <20070515042200.GA10884@coredump.intra.peff.net>
2007-05-15 4:50 ` J. Bruce Fields
2007-05-15 5:08 ` Jeff King
2007-05-15 5:57 ` Jeffrey C. Ollie
2007-05-15 6:24 ` Jeff King
2007-05-15 8:24 ` Karl Hasselström
2007-05-15 8:55 ` Junio C Hamano
2007-05-15 9:57 ` Jeff King
2007-05-15 18:41 ` Junio C Hamano
2007-05-16 11:15 ` Jeff King
2007-05-15 15:24 ` J. Bruce Fields
2007-05-15 15:35 ` Jeff King
2007-05-15 18:42 ` Junio C Hamano
2007-05-16 11:18 ` Jeff King
-- strict thread matches above, loose matches on Subject: below --
2007-05-14 15:21 J. Bruce Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).