git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] git-mailinfo fixes for patch munging
@ 2007-03-30 16:18 Don Zickus
  2007-03-30 21:19 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Don Zickus @ 2007-03-30 16:18 UTC (permalink / raw)
  To: git

Don't translate the patch to UTF-8, instead preserve the data as is.  Also
allow overwriting the primary mail headers (addresses Linus's concern).  

I also revert a test case that was included in the original patch.  Now it
makes sense why it was the way it was. :)

Cheers,
Don


diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index d94578c..71b6457 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -294,14 +294,14 @@ static char *header[MAX_HDR_PARSED] = {
 	"From","Subject","Date",
 };
 
-static int check_header(char *line, char **hdr_data)
+static int check_header(char *line, char **hdr_data, int overwrite)
 {
 	int i;
 
 	/* search for the interesting parts */
 	for (i = 0; header[i]; i++) {
 		int len = strlen(header[i]);
-		if (!hdr_data[i] &&
+		if ((!hdr_data[i] || overwrite) &&
 		    !strncasecmp(line, header[i], len) &&
 		    line[len] == ':' && isspace(line[len + 1])) {
 			/* Unwrap inline B and Q encoding, and optionally
@@ -614,6 +614,7 @@ static int find_boundary(void)
 
 static int handle_boundary(void)
 {
+	char newline[]="\n";
 again:
 	if (!memcmp(line+content_top->boundary_len, "--", 2)) {
 		/* we hit an end boundary */
@@ -628,7 +629,7 @@ again:
 					"can't recover\n");
 			exit(1);
 		}
-		handle_filter("\n");
+		handle_filter(newline);
 
 		/* skip to the next boundary */
 		if (!find_boundary())
@@ -643,7 +644,7 @@ again:
 
 	/* slurp in this section's info */
 	while (read_one_header_line(line, sizeof(line), fin))
-		check_header(line, p_hdr_data);
+		check_header(line, p_hdr_data, 1);
 
 	/* eat the blank line after section info */
 	return (fgets(line, sizeof(line), fin) != NULL);
@@ -699,10 +700,14 @@ static int handle_commit_msg(char *line)
 			if (!*cp)
 				return 0;
 		}
-		if ((still_looking = check_header(cp, s_hdr_data)) != 0)
+		if ((still_looking = check_header(cp, s_hdr_data, 0)) != 0)
 			return 0;
 	}
 
+	/* normalize the log message to UTF-8. */
+	if (metainfo_charset)
+		convert_to_utf8(line, charset);
+
 	if (patchbreak(line)) {
 		fclose(cmitmsg);
 		cmitmsg = NULL;
@@ -767,12 +772,8 @@ static void handle_body(void)
 				return;
 		}
 
-		/* Unwrap transfer encoding and optionally
-		 * normalize the log message to UTF-8.
-		 */
+		/* Unwrap transfer encoding */
 		decode_transfer_encoding(line);
-		if (metainfo_charset)
-			convert_to_utf8(line, charset);
 
 		switch (transfer_encoding) {
 		case TE_BASE64:
@@ -875,7 +876,7 @@ int mailinfo(FILE *in, FILE *out, int ks, const char *encoding,
 
 	/* process the email header */
 	while (read_one_header_line(line, sizeof(line), fin))
-		check_header(line, p_hdr_data);
+		check_header(line, p_hdr_data, 1);
 
 	handle_body();
 	handle_info();
diff --git a/t/t5100/patch0005 b/t/t5100/patch0005
index e7d6f66..7d24b24 100644
--- a/t/t5100/patch0005
+++ b/t/t5100/patch0005
@@ -61,7 +61,7 @@ diff --git a/git-cvsimport-script b/git-cvsimport-script
  		push(@old,$fn);
 
 -- 
-David KÃ¥gedal
+David Kågedal
 -
 To unsubscribe from this list: send the line "unsubscribe git" in
 the body of a message to majordomo@vger.kernel.org

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] git-mailinfo fixes for patch munging
  2007-03-30 16:18 [PATCH] git-mailinfo fixes for patch munging Don Zickus
@ 2007-03-30 21:19 ` Junio C Hamano
  2007-03-30 21:32   ` Don Zickus
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2007-03-30 21:19 UTC (permalink / raw)
  To: Don Zickus; +Cc: git

Don Zickus <dzickus@redhat.com> writes:

> Don't translate the patch to UTF-8, instead preserve the data as is.  Also
> allow overwriting the primary mail headers (addresses Linus's concern).  
>
> I also revert a test case that was included in the original patch.  Now it
> makes sense why it was the way it was. :)
>
> Cheers,
> Don

Thanks.  Sign-off would have been nice.

> diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
> index d94578c..71b6457 100644
> --- a/builtin-mailinfo.c
> +++ b/builtin-mailinfo.c
> @@ -294,14 +294,14 @@ static char *header[MAX_HDR_PARSED] = {
>  	"From","Subject","Date",
>  };
>  
> -static int check_header(char *line, char **hdr_data)
> +static int check_header(char *line, char **hdr_data, int overwrite)
>  {
>  	int i;
>  
>  	/* search for the interesting parts */
>  	for (i = 0; header[i]; i++) {
>  		int len = strlen(header[i]);
> -		if (!hdr_data[i] &&
> +		if ((!hdr_data[i] || overwrite) &&
>  		    !strncasecmp(line, header[i], len) &&
>  		    line[len] == ':' && isspace(line[len + 1])) {
>  			/* Unwrap inline B and Q encoding, and optionally

This check_header is called from each multi-part boundary with
overwrite=1, so if you have two parts and you have From: or
Subject: in the multi-part header (not in-body), wouldn't they
overwrite what we already have?  That is not desired, I would
think.

For non multi-part case, what we traditionally have done is:

	* Take Subject:, Date:, and From: from RFC2822 headers
          to prime the title and authorship information.

	* The first lines of the body of the message (i.e. after
          the blank line that separates 2822 headers and the
          body) can look like the above header lines to
          override.

	* A line that does not look like an overriding in-body
          header line is the first line of the commit log
          message.  After that, nothing is taken as an
          overriding in-body header.

For a multi-part, I think we only processed the first part as
the commit log message, potentially starting with the overriding
in-body headers.  In other words, in-body headers are what the
user *types* to override what the MUA says in RFC2822 headers.
As the stuff that follow the multi-part boundary (like
content-type and transfer encoding) are of the MUA kind, I
suspect we do not want it to override what the sender said in
the earlier parts of the message.

> @@ -614,6 +614,7 @@ static int find_boundary(void)
>  
>  static int handle_boundary(void)
>  {
> +	char newline[]="\n";
>  again:
>  	if (!memcmp(line+content_top->boundary_len, "--", 2)) {
>  		/* we hit an end boundary */
> @@ -628,7 +629,7 @@ again:
>  					"can't recover\n");
>  			exit(1);
>  		}
> -		handle_filter("\n");
> +		handle_filter(newline);
>  
>  		/* skip to the next boundary */
>  		if (!find_boundary())

These two hunks certainly do not hurt, but why?  Is this about
the constness of the first parameter to handle_filter() and its
call chain?

Having said that, the result of the patch is much better.  

In fact, I couldn't "git am" this patch (the part that reverts
the test vector) with the current tip of 'master' because of the
breakage you are fixing with it ;-).

Now I can.  So I'd probably take this patch for now.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] git-mailinfo fixes for patch munging
  2007-03-30 21:19 ` Junio C Hamano
@ 2007-03-30 21:32   ` Don Zickus
  0 siblings, 0 replies; 3+ messages in thread
From: Don Zickus @ 2007-03-30 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Mar 30, 2007 at 02:19:42PM -0700, Junio C Hamano wrote:
> Don Zickus <dzickus@redhat.com> writes:
> 
> > Don't translate the patch to UTF-8, instead preserve the data as is.  Also
> > allow overwriting the primary mail headers (addresses Linus's concern).  
> >
> > I also revert a test case that was included in the original patch.  Now it
> > makes sense why it was the way it was. :)
> >
> > Cheers,
> > Don
> 
> Thanks.  Sign-off would have been nice.

Doh. Sorry.  Should I repost with fix below?

> This check_header is called from each multi-part boundary with
> overwrite=1, so if you have two parts and you have From: or
> Subject: in the multi-part header (not in-body), wouldn't they
> overwrite what we already have?  That is not desired, I would
> think.

Hmm.  I guess I never thought about that case.  You are right, that check
can be changed to a zero (because the rfc2822 are checked elsewhere).

> > @@ -614,6 +614,7 @@ static int find_boundary(void)
> >  
> >  static int handle_boundary(void)
> >  {
> > +	char newline[]="\n";
> >  again:
> >  	if (!memcmp(line+content_top->boundary_len, "--", 2)) {
> >  		/* we hit an end boundary */
> > @@ -628,7 +629,7 @@ again:
> >  					"can't recover\n");
> >  			exit(1);
> >  		}
> > -		handle_filter("\n");
> > +		handle_filter(newline);
> >  
> >  		/* skip to the next boundary */
> >  		if (!find_boundary())
> 
> These two hunks certainly do not hurt, but why?  Is this about
> the constness of the first parameter to handle_filter() and its
> call chain?

Yeah, I SEGFAULT'd when trying to convert_to_utf8() a fixed string. :-)

Cheers,
Don

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-03-30 21:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-30 16:18 [PATCH] git-mailinfo fixes for patch munging Don Zickus
2007-03-30 21:19 ` Junio C Hamano
2007-03-30 21:32   ` Don Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).