[PATCH v2 0/6] clean up author parsing

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/6] clean up author parsing
@ 2014-08-27  7:55 Jeff King
  2014-08-27  7:56 ` [PATCH v2 1/6] commit: provide a function to find a header in a buffer Jeff King
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27  7:55 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Erik Faye-Lund

This is a re-roll of my series from June:

  http://thread.gmane.org/gmane.comp.version-control.git/252055

The goal is to clean up some ad-hoc parsing code. Besides reusing code,
this fixes some memory leaks, fixes some bad pointer math in
determine_author_info, and protects us against getenv() implementations
that use a static buffer.

This iteration incorporates feedback from the Eri[ck]s (particularly the
last patch, which is much simplified). In addition, I dropped the patch
that converted split_ident's pairs of pointers into a struct. It caused
quite a bit of noise through the code base, and only saved us a few
lines in the end (actually no lines -- it just made a few lines
shorter).

The original was also based on another pending series in 'next', which
has since graduated to master. This one is based directly on master.

  [1/6]: commit: provide a function to find a header in a buffer
  [2/6]: record_author_info: fix memory leak on malformed commit
  [3/6]: record_author_info: use find_commit_header
  [4/6]: use strbufs in date functions
  [5/6]: determine_author_info: reuse parsing functions
  [6/6]: determine_author_info: copy getenv output

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
@ 2014-08-27  7:56 ` Jeff King
  2014-08-27 17:30   ` Junio C Hamano
  2014-08-27  7:56 ` [PATCH v2 2/6] record_author_info: fix memory leak on malformed commit Jeff King
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2014-08-27  7:56 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Erik Faye-Lund

Usually when we parse a commit, we read it line by line and
handle each individual line (e.g., parse_commit and
parse_commit_header).  Sometimes, however, we only care
about extracting a single header. Code in this situation is
stuck doing an ad-hoc parse of the commit buffer.

Let's provide a reusable function to locate a header within
the commit.  The code is modeled after pretty.c's
get_header, which is used to extract the encoding.

Since some callers may not have the "struct commit" to go
along with the buffer, we drop that parameter.  The only
thing lost is a warning for truncated commits, but that's
OK.  This shouldn't happen in practice, and even if it does,
there's no particular reason that this function needs to
complain about it. It either finds the header it was asked
for, or it doesn't (and in the latter case, the caller will
typically complain).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 22 ++++++++++++++++++++++
 commit.h | 11 +++++++++++
 pretty.c | 33 ++++++---------------------------
 3 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/commit.c b/commit.c
index ae7f2b1..b6ffd62 100644
--- a/commit.c
+++ b/commit.c
@@ -1660,3 +1660,25 @@ void print_commit_list(struct commit_list *list,
 		printf(format, sha1_to_hex(list->item->object.sha1));
 	}
 }
+
+const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
+{
+	int key_len = strlen(key);
+	const char *line = msg;
+
+	while (line) {
+		const char *eol = strchrnul(line, '\n');
+
+		if (line == eol)
+			return NULL;
+
+		if (eol - line > key_len &&
+		    !strncmp(line, key, key_len) &&
+		    line[key_len] == ' ') {
+			*out_len = eol - line - key_len - 1;
+			return line + key_len + 1;
+		}
+		line = *eol ? eol + 1 : NULL;
+	}
+	return NULL;
+}
diff --git a/commit.h b/commit.h
index a8cbf52..12f4fe1 100644
--- a/commit.h
+++ b/commit.h
@@ -313,6 +313,17 @@ extern struct commit_extra_header *read_commit_extra_headers(struct commit *, co
 
 extern void free_commit_extra_headers(struct commit_extra_header *extra);
 
+/*
+ * Search the commit object contents given by "msg" for the header "key".
+ * Returns a pointer to the start of the header contents, or NULL. The length
+ * of the header, up to the first newline, is returned via out_len.
+ *
+ * Note that some headers (like mergetag) may be multi-line. It is the caller's
+ * responsibility to parse further in this case!
+ */
+extern const char *find_commit_header(const char *msg, const char *key,
+				      size_t *out_len);
+
 typedef void (*each_mergetag_fn)(struct commit *commit, struct commit_extra_header *extra,
 				 void *cb_data);
 
diff --git a/pretty.c b/pretty.c
index 3a1da6f..3a70557 100644
--- a/pretty.c
+++ b/pretty.c
@@ -547,31 +547,11 @@ static void add_merge_info(const struct pretty_print_context *pp,
 	strbuf_addch(sb, '\n');
 }
 
-static char *get_header(const struct commit *commit, const char *msg,
-			const char *key)
+static char *get_header(const char *msg, const char *key)
 {
-	int key_len = strlen(key);
-	const char *line = msg;
-
-	while (line) {
-		const char *eol = strchrnul(line, '\n'), *next;
-
-		if (line == eol)
-			return NULL;
-		if (!*eol) {
-			warning("malformed commit (header is missing newline): %s",
-				sha1_to_hex(commit->object.sha1));
-			next = NULL;
-		} else
-			next = eol + 1;
-		if (eol - line > key_len &&
-		    !strncmp(line, key, key_len) &&
-		    line[key_len] == ' ') {
-			return xmemdupz(line + key_len + 1, eol - line - key_len - 1);
-		}
-		line = next;
-	}
-	return NULL;
+	size_t len;
+	const char *v = find_commit_header(msg, key, &len);
+	return v ? xmemdupz(v, len) : NULL;
 }
 
 static char *replace_encoding_header(char *buf, const char *encoding)
@@ -617,11 +597,10 @@ const char *logmsg_reencode(const struct commit *commit,
 
 	if (!output_encoding || !*output_encoding) {
 		if (commit_encoding)
-			*commit_encoding =
-				get_header(commit, msg, "encoding");
+			*commit_encoding = get_header(msg, "encoding");
 		return msg;
 	}
-	encoding = get_header(commit, msg, "encoding");
+	encoding = get_header(msg, "encoding");
 	if (commit_encoding)
 		*commit_encoding = encoding;
 	use_encoding = encoding ? encoding : utf8;
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/6] record_author_info: fix memory leak on malformed commit
  2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
  2014-08-27  7:56 ` [PATCH v2 1/6] commit: provide a function to find a header in a buffer Jeff King
@ 2014-08-27  7:56 ` Jeff King
  2014-08-27  7:56 ` [PATCH v2 3/6] record_author_info: use find_commit_header Jeff King
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27  7:56 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Erik Faye-Lund

If we hit the end-of-header without finding an "author"
line, we just return from the function. We should jump to
the fail_exit path to clean up the buffer that we may have
allocated.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index b6ffd62..4ff8c66 100644
--- a/commit.c
+++ b/commit.c
@@ -594,7 +594,7 @@ static void record_author_date(struct author_date_slab *author_date,
 		line_end = strchrnul(buf, '\n');
 		if (!skip_prefix(buf, "author ", &ident_line)) {
 			if (!line_end[0] || line_end[1] == '\n')
-				return; /* end of header */
+				goto fail_exit; /* end of header */
 			continue;
 		}
 		if (split_ident_line(&ident,
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/6] record_author_info: use find_commit_header
  2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
  2014-08-27  7:56 ` [PATCH v2 1/6] commit: provide a function to find a header in a buffer Jeff King
  2014-08-27  7:56 ` [PATCH v2 2/6] record_author_info: fix memory leak on malformed commit Jeff King
@ 2014-08-27  7:56 ` Jeff King
  2014-08-27  7:57 ` [PATCH v2 4/6] use strbufs in date functions Jeff King
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27  7:56 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Erik Faye-Lund

This saves us some manual parsing and makes the code more
readable.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/commit.c b/commit.c
index 4ff8c66..9416d84 100644
--- a/commit.c
+++ b/commit.c
@@ -584,25 +584,19 @@ define_commit_slab(author_date_slab, unsigned long);
 static void record_author_date(struct author_date_slab *author_date,
 			       struct commit *commit)
 {
-	const char *buf, *line_end, *ident_line;
 	const char *buffer = get_commit_buffer(commit, NULL);
 	struct ident_split ident;
+	const char *ident_line;
+	size_t ident_len;
 	char *date_end;
 	unsigned long date;
 
-	for (buf = buffer; buf; buf = line_end + 1) {
-		line_end = strchrnul(buf, '\n');
-		if (!skip_prefix(buf, "author ", &ident_line)) {
-			if (!line_end[0] || line_end[1] == '\n')
-				goto fail_exit; /* end of header */
-			continue;
-		}
-		if (split_ident_line(&ident,
-				     ident_line, line_end - ident_line) ||
-		    !ident.date_begin || !ident.date_end)
-			goto fail_exit; /* malformed "author" line */
-		break;
-	}
+	ident_line = find_commit_header(buffer, "author", &ident_len);
+	if (!ident_line)
+		goto fail_exit; /* no author line */
+	if (split_ident_line(&ident, ident_line, ident_len) ||
+	    !ident.date_begin || !ident.date_end)
+		goto fail_exit; /* malformed "author" line */
 
 	date = strtoul(ident.date_begin, &date_end, 10);
 	if (date_end != ident.date_end)
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/6] use strbufs in date functions
  2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
                   ` (2 preceding siblings ...)
  2014-08-27  7:56 ` [PATCH v2 3/6] record_author_info: use find_commit_header Jeff King
@ 2014-08-27  7:57 ` Jeff King
  2014-08-27  7:57 ` [PATCH v2 5/6] determine_author_info: reuse parsing functions Jeff King
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27  7:57 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Erik Faye-Lund

Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).

Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:

  1. In test-date.c, where we don't care about efficiency.

  2. In determine_author_info, which is not performance
     critical (and where the use of a strbuf will help later
     refactoring).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 20 ++++++++++----------
 cache.h          |  4 ++--
 date.c           | 13 +++++++------
 fast-import.c    | 20 +++++++++-----------
 ident.c          | 26 +++++++++++---------------
 test-date.c      | 10 ++++++----
 6 files changed, 45 insertions(+), 48 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 5ed6036..8da0a9f 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -520,19 +520,16 @@ static int sane_ident_split(struct ident_split *person)
 	return 1;
 }
 
-static int parse_force_date(const char *in, char *out, int len)
+static int parse_force_date(const char *in, struct strbuf *out)
 {
-	if (len < 1)
-		return -1;
-	*out++ = '@';
-	len--;
+	strbuf_addch(out, '@');
 
-	if (parse_date(in, out, len) < 0) {
+	if (parse_date(in, out) < 0) {
 		int errors = 0;
 		unsigned long t = approxidate_careful(in, &errors);
 		if (errors)
 			return -1;
-		snprintf(out, len, "%lu", t);
+		strbuf_addf(out, "%lu", t);
 	}
 
 	return 0;
@@ -542,7 +539,7 @@ static void determine_author_info(struct strbuf *author_ident)
 {
 	char *name, *email, *date;
 	struct ident_split author;
-	char date_buf[64];
+	struct strbuf date_buf = STRBUF_INIT;
 
 	name = getenv("GIT_AUTHOR_NAME");
 	email = getenv("GIT_AUTHOR_EMAIL");
@@ -588,9 +585,10 @@ static void determine_author_info(struct strbuf *author_ident)
 	}
 
 	if (force_date) {
-		if (parse_force_date(force_date, date_buf, sizeof(date_buf)))
+		strbuf_reset(&date_buf);
+		if (parse_force_date(force_date, &date_buf))
 			die(_("invalid date format: %s"), force_date);
-		date = date_buf;
+		date = date_buf.buf;
 	}
 
 	strbuf_addstr(author_ident, fmt_ident(name, email, date, IDENT_STRICT));
@@ -600,6 +598,8 @@ static void determine_author_info(struct strbuf *author_ident)
 		export_one("GIT_AUTHOR_EMAIL", author.mail_begin, author.mail_end, 0);
 		export_one("GIT_AUTHOR_DATE", author.date_begin, author.tz_end, '@');
 	}
+
+	strbuf_release(&date_buf);
 }
 
 static void split_ident_or_die(struct ident_split *id, const struct strbuf *buf)
diff --git a/cache.h b/cache.h
index fcb511d..96ecc1f 100644
--- a/cache.h
+++ b/cache.h
@@ -1044,10 +1044,10 @@ enum date_mode {
 const char *show_date(unsigned long time, int timezone, enum date_mode mode);
 void show_date_relative(unsigned long time, int tz, const struct timeval *now,
 			struct strbuf *timebuf);
-int parse_date(const char *date, char *buf, int bufsize);
+int parse_date(const char *date, struct strbuf *out);
 int parse_date_basic(const char *date, unsigned long *timestamp, int *offset);
 int parse_expiry_date(const char *date, unsigned long *timestamp);
-void datestamp(char *buf, int bufsize);
+void datestamp(struct strbuf *out);
 #define approxidate(s) approxidate_careful((s), NULL)
 unsigned long approxidate_careful(const char *, int *);
 unsigned long approxidate_relative(const char *date, const struct timeval *now);
diff --git a/date.c b/date.c
index 782de95..2c33468 100644
--- a/date.c
+++ b/date.c
@@ -605,7 +605,7 @@ static int match_tz(const char *date, int *offp)
 	return end - date;
 }
 
-static int date_string(unsigned long date, int offset, char *buf, int len)
+static void date_string(unsigned long date, int offset, struct strbuf *buf)
 {
 	int sign = '+';
 
@@ -613,7 +613,7 @@ static int date_string(unsigned long date, int offset, char *buf, int len)
 		offset = -offset;
 		sign = '-';
 	}
-	return snprintf(buf, len, "%lu %c%02d%02d", date, sign, offset / 60, offset % 60);
+	strbuf_addf(buf, "%lu %c%02d%02d", date, sign, offset / 60, offset % 60);
 }
 
 /*
@@ -735,13 +735,14 @@ int parse_expiry_date(const char *date, unsigned long *timestamp)
 	return errors;
 }
 
-int parse_date(const char *date, char *result, int maxlen)
+int parse_date(const char *date, struct strbuf *result)
 {
 	unsigned long timestamp;
 	int offset;
 	if (parse_date_basic(date, &timestamp, &offset))
 		return -1;
-	return date_string(timestamp, offset, result, maxlen);
+	date_string(timestamp, offset, result);
+	return 0;
 }
 
 enum date_mode parse_date_format(const char *format)
@@ -766,7 +767,7 @@ enum date_mode parse_date_format(const char *format)
 		die("unknown date format %s", format);
 }
 
-void datestamp(char *buf, int bufsize)
+void datestamp(struct strbuf *out)
 {
 	time_t now;
 	int offset;
@@ -776,7 +777,7 @@ void datestamp(char *buf, int bufsize)
 	offset = tm_to_time_t(localtime(&now)) - now;
 	offset /= 60;
 
-	date_string(now, offset, buf, bufsize);
+	date_string(now, offset, out);
 }
 
 /*
diff --git a/fast-import.c b/fast-import.c
index d73f58c..dc9f7a8 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1971,7 +1971,7 @@ static int parse_data(struct strbuf *sb, uintmax_t limit, uintmax_t *len_res)
 	return 1;
 }
 
-static int validate_raw_date(const char *src, char *result, int maxlen)
+static int validate_raw_date(const char *src, struct strbuf *result)
 {
 	const char *orig_src = src;
 	char *endp;
@@ -1989,11 +1989,10 @@ static int validate_raw_date(const char *src, char *result, int maxlen)
 		return -1;
 
 	num = strtoul(src + 1, &endp, 10);
-	if (errno || endp == src + 1 || *endp || (endp - orig_src) >= maxlen ||
-	    1400 < num)
+	if (errno || endp == src + 1 || *endp || 1400 < num)
 		return -1;
 
-	strcpy(result, orig_src);
+	strbuf_addstr(result, orig_src);
 	return 0;
 }
 
@@ -2001,7 +2000,7 @@ static char *parse_ident(const char *buf)
 {
 	const char *ltgt;
 	size_t name_len;
-	char *ident;
+	struct strbuf ident = STRBUF_INIT;
 
 	/* ensure there is a space delimiter even if there is no name */
 	if (*buf == '<')
@@ -2020,26 +2019,25 @@ static char *parse_ident(const char *buf)
 		die("Missing space after > in ident string: %s", buf);
 	ltgt++;
 	name_len = ltgt - buf;
-	ident = xmalloc(name_len + 24);
-	strncpy(ident, buf, name_len);
+	strbuf_add(&ident, buf, name_len);
 
 	switch (whenspec) {
 	case WHENSPEC_RAW:
-		if (validate_raw_date(ltgt, ident + name_len, 24) < 0)
+		if (validate_raw_date(ltgt, &ident) < 0)
 			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
 		break;
 	case WHENSPEC_RFC2822:
-		if (parse_date(ltgt, ident + name_len, 24) < 0)
+		if (parse_date(ltgt, &ident) < 0)
 			die("Invalid rfc2822 date \"%s\" in ident: %s", ltgt, buf);
 		break;
 	case WHENSPEC_NOW:
 		if (strcmp("now", ltgt))
 			die("Date in ident must be 'now': %s", buf);
-		datestamp(ident + name_len, 24);
+		datestamp(&ident);
 		break;
 	}
 
-	return ident;
+	return strbuf_detach(&ident, NULL);
 }
 
 static void parse_and_store_blob(
diff --git a/ident.c b/ident.c
index 1d9b6e7..9bcc4e1 100644
--- a/ident.c
+++ b/ident.c
@@ -9,7 +9,7 @@
 
 static struct strbuf git_default_name = STRBUF_INIT;
 static struct strbuf git_default_email = STRBUF_INIT;
-static char git_default_date[50];
+static struct strbuf git_default_date = STRBUF_INIT;
 
 #define IDENT_NAME_GIVEN 01
 #define IDENT_MAIL_GIVEN 02
@@ -129,9 +129,9 @@ const char *ident_default_email(void)
 
 static const char *ident_default_date(void)
 {
-	if (!git_default_date[0])
-		datestamp(git_default_date, sizeof(git_default_date));
-	return git_default_date;
+	if (!git_default_date.len)
+		datestamp(&git_default_date);
+	return git_default_date.buf;
 }
 
 static int crud(unsigned char c)
@@ -292,7 +292,6 @@ const char *fmt_ident(const char *name, const char *email,
 		      const char *date_str, int flag)
 {
 	static struct strbuf ident = STRBUF_INIT;
-	char date[50];
 	int strict = (flag & IDENT_STRICT);
 	int want_date = !(flag & IDENT_NO_DATE);
 	int want_name = !(flag & IDENT_NO_NAME);
@@ -320,15 +319,6 @@ const char *fmt_ident(const char *name, const char *email,
 		die("unable to auto-detect email address (got '%s')", email);
 	}
 
-	if (want_date) {
-		if (date_str && date_str[0]) {
-			if (parse_date(date_str, date, sizeof(date)) < 0)
-				die("invalid date format: %s", date_str);
-		}
-		else
-			strcpy(date, ident_default_date());
-	}
-
 	strbuf_reset(&ident);
 	if (want_name) {
 		strbuf_addstr_without_crud(&ident, name);
@@ -339,8 +329,14 @@ const char *fmt_ident(const char *name, const char *email,
 			strbuf_addch(&ident, '>');
 	if (want_date) {
 		strbuf_addch(&ident, ' ');
-		strbuf_addstr_without_crud(&ident, date);
+		if (date_str && date_str[0]) {
+			if (parse_date(date_str, &ident) < 0)
+				die("invalid date format: %s", date_str);
+		}
+		else
+			strbuf_addstr(&ident, ident_default_date());
 	}
+
 	return ident.buf;
 }
 
diff --git a/test-date.c b/test-date.c
index 10afaab..94a6997 100644
--- a/test-date.c
+++ b/test-date.c
@@ -19,19 +19,21 @@ static void show_dates(char **argv, struct timeval *now)
 
 static void parse_dates(char **argv, struct timeval *now)
 {
+	struct strbuf result = STRBUF_INIT;
+
 	for (; *argv; argv++) {
-		char result[100];
 		unsigned long t;
 		int tz;
 
-		result[0] = 0;
-		parse_date(*argv, result, sizeof(result));
-		if (sscanf(result, "%lu %d", &t, &tz) == 2)
+		strbuf_reset(&result);
+		parse_date(*argv, &result);
+		if (sscanf(result.buf, "%lu %d", &t, &tz) == 2)
 			printf("%s -> %s\n",
 			       *argv, show_date(t, tz, DATE_ISO8601));
 		else
 			printf("%s -> bad\n", *argv);
 	}
+	strbuf_release(&result);
 }
 
 static void parse_approxidate(char **argv, struct timeval *now)
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/6] determine_author_info: reuse parsing functions
  2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
                   ` (3 preceding siblings ...)
  2014-08-27  7:57 ` [PATCH v2 4/6] use strbufs in date functions Jeff King
@ 2014-08-27  7:57 ` Jeff King
  2014-08-27  7:57 ` [PATCH v2 6/6] determine_author_info: copy getenv output Jeff King
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27  7:57 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Erik Faye-Lund

Rather than parsing the header manually to find the "author"
field, and then parsing its sub-parts, let's use
find_commit_header and split_ident_line. This is shorter and
easier to read, and should do a more careful parsing job.

For example, the current parser could find the end-of-email
right-bracket across a newline (for a malformed commit), and
calculate a bogus gigantic length for the date (by using
"eol - rb").

As a bonus, this also plugs a memory leak when we pull the
date field from an existing commit (we still leak the name
and email buffers, which will be fixed in a later commit).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 51 ++++++++++++++++++++++-----------------------------
 1 file changed, 22 insertions(+), 29 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 8da0a9f..57b046b 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -546,42 +546,35 @@ static void determine_author_info(struct strbuf *author_ident)
 	date = getenv("GIT_AUTHOR_DATE");
 
 	if (author_message) {
-		const char *a, *lb, *rb, *eol;
-		size_t len;
+		struct ident_split ident;
+		unsigned long len;
+		const char *a;
 
-		a = strstr(author_message_buffer, "\nauthor ");
+		a = find_commit_header(author_message_buffer, "author", &len);
 		if (!a)
-			die(_("invalid commit: %s"), author_message);
-
-		lb = strchrnul(a + strlen("\nauthor "), '<');
-		rb = strchrnul(lb, '>');
-		eol = strchrnul(rb, '\n');
-		if (!*lb || !*rb || !*eol)
-			die(_("invalid commit: %s"), author_message);
-
-		if (lb == a + strlen("\nauthor "))
-			/* \nauthor <foo@example.com> */
-			name = xcalloc(1, 1);
-		else
-			name = xmemdupz(a + strlen("\nauthor "),
-					(lb - strlen(" ") -
-					 (a + strlen("\nauthor "))));
-		email = xmemdupz(lb + strlen("<"), rb - (lb + strlen("<")));
-		len = eol - (rb + strlen("> "));
-		date = xmalloc(len + 2);
-		*date = '@';
-		memcpy(date + 1, rb + strlen("> "), len);
-		date[len + 1] = '\0';
+			die(_("commit '%s' lacks author header"), author_message);
+		if (split_ident_line(&ident, a, len) < 0)
+			die(_("commit '%s' has malformed author line"), author_message);
+
+		name = xmemdupz(ident.name_begin, ident.name_end - ident.name_begin);
+		email = xmemdupz(ident.mail_begin, ident.mail_end - ident.mail_begin);
+		if (ident.date_begin) {
+			strbuf_reset(&date_buf);
+			strbuf_addch(&date_buf, '@');
+			strbuf_add(&date_buf, ident.date_begin, ident.date_end - ident.date_begin);
+			strbuf_addch(&date_buf, ' ');
+			strbuf_add(&date_buf, ident.tz_begin, ident.tz_end - ident.tz_begin);
+			date = date_buf.buf;
+		}
 	}
 
 	if (force_author) {
-		const char *lb = strstr(force_author, " <");
-		const char *rb = strchr(force_author, '>');
+		struct ident_split ident;
 
-		if (!lb || !rb)
+		if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
 			die(_("malformed --author parameter"));
-		name = xstrndup(force_author, lb - force_author);
-		email = xstrndup(lb + 2, rb - (lb + 2));
+		name = xmemdupz(ident.name_begin, ident.name_end - ident.name_begin);
+		email = xmemdupz(ident.mail_begin, ident.mail_end - ident.mail_begin);
 	}
 
 	if (force_date) {
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 6/6] determine_author_info: copy getenv output
  2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
                   ` (4 preceding siblings ...)
  2014-08-27  7:57 ` [PATCH v2 5/6] determine_author_info: reuse parsing functions Jeff King
@ 2014-08-27  7:57 ` Jeff King
  2014-08-27  9:06 ` [PATCH v2 0/6] clean up author parsing Christian Couder
  2014-08-27 17:36 ` Junio C Hamano
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27  7:57 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Erik Faye-Lund

When figuring out the author name for a commit, we may end
up either pointing to const storage from getenv("GIT_AUTHOR_*"),
or to newly allocated storage based on an existing commit or
the --author option.

Using const pointers to getenv's return has two problems:

  1. It is not guaranteed that the return value from getenv
     remains valid across multiple calls.

  2. We do not know whether to free the values at the end,
     so we just leak them.

We can solve both by duplicating the string returned by
getenv().

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 40 +++++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 57b046b..7f9f071 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -535,15 +535,26 @@ static int parse_force_date(const char *in, struct strbuf *out)
 	return 0;
 }
 
+static void set_ident_var(char **buf, char *val)
+{
+	free(*buf);
+	*buf = val;
+}
+
+static char *envdup(const char *var)
+{
+	const char *val = getenv(var);
+	return val ? xstrdup(val) : NULL;
+}
+
 static void determine_author_info(struct strbuf *author_ident)
 {
 	char *name, *email, *date;
 	struct ident_split author;
-	struct strbuf date_buf = STRBUF_INIT;
 
-	name = getenv("GIT_AUTHOR_NAME");
-	email = getenv("GIT_AUTHOR_EMAIL");
-	date = getenv("GIT_AUTHOR_DATE");
+	name = envdup("GIT_AUTHOR_NAME");
+	email = envdup("GIT_AUTHOR_EMAIL");
+	date = envdup("GIT_AUTHOR_DATE");
 
 	if (author_message) {
 		struct ident_split ident;
@@ -556,15 +567,16 @@ static void determine_author_info(struct strbuf *author_ident)
 		if (split_ident_line(&ident, a, len) < 0)
 			die(_("commit '%s' has malformed author line"), author_message);
 
-		name = xmemdupz(ident.name_begin, ident.name_end - ident.name_begin);
-		email = xmemdupz(ident.mail_begin, ident.mail_end - ident.mail_begin);
+		set_ident_var(&name, xmemdupz(ident.name_begin, ident.name_end - ident.name_begin));
+		set_ident_var(&email, xmemdupz(ident.mail_begin, ident.mail_end - ident.mail_begin));
+
 		if (ident.date_begin) {
-			strbuf_reset(&date_buf);
+			struct strbuf date_buf = STRBUF_INIT;
 			strbuf_addch(&date_buf, '@');
 			strbuf_add(&date_buf, ident.date_begin, ident.date_end - ident.date_begin);
 			strbuf_addch(&date_buf, ' ');
 			strbuf_add(&date_buf, ident.tz_begin, ident.tz_end - ident.tz_begin);
-			date = date_buf.buf;
+			set_ident_var(&date, strbuf_detach(&date_buf, NULL));
 		}
 	}
 
@@ -573,15 +585,15 @@ static void determine_author_info(struct strbuf *author_ident)
 
 		if (split_ident_line(&ident, force_author, strlen(force_author)) < 0)
 			die(_("malformed --author parameter"));
-		name = xmemdupz(ident.name_begin, ident.name_end - ident.name_begin);
-		email = xmemdupz(ident.mail_begin, ident.mail_end - ident.mail_begin);
+		set_ident_var(&name, xmemdupz(ident.name_begin, ident.name_end - ident.name_begin));
+		set_ident_var(&email, xmemdupz(ident.mail_begin, ident.mail_end - ident.mail_begin));
 	}
 
 	if (force_date) {
-		strbuf_reset(&date_buf);
+		struct strbuf date_buf = STRBUF_INIT;
 		if (parse_force_date(force_date, &date_buf))
 			die(_("invalid date format: %s"), force_date);
-		date = date_buf.buf;
+		set_ident_var(&date, strbuf_detach(&date_buf, NULL));
 	}
 
 	strbuf_addstr(author_ident, fmt_ident(name, email, date, IDENT_STRICT));
@@ -592,7 +604,9 @@ static void determine_author_info(struct strbuf *author_ident)
 		export_one("GIT_AUTHOR_DATE", author.date_begin, author.tz_end, '@');
 	}
 
-	strbuf_release(&date_buf);
+	free(name);
+	free(email);
+	free(date);
 }
 
 static void split_ident_or_die(struct ident_split *id, const struct strbuf *buf)
-- 
2.1.0.346.ga0367b9

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/6] clean up author parsing
  2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
                   ` (5 preceding siblings ...)
  2014-08-27  7:57 ` [PATCH v2 6/6] determine_author_info: copy getenv output Jeff King
@ 2014-08-27  9:06 ` Christian Couder
  2014-08-27 14:18   ` Jeff King
  2014-08-27 17:36 ` Junio C Hamano
  7 siblings, 1 reply; 18+ messages in thread
From: Christian Couder @ 2014-08-27  9:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Erik Faye-Lund

On Wed, Aug 27, 2014 at 9:55 AM, Jeff King <peff@peff.net> wrote:
>
>   [2/6]: record_author_info: fix memory leak on malformed commit
>   [3/6]: record_author_info: use find_commit_header

s/record_author_info/record_author_date/

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/6] clean up author parsing
  2014-08-27  9:06 ` [PATCH v2 0/6] clean up author parsing Christian Couder
@ 2014-08-27 14:18   ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27 14:18 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Eric Sunshine, Erik Faye-Lund

On Wed, Aug 27, 2014 at 11:06:27AM +0200, Christian Couder wrote:

> On Wed, Aug 27, 2014 at 9:55 AM, Jeff King <peff@peff.net> wrote:
> >
> >   [2/6]: record_author_info: fix memory leak on malformed commit
> >   [3/6]: record_author_info: use find_commit_header
> 
> s/record_author_info/record_author_date/

Thanks. I think I just had determine_author_info on the mind when I
wrote the commit messages.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27  7:56 ` [PATCH v2 1/6] commit: provide a function to find a header in a buffer Jeff King
@ 2014-08-27 17:30   ` Junio C Hamano
  2014-08-27 18:00     ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2014-08-27 17:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Erik Faye-Lund

Jeff King <peff@peff.net> writes:

> +const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
> +{
> +	int key_len = strlen(key);
> +	const char *line = msg;
> +
> +	while (line) {
> +		const char *eol = strchrnul(line, '\n');
> +
> +		if (line == eol)
> +			return NULL;
> +
> +		if (eol - line > key_len &&
> +		    !strncmp(line, key, key_len) &&
> +		    line[key_len] == ' ') {
> +			*out_len = eol - line - key_len - 1;
> +			return line + key_len + 1;

Hmph.  Does this have to worry about continuation lines in the
header part e.g. mergetag?  If the original in pretty.c was only
about the encoding, it may not have mattered, but now because it is
made public, it may matter more.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/6] clean up author parsing
  2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
                   ` (6 preceding siblings ...)
  2014-08-27  9:06 ` [PATCH v2 0/6] clean up author parsing Christian Couder
@ 2014-08-27 17:36 ` Junio C Hamano
  7 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2014-08-27 17:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Erik Faye-Lund

Overall it looked sensible, modulo a minor nit that may not matter
in the context of this series.  Will queue.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27 17:30   ` Junio C Hamano
@ 2014-08-27 18:00     ` Jeff King
  2014-08-27 18:16       ` Jeff King
  2014-08-27 19:05       ` Junio C Hamano
  0 siblings, 2 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Erik Faye-Lund

On Wed, Aug 27, 2014 at 10:30:22AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
> [...]
> 
> Hmph.  Does this have to worry about continuation lines in the
> header part e.g. mergetag?  If the original in pretty.c was only
> about the encoding, it may not have mattered, but now because it is
> made public, it may matter more.

If you mean parsing past continuation lines, then no, we do not need to
worry. We go line by line and look for the key at the beginning of a
line, so we would skip past any continuation lines.

If you mean including continuation lines in the output, I don't think
that's a good idea here. It would mean the function would have to copy
the value out (to get rid of the continuation whitespace) rather than
point directly into the msg buffer.

That may be something some callers want, but they should build it
separately around this find_commit_header, so that callers that want a
single line (like "encoding" or "author") do not have to pay the price.
I didn't bother building it out here since there are no callers which
want it yet (though I did not look at the mergetag code, which could
possibly be converted).

In the meantime, I hoped this comment would suffice for any callers to
know whether it was suitable:

+/*
+ * Search the commit object contents given by "msg" for the header "key".
+ * Returns a pointer to the start of the header contents, or NULL. The length
+ * of the header, up to the first newline, is returned via out_len.
+ *
+ * Note that some headers (like mergetag) may be multi-line. It is the caller's
+ * responsibility to parse further in this case!
+ */
+extern const char *find_commit_header(const char *msg, const char *key,
+                                     size_t *out_len);

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27 18:00     ` Jeff King
@ 2014-08-27 18:16       ` Jeff King
  2014-08-27 19:05       ` Junio C Hamano
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff King @ 2014-08-27 18:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Erik Faye-Lund

On Wed, Aug 27, 2014 at 02:00:16PM -0400, Jeff King wrote:

> That may be something some callers want, but they should build it
> separately around this find_commit_header, so that callers that want a
> single line (like "encoding" or "author") do not have to pay the price.
> I didn't bother building it out here since there are no callers which
> want it yet (though I did not look at the mergetag code, which could
> possibly be converted).

I just peeked at the mergetag code. It is all built around
read_commit_extra_headers, which is a different approach (it is "copy
out non-standard things", not "find this one thing I am looking for").

The former is more efficient if we are looking for lots of things, since
we'd only have to parse once. But we don't use it that way (we parse the
whole thing and then see if we have any "mergetag" headers).

The most efficient and convenient thing IMHO would be a progressive
parser that keeps a partially-parsed state and advances the parser
on-demand. So if I ask it for header "foo", it would start at the
beginning and parse until it finds "foo", marking the location of
anything along the way. If I then ask for "bar", it would keep going
from the end of "bar", and so forth.

I do not know if that is even worth the effort, though. I do not think
commit-parsing is a major hotspot for most operations (it might be for a
traversal, but we already use a minimalistic parser there that only
grabs the items that "struct commit" cares about). And we already
zlib-inflate the whole commit object in the first place, so it's not
like we haven't touched all these bytes anyway[1].

-Peff

[1] A long time ago I experimented with having parse_commit do a partial
    inflate, just up to the empty-line delimiter. I don't have the
    numbers handy, but I recall that it did not make a measurable
    improvement in rev-list speeds.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27 18:00     ` Jeff King
  2014-08-27 18:16       ` Jeff King
@ 2014-08-27 19:05       ` Junio C Hamano
  2014-08-27 19:14         ` Jeff King
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2014-08-27 19:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Erik Faye-Lund

Jeff King <peff@peff.net> writes:

> On Wed, Aug 27, 2014 at 10:30:22AM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > +const char *find_commit_header(const char *msg, const char *key, size_t *out_len)
>> [...]
>> 
>> Hmph.  Does this have to worry about continuation lines in the
>> header part e.g. mergetag?  If the original in pretty.c was only
>> about the encoding, it may not have mattered, but now because it is
>> made public, it may matter more.
>
> If you mean parsing past continuation lines, then no, we do not need to
> worry. We go line by line and look for the key at the beginning of a
> line, so we would skip past any continuation lines.
>
> If you mean including continuation lines in the output, I don't think
> that's a good idea here. It would mean the function would have to copy
> the value out (to get rid of the continuation whitespace) rather than
> point directly into the msg buffer.

I meant the counting of out_len.  You do not copy the contents for
the caller for a single line case either, so I wouldn't expect it.

You locate where the stuff begins to make it easier for the caller
to read or copy, and the caller may choose to stop reading from the
location up to out_len bytes, or it may choose to ignore out_len and
stop at the first newline.  For the latter kind of caller, it does
not matter if out_len does not point at the end of the "field"
(i.e. for a continued-line case), but for the former, wouldn't it be
more useful if out_len told where the end of the "field" is?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27 19:05       ` Junio C Hamano
@ 2014-08-27 19:14         ` Jeff King
  2014-08-27 19:26           ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2014-08-27 19:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Erik Faye-Lund

On Wed, Aug 27, 2014 at 12:05:06PM -0700, Junio C Hamano wrote:

> > If you mean including continuation lines in the output, I don't think
> > that's a good idea here. It would mean the function would have to copy
> > the value out (to get rid of the continuation whitespace) rather than
> > point directly into the msg buffer.
> 
> I meant the counting of out_len.  You do not copy the contents for
> the caller for a single line case either, so I wouldn't expect it.
> 
> You locate where the stuff begins to make it easier for the caller
> to read or copy, and the caller may choose to stop reading from the
> location up to out_len bytes, or it may choose to ignore out_len and
> stop at the first newline.  For the latter kind of caller, it does
> not matter if out_len does not point at the end of the "field"
> (i.e. for a continued-line case), but for the former, wouldn't it be
> more useful if out_len told where the end of the "field" is?

I think they are a direct tradeoff. If you include only the first line,
then callers who want multiple lines have to keep parsing.  If you
include multiple lines, then callers who care only about the first line
will have to re-find the newline rather than just using "out_len"
directly.

I suppose you could argue that people who are only expecting one line
(e.g., "encoding") should just assume that out_len ends at the first
line. For correctly-formatted commits, that works the same under either
scheme. For a broken commit where "encoding" _is_ multi-line, one case
would ignore the continued bits and the other case would return an
unexpected encoding value with newlines in it. The choice probably
doesn't matter much in practice.

Mostly I just punted on it with a comment since I did not plan to add
any multi-line callers, and I figured we could sort it out then. If you
feel strongly, it should be pretty easy to check for continuation and
extend out_len if necessary.

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27 19:14         ` Jeff King
@ 2014-08-27 19:26           ` Junio C Hamano
  2014-08-27 19:38             ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2014-08-27 19:26 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Erik Faye-Lund

Jeff King <peff@peff.net> writes:

> I think they are a direct tradeoff. If you include only the first line,
> then callers who want multiple lines have to keep parsing.  If you
> include multiple lines, then callers who care only about the first line
> will have to re-find the newline rather than just using "out_len"
> directly.

Fair enough.

> I suppose you could argue that people who are only expecting one line
> (e.g., "encoding") should just assume that out_len ends at the first
> line. For correctly-formatted commits, that works the same under either
> scheme. For a broken commit where "encoding" _is_ multi-line, one case
> would ignore the continued bits and the other case would return an
> unexpected encoding value with newlines in it. The choice probably
> doesn't matter much in practice.

True.

> Mostly I just punted on it with a comment since I did not plan to add
> any multi-line callers, and I figured we could sort it out then. If you
> feel strongly, it should be pretty easy to check for continuation and
> extend out_len if necessary.

I agree that we do not want to worry too much about the minority
multi-line callers.

I don't mind returning -1 in out_len and have the callers check.
That way will allow callers to easily diagnose this

	tree $T
        author $GIT_AUTHOR_IDENT
	committer $GIT_COMMITTER_IDENT
	encoding encoding
         unexpected continuation line

	log message

as an error; they would just make sure that out_len is not the "this
is continued; you need to parse the rest yourself" value.

Otherwise, the callers need to check value[out_len+1] to see if it
is an SP (value[out_len] must be '\n') to catch the error, I would
think.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27 19:26           ` Junio C Hamano
@ 2014-08-27 19:38             ` Jeff King
  2014-08-27 19:41               ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2014-08-27 19:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Erik Faye-Lund

On Wed, Aug 27, 2014 at 12:26:56PM -0700, Junio C Hamano wrote:

> I don't mind returning -1 in out_len and have the callers check.
> That way will allow callers to easily diagnose this
> 
> 	tree $T
>         author $GIT_AUTHOR_IDENT
> 	committer $GIT_COMMITTER_IDENT
> 	encoding encoding
>          unexpected continuation line
> 
> 	log message
> 
> as an error; they would just make sure that out_len is not the "this
> is continued; you need to parse the rest yourself" value.

Hmph. I was hoping that callers could just ignore this and fallback to
the "it's supposed to be one line, so assume that it is, and ignore
other lines" behavior. I guess that is the flip side of the "just use
the whole broken value" alternative that I mentioned earlier.

What I didn't want to do is deal with it in each callsite, like:

diff --git a/builtin/commit.c b/builtin/commit.c
index 7f9f071..10417bb 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -564,6 +564,8 @@ static void determine_author_info(struct strbuf *author_ident)
 		a = find_commit_header(author_message_buffer, "author", &len);
 		if (!a)
 			die(_("commit '%s' lacks author header"), author_message);
+		if (len == -1)
+			die(_("commit '%s' has a multi-line author"), author_message);
 		if (split_ident_line(&ident, a, len) < 0)
 			die(_("commit '%s' has malformed author line"), author_message);
 
diff --git a/commit.c b/commit.c
index 9416d84..6fec0c2 100644
--- a/commit.c
+++ b/commit.c
@@ -594,6 +594,8 @@ static void record_author_date(struct author_date_slab *author_date,
 	ident_line = find_commit_header(buffer, "author", &ident_len);
 	if (!ident_line)
 		goto fail_exit; /* no author line */
+	if (ident_len == -1)
+		goto fail_exit; /* multi-line author line */
 	if (split_ident_line(&ident, ident_line, ident_len) ||
 	    !ident.date_begin || !ident.date_end)
 		goto fail_exit; /* malformed "author" line */
@@ -1669,7 +1671,10 @@ const char *find_commit_header(const char *msg, const char *key, size_t *out_len
 		if (eol - line > key_len &&
 		    !strncmp(line, key, key_len) &&
 		    line[key_len] == ' ') {
-			*out_len = eol - line - key_len - 1;
+			if (eol[0] && eol[1] == ' ')
+				*out_len = -1;
+			else
+				*out_len = eol - line - key_len - 1;
 			return line + key_len + 1;
 		}
 		line = *eol ? eol + 1 : NULL;
diff --git a/pretty.c b/pretty.c
index 3a70557..ff2bf4a 100644
--- a/pretty.c
+++ b/pretty.c
@@ -551,7 +551,7 @@ static char *get_header(const char *msg, const char *key)
 {
 	size_t len;
 	const char *v = find_commit_header(msg, key, &len);
-	return v ? xmemdupz(v, len) : NULL;
+	return v && len != -1 ? xmemdupz(v, len) : NULL;
 }
 
 static char *replace_encoding_header(char *buf, const char *encoding)


which does not really add any value, IMHO.

-Peff

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
  2014-08-27 19:38             ` Jeff King
@ 2014-08-27 19:41               ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2014-08-27 19:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Eric Sunshine, Erik Faye-Lund

Jeff King <peff@peff.net> writes:

> What I didn't want to do is deal with it in each callsite, like:

OK.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-08-27 19:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
2014-08-27  7:56 ` [PATCH v2 1/6] commit: provide a function to find a header in a buffer Jeff King
2014-08-27 17:30   ` Junio C Hamano
2014-08-27 18:00     ` Jeff King
2014-08-27 18:16       ` Jeff King
2014-08-27 19:05       ` Junio C Hamano
2014-08-27 19:14         ` Jeff King
2014-08-27 19:26           ` Junio C Hamano
2014-08-27 19:38             ` Jeff King
2014-08-27 19:41               ` Junio C Hamano
2014-08-27  7:56 ` [PATCH v2 2/6] record_author_info: fix memory leak on malformed commit Jeff King
2014-08-27  7:56 ` [PATCH v2 3/6] record_author_info: use find_commit_header Jeff King
2014-08-27  7:57 ` [PATCH v2 4/6] use strbufs in date functions Jeff King
2014-08-27  7:57 ` [PATCH v2 5/6] determine_author_info: reuse parsing functions Jeff King
2014-08-27  7:57 ` [PATCH v2 6/6] determine_author_info: copy getenv output Jeff King
2014-08-27  9:06 ` [PATCH v2 0/6] clean up author parsing Christian Couder
2014-08-27 14:18   ` Jeff King
2014-08-27 17:36 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).