All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Thomas Bock <bockthom@cs.uni-saarland.de>,
	Derrick Stolee <derrickstolee@github.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 2/3] parse_commit(): parse timestamp from end of line
Date: Mon, 24 Apr 2023 10:05:42 -0700	[thread overview]
Message-ID: <xmqqcz3tfbx5.fsf@gitster.g> (raw)
In-Reply-To: <20230422134703.GB3942326@coredump.intra.peff.net> (Jeff King's message of "Sat, 22 Apr 2023 09:47:03 -0400")

Jeff King <peff@peff.net> writes:

> To find the committer timestamp, we parse left-to-right looking for the
> closing ">" of the email, and then expect the timestamp right after
> that. But we've seen some broken cases in the wild where this fails, but
> we _could_ find the timestamp with a little extra work. E.g.:
>
>   Name <Name<email>> 123456789 -0500
> ...
> So let's use the same trick as 03818a4a94: find the end of the line, and
> parse back to the final ">".

This obviously assumes that even in a broken ident, it is very
likely that the second component (where <e-mail> usually sits) ends
with a '>'.  Given that we enclose whatever garbage the end user
gave us as their e-mail inside a pair of <> ourselves, it is a very
sensible assumption, I would think.  The original parser assumed
that the end user would not have '>' inside their e-mail part of the
ident, which turns out to be more problematic than the alternative
being proposed.  It is doubly good that we already parse from the
end elsewhere.

Nice.

> +	/*
> +	 * parse to end-of-line and then walk backwards, which
> +	 * handles some malformed cases.
> +	 */

I would say "parse to" -> "jump to", but technically moving forward
looking for a LF byte is still "parsing".  "some" malformed cases
being "most plausible" ones (due to how ident.c::fmt_ident() is what
writes '>' after the string end-user gave as e-mail) may be worth
mentioning.

> +	eol = memchr(buf, '\n', tail - buf);
> +	if (!eol)
>  		return 0;

OK.

> -	dateptr = buf;
> -	while (buf < tail && *buf++ != '\n')
> +	for (dateptr = eol; dateptr > buf && dateptr[-1] != '>'; dateptr--)
>  		/* nada */;

OK.  Just a style thing, but I found that "; /* nada */" is easier
to spot that there is an empty statement there.

> -	if (buf >= tail)
> +	if (dateptr == buf || dateptr == eol)
>  		return 0;

Curious when dateptr that wanted to scan back from eol is still at
eol after the loop.  It is when the ident line ends with ">" without
any timestamp/tz info.   And the reason why we need to check that
here is ...

> -	/* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */
> +
> +	/* dateptr < eol && *eol == '\n', so parsing will stop at eol */
>  	return parse_timestamp(dateptr, NULL, 10);

... because parse_timestamp() is merely strtoumax() and would
happily skip over arbitrary number of leading "whitespace" without
stopping if (dateptr == eol && *eol == '\n').  OK, sad but correct.


  reply	other threads:[~2023-04-24 17:05 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 11:37 Weird behavior of 'git log --before' or 'git log --date-order': Commits from 2011 are treated to be before 1980 Thomas Bock
2023-04-15  8:52 ` Jeff King
2023-04-15  8:59   ` Jeff King
2023-04-15 14:10   ` Kristoffer Haugsbakk
2023-04-17  5:40     ` Jeff King
2023-04-17  6:20       ` Kristoffer Haugsbakk
2023-04-17  7:41         ` Jeff King
2023-04-27 22:32           ` Kristoffer Haugsbakk
2023-04-17  9:51   ` Junio C Hamano
2023-04-18  4:12     ` Jeff King
2023-04-18 14:02       ` Derrick Stolee
2023-04-21 14:51         ` Thomas Bock
2023-04-22 13:41           ` [PATCH 0/3] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-22 13:42             ` [PATCH 1/3] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-22 13:47             ` [PATCH 2/3] parse_commit(): parse timestamp from end of line Jeff King
2023-04-24 17:05               ` Junio C Hamano [this message]
2023-04-25  5:23                 ` Jeff King
2023-04-24 16:39             ` [PATCH 0/3] fixing some parse_commit() timestamp corner cases Junio C Hamano
2023-04-25  5:52             ` [PATCH v2 " Jeff King
2023-04-25  5:54               ` Jeff King
2023-04-25  5:54               ` [PATCH v2 1/4] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-25  5:54               ` [PATCH v2 2/4] parse_commit(): parse timestamp from end of line Jeff King
2023-04-25  5:54               ` [PATCH v2 3/4] parse_commit(): handle broken whitespace-only timestamp Jeff King
2023-04-25 10:11                 ` Phillip Wood
2023-04-25 16:06                   ` Junio C Hamano
2023-04-26 11:36                     ` Jeff King
2023-04-26 15:32                       ` Junio C Hamano
2023-04-27  8:13                         ` [PATCH v3 0/4] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-27  8:14                           ` [PATCH v3 1/4] t4212: avoid putting git on left-hand side of pipe Jeff King
2023-04-27  8:14                           ` [PATCH v3 2/4] parse_commit(): parse timestamp from end of line Jeff King
2023-04-27  8:17                           ` [PATCH v3 3/4] parse_commit(): handle broken whitespace-only timestamp Jeff King
2023-04-27 10:11                             ` Phillip Wood
2023-04-27 11:55                               ` Phillip Wood
2023-04-27 16:46                                 ` Jeff King
2023-04-27 16:20                               ` Junio C Hamano
2023-04-27 16:55                                 ` Jeff King
2023-04-27 16:25                             ` Junio C Hamano
2023-04-27 16:57                               ` Jeff King
2023-04-27  8:17                           ` [PATCH v3 4/4] parse_commit(): describe more date-parsing failure modes Jeff King
2023-04-27  8:18                           ` [PATCH v3 0/4] fixing some parse_commit() timestamp corner cases Jeff King
2023-04-27 16:32                           ` Junio C Hamano
2023-04-26 14:06                     ` [PATCH v2 3/4] parse_commit(): handle broken whitespace-only timestamp Phillip Wood
2023-04-26 14:31                       ` Andreas Schwab
2023-04-26 14:44                         ` Phillip Wood
2023-04-25  5:55               ` [PATCH v2 4/4] parse_commit(): describe more date-parsing failure modes Jeff King
2023-04-22 13:52         ` Weird behavior of 'git log --before' or 'git log --date-order': Commits from 2011 are treated to be before 1980 Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqcz3tfbx5.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=bockthom@cs.uni-saarland.de \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.