* [RFC/PATCH 1/2] parse_date(): allow ancient git-timestamp
2012-02-02 21:41 [RFC/PATCH 0/2] Commits with ancient timestamps Junio C Hamano
@ 2012-02-02 21:41 ` Junio C Hamano
2012-02-02 21:41 ` [RFC/PATCH 2/2] parse_date(): '@' prefix forces git-timestamp Junio C Hamano
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2012-02-02 21:41 UTC (permalink / raw)
To: git
The date-time parser parses out a human-readble datestring piece by
piece, so that it could even parse a string in a rather strange
notation like 'noon november 11, 2005', but restricts itself from
parsing strings in "<seconds since epoch> <timezone>" format only
for reasonably new timestamps (like 1974 or newer) with 10 or more
digits. This is to prevent a string like "20100917" from getting
interpreted as seconds since epoch (we want to treat it as September
17, 2010 instead) while doing so.
The same codepath is used to read back the timestamp that we have
already recorded in the headers of commit and tag objects; because
of this, such a commit with timestamp "0 +0000" cannot be rebased or
amended very easily.
Teach parse_date() codepath to special case a string of the form
"<digits> +<4-digits>" to work this issue around, but require that
there is no other cruft around the string when parsing a timestamp
of this format for safety.
Note that this has a slight backward incompatibility implications.
If somebody writes "git commit --date='20100917 +0900'" and wants it
to mean a timestamp in September 2010 in Japan, this change will
break such a use case.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
date.c | 29 +++++++++++++++++++++++++++++
1 files changed, 29 insertions(+), 0 deletions(-)
diff --git a/date.c b/date.c
index 896fbb4..c212946 100644
--- a/date.c
+++ b/date.c
@@ -585,6 +585,33 @@ static int date_string(unsigned long date, int offset, char *buf, int len)
return snprintf(buf, len, "%lu %c%02d%02d", date, sign, offset / 60, offset % 60);
}
+/*
+ * Parse a string like "0 +0000" as ancient timestamp near epoch, but
+ * only when it appears not as part of any other string.
+ */
+static int match_object_header_date(const char *date, unsigned long *timestamp, int *offset)
+{
+ char *end;
+ unsigned long stamp;
+ int ofs;
+
+ if (*date < '0' || '9' <= *date)
+ return -1;
+ stamp = strtoul(date, &end, 10);
+ if (*end != ' ' || stamp == ULONG_MAX || (end[1] != '+' && end[1] != '-'))
+ return -1;
+ date = end + 2;
+ ofs = strtol(date, &end, 10);
+ if ((*end != '\0' && (*end != '\n')) || end != date + 4)
+ return -1;
+ ofs = (ofs / 100) * 60 + (ofs % 100);
+ if (date[-1] == '-')
+ ofs = -ofs;
+ *timestamp = stamp;
+ *offset = ofs;
+ return 0;
+}
+
/* Gr. strptime is crap for this; it doesn't have a way to require RFC2822
(i.e. English) day/month names, and it doesn't work correctly with %z. */
int parse_date_basic(const char *date, unsigned long *timestamp, int *offset)
@@ -610,6 +637,8 @@ int parse_date_basic(const char *date, unsigned long *timestamp, int *offset)
*offset = -1;
tm_gmt = 0;
+ if (!match_object_header_date(date, timestamp, offset))
+ return 0; /* success */
for (;;) {
int match = 0;
unsigned char c = *date;
--
1.7.9.172.ge26ae
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC/PATCH 2/2] parse_date(): '@' prefix forces git-timestamp
2012-02-02 21:41 [RFC/PATCH 0/2] Commits with ancient timestamps Junio C Hamano
2012-02-02 21:41 ` [RFC/PATCH 1/2] parse_date(): allow ancient git-timestamp Junio C Hamano
@ 2012-02-02 21:41 ` Junio C Hamano
2012-02-03 10:44 ` [RFC/PATCH 0/2] Commits with ancient timestamps Thomas Rast
2012-02-03 14:53 ` Han-Wen Nienhuys
3 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2012-02-02 21:41 UTC (permalink / raw)
To: git
The only place that the issue this series addresses was observed
where we read "cat-file commit" output and put it in GIT_AUTHOR_DATE
in order to replay a commit with an ancient timestamp.
With the previous patch alone, "git commit --date='20100917 +0900'"
can be misinterpreted to mean an ancient timestamp, not September in
year 2010. Guard this codepath by requring an extra '@' in front of
the raw git timestamp on the parsing side. This of course needs to
be compensated by updating get_author_ident_from_commit and the code
for "git commit --amend" to prepend '@' to the string read from the
existing commit in the GIT_AUTHOR_DATE environment variable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
builtin/commit.c | 6 ++++++
date.c | 3 ++-
git-sh-setup.sh | 2 +-
t/t3400-rebase.sh | 23 +++++++++++++++++++++++
4 files changed, 32 insertions(+), 2 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index cbc9613..bcb0db2 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -534,6 +534,7 @@ static void determine_author_info(struct strbuf *author_ident)
if (author_message) {
const char *a, *lb, *rb, *eol;
+ size_t len;
a = strstr(author_message_buffer, "\nauthor ");
if (!a)
@@ -554,6 +555,11 @@ static void determine_author_info(struct strbuf *author_ident)
(a + strlen("\nauthor "))));
email = xmemdupz(lb + strlen("<"), rb - (lb + strlen("<")));
date = xmemdupz(rb + strlen("> "), eol - (rb + strlen("> ")));
+ len = eol - (rb + strlen("> "));
+ date = xmalloc(len + 2);
+ *date = '@';
+ memcpy(date + 1, rb + strlen("> "), len);
+ date[len + 1] = '\0';
}
if (force_author) {
diff --git a/date.c b/date.c
index c212946..ca60767 100644
--- a/date.c
+++ b/date.c
@@ -637,7 +637,8 @@ int parse_date_basic(const char *date, unsigned long *timestamp, int *offset)
*offset = -1;
tm_gmt = 0;
- if (!match_object_header_date(date, timestamp, offset))
+ if (*date == '@' &&
+ !match_object_header_date(date + 1, timestamp, offset))
return 0; /* success */
for (;;) {
int match = 0;
diff --git a/git-sh-setup.sh b/git-sh-setup.sh
index 8e427da..015fe6e 100644
--- a/git-sh-setup.sh
+++ b/git-sh-setup.sh
@@ -200,7 +200,7 @@ get_author_ident_from_commit () {
s/.*/GIT_AUTHOR_EMAIL='\''&'\''/p
g
- s/^author [^<]* <[^>]*> \(.*\)$/\1/
+ s/^author [^<]* <[^>]*> \(.*\)$/@\1/
s/.*/GIT_AUTHOR_DATE='\''&'\''/p
q
diff --git a/t/t3400-rebase.sh b/t/t3400-rebase.sh
index 6eaecec..e26e14d 100755
--- a/t/t3400-rebase.sh
+++ b/t/t3400-rebase.sh
@@ -218,4 +218,27 @@ test_expect_success 'rebase -m can copy notes' '
test "a note" = "$(git notes show HEAD)"
'
+test_expect_success 'rebase commit with an ancient timestamp' '
+ git reset --hard &&
+
+ >old.one && git add old.one && test_tick &&
+ git commit --date="@12345 +0400" -m "Old one" &&
+ >old.two && git add old.two && test_tick &&
+ git commit --date="@23456 +0500" -m "Old two" &&
+ >old.three && git add old.three && test_tick &&
+ git commit --date="@34567 +0600" -m "Old three" &&
+
+ git cat-file commit HEAD^^ >actual &&
+ grep "author .* 12345 +0400$" actual &&
+ git cat-file commit HEAD^ >actual &&
+ grep "author .* 23456 +0500$" actual &&
+ git cat-file commit HEAD >actual &&
+ grep "author .* 34567 +0600$" actual &&
+
+ git rebase --onto HEAD^^ HEAD^ &&
+
+ git cat-file commit HEAD >actual &&
+ grep "author .* 34567 +0600$" actual
+'
+
test_done
--
1.7.9.172.ge26ae
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC/PATCH 0/2] Commits with ancient timestamps
2012-02-02 21:41 [RFC/PATCH 0/2] Commits with ancient timestamps Junio C Hamano
2012-02-02 21:41 ` [RFC/PATCH 1/2] parse_date(): allow ancient git-timestamp Junio C Hamano
2012-02-02 21:41 ` [RFC/PATCH 2/2] parse_date(): '@' prefix forces git-timestamp Junio C Hamano
@ 2012-02-03 10:44 ` Thomas Rast
2012-02-03 18:01 ` Junio C Hamano
2012-02-03 14:53 ` Han-Wen Nienhuys
3 siblings, 1 reply; 6+ messages in thread
From: Thomas Rast @ 2012-02-03 10:44 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano <gitster@pobox.com> writes:
> avoid misinterpreting human-written timestamp in other formats, and
> timestamps before 1975 do not have enough number of digits in them.
>
> Here is a two-patch series that may improve the situation.
Doing this just makes me wonder how important exactly the 1970-1975
range is. Is there a notable software history from that era that can be
recovered?
(Your [1/2] does not seem to parse negative offsets from the unix epoch,
so anything before 1970 is still out.)
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC/PATCH 0/2] Commits with ancient timestamps
2012-02-03 10:44 ` [RFC/PATCH 0/2] Commits with ancient timestamps Thomas Rast
@ 2012-02-03 18:01 ` Junio C Hamano
0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2012-02-03 18:01 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Han-Wen Nienhuys
Thomas Rast <trast@inf.ethz.ch> writes:
> Doing this just makes me wonder how important exactly the 1970-1975
> range is. Is there a notable software history from that era that can be
> recovered?
That is not really a valid question. People who wrote private stuff in
that era deserve to be users of Git, too.
> (Your [1/2] does not seem to parse negative offsets from the unix epoch,
> so anything before 1970 is still out.)
Yes, pre-epoch timestamps are also useful for projects like US
Constitution.
http://thread.gmane.org/gmane.comp.version-control.git/152433/focus=152725
For that, we would need to use and pass around time_t (or intmax_t if we
follow the reason why originally Linus chose to avoid time_t) throughout
the codebase. If you actually write commit objects that record pre-epoch
timestamps, however, they will become unreadable by the current versions
of Git (as they would not understand such a negative raw timestamp).
In any case, that is a goal for a much longer term.
But even after such a change happens, you still need a way for Git to
replay a raw timestamp stored in commit objects without regressing the
parse_date() interface too much. These two patches show one way to do so
with minimum disruption.
As an added bonus, with the second patch, the way to spell a raw timestamp
happens to become compatible with how GNU date accepts one, i.e.
$ date -d @1000000000
even though we do not have to encourage the use of this notation by humans,
tools and script writers may find it useful.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC/PATCH 0/2] Commits with ancient timestamps
2012-02-02 21:41 [RFC/PATCH 0/2] Commits with ancient timestamps Junio C Hamano
` (2 preceding siblings ...)
2012-02-03 10:44 ` [RFC/PATCH 0/2] Commits with ancient timestamps Thomas Rast
@ 2012-02-03 14:53 ` Han-Wen Nienhuys
3 siblings, 0 replies; 6+ messages in thread
From: Han-Wen Nienhuys @ 2012-02-03 14:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Thu, Feb 2, 2012 at 7:41 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Result of conversion of ancient history from other SCMs, and output from
> other third-party tools, can record timestamps that predates inception of
> Git. They can cause "git am", "git rebase" and "git commit --amend" to
> misbehave, because the raw git timestamp e.g.
>
> author <a.u.thor@example.com> 1328214896 -0800
>
> are read from the commit object and passed to parse_date() machinery,
As a bit of context: we have some internal tools at Google that create
administrative commits that should have no timestamp. I am using "0"
and "1" as a deterministic timestamps in these cases (ie. the start of
the epoch). While this works well in general, there are some git
subcommands that barf on this, causing user-unhappiness.
This patch will hopefully resolve these breakages.
--
Han-Wen Nienhuys
Google Engineering Belo Horizonte
hanwen@google.com
^ permalink raw reply [flat|nested] 6+ messages in thread