git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RFH: unexpected reflog behavior with --since=
@ 2011-11-09  0:22 Eric Raible
  2011-11-09 22:01 ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Raible @ 2011-11-09  0:22 UTC (permalink / raw)
  To: git@vger.kernel.org

I'm trying to leverage the reflog to speed up our $dayjob build procedure
(it's complicated), and found unexpected behavior when limiting reflog
output with --since.

    git init reflog-test
    cd reflog-test

    touch a && git add a && git commit -m'add a'
    sleep 1
    touch b && git add b && git commit -m'add b'

    # add_b will be the time that b was added (email ends with '>')
    add_b=$(tail -1 .git/logs/HEAD | perl -e "print( <> =~ m/> (\S+)/ )")

    # It's reported correctly here:
    git log -g --oneline --since=$add_b

    # But after a reset no history isn't shown.
    git reset --hard HEAD^
    git log -g --oneline --since=$add_b

Is this a bug?  Of course everything is reported when --since isn't used,
but not so when limited with --since.

1.7.7.1.msysgit.0

Thanks - Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-09  0:22 RFH: unexpected reflog behavior with --since= Eric Raible
@ 2011-11-09 22:01 ` Jeff King
  2011-11-09 22:20   ` Jeff King
  2011-11-10  7:48   ` Eric Raible
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2011-11-09 22:01 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org

On Tue, Nov 08, 2011 at 04:22:41PM -0800, Eric Raible wrote:

>     # It's reported correctly here:
>     git log -g --oneline --since=$add_b
> 
>     # But after a reset no history isn't shown.
>     git reset --hard HEAD^
>     git log -g --oneline --since=$add_b
> 
> Is this a bug?  Of course everything is reported when --since isn't used,
> but not so when limited with --since.

It's sort of a bug. And sort of a missing feature.

In the normal revision walking case, git walks the history graph
backwards, hitting the parent of each commit (and when there are
multiple lines of history, we traverse them in commit timestamp order).

So "--since" works not just by omitting non-matching commits from the
output, but also by stopping the traversal when we go too far back in
time. In a sense, this is purely an optimization, as it shouldn't change
the output. But it's an important one, because it makes looking back in
time O(how far back) instead of O(size of all history).

This optimization breaks down badly, of course, in the face of clock
skew (i.e., a commit whose timestamp is further back than its parent).
There are a few tricks we do to avoid small runs of moderate skew, and
in practice it works well.

Now let's look at reflog walking. It's kind of bolted on to the side
of the revision traversal machinery. We walk through the reflog
backwards and pretend that entry N's parent is entry N-1 (you can see
this if you do "git log -g -p", for example; you see the patch versus
the last reflog entry, not the patch against the commit's true parent).

In the case of rewound history (like the reset you showed above), this
means that the history graph will appear to have bad clock skew. The
timestamp of HEAD@{0} is going to be much earlier than its pretend
parent, HEAD@{1}. And the "--since" optimization is going to cut off
traversal, even though there are more interesting commits to be shown.

So in that sense, I think it's a bug, and we should probably disable the
exit-early-from-traversal optimization when we're walking reflogs.

But it may also be a misfeature, because it's not clear what you're
actually trying to limit by. We have commit timestamps, of course, but
when we are walking reflogs, we also have reflog timestamps. Did you
actually want to say "show me all commits in the reflog, in reverse
reflog order, omitting commits that happened before time t"? Or did you
really mean "show me the reflog entries that happened before time t,
regardless of their commit timestamp"?

In the latter case, we would either need a new specifier (like
"--reflog-since"), or to rewrite the commit timestamp when we rewrite
the parent pointers.

The latter has a certain elegance to it (we are making a pretend linear
history graph out of the reflog, so faking the timestamps to be sensible
and in order is a logical thing to do) but I worry about lying too much
in the output. Something like "git log -g --format=%cd" would now have
the fake timestamp in the output. But then, we already show the fake
parents in the output, so I don't know that this is any worse.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-09 22:01 ` Jeff King
@ 2011-11-09 22:20   ` Jeff King
  2011-11-09 22:26     ` Jeff King
                       ` (2 more replies)
  2011-11-10  7:48   ` Eric Raible
  1 sibling, 3 replies; 13+ messages in thread
From: Jeff King @ 2011-11-09 22:20 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org

On Wed, Nov 09, 2011 at 05:01:28PM -0500, Jeff King wrote:

> In the latter case, we would either need a new specifier (like
> "--reflog-since"), or to rewrite the commit timestamp when we rewrite
> the parent pointers.
> 
> The latter has a certain elegance to it (we are making a pretend linear
> history graph out of the reflog, so faking the timestamps to be sensible
> and in order is a logical thing to do) but I worry about lying too much
> in the output. Something like "git log -g --format=%cd" would now have
> the fake timestamp in the output. But then, we already show the fake
> parents in the output, so I don't know that this is any worse.

This patch (which is below) turns out to be absurdly simple. And it
actually still prints the original commit timestamp, because we end up
reparsing it out of the commit object during the pretty-print phase.

So I think the only decision is whether "--since" should respect the
commit timestamps (and be used as a sort of "grep" filter for
timestamps), or whether it should be respecting the fake history we
create when doing a reflog walk.

I think I am leaning towards the latter. It seems to me to be the more
likely guess for what the user would want. And there is real benefit to
doing it in git, since we can stop the traversal early. In the
"grep-like" case, doing it inside git is not really any more efficient
than filtering in a pipeline, like:

  git log -g --format='%ct %H' |
  awk '{ print $2 if $1 < SOME_TIMESTAMP }'

Of course we could still offer both (with a "--reflog-since" type of
option). We'd also need to turn off the optimization for "--since", and
then check whether "--until" has a similar bug (and offer
"--reflog-until").

diff --git a/reflog-walk.c b/reflog-walk.c
index 5d81d39..2e5b270 100644
--- a/reflog-walk.c
+++ b/reflog-walk.c
@@ -231,6 +231,7 @@ void fake_reflog_parent(struct reflog_walk_info *info, struct commit *commit)
 	reflog = &commit_reflog->reflogs->items[commit_reflog->recno];
 	info->last_commit_reflog = commit_reflog;
 	commit_reflog->recno--;
+	commit->date = reflog->timestamp;
 	commit_info->commit = (struct commit *)parse_object(reflog->osha1);
 	if (!commit_info->commit) {
 		commit->parents = NULL;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-09 22:20   ` Jeff King
@ 2011-11-09 22:26     ` Jeff King
  2011-11-10  8:04     ` Eric Raible
  2011-11-12  6:50     ` Junio C Hamano
  2 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2011-11-09 22:26 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org

On Wed, Nov 09, 2011 at 05:20:32PM -0500, Jeff King wrote:

>   git log -g --format='%ct %H' |
>   awk '{ print $2 if $1 < SOME_TIMESTAMP }'

Hmm, that is obviously not valid awk syntax. My brain has been too fried
by perl. And the comparison goes the wrong way. A (closer to) working
example would be:

  git log -g --format='%ct %H' |
  perl -alne 'print $F[1] if $F[0] > SOME_TIMESTAMP'

But hopefully you get the point.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-09 22:01 ` Jeff King
  2011-11-09 22:20   ` Jeff King
@ 2011-11-10  7:48   ` Eric Raible
  2011-11-10  7:59     ` Jeff King
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Raible @ 2011-11-10  7:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org

On 11/9/2011 2:01 PM, Jeff King wrote:
> On Tue, Nov 08, 2011 at 04:22:41PM -0800, Eric Raible wrote:
> 
> [explanation how --since is used to limits traversal omitted]

Yes, all that is as expected, and makes sense.

> Now let's look at reflog walking. It's kind of bolted on to the side
> of the revision traversal machinery. We walk through the reflog
> backwards and pretend that entry N's parent is entry N-1 (you can see
> this if you do "git log -g -p", for example; you see the patch versus
> the last reflog entry, not the patch against the commit's true parent).
> 
> In the case of rewound history (like the reset you showed above), this
> means that the history graph will appear to have bad clock skew. The
> timestamp of HEAD@{0} is going to be much earlier than its pretend
> parent, HEAD@{1}. And the "--since" optimization is going to cut off
> traversal, even though there are more interesting commits to be shown.
> 
> So in that sense, I think it's a bug, and we should probably disable the
> exit-early-from-traversal optimization when we're walking reflogs.

Indeed.  Seems like a case of an optimization leading to an incorrect result.

> But it may also be a misfeature, because it's not clear what you're
> actually trying to limit by. We have commit timestamps, of course, but
> when we are walking reflogs, we also have reflog timestamps. Did you
> actually want to say "show me all commits in the reflog, in reverse
> reflog order, omitting commits that happened before time t"? Or did you
> really mean "show me the reflog entries that happened before time t,
> regardless of their commit timestamp"?

I meant "show me the reflog entries that happened *since* time t,
regardless of their commit timestamp.

> In the latter case, we would either need a new specifier (like
> "--reflog-since"), or to rewrite the commit timestamp when we rewrite
> the parent pointers.
> 
> The latter has a certain elegance to it (we are making a pretend linear
> history graph out of the reflog, so faking the timestamps to be sensible
> and in order is a logical thing to do) but I worry about lying too much
> in the output. Something like "git log -g --format=%cd" would now have
> the fake timestamp in the output. But then, we already show the fake
> parents in the output, so I don't know that this is any worse.

Since -g is asking specifying for the reflog, and since the reflog has
its own timestamps, I would expect that those timestamps be used.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-10  7:48   ` Eric Raible
@ 2011-11-10  7:59     ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2011-11-10  7:59 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org

On Wed, Nov 09, 2011 at 11:48:58PM -0800, Eric Raible wrote:

> > But it may also be a misfeature, because it's not clear what you're
> > actually trying to limit by. We have commit timestamps, of course, but
> > when we are walking reflogs, we also have reflog timestamps. Did you
> > actually want to say "show me all commits in the reflog, in reverse
> > reflog order, omitting commits that happened before time t"? Or did you
> > really mean "show me the reflog entries that happened before time t,
> > regardless of their commit timestamp"?
> 
> I meant "show me the reflog entries that happened *since* time t,
> regardless of their commit timestamp.

Err, yeah, sorry. Somehow in the middle of writing the email I got
turned backwards about which direction we were interested in.

But I think you get the point.

> Since -g is asking specifying for the reflog, and since the reflog has
> its own timestamps, I would expect that those timestamps be used.

Then I think my one-liner patch should do what you want. And now it's
not just anecdotal evidence that I think it's the right behavior. There
are two of us; we're _data_.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-09 22:20   ` Jeff King
  2011-11-09 22:26     ` Jeff King
@ 2011-11-10  8:04     ` Eric Raible
  2011-11-10  8:08       ` Jeff King
  2011-11-12  6:50     ` Junio C Hamano
  2 siblings, 1 reply; 13+ messages in thread
From: Eric Raible @ 2011-11-10  8:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org

On 11/9/2011 2:20 PM, Jeff King wrote:
> On Wed, Nov 09, 2011 at 05:01:28PM -0500, Jeff King wrote:
> 
> This patch (which is below) turns out to be absurdly simple. And it
> actually still prints the original commit timestamp, because we end up
> reparsing it out of the commit object during the pretty-print phase.

Sweet!

> So I think the only decision is whether "--since" should respect the
> commit timestamps (and be used as a sort of "grep" filter for
> timestamps), or whether it should be respecting the fake history we
> create when doing a reflog walk.

When -g is specified it seems less surprising for --since to respect
the reflog's fake history.  That's what *I* expected, anyway.

> I think I am leaning towards the latter. It seems to me to be the more
> likely guess for what the user would want. And there is real benefit to
> doing it in git, since we can stop the traversal early. In the
> "grep-like" case, doing it inside git is not really any more efficient
> than filtering in a pipeline, like:
> 
>   git log -g --format='%ct %H' |
>   awk '{ print $2 if $1 < SOME_TIMESTAMP }'

And then the sha would have to be fed back into git to be useful, eh?

> Of course we could still offer both (with a "--reflog-since" type of
> option). We'd also need to turn off the optimization for "--since", and
> then check whether "--until" has a similar bug (and offer
> "--reflog-until").

I don't see the point of --reflog-since.  If the user specifies 'reflog'
(either directly or with -g), then can't we just use the reflog's timestamp?
Note: there might be good reasons, as my use of the reflog (and --since, for
that matter), has been very simplistic so far.

> diff --git a/reflog-walk.c b/reflog-walk.c
> index 5d81d39..2e5b270 100644
> --- a/reflog-walk.c
> +++ b/reflog-walk.c
> @@ -231,6 +231,7 @@ void fake_reflog_parent(struct reflog_walk_info *info, struct commit *commit)
>  	reflog = &commit_reflog->reflogs->items[commit_reflog->recno];
>  	info->last_commit_reflog = commit_reflog;
>  	commit_reflog->recno--;
> +	commit->date = reflog->timestamp;
>  	commit_info->commit = (struct commit *)parse_object(reflog->osha1);
>  	if (!commit_info->commit) {
>  		commit->parents = NULL;

Is this something you'd be willing to turn into a real patch?
I'm certainly not qualified.

Thanks - Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-10  8:04     ` Eric Raible
@ 2011-11-10  8:08       ` Jeff King
  2011-11-10  8:20         ` Eric Raible
                           ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Jeff King @ 2011-11-10  8:08 UTC (permalink / raw)
  To: Eric Raible; +Cc: git@vger.kernel.org

On Thu, Nov 10, 2011 at 12:04:38AM -0800, Eric Raible wrote:

> > I think I am leaning towards the latter. It seems to me to be the more
> > likely guess for what the user would want. And there is real benefit to
> > doing it in git, since we can stop the traversal early. In the
> > "grep-like" case, doing it inside git is not really any more efficient
> > than filtering in a pipeline, like:
> > 
> >   git log -g --format='%ct %H' |
> >   awk '{ print $2 if $1 < SOME_TIMESTAMP }'
> 
> And then the sha would have to be fed back into git to be useful, eh?

It's just illustrative. You could replace "%H" with the actual
information you're interested in.

> > Of course we could still offer both (with a "--reflog-since" type of
> > option). We'd also need to turn off the optimization for "--since", and
> > then check whether "--until" has a similar bug (and offer
> > "--reflog-until").
> 
> I don't see the point of --reflog-since.  If the user specifies 'reflog'
> (either directly or with -g), then can't we just use the reflog's timestamp?
> Note: there might be good reasons, as my use of the reflog (and --since, for
> that matter), has been very simplistic so far.

The only point would be to leave "--since" to act on the commit
timestamps, so that you don't have to resort to the external grepping I
mentioned above. However, I'm not convinced anybody even cares about
that use case.

I think the behavior you want is much more sensible.

> > diff --git a/reflog-walk.c b/reflog-walk.c
> > index 5d81d39..2e5b270 100644
> > --- a/reflog-walk.c
> > +++ b/reflog-walk.c
> > @@ -231,6 +231,7 @@ void fake_reflog_parent(struct reflog_walk_info *info, struct commit *commit)
> >  	reflog = &commit_reflog->reflogs->items[commit_reflog->recno];
> >  	info->last_commit_reflog = commit_reflog;
> >  	commit_reflog->recno--;
> > +	commit->date = reflog->timestamp;
> >  	commit_info->commit = (struct commit *)parse_object(reflog->osha1);
> >  	if (!commit_info->commit) {
> >  		commit->parents = NULL;
> 
> Is this something you'd be willing to turn into a real patch?
> I'm certainly not qualified.

Yes. We're in release freeze now, so I didn't even bother with sending
it to Junio. But also, I'd like to gather more opinions on whether the
design is the right thing (hopefully the implementation is Obviously
Correct. :) ).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-10  8:08       ` Jeff King
@ 2011-11-10  8:20         ` Eric Raible
  2011-11-10  8:31         ` Jay Soffian
  2011-11-10 11:06         ` Miles Bader
  2 siblings, 0 replies; 13+ messages in thread
From: Eric Raible @ 2011-11-10  8:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org

On 11/10/2011 12:08 AM, Jeff King wrote:
>>>   git log -g --format='%ct %H' |
>>>   awk '{ print $2 if $1 < SOME_TIMESTAMP }'
>>
>> And then the sha would have to be fed back into git to be useful, eh?
> 
> It's just illustrative. You could replace "%H" with the actual
> information you're interested in.

Of course, my thinko.

> The only point would be to leave "--since" to act on the commit
> timestamps, so that you don't have to resort to the external grepping I
> mentioned above. However, I'm not convinced anybody even cares about
> that use case.
> 
> I think the behavior you want is much more sensible.

Me too!

>> Is this something you'd be willing to turn into a real patch?
>> I'm certainly not qualified.
> 
> Yes. We're in release freeze now, so I didn't even bother with sending
> it to Junio. But also, I'd like to gather more opinions on whether the
> design is the right thing (hopefully the implementation is Obviously
> Correct. :) ).

I think it's hard to argue that the current behavior (as illustrated with
my original example) makes sense.  Or that your patch is overly complicated.
But giving people time to chime in it definitely TRTTD.

- Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-10  8:08       ` Jeff King
  2011-11-10  8:20         ` Eric Raible
@ 2011-11-10  8:31         ` Jay Soffian
  2011-11-10 11:06         ` Miles Bader
  2 siblings, 0 replies; 13+ messages in thread
From: Jay Soffian @ 2011-11-10  8:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Raible, git@vger.kernel.org

On Thu, Nov 10, 2011 at 3:08 AM, Jeff King <peff@peff.net> wrote:
> it to Junio. But also, I'd like to gather more opinions on whether the
> design is the right thing (hopefully the implementation is Obviously

Makes sense to me, so you're at +3.

j.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-10  8:08       ` Jeff King
  2011-11-10  8:20         ` Eric Raible
  2011-11-10  8:31         ` Jay Soffian
@ 2011-11-10 11:06         ` Miles Bader
  2011-11-10 18:18           ` Eric Raible
  2 siblings, 1 reply; 13+ messages in thread
From: Miles Bader @ 2011-11-10 11:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Raible, git@vger.kernel.org

Jeff King <peff@peff.net> writes:
> The only point would be to leave "--since" to act on the commit
> timestamps, so that you don't have to resort to the external grepping I
> mentioned above. However, I'm not convinced anybody even cares about
> that use case.
>
> I think the behavior you want is much more sensible.

I think there's already confusion in this area, e.g., with @{...} using
reflog dates, but "git log --since" using commit dates.  This can be an
easy trap to fall into because _often_ the two have similar granularity
(when you're mostly pushing changes), but not _always_ (when you pull a
big batch of changes).

Soooo, being really really explicit about using reflog dates vs. commit
dates -- and e.g., having option names like "--since" _always_ refer to
commit dates -- would be a good thing, I think...

-Miles

-- 
Future, n. That period of time in which our affairs prosper, our friends
are true and our happiness is assured.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-10 11:06         ` Miles Bader
@ 2011-11-10 18:18           ` Eric Raible
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Raible @ 2011-11-10 18:18 UTC (permalink / raw)
  To: Miles Bader; +Cc: Jeff King, git@vger.kernel.org

On 11/10/2011 3:06 AM, Miles Bader wrote:
> I think there's already confusion in this area, e.g., with @{...} using
> reflog dates, but "git log --since" using commit dates.  This can be an
> easy trap to fall into because _often_ the two have similar granularity
> (when you're mostly pushing changes), but not _always_ (when you pull a
> big batch of changes).
> 
> Soooo, being really really explicit about using reflog dates vs. commit
> dates -- and e.g., having option names like "--since" _always_ refer to
> commit dates -- would be a good thing, I think...
> 
> -Miles

Surely you agree that my original example shows that the current behavior
is confusing, yes?

So you're advocating --reflog-since (or some such)?
Or to disable the early --since early exit when walking the reflog?
Or for something else?

- Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: RFH: unexpected reflog behavior with --since=
  2011-11-09 22:20   ` Jeff King
  2011-11-09 22:26     ` Jeff King
  2011-11-10  8:04     ` Eric Raible
@ 2011-11-12  6:50     ` Junio C Hamano
  2 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2011-11-12  6:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Raible, git@vger.kernel.org

Jeff King <peff@peff.net> writes:

> So I think the only decision is whether "--since" should respect the
> commit timestamps (and be used as a sort of "grep" filter for
> timestamps), or whether it should be respecting the fake history we
> create when doing a reflog walk.
>
> I think I am leaning towards the latter.

I tend to agree as far as the semantics go.

Also at least as a short term solution at the implementation level, I am
OK with the change. But in the longer term, I have this suspicion that we
should not be rewriting commit objects themselves with these phony data,
which makes things like "git log -g --stat" and "git log --parents -g"
totally useless.

That of course is not a fault of this patch; it does not make things any
worse than the original, and that is why I say I am OK with it.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-11-12  6:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-09  0:22 RFH: unexpected reflog behavior with --since= Eric Raible
2011-11-09 22:01 ` Jeff King
2011-11-09 22:20   ` Jeff King
2011-11-09 22:26     ` Jeff King
2011-11-10  8:04     ` Eric Raible
2011-11-10  8:08       ` Jeff King
2011-11-10  8:20         ` Eric Raible
2011-11-10  8:31         ` Jay Soffian
2011-11-10 11:06         ` Miles Bader
2011-11-10 18:18           ` Eric Raible
2011-11-12  6:50     ` Junio C Hamano
2011-11-10  7:48   ` Eric Raible
2011-11-10  7:59     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).