* [PATCH 0/2] speed up reflog unreachability pruning @ 2009-03-31 17:03 Linus Torvalds 2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds 0 siblings, 1 reply; 6+ messages in thread From: Linus Torvalds @ 2009-03-31 17:03 UTC (permalink / raw) To: Junio C Hamano, Git Mailing List Ok, this is a two-patch series that first tries to clean things up a bit, and then applies Junio's approach to speed up the reachability test. The reason I did it this way is that I think we can improve on the reachability logic a bit more, but in order to do that I refuse to work with the crazy duplicated complex logic inside 'expire_reflog_ent()', and I wanted to abstract it out. Then, the first cut at speedup is just Junio's approach. Which is fairly hacky, but works. I'd _like_ to do more of a "dynamically do 'mark_reachable()' only when necessary" thing, but that's a separate cleanup thing. As is, this improves the reflog expire quite enormously for me: - before: [torvalds@nehalem linux]$ time git reflog expire --all real 0m37.193s user 0m37.174s sys 0m0.020s - after: [torvalds@nehalem linux]$ time ~/git/git reflog expire --all real 0m1.693s user 0m1.672s sys 0m0.020s although I do suspect that the 'mark_reachable()' could slow things down in some less extreme cases. But probably never by a huge amount. Total diffstat: builtin-reflog.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 69 insertions(+), 6 deletions(-) with the two individual patches coming up next. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] Clean up reflog unreachability pruning decision 2009-03-31 17:03 [PATCH 0/2] speed up reflog unreachability pruning Linus Torvalds @ 2009-03-31 17:09 ` Linus Torvalds 2009-03-31 17:11 ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds 2009-03-31 17:44 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano 0 siblings, 2 replies; 6+ messages in thread From: Linus Torvalds @ 2009-03-31 17:09 UTC (permalink / raw) To: Junio C Hamano, Git Mailing List From: Linus Torvalds <torvalds@linux-foundation.org> Date: Tue, 31 Mar 2009 09:45:22 -0700 This clarifies the pruning rules for unreachable commits by having a separate helpder function for the unreachability decision. It's preparation for actual bigger changes to come to speed up the decision when the reachability calculations become a bottleneck. In the process it also _does_ change behavior, although in a way that I think is much saner than the old behavior (which was in my opinion not designed, just a result of how the tests were written). It now will prune reflog entries that are older than that 'prune_unreacable' time _and_ that have commit references that can't be even looked up. Of course, "--stale-fix" also does that, and does it regardless of the age of the reflog entry is, but I really think this is the right thing to do. If we can't even look it up, we should consider it to be unreachable. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- builtin-reflog.c | 32 ++++++++++++++++++++++++++------ 1 files changed, 26 insertions(+), 6 deletions(-) Note the behavioural change. I think it's sane and "ObviouslyCorrect(tm)", but maybe somebody disagrees. diff --git a/builtin-reflog.c b/builtin-reflog.c index d95f515..0355ce6 100644 --- a/builtin-reflog.c +++ b/builtin-reflog.c @@ -209,6 +209,31 @@ static int keep_entry(struct commit **it, unsigned char *sha1) return 1; } +static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1) +{ + /* + * We may or may not have the commit yet - if not, look it + * up using the supplied sha1. + */ + if (!commit) { + if (is_null_sha1(sha1)) + return 0; + + commit = lookup_commit_reference_gently(sha1, 1); + + /* We can't even look it up - consider it unreachable */ + if (!commit) + return 1; + } + + /* Reachable from the current reflog top? Don't prune */ + if (in_merge_bases(commit, &cb->ref_commit, 1)) + return 0; + + /* We can't reach it - prune it. */ + return 1; +} + static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1, const char *email, unsigned long timestamp, int tz, const char *message, void *cb_data) @@ -230,12 +255,7 @@ static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1, if (timestamp < cb->cmd->expire_unreachable) { if (!cb->ref_commit) goto prune; - if (!old && !is_null_sha1(osha1)) - old = lookup_commit_reference_gently(osha1, 1); - if (!new && !is_null_sha1(nsha1)) - new = lookup_commit_reference_gently(nsha1, 1); - if ((old && !in_merge_bases(old, &cb->ref_commit, 1)) || - (new && !in_merge_bases(new, &cb->ref_commit, 1))) + if (unreachable(cb, old, osha1) || unreachable(cb, new, nsha1)) goto prune; } -- 1.6.2.1.404.gb0085.dirty ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] Speed up reflog pruning of unreachable commits 2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds @ 2009-03-31 17:11 ` Linus Torvalds 2009-03-31 17:46 ` Junio C Hamano 2009-03-31 17:44 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano 1 sibling, 1 reply; 6+ messages in thread From: Linus Torvalds @ 2009-03-31 17:11 UTC (permalink / raw) To: Junio C Hamano, Git Mailing List From: Junio Hamano <gitster@pobox.com> Date: Mon, 30 Mar 2009 21:34:14 -0700 Instead of doing the (potentially very expensive) "in_merge_base()" check for each commit that might be pruned if it is unreachable, do a preparatory reachability graph of the commit space, so that the common case of being reachable can be tested directly. [ Cleaned up a bit and tweaked to actually work. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> --- builtin-reflog.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 43 insertions(+), 0 deletions(-) diff --git a/builtin-reflog.c b/builtin-reflog.c index 0355ce6..f29ab2f 100644 --- a/builtin-reflog.c +++ b/builtin-reflog.c @@ -52,6 +52,7 @@ struct collect_reflog_cb { #define INCOMPLETE (1u<<10) #define STUDYING (1u<<11) +#define REACHABLE (1u<<12) static int tree_is_complete(const unsigned char *sha1) { @@ -209,6 +210,43 @@ static int keep_entry(struct commit **it, unsigned char *sha1) return 1; } +static void mark_reachable(struct commit *commit, unsigned long expire_limit) +{ + /* + * We need to compute if commit on either side of an reflog + * entry is reachable from the tip of the ref for all entries. + * Mark commits that are reachable from the tip down to the + * time threashold first; we know a commit marked thusly is + * reachable from the tip without running in_merge_bases() + * at all. + */ + struct commit_list *pending = NULL; + + commit_list_insert(commit, &pending); + while (pending) { + struct commit_list *entry = pending; + struct commit_list *parent; + pending = entry->next; + commit = entry->item; + free(entry); + if (commit->object.flags & REACHABLE) + continue; + if (parse_commit(commit)) + continue; + commit->object.flags |= REACHABLE; + if (commit->date < expire_limit) + continue; + parent = commit->parents; + while (parent) { + commit = parent->item; + parent = parent->next; + if (commit->object.flags & REACHABLE) + continue; + commit_list_insert(commit, &pending); + } + } +} + static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1) { /* @@ -227,6 +265,8 @@ static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsig } /* Reachable from the current reflog top? Don't prune */ + if (commit->object.flags & REACHABLE) + return 0; if (in_merge_bases(commit, &cb->ref_commit, 1)) return 0; @@ -308,7 +348,10 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused, cb.ref_commit = lookup_commit_reference_gently(sha1, 1); cb.ref = ref; cb.cmd = cmd; + + mark_reachable(cb.ref_commit, cmd->expire_total); for_each_reflog_ent(ref, expire_reflog_ent, &cb); + clear_commit_marks(cb.ref_commit, REACHABLE); finish: if (cb.newlog) { if (fclose(cb.newlog)) { -- 1.6.2.1.404.gb0085.dirty ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] Speed up reflog pruning of unreachable commits 2009-03-31 17:11 ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds @ 2009-03-31 17:46 ` Junio C Hamano 2009-03-31 17:58 ` Linus Torvalds 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2009-03-31 17:46 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > From: Junio Hamano <gitster@pobox.com> > Date: Mon, 30 Mar 2009 21:34:14 -0700 > > Instead of doing the (potentially very expensive) "in_merge_base()" > check for each commit that might be pruned if it is unreachable, do a > preparatory reachability graph of the commit space, so that the common > case of being reachable can be tested directly. > > [ Cleaned up a bit and tweaked to actually work. - Linus ] > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > --- > builtin-reflog.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 43 insertions(+), 0 deletions(-) > > diff --git a/builtin-reflog.c b/builtin-reflog.c > index 0355ce6..f29ab2f 100644 > --- a/builtin-reflog.c > +++ b/builtin-reflog.c > @@ -52,6 +52,7 @@ struct collect_reflog_cb { > > #define INCOMPLETE (1u<<10) > #define STUDYING (1u<<11) > +#define REACHABLE (1u<<12) > > static int tree_is_complete(const unsigned char *sha1) > { > @@ -209,6 +210,43 @@ static int keep_entry(struct commit **it, unsigned char *sha1) > return 1; > } > > +static void mark_reachable(struct commit *commit, unsigned long expire_limit) > +{ > + /* > + * We need to compute if commit on either side of an reflog > + * entry is reachable from the tip of the ref for all entries. > + * Mark commits that are reachable from the tip down to the > + * time threashold first; we know a commit marked thusly is > + * reachable from the tip without running in_merge_bases() > + * at all. > + */ > + struct commit_list *pending = NULL; > + > + commit_list_insert(commit, &pending); > + while (pending) { > + struct commit_list *entry = pending; > + struct commit_list *parent; > + pending = entry->next; > + commit = entry->item; > + free(entry); > + if (commit->object.flags & REACHABLE) > + continue; > + if (parse_commit(commit)) > + continue; > + commit->object.flags |= REACHABLE; > + if (commit->date < expire_limit) > + continue; > + parent = commit->parents; > + while (parent) { > + commit = parent->item; > + parent = parent->next; > + if (commit->object.flags & REACHABLE) > + continue; > + commit_list_insert(commit, &pending); > + } > + } > +} > + > static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1) > { > /* > @@ -227,6 +265,8 @@ static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsig > } > > /* Reachable from the current reflog top? Don't prune */ > + if (commit->object.flags & REACHABLE) > + return 0; > if (in_merge_bases(commit, &cb->ref_commit, 1)) > return 0; > > @@ -308,7 +348,10 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused, > cb.ref_commit = lookup_commit_reference_gently(sha1, 1); > cb.ref = ref; > cb.cmd = cmd; > + You seem to have lost "if (cb.ref_commit)" from the last round to protect mark_rechable(). It can be NULL. > + mark_reachable(cb.ref_commit, cmd->expire_total); > for_each_reflog_ent(ref, expire_reflog_ent, &cb); > + clear_commit_marks(cb.ref_commit, REACHABLE); > finish: > if (cb.newlog) { > if (fclose(cb.newlog)) { > -- > 1.6.2.1.404.gb0085.dirty ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] Speed up reflog pruning of unreachable commits 2009-03-31 17:46 ` Junio C Hamano @ 2009-03-31 17:58 ` Linus Torvalds 0 siblings, 0 replies; 6+ messages in thread From: Linus Torvalds @ 2009-03-31 17:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Git Mailing List On Tue, 31 Mar 2009, Junio C Hamano wrote: > > You seem to have lost "if (cb.ref_commit)" from the last round to protect > mark_rechable(). It can be NULL. I based it on your original patch, so yeah, it's missing all your updates since. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] Clean up reflog unreachability pruning decision 2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds 2009-03-31 17:11 ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds @ 2009-03-31 17:44 ` Junio C Hamano 1 sibling, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2009-03-31 17:44 UTC (permalink / raw) To: Linus Torvalds; +Cc: Git Mailing List Linus Torvalds <torvalds@linux-foundation.org> writes: > From: Linus Torvalds <torvalds@linux-foundation.org> > Date: Tue, 31 Mar 2009 09:45:22 -0700 > > This clarifies the pruning rules for unreachable commits by having a > separate helpder function for the unreachability decision. > > It's preparation for actual bigger changes to come to speed up the > decision when the reachability calculations become a bottleneck. > > In the process it also _does_ change behavior, although in a way that I > think is much saner than the old behavior (which was in my opinion not > designed, just a result of how the tests were written). It now will prune > reflog entries that are older than that 'prune_unreacable' time _and_ that > have commit references that can't be even looked up. > Of course, "--stale-fix" also does that, and does it regardless of the age > of the reflog entry is, but I really think this is the right thing to do. > If we can't even look it up, we should consider it to be unreachable. > > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > --- > builtin-reflog.c | 32 ++++++++++++++++++++++++++------ > 1 files changed, 26 insertions(+), 6 deletions(-) > > Note the behavioural change. I think it's sane and "ObviouslyCorrect(tm)", > but maybe somebody disagrees. I did not recall initially when I was discussing this with you last night but I think this "no commit? not prune" is intentional. The codepath is not about logs/heads but logs/*anything*, and we allow pointers to non commit objects (like v2.6.11-tree). Also even if we limit ourselves to commits, @{-N} notation can be used to see the history of branch switching, and that does not need any underlying objects. I do not think it is interesting to be able to see which branch you were on 30 days abo, though ;-) > diff --git a/builtin-reflog.c b/builtin-reflog.c > index d95f515..0355ce6 100644 > --- a/builtin-reflog.c > +++ b/builtin-reflog.c > @@ -209,6 +209,31 @@ static int keep_entry(struct commit **it, unsigned char *sha1) > return 1; > } > > +static int unreachable(struct expire_reflog_cb *cb, struct commit *commit, unsigned char *sha1) > +{ > + /* > + * We may or may not have the commit yet - if not, look it > + * up using the supplied sha1. > + */ > + if (!commit) { > + if (is_null_sha1(sha1)) > + return 0; > + > + commit = lookup_commit_reference_gently(sha1, 1); > + > + /* We can't even look it up - consider it unreachable */ > + if (!commit) > + return 1; /* If it is not a commit, keep it. */ if (!commit) return 0; > + } > + > + /* Reachable from the current reflog top? Don't prune */ That's "tip of the ref", not necessarily "reflog top". > + if (in_merge_bases(commit, &cb->ref_commit, 1)) > + return 0; > + > + /* We can't reach it - prune it. */ > + return 1; > +} > + > static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1, > const char *email, unsigned long timestamp, int tz, > const char *message, void *cb_data) > @@ -230,12 +255,7 @@ static int expire_reflog_ent(unsigned char *osha1, unsigned char *nsha1, > if (timestamp < cb->cmd->expire_unreachable) { > if (!cb->ref_commit) > goto prune; > - if (!old && !is_null_sha1(osha1)) > - old = lookup_commit_reference_gently(osha1, 1); > - if (!new && !is_null_sha1(nsha1)) > - new = lookup_commit_reference_gently(nsha1, 1); > - if ((old && !in_merge_bases(old, &cb->ref_commit, 1)) || > - (new && !in_merge_bases(new, &cb->ref_commit, 1))) > + if (unreachable(cb, old, osha1) || unreachable(cb, new, nsha1)) > goto prune; > } > > -- > 1.6.2.1.404.gb0085.dirty ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-03-31 18:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-03-31 17:03 [PATCH 0/2] speed up reflog unreachability pruning Linus Torvalds 2009-03-31 17:09 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Linus Torvalds 2009-03-31 17:11 ` [PATCH 2/2] Speed up reflog pruning of unreachable commits Linus Torvalds 2009-03-31 17:46 ` Junio C Hamano 2009-03-31 17:58 ` Linus Torvalds 2009-03-31 17:44 ` [PATCH 1/2] Clean up reflog unreachability pruning decision Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).