* [PATCH/RFH 0/3] stable priority-queue
@ 2014-07-14 5:40 Jeff King
2014-07-14 5:42 ` [PATCH 1/3] prio-queue: factor out compare and swap operations Jeff King
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Jeff King @ 2014-07-14 5:40 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
As Junio and I discussed earlier in [1], this series makes the
prio_queue struct stable with respect to object insertion (which in turn
means we can use it to replace commit_list in more places).
I think everything here is correct, but the second commit fails the
final test in t5539. I think the test is just flaky (hence the RFH and
cc to Duy).
That test creates some unrelated commits in two separate repositories,
and then fetches from one to the other. Since the commit creation
happens in a subshell, the first commit in each ends up with the same
test_tick value. When fetch-pack looks at the two root commits
"unrelated1" and "new-too", the exact sequence of ACKs is different
depending on which one it pulls out of the queue first.
With the current code, it happens to be "unrelated1" (though this is not
at all guaranteed by the prio_queue data structure, it is deterministic
for this particular sequence of input). We see the ready-ACK, and the
test succeeds.
With the stable queue, we reliably get "new-too" out (since it is our
local tip, it is added to the queue before we even talk to the remote).
We never see a ready-ACK, and the test fails due to the grep on the
TRACE_PACKET output at the end (the fetch itself succeeds as expected).
I'm really not quite clear on what's supposed to be going on in the
test. I can make it pass with:
diff --git a/t/t5539-fetch-http-shallow.sh b/t/t5539-fetch-http-shallow.sh
index 94553e1..b461188 100755
--- a/t/t5539-fetch-http-shallow.sh
+++ b/t/t5539-fetch-http-shallow.sh
@@ -54,6 +54,7 @@ EOF
test_expect_success 'no shallow lines after receiving ACK ready' '
(
cd shallow &&
+ test_tick &&
for i in $(test_seq 15)
do
git checkout --orphan unrelated$i &&
which just bumps the timestamp for the unrelated* commits (so that they
are always more recent than "new-too" and get picked first). I'm not
sure if that is too hacky, or if there's a more robust way to set up the
test.
Anyway, here are the patches.
[1/3]: prio-queue: factor out compare and swap operations
[2/3]: prio-queue: make output stable with respect to insertion
[3/3]: paint_down_to_common: use prio_queue
-Peff
[1] http://thread.gmane.org/gmane.comp.version-control.git/252472/focus=252475
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 1/3] prio-queue: factor out compare and swap operations
2014-07-14 5:40 [PATCH/RFH 0/3] stable priority-queue Jeff King
@ 2014-07-14 5:42 ` Jeff King
2014-07-14 5:51 ` [PATCH 2/3] prio-queue: make output stable with respect to insertion Jeff King
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2014-07-14 5:42 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
When manipulating the priority queue's heap, we frequently
have to compare and swap heap entries. As we are storing
only void pointers right now, this is quite easy to do
inline in a few lines. However, when we start using a more
complicated heap entry in a future patch, that will get
longer. Factoring out these operations lets us make future
changes in one place. It also makes the code a little
shorter and more readable.
Note that we actually accept indices into the queue array
instead of pointers. This is slightly less flexible than
passing pointers-to-pointers (we could not swap items from
unrelated arrays, but we would not want to), but will make
further refactoring simpler (and lets us avoid repeating
"queue->array" at each callsite, which led to some long
lines).
And finally, note that we are cleaning up an accidental use
of a "struct commit" pointer to hold a temporary entry
during swap. Even though we currently only use this code for
commits, it is supposed to be type-agnostic. In practice
this didn't matter anyway because we never dereferenced the
commit pointer (and on most systems, the pointer values
themselves are interchangeable between types).
Signed-off-by: Jeff King <peff@peff.net>
---
prio-queue.c | 49 +++++++++++++++++++++++++------------------------
1 file changed, 25 insertions(+), 24 deletions(-)
diff --git a/prio-queue.c b/prio-queue.c
index c9f8c6d..0f4fcf2 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -1,18 +1,28 @@
#include "cache.h"
-#include "commit.h"
#include "prio-queue.h"
+static inline int compare(struct prio_queue *queue, int i, int j)
+{
+ int cmp = queue->compare(queue->array[i], queue->array[j],
+ queue->cb_data);
+ return cmp;
+}
+
+static inline void swap(struct prio_queue *queue, int i, int j)
+{
+ void *tmp = queue->array[i];
+ queue->array[i] = queue->array[j];
+ queue->array[j] = tmp;
+}
+
void prio_queue_reverse(struct prio_queue *queue)
{
int i, j;
if (queue->compare != NULL)
die("BUG: prio_queue_reverse() on non-LIFO queue");
- for (i = 0; i <= (j = (queue->nr - 1) - i); i++) {
- struct commit *swap = queue->array[i];
- queue->array[i] = queue->array[j];
- queue->array[j] = swap;
- }
+ for (i = 0; i <= (j = (queue->nr - 1) - i); i++)
+ swap(queue, i, j);
}
void clear_prio_queue(struct prio_queue *queue)
@@ -25,37 +35,32 @@ void clear_prio_queue(struct prio_queue *queue)
void prio_queue_put(struct prio_queue *queue, void *thing)
{
- prio_queue_compare_fn compare = queue->compare;
int ix, parent;
/* Append at the end */
ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
queue->array[queue->nr++] = thing;
- if (!compare)
+ if (!queue->compare)
return; /* LIFO */
/* Bubble up the new one */
for (ix = queue->nr - 1; ix; ix = parent) {
parent = (ix - 1) / 2;
- if (compare(queue->array[parent], queue->array[ix],
- queue->cb_data) <= 0)
+ if (compare(queue, parent, ix) <= 0)
break;
- thing = queue->array[parent];
- queue->array[parent] = queue->array[ix];
- queue->array[ix] = thing;
+ swap(queue, parent, ix);
}
}
void *prio_queue_get(struct prio_queue *queue)
{
- void *result, *swap;
+ void *result;
int ix, child;
- prio_queue_compare_fn compare = queue->compare;
if (!queue->nr)
return NULL;
- if (!compare)
+ if (!queue->compare)
return queue->array[--queue->nr]; /* LIFO */
result = queue->array[0];
@@ -67,18 +72,14 @@ void *prio_queue_get(struct prio_queue *queue)
/* Push down the one at the root */
for (ix = 0; ix * 2 + 1 < queue->nr; ix = child) {
child = ix * 2 + 1; /* left */
- if ((child + 1 < queue->nr) &&
- (compare(queue->array[child], queue->array[child + 1],
- queue->cb_data) >= 0))
+ if (child + 1 < queue->nr &&
+ compare(queue, child, child + 1) >= 0)
child++; /* use right child */
- if (compare(queue->array[ix], queue->array[child],
- queue->cb_data) <= 0)
+ if (compare(queue, ix, child) <= 0)
break;
- swap = queue->array[child];
- queue->array[child] = queue->array[ix];
- queue->array[ix] = swap;
+ swap(queue, child, ix);
}
return result;
}
--
2.0.0.566.gfe3e6b2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] prio-queue: make output stable with respect to insertion
2014-07-14 5:40 [PATCH/RFH 0/3] stable priority-queue Jeff King
2014-07-14 5:42 ` [PATCH 1/3] prio-queue: factor out compare and swap operations Jeff King
@ 2014-07-14 5:51 ` Jeff King
2014-07-14 5:53 ` [PATCH 3/3] paint_down_to_common: use prio_queue Jeff King
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2014-07-14 5:51 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
If two items are added to a prio_queue and compare equal,
they currently come out in an apparently random order (this
order is deterministic for a particular sequence of
insertions and removals, but does not necessarily match the
insertion order). This makes it unlike using a date-ordered
commit_list, which is one of the main types we would like to
replace with it (because prio_queue does not suffer from
O(n) insertions).
We can make the priority queue stable by keeping an
insertion counter for each element, and using it to break
ties. This does increase the memory usage of the structure
(one int per element), but in practice it does not seem to
affect runtime. A best-of-five "git rev-list --topo-order"
on linux.git showed less than 1% difference (well within the
run-to-run noise).
In an ideal world, we would offer both stable and unstable
priority queues (the latter to try to maximize performance).
However, given the lack of a measurable performance
difference, it is not worth the extra code.
Signed-off-by: Jeff King <peff@peff.net>
---
I actually tried implementing a stable queue on top of the existing
prio-queue. However, it was quite a mess. Because prio_queue only stores
one void pointer to a thing, the stable queue has to allocate its own its
(counter,thing) pair. Doing it in a separate array doesn't work (you do
not pop them in insertion order, so you end up with "holes"). So you end
up storing an extra pointer, _plus_ per-entry malloc overhead.
If we really want to offer a non-stable queue (and I do not think we
do), we should probably just do two totally separate queues, or
implement the whole thing with a run-time "element size" member (or even
do it in macros).
I use struct assignment in the swap() function below. Do we want to
replace that with a memcpy? Presumably decent compilers can turn that
into the same code anyway, but I find the assignment more readable, and
IIRC it has been around since C89.
prio-queue.c | 15 ++++++++++-----
prio-queue.h | 8 +++++++-
2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/prio-queue.c b/prio-queue.c
index 0f4fcf2..e4365b0 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -3,14 +3,16 @@
static inline int compare(struct prio_queue *queue, int i, int j)
{
- int cmp = queue->compare(queue->array[i], queue->array[j],
+ int cmp = queue->compare(queue->array[i].data, queue->array[j].data,
queue->cb_data);
+ if (!cmp)
+ cmp = queue->array[i].ctr - queue->array[j].ctr;
return cmp;
}
static inline void swap(struct prio_queue *queue, int i, int j)
{
- void *tmp = queue->array[i];
+ struct prio_queue_entry tmp = queue->array[i];
queue->array[i] = queue->array[j];
queue->array[j] = tmp;
}
@@ -31,6 +33,7 @@ void clear_prio_queue(struct prio_queue *queue)
queue->nr = 0;
queue->alloc = 0;
queue->array = NULL;
+ queue->insertion_ctr = 0;
}
void prio_queue_put(struct prio_queue *queue, void *thing)
@@ -39,7 +42,9 @@ void prio_queue_put(struct prio_queue *queue, void *thing)
/* Append at the end */
ALLOC_GROW(queue->array, queue->nr + 1, queue->alloc);
- queue->array[queue->nr++] = thing;
+ queue->array[queue->nr].ctr = queue->insertion_ctr++;
+ queue->array[queue->nr].data = thing;
+ queue->nr++;
if (!queue->compare)
return; /* LIFO */
@@ -61,9 +66,9 @@ void *prio_queue_get(struct prio_queue *queue)
if (!queue->nr)
return NULL;
if (!queue->compare)
- return queue->array[--queue->nr]; /* LIFO */
+ return queue->array[--queue->nr].data; /* LIFO */
- result = queue->array[0];
+ result = queue->array[0].data;
if (!--queue->nr)
return result;
diff --git a/prio-queue.h b/prio-queue.h
index 9c3cd1f..d030ec9 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -21,11 +21,17 @@
*/
typedef int (*prio_queue_compare_fn)(const void *one, const void *two, void *cb_data);
+struct prio_queue_entry {
+ unsigned ctr;
+ void *data;
+};
+
struct prio_queue {
prio_queue_compare_fn compare;
+ unsigned insertion_ctr;
void *cb_data;
int alloc, nr;
- void **array;
+ struct prio_queue_entry *array;
};
/*
--
2.0.0.566.gfe3e6b2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] paint_down_to_common: use prio_queue
2014-07-14 5:40 [PATCH/RFH 0/3] stable priority-queue Jeff King
2014-07-14 5:42 ` [PATCH 1/3] prio-queue: factor out compare and swap operations Jeff King
2014-07-14 5:51 ` [PATCH 2/3] prio-queue: make output stable with respect to insertion Jeff King
@ 2014-07-14 5:53 ` Jeff King
2014-07-14 10:12 ` [PATCH/RFH 0/3] stable priority-queue Duy Nguyen
2014-07-14 11:02 ` David Kastrup
4 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2014-07-14 5:53 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy
When we are traversing to find merge bases, we keep our
usual commit_list of commits to process, sorted by their
commit timestamp. As we add each parent to the list, we have
to spend "O(width of history)" to do the insertion, where
the width of history is the number of simultaneous lines of
development.
If we instead use a heap-based priority queue, we can do
these insertions in "O(log width)" time. This provides minor
speedups to merge-base calculations (timings in linux.git,
warm cache, best-of-five):
[before]
$ git merge-base HEAD v2.6.12
real 0m3.251s
user 0m3.148s
sys 0m0.104s
[after]
$ git merge-base HEAD v2.6.12
real 0m3.234s
user 0m3.108s
sys 0m0.128s
That's only an 0.5% speedup, but it does help protect us
against pathological cases.
While we are munging the "interesting" function, we also
take the opportunity to give it a more descriptive name, and
convert the return value to an int (we returned the first
interesting commit, but nobody ever looked at it).
Signed-off-by: Jeff King <peff@peff.net>
---
Same as what I posted a few weeks ago, but now we do not have to tweak
t6024 to account for the lack of stability.
commit.c | 42 +++++++++++++++++++-----------------------
1 file changed, 19 insertions(+), 23 deletions(-)
diff --git a/commit.c b/commit.c
index acb74b5..1fc60c0 100644
--- a/commit.c
+++ b/commit.c
@@ -786,45 +786,41 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
-static struct commit *interesting(struct commit_list *list)
+static int queue_has_nonstale(struct prio_queue *queue)
{
- while (list) {
- struct commit *commit = list->item;
- list = list->next;
- if (commit->object.flags & STALE)
- continue;
- return commit;
+ int i;
+ for (i = 0; i < queue->nr; i++) {
+ struct commit *commit = queue->array[i].data;
+ if (!(commit->object.flags & STALE))
+ return 1;
}
- return NULL;
+ return 0;
}
/* all input commits in one and twos[] must have been parsed! */
static struct commit_list *paint_down_to_common(struct commit *one, int n, struct commit **twos)
{
- struct commit_list *list = NULL;
+ struct prio_queue queue = { compare_commits_by_commit_date };
struct commit_list *result = NULL;
int i;
one->object.flags |= PARENT1;
- commit_list_insert_by_date(one, &list);
- if (!n)
- return list;
+ if (!n) {
+ commit_list_append(one, &result);
+ return result;
+ }
+ prio_queue_put(&queue, one);
+
for (i = 0; i < n; i++) {
twos[i]->object.flags |= PARENT2;
- commit_list_insert_by_date(twos[i], &list);
+ prio_queue_put(&queue, twos[i]);
}
- while (interesting(list)) {
- struct commit *commit;
+ while (queue_has_nonstale(&queue)) {
+ struct commit *commit = prio_queue_get(&queue);
struct commit_list *parents;
- struct commit_list *next;
int flags;
- commit = list->item;
- next = list->next;
- free(list);
- list = next;
-
flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
if (flags == (PARENT1 | PARENT2)) {
if (!(commit->object.flags & RESULT)) {
@@ -843,11 +839,11 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc
if (parse_commit(p))
return NULL;
p->object.flags |= flags;
- commit_list_insert_by_date(p, &list);
+ prio_queue_put(&queue, p);
}
}
- free_commit_list(list);
+ clear_prio_queue(&queue);
return result;
}
--
2.0.0.566.gfe3e6b2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH/RFH 0/3] stable priority-queue
2014-07-14 5:40 [PATCH/RFH 0/3] stable priority-queue Jeff King
` (2 preceding siblings ...)
2014-07-14 5:53 ` [PATCH 3/3] paint_down_to_common: use prio_queue Jeff King
@ 2014-07-14 10:12 ` Duy Nguyen
2014-07-14 11:02 ` David Kastrup
4 siblings, 0 replies; 7+ messages in thread
From: Duy Nguyen @ 2014-07-14 10:12 UTC (permalink / raw)
To: Jeff King; +Cc: Git Mailing List, Junio C Hamano
On Mon, Jul 14, 2014 at 12:40 PM, Jeff King <peff@peff.net> wrote:
> As Junio and I discussed earlier in [1], this series makes the
> prio_queue struct stable with respect to object insertion (which in turn
> means we can use it to replace commit_list in more places).
>
> I think everything here is correct, but the second commit fails the
> final test in t5539. I think the test is just flaky (hence the RFH and
> cc to Duy).
>
> That test creates some unrelated commits in two separate repositories,
> and then fetches from one to the other. Since the commit creation
> happens in a subshell, the first commit in each ends up with the same
> test_tick value. When fetch-pack looks at the two root commits
> "unrelated1" and "new-too", the exact sequence of ACKs is different
> depending on which one it pulls out of the queue first.
>
> With the current code, it happens to be "unrelated1" (though this is not
> at all guaranteed by the prio_queue data structure, it is deterministic
> for this particular sequence of input). We see the ready-ACK, and the
> test succeeds.
>
> With the stable queue, we reliably get "new-too" out (since it is our
> local tip, it is added to the queue before we even talk to the remote).
> We never see a ready-ACK, and the test fails due to the grep on the
> TRACE_PACKET output at the end (the fetch itself succeeds as expected).
>
> I'm really not quite clear on what's supposed to be going on in the
> test. I can make it pass with:
>
> diff --git a/t/t5539-fetch-http-shallow.sh b/t/t5539-fetch-http-shallow.sh
> index 94553e1..b461188 100755
> --- a/t/t5539-fetch-http-shallow.sh
> +++ b/t/t5539-fetch-http-shallow.sh
> @@ -54,6 +54,7 @@ EOF
> test_expect_success 'no shallow lines after receiving ACK ready' '
> (
> cd shallow &&
> + test_tick &&
> for i in $(test_seq 15)
> do
> git checkout --orphan unrelated$i &&
>
> which just bumps the timestamp for the unrelated* commits (so that they
> are always more recent than "new-too" and get picked first). I'm not
> sure if that is too hacky, or if there's a more robust way to set up the
> test.
The test needs the right amount of "have"s with pkt-flush inserted at
the right place. If test_tick makes it happy, I think we could go with
it for now. I'll try to study this code more and see if I can come up
with a better test. It'll be low priority in my backlog though.
--
Duy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH/RFH 0/3] stable priority-queue
2014-07-14 5:40 [PATCH/RFH 0/3] stable priority-queue Jeff King
` (3 preceding siblings ...)
2014-07-14 10:12 ` [PATCH/RFH 0/3] stable priority-queue Duy Nguyen
@ 2014-07-14 11:02 ` David Kastrup
2014-07-21 9:07 ` Jeff King
4 siblings, 1 reply; 7+ messages in thread
From: David Kastrup @ 2014-07-14 11:02 UTC (permalink / raw)
To: Jeff King; +Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy
Jeff King <peff@peff.net> writes:
> As Junio and I discussed earlier in [1], this series makes the
> prio_queue struct stable with respect to object insertion (which in turn
> means we can use it to replace commit_list in more places).
I don't think that this makes sense in general since it assumes the
appropriate fallback behavior to be FIFO. Depending on the use case, it
might be the other way round, or something else (like topology-based)
entirely.
commit_list may be unsuitable as such for intermingled addition of
unsorted and extraction of sorted elements (the general use case for
priority queues).
I see that struct commit already contains an integer field called
"index", assigned sequentially, which might conceivably be used for
tie-breaking independent from the actual prio_queue use at no extra
cost.
The resulting order will of course be somewhat arbitrary in a different
way, but at least will correspond to the order the commits have been
discovered/generated, so they should still exhibit a more topological
relation than what prio_queue does without further help.
--
David Kastrup
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH/RFH 0/3] stable priority-queue
2014-07-14 11:02 ` David Kastrup
@ 2014-07-21 9:07 ` Jeff King
0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2014-07-21 9:07 UTC (permalink / raw)
To: David Kastrup; +Cc: git, Junio C Hamano, Nguyễn Thái Ngọc Duy
On Mon, Jul 14, 2014 at 01:02:56PM +0200, David Kastrup wrote:
> Jeff King <peff@peff.net> writes:
>
> > As Junio and I discussed earlier in [1], this series makes the
> > prio_queue struct stable with respect to object insertion (which in turn
> > means we can use it to replace commit_list in more places).
>
> I don't think that this makes sense in general since it assumes the
> appropriate fallback behavior to be FIFO. Depending on the use case, it
> might be the other way round, or something else (like topology-based)
> entirely.
Remember that this is just a tie-breaker for the regular comparison
function. If you want to represent some other ordering, you are free to
do the tie-breaking yourself in your comparison function. The one thing
we can easily provide but do not is LIFO ordering for the tie-breaker.
That would be trivial to add as a flag on the prio_queue, but I'd wait
until there is actually a caller who wants it.
Yes, it's a bit hacky to provide it for cases which _don't_ care about
order (or implement their own separate tie-breaker). But the worst case
there is that we are wasting some memory, and I wasn't able to measure a
real slow-down. I think it's a reasonable compromise given the lack of
generics in C.
But if you have a case that is measurably affected, please let me know,
and I can look into implementing it in a type-agnostic way (so that the
embedded insertion counter becomes just another type).
> I see that struct commit already contains an integer field called
> "index", assigned sequentially, which might conceivably be used for
> tie-breaking independent from the actual prio_queue use at no extra
> cost.
I don't think it's a good idea to rely on that index, as it is anything
but stable. It represents whatever order commits happened to be first
touched in the current program run. So:
1. Performing the same operation in-process versus in a sub-process
may produce different results.
2. Arguments to a command may have unexpected effects. E.g.,
specifying "--tags" to "rev-list" will look up the commit at
each tag ref, giving them much lower index numbers than they would
if we reached only through the normal traversal.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-07-21 9:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-14 5:40 [PATCH/RFH 0/3] stable priority-queue Jeff King
2014-07-14 5:42 ` [PATCH 1/3] prio-queue: factor out compare and swap operations Jeff King
2014-07-14 5:51 ` [PATCH 2/3] prio-queue: make output stable with respect to insertion Jeff King
2014-07-14 5:53 ` [PATCH 3/3] paint_down_to_common: use prio_queue Jeff King
2014-07-14 10:12 ` [PATCH/RFH 0/3] stable priority-queue Duy Nguyen
2014-07-14 11:02 ` David Kastrup
2014-07-21 9:07 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).