* [GSOC][RFC PATCH 0/2] midx: implement progress reporting for QSORT operation @ 2025-02-10 7:46 Ayush Chandekar 2025-02-10 7:46 ` [PATCH 1/2] midx: show progress during " Ayush Chandekar 2025-02-10 7:46 ` [PATCH 2/2] t5319: add test for MIDX QSORT progress reporting Ayush Chandekar 0 siblings, 2 replies; 6+ messages in thread From: Ayush Chandekar @ 2025-02-10 7:46 UTC (permalink / raw) To: git Hi, This small patch series adds progress reporting during the QSORT operation in multi-pack-index verification. This was a TODO in the code which I decided to pickup because I found it interesting. Feedback is appreciated! Thanks, Ayush Ayush Chandekar (2): midx: show progress during QSORT operation t5319: add test for MIDX QSORT progress reporting midx.c | 43 +++++++++++++++++++++++++------------ t/t5319-multi-pack-index.sh | 14 ++++++++++++ 2 files changed, 43 insertions(+), 14 deletions(-) -- 2.48.GIT ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] midx: show progress during QSORT operation 2025-02-10 7:46 [GSOC][RFC PATCH 0/2] midx: implement progress reporting for QSORT operation Ayush Chandekar @ 2025-02-10 7:46 ` Ayush Chandekar 2025-02-10 16:55 ` Junio C Hamano 2025-02-10 7:46 ` [PATCH 2/2] t5319: add test for MIDX QSORT progress reporting Ayush Chandekar 1 sibling, 1 reply; 6+ messages in thread From: Ayush Chandekar @ 2025-02-10 7:46 UTC (permalink / raw) To: git Add progress reporting during the QSORT operation in multi-pack-index verification. This helps users track the progress of large sorting operations. In previous versions, the progress would jump directly from 0% to 100% without any intermediate updates. Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com> --- midx.c | 43 +++++++++++++++++++++++++++++-------------- 1 file changed, 29 insertions(+), 14 deletions(-) diff --git a/midx.c b/midx.c index d91088efb8..69937f5ca8 100644 --- a/midx.c +++ b/midx.c @@ -14,6 +14,7 @@ #include "pack-bitmap.h" #include "pack-revindex.h" + int midx_checksum_valid(struct multi_pack_index *m); void clear_midx_files_ext(const char *object_dir, const char *ext, const char *keep_hash); @@ -853,32 +854,43 @@ static void midx_report(const char *fmt, ...) va_end(ap); } +/* + * Limit calls to display_progress() for performance reasons. + * The interval here was arbitrarily chosen. + */ +#define SPARSE_PROGRESS_INTERVAL (1 << 12) +#define midx_display_sparse_progress(progress, n) \ + do { \ + uint64_t _n = (n); \ + if ((_n & (SPARSE_PROGRESS_INTERVAL - 1)) == 0) \ + display_progress(progress, _n); \ + } while (0) + struct pair_pos_vs_id { uint32_t pos; uint32_t pack_int_id; }; +static struct progress *sort_progress; +static uint64_t last_max_pos; + static int compare_pair_pos_vs_id(const void *_a, const void *_b) { struct pair_pos_vs_id *a = (struct pair_pos_vs_id *)_a; struct pair_pos_vs_id *b = (struct pair_pos_vs_id *)_b; + + if (sort_progress) { + uint64_t max_pos = (a->pos > b->pos) ? a->pos : b->pos; + if (max_pos > last_max_pos) { + last_max_pos = max_pos; + midx_display_sparse_progress(sort_progress, last_max_pos); + } + } return b->pack_int_id - a->pack_int_id; } -/* - * Limit calls to display_progress() for performance reasons. - * The interval here was arbitrarily chosen. - */ -#define SPARSE_PROGRESS_INTERVAL (1 << 12) -#define midx_display_sparse_progress(progress, n) \ - do { \ - uint64_t _n = (n); \ - if ((_n & (SPARSE_PROGRESS_INTERVAL - 1)) == 0) \ - display_progress(progress, _n); \ - } while (0) - int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags) { struct pair_pos_vs_id *pairs = NULL; @@ -960,12 +972,15 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag pairs[i].pack_int_id = nth_midxed_pack_int_id(m, i); } - if (flags & MIDX_PROGRESS) + if (flags & MIDX_PROGRESS) { progress = start_sparse_progress(r, _("Sorting objects by packfile"), m->num_objects); - display_progress(progress, 0); /* TODO: Measure QSORT() progress */ + last_max_pos = 0; + sort_progress = progress; + } QSORT(pairs, m->num_objects, compare_pair_pos_vs_id); + sort_progress = NULL; stop_progress(&progress); if (flags & MIDX_PROGRESS) -- 2.48.GIT ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] midx: show progress during QSORT operation 2025-02-10 7:46 ` [PATCH 1/2] midx: show progress during " Ayush Chandekar @ 2025-02-10 16:55 ` Junio C Hamano 2025-02-11 12:23 ` Ayush Chandekar 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2025-02-10 16:55 UTC (permalink / raw) To: Ayush Chandekar; +Cc: git Ayush Chandekar <ayu.chandekar@gmail.com> writes: > Add progress reporting during the QSORT operation in multi-pack-index > verification. This helps users track the progress of large sorting > operations. Hmph. If the implementation is correct (which I cannot tell), this needs to explain why it is a bit better than saying nothing. > +/* > + * Limit calls to display_progress() for performance reasons. > + * The interval here was arbitrarily chosen. > + */ > +#define SPARSE_PROGRESS_INTERVAL (1 << 12) > +#define midx_display_sparse_progress(progress, n) \ > + do { \ > + uint64_t _n = (n); \ > + if ((_n & (SPARSE_PROGRESS_INTERVAL - 1)) == 0) \ > + display_progress(progress, _n); \ > + } while (0) > + > struct pair_pos_vs_id > { > uint32_t pos; > uint32_t pack_int_id; > }; > > +static struct progress *sort_progress; > +static uint64_t last_max_pos; > + > static int compare_pair_pos_vs_id(const void *_a, const void *_b) > { > struct pair_pos_vs_id *a = (struct pair_pos_vs_id *)_a; > struct pair_pos_vs_id *b = (struct pair_pos_vs_id *)_b; This is a compar callback function used by the sorting machinery, which is called QSORT but system-provided qsort() implementations are not necessarily quick-sort [*]. > + > + if (sort_progress) { > + uint64_t max_pos = (a->pos > b->pos) ? a->pos : b->pos; > + if (max_pos > last_max_pos) { > + last_max_pos = max_pos; > + midx_display_sparse_progress(sort_progress, last_max_pos); > + } > + } So I do not quite understand the assumption this implementation of the progress meter makes. The assumption seems to be that the element in the array with the highest index MUST not be summoned for comparison until the very end of the sorting process, but what guarantees that? Even if we assume that the qsort() implementation supplied by the system implements the divide and conquer plain vanilla quicksort, it may divide the array into half, and then sort the top half first before it sorts the bottom half, and doing so recursively will give you the comparison between elements near the end of the array with the highest index fairly early in the process, no? And the standard does not even specify what algorithm should internally be used to implement qsort(3), which our QSORT() macro eventually calls, so making any assumption on the order the elements of the array is fed to the compar callback function sounds doubly a frigile deal. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] midx: show progress during QSORT operation 2025-02-10 16:55 ` Junio C Hamano @ 2025-02-11 12:23 ` Ayush Chandekar 2025-02-11 16:29 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Ayush Chandekar @ 2025-02-11 12:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: git > Hmph. If the implementation is correct (which I cannot tell), this > needs to explain why it is a bit better than saying nothing. While going through the code, I noticed the TODO comment: "Measure QSORT() progress", and I thought it might be interesting to explore. For big codebases, being stuck at zero would make it feel like there's no progress happening and that is why putting a progress might be better. > > static int compare_pair_pos_vs_id(const void *_a, const void *_b) > > { > > struct pair_pos_vs_id *a = (struct pair_pos_vs_id *)_a; > > struct pair_pos_vs_id *b = (struct pair_pos_vs_id *)_b; > > This is a compar callback function used by the sorting machinery, > which is called QSORT but system-provided qsort() implementations > are not necessarily quick-sort [*]. Oh. Initially, I was unsure how to approach it, but I believed that tracking the highest pos value seen in comparisons could give a rough estimate of progress. However, as you pointed out, this assumes that qsort() processes elements in a structured way where the highest-indexed element isn't compared until later in the sort. I now see that this isn't a safe assumption Since there's no guarantee that progress would be reflected meaningfully, this approach isn't good. Let me know if you have any suggestions/comments:) Thanks, Ayush ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] midx: show progress during QSORT operation 2025-02-11 12:23 ` Ayush Chandekar @ 2025-02-11 16:29 ` Junio C Hamano 0 siblings, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2025-02-11 16:29 UTC (permalink / raw) To: Ayush Chandekar; +Cc: git Ayush Chandekar <ayu.chandekar@gmail.com> writes: >> Hmph. If the implementation is correct (which I cannot tell), this >> needs to explain why it is a bit better than saying nothing. > While going through the code, I noticed the TODO comment: "Measure > QSORT() progress", and I thought it might be interesting to explore. > For big codebases, being stuck at zero would make it feel like there's > no progress happening and that is why putting a progress might be > better. That much I already know---otherwise we would not have that comment there ;-) What I meant was that the proposed log message did not give readers any hint how the implementation given in the patch is correct, the assumption it makes on the behaviour of sort(3), etc. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] t5319: add test for MIDX QSORT progress reporting 2025-02-10 7:46 [GSOC][RFC PATCH 0/2] midx: implement progress reporting for QSORT operation Ayush Chandekar 2025-02-10 7:46 ` [PATCH 1/2] midx: show progress during " Ayush Chandekar @ 2025-02-10 7:46 ` Ayush Chandekar 1 sibling, 0 replies; 6+ messages in thread From: Ayush Chandekar @ 2025-02-10 7:46 UTC (permalink / raw) To: git Add a test to verify that the multi-pack-index verify command shows progress during the QSORT operation. Create 100 test objects, repack them, and verify the progress reaches 100% during sorting Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com> --- This test makes sure the progress reaches 100%, but I couldn't find a way which could verify that the progress went from 0% to 100% with intermediates. I would like if someone can suggest a method for this. Thanks, Ayush t/t5319-multi-pack-index.sh | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 0f215ad2e8..d368e22e3a 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -658,6 +658,20 @@ test_expect_success 'verify incorrect 64-bit offset' ' "incorrect object offset" ' +test_expect_success 'verify shows QSORT progress' ' + # Create test objects + for i in $(test_seq 1 100) + do + echo "content $i" | \ + git hash-object -w --stdin \ + || return 1 + done && + git repack -ad && + git multi-pack-index write && + GIT_PROGRESS_DELAY=0 git multi-pack-index verify --progress 2>actual && + grep "Sorting objects by packfile: *100%" actual +' + test_expect_success 'setup expire tests' ' mkdir dup && ( -- 2.48.GIT ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-02-11 16:29 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-10 7:46 [GSOC][RFC PATCH 0/2] midx: implement progress reporting for QSORT operation Ayush Chandekar 2025-02-10 7:46 ` [PATCH 1/2] midx: show progress during " Ayush Chandekar 2025-02-10 16:55 ` Junio C Hamano 2025-02-11 12:23 ` Ayush Chandekar 2025-02-11 16:29 ` Junio C Hamano 2025-02-10 7:46 ` [PATCH 2/2] t5319: add test for MIDX QSORT progress reporting Ayush Chandekar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).