From: Junio C Hamano <gitster@pobox.com>
To: Ayush Chandekar <ayu.chandekar@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/2] midx: show progress during QSORT operation
Date: Mon, 10 Feb 2025 08:55:50 -0800 [thread overview]
Message-ID: <xmqqzfitbuy1.fsf@gitster.g> (raw)
In-Reply-To: <20250210074623.136599-2-ayu.chandekar@gmail.com> (Ayush Chandekar's message of "Mon, 10 Feb 2025 13:16:22 +0530")
Ayush Chandekar <ayu.chandekar@gmail.com> writes:
> Add progress reporting during the QSORT operation in multi-pack-index
> verification. This helps users track the progress of large sorting
> operations.
Hmph. If the implementation is correct (which I cannot tell), this
needs to explain why it is a bit better than saying nothing.
> +/*
> + * Limit calls to display_progress() for performance reasons.
> + * The interval here was arbitrarily chosen.
> + */
> +#define SPARSE_PROGRESS_INTERVAL (1 << 12)
> +#define midx_display_sparse_progress(progress, n) \
> + do { \
> + uint64_t _n = (n); \
> + if ((_n & (SPARSE_PROGRESS_INTERVAL - 1)) == 0) \
> + display_progress(progress, _n); \
> + } while (0)
> +
> struct pair_pos_vs_id
> {
> uint32_t pos;
> uint32_t pack_int_id;
> };
>
> +static struct progress *sort_progress;
> +static uint64_t last_max_pos;
> +
> static int compare_pair_pos_vs_id(const void *_a, const void *_b)
> {
> struct pair_pos_vs_id *a = (struct pair_pos_vs_id *)_a;
> struct pair_pos_vs_id *b = (struct pair_pos_vs_id *)_b;
This is a compar callback function used by the sorting machinery,
which is called QSORT but system-provided qsort() implementations
are not necessarily quick-sort [*].
> +
> + if (sort_progress) {
> + uint64_t max_pos = (a->pos > b->pos) ? a->pos : b->pos;
> + if (max_pos > last_max_pos) {
> + last_max_pos = max_pos;
> + midx_display_sparse_progress(sort_progress, last_max_pos);
> + }
> + }
So I do not quite understand the assumption this implementation of
the progress meter makes.
The assumption seems to be that the element in the array with the
highest index MUST not be summoned for comparison until the very end
of the sorting process, but what guarantees that? Even if we assume
that the qsort() implementation supplied by the system implements
the divide and conquer plain vanilla quicksort, it may divide the
array into half, and then sort the top half first before it sorts
the bottom half, and doing so recursively will give you the
comparison between elements near the end of the array with the
highest index fairly early in the process, no?
And the standard does not even specify what algorithm should
internally be used to implement qsort(3), which our QSORT() macro
eventually calls, so making any assumption on the order the elements
of the array is fed to the compar callback function sounds doubly a
frigile deal.
Thanks.
next prev parent reply other threads:[~2025-02-10 16:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-10 7:46 [GSOC][RFC PATCH 0/2] midx: implement progress reporting for QSORT operation Ayush Chandekar
2025-02-10 7:46 ` [PATCH 1/2] midx: show progress during " Ayush Chandekar
2025-02-10 16:55 ` Junio C Hamano [this message]
2025-02-11 12:23 ` Ayush Chandekar
2025-02-11 16:29 ` Junio C Hamano
2025-02-10 7:46 ` [PATCH 2/2] t5319: add test for MIDX QSORT progress reporting Ayush Chandekar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqzfitbuy1.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=ayu.chandekar@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).