From: Junio C Hamano <gitster@pobox.com>
To: Kousik Sanagavarapu <five231003@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] ref-filter: sort numerically when ":size" is used
Date: Fri, 01 Sep 2023 09:43:07 -0700 [thread overview]
Message-ID: <xmqqa5u5rgis.fsf@gitster.g> (raw)
In-Reply-To: <20230901142624.12063-1-five231003@gmail.com> (Kousik Sanagavarapu's message of "Fri, 1 Sep 2023 19:54:54 +0530")
Kousik Sanagavarapu <five231003@gmail.com> writes:
> Atoms like "raw" and "contents" have a ":size" option which can be used
> to know the size of the data. Since these atoms have the cmp_type
> FIELD_STR, they are sorted alphabetically from 'a' to 'z' and '0' to
> '9'. Meaning, even when the ":size" option is used and what we
> ultimatlely have is numbers, we still sort alphabetically.
There are other cmp_types already defined, like ULONG and TIME. How
are they used and affect the comparison? Naively, :size sounds like
a good candidate to compare as ULONG, as it cannot be negative even
though 0 is a valid size.
I understand and agree with the motivation, but the implementation
looks puzzling.
> diff --git a/ref-filter.c b/ref-filter.c
> index 1bfaf20fbf..5d7bea5f23 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -932,7 +932,13 @@ struct atom_value {
> ssize_t s_size;
> int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
> struct strbuf *err);
> - uintmax_t value; /* used for sorting when not FIELD_STR */
> +
> + /*
> + * Used for sorting when not FIELD_STR or when FIELD_STR but the
> + * sort should be numeric and not alphabetic.
> + */
> + uintmax_t value;
This does not explain why we cannot make <anything>:size FIELD_ULONG
for <anything> that is of FIELD_STR type, though. IOW, why such a
strange "when FIELD_STR but the sort should be numeric" is needed?
If you have a <size>, shouldn't it always be numeric?
IOW, when you notice that you need to set, say, u.contents.option of
an atom to C_LENGTH, shouldn't you set cmp_type of the atom to
FIELD_ULONG, somewhere in contents_atom_parser() and friends, and
everything should naturally follow, no?
It seems that support for other cmp_types are incomplete in the
current code. There are FIELD_ULONG and FIELD_TIME defined, but
they do not appear to be used in any way, so the cmp_ref_sorting()
would need to be updated to make it actually pay attention to the
cmp_type and perform numeric comparison.
> @@ -1883,8 +1890,10 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
> v->s = strbuf_detach(&sb, NULL);
> } else if (atom->u.contents.option == C_BODY_DEP)
> v->s = xmemdupz(bodypos, bodylen);
> - else if (atom->u.contents.option == C_LENGTH)
> - v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos));
> + else if (atom->u.contents.option == C_LENGTH) {
> + v->value = (uintmax_t)strlen(subpos);
> + v->s = xstrfmt("%"PRIuMAX, v->value);
> + }
We should take a note that all of these v->value are *per* *item*
that will be sorted.
> @@ -2986,7 +2996,7 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
> cmp_detached_head = 1;
> } else if (s->sort_flags & REF_SORTING_VERSION) {
> cmp = versioncmp(va->s, vb->s);
> - } else if (cmp_type == FIELD_STR) {
> + } else if (cmp_type == FIELD_STR && !va->value && !vb->value) {
Two refs may point at an empty object with zero length, i.e. for
them, !v->value is true, and another ref may point at a non-empty
object. The two empty refs are compared with an algorithm different
from the algorithm used to compare the empty ref and the non-empty
ref. Isn't this broken as a comparison function to be given to
QSORT(), which must be transitive (e.g. if A < B and B < C, then it
should be guaranteed that A < C and you do not have to compare A and
C)?
IOW, the choice of the comparison algorithm should not depend on an
attribute (like value or s) that is specific to the item being
compared. Things like cmp_type that is defined at the used_atom
lefvel to make the sorting function stable, I would think.
next prev parent reply other threads:[~2023-09-01 16:43 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-01 14:24 [PATCH] ref-filter: sort numerically when ":size" is used Kousik Sanagavarapu
2023-09-01 16:43 ` Junio C Hamano [this message]
2023-09-01 17:45 ` Jeff King
2023-09-01 17:59 ` Junio C Hamano
2023-09-01 18:32 ` Jeff King
2023-09-01 18:59 ` Kousik Sanagavarapu
2023-09-01 19:16 ` Jeff King
2023-09-01 20:21 ` Junio C Hamano
2023-09-01 20:51 ` Jeff King
2023-09-01 20:04 ` Junio C Hamano
2023-09-01 20:40 ` Jeff King
2023-09-02 9:00 ` [PATCH v2] " Kousik Sanagavarapu
2023-09-02 9:11 ` Kousik Sanagavarapu
2023-09-02 22:19 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqa5u5rgis.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=five231003@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).