git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Kousik Sanagavarapu <five231003@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] ref-filter: sort numerically when ":size" is used
Date: Fri, 01 Sep 2023 09:43:07 -0700	[thread overview]
Message-ID: <xmqqa5u5rgis.fsf@gitster.g> (raw)
In-Reply-To: <20230901142624.12063-1-five231003@gmail.com> (Kousik Sanagavarapu's message of "Fri, 1 Sep 2023 19:54:54 +0530")

Kousik Sanagavarapu <five231003@gmail.com> writes:

> Atoms like "raw" and "contents" have a ":size" option which can be used
> to know the size of the data. Since these atoms have the cmp_type
> FIELD_STR, they are sorted alphabetically from 'a' to 'z' and '0' to
> '9'. Meaning, even when the ":size" option is used and what we
> ultimatlely have is numbers, we still sort alphabetically.

There are other cmp_types already defined, like ULONG and TIME.  How
are they used and affect the comparison?  Naively, :size sounds like
a good candidate to compare as ULONG, as it cannot be negative even
though 0 is a valid size.

I understand and agree with the motivation, but the implementation
looks puzzling.

> diff --git a/ref-filter.c b/ref-filter.c
> index 1bfaf20fbf..5d7bea5f23 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -932,7 +932,13 @@ struct atom_value {
>  	ssize_t s_size;
>  	int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
>  		       struct strbuf *err);
> -	uintmax_t value; /* used for sorting when not FIELD_STR */
> +
> +	/*
> +	 * Used for sorting when not FIELD_STR or when FIELD_STR but the
> +	 * sort should be numeric and not alphabetic.
> +	 */
> +	uintmax_t value;

This does not explain why we cannot make <anything>:size FIELD_ULONG
for <anything> that is of FIELD_STR type, though.  IOW, why such a
strange "when FIELD_STR but the sort should be numeric" is needed?
If you have a <size>, shouldn't it always be numeric?

IOW, when you notice that you need to set, say, u.contents.option of
an atom to C_LENGTH, shouldn't you set cmp_type of the atom to
FIELD_ULONG, somewhere in contents_atom_parser() and friends, and
everything should naturally follow, no?

It seems that support for other cmp_types are incomplete in the
current code.  There are FIELD_ULONG and FIELD_TIME defined, but
they do not appear to be used in any way, so the cmp_ref_sorting()
would need to be updated to make it actually pay attention to the
cmp_type and perform numeric comparison.

> @@ -1883,8 +1890,10 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
>  			v->s = strbuf_detach(&sb, NULL);
>  		} else if (atom->u.contents.option == C_BODY_DEP)
>  			v->s = xmemdupz(bodypos, bodylen);
> -		else if (atom->u.contents.option == C_LENGTH)
> -			v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos));
> +		else if (atom->u.contents.option == C_LENGTH) {
> +			v->value = (uintmax_t)strlen(subpos);
> +			v->s = xstrfmt("%"PRIuMAX, v->value);
> +		}

We should take a note that all of these v->value are *per* *item*
that will be sorted.

> @@ -2986,7 +2996,7 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
>  		cmp_detached_head = 1;
>  	} else if (s->sort_flags & REF_SORTING_VERSION) {
>  		cmp = versioncmp(va->s, vb->s);
> -	} else if (cmp_type == FIELD_STR) {
> +	} else if (cmp_type == FIELD_STR && !va->value && !vb->value) {

Two refs may point at an empty object with zero length, i.e.  for
them, !v->value is true, and another ref may point at a non-empty
object.  The two empty refs are compared with an algorithm different
from the algorithm used to compare the empty ref and the non-empty
ref.  Isn't this broken as a comparison function to be given to
QSORT(), which must be transitive (e.g. if A < B and B < C, then it
should be guaranteed that A < C and you do not have to compare A and
C)?

IOW, the choice of the comparison algorithm should not depend on an
attribute (like value or s) that is specific to the item being
compared.  Things like cmp_type that is defined at the used_atom
lefvel to make the sorting function stable, I would think.

  reply	other threads:[~2023-09-01 16:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-01 14:24 [PATCH] ref-filter: sort numerically when ":size" is used Kousik Sanagavarapu
2023-09-01 16:43 ` Junio C Hamano [this message]
2023-09-01 17:45   ` Jeff King
2023-09-01 17:59     ` Junio C Hamano
2023-09-01 18:32       ` Jeff King
2023-09-01 18:59         ` Kousik Sanagavarapu
2023-09-01 19:16           ` Jeff King
2023-09-01 20:21             ` Junio C Hamano
2023-09-01 20:51               ` Jeff King
2023-09-01 20:04           ` Junio C Hamano
2023-09-01 20:40             ` Jeff King
2023-09-02  9:00 ` [PATCH v2] " Kousik Sanagavarapu
2023-09-02  9:11   ` Kousik Sanagavarapu
2023-09-02 22:19   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqa5u5rgis.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=five231003@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).