All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Kousik Sanagavarapu <five231003@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] ref-filter: sort numerically when ":size" is used
Date: Fri, 01 Sep 2023 09:43:07 -0700	[thread overview]
Message-ID: <xmqqa5u5rgis.fsf@gitster.g> (raw)
In-Reply-To: <20230901142624.12063-1-five231003@gmail.com> (Kousik Sanagavarapu's message of "Fri, 1 Sep 2023 19:54:54 +0530")

Kousik Sanagavarapu <five231003@gmail.com> writes:

> Atoms like "raw" and "contents" have a ":size" option which can be used
> to know the size of the data. Since these atoms have the cmp_type
> FIELD_STR, they are sorted alphabetically from 'a' to 'z' and '0' to
> '9'. Meaning, even when the ":size" option is used and what we
> ultimatlely have is numbers, we still sort alphabetically.

There are other cmp_types already defined, like ULONG and TIME.  How
are they used and affect the comparison?  Naively, :size sounds like
a good candidate to compare as ULONG, as it cannot be negative even
though 0 is a valid size.

I understand and agree with the motivation, but the implementation
looks puzzling.

> diff --git a/ref-filter.c b/ref-filter.c
> index 1bfaf20fbf..5d7bea5f23 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -932,7 +932,13 @@ struct atom_value {
>  	ssize_t s_size;
>  	int (*handler)(struct atom_value *atomv, struct ref_formatting_state *state,
>  		       struct strbuf *err);
> -	uintmax_t value; /* used for sorting when not FIELD_STR */
> +
> +	/*
> +	 * Used for sorting when not FIELD_STR or when FIELD_STR but the
> +	 * sort should be numeric and not alphabetic.
> +	 */
> +	uintmax_t value;

This does not explain why we cannot make <anything>:size FIELD_ULONG
for <anything> that is of FIELD_STR type, though.  IOW, why such a
strange "when FIELD_STR but the sort should be numeric" is needed?
If you have a <size>, shouldn't it always be numeric?

IOW, when you notice that you need to set, say, u.contents.option of
an atom to C_LENGTH, shouldn't you set cmp_type of the atom to
FIELD_ULONG, somewhere in contents_atom_parser() and friends, and
everything should naturally follow, no?

It seems that support for other cmp_types are incomplete in the
current code.  There are FIELD_ULONG and FIELD_TIME defined, but
they do not appear to be used in any way, so the cmp_ref_sorting()
would need to be updated to make it actually pay attention to the
cmp_type and perform numeric comparison.

> @@ -1883,8 +1890,10 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct exp
>  			v->s = strbuf_detach(&sb, NULL);
>  		} else if (atom->u.contents.option == C_BODY_DEP)
>  			v->s = xmemdupz(bodypos, bodylen);
> -		else if (atom->u.contents.option == C_LENGTH)
> -			v->s = xstrfmt("%"PRIuMAX, (uintmax_t)strlen(subpos));
> +		else if (atom->u.contents.option == C_LENGTH) {
> +			v->value = (uintmax_t)strlen(subpos);
> +			v->s = xstrfmt("%"PRIuMAX, v->value);
> +		}

We should take a note that all of these v->value are *per* *item*
that will be sorted.

> @@ -2986,7 +2996,7 @@ static int cmp_ref_sorting(struct ref_sorting *s, struct ref_array_item *a, stru
>  		cmp_detached_head = 1;
>  	} else if (s->sort_flags & REF_SORTING_VERSION) {
>  		cmp = versioncmp(va->s, vb->s);
> -	} else if (cmp_type == FIELD_STR) {
> +	} else if (cmp_type == FIELD_STR && !va->value && !vb->value) {

Two refs may point at an empty object with zero length, i.e.  for
them, !v->value is true, and another ref may point at a non-empty
object.  The two empty refs are compared with an algorithm different
from the algorithm used to compare the empty ref and the non-empty
ref.  Isn't this broken as a comparison function to be given to
QSORT(), which must be transitive (e.g. if A < B and B < C, then it
should be guaranteed that A < C and you do not have to compare A and
C)?

IOW, the choice of the comparison algorithm should not depend on an
attribute (like value or s) that is specific to the item being
compared.  Things like cmp_type that is defined at the used_atom
lefvel to make the sorting function stable, I would think.

  reply	other threads:[~2023-09-01 16:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-01 14:24 [PATCH] ref-filter: sort numerically when ":size" is used Kousik Sanagavarapu
2023-09-01 16:43 ` Junio C Hamano [this message]
2023-09-01 17:45   ` Jeff King
2023-09-01 17:59     ` Junio C Hamano
2023-09-01 18:32       ` Jeff King
2023-09-01 18:59         ` Kousik Sanagavarapu
2023-09-01 19:16           ` Jeff King
2023-09-01 20:21             ` Junio C Hamano
2023-09-01 20:51               ` Jeff King
2023-09-01 20:04           ` Junio C Hamano
2023-09-01 20:40             ` Jeff King
2023-09-02  9:00 ` [PATCH v2] " Kousik Sanagavarapu
2023-09-02  9:11   ` Kousik Sanagavarapu
2023-09-02 22:19   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqa5u5rgis.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=five231003@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.