git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kousik Sanagavarapu <five231003@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>,
	git@vger.kernel.org,
	Christian Couder <christian.couder@gmail.com>
Subject: Re: [PATCH] t6300: values containing ')' are broken in ref formats
Date: Wed, 6 Nov 2024 09:24:08 +0530	[thread overview]
Message-ID: <ZyroYBwtQtgc6NoR@five231003> (raw)
In-Reply-To: <xmqq8qtxqcye.fsf@gitster.g>

On Tue, Nov 05, 2024 at 07:05:13PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > I am tempted to say the solution is to expand that "equals" value, and
> > possibly add some less-arcane version of the character (maybe "%)"?).
> > But it be a break in backwards compatibility if somebody is trying to
> > match literal %-chars in their "if" block.
> 
> If they were trying to write a literal %, wouldn't they be writing
> %% already, not because % followed by a byte without any special
> meaning happens to be passed intact by the implementation, but
> because that is _the_ right thing to do, when % is used as an
> introducer for escape sequences?  So I do agree it would be a change
> that breaks backward compatibility but I do not think we want to
> stay bug to bug compatible with the current behaviour here.  I am
> not sure with the wisdom of %) though.  Wouldn't "%(foo %)" look as
> if %( opens and %) closes a group in our language?
> 
> So I am very much in favor of this "if condition should be expanded
> before comparison" solution.

I had worked on this "if condition should be expanded before comparison"
solution but Christian and I agreed that it would be better to open up
discussion incase there would be other possible and better solutions
which would also be viable in the long term.

This solution relies on how we parse out literals in ref-filter, so '%%'
would work as intended in the comparision string - ie if we want to
compare against "some%ref", we would do "%(if:equals=some%%ref)...".

Also it is obscure that someone will use this in practice as I myself
had discovered this when working on a corner case of some other
implementation related to parsing but way lower in the call-chain of
ref-filter ;) but here goes

(this applies on top of the current patch)

------------------------ >8 ------------------------
Subject: [PATCH] ref-filter: parse parentheses correctly in %(if) atoms

Having a ')' in "<string>" in ":equals=<string>" or
":notequals=<string>" wouldn't parse correctly in ref-filter as
documented in the previous commit.

One way to fix this is refactoring the way in which we parse our format
string.  Although this would mean we would have to do a huge refactoring
as this step happens very high up in the call chain.

Therefore, support including parenthesis characters in "<string>" by
instead giving their hexcode equivalents - as a for-now hack.

Do this by further abstracting "append_literal()" to "parse_literal()"
where the output is no longer stored into a ref formatting stack's
strbuf but a standard standalone strbuf.  append_literal would then
hence wrap appropriately around "parse_literal()".

Also introduce "convert_hexcode()" which also wraps around
"parse_literal()" and must be used to convert hexcode in a given string
and be silent when such a string doesn't contain hexcode or doesn't
exist (ie is NULL).

Using "convert_hexcode()" would mean that we now have an alloced string -
hence free() it once we are done with it to prevent any memory leaks.

Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com>
---
 Documentation/git-for-each-ref.txt |  4 ++
 ref-filter.c                       | 76 +++++++++++++++++++++++-------
 t/t6300-for-each-ref.sh            |  6 ++-
 3 files changed, 66 insertions(+), 20 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt
index d3764401a2..ce12400040 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -221,6 +221,10 @@ if::
 	the value between the %(if:...) and %(then) atoms with the
 	given string.
 
+	Additionally, if `<string>` must contain parenthesis, then these
+	parentheses are spelled out as hexcode.  For e.g., `1)someref`
+	would need to be `1%29someref`.
+
 symref::
 	The ref which the given symbolic ref refers to. If not a
 	symbolic ref, nothing is printed. Respects the `:short`,
diff --git a/ref-filter.c b/ref-filter.c
index 84c6036107..ebdb2daeb7 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1234,13 +1234,64 @@ static void if_then_else_handler(struct ref_formatting_stack **stack)
 	*stack = cur;
 }
 
+/*
+ * Parse out literals in the string pointed to by "cp" and store them in
+ * a strbuf - this would go on until we hit NUL or "ep".
+ *
+ * While at it, if they're of the form "%xx", where xx represents the
+ * hexcode of some character, then convert them into their equivalent
+ * characters.
+ */
+static void parse_literal(const char *cp, const char *ep,
+			  struct strbuf *s)
+{
+	while (*cp && (!ep || cp < ep)) {
+		if (*cp == '%') {
+			if (cp[1] == '%')
+				cp++;
+			else {
+				int ch = hex2chr(cp + 1);
+				if (0 <= ch) {
+					strbuf_addch(s, ch);
+					cp += 3;
+					continue;
+				}
+			}
+		}
+		strbuf_addch(s, *cp);
+		cp++;
+	}
+}
+
+/*
+ * Convert the string, pointed to by "cp", which might or might not
+ * contain hexcode (in the format of "%xx" where xx is the hexcode) to
+ * its character-equivalent string and return it.
+ *
+ * If the string does not contain any hexcode - then it is returned as
+ * is.
+ */
+static const char *convert_hexcode(const char *cp)
+{
+	struct strbuf s = STRBUF_INIT;
+
+	if (!cp)
+		return NULL;
+	/*
+	 * This has the effect of an in-place translation but
+	 * implementation-wise it is not.
+	 */
+	parse_literal(cp, NULL, &s);
+	return strbuf_detach(&s, NULL);
+}
+
 static int if_atom_handler(struct atom_value *atomv, struct ref_formatting_state *state,
 			   struct strbuf *err UNUSED)
 {
 	struct ref_formatting_stack *new_stack;
 	struct if_then_else *if_then_else = xcalloc(1, sizeof(*if_then_else));
 
-	if_then_else->str = atomv->atom->u.if_then_else.str;
+	if_then_else->str = convert_hexcode(atomv->atom->u.if_then_else.str);
 	if_then_else->cmp_status = atomv->atom->u.if_then_else.cmp_status;
 
 	push_stack_element(&state->stack);
@@ -1296,6 +1347,9 @@ static int then_atom_handler(struct atom_value *atomv UNUSED,
 			if_then_else->condition_satisfied = 1;
 	} else if (cur->output.len && !is_empty(&cur->output))
 		if_then_else->condition_satisfied = 1;
+
+	if (if_then_else->str)
+		free((char *)if_then_else->str);
 	strbuf_reset(&cur->output);
 	return 0;
 }
@@ -3425,26 +3479,12 @@ void ref_array_sort(struct ref_sorting *sorting, struct ref_array *array)
 		QSORT_S(array->items, array->nr, compare_refs, sorting);
 }
 
-static void append_literal(const char *cp, const char *ep, struct ref_formatting_state *state)
+static void append_literal(const char *cp, const char *ep,
+			   struct ref_formatting_state *state)
 {
 	struct strbuf *s = &state->stack->output;
 
-	while (*cp && (!ep || cp < ep)) {
-		if (*cp == '%') {
-			if (cp[1] == '%')
-				cp++;
-			else {
-				int ch = hex2chr(cp + 1);
-				if (0 <= ch) {
-					strbuf_addch(s, ch);
-					cp += 3;
-					continue;
-				}
-			}
-		}
-		strbuf_addch(s, *cp);
-		cp++;
-	}
+	parse_literal(cp, ep, s);
 }
 
 int format_ref_array_item(struct ref_array_item *info,
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index ce5c607193..c9383d23a4 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -2141,7 +2141,7 @@ test_expect_success GPG 'show lack of signature with custom format' '
 	test_cmp expect actual
 '
 
-test_expect_failure 'format having values containing ) parse correctly' '
+test_expect_success 'format having values containing ) parse correctly' '
 	git branch "1)feat" &&
 	cat >expect <<-\EOF &&
 	refs/heads/1)feat
@@ -2151,7 +2151,9 @@ test_expect_failure 'format having values containing ) parse correctly' '
 	not equals
 	not equals
 	EOF
-	git for-each-ref --format="%(if:equals=1)feat)%(refname:short)%(then)%(refname)%(else)not equals%(end)" \
+
+	# 29 is the hexcode of )
+	git for-each-ref --format="%(if:equals=1%29feat)%(refname:short)%(then)%(refname)%(else)not equals%(end)" \
 		refs/heads/ >actual &&
 	test_cmp expect actual
 '
-- 
2.47.0.230.g0cf584699a


  reply	other threads:[~2024-11-06  3:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-05 18:41 [PATCH] t6300: values containing ')' are broken in ref formats Kousik Sanagavarapu
2024-11-06  1:18 ` Junio C Hamano
2024-11-06  2:25   ` Jeff King
2024-11-06  3:05     ` Junio C Hamano
2024-11-06  3:54       ` Kousik Sanagavarapu [this message]
2024-11-06 18:55         ` Jeff King
2024-11-07  2:34           ` Kousik Sanagavarapu
2024-11-06 18:51       ` Jeff King
2024-11-07  2:29         ` Kousik Sanagavarapu
2024-11-07  2:52         ` Junio C Hamano
2024-11-08  4:11           ` Kousik Sanagavarapu
2024-11-08 17:16             ` Jeff King
2024-11-08 18:12               ` Kousik Sanagavarapu
2024-11-06  2:40   ` Kousik Sanagavarapu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZyroYBwtQtgc6NoR@five231003 \
    --to=five231003@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).