Git development

Git development
 help / color / mirror / Atom feed

* [ANNOUNCE] Git Rev News edition 106
From: Christian Couder @ 2024-01-01 20:53 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jakub Narebski, Markus Jansen,
	Štěpán Němec, Kaartic Sivaraam, Taylor Blau,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason, lwn,
	Josh Soref, Eric Sunshine, Elijah Newren, VonC, Scott Chacon

Hi everyone,

The 106th edition of Git Rev News is now published:

  https://git.github.io/rev_news/2023/12/31/edition-106/

Thanks a lot to VonC, Josh Soref and Štěpán Němec who helped this month!

Enjoy,
Christian, Jakub, Markus and Kaartic.

PS: An issue for the next edition is already opened and contributions
are welcome:

  https://github.com/git/git.github.io/issues/679

^ permalink raw reply

* Re: [RFC PATCH] grep: default to posix digits with -P
From: Eric Sunshine @ 2024-01-01 18:07 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón; +Cc: git
In-Reply-To: <20240101150336.89098-1-carenas@gmail.com>

On Mon, Jan 1, 2024 at 10:04 AM Carlo Marcelo Arenas Belón
<carenas@gmail.com> wrote:
> Since acabd2048e (grep: correctly identify utf-8 characters with
> \{b,w} in -P, 2023-01-08), PCRE2's UCP mode has been enabled when
> UTF content was expected and therefore muktibyte characters are
> included as part of all character classes (when defined).

s/muktibyte/multibyte/

^ permalink raw reply

* Re: [RFC PATCH] grep: default to posix digits with -P
From: René Scharfe @ 2024-01-01 17:18 UTC (permalink / raw)
  To: Carlo Marcelo Arenas Belón, git
In-Reply-To: <20240101150336.89098-1-carenas@gmail.com>

Am 01.01.24 um 16:03 schrieb Carlo Marcelo Arenas Belón:
> Since acabd2048e (grep: correctly identify utf-8 characters with
> \{b,w} in -P, 2023-01-08), PCRE2's UCP mode has been enabled when
> UTF content was expected and therefore muktibyte characters are
> included as part of all character classes (when defined).
>
> note that if the locale used is not UTF enabled (specially: C, POSIX)
> or uses an extended charmap binary that is not unicode compatible, binary
> match will be used instead.
>
> It was argued that doing so, at least for \d, was not a good idea,
> as that might not be what the user expected based on its historical
> meaning and was also slower, and indeed a similar change that was done
> to GNU grep was reverted and required further tweaks.
>
> At that time, PCRE2 didn't have a way to disable UCP's character
> expansion selectively, but flags to do so, and that will be
> available in the next release (planned soon) were added, and
> one of them has been in use by GNU grep since their last release
> in May (only if built and linked against the prereleased PCRE2
> library though).
>
> Add flags to make sure that both \d and [:digit:] only include
> ASCII digits so that `git grep` will be closer to GNU grep and
> improve performance as a side effect, but add a configuration flag
> to allow keeping the current behaviour (which is closer to perl
> and ripgrep).
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  Documentation/config/grep.txt       |  4 ++++
>  grep.c                              | 37 ++++++++++++++++++++++-------
>  grep.h                              |  5 ++++
>  t/perf/p7822-grep-perl-character.sh | 11 +++++++--
>  t/t7818-grep-digit.sh               | 32 +++++++++++++++++++++++++
>  5 files changed, 78 insertions(+), 11 deletions(-)
>  create mode 100755 t/t7818-grep-digit.sh
>
> diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt
> index e521f20390..4e405c8ad1 100644
> --- a/Documentation/config/grep.txt
> +++ b/Documentation/config/grep.txt
> @@ -26,3 +26,7 @@ grep.fullName::
>  grep.fallbackToNoIndex::
>  	If set to true, fall back to git grep --no-index if git grep
>  	is executed outside of a git repository.  Defaults to false.
> +
> +grep.perl.digit::
> +	If set to true, use the perl definitions for \d and [:digit:].
> +	Defaults to false.
> diff --git a/grep.c b/grep.c
> index fc2d0c837a..fec36ccb30 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -88,6 +88,11 @@ int grep_config(const char *var, const char *value,
>  		return 0;
>  	}
>
> +	if (!strcmp(var, "grep.perl.digit")) {
> +		opt->perl_digit = git_config_bool(var, value);
> +		return 0;
> +	}
> +
>  	if (!strcmp(var, "color.grep"))
>  		opt->color = git_config_colorbool(var, value);
>  	if (!strcmp(var, "color.grep.match")) {
> @@ -301,6 +306,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  	int patinforet;
>  	size_t jitsizearg;
>  	int literal = !opt->ignore_case && (p->fixed || p->is_fixed);
> +	uint32_t xoptions = 0;
>
>  	/*
>  	 * Call pcre2_general_context_create() before calling any
> @@ -321,17 +327,22 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  		}
>  		options |= PCRE2_CASELESS;
>  	}
> -	if (!opt->ignore_locale && is_utf8_locale() && !literal)
> -		options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
> +	if (!opt->ignore_locale && is_utf8_locale() && !literal) {
> +		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
>
> -#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
> -	/*
> -	 * Work around a JIT bug related to invalid Unicode character handling
> -	 * fixed in 10.35:
> -	 * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
> -	 */
> -	options &= ~PCRE2_UCP;
> +#ifdef GIT_PCRE2_VERSION_10_35_OR_HIGHER
> +		/*
> +		 * Work around a JIT bug related to invalid Unicode character handling
> +		 * fixed in 10.35:
> +		 * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
> +		 */
> +		options |= PCRE2_UCP;
> +#ifdef GIT_PCRE2_VERSION_10_43_OR_HIGHER
> +		if (!opt->perl_digit)
> +			xoptions |= (PCRE2_EXTRA_ASCII_BSD | PCRE2_EXTRA_ASCII_DIGIT);
>  #endif
> +#endif

Why do we need that extra level of indentation?

The old code turned PCRE2_UCP on by default and turned it off for older
versions.  The new code enables PCRE2_UCP only for newer versions.  The
result should be the same, no?  So why change that part at all?

But the comment is now off, isn't it?  The workaround was turning
PCRE2_UCP off for older versions (because those were broken), not
turning it on for newer versions (because it would be required by some
unfixed regression).

Do we need to nest the checks for GIT_PCRE2_VERSION_10_35_OR_HIGHER and
GIT_PCRE2_VERSION_10_43_OR_HIGHER?  Keeping them separate would help
keep the #ifdef parts as short as possible and maintainable
independently.

> +	}
>
>  #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
>  	/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
> @@ -339,6 +350,14 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  		options |= PCRE2_NO_START_OPTIMIZE;
>  #endif
>
> +	if (xoptions) {
> +		if (!p->pcre2_compile_context)
> +			p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context);
> +
> +		pcre2_set_compile_extra_options(p->pcre2_compile_context,
> +						xoptions);
> +	}
> +
>  	p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
>  					 p->patternlen, options, &error, &erroffset,
>  					 p->pcre2_compile_context);
> diff --git a/grep.h b/grep.h
> index 926c0875c4..cd5c416a0a 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -4,6 +4,9 @@
>  #ifdef USE_LIBPCRE2
>  #define PCRE2_CODE_UNIT_WIDTH 8
>  #include <pcre2.h>
> +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 43) || PCRE2_MAJOR >= 11
> +#define GIT_PCRE2_VERSION_10_43_OR_HIGHER
> +#endif
>  #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
>  #define GIT_PCRE2_VERSION_10_36_OR_HIGHER
>  #endif
> @@ -178,6 +181,8 @@ struct grep_opt {
>
>  	void (*output)(struct grep_opt *opt, const void *data, size_t size);
>  	void *output_priv;
> +
> +	unsigned perl_digit:1;
>  };
>
>  #define GREP_OPT_INIT { \
> diff --git a/t/perf/p7822-grep-perl-character.sh b/t/perf/p7822-grep-perl-character.sh
> index 87009c60df..cc88d5a695 100755
> --- a/t/perf/p7822-grep-perl-character.sh
> +++ b/t/perf/p7822-grep-perl-character.sh
> @@ -8,6 +8,13 @@ etc.) we will test the patterns under those numbers of threads.
>
>  . ./perf-lib.sh
>
> +# setting a LOCALE is needed, but not yet supported by :
> +#. "$TEST_DIRECTORY"/lib-gettext.sh
> +
> +# Invoke like:
> +#
> +# LC_ALL=is_IS.utf8 ./p7822-grep-perl-character.sh
> +
>  test_perf_large_repo
>  test_checkout_worktree
>
> @@ -27,13 +34,13 @@ do
>  	if ! test_have_prereq PERF_GREP_ENGINES_THREADS
>  	then
>  		test_perf "grep -P '$pattern'" --prereq PCRE "
> -			git -P grep -f pat || :
> +			git grep -P -f pat || :
>  		"
>  	else
>  		for threads in $GIT_PERF_GREP_THREADS
>  		do
>  			test_perf "grep -P '$pattern' with $threads threads" --prereq PTHREADS,PCRE "
> -				git -c grep.threads=$threads -P grep -f pat || :
> +				git -c grep.threads=$threads grep -P -f pat || :

What's going on here?  You remove the -P option of git (--no-pager) and
add the -P option of git grep (--perl-regexp).  So this perf test never
tested PCRE despite its name?  Seems worth a separate patch.

Do the performance numbers in acabd2048e (grep: correctly identify utf-8
characters with \{b,w} in -P, 2023-01-08) still hold up in that light?

>  			"
>  		done
>  	fi

René


^ permalink raw reply

* Re: [Outreachy][PATCH v3] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: René Scharfe @ 2024-01-01 16:41 UTC (permalink / raw)
  To: Achu Luma, git
  Cc: chriscool, christian.couder, gitster, phillip.wood, steadmon
In-Reply-To: <20240101104017.9452-2-ach.lumap@gmail.com>

Am 01.01.24 um 11:40 schrieb Achu Luma:
> In the recent codebase update(8bf6fbd00d (Merge branch
> 'js/doc-unit-tests', 2023-12-09)), a new unit testing framework was
> merged, providing a standardized approach for testing C code. Prior to
> this update, some unit tests relied on the test helper mechanism,
> lacking a dedicated unit testing framework. It's more natural to perform
> these unit tests using the new unit test framework.
>
> This commit migrates the unit tests for C character classification
> functions (isdigit(), isspace(), etc) from the legacy approach
> using the test-tool command `test-tool ctype` in t/helper/test-ctype.c
> to the new unit testing framework (t/unit-tests/test-lib.h).
>
> The migration involves refactoring the tests to utilize the testing
> macros provided by the framework (TEST() and check_*()).
>
> Signed-off-by: Achu Luma <ach.lumap@gmail.com>

v1 and v2 had "Mentored-by: Christian Couder <chriscool@tuxfamily.org>".
It's gone now; intentionally?

> ---
>  Sorry for the poor description of the changes, maybe the following is better:
>  In the revised patch we added a call to test_msg() suggested by Phillip to
>  print the character code. This helps us pinpoint exactly the byte value that
>  is an issue as suggested by Junio. We keep using check_int() as it allows us
>  to still see the actual and expected return which might help us in case func()
>  returned a different non-zero value when we were expecting "1". It is useful
>  to have the return value printed on error in case we start returning "53"
>  instead of "1" for "true".

To illustrate, here are the changes between v1 and v2 in diff form:

--- >8 ---
diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
index 41189ba9f9..8a215d387a 100644
--- a/t/unit-tests/t-ctype.c
+++ b/t/unit-tests/t-ctype.c
@@ -14,12 +14,13 @@ static int is_in(const char *s, int ch)
 }

 /* Macro to test a character type */
-#define TEST_CTYPE_FUNC(func, string)			\
+#define TEST_CTYPE_FUNC(func, string)				\
 static void test_ctype_##func(void)				\
 {								\
-	int i;                                     	 	\
-	for (i = 0; i < 256; i++)                 		\
-		check_int(func(i), ==, is_in(string, i)); 	\
+	for (int i = 0; i < 256; i++) {				\
+        	if (!check_int(func(i), ==, is_in(string, i))) 	\
+        		test_msg("       i: %02x", i);		\
+        } 							\
 }

 #define DIGIT "0123456789"
--- >8 ---

v3 is the same as v2.

>
>  Makefile               |  2 +-
>  t/helper/test-ctype.c  | 70 --------------------------------------
>  t/helper/test-tool.c   |  1 -
>  t/helper/test-tool.h   |  1 -
>  t/t0070-fundamental.sh |  4 ---
>  t/unit-tests/t-ctype.c | 77 ++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 78 insertions(+), 77 deletions(-)
>  delete mode 100644 t/helper/test-ctype.c
>  create mode 100644 t/unit-tests/t-ctype.c
>
> diff --git a/Makefile b/Makefile
> index 88ba7a3c51..a4df48ba65 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -792,7 +792,6 @@ TEST_BUILTINS_OBJS += test-chmtime.o
>  TEST_BUILTINS_OBJS += test-config.o
>  TEST_BUILTINS_OBJS += test-crontab.o
>  TEST_BUILTINS_OBJS += test-csprng.o
> -TEST_BUILTINS_OBJS += test-ctype.o
>  TEST_BUILTINS_OBJS += test-date.o
>  TEST_BUILTINS_OBJS += test-delta.o
>  TEST_BUILTINS_OBJS += test-dir-iterator.o
> @@ -1341,6 +1340,7 @@ THIRD_PARTY_SOURCES += sha1dc/%
>
>  UNIT_TEST_PROGRAMS += t-basic
>  UNIT_TEST_PROGRAMS += t-strbuf
> +UNIT_TEST_PROGRAMS += t-ctype
>  UNIT_TEST_PROGS = $(patsubst %,$(UNIT_TEST_BIN)/%$X,$(UNIT_TEST_PROGRAMS))
>  UNIT_TEST_OBJS = $(patsubst %,$(UNIT_TEST_DIR)/%.o,$(UNIT_TEST_PROGRAMS))
>  UNIT_TEST_OBJS += $(UNIT_TEST_DIR)/test-lib.o
> diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
> deleted file mode 100644
> index e5659df40b..0000000000
> --- a/t/helper/test-ctype.c
> +++ /dev/null
> @@ -1,70 +0,0 @@
> -#include "test-tool.h"
> -
> -static int rc;
> -
> -static void report_error(const char *class, int ch)
> -{
> -	printf("%s classifies char %d (0x%02x) wrongly\n", class, ch, ch);
> -	rc = 1;
> -}
> -
> -static int is_in(const char *s, int ch)
> -{
> -	/*
> -	 * We can't find NUL using strchr. Accept it as the first
> -	 * character in the spec -- there are no empty classes.
> -	 */
> -	if (ch == '\0')
> -		return ch == *s;
> -	if (*s == '\0')
> -		s++;
> -	return !!strchr(s, ch);
> -}
> -
> -#define TEST_CLASS(t,s) {			\
> -	int i;					\
> -	for (i = 0; i < 256; i++) {		\
> -		if (is_in(s, i) != t(i))	\
> -			report_error(#t, i);	\
> -	}					\
> -	if (t(EOF))				\
> -		report_error(#t, EOF);		\
> -}
> -
> -#define DIGIT "0123456789"
> -#define LOWER "abcdefghijklmnopqrstuvwxyz"
> -#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> -#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
> -#define ASCII \
> -	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> -	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> -	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
> -	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
> -	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
> -	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
> -	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
> -	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
> -#define CNTRL \
> -	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> -	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> -	"\x7f"
> -
> -int cmd__ctype(int argc UNUSED, const char **argv UNUSED)
> -{
> -	TEST_CLASS(isdigit, DIGIT);
> -	TEST_CLASS(isspace, " \n\r\t");
> -	TEST_CLASS(isalpha, LOWER UPPER);
> -	TEST_CLASS(isalnum, LOWER UPPER DIGIT);
> -	TEST_CLASS(is_glob_special, "*?[\\");
> -	TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
> -	TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
> -	TEST_CLASS(isascii, ASCII);
> -	TEST_CLASS(islower, LOWER);
> -	TEST_CLASS(isupper, UPPER);
> -	TEST_CLASS(iscntrl, CNTRL);
> -	TEST_CLASS(ispunct, PUNCT);
> -	TEST_CLASS(isxdigit, DIGIT "abcdefABCDEF");
> -	TEST_CLASS(isprint, LOWER UPPER DIGIT PUNCT " ");
> -
> -	return rc;
> -}
> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> index 37ba996539..33b9501c21 100644
> --- a/t/helper/test-tool.c
> +++ b/t/helper/test-tool.c
> @@ -19,7 +19,6 @@ static struct test_cmd cmds[] = {
>  	{ "config", cmd__config },
>  	{ "crontab", cmd__crontab },
>  	{ "csprng", cmd__csprng },
> -	{ "ctype", cmd__ctype },
>  	{ "date", cmd__date },
>  	{ "delta", cmd__delta },
>  	{ "dir-iterator", cmd__dir_iterator },
> diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> index 8a1a7c63da..b72f07ded9 100644
> --- a/t/helper/test-tool.h
> +++ b/t/helper/test-tool.h
> @@ -12,7 +12,6 @@ int cmd__chmtime(int argc, const char **argv);
>  int cmd__config(int argc, const char **argv);
>  int cmd__crontab(int argc, const char **argv);
>  int cmd__csprng(int argc, const char **argv);
> -int cmd__ctype(int argc, const char **argv);
>  int cmd__date(int argc, const char **argv);
>  int cmd__delta(int argc, const char **argv);
>  int cmd__dir_iterator(int argc, const char **argv);
> diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
> index 487bc8d905..a4756fbab9 100755
> --- a/t/t0070-fundamental.sh
> +++ b/t/t0070-fundamental.sh
> @@ -9,10 +9,6 @@ Verify wrappers and compatibility functions.
>  TEST_PASSES_SANITIZE_LEAK=true
>  . ./test-lib.sh
>
> -test_expect_success 'character classes (isspace, isalpha etc.)' '
> -	test-tool ctype
> -'
> -
>  test_expect_success 'mktemp to nonexistent directory prints filename' '
>  	test_must_fail test-tool mktemp doesnotexist/testXXXXXX 2>err &&
>  	grep "doesnotexist/test" err
> diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
> new file mode 100644
> index 0000000000..8a215d387a
> --- /dev/null
> +++ b/t/unit-tests/t-ctype.c
> @@ -0,0 +1,77 @@
> +#include "test-lib.h"
> +
> +static int is_in(const char *s, int ch)
> +{
> +	/*
> +	 * We can't find NUL using strchr. Accept it as the first
> +	 * character in the spec -- there are no empty classes.
> +	 */
> +	if (ch == '\0')
> +		return ch == *s;
> +	if (*s == '\0')
> +		s++;
> +	return !!strchr(s, ch);
> +}
> +
> +/* Macro to test a character type */
> +#define TEST_CTYPE_FUNC(func, string)				\
> +static void test_ctype_##func(void)				\
> +{								\
> +	for (int i = 0; i < 256; i++) {				\
> +        	if (!check_int(func(i), ==, is_in(string, i))) 	\
> +        		test_msg("       i: %02x", i);		\
> +        } 							\

This loop is indented with spaces followed by tabs.  The Git project
prefers indenting by tabs and in some cases tabs followed by spaces, but
not the other way array.  git am warns about such whitespace errors and
can actually fix them automatically, so I imagine this wouldn't be a
deal breaker.  But even if it seems picky, respecting the project's
preferences from the start reduces unnecessary friction.

The original test reported the number of a misclassified character
(basically its ASCII code) in both decimal and hexadecimal form.  This
code prints it only in hexadecimal, but without the prefix "0x".  A
casual reader could mistake hexadecimal numbers for decimal ones as a
result.  Printing only one suffices, but I think it's better to either
use decimal notation without any prefix or hexadecimal with the "0x"
prefix to avoid confusion.  There's no reason to be stingy with the
screen space here.

> +}
> +
> +#define DIGIT "0123456789"
> +#define LOWER "abcdefghijklmnopqrstuvwxyz"
> +#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> +#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
> +#define ASCII \
> +	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> +	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> +	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
> +	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
> +	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
> +	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
> +	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
> +	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
> +#define CNTRL \
> +	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> +	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> +	"\x7f"
> +
> +TEST_CTYPE_FUNC(isdigit, DIGIT)
> +TEST_CTYPE_FUNC(isspace, " \n\r\t")
> +TEST_CTYPE_FUNC(isalpha, LOWER UPPER)
> +TEST_CTYPE_FUNC(isalnum, LOWER UPPER DIGIT)
> +TEST_CTYPE_FUNC(is_glob_special, "*?[\\")
> +TEST_CTYPE_FUNC(is_regex_special, "$()*+.?[\\^{|")
> +TEST_CTYPE_FUNC(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~")
> +TEST_CTYPE_FUNC(isascii, ASCII)
> +TEST_CTYPE_FUNC(islower, LOWER)
> +TEST_CTYPE_FUNC(isupper, UPPER)
> +TEST_CTYPE_FUNC(iscntrl, CNTRL)
> +TEST_CTYPE_FUNC(ispunct, PUNCT)
> +TEST_CTYPE_FUNC(isxdigit, DIGIT "abcdefABCDEF")
> +TEST_CTYPE_FUNC(isprint, LOWER UPPER DIGIT PUNCT " ")
> +
> +int cmd_main(int argc, const char **argv) {
> +	/* Run all character type tests */
> +	TEST(test_ctype_isspace(), "isspace() works as we expect");
> +	TEST(test_ctype_isdigit(), "isdigit() works as we expect");
> +	TEST(test_ctype_isalpha(), "isalpha() works as we expect");
> +	TEST(test_ctype_isalnum(), "isalnum() works as we expect");
> +	TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
> +	TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
> +	TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
> +	TEST(test_ctype_isascii(), "isascii() works as we expect");
> +	TEST(test_ctype_islower(), "islower() works as we expect");
> +	TEST(test_ctype_isupper(), "isupper() works as we expect");
> +	TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
> +	TEST(test_ctype_ispunct(), "ispunct() works as we expect");
> +	TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
> +	TEST(test_ctype_isprint(), "isprint() works as we expect");

Each class (e.g. space or digit) is mentioned thrice here: When
declaring its function with TEST_CTYPE_FUNC, when calling said function
and again in the test description.  Adding a new class requires adding
two lines of code.  That's not too bad, but the original implementation
didn't require that repetition and adding a new class only required
adding a single line.

I mentioned this briefly in my review of v1 in
https://lore.kernel.org/git/f743b473-40f8-423d-bf5b-d42b92e5aa1b@web.de/
and showed a way to define character classes without repeating their
names.  You don't have to follow that suggestion, but it would be nice
if you could give some feedback about it.

> +
> +	return test_done();
> +}

René


^ permalink raw reply related

* Re: Concurrent fetch commands
From: Federico Kircheis @ 2024-01-01 15:47 UTC (permalink / raw)
  To: git
In-Reply-To: <55620f4f-80f9-4e9a-8947-dce5d2a5113d@haller-berlin.de>

On 01/01/2024 12.23, Stefan Haller wrote:
> On 31.12.23 14:50, Konstantin Tokarev wrote:
>> В Sun, 31 Dec 2023 14:30:05 +0100
>> Stefan Haller <lists@haller-berlin.de> пишет:
>>
>>> ... it is common for git clients to run git fetch
>>> periodically in the background.
>>
>> I think it's a really weird idea which makes a disservice both to human
>> git users and git servers. IMO clients doing this should be fixed to
>> avoid such behavior, at least by default.
> 
> I disagree pretty strongly. Users expect this, and I do find it very
> convenient myself.
> 

Especially when working with git worktree.

^ permalink raw reply

* [RFC PATCH] grep: default to posix digits with -P
From: Carlo Marcelo Arenas Belón @ 2024-01-01 15:03 UTC (permalink / raw)
  To: git; +Cc: Carlo Marcelo Arenas Belón

Since acabd2048e (grep: correctly identify utf-8 characters with
\{b,w} in -P, 2023-01-08), PCRE2's UCP mode has been enabled when
UTF content was expected and therefore muktibyte characters are
included as part of all character classes (when defined).

note that if the locale used is not UTF enabled (specially: C, POSIX)
or uses an extended charmap binary that is not unicode compatible, binary
match will be used instead.

It was argued that doing so, at least for \d, was not a good idea,
as that might not be what the user expected based on its historical
meaning and was also slower, and indeed a similar change that was done
to GNU grep was reverted and required further tweaks.

At that time, PCRE2 didn't have a way to disable UCP's character
expansion selectively, but flags to do so, and that will be
available in the next release (planned soon) were added, and
one of them has been in use by GNU grep since their last release
in May (only if built and linked against the prereleased PCRE2
library though).

Add flags to make sure that both \d and [:digit:] only include
ASCII digits so that `git grep` will be closer to GNU grep and
improve performance as a side effect, but add a configuration flag
to allow keeping the current behaviour (which is closer to perl
and ripgrep).

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 Documentation/config/grep.txt       |  4 ++++
 grep.c                              | 37 ++++++++++++++++++++++-------
 grep.h                              |  5 ++++
 t/perf/p7822-grep-perl-character.sh | 11 +++++++--
 t/t7818-grep-digit.sh               | 32 +++++++++++++++++++++++++
 5 files changed, 78 insertions(+), 11 deletions(-)
 create mode 100755 t/t7818-grep-digit.sh

diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt
index e521f20390..4e405c8ad1 100644
--- a/Documentation/config/grep.txt
+++ b/Documentation/config/grep.txt
@@ -26,3 +26,7 @@ grep.fullName::
 grep.fallbackToNoIndex::
 	If set to true, fall back to git grep --no-index if git grep
 	is executed outside of a git repository.  Defaults to false.
+
+grep.perl.digit::
+	If set to true, use the perl definitions for \d and [:digit:].
+	Defaults to false.
diff --git a/grep.c b/grep.c
index fc2d0c837a..fec36ccb30 100644
--- a/grep.c
+++ b/grep.c
@@ -88,6 +88,11 @@ int grep_config(const char *var, const char *value,
 		return 0;
 	}
 
+	if (!strcmp(var, "grep.perl.digit")) {
+		opt->perl_digit = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "color.grep"))
 		opt->color = git_config_colorbool(var, value);
 	if (!strcmp(var, "color.grep.match")) {
@@ -301,6 +306,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	int patinforet;
 	size_t jitsizearg;
 	int literal = !opt->ignore_case && (p->fixed || p->is_fixed);
+	uint32_t xoptions = 0;
 
 	/*
 	 * Call pcre2_general_context_create() before calling any
@@ -321,17 +327,22 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		}
 		options |= PCRE2_CASELESS;
 	}
-	if (!opt->ignore_locale && is_utf8_locale() && !literal)
-		options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
+	if (!opt->ignore_locale && is_utf8_locale() && !literal) {
+		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
 
-#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
-	/*
-	 * Work around a JIT bug related to invalid Unicode character handling
-	 * fixed in 10.35:
-	 * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
-	 */
-	options &= ~PCRE2_UCP;
+#ifdef GIT_PCRE2_VERSION_10_35_OR_HIGHER
+		/*
+		 * Work around a JIT bug related to invalid Unicode character handling
+		 * fixed in 10.35:
+		 * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
+		 */
+		options |= PCRE2_UCP;
+#ifdef GIT_PCRE2_VERSION_10_43_OR_HIGHER
+		if (!opt->perl_digit)
+			xoptions |= (PCRE2_EXTRA_ASCII_BSD | PCRE2_EXTRA_ASCII_DIGIT);
 #endif
+#endif
+	}
 
 #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
 	/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
@@ -339,6 +350,14 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		options |= PCRE2_NO_START_OPTIMIZE;
 #endif
 
+	if (xoptions) {
+		if (!p->pcre2_compile_context)
+			p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context);
+
+		pcre2_set_compile_extra_options(p->pcre2_compile_context,
+						xoptions);
+	}
+
 	p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
 					 p->patternlen, options, &error, &erroffset,
 					 p->pcre2_compile_context);
diff --git a/grep.h b/grep.h
index 926c0875c4..cd5c416a0a 100644
--- a/grep.h
+++ b/grep.h
@@ -4,6 +4,9 @@
 #ifdef USE_LIBPCRE2
 #define PCRE2_CODE_UNIT_WIDTH 8
 #include <pcre2.h>
+#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 43) || PCRE2_MAJOR >= 11
+#define GIT_PCRE2_VERSION_10_43_OR_HIGHER
+#endif
 #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
 #define GIT_PCRE2_VERSION_10_36_OR_HIGHER
 #endif
@@ -178,6 +181,8 @@ struct grep_opt {
 
 	void (*output)(struct grep_opt *opt, const void *data, size_t size);
 	void *output_priv;
+
+	unsigned perl_digit:1;
 };
 
 #define GREP_OPT_INIT { \
diff --git a/t/perf/p7822-grep-perl-character.sh b/t/perf/p7822-grep-perl-character.sh
index 87009c60df..cc88d5a695 100755
--- a/t/perf/p7822-grep-perl-character.sh
+++ b/t/perf/p7822-grep-perl-character.sh
@@ -8,6 +8,13 @@ etc.) we will test the patterns under those numbers of threads.
 
 . ./perf-lib.sh
 
+# setting a LOCALE is needed, but not yet supported by :
+#. "$TEST_DIRECTORY"/lib-gettext.sh
+
+# Invoke like:
+#
+# LC_ALL=is_IS.utf8 ./p7822-grep-perl-character.sh
+
 test_perf_large_repo
 test_checkout_worktree
 
@@ -27,13 +34,13 @@ do
 	if ! test_have_prereq PERF_GREP_ENGINES_THREADS
 	then
 		test_perf "grep -P '$pattern'" --prereq PCRE "
-			git -P grep -f pat || :
+			git grep -P -f pat || :
 		"
 	else
 		for threads in $GIT_PERF_GREP_THREADS
 		do
 			test_perf "grep -P '$pattern' with $threads threads" --prereq PTHREADS,PCRE "
-				git -c grep.threads=$threads -P grep -f pat || :
+				git -c grep.threads=$threads grep -P -f pat || :
 			"
 		done
 	fi
diff --git a/t/t7818-grep-digit.sh b/t/t7818-grep-digit.sh
new file mode 100755
index 0000000000..44007e6be6
--- /dev/null
+++ b/t/t7818-grep-digit.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='git grep -P with digits'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./lib-gettext.sh
+
+test_expect_success 'setup' '
+	echo 2023 >ascii &&
+	printf "\357\274\222\357\274\220\357\274\222\357\274\223\n" >fullwidth &&
+	printf "\331\241\331\244\331\244\331\245\n" >multibyte &&
+	git add . &&
+	git commit -m. &&
+	LC_ALL="$is_IS_locale" &&
+	export LC_ALL
+'
+
+test_expect_success PCRE 'grep -P "\d"' '
+	echo "ascii:2023" >expected &&
+	git grep -P "\d{2}[[:digit:]]{2}" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success PCRE 'git -c grep.perl.digit' '
+	test_config grep.perl.digit true &&
+	git grep -P "\d{2}[[:digit:]]{2}" >actual &&
+	grep fullwidth actual &&
+	grep multibyte actual &&
+	test_line_count = 3 actual
+'
+
+test_done
-- 
2.39.3 (Apple Git-145)


^ permalink raw reply related

* Re: Concurrent fetch commands
From: Stefan Haller @ 2024-01-01 11:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <9b5ff583-f8b9-42dd-8829-2fea74bc2057@haller-berlin.de>

On 01.01.24 12:30, Stefan Haller wrote:
> But it does sound like --no-write-fetch-head will solve our problem,
> thanks!

Actually we saw another problem recently that I suspect might also be
caused by the concurrency between fetch and pull, but I'm not sure. I'll
explain it here in case anybody has any idea.

What happened was that a user tried to pull a branch that was behind its
upstream (not diverged). They got the error message from
show_advice_pull_non_ff ("Need to specify how to reconcile divergent
branches"); the log then showed that the background fetch was ongoing
for the remote of the current branch while they initiated the pull.

Trying to pull again after the background fetch had moved on to the next
remote then worked.

I read the code in pull.c a bit, but I can't see how it can become so
confused about being diverged in this scenario. Any ideas?

-Stefan

^ permalink raw reply

* Re: Concurrent fetch commands
From: Stefan Haller @ 2024-01-01 11:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqy1daffk8.fsf@gitster.g>

On 31.12.23 18:27, Junio C Hamano wrote:
> How this _ought_ to work is
> 
>    ... it should notice and
>    barf, saying "fatal: a 'git fetch' is already working" or
>    something.

Interesting, I had expected this to work somehow, e.g. by sequencing the
operations or whatever is necessary to make it work. Fixing the bug like
you suggest actually makes very little difference in practice, it just
gives a slightly less confusing error message.

> but it is a crime to run them without the "--no-fetch-head" option.

Ouch, I wasn't aware we are committing crimes. We'll accept the
punishment. :-)

But it does sound like --no-write-fetch-head will solve our problem,
thanks!

-Stefan

^ permalink raw reply

* Re: Concurrent fetch commands
From: Stefan Haller @ 2024-01-01 11:23 UTC (permalink / raw)
  To: Konstantin Tokarev; +Cc: git
In-Reply-To: <20231231165042.1d934927@RedEyes>

On 31.12.23 14:50, Konstantin Tokarev wrote:
> В Sun, 31 Dec 2023 14:30:05 +0100
> Stefan Haller <lists@haller-berlin.de> пишет:
> 
>> ... it is common for git clients to run git fetch
>> periodically in the background.
> 
> I think it's a really weird idea which makes a disservice both to human
> git users and git servers. IMO clients doing this should be fixed to
> avoid such behavior, at least by default.

I disagree pretty strongly. Users expect this, and I do find it very
convenient myself.

^ permalink raw reply

* [Outreachy][PATCH v3] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: Achu Luma @ 2024-01-01 10:40 UTC (permalink / raw)
  To: git
  Cc: chriscool, christian.couder, gitster, l.s.r, phillip.wood,
	steadmon, Achu Luma
In-Reply-To: <20231230001102.9220-1-ach.lumap@gmail.com>

In the recent codebase update(8bf6fbd00d (Merge branch
'js/doc-unit-tests', 2023-12-09)), a new unit testing framework was
merged, providing a standardized approach for testing C code. Prior to
this update, some unit tests relied on the test helper mechanism,
lacking a dedicated unit testing framework. It's more natural to perform
these unit tests using the new unit test framework.

This commit migrates the unit tests for C character classification
functions (isdigit(), isspace(), etc) from the legacy approach
using the test-tool command `test-tool ctype` in t/helper/test-ctype.c
to the new unit testing framework (t/unit-tests/test-lib.h).

The migration involves refactoring the tests to utilize the testing
macros provided by the framework (TEST() and check_*()).

Signed-off-by: Achu Luma <ach.lumap@gmail.com>
---
 Sorry for the poor description of the changes, maybe the following is better:
 In the revised patch we added a call to test_msg() suggested by Phillip to 
 print the character code. This helps us pinpoint exactly the byte value that 
 is an issue as suggested by Junio. We keep using check_int() as it allows us 
 to still see the actual and expected return which might help us in case func() 
 returned a different non-zero value when we were expecting "1". It is useful 
 to have the return value printed on error in case we start returning "53" 
 instead of "1" for "true".

 Makefile               |  2 +-
 t/helper/test-ctype.c  | 70 --------------------------------------
 t/helper/test-tool.c   |  1 -
 t/helper/test-tool.h   |  1 -
 t/t0070-fundamental.sh |  4 ---
 t/unit-tests/t-ctype.c | 77 ++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 78 insertions(+), 77 deletions(-)
 delete mode 100644 t/helper/test-ctype.c
 create mode 100644 t/unit-tests/t-ctype.c

diff --git a/Makefile b/Makefile
index 88ba7a3c51..a4df48ba65 100644
--- a/Makefile
+++ b/Makefile
@@ -792,7 +792,6 @@ TEST_BUILTINS_OBJS += test-chmtime.o
 TEST_BUILTINS_OBJS += test-config.o
 TEST_BUILTINS_OBJS += test-crontab.o
 TEST_BUILTINS_OBJS += test-csprng.o
-TEST_BUILTINS_OBJS += test-ctype.o
 TEST_BUILTINS_OBJS += test-date.o
 TEST_BUILTINS_OBJS += test-delta.o
 TEST_BUILTINS_OBJS += test-dir-iterator.o
@@ -1341,6 +1340,7 @@ THIRD_PARTY_SOURCES += sha1dc/%
 
 UNIT_TEST_PROGRAMS += t-basic
 UNIT_TEST_PROGRAMS += t-strbuf
+UNIT_TEST_PROGRAMS += t-ctype
 UNIT_TEST_PROGS = $(patsubst %,$(UNIT_TEST_BIN)/%$X,$(UNIT_TEST_PROGRAMS))
 UNIT_TEST_OBJS = $(patsubst %,$(UNIT_TEST_DIR)/%.o,$(UNIT_TEST_PROGRAMS))
 UNIT_TEST_OBJS += $(UNIT_TEST_DIR)/test-lib.o
diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
deleted file mode 100644
index e5659df40b..0000000000
--- a/t/helper/test-ctype.c
+++ /dev/null
@@ -1,70 +0,0 @@
-#include "test-tool.h"
-
-static int rc;
-
-static void report_error(const char *class, int ch)
-{
-	printf("%s classifies char %d (0x%02x) wrongly\n", class, ch, ch);
-	rc = 1;
-}
-
-static int is_in(const char *s, int ch)
-{
-	/*
-	 * We can't find NUL using strchr. Accept it as the first
-	 * character in the spec -- there are no empty classes.
-	 */
-	if (ch == '\0')
-		return ch == *s;
-	if (*s == '\0')
-		s++;
-	return !!strchr(s, ch);
-}
-
-#define TEST_CLASS(t,s) {			\
-	int i;					\
-	for (i = 0; i < 256; i++) {		\
-		if (is_in(s, i) != t(i))	\
-			report_error(#t, i);	\
-	}					\
-	if (t(EOF))				\
-		report_error(#t, EOF);		\
-}
-
-#define DIGIT "0123456789"
-#define LOWER "abcdefghijklmnopqrstuvwxyz"
-#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
-#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
-#define ASCII \
-	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
-	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
-	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
-	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
-	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
-	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
-	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
-	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
-#define CNTRL \
-	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
-	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
-	"\x7f"
-
-int cmd__ctype(int argc UNUSED, const char **argv UNUSED)
-{
-	TEST_CLASS(isdigit, DIGIT);
-	TEST_CLASS(isspace, " \n\r\t");
-	TEST_CLASS(isalpha, LOWER UPPER);
-	TEST_CLASS(isalnum, LOWER UPPER DIGIT);
-	TEST_CLASS(is_glob_special, "*?[\\");
-	TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
-	TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
-	TEST_CLASS(isascii, ASCII);
-	TEST_CLASS(islower, LOWER);
-	TEST_CLASS(isupper, UPPER);
-	TEST_CLASS(iscntrl, CNTRL);
-	TEST_CLASS(ispunct, PUNCT);
-	TEST_CLASS(isxdigit, DIGIT "abcdefABCDEF");
-	TEST_CLASS(isprint, LOWER UPPER DIGIT PUNCT " ");
-
-	return rc;
-}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 37ba996539..33b9501c21 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -19,7 +19,6 @@ static struct test_cmd cmds[] = {
 	{ "config", cmd__config },
 	{ "crontab", cmd__crontab },
 	{ "csprng", cmd__csprng },
-	{ "ctype", cmd__ctype },
 	{ "date", cmd__date },
 	{ "delta", cmd__delta },
 	{ "dir-iterator", cmd__dir_iterator },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 8a1a7c63da..b72f07ded9 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -12,7 +12,6 @@ int cmd__chmtime(int argc, const char **argv);
 int cmd__config(int argc, const char **argv);
 int cmd__crontab(int argc, const char **argv);
 int cmd__csprng(int argc, const char **argv);
-int cmd__ctype(int argc, const char **argv);
 int cmd__date(int argc, const char **argv);
 int cmd__delta(int argc, const char **argv);
 int cmd__dir_iterator(int argc, const char **argv);
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 487bc8d905..a4756fbab9 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -9,10 +9,6 @@ Verify wrappers and compatibility functions.
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
-test_expect_success 'character classes (isspace, isalpha etc.)' '
-	test-tool ctype
-'
-
 test_expect_success 'mktemp to nonexistent directory prints filename' '
 	test_must_fail test-tool mktemp doesnotexist/testXXXXXX 2>err &&
 	grep "doesnotexist/test" err
diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
new file mode 100644
index 0000000000..8a215d387a
--- /dev/null
+++ b/t/unit-tests/t-ctype.c
@@ -0,0 +1,77 @@
+#include "test-lib.h"
+
+static int is_in(const char *s, int ch)
+{
+	/*
+	 * We can't find NUL using strchr. Accept it as the first
+	 * character in the spec -- there are no empty classes.
+	 */
+	if (ch == '\0')
+		return ch == *s;
+	if (*s == '\0')
+		s++;
+	return !!strchr(s, ch);
+}
+
+/* Macro to test a character type */
+#define TEST_CTYPE_FUNC(func, string)				\
+static void test_ctype_##func(void)				\
+{								\
+	for (int i = 0; i < 256; i++) {				\
+        	if (!check_int(func(i), ==, is_in(string, i))) 	\
+        		test_msg("       i: %02x", i);		\
+        } 							\
+}
+
+#define DIGIT "0123456789"
+#define LOWER "abcdefghijklmnopqrstuvwxyz"
+#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
+#define ASCII \
+	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+	"\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
+	"\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
+	"\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
+	"\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
+	"\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
+	"\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+#define CNTRL \
+	"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+	"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+	"\x7f"
+
+TEST_CTYPE_FUNC(isdigit, DIGIT)
+TEST_CTYPE_FUNC(isspace, " \n\r\t")
+TEST_CTYPE_FUNC(isalpha, LOWER UPPER)
+TEST_CTYPE_FUNC(isalnum, LOWER UPPER DIGIT)
+TEST_CTYPE_FUNC(is_glob_special, "*?[\\")
+TEST_CTYPE_FUNC(is_regex_special, "$()*+.?[\\^{|")
+TEST_CTYPE_FUNC(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~")
+TEST_CTYPE_FUNC(isascii, ASCII)
+TEST_CTYPE_FUNC(islower, LOWER)
+TEST_CTYPE_FUNC(isupper, UPPER)
+TEST_CTYPE_FUNC(iscntrl, CNTRL)
+TEST_CTYPE_FUNC(ispunct, PUNCT)
+TEST_CTYPE_FUNC(isxdigit, DIGIT "abcdefABCDEF")
+TEST_CTYPE_FUNC(isprint, LOWER UPPER DIGIT PUNCT " ")
+
+int cmd_main(int argc, const char **argv) {
+	/* Run all character type tests */
+	TEST(test_ctype_isspace(), "isspace() works as we expect");
+	TEST(test_ctype_isdigit(), "isdigit() works as we expect");
+	TEST(test_ctype_isalpha(), "isalpha() works as we expect");
+	TEST(test_ctype_isalnum(), "isalnum() works as we expect");
+	TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
+	TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
+	TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
+	TEST(test_ctype_isascii(), "isascii() works as we expect");
+	TEST(test_ctype_islower(), "islower() works as we expect");
+	TEST(test_ctype_isupper(), "isupper() works as we expect");
+	TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
+	TEST(test_ctype_ispunct(), "ispunct() works as we expect");
+	TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
+	TEST(test_ctype_isprint(), "isprint() works as we expect");
+
+	return test_done();
+}
-- 
2.42.0.windows.2


^ permalink raw reply related

* Re: [PATCH] write-or-die: make GIT_FLUSH a Boolean environment variable
From: Torsten Bögershausen @ 2024-01-01  8:24 UTC (permalink / raw)
  To: Chandra Pratap via GitGitGadget; +Cc: git, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1628.git.1703955246308.gitgitgadget@gmail.com>

On Sat, Dec 30, 2023 at 04:54:06PM +0000, Chandra Pratap via GitGitGadget wrote:
> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> Among Git's environment variable, the ones marked as "Boolean"
> accept values in a way similar to Boolean configuration variables,
> i.e. values like 'yes', 'on', 'true' and positive numbers are
> taken as "on" and values like 'no', 'off', 'false' are taken as
> "off".
> GIT_FLUSH can be used to force Git to use non-buffered I/O when
> writing to stdout. It can only accept two values, '1' which causes
> Git to flush more often and '0' which makes all output buffered.
> Make GIT_FLUSH accept more values besides '0' and '1' by turning it
> into a Boolean environment variable, modifying the required logic.
> Update the related documentation.
>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
>     write-or-die: make GIT_FLUSH a Boolean environment variable
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1628%2FChand-ra%2Fgit_flush-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1628/Chand-ra/git_flush-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1628
>
>  Documentation/git.txt | 16 +++++++---------
>  write-or-die.c        |  9 ++++-----
>  2 files changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 2535a30194f..83fd60f2d11 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -724,16 +724,14 @@ for further details.
>  	waiting for someone with sufficient permissions to fix it.
>
>  `GIT_FLUSH`::
> -// NEEDSWORK: make it into a usual Boolean environment variable
> -	If this environment variable is set to "1", then commands such
> +	If this Boolean environment variable is set to true, then commands such
>  	as 'git blame' (in incremental mode), 'git rev-list', 'git log',
> -	'git check-attr' and 'git check-ignore' will
> -	force a flush of the output stream after each record have been
> -	flushed. If this
> -	variable is set to "0", the output of these commands will be done
> -	using completely buffered I/O.   If this environment variable is
> -	not set, Git will choose buffered or record-oriented flushing
> -	based on whether stdout appears to be redirected to a file or not.
> +	'git check-attr' and 'git check-ignore' will force a flush of the output
> +	stream after each record have been flushed. If this variable is set to
> +	false, the output of these commands will be done using completely buffered
> +	I/O. If this environment variable is not set, Git will choose buffered or
> +	record-oriented flushing based on whether stdout appears to be redirected
> +	to a file or not.
>
>  `GIT_TRACE`::
>  	Enables general trace messages, e.g. alias expansion, built-in
> diff --git a/write-or-die.c b/write-or-die.c
> index 42a2dc73cd3..f501a6e382a 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -20,14 +20,13 @@ void maybe_flush_or_die(FILE *f, const char *desc)
>  {
>  	static int skip_stdout_flush = -1;
>  	struct stat st;
> -	char *cp;
> +	int cp;
>
>  	if (f == stdout) {
>  		if (skip_stdout_flush < 0) {
> -			/* NEEDSWORK: make this a normal Boolean */
> -			cp = getenv("GIT_FLUSH");
> -			if (cp)
> -				skip_stdout_flush = (atoi(cp) == 0);
> +			cp = git_env_bool("GIT_FLUSH", -1);
> +			if (cp >= 0)
> +				skip_stdout_flush = (cp == 0);
>  			else if ((fstat(fileno(stdout), &st) == 0) &&
>  				 S_ISREG(st.st_mode))
>  				skip_stdout_flush = 1;

I think that the "cp" variable could be renamed,
cp is not a "char pointer" any more.
However, using the logic from git_env_bool(), it can go away anyway,
wouldn't it ?


  diff --git a/write-or-die.c b/write-or-die.c
  index 42a2dc73cd..01b042bf12 100644
  --- a/write-or-die.c
  +++ b/write-or-die.c
  @@ -13,21 +13,21 @@
    * more. So we just ignore that case instead (and hope we get
    * the right error code on the flush).
    *
  + * The flushing can be skipped like this:
  + * GIT_FLUSH=0 git-rev-list HEAD
  + *
    * If the file handle is stdout, and stdout is a file, then skip the
  - * flush entirely since it's not needed.
  + * flush as well since it's not needed.
    */
   void maybe_flush_or_die(FILE *f, const char *desc)
   {
          static int skip_stdout_flush = -1;
          struct stat st;
  -       char *cp;

          if (f == stdout) {
                  if (skip_stdout_flush < 0) {
  -                       /* NEEDSWORK: make this a normal Boolean */
  -                       cp = getenv("GIT_FLUSH");
  -                       if (cp)
  -                               skip_stdout_flush = (atoi(cp) == 0);
  +                       if (git_env_bool("GIT_FLUSH", -1) == 0)
  +                               skip_stdout_flush = 1;
                          else if ((fstat(fileno(stdout), &st) == 0) &&
                                   S_ISREG(st.st_mode))
                                  skip_stdout_flush = 1;

^ permalink raw reply

* Re: Concurrent fetch commands
From: Dragan Simic @ 2023-12-31 17:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Haller, git
In-Reply-To: <xmqqy1daffk8.fsf@gitster.g>

On 2023-12-31 18:27, Junio C Hamano wrote:
> Stefan Haller <lists@haller-berlin.de> writes:
> 
>> I can reliably reproduce this by doing
>> 
>>    $ git fetch&; sleep 0.1; git pull
>>    [1] 42160
>>    [1]  + done       git fetch
>>    fatal: Cannot rebase onto multiple branches.
> 
> I see a bug here.
> 
> How this _ought_ to work is
> 
>  - The first "git fetch" wants to report what it fetched by writing
>    into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
>    the fetch finishes can consume its contents).
> 
>  - The second "git pull" runs "git fetch" under the hood.  Because
>    it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
>    is already somebody writing to the file, it should notice and
>    barf, saying "fatal: a 'git fetch' is already working" or
>    something.
> 
> But because there is no "Do not overwrite FETCH_HEAD somebody else
> is using" protection, "git merge" or "git rebase" that is run as the
> second half of the "git pull" ends up working on the contents of
> FETCH_HEAD that is undefined, and GIGO result follows.
> 
> The "bug" that the second "git fetch" does not notice an already
> running one (who is in possession of FETCH_HEAD) and refrain from
> starting is not easy to design a fix for---we cannot just abort by
> opening it with O_CREAT|O_EXCL because it is a normal thing for
> $GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
> its contents before starting to avoid getting affected by contents
> leftover by the last fetch, but when there is a "git fetch" that is
> actively running, and it finishes _after_ the second one starts and
> truncates the file, the second one will end up seeing the contents
> the first one left.  We have the "--no-write-fetch-head" option for
> users to explicitly tell which invocation of "git fetch" should not
> write FETCH_HEAD.
> 
> Running "background/priming" fetches (the one before "sleep 0.1" you
> have) is not a crime by itself, but it is a crime to run them
> without the "--no-fetch-head" option.  Since you have *NO* intention
> of using its contents to feed a "git merge" (or equivalent)
> yourself, you are breaking your "git pull" step in your example
> reproduction yourself.

Thank you very much for this highly detailed explanation.

^ permalink raw reply

* Re: Concurrent fetch commands
From: Junio C Hamano @ 2023-12-31 17:27 UTC (permalink / raw)
  To: Stefan Haller; +Cc: git
In-Reply-To: <c11ca0b3-aaf4-4a8d-80a1-3832954aa7aa@haller-berlin.de>

Stefan Haller <lists@haller-berlin.de> writes:

> I can reliably reproduce this by doing
>
>    $ git fetch&; sleep 0.1; git pull
>    [1] 42160
>    [1]  + done       git fetch
>    fatal: Cannot rebase onto multiple branches.

I see a bug here.

How this _ought_ to work is

 - The first "git fetch" wants to report what it fetched by writing
   into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
   the fetch finishes can consume its contents).

 - The second "git pull" runs "git fetch" under the hood.  Because
   it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
   is already somebody writing to the file, it should notice and
   barf, saying "fatal: a 'git fetch' is already working" or
   something.

But because there is no "Do not overwrite FETCH_HEAD somebody else
is using" protection, "git merge" or "git rebase" that is run as the
second half of the "git pull" ends up working on the contents of
FETCH_HEAD that is undefined, and GIGO result follows.

The "bug" that the second "git fetch" does not notice an already
running one (who is in possession of FETCH_HEAD) and refrain from
starting is not easy to design a fix for---we cannot just abort by
opening it with O_CREAT|O_EXCL because it is a normal thing for
$GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
its contents before starting to avoid getting affected by contents
leftover by the last fetch, but when there is a "git fetch" that is
actively running, and it finishes _after_ the second one starts and
truncates the file, the second one will end up seeing the contents
the first one left.  We have the "--no-write-fetch-head" option for
users to explicitly tell which invocation of "git fetch" should not
write FETCH_HEAD.

Running "background/priming" fetches (the one before "sleep 0.1" you
have) is not a crime by itself, but it is a crime to run them
without the "--no-fetch-head" option.  Since you have *NO* intention
of using its contents to feed a "git merge" (or equivalent)
yourself, you are breaking your "git pull" step in your example
reproduction yourself.

^ permalink raw reply

* Re: Concurrent fetch commands
From: Dragan Simic @ 2023-12-31 14:01 UTC (permalink / raw)
  To: Konstantin Tokarev; +Cc: Stefan Haller, git
In-Reply-To: <20231231165042.1d934927@RedEyes>

On 2023-12-31 14:50, Konstantin Tokarev wrote:
> В Sun, 31 Dec 2023 14:30:05 +0100
> Stefan Haller <lists@haller-berlin.de> пишет:
> 
>> Currently, git doesn't seem to be very good at handling two concurrent
>> invocations of git fetch (or git fetch and git pull). This is a
>> problem because it is common for git clients to run git fetch
>> periodically in the background.
> 
> I think it's a really weird idea which makes a disservice both to human
> git users and git servers. IMO clients doing this should be fixed to
> avoid such behavior, at least by default.

Could you, please, elaborate a bit on the resulting disservice?  
Regarding fixing the clients, IDE users often expect such automatic 
updating to be performed in the background, so I'm afraid there's not 
much wiggle room there.

^ permalink raw reply

* Re: Concurrent fetch commands
From: Konstantin Tokarev @ 2023-12-31 13:50 UTC (permalink / raw)
  To: Stefan Haller; +Cc: git
In-Reply-To: <c11ca0b3-aaf4-4a8d-80a1-3832954aa7aa@haller-berlin.de>

В Sun, 31 Dec 2023 14:30:05 +0100
Stefan Haller <lists@haller-berlin.de> пишет:

> Currently, git doesn't seem to be very good at handling two concurrent
> invocations of git fetch (or git fetch and git pull). This is a
> problem because it is common for git clients to run git fetch
> periodically in the background.

I think it's a really weird idea which makes a disservice both to human
git users and git servers. IMO clients doing this should be fixed to
avoid such behavior, at least by default.

^ permalink raw reply

* Re: Concurrent fetch commands
From: Dragan Simic @ 2023-12-31 13:48 UTC (permalink / raw)
  To: Stefan Haller; +Cc: git
In-Reply-To: <c11ca0b3-aaf4-4a8d-80a1-3832954aa7aa@haller-berlin.de>

On 2023-12-31 14:30, Stefan Haller wrote:
> Currently, git doesn't seem to be very good at handling two concurrent
> invocations of git fetch (or git fetch and git pull). This is a problem
> because it is common for git clients to run git fetch periodically in
> the background. In that case, when you happen to invoke git pull while
> such a background fetch is running, an error occurs ("Cannot rebase 
> onto
> multiple branches").
> 
> I can reliably reproduce this by doing
> 
>    $ git fetch&; sleep 0.1; git pull
>    [1] 42160
>    [1]  + done       git fetch
>    fatal: Cannot rebase onto multiple branches.
> 
> The reason for this failure seems to be that both the first fetch and
> the fetch that runs as part of the pull append their information to
> .git/FETCH_HEAD, so that the information for the current branch ends up
> twice in the file.
> 
> Do you think git fetch should be made more robust against scenarios 
> like
> this?

I believe a similar issue has been already raised recently, so perhaps 
introducing some kind of file-based locking within git itself could be 
justified.  It would make the things a bit more robust, and would also 
improve the overall user experience.

> More context: the git client that I'm contributing to (lazygit) used to
> guard against this for its own background fetch with a global mutex 
> that
> allowed only one single fetch, pull, or push at a time. This solved the
> problem nicely for lazygit's own operations (at the expense of some 
> lag,
> occasionally); and I'm not aware of any reports about failures because
> some other git client's background fetch got in the way, so maybe we
> don't have to worry about that too much.
> 
> However, we now removed that mutex to allow certain parallel fetch
> operations to run at the same time, most notably fetching (and 
> updating)
> a branch that is not checked out (by doing "git fetch origin
> branch:branch"). It is useful to be able to trigger this for multiple
> branches concurrently, and actually this works fine.
> 
> But now we have the problem described above, where a pull of the
> checked-out branch runs at the same time as a background fetch; this is
> not so unlikely, because lazygit triggers the first background fetch at
> startup, so invoking the pull command right after starting lazygit is
> very likely to fail.
> 
> We could re-introduce a mutex and just make it a little less global;
> e.g. protect only pull and parameter-less fetch. But fixing it in git
> itself seems preferable to me.
> 
> Sorry for the wall of text, but I figured giving more context could be
> useful.

^ permalink raw reply

* Concurrent fetch commands
From: Stefan Haller @ 2023-12-31 13:30 UTC (permalink / raw)
  To: git

Currently, git doesn't seem to be very good at handling two concurrent
invocations of git fetch (or git fetch and git pull). This is a problem
because it is common for git clients to run git fetch periodically in
the background. In that case, when you happen to invoke git pull while
such a background fetch is running, an error occurs ("Cannot rebase onto
multiple branches").

I can reliably reproduce this by doing

   $ git fetch&; sleep 0.1; git pull
   [1] 42160
   [1]  + done       git fetch
   fatal: Cannot rebase onto multiple branches.

The reason for this failure seems to be that both the first fetch and
the fetch that runs as part of the pull append their information to
.git/FETCH_HEAD, so that the information for the current branch ends up
twice in the file.

Do you think git fetch should be made more robust against scenarios like
this?

More context: the git client that I'm contributing to (lazygit) used to
guard against this for its own background fetch with a global mutex that
allowed only one single fetch, pull, or push at a time. This solved the
problem nicely for lazygit's own operations (at the expense of some lag,
occasionally); and I'm not aware of any reports about failures because
some other git client's background fetch got in the way, so maybe we
don't have to worry about that too much.

However, we now removed that mutex to allow certain parallel fetch
operations to run at the same time, most notably fetching (and updating)
a branch that is not checked out (by doing "git fetch origin
branch:branch"). It is useful to be able to trigger this for multiple
branches concurrently, and actually this works fine.

But now we have the problem described above, where a pull of the
checked-out branch runs at the same time as a background fetch; this is
not so unlikely, because lazygit triggers the first background fetch at
startup, so invoking the pull command right after starting lazygit is
very likely to fail.

We could re-introduce a mutex and just make it a little less global;
e.g. protect only pull and parameter-less fetch. But fixing it in git
itself seems preferable to me.

Sorry for the wall of text, but I figured giving more context could be
useful.

Thanks,
Stefan

^ permalink raw reply

* Re: rebase invoking pre-commit
From: Sean Allred @ 2023-12-31 10:52 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Sean Allred, Git List, Junio C Hamano
In-Reply-To: <CABPp-BFbvRDCbMp9Gs9PuV7WfgoVNwyOOn1rB7fe_8UvEEdehA@mail.gmail.com>

Elijah Newren <newren@gmail.com> writes:

> On Thu, Dec 21, 2023 at 12:59 PM Sean Allred <allred.sean@gmail.com> wrote:
>> Is there a current reason why pre-commit shouldn't be invoked during
>> rebase, or is this just waiting for a reviewable patch?
>>
>> This was brought up before at [1] in 2015, but that thread so old at
>> this point that it seemed prudent to double-check before investing time
>> in a developing and testing a patch.
>>
>> [1]: https://lore.kernel.org/git/1m55i3m.1fum4zo1fpnhncM%25lists@haller-berlin.de/
>
> I'm very opinionated here.  I'm just one person, so definitely take
> this with a grain of salt, but in my view...
>
> Personally, I think implementing any per-commit hook in rebase by
> default is a mistake. It enforces a must-be-in-a-worktree-and-the-
> worktree-must-be-updated-with-every-replayed-commit mindset, which I
> find highly problematic[2], even if that's "what we always used to
> do".
>
> [2] https://lore.kernel.org/git/20231124111044.3426007-1-christian.couder@gmail.com/

I'm not hip with what most pre-commit hooks do, but I'll point out that
a hook like pre-commit assuming there is a worktree is the fault of the
hook implementation, not of the infrastructure that invokes the hook. I
imagine most folks on this list are aware that a worktree is not needed
to create a commit and update a branch to point at it.

FWIW, I would also find such a mindset to be highly problematic :-) I'll
take a moment here to thank you, Christian, and everyone else in that
effort for your interest in and work on git-replay; I've been trying to
watch its activity on-list closely in the hopes that we can adopt it
into our system once it's ready.

> Because of that, I would prefer to see this at most be a command line
> flag. However, we've already got a command line flag that while not
> identical, is essentially equivalent: "--exec $MY_SCRIPT" (it's not
> the same because it's a post-commit check, but you get notification of
> any problematic commits, and an immediate stop of the rebase for you
> to fix up the problematic commit; fixing up the commit shouldn't be
> problematic since you are, after all, already rebasing).

Indeed, and an

    --exec 'git hook run pre-commit || git reset --soft HEAD~'

would probably get you farther. I can certainly see an argument for
this, but from the perspective of designing a system for other
developers to use, such a rebase would have to be triggered
automatically (perhaps on pre-push).

> I see Phillip already responded and suggested not running the
> pre-commit hook with every commit, but only upon the first commit
> after a "git rebase --continue".  That seems far more reasonable to me
> than running on every commit...though even that worries me in regards
> to what assumptions that entails about what is present in the working
> tree.

It's worth noting the context here is to prevent developers from
committing conflict markers, so this would actually be exactly
sufficient.

Invoking pre-commit at this time would also be consistent with the
behaviors of prepare-commit-msg, commit-msg, and post-commit -- at least
when I reword a commit during a rebase.

However, post-commit is executed after each picked commit during a
rebase, so pre-commit there would also be consistent.

> (For example, what about folks with large repositories, so large that
> a branch switch or full checkout is extremely costly, and which could
> benefit from resolving conflicts in a separate sparse-checkout
> worktree, potentially much more sparse than their main checkout?

As it happens, a single checkout of our source runs upwards of 2GB, so
I'm exactly in the population you're describing :-) The main reason
we're moving to Git from SVN is that an SVN checkout can take upwards of
an hour for us today -- even with some real shenanigans to make them go
faster. On the Git side, we've also looked into (though I don't recall
if we had much success with) narrowing the sparsity patterns to just the
conflicts for conflict resolution workflows -- particularly when moving
feature code between separate trunks. So I guess I'm also glad we
weren't too far off in left field on that one! (As I recall, one of the
main challenges we faced there was ensuring there was enough stuff
'still around' so that both binary and project references could resolve
and folks could use that information to help resolve conflicts.
Hopefully git-replay can be smart enough to allow some customization on
that front. We found some success with feeding the list of conflicted
files into some arbitrary logic that spat out the sparsity pattern to
use.)

> And what if people like that really fast rebase resolution (namely,
> done in a separate very sparse checkout which also has the advantage
> of not polluting your current working tree) so much that they use it
> on smaller repositories as well? Can I not even experiment with this
> idea because of the historical per-commit-at-least-as-full-as-main
> -worktree-checkout assumptions we've baked into rebase?)

I'd be interested in reading more about this baked-in assumption. Are
these mostly laid out in replay-design-notes.txt[3]?

> While at it, I should also mention that I'm not a fan of the broken
> pre-rebase hook; its design ties us to doing a single branch at a
> time.  Maybe that hook is not quite as bad, though, since we already
> broke that hook and no one seemed to care (in particular, when
> --update-refs was implemented).  But if no one seems to care about
> broken hooks, I think the proper answer is to either get rid of the
> hook or fix it.

If I were to guess, this likely stems either from an inexact definition
of the hook in documentation (ultimately resulting in incomplete tests)
or folks incorrectly assuming what each hook should do based purely on
its name.

Which leads to an interesting point: pre-commit specifically states that
it is invoked by git-commit -- not that it's invoked whenever a commit
is created. So perhaps the correct thing to do here (if a hook is in
fact needed) would be to define a new hook -- but I worry about doing
that in the current state where there doesn't *seem* to be very rigid
coordination of when client hooks are invoked in terms of plumbing
rather than porcelain.

> Anyway, as I mentioned, I'm quite opinionated here.  To the point that
> I deemed git-rebase in its current form to possibly be unfixable
> (after first putting a lot of work into improving it over the past
> years) and eventually introduced a new "git-replay" command which I
> hope to make into a competent replacement for it.  Given that I do
> have another command to do my experiments, others on the list may
> think it's fine to send rebase further down this route, but I was
> hoping to avoid further drift so that there might be some way of
> re-implementing rebase on top of my "git-replay" ideas/design.

I appreciate your perspective; you've certainly thought a lot about this
space -- and I definitely share your goal of consolidating
implementations for obvious reasons.

So I suppose that leaves me with four possible paths forward:

1. Pursue invoking pre-commit before each commit in `git rebase` (likely
   generic in the sequencer) to be consistent with post-commit.

   It sounds like this isn't a popular option, but I'm curious to folks'
   thoughts on the noted behavior of post-commit here.

2. Pursue invoking pre-commit on `git rebase --continue` (likely on any
   --continue in the sequencer). This has the benefit of using existing
   configuration on developer machines to purportedly 'do the right
   thing' when its likely humans are touching code during conflict
   resolution. It's worth noting this isn't the only reason you might
   --continue, though, since the naive interpretation of this approach
   completely ignores sequencer commands like 'break', though it could
   probably just do what commit-msg does.

3. Define and implement a new hook that is called whenever a new commit
   is about to be (or has been?) written. Such a hook could be
   specifically designed to discourage assuming there's a working copy,
   though we're kidding nobody by thinking it won't be used downstream
   with that assumption. At least we could be explicit about
   expectations, though.

   This is *probably* a lot more design work than this little paragraph
   lets on, but I've not personally watched the introduction of a new
   hook so I don't have context for what to expect.

4. Trigger a rebase --exec in our pre-push. This is certainly the least
   work in git.git (i.e., no work at all), but it comes with the
   distinct disadvantage of playing whiplash with the developer's focus.
   During conflict resolution, they're thinking about conflicts. When
   you're ready to push, its likely that you're no longer thinking about
   conflicts.

Does the behavior of post-commit here change any minds?

[3]: https://github.com/newren/git/blob/2a621020863c0b867293e020fec0954b43818789/replay-design-notes.txt#L162

--
Sean Allred

^ permalink raw reply

* Re: [PATCH V3 0/1] Replace SID with domain/username on Windows
From: Eric Sunshine @ 2023-12-31  9:18 UTC (permalink / raw)
  To: Sören Krecker; +Cc: git
In-Reply-To: <20231231091245.2853-1-soekkle@freenet.de>

On Sun, Dec 31, 2023 at 4:12 AM Sören Krecker <soekkle@freenet.de> wrote:
> I improve the commit message with example of old and new error output.
>
> Thanks to Eric Sunshine.

Thanks for rerolling.

^ permalink raw reply

* [PATCH v3 1/1] Replace SID with domain/username
From: Sören Krecker @ 2023-12-31  9:12 UTC (permalink / raw)
  To: git; +Cc: sunshine, Sören Krecker
In-Reply-To: <20231231091245.2853-1-soekkle@freenet.de>

Replace SID with domain/username in error message, if owner of repository
and user are not equal on windows systems.

Old message:
'''
fatal: detected dubious ownership in repository at 'C:/Users/test/source/repos/git'
'C:/Users/test/source/repos/git' is owned by:
	'S-1-5-21-571067702-4104414259-3379520149-500'
but the current user is:
	'S-1-5-21-571067702-4104414259-3379520149-1001'
To add an exception for this directory, call:

	git config --global --add safe.directory C:/Users/test/source/repos/git
'''

New massage:
'''
fatal: detected dubious ownership in repository at 'C:/Users/test/source/repos/git'
'C:/Users/test/source/repos/git' is owned by:
        'DESKTOP-L78JVA6/Administrator'
but the current user is:
        'DESKTOP-L78JVA6/test'
To add an exception for this directory, call:

        git config --global --add safe.directory C:/Users/test/source/repos/git
'''

Signed-off-by: Sören Krecker <soekkle@freenet.de>
---
 compat/mingw.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 42053c1f65..05aeaaa9ad 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2684,6 +2684,26 @@ static PSID get_current_user_sid(void)
 	return result;
 }
 
+static BOOL user_sid_to_user_name(PSID sid, LPSTR *str)
+{
+	SID_NAME_USE pe_use;
+	DWORD len_user = 0, len_domain = 0;
+	BOOL translate_sid_to_user;
+
+	/* returns only FALSE, because the string pointers are NULL*/
+	LookupAccountSidA(NULL, sid, NULL, &len_user, NULL, &len_domain,
+			  &pe_use); 
+	/*Alloc needed space of the strings*/
+	ALLOC_ARRAY((*str), (size_t)len_domain + (size_t)len_user); 
+	translate_sid_to_user = LookupAccountSidA(NULL, sid, (*str) + len_domain, &len_user,
+				   *str, &len_domain, &pe_use);
+	*(*str + len_domain) = '/';
+	if (translate_sid_to_user == FALSE) {
+		FREE_AND_NULL(*str);
+	}
+	return translate_sid_to_user;
+}
+
 static int acls_supported(const char *path)
 {
 	size_t offset = offset_1st_component(path);
@@ -2767,7 +2787,7 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
 		} else if (report) {
 			LPSTR str1, str2, to_free1 = NULL, to_free2 = NULL;
 
-			if (ConvertSidToStringSidA(sid, &str1))
+			if (user_sid_to_user_name(sid, &str1))
 				to_free1 = str1;
 			else
 				str1 = "(inconvertible)";
@@ -2776,7 +2796,7 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
 				str2 = "(none)";
 			else if (!IsValidSid(current_user_sid))
 				str2 = "(invalid)";
-			else if (ConvertSidToStringSidA(current_user_sid, &str2))
+			else if (user_sid_to_user_name(current_user_sid, &str2))
 				to_free2 = str2;
 			else
 				str2 = "(inconvertible)";
@@ -2784,8 +2804,8 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
 				    "'%s' is owned by:\n"
 				    "\t'%s'\nbut the current user is:\n"
 				    "\t'%s'\n", path, str1, str2);
-			LocalFree(to_free1);
-			LocalFree(to_free2);
+			free(to_free1);
+			free(to_free2);
 		}
 	}
 
-- 
2.39.2


^ permalink raw reply related

* [PATCH V3 0/1] Replace SID with domain/username on Windows
From: Sören Krecker @ 2023-12-31  9:12 UTC (permalink / raw)
  To: git; +Cc: sunshine, Sören Krecker
In-Reply-To: <CAPig+cT4jy4MkyGxtSOZj6U3vUxLaRa-4wr7PON-EebAjT8pwQ@mail.gmail.com>

I improve the commit message with example of old and new error output.

Thanks to Eric Sunshine.

Sören Krecker (1):
  Replace SID with domain/username

 compat/mingw.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)


base-commit: e79552d19784ee7f4bbce278fe25f93fbda196fa
-- 
2.39.2


^ permalink raw reply

* Re: [PATCH 0/1 v2] Replace SID with domain/username on Windows
From: Eric Sunshine @ 2023-12-31  4:08 UTC (permalink / raw)
  To: Sören Krecker; +Cc: git
In-Reply-To: <20231229120319.3797-1-soekkle@freenet.de>

On Fri, Dec 29, 2023 at 7:03 AM Sören Krecker <soekkle@freenet.de> wrote:
> Improve error message on windows systems, if owner of reposotory and current user are not equal.
>
> Old Message:
> '''
> fatal: detected dubious ownership in repository at 'C:/Users/test/source/repos/git'
> 'C:/Users/test/source/repos/git' is owned by:
>         'S-1-5-21-571067702-4104414259-3379520149-500'
> but the current user is:
>         'S-1-5-21-571067702-4104414259-3379520149-1001'
> To add an exception for this directory, call:
>
>         git config --global --add safe.directory C:/Users/test/source/repos/git
> '''
>
> New Massage:
> '''
> fatal: detected dubious ownership in repository at 'C:/Users/test/source/repos/git'
> 'C:/Users/soren/source/repos/git' is owned by:
>         'DESKTOP-L78JVA6/Administrator'
> but the current user is:
>         'DESKTOP-L78JVA6/test'
> To add an exception for this directory, call:
>
>         git config --global --add safe.directory C:/Users/test/source/repos/git
> '''
>
> I hope that I have succeeded in addressing all the points raised.

Thanks, this explanation does an excellent job of helping reviewers
understand why the patch is desirable.

It's also important that people digging through the project history in
the future also understand the reason for this change, so it's very
valuable for the explanation to be part of the commit message of the
patch itself, not just in the cover letter (which doesn't become part
of the permanent project history). Therefore, can you reroll once
again, placing this explanation in the commit message of the patch
itself? Doing so should help the patch get accepted into the project.

That's as much as I can add. Hopefully, one or more Windows folks will
chime in regarding the actual patch content (whether it's acceptable
as-is or needs some tweaks).

Thanks.

^ permalink raw reply

* [PATCH] write-or-die: make GIT_FLUSH a Boolean environment variable
From: Chandra Pratap via GitGitGadget @ 2023-12-30 16:54 UTC (permalink / raw)
  To: git; +Cc: Chandra Pratap, Chandra Pratap

From: Chandra Pratap <chandrapratap3519@gmail.com>

Among Git's environment variable, the ones marked as "Boolean"
accept values in a way similar to Boolean configuration variables,
i.e. values like 'yes', 'on', 'true' and positive numbers are
taken as "on" and values like 'no', 'off', 'false' are taken as
"off".
GIT_FLUSH can be used to force Git to use non-buffered I/O when
writing to stdout. It can only accept two values, '1' which causes
Git to flush more often and '0' which makes all output buffered.
Make GIT_FLUSH accept more values besides '0' and '1' by turning it
into a Boolean environment variable, modifying the required logic.
Update the related documentation.

Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
---
    write-or-die: make GIT_FLUSH a Boolean environment variable

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1628%2FChand-ra%2Fgit_flush-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1628/Chand-ra/git_flush-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1628

 Documentation/git.txt | 16 +++++++---------
 write-or-die.c        |  9 ++++-----
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 2535a30194f..83fd60f2d11 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -724,16 +724,14 @@ for further details.
 	waiting for someone with sufficient permissions to fix it.
 
 `GIT_FLUSH`::
-// NEEDSWORK: make it into a usual Boolean environment variable
-	If this environment variable is set to "1", then commands such
+	If this Boolean environment variable is set to true, then commands such
 	as 'git blame' (in incremental mode), 'git rev-list', 'git log',
-	'git check-attr' and 'git check-ignore' will
-	force a flush of the output stream after each record have been
-	flushed. If this
-	variable is set to "0", the output of these commands will be done
-	using completely buffered I/O.   If this environment variable is
-	not set, Git will choose buffered or record-oriented flushing
-	based on whether stdout appears to be redirected to a file or not.
+	'git check-attr' and 'git check-ignore' will force a flush of the output
+	stream after each record have been flushed. If this variable is set to
+	false, the output of these commands will be done using completely buffered
+	I/O. If this environment variable is not set, Git will choose buffered or
+	record-oriented flushing based on whether stdout appears to be redirected
+	to a file or not.
 
 `GIT_TRACE`::
 	Enables general trace messages, e.g. alias expansion, built-in
diff --git a/write-or-die.c b/write-or-die.c
index 42a2dc73cd3..f501a6e382a 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -20,14 +20,13 @@ void maybe_flush_or_die(FILE *f, const char *desc)
 {
 	static int skip_stdout_flush = -1;
 	struct stat st;
-	char *cp;
+	int cp;
 
 	if (f == stdout) {
 		if (skip_stdout_flush < 0) {
-			/* NEEDSWORK: make this a normal Boolean */
-			cp = getenv("GIT_FLUSH");
-			if (cp)
-				skip_stdout_flush = (atoi(cp) == 0);
+			cp = git_env_bool("GIT_FLUSH", -1);
+			if (cp >= 0)
+				skip_stdout_flush = (cp == 0);
 			else if ((fstat(fileno(stdout), &st) == 0) &&
 				 S_ISREG(st.st_mode))
 				skip_stdout_flush = 1;

base-commit: 1a87c842ece327d03d08096395969aca5e0a6996
-- 
gitgitgadget

^ permalink raw reply related

* RE: You are marked as spam, and therefore cannot authorize a third party application.
From: rsbecker @ 2023-12-30 14:45 UTC (permalink / raw)
  To: 'Ramesh', git
In-Reply-To: <7aefb4c3-da1b-429b-a519-044084265994@infohubinnovations.com>

On Saturday, December 30, 2023 7:25 AM, Ramesh wrote:
>Hi GitHub Team,
>
>I am trying to clone my repository from local machine.But i am getting the follow
>exception that  "You are marked as spam, and therefore cannot authorize a third
>party application.". Can you please suggest the same.I have deleted all my accounts
>except this account.

This is not a GitHub mailing list. Please use support links at GitHub.com.

^ permalink raw reply

* Draft of Git Rev News edition 106
From: Christian Couder @ 2023-12-30 14:38 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jakub Narebski, Markus Jansen, Kaartic Sivaraam,
	Štěpán Němec, Taylor Blau,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason,
	Josh Soref, Eric Sunshine, Elijah Newren, VonC, Scott Chacon

Hi everyone!

A draft of a new Git Rev News edition is available here:

  https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-106.md

Everyone is welcome to contribute in any section either by editing the
above page on GitHub and sending a pull request, or by commenting on
this GitHub issue:

  https://github.com/git/git.github.io/issues/675

You can also reply to this email.

In general all kinds of contributions, for example proofreading,
suggestions for articles or links, help on the issues in GitHub,
volunteering for being interviewed and so on, are very much
appreciated.

I tried to Cc everyone who appears in this edition, but maybe I missed
some people, sorry about that.

Jakub, Markus, Kaartic and I plan to publish this edition on
Monday January 1st, 2024.

Thanks,
Christian.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox