* Re: Concurrent fetch commands
From: Dragan Simic @ 2023-12-31 17:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Stefan Haller, git
In-Reply-To: <xmqqy1daffk8.fsf@gitster.g>
On 2023-12-31 18:27, Junio C Hamano wrote:
> Stefan Haller <lists@haller-berlin.de> writes:
>
>> I can reliably reproduce this by doing
>>
>> $ git fetch&; sleep 0.1; git pull
>> [1] 42160
>> [1] + done git fetch
>> fatal: Cannot rebase onto multiple branches.
>
> I see a bug here.
>
> How this _ought_ to work is
>
> - The first "git fetch" wants to report what it fetched by writing
> into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
> the fetch finishes can consume its contents).
>
> - The second "git pull" runs "git fetch" under the hood. Because
> it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
> is already somebody writing to the file, it should notice and
> barf, saying "fatal: a 'git fetch' is already working" or
> something.
>
> But because there is no "Do not overwrite FETCH_HEAD somebody else
> is using" protection, "git merge" or "git rebase" that is run as the
> second half of the "git pull" ends up working on the contents of
> FETCH_HEAD that is undefined, and GIGO result follows.
>
> The "bug" that the second "git fetch" does not notice an already
> running one (who is in possession of FETCH_HEAD) and refrain from
> starting is not easy to design a fix for---we cannot just abort by
> opening it with O_CREAT|O_EXCL because it is a normal thing for
> $GIT_DIR/FETCH_HEAD to exist after the "last" fetch. We truncate
> its contents before starting to avoid getting affected by contents
> leftover by the last fetch, but when there is a "git fetch" that is
> actively running, and it finishes _after_ the second one starts and
> truncates the file, the second one will end up seeing the contents
> the first one left. We have the "--no-write-fetch-head" option for
> users to explicitly tell which invocation of "git fetch" should not
> write FETCH_HEAD.
>
> Running "background/priming" fetches (the one before "sleep 0.1" you
> have) is not a crime by itself, but it is a crime to run them
> without the "--no-fetch-head" option. Since you have *NO* intention
> of using its contents to feed a "git merge" (or equivalent)
> yourself, you are breaking your "git pull" step in your example
> reproduction yourself.
Thank you very much for this highly detailed explanation.
^ permalink raw reply
* Re: [PATCH] write-or-die: make GIT_FLUSH a Boolean environment variable
From: Torsten Bögershausen @ 2024-01-01 8:24 UTC (permalink / raw)
To: Chandra Pratap via GitGitGadget; +Cc: git, Chandra Pratap, Chandra Pratap
In-Reply-To: <pull.1628.git.1703955246308.gitgitgadget@gmail.com>
On Sat, Dec 30, 2023 at 04:54:06PM +0000, Chandra Pratap via GitGitGadget wrote:
> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> Among Git's environment variable, the ones marked as "Boolean"
> accept values in a way similar to Boolean configuration variables,
> i.e. values like 'yes', 'on', 'true' and positive numbers are
> taken as "on" and values like 'no', 'off', 'false' are taken as
> "off".
> GIT_FLUSH can be used to force Git to use non-buffered I/O when
> writing to stdout. It can only accept two values, '1' which causes
> Git to flush more often and '0' which makes all output buffered.
> Make GIT_FLUSH accept more values besides '0' and '1' by turning it
> into a Boolean environment variable, modifying the required logic.
> Update the related documentation.
>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
> write-or-die: make GIT_FLUSH a Boolean environment variable
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1628%2FChand-ra%2Fgit_flush-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1628/Chand-ra/git_flush-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1628
>
> Documentation/git.txt | 16 +++++++---------
> write-or-die.c | 9 ++++-----
> 2 files changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index 2535a30194f..83fd60f2d11 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -724,16 +724,14 @@ for further details.
> waiting for someone with sufficient permissions to fix it.
>
> `GIT_FLUSH`::
> -// NEEDSWORK: make it into a usual Boolean environment variable
> - If this environment variable is set to "1", then commands such
> + If this Boolean environment variable is set to true, then commands such
> as 'git blame' (in incremental mode), 'git rev-list', 'git log',
> - 'git check-attr' and 'git check-ignore' will
> - force a flush of the output stream after each record have been
> - flushed. If this
> - variable is set to "0", the output of these commands will be done
> - using completely buffered I/O. If this environment variable is
> - not set, Git will choose buffered or record-oriented flushing
> - based on whether stdout appears to be redirected to a file or not.
> + 'git check-attr' and 'git check-ignore' will force a flush of the output
> + stream after each record have been flushed. If this variable is set to
> + false, the output of these commands will be done using completely buffered
> + I/O. If this environment variable is not set, Git will choose buffered or
> + record-oriented flushing based on whether stdout appears to be redirected
> + to a file or not.
>
> `GIT_TRACE`::
> Enables general trace messages, e.g. alias expansion, built-in
> diff --git a/write-or-die.c b/write-or-die.c
> index 42a2dc73cd3..f501a6e382a 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -20,14 +20,13 @@ void maybe_flush_or_die(FILE *f, const char *desc)
> {
> static int skip_stdout_flush = -1;
> struct stat st;
> - char *cp;
> + int cp;
>
> if (f == stdout) {
> if (skip_stdout_flush < 0) {
> - /* NEEDSWORK: make this a normal Boolean */
> - cp = getenv("GIT_FLUSH");
> - if (cp)
> - skip_stdout_flush = (atoi(cp) == 0);
> + cp = git_env_bool("GIT_FLUSH", -1);
> + if (cp >= 0)
> + skip_stdout_flush = (cp == 0);
> else if ((fstat(fileno(stdout), &st) == 0) &&
> S_ISREG(st.st_mode))
> skip_stdout_flush = 1;
I think that the "cp" variable could be renamed,
cp is not a "char pointer" any more.
However, using the logic from git_env_bool(), it can go away anyway,
wouldn't it ?
diff --git a/write-or-die.c b/write-or-die.c
index 42a2dc73cd..01b042bf12 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -13,21 +13,21 @@
* more. So we just ignore that case instead (and hope we get
* the right error code on the flush).
*
+ * The flushing can be skipped like this:
+ * GIT_FLUSH=0 git-rev-list HEAD
+ *
* If the file handle is stdout, and stdout is a file, then skip the
- * flush entirely since it's not needed.
+ * flush as well since it's not needed.
*/
void maybe_flush_or_die(FILE *f, const char *desc)
{
static int skip_stdout_flush = -1;
struct stat st;
- char *cp;
if (f == stdout) {
if (skip_stdout_flush < 0) {
- /* NEEDSWORK: make this a normal Boolean */
- cp = getenv("GIT_FLUSH");
- if (cp)
- skip_stdout_flush = (atoi(cp) == 0);
+ if (git_env_bool("GIT_FLUSH", -1) == 0)
+ skip_stdout_flush = 1;
else if ((fstat(fileno(stdout), &st) == 0) &&
S_ISREG(st.st_mode))
skip_stdout_flush = 1;
^ permalink raw reply
* [Outreachy][PATCH v3] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: Achu Luma @ 2024-01-01 10:40 UTC (permalink / raw)
To: git
Cc: chriscool, christian.couder, gitster, l.s.r, phillip.wood,
steadmon, Achu Luma
In-Reply-To: <20231230001102.9220-1-ach.lumap@gmail.com>
In the recent codebase update(8bf6fbd00d (Merge branch
'js/doc-unit-tests', 2023-12-09)), a new unit testing framework was
merged, providing a standardized approach for testing C code. Prior to
this update, some unit tests relied on the test helper mechanism,
lacking a dedicated unit testing framework. It's more natural to perform
these unit tests using the new unit test framework.
This commit migrates the unit tests for C character classification
functions (isdigit(), isspace(), etc) from the legacy approach
using the test-tool command `test-tool ctype` in t/helper/test-ctype.c
to the new unit testing framework (t/unit-tests/test-lib.h).
The migration involves refactoring the tests to utilize the testing
macros provided by the framework (TEST() and check_*()).
Signed-off-by: Achu Luma <ach.lumap@gmail.com>
---
Sorry for the poor description of the changes, maybe the following is better:
In the revised patch we added a call to test_msg() suggested by Phillip to
print the character code. This helps us pinpoint exactly the byte value that
is an issue as suggested by Junio. We keep using check_int() as it allows us
to still see the actual and expected return which might help us in case func()
returned a different non-zero value when we were expecting "1". It is useful
to have the return value printed on error in case we start returning "53"
instead of "1" for "true".
Makefile | 2 +-
t/helper/test-ctype.c | 70 --------------------------------------
t/helper/test-tool.c | 1 -
t/helper/test-tool.h | 1 -
t/t0070-fundamental.sh | 4 ---
t/unit-tests/t-ctype.c | 77 ++++++++++++++++++++++++++++++++++++++++++
6 files changed, 78 insertions(+), 77 deletions(-)
delete mode 100644 t/helper/test-ctype.c
create mode 100644 t/unit-tests/t-ctype.c
diff --git a/Makefile b/Makefile
index 88ba7a3c51..a4df48ba65 100644
--- a/Makefile
+++ b/Makefile
@@ -792,7 +792,6 @@ TEST_BUILTINS_OBJS += test-chmtime.o
TEST_BUILTINS_OBJS += test-config.o
TEST_BUILTINS_OBJS += test-crontab.o
TEST_BUILTINS_OBJS += test-csprng.o
-TEST_BUILTINS_OBJS += test-ctype.o
TEST_BUILTINS_OBJS += test-date.o
TEST_BUILTINS_OBJS += test-delta.o
TEST_BUILTINS_OBJS += test-dir-iterator.o
@@ -1341,6 +1340,7 @@ THIRD_PARTY_SOURCES += sha1dc/%
UNIT_TEST_PROGRAMS += t-basic
UNIT_TEST_PROGRAMS += t-strbuf
+UNIT_TEST_PROGRAMS += t-ctype
UNIT_TEST_PROGS = $(patsubst %,$(UNIT_TEST_BIN)/%$X,$(UNIT_TEST_PROGRAMS))
UNIT_TEST_OBJS = $(patsubst %,$(UNIT_TEST_DIR)/%.o,$(UNIT_TEST_PROGRAMS))
UNIT_TEST_OBJS += $(UNIT_TEST_DIR)/test-lib.o
diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
deleted file mode 100644
index e5659df40b..0000000000
--- a/t/helper/test-ctype.c
+++ /dev/null
@@ -1,70 +0,0 @@
-#include "test-tool.h"
-
-static int rc;
-
-static void report_error(const char *class, int ch)
-{
- printf("%s classifies char %d (0x%02x) wrongly\n", class, ch, ch);
- rc = 1;
-}
-
-static int is_in(const char *s, int ch)
-{
- /*
- * We can't find NUL using strchr. Accept it as the first
- * character in the spec -- there are no empty classes.
- */
- if (ch == '\0')
- return ch == *s;
- if (*s == '\0')
- s++;
- return !!strchr(s, ch);
-}
-
-#define TEST_CLASS(t,s) { \
- int i; \
- for (i = 0; i < 256; i++) { \
- if (is_in(s, i) != t(i)) \
- report_error(#t, i); \
- } \
- if (t(EOF)) \
- report_error(#t, EOF); \
-}
-
-#define DIGIT "0123456789"
-#define LOWER "abcdefghijklmnopqrstuvwxyz"
-#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
-#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
-#define ASCII \
- "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
- "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
- "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
- "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
- "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
- "\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
- "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
- "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
-#define CNTRL \
- "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
- "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
- "\x7f"
-
-int cmd__ctype(int argc UNUSED, const char **argv UNUSED)
-{
- TEST_CLASS(isdigit, DIGIT);
- TEST_CLASS(isspace, " \n\r\t");
- TEST_CLASS(isalpha, LOWER UPPER);
- TEST_CLASS(isalnum, LOWER UPPER DIGIT);
- TEST_CLASS(is_glob_special, "*?[\\");
- TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
- TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
- TEST_CLASS(isascii, ASCII);
- TEST_CLASS(islower, LOWER);
- TEST_CLASS(isupper, UPPER);
- TEST_CLASS(iscntrl, CNTRL);
- TEST_CLASS(ispunct, PUNCT);
- TEST_CLASS(isxdigit, DIGIT "abcdefABCDEF");
- TEST_CLASS(isprint, LOWER UPPER DIGIT PUNCT " ");
-
- return rc;
-}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 37ba996539..33b9501c21 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -19,7 +19,6 @@ static struct test_cmd cmds[] = {
{ "config", cmd__config },
{ "crontab", cmd__crontab },
{ "csprng", cmd__csprng },
- { "ctype", cmd__ctype },
{ "date", cmd__date },
{ "delta", cmd__delta },
{ "dir-iterator", cmd__dir_iterator },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 8a1a7c63da..b72f07ded9 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -12,7 +12,6 @@ int cmd__chmtime(int argc, const char **argv);
int cmd__config(int argc, const char **argv);
int cmd__crontab(int argc, const char **argv);
int cmd__csprng(int argc, const char **argv);
-int cmd__ctype(int argc, const char **argv);
int cmd__date(int argc, const char **argv);
int cmd__delta(int argc, const char **argv);
int cmd__dir_iterator(int argc, const char **argv);
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 487bc8d905..a4756fbab9 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -9,10 +9,6 @@ Verify wrappers and compatibility functions.
TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
-test_expect_success 'character classes (isspace, isalpha etc.)' '
- test-tool ctype
-'
-
test_expect_success 'mktemp to nonexistent directory prints filename' '
test_must_fail test-tool mktemp doesnotexist/testXXXXXX 2>err &&
grep "doesnotexist/test" err
diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
new file mode 100644
index 0000000000..8a215d387a
--- /dev/null
+++ b/t/unit-tests/t-ctype.c
@@ -0,0 +1,77 @@
+#include "test-lib.h"
+
+static int is_in(const char *s, int ch)
+{
+ /*
+ * We can't find NUL using strchr. Accept it as the first
+ * character in the spec -- there are no empty classes.
+ */
+ if (ch == '\0')
+ return ch == *s;
+ if (*s == '\0')
+ s++;
+ return !!strchr(s, ch);
+}
+
+/* Macro to test a character type */
+#define TEST_CTYPE_FUNC(func, string) \
+static void test_ctype_##func(void) \
+{ \
+ for (int i = 0; i < 256; i++) { \
+ if (!check_int(func(i), ==, is_in(string, i))) \
+ test_msg(" i: %02x", i); \
+ } \
+}
+
+#define DIGIT "0123456789"
+#define LOWER "abcdefghijklmnopqrstuvwxyz"
+#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
+#define ASCII \
+ "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+ "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+ "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
+ "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
+ "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
+ "\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
+ "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
+ "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+#define CNTRL \
+ "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
+ "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
+ "\x7f"
+
+TEST_CTYPE_FUNC(isdigit, DIGIT)
+TEST_CTYPE_FUNC(isspace, " \n\r\t")
+TEST_CTYPE_FUNC(isalpha, LOWER UPPER)
+TEST_CTYPE_FUNC(isalnum, LOWER UPPER DIGIT)
+TEST_CTYPE_FUNC(is_glob_special, "*?[\\")
+TEST_CTYPE_FUNC(is_regex_special, "$()*+.?[\\^{|")
+TEST_CTYPE_FUNC(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~")
+TEST_CTYPE_FUNC(isascii, ASCII)
+TEST_CTYPE_FUNC(islower, LOWER)
+TEST_CTYPE_FUNC(isupper, UPPER)
+TEST_CTYPE_FUNC(iscntrl, CNTRL)
+TEST_CTYPE_FUNC(ispunct, PUNCT)
+TEST_CTYPE_FUNC(isxdigit, DIGIT "abcdefABCDEF")
+TEST_CTYPE_FUNC(isprint, LOWER UPPER DIGIT PUNCT " ")
+
+int cmd_main(int argc, const char **argv) {
+ /* Run all character type tests */
+ TEST(test_ctype_isspace(), "isspace() works as we expect");
+ TEST(test_ctype_isdigit(), "isdigit() works as we expect");
+ TEST(test_ctype_isalpha(), "isalpha() works as we expect");
+ TEST(test_ctype_isalnum(), "isalnum() works as we expect");
+ TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
+ TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
+ TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
+ TEST(test_ctype_isascii(), "isascii() works as we expect");
+ TEST(test_ctype_islower(), "islower() works as we expect");
+ TEST(test_ctype_isupper(), "isupper() works as we expect");
+ TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
+ TEST(test_ctype_ispunct(), "ispunct() works as we expect");
+ TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
+ TEST(test_ctype_isprint(), "isprint() works as we expect");
+
+ return test_done();
+}
--
2.42.0.windows.2
^ permalink raw reply related
* Re: Concurrent fetch commands
From: Stefan Haller @ 2024-01-01 11:23 UTC (permalink / raw)
To: Konstantin Tokarev; +Cc: git
In-Reply-To: <20231231165042.1d934927@RedEyes>
On 31.12.23 14:50, Konstantin Tokarev wrote:
> В Sun, 31 Dec 2023 14:30:05 +0100
> Stefan Haller <lists@haller-berlin.de> пишет:
>
>> ... it is common for git clients to run git fetch
>> periodically in the background.
>
> I think it's a really weird idea which makes a disservice both to human
> git users and git servers. IMO clients doing this should be fixed to
> avoid such behavior, at least by default.
I disagree pretty strongly. Users expect this, and I do find it very
convenient myself.
^ permalink raw reply
* Re: Concurrent fetch commands
From: Stefan Haller @ 2024-01-01 11:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqy1daffk8.fsf@gitster.g>
On 31.12.23 18:27, Junio C Hamano wrote:
> How this _ought_ to work is
>
> ... it should notice and
> barf, saying "fatal: a 'git fetch' is already working" or
> something.
Interesting, I had expected this to work somehow, e.g. by sequencing the
operations or whatever is necessary to make it work. Fixing the bug like
you suggest actually makes very little difference in practice, it just
gives a slightly less confusing error message.
> but it is a crime to run them without the "--no-fetch-head" option.
Ouch, I wasn't aware we are committing crimes. We'll accept the
punishment. :-)
But it does sound like --no-write-fetch-head will solve our problem,
thanks!
-Stefan
^ permalink raw reply
* Re: Concurrent fetch commands
From: Stefan Haller @ 2024-01-01 11:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <9b5ff583-f8b9-42dd-8829-2fea74bc2057@haller-berlin.de>
On 01.01.24 12:30, Stefan Haller wrote:
> But it does sound like --no-write-fetch-head will solve our problem,
> thanks!
Actually we saw another problem recently that I suspect might also be
caused by the concurrency between fetch and pull, but I'm not sure. I'll
explain it here in case anybody has any idea.
What happened was that a user tried to pull a branch that was behind its
upstream (not diverged). They got the error message from
show_advice_pull_non_ff ("Need to specify how to reconcile divergent
branches"); the log then showed that the background fetch was ongoing
for the remote of the current branch while they initiated the pull.
Trying to pull again after the background fetch had moved on to the next
remote then worked.
I read the code in pull.c a bit, but I can't see how it can become so
confused about being diverged in this scenario. Any ideas?
-Stefan
^ permalink raw reply
* [RFC PATCH] grep: default to posix digits with -P
From: Carlo Marcelo Arenas Belón @ 2024-01-01 15:03 UTC (permalink / raw)
To: git; +Cc: Carlo Marcelo Arenas Belón
Since acabd2048e (grep: correctly identify utf-8 characters with
\{b,w} in -P, 2023-01-08), PCRE2's UCP mode has been enabled when
UTF content was expected and therefore muktibyte characters are
included as part of all character classes (when defined).
note that if the locale used is not UTF enabled (specially: C, POSIX)
or uses an extended charmap binary that is not unicode compatible, binary
match will be used instead.
It was argued that doing so, at least for \d, was not a good idea,
as that might not be what the user expected based on its historical
meaning and was also slower, and indeed a similar change that was done
to GNU grep was reverted and required further tweaks.
At that time, PCRE2 didn't have a way to disable UCP's character
expansion selectively, but flags to do so, and that will be
available in the next release (planned soon) were added, and
one of them has been in use by GNU grep since their last release
in May (only if built and linked against the prereleased PCRE2
library though).
Add flags to make sure that both \d and [:digit:] only include
ASCII digits so that `git grep` will be closer to GNU grep and
improve performance as a side effect, but add a configuration flag
to allow keeping the current behaviour (which is closer to perl
and ripgrep).
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
Documentation/config/grep.txt | 4 ++++
grep.c | 37 ++++++++++++++++++++++-------
grep.h | 5 ++++
t/perf/p7822-grep-perl-character.sh | 11 +++++++--
t/t7818-grep-digit.sh | 32 +++++++++++++++++++++++++
5 files changed, 78 insertions(+), 11 deletions(-)
create mode 100755 t/t7818-grep-digit.sh
diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt
index e521f20390..4e405c8ad1 100644
--- a/Documentation/config/grep.txt
+++ b/Documentation/config/grep.txt
@@ -26,3 +26,7 @@ grep.fullName::
grep.fallbackToNoIndex::
If set to true, fall back to git grep --no-index if git grep
is executed outside of a git repository. Defaults to false.
+
+grep.perl.digit::
+ If set to true, use the perl definitions for \d and [:digit:].
+ Defaults to false.
diff --git a/grep.c b/grep.c
index fc2d0c837a..fec36ccb30 100644
--- a/grep.c
+++ b/grep.c
@@ -88,6 +88,11 @@ int grep_config(const char *var, const char *value,
return 0;
}
+ if (!strcmp(var, "grep.perl.digit")) {
+ opt->perl_digit = git_config_bool(var, value);
+ return 0;
+ }
+
if (!strcmp(var, "color.grep"))
opt->color = git_config_colorbool(var, value);
if (!strcmp(var, "color.grep.match")) {
@@ -301,6 +306,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
int patinforet;
size_t jitsizearg;
int literal = !opt->ignore_case && (p->fixed || p->is_fixed);
+ uint32_t xoptions = 0;
/*
* Call pcre2_general_context_create() before calling any
@@ -321,17 +327,22 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
}
options |= PCRE2_CASELESS;
}
- if (!opt->ignore_locale && is_utf8_locale() && !literal)
- options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
+ if (!opt->ignore_locale && is_utf8_locale() && !literal) {
+ options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
-#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
- /*
- * Work around a JIT bug related to invalid Unicode character handling
- * fixed in 10.35:
- * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
- */
- options &= ~PCRE2_UCP;
+#ifdef GIT_PCRE2_VERSION_10_35_OR_HIGHER
+ /*
+ * Work around a JIT bug related to invalid Unicode character handling
+ * fixed in 10.35:
+ * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
+ */
+ options |= PCRE2_UCP;
+#ifdef GIT_PCRE2_VERSION_10_43_OR_HIGHER
+ if (!opt->perl_digit)
+ xoptions |= (PCRE2_EXTRA_ASCII_BSD | PCRE2_EXTRA_ASCII_DIGIT);
#endif
+#endif
+ }
#ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
/* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
@@ -339,6 +350,14 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
options |= PCRE2_NO_START_OPTIMIZE;
#endif
+ if (xoptions) {
+ if (!p->pcre2_compile_context)
+ p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context);
+
+ pcre2_set_compile_extra_options(p->pcre2_compile_context,
+ xoptions);
+ }
+
p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
p->patternlen, options, &error, &erroffset,
p->pcre2_compile_context);
diff --git a/grep.h b/grep.h
index 926c0875c4..cd5c416a0a 100644
--- a/grep.h
+++ b/grep.h
@@ -4,6 +4,9 @@
#ifdef USE_LIBPCRE2
#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>
+#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 43) || PCRE2_MAJOR >= 11
+#define GIT_PCRE2_VERSION_10_43_OR_HIGHER
+#endif
#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
#define GIT_PCRE2_VERSION_10_36_OR_HIGHER
#endif
@@ -178,6 +181,8 @@ struct grep_opt {
void (*output)(struct grep_opt *opt, const void *data, size_t size);
void *output_priv;
+
+ unsigned perl_digit:1;
};
#define GREP_OPT_INIT { \
diff --git a/t/perf/p7822-grep-perl-character.sh b/t/perf/p7822-grep-perl-character.sh
index 87009c60df..cc88d5a695 100755
--- a/t/perf/p7822-grep-perl-character.sh
+++ b/t/perf/p7822-grep-perl-character.sh
@@ -8,6 +8,13 @@ etc.) we will test the patterns under those numbers of threads.
. ./perf-lib.sh
+# setting a LOCALE is needed, but not yet supported by :
+#. "$TEST_DIRECTORY"/lib-gettext.sh
+
+# Invoke like:
+#
+# LC_ALL=is_IS.utf8 ./p7822-grep-perl-character.sh
+
test_perf_large_repo
test_checkout_worktree
@@ -27,13 +34,13 @@ do
if ! test_have_prereq PERF_GREP_ENGINES_THREADS
then
test_perf "grep -P '$pattern'" --prereq PCRE "
- git -P grep -f pat || :
+ git grep -P -f pat || :
"
else
for threads in $GIT_PERF_GREP_THREADS
do
test_perf "grep -P '$pattern' with $threads threads" --prereq PTHREADS,PCRE "
- git -c grep.threads=$threads -P grep -f pat || :
+ git -c grep.threads=$threads grep -P -f pat || :
"
done
fi
diff --git a/t/t7818-grep-digit.sh b/t/t7818-grep-digit.sh
new file mode 100755
index 0000000000..44007e6be6
--- /dev/null
+++ b/t/t7818-grep-digit.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='git grep -P with digits'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./lib-gettext.sh
+
+test_expect_success 'setup' '
+ echo 2023 >ascii &&
+ printf "\357\274\222\357\274\220\357\274\222\357\274\223\n" >fullwidth &&
+ printf "\331\241\331\244\331\244\331\245\n" >multibyte &&
+ git add . &&
+ git commit -m. &&
+ LC_ALL="$is_IS_locale" &&
+ export LC_ALL
+'
+
+test_expect_success PCRE 'grep -P "\d"' '
+ echo "ascii:2023" >expected &&
+ git grep -P "\d{2}[[:digit:]]{2}" >actual &&
+ test_cmp expected actual
+'
+
+test_expect_success PCRE 'git -c grep.perl.digit' '
+ test_config grep.perl.digit true &&
+ git grep -P "\d{2}[[:digit:]]{2}" >actual &&
+ grep fullwidth actual &&
+ grep multibyte actual &&
+ test_line_count = 3 actual
+'
+
+test_done
--
2.39.3 (Apple Git-145)
^ permalink raw reply related
* Re: Concurrent fetch commands
From: Federico Kircheis @ 2024-01-01 15:47 UTC (permalink / raw)
To: git
In-Reply-To: <55620f4f-80f9-4e9a-8947-dce5d2a5113d@haller-berlin.de>
On 01/01/2024 12.23, Stefan Haller wrote:
> On 31.12.23 14:50, Konstantin Tokarev wrote:
>> В Sun, 31 Dec 2023 14:30:05 +0100
>> Stefan Haller <lists@haller-berlin.de> пишет:
>>
>>> ... it is common for git clients to run git fetch
>>> periodically in the background.
>>
>> I think it's a really weird idea which makes a disservice both to human
>> git users and git servers. IMO clients doing this should be fixed to
>> avoid such behavior, at least by default.
>
> I disagree pretty strongly. Users expect this, and I do find it very
> convenient myself.
>
Especially when working with git worktree.
^ permalink raw reply
* Re: [Outreachy][PATCH v3] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: René Scharfe @ 2024-01-01 16:41 UTC (permalink / raw)
To: Achu Luma, git
Cc: chriscool, christian.couder, gitster, phillip.wood, steadmon
In-Reply-To: <20240101104017.9452-2-ach.lumap@gmail.com>
Am 01.01.24 um 11:40 schrieb Achu Luma:
> In the recent codebase update(8bf6fbd00d (Merge branch
> 'js/doc-unit-tests', 2023-12-09)), a new unit testing framework was
> merged, providing a standardized approach for testing C code. Prior to
> this update, some unit tests relied on the test helper mechanism,
> lacking a dedicated unit testing framework. It's more natural to perform
> these unit tests using the new unit test framework.
>
> This commit migrates the unit tests for C character classification
> functions (isdigit(), isspace(), etc) from the legacy approach
> using the test-tool command `test-tool ctype` in t/helper/test-ctype.c
> to the new unit testing framework (t/unit-tests/test-lib.h).
>
> The migration involves refactoring the tests to utilize the testing
> macros provided by the framework (TEST() and check_*()).
>
> Signed-off-by: Achu Luma <ach.lumap@gmail.com>
v1 and v2 had "Mentored-by: Christian Couder <chriscool@tuxfamily.org>".
It's gone now; intentionally?
> ---
> Sorry for the poor description of the changes, maybe the following is better:
> In the revised patch we added a call to test_msg() suggested by Phillip to
> print the character code. This helps us pinpoint exactly the byte value that
> is an issue as suggested by Junio. We keep using check_int() as it allows us
> to still see the actual and expected return which might help us in case func()
> returned a different non-zero value when we were expecting "1". It is useful
> to have the return value printed on error in case we start returning "53"
> instead of "1" for "true".
To illustrate, here are the changes between v1 and v2 in diff form:
--- >8 ---
diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
index 41189ba9f9..8a215d387a 100644
--- a/t/unit-tests/t-ctype.c
+++ b/t/unit-tests/t-ctype.c
@@ -14,12 +14,13 @@ static int is_in(const char *s, int ch)
}
/* Macro to test a character type */
-#define TEST_CTYPE_FUNC(func, string) \
+#define TEST_CTYPE_FUNC(func, string) \
static void test_ctype_##func(void) \
{ \
- int i; \
- for (i = 0; i < 256; i++) \
- check_int(func(i), ==, is_in(string, i)); \
+ for (int i = 0; i < 256; i++) { \
+ if (!check_int(func(i), ==, is_in(string, i))) \
+ test_msg(" i: %02x", i); \
+ } \
}
#define DIGIT "0123456789"
--- >8 ---
v3 is the same as v2.
>
> Makefile | 2 +-
> t/helper/test-ctype.c | 70 --------------------------------------
> t/helper/test-tool.c | 1 -
> t/helper/test-tool.h | 1 -
> t/t0070-fundamental.sh | 4 ---
> t/unit-tests/t-ctype.c | 77 ++++++++++++++++++++++++++++++++++++++++++
> 6 files changed, 78 insertions(+), 77 deletions(-)
> delete mode 100644 t/helper/test-ctype.c
> create mode 100644 t/unit-tests/t-ctype.c
>
> diff --git a/Makefile b/Makefile
> index 88ba7a3c51..a4df48ba65 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -792,7 +792,6 @@ TEST_BUILTINS_OBJS += test-chmtime.o
> TEST_BUILTINS_OBJS += test-config.o
> TEST_BUILTINS_OBJS += test-crontab.o
> TEST_BUILTINS_OBJS += test-csprng.o
> -TEST_BUILTINS_OBJS += test-ctype.o
> TEST_BUILTINS_OBJS += test-date.o
> TEST_BUILTINS_OBJS += test-delta.o
> TEST_BUILTINS_OBJS += test-dir-iterator.o
> @@ -1341,6 +1340,7 @@ THIRD_PARTY_SOURCES += sha1dc/%
>
> UNIT_TEST_PROGRAMS += t-basic
> UNIT_TEST_PROGRAMS += t-strbuf
> +UNIT_TEST_PROGRAMS += t-ctype
> UNIT_TEST_PROGS = $(patsubst %,$(UNIT_TEST_BIN)/%$X,$(UNIT_TEST_PROGRAMS))
> UNIT_TEST_OBJS = $(patsubst %,$(UNIT_TEST_DIR)/%.o,$(UNIT_TEST_PROGRAMS))
> UNIT_TEST_OBJS += $(UNIT_TEST_DIR)/test-lib.o
> diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
> deleted file mode 100644
> index e5659df40b..0000000000
> --- a/t/helper/test-ctype.c
> +++ /dev/null
> @@ -1,70 +0,0 @@
> -#include "test-tool.h"
> -
> -static int rc;
> -
> -static void report_error(const char *class, int ch)
> -{
> - printf("%s classifies char %d (0x%02x) wrongly\n", class, ch, ch);
> - rc = 1;
> -}
> -
> -static int is_in(const char *s, int ch)
> -{
> - /*
> - * We can't find NUL using strchr. Accept it as the first
> - * character in the spec -- there are no empty classes.
> - */
> - if (ch == '\0')
> - return ch == *s;
> - if (*s == '\0')
> - s++;
> - return !!strchr(s, ch);
> -}
> -
> -#define TEST_CLASS(t,s) { \
> - int i; \
> - for (i = 0; i < 256; i++) { \
> - if (is_in(s, i) != t(i)) \
> - report_error(#t, i); \
> - } \
> - if (t(EOF)) \
> - report_error(#t, EOF); \
> -}
> -
> -#define DIGIT "0123456789"
> -#define LOWER "abcdefghijklmnopqrstuvwxyz"
> -#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> -#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
> -#define ASCII \
> - "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> - "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> - "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
> - "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
> - "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
> - "\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
> - "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
> - "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
> -#define CNTRL \
> - "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> - "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> - "\x7f"
> -
> -int cmd__ctype(int argc UNUSED, const char **argv UNUSED)
> -{
> - TEST_CLASS(isdigit, DIGIT);
> - TEST_CLASS(isspace, " \n\r\t");
> - TEST_CLASS(isalpha, LOWER UPPER);
> - TEST_CLASS(isalnum, LOWER UPPER DIGIT);
> - TEST_CLASS(is_glob_special, "*?[\\");
> - TEST_CLASS(is_regex_special, "$()*+.?[\\^{|");
> - TEST_CLASS(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~");
> - TEST_CLASS(isascii, ASCII);
> - TEST_CLASS(islower, LOWER);
> - TEST_CLASS(isupper, UPPER);
> - TEST_CLASS(iscntrl, CNTRL);
> - TEST_CLASS(ispunct, PUNCT);
> - TEST_CLASS(isxdigit, DIGIT "abcdefABCDEF");
> - TEST_CLASS(isprint, LOWER UPPER DIGIT PUNCT " ");
> -
> - return rc;
> -}
> diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
> index 37ba996539..33b9501c21 100644
> --- a/t/helper/test-tool.c
> +++ b/t/helper/test-tool.c
> @@ -19,7 +19,6 @@ static struct test_cmd cmds[] = {
> { "config", cmd__config },
> { "crontab", cmd__crontab },
> { "csprng", cmd__csprng },
> - { "ctype", cmd__ctype },
> { "date", cmd__date },
> { "delta", cmd__delta },
> { "dir-iterator", cmd__dir_iterator },
> diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> index 8a1a7c63da..b72f07ded9 100644
> --- a/t/helper/test-tool.h
> +++ b/t/helper/test-tool.h
> @@ -12,7 +12,6 @@ int cmd__chmtime(int argc, const char **argv);
> int cmd__config(int argc, const char **argv);
> int cmd__crontab(int argc, const char **argv);
> int cmd__csprng(int argc, const char **argv);
> -int cmd__ctype(int argc, const char **argv);
> int cmd__date(int argc, const char **argv);
> int cmd__delta(int argc, const char **argv);
> int cmd__dir_iterator(int argc, const char **argv);
> diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
> index 487bc8d905..a4756fbab9 100755
> --- a/t/t0070-fundamental.sh
> +++ b/t/t0070-fundamental.sh
> @@ -9,10 +9,6 @@ Verify wrappers and compatibility functions.
> TEST_PASSES_SANITIZE_LEAK=true
> . ./test-lib.sh
>
> -test_expect_success 'character classes (isspace, isalpha etc.)' '
> - test-tool ctype
> -'
> -
> test_expect_success 'mktemp to nonexistent directory prints filename' '
> test_must_fail test-tool mktemp doesnotexist/testXXXXXX 2>err &&
> grep "doesnotexist/test" err
> diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
> new file mode 100644
> index 0000000000..8a215d387a
> --- /dev/null
> +++ b/t/unit-tests/t-ctype.c
> @@ -0,0 +1,77 @@
> +#include "test-lib.h"
> +
> +static int is_in(const char *s, int ch)
> +{
> + /*
> + * We can't find NUL using strchr. Accept it as the first
> + * character in the spec -- there are no empty classes.
> + */
> + if (ch == '\0')
> + return ch == *s;
> + if (*s == '\0')
> + s++;
> + return !!strchr(s, ch);
> +}
> +
> +/* Macro to test a character type */
> +#define TEST_CTYPE_FUNC(func, string) \
> +static void test_ctype_##func(void) \
> +{ \
> + for (int i = 0; i < 256; i++) { \
> + if (!check_int(func(i), ==, is_in(string, i))) \
> + test_msg(" i: %02x", i); \
> + } \
This loop is indented with spaces followed by tabs. The Git project
prefers indenting by tabs and in some cases tabs followed by spaces, but
not the other way array. git am warns about such whitespace errors and
can actually fix them automatically, so I imagine this wouldn't be a
deal breaker. But even if it seems picky, respecting the project's
preferences from the start reduces unnecessary friction.
The original test reported the number of a misclassified character
(basically its ASCII code) in both decimal and hexadecimal form. This
code prints it only in hexadecimal, but without the prefix "0x". A
casual reader could mistake hexadecimal numbers for decimal ones as a
result. Printing only one suffices, but I think it's better to either
use decimal notation without any prefix or hexadecimal with the "0x"
prefix to avoid confusion. There's no reason to be stingy with the
screen space here.
> +}
> +
> +#define DIGIT "0123456789"
> +#define LOWER "abcdefghijklmnopqrstuvwxyz"
> +#define UPPER "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> +#define PUNCT "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
> +#define ASCII \
> + "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> + "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> + "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f" \
> + "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f" \
> + "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f" \
> + "\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f" \
> + "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f" \
> + "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
> +#define CNTRL \
> + "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f" \
> + "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f" \
> + "\x7f"
> +
> +TEST_CTYPE_FUNC(isdigit, DIGIT)
> +TEST_CTYPE_FUNC(isspace, " \n\r\t")
> +TEST_CTYPE_FUNC(isalpha, LOWER UPPER)
> +TEST_CTYPE_FUNC(isalnum, LOWER UPPER DIGIT)
> +TEST_CTYPE_FUNC(is_glob_special, "*?[\\")
> +TEST_CTYPE_FUNC(is_regex_special, "$()*+.?[\\^{|")
> +TEST_CTYPE_FUNC(is_pathspec_magic, "!\"#%&',-/:;<=>@_`~")
> +TEST_CTYPE_FUNC(isascii, ASCII)
> +TEST_CTYPE_FUNC(islower, LOWER)
> +TEST_CTYPE_FUNC(isupper, UPPER)
> +TEST_CTYPE_FUNC(iscntrl, CNTRL)
> +TEST_CTYPE_FUNC(ispunct, PUNCT)
> +TEST_CTYPE_FUNC(isxdigit, DIGIT "abcdefABCDEF")
> +TEST_CTYPE_FUNC(isprint, LOWER UPPER DIGIT PUNCT " ")
> +
> +int cmd_main(int argc, const char **argv) {
> + /* Run all character type tests */
> + TEST(test_ctype_isspace(), "isspace() works as we expect");
> + TEST(test_ctype_isdigit(), "isdigit() works as we expect");
> + TEST(test_ctype_isalpha(), "isalpha() works as we expect");
> + TEST(test_ctype_isalnum(), "isalnum() works as we expect");
> + TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
> + TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
> + TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
> + TEST(test_ctype_isascii(), "isascii() works as we expect");
> + TEST(test_ctype_islower(), "islower() works as we expect");
> + TEST(test_ctype_isupper(), "isupper() works as we expect");
> + TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
> + TEST(test_ctype_ispunct(), "ispunct() works as we expect");
> + TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
> + TEST(test_ctype_isprint(), "isprint() works as we expect");
Each class (e.g. space or digit) is mentioned thrice here: When
declaring its function with TEST_CTYPE_FUNC, when calling said function
and again in the test description. Adding a new class requires adding
two lines of code. That's not too bad, but the original implementation
didn't require that repetition and adding a new class only required
adding a single line.
I mentioned this briefly in my review of v1 in
https://lore.kernel.org/git/f743b473-40f8-423d-bf5b-d42b92e5aa1b@web.de/
and showed a way to define character classes without repeating their
names. You don't have to follow that suggestion, but it would be nice
if you could give some feedback about it.
> +
> + return test_done();
> +}
René
^ permalink raw reply related
* Re: [RFC PATCH] grep: default to posix digits with -P
From: René Scharfe @ 2024-01-01 17:18 UTC (permalink / raw)
To: Carlo Marcelo Arenas Belón, git
In-Reply-To: <20240101150336.89098-1-carenas@gmail.com>
Am 01.01.24 um 16:03 schrieb Carlo Marcelo Arenas Belón:
> Since acabd2048e (grep: correctly identify utf-8 characters with
> \{b,w} in -P, 2023-01-08), PCRE2's UCP mode has been enabled when
> UTF content was expected and therefore muktibyte characters are
> included as part of all character classes (when defined).
>
> note that if the locale used is not UTF enabled (specially: C, POSIX)
> or uses an extended charmap binary that is not unicode compatible, binary
> match will be used instead.
>
> It was argued that doing so, at least for \d, was not a good idea,
> as that might not be what the user expected based on its historical
> meaning and was also slower, and indeed a similar change that was done
> to GNU grep was reverted and required further tweaks.
>
> At that time, PCRE2 didn't have a way to disable UCP's character
> expansion selectively, but flags to do so, and that will be
> available in the next release (planned soon) were added, and
> one of them has been in use by GNU grep since their last release
> in May (only if built and linked against the prereleased PCRE2
> library though).
>
> Add flags to make sure that both \d and [:digit:] only include
> ASCII digits so that `git grep` will be closer to GNU grep and
> improve performance as a side effect, but add a configuration flag
> to allow keeping the current behaviour (which is closer to perl
> and ripgrep).
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
> Documentation/config/grep.txt | 4 ++++
> grep.c | 37 ++++++++++++++++++++++-------
> grep.h | 5 ++++
> t/perf/p7822-grep-perl-character.sh | 11 +++++++--
> t/t7818-grep-digit.sh | 32 +++++++++++++++++++++++++
> 5 files changed, 78 insertions(+), 11 deletions(-)
> create mode 100755 t/t7818-grep-digit.sh
>
> diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt
> index e521f20390..4e405c8ad1 100644
> --- a/Documentation/config/grep.txt
> +++ b/Documentation/config/grep.txt
> @@ -26,3 +26,7 @@ grep.fullName::
> grep.fallbackToNoIndex::
> If set to true, fall back to git grep --no-index if git grep
> is executed outside of a git repository. Defaults to false.
> +
> +grep.perl.digit::
> + If set to true, use the perl definitions for \d and [:digit:].
> + Defaults to false.
> diff --git a/grep.c b/grep.c
> index fc2d0c837a..fec36ccb30 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -88,6 +88,11 @@ int grep_config(const char *var, const char *value,
> return 0;
> }
>
> + if (!strcmp(var, "grep.perl.digit")) {
> + opt->perl_digit = git_config_bool(var, value);
> + return 0;
> + }
> +
> if (!strcmp(var, "color.grep"))
> opt->color = git_config_colorbool(var, value);
> if (!strcmp(var, "color.grep.match")) {
> @@ -301,6 +306,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
> int patinforet;
> size_t jitsizearg;
> int literal = !opt->ignore_case && (p->fixed || p->is_fixed);
> + uint32_t xoptions = 0;
>
> /*
> * Call pcre2_general_context_create() before calling any
> @@ -321,17 +327,22 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
> }
> options |= PCRE2_CASELESS;
> }
> - if (!opt->ignore_locale && is_utf8_locale() && !literal)
> - options |= (PCRE2_UTF | PCRE2_UCP | PCRE2_MATCH_INVALID_UTF);
> + if (!opt->ignore_locale && is_utf8_locale() && !literal) {
> + options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
>
> -#ifndef GIT_PCRE2_VERSION_10_35_OR_HIGHER
> - /*
> - * Work around a JIT bug related to invalid Unicode character handling
> - * fixed in 10.35:
> - * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
> - */
> - options &= ~PCRE2_UCP;
> +#ifdef GIT_PCRE2_VERSION_10_35_OR_HIGHER
> + /*
> + * Work around a JIT bug related to invalid Unicode character handling
> + * fixed in 10.35:
> + * https://github.com/PCRE2Project/pcre2/commit/c21bd977547d
> + */
> + options |= PCRE2_UCP;
> +#ifdef GIT_PCRE2_VERSION_10_43_OR_HIGHER
> + if (!opt->perl_digit)
> + xoptions |= (PCRE2_EXTRA_ASCII_BSD | PCRE2_EXTRA_ASCII_DIGIT);
> #endif
> +#endif
Why do we need that extra level of indentation?
The old code turned PCRE2_UCP on by default and turned it off for older
versions. The new code enables PCRE2_UCP only for newer versions. The
result should be the same, no? So why change that part at all?
But the comment is now off, isn't it? The workaround was turning
PCRE2_UCP off for older versions (because those were broken), not
turning it on for newer versions (because it would be required by some
unfixed regression).
Do we need to nest the checks for GIT_PCRE2_VERSION_10_35_OR_HIGHER and
GIT_PCRE2_VERSION_10_43_OR_HIGHER? Keeping them separate would help
keep the #ifdef parts as short as possible and maintainable
independently.
> + }
>
> #ifndef GIT_PCRE2_VERSION_10_36_OR_HIGHER
> /* Work around https://bugs.exim.org/show_bug.cgi?id=2642 fixed in 10.36 */
> @@ -339,6 +350,14 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
> options |= PCRE2_NO_START_OPTIMIZE;
> #endif
>
> + if (xoptions) {
> + if (!p->pcre2_compile_context)
> + p->pcre2_compile_context = pcre2_compile_context_create(p->pcre2_general_context);
> +
> + pcre2_set_compile_extra_options(p->pcre2_compile_context,
> + xoptions);
> + }
> +
> p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,
> p->patternlen, options, &error, &erroffset,
> p->pcre2_compile_context);
> diff --git a/grep.h b/grep.h
> index 926c0875c4..cd5c416a0a 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -4,6 +4,9 @@
> #ifdef USE_LIBPCRE2
> #define PCRE2_CODE_UNIT_WIDTH 8
> #include <pcre2.h>
> +#if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 43) || PCRE2_MAJOR >= 11
> +#define GIT_PCRE2_VERSION_10_43_OR_HIGHER
> +#endif
> #if (PCRE2_MAJOR >= 10 && PCRE2_MINOR >= 36) || PCRE2_MAJOR >= 11
> #define GIT_PCRE2_VERSION_10_36_OR_HIGHER
> #endif
> @@ -178,6 +181,8 @@ struct grep_opt {
>
> void (*output)(struct grep_opt *opt, const void *data, size_t size);
> void *output_priv;
> +
> + unsigned perl_digit:1;
> };
>
> #define GREP_OPT_INIT { \
> diff --git a/t/perf/p7822-grep-perl-character.sh b/t/perf/p7822-grep-perl-character.sh
> index 87009c60df..cc88d5a695 100755
> --- a/t/perf/p7822-grep-perl-character.sh
> +++ b/t/perf/p7822-grep-perl-character.sh
> @@ -8,6 +8,13 @@ etc.) we will test the patterns under those numbers of threads.
>
> . ./perf-lib.sh
>
> +# setting a LOCALE is needed, but not yet supported by :
> +#. "$TEST_DIRECTORY"/lib-gettext.sh
> +
> +# Invoke like:
> +#
> +# LC_ALL=is_IS.utf8 ./p7822-grep-perl-character.sh
> +
> test_perf_large_repo
> test_checkout_worktree
>
> @@ -27,13 +34,13 @@ do
> if ! test_have_prereq PERF_GREP_ENGINES_THREADS
> then
> test_perf "grep -P '$pattern'" --prereq PCRE "
> - git -P grep -f pat || :
> + git grep -P -f pat || :
> "
> else
> for threads in $GIT_PERF_GREP_THREADS
> do
> test_perf "grep -P '$pattern' with $threads threads" --prereq PTHREADS,PCRE "
> - git -c grep.threads=$threads -P grep -f pat || :
> + git -c grep.threads=$threads grep -P -f pat || :
What's going on here? You remove the -P option of git (--no-pager) and
add the -P option of git grep (--perl-regexp). So this perf test never
tested PCRE despite its name? Seems worth a separate patch.
Do the performance numbers in acabd2048e (grep: correctly identify utf-8
characters with \{b,w} in -P, 2023-01-08) still hold up in that light?
> "
> done
> fi
René
^ permalink raw reply
* Re: [RFC PATCH] grep: default to posix digits with -P
From: Eric Sunshine @ 2024-01-01 18:07 UTC (permalink / raw)
To: Carlo Marcelo Arenas Belón; +Cc: git
In-Reply-To: <20240101150336.89098-1-carenas@gmail.com>
On Mon, Jan 1, 2024 at 10:04 AM Carlo Marcelo Arenas Belón
<carenas@gmail.com> wrote:
> Since acabd2048e (grep: correctly identify utf-8 characters with
> \{b,w} in -P, 2023-01-08), PCRE2's UCP mode has been enabled when
> UTF content was expected and therefore muktibyte characters are
> included as part of all character classes (when defined).
s/muktibyte/multibyte/
^ permalink raw reply
* [ANNOUNCE] Git Rev News edition 106
From: Christian Couder @ 2024-01-01 20:53 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Jakub Narebski, Markus Jansen,
Štěpán Němec, Kaartic Sivaraam, Taylor Blau,
Johannes Schindelin, Ævar Arnfjörð Bjarmason, lwn,
Josh Soref, Eric Sunshine, Elijah Newren, VonC, Scott Chacon
Hi everyone,
The 106th edition of Git Rev News is now published:
https://git.github.io/rev_news/2023/12/31/edition-106/
Thanks a lot to VonC, Josh Soref and Štěpán Němec who helped this month!
Enjoy,
Christian, Jakub, Markus and Kaartic.
PS: An issue for the next edition is already opened and contributions
are welcome:
https://github.com/git/git.github.io/issues/679
^ permalink raw reply
* Re: [PATCH 2/6] setup: move creation of "refs/" into the files backend
From: Karthik Nayak @ 2024-01-02 13:23 UTC (permalink / raw)
To: Patrick Steinhardt, git
In-Reply-To: <ae013eaa4aba0d68172ff03dbe9f2c2bca596285.1703754513.git.ps@pks.im>
Patrick Steinhardt <ps@pks.im> writes:
> Move the code to create the directory into the files backend itself to
> make it so. This means that future ref backends will also need to have
> equivalent logic around to ensure that the directory exists, but it
> seems a lot more sensible to have it this way round than to require
> callers to create the directory themselves.
>
Why not move it to refs.c:refs_init_db() instead? this way each
implementation doesn't have to do it?
@@ -2020,14 +2024,30 @@ const char *refs_resolve_ref_unsafe(struct
ref_store *refs,
/* backend functions */
int refs_init_db(struct ref_store *refs, int flags, struct strbuf *err)
{
+ /*
+ * We need to create a "refs" dir in any case so that older versions of
+ * Git can tell that this is a repository. This serves two main
+ * purposes:
+ *
+ * - Clients will know to stop walking the parent-directory chain when
+ * detecting the Git repository. Otherwise they may end up detecting
+ * a Git repository in a parent directory instead.
+ *
+ * - Instead of failing to detect a repository with unknown reference
+ * format altogether, old clients will print an error saying that
+ * they do not understand the reference format extension.
+ */
+ safe_create_dir(git_path("refs"), 1);
+ adjust_shared_perm(git_path("refs"));
+
return refs->be->init_db(refs, flags, err);
}
^ permalink raw reply
* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Karthik Nayak @ 2024-01-02 15:18 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, ps, christian.couder
In-Reply-To: <xmqqsf3oj3u8.fsf@gitster.g>
[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]
Junio C Hamano <gitster@pobox.com> writes:
>
> It would not begin with 40-hex, though. If I were doing this,
> perhaps I'd say we should first split is_pseudoref_syntax() that is
> overly loose into to classes (e.g. "caps with underscores that ends
> with HEAD" and everything else), silently reject false positives
> among the latter class. Then we rename those that are misnamed
> (there should be only few, like AUTO_MERGE that should be a
> pseudoref but named without _HEAD; I do not think of anything that
> ends with _HEAD that is not a ref) over time and drop the latter
> class.
>
I agree about checking the contents of the files to filter out false
positives around the filenames. Alright, this seems like a good way to
go.
Summarizing, we'll change `is_pseudoref_syntax()` to
1. Check for filename to be UPPERCASE including '-', '_'.
a. If the filename ends with _HEAD, we return as positive
b. Else check the file content for SHA1/SHA256 hex
This still leaves out HEAD ref, which is not a pseudo ref (since it can
be a symref at times). I'll figure out something there.
>
>> While you're here, I wonder if you have any thoughts on the last block
>> of my first mail.
>>
>>> Over this, I'm also curious on what the mailing list thinks about
>>> exposing this to the client side. Would an `--all` option for
>>> git-for-each-ref(1) be sufficient?
>
> "list pseudorefs in addition to things below refs/"? Sounds OK to
> me as a feature.
>
> However, "--all" does not mean that in the context of "git log"
> family of commands. Over there, it means "not just --branches,
> --tags, and --remotes, but everything" which is still limited below
> "refs/".
>
Good point, I agree, usage of "--all" would clash here.
> As "git for-each-ref" takes pattern that is prefix match, e.g.,
>
> $ git for-each-ref refs/remotes/
>
> shows everything like refs/remotes/origin/main that begins with
> refs/remotes/, I wonder if
>
> $ git for-each-ref ""
>
> should mean what you are asking for. After all, "git for-each-ref"
> does *not* take "--branches" and others like "git log" family to
> limit its operation to subhierarchy of "refs/" to begin with.
But I don't think using an empty pattern is the best way to go forward.
This would break the pattern matching feature. For instance, what if the
user wanted to print all refs, but pattern match "*_HEAD"?
Would that be
$ git for-each-ref "" "*_HEAD"
I think this would be confusing, since the first pattern is now acting
as an option, since its not really filtering rather its changing the
search space.
Maybe "--all-refs" or "--all-ref-types" instead?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply
* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Karthik Nayak @ 2024-01-02 15:23 UTC (permalink / raw)
To: Patrick Steinhardt, Junio C Hamano; +Cc: git, christian.couder
In-Reply-To: <ZY1PLtPue4PgbhwU@tanuki>
[-- Attachment #1: Type: text/plain, Size: 1992 bytes --]
Patrick Steinhardt <ps@pks.im> writes:
> One interesting question is how we should treat files that look like a
> pseudoref, but which really aren't. I'm not aware of any such files
> written by Git itself, but it could certainly be that a user wrote such
> a file into the repository manually. But given that we're adding new
> behaviour that will be opt-in (e.g. via a new switch) I'd rather err on
> the side of caution and mark any such file as broken instead of silently
> ignoring them.
>
This is definitely tricky, especially since something like
`COMMIT_EDITMSG` passes the `is_pseudoref_syntax()` check and could
simply contain a commit-ish object ID. Here identifying that this is not
a pseudoref is hard when it satisfies both:
1. The general pseudoref syntax
2. Contains the expected file content type
>
> Yup, for the reftable we don't have the issue of "How do we detect refs
> dynamically" at all. So I would love for there to be a way to print all
> refs in the refdb, regardless of whether they start with `refs/` or look
> like a pseudoref or whatever else. Otherwise it wouldn't be possible for
> a user to delete anything stored in the refdb that may have a funny
> name, be it intentionally, by accident or due to a bug.
>
> In the reftable backend, the ref iterator's `_advance()` function has a
> hardcoded `starts_with(refname, "refs/")` check. If false, then we'd
> skip the ref in order to retain the same behaviour that the files
> backend has. So maybe what we should be doing is to introduce a new flag
> `DO_FOR_EACH_ALL_REFS` and expose it via git-for-each-ref(1) or
> git-show-ref(1). So:
>
> - For the reftable backend we'd skip the `starts_with()` check and
> simply return all refs.
>
> - For the files backend we'd also iterate through all files in
> $GIT_DIR and check whether they are pseudoref-like.
>
This makes sense to me and is along what I was considering for the
dynamic approach, thanks for writing it down, clarifies my thoughts.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply
* Re: [PATCH v2 1/1] Replace SID with domain/username
From: Junio C Hamano @ 2024-01-02 16:20 UTC (permalink / raw)
To: Sören Krecker; +Cc: git, sunshine
In-Reply-To: <20231229120319.3797-2-soekkle@freenet.de>
Sören Krecker <soekkle@freenet.de> writes:
> Replace SID with domain/username in erromessage, if owner of repository
> and user are not equal on windows systems.
"erromessage" -> "error messages" or something?
This may not be a question raised by anybody who know Windows, but
because I do not do Windows, it makes me wonder if this is losing
information. Can two SID for the same user be active at the same
time, which would cause user_sid_to_user_name() potentially yield
the same string for two different SID?
In any case, I am reasonably sure that Dscho will say yes or no to
this patch (the above "makes me wonder" does not need to be
resolved) and I can wait until then.
Thanks.
> Signed-off-by: Sören Krecker <soekkle@freenet.de>
> ---
> compat/mingw.c | 28 ++++++++++++++++++++++++----
> 1 file changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/compat/mingw.c b/compat/mingw.c
> index 42053c1f65..05aeaaa9ad 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -2684,6 +2684,26 @@ static PSID get_current_user_sid(void)
> return result;
> }
>
> +static BOOL user_sid_to_user_name(PSID sid, LPSTR *str)
> +{
> + SID_NAME_USE pe_use;
> + DWORD len_user = 0, len_domain = 0;
> + BOOL translate_sid_to_user;
> +
> + /* returns only FALSE, because the string pointers are NULL*/
> + LookupAccountSidA(NULL, sid, NULL, &len_user, NULL, &len_domain,
> + &pe_use);
> + /*Alloc needed space of the strings*/
> + ALLOC_ARRAY((*str), (size_t)len_domain + (size_t)len_user);
> + translate_sid_to_user = LookupAccountSidA(NULL, sid, (*str) + len_domain, &len_user,
> + *str, &len_domain, &pe_use);
> + *(*str + len_domain) = '/';
> + if (translate_sid_to_user == FALSE) {
> + FREE_AND_NULL(*str);
> + }
> + return translate_sid_to_user;
> +}
> +
> static int acls_supported(const char *path)
> {
> size_t offset = offset_1st_component(path);
> @@ -2767,7 +2787,7 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
> } else if (report) {
> LPSTR str1, str2, to_free1 = NULL, to_free2 = NULL;
>
> - if (ConvertSidToStringSidA(sid, &str1))
> + if (user_sid_to_user_name(sid, &str1))
> to_free1 = str1;
> else
> str1 = "(inconvertible)";
> @@ -2776,7 +2796,7 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
> str2 = "(none)";
> else if (!IsValidSid(current_user_sid))
> str2 = "(invalid)";
> - else if (ConvertSidToStringSidA(current_user_sid, &str2))
> + else if (user_sid_to_user_name(current_user_sid, &str2))
> to_free2 = str2;
> else
> str2 = "(inconvertible)";
> @@ -2784,8 +2804,8 @@ int is_path_owned_by_current_sid(const char *path, struct strbuf *report)
> "'%s' is owned by:\n"
> "\t'%s'\nbut the current user is:\n"
> "\t'%s'\n", path, str1, str2);
> - LocalFree(to_free1);
> - LocalFree(to_free2);
> + free(to_free1);
> + free(to_free2);
> }
> }
^ permalink raw reply
* Re: [PATCH] write-or-die: make GIT_FLUSH a Boolean environment variable
From: Junio C Hamano @ 2024-01-02 16:29 UTC (permalink / raw)
To: Torsten Bögershausen
Cc: Chandra Pratap via GitGitGadget, git, Chandra Pratap,
Chandra Pratap
In-Reply-To: <20240101082446.GA21905@tb-raspi4>
Torsten Bögershausen <tboegi@web.de> writes:
>> - char *cp;
>> + int cp;
>>
>> if (f == stdout) {
>> if (skip_stdout_flush < 0) {
>> - /* NEEDSWORK: make this a normal Boolean */
>> - cp = getenv("GIT_FLUSH");
>> - if (cp)
>> - skip_stdout_flush = (atoi(cp) == 0);
>> + cp = git_env_bool("GIT_FLUSH", -1);
>> + if (cp >= 0)
>> + skip_stdout_flush = (cp == 0);
>> else if ((fstat(fileno(stdout), &st) == 0) &&
>> S_ISREG(st.st_mode))
>> skip_stdout_flush = 1;
>
> I think that the "cp" variable could be renamed,
> cp is not a "char pointer" any more.
Absolutely. Very good point.
> However, using the logic from git_env_bool(), it can go away anyway,
> wouldn't it ?
True.
If we are doing clean-ups in this area, we may want to also stop
doing "== 0" comparisons on lines the patch touches while at it.
> diff --git a/write-or-die.c b/write-or-die.c
> index 42a2dc73cd..01b042bf12 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -13,21 +13,21 @@
> * more. So we just ignore that case instead (and hope we get
> * the right error code on the flush).
> *
> + * The flushing can be skipped like this:
> + * GIT_FLUSH=0 git-rev-list HEAD
> + *
> * If the file handle is stdout, and stdout is a file, then skip the
> - * flush entirely since it's not needed.
> + * flush as well since it's not needed.
> */
> void maybe_flush_or_die(FILE *f, const char *desc)
> {
> static int skip_stdout_flush = -1;
> struct stat st;
> - char *cp;
>
> if (f == stdout) {
> if (skip_stdout_flush < 0) {
> - /* NEEDSWORK: make this a normal Boolean */
> - cp = getenv("GIT_FLUSH");
> - if (cp)
> - skip_stdout_flush = (atoi(cp) == 0);
> + if (git_env_bool("GIT_FLUSH", -1) == 0)
> + skip_stdout_flush = 1;
> else if ((fstat(fileno(stdout), &st) == 0) &&
> S_ISREG(st.st_mode))
> skip_stdout_flush = 1;
^ permalink raw reply
* Re: [Outreachy][PATCH v3] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: Junio C Hamano @ 2024-01-02 16:35 UTC (permalink / raw)
To: René Scharfe
Cc: Achu Luma, git, chriscool, christian.couder, phillip.wood,
steadmon
In-Reply-To: <435a03c7-dbf3-4ddb-b183-cac86ed0467d@web.de>
René Scharfe <l.s.r@web.de> writes:
>> ...
>> + for (int i = 0; i < 256; i++) { \
>> + if (!check_int(func(i), ==, is_in(string, i))) \
>> + test_msg(" i: %02x", i); \
>> + } \
>
> This loop is indented with spaces followed by tabs. The Git project
> prefers indenting by tabs and in some cases tabs followed by spaces, but
> not the other way array. git am warns about such whitespace errors and
> can actually fix them automatically, so I imagine this wouldn't be a
> deal breaker. But even if it seems picky, respecting the project's
> preferences from the start reduces unnecessary friction.
>
> The original test reported the number of a misclassified character
> (basically its ASCII code) in both decimal and hexadecimal form. This
> code prints it only in hexadecimal, but without the prefix "0x". A
> casual reader could mistake hexadecimal numbers for decimal ones as a
> result. Printing only one suffices, but I think it's better to either
> use decimal notation without any prefix or hexadecimal with the "0x"
> prefix to avoid confusion. There's no reason to be stingy with the
> screen space here.
> ...
> Each class (e.g. space or digit) is mentioned thrice here: When
> declaring its function with TEST_CTYPE_FUNC, when calling said function
> and again in the test description. Adding a new class requires adding
> two lines of code. That's not too bad, but the original implementation
> didn't require that repetition and adding a new class only required
> adding a single line.
Thanks for an excellent review.
>
> I mentioned this briefly in my review of v1 in
> https://lore.kernel.org/git/f743b473-40f8-423d-bf5b-d42b92e5aa1b@web.de/
> and showed a way to define character classes without repeating their
> names. You don't have to follow that suggestion, but it would be nice
> if you could give some feedback about it.
>
>> +
>> + return test_done();
>> +}
>
> René
^ permalink raw reply
* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Junio C Hamano @ 2024-01-02 16:49 UTC (permalink / raw)
To: Karthik Nayak; +Cc: git, ps, christian.couder
In-Reply-To: <CAOLa=ZTPxWXnZ8kpBB7=cybNfdEv6d6O37Em7Vpmcw=enpY1_w@mail.gmail.com>
Karthik Nayak <karthik.188@gmail.com> writes:
>> As "git for-each-ref" takes pattern that is prefix match, e.g.,
>>
>> $ git for-each-ref refs/remotes/
>>
>> shows everything like refs/remotes/origin/main that begins with
>> refs/remotes/, I wonder if
>>
>> $ git for-each-ref ""
>>
>> should mean what you are asking for. After all, "git for-each-ref"
>> does *not* take "--branches" and others like "git log" family to
>> limit its operation to subhierarchy of "refs/" to begin with.
>
> But I don't think using an empty pattern is the best way to go forward.
> This would break the pattern matching feature. For instance, what if the
> user wanted to print all refs, but pattern match "*_HEAD"?
>
> Would that be
>
> $ git for-each-ref "" "*_HEAD"
Because you do not omit the leading hierarchy when using globs:
$ git for-each-ref v2.4\?.\* ;# nothing
$ git for-each-ref tags/v2.4\?.\* ;# nothing
$ git for-each-ref refs/tags/v2.4\?.\* ;# gives tags in v2.40 and onwards
I would expect that it would be more like
$ git for-each-ref "*_HEAD"
And because you can have two or more patterns, e.g.,
$ git for-each-ref refs/tags/v2.4\?.\* refs/heads/m\*
to get those recent tags and branches whose name begins with 'm', I
would expect your
$ git for-each-ref "" "*_HEAD"
would probably be equivalent to
$ git for-each-ref ""
^ permalink raw reply
* Re: [PATCH v2 1/1] Replace SID with domain/username
From: Junio C Hamano @ 2024-01-02 17:33 UTC (permalink / raw)
To: Sören Krecker; +Cc: git, sunshine
In-Reply-To: <xmqqplyjg10l.fsf@gitster.g>
Junio C Hamano <gitster@pobox.com> writes:
> Sören Krecker <soekkle@freenet.de> writes:
>
>> Replace SID with domain/username in erromessage, if owner of repository
>> and user are not equal on windows systems.
>
> "erromessage" -> "error messages" or something?
>
> This may not be a question raised by anybody who know Windows, but
> because I do not do Windows, it makes me wonder if this is losing
> information. Can two SID for the same user be active at the same
> time, which would cause user_sid_to_user_name() potentially yield
> the same string for two different SID?
>
> In any case, I am reasonably sure that Dscho will say yes or no to
> this patch (the above "makes me wonder" does not need to be
> resolved) and I can wait until then.
>
> Thanks.
Another thing I forgot to mention (but did wonder). The new helper
function does allow LookupAccountSidA() to fail. Should it fall
back to ConvertSidToStringSidA() that the original has been using?
In any case, I do not think a failure to convert will result in an
attempt to format ("%s", NULL) thanks to the existing code that uses
the stringified SID, which is good.
^ permalink raw reply
* Re: [PATCH v3 1/1] Replace SID with domain/username
From: Junio C Hamano @ 2024-01-02 17:43 UTC (permalink / raw)
To: Sören Krecker; +Cc: git, sunshine
In-Reply-To: <20231231091245.2853-2-soekkle@freenet.de>
Sören Krecker <soekkle@freenet.de> writes:
> Replace SID with domain/username in error message, if owner of repository
> and user are not equal on windows systems.
>
> Old message:
> '''
> fatal: detected dubious ownership in repository at 'C:/Users/test/source/repos/git'
> 'C:/Users/test/source/repos/git' is owned by:
> 'S-1-5-21-571067702-4104414259-3379520149-500'
> but the current user is:
> 'S-1-5-21-571067702-4104414259-3379520149-1001'
> To add an exception for this directory, call:
>
> git config --global --add safe.directory C:/Users/test/source/repos/git
> '''
>
> New massage:
"massage"???
> '''
> fatal: detected dubious ownership in repository at 'C:/Users/test/source/repos/git'
> 'C:/Users/test/source/repos/git' is owned by:
> 'DESKTOP-L78JVA6/Administrator'
> but the current user is:
> 'DESKTOP-L78JVA6/test'
> To add an exception for this directory, call:
>
> git config --global --add safe.directory C:/Users/test/source/repos/git
> '''
The same "does this lose information?" comment applies to this one
as well, as the fundamental approach is unchanged.
> +static BOOL user_sid_to_user_name(PSID sid, LPSTR *str)
> +{
> + SID_NAME_USE pe_use;
> + DWORD len_user = 0, len_domain = 0;
> + BOOL translate_sid_to_user;
> +
> + /* returns only FALSE, because the string pointers are NULL*/
> + LookupAccountSidA(NULL, sid, NULL, &len_user, NULL, &len_domain,
> + &pe_use);
> + /*Alloc needed space of the strings*/
> + ALLOC_ARRAY((*str), (size_t)len_domain + (size_t)len_user);
> + translate_sid_to_user = LookupAccountSidA(NULL, sid, (*str) + len_domain, &len_user,
> + *str, &len_domain, &pe_use);
> + *(*str + len_domain) = '/';
> + if (translate_sid_to_user == FALSE) {
> + FREE_AND_NULL(*str);
> + }
Is this "FREE_AND_NULL" about clearing after you see an error? If
so, shouldn't "overwrite the byte after the domain part with a slash"
be done only when you have no error, i.e., perhaps
translate_sid_to_user = LookupAccountSidA(...);
if (!translate_sid_to_user)
FREE_AND_NULL(*str);
else
(*str)[len_domain] = '/';
or something along the line?
> + return translate_sid_to_user;
> +}
^ permalink raw reply
* Re: subtree split after deleting and re-running git-subtree add, fails with "fatal: cache for XXX already exists!"
From: Christian Couder @ 2024-01-02 18:23 UTC (permalink / raw)
To: Eli Schwartz; +Cc: git
In-Reply-To: <6de00946-9c5a-4854-9e49-069a22f8a782@gmail.com>
On Tue, Dec 26, 2023 at 1:58 AM Eli Schwartz <eschwartz93@gmail.com> wrote:
>
> Originally reported in https://github.com/eli-schwartz/aurpublish/issues/30
>
>
> Given a subtree that gets messed up, some users might naturally
> gravitate towards deleting the subtree, and recreating it again via
> `git subtree add`. This can result in a difficult to solve situation.
> Any attempt to split it seems to produce failure.
>
> Reproducer:
>
> git init testme && cd testme
> mkdir foo
> touch foo/bar
> git add foo/bar
> git commit -m ...
> split_commit=$(git subtree split -P foo --rejoin)
> # Added dir 'foo'
> echo "${split_commit}"
> # 42517e4b9fe310a64be2a777ef08c91bd582b385
>
> git rm -r foo
> git commit -m deleted
> git subtree add --prefix foo "${split_commit}"
> # Added dir 'foo'
> git subtree split -P foo --rejoin
> # fatal: cache for 42517e4b9fe310a64be2a777ef08c91bd582b385 already exists!
>
>
>
> The interesting thing here is that in git.git commit
> d2f0f819547de35ffc923fc963f806f1656eb2ca:
> "subtree: more consistent error propagation"
> the git-subtree program got a bit of a facelift w.r.t. proper error
> checking.
>
> In particular, in find_existing_splits, `cache_set $sub $sub` will fail
> here. But before that commit, the die did not propagate. It turns out
> that actually ignoring this was "fine" and resulted in successfully
> splitting (while also printing a "warning": back then, the word "fatal"
> did not appear anywhere in the message; now it does).
Thanks for reporting this issue! Unfortunately it looks like git
subtree is not very well maintained these days.
There is another thread where Zach FettersMoore proposed other fixes
in what look like a similar area:
https://lore.kernel.org/git/pull.1587.v6.git.1701442494319.gitgitgadget@gmail.com/
Maybe you could team up with Zach to review each other's fixes?
^ permalink raw reply
* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Taylor Blau @ 2024-01-02 18:47 UTC (permalink / raw)
To: Karthik Nayak; +Cc: Junio C Hamano, git, ps, christian.couder
In-Reply-To: <CAOLa=ZTPxWXnZ8kpBB7=cybNfdEv6d6O37Em7Vpmcw=enpY1_w@mail.gmail.com>
On Tue, Jan 02, 2024 at 07:18:48AM -0800, Karthik Nayak wrote:
> > As "git for-each-ref" takes pattern that is prefix match, e.g.,
> >
> > $ git for-each-ref refs/remotes/
> >
> > shows everything like refs/remotes/origin/main that begins with
> > refs/remotes/, I wonder if
> >
> > $ git for-each-ref ""
> >
> > should mean what you are asking for. After all, "git for-each-ref"
> > does *not* take "--branches" and others like "git log" family to
> > limit its operation to subhierarchy of "refs/" to begin with.
>
> But I don't think using an empty pattern is the best way to go forward.
> This would break the pattern matching feature. For instance, what if the
> user wanted to print all refs, but pattern match "*_HEAD"?
>
> Would that be
>
> $ git for-each-ref "" "*_HEAD"
>
> I think this would be confusing, since the first pattern is now acting
> as an option, since its not really filtering rather its changing the
> search space.
>
> Maybe "--all-refs" or "--all-ref-types" instead?
I tend to agree that the special empty pattern would be a good shorthand
for listing all references underneath refs/, including any top-level
psuedo-refs.
But I don't think that I quite follow what Karthik is saying here.
for-each-ref returns the union of references that match the given
pattern(s), not their intersection. So if you wanted to list just the
psudo-refs ending in '_HEAD', you'd do:
$ git for-each-ref "*_HEAD"
I think if you wanted to list all pseudo-refs, calling the option
`--pseudo-refs` seems reasonable. But if you want to list some subset of
psueod-refs matching a given pattern, you should specify that pattern
directly.
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH 2/2] ref-filter: support filtering of operational refs
From: Taylor Blau @ 2024-01-02 18:49 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Junio C Hamano, Karthik Nayak, git, christian.couder
In-Reply-To: <ZY1PLtPue4PgbhwU@tanuki>
On Thu, Dec 28, 2023 at 11:34:22AM +0100, Patrick Steinhardt wrote:
> One interesting question is how we should treat files that look like a
> pseudoref, but which really aren't. I'm not aware of any such files
> written by Git itself, but it could certainly be that a user wrote such
> a file into the repository manually. But given that we're adding new
> behaviour that will be opt-in (e.g. via a new switch) I'd rather err on
> the side of caution and mark any such file as broken instead of silently
> ignoring them.
I probably wouldn't spend a ton of time worrying about this personally.
Without additional information, I think it's impossible for us to
determine a-priori whether or not a file underneath $GIT_DIR should be
interpreted as a pseudo-ref or not.
I agree with your reasoning that since this is opt-in via a new
command-line flag, that we're probably OK here enumerating the files in
$GIT_DIR, and printing out the ones that look like pseudo-refs.
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: Taylor Blau @ 2024-01-02 18:55 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Achu Luma, christian.couder, git, Christian Couder
In-Reply-To: <xmqqsf3ohkew.fsf@gitster.g>
On Tue, Dec 26, 2023 at 10:45:59AM -0800, Junio C Hamano wrote:
> Achu Luma <ach.lumap@gmail.com> writes:
>
> > diff --git a/t/helper/test-ctype.c b/t/helper/test-ctype.c
> > deleted file mode 100644
> > index e5659df40b..0000000000
> > --- a/t/helper/test-ctype.c
> > +++ /dev/null
> > @@ -1,70 +0,0 @@
> > -#include "test-tool.h"
> > -
> > -static int rc;
> > -
> > -static void report_error(const char *class, int ch)
> > -{
> > - printf("%s classifies char %d (0x%02x) wrongly\n", class, ch, ch);
> > - rc = 1;
> > -}
>
> So, if we have a is_foo() that characterises a byte that ought to
> be "foo" but gets miscategorised not to be "foo", we used to
> pinpoint exactly the byte value that was an issue. We did not do
> any early return ...
>
> > ...
> > -#define TEST_CLASS(t,s) { \
> > - int i; \
> > - for (i = 0; i < 256; i++) { \
> > - if (is_in(s, i) != t(i)) \
> > - report_error(#t, i); \
> > - } \
> > - if (t(EOF)) \
> > - report_error(#t, EOF); \
> > -}
>
> ... and reported for all errors in the "class".
>
> > diff --git a/t/unit-tests/t-ctype.c b/t/unit-tests/t-ctype.c
> > new file mode 100644
> > index 0000000000..41189ba9f9
> > --- /dev/null
> > +++ b/t/unit-tests/t-ctype.c
> > @@ -0,0 +1,76 @@
> > +#include "test-lib.h"
> > +
> > +static int is_in(const char *s, int ch)
> > +{
> > + /*
> > + * We can't find NUL using strchr. Accept it as the first
> > + * character in the spec -- there are no empty classes.
> > + */
> > + if (ch == '\0')
> > + return ch == *s;
> > + if (*s == '\0')
> > + s++;
> > + return !!strchr(s, ch);
> > +}
> > +
> > +/* Macro to test a character type */
> > +#define TEST_CTYPE_FUNC(func, string) \
> > +static void test_ctype_##func(void) \
> > +{ \
> > + int i; \
> > + for (i = 0; i < 256; i++) \
> > + check_int(func(i), ==, is_in(string, i)); \
> > +}
>
> Now, we let check_int() to do the checking for each and every byte
> value for the class. check_int() uses different reporting and shows
> the problematic value in a way that is more verbose and at the same
> time is a less specific and harder to understand:
>
> test_msg(" left: %"PRIdMAX, a);
> test_msg(" right: %"PRIdMAX, b);
>
> But that is probably the price to pay to use a more generic
> framework, I guess.
Perhaps I'm missing something here, since I haven't followed the
unit-test effort very closely, but this check_int() macro feels like it
might be overkill for what we're trying to do.
We know that the expected value is the result of is_in(string, i), so I
wonder if we might benefit from having an "assert_equals()" that looks
like:
assert_equals(is_in(string, i), func(i));
Where we follow the usual convention of treating the first argument as
the expected value, and the second as the actual value. Then we could
format our error message to be more specific, like:
test_msg("expected %d, got %d", expected, actual);
I think that this would be a little more readable, and still seems
flexible enough to support the kind of thing that check_int(..., ==,
...) is after.
> > +int cmd_main(int argc, const char **argv) {
> > + /* Run all character type tests */
> > + TEST(test_ctype_isspace(), "isspace() works as we expect");
> > + TEST(test_ctype_isdigit(), "isdigit() works as we expect");
> > + TEST(test_ctype_isalpha(), "isalpha() works as we expect");
> > + TEST(test_ctype_isalnum(), "isalnum() works as we expect");
> > + TEST(test_ctype_is_glob_special(), "is_glob_special() works as we expect");
> > + TEST(test_ctype_is_regex_special(), "is_regex_special() works as we expect");
> > + TEST(test_ctype_is_pathspec_magic(), "is_pathspec_magic() works as we expect");
> > + TEST(test_ctype_isascii(), "isascii() works as we expect");
> > + TEST(test_ctype_islower(), "islower() works as we expect");
> > + TEST(test_ctype_isupper(), "isupper() works as we expect");
> > + TEST(test_ctype_iscntrl(), "iscntrl() works as we expect");
> > + TEST(test_ctype_ispunct(), "ispunct() works as we expect");
> > + TEST(test_ctype_isxdigit(), "isxdigit() works as we expect");
> > + TEST(test_ctype_isprint(), "isprint() works as we expect");
> > +
> > + return test_done();
> > +}
>
> As a practice to use the unit-tests framework, the patch looks OK.
> helper/test-ctype.c indeed is an oddball that runs once and checks
> everything it wants to check, for which the unit tests framework is
> much more suited.
As an aside, I don't think we need the "works as we expect" suffix in
each test description. I personally would be fine with something like:
TEST(test_ctype_isspace(), "isspace()");
TEST(test_ctype_isdigit(), "isdigit()");
...
But don't feel strongly about it.
Thanks,
Taylor
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox