git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: "Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [Question] Unicode weirdness breaking tests on ZFS?
Date: Fri, 19 Nov 2021 13:30:33 -0500	[thread overview]
Message-ID: <e56b0227-14fc-26cf-7b98-fbf01f3c5cd7@gmail.com> (raw)
In-Reply-To: <xmqqmtm0qdqp.fsf@gitster.g>

On 11/19/2021 12:03 PM, Junio C Hamano wrote:
> Torsten Bögershausen <tboegi@web.de> writes:
> 
>> Should we conclude that the underlying os/zfs is not stable ?
>> Things don't seem to be reproducable
>>
>> What Git needs here in t0050 is that stat("ä") behaves the same as stat("a¨"),
>> when either "ä" or "a¨" exist on disk.
>> The same for open() and all other file system functions.
> 
> We either need to see these two are treated as the same thing, or
> these two are treated as two distict filesystem entities, just like
> stat("a") and stat("b") are.  What we absolutely need is the
> unification either always happens or never happens consistently.
> 
> I wonder what readdir() is returning.  After creat("ä") in an empty
> directory, does readdir() in there return "ä" or "a¨?  And vice
> versa?  Is this also inconsistent?

Following this suggestion, I added a test helper with this code:

int cmd__create_and_read(int argc, const char **argv)
{
	DIR *dir;
	struct dirent *de;

	if (strcmp(argv[0], "--nfc"))
		creat("\303\244", 0766);
	else if (strcmp(argv[0], "--nfd"))
		creat("\141\314\210", 0766);
	else
		die("select --nfc or --nfd");

	dir = opendir(".");
	readdir(dir);

	while ((de = readdir(dir)) != NULL)
		printf("%s\n", de->d_name);

	return 0;
}

And then added this test:

test_expect_success 'unicode stuff' '
	mkdir nfc &&
	(
		cd nfc &&
		test-tool create-and-read --nfc >../nfc.txt
	) &&

	mkdir nfd &&
	(
		cd nfd &&
		test-tool create-and-read --nfd >../nfd.txt
	) &&

	test_cmp nfc.txt nfd.txt
'

This test always passes for me, and is essentially doing
a similar check that the prereq is doing, except that it
actually writes both names to files instead of writing
one and doing a read with the other.

After changing the "$test_unicode" instances to instances of
"test_expect_success", I ran t0050 under --stress and quickly
got a failure on the 'merge (silent unicode normalization)'
test:


expecting success of 0050.11 'merge (silent unicode normalization)': 
        git reset --hard initial &&
        git merge topic

+ git reset --hard initial
error: unable to unlink old 'ä': No such file or directory
fatal: Could not reset index file to revision 'initial'.
error: last command exited with $?=128
not ok 11 - merge (silent unicode normalization)


Deleting that test gave mostly-consistent results, although I once
got a failure on the "setup unicode normalization tests" tests with
a similar error message:

+ git checkout -f main
error: unable to unlink old 'ä': No such file or directory
Switched to branch 'main'
error: last command exited with $?=1
not ok 8 - setup unicode normalization tests
 
>> ("ä" is the precomposed form "a¨" is the decomposed form,
>>  typically both render to the same glyph on the screen,
>>  and a hex dump or xxd will show what we had.
>>  I just use this notation here for illustration)
>>
>> Should we contact the zfs developers ?

Hopefully someone has a good way to contact them, and I
can start a thread at the appropriate place. To optimize
for their time, what is our minimal reproduction steps?

1. Build Git at the v2.34.0 tag.
2. cd to t/
3. ./t0050-filesystem.sh --stress

Those instructions (given enough time) should get the
repro on test 8, 'setup unicode normalization tests'.

To get the faster stress, the same steps work except
use the 'zfs-minimal' branch at my fork [1] because it
changes the tests to expect success, and demonstrates
the unpredictable tests more quickly.

[1] https://github.com/derrickstolee/git/tree/zfs-minimal

Thanks,
-Stolee

      reply	other threads:[~2021-11-19 18:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-17 15:17 [Question] Unicode weirdness breaking tests on ZFS? Derrick Stolee
2021-11-17 15:41 ` Ævar Arnfjörð Bjarmason
2021-11-17 16:12 ` Torsten Bögershausen
2021-11-17 17:06   ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
2021-11-17 17:39     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
2021-11-17 18:29       ` Derrick Stolee
2021-11-17 18:35         ` Derrick Stolee
2021-11-19 15:44           ` Torsten Bögershausen
2021-11-19 17:03             ` Junio C Hamano
2021-11-19 18:30               ` Derrick Stolee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e56b0227-14fc-26cf-7b98-fbf01f3c5cd7@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).