* [BUG] commit fails with 'bus error' when working directory is on an NFS share @ 2024-11-30 4:58 Dmitriy Panteleyev 2024-11-30 16:38 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Dmitriy Panteleyev @ 2024-11-30 4:58 UTC (permalink / raw) To: git I've recently upgraded my (Linux Mint) distribution version, which came with git v2.43.0 and I noticed that I can no longer `commit` on any working directory which resides on an NFS share mount. Git reports "Bus error (core dumped)" and dmesg shows multiple "NFS: server error: fileid changed. fsid 0:68: expected fileid 0xf8e3d8e80230ddb5, got 0xeeb48230d99ed0d4" messages. This does not happen if I move the working directory off the NFS share. I attempted to upgrade git to v2.47.1, with the same result. I then downgraded git to v2.34.1 (the version for the previous distribution release) and the error has resolved. This seems like a bug to me. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-11-30 4:58 [BUG] commit fails with 'bus error' when working directory is on an NFS share Dmitriy Panteleyev @ 2024-11-30 16:38 ` Jeff King 2024-12-01 17:17 ` Dmitriy Panteleyev 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2024-11-30 16:38 UTC (permalink / raw) To: Dmitriy Panteleyev; +Cc: git On Fri, Nov 29, 2024 at 09:58:51PM -0700, Dmitriy Panteleyev wrote: > I've recently upgraded my (Linux Mint) distribution version, which > came with git v2.43.0 and I noticed that I can no longer `commit` on > any working directory which resides on an NFS share mount. > > Git reports "Bus error (core dumped)" and dmesg shows multiple "NFS: > server error: fileid changed. fsid 0:68: expected fileid > 0xf8e3d8e80230ddb5, got 0xeeb48230d99ed0d4" messages. > > This does not happen if I move the working directory off the NFS share. I can't reproduce any problems here on a test NFS mount. But since the old version works here: > I attempted to upgrade git to v2.47.1, with the same result. > > I then downgraded git to v2.34.1 (the version for the previous > distribution release) and the error has resolved. Can you try bisecting between v2.34.1 and v2.43.0 to see which commit introduces the problem for you? -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-11-30 16:38 ` Jeff King @ 2024-12-01 17:17 ` Dmitriy Panteleyev 2024-12-01 21:36 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Dmitriy Panteleyev @ 2024-12-01 17:17 UTC (permalink / raw) To: Jeff King; +Cc: git On Sat, Nov 30, 2024 at 9:44 AM Jeff King <peff@peff.net> wrote: > > On Fri, Nov 29, 2024 at 09:58:51PM -0700, Dmitriy Panteleyev wrote: > > > I've recently upgraded my (Linux Mint) distribution version, which > > came with git v2.43.0 and I noticed that I can no longer `commit` on > > any working directory which resides on an NFS share mount. > > > > Git reports "Bus error (core dumped)" and dmesg shows multiple "NFS: > > server error: fileid changed. fsid 0:68: expected fileid > > 0xf8e3d8e80230ddb5, got 0xeeb48230d99ed0d4" messages. > > > > This does not happen if I move the working directory off the NFS share. > > I can't reproduce any problems here on a test NFS mount. But since the > old version works here: > > > I attempted to upgrade git to v2.47.1, with the same result. > > > > I then downgraded git to v2.34.1 (the version for the previous > > distribution release) and the error has resolved. > > Can you try bisecting between v2.34.1 and v2.43.0 to see which commit > introduces the problem for you? > > -Peff Bisecting: 0 revisions left to test after this (roughly 0 steps) [04fb96219abc0cbe46ba084997dc9066de3ac889] parse_object(): drop extra "has" check before checking object type ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-01 17:17 ` Dmitriy Panteleyev @ 2024-12-01 21:36 ` Jeff King 2024-12-01 22:24 ` Dmitriy Panteleyev 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2024-12-01 21:36 UTC (permalink / raw) To: Dmitriy Panteleyev; +Cc: git On Sun, Dec 01, 2024 at 10:17:44AM -0700, Dmitriy Panteleyev wrote: > > > I attempted to upgrade git to v2.47.1, with the same result. > > > > > > I then downgraded git to v2.34.1 (the version for the previous > > > distribution release) and the error has resolved. > > > > Can you try bisecting between v2.34.1 and v2.43.0 to see which commit > > introduces the problem for you? > > > > -Peff > > Bisecting: 0 revisions left to test after this (roughly 0 steps) > [04fb96219abc0cbe46ba084997dc9066de3ac889] parse_object(): drop extra > "has" check before checking object type That seems like an unlikely commit to introduce the problem you're seeing. And how did we end up with 0 revisions left to check, but no final outcome? Did you need to do one more test and "git bisect good/bad" on this commit? Or alternatively, can you share what you're doing to test the bisection? That might help us reproduce. I kind of wonder if the results might not be deterministic, to end up at an apparently unrelated commit like that. -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-01 21:36 ` Jeff King @ 2024-12-01 22:24 ` Dmitriy Panteleyev 2024-12-02 20:34 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Dmitriy Panteleyev @ 2024-12-01 22:24 UTC (permalink / raw) To: Jeff King; +Cc: git On Sun, Dec 1, 2024 at 2:36 PM Jeff King <peff@peff.net> wrote: > > On Sun, Dec 01, 2024 at 10:17:44AM -0700, Dmitriy Panteleyev wrote: > > > > > I attempted to upgrade git to v2.47.1, with the same result. > > > > > > > > I then downgraded git to v2.34.1 (the version for the previous > > > > distribution release) and the error has resolved. > > > > > > Can you try bisecting between v2.34.1 and v2.43.0 to see which commit > > > introduces the problem for you? > > > > > > -Peff > > > > Bisecting: 0 revisions left to test after this (roughly 0 steps) > > [04fb96219abc0cbe46ba084997dc9066de3ac889] parse_object(): drop extra > > "has" check before checking object type > > That seems like an unlikely commit to introduce the problem you're > seeing. And how did we end up with 0 revisions left to check, but no > final outcome? Did you need to do one more test and "git bisect > good/bad" on this commit? > You are right, Jeff, I needed to run one more bisect. But it does point to the commit I linked above. The bisect result is: 04fb96219abc0cbe46ba084997dc9066de3ac889 is the first bad commit commit 04fb96219abc0cbe46ba084997dc9066de3ac889 Author: Jeff King <peff@peff.net> Date: Thu Nov 17 17:37:58 2022 -0500 parse_object(): drop extra "has" check before checking object type When parsing an object of unknown type, we check to see if it's a blob, so we can use our streaming code path. This uses oid_object_info() to check the type, but before doing so we call repo_has_object_file(). This latter is pointless, as oid_object_info() will already fail if the object is missing. Checking it ahead of time just complicates the code and is a waste of resources (albeit small). Let's drop the redundant check. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> object.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) > Or alternatively, can you share what you're doing to test the bisection? > That might help us reproduce. I kind of wonder if the results might not > be deterministic, to end up at an apparently unrelated commit like that. > > -Peff I am not at all familiar with the standard process for this, but the way I ran the test is: (0. cloned test project into /nfs/proj/ and made a change) 1. cloned git repo (from github) into /tmp/git/ 2. ran bisect in /tmp/git/, starting with v2.34.1 (good) and v2.43.1 (bad) 3. ran `make all` in /tmp/git/ 4. in /nfs/proj/ ran `/tmp/git/bin-wrappers/git commit -m 'test'` 5. repeated 2-4 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-01 22:24 ` Dmitriy Panteleyev @ 2024-12-02 20:34 ` Jeff King 2024-12-03 2:48 ` Dmitriy Panteleyev 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2024-12-02 20:34 UTC (permalink / raw) To: Dmitriy Panteleyev; +Cc: git On Sun, Dec 01, 2024 at 03:24:35PM -0700, Dmitriy Panteleyev wrote: > You are right, Jeff, I needed to run one more bisect. But it does point to > the commit I linked above. The bisect result is: Thanks for checking. I'm still puzzled how this commit: > 04fb96219abc0cbe46ba084997dc9066de3ac889 is the first bad commit > commit 04fb96219abc0cbe46ba084997dc9066de3ac889 > Author: Jeff King <peff@peff.net> > Date: Thu Nov 17 17:37:58 2022 -0500 > > parse_object(): drop extra "has" check before checking object type > > When parsing an object of unknown type, we check to see if it's a blob, > so we can use our streaming code path. This uses oid_object_info() to > check the type, but before doing so we call repo_has_object_file(). This > latter is pointless, as oid_object_info() will already fail if the > object is missing. Checking it ahead of time just complicates the code > and is a waste of resources (albeit small). > > Let's drop the redundant check. could be the culprit, though. The diff is just diff --git a/object.c b/object.c index 8a74eb85e9..16eb944e98 100644 --- a/object.c +++ b/object.c @@ -287,8 +287,7 @@ struct object *parse_object_with_flags(struct repository *r, } if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || - (!obj && repo_has_object_file(r, oid) && - oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { if (!skip_hash && stream_object_signature(r, repl) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; So it is actually doing _less_, though what it is removing is going to just be a pack .idx lookup (or maybe a stat() call if the object is loose). > I am not at all familiar with the standard process for this, but the way I ran > the test is: > > (0. cloned test project into /nfs/proj/ and made a change) > 1. cloned git repo (from github) into /tmp/git/ > 2. ran bisect in /tmp/git/, starting with v2.34.1 (good) and v2.43.1 (bad) > 3. ran `make all` in /tmp/git/ > 4. in /nfs/proj/ ran `/tmp/git/bin-wrappers/git commit -m 'test'` > 5. repeated 2-4 That sounds reasonable. I'm still not sure what's going on. It's always possible that commit introduced a problem, but I just don't see it. So I still have a suspicion (especially given that your symptom is a bus error) that the problem might not be deterministic. I wonder if building git with: make SANITIZE=address,undefined and running the same test might yield anything useful. -Peff ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-02 20:34 ` Jeff King @ 2024-12-03 2:48 ` Dmitriy Panteleyev 2024-12-03 21:18 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Dmitriy Panteleyev @ 2024-12-03 2:48 UTC (permalink / raw) To: Jeff King; +Cc: git On Mon, Dec 2, 2024 at 1:41 PM Jeff King <peff@peff.net> wrote: > > On Sun, Dec 01, 2024 at 03:24:35PM -0700, Dmitriy Panteleyev wrote: > > > You are right, Jeff, I needed to run one more bisect. But it does point to > > the commit I linked above. The bisect result is: > > Thanks for checking. I'm still puzzled how this commit: > > > 04fb96219abc0cbe46ba084997dc9066de3ac889 is the first bad commit > > commit 04fb96219abc0cbe46ba084997dc9066de3ac889 > > Author: Jeff King <peff@peff.net> > > Date: Thu Nov 17 17:37:58 2022 -0500 > > > > parse_object(): drop extra "has" check before checking object type > > > > When parsing an object of unknown type, we check to see if it's a blob, > > so we can use our streaming code path. This uses oid_object_info() to > > check the type, but before doing so we call repo_has_object_file(). This > > latter is pointless, as oid_object_info() will already fail if the > > object is missing. Checking it ahead of time just complicates the code > > and is a waste of resources (albeit small). > > > > Let's drop the redundant check. > > could be the culprit, though. The diff is just > > diff --git a/object.c b/object.c > index 8a74eb85e9..16eb944e98 100644 > --- a/object.c > +++ b/object.c > @@ -287,8 +287,7 @@ struct object *parse_object_with_flags(struct repository *r, > } > > if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || > - (!obj && repo_has_object_file(r, oid) && > - oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > + (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > if (!skip_hash && stream_object_signature(r, repl) < 0) { > error(_("hash mismatch %s"), oid_to_hex(oid)); > return NULL; > > So it is actually doing _less_, though what it is removing is going to > just be a pack .idx lookup (or maybe a stat() call if the object is > loose). > > > I am not at all familiar with the standard process for this, but the way I ran > > the test is: > > > > (0. cloned test project into /nfs/proj/ and made a change) > > 1. cloned git repo (from github) into /tmp/git/ > > 2. ran bisect in /tmp/git/, starting with v2.34.1 (good) and v2.43.1 (bad) > > 3. ran `make all` in /tmp/git/ > > 4. in /nfs/proj/ ran `/tmp/git/bin-wrappers/git commit -m 'test'` > > 5. repeated 2-4 > > That sounds reasonable. I'm still not sure what's going on. It's always > possible that commit introduced a problem, but I just don't see it. So I > still have a suspicion (especially given that your symptom is a bus > error) that the problem might not be deterministic. > > I wonder if building git with: > > make SANITIZE=address,undefined > > and running the same test might yield anything useful. > > -Peff Not sure if this is useful, but this is what I got: AddressSanitizer:DEADLYSIGNAL ================================================================= ==155141==ERROR: AddressSanitizer: BUS on unknown address (pc 0x78811e863aed bp 0x7ffe9d5ac800 sp 0x7ffe9d5ac770 T0) ==155141==The signal is caused by a READ memory access. ==155141==Hint: this fault was caused by a dereference of a high value address (see register values below). Disassemble the provided pc to learn which register was used. #0 0x78811e863aed in inflate (/lib/x86_64-linux-gnu/libz.so.1+0xfaed) (BuildId: bbefe2bbdc367b0c3cfbfcf80c579930496fb963) #1 0x563e32ec7e5f in git_inflate /tmp/git_tests/git/zlib.c:118 #2 0x563e32bde431 in unpack_loose_header /tmp/git_tests/git/object-file.c:1271 #3 0x563e32be429c in loose_object_info /tmp/git_tests/git/object-file.c:1474 #4 0x563e32be5348 in do_oid_object_info_extended /tmp/git_tests/git/object-file.c:1582 #5 0x563e32be5dac in oid_object_info_extended /tmp/git_tests/git/object-file.c:1640 #6 0x563e32be5dac in oid_object_info /tmp/git_tests/git/object-file.c:1656 #7 0x563e32bf8b57 in parse_object_with_flags /tmp/git_tests/git/object.c:290 #8 0x563e32cfbd19 in write_ref_to_lockfile refs/files-backend.c:1772 #9 0x563e32d0196e in lock_ref_for_update refs/files-backend.c:2582 #10 0x563e32d0196e in files_transaction_prepare refs/files-backend.c:2755 #11 0x563e32ce6800 in ref_transaction_prepare /tmp/git_tests/git/refs.c:2266 #12 0x563e32ce6a5a in ref_transaction_commit /tmp/git_tests/git/refs.c:2315 #13 0x563e32d8c44e in update_head_with_reflog /tmp/git_tests/git/sequencer.c:1197 #14 0x563e326b2f51 in cmd_commit builtin/commit.c:1834 #15 0x563e3263002a in run_builtin /tmp/git_tests/git/git.c:466 #16 0x563e3263002a in handle_builtin /tmp/git_tests/git/git.c:721 #17 0x563e32633ff8 in run_argv /tmp/git_tests/git/git.c:788 #18 0x563e32633ff8 in cmd_main /tmp/git_tests/git/git.c:926 #19 0x563e3262c6a4 in main /tmp/git_tests/git/common-main.c:57 #20 0x78811d42a1c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 #21 0x78811d42a28a in __libc_start_main_impl ../csu/libc-start.c:360 #22 0x563e3262f6d4 in _start (/tmp/git_tests/git/git+0xa726d4) (BuildId: 197ee6cc3c63db9e10cfed4585ab78b52790454a) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: BUS (/lib/x86_64-linux-gnu/libz.so.1+0xfaed) (BuildId: bbefe2bbdc367b0c3cfbfcf80c579930496fb963) in inflate ==155141==ABORTING ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-03 2:48 ` Dmitriy Panteleyev @ 2024-12-03 21:18 ` Jeff King 2024-12-05 2:21 ` Dmitriy Panteleyev 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2024-12-03 21:18 UTC (permalink / raw) To: Dmitriy Panteleyev; +Cc: git On Mon, Dec 02, 2024 at 07:48:05PM -0700, Dmitriy Panteleyev wrote: > > I wonder if building git with: > > > > make SANITIZE=address,undefined > > > > and running the same test might yield anything useful. > > Not sure if this is useful, but this is what I got: Thanks. If you bisect with that command, does it end up on the same commit? > AddressSanitizer:DEADLYSIGNAL > ================================================================= > ==155141==ERROR: AddressSanitizer: BUS on unknown address (pc > 0x78811e863aed bp 0x7ffe9d5ac800 sp 0x7ffe9d5ac770 T0) > ==155141==The signal is caused by a READ memory access. > ==155141==Hint: this fault was caused by a dereference of a high value > address (see register values below). Disassemble the provided pc to > learn which register was used. > #0 0x78811e863aed in inflate > (/lib/x86_64-linux-gnu/libz.so.1+0xfaed) (BuildId: > bbefe2bbdc367b0c3cfbfcf80c579930496fb963) > #1 0x563e32ec7e5f in git_inflate /tmp/git_tests/git/zlib.c:118 > #2 0x563e32bde431 in unpack_loose_header > /tmp/git_tests/git/object-file.c:1271 > #3 0x563e32be429c in loose_object_info /tmp/git_tests/git/object-file.c:1474 Hmm. So we are inflating a loose object. It's mmap()-ed, so presumably that is why you get the bus error (the underlying nfs system for whatever reason is not able to provide the bytes). I'm still super puzzled about why this would start happening, or how it could be related to that commit. The rest of the stack here: > #4 0x563e32be5348 in do_oid_object_info_extended > /tmp/git_tests/git/object-file.c:1582 > #5 0x563e32be5dac in oid_object_info_extended > /tmp/git_tests/git/object-file.c:1640 > #6 0x563e32be5dac in oid_object_info /tmp/git_tests/git/object-file.c:1656 > #7 0x563e32bf8b57 in parse_object_with_flags /tmp/git_tests/git/object.c:290 shows that we are coming from parse_object_with_flags(). Is it possible that calling stat() somehow primes the nfs system to be better able to serve the mmap'd data? That seems kind of weird. Maybe one other thing to try. Build with: make NO_MMAP=1 (optionally with SANITIZE also). That should replace the mmap calls with a compat wrapper that just reads into an internal buffer. I suspect that will make your problem go away, though I'm not sure it gets us any closer to understanding what's going wrong. What's the nfs server in your setup? Is it another Linux machine, or is it some other implementation? Do you know which nfs version? -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-03 21:18 ` Jeff King @ 2024-12-05 2:21 ` Dmitriy Panteleyev 2024-12-05 3:22 ` Jeff King 0 siblings, 1 reply; 14+ messages in thread From: Dmitriy Panteleyev @ 2024-12-05 2:21 UTC (permalink / raw) To: Jeff King; +Cc: git On Tue, Dec 3, 2024 at 2:18 PM Jeff King <peff@peff.net> wrote: > > On Mon, Dec 02, 2024 at 07:48:05PM -0700, Dmitriy Panteleyev wrote: > > > > I wonder if building git with: > > > > > > make SANITIZE=address,undefined > > > > > > and running the same test might yield anything useful. > > > > Not sure if this is useful, but this is what I got: > > Thanks. If you bisect with that command, does it end up on the same > commit? Yes. The immediate parent commit works just fine. > > > AddressSanitizer:DEADLYSIGNAL > > ================================================================= > > ==155141==ERROR: AddressSanitizer: BUS on unknown address (pc > > 0x78811e863aed bp 0x7ffe9d5ac800 sp 0x7ffe9d5ac770 T0) > > ==155141==The signal is caused by a READ memory access. > > ==155141==Hint: this fault was caused by a dereference of a high value > > address (see register values below). Disassemble the provided pc to > > learn which register was used. > > #0 0x78811e863aed in inflate > > (/lib/x86_64-linux-gnu/libz.so.1+0xfaed) (BuildId: > > bbefe2bbdc367b0c3cfbfcf80c579930496fb963) > > #1 0x563e32ec7e5f in git_inflate /tmp/git_tests/git/zlib.c:118 > > #2 0x563e32bde431 in unpack_loose_header > > /tmp/git_tests/git/object-file.c:1271 > > #3 0x563e32be429c in loose_object_info /tmp/git_tests/git/object-file.c:1474 > > Hmm. So we are inflating a loose object. It's mmap()-ed, so presumably > that is why you get the bus error (the underlying nfs system for > whatever reason is not able to provide the bytes). > > I'm still super puzzled about why this would start happening, or how it > could be related to that commit. The rest of the stack here: > > > #4 0x563e32be5348 in do_oid_object_info_extended > > /tmp/git_tests/git/object-file.c:1582 > > #5 0x563e32be5dac in oid_object_info_extended > > /tmp/git_tests/git/object-file.c:1640 > > #6 0x563e32be5dac in oid_object_info /tmp/git_tests/git/object-file.c:1656 > > #7 0x563e32bf8b57 in parse_object_with_flags /tmp/git_tests/git/object.c:290 > > shows that we are coming from parse_object_with_flags(). Is it possible > that calling stat() somehow primes the nfs system to be better able to > serve the mmap'd data? That seems kind of weird. > > Maybe one other thing to try. Build with: > > make NO_MMAP=1 > > (optionally with SANITIZE also). That should replace the mmap calls with > a compat wrapper that just reads into an internal buffer. I suspect that > will make your problem go away, though I'm not sure it gets us any > closer to understanding what's going wrong. > > What's the nfs server in your setup? Is it another Linux machine, or is > it some other implementation? Do you know which nfs version? > > -Peff NFS server is on a linux bot on LAN. nfs-kernel-server 2.6.1. Client mounts shares as vers=3. After trying NO_MMAP=1 with and without SANITIZE, I get: "fatal: mmap failed: Permission denied" ~D ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-05 2:21 ` Dmitriy Panteleyev @ 2024-12-05 3:22 ` Jeff King 2024-12-05 3:59 ` Dmitriy Panteleyev 0 siblings, 1 reply; 14+ messages in thread From: Jeff King @ 2024-12-05 3:22 UTC (permalink / raw) To: Dmitriy Panteleyev; +Cc: git On Wed, Dec 04, 2024 at 07:21:16PM -0700, Dmitriy Panteleyev wrote: > NFS server is on a linux bot on LAN. nfs-kernel-server 2.6.1. Client > mounts shares as vers=3. My setup was a little different, but I tried the same thing doing an actual cross-network mount of an older box with 2.6.2, and making sure to use vers=3. Still can't reproduce. > After trying NO_MMAP=1 with and without SANITIZE, I get: > "fatal: mmap failed: Permission denied" Hmm, that's odd. If you run it under strace, which syscall fails? That message should be reporting errno from mmap(), which in NO_MMAP mode should be a pread() call. I'm not sure why that would get EACCES if the open() call succeeded, but that might explain why the mmap'd version gets SIGBUS (I don't know much about NFS, but I imagine that under the hood the client is probably issuing reads for individual pages to fault in the map). Does your system have AppArmor enabled? This issue sounds similar to yours: https://unix.stackexchange.com/questions/633389/man-cannot-read-manpage-from-nfs-although-the-file-is-readable especially the bit where reading the metadata once makes it magically work for a brief period (which is the only thing I'd expect the commit you found via bisection to have an effect on). -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-05 3:22 ` Jeff King @ 2024-12-05 3:59 ` Dmitriy Panteleyev 2024-12-05 4:58 ` Dmitriy Panteleyev 2024-12-05 19:13 ` Jeff King 0 siblings, 2 replies; 14+ messages in thread From: Dmitriy Panteleyev @ 2024-12-05 3:59 UTC (permalink / raw) To: Jeff King; +Cc: git On Wed, Dec 4, 2024 at 8:22 PM Jeff King <peff@peff.net> wrote: > > On Wed, Dec 04, 2024 at 07:21:16PM -0700, Dmitriy Panteleyev wrote: > > > After trying NO_MMAP=1 with and without SANITIZE, I get: > > "fatal: mmap failed: Permission denied" > > Hmm, that's odd. If you run it under strace, which syscall fails? That > message should be reporting errno from mmap(), which in NO_MMAP mode > should be a pread() call. I'm not sure why that would get EACCES if the > open() call succeeded, but that might explain why the mmap'd version > gets SIGBUS (I don't know much about NFS, but I imagine that under the > hood the client is probably issuing reads for individual pages to > fault in the map). Strace with NO_MMAP=1, I gives: openat(AT_FDCWD, ".git/objects/34/5819b235838e219d66420b536a54ce4cf0624c", O_RDONLY|O_CLOEXEC) = 4 fstat(4, {st_mode=S_IFREG|0444, st_size=154, ...}) = 0 pread64(4, 0x61a0292e15d0, 154, 0) = -1 ESTALE (Stale file handle) write(2, "fatal: mmap failed: Permission d"..., 38) = 38 Weirdly, it's throwing ESTALE not EACCESS... Without NO_MMAP, I get: openat(AT_FDCWD, ".git/objects/51/da8e85661b60d7378b8ac0d896cfc955405fdf", O_RDONLY|O_CLOEXEC) = 4 fstat(4, {st_mode=S_IFREG|0444, st_size=154, ...}) = 0 mmap(NULL, 154, PROT_READ, MAP_PRIVATE, 4, 0) = 0x73ceb860e000 close(4) = 0 --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x73ceb860e000} --- +++ killed by SIGBUS (core dumped) +++ Also, it's odd that the same set of commands -- openat(), fstat(), and pread64() / mmap() -- succeed multiple times before an error is encountered. > > Does your system have AppArmor enabled? Yes, but I don't see any profiles related to git. And I can't image AppArmor would be version-dependent. > > This issue sounds similar to yours: > > https://unix.stackexchange.com/questions/633389/man-cannot-read-manpage-from-nfs-although-the-file-is-readable > > especially the bit where reading the metadata once makes it magically > work for a brief period (which is the only thing I'd expect the commit > you found via bisection to have an effect on). > > -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-05 3:59 ` Dmitriy Panteleyev @ 2024-12-05 4:58 ` Dmitriy Panteleyev 2024-12-05 19:13 ` Jeff King 1 sibling, 0 replies; 14+ messages in thread From: Dmitriy Panteleyev @ 2024-12-05 4:58 UTC (permalink / raw) To: Jeff King; +Cc: git Hrm. I just spun up a couple of different VMs on my server with old and new NFS versions, and git works fine from those shares. I think we should put a pin in it, since I can't reproduce the problem outside of my specific server instance. Thanks for all the troubleshooting, Peff. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share 2024-12-05 3:59 ` Dmitriy Panteleyev 2024-12-05 4:58 ` Dmitriy Panteleyev @ 2024-12-05 19:13 ` Jeff King 1 sibling, 0 replies; 14+ messages in thread From: Jeff King @ 2024-12-05 19:13 UTC (permalink / raw) To: Dmitriy Panteleyev; +Cc: git On Wed, Dec 04, 2024 at 08:59:03PM -0700, Dmitriy Panteleyev wrote: > Strace with NO_MMAP=1, I gives: > > openat(AT_FDCWD, > ".git/objects/34/5819b235838e219d66420b536a54ce4cf0624c", > O_RDONLY|O_CLOEXEC) = 4 > fstat(4, {st_mode=S_IFREG|0444, st_size=154, ...}) = 0 > pread64(4, 0x61a0292e15d0, 154, 0) = -1 ESTALE (Stale file handle) > write(2, "fatal: mmap failed: Permission d"..., 38) = 38 > > Weirdly, it's throwing ESTALE not EACCESS... Ah, interesting. So yeah, it seems like there is some configuration issue or other problem that is causing your NFS handles to time out, and we get unexpected failures while reading. I _think_ that exonerates the commit you found, as the code it removed was helping only by chance, by creating slightly different filesystem access patterns. > > Does your system have AppArmor enabled? > > Yes, but I don't see any profiles related to git. And I can't image > AppArmor would be version-dependent. I think this was probably a long shot anyway. In the link I found it was "man", which sensibly would have AppArmor profiles that disallow network access. But clearly "git" would not have the same ones, since we expect it to hit the network (not "git commit", but it is all one binary, so AppArmor doesn't distinguish). > Hrm. I just spun up a couple of different VMs on my server with old > and new NFS versions, and git works fine from those shares. > > I think we should put a pin in it, since I can't reproduce the problem > outside of my specific server instance. Yeah, that makes sense. You might find something interesting in the server-side logs that explains the stale NFS handles. Thanks for going through all the back-and-forth. :) -Peff ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share @ 2025-05-18 22:56 Evaldas Svidras 0 siblings, 0 replies; 14+ messages in thread From: Evaldas Svidras @ 2025-05-18 22:56 UTC (permalink / raw) To: dpantel; +Cc: git, peff Efka ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-05-18 22:56 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-30 4:58 [BUG] commit fails with 'bus error' when working directory is on an NFS share Dmitriy Panteleyev 2024-11-30 16:38 ` Jeff King 2024-12-01 17:17 ` Dmitriy Panteleyev 2024-12-01 21:36 ` Jeff King 2024-12-01 22:24 ` Dmitriy Panteleyev 2024-12-02 20:34 ` Jeff King 2024-12-03 2:48 ` Dmitriy Panteleyev 2024-12-03 21:18 ` Jeff King 2024-12-05 2:21 ` Dmitriy Panteleyev 2024-12-05 3:22 ` Jeff King 2024-12-05 3:59 ` Dmitriy Panteleyev 2024-12-05 4:58 ` Dmitriy Panteleyev 2024-12-05 19:13 ` Jeff King -- strict thread matches above, loose matches on Subject: below -- 2025-05-18 22:56 Evaldas Svidras
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).