git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] commit fails with 'bus error' when working directory is on an NFS share
@ 2024-11-30  4:58 Dmitriy Panteleyev
  2024-11-30 16:38 ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitriy Panteleyev @ 2024-11-30  4:58 UTC (permalink / raw)
  To: git

I've recently upgraded my (Linux Mint) distribution version, which
came with git v2.43.0 and I noticed that I can no longer `commit` on
any working directory which resides on an NFS share mount.

Git reports "Bus error (core dumped)" and dmesg shows multiple "NFS:
server error: fileid changed. fsid 0:68: expected fileid
0xf8e3d8e80230ddb5, got 0xeeb48230d99ed0d4" messages.

This does not happen if I move the working directory off the NFS share.

I attempted to upgrade git to v2.47.1, with the same result.

I then downgraded git to v2.34.1 (the version for the previous
distribution release) and the error has resolved.

This seems like a bug to me.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-11-30  4:58 [BUG] commit fails with 'bus error' when working directory is on an NFS share Dmitriy Panteleyev
@ 2024-11-30 16:38 ` Jeff King
  2024-12-01 17:17   ` Dmitriy Panteleyev
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2024-11-30 16:38 UTC (permalink / raw)
  To: Dmitriy Panteleyev; +Cc: git

On Fri, Nov 29, 2024 at 09:58:51PM -0700, Dmitriy Panteleyev wrote:

> I've recently upgraded my (Linux Mint) distribution version, which
> came with git v2.43.0 and I noticed that I can no longer `commit` on
> any working directory which resides on an NFS share mount.
> 
> Git reports "Bus error (core dumped)" and dmesg shows multiple "NFS:
> server error: fileid changed. fsid 0:68: expected fileid
> 0xf8e3d8e80230ddb5, got 0xeeb48230d99ed0d4" messages.
> 
> This does not happen if I move the working directory off the NFS share.

I can't reproduce any problems here on a test NFS mount. But since the
old version works here:

> I attempted to upgrade git to v2.47.1, with the same result.
> 
> I then downgraded git to v2.34.1 (the version for the previous
> distribution release) and the error has resolved.

Can you try bisecting between v2.34.1 and v2.43.0 to see which commit
introduces the problem for you?

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-11-30 16:38 ` Jeff King
@ 2024-12-01 17:17   ` Dmitriy Panteleyev
  2024-12-01 21:36     ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitriy Panteleyev @ 2024-12-01 17:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Sat, Nov 30, 2024 at 9:44 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Nov 29, 2024 at 09:58:51PM -0700, Dmitriy Panteleyev wrote:
>
> > I've recently upgraded my (Linux Mint) distribution version, which
> > came with git v2.43.0 and I noticed that I can no longer `commit` on
> > any working directory which resides on an NFS share mount.
> >
> > Git reports "Bus error (core dumped)" and dmesg shows multiple "NFS:
> > server error: fileid changed. fsid 0:68: expected fileid
> > 0xf8e3d8e80230ddb5, got 0xeeb48230d99ed0d4" messages.
> >
> > This does not happen if I move the working directory off the NFS share.
>
> I can't reproduce any problems here on a test NFS mount. But since the
> old version works here:
>
> > I attempted to upgrade git to v2.47.1, with the same result.
> >
> > I then downgraded git to v2.34.1 (the version for the previous
> > distribution release) and the error has resolved.
>
> Can you try bisecting between v2.34.1 and v2.43.0 to see which commit
> introduces the problem for you?
>
> -Peff

Bisecting: 0 revisions left to test after this (roughly 0 steps)
[04fb96219abc0cbe46ba084997dc9066de3ac889] parse_object(): drop extra
"has" check before checking object type

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-01 17:17   ` Dmitriy Panteleyev
@ 2024-12-01 21:36     ` Jeff King
  2024-12-01 22:24       ` Dmitriy Panteleyev
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2024-12-01 21:36 UTC (permalink / raw)
  To: Dmitriy Panteleyev; +Cc: git

On Sun, Dec 01, 2024 at 10:17:44AM -0700, Dmitriy Panteleyev wrote:

> > > I attempted to upgrade git to v2.47.1, with the same result.
> > >
> > > I then downgraded git to v2.34.1 (the version for the previous
> > > distribution release) and the error has resolved.
> >
> > Can you try bisecting between v2.34.1 and v2.43.0 to see which commit
> > introduces the problem for you?
> >
> > -Peff
> 
> Bisecting: 0 revisions left to test after this (roughly 0 steps)
> [04fb96219abc0cbe46ba084997dc9066de3ac889] parse_object(): drop extra
> "has" check before checking object type

That seems like an unlikely commit to introduce the problem you're
seeing. And how did we end up with 0 revisions left to check, but no
final outcome? Did you need to do one more test and "git bisect
good/bad" on this commit?

Or alternatively, can you share what you're doing to test the bisection?
That might help us reproduce. I kind of wonder if the results might not
be deterministic, to end up at an apparently unrelated commit like that.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-01 21:36     ` Jeff King
@ 2024-12-01 22:24       ` Dmitriy Panteleyev
  2024-12-02 20:34         ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitriy Panteleyev @ 2024-12-01 22:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Sun, Dec 1, 2024 at 2:36 PM Jeff King <peff@peff.net> wrote:
>
> On Sun, Dec 01, 2024 at 10:17:44AM -0700, Dmitriy Panteleyev wrote:
>
> > > > I attempted to upgrade git to v2.47.1, with the same result.
> > > >
> > > > I then downgraded git to v2.34.1 (the version for the previous
> > > > distribution release) and the error has resolved.
> > >
> > > Can you try bisecting between v2.34.1 and v2.43.0 to see which commit
> > > introduces the problem for you?
> > >
> > > -Peff
> >
> > Bisecting: 0 revisions left to test after this (roughly 0 steps)
> > [04fb96219abc0cbe46ba084997dc9066de3ac889] parse_object(): drop extra
> > "has" check before checking object type
>
> That seems like an unlikely commit to introduce the problem you're
> seeing. And how did we end up with 0 revisions left to check, but no
> final outcome? Did you need to do one more test and "git bisect
> good/bad" on this commit?
>

You are right, Jeff, I needed to run one more bisect. But it does point to
the commit I linked above. The bisect result is:

04fb96219abc0cbe46ba084997dc9066de3ac889 is the first bad commit
commit 04fb96219abc0cbe46ba084997dc9066de3ac889
Author: Jeff King <peff@peff.net>
Date:   Thu Nov 17 17:37:58 2022 -0500

    parse_object(): drop extra "has" check before checking object type

    When parsing an object of unknown type, we check to see if it's a blob,
    so we can use our streaming code path. This uses oid_object_info() to
    check the type, but before doing so we call repo_has_object_file(). This
    latter is pointless, as oid_object_info() will already fail if the
    object is missing. Checking it ahead of time just complicates the code
    and is a waste of resources (albeit small).

    Let's drop the redundant check.

    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Taylor Blau <me@ttaylorr.com>

 object.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

> Or alternatively, can you share what you're doing to test the bisection?
> That might help us reproduce. I kind of wonder if the results might not
> be deterministic, to end up at an apparently unrelated commit like that.
>
> -Peff

I am not at all familiar with the standard process for this, but the way I ran
the test is:

(0. cloned test project into /nfs/proj/ and made a change)
1. cloned git repo (from github) into /tmp/git/
2. ran bisect in /tmp/git/, starting with v2.34.1 (good) and v2.43.1 (bad)
3. ran `make all` in /tmp/git/
4. in /nfs/proj/ ran `/tmp/git/bin-wrappers/git commit -m 'test'`
5. repeated 2-4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-01 22:24       ` Dmitriy Panteleyev
@ 2024-12-02 20:34         ` Jeff King
  2024-12-03  2:48           ` Dmitriy Panteleyev
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2024-12-02 20:34 UTC (permalink / raw)
  To: Dmitriy Panteleyev; +Cc: git

On Sun, Dec 01, 2024 at 03:24:35PM -0700, Dmitriy Panteleyev wrote:

> You are right, Jeff, I needed to run one more bisect. But it does point to
> the commit I linked above. The bisect result is:

Thanks for checking. I'm still puzzled how this commit:

> 04fb96219abc0cbe46ba084997dc9066de3ac889 is the first bad commit
> commit 04fb96219abc0cbe46ba084997dc9066de3ac889
> Author: Jeff King <peff@peff.net>
> Date:   Thu Nov 17 17:37:58 2022 -0500
> 
>     parse_object(): drop extra "has" check before checking object type
> 
>     When parsing an object of unknown type, we check to see if it's a blob,
>     so we can use our streaming code path. This uses oid_object_info() to
>     check the type, but before doing so we call repo_has_object_file(). This
>     latter is pointless, as oid_object_info() will already fail if the
>     object is missing. Checking it ahead of time just complicates the code
>     and is a waste of resources (albeit small).
> 
>     Let's drop the redundant check.

could be the culprit, though. The diff is just

diff --git a/object.c b/object.c
index 8a74eb85e9..16eb944e98 100644
--- a/object.c
+++ b/object.c
@@ -287,8 +287,7 @@ struct object *parse_object_with_flags(struct repository *r,
 	}
 
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
-	    (!obj && repo_has_object_file(r, oid) &&
-	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+	    (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;

So it is actually doing _less_, though what it is removing is going to
just be a pack .idx lookup (or maybe a stat() call if the object is
loose).

> I am not at all familiar with the standard process for this, but the way I ran
> the test is:
> 
> (0. cloned test project into /nfs/proj/ and made a change)
> 1. cloned git repo (from github) into /tmp/git/
> 2. ran bisect in /tmp/git/, starting with v2.34.1 (good) and v2.43.1 (bad)
> 3. ran `make all` in /tmp/git/
> 4. in /nfs/proj/ ran `/tmp/git/bin-wrappers/git commit -m 'test'`
> 5. repeated 2-4

That sounds reasonable. I'm still not sure what's going on. It's always
possible that commit introduced a problem, but I just don't see it. So I
still have a suspicion (especially given that your symptom is a bus
error) that the problem might not be deterministic.

I wonder if building git with:

  make SANITIZE=address,undefined

and running the same test might yield anything useful.

-Peff

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-02 20:34         ` Jeff King
@ 2024-12-03  2:48           ` Dmitriy Panteleyev
  2024-12-03 21:18             ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitriy Panteleyev @ 2024-12-03  2:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Mon, Dec 2, 2024 at 1:41 PM Jeff King <peff@peff.net> wrote:
>
> On Sun, Dec 01, 2024 at 03:24:35PM -0700, Dmitriy Panteleyev wrote:
>
> > You are right, Jeff, I needed to run one more bisect. But it does point to
> > the commit I linked above. The bisect result is:
>
> Thanks for checking. I'm still puzzled how this commit:
>
> > 04fb96219abc0cbe46ba084997dc9066de3ac889 is the first bad commit
> > commit 04fb96219abc0cbe46ba084997dc9066de3ac889
> > Author: Jeff King <peff@peff.net>
> > Date:   Thu Nov 17 17:37:58 2022 -0500
> >
> >     parse_object(): drop extra "has" check before checking object type
> >
> >     When parsing an object of unknown type, we check to see if it's a blob,
> >     so we can use our streaming code path. This uses oid_object_info() to
> >     check the type, but before doing so we call repo_has_object_file(). This
> >     latter is pointless, as oid_object_info() will already fail if the
> >     object is missing. Checking it ahead of time just complicates the code
> >     and is a waste of resources (albeit small).
> >
> >     Let's drop the redundant check.
>
> could be the culprit, though. The diff is just
>
> diff --git a/object.c b/object.c
> index 8a74eb85e9..16eb944e98 100644
> --- a/object.c
> +++ b/object.c
> @@ -287,8 +287,7 @@ struct object *parse_object_with_flags(struct repository *r,
>         }
>
>         if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
> -           (!obj && repo_has_object_file(r, oid) &&
> -            oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
> +           (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
>                 if (!skip_hash && stream_object_signature(r, repl) < 0) {
>                         error(_("hash mismatch %s"), oid_to_hex(oid));
>                         return NULL;
>
> So it is actually doing _less_, though what it is removing is going to
> just be a pack .idx lookup (or maybe a stat() call if the object is
> loose).
>
> > I am not at all familiar with the standard process for this, but the way I ran
> > the test is:
> >
> > (0. cloned test project into /nfs/proj/ and made a change)
> > 1. cloned git repo (from github) into /tmp/git/
> > 2. ran bisect in /tmp/git/, starting with v2.34.1 (good) and v2.43.1 (bad)
> > 3. ran `make all` in /tmp/git/
> > 4. in /nfs/proj/ ran `/tmp/git/bin-wrappers/git commit -m 'test'`
> > 5. repeated 2-4
>
> That sounds reasonable. I'm still not sure what's going on. It's always
> possible that commit introduced a problem, but I just don't see it. So I
> still have a suspicion (especially given that your symptom is a bus
> error) that the problem might not be deterministic.
>
> I wonder if building git with:
>
>   make SANITIZE=address,undefined
>
> and running the same test might yield anything useful.
>
> -Peff

Not sure if this is useful, but this is what I got:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==155141==ERROR: AddressSanitizer: BUS on unknown address (pc
0x78811e863aed bp 0x7ffe9d5ac800 sp 0x7ffe9d5ac770 T0)
==155141==The signal is caused by a READ memory access.
==155141==Hint: this fault was caused by a dereference of a high value
address (see register values below).  Disassemble the provided pc to
learn which register was used.
    #0 0x78811e863aed in inflate
(/lib/x86_64-linux-gnu/libz.so.1+0xfaed) (BuildId:
bbefe2bbdc367b0c3cfbfcf80c579930496fb963)
    #1 0x563e32ec7e5f in git_inflate /tmp/git_tests/git/zlib.c:118
    #2 0x563e32bde431 in unpack_loose_header
/tmp/git_tests/git/object-file.c:1271
    #3 0x563e32be429c in loose_object_info /tmp/git_tests/git/object-file.c:1474
    #4 0x563e32be5348 in do_oid_object_info_extended
/tmp/git_tests/git/object-file.c:1582
    #5 0x563e32be5dac in oid_object_info_extended
/tmp/git_tests/git/object-file.c:1640
    #6 0x563e32be5dac in oid_object_info /tmp/git_tests/git/object-file.c:1656
    #7 0x563e32bf8b57 in parse_object_with_flags /tmp/git_tests/git/object.c:290
    #8 0x563e32cfbd19 in write_ref_to_lockfile refs/files-backend.c:1772
    #9 0x563e32d0196e in lock_ref_for_update refs/files-backend.c:2582
    #10 0x563e32d0196e in files_transaction_prepare refs/files-backend.c:2755
    #11 0x563e32ce6800 in ref_transaction_prepare /tmp/git_tests/git/refs.c:2266
    #12 0x563e32ce6a5a in ref_transaction_commit /tmp/git_tests/git/refs.c:2315
    #13 0x563e32d8c44e in update_head_with_reflog
/tmp/git_tests/git/sequencer.c:1197
    #14 0x563e326b2f51 in cmd_commit builtin/commit.c:1834
    #15 0x563e3263002a in run_builtin /tmp/git_tests/git/git.c:466
    #16 0x563e3263002a in handle_builtin /tmp/git_tests/git/git.c:721
    #17 0x563e32633ff8 in run_argv /tmp/git_tests/git/git.c:788
    #18 0x563e32633ff8 in cmd_main /tmp/git_tests/git/git.c:926
    #19 0x563e3262c6a4 in main /tmp/git_tests/git/common-main.c:57
    #20 0x78811d42a1c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
    #21 0x78811d42a28a in __libc_start_main_impl ../csu/libc-start.c:360
    #22 0x563e3262f6d4 in _start (/tmp/git_tests/git/git+0xa726d4)
(BuildId: 197ee6cc3c63db9e10cfed4585ab78b52790454a)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: BUS
(/lib/x86_64-linux-gnu/libz.so.1+0xfaed) (BuildId:
bbefe2bbdc367b0c3cfbfcf80c579930496fb963) in inflate
==155141==ABORTING

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-03  2:48           ` Dmitriy Panteleyev
@ 2024-12-03 21:18             ` Jeff King
  2024-12-05  2:21               ` Dmitriy Panteleyev
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2024-12-03 21:18 UTC (permalink / raw)
  To: Dmitriy Panteleyev; +Cc: git

On Mon, Dec 02, 2024 at 07:48:05PM -0700, Dmitriy Panteleyev wrote:

> > I wonder if building git with:
> >
> >   make SANITIZE=address,undefined
> >
> > and running the same test might yield anything useful.
> 
> Not sure if this is useful, but this is what I got:

Thanks. If you bisect with that command, does it end up on the same
commit?

> AddressSanitizer:DEADLYSIGNAL
> =================================================================
> ==155141==ERROR: AddressSanitizer: BUS on unknown address (pc
> 0x78811e863aed bp 0x7ffe9d5ac800 sp 0x7ffe9d5ac770 T0)
> ==155141==The signal is caused by a READ memory access.
> ==155141==Hint: this fault was caused by a dereference of a high value
> address (see register values below).  Disassemble the provided pc to
> learn which register was used.
>     #0 0x78811e863aed in inflate
> (/lib/x86_64-linux-gnu/libz.so.1+0xfaed) (BuildId:
> bbefe2bbdc367b0c3cfbfcf80c579930496fb963)
>     #1 0x563e32ec7e5f in git_inflate /tmp/git_tests/git/zlib.c:118
>     #2 0x563e32bde431 in unpack_loose_header
> /tmp/git_tests/git/object-file.c:1271
>     #3 0x563e32be429c in loose_object_info /tmp/git_tests/git/object-file.c:1474

Hmm. So we are inflating a loose object. It's mmap()-ed, so presumably
that is why you get the bus error (the underlying nfs system for
whatever reason is not able to provide the bytes).

I'm still super puzzled about why this would start happening, or how it
could be related to that commit. The rest of the stack here:

>     #4 0x563e32be5348 in do_oid_object_info_extended
> /tmp/git_tests/git/object-file.c:1582
>     #5 0x563e32be5dac in oid_object_info_extended
> /tmp/git_tests/git/object-file.c:1640
>     #6 0x563e32be5dac in oid_object_info /tmp/git_tests/git/object-file.c:1656
>     #7 0x563e32bf8b57 in parse_object_with_flags /tmp/git_tests/git/object.c:290

shows that we are coming from parse_object_with_flags(). Is it possible
that calling stat() somehow primes the nfs system to be better able to
serve the mmap'd data? That seems kind of weird.

Maybe one other thing to try. Build with:

  make NO_MMAP=1

(optionally with SANITIZE also). That should replace the mmap calls with
a compat wrapper that just reads into an internal buffer. I suspect that
will make your problem go away, though I'm not sure it gets us any
closer to understanding what's going wrong.

What's the nfs server in your setup? Is it another Linux machine, or is
it some other implementation? Do you know which nfs version?

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-03 21:18             ` Jeff King
@ 2024-12-05  2:21               ` Dmitriy Panteleyev
  2024-12-05  3:22                 ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitriy Panteleyev @ 2024-12-05  2:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Tue, Dec 3, 2024 at 2:18 PM Jeff King <peff@peff.net> wrote:
>
> On Mon, Dec 02, 2024 at 07:48:05PM -0700, Dmitriy Panteleyev wrote:
>
> > > I wonder if building git with:
> > >
> > >   make SANITIZE=address,undefined
> > >
> > > and running the same test might yield anything useful.
> >
> > Not sure if this is useful, but this is what I got:
>
> Thanks. If you bisect with that command, does it end up on the same
> commit?

Yes. The immediate parent commit works just fine.

>
> > AddressSanitizer:DEADLYSIGNAL
> > =================================================================
> > ==155141==ERROR: AddressSanitizer: BUS on unknown address (pc
> > 0x78811e863aed bp 0x7ffe9d5ac800 sp 0x7ffe9d5ac770 T0)
> > ==155141==The signal is caused by a READ memory access.
> > ==155141==Hint: this fault was caused by a dereference of a high value
> > address (see register values below).  Disassemble the provided pc to
> > learn which register was used.
> >     #0 0x78811e863aed in inflate
> > (/lib/x86_64-linux-gnu/libz.so.1+0xfaed) (BuildId:
> > bbefe2bbdc367b0c3cfbfcf80c579930496fb963)
> >     #1 0x563e32ec7e5f in git_inflate /tmp/git_tests/git/zlib.c:118
> >     #2 0x563e32bde431 in unpack_loose_header
> > /tmp/git_tests/git/object-file.c:1271
> >     #3 0x563e32be429c in loose_object_info /tmp/git_tests/git/object-file.c:1474
>
> Hmm. So we are inflating a loose object. It's mmap()-ed, so presumably
> that is why you get the bus error (the underlying nfs system for
> whatever reason is not able to provide the bytes).
>
> I'm still super puzzled about why this would start happening, or how it
> could be related to that commit. The rest of the stack here:
>
> >     #4 0x563e32be5348 in do_oid_object_info_extended
> > /tmp/git_tests/git/object-file.c:1582
> >     #5 0x563e32be5dac in oid_object_info_extended
> > /tmp/git_tests/git/object-file.c:1640
> >     #6 0x563e32be5dac in oid_object_info /tmp/git_tests/git/object-file.c:1656
> >     #7 0x563e32bf8b57 in parse_object_with_flags /tmp/git_tests/git/object.c:290
>
> shows that we are coming from parse_object_with_flags(). Is it possible
> that calling stat() somehow primes the nfs system to be better able to
> serve the mmap'd data? That seems kind of weird.
>
> Maybe one other thing to try. Build with:
>
>   make NO_MMAP=1
>
> (optionally with SANITIZE also). That should replace the mmap calls with
> a compat wrapper that just reads into an internal buffer. I suspect that
> will make your problem go away, though I'm not sure it gets us any
> closer to understanding what's going wrong.
>
> What's the nfs server in your setup? Is it another Linux machine, or is
> it some other implementation? Do you know which nfs version?
>
> -Peff

NFS server is on a linux bot on LAN. nfs-kernel-server 2.6.1. Client
mounts shares as vers=3.

After trying NO_MMAP=1 with and without SANITIZE, I get:
"fatal: mmap failed: Permission denied"

~D

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-05  2:21               ` Dmitriy Panteleyev
@ 2024-12-05  3:22                 ` Jeff King
  2024-12-05  3:59                   ` Dmitriy Panteleyev
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff King @ 2024-12-05  3:22 UTC (permalink / raw)
  To: Dmitriy Panteleyev; +Cc: git

On Wed, Dec 04, 2024 at 07:21:16PM -0700, Dmitriy Panteleyev wrote:

> NFS server is on a linux bot on LAN. nfs-kernel-server 2.6.1. Client
> mounts shares as vers=3.

My setup was a little different, but I tried the same thing doing an
actual cross-network mount of an older box with 2.6.2, and making sure
to use vers=3. Still can't reproduce.

> After trying NO_MMAP=1 with and without SANITIZE, I get:
> "fatal: mmap failed: Permission denied"

Hmm, that's odd. If you run it under strace, which syscall fails? That
message should be reporting errno from mmap(), which in NO_MMAP mode
should be a pread() call. I'm not sure why that would get EACCES if the
open() call succeeded, but that might explain why the mmap'd version
gets SIGBUS (I don't know much about NFS, but I imagine that under the
hood the client is probably issuing reads for individual pages to
fault in the map).

Does your system have AppArmor enabled?

This issue sounds similar to yours:

  https://unix.stackexchange.com/questions/633389/man-cannot-read-manpage-from-nfs-although-the-file-is-readable

especially the bit where reading the metadata once makes it magically
work for a brief period (which is the only thing I'd expect the commit
you found via bisection to have an effect on).

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-05  3:22                 ` Jeff King
@ 2024-12-05  3:59                   ` Dmitriy Panteleyev
  2024-12-05  4:58                     ` Dmitriy Panteleyev
  2024-12-05 19:13                     ` Jeff King
  0 siblings, 2 replies; 14+ messages in thread
From: Dmitriy Panteleyev @ 2024-12-05  3:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Wed, Dec 4, 2024 at 8:22 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Dec 04, 2024 at 07:21:16PM -0700, Dmitriy Panteleyev wrote:
>
> > After trying NO_MMAP=1 with and without SANITIZE, I get:
> > "fatal: mmap failed: Permission denied"
>
> Hmm, that's odd. If you run it under strace, which syscall fails? That
> message should be reporting errno from mmap(), which in NO_MMAP mode
> should be a pread() call. I'm not sure why that would get EACCES if the
> open() call succeeded, but that might explain why the mmap'd version
> gets SIGBUS (I don't know much about NFS, but I imagine that under the
> hood the client is probably issuing reads for individual pages to
> fault in the map).

Strace with NO_MMAP=1, I gives:

openat(AT_FDCWD,
".git/objects/34/5819b235838e219d66420b536a54ce4cf0624c",
O_RDONLY|O_CLOEXEC) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=154, ...}) = 0
pread64(4, 0x61a0292e15d0, 154, 0)      = -1 ESTALE (Stale file handle)
write(2, "fatal: mmap failed: Permission d"..., 38) = 38

Weirdly, it's throwing ESTALE not EACCESS...

Without NO_MMAP, I get:

openat(AT_FDCWD,
".git/objects/51/da8e85661b60d7378b8ac0d896cfc955405fdf",
O_RDONLY|O_CLOEXEC) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=154, ...}) = 0
mmap(NULL, 154, PROT_READ, MAP_PRIVATE, 4, 0) = 0x73ceb860e000
close(4)                                = 0
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x73ceb860e000} ---
+++ killed by SIGBUS (core dumped) +++


Also, it's odd that the same set of commands -- openat(), fstat(), and
pread64() / mmap() -- succeed multiple times before an error is
encountered.


>
> Does your system have AppArmor enabled?

Yes, but I don't see any profiles related to git. And I can't image
AppArmor would be version-dependent.

>
> This issue sounds similar to yours:
>
>   https://unix.stackexchange.com/questions/633389/man-cannot-read-manpage-from-nfs-although-the-file-is-readable
>
> especially the bit where reading the metadata once makes it magically
> work for a brief period (which is the only thing I'd expect the commit
> you found via bisection to have an effect on).
>
> -Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-05  3:59                   ` Dmitriy Panteleyev
@ 2024-12-05  4:58                     ` Dmitriy Panteleyev
  2024-12-05 19:13                     ` Jeff King
  1 sibling, 0 replies; 14+ messages in thread
From: Dmitriy Panteleyev @ 2024-12-05  4:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hrm. I just spun up a couple of different VMs on my server with old
and new NFS versions, and git works fine from those shares.

I think we should put a pin in it, since I can't reproduce the problem
outside of my specific server instance.

Thanks for all the troubleshooting, Peff.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
  2024-12-05  3:59                   ` Dmitriy Panteleyev
  2024-12-05  4:58                     ` Dmitriy Panteleyev
@ 2024-12-05 19:13                     ` Jeff King
  1 sibling, 0 replies; 14+ messages in thread
From: Jeff King @ 2024-12-05 19:13 UTC (permalink / raw)
  To: Dmitriy Panteleyev; +Cc: git

On Wed, Dec 04, 2024 at 08:59:03PM -0700, Dmitriy Panteleyev wrote:

> Strace with NO_MMAP=1, I gives:
> 
> openat(AT_FDCWD,
> ".git/objects/34/5819b235838e219d66420b536a54ce4cf0624c",
> O_RDONLY|O_CLOEXEC) = 4
> fstat(4, {st_mode=S_IFREG|0444, st_size=154, ...}) = 0
> pread64(4, 0x61a0292e15d0, 154, 0)      = -1 ESTALE (Stale file handle)
> write(2, "fatal: mmap failed: Permission d"..., 38) = 38
> 
> Weirdly, it's throwing ESTALE not EACCESS...

Ah, interesting. So yeah, it seems like there is some configuration
issue or other problem that is causing your NFS handles to time out, and
we get unexpected failures while reading. I _think_ that exonerates the
commit you found, as the code it removed was helping only by chance, by
creating slightly different filesystem access patterns.

> > Does your system have AppArmor enabled?
> 
> Yes, but I don't see any profiles related to git. And I can't image
> AppArmor would be version-dependent.

I think this was probably a long shot anyway. In the link I found it was
"man", which sensibly would have AppArmor profiles that disallow network
access. But clearly "git" would not have the same ones, since we expect
it to hit the network (not "git commit", but it is all one binary, so
AppArmor doesn't distinguish).

> Hrm. I just spun up a couple of different VMs on my server with old
> and new NFS versions, and git works fine from those shares.
> 
> I think we should put a pin in it, since I can't reproduce the problem
> outside of my specific server instance.

Yeah, that makes sense. You might find something interesting in the
server-side logs that explains the stale NFS handles.

Thanks for going through all the back-and-forth. :)

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [BUG] commit fails with 'bus error' when working directory is on an NFS share
@ 2025-05-18 22:56 Evaldas Svidras
  0 siblings, 0 replies; 14+ messages in thread
From: Evaldas Svidras @ 2025-05-18 22:56 UTC (permalink / raw)
  To: dpantel; +Cc: git, peff


Efka 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-05-18 22:56 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-30  4:58 [BUG] commit fails with 'bus error' when working directory is on an NFS share Dmitriy Panteleyev
2024-11-30 16:38 ` Jeff King
2024-12-01 17:17   ` Dmitriy Panteleyev
2024-12-01 21:36     ` Jeff King
2024-12-01 22:24       ` Dmitriy Panteleyev
2024-12-02 20:34         ` Jeff King
2024-12-03  2:48           ` Dmitriy Panteleyev
2024-12-03 21:18             ` Jeff King
2024-12-05  2:21               ` Dmitriy Panteleyev
2024-12-05  3:22                 ` Jeff King
2024-12-05  3:59                   ` Dmitriy Panteleyev
2024-12-05  4:58                     ` Dmitriy Panteleyev
2024-12-05 19:13                     ` Jeff King
  -- strict thread matches above, loose matches on Subject: below --
2025-05-18 22:56 Evaldas Svidras

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).