* [BUG] git diff --no-index segfaults on large files (NULL object database) @ 2026-04-04 10:39 Luca Stefani 2026-04-04 16:45 ` Tian Yuchen 0 siblings, 1 reply; 9+ messages in thread From: Luca Stefani @ 2026-04-04 10:39 UTC (permalink / raw) To: git Thank you for filling out a Git bug report! Please answer the following questions to help us understand your issue. What did you do before the bug happened? (Steps to reproduce your issue) `git diff --color -- file1 file2` SIGSEGV in case file1 and file2 are "big enough", that is when the file size is bigger than repo_settings_get_big_file_threshold() What did you expect to happen? (Expected behavior) Not a crash, it to say "binary files are different" What happened instead? (Actual behavior) Program received signal SIGSEGV, Segmentation fault. index_fd (istate=istate@entry=0x5555559ccb40, oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0, type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0 "4/root_part", flags=0) at /usr/src/debug/git/git/object-file.c:1634 1634 transaction = odb_transaction_begin(the_repository->objects); (gdb) bt #0 index_fd (istate=istate@entry=0x5555559ccb40, oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0, type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0 "4/root_part", flags=0) at /usr/src/debug/git/git/object-file.c:1634 #1 0x000055555579ab75 in index_path (istate=0x5555559ccb40, oid=0x5555559d3c80, path=0x5555559d3ce0 "4/root_part", st=0x7fffffffc5c0, flags=0) at /usr/src/debug/git/git/object-file.c:1658 #2 0x0000555555720f83 in diff_fill_oid_info (one=0x5555559d3c80, istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4690 #3 diff_fill_oid_info (one=one@entry=0x5555559d3c80, istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4679 #4 0x0000555555724aba in run_diff (p=0x5555559ce900, o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:4738 #5 diff_flush_patch (p=0x5555559ce900, o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:6247 #6 0x0000555555729850 in diff_flush_patch_all_file_pairs (o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:6802 #7 diff_flush (options=<optimized out>) at /usr/src/debug/git/git/diff.c:6942 #8 0x00005555555b0d34 in diff_no_index (revs=0x7fffffffcbd0, algop=0x55555598dbd0 <hash_algos+112>, implicit_no_index=<optimized out>, argc=2, argv=0x5555559ce2a0) at /usr/src/debug/git/git/diff-no-index.c:427 #9 cmd_diff (argc=<optimized out>, argv=0x5555559ce2a0, prefix=<optimized out>, repo=<optimized out>) at builtin/diff.c:516 #10 0x000055555555fe05 in run_builtin (p=0x555555994ec8 <commands.lto_priv+840>, argc=<optimized out>, argv=<optimized out>, repo=0x5555559a3c00 <the_repo.lto_priv>) at /usr/src/debug/git/git/git.c:506 #11 handle_builtin (args=args@entry=0x7fffffffd980) at /usr/src/debug/git/git/git.c:779 #12 0x00005555555612cc in run_argv (args=0x7fffffffd980) at /usr/src/debug/git/git/git.c:862 #13 cmd_main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/git/git/git.c:984 #14 0x000055555555d794 in main (argc=6, argv=0x7fffffffdc58) at /usr/src/debug/git/git/common-main.c:9 (gdb) p the_repository->objects $1 = (struct object_database *) 0x0 (gdb) p istate->repo->objects $2 = (struct object_database *) 0x0 (gdb) p istate->repo $3 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv> (gdb) p the_repository $4 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv> The same issue happens in master, the same path is taken and when we finally get into `odb_transaction_begin` where odb is NULL, and reading ->transaction causes the illegal segment access once again. Program received signal SIGSEGV, Segmentation fault. 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075 1075 if (odb->transaction) (gdb) bt #0 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075 #1 0x00005555557b94f9 in index_fd (istate=0x555555a88b60, oid=0x555555a8fcb0, fd=4, st=0x7fffffffc730, type=OBJ_BLOB, path=0x555555a8fd10 "4/root_part", flags=0) at object-file.c:1665 #2 0x00005555557b9651 in index_path (istate=0x555555a88b60, oid=0x555555a8fcb0, path=0x555555a8fd10 "4/root_part", st=0x7fffffffc730, flags=0) at object-file.c:1691 #3 0x0000555555730937 in diff_fill_oid_info (one=0x555555a8fcb0, istate=0x555555a88b60) at diff.c:4699 #4 0x0000555555730b81 in run_diff (p=0x555555a8a9a0, o=0x7fffffffd1a8) at diff.c:4747 #5 0x00005555557361d5 in diff_flush_patch (p=0x555555a8a9a0, o=0x7fffffffd1a8) at diff.c:6258 #6 0x0000555555737cae in diff_flush_patch_all_file_pairs (o=0x7fffffffd1a8) at diff.c:6813 #7 0x00005555557382b2 in diff_flush (options=0x7fffffffd1a8) at diff.c:6953 #8 0x00005555557228cc in diff_no_index (revs=0x7fffffffcbf0, algop=0x555555a49b50 <hash_algos+112>, implicit_no_index=1, argc=2, argv=0x555555a8a2f0) at diff-no-index.c:427 #9 0x00005555555c04b1 in cmd_diff (argc=5, argv=0x555555a8a2f0, prefix=0x0, repo=0x0) at builtin/diff.c:516 #10 0x0000555555574f47 in run_builtin (p=0x555555a4aaa8 <commands+840>, argc=5, argv=0x555555a8a2f0, repo=0x555555a7e620 <the_repo>) at git.c:506 #11 0x000055555557544e in handle_builtin (args=0x7fffffffdab0) at git.c:780 #12 0x0000555555575778 in run_argv (args=0x7fffffffdab0) at git.c:863 #13 0x0000555555575c0f in cmd_main (argc=5, argv=0x7fffffffdc50) at git.c:984 #14 0x00005555556a7414 in main (argc=6, argv=0x7fffffffdc48) at common-main.c:9 What's different between what you expected and what actually happened? Anything else you want to add: Please review the rest of the bug report below. You can delete any lines you don't wish to share. [System Info] git version: git version 2.53.0 cpu: x86_64 built from commit: 67ad42147a7acc2af6074753ebd03d904476118f sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh rust: enabled gettext: enabled libcurl: 8.18.0 OpenSSL: OpenSSL 3.6.1 27 Jan 2026 zlib-ng: 2.3.2 SHA-1: SHA1_DC SHA-256: SHA256_BLK default-ref-format: files default-hash: sha1 uname: Linux 6.19.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 19 Mar 2026 16:33:17 +0000 x86_64 compiler info: gnuc: 15.2 libc info: glibc: 2.43 $SHELL (typically, interactive shell): /usr/bin/fish [Enabled Hooks] not run from a git repository - no hooks to show ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database) 2026-04-04 10:39 [BUG] git diff --no-index segfaults on large files (NULL object database) Luca Stefani @ 2026-04-04 16:45 ` Tian Yuchen 2026-04-04 16:53 ` Luca Stefani 0 siblings, 1 reply; 9+ messages in thread From: Tian Yuchen @ 2026-04-04 16:45 UTC (permalink / raw) To: Luca Stefani, git On 4/4/26 18:39, Luca Stefani wrote: > Thank you for filling out a Git bug report! > Please answer the following questions to help us understand your issue. > > What did you do before the bug happened? (Steps to reproduce your issue) > > `git diff --color -- file1 file2` SIGSEGV in case file1 and file2 > are "big enough", that is when the file size is bigger than > repo_settings_get_big_file_threshold() > > What did you expect to happen? (Expected behavior) > > Not a crash, it to say "binary files are different" > > What happened instead? (Actual behavior) > > Program received signal SIGSEGV, Segmentation fault. > index_fd (istate=istate@entry=0x5555559ccb40, > oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0, > type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0 > "4/root_part", flags=0) at /usr/src/debug/git/git/object-file.c:1634 > 1634 transaction = > odb_transaction_begin(the_repository->objects); > (gdb) bt > #0 index_fd (istate=istate@entry=0x5555559ccb40, > oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0, > type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0 > "4/root_part", flags=0) > at /usr/src/debug/git/git/object-file.c:1634 > #1 0x000055555579ab75 in index_path (istate=0x5555559ccb40, > oid=0x5555559d3c80, path=0x5555559d3ce0 "4/root_part", > st=0x7fffffffc5c0, flags=0) at > /usr/src/debug/git/git/object-file.c:1658 > #2 0x0000555555720f83 in diff_fill_oid_info (one=0x5555559d3c80, > istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4690 > #3 diff_fill_oid_info (one=one@entry=0x5555559d3c80, > istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4679 > #4 0x0000555555724aba in run_diff (p=0x5555559ce900, > o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:4738 > #5 diff_flush_patch (p=0x5555559ce900, o=0x7fffffffd188) at > /usr/src/debug/git/git/diff.c:6247 > #6 0x0000555555729850 in diff_flush_patch_all_file_pairs > (o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:6802 > #7 diff_flush (options=<optimized out>) at /usr/src/debug/git/git/diff.c:6942 > #8 0x00005555555b0d34 in diff_no_index (revs=0x7fffffffcbd0, > algop=0x55555598dbd0 <hash_algos+112>, implicit_no_index=<optimized > out>, argc=2, argv=0x5555559ce2a0) at > /usr/src/debug/git/git/diff-no-index.c:427 > #9 cmd_diff (argc=<optimized out>, argv=0x5555559ce2a0, > prefix=<optimized out>, repo=<optimized out>) at builtin/diff.c:516 > #10 0x000055555555fe05 in run_builtin (p=0x555555994ec8 > <commands.lto_priv+840>, argc=<optimized out>, argv=<optimized out>, > repo=0x5555559a3c00 <the_repo.lto_priv>) at > /usr/src/debug/git/git/git.c:506 > #11 handle_builtin (args=args@entry=0x7fffffffd980) at > /usr/src/debug/git/git/git.c:779 > #12 0x00005555555612cc in run_argv (args=0x7fffffffd980) at > /usr/src/debug/git/git/git.c:862 > #13 cmd_main (argc=<optimized out>, argv=<optimized out>) at > /usr/src/debug/git/git/git.c:984 > #14 0x000055555555d794 in main (argc=6, argv=0x7fffffffdc58) at > /usr/src/debug/git/git/common-main.c:9 > (gdb) p the_repository->objects > $1 = (struct object_database *) 0x0 > (gdb) p istate->repo->objects > $2 = (struct object_database *) 0x0 > (gdb) p istate->repo > $3 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv> > (gdb) p the_repository > $4 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv> > > The same issue happens in master, the same path is taken and when we > finally get into > `odb_transaction_begin` where odb is NULL, and reading ->transaction > causes the illegal > segment access once again. > > Program received signal SIGSEGV, Segmentation fault. > 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075 > 1075 if (odb->transaction) > (gdb) bt > #0 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075 > #1 0x00005555557b94f9 in index_fd (istate=0x555555a88b60, > oid=0x555555a8fcb0, fd=4, st=0x7fffffffc730, type=OBJ_BLOB, > path=0x555555a8fd10 "4/root_part", flags=0) at object-file.c:1665 > #2 0x00005555557b9651 in index_path (istate=0x555555a88b60, > oid=0x555555a8fcb0, path=0x555555a8fd10 "4/root_part", > st=0x7fffffffc730, flags=0) at object-file.c:1691 > #3 0x0000555555730937 in diff_fill_oid_info (one=0x555555a8fcb0, > istate=0x555555a88b60) at diff.c:4699 > #4 0x0000555555730b81 in run_diff (p=0x555555a8a9a0, > o=0x7fffffffd1a8) at diff.c:4747 > #5 0x00005555557361d5 in diff_flush_patch (p=0x555555a8a9a0, > o=0x7fffffffd1a8) at diff.c:6258 > #6 0x0000555555737cae in diff_flush_patch_all_file_pairs > (o=0x7fffffffd1a8) at diff.c:6813 > #7 0x00005555557382b2 in diff_flush (options=0x7fffffffd1a8) at diff.c:6953 > #8 0x00005555557228cc in diff_no_index (revs=0x7fffffffcbf0, > algop=0x555555a49b50 <hash_algos+112>, implicit_no_index=1, argc=2, > argv=0x555555a8a2f0) at diff-no-index.c:427 > #9 0x00005555555c04b1 in cmd_diff (argc=5, argv=0x555555a8a2f0, > prefix=0x0, repo=0x0) at builtin/diff.c:516 > #10 0x0000555555574f47 in run_builtin (p=0x555555a4aaa8 > <commands+840>, argc=5, argv=0x555555a8a2f0, repo=0x555555a7e620 > <the_repo>) at git.c:506 > #11 0x000055555557544e in handle_builtin (args=0x7fffffffdab0) at git.c:780 > #12 0x0000555555575778 in run_argv (args=0x7fffffffdab0) at git.c:863 > #13 0x0000555555575c0f in cmd_main (argc=5, argv=0x7fffffffdc50) at git.c:984 > #14 0x00005555556a7414 in main (argc=6, argv=0x7fffffffdc48) at common-main.c:9 The problem is most likely to occur in object-file.c: ... int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags) { int ret; /* * Call xsize_t() only when needed to avoid potentially unnecessary * die() for large files. */ if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, path)) { ret = index_stream_convert_blob(istate, oid, fd, path, flags); } else if (!S_ISREG(st->st_mode)) { ret = index_pipe(istate, oid, fd, type, path, flags); } else if ((st->st_size >= 0 && (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) || type != OBJ_BLOB || (path && would_convert_to_git(istate, path))) { ret = index_core(istate, oid, fd, xsize_t(st->st_size), type, path, flags); } else { struct object_database *odb = the_repository->objects; struct odb_transaction_files *files_transaction; struct odb_transaction *transaction; transaction = odb_transaction_begin(odb); files_transaction = container_of(odb->transaction, struct odb_transaction_files, base); ret = index_blob_packfile_transaction(files_transaction, oid, fd, xsize_t(st->st_size), path, flags); odb_transaction_commit(transaction); } close(fd); return ret; } ... Note that in 'if (type == OBJ_BLOB && st->st_size > big_file_threshold)' block, there is no check to see if 'the_repository->objects' is NULL. It assumes that an object database is available, which is not true in no-repository scenarios. One possible solution would be as follows: only when 'flags & HASH_WRITE_OBJECT' (or something like that) is true does this indicate that the file needs to be written in; if it is false only the hash value is required and the file should not be written in. Will send a patch to fix it, soon. Thanks, Yuchen ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database) 2026-04-04 16:45 ` Tian Yuchen @ 2026-04-04 16:53 ` Luca Stefani 2026-04-04 17:07 ` Tian Yuchen 0 siblings, 1 reply; 9+ messages in thread From: Luca Stefani @ 2026-04-04 16:53 UTC (permalink / raw) To: Tian Yuchen; +Cc: git Thanks for looking into it. Locally, I simply check against null storage and it works just fine, flags is always 0 in my experiments so a check against INDEX_WRITE_OBJECT also worked. diff --git a/object-file.c b/object-file.c index f0b029ff0b..68303aa99c 100644 --- a/object-file.c +++ b/object-file.c @@ -1654,7 +1654,8 @@ int index_fd(struct index_state *istate, struct object_id *oid, } else if ((st->st_size >= 0 && (size_t)st->st_size <= repo_settings_get_big_file_threshold(istate->repo)) || type != OBJ_BLOB || - (path && would_convert_to_git(istate, path))) { + (path && would_convert_to_git(istate, path)) || + !(flags & INDEX_WRITE_OBJECT)) { ret = index_core(istate, oid, fd, xsize_t(st->st_size), type, path, flags); } else { Luca. On Sat, 4 Apr 2026 at 18:45, Tian Yuchen <cat@malon.dev> wrote: > > On 4/4/26 18:39, Luca Stefani wrote: > > Thank you for filling out a Git bug report! > > Please answer the following questions to help us understand your issue. > > > > What did you do before the bug happened? (Steps to reproduce your issue) > > > > `git diff --color -- file1 file2` SIGSEGV in case file1 and file2 > > are "big enough", that is when the file size is bigger than > > repo_settings_get_big_file_threshold() > > > > What did you expect to happen? (Expected behavior) > > > > Not a crash, it to say "binary files are different" > > > > What happened instead? (Actual behavior) > > > > Program received signal SIGSEGV, Segmentation fault. > > index_fd (istate=istate@entry=0x5555559ccb40, > > oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0, > > type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0 > > "4/root_part", flags=0) at /usr/src/debug/git/git/object-file.c:1634 > > 1634 transaction = > > odb_transaction_begin(the_repository->objects); > > (gdb) bt > > #0 index_fd (istate=istate@entry=0x5555559ccb40, > > oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0, > > type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0 > > "4/root_part", flags=0) > > at /usr/src/debug/git/git/object-file.c:1634 > > #1 0x000055555579ab75 in index_path (istate=0x5555559ccb40, > > oid=0x5555559d3c80, path=0x5555559d3ce0 "4/root_part", > > st=0x7fffffffc5c0, flags=0) at > > /usr/src/debug/git/git/object-file.c:1658 > > #2 0x0000555555720f83 in diff_fill_oid_info (one=0x5555559d3c80, > > istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4690 > > #3 diff_fill_oid_info (one=one@entry=0x5555559d3c80, > > istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4679 > > #4 0x0000555555724aba in run_diff (p=0x5555559ce900, > > o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:4738 > > #5 diff_flush_patch (p=0x5555559ce900, o=0x7fffffffd188) at > > /usr/src/debug/git/git/diff.c:6247 > > #6 0x0000555555729850 in diff_flush_patch_all_file_pairs > > (o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:6802 > > #7 diff_flush (options=<optimized out>) at /usr/src/debug/git/git/diff.c:6942 > > #8 0x00005555555b0d34 in diff_no_index (revs=0x7fffffffcbd0, > > algop=0x55555598dbd0 <hash_algos+112>, implicit_no_index=<optimized > > out>, argc=2, argv=0x5555559ce2a0) at > > /usr/src/debug/git/git/diff-no-index.c:427 > > #9 cmd_diff (argc=<optimized out>, argv=0x5555559ce2a0, > > prefix=<optimized out>, repo=<optimized out>) at builtin/diff.c:516 > > #10 0x000055555555fe05 in run_builtin (p=0x555555994ec8 > > <commands.lto_priv+840>, argc=<optimized out>, argv=<optimized out>, > > repo=0x5555559a3c00 <the_repo.lto_priv>) at > > /usr/src/debug/git/git/git.c:506 > > #11 handle_builtin (args=args@entry=0x7fffffffd980) at > > /usr/src/debug/git/git/git.c:779 > > #12 0x00005555555612cc in run_argv (args=0x7fffffffd980) at > > /usr/src/debug/git/git/git.c:862 > > #13 cmd_main (argc=<optimized out>, argv=<optimized out>) at > > /usr/src/debug/git/git/git.c:984 > > #14 0x000055555555d794 in main (argc=6, argv=0x7fffffffdc58) at > > /usr/src/debug/git/git/common-main.c:9 > > (gdb) p the_repository->objects > > $1 = (struct object_database *) 0x0 > > (gdb) p istate->repo->objects > > $2 = (struct object_database *) 0x0 > > (gdb) p istate->repo > > $3 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv> > > (gdb) p the_repository > > $4 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv> > > > > The same issue happens in master, the same path is taken and when we > > finally get into > > `odb_transaction_begin` where odb is NULL, and reading ->transaction > > causes the illegal > > segment access once again. > > > > Program received signal SIGSEGV, Segmentation fault. > > 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075 > > 1075 if (odb->transaction) > > (gdb) bt > > #0 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075 > > #1 0x00005555557b94f9 in index_fd (istate=0x555555a88b60, > > oid=0x555555a8fcb0, fd=4, st=0x7fffffffc730, type=OBJ_BLOB, > > path=0x555555a8fd10 "4/root_part", flags=0) at object-file.c:1665 > > #2 0x00005555557b9651 in index_path (istate=0x555555a88b60, > > oid=0x555555a8fcb0, path=0x555555a8fd10 "4/root_part", > > st=0x7fffffffc730, flags=0) at object-file.c:1691 > > #3 0x0000555555730937 in diff_fill_oid_info (one=0x555555a8fcb0, > > istate=0x555555a88b60) at diff.c:4699 > > #4 0x0000555555730b81 in run_diff (p=0x555555a8a9a0, > > o=0x7fffffffd1a8) at diff.c:4747 > > #5 0x00005555557361d5 in diff_flush_patch (p=0x555555a8a9a0, > > o=0x7fffffffd1a8) at diff.c:6258 > > #6 0x0000555555737cae in diff_flush_patch_all_file_pairs > > (o=0x7fffffffd1a8) at diff.c:6813 > > #7 0x00005555557382b2 in diff_flush (options=0x7fffffffd1a8) at diff.c:6953 > > #8 0x00005555557228cc in diff_no_index (revs=0x7fffffffcbf0, > > algop=0x555555a49b50 <hash_algos+112>, implicit_no_index=1, argc=2, > > argv=0x555555a8a2f0) at diff-no-index.c:427 > > #9 0x00005555555c04b1 in cmd_diff (argc=5, argv=0x555555a8a2f0, > > prefix=0x0, repo=0x0) at builtin/diff.c:516 > > #10 0x0000555555574f47 in run_builtin (p=0x555555a4aaa8 > > <commands+840>, argc=5, argv=0x555555a8a2f0, repo=0x555555a7e620 > > <the_repo>) at git.c:506 > > #11 0x000055555557544e in handle_builtin (args=0x7fffffffdab0) at git.c:780 > > #12 0x0000555555575778 in run_argv (args=0x7fffffffdab0) at git.c:863 > > #13 0x0000555555575c0f in cmd_main (argc=5, argv=0x7fffffffdc50) at git.c:984 > > #14 0x00005555556a7414 in main (argc=6, argv=0x7fffffffdc48) at common-main.c:9 > > The problem is most likely to occur in object-file.c: > > ... > int index_fd(struct index_state *istate, struct object_id *oid, > int fd, struct stat *st, > enum object_type type, const char *path, unsigned flags) > { > int ret; > > /* > * Call xsize_t() only when needed to avoid potentially unnecessary > * die() for large files. > */ > if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate, > path)) { > ret = index_stream_convert_blob(istate, oid, fd, path, flags); > } else if (!S_ISREG(st->st_mode)) { > ret = index_pipe(istate, oid, fd, type, path, flags); > } else if ((st->st_size >= 0 && > (size_t)st->st_size <= > repo_settings_get_big_file_threshold(istate->repo)) || > type != OBJ_BLOB || > (path && would_convert_to_git(istate, path))) { > ret = index_core(istate, oid, fd, xsize_t(st->st_size), > type, path, flags); > } else { > struct object_database *odb = the_repository->objects; > struct odb_transaction_files *files_transaction; > struct odb_transaction *transaction; > > transaction = odb_transaction_begin(odb); > files_transaction = container_of(odb->transaction, > struct odb_transaction_files, > base); > ret = index_blob_packfile_transaction(files_transaction, oid, fd, > xsize_t(st->st_size), > path, flags); > odb_transaction_commit(transaction); > } > > close(fd); > return ret; > } > ... > > Note that in 'if (type == OBJ_BLOB && st->st_size > big_file_threshold)' > block, there is no check to see if 'the_repository->objects' is NULL. It > assumes that an object database is available, which is not true in > no-repository scenarios. > > One possible solution would be as follows: only when 'flags & > HASH_WRITE_OBJECT' (or something like that) is true does this indicate > that the file needs to be written in; if it is false only the hash value > is required and the file should not be written in. > > Will send a patch to fix it, soon. > > Thanks, Yuchen ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database) 2026-04-04 16:53 ` Luca Stefani @ 2026-04-04 17:07 ` Tian Yuchen 2026-04-04 23:09 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Tian Yuchen @ 2026-04-04 17:07 UTC (permalink / raw) To: Luca Stefani; +Cc: git On 4/5/26 00:53, Luca Stefani wrote: > Thanks for looking into it. > Locally, I simply check against null storage and it works just fine, > flags is always 0 in my experiments so a check against > INDEX_WRITE_OBJECT also worked. > > diff --git a/object-file.c b/object-file.c > index f0b029ff0b..68303aa99c 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -1654,7 +1654,8 @@ int index_fd(struct index_state *istate, struct > object_id *oid, > } else if ((st->st_size >= 0 && > (size_t)st->st_size <= > repo_settings_get_big_file_threshold(istate->repo)) || > type != OBJ_BLOB || > - (path && would_convert_to_git(istate, path))) { > + (path && would_convert_to_git(istate, path)) || > + !(flags & INDEX_WRITE_OBJECT)) { > ret = index_core(istate, oid, fd, xsize_t(st->st_size), > type, path, flags); > } else { > > Luca. That looks good, almost exactly what I was about to send. I was mistaken— there isn’t a hash_write_object flag after all ;-) It looks like this is your first time posting on the Git mailing list. Would you consider contributing this (as a patch)? Thanks! Yuchen ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database) 2026-04-04 17:07 ` Tian Yuchen @ 2026-04-04 23:09 ` Jeff King 2026-04-05 2:48 ` Tian Yuchen 2026-04-06 17:57 ` Justin Tobler 0 siblings, 2 replies; 9+ messages in thread From: Jeff King @ 2026-04-04 23:09 UTC (permalink / raw) To: Tian Yuchen; +Cc: Justin Tobler, Luca Stefani, git On Sun, Apr 05, 2026 at 01:07:27AM +0800, Tian Yuchen wrote: > On 4/5/26 00:53, Luca Stefani wrote: > > Thanks for looking into it. > > Locally, I simply check against null storage and it works just fine, > > flags is always 0 in my experiments so a check against > > INDEX_WRITE_OBJECT also worked. > > > > diff --git a/object-file.c b/object-file.c > > index f0b029ff0b..68303aa99c 100644 > > --- a/object-file.c > > +++ b/object-file.c > > @@ -1654,7 +1654,8 @@ int index_fd(struct index_state *istate, struct > > object_id *oid, > > } else if ((st->st_size >= 0 && > > (size_t)st->st_size <= > > repo_settings_get_big_file_threshold(istate->repo)) || > > type != OBJ_BLOB || > > - (path && would_convert_to_git(istate, path))) { > > + (path && would_convert_to_git(istate, path)) || > > + !(flags & INDEX_WRITE_OBJECT)) { > > ret = index_core(istate, oid, fd, xsize_t(st->st_size), > > type, path, flags); > > } else { > > > > Luca. > > That looks good, almost exactly what I was about to send. I was mistaken— > there isn’t a hash_write_object flag after all ;-) > > It looks like this is your first time posting on the Git mailing list. Would > you consider contributing this (as a patch)?s Alternatively, should the odb transaction system be more forgiving here, and act as a noop when there is no odb? Bisecting the segfault yields ce1661f9da (odb: add transaction interface, 2025-09-16). Before then, we passed around the object_database itself, saw that its transaction field was NULL, and returned immediately. After that commit, we pass the object_databse to odb_transaction_begin(), which narrows it to odb->sources (which is NULL) while passing to object_file_transaction_begin(). And then that function looks at source->odb to go back to the object_database! But the source being NULL, it segfaults. Immediately after that commit, the switch from taking an odb to a source is not helpful, though I think eventually it is used to set transaction->base.source. But should the whole thing check for a NULL source and return early? Or otherwise establish some kind of noop transaction? I haven't thought about the implications (nor even really looked at odb transaction code before). But doing it that way would fix not only this bug, but also other potential bugs throughout the code base when callers start a noop transaction. +cc Justin (author of ce1661f9da) for any thoughts. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database) 2026-04-04 23:09 ` Jeff King @ 2026-04-05 2:48 ` Tian Yuchen 2026-04-05 6:14 ` Jeff King 2026-04-06 17:57 ` Justin Tobler 1 sibling, 1 reply; 9+ messages in thread From: Tian Yuchen @ 2026-04-05 2:48 UTC (permalink / raw) To: Jeff King; +Cc: Justin Tobler, Luca Stefani, git On 4/5/26 07:09, Jeff King wrote: > Alternatively, should the odb transaction system be more forgiving here, > and act as a noop when there is no odb? > > Bisecting the segfault yields ce1661f9da (odb: add transaction > interface, 2025-09-16). Before then, we passed around the > object_database itself, saw that its transaction field was NULL, and > returned immediately. After that commit, we pass the object_databse to > odb_transaction_begin(), which narrows it to odb->sources (which is > NULL) while passing to object_file_transaction_begin(). And then that > function looks at source->odb to go back to the object_database! But > the source being NULL, it segfaults. > > Immediately after that commit, the switch from taking an odb to a source > is not helpful, though I think eventually it is used to set > transaction->base.source. But should the whole thing check for a NULL > source and return early? Or otherwise establish some kind of noop > transaction? > In a nutshell, are you suggesting that odb_transaction_begin() should return silently when source is NULL? I understand and agree with your intention, but in practical terms, if the odb has not been initialised correctly, data will be lost silently here and the caller will not receive an error message. I’m concerned that doing this might lead to more bugs in certain situations Would it make more sense to treat this as a bug instead, e.g. by triggering a BUG() when source is NULL? > I haven't thought about the implications (nor even really looked at odb > transaction code before). But doing it that way would fix not only this > bug, but also other potential bugs throughout the code base when callers > start a noop transaction. > > +cc Justin (author of ce1661f9da) for any thoughts. > > -Peff Thanks! Yuchen ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database) 2026-04-05 2:48 ` Tian Yuchen @ 2026-04-05 6:14 ` Jeff King 0 siblings, 0 replies; 9+ messages in thread From: Jeff King @ 2026-04-05 6:14 UTC (permalink / raw) To: Tian Yuchen; +Cc: Justin Tobler, Luca Stefani, git On Sun, Apr 05, 2026 at 10:48:16AM +0800, Tian Yuchen wrote: > > Immediately after that commit, the switch from taking an odb to a source > > is not helpful, though I think eventually it is used to set > > transaction->base.source. But should the whole thing check for a NULL > > source and return early? Or otherwise establish some kind of noop > > transaction? > > > > In a nutshell, are you suggesting that odb_transaction_begin() should return > silently when source is NULL? > > I understand and agree with your intention, but in practical terms, if the > odb has not been initialised correctly, data will be lost silently here and > the caller will not receive an error message. I’m concerned that doing this > might lead to more bugs in certain situations The source of data loss is not the transaction begin/end, though, but the actual write operation inside it. If I do this: t = odb_transaction_begin(the_repository->objects); if (startup_info->have_repository) odb_write_object(the_repository->objects, ...); odb_transaction_commit(t); should it be an error? We chose not to write because we knew we did not have an odb, so there is no data to be lost. Starting or ending a transaction without an odb _is_ a noop, because there is nothing that can or will be written. It is asking to write that is the error. In the example above you could obviously push the transaction begin/end into the conditional. But that's harder when the "if" part is happening inside a helper function, which is exactly what's going on in index_fd() here. It passes the flags (which would omit INDEX_WRITE_OBJECT) down to index_blob_packfile_transaction(), which then goes through all its regular motions without touching the odb. This used to work because begin_odb_transaction() did quietly return NULL, and index_fd() did not have to care about the flag at all. > Would it make more sense to treat this as a bug instead, e.g. by > triggering a BUG() when source is NULL? I don't think so, if the point is to be kind to callers. We should still BUG() on an actual write when there is no odb or repo, of course. I looked over the other callers of odb_transaction_begin() and I think this is probably the only affected caller. But I don't think it's purely academic. If you teach index_fd() to use index_core() when the INDEX_WRITE_OBJECT is not set (as proposed earlier in this thread), I think you'll fall afoul of the comment at the top of the function: /* * Call xsize_t() only when needed to avoid potentially unnecessary * die() for large files. */ E.g., on a 32-bit system we'd die() on a 4GB file now (because we can't fit its size in a size_t anymore), and probably even smaller (because we'd try to mmap() it and there wouldn't be a large enough contiguous chunk of the address space). So we really do want to follow the streaming path, even if we're not writing out the object. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database) 2026-04-04 23:09 ` Jeff King 2026-04-05 2:48 ` Tian Yuchen @ 2026-04-06 17:57 ` Justin Tobler 2026-04-06 20:45 ` Junio C Hamano 1 sibling, 1 reply; 9+ messages in thread From: Justin Tobler @ 2026-04-06 17:57 UTC (permalink / raw) To: Jeff King; +Cc: Tian Yuchen, Luca Stefani, git On 26/04/04 07:09PM, Jeff King wrote: > Alternatively, should the odb transaction system be more forgiving here, > and act as a noop when there is no odb? > > Bisecting the segfault yields ce1661f9da (odb: add transaction > interface, 2025-09-16). Before then, we passed around the > object_database itself, saw that its transaction field was NULL, and > returned immediately. After that commit, we pass the object_databse to > odb_transaction_begin(), which narrows it to odb->sources (which is > NULL) while passing to object_file_transaction_begin(). And then that > function looks at source->odb to go back to the object_database! But > the source being NULL, it segfaults. > > Immediately after that commit, the switch from taking an odb to a source > is not helpful, though I think eventually it is used to set > transaction->base.source. But should the whole thing check for a NULL > source and return early? Or otherwise establish some kind of noop > transaction? > > I haven't thought about the implications (nor even really looked at odb > transaction code before). But doing it that way would fix not only this > bug, but also other potential bugs throughout the code base when callers > start a noop transaction. > > +cc Justin (author of ce1661f9da) for any thoughts. Thanks Peff for the cc! :) Ok so the segfault in question can be reproduced with the following: echo foo >foo echo bar >bar git -c core.bigFileThreshold=1 diff -- foo bar And indeed the problematic code path is the `odb_transaction_begin()` invoked via `object-file.c:index_fd()`. In a scenario where git-diff(1) is being executed outside of a repository and involves files that exceed "core.bigFileThreshold", `index_blob_packfile_transaction()` is invoked without the `INDEX_WRITE_OBJECT` flag to stream the "large" object object and generate its hash only. I unfortunately didn't consider this use case. IMO it is already questionable as to why we would want to start an ODB transaction if it is already known that the object won't be written. IOW, if we are only interested in streaming the object to get its hash we shouldn't have to start a transaction. As part of a patch series I'm currently working on [1], a separate `hash_blob_stream()` helper is introduced to just hash the blob and is used to bypass the ODB transaction entirely. Building/testing with this patch does appear to fix the issue. This series is only in "seen" right now, so I doubt it would make this next release. I don't mind extracting this patch from its current series and submitting it as a fix if this the route we want to go though. The other option would be as Peff suggested and to make `odb_transaction_begin()` just return early with NULL or some type of no-op transaction if there is no ODB set up. I do think the former approach would be preferable though as I'm not sure there is really a good use case for supporting ODB transactions when there isn't an ODB set up. -Justin [1]: https://lore.kernel.org/git/20260402213220.2651523-5-jltobler@gmail.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database) 2026-04-06 17:57 ` Justin Tobler @ 2026-04-06 20:45 ` Junio C Hamano 0 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2026-04-06 20:45 UTC (permalink / raw) To: Justin Tobler; +Cc: Jeff King, Tian Yuchen, Luca Stefani, git Justin Tobler <jltobler@gmail.com> writes: > IMO it is already questionable as to why we would want to start an ODB > transaction if it is already known that the object won't be written. > IOW, if we are only interested in streaming the object to get its hash > we shouldn't have to start a transaction. Very well. > The other option would be as Peff suggested and to make > `odb_transaction_begin()` just return early with NULL or some type of > no-op transaction if there is no ODB set up. I do think the former > approach would be preferable though as I'm not sure there is really a > good use case for supporting ODB transactions when there isn't an ODB > set up. Let's not go there. I agree that odb transactions without odb does not make any sense. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-06 20:45 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-04 10:39 [BUG] git diff --no-index segfaults on large files (NULL object database) Luca Stefani 2026-04-04 16:45 ` Tian Yuchen 2026-04-04 16:53 ` Luca Stefani 2026-04-04 17:07 ` Tian Yuchen 2026-04-04 23:09 ` Jeff King 2026-04-05 2:48 ` Tian Yuchen 2026-04-05 6:14 ` Jeff King 2026-04-06 17:57 ` Justin Tobler 2026-04-06 20:45 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox