* [BUG] git diff --no-index segfaults on large files (NULL object database)
@ 2026-04-04 10:39 Luca Stefani
2026-04-04 16:45 ` Tian Yuchen
0 siblings, 1 reply; 9+ messages in thread
From: Luca Stefani @ 2026-04-04 10:39 UTC (permalink / raw)
To: git
Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.
What did you do before the bug happened? (Steps to reproduce your issue)
`git diff --color -- file1 file2` SIGSEGV in case file1 and file2
are "big enough", that is when the file size is bigger than
repo_settings_get_big_file_threshold()
What did you expect to happen? (Expected behavior)
Not a crash, it to say "binary files are different"
What happened instead? (Actual behavior)
Program received signal SIGSEGV, Segmentation fault.
index_fd (istate=istate@entry=0x5555559ccb40,
oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0,
type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0
"4/root_part", flags=0) at /usr/src/debug/git/git/object-file.c:1634
1634 transaction =
odb_transaction_begin(the_repository->objects);
(gdb) bt
#0 index_fd (istate=istate@entry=0x5555559ccb40,
oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0,
type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0
"4/root_part", flags=0)
at /usr/src/debug/git/git/object-file.c:1634
#1 0x000055555579ab75 in index_path (istate=0x5555559ccb40,
oid=0x5555559d3c80, path=0x5555559d3ce0 "4/root_part",
st=0x7fffffffc5c0, flags=0) at
/usr/src/debug/git/git/object-file.c:1658
#2 0x0000555555720f83 in diff_fill_oid_info (one=0x5555559d3c80,
istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4690
#3 diff_fill_oid_info (one=one@entry=0x5555559d3c80,
istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4679
#4 0x0000555555724aba in run_diff (p=0x5555559ce900,
o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:4738
#5 diff_flush_patch (p=0x5555559ce900, o=0x7fffffffd188) at
/usr/src/debug/git/git/diff.c:6247
#6 0x0000555555729850 in diff_flush_patch_all_file_pairs
(o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:6802
#7 diff_flush (options=<optimized out>) at /usr/src/debug/git/git/diff.c:6942
#8 0x00005555555b0d34 in diff_no_index (revs=0x7fffffffcbd0,
algop=0x55555598dbd0 <hash_algos+112>, implicit_no_index=<optimized
out>, argc=2, argv=0x5555559ce2a0) at
/usr/src/debug/git/git/diff-no-index.c:427
#9 cmd_diff (argc=<optimized out>, argv=0x5555559ce2a0,
prefix=<optimized out>, repo=<optimized out>) at builtin/diff.c:516
#10 0x000055555555fe05 in run_builtin (p=0x555555994ec8
<commands.lto_priv+840>, argc=<optimized out>, argv=<optimized out>,
repo=0x5555559a3c00 <the_repo.lto_priv>) at
/usr/src/debug/git/git/git.c:506
#11 handle_builtin (args=args@entry=0x7fffffffd980) at
/usr/src/debug/git/git/git.c:779
#12 0x00005555555612cc in run_argv (args=0x7fffffffd980) at
/usr/src/debug/git/git/git.c:862
#13 cmd_main (argc=<optimized out>, argv=<optimized out>) at
/usr/src/debug/git/git/git.c:984
#14 0x000055555555d794 in main (argc=6, argv=0x7fffffffdc58) at
/usr/src/debug/git/git/common-main.c:9
(gdb) p the_repository->objects
$1 = (struct object_database *) 0x0
(gdb) p istate->repo->objects
$2 = (struct object_database *) 0x0
(gdb) p istate->repo
$3 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv>
(gdb) p the_repository
$4 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv>
The same issue happens in master, the same path is taken and when we
finally get into
`odb_transaction_begin` where odb is NULL, and reading ->transaction
causes the illegal
segment access once again.
Program received signal SIGSEGV, Segmentation fault.
0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075
1075 if (odb->transaction)
(gdb) bt
#0 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075
#1 0x00005555557b94f9 in index_fd (istate=0x555555a88b60,
oid=0x555555a8fcb0, fd=4, st=0x7fffffffc730, type=OBJ_BLOB,
path=0x555555a8fd10 "4/root_part", flags=0) at object-file.c:1665
#2 0x00005555557b9651 in index_path (istate=0x555555a88b60,
oid=0x555555a8fcb0, path=0x555555a8fd10 "4/root_part",
st=0x7fffffffc730, flags=0) at object-file.c:1691
#3 0x0000555555730937 in diff_fill_oid_info (one=0x555555a8fcb0,
istate=0x555555a88b60) at diff.c:4699
#4 0x0000555555730b81 in run_diff (p=0x555555a8a9a0,
o=0x7fffffffd1a8) at diff.c:4747
#5 0x00005555557361d5 in diff_flush_patch (p=0x555555a8a9a0,
o=0x7fffffffd1a8) at diff.c:6258
#6 0x0000555555737cae in diff_flush_patch_all_file_pairs
(o=0x7fffffffd1a8) at diff.c:6813
#7 0x00005555557382b2 in diff_flush (options=0x7fffffffd1a8) at diff.c:6953
#8 0x00005555557228cc in diff_no_index (revs=0x7fffffffcbf0,
algop=0x555555a49b50 <hash_algos+112>, implicit_no_index=1, argc=2,
argv=0x555555a8a2f0) at diff-no-index.c:427
#9 0x00005555555c04b1 in cmd_diff (argc=5, argv=0x555555a8a2f0,
prefix=0x0, repo=0x0) at builtin/diff.c:516
#10 0x0000555555574f47 in run_builtin (p=0x555555a4aaa8
<commands+840>, argc=5, argv=0x555555a8a2f0, repo=0x555555a7e620
<the_repo>) at git.c:506
#11 0x000055555557544e in handle_builtin (args=0x7fffffffdab0) at git.c:780
#12 0x0000555555575778 in run_argv (args=0x7fffffffdab0) at git.c:863
#13 0x0000555555575c0f in cmd_main (argc=5, argv=0x7fffffffdc50) at git.c:984
#14 0x00005555556a7414 in main (argc=6, argv=0x7fffffffdc48) at common-main.c:9
What's different between what you expected and what actually happened?
Anything else you want to add:
Please review the rest of the bug report below.
You can delete any lines you don't wish to share.
[System Info]
git version:
git version 2.53.0
cpu: x86_64
built from commit: 67ad42147a7acc2af6074753ebd03d904476118f
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
rust: enabled
gettext: enabled
libcurl: 8.18.0
OpenSSL: OpenSSL 3.6.1 27 Jan 2026
zlib-ng: 2.3.2
SHA-1: SHA1_DC
SHA-256: SHA256_BLK
default-ref-format: files
default-hash: sha1
uname: Linux 6.19.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 19 Mar 2026
16:33:17 +0000 x86_64
compiler info: gnuc: 15.2
libc info: glibc: 2.43
$SHELL (typically, interactive shell): /usr/bin/fish
[Enabled Hooks]
not run from a git repository - no hooks to show
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
2026-04-04 10:39 [BUG] git diff --no-index segfaults on large files (NULL object database) Luca Stefani
@ 2026-04-04 16:45 ` Tian Yuchen
2026-04-04 16:53 ` Luca Stefani
0 siblings, 1 reply; 9+ messages in thread
From: Tian Yuchen @ 2026-04-04 16:45 UTC (permalink / raw)
To: Luca Stefani, git
On 4/4/26 18:39, Luca Stefani wrote:
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
>
> `git diff --color -- file1 file2` SIGSEGV in case file1 and file2
> are "big enough", that is when the file size is bigger than
> repo_settings_get_big_file_threshold()
>
> What did you expect to happen? (Expected behavior)
>
> Not a crash, it to say "binary files are different"
>
> What happened instead? (Actual behavior)
>
> Program received signal SIGSEGV, Segmentation fault.
> index_fd (istate=istate@entry=0x5555559ccb40,
> oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0,
> type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0
> "4/root_part", flags=0) at /usr/src/debug/git/git/object-file.c:1634
> 1634 transaction =
> odb_transaction_begin(the_repository->objects);
> (gdb) bt
> #0 index_fd (istate=istate@entry=0x5555559ccb40,
> oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0,
> type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0
> "4/root_part", flags=0)
> at /usr/src/debug/git/git/object-file.c:1634
> #1 0x000055555579ab75 in index_path (istate=0x5555559ccb40,
> oid=0x5555559d3c80, path=0x5555559d3ce0 "4/root_part",
> st=0x7fffffffc5c0, flags=0) at
> /usr/src/debug/git/git/object-file.c:1658
> #2 0x0000555555720f83 in diff_fill_oid_info (one=0x5555559d3c80,
> istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4690
> #3 diff_fill_oid_info (one=one@entry=0x5555559d3c80,
> istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4679
> #4 0x0000555555724aba in run_diff (p=0x5555559ce900,
> o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:4738
> #5 diff_flush_patch (p=0x5555559ce900, o=0x7fffffffd188) at
> /usr/src/debug/git/git/diff.c:6247
> #6 0x0000555555729850 in diff_flush_patch_all_file_pairs
> (o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:6802
> #7 diff_flush (options=<optimized out>) at /usr/src/debug/git/git/diff.c:6942
> #8 0x00005555555b0d34 in diff_no_index (revs=0x7fffffffcbd0,
> algop=0x55555598dbd0 <hash_algos+112>, implicit_no_index=<optimized
> out>, argc=2, argv=0x5555559ce2a0) at
> /usr/src/debug/git/git/diff-no-index.c:427
> #9 cmd_diff (argc=<optimized out>, argv=0x5555559ce2a0,
> prefix=<optimized out>, repo=<optimized out>) at builtin/diff.c:516
> #10 0x000055555555fe05 in run_builtin (p=0x555555994ec8
> <commands.lto_priv+840>, argc=<optimized out>, argv=<optimized out>,
> repo=0x5555559a3c00 <the_repo.lto_priv>) at
> /usr/src/debug/git/git/git.c:506
> #11 handle_builtin (args=args@entry=0x7fffffffd980) at
> /usr/src/debug/git/git/git.c:779
> #12 0x00005555555612cc in run_argv (args=0x7fffffffd980) at
> /usr/src/debug/git/git/git.c:862
> #13 cmd_main (argc=<optimized out>, argv=<optimized out>) at
> /usr/src/debug/git/git/git.c:984
> #14 0x000055555555d794 in main (argc=6, argv=0x7fffffffdc58) at
> /usr/src/debug/git/git/common-main.c:9
> (gdb) p the_repository->objects
> $1 = (struct object_database *) 0x0
> (gdb) p istate->repo->objects
> $2 = (struct object_database *) 0x0
> (gdb) p istate->repo
> $3 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv>
> (gdb) p the_repository
> $4 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv>
>
> The same issue happens in master, the same path is taken and when we
> finally get into
> `odb_transaction_begin` where odb is NULL, and reading ->transaction
> causes the illegal
> segment access once again.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075
> 1075 if (odb->transaction)
> (gdb) bt
> #0 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075
> #1 0x00005555557b94f9 in index_fd (istate=0x555555a88b60,
> oid=0x555555a8fcb0, fd=4, st=0x7fffffffc730, type=OBJ_BLOB,
> path=0x555555a8fd10 "4/root_part", flags=0) at object-file.c:1665
> #2 0x00005555557b9651 in index_path (istate=0x555555a88b60,
> oid=0x555555a8fcb0, path=0x555555a8fd10 "4/root_part",
> st=0x7fffffffc730, flags=0) at object-file.c:1691
> #3 0x0000555555730937 in diff_fill_oid_info (one=0x555555a8fcb0,
> istate=0x555555a88b60) at diff.c:4699
> #4 0x0000555555730b81 in run_diff (p=0x555555a8a9a0,
> o=0x7fffffffd1a8) at diff.c:4747
> #5 0x00005555557361d5 in diff_flush_patch (p=0x555555a8a9a0,
> o=0x7fffffffd1a8) at diff.c:6258
> #6 0x0000555555737cae in diff_flush_patch_all_file_pairs
> (o=0x7fffffffd1a8) at diff.c:6813
> #7 0x00005555557382b2 in diff_flush (options=0x7fffffffd1a8) at diff.c:6953
> #8 0x00005555557228cc in diff_no_index (revs=0x7fffffffcbf0,
> algop=0x555555a49b50 <hash_algos+112>, implicit_no_index=1, argc=2,
> argv=0x555555a8a2f0) at diff-no-index.c:427
> #9 0x00005555555c04b1 in cmd_diff (argc=5, argv=0x555555a8a2f0,
> prefix=0x0, repo=0x0) at builtin/diff.c:516
> #10 0x0000555555574f47 in run_builtin (p=0x555555a4aaa8
> <commands+840>, argc=5, argv=0x555555a8a2f0, repo=0x555555a7e620
> <the_repo>) at git.c:506
> #11 0x000055555557544e in handle_builtin (args=0x7fffffffdab0) at git.c:780
> #12 0x0000555555575778 in run_argv (args=0x7fffffffdab0) at git.c:863
> #13 0x0000555555575c0f in cmd_main (argc=5, argv=0x7fffffffdc50) at git.c:984
> #14 0x00005555556a7414 in main (argc=6, argv=0x7fffffffdc48) at common-main.c:9
The problem is most likely to occur in object-file.c:
...
int index_fd(struct index_state *istate, struct object_id *oid,
int fd, struct stat *st,
enum object_type type, const char *path, unsigned flags)
{
int ret;
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
* die() for large files.
*/
if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate,
path)) {
ret = index_stream_convert_blob(istate, oid, fd, path, flags);
} else if (!S_ISREG(st->st_mode)) {
ret = index_pipe(istate, oid, fd, type, path, flags);
} else if ((st->st_size >= 0 &&
(size_t)st->st_size <=
repo_settings_get_big_file_threshold(istate->repo)) ||
type != OBJ_BLOB ||
(path && would_convert_to_git(istate, path))) {
ret = index_core(istate, oid, fd, xsize_t(st->st_size),
type, path, flags);
} else {
struct object_database *odb = the_repository->objects;
struct odb_transaction_files *files_transaction;
struct odb_transaction *transaction;
transaction = odb_transaction_begin(odb);
files_transaction = container_of(odb->transaction,
struct odb_transaction_files,
base);
ret = index_blob_packfile_transaction(files_transaction, oid, fd,
xsize_t(st->st_size),
path, flags);
odb_transaction_commit(transaction);
}
close(fd);
return ret;
}
...
Note that in 'if (type == OBJ_BLOB && st->st_size > big_file_threshold)'
block, there is no check to see if 'the_repository->objects' is NULL. It
assumes that an object database is available, which is not true in
no-repository scenarios.
One possible solution would be as follows: only when 'flags &
HASH_WRITE_OBJECT' (or something like that) is true does this indicate
that the file needs to be written in; if it is false only the hash value
is required and the file should not be written in.
Will send a patch to fix it, soon.
Thanks, Yuchen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
2026-04-04 16:45 ` Tian Yuchen
@ 2026-04-04 16:53 ` Luca Stefani
2026-04-04 17:07 ` Tian Yuchen
0 siblings, 1 reply; 9+ messages in thread
From: Luca Stefani @ 2026-04-04 16:53 UTC (permalink / raw)
To: Tian Yuchen; +Cc: git
Thanks for looking into it.
Locally, I simply check against null storage and it works just fine,
flags is always 0 in my experiments so a check against
INDEX_WRITE_OBJECT also worked.
diff --git a/object-file.c b/object-file.c
index f0b029ff0b..68303aa99c 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1654,7 +1654,8 @@ int index_fd(struct index_state *istate, struct
object_id *oid,
} else if ((st->st_size >= 0 &&
(size_t)st->st_size <=
repo_settings_get_big_file_threshold(istate->repo)) ||
type != OBJ_BLOB ||
- (path && would_convert_to_git(istate, path))) {
+ (path && would_convert_to_git(istate, path)) ||
+ !(flags & INDEX_WRITE_OBJECT)) {
ret = index_core(istate, oid, fd, xsize_t(st->st_size),
type, path, flags);
} else {
Luca.
On Sat, 4 Apr 2026 at 18:45, Tian Yuchen <cat@malon.dev> wrote:
>
> On 4/4/26 18:39, Luca Stefani wrote:
> > Thank you for filling out a Git bug report!
> > Please answer the following questions to help us understand your issue.
> >
> > What did you do before the bug happened? (Steps to reproduce your issue)
> >
> > `git diff --color -- file1 file2` SIGSEGV in case file1 and file2
> > are "big enough", that is when the file size is bigger than
> > repo_settings_get_big_file_threshold()
> >
> > What did you expect to happen? (Expected behavior)
> >
> > Not a crash, it to say "binary files are different"
> >
> > What happened instead? (Actual behavior)
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > index_fd (istate=istate@entry=0x5555559ccb40,
> > oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0,
> > type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0
> > "4/root_part", flags=0) at /usr/src/debug/git/git/object-file.c:1634
> > 1634 transaction =
> > odb_transaction_begin(the_repository->objects);
> > (gdb) bt
> > #0 index_fd (istate=istate@entry=0x5555559ccb40,
> > oid=oid@entry=0x5555559d3c80, fd=4, st=st@entry=0x7fffffffc5c0,
> > type=type@entry=OBJ_BLOB, path=path@entry=0x5555559d3ce0
> > "4/root_part", flags=0)
> > at /usr/src/debug/git/git/object-file.c:1634
> > #1 0x000055555579ab75 in index_path (istate=0x5555559ccb40,
> > oid=0x5555559d3c80, path=0x5555559d3ce0 "4/root_part",
> > st=0x7fffffffc5c0, flags=0) at
> > /usr/src/debug/git/git/object-file.c:1658
> > #2 0x0000555555720f83 in diff_fill_oid_info (one=0x5555559d3c80,
> > istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4690
> > #3 diff_fill_oid_info (one=one@entry=0x5555559d3c80,
> > istate=0x5555559ccb40) at /usr/src/debug/git/git/diff.c:4679
> > #4 0x0000555555724aba in run_diff (p=0x5555559ce900,
> > o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:4738
> > #5 diff_flush_patch (p=0x5555559ce900, o=0x7fffffffd188) at
> > /usr/src/debug/git/git/diff.c:6247
> > #6 0x0000555555729850 in diff_flush_patch_all_file_pairs
> > (o=0x7fffffffd188) at /usr/src/debug/git/git/diff.c:6802
> > #7 diff_flush (options=<optimized out>) at /usr/src/debug/git/git/diff.c:6942
> > #8 0x00005555555b0d34 in diff_no_index (revs=0x7fffffffcbd0,
> > algop=0x55555598dbd0 <hash_algos+112>, implicit_no_index=<optimized
> > out>, argc=2, argv=0x5555559ce2a0) at
> > /usr/src/debug/git/git/diff-no-index.c:427
> > #9 cmd_diff (argc=<optimized out>, argv=0x5555559ce2a0,
> > prefix=<optimized out>, repo=<optimized out>) at builtin/diff.c:516
> > #10 0x000055555555fe05 in run_builtin (p=0x555555994ec8
> > <commands.lto_priv+840>, argc=<optimized out>, argv=<optimized out>,
> > repo=0x5555559a3c00 <the_repo.lto_priv>) at
> > /usr/src/debug/git/git/git.c:506
> > #11 handle_builtin (args=args@entry=0x7fffffffd980) at
> > /usr/src/debug/git/git/git.c:779
> > #12 0x00005555555612cc in run_argv (args=0x7fffffffd980) at
> > /usr/src/debug/git/git/git.c:862
> > #13 cmd_main (argc=<optimized out>, argv=<optimized out>) at
> > /usr/src/debug/git/git/git.c:984
> > #14 0x000055555555d794 in main (argc=6, argv=0x7fffffffdc58) at
> > /usr/src/debug/git/git/common-main.c:9
> > (gdb) p the_repository->objects
> > $1 = (struct object_database *) 0x0
> > (gdb) p istate->repo->objects
> > $2 = (struct object_database *) 0x0
> > (gdb) p istate->repo
> > $3 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv>
> > (gdb) p the_repository
> > $4 = (struct repository *) 0x5555559a3c00 <the_repo.lto_priv>
> >
> > The same issue happens in master, the same path is taken and when we
> > finally get into
> > `odb_transaction_begin` where odb is NULL, and reading ->transaction
> > causes the illegal
> > segment access once again.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075
> > 1075 if (odb->transaction)
> > (gdb) bt
> > #0 0x00005555557c6283 in odb_transaction_begin (odb=0x0) at odb.c:1075
> > #1 0x00005555557b94f9 in index_fd (istate=0x555555a88b60,
> > oid=0x555555a8fcb0, fd=4, st=0x7fffffffc730, type=OBJ_BLOB,
> > path=0x555555a8fd10 "4/root_part", flags=0) at object-file.c:1665
> > #2 0x00005555557b9651 in index_path (istate=0x555555a88b60,
> > oid=0x555555a8fcb0, path=0x555555a8fd10 "4/root_part",
> > st=0x7fffffffc730, flags=0) at object-file.c:1691
> > #3 0x0000555555730937 in diff_fill_oid_info (one=0x555555a8fcb0,
> > istate=0x555555a88b60) at diff.c:4699
> > #4 0x0000555555730b81 in run_diff (p=0x555555a8a9a0,
> > o=0x7fffffffd1a8) at diff.c:4747
> > #5 0x00005555557361d5 in diff_flush_patch (p=0x555555a8a9a0,
> > o=0x7fffffffd1a8) at diff.c:6258
> > #6 0x0000555555737cae in diff_flush_patch_all_file_pairs
> > (o=0x7fffffffd1a8) at diff.c:6813
> > #7 0x00005555557382b2 in diff_flush (options=0x7fffffffd1a8) at diff.c:6953
> > #8 0x00005555557228cc in diff_no_index (revs=0x7fffffffcbf0,
> > algop=0x555555a49b50 <hash_algos+112>, implicit_no_index=1, argc=2,
> > argv=0x555555a8a2f0) at diff-no-index.c:427
> > #9 0x00005555555c04b1 in cmd_diff (argc=5, argv=0x555555a8a2f0,
> > prefix=0x0, repo=0x0) at builtin/diff.c:516
> > #10 0x0000555555574f47 in run_builtin (p=0x555555a4aaa8
> > <commands+840>, argc=5, argv=0x555555a8a2f0, repo=0x555555a7e620
> > <the_repo>) at git.c:506
> > #11 0x000055555557544e in handle_builtin (args=0x7fffffffdab0) at git.c:780
> > #12 0x0000555555575778 in run_argv (args=0x7fffffffdab0) at git.c:863
> > #13 0x0000555555575c0f in cmd_main (argc=5, argv=0x7fffffffdc50) at git.c:984
> > #14 0x00005555556a7414 in main (argc=6, argv=0x7fffffffdc48) at common-main.c:9
>
> The problem is most likely to occur in object-file.c:
>
> ...
> int index_fd(struct index_state *istate, struct object_id *oid,
> int fd, struct stat *st,
> enum object_type type, const char *path, unsigned flags)
> {
> int ret;
>
> /*
> * Call xsize_t() only when needed to avoid potentially unnecessary
> * die() for large files.
> */
> if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(istate,
> path)) {
> ret = index_stream_convert_blob(istate, oid, fd, path, flags);
> } else if (!S_ISREG(st->st_mode)) {
> ret = index_pipe(istate, oid, fd, type, path, flags);
> } else if ((st->st_size >= 0 &&
> (size_t)st->st_size <=
> repo_settings_get_big_file_threshold(istate->repo)) ||
> type != OBJ_BLOB ||
> (path && would_convert_to_git(istate, path))) {
> ret = index_core(istate, oid, fd, xsize_t(st->st_size),
> type, path, flags);
> } else {
> struct object_database *odb = the_repository->objects;
> struct odb_transaction_files *files_transaction;
> struct odb_transaction *transaction;
>
> transaction = odb_transaction_begin(odb);
> files_transaction = container_of(odb->transaction,
> struct odb_transaction_files,
> base);
> ret = index_blob_packfile_transaction(files_transaction, oid, fd,
> xsize_t(st->st_size),
> path, flags);
> odb_transaction_commit(transaction);
> }
>
> close(fd);
> return ret;
> }
> ...
>
> Note that in 'if (type == OBJ_BLOB && st->st_size > big_file_threshold)'
> block, there is no check to see if 'the_repository->objects' is NULL. It
> assumes that an object database is available, which is not true in
> no-repository scenarios.
>
> One possible solution would be as follows: only when 'flags &
> HASH_WRITE_OBJECT' (or something like that) is true does this indicate
> that the file needs to be written in; if it is false only the hash value
> is required and the file should not be written in.
>
> Will send a patch to fix it, soon.
>
> Thanks, Yuchen
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
2026-04-04 16:53 ` Luca Stefani
@ 2026-04-04 17:07 ` Tian Yuchen
2026-04-04 23:09 ` Jeff King
0 siblings, 1 reply; 9+ messages in thread
From: Tian Yuchen @ 2026-04-04 17:07 UTC (permalink / raw)
To: Luca Stefani; +Cc: git
On 4/5/26 00:53, Luca Stefani wrote:
> Thanks for looking into it.
> Locally, I simply check against null storage and it works just fine,
> flags is always 0 in my experiments so a check against
> INDEX_WRITE_OBJECT also worked.
>
> diff --git a/object-file.c b/object-file.c
> index f0b029ff0b..68303aa99c 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1654,7 +1654,8 @@ int index_fd(struct index_state *istate, struct
> object_id *oid,
> } else if ((st->st_size >= 0 &&
> (size_t)st->st_size <=
> repo_settings_get_big_file_threshold(istate->repo)) ||
> type != OBJ_BLOB ||
> - (path && would_convert_to_git(istate, path))) {
> + (path && would_convert_to_git(istate, path)) ||
> + !(flags & INDEX_WRITE_OBJECT)) {
> ret = index_core(istate, oid, fd, xsize_t(st->st_size),
> type, path, flags);
> } else {
>
> Luca.
That looks good, almost exactly what I was about to send. I was
mistaken— there isn’t a hash_write_object flag after all ;-)
It looks like this is your first time posting on the Git mailing list.
Would you consider contributing this (as a patch)?
Thanks! Yuchen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
2026-04-04 17:07 ` Tian Yuchen
@ 2026-04-04 23:09 ` Jeff King
2026-04-05 2:48 ` Tian Yuchen
2026-04-06 17:57 ` Justin Tobler
0 siblings, 2 replies; 9+ messages in thread
From: Jeff King @ 2026-04-04 23:09 UTC (permalink / raw)
To: Tian Yuchen; +Cc: Justin Tobler, Luca Stefani, git
On Sun, Apr 05, 2026 at 01:07:27AM +0800, Tian Yuchen wrote:
> On 4/5/26 00:53, Luca Stefani wrote:
> > Thanks for looking into it.
> > Locally, I simply check against null storage and it works just fine,
> > flags is always 0 in my experiments so a check against
> > INDEX_WRITE_OBJECT also worked.
> >
> > diff --git a/object-file.c b/object-file.c
> > index f0b029ff0b..68303aa99c 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1654,7 +1654,8 @@ int index_fd(struct index_state *istate, struct
> > object_id *oid,
> > } else if ((st->st_size >= 0 &&
> > (size_t)st->st_size <=
> > repo_settings_get_big_file_threshold(istate->repo)) ||
> > type != OBJ_BLOB ||
> > - (path && would_convert_to_git(istate, path))) {
> > + (path && would_convert_to_git(istate, path)) ||
> > + !(flags & INDEX_WRITE_OBJECT)) {
> > ret = index_core(istate, oid, fd, xsize_t(st->st_size),
> > type, path, flags);
> > } else {
> >
> > Luca.
>
> That looks good, almost exactly what I was about to send. I was mistaken—
> there isn’t a hash_write_object flag after all ;-)
>
> It looks like this is your first time posting on the Git mailing list. Would
> you consider contributing this (as a patch)?s
Alternatively, should the odb transaction system be more forgiving here,
and act as a noop when there is no odb?
Bisecting the segfault yields ce1661f9da (odb: add transaction
interface, 2025-09-16). Before then, we passed around the
object_database itself, saw that its transaction field was NULL, and
returned immediately. After that commit, we pass the object_databse to
odb_transaction_begin(), which narrows it to odb->sources (which is
NULL) while passing to object_file_transaction_begin(). And then that
function looks at source->odb to go back to the object_database! But
the source being NULL, it segfaults.
Immediately after that commit, the switch from taking an odb to a source
is not helpful, though I think eventually it is used to set
transaction->base.source. But should the whole thing check for a NULL
source and return early? Or otherwise establish some kind of noop
transaction?
I haven't thought about the implications (nor even really looked at odb
transaction code before). But doing it that way would fix not only this
bug, but also other potential bugs throughout the code base when callers
start a noop transaction.
+cc Justin (author of ce1661f9da) for any thoughts.
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
2026-04-04 23:09 ` Jeff King
@ 2026-04-05 2:48 ` Tian Yuchen
2026-04-05 6:14 ` Jeff King
2026-04-06 17:57 ` Justin Tobler
1 sibling, 1 reply; 9+ messages in thread
From: Tian Yuchen @ 2026-04-05 2:48 UTC (permalink / raw)
To: Jeff King; +Cc: Justin Tobler, Luca Stefani, git
On 4/5/26 07:09, Jeff King wrote:
> Alternatively, should the odb transaction system be more forgiving here,
> and act as a noop when there is no odb?
>
> Bisecting the segfault yields ce1661f9da (odb: add transaction
> interface, 2025-09-16). Before then, we passed around the
> object_database itself, saw that its transaction field was NULL, and
> returned immediately. After that commit, we pass the object_databse to
> odb_transaction_begin(), which narrows it to odb->sources (which is
> NULL) while passing to object_file_transaction_begin(). And then that
> function looks at source->odb to go back to the object_database! But
> the source being NULL, it segfaults.
>
> Immediately after that commit, the switch from taking an odb to a source
> is not helpful, though I think eventually it is used to set
> transaction->base.source. But should the whole thing check for a NULL
> source and return early? Or otherwise establish some kind of noop
> transaction?
>
In a nutshell, are you suggesting that odb_transaction_begin() should
return silently when source is NULL?
I understand and agree with your intention, but in practical terms, if
the odb has not been initialised correctly, data will be lost silently
here and the caller will not receive an error message. I’m concerned
that doing this might lead to more bugs in certain situations
Would it make more sense to treat this as a bug instead, e.g. by
triggering a BUG() when source is NULL?
> I haven't thought about the implications (nor even really looked at odb
> transaction code before). But doing it that way would fix not only this
> bug, but also other potential bugs throughout the code base when callers
> start a noop transaction.
>
> +cc Justin (author of ce1661f9da) for any thoughts.
>
> -Peff
Thanks! Yuchen
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
2026-04-05 2:48 ` Tian Yuchen
@ 2026-04-05 6:14 ` Jeff King
0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2026-04-05 6:14 UTC (permalink / raw)
To: Tian Yuchen; +Cc: Justin Tobler, Luca Stefani, git
On Sun, Apr 05, 2026 at 10:48:16AM +0800, Tian Yuchen wrote:
> > Immediately after that commit, the switch from taking an odb to a source
> > is not helpful, though I think eventually it is used to set
> > transaction->base.source. But should the whole thing check for a NULL
> > source and return early? Or otherwise establish some kind of noop
> > transaction?
> >
>
> In a nutshell, are you suggesting that odb_transaction_begin() should return
> silently when source is NULL?
>
> I understand and agree with your intention, but in practical terms, if the
> odb has not been initialised correctly, data will be lost silently here and
> the caller will not receive an error message. I’m concerned that doing this
> might lead to more bugs in certain situations
The source of data loss is not the transaction begin/end, though, but
the actual write operation inside it. If I do this:
t = odb_transaction_begin(the_repository->objects);
if (startup_info->have_repository)
odb_write_object(the_repository->objects, ...);
odb_transaction_commit(t);
should it be an error? We chose not to write because we knew we did not
have an odb, so there is no data to be lost. Starting or ending a
transaction without an odb _is_ a noop, because there is nothing that
can or will be written. It is asking to write that is the error.
In the example above you could obviously push the transaction begin/end
into the conditional. But that's harder when the "if" part is happening
inside a helper function, which is exactly what's going on in index_fd()
here. It passes the flags (which would omit INDEX_WRITE_OBJECT) down to
index_blob_packfile_transaction(), which then goes through all its
regular motions without touching the odb.
This used to work because begin_odb_transaction() did quietly return
NULL, and index_fd() did not have to care about the flag at all.
> Would it make more sense to treat this as a bug instead, e.g. by
> triggering a BUG() when source is NULL?
I don't think so, if the point is to be kind to callers. We should still
BUG() on an actual write when there is no odb or repo, of course.
I looked over the other callers of odb_transaction_begin() and I think
this is probably the only affected caller. But I don't think it's purely
academic. If you teach index_fd() to use index_core() when the
INDEX_WRITE_OBJECT is not set (as proposed earlier in this thread), I
think you'll fall afoul of the comment at the top of the function:
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
* die() for large files.
*/
E.g., on a 32-bit system we'd die() on a 4GB file now (because we can't
fit its size in a size_t anymore), and probably even smaller (because
we'd try to mmap() it and there wouldn't be a large enough contiguous
chunk of the address space).
So we really do want to follow the streaming path, even if we're not
writing out the object.
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
2026-04-04 23:09 ` Jeff King
2026-04-05 2:48 ` Tian Yuchen
@ 2026-04-06 17:57 ` Justin Tobler
2026-04-06 20:45 ` Junio C Hamano
1 sibling, 1 reply; 9+ messages in thread
From: Justin Tobler @ 2026-04-06 17:57 UTC (permalink / raw)
To: Jeff King; +Cc: Tian Yuchen, Luca Stefani, git
On 26/04/04 07:09PM, Jeff King wrote:
> Alternatively, should the odb transaction system be more forgiving here,
> and act as a noop when there is no odb?
>
> Bisecting the segfault yields ce1661f9da (odb: add transaction
> interface, 2025-09-16). Before then, we passed around the
> object_database itself, saw that its transaction field was NULL, and
> returned immediately. After that commit, we pass the object_databse to
> odb_transaction_begin(), which narrows it to odb->sources (which is
> NULL) while passing to object_file_transaction_begin(). And then that
> function looks at source->odb to go back to the object_database! But
> the source being NULL, it segfaults.
>
> Immediately after that commit, the switch from taking an odb to a source
> is not helpful, though I think eventually it is used to set
> transaction->base.source. But should the whole thing check for a NULL
> source and return early? Or otherwise establish some kind of noop
> transaction?
>
> I haven't thought about the implications (nor even really looked at odb
> transaction code before). But doing it that way would fix not only this
> bug, but also other potential bugs throughout the code base when callers
> start a noop transaction.
>
> +cc Justin (author of ce1661f9da) for any thoughts.
Thanks Peff for the cc! :)
Ok so the segfault in question can be reproduced with the following:
echo foo >foo
echo bar >bar
git -c core.bigFileThreshold=1 diff -- foo bar
And indeed the problematic code path is the `odb_transaction_begin()`
invoked via `object-file.c:index_fd()`. In a scenario where git-diff(1)
is being executed outside of a repository and involves files that exceed
"core.bigFileThreshold", `index_blob_packfile_transaction()` is invoked
without the `INDEX_WRITE_OBJECT` flag to stream the "large" object
object and generate its hash only. I unfortunately didn't consider this
use case.
IMO it is already questionable as to why we would want to start an ODB
transaction if it is already known that the object won't be written.
IOW, if we are only interested in streaming the object to get its hash
we shouldn't have to start a transaction. As part of a patch series I'm
currently working on [1], a separate `hash_blob_stream()` helper is
introduced to just hash the blob and is used to bypass the ODB
transaction entirely. Building/testing with this patch does appear to
fix the issue.
This series is only in "seen" right now, so I doubt it would make this
next release. I don't mind extracting this patch from its current series
and submitting it as a fix if this the route we want to go though.
The other option would be as Peff suggested and to make
`odb_transaction_begin()` just return early with NULL or some type of
no-op transaction if there is no ODB set up. I do think the former
approach would be preferable though as I'm not sure there is really a
good use case for supporting ODB transactions when there isn't an ODB
set up.
-Justin
[1]: https://lore.kernel.org/git/20260402213220.2651523-5-jltobler@gmail.com/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
2026-04-06 17:57 ` Justin Tobler
@ 2026-04-06 20:45 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2026-04-06 20:45 UTC (permalink / raw)
To: Justin Tobler; +Cc: Jeff King, Tian Yuchen, Luca Stefani, git
Justin Tobler <jltobler@gmail.com> writes:
> IMO it is already questionable as to why we would want to start an ODB
> transaction if it is already known that the object won't be written.
> IOW, if we are only interested in streaming the object to get its hash
> we shouldn't have to start a transaction.
Very well.
> The other option would be as Peff suggested and to make
> `odb_transaction_begin()` just return early with NULL or some type of
> no-op transaction if there is no ODB set up. I do think the former
> approach would be preferable though as I'm not sure there is really a
> good use case for supporting ODB transactions when there isn't an ODB
> set up.
Let's not go there. I agree that odb transactions without odb does
not make any sense.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-06 20:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-04 10:39 [BUG] git diff --no-index segfaults on large files (NULL object database) Luca Stefani
2026-04-04 16:45 ` Tian Yuchen
2026-04-04 16:53 ` Luca Stefani
2026-04-04 17:07 ` Tian Yuchen
2026-04-04 23:09 ` Jeff King
2026-04-05 2:48 ` Tian Yuchen
2026-04-05 6:14 ` Jeff King
2026-04-06 17:57 ` Justin Tobler
2026-04-06 20:45 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox