Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Luca Stefani <luca.stefani.ge1@gmail.com>
Cc: Justin Tobler <jltobler@gmail.com>, git@vger.kernel.org, cat@malon.dev
Subject: Re: [PATCH] object-file: don't use object database without a repository
Date: Sun, 5 Apr 2026 15:17:50 -0400	[thread overview]
Message-ID: <20260405191750.GA1525850@coredump.intra.peff.net> (raw)
In-Reply-To: <145b6c7f-c037-4a87-b561-d2b4d8c5a0cd@gmail.com>

On Sun, Apr 05, 2026 at 06:10:33PM +0200, Luca Stefani wrote:

> > > Enforce the use of index_core() in this case.
> > I don't think we want to use index_core() for a large file, though. A
> > test like this:
> 
> I don't know what would be the right approach, index_core sure is slow, but
> maybe that's expected for those sizes.

It's always going to be slow because we're hashing a lot of data. But
the point is that we should be streaming it, and trying to allocate a
huge buffer (even via mmap). So it's not that it's slow, it's that some
cases which used to work will not do so any longer.

We want to keep going into the streaming code path, and not
index_core(). But the streaming code path has been broken outside of a
repo, by ce1661f9da (odb: add transaction interface, 2025-09-16) and its
follow-on patches. And we should fix that instead of avoiding it.

> This fix by itself simply avoids entering into the broken case, and it
> still gives me a working diff.

For some files, yes, but it breaks other cases (like the one I
demonstrated).

> > diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
> > index 15076dfe0d..7ef5604430 100755
> > --- a/t/t4053-diff-no-index.sh
> > +++ b/t/t4053-diff-no-index.sh
> > @@ -413,4 +413,10 @@ test_expect_success 'diff --no-index with pathspec glob and exclude' '
> >   	test_cmp expect actual
> >   '
> > +test_expect_success 'diff --no-index on a huge file' '
> > +	dd if=/dev/zero bs=1M count=4000 >big.file &&
> > +	echo whatever >small.file &&
> > +	test_expect_code 1 git diff --no-index big.file small.file
> > +'
> > +
> >   test_done
> 
> If you want  I can send a V2 with that, but given it's your test suit I'd
> rather you handle it.

If you sent a v2 with this, the tests would not pass. ;)

But like I said, I don't think we want that in the test suite because it
is too expensive to run. What we probably do want is a cheap
demonstration of the segfault, which is this:

diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index c824c1a25c..31965908a6 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -149,4 +149,9 @@ test_expect_success 'fmt-merge-msg does not crash with -h' '
 	test_grep "[Uu]sage: git fmt-merge-msg " usage
 '
 
+test_expect_success 'indexing large file outside repo' '
+	nongit dd if=/dev/zero of=big.file bs=10k count=1 &&
+	nongit git -c core.bigfilethreshold=5k hash-object big.file
+'
+
 test_done

But I think the actual code change in your patch is the wrong thing, so
I also don't think we'd want to just squash that test in. I'm hoping
Justin has some insights on how to do a more complete fix.

-Peff

  reply	other threads:[~2026-04-05 19:17 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04 17:28 [PATCH] object-file: don't use object database without a repository Luca Stefani
2026-04-05  6:03 ` Pushkar Singh
2026-04-05  6:46 ` Jeff King
2026-04-05 16:10   ` Luca Stefani
2026-04-05 19:17     ` Jeff King [this message]
2026-04-06 18:17       ` Justin Tobler
2026-04-06 19:31         ` Luca Stefani
2026-04-06 20:31           ` Justin Tobler
2026-04-06 20:06         ` Jeff King
2026-04-06 20:38           ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260405191750.GA1525850@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=cat@malon.dev \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=luca.stefani.ge1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox