Git development
 help / color / mirror / Atom feed
From: Luca Stefani <luca.stefani.ge1@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, cat@malon.dev
Subject: Re: [PATCH] object-file: don't use object database without a repository
Date: Sun, 5 Apr 2026 18:10:33 +0200	[thread overview]
Message-ID: <145b6c7f-c037-4a87-b561-d2b4d8c5a0cd@gmail.com> (raw)
In-Reply-To: <20260405064651.GA1452907@coredump.intra.peff.net>


On 05/04/2026 08:46, Jeff King wrote:
> On Sat, Apr 04, 2026 at 07:28:17PM +0200, Luca Stefani wrote:
>
>> When running `git diff -- $file1 $file2' on large enough files,
>> index_fd() attempts to use 'the_repository->objects', assuming it
>> is initialized, but that's not the case for non-repository usecases.
>>
>> When git diff is invoked without a backing repository,
>> INDEX_WRITE_OBJECT is never set in flags, meaning only the hash is
>> needed and nothing should be written to the object store.
>>
>> Enforce the use of index_core() in this case.
> I don't think we want to use index_core() for a large file, though. A
> test like this:

I don't know what would be the right approach, index_core sure is slow, 
but maybe that's expected for those sizes.

This fix by itself simply avoids entering into the broken case, and it 
still gives me a working diff.

> diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
> index 15076dfe0d..7ef5604430 100755
> --- a/t/t4053-diff-no-index.sh
> +++ b/t/t4053-diff-no-index.sh
> @@ -413,4 +413,10 @@ test_expect_success 'diff --no-index with pathspec glob and exclude' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success 'diff --no-index on a huge file' '
> +	dd if=/dev/zero bs=1M count=4000 >big.file &&
> +	echo whatever >small.file &&
> +	test_expect_code 1 git diff --no-index big.file small.file
> +'
> +
>   test_done

If you want  I can send a V2 with that, but given it's your test suit 
I'd rather you handle it.

Especially when it comes to multi-arch, as I only really care about amd64

>
> will now fail on a 32-bit system, because we try to mmap the whole file,
> which will fail.  We really do want to follow the streaming code path
> (which knows to respect the lack of a WRITE_OBJECT flag and works
> without an odb in that case).
>
> It's kind of an expensive test, though, so we probably don't want to
> actually include it in the test suite.
>
> -Peff
>
> PS I'd expect a 4GB+ file to work, too, but it looks like the diff code
>     barfs when trying to stuff the file into a diff_filespec. A simpler
>     example is:
>
>       dd if=/dev/zero bs=1G count=5 >big.file
>       git hash-object big.file
>
>     but that dies, too! It looks like the streaming helper uses a size_t
>     to take the size, which is wrong. It really should be an off_t. So I
>     dunno, maybe nobody cares about ever working with 4GB files on 32-bit
>     systems these days. It still feels like we should avoid a large mmap,
>     though.

Ah I I just happened to have a 3G file and I threw it at 'git diff' :)

With `#define _FILE_OFFSET_BITS 64` off_t is properly sized, but if 
other places downcast it then there's little hope on 32bit.

Now even with that in mind not sure if it's worth fixing...

  reply	other threads:[~2026-04-05 16:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04 17:28 [PATCH] object-file: don't use object database without a repository Luca Stefani
2026-04-05  6:03 ` Pushkar Singh
2026-04-05  6:46 ` Jeff King
2026-04-05 16:10   ` Luca Stefani [this message]
2026-04-05 19:17     ` Jeff King
2026-04-06 18:17       ` Justin Tobler
2026-04-06 19:31         ` Luca Stefani
2026-04-06 20:31           ` Justin Tobler
2026-04-06 20:06         ` Jeff King
2026-04-06 20:38           ` Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=145b6c7f-c037-4a87-b561-d2b4d8c5a0cd@gmail.com \
    --to=luca.stefani.ge1@gmail.com \
    --cc=cat@malon.dev \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox