Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Tian Yuchen <cat@malon.dev>
Cc: Justin Tobler <jltobler@gmail.com>,
	Luca Stefani <luca.stefani.ge1@gmail.com>,
	git@vger.kernel.org
Subject: Re: [BUG] git diff --no-index segfaults on large files (NULL object database)
Date: Sun, 5 Apr 2026 02:14:30 -0400	[thread overview]
Message-ID: <20260405061430.GA1451922@coredump.intra.peff.net> (raw)
In-Reply-To: <7841c013-1f6e-49e3-9f47-f25044764a11@malon.dev>

On Sun, Apr 05, 2026 at 10:48:16AM +0800, Tian Yuchen wrote:

> > Immediately after that commit, the switch from taking an odb to a source
> > is not helpful, though I think eventually it is used to set
> > transaction->base.source. But should the whole thing check for a NULL
> > source and return early? Or otherwise establish some kind of noop
> > transaction?
> > 
> 
> In a nutshell, are you suggesting that odb_transaction_begin() should return
> silently when source is NULL?
> 
> I understand and agree with your intention, but in practical terms, if the
> odb has not been initialised correctly, data will be lost silently here and
> the caller will not receive an error message. I’m concerned that doing this
> might lead to more bugs in certain situations

The source of data loss is not the transaction begin/end, though, but
the actual write operation inside it. If I do this:

  t = odb_transaction_begin(the_repository->objects);
  if (startup_info->have_repository)
	odb_write_object(the_repository->objects, ...);
  odb_transaction_commit(t);

should it be an error? We chose not to write because we knew we did not
have an odb, so there is no data to be lost. Starting or ending a
transaction without an odb _is_ a noop, because there is nothing that
can or will be written. It is asking to write that is the error.

In the example above you could obviously push the transaction begin/end
into the conditional. But that's harder when the "if" part is happening
inside a helper function, which is exactly what's going on in index_fd()
here. It passes the flags (which would omit INDEX_WRITE_OBJECT) down to
index_blob_packfile_transaction(), which then goes through all its
regular motions without touching the odb.

This used to work because begin_odb_transaction() did quietly return
NULL, and index_fd() did not have to care about the flag at all.

> Would it make more sense to treat this as a bug instead, e.g. by
> triggering a BUG() when source is NULL?

I don't think so, if the point is to be kind to callers. We should still
BUG() on an actual write when there is no odb or repo, of course.


I looked over the other callers of odb_transaction_begin() and I think
this is probably the only affected caller. But I don't think it's purely
academic. If you teach index_fd() to use index_core() when the
INDEX_WRITE_OBJECT is not set (as proposed earlier in this thread), I
think you'll fall afoul of the comment at the top of the function:

          /*
           * Call xsize_t() only when needed to avoid potentially unnecessary
           * die() for large files.
           */

E.g., on a 32-bit system we'd die() on a 4GB file now (because we can't
fit its size in a size_t anymore), and probably even smaller (because
we'd try to mmap() it and there wouldn't be a large enough contiguous
chunk of the address space).

So we really do want to follow the streaming path, even if we're not
writing out the object.

-Peff

  reply	other threads:[~2026-04-05  6:14 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04 10:39 [BUG] git diff --no-index segfaults on large files (NULL object database) Luca Stefani
2026-04-04 16:45 ` Tian Yuchen
2026-04-04 16:53   ` Luca Stefani
2026-04-04 17:07     ` Tian Yuchen
2026-04-04 23:09       ` Jeff King
2026-04-05  2:48         ` Tian Yuchen
2026-04-05  6:14           ` Jeff King [this message]
2026-04-06 17:57         ` Justin Tobler
2026-04-06 20:45           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260405061430.GA1451922@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=cat@malon.dev \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=luca.stefani.ge1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox