From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Elijah Newren <newren@gmail.com>,
git@vger.kernel.org, "Eric W. Biederman" <ebiederm@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`
Date: Mon, 13 Nov 2023 17:34:42 -0500 [thread overview]
Message-ID: <ZVKkgpiFaOwwDcdw@nand.local> (raw)
In-Reply-To: <20231113220254.GA2065691@coredump.intra.peff.net>
On Mon, Nov 13, 2023 at 05:02:54PM -0500, Jeff King wrote:
> On Fri, Nov 10, 2023 at 03:51:18PM -0800, Elijah Newren wrote:
>
> > This is unsafe; the object may need to be read later within the same
> > merge. [...]
> >
> > I think (untested) that you could fix this by creating two packs
> > instead of just one. In particular, add a call to
> > flush_odb_transcation() after the "redo_after_renames" block in
> > merge_ort_nonrecursive_internal().
>
> It might not be too hard to just let in-process callers access the
> objects we've written. A quick and dirty patch is below, which works
> with the test you suggested (the test still fails because it finds a
> conflict, but it gets past the "woah, I can't find that sha1" part).
That's a very slick idea, and I think that this series has some legs to
stand on now as a result.
There are a few of interesting conclusions we can draw here:
1. This solves the recursive merge problem that Elijah mentioned
earlier where we could generate up to 2^N packs (where N is the
maximum depth of the recursive merge).
2. This also solves the case where the merge-ort code needs to read an
object that it wrote earlier on in the same process without having
to flush out intermediate packs. So in the best case we need only a
single pack (instead of two).
3. This also solves the 'replay' issue, *and* allows us to avoid the
tmp-objdir thing there entirely, since we can simulate object reads
in the bulk-checkin pack.
So I think that this is a direction worth pursuing.
In terms of making those lookups faster, I think that what you'd want is
an oidmap. The overhead is slightly unfortunate, but I think that any
other solution which requires keeping the written array in sorted order
would in practice be more expensive as you have to constantly reallocate
and copy portions of the array around to maintain its invariant.
> I don't know if that is sufficient, though. Would other spawned
> processes (hooks, external merge drivers, and so on) need to be able to
> see these objects, too?
Interesting point. In theory those processes could ask about newly
created objects, and if they were spawned before the bulk-checkin pack
was flushed, those lookups would fail.
But that requires that, e.g. for hooks, that we know a-priori the object
ID of some newly-written objects. If you wanted to make those lookups
succeed, I think there are a couple of options:
- teach child-processes about the bulk-checkin pack, and let them
perform the same fake lookup
- flush (but do not close) the bulk-checkin transaction
In any event, I think that this is a sufficiently rare and niche case
that we'd be OK to declare that you should not expect the above
scenarios to work when using `--write-pack`. If someone does ask for
that feature in the future, we could implement it relatively painlessly
using one of the above options.
Thanks,
Taylor
next prev parent reply other threads:[~2023-11-13 22:34 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-19 17:28 [PATCH v4 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-19 17:28 ` [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-20 7:35 ` Jeff King
2023-10-20 16:55 ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-19 17:28 ` [PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-23 9:19 ` Patrick Steinhardt
2023-10-23 18:58 ` Jeff King
2023-10-24 6:34 ` Patrick Steinhardt
2023-10-24 17:08 ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:29 ` [PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-19 21:47 ` [PATCH v4 0/7] merge-ort: implement support for packing objects together Junio C Hamano
2023-10-20 7:29 ` Jeff King
2023-10-20 16:53 ` Junio C Hamano
2023-10-23 9:19 ` Patrick Steinhardt
2023-10-23 22:44 ` [PATCH v5 0/5] " Taylor Blau
2023-10-23 22:44 ` [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-25 7:37 ` Jeff King
2023-10-25 15:39 ` Taylor Blau
2023-10-27 23:12 ` Junio C Hamano
2023-10-23 22:44 ` [PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-23 22:45 ` [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-25 7:58 ` Patrick Steinhardt
2023-10-25 15:44 ` Taylor Blau
2023-10-25 17:21 ` Eric Sunshine
2023-10-26 8:16 ` Patrick Steinhardt
2023-11-11 0:17 ` Elijah Newren
2023-10-23 22:45 ` [PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-23 22:45 ` [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-25 7:58 ` Patrick Steinhardt
2023-10-25 15:46 ` Taylor Blau
2023-11-10 23:51 ` Elijah Newren
2023-11-11 0:27 ` Junio C Hamano
2023-11-11 1:34 ` Taylor Blau
2023-11-11 1:24 ` Taylor Blau
2023-11-13 22:05 ` Jeff King
2023-11-14 1:40 ` Junio C Hamano
2023-11-14 2:54 ` Elijah Newren
2023-11-14 21:55 ` Jeff King
2023-11-14 3:08 ` Elijah Newren
2023-11-13 22:02 ` Jeff King
2023-11-13 22:34 ` Taylor Blau [this message]
2023-11-14 2:50 ` Elijah Newren
2023-11-14 21:53 ` Jeff King
2023-11-14 22:04 ` Jeff King
2023-10-23 23:31 ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Junio C Hamano
2023-11-06 15:46 ` Johannes Schindelin
2023-11-06 23:19 ` Junio C Hamano
2023-11-07 3:42 ` Jeff King
2023-11-07 15:58 ` Taylor Blau
2023-11-07 18:22 ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Taylor Blau
2023-11-07 18:22 ` [RFC PATCH 1/3] merge-ort.c: finalize ODB transactions after each step Taylor Blau
2023-11-11 3:45 ` Elijah Newren
2023-11-07 18:22 ` [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()` Taylor Blau
2023-11-08 7:05 ` Patrick Steinhardt
2023-11-09 19:26 ` Taylor Blau
2023-11-07 18:23 ` [RFC PATCH 3/3] builtin/replay.c: introduce `--write-pack` Taylor Blau
2023-11-11 3:42 ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Elijah Newren
2023-11-11 4:04 ` Elijah Newren
2023-10-25 7:58 ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZVKkgpiFaOwwDcdw@nand.local \
--to=me@ttaylorr.com \
--cc=ebiederm@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).