git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
	"Eric W. Biederman" <ebiederm@gmail.com>,
	Jeff King <peff@peff.net>, Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH v5 0/5] merge-ort: implement support for packing objects together
Date: Tue, 7 Nov 2023 10:58:30 -0500	[thread overview]
Message-ID: <ZUpepnSCSxL8i96b@nand.local> (raw)
In-Reply-To: <0ac32374-7d52-8f0c-8583-110de678291e@gmx.de>

On Mon, Nov 06, 2023 at 04:46:32PM +0100, Johannes Schindelin wrote:
> Hi,
>
> On Mon, 23 Oct 2023, Junio C Hamano wrote:
>
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > But I think that this approach ended up being less heavy-weight than I
> > > had originally imagined, so I think that this version is a worthwhile
> > > improvement over v4.
> >
> > ;-).
> >
> > This version is a good place to stop, a bit short of going full OO.
> > Nicely done.
>
> I wonder whether a more generic approach would be more desirable, an
> approach that would work for `git replay`, too, for example (where
> streaming objects does not work because they need to be made available
> immediately because subsequent `merge_incore_nonrecursive()` might expect
> the created objects to be present)?

The goal of this series is to bound the number of inodes consumed by a
single merge-tree invocation down from arbitrarily many (in the case of
storing each new object loose) to a constant (by storing everything in a
single pack).

I'd think that we would want a similar approach for 'replay', but as
you note we have some additional requirements, too:

  - each replayed commit is computed in a single step, which will result
    in a new pack
  - we must be able to see objects from previous steps

I think one feasible approach here for replay is to combine the two
ideas and have a separate objdir that stores N packs (one for each step
of the replay), but then repacks them down into a single pack before
migrating back to the main object store.

That would ensure that we have some isolation between replay-created
objects and the rest of the repository in the intermediate state. Even
though we'd have as many packs as there are commits, we'd consume far
fewer inodes in the process, since each commit can introduce arbitrarily
many new objects, each requiring at least a single inode (potentially
more with sharding).

We'd have to be mindful of having a large number of packs, but I think
that this should mostly be a non-issue, since we'd only be living with N
packs for the lifetime of the replay command (before repacking them down
to a single pack and migrating them back to the main object store).

Thanks,
Taylor

  parent reply	other threads:[~2023-11-07 15:58 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-19 17:28 [PATCH v4 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-19 17:28 ` [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-20  7:35   ` Jeff King
2023-10-20 16:55     ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-19 17:28 ` [PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-23  9:19   ` Patrick Steinhardt
2023-10-23 18:58     ` Jeff King
2023-10-24  6:34       ` Patrick Steinhardt
2023-10-24 17:08         ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:29 ` [PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-19 21:47 ` [PATCH v4 0/7] merge-ort: implement support for packing objects together Junio C Hamano
2023-10-20  7:29 ` Jeff King
2023-10-20 16:53   ` Junio C Hamano
2023-10-23  9:19 ` Patrick Steinhardt
2023-10-23 22:44 ` [PATCH v5 0/5] " Taylor Blau
2023-10-23 22:44   ` [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-25  7:37     ` Jeff King
2023-10-25 15:39       ` Taylor Blau
2023-10-27 23:12       ` Junio C Hamano
2023-10-23 22:44   ` [PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-23 22:45   ` [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-25  7:58     ` Patrick Steinhardt
2023-10-25 15:44       ` Taylor Blau
2023-10-25 17:21         ` Eric Sunshine
2023-10-26  8:16           ` Patrick Steinhardt
2023-11-11  0:17           ` Elijah Newren
2023-10-23 22:45   ` [PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-23 22:45   ` [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-25  7:58     ` Patrick Steinhardt
2023-10-25 15:46       ` Taylor Blau
2023-11-10 23:51     ` Elijah Newren
2023-11-11  0:27       ` Junio C Hamano
2023-11-11  1:34         ` Taylor Blau
2023-11-11  1:24       ` Taylor Blau
2023-11-13 22:05         ` Jeff King
2023-11-14  1:40           ` Junio C Hamano
2023-11-14  2:54             ` Elijah Newren
2023-11-14 21:55             ` Jeff King
2023-11-14  3:08           ` Elijah Newren
2023-11-13 22:02       ` Jeff King
2023-11-13 22:34         ` Taylor Blau
2023-11-14  2:50           ` Elijah Newren
2023-11-14 21:53             ` Jeff King
2023-11-14 22:04           ` Jeff King
2023-10-23 23:31   ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Junio C Hamano
2023-11-06 15:46     ` Johannes Schindelin
2023-11-06 23:19       ` Junio C Hamano
2023-11-07  3:42       ` Jeff King
2023-11-07 15:58       ` Taylor Blau [this message]
2023-11-07 18:22         ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Taylor Blau
2023-11-07 18:22           ` [RFC PATCH 1/3] merge-ort.c: finalize ODB transactions after each step Taylor Blau
2023-11-11  3:45             ` Elijah Newren
2023-11-07 18:22           ` [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()` Taylor Blau
2023-11-08  7:05             ` Patrick Steinhardt
2023-11-09 19:26               ` Taylor Blau
2023-11-07 18:23           ` [RFC PATCH 3/3] builtin/replay.c: introduce `--write-pack` Taylor Blau
2023-11-11  3:42           ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Elijah Newren
2023-11-11  4:04           ` Elijah Newren
2023-10-25  7:58   ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZUpepnSCSxL8i96b@nand.local \
    --to=me@ttaylorr.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=ebiederm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).