git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Elijah Newren <newren@gmail.com>,
	"Eric W. Biederman" <ebiederm@gmail.com>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
Date: Mon, 9 Oct 2023 12:54:12 +0200	[thread overview]
Message-ID: <ZSPb1OYRrQSUugtg@tanuki> (raw)
In-Reply-To: <ZSCR7e6KKqFv8mZk@nand.local>

[-- Attachment #1: Type: text/plain, Size: 3831 bytes --]

On Fri, Oct 06, 2023 at 07:02:05PM -0400, Taylor Blau wrote:
> On Fri, Oct 06, 2023 at 03:35:25PM -0700, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> > > When using merge-tree often within a repository[^1], it is possible to
> > > generate a relatively large number of loose objects, which can result in
> > > degraded performance, and inode exhaustion in extreme cases.
> >
> > Well, be it "git merge-tree" or "git merge", new loose objects tend
> > to accumulate until "gc" kicks in, so it is not a new problem for
> > mere mortals, is it?
> 
> Yeah, I would definitely suspect that this is more of an issue for
> forges than individual Git users.
> 
> > As one "interesting" use case of "merge-tree" is for a Git hosting
> > site with bare repositories to offer trial merges, without which
> > majority of the object their repositories acquire would have been in
> > packs pushed by their users, "Gee, loose objects consume many inodes
> > in exchange for easier selective pruning" becomes an issue, right?
> 
> Right.
> 
> > Just like it hurts performance to have too many loose object files,
> > presumably it would also hurt performance to keep too many packs,
> > each came from such a trial merge.  Do we have a "gc" story offered
> > for these packs created by the new feature?  E.g., "once merge-tree
> > is done creating a trial merge, we can discard the objects created
> > in the pack, because we never expose new objects in the pack to the
> > outside, processes running simultaneously, so instead closing the
> > new packfile by calling flush_bulk_checkin_packfile(), we can safely
> > unlink the temporary pack.  We do not even need to spend cycles to
> > run a gc that requires cycles to enumerate what is still reachable",
> > or something like that?
> 
> I know Johannes worked on something like this recently. IIRC, it
> effectively does something like:
> 
>     struct tmp_objdir *tmp_objdir = tmp_objdir_create(...);
>     tmp_objdir_replace_primary_odb(tmp_objdir, 1);
> 
> at the beginning of a merge operation, and:
> 
>     tmp_objdir_discard_objects(tmp_objdir);
> 
> at the end. I haven't followed that work off-list very closely, but it
> is only possible for GitHub to discard certain niche kinds of
> merges/rebases, since in general we make the objects created during test
> merges available via refs/pull/N/{merge,rebase}.
> 
> I think that like anything, this is a trade-off. Having lots of packs
> can be a performance hindrance just like having lots of loose objects.
> But since we can represent more objects with fewer inodes when packed,
> storing those objects together in a pack is preferable when (a) you're
> doing lots of test-merges, and (b) you want to keep those objects
> around, e.g., because they are reachable.

In Gitaly, we usually set up quarantine directories for all operations
that create objects. This allows us to discard any newly written objects
in case either the RPC call gets cancelled or in case our access checks
determine that the change should not be allowed. The logic is rather
simple:

    1. Create a new temporary directory.

    2. Set up the new temporary directory as main object database via
       the `GIT_OBJECT_DIRECTORY` environment variable.

    3. Set up the main repository's object database via the
       `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variable.

    4. Execute Git commands that write objects with these environment
       variables set up. The new objects will end up neatly contained in
       the temporary directory.

    5. Once done, either discard the temporary object database or
       migrate objects into the main object daatabase.

I wonder whether this would be a viable approach for you, as well.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2023-10-09 11:00 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06 22:01 [PATCH 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-06 22:01 ` [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-06 22:01 ` [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-06 22:02 ` [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-06 22:02 ` [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-07  3:07   ` Eric Biederman
2023-10-09  1:31     ` Taylor Blau
2023-10-06 22:02 ` [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-06 22:35   ` Junio C Hamano
2023-10-06 23:02     ` Taylor Blau
2023-10-08  7:02       ` Elijah Newren
2023-10-08 16:04         ` Taylor Blau
2023-10-08 17:33           ` Jeff King
2023-10-09  1:37             ` Taylor Blau
2023-10-09 20:21               ` Jeff King
2023-10-09 17:24             ` Junio C Hamano
2023-10-09 10:54       ` Patrick Steinhardt [this message]
2023-10-09 16:08         ` Taylor Blau
2023-10-10  6:36           ` Patrick Steinhardt
2023-10-17 16:31 ` [PATCH v2 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-17 16:31   ` [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18  2:18     ` Junio C Hamano
2023-10-18 16:34       ` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 00/10] merge-ort: implement support for packing objects together Taylor Blau
2023-10-18 17:07   ` [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-18 23:10     ` Junio C Hamano
2023-10-19 15:19       ` Taylor Blau
2023-10-19 17:55         ` Junio C Hamano
2023-10-18 17:08   ` [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-18 17:08   ` [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 23:18     ` Junio C Hamano
2023-10-19 15:30       ` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 18:32 ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-18 18:32   ` [PATCH v4 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-18 18:32   ` [PATCH v4 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-18 18:32   ` [PATCH v4 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-18 18:32   ` [PATCH v4 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-18 18:32   ` [PATCH v4 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-18 18:32   ` [PATCH v4 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-18 18:32   ` [PATCH v4 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-18 18:32   ` [PATCH v4 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-18 18:32   ` [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-18 18:32   ` [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-18 18:33   ` [PATCH v4 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-18 18:33   ` [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-18 18:33   ` [PATCH v4 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-18 18:33   ` [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-18 18:33   ` [PATCH v4 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-18 18:33   ` [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-18 18:33   ` [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-18 23:26   ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Junio C Hamano
2023-10-20 17:27     ` Taylor Blau
2023-10-23 20:22       ` SZEDER Gábor
2023-10-30 20:24         ` Taylor Blau
2024-01-16 22:08   ` [PATCH v5 " Taylor Blau
2024-01-16 22:09     ` [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-16 22:09     ` [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-16 22:09     ` [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-16 22:09     ` [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-16 22:09     ` [PATCH v5 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-16 22:09     ` [PATCH v5 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-16 22:09     ` [PATCH v5 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-29 21:26       ` SZEDER Gábor
2024-01-29 23:58         ` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 10/17] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-01-16 22:09     ` [PATCH v5 11/17] bloom: annotate filters with hash version Taylor Blau
2024-01-16 22:09     ` [PATCH v5 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-16 22:09     ` [PATCH v5 13/17] commit-graph.c: unconditionally load " Taylor Blau
2024-01-16 22:09     ` [PATCH v5 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-16 22:09     ` [PATCH v5 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-16 22:09     ` [PATCH v5 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZSPb1OYRrQSUugtg@tanuki \
    --to=ps@pks.im \
    --cc=ebiederm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).