git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Elijah Newren <newren@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()`
Date: Wed, 8 Nov 2023 08:05:46 +0100	[thread overview]
Message-ID: <ZUszSs0CYoFV9YJ0@tanuki> (raw)
In-Reply-To: <0f19c139ba9bb5105747f545038825d0c89f2e42.1699381371.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 3155 bytes --]

On Tue, Nov 07, 2023 at 01:22:58PM -0500, Taylor Blau wrote:
> In the following commit, we will teach `git replay` how to write a pack
> containing the set of new objects created as a result of the `replay`
> operation.
> 
> Since `replay` needs to be able to see the object(s) written
> from previous steps in order to replay each commit, the ODB transaction
> may have multiple pending packs. Migrating multiple packs back into the
> main object store has a couple of downsides:
> 
>   - It is error-prone to do so: each pack must be migrated in the
>     correct order (with the ".idx" file staged last), and the set of
>     packs themselves must be moved over in the correct order to avoid
>     racy behavior.
> 
>   - It is a (potentially significant) performance degradation to migrate
>     a large number of packs back into the main object store.
> 
> Introduce a new function that combines the set of all packs in the
> temporary object store to produce a single pack which is the logical
> concatenation of all packs created during that level of the ODB
> transaction.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  tmp-objdir.c | 13 +++++++++++++
>  tmp-objdir.h |  6 ++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/tmp-objdir.c b/tmp-objdir.c
> index 5f9074ad1c..ef53180b47 100644
> --- a/tmp-objdir.c
> +++ b/tmp-objdir.c
> @@ -12,6 +12,7 @@
>  #include "strvec.h"
>  #include "quote.h"
>  #include "object-store-ll.h"
> +#include "run-command.h"
>  
>  struct tmp_objdir {
>  	struct strbuf path;
> @@ -277,6 +278,18 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
>  	return ret;
>  }
>  
> +int tmp_objdir_repack(struct tmp_objdir *t)
> +{
> +	struct child_process cmd = CHILD_PROCESS_INIT;
> +
> +	cmd.git_cmd = 1;
> +
> +	strvec_pushl(&cmd.args, "repack", "-a", "-d", "-k", "-l", NULL);
> +	strvec_pushv(&cmd.env, tmp_objdir_env(t));

I wonder what performance of this repack would be like in a large
repository with many refs. Ideally, I would expect that the repacking
performance should scale with the number of objects we have written into
the temporary object directory. But in practice, the repack will need to
compute reachability and thus also scales with the size of the repo
itself, doesn't it?

Patrick

> +	return run_command(&cmd);
> +}
> +
>  const char **tmp_objdir_env(const struct tmp_objdir *t)
>  {
>  	if (!t)
> diff --git a/tmp-objdir.h b/tmp-objdir.h
> index 237d96b660..d00e3b3e27 100644
> --- a/tmp-objdir.h
> +++ b/tmp-objdir.h
> @@ -36,6 +36,12 @@ struct tmp_objdir *tmp_objdir_create(const char *prefix);
>   */
>  const char **tmp_objdir_env(const struct tmp_objdir *);
>  
> +/*
> + * Combines all packs in the tmp_objdir into a single pack before migrating.
> + * Removes original pack(s) after installing the combined pack into place.
> + */
> +int tmp_objdir_repack(struct tmp_objdir *);
> +
>  /*
>   * Finalize a temporary object directory by migrating its objects into the main
>   * object database, removing the temporary directory, and freeing any
> -- 
> 2.42.0.446.g0b9ef90488
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-11-08  7:05 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-19 17:28 [PATCH v4 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-19 17:28 ` [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-20  7:35   ` Jeff King
2023-10-20 16:55     ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 2/7] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-19 17:28 ` [PATCH v4 3/7] bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-23  9:19   ` Patrick Steinhardt
2023-10-23 18:58     ` Jeff King
2023-10-24  6:34       ` Patrick Steinhardt
2023-10-24 17:08         ` Junio C Hamano
2023-10-19 17:28 ` [PATCH v4 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:28 ` [PATCH v4 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-19 17:29 ` [PATCH v4 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-19 21:47 ` [PATCH v4 0/7] merge-ort: implement support for packing objects together Junio C Hamano
2023-10-20  7:29 ` Jeff King
2023-10-20 16:53   ` Junio C Hamano
2023-10-23  9:19 ` Patrick Steinhardt
2023-10-23 22:44 ` [PATCH v5 0/5] " Taylor Blau
2023-10-23 22:44   ` [PATCH v5 1/5] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-25  7:37     ` Jeff King
2023-10-25 15:39       ` Taylor Blau
2023-10-27 23:12       ` Junio C Hamano
2023-10-23 22:44   ` [PATCH v5 2/5] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-23 22:45   ` [PATCH v5 3/5] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-25  7:58     ` Patrick Steinhardt
2023-10-25 15:44       ` Taylor Blau
2023-10-25 17:21         ` Eric Sunshine
2023-10-26  8:16           ` Patrick Steinhardt
2023-11-11  0:17           ` Elijah Newren
2023-10-23 22:45   ` [PATCH v5 4/5] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-23 22:45   ` [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-25  7:58     ` Patrick Steinhardt
2023-10-25 15:46       ` Taylor Blau
2023-11-10 23:51     ` Elijah Newren
2023-11-11  0:27       ` Junio C Hamano
2023-11-11  1:34         ` Taylor Blau
2023-11-11  1:24       ` Taylor Blau
2023-11-13 22:05         ` Jeff King
2023-11-14  1:40           ` Junio C Hamano
2023-11-14  2:54             ` Elijah Newren
2023-11-14 21:55             ` Jeff King
2023-11-14  3:08           ` Elijah Newren
2023-11-13 22:02       ` Jeff King
2023-11-13 22:34         ` Taylor Blau
2023-11-14  2:50           ` Elijah Newren
2023-11-14 21:53             ` Jeff King
2023-11-14 22:04           ` Jeff King
2023-10-23 23:31   ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Junio C Hamano
2023-11-06 15:46     ` Johannes Schindelin
2023-11-06 23:19       ` Junio C Hamano
2023-11-07  3:42       ` Jeff King
2023-11-07 15:58       ` Taylor Blau
2023-11-07 18:22         ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Taylor Blau
2023-11-07 18:22           ` [RFC PATCH 1/3] merge-ort.c: finalize ODB transactions after each step Taylor Blau
2023-11-11  3:45             ` Elijah Newren
2023-11-07 18:22           ` [RFC PATCH 2/3] tmp-objdir: introduce `tmp_objdir_repack()` Taylor Blau
2023-11-08  7:05             ` Patrick Steinhardt [this message]
2023-11-09 19:26               ` Taylor Blau
2023-11-07 18:23           ` [RFC PATCH 3/3] builtin/replay.c: introduce `--write-pack` Taylor Blau
2023-11-11  3:42           ` [RFC PATCH 0/3] replay: implement support for writing new objects to a pack Elijah Newren
2023-11-11  4:04           ` Elijah Newren
2023-10-25  7:58   ` [PATCH v5 0/5] merge-ort: implement support for packing objects together Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZUszSs0CYoFV9YJ0@tanuki \
    --to=ps@pks.im \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).