All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Son Luong Ngoc <sluongng@gmail.com>
Subject: [PATCH v4 0/2] midx: apply gitconfig to midx repack
Date: Sun, 10 May 2020 16:07:32 +0000	[thread overview]
Message-ID: <pull.626.v4.git.1589126855.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.626.v3.git.1589034270.gitgitgadget@gmail.com>

Midx repack has largely been used in Microsoft Scalar on the client side to
optimize the repository multiple packs state. However when I tried to apply
this onto the server-side, I realized that there are certain features that
were lacking compare to git repack. Most of these features are highly
desirable on the server-side to create the most optimized pack possible.

One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size differences.

> du objects/pack/*pack
14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack

Latest 2.26.2 (without delta_base_offset)
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack

With delta_base_offset
> git version
git version 2.26.2.672.g232c24e857.dirty
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack

Note that repack.writeBitmaps configuration is ignored, as the pack bitmap
facility is useful only with a single packfile.

Derrick Stolee's following patch will address repack.packKeptObjects 
support.

Derrick Stolee (1):
  multi-pack-index: respect repack.packKeptObjects=false

Son Luong Ngoc (1):
  midx: teach "git multi-pack-index repack" honor "git repack"
    configurations

 Documentation/git-multi-pack-index.txt |  3 ++
 midx.c                                 | 42 +++++++++++++++++++++++---
 t/t5319-multi-pack-index.sh            | 27 +++++++++++++++++
 3 files changed, 67 insertions(+), 5 deletions(-)


base-commit: b994622632154fc3b17fb40a38819ad954a5fb88
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/626

Range-diff vs v3:

 1:  a925307d4c5 ! 1:  a8f75e34e5b midx: teach "git multi-pack-index repack" honor "git repack" configurations
     @@ Metadata
       ## Commit message ##
          midx: teach "git multi-pack-index repack" honor "git repack" configurations
      
     -    Previously, when the "repack" subcommand of "git multi-pack-index" command
     -    creates new packfile(s), it does not call the "git repack" command but
     -    instead directly calls the "git pack-objects" command, and the
     -    configuration variables meant for the "git repack" command, like
     -    "repack.usedaeltabaseoffset", are ignored.
     +    When the "repack" subcommand of "git multi-pack-index" command
     +    creates new packfile(s), it does not call the "git repack"
     +    command but instead directly calls the "git pack-objects"
     +    command, and the configuration variables meant for the "git
     +    repack" command, like "repack.usedaeltabaseoffset", are ignored.
      
     -    This patch ensured "git multi-pack-index" checks the configuration
     -    variables used by "git repack" and passes the corresponding options to
     -    the underlying "git pack-objects" command.
     +    Check the configuration variables used by "git repack" ourselves
     +    in "git multi-index-pack" and pass the corresponding options to
     +    underlying "git pack-objects".
      
          Note that `repack.writeBitmaps` configuration is ignored, as the
          pack bitmap facility is useful only with a single packfile.
     @@ Commit message
      
       ## midx.c ##
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
     - 	struct child_process cmd = CHILD_PROCESS_INIT;
       	struct strbuf base_name = STRBUF_INIT;
       	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
     + 
     ++	/*
     ++	 * When updating the default for these configuration
     ++	 * variables in builtin/repack.c, these must be adjusted
     ++	 * to match.
     ++	 */
      +	int delta_base_offset = 1;
      +	int use_delta_islands = 0;
     - 
     ++
       	if (!m)
       		return 0;
     + 
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
       	} else if (fill_included_packs_all(m, include_pack))
       		goto cleanup;
 2:  988697dd512 ! 2:  192fc785382 multi-pack-index: respect repack.packKeptObjects=false
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'repack with minimum size does
      +		ls .git/objects/pack/*idx >idx-list &&
      +		test_line_count = 5 idx-list &&
      +		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
     ++		test_line_count = 5 keep-list &&
      +		for keep in $(cat keep-list)
      +		do
      +			touch $keep || return 1
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'repack with minimum size does
      +		test_line_count = 5 idx-list &&
      +		test-tool read-midx .git/objects | grep idx >midx-list &&
      +		test_line_count = 5 midx-list &&
     -+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
     -+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
     ++		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | sed -n 3p) &&
     ++		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
      +		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
      +		ls .git/objects/pack/*idx >idx-list &&
      +		test_line_count = 5 idx-list &&
 3:  efeb3d7d132 < -:  ----------- Ensured t5319 follows arith expansion guideline

-- 
gitgitgadget

  parent reply	other threads:[~2020-05-10 16:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 13:06 [PATCH] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-05 13:50 ` Derrick Stolee
2020-05-05 16:03   ` Son Luong Ngoc
2020-05-06  8:56     ` Son Luong Ngoc
2020-05-06  9:43 ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
2020-05-06  9:43   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
2020-05-06 12:03     ` Derrick Stolee
2020-05-06 17:03     ` Junio C Hamano
2020-05-07  7:29       ` Son Luong Ngoc
2020-05-06  9:43   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-06 16:18     ` Eric Sunshine
2020-05-06 16:36       ` Derrick Stolee
2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-09 14:24     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
2020-05-09 16:51       ` Junio C Hamano
2020-05-10 14:27         ` Son Luong Ngoc
2020-05-09 14:24     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-09 16:11       ` Đoàn Trần Công Danh
2020-05-09 17:33         ` Junio C Hamano
2020-05-10  6:38           ` Đoàn Trần Công Danh
2020-05-10 15:52             ` Son Luong Ngoc
2020-05-09 14:24     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
2020-05-09 16:55       ` Junio C Hamano
2020-05-10 16:07     ` Son Luong Ngoc via GitGitGadget [this message]
2020-05-10 16:07       ` [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
2020-05-10 16:07       ` [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.626.v4.git.1589126855.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sluongng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.