From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DFE9C47247 for ; Tue, 5 May 2020 13:50:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 369DA206A5 for ; Tue, 5 May 2020 13:50:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dQeJn1DZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729093AbgEENul (ORCPT ); Tue, 5 May 2020 09:50:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44832 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1728512AbgEENul (ORCPT ); Tue, 5 May 2020 09:50:41 -0400 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3275C061A0F for ; Tue, 5 May 2020 06:50:40 -0700 (PDT) Received: by mail-qk1-x741.google.com with SMTP id f83so2229949qke.13 for ; Tue, 05 May 2020 06:50:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=cjTxnwtj0O7js2C076vkXBRB85rpd1110d5+NvuYwPk=; b=dQeJn1DZserwJvPknWLB5VS5agTy8Zm+qfumTQDsYNc2se6i/TLDEINZWIA+mBlJKK lCigjZbHz1OA761pO0zVRyGjhhDGOogExD3NkBG/FcQoXdr77e80W8duUnTVvLPXH0Bi Rqg0YWLdbeuKhI7+0hblJpB4kpPuY6sy8LNhkvXSZ39AVjdP/0leWDk8V2CJ9yZBHIfq tQApyTEbRgTDrBF5XInidjMDBsDniJZ0yd5ftIyTb1kdmKXmULcSp2ApDCLNYFWYyBLA 1WtLaFE6RoLxbxzd2ff0KatT4JAJb3WKcrFjw5pZlArN171+0t3MiWZQPza5331HtLWO ktPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=cjTxnwtj0O7js2C076vkXBRB85rpd1110d5+NvuYwPk=; b=hl2NnAokK2vyUjxESCG+nrfMz40a5Ms9epgIwq0uGOSqf520HYX/8KdV8FAZDY6sHM j45VmgNEZs8CcLdT1ecbLXTK6NMWV2245GacfArR+0VMepcDk8HeiowaAxHNoT027Iqv li0EIECn3iMpy0FrerE1BaY6ISvie+z3EgrVRlI+x1oJKE4t3xkmuHg4kjAQB8Es4PBp FpIJw+V7lIUVnEsImg+EJB8mZH5TglzoR4lkCRWD1a01FLy3CvhNOBGNn2UGlYL/D1hg EQ2JJlm0v/ouUzbK9jgWyA0ZgwNtUiA8IIQyX92lTJzZB4xfwmRcv/KJ112YlFFeBElG mGZw== X-Gm-Message-State: AGi0Pub+4cXrNP9Yh/YYB8bQCM15agLx2dURV45K/14/DruglIfnuO56 LY6gwXyYHrUkmJgXvIAMKjA= X-Google-Smtp-Source: APiQypIlYYgvhC8iFZjCcQbxJ+QmdyUrZOt2226YGtsa4xkP5pm03FRLTMTnDNr2QpElVMvKIqwBtg== X-Received: by 2002:a37:6843:: with SMTP id d64mr3548664qkc.24.1588686639771; Tue, 05 May 2020 06:50:39 -0700 (PDT) Received: from [192.168.1.110] ([99.85.27.166]) by smtp.gmail.com with ESMTPSA id u35sm1672011qtd.88.2020.05.05.06.50.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 05 May 2020 06:50:38 -0700 (PDT) Subject: Re: [PATCH] midx: apply gitconfig to midx repack To: Son Luong Ngoc via GitGitGadget , git@vger.kernel.org Cc: Son Luong Ngoc References: From: Derrick Stolee Message-ID: <8bd91a14-75dc-76e2-31b4-54eff5bea8dd@gmail.com> Date: Tue, 5 May 2020 09:50:37 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Thunderbird/76.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 5/5/2020 9:06 AM, Son Luong Ngoc via GitGitGadget wrote: > From: Son Luong Ngoc > > Multi-Pack-Index repack is an incremental, repack solutions > that allows user to consolidate multiple packfiles in a non-disruptive > way. However the new packfile could be created without some of the > capabilities of a packfile that is created by calling `git repack`. > > This is because with `git repack`, there are configuration that would > enable different flags to be passed down to `git pack-objects` plumbing. > > In this patch, I applies those flags into `git multi-pack-index repack` > so that it respect the `repack.*` config series. This is a good idea! The fact that these are specified by 'git repack' and not 'git pack-objects' makes intervention here necessary. However, I don't think that all of these will apply properly. > Note: I left out `repack.packKeptObjects` intentionally as I dont think > its relevant to midx repack use case. I think it would be good to add this, but in a different way. > Signed-off-by: Son Luong Ngoc > --- > midx: apply gitconfig to midx repack > > Midx repack has largely been used in Microsoft Scalar on the client side > to optimize the repository multiple packs state. However when I tried to > apply this onto the server-side, I realized that there are certain > features that were lacking compare to git repack. Most of these features > are highly desirable on the server-side to create the most optimized > pack possible. > > One of the example is delta_base_offset, comparing an midx repack > with/without delta_base_offset, we can observe significant size > differences. > > > du objects/pack/*pack > 14536 objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack > 9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack > > Latest 2.26.2 (without delta_base_offset) > > git multi-pack-index write > > git multi-pack-index repack > > git multi-pack-index expire > > du objects/pack/*pack > 9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack > > With delta_base_offset > > git version > git version 2.26.2.672.g232c24e857.dirty > > git multi-pack-index write > > git multi-pack-index repack > > git multi-pack-index expire > > du objects/pack/*pack > 9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack > > In this patch, I intentionally leaving out repack.packKeptObjects as I > don't think its very relevant to midx repack use case: > > * One could always exclude biggest packs with --batch-size option > > > * For non-biggest-packs exclusion use case, its rather rare (unless you > want to have a special pack with only commits and trees being > excluded from repack to serve partial clone better?) > > > > Please let me know if anyone think that we should include that option > for the sake of completions. In the scenario where there is a .keep pack _and_ it is small enough to get picked up by the batch size, the 'git multi-pack-index repack' command will create a new pack containing its objects (and objects from other packs) but the 'git multi-pack-index expire' command will not delete the pack with .keep. The good news is that after the first repack, the objects in the pack are in a newer pack, so the multi-pack-index will not repack those objects from that pack multiple times. However, this may be unintended behavior for the user that specified the .keep pack. I think the right thing to do to respect "repack.packKeptObjects = false" is to ignore the packs when selecting the batch of objects. Instead of asking you to do this, I added a patch below. Please take it into your v2, if you don't mind. > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/626 > > midx.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/midx.c b/midx.c > index 9a61d3b37d9..88f16594268 100644 > --- a/midx.c > +++ b/midx.c > @@ -1361,6 +1361,10 @@ static int fill_included_packs_batch(struct repository *r, > return 0; > } > > +static int delta_base_offset = 1; > +static int write_bitmaps = -1; > +static int use_delta_islands; > + Why not make these local to the midx_repack method? > int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags) > { > int result = 0; > @@ -1381,12 +1385,25 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, > } else if (fill_included_packs_all(m, include_pack)) > goto cleanup; > > + git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset); > + git_config_get_bool("repack.writebitmaps", &write_bitmaps); > + git_config_get_bool("repack.usedeltaislands", &use_delta_islands); > + It looks like you have some spacing issues here. Perhaps use tabs? > argv_array_push(&cmd.args, "pack-objects"); > > strbuf_addstr(&base_name, object_dir); > strbuf_addstr(&base_name, "/pack/pack"); > argv_array_push(&cmd.args, base_name.buf); > > + if (delta_base_offset) > + argv_array_push(&cmd.args, "--delta-base-offset"); > + if (use_delta_islands) > + argv_array_push(&cmd.args, "--delta-islands"); These two probably make sense. > + if (write_bitmaps > 0) > + argv_array_push(&cmd.args, "--write-bitmap-index"); > + else if (write_bitmaps < 0) > + argv_array_push(&cmd.args, "--write-bitmap-index-quiet"); These make less sense. Unless --batch-size=0 and there are no .keep packs (with the patch below) I'm not sure we _can_ write bitmap indexes here. The pack-file is not necessarily closed under reachability. Or, will supplying these arguments to 'git pack-objects' actually do that closure? I would be happy to special-case these options to the "--batch-size=0" situation and otherwise ignore them. This then gets into enough complication that we should update the documentation as in the patch below. At minimum, it would be good to have some tests that exercise these code paths so we know they are behaving correctly. Thanks, -Stolee -- >8 -- >From 8a115191cbf21c553675a235c8c678affbca609b Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 5 May 2020 09:37:50 -0400 Subject: [PATCH] multi-pack-index: respect repack.packKeptObjects=false When selecting a batch of pack-files to repack in the "git multi-pack-index repack" command, Git should respect the repack.packKeptObjects config option. When false, this option says that the pack-files with an associated ".keep" file should not be repacked. This config value is "false" by default. There are two cases for selecting a batch of objects. The first is the case where the input batch-size is zero, which specifies "repack everything". The second is with a non-zero batch size, which selects pack-files using a greedy selection criteria. Both of these cases are updated and tested. Reported-by: Son Luong Ngoc Signed-off-by: Derrick Stolee --- Documentation/git-multi-pack-index.txt | 3 +++ midx.c | 26 +++++++++++++++++++++----- t/t5319-multi-pack-index.sh | 26 ++++++++++++++++++++++++++ 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 642d9ac5b7..0c6619493c 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -56,6 +56,9 @@ repack:: file is created, rewrite the multi-pack-index to reference the new pack-file. A later run of 'git multi-pack-index expire' will delete the pack-files that were part of this batch. ++ +If `repack.packKeptObjects` is `false`, then any pack-files with an +associated `.keep` file will not be selected for the batch to repack. EXAMPLES diff --git a/midx.c b/midx.c index 1527e464a7..d055bf3cd3 100644 --- a/midx.c +++ b/midx.c @@ -1280,15 +1280,26 @@ static int compare_by_mtime(const void *a_, const void *b_) return 0; } -static int fill_included_packs_all(struct multi_pack_index *m, +static int fill_included_packs_all(struct repository *r, + struct multi_pack_index *m, unsigned char *include_pack) { - uint32_t i; + uint32_t i, count = 0; + int pack_kept_objects = 0; + + repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects); + + for (i = 0; i < m->num_packs; i++) { + if (prepare_midx_pack(r, m, i)) + continue; + if (!pack_kept_objects && m->packs[i]->pack_keep) + continue; - for (i = 0; i < m->num_packs; i++) include_pack[i] = 1; + count++; + } - return m->num_packs < 2; + return count < 2; } static int fill_included_packs_batch(struct repository *r, @@ -1299,6 +1310,9 @@ static int fill_included_packs_batch(struct repository *r, uint32_t i, packs_to_repack; size_t total_size; struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info)); + int pack_kept_objects = 0; + + repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects); for (i = 0; i < m->num_packs; i++) { pack_info[i].pack_int_id = i; @@ -1325,6 +1339,8 @@ static int fill_included_packs_batch(struct repository *r, if (!p) continue; + if (!pack_kept_objects && p->pack_keep) + continue; if (open_pack_index(p) || !p->num_objects) continue; @@ -1365,7 +1381,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, if (batch_size) { if (fill_included_packs_batch(r, m, include_pack, batch_size)) goto cleanup; - } else if (fill_included_packs_all(m, include_pack)) + } else if (fill_included_packs_all(r, m, include_pack)) goto cleanup; argv_array_push(&cmd.args, "pack-objects"); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 43a7a66c9d..b2fece5d3d 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -533,6 +533,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' ' ) ' +test_expect_success 'repack respects repack.packKeptObjects=false' ' + test_when_finished rm -f dup/.git/objects/pack/*keep && + ( + cd dup && + ls .git/objects/pack/*idx >idx-list && + test_line_count = 5 idx-list && + ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list && + for keep in $(cat keep-list) + do + touch $keep || return 1 + done && + git multi-pack-index repack --batch-size=0 && + ls .git/objects/pack/*idx >idx-list && + test_line_count = 5 idx-list && + test-tool read-midx .git/objects | grep idx >midx-list && + test_line_count = 5 midx-list && + THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) && + BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) && + git multi-pack-index repack --batch-size=$BATCH_SIZE && + ls .git/objects/pack/*idx >idx-list && + test_line_count = 5 idx-list && + test-tool read-midx .git/objects | grep idx >midx-list && + test_line_count = 5 midx-list + ) +' + test_expect_success 'repack creates a new pack' ' ( cd dup && -- 2.26.2.vfs.1.2