From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AD5BC433FE for ; Sat, 22 Oct 2022 15:46:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229494AbiJVPqN (ORCPT ); Sat, 22 Oct 2022 11:46:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbiJVPqK (ORCPT ); Sat, 22 Oct 2022 11:46:10 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B1B624AADA for ; Sat, 22 Oct 2022 08:46:08 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id m29-20020a05600c3b1d00b003c6bf423c71so7175228wms.0 for ; Sat, 22 Oct 2022 08:46:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :message-id:from:to:cc:subject:date:message-id:reply-to; bh=zZQEqSkvYJEUyPWCQx2roN1Yh8GYsB4xueLYOWIpc7A=; b=CuQT+n6oYo1agN0dWbB6kq3WSGI1NTWvkPTCsYdsZn2djdd0bS1oweyobt4PVdDdzq 8gVZsi/aUHOzry+nQHfgF6Vzc9BCVytRNzzc12TWFJ+TE7deyuqPBGM2hGndib2hF+Rh /EKYXJr2ttbintGQioQ/hmxjboabuXAT7PRvUMNXworMMokH8V4VQ4qUZWNquJ0+ZT7Z gkgTscKME5x1e/O9rmappX1l3BUhRqeuJPZXSDZZIsfCCAmJHwshoeQbqvvC5Q9Jzl+Z x1sAjIT5mFDuEuQyb3Sn5CkhZD9DOwPfT4uMS0fkObcYlgrdn4isEXoeFNdOtQl/yAGU b77A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zZQEqSkvYJEUyPWCQx2roN1Yh8GYsB4xueLYOWIpc7A=; b=RD4J19qtLH41NBvvTGVNwt1nfaSEsYiBs1bxAYZlpb8Tb9Y+ydABkVOV8I/PpEkjT6 AOvr/gN/G56HvtVkcqIX5fkQLSK/4uslWYtoWBhtLO7d+QLrwBpov5MDJTKX1SvN3mIG ksx41KYOe0YQUvMjpb729HrmjNZNJ5BkALUv1pmSL41+45bFmEx3nfyI+h55Ac/4XpwJ zQHrpq7N1saQmQXyM6tUq99rkw66nAluUe9sdlU07wJUtF8dHf4QA2PgkWb2KQKSJLBe kYkiZCB6PaElKkUKocrWu/IEGSOulK/p3VE9Mch5AekXec1fNwSHSBUIx4kIT7lS56G9 4Vew== X-Gm-Message-State: ACrzQf0X8xnXMOnL/yzkqE29mcJdyoOpjj19jdITOblT/iQJ7mu3KNpT Y5UF0kOkb/ayo1zuUsRrbd82emnsyis= X-Google-Smtp-Source: AMsMyM5Gv2GvCdPfjdMFDyn6+DMV5qZxk8H4zYxRnA+C8SYYNLOo9XIP3gFVomwB+Atdc5JQVDXZPQ== X-Received: by 2002:a05:600c:2f08:b0:3c6:befc:9778 with SMTP id r8-20020a05600c2f0800b003c6befc9778mr16556824wmn.101.1666453566167; Sat, 22 Oct 2022 08:46:06 -0700 (PDT) Received: from [127.0.0.1] ([13.74.141.28]) by smtp.gmail.com with ESMTPSA id l15-20020a05600012cf00b002365b759b65sm2353219wrx.86.2022.10.22.08.46.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 22 Oct 2022 08:46:05 -0700 (PDT) Message-Id: From: "ZheNing Hu via GitGitGadget" Date: Sat, 22 Oct 2022 15:46:04 +0000 Subject: [PATCH] pack-objects: introduce --exclude-delta= option Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Christian Couder , =?UTF-8?Q?=C3=86var_Arnfj=C3=B6r=C3=B0?= Bjarmason , Junio C Hamano , Derrick Stolee , Johannes Schindelin , Elijah Newren , James Ramsay , ZheNing Hu , ZheNing Hu Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: ZheNing Hu The server uses delta compression during git clone to reduce the amount of data transferred over the network, but delta compression for large binary blobs often does not reduce storage size significantly and wastes a lot of CPU. Git now disables delta compression for objects that meet these conditions: 1. files that have -delta set in .gitattributes 2. files that its size exceed the big_file_threshold However, in 1, .gitattributes needs to be set manually by the user, and in most cases the user does not actively set it, and it is not something that can be actively adjusted on the server aside. In 2, the big_file_threshold now defaults to 512MB, and many binary files smaller than that will be uselessly delta-compressed, and this is made worse if the server actively increases the big_file_threshold. Therefore, we need a way to be able to actively skip the delta compression of some files on the server. Introduces the `-exclude-delta=` option, which can be used to disable delta compression for objects that satisfy the pattern. Signed-off-by: ZheNing Hu --- pack-objects: introduce --exclude-delta= option While analyzing some repositories using git filter-repo -analyze, I noticed that many huge binaries in the repositories were delta-compressed without much reduction in size. $ cat .git/filter-repo/analysis/path-all-sizes.txt | more === All paths by reverse accumulated size === Format: unpacked size, packed size, date deleted, path name 23816778 23765921 2022-08-22 managed/src/universal/ybc/ybc-1.0.0-b1-linux-x86_64.tar.gz 22504398 22445676 2022-08-22 managed/src/universal/ybc/ybc-1.0.0-b1-el8-aarch64.tar.gz 11726471 6424233 2022-08-09 managed/yba-installer/yba-installer_linux_amd64 294644800 5794201 src/yb/master/catalog_manager.cc 2912780 2872186 docs/static/images/yp/tables-view-ycql.png 2992192 2634232 docs/static/images/yb-cloud/cloud-clusters-backups.png 2757095 2501915 docs/static/images/deploy/aws/aws-cf-configure-options.png ... The current solution to avoid delta compression is not very suitable for git servers. First, files that exceed the big_file_threshold are not delta compressed, but the above analysis indicates that many big binary files do not exceed the the big_file_threshold (default to 512MB). Second, there is not .gitattrbutes to disable delta compression for them, we also don't really can let repo administrators add it manually. But we can also see that the large files in these repositories often have some common characteristics: they end in ".tar.gz"or “.png". So perhaps we can take advantage of this feature and disable delta compression on the server for some common type binary files. This is currently implemented by command line parameters --exclude-delta=. But maybe we can also try passing it through git config. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1392%2Fadlternative%2Fadl%2Fpack-object-no-try-delta-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1392/adlternative/adl/pack-object-no-try-delta-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1392 Documentation/git-pack-objects.txt | 6 +++++- builtin/pack-objects.c | 28 +++++++++++++++++++++++++++- 2 files changed, 32 insertions(+), 2 deletions(-) diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index a9995a932ca..92cfee83df5 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -13,7 +13,7 @@ SYNOPSIS [--no-reuse-delta] [--delta-base-offset] [--non-empty] [--local] [--incremental] [--window=] [--depth=] [--revs [--unpacked | --all]] [--keep-pack=] - [--cruft] [--cruft-expiration=