From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.2 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 4ED3920281 for ; Thu, 2 Nov 2017 20:32:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934544AbdKBUcN (ORCPT ); Thu, 2 Nov 2017 16:32:13 -0400 Received: from siwi.pair.com ([209.68.5.199]:14577 "EHLO siwi.pair.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934510AbdKBUbr (ORCPT ); Thu, 2 Nov 2017 16:31:47 -0400 Received: from siwi.pair.com (localhost [127.0.0.1]) by siwi.pair.com (Postfix) with ESMTP id 9741E845AD; Thu, 2 Nov 2017 16:31:46 -0400 (EDT) Received: from jeffhost-ubuntu.reddog.microsoft.com (unknown [65.55.188.213]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by siwi.pair.com (Postfix) with ESMTPSA id 046A4845AC; Thu, 2 Nov 2017 16:31:45 -0400 (EDT) From: Jeff Hostetler To: git@vger.kernel.org Cc: gitster@pobox.com, peff@peff.net, jonathantanmy@google.com, Jeff Hostetler Subject: [PATCH 06/14] pack-objects: test support for blob filtering Date: Thu, 2 Nov 2017 20:31:21 +0000 Message-Id: <20171102203129.59417-7-git@jeffhostetler.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20171102203129.59417-1-git@jeffhostetler.com> References: <20171102203129.59417-1-git@jeffhostetler.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jonathan Tan As part of an effort to improve Git support for very large repositories in which clients typically have only a subset of all version-controlled blobs, test pack-objects support for --filter=blobs:limit=, packing only blobs not exceeding that size unless the blob corresponds to a file whose name starts with ".git". upload-pack will eventually be taught to use this new parameter if needed to exclude certain blobs during a fetch or clone, potentially drastically reducing network consumption when serving these very large repositories. Signed-off-by: Jonathan Tan Signed-off-by: Jeff Hostetler --- t/t5300-pack-object.sh | 45 +++++++++++++++++++++++++++++++++++++++++++++ t/test-lib-functions.sh | 12 ++++++++++++ 2 files changed, 57 insertions(+) diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh index 9c68b99..0739a07 100755 --- a/t/t5300-pack-object.sh +++ b/t/t5300-pack-object.sh @@ -457,6 +457,51 @@ test_expect_success !PTHREADS,C_LOCALE_OUTPUT 'pack-objects --threads=N or pack. grep -F "no threads support, ignoring pack.threads" err ' +lcut () { + perl -e '$/ = undef; $_ = <>; s/^.{'$1'}//s; print $_' +} + +test_expect_success 'filtering by size works with multiple excluded' ' + rm -rf server && + git init server && + printf a > server/a && + printf b > server/b && + printf c-very-long-file > server/c && + printf d-very-long-file > server/d && + git -C server add a b c d && + git -C server commit -m x && + + git -C server rev-parse HEAD >objects && + git -C server pack-objects --revs --stdout --filter=blobs:limit=10 my.pack && + + # Ensure that only the small blobs are in the packfile + git index-pack my.pack && + git verify-pack -v my.idx >objectlist && + grep $(git hash-object server/a) objectlist && + grep $(git hash-object server/b) objectlist && + ! grep $(git hash-object server/c) objectlist && + ! grep $(git hash-object server/d) objectlist +' + +test_expect_success 'filtering by size never excludes special files' ' + rm -rf server && + git init server && + printf a-very-long-file > server/a && + printf a-very-long-file > server/.git-a && + printf b-very-long-file > server/b && + git -C server add a .git-a b && + git -C server commit -m x && + + git -C server rev-parse HEAD >objects && + git -C server pack-objects --revs --stdout --filter=blobs:limit=10 my.pack && + + # Ensure that the .git-a blob is in the packfile, despite also + # appearing as a non-.git file + git index-pack my.pack && + git verify-pack -v my.idx >objectlist && + grep $(git hash-object server/a) objectlist +' + # # WARNING! # diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 1701fe2..07b79c7 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1020,3 +1020,15 @@ nongit () { "$@" ) } + +# Converts big-endian pairs of hexadecimal digits into bytes. For example, +# "printf 61620d0a | hex_pack" results in "ab\r\n". +hex_pack () { + perl -e '$/ = undef; $input = <>; print pack("H*", $input)' +} + +# Converts bytes into big-endian pairs of hexadecimal digits. For example, +# "printf 'ab\r\n' | hex_unpack" results in "61620d0a". +hex_unpack () { + perl -e '$/ = undef; $input = <>; print unpack("H2" x length($input), $input)' +} -- 2.9.3