From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-2.9 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FROM_EXCESS_BASE64,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 98DAE1F97E for ; Fri, 5 Oct 2018 16:44:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728061AbeJEXn7 (ORCPT ); Fri, 5 Oct 2018 19:43:59 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:33375 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727572AbeJEXn7 (ORCPT ); Fri, 5 Oct 2018 19:43:59 -0400 Received: by mail-wr1-f67.google.com with SMTP id e4-v6so14248536wrs.0 for ; Fri, 05 Oct 2018 09:44:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:user-agent:in-reply-to:date :message-id:mime-version:content-transfer-encoding; bh=4c8ylyqG1USsofpyHP8/9qRBnr40M9wTCi8Et28jaJE=; b=aFp1Nl1RnAUvK8BeyLBD59OTmHvtzlhvzT7WGEjRBChMuUIhafHvYxEGZ8lFLj6WGQ MPT4+JWqwTLivhhUmQf2D7pQxatz6n3Yt+y1QI3deAKo9oJCmrd9LRJLJrFGi8mHUs1C 0MVKMNPuuij9mz2top9pdz09lCdcsNaug0hp4dB8uUXymqTRY6K/9UmGwSo9Km1LfibC TeSI4Xq9sdTNgi2sKsi8402w+D2zvkfneyuRTwgbpJsnTbhM7cKVhHibJUPM+1ZPfaNB VN/X+5sDdsyJ3W6dgjwO03l3d4wkGS/yCI/RxDRXpl9y0rsm0FJRur9ZjjpHsrl3sAMW qAfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:user-agent :in-reply-to:date:message-id:mime-version:content-transfer-encoding; bh=4c8ylyqG1USsofpyHP8/9qRBnr40M9wTCi8Et28jaJE=; b=hL14vy+qHY6tWcALzPkKO+kwHFOrGgbb/OzGw+eBmXtgjKRTraERCwUAUtGsDrDsuE SmZOuknobX1myj0LNZLeFLSVwHquQT2Q5giOLmto0GAku5LyL+G/APNA1w4htu+nchRf JctxVUARI9jscw84uyskNIW3x7tEbMfH6IHsI/YRgI6V/uS4hqleu3MlieAA63z/69fK UdtZ6wiRpA0REYv+41EiviLQnpRMQUT6I8/ytAfE7dcH4MgFIvSVudLVO8CkkvxfqCeF LZZ3Wto/kIPkalcnWPOAemOhErKvmNIzw1AYmqxL5uCk6q+pu6zntv6Kn0OLSv+NMDLf xp3Q== X-Gm-Message-State: ABuFfojE+84Lk6LaakbFLGF0yIp04/muQQdLSfujlz1f1qnSHRBZ+AHC b3/MaZFxpOMHbDjH3z4Jhc8= X-Google-Smtp-Source: ACcGV61V6tlGGr5BVIM76K1VuOZOq4NY6rd0Ri+zWVHECvnqsYZwGMk4NlQIMGfn/68eFQlGn2uLXg== X-Received: by 2002:a5d:620b:: with SMTP id y11-v6mr8876802wru.105.1538757867761; Fri, 05 Oct 2018 09:44:27 -0700 (PDT) Received: from evledraar (g74155.upc-g.chello.nl. [80.57.74.155]) by smtp.gmail.com with ESMTPSA id t66-v6sm1516107wmt.5.2018.10.05.09.44.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 05 Oct 2018 09:44:26 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Jeff King Cc: Git List , Michael Haggerty Subject: Re: Is there some script to find un-delta-able objects? References: <87d0soh3v8.fsf@evledraar.gmail.com> <20181005161943.GA8816@sigill.intra.peff.net> User-agent: Debian GNU/Linux testing (buster); Emacs 25.2.2; mu4e 1.1.0 In-reply-to: <20181005161943.GA8816@sigill.intra.peff.net> Date: Fri, 05 Oct 2018 18:44:25 +0200 Message-ID: <87bm88gx7a.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Oct 05 2018, Jeff King wrote: > On Fri, Oct 05, 2018 at 04:20:27PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> I.e. something to generate the .gitattributes file using this format: >> >> https://git-scm.com/docs/gitattributes#_packing_objects >> >> Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if >> there's some repo scanner utility to spew this out for a given repo. > > I'm not sure what you mean by "un-delta-able" objects. Do you mean ones > where we're not likely to find a delta? Or ones where Git will not try > to look for a delta? > > If the latter, I think the only rules are the "-delta" attribute and the > object size. You should be able to use git-check-attr and "git-cat-file" > to get that info. > > If the former, I don't know how you would know. We can only report on > what isn't a delta _yet_. Some version of the former. Ones where we haven't found any (or much of) useful deltas yet. E.g. say I had a repository with a lot of files generated by this command at various points in the history: dd if=/dev/urandom of=file.binary count=1024 bs=1024 Some script similar to git-sizer which could report that the packed+compressed+delta'd version of the 10 *.binary files I had in my history had a 1:1 ratio of how large they were in .git, v.s. how large the sum of each file retrieved by "git show" was (i.e. uncompressed, un-delta'd). That doesn't mean that tomorrow I won't commit 10 new objects which would have a really good delta ratio to those 10 existing files, bringing the ratio to ~1:2, but if I had some report like: For a given repo that could be fed into .gitattributes to say we shouldn't bother to delta files of certain extensions.