From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 969BD1F97E for ; Fri, 5 Oct 2018 16:56:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728611AbeJEX4U (ORCPT ); Fri, 5 Oct 2018 19:56:20 -0400 Received: from cloud.peff.net ([104.130.231.41]:43288 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1727572AbeJEX4U (ORCPT ); Fri, 5 Oct 2018 19:56:20 -0400 Received: (qmail 19582 invoked by uid 109); 5 Oct 2018 16:56:47 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Fri, 05 Oct 2018 16:56:47 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 13162 invoked by uid 111); 5 Oct 2018 16:56:03 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Fri, 05 Oct 2018 12:56:03 -0400 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Fri, 05 Oct 2018 12:56:45 -0400 Date: Fri, 5 Oct 2018 12:56:45 -0400 From: Jeff King To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: Git List , Michael Haggerty Subject: Re: Is there some script to find un-delta-able objects? Message-ID: <20181005165644.GD11254@sigill.intra.peff.net> References: <87d0soh3v8.fsf@evledraar.gmail.com> <20181005161943.GA8816@sigill.intra.peff.net> <87bm88gx7a.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87bm88gx7a.fsf@evledraar.gmail.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Oct 05, 2018 at 06:44:25PM +0200, Ævar Arnfjörð Bjarmason wrote: > Some version of the former. Ones where we haven't found any (or much of) > useful deltas yet. E.g. say I had a repository with a lot of files > generated by this command at various points in the history: > > dd if=/dev/urandom of=file.binary count=1024 bs=1024 > > Some script similar to git-sizer which could report that the > packed+compressed+delta'd version of the 10 *.binary files I had in my > history had a 1:1 ratio of how large they were in .git, v.s. how large > the sum of each file retrieved by "git show" was (i.e. uncompressed, > un-delta'd). You can get the uncompressed and on-disk sizes with: git cat-file --batch-all-objects \ --batch-check='%(objectname) %(objectsize) %(objectsize:disk)' and then compare the sizes/ratios however you like. If you want just a subset of the blobs, drop the "--batch-all-objects" and just feed the object names or even "HEAD:filename" on stdin). > That doesn't mean that tomorrow I won't commit 10 new objects which > would have a really good delta ratio to those 10 existing files, > bringing the ratio to ~1:2, but if I had some report like: > > > > For a given repo that could be fed into .gitattributes to say we > shouldn't bother to delta files of certain extensions. I don't know of a tool that does that, but I think a modest application of perl to the cat-file output would produce it. -Peff