From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RCVD_IN_SORBS_SPAM, RP_MATCHES_RCVD shortcircuit=no autolearn=no autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 028471FAFB for ; Thu, 6 Apr 2017 21:20:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752194AbdDFVUS (ORCPT ); Thu, 6 Apr 2017 17:20:18 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:36100 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751593AbdDFVUQ (ORCPT ); Thu, 6 Apr 2017 17:20:16 -0400 Received: by mail-wr0-f195.google.com with SMTP id o21so9725154wrb.3 for ; Thu, 06 Apr 2017 14:20:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=286/tetAVynT5E4IRLIjg7hJMi3P490xYGbUWfYNniQ=; b=H7lADujY+69ClwOOBoaAoynDAQ8SkhV9vXIdo/IGzkM8Vht36yfYRrMnqARiu7c5RY YdLrMpSiXxv9U5Z4CFNSOZLLJwN1LLQbC4WOPHzbCBw/yyiwfeGfOvIjoLpdoMBWkrqM qJqPN7hSl02itX6p4VcZXJznvxjE+afAwPvQ3kAj8UGkAAmTW5d0vfnFy8H9z2I1AK6E N4a+BRi4QKTxk9T+we1uHUbMOi2JxvfnKJQUs6jHMFY2qLM7LYxTDgkEsc9DGGJNhkyM 16s2h+SLUm/A3TUwQ33ALWqgVdaPLiPBDeuNb7rp2+E2BwdKHTYS8R6oQQ07Tc2a13hJ y0rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=286/tetAVynT5E4IRLIjg7hJMi3P490xYGbUWfYNniQ=; b=cG80H9TBGLfSgq+NgKdgLwWI3JP0xqAPX2EPZUUnOqoGjOj1gIbkbA7ew3GzZcTgDZ sm2I19GewW2FZNdLI/2sf8Vtu/O7xGIkcKOcjGL1u+HlYCg0Ce9Dexz4TbN2U07TsycA viz1YE6MY5jalxpJYmfFBWh+jJaP9w4F/JP2XwQTfsnkSRgL2IgBaWxiMrtYe7O9U/C1 XgJF9iN6BKTYiTdoz97NPMMcsQPy1jNtUaiF62PXw02q7RgqlTSIp1B2TjMGxY8BLEkq 48+snRdAH+c+dO/jRUwSCfZKXdZ06ycmEfLZ/lsm+u59Mou7RIsR5L6hRZ0E1CsthxCK HElA== X-Gm-Message-State: AFeK/H39+pgi6JjNVHH8pKjACBm+uhqD4JSyMfCsOgDi8Kmmq+DM5OvLoIWsu+Ekj8iFdg== X-Received: by 10.223.172.133 with SMTP id o5mr27865366wrc.23.1491513615109; Thu, 06 Apr 2017 14:20:15 -0700 (PDT) Received: from localhost ([2a02:c7f:c42b:f900:5e51:4fff:fee9:57af]) by smtp.gmail.com with ESMTPSA id 60sm3493656wrg.60.2017.04.06.14.20.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 06 Apr 2017 14:20:14 -0700 (PDT) Date: Fri, 7 Apr 2017 00:26:39 +0100 From: Thomas Gummerer To: Jeff Hostetler Cc: git@vger.kernel.org, gitster@pobox.com, peff@peff.net, Jeff Hostetler Subject: Re: [PATCH v3 2/2] p0005-status: time status on very large repo Message-ID: <20170406232634.GB32223@hank> References: <20170406134508.31711-1-git@jeffhostetler.com> <20170406134508.31711-3-git@jeffhostetler.com> <20170406221453.GA32223@hank> <590f6863-801b-58e9-3700-962168f8315e@jeffhostetler.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <590f6863-801b-58e9-3700-962168f8315e@jeffhostetler.com> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 04/06, Jeff Hostetler wrote: > > > On 4/6/2017 6:14 PM, Thomas Gummerer wrote: > >On 04/06, git@jeffhostetler.com wrote: > >>From: Jeff Hostetler > >> > >>Signed-off-by: Jeff Hostetler > >>--- > >> t/perf/p0005-status.sh | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 61 insertions(+) > >> create mode 100755 t/perf/p0005-status.sh > >> > >>diff --git a/t/perf/p0005-status.sh b/t/perf/p0005-status.sh > >>new file mode 100755 > >>index 0000000..704cebc > >>--- /dev/null > >>+++ b/t/perf/p0005-status.sh > >>@@ -0,0 +1,61 @@ > >>+#!/bin/sh > >>+ > >>+test_description="Tests performance of read-tree" > >>+ > >>+. ./perf-lib.sh > >>+ > >>+test_perf_default_repo > >>+test_checkout_worktree > >>+ > >>+## usage: dir depth width files > >>+make_paths () { > >>+ for f in $(seq $4) > >>+ do > >>+ echo $1/file$f > >>+ done; > >>+ if test $2 -gt 0; > >>+ then > >>+ for w in $(seq $3) > >>+ do > >>+ make_paths $1/dir$w $(($2 - 1)) $3 $4 > >>+ done > >>+ fi > >>+ return 0 > >>+} > >>+ > >>+fill_index () { > >>+ make_paths $1 $2 $3 $4 | > >>+ sed "s/^/100644 $EMPTY_BLOB /" | > >>+ git update-index --index-info > >>+ return 0 > >>+} > >>+ > >>+br_work1=xxx_work1_xxx > >>+dir_new=xxx_dir_xxx > >>+ > >>+## (5, 10, 9) will create 999,999 files. > >>+## (4, 10, 9) will create 99,999 files. > >>+depth=5 > >>+width=10 > >>+files=9 > >>+ > >>+## Inflate the index with thousands of empty files and commit it. > >>+## Use reset to actually populate the worktree. > >>+test_expect_success 'inflate the index' ' > >>+ git reset --hard && > >>+ git branch $br_work1 && > >>+ git checkout $br_work1 && > >>+ fill_index $dir_new $depth $width $files && > >>+ git commit -m $br_work1 && > >>+ git reset --hard > >>+' > >>+ > >>+## The number of files in the branch. > >>+nr_work1=$(git ls-files | wc -l) > > > >The above seems to be repeated (or at least very similar to what you > >have in your other series [1]. Especially in this perf test wouldn't > >it be better just use test_perf_large_repo, and let whoever runs the > >test decide what constitutes a large repository for them? > > > >The other advantage of that would be that it is more of a real-world > >scenario, instead of a synthetic distribution of the files, which > >would give us some better results I think. > > > >Is there anything I'm missing that would make using > >test_perf_large_repo not a good option here? > > Yes, it is copied from the other series. I make the same change > that Rene just suggested on it to use awk to create the list. > > I did this because I need very large repos. From what I can tell > the common usage is to set test_perf_large_repo to linux.git, but > that only has 58K files. And it defaults to git.git which only > has 3K files. Yeah true. Back when I worked on "index v5" for my GSoC project, I used to use the webkit repository, which at the time had 300-something K files. Nowadays the better test might be the chromium repository, but I'm not sure (cloning that takes a while on my connection :) ). > Internally, I test against the Windows source tree with 3.1M files, > but I can't share that :-) Heh. I'd love to see the performance numbers for that though! > So I created this test to generate artificial, but large and > reproducible repos for evaluation. > > I could change the default depth to 4 (giving a 100K tree), but > I'm really interested in 1M+ repos. For small-ish values of n > the difference between O(n) and O(n log n) operations can hide > in system and I/O noise; not so for very large n.... Makes sense to me. Thanks for the explanation! > > > >[1]: http://public-inbox.org/git/20170406163442.36463-3-git@jeffhostetler.com/ > > > >>+test_perf "read-tree status work1 ($nr_work1)" ' > >>+ git read-tree HEAD && > >>+ git status > >>+' > >>+ > >>+test_done > >>-- > >>2.9.3 > >>