From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FROM_EXCESS_BASE64,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 8BD221F453 for ; Tue, 30 Apr 2019 18:48:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726209AbfD3SsN (ORCPT ); Tue, 30 Apr 2019 14:48:13 -0400 Received: from mail-ed1-f44.google.com ([209.85.208.44]:40366 "EHLO mail-ed1-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726006AbfD3SsM (ORCPT ); Tue, 30 Apr 2019 14:48:12 -0400 Received: by mail-ed1-f44.google.com with SMTP id e56so6918223ede.7 for ; Tue, 30 Apr 2019 11:48:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:user-agent:in-reply-to:date :message-id:mime-version; bh=MYshi+R8LbeUsoysrVUClNZTFzcxtrBxaXuHuI9BZoY=; b=R2mgN76i/v0KcjH7VcS+/Iu/Ppek25lt/6aiy96xyYjNnPlFbas7VBsgRaDepX8A1n PRbHEUK8lwrDc6zaIEt+ypLfFF6nCZzuhSaeTyCE/FBW0JI6ypVnLYBpoNu65+hcU2Cb /L1ws3JWaeXhhByB7C4aw9YGUY4SJ5QcS+vGnGC8cOJnYx/VGtIwBj+ibX7x1gjz7APZ U5DFUmOCEYeVruJSl6LZRFGle/Dl60MLctZuei/1++aOvvOWuawAgWQig2wZCpOvJDSH 8jGZ1bgdqlY7HEhnp8VjmZDm4p5O2osaFEsdxEQynz0QOll+NnaYP6hb8t9Ddoy8Lcl9 CZhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:user-agent :in-reply-to:date:message-id:mime-version; bh=MYshi+R8LbeUsoysrVUClNZTFzcxtrBxaXuHuI9BZoY=; b=RCNiE8+qTU20wvqclSq03yWah9ocaMV0a81mG/wrSF4FNVuYvxRGV56h3P3fscGX0o n+TbEYVyDce5T4L8ILLr5doZSKj90W6bFTJU68BSotalTNgx8AH/JLxHk6D9tRkdQF9H jgXjR5B/iHQkrJ8CUjpoREjVodMzzr2eTQcK/0g9lGGahgzWWGyXhRnN9WOhSDojvYUs D4QZhsFTFmgnsvi31hUNDC6WbW5sYV/3lGps/n0gSwaidUevEJ3GaP5SOoX/nxfl/DR2 5ADAVPOGvHQB7zGJmPhYpnyPA3KYKIdQ8v5HVIOmSoxrSAoarWuSKn1RgYXxi1alcbRs iLlg== X-Gm-Message-State: APjAAAVS9/zyWDkxgH1s7/oCxHY+RI83kdNtoyPIPg7WadGVcu2SBVZU SCURta/RPKqLSn8uXpv6pZI= X-Google-Smtp-Source: APXvYqzXcvGum6uCgcdBKZ9VIgJcC+pwfOp0COHrQsmlY1xlplI3Y/epHreMVt+wlZHI0T7e+l92GA== X-Received: by 2002:aa7:d908:: with SMTP id a8mr43157126edr.68.1556650090159; Tue, 30 Apr 2019 11:48:10 -0700 (PDT) Received: from evledraar (dhcp-077-251-215-224.chello.nl. [77.251.215.224]) by smtp.gmail.com with ESMTPSA id v9sm10271738ede.32.2019.04.30.11.48.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 30 Apr 2019 11:48:09 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Jeff King Cc: Duy Nguyen , Martin Fick , Git Mailing List Subject: Re: Resolving deltas dominates clone time References: <259296914.jpyqiltySj@mfick-lnx> <16052712.dFCfNLlQnN@mfick-lnx> <20190422205653.GA30286@sigill.intra.peff.net> <19221376.OlD5LWjr85@mfick-lnx> <20190423015538.GA16369@sigill.intra.peff.net> <20190423042109.GA19183@sigill.intra.peff.net> <20190430175048.GB16729@sigill.intra.peff.net> User-agent: Debian GNU/Linux buster/sid; Emacs 26.1; mu4e 1.1.0 In-reply-to: <20190430175048.GB16729@sigill.intra.peff.net> Date: Tue, 30 Apr 2019 20:48:08 +0200 Message-ID: <87sgtzqqhj.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Tue, Apr 30 2019, Jeff King wrote: > On Tue, Apr 23, 2019 at 05:08:40PM +0700, Duy Nguyen wrote: > >> On Tue, Apr 23, 2019 at 11:45 AM Jeff King wrote: >> > >> > On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote: >> > >> > > Here are my p5302 numbers on linux.git, by the way. >> > > >> > > Test jk/p5302-repeat-fix >> > > ------------------------------------------------------------------ >> > > 5302.2: index-pack 0 threads 307.04(303.74+3.30) >> > > 5302.3: index-pack 1 thread 309.74(306.13+3.56) >> > > 5302.4: index-pack 2 threads 177.89(313.73+3.60) >> > > 5302.5: index-pack 4 threads 117.14(344.07+4.29) >> > > 5302.6: index-pack 8 threads 112.40(607.12+5.80) >> > > 5302.7: index-pack default number of threads 135.00(322.03+3.74) >> > > >> > > which still imply that "4" is a win over "3" ("8" is slightly better >> > > still in wall-clock time, but the total CPU rises dramatically; that's >> > > probably because this is a quad-core with hyperthreading, so by that >> > > point we're just throttling down the CPUs). >> > >> > And here's a similar test run on a 20-core Xeon w/ hyperthreading (I >> > tweaked the test to keep going after eight threads): >> > >> > Test HEAD >> > ---------------------------------------------------- >> > 5302.2: index-pack 1 threads 376.88(364.50+11.52) >> > 5302.3: index-pack 2 threads 228.13(371.21+17.86) >> > 5302.4: index-pack 4 threads 151.41(387.06+21.12) >> > 5302.5: index-pack 8 threads 113.68(413.40+25.80) >> > 5302.6: index-pack 16 threads 100.60(511.85+37.53) >> > 5302.7: index-pack 32 threads 94.43(623.82+45.70) >> > 5302.8: index-pack 40 threads 93.64(702.88+47.61) >> > >> > I don't think any of this is _particularly_ relevant to your case, but >> > it really seems to me that the default of capping at 3 threads is too >> > low. >> >> Looking back at the multithread commit, I think the trend was the same >> and I capped it because the gain was not proportional to the number of >> cores we threw at index-pack anymore. I would not be opposed to >> raising the cap though (or maybe just remove it) > > I'm not sure what the right cap would be. I don't think it's static; > we'd want ~4 threads on the top case, and 10-20 on the bottom one. > > It does seem like there's an inflection point in the graph at N/2 > threads. But then maybe that's just because these are hyper-threaded > machines, so "N/2" is the actual number of physical cores, and the > inflated CPU times above that are just because we can't turbo-boost > then, so we're actually clocking slower. Multi-threaded profiling and > measurement is such a mess. :) > > So I'd say the right answer is probably either online_cpus() or half > that. The latter would be more appropriate for the machines I have, but > I'd worry that it would leave performance on the table for non-intel > machines. It would be a nice #leftoverbits project to do this dynamically at runtime, i.e. hook up the throughput code in progress.c to some new utility functions where the current code using pthreads would occasionally stop and try to find some (local) maximum throughput given N threads. You could then dynamically save that optimum for next time, or adjust threading at runtime every X seconds, e.g. on a server with N=24 cores you might want 24 threads if you have one index-pack, but if you have 24 index-packs you probably don't want each with 24 threads, for a total of 576.