From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EC82C4338F for ; Mon, 2 Aug 2021 12:58:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 70C9860E09 for ; Mon, 2 Aug 2021 12:58:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233823AbhHBM62 (ORCPT ); Mon, 2 Aug 2021 08:58:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232629AbhHBM61 (ORCPT ); Mon, 2 Aug 2021 08:58:27 -0400 Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 566A5C0613D5 for ; Mon, 2 Aug 2021 05:58:18 -0700 (PDT) Received: by mail-ej1-x634.google.com with SMTP id hw6so16843301ejc.10 for ; Mon, 02 Aug 2021 05:58:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=opFw02COXnigJkAV5eAlBY19aT8H4q7E3sY8nsELu5o=; b=r1HGEqB8oPpjcEPEsrM0la1Q3SiDDzK1vQfFw+3656NyC4bZcv/1bCCkrAxXEP4+yc vbcjH3QwHRi7yALPT9jn4abpcqvY8LDzF5xzQzaMNiKGXSBcOsKdpEPizzBwFR5SG6dN ttqaFbaD1RuE5YYyxm1qtNTDzQSSIhY4N8i5PT+cSZ4jvOfB7VfIdyYjpIrvxFQYtdt/ kOobDMkBG/EAgm+SvMmhCLzVx0XuSN1hVYkrB+2NZrYh/WeXx9NN5CkqEGVNbvVp40Dy PAg0yVj66/rg4290zVnv2RSRpcY0Xo+/RZKjHgzpUq8i53sQdeFXKUK3xm1X68NN3c9Y w0ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version:content-transfer-encoding; bh=opFw02COXnigJkAV5eAlBY19aT8H4q7E3sY8nsELu5o=; b=MgUlwtBOsUwhimOdON5hbKV+ADM/PHbx0+LQ1NQ+jrwHMc3wdQ90vLq5GAqXU/Rb9r PtJ+YdZziJVNBuKaPrKgJfZMy4P3zfu4NjEMY0eDX8ChYd/7auiWxDOAXLGhblPR8OBT 75y7xiObzx3b0S6chlrkIGjjBEo/n+gz0IParA1Y7ha1fWio/qnZdSWyNqoHusbDdBnR nODKhqrCrR/fGr0ivRaE4apuVEBxo7/Gdumo1hEJSan/4xPfnSi4CSE1yRlmujuyJkzR 0P2FqRo9243GuqPU0PRUWfTr/4+xxPSd6nkZcLLP6RxADRgD5zWmgZU6th1O9i+vBn/d J8EQ== X-Gm-Message-State: AOAM53116FBhYOKvb3cLBedCFXelThK7P4H1wyHrdpfJ9wznOgEfNdm5 wxpOAEJMOCFgbR8kUoUdBxA= X-Google-Smtp-Source: ABdhPJykuDKjRTtNpqoCTAR2EkTkdfWPr+Dw5A2Zzwl1k9WbWuY2KKfeyJAfe7m95NkvWsM2TBA2Wg== X-Received: by 2002:a17:906:4ad9:: with SMTP id u25mr15296353ejt.106.1627909096825; Mon, 02 Aug 2021 05:58:16 -0700 (PDT) Received: from evledraar (j57224.upc-j.chello.nl. [24.132.57.224]) by smtp.gmail.com with ESMTPSA id u4sm4548317eje.81.2021.08.02.05.58.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Aug 2021 05:58:16 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Patrick Steinhardt Cc: git@vger.kernel.org, Jeff King , Felipe Contreras , SZEDER =?utf-8?Q?G=C3=A1bor?= , Chris Torek , Junio C Hamano , Taylor Blau Subject: Re: [PATCH v3 3/4] revision: avoid loading object headers multiple times Date: Mon, 02 Aug 2021 14:55:09 +0200 References: User-agent: Debian GNU/Linux 11 (bullseye); Emacs 27.1; mu4e 1.5.13 In-reply-to: Message-ID: <87v94onguw.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Mon, Aug 02 2021, Patrick Steinhardt wrote: > [[PGP Signed Part:Undecided]] > When loading references, we try to optimize loading of commits by using > the commit graph. To do so, we first need to determine whether the > object actually is a commit or not, which is why we always execute > `oid_object_info()` first. Like this, we'll unpack the object header of > each object first. > > This pattern can be quite inefficient in case many references point to > the same commit: if the object didn't end up in the cached objects, then > we'll repeatedly unpack the same object header, even if we've already > seen the object before. > > Optimize this pattern by using `lookup_unknown_object()` first in order > to determine whether we've seen the object before. If so, then we don't > need to re-parse the header but can directly use its object information > and thus gain a modest performance improvement. Executed in a real-world > repository with around 2.2 million references: > > Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --no= t --all --not $newrev > Time (mean =C2=B1 =CF=83): 4.771 s =C2=B1 0.238 s [User: 4= .440 s, System: 0.330 s] > Range (min =E2=80=A6 max): 4.539 s =E2=80=A6 5.219 s 10 runs > > Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not= --all --not $newrev > Time (mean =C2=B1 =CF=83): 4.454 s =C2=B1 0.037 s [User: 4= .122 s, System: 0.332 s] > Range (min =E2=80=A6 max): 4.375 s =E2=80=A6 4.496 s 10 runs > > Summary > 'HEAD: rev-list --unsorted-input --objects --quiet --not --all --no= t $newrev' ran > 1.07 =C2=B1 0.05 times faster than 'HEAD~: rev-list --unsorted-in= put --objects --quiet --not --all --not $newrev' > > The downside is that `lookup_unknown_object()` is forced to always > allocate an object such that it's big enough to host all object types' > structs and thus we may waste memory here. This tradeoff is probably > worth it though considering the following struct sizes: > > - commit: 72 bytes > - tree: 56 bytes > - blob: 40 bytes > - tag: 64 bytes > > Assuming that in almost all repositories, most references will point to > either a tag or a commit, we'd have a modest increase in memory > consumption of about 12.5% here. > > Signed-off-by: Patrick Steinhardt > --- > revision.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/revision.c b/revision.c > index f06a5d63a3..671b6d6513 100644 > --- a/revision.c > +++ b/revision.c > @@ -359,14 +359,22 @@ static struct object *get_reference(struct rev_info= *revs, const char *name, > const struct object_id *oid, > unsigned int flags) > { > - struct object *object; > + struct object *object =3D lookup_unknown_object(revs->repo, oid); > + > + if (object->type =3D=3D OBJ_NONE) { > + int type =3D oid_object_info(revs->repo, oid, NULL); > + if (type < 0 || !object_as_type(object, type, 1)) { Let's s/int type/enum object_type, personally I think we should never do "type < 0" either, and check OBJ_BAD explicitly, but I've seemingly lost that discussion on-list before. But I think the consensus is that we should not do !type, but rather type =3D=3D OBJ_NONE.