From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,T_RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id D563020954 for ; Fri, 1 Dec 2017 19:50:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751691AbdLATuJ (ORCPT ); Fri, 1 Dec 2017 14:50:09 -0500 Received: from mail-qt0-f172.google.com ([209.85.216.172]:37578 "EHLO mail-qt0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751163AbdLATuI (ORCPT ); Fri, 1 Dec 2017 14:50:08 -0500 Received: by mail-qt0-f172.google.com with SMTP id d15so14450563qte.4 for ; Fri, 01 Dec 2017 11:50:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=lCcVaIbjh44IGGvaHpu32QJkL32YHdz7SO0PPlcl4qU=; b=k+CuXFsc87EnwEbZ4WQagvKoVH4WBCzqMGu1Q/yeYbHLIzjhZA3AqsQdogrg0SyedG g6qPxl3K62D4AGdudZsbrcNa67BupvWUEc05od+HtuPdth0NaQ5MjBmj9FPrmaKj0VwU dcJPKD9kC1PNyfIqYtQGrpdC0X1fGPt+B8eBaprBAXHIUwDs6E7Wvyz42Xz1wIEjjjj9 azkGZYkJ2LZhJucv3ahMmvMAqkSgCwP+ExhcQj5f3j/ly0SA5qdLx27OfWqr7m0uETxJ BTrYSuHddX4JqJ2rNy2XLQcGDhKBGR+GtL04U+yhm7gubdLS4UlXuYJSapgZoD3JB0Ma MwYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=lCcVaIbjh44IGGvaHpu32QJkL32YHdz7SO0PPlcl4qU=; b=AcWtJe36fjW5BoMz22S25Jv51Fp21OHf5oJwlqy3ciUQi/8ACGwYR1w54+B5ehooeN eULc1lITdzA13gIvY95KVovbgeYHR4Ye86kN8g8/Va8Y7YU41gtyWt1eD0CMxgT0HDoR a2II4axvdUrhMGsYyGAzrLeI1Qi7dCdy/B/g0IMieFxD3VD+HaWXPc9F7CKcJNYg3HwK wODYnxc3/RDnS0R8kXGCYbb8CuqgFIph4LgExXwZjyevRT7Gwjl7WQp4PKJyoX/Q4PVM 3aPYZvaVhdPQDpykO+0blgeTD65r+NNX8MoJOqk6Cs/bliBt+WtJZ8pHfrczlN0QPLWA 08Ag== X-Gm-Message-State: AKGB3mIiEI4UTqHRec+fjhJRdo0ypuSzCLCMXExcg5/5EXv7UmrA9I59 3BhwUtG1nzHb2npbv42fB24= X-Google-Smtp-Source: AGs4zMYLlBhX6V9lmtSmWghI4KXeZzXB/+BNyq49yfe9VZf3ZBI5B5DDLrlpWULossj9gcrrtXa8pA== X-Received: by 10.237.59.247 with SMTP id s52mr9736767qte.139.1512157807361; Fri, 01 Dec 2017 11:50:07 -0800 (PST) Received: from ?IPv6:2001:4898:6808:13e:c4e6:7a22:56f1:df04? ([2001:4898:8010:1::7cc]) by smtp.gmail.com with ESMTPSA id o26sm5414937qto.16.2017.12.01.11.50.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 01 Dec 2017 11:50:06 -0800 (PST) Subject: Re: [PATCH] sha1_file: use strbuf_add() instead of strbuf_addf() To: Jeff King , Derrick Stolee Cc: git@vger.kernel.org, git@jeffhostetler.com References: <20171201174956.143245-1-dstolee@microsoft.com> <20171201182238.GA27688@sigill.intra.peff.net> From: Derrick Stolee Message-ID: <54da89a5-bf20-cb20-9884-6285033f0d95@gmail.com> Date: Fri, 1 Dec 2017 14:50:05 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171201182238.GA27688@sigill.intra.peff.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 12/1/2017 1:22 PM, Jeff King wrote: > On Fri, Dec 01, 2017 at 12:49:56PM -0500, Derrick Stolee wrote: [snip] >> diff --git a/sha1_file.c b/sha1_file.c >> index 8ae6cb6285..2160323c4a 100644 > This overall looks good, but I noticed one bug and a few cosmetic > improvements. Thanks for finding quality improvements to this patch. I'll let it sit over the weekend for more feedback before sending a v2. > >> --- a/sha1_file.c >> +++ b/sha1_file.c >> @@ -1914,17 +1914,18 @@ int for_each_file_in_obj_subdir(unsigned int subdir_nr, >> } >> >> oid.hash[0] = subdir_nr; >> + strbuf_add(path, "/", 1); > Maybe: > > strbuf_addch(path, '/'); > > would be a bit more readable (it's also a tiny bit faster, but this > isn't part of the tight loop). > >> + baselen = path->len; > We set this here so that the '/' is included as part of the base. Makes > sense, but can we now drop the earlier setting of baselen before the > opendir() call? Yeah, probably. I had briefly considered just adding the '/' before the first assignment of "baselen", but didn't want to change the error output. I also don't know if there are side effects for other platforms by calling opendir() with a '/'-terminated path. >> while ((de = readdir(dir))) { >> if (is_dot_or_dotdot(de->d_name)) >> continue; >> >> - strbuf_setlen(path, baselen); >> - strbuf_addf(path, "/%s", de->d_name); >> - >> if (strlen(de->d_name) == GIT_SHA1_HEXSZ - 2 && >> !hex_to_bytes(oid.hash + 1, de->d_name, >> GIT_SHA1_RAWSZ - 1)) { >> + strbuf_setlen(path, baselen); >> + strbuf_add(path, de->d_name, GIT_SHA1_HEXSZ - 2); > Moving this code into the conditional makes sense here, since that's > where we know the actual size. But what about the case when we _don't_ > hit this conditional. We still do: > > if (cruft_cb) > cruft_cb(path->buf); > > which is now broken (it will just get the directory name without the > entry appended). Good catch! A big reason to pull it inside and use strbuf_add over strbuf_addstr is to avoid a duplicate strlen() calculation. However, I can store the length before the conditional. > I think the optimized versions probably needs to be something more like: > > while (de = readdir(...)) { > strbuf_setlen(path, baselen); > if (size is HEXSZ - 2) { > strbuf_add(path, de->d_name, HEXSZ - 2); > obj_cb(path->buf); > } else if (cruft_cb) { > strbuf_addstr(path, de->d_name); > cruft_cb(path->buf); > } > } Small change by storing the length in advance of the conditional: while (de = readdir(...)) {     int namelen = strlen(de->d_name);     strbuf_setlen(path, baselen);     strbuf_add(path, de->d_name, namelen);     if (namelen == HEXSZ - 2)         obj_cb(path->buf)     else         cruft_cb(path->buf); } > Two other thoughts: > > - from past discussions, I suspect most of your performance > improvement actually comes from avoiding addf(), and the difference > between addstr() and add() may be negligible here. It might be worth > timing strbuf_addstr(). If that's similarly fast, that means we > could keep the logic out of the conditional entirely. addstr() duplicates the strlen(), which isn't much but we might as well save cycles where we can if it isn't too hard. > - there's an extra micro-optimization there, which is that if there's > no obj_cb, we have no need to assemble the full path at all. I doubt > it makes much of a difference, as most callers would pass an object > callback (I'd be surprised if we even have one that doesn't). After doing a few 'git grep' commands, I found several that include a NULL cruft_cb but none that have a NULL obj_cb. Thanks, -Stolee