From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FROM_EXCESS_BASE64,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_NONE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id CC1961F461 for ; Thu, 27 Jun 2019 12:18:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726695AbfF0MST (ORCPT ); Thu, 27 Jun 2019 08:18:19 -0400 Received: from mail-ed1-f50.google.com ([209.85.208.50]:39418 "EHLO mail-ed1-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726308AbfF0MST (ORCPT ); Thu, 27 Jun 2019 08:18:19 -0400 Received: by mail-ed1-f50.google.com with SMTP id m10so6862988edv.6 for ; Thu, 27 Jun 2019 05:18:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:user-agent:in-reply-to:date :message-id:mime-version; bh=fXVkk+V7dn7jnV3199IK4YfH9i0Zinqn6bcMhrtrM5A=; b=RMM8Nj4bgAUjKQVpWxXFP1fj1PL/qu2MsbX6X9yF0q2QQlfEMVE7xeCFbTMrWv+cb7 TvItwCpO2bBtmaOcpEiffEPdk+NwlnMoJ1tK9yYhbfPhXk3go4lWJuJ11vI8Jnv1bJAG cN11aTyxMhnDnd65HV48Y+R5pLkZ9Xr3XqS7G1wkLM+qqElHOVrxiSP0pOzt0fDJMGeX Qu+kr+dEfsg4ZEMOWqa2UivRV1054nQ2wqOcqX0lR0bdVV1QUR5orGFWi1l2fpKIb5W8 G+jwnEpaDLmabOU06h/h3TwyzgVpOGqaJUgxqjkQWw7UjdwTAZJFPlmUFbCE1mLCvT0t ZloA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:user-agent :in-reply-to:date:message-id:mime-version; bh=fXVkk+V7dn7jnV3199IK4YfH9i0Zinqn6bcMhrtrM5A=; b=NunxrG3cokYF51AmFXRpJ2UWWSh+KlzuIqmAfRfCrBQ68cFOCMkWssJbpyd1N0/VnS EgODvniGU9DUFc+udW9JwfpYsv7fXbcd05laNB+iyWvwEbxFKT9neFxCfngG5K1AQtkJ smyJYhwyaf7rDqm20LeesZmGX1BtrcmQ4YxQvMrmT1sX2sRiPakKTjY9UOkthl6UVbJI AMVfPhnMnilGCyTnMDRL/IEAdF1EaUg6mTqYheYENoGjw4iJni9ZFlCSsLd3lggiz5eq 46pdTeb0QWQlTvE4emDLRxYrnlPIHff+Lca3Y/c56uo85RqlhxrydyyNNpxyH+0NRKur VfHQ== X-Gm-Message-State: APjAAAVOcPeOEtQy46kwSqUBh7xl636HFDkFpvvGR/p4xFfOvAeOx009 FLCCIFgH9aQxFbG7uYmci34= X-Google-Smtp-Source: APXvYqwahuh4551cacgBXsAIAPf5WX/qvP1T7kjAZKXmlhdSYsN6fEUpb/SWts5KvQSUY25MWEHL0A== X-Received: by 2002:a50:8dcb:: with SMTP id s11mr3748310edh.144.1561637897385; Thu, 27 Jun 2019 05:18:17 -0700 (PDT) Received: from evledraar (i237193.upc-i.chello.nl. [62.195.237.193]) by smtp.gmail.com with ESMTPSA id k20sm705104ede.66.2019.06.27.05.18.16 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 27 Jun 2019 05:18:16 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Duy Nguyen Cc: Jeff King , Git Mailing List , Taylor Blau Subject: Re: fprintf_ln() is slow References: <20190627052515.GA21207@sigill.intra.peff.net> <20190627055739.GA9322@sigill.intra.peff.net> User-agent: Debian GNU/Linux 10 (buster); Emacs 26.1; mu4e 1.1.0 In-reply-to: Date: Thu, 27 Jun 2019 14:18:15 +0200 Message-ID: <8736jv8bnc.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Jun 27 2019, Duy Nguyen wrote: > On Thu, Jun 27, 2019 at 1:00 PM Jeff King wrote: >> >> On Thu, Jun 27, 2019 at 01:25:15AM -0400, Jeff King wrote: >> >> > Taylor and I noticed a slowdown in p1451 between v2.20.1 and v2.21.0. I >> > was surprised to find that it bisects to bbb15c5193 (fsck: reduce word >> > legos to help i18n, 2018-11-10). >> > >> > The important part, as it turns out, is the switch to using fprintf_ln() >> > instead of a regular fprintf() with a "\n" in it. Doing this: >> > [...] >> > on top of the current tip of master yields this result: >> > >> > Test HEAD^ HEAD >> > ----------------------------------------------------------------------------------------- >> > 1451.3: fsck with 0 skipped bad commits 9.78(7.46+2.32) 8.74(7.38+1.36) -10.6% >> > 1451.5: fsck with 1 skipped bad commits 9.78(7.66+2.11) 8.49(7.04+1.44) -13.2% >> > 1451.7: fsck with 10 skipped bad commits 9.83(7.45+2.37) 8.53(7.26+1.24) -13.2% >> > 1451.9: fsck with 100 skipped bad commits 9.87(7.47+2.40) 8.54(7.24+1.30) -13.5% >> > 1451.11: fsck with 1000 skipped bad commits 9.79(7.67+2.12) 8.48(7.25+1.23) -13.4% >> > 1451.13: fsck with 10000 skipped bad commits 9.86(7.58+2.26) 8.38(7.09+1.28) -15.0% >> > 1451.15: fsck with 100000 skipped bad commits 9.58(7.39+2.19) 8.41(7.21+1.19) -12.2% >> > 1451.17: fsck with 1000000 skipped bad commits 6.38(6.31+0.07) 6.35(6.26+0.07) -0.5% >> >> Ah, I think I see it. >> >> See how the system times for HEAD^ (with fprintf_ln) are higher? We're >> flushing stderr more frequently (twice as much, since it's unbuffered, >> and we now have an fprintf followed by a putc). >> >> I can get similar speedups by formatting into a buffer: >> >> diff --git a/strbuf.c b/strbuf.c >> index 0e18b259ce..07ce9b9178 100644 >> --- a/strbuf.c >> +++ b/strbuf.c >> @@ -880,8 +880,22 @@ int printf_ln(const char *fmt, ...) >> >> int fprintf_ln(FILE *fp, const char *fmt, ...) >> { >> + char buf[1024]; >> int ret; >> va_list ap; >> + >> + /* Fast path: format it ourselves and dump it via fwrite. */ >> + va_start(ap, fmt); >> + ret = vsnprintf(buf, sizeof(buf), fmt, ap); >> + va_end(ap); >> + if (ret < sizeof(buf)) { >> + buf[ret++] = '\n'; >> + if (fwrite(buf, 1, ret, fp) != ret) >> + return -1; >> + return ret; >> + } >> + >> + /* Slow path: a normal fprintf/putc combo */ >> va_start(ap, fmt); >> ret = vfprintf(fp, fmt, ap); >> va_end(ap); >> >> But we shouldn't have to resort to that. We can use setvbuf() to toggle >> buffering back and forth, but I'm not sure if there's a way to query the >> current buffering scheme for a stdio stream. We'd need that to be able >> to switch back correctly (and to avoid switching for things that are >> already buffered). >> >> I suppose it would be enough to check for "fp == stderr", since that is >> the only unbuffered thing we'd generally see. >> >> And it may be that the code above is really not much different anyway. >> For an unbuffered stream, I'd guess it dumps an fwrite() directly to >> write() anyway (since by definition it does not need to hold onto it, >> and nor is there anything in the buffer ahead of it). >> >> Something like: >> >> char buf[1024]; >> if (fp == stderr) >> setvbuf(stream, buf, _IOLBF, sizeof(buf)); >> >> ... do fprintf and putc ... >> >> if (fp == stderr) >> setvbuf(stream, NULL, _IONBF, 0); >> >> feels less horrible, but it's making the assumption that we were >> unbuffered coming into the function. I dunno. > > How about do all the formatting in strbuf and only fwrite last minute? > A bit more overhead with malloc(), so I don't know if it's an > improvement or not. Why shouldn't we just move back to plain fprintf() with "\n"? Your 9a0a30aa4b ("strbuf: convenience format functions with \n automatically appended", 2012-04-23) doesn't explain why this is a convenience for translators. When I'm translating things tend to like knowing that something ends in a newline explicitly, why do we need to hide that from translators? They also need to deal with trailing \n in other messages, so these *_ln() functions make things inconsistent. It's also not possible for translators to do this by mistake without being caught, because msgfmt will catch this (and other common issues): po/de.po:23: 'msgid' and 'msgstr' entries do not both end with '\n'