From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-2.7 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RCVD_IN_SORBS_SPAM, RP_MATCHES_RCVD shortcircuit=no autolearn=no autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id A7AAA208CD for ; Tue, 15 Aug 2017 19:06:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753457AbdHOTGP (ORCPT ); Tue, 15 Aug 2017 15:06:15 -0400 Received: from mail-pg0-f66.google.com ([74.125.83.66]:34395 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752465AbdHOTGO (ORCPT ); Tue, 15 Aug 2017 15:06:14 -0400 Received: by mail-pg0-f66.google.com with SMTP id y192so2749149pgd.1 for ; Tue, 15 Aug 2017 12:06:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=revFXtdzAy/RIukkzPjLQahiU3txPf6E8WTvmEi4fdc=; b=AnBhYMviSvFWp3usLuPZc1bXe7Bxj7V87mQ2TF/NJAVdmBmpePpMFmBSem890uVTwU QC7Gqec/K2qTW5fpZKG3uC4JP7L/MfTzAgznC3oGkuuhH5eGTGD/mWFBtR9xXnTCK98z yzrCkfSCNxYclAB6gPlRCAvxHowKUoKjU4CZLCPkfGG+q6RA80gVaC6akpdq1dvQIU9n DQNjpwtP1GbwL4dyO8NCaa4IhIZxJFSZbkufq2l7356xb2IxwoqgjuSCn6dR5wWqQuDc VIzDWU7WGbiPGVRMvc+47paLn3wobr/i0Fk4F28upI09HtpPQr7xoBYpTJXCgDvQsa1M G36g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=revFXtdzAy/RIukkzPjLQahiU3txPf6E8WTvmEi4fdc=; b=VBD9ezOJdVtgbkScfbwzn2CF1Qd/EW/zjFMClcdACvaye8fV7oeiBuHfwWOqJgMyQH S7UscZ82KUjgWUOzo1s3U+06BbKjTUszmJlQbM5CS0fd7cE1NEU0dcJZ+Xut5HJOb2jz j9dAtkCkMxFPehxIC7NwqgxrufYQic2YP4st0jEsS5aQxMGu2tY22e/wRbA76SeQxw0w VdoU4XIjs4AMqDXxfgjnbT4mvzmBaJz6Z3Uh8oSCwpBojf2BMcxdI5wfWGijc9EMJZRM P4scSMmOWdhwUc9Q00gV+rB1/b9mZz6xOleRwpjjOC+cWknZloQ9ak/cbY5chrWqdYNk eanA== X-Gm-Message-State: AHYfb5iZtRh7foG1OHNoEZ2dLska7Wia0DMVpz+qAqbMlxt5jHVp55xk /mkLv0FwjNL2T4QDdxI2libdmB93Yw== X-Received: by 10.99.112.16 with SMTP id l16mr28090801pgc.143.1502823973582; Tue, 15 Aug 2017 12:06:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.162.37 with HTTP; Tue, 15 Aug 2017 12:06:13 -0700 (PDT) In-Reply-To: References: <931ffb00319f40e3c9e099f17eeae6a0c1de41ea.1502780344.git.martin.agren@gmail.com> From: =?UTF-8?Q?Martin_=C3=85gren?= Date: Tue, 15 Aug 2017 21:06:13 +0200 Message-ID: Subject: Re: [PATCH 4/5] strbuf_reset: don't write to slopbuf with ThreadSanitizer To: Junio C Hamano Cc: Git Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 15 August 2017 at 20:43, Junio C Hamano wrote: > Martin =C3=85gren writes: > >> If two threads have one freshly initialized string buffer each and call >> strbuf_reset on them at roughly the same time, both threads will be >> writing a '\0' to strbuf_slopbuf. That is not a problem in practice >> since it doesn't matter in which order the writes happen. But >> ThreadSanitizer will consider this a race. >> >> When compiling with GIT_THREAD_SANITIZER, avoid writing to >> strbuf_slopbuf. Let's instead assert on the first byte of strbuf_slopbuf >> being '\0', since it ensures the promised invariant of "buf[len] =3D=3D >> '\0'". (Writing to strbuf_slopbuf is normally bad, but could become even >> more bad if we stop covering it up in strbuf_reset.) >> >> Signed-off-by: Martin =C3=85gren >> --- >> strbuf.h | 12 ++++++++++++ >> 1 file changed, 12 insertions(+) >> >> diff --git a/strbuf.h b/strbuf.h >> index e705b94db..295654d39 100644 >> --- a/strbuf.h >> +++ b/strbuf.h >> @@ -153,7 +153,19 @@ static inline void strbuf_setlen(struct strbuf *sb,= size_t len) >> /** >> * Empty the buffer by setting the size of it to zero. >> */ >> +#ifdef GIT_THREAD_SANITIZER >> +#define strbuf_reset(sb) \ >> + do { \ >> + struct strbuf *_sb =3D sb; = \ >> + _sb->len =3D 0; = \ >> + if (_sb->buf =3D=3D strbuf_slopbuf) = \ >> + assert(!strbuf_slopbuf[0]); \ >> + else \ >> + _sb->buf[0] =3D '\0'; = \ >> + } while (0) >> +#else >> #define strbuf_reset(sb) strbuf_setlen(sb, 0) >> +#endif >> >> >> /** > > The strbuf_slopbuf[] is a shared resource that is expected by > everybody to stay a holder of a NUL. Even though it is defined as > "char [1]", it in spirit ought to be considered const. And from > that point of view, your new definition that is conditionally used > only when sanitizer is in use _is_ the more correct one than the > current "we do not care if it is slopbuf, we are writing \0 so it > will be no-op anyway" code. > > I wonder if we excessively call strbuf_reset() in the real code to > make your version unacceptably expensive? If not, I somehow feel > that using this version unconditionally may be a better approach. > > What happens when a caller calls "strbuf_setlen(&sb, 0)" on a strbuf > that happens to have nothing and whose buffer still points at the > slopbuf (instead of calling _reset())? Shouldn't your patch fix > that function instead, i.e. something like the following without the > above? Is that make things noticeably and measurably too expensive? Good thinking. There are about 300 users of strbuf_reset and 10 users of strbuf_setlen(., 0) with a literal zero. Obviously, there might be more users which end up setting the length to 0 for some reason or other. So your idea seems the better one. I would assume that whoever resets a buffer is about to add something to it, which should be more expensive, but that's obviously just hand-waving. I'll see if I can find some interesting caller and/or performance numbers. > strbuf.h | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/strbuf.h b/strbuf.h > index 2075384e0b..1a77fe146a 100644 > --- a/strbuf.h > +++ b/strbuf.h > @@ -147,7 +147,10 @@ static inline void strbuf_setlen(struct strbuf *sb, = size_t len) > if (len > (sb->alloc ? sb->alloc - 1 : 0)) > die("BUG: strbuf_setlen() beyond buffer"); > sb->len =3D len; > - sb->buf[len] =3D '\0'; > + if (sb->buf !=3D strbuf_slopbuf) > + sb->buf[len] =3D '\0'; > + else > + assert(!strbuf_slopbuf[0]); > } > > /** When writing my patch, I used assert() and figured that with tsan, we're in some sort of "debug"-mode anyway. If we decide to always do the check, would it make sense to do "else if (strbuf_slopbuf[0]) BUG(..);" instead of the assert? Or, if we do prefer the assert, would the performance-worry be moot? Thanks for the feedback.