git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/6] getenv() timing fixes
Date: Sat, 12 Jan 2019 12:31:21 +0100	[thread overview]
Message-ID: <87va2u3yeu.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20190111221414.GA31335@sigill.intra.peff.net>


On Fri, Jan 11 2019, Jeff King wrote:

> Similar to the recent:
>
>   https://public-inbox.org/git/20190109221007.21624-1-kgybels@infogroep.be/
>
> there are some other places where we do not follow the POSIX rule that
> getenv()'s return value may be invalidated by other calls to getenv() or
> setenv().
>
> For the most part we haven't noticed because:
>
>   - on many platforms, you can call getenv() as many times as you want.
>     This changed recently in our mingw_getenv() helper, which is why
>     people are noticing now.
>
>   - calling setenv() in between _often_ works, but it depends on whether
>     libc feels like it needs to reallocate memory. Which is itself
>     platform specific, and even on a single platform may depend on
>     things like how many environment variables you have set.
>
> The first patch here is a problem somebody actually found in the wild.
> That led me to start looking through the results of:
>
>   git grep '= getenv('
>
> There are a ton of hits. I poked at the first 20 or so. A lot of them
> are fine, as they do something like this:
>
>   rla = getenv("GIT_REFLOG_ACTION");
>   strbuf_addstr("blah blah %s", rla);
>
> That's not _strictly_ correct, because strbuf_addstr() may actually look
> at the environment. But it works for our mingw_getenv() case, because
> there we use a rotating series of buffers. So as long as it doesn't look at
> 30 environment variables, we're fine. And many calls fall into that
> bucket (a more complicated one is get_ssh_command(), which runs a fair
> bit of code while holding the pointer, but ultimately probably has a
> small fixed number of opportunities to call getenv(). What is more
> worrisome is code that holds a pointer across an arbitrary number of
> calls (like once per diff'd file, or once per submodule, etc).
>
> Of course it's possible for some platform libc to use a single buffer.
> But in that case, I'd argue that the path of least resistance is
> wrapping getenv, like I described in:
>
>   https://public-inbox.org/git/20181025062037.GC11460@sigill.intra.peff.net/
>
> So anyway. Here are a handful of what seem like pretty low-hanging
> fruit. Beyond the first one, I'm not sure if they're triggerable, but
> they're easy to fix. There are 100+ grep matches that I _didn't_ audit,
> so this is by no means a complete fix. I was mostly trying to get a
> sense of how painful these fixes would be.

I wonder, and not as "you should do this" feedback on this series, just
on future development, whether we shouldn't just make our own getenv()
wrapper for the majority of the GIT_* variables. The semantics would be
fetch value X, and if it's ever requested again return the value we
found the first time.

For some things we rely on getenv(X) -> setenv(X) -> getenv(X) returning
different values of X, e.g. in passing along config, but for
e.g. GIT_TEST_* variables we just want to check them once, and have our
own ad-hoc caches (via static variables) in a couple of places.

Maybe such an API would just loop over "environ" on startup, looking for
any GIT_* variables, i.e. called from the setup.c code.

  parent reply	other threads:[~2019-01-12 11:39 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
2019-01-11 22:15 ` [PATCH 1/6] get_super_prefix(): copy getenv() result Jeff King
2019-01-12  3:02   ` Junio C Hamano
2019-01-11 22:15 ` [PATCH 2/6] commit: copy saved " Jeff King
2019-01-12  3:07   ` Junio C Hamano
2019-01-12 10:26     ` Jeff King
2019-01-15 14:05       ` Johannes Schindelin
2019-01-15 19:17         ` Jeff King
2019-01-15 19:25           ` Stefan Beller
2019-01-15 19:32             ` Jeff King
2019-01-16 14:06               ` Johannes Schindelin
2019-01-11 22:15 ` [PATCH 3/6] config: make a copy of $GIT_CONFIG string Jeff King
2019-01-11 22:16 ` [PATCH 4/6] init: make a copy of $GIT_DIR string Jeff King
2019-01-12  3:08   ` Junio C Hamano
2019-01-11 22:16 ` [PATCH 5/6] merge-recursive: copy $GITHEAD strings Jeff King
2019-01-12  3:10   ` Junio C Hamano
2019-01-11 22:17 ` [PATCH 6/6] builtin_diff(): read $GIT_DIFF_OPTS closer to use Jeff King
2019-01-12 11:31 ` Ævar Arnfjörð Bjarmason [this message]
2019-01-12 18:51   ` [PATCH 0/6] getenv() timing fixes Stefan Beller
2019-01-15 19:13     ` Jeff King
2019-01-15 19:32       ` Junio C Hamano
2019-01-15 19:38         ` Stefan Beller
2019-01-15 19:41         ` Jeff King
2019-01-15 19:47           ` Jeff King
2019-01-15 20:49             ` Junio C Hamano
2019-01-15 19:12   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87va2u3yeu.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).