From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/6] getenv() timing fixes
Date: Sat, 12 Jan 2019 12:31:21 +0100 [thread overview]
Message-ID: <87va2u3yeu.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20190111221414.GA31335@sigill.intra.peff.net>
On Fri, Jan 11 2019, Jeff King wrote:
> Similar to the recent:
>
> https://public-inbox.org/git/20190109221007.21624-1-kgybels@infogroep.be/
>
> there are some other places where we do not follow the POSIX rule that
> getenv()'s return value may be invalidated by other calls to getenv() or
> setenv().
>
> For the most part we haven't noticed because:
>
> - on many platforms, you can call getenv() as many times as you want.
> This changed recently in our mingw_getenv() helper, which is why
> people are noticing now.
>
> - calling setenv() in between _often_ works, but it depends on whether
> libc feels like it needs to reallocate memory. Which is itself
> platform specific, and even on a single platform may depend on
> things like how many environment variables you have set.
>
> The first patch here is a problem somebody actually found in the wild.
> That led me to start looking through the results of:
>
> git grep '= getenv('
>
> There are a ton of hits. I poked at the first 20 or so. A lot of them
> are fine, as they do something like this:
>
> rla = getenv("GIT_REFLOG_ACTION");
> strbuf_addstr("blah blah %s", rla);
>
> That's not _strictly_ correct, because strbuf_addstr() may actually look
> at the environment. But it works for our mingw_getenv() case, because
> there we use a rotating series of buffers. So as long as it doesn't look at
> 30 environment variables, we're fine. And many calls fall into that
> bucket (a more complicated one is get_ssh_command(), which runs a fair
> bit of code while holding the pointer, but ultimately probably has a
> small fixed number of opportunities to call getenv(). What is more
> worrisome is code that holds a pointer across an arbitrary number of
> calls (like once per diff'd file, or once per submodule, etc).
>
> Of course it's possible for some platform libc to use a single buffer.
> But in that case, I'd argue that the path of least resistance is
> wrapping getenv, like I described in:
>
> https://public-inbox.org/git/20181025062037.GC11460@sigill.intra.peff.net/
>
> So anyway. Here are a handful of what seem like pretty low-hanging
> fruit. Beyond the first one, I'm not sure if they're triggerable, but
> they're easy to fix. There are 100+ grep matches that I _didn't_ audit,
> so this is by no means a complete fix. I was mostly trying to get a
> sense of how painful these fixes would be.
I wonder, and not as "you should do this" feedback on this series, just
on future development, whether we shouldn't just make our own getenv()
wrapper for the majority of the GIT_* variables. The semantics would be
fetch value X, and if it's ever requested again return the value we
found the first time.
For some things we rely on getenv(X) -> setenv(X) -> getenv(X) returning
different values of X, e.g. in passing along config, but for
e.g. GIT_TEST_* variables we just want to check them once, and have our
own ad-hoc caches (via static variables) in a couple of places.
Maybe such an API would just loop over "environ" on startup, looking for
any GIT_* variables, i.e. called from the setup.c code.
next prev parent reply other threads:[~2019-01-12 11:39 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-11 22:14 [PATCH 0/6] getenv() timing fixes Jeff King
2019-01-11 22:15 ` [PATCH 1/6] get_super_prefix(): copy getenv() result Jeff King
2019-01-12 3:02 ` Junio C Hamano
2019-01-11 22:15 ` [PATCH 2/6] commit: copy saved " Jeff King
2019-01-12 3:07 ` Junio C Hamano
2019-01-12 10:26 ` Jeff King
2019-01-15 14:05 ` Johannes Schindelin
2019-01-15 19:17 ` Jeff King
2019-01-15 19:25 ` Stefan Beller
2019-01-15 19:32 ` Jeff King
2019-01-16 14:06 ` Johannes Schindelin
2019-01-11 22:15 ` [PATCH 3/6] config: make a copy of $GIT_CONFIG string Jeff King
2019-01-11 22:16 ` [PATCH 4/6] init: make a copy of $GIT_DIR string Jeff King
2019-01-12 3:08 ` Junio C Hamano
2019-01-11 22:16 ` [PATCH 5/6] merge-recursive: copy $GITHEAD strings Jeff King
2019-01-12 3:10 ` Junio C Hamano
2019-01-11 22:17 ` [PATCH 6/6] builtin_diff(): read $GIT_DIFF_OPTS closer to use Jeff King
2019-01-12 11:31 ` Ævar Arnfjörð Bjarmason [this message]
2019-01-12 18:51 ` [PATCH 0/6] getenv() timing fixes Stefan Beller
2019-01-15 19:13 ` Jeff King
2019-01-15 19:32 ` Junio C Hamano
2019-01-15 19:38 ` Stefan Beller
2019-01-15 19:41 ` Jeff King
2019-01-15 19:47 ` Jeff King
2019-01-15 20:49 ` Junio C Hamano
2019-01-15 19:12 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87va2u3yeu.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).