* [PATCH] read-cache: avoid git_path() race in freshen_shared_index()
@ 2017-03-29 8:08 Christian Couder
2017-03-29 17:06 ` Junio C Hamano
0 siblings, 1 reply; 8+ messages in thread
From: Christian Couder @ 2017-03-29 8:08 UTC (permalink / raw)
To: git
Cc: Junio C Hamano, Nguyen Thai Ngoc Duy,
Ævar Arnfjörð Bjarmason, Ramsay Jones, Jeff King,
Christian Couder
When performing an interactive rebase in split-index mode,
the commit message that one should rework when squashing commits
can contain some garbage instead of the usual concatenation of
both of the commit messages.
When bisecting it appears that 94c9b5af70 (Merge branch
'cc/split-index-config', 2017-03-17) is the first bad commit.
But when rebasing cc/split-index-config on top of the commit it
was merged with, the first bad commit is then c3a0082502
(read-cache: use freshen_shared_index() in read_index_from(),
2017-03-06).
This shows that we should be careful not to use git_path() in
freshen_shared_index(). It is using a shared buffer that can
too easily lead to races.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
read-cache.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/read-cache.c b/read-cache.c
index e447751823..2f10242c24 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1682,9 +1682,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
*/
static void freshen_shared_index(char *base_sha1_hex, int warn)
{
- const char *shared_index = git_path("sharedindex.%s", base_sha1_hex);
+ char *shared_index = git_pathdup("sharedindex.%s", base_sha1_hex);
if (!check_and_freshen_file(shared_index, 1) && warn)
warning("could not freshen shared index '%s'", shared_index);
+ free(shared_index);
}
int read_index_from(struct index_state *istate, const char *path)
--
2.12.0.339.g3df8399f7e.dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] read-cache: avoid git_path() race in freshen_shared_index()
2017-03-29 8:08 [PATCH] read-cache: avoid git_path() race in freshen_shared_index() Christian Couder
@ 2017-03-29 17:06 ` Junio C Hamano
2017-03-29 17:56 ` Jeff King
0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2017-03-29 17:06 UTC (permalink / raw)
To: Christian Couder
Cc: git, Nguyen Thai Ngoc Duy, Ævar Arnfjörð Bjarmason,
Ramsay Jones, Jeff King, Christian Couder
Christian Couder <christian.couder@gmail.com> writes:
> When performing an interactive rebase in split-index mode,
> the commit message that one should rework when squashing commits
> can contain some garbage instead of the usual concatenation of
> both of the commit messages.
OK, that is an understandable explanation of what problem you are
trying to fix.
>
> When bisecting it appears that 94c9b5af70 (Merge branch
> 'cc/split-index-config', 2017-03-17) is the first bad commit.
>
> But when rebasing cc/split-index-config on top of the commit it
> was merged with, the first bad commit is then c3a0082502
> (read-cache: use freshen_shared_index() in read_index_from(),
> 2017-03-06).
This part however doesn't help understanding the issue. "When X but
when Y" sounds as if you found a botched merge, but that does not
seem to be the case. The resulting tree after rebasing (with
conflict resolution) is the same as the recorded merge result. It
could be saying that "git bisect" is buggy and does not pinpoint the
broken commit, but this is not a commit to fix "bisect".
That leaves the reader confused.
> This shows that we should be careful not to use git_path() in
> freshen_shared_index(). It is using a shared buffer that can
> too easily lead to races.
The impression I get from the symptom is that after git_path() is
called here, before check_and_freshen_file() uses that result, it
(or functions it calls) uses git_path(), and the number of times it
does so has changed since cc/split-index-config was written on the
mainline, and the rotating 4-element buffer get_pathname() gives is
now exhausted, leading to the failure you observed. By the way,
that does not sound a race to me.
In any case, that explains why bisect says the merge is the first
bad one, and cures the confused reader ;-) The use of git_path() on
the topic was still safe; it was a timebomb waiting to go off. The
mainline started using more calls and the merge result was unsafe.
If you meant to summarise the whole two paragraphs above that I
needed to think it through with "This shows that", I'd have to say
that you are expecting too much from your readers. Please be a bit
more gentle to them.
Thanks.
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
> read-cache.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index e447751823..2f10242c24 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1682,9 +1682,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
> */
> static void freshen_shared_index(char *base_sha1_hex, int warn)
> {
> - const char *shared_index = git_path("sharedindex.%s", base_sha1_hex);
> + char *shared_index = git_pathdup("sharedindex.%s", base_sha1_hex);
> if (!check_and_freshen_file(shared_index, 1) && warn)
> warning("could not freshen shared index '%s'", shared_index);
> + free(shared_index);
> }
>
> int read_index_from(struct index_state *istate, const char *path)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] read-cache: avoid git_path() race in freshen_shared_index()
2017-03-29 17:06 ` Junio C Hamano
@ 2017-03-29 17:56 ` Jeff King
2017-03-30 8:40 ` Christian Couder
2017-03-30 9:24 ` Duy Nguyen
0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2017-03-29 17:56 UTC (permalink / raw)
To: Junio C Hamano
Cc: Christian Couder, git, Nguyen Thai Ngoc Duy,
Ævar Arnfjörð Bjarmason, Ramsay Jones,
Christian Couder
On Wed, Mar 29, 2017 at 10:06:52AM -0700, Junio C Hamano wrote:
> > This shows that we should be careful not to use git_path() in
> > freshen_shared_index(). It is using a shared buffer that can
> > too easily lead to races.
>
> The impression I get from the symptom is that after git_path() is
> called here, before check_and_freshen_file() uses that result, it
> (or functions it calls) uses git_path(), and the number of times it
> does so has changed since cc/split-index-config was written on the
> mainline, and the rotating 4-element buffer get_pathname() gives is
> now exhausted, leading to the failure you observed. By the way,
> that does not sound a race to me.
>
> In any case, that explains why bisect says the merge is the first
> bad one, and cures the confused reader ;-) The use of git_path() on
> the topic was still safe; it was a timebomb waiting to go off. The
> mainline started using more calls and the merge result was unsafe.
Yeah, it looks like that is what happened. I see that Christian bisected
the rebase to find the commit in the series that introduces the problem.
I'm mildly curious which commit upstream created the problem[1].
There's a reasonable chance it's some innocent-looking cleanup (possibly
one of my recent "stop using a fixed buffer" ones).
But in the end it doesn't really matter. I think code like:
const char *filename = git_path(...);
or
nontrivial_function(git_path(...));
is an anti-pattern. It _might_ be safe, but it's really hard to tell
without following the complete lifetime of the return value. I've been
tempted to suggest we should abolish git_path() entirely. But it's so
darn useful for things like unlink(git_path(...)), or other direct
system calls.
As an aside, this kind of static-buffer reuse _used_ to mean you might
see somebody else's buffer. Which is bad enough. But since the move to
use strbufs underneath the hood of git_path(), it may produce that
effect or it may be a use-after-free (if the strbuf had to reallocate to
grow in the meantime).
Anyway. The fix in the patch is obviously the right thing.
-Peff
[1] I think we could pinpoint the upstream change that caused the bad
interaction by bisecting between the merge-base and the first-parent
of the broken merge. For each commit, cherry-pick the complete
series on top of it, and test the result.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] read-cache: avoid git_path() race in freshen_shared_index()
2017-03-29 17:56 ` Jeff King
@ 2017-03-30 8:40 ` Christian Couder
2017-03-30 16:00 ` Junio C Hamano
2017-03-30 9:24 ` Duy Nguyen
1 sibling, 1 reply; 8+ messages in thread
From: Christian Couder @ 2017-03-30 8:40 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, git, Nguyen Thai Ngoc Duy,
Ævar Arnfjörð Bjarmason, Ramsay Jones,
Christian Couder
On Wed, Mar 29, 2017 at 7:56 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Mar 29, 2017 at 10:06:52AM -0700, Junio C Hamano wrote:
>
>> > This shows that we should be careful not to use git_path() in
>> > freshen_shared_index(). It is using a shared buffer that can
>> > too easily lead to races.
>>
>> The impression I get from the symptom is that after git_path() is
>> called here, before check_and_freshen_file() uses that result, it
>> (or functions it calls) uses git_path(), and the number of times it
>> does so has changed since cc/split-index-config was written on the
>> mainline, and the rotating 4-element buffer get_pathname() gives is
>> now exhausted, leading to the failure you observed. By the way,
>> that does not sound a race to me.
>>
>> In any case, that explains why bisect says the merge is the first
>> bad one, and cures the confused reader ;-) The use of git_path() on
>> the topic was still safe; it was a timebomb waiting to go off. The
>> mainline started using more calls and the merge result was unsafe.
>
> Yeah, it looks like that is what happened. I see that Christian bisected
> the rebase to find the commit in the series that introduces the problem.
> I'm mildly curious which commit upstream created the problem[1].
I bisected it to 18633e1a22 (rebase -i: use the rebase--helper
builtin, 2017-02-09).
This commit is indeed changing how the interactive rebase works, but
it is not easy to see how it impact git_path() usage.
> There's a reasonable chance it's some innocent-looking cleanup (possibly
> one of my recent "stop using a fixed buffer" ones).
>
> But in the end it doesn't really matter. I think code like:
>
> const char *filename = git_path(...);
>
> or
>
> nontrivial_function(git_path(...));
>
> is an anti-pattern. It _might_ be safe, but it's really hard to tell
> without following the complete lifetime of the return value. I've been
> tempted to suggest we should abolish git_path() entirely. But it's so
> darn useful for things like unlink(git_path(...)), or other direct
> system calls.
Yeah, I am very tempted to just rewrite the commit message like this:
------------
When performing an interactive rebase in split-index mode,
the commit message that one should rework when squashing commits
can contain some garbage instead of the usual concatenation of
both of the commit messages.
Bisecting shows that c3a0082502 (read-cache: use
freshen_shared_index() in read_index_from(), 2017-03-06) is involved,
which points to the unsafe use of git_path() in
freshen_shared_index().
------------
and change the subject to "read-cache: avoid using git_path() in
freshen_shared_index()".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] read-cache: avoid git_path() race in freshen_shared_index()
2017-03-29 17:56 ` Jeff King
2017-03-30 8:40 ` Christian Couder
@ 2017-03-30 9:24 ` Duy Nguyen
2017-04-01 8:20 ` Jeff King
1 sibling, 1 reply; 8+ messages in thread
From: Duy Nguyen @ 2017-03-30 9:24 UTC (permalink / raw)
To: Jeff King
Cc: Junio C Hamano, Christian Couder, Git Mailing List,
Ævar Arnfjörð Bjarmason, Ramsay Jones,
Christian Couder
On Thu, Mar 30, 2017 at 12:56 AM, Jeff King <peff@peff.net> wrote:
> But in the end it doesn't really matter. I think code like:
>
> const char *filename = git_path(...);
>
> or
>
> nontrivial_function(git_path(...));
>
> is an anti-pattern. It _might_ be safe, but it's really hard to tell
> without following the complete lifetime of the return value. I've been
> tempted to suggest we should abolish git_path() entirely. But it's so
> darn useful for things like unlink(git_path(...)), or other direct
> system calls.
Yeah. I thought we killed most of those (was it your patches?).
I had a quick look at "git grep -w git_path" again. The ones in
builtin/am.c, builtin/grep.c and submodule.c look very much like that
anti-pattern. The one in read_index_from() probably should be replaced
with git_pathdup() as well. Sorry no patches (I'm very slow these
days).
--
Duy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] read-cache: avoid git_path() race in freshen_shared_index()
2017-03-30 8:40 ` Christian Couder
@ 2017-03-30 16:00 ` Junio C Hamano
0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-03-30 16:00 UTC (permalink / raw)
To: Christian Couder
Cc: Jeff King, git, Nguyen Thai Ngoc Duy,
Ævar Arnfjörð Bjarmason, Ramsay Jones,
Christian Couder
Christian Couder <christian.couder@gmail.com> writes:
> On Wed, Mar 29, 2017 at 7:56 PM, Jeff King <peff@peff.net> wrote:
> ...
>> Yeah, it looks like that is what happened. I see that Christian bisected
>> the rebase to find the commit in the series that introduces the problem.
>> I'm mildly curious which commit upstream created the problem[1].
>
> I bisected it to 18633e1a22 (rebase -i: use the rebase--helper
> builtin, 2017-02-09).
> This commit is indeed changing how the interactive rebase works, but
> it is not easy to see how it impact git_path() usage.
That small series activates a large body of code that hasn't been exercised
so it is not surprising.
> Yeah, I am very tempted to just rewrite the commit message like this:
>
> ------------
>
> When performing an interactive rebase in split-index mode,
> the commit message that one should rework when squashing commits
> can contain some garbage instead of the usual concatenation of
> both of the commit messages.
>
> Bisecting shows that c3a0082502 (read-cache: use
> freshen_shared_index() in read_index_from(), 2017-03-06) is involved,
> which points to the unsafe use of git_path() in
> freshen_shared_index().
>
> ------------
>
> and change the subject to "read-cache: avoid using git_path() in
> freshen_shared_index()".
Sure. It may not be even worth mentioning a not-so-useful bisect
result, and the potential riskiness of the original code (iow why
this fix is the right thing) can be seen from the patch alone.
Thanks for following it through.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] read-cache: avoid git_path() race in freshen_shared_index()
2017-03-30 9:24 ` Duy Nguyen
@ 2017-04-01 8:20 ` Jeff King
0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2017-04-01 8:20 UTC (permalink / raw)
To: Duy Nguyen
Cc: Junio C Hamano, Christian Couder, Git Mailing List,
Ævar Arnfjörð Bjarmason, Ramsay Jones,
Christian Couder
On Thu, Mar 30, 2017 at 04:24:40PM +0700, Duy Nguyen wrote:
> On Thu, Mar 30, 2017 at 12:56 AM, Jeff King <peff@peff.net> wrote:
> > But in the end it doesn't really matter. I think code like:
> >
> > const char *filename = git_path(...);
> >
> > or
> >
> > nontrivial_function(git_path(...));
> >
> > is an anti-pattern. It _might_ be safe, but it's really hard to tell
> > without following the complete lifetime of the return value. I've been
> > tempted to suggest we should abolish git_path() entirely. But it's so
> > darn useful for things like unlink(git_path(...)), or other direct
> > system calls.
>
> Yeah. I thought we killed most of those (was it your patches?).
Yes, after fixing a bug where static buffer reuse caused git to randomly
delete a ref, I rage-converted most of the dangerous looking cases.
> I had a quick look at "git grep -w git_path" again. The ones in
> builtin/am.c, builtin/grep.c and submodule.c look very much like that
> anti-pattern. The one in read_index_from() probably should be replaced
> with git_pathdup() as well. Sorry no patches (I'm very slow these
> days).
Yeah, I think a number of them are actually OK if you dig (e.g., passing
it to am_state_init() immediately duplicates the result), but it's a bad
pattern if you have to dig to see if it's right. It's hard to tell when
a sub-function may reuse the buffer. For instance, git-init passes the
result to adjust_shared_perm(), which might lazily load the config from
disk. I don't know if that calls git_path() or not, but it's an awful
lot of code to run.
A lot of the cases look like they could be fixed by using git_path_foo()
instead of git_path("FOO"). (And in many cases we even have
git_path_foo() defined already!).
My favorite is the one in add_worktree(), which calls strbuf_addstr() on
the result of git_path(0. That one's _not_ dangerous, but surely it
would be simpler to just write directly into the strbuf. :)
-Peff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] read-cache: avoid git_path() race in freshen_shared_index()
@ 2017-04-11 12:53 Devan Lucas
0 siblings, 0 replies; 8+ messages in thread
From: Devan Lucas @ 2017-04-11 12:53 UTC (permalink / raw)
To: gitster; +Cc: avarab, chriscool, christian.couder, git, pclouds, peff, ramsay
Sent from my iPhone
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-04-11 12:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-29 8:08 [PATCH] read-cache: avoid git_path() race in freshen_shared_index() Christian Couder
2017-03-29 17:06 ` Junio C Hamano
2017-03-29 17:56 ` Jeff King
2017-03-30 8:40 ` Christian Couder
2017-03-30 16:00 ` Junio C Hamano
2017-03-30 9:24 ` Duy Nguyen
2017-04-01 8:20 ` Jeff King
-- strict thread matches above, loose matches on Subject: below --
2017-04-11 12:53 Devan Lucas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).