From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Kirill Smelkov <kirr@mns.spb.ru>
Subject: Re: [PATCH] tree-diff: avoid alloca for large allocations
Date: Tue, 7 Jun 2016 21:36:56 -0400 [thread overview]
Message-ID: <20160608013656.GA7277@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq8tygo6s4.fsf@gitster.mtv.corp.google.com>
On Tue, Jun 07, 2016 at 05:36:59PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > An alternative to this would be implement something like:
> >
> > struct tree *tp, tp_fallback[2];
> > if (nparent <= ARRAY_SIZE(tp_fallback))
> > tp = tp_fallback;
> > else
> > ALLOC_ARRAY(tp, nparent);
> > ...
> > if (tp != tp_fallback)
> > free(tp);
> >
> > That would let us drop our xalloca() portability code
> > entirely. But in my measurements, this seemed to perform
> > slightly worse than the xalloca solution.
>
> It indeed is curious why this "obvious" alternative performed
> worse.
Yeah. I'd be happy if somebody wanted to prove me wrong here. It's
entirely possible I just did something stupid. I had wrapped up the
above in the FAST_ARRAY_ALLOC macro. It would waste the stack space when
we _did_ have to call malloc, but that should almost never happen in my
benchmark. It also allocates an extra slot on the stack for non-merge
commits. So I guess the wasted 24 bytes or whatever could have in
impact. It's also possible that the extra variable simply tickles the
optimizer in some way; I didn't look at the generated asm.
FWIW, here are the timings I had come up with for running "git log --raw
--no-abbrev --no-renames" on linux.git (the same benchmark from the
commit that originally added the alloca). The "time" output at the end
is best-of-five, with the "attempt" lines showing the wall-clock time
for each:
[original, with xalloca]
Attempt 1: 30.669
Attempt 2: 30.782
Attempt 3: 31.807
Attempt 4: 31.152
Attempt 5: 30.243
real 0m30.243s
user 0m30.112s
sys 0m0.128s
[xmalloc/free instead of xalloca]
Attempt 1: 31.306
Attempt 2: 31.814
Attempt 3: 31.138
Attempt 4: 31.787
Attempt 5: 32.23
real 0m31.138s
user 0m30.976s
sys 0m0.160s
[local var with fallback to xmalloc]
Attempt 1: 30.926
Attempt 2: 31.466
Attempt 3: 31.865
Attempt 4: 31.345
Attempt 5: 33.159
real 0m30.926s
user 0m30.804s
sys 0m0.120s
So the improvement here over even a naive malloc/free is _really_ small.
You'll note that the best-of-five for xmalloc is actually smaller than
the worst-case with xalloca. But the mean is still 2% better. The mean
for the "fallback" case aren't really any better (which makes me wonder
if I was somehow accidentally calling malloc each time).
So I dunno. For 2%, I was tempted to simply say "screw it, let's just
forget this micro-optimization and call xmalloc". This patch is the
conservative choice in that it has the same performance profile as the
original. Or so I thought. I didn't record the old timings, but they
looked similar to the original. I just recreated them while writing this
email and got:
[xalloca with fallback to xmalloc]
Attempt 1: 31.356
Attempt 2: 31.725
Attempt 3: 31.454
Attempt 4: 30.898
Attempt 5: 31.937
real 0m30.898s
user 0m30.752s
sys 0m0.144s
which really isn't much better than the local-var case. I wonder if just
having the conditional stack/heap is what kills us (either itself, or
because it affects the optimizer).
I dunno.
> > +#define FAST_ARRAY_ALLOC(x, nr) do { \
> > + if ((nr) <= 2) \
> > + (x) = xalloca((nr) * sizeof(*(x))); \
> > + else \
> > + ALLOC_ARRAY((x), nr); \
> > +} while(0)
> > +#define FAST_ARRAY_FREE(x, nr) do { \
> > + if ((nr) > 2) \
> > + free((x)); \
> > +} while(0)
>
> An obvious and clean implementation of the idea.
>
> The only slight worry I have is that nr must stay constant until the
> time the caller calls FREE(), but for the only three callsite pairs
> it is clear nparent will stay constant and this is local to the
> file.
Yep, I had the same worry. I think it's OK because the damage is limited
to tree-diff.c. I'm not sure I would promote this macro for general use.
-Peff
prev parent reply other threads:[~2016-06-08 1:37 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-07 22:53 [PATCH] tree-diff: avoid alloca for large allocations Jeff King
2016-06-08 0:36 ` Junio C Hamano
2016-06-08 1:36 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160608013656.GA7277@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kirr@mns.spb.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).