From: Ingo Molnar <mingo@elte.hu>
To: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
tglx@linutronix.de, Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, hpa@zytor.com
Subject: Re: [PATCH] x86-64: fix memset() to support sizes of 4Gb and above
Date: Thu, 19 Jan 2012 13:18:59 +0100 [thread overview]
Message-ID: <20120119121859.GA3936@elte.hu> (raw)
In-Reply-To: <4F17D8EB020000780006D943@nat28.tlf.novell.com>
* Jan Beulich <JBeulich@suse.com> wrote:
> >>> On 18.01.12 at 19:16, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Wed, Jan 18, 2012 at 2:40 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >>> For example the kernel's memcpy routine in slightly faster than
> >>> glibc's:
> >>
> >> This is an illusion - since the kernel's memcpy_64.S also defines a
> >> "memcpy" (not just "__memcpy"), the static linker resolves the
> >> reference from mem-memcpy.c against this one. Apparent
> >> performance differences rather point at effects like (guessing)
> >> branch prediction (using the second vs the first entry of
> >> routines[]). After fixing this, on my Westmere box glibc's is quite
> >> a bit slower than the unrolled kernel variant (4% fewer
> >> instructions, but about 15% more cycles).
> >
> > Please don't bother doing memcpy performance analysis using
> > hot-cache cases (or entirely cold-cache for that matter)
> > and/or big memory copies.
>
> I realize that - I just was asked to do this analysis, to
> (hopefully) turn down arguments against the $subject patch.
The other problem with such repeated measurements, beyond their
very isolated and artificially sterile nature, is what i
mentioned: the inter-test variability is not enough to signal
the real variance that occurs in a live system. That too can be
deceiving.
Note that your patch is a special case which makes measurement
easier: from the nature of your changes i expected *at most*
some minimal micro-performance impact, not any larger access
pattern related changes.
But Linus is right that this cannot be generalized to the
typical patch.
So i realize all those limitations and fully agree with being
aware of them, but compared to measuring *nothing* (which is the
current status quo) we have to start *somewhere*.
> > The *normal* memory copy size tends to be in the 10-30 byte
> > range, and the cache issues (both code *and* data) are
> > unclear. Running microbenchmarks is almost always
> > counter-productive, since it actually shows numbers for
> > something that has absolutely *nothing* to do with the
> > actual patterns.
>
> This is why I added a way to do meaningful measurement on
> small size operations (albeit still cache-hot) with perf.
We could add a test point for 10 and a 30 bytes, and the two
corner cases: one measurement with an I$ that is trashing and a
measurement where the D$ is trashing in a non-trivial way.
( I have used test-code before to achieve high I$ trashing: a
function with a million NOPs. )
Once we have the typical sizes and the edge cases covered we can
at least hope that reality is a healthy mix of all those
"eigen-vectors".
Once we have that in place we can at least have one meaningful
result: if a patch improves *all* these edge cases on the CPU
models that matter, then it's typically true that it will
improve the generic 'mixed' workload as well.
If a patch is not so clear-cut then it has to be measured with
real loads as well, etc.
Anyway, i'll apply your current patches and play with them a
bit.
Thanks,
Ingo
next prev parent reply other threads:[~2012-01-19 12:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-05 16:10 [PATCH] x86-64: fix memset() to support sizes of 4Gb and above Jan Beulich
2012-01-06 11:05 ` Ingo Molnar
2012-01-06 12:31 ` Jan Beulich
2012-01-06 19:01 ` Ingo Molnar
2012-01-18 10:40 ` Jan Beulich
2012-01-18 11:14 ` Ingo Molnar
2012-01-18 13:33 ` Jan Beulich
2012-01-18 18:16 ` Linus Torvalds
2012-01-19 7:48 ` Jan Beulich
2012-01-19 12:18 ` Ingo Molnar [this message]
2012-01-26 13:40 ` [tip:x86/asm] x86-64: Fix " tip-bot for Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120119121859.GA3936@elte.hu \
--to=mingo@elte.hu \
--cc=JBeulich@suse.com \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.