Re: [PATCH] x86-64: fix memset() to support sizes of 4Gb and above

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ingo Molnar <mingo@elte.hu>
To: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	tglx@linutronix.de, Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, hpa@zytor.com
Subject: Re: [PATCH] x86-64: fix memset() to support sizes of 4Gb and above
Date: Thu, 19 Jan 2012 13:18:59 +0100	[thread overview]
Message-ID: <20120119121859.GA3936@elte.hu> (raw)
In-Reply-To: <4F17D8EB020000780006D943@nat28.tlf.novell.com>

* Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 18.01.12 at 19:16, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Wed, Jan 18, 2012 at 2:40 AM, Jan Beulich <JBeulich@suse.com> wrote:
> >>
> >>> For example the kernel's memcpy routine in slightly faster than
> >>> glibc's:
> >>
> >> This is an illusion - since the kernel's memcpy_64.S also defines a
> >> "memcpy" (not just "__memcpy"), the static linker resolves the
> >> reference from mem-memcpy.c against this one. Apparent
> >> performance differences rather point at effects like (guessing)
> >> branch prediction (using the second vs the first entry of
> >> routines[]). After fixing this, on my Westmere box glibc's is quite
> >> a bit slower than the unrolled kernel variant (4% fewer
> >> instructions, but about 15% more cycles).
> > 
> > Please don't bother doing memcpy performance analysis using 
> > hot-cache cases (or entirely cold-cache for that matter) 
> > and/or big memory copies.
> 
> I realize that - I just was asked to do this analysis, to 
> (hopefully) turn down arguments against the $subject patch.

The other problem with such repeated measurements, beyond their 
very isolated and artificially sterile nature, is what i 
mentioned: the inter-test variability is not enough to signal 
the real variance that occurs in a live system. That too can be 
deceiving.

Note that your patch is a special case which makes measurement 
easier: from the nature of your changes i expected *at most* 
some minimal micro-performance impact, not any larger access 
pattern related changes.

But Linus is right that this cannot be generalized to the 
typical patch.

So i realize all those limitations and fully agree with being 
aware of them, but compared to measuring *nothing* (which is the 
current status quo) we have to start *somewhere*.

> > The *normal* memory copy size tends to be in the 10-30 byte 
> > range, and the cache issues (both code *and* data) are 
> > unclear. Running microbenchmarks is almost always 
> > counter-productive, since it actually shows numbers for 
> > something that has absolutely *nothing* to do with the 
> > actual patterns.
> 
> This is why I added a way to do meaningful measurement on 
> small size operations (albeit still cache-hot) with perf.

We could add a test point for 10 and a 30 bytes, and the two 
corner cases: one measurement with an I$ that is trashing and a 
measurement where the D$ is trashing in a non-trivial way.

( I have used test-code before to achieve high I$ trashing: a
  function with a million NOPs. )

Once we have the typical sizes and the edge cases covered we can 
at least hope that reality is a healthy mix of all those 
"eigen-vectors".

Once we have that in place we can at least have one meaningful 
result: if a patch improves *all* these edge cases on the CPU 
models that matter, then it's typically true that it will 
improve the generic 'mixed' workload as well.

If a patch is not so clear-cut then it has to be measured with 
real loads as well, etc.

Anyway, i'll apply your current patches and play with them a 
bit.

Thanks,

	Ingo

next prev parent reply	other threads:[~2012-01-19 12:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-05 16:10 [PATCH] x86-64: fix memset() to support sizes of 4Gb and above Jan Beulich
2012-01-06 11:05 ` Ingo Molnar
2012-01-06 12:31   ` Jan Beulich
2012-01-06 19:01     ` Ingo Molnar
2012-01-18 10:40   ` Jan Beulich
2012-01-18 11:14     ` Ingo Molnar
2012-01-18 13:33       ` Jan Beulich
2012-01-18 18:16     ` Linus Torvalds
2012-01-19  7:48       ` Jan Beulich
2012-01-19 12:18         ` Ingo Molnar [this message]
2012-01-26 13:40 ` [tip:x86/asm] x86-64: Fix " tip-bot for Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120119121859.GA3936@elte.hu \
    --to=mingo@elte.hu \
    --cc=JBeulich@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.