public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Jan Beulich <JBeulich@suse.com>
Cc: tglx@linutronix.de, hpa@zytor.com, linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH] x86-64: fix memset() to support sizes of 4Gb and above
Date: Fri, 6 Jan 2012 12:05:20 +0100	[thread overview]
Message-ID: <20120106110519.GA32673@elte.hu> (raw)
In-Reply-To: <4F05D992020000780006AA09@nat28.tlf.novell.com>


* Jan Beulich <JBeulich@suse.com> wrote:

> While currently there doesn't appear to be any reachable 
> in-tree case where such large memory blocks may be passed to 
> memset() (alloc_bootmem() being the primary non-reachable one, 
> as it gets called with suitably large sizes in FLATMEM 
> configurations), we have recently hit the problem a second 
> time in our Xen kernels. Rather than working around it a 
> second time, prevent others from falling into the same trap by 
> fixing this long standing limitation.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Have you checked the before/after size of the hotpath?

The patch suggests that it got shorter by 3 instructions:

> -	movl %edx,%r8d
> -	andl $7,%r8d
> -	movl %edx,%ecx
> -	shrl $3,%ecx
> +	movq %rdx,%rcx
> +	andl $7,%edx
> +	shrq $3,%rcx

[...]

>  	movq %rdi,%r10
> -	movq %rdx,%r11
>  

[...]

> -	movl	%r11d,%ecx
> -	andl	$7,%ecx
> +	andl	$7,%edx

Is that quick impression correct? I have not tried building or 
measuring it.

Also, note that we have some cool instrumentation tech upstream, 
lately we've added a way to measure the *kernel*'s memcpy 
routine performance in user-space, using perf bench:

  $ perf bench mem memcpy -r x86
  # Running mem/memcpy benchmark...
  Unknown routine:x86
  Available routines...
	default ... Default memcpy() provided by glibc
	x86-64-unrolled ... unrolled memcpy() in arch/x86/lib/memcpy_64.S

  $ perf bench mem memcpy -r x86-64-unrolled
  # Running mem/memcpy benchmark...
  # Copying 1MB Bytes ...

       1.888902 GB/Sec
      15.024038 GB/Sec (with prefault)

This builds the routine in arch/x86/lib/memcpy_64.S and measures 
its bandwidth in the cache-cold and cache-hot case as well.

Would be nice to add support for arch/x86/lib/memset_64.S as 
well, and look at the before/after performance of it. In 
userspace we can do a lot more accurate measurements of this 
kind:

 $ perf stat --repeat 1000 -e instructions -e cycles perf bench mem memcpy -r x86-64-unrolled >/dev/null

 Performance counter stats for 'perf bench mem memcpy -r x86-64-unrolled' (1000 runs):

         4,924,378 instructions              #    0.84  insns per cycle          ( +-  0.03% )
         5,892,603 cycles                    #    0.000 GHz                      ( +-  0.06% )

       0.002208000 seconds time elapsed                                          ( +-  0.10% )

Check how the confidence interval of the measurement is in the 
0.03% range, thus even a single cycle change in overhead caused 
by your patch should be measurable. Note, you'll need to rebuild 
perf with the different memset routines.

For example the kernel's memcpy routine in slightly faster than 
glibc's:

  $ perf stat --repeat 1000 -e instructions -e cycles perf bench mem memcpy -r default >/dev/null

 Performance counter stats for 'perf bench mem memcpy -r default' (1000 runs):

         4,927,173 instructions              #    0.83  insns per cycle          ( +-  0.03% )
         5,928,168 cycles                    #    0.000 GHz                      ( +-  0.06% )

       0.002157349 seconds time elapsed                                          ( +-  0.10% )

If such measurements all suggests equal or better performance, 
and if there's no erratum in current CPUs that would make 4G 
string copies dangerous [which your research suggests should be 
fine], i have no principal objection against this patch.

I'd not be surprised (at all) if other OSs did larger than 4GB 
memsets in certain circumstances.

Thanks,

	Ingo

  reply	other threads:[~2012-01-06 11:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-05 16:10 [PATCH] x86-64: fix memset() to support sizes of 4Gb and above Jan Beulich
2012-01-06 11:05 ` Ingo Molnar [this message]
2012-01-06 12:31   ` Jan Beulich
2012-01-06 19:01     ` Ingo Molnar
2012-01-18 10:40   ` Jan Beulich
2012-01-18 11:14     ` Ingo Molnar
2012-01-18 13:33       ` Jan Beulich
2012-01-18 18:16     ` Linus Torvalds
2012-01-19  7:48       ` Jan Beulich
2012-01-19 12:18         ` Ingo Molnar
2012-01-26 13:40 ` [tip:x86/asm] x86-64: Fix " tip-bot for Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120106110519.GA32673@elte.hu \
    --to=mingo@elte.hu \
    --cc=JBeulich@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox