public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Zou Nan hai <nanhai.zou@intel.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: possible performance issue in 4-level page tables
Date: Tue, 01 Feb 2005 18:46:45 +1100	[thread overview]
Message-ID: <41FF33E5.4070107@yahoo.com.au> (raw)
In-Reply-To: <1107231570.2555.19.camel@linux-znh>

Zou Nan hai wrote:
> There is a performance regression of lmbench
> lat_proc fork result on ia64.
> 
> in 
> 2.6.10 
> 
> I got 
> Process fork+exit:164.8438 microseconds.
> 
> in 2.6.11-rc2
> Process fork+exit:183.8621 microseconds.
> 
> I believe this regression was caused by 
> the 4-level page tables change.
> 
> Since most of the kernel time spend in lat_proc fork is copy_page_range
> in fork path and clear_page_range in the exit path. Now they are 1 level
> deeper.
> 
> Though pud and pgd is same on IA64, there is still some overhead
> introduced I think.
>  
> Are any other architectures seeing the same sort of results?
> 

I didn't think the i386 numbers were down that much, but I can't recall
the exact figures I got.... just some rambling thoughts:

There will be a little more overhead in copy_page_range. I am surprised
it is that much though. Another likely place to look at is clear_page_range
in the exit path - that has had some bigger changes, and I would be less
*unhappy* if the slowdown is mostly coming from there.

I was thinking about this, and I believe it may be possible to implement
macros for walking page table ranges that optimise out the folded levels
without the additional function call. I didn't want to get too carried away
yet though, because I didn't want to deviate too much from Andi's work, and
everybody seems to do it slightly differently and it might not be possible
to sanely reconcile them all.

Maybe the best option is inlining. If these are inline, the compiler might
be smart enough to optimise it away itself. I did definitely see a small
but non trivial improvement in lmbench fork+exit when making page table
walkers all inline.

This was objected to for debuggability (stack trace) reasons, which is fair
enough. I'm told the gcc option -funit-at-a-time will do the job nicely, and
even generate proper backtrace info... but gcc's inlining can bloat stack
usage, and Linus won't enable this until that's fixed.

So we may be a bit stuck, for the moment. You could probably try some
preliminary tests with the inline keyword and/or -funit-at-a-time.

Maybe for now we could just enable -funit-at-a-time for some select important
files in mm/, if there is a large gain to be had?

Thanks,
Nick


      reply	other threads:[~2005-02-01  8:08 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-01  4:19 possible performance issue in 4-level page tables Zou Nan hai
2005-02-01  7:46 ` Nick Piggin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41FF33E5.4070107@yahoo.com.au \
    --to=nickpiggin@yahoo.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nanhai.zou@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox