All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
To: lkp@lists.01.org
Subject: Re: [mm] 5c0a85fad9: unixbench.score -6.3% regression
Date: Mon, 13 Jun 2016 15:52:48 +0300	[thread overview]
Message-ID: <20160613125248.GA30109@black.fi.intel.com> (raw)
In-Reply-To: <CA+55aFy4oYis6HTu7o4YwiFawRtDOPO=87v8oHZdTFS+BjnA8g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

On Sat, Jun 11, 2016 at 06:02:57PM -0700, Linus Torvalds wrote:
> On Sat, Jun 11, 2016 at 5:49 PM, Huang, Ying <ying.huang@intel.com> wrote:
> >
> > From perf profile, the time spent in page_fault and its children
> > functions are almost same (7.85% vs 7.81%).  So the time spent in page
> > fault and page table operation itself doesn't changed much.  So, you
> > mean CPU may be slower to load the page table entry to TLB if accessed
> > bit is not set?
> 
> So the CPU does take a microfault internally when it needs to set the
> accessed/dirty bit. It's not architecturally visible, but you can see
> it when you do timing loops.
> 
> I've timed it at over a thousand cycles on at least some CPU's, but
> that's still peanuts compared to a real page fault. It shouldn't be
> *that* noticeable, ie no way it's a 6% regression on its own.

Looks like setting accessed bit is the problem.

Withouth mkold:

Score: 1952.9

  Performance counter stats for './Run shell8 -c 1' (3 runs):
 
    468,562,316,621      cycles:u                                                      ( +-  0.02% )
      4,596,299,472      dtlb_load_misses_walk_duration:u                                     ( +-  0.07% )
      5,245,488,559      itlb_misses_walk_duration:u                                     ( +-  0.10% )
 
      189.336404566 seconds time elapsed                                          ( +-  0.01% )

With mkold:

Score: 1885.5

  Performance counter stats for './Run shell8 -c 1' (3 runs):
 
    503,185,676,256      cycles:u                                                      ( +-  0.06% )
      8,137,007,894      dtlb_load_misses_walk_duration:u                                     ( +-  0.85% )
      7,220,632,283      itlb_misses_walk_duration:u                                     ( +-  1.40% )
 
      189.363223499 seconds time elapsed                                          ( +-  0.01% )

We spend 36% more time in page walk only, about 1% of total userspace time.
Combining this with page walk footprint on caches, I guess we can get to
this 3.5% score difference I see.

I'm not sure if there's anything we can do to solve the issue without
screwing relacim logic again. :(

-- 
 Kirill A. Shutemov

WARNING: multiple messages have this Message-ID (diff)
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Rik van Riel <riel@redhat.com>, Michal Hocko <mhocko@suse.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Minchan Kim <minchan@kernel.org>,
	Vinayak Menon <vinmenon@codeaurora.org>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>, LKP <lkp@01.org>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression
Date: Mon, 13 Jun 2016 15:52:48 +0300	[thread overview]
Message-ID: <20160613125248.GA30109@black.fi.intel.com> (raw)
In-Reply-To: <CA+55aFy4oYis6HTu7o4YwiFawRtDOPO=87v8oHZdTFS+BjnA8g@mail.gmail.com>

On Sat, Jun 11, 2016 at 06:02:57PM -0700, Linus Torvalds wrote:
> On Sat, Jun 11, 2016 at 5:49 PM, Huang, Ying <ying.huang@intel.com> wrote:
> >
> > From perf profile, the time spent in page_fault and its children
> > functions are almost same (7.85% vs 7.81%).  So the time spent in page
> > fault and page table operation itself doesn't changed much.  So, you
> > mean CPU may be slower to load the page table entry to TLB if accessed
> > bit is not set?
> 
> So the CPU does take a microfault internally when it needs to set the
> accessed/dirty bit. It's not architecturally visible, but you can see
> it when you do timing loops.
> 
> I've timed it at over a thousand cycles on at least some CPU's, but
> that's still peanuts compared to a real page fault. It shouldn't be
> *that* noticeable, ie no way it's a 6% regression on its own.

Looks like setting accessed bit is the problem.

Withouth mkold:

Score: 1952.9

  Performance counter stats for './Run shell8 -c 1' (3 runs):
 
    468,562,316,621      cycles:u                                                      ( +-  0.02% )
      4,596,299,472      dtlb_load_misses_walk_duration:u                                     ( +-  0.07% )
      5,245,488,559      itlb_misses_walk_duration:u                                     ( +-  0.10% )
 
      189.336404566 seconds time elapsed                                          ( +-  0.01% )

With mkold:

Score: 1885.5

  Performance counter stats for './Run shell8 -c 1' (3 runs):
 
    503,185,676,256      cycles:u                                                      ( +-  0.06% )
      8,137,007,894      dtlb_load_misses_walk_duration:u                                     ( +-  0.85% )
      7,220,632,283      itlb_misses_walk_duration:u                                     ( +-  1.40% )
 
      189.363223499 seconds time elapsed                                          ( +-  0.01% )

We spend 36% more time in page walk only, about 1% of total userspace time.
Combining this with page walk footprint on caches, I guess we can get to
this 3.5% score difference I see.

I'm not sure if there's anything we can do to solve the issue without
screwing relacim logic again. :(

-- 
 Kirill A. Shutemov

  parent reply	other threads:[~2016-06-13 12:52 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06  2:27 [mm] 5c0a85fad9: unixbench.score -6.3% regression kernel test robot
2016-06-06  2:27 ` [lkp] " kernel test robot
2016-06-06  9:51 ` Kirill A. Shutemov
2016-06-06  9:51   ` [lkp] " Kirill A. Shutemov
2016-06-08  7:21   ` Huang, Ying
2016-06-08  7:21     ` [LKP] [lkp] " Huang, Ying
2016-06-08  8:41     ` Huang, Ying
2016-06-08  8:41       ` [LKP] [lkp] " Huang, Ying
2016-06-08  8:58       ` Kirill A. Shutemov
2016-06-08  8:58         ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-12  0:49         ` Huang, Ying
2016-06-12  0:49           ` [LKP] [lkp] " Huang, Ying
2016-06-12  1:02           ` Linus Torvalds
2016-06-12  1:02             ` [LKP] [lkp] " Linus Torvalds
2016-06-13  9:02             ` Huang, Ying
2016-06-13  9:02               ` [LKP] [lkp] " Huang, Ying
2016-06-14 13:38               ` Minchan Kim
2016-06-14 13:38                 ` [LKP] [lkp] " Minchan Kim
2016-06-15 23:42                 ` Huang, Ying
2016-06-15 23:42                   ` [LKP] [lkp] " Huang, Ying
2016-06-13 12:52             ` Kirill A. Shutemov [this message]
2016-06-13 12:52               ` Kirill A. Shutemov
2016-06-14  6:11               ` Linus Torvalds
2016-06-14  6:11                 ` [LKP] [lkp] " Linus Torvalds
2016-06-14  8:26                 ` Kirill A. Shutemov
2016-06-14  8:26                   ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-14 16:07                   ` Rik van Riel
2016-06-14 16:07                     ` [LKP] [lkp] " Rik van Riel
2016-06-14 14:03                 ` Christian Borntraeger
2016-06-14 14:03                   ` Christian Borntraeger
2016-06-14  8:57         ` Minchan Kim
2016-06-14  8:57           ` [LKP] [lkp] " Minchan Kim
2016-06-14 14:34           ` Kirill A. Shutemov
2016-06-14 14:34             ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-15 23:52             ` Huang, Ying
2016-06-15 23:52               ` [LKP] [lkp] " Huang, Ying
2016-06-16  0:13               ` Minchan Kim
2016-06-16  0:13                 ` [LKP] [lkp] " Minchan Kim
2016-06-16 22:27                 ` Huang, Ying
2016-06-16 22:27                   ` [LKP] [lkp] " Huang, Ying
2016-06-17  5:41                   ` Minchan Kim
2016-06-17  5:41                     ` [LKP] [lkp] " Minchan Kim
2016-06-17 19:26                     ` Huang, Ying
2016-06-17 19:26                       ` [LKP] [lkp] " Huang, Ying
2016-06-20  0:06                       ` Minchan Kim
2016-06-20  0:06                         ` [LKP] [lkp] " Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160613125248.GA30109@black.fi.intel.com \
    --to=kirill.shutemov@linux.intel.com \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.