All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "Huang, Ying" <ying.huang@intel.com>,
	Rik van Riel <riel@redhat.com>, Michal Hocko <mhocko@suse.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Minchan Kim <minchan@kernel.org>,
	Vinayak Menon <vinmenon@codeaurora.org>,
	Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>, LKP <lkp@01.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	linux-s390 <linux-s390@vger.kernel.org>
Subject: Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression
Date: Tue, 14 Jun 2016 16:03:23 +0200	[thread overview]
Message-ID: <57600EAB.9030000@de.ibm.com> (raw)
In-Reply-To: <CA+55aFx2TdqHW5VvirF-fAe4rRtSKK6BH06LyN4Ma3Q7ifJkxA@mail.gmail.com>

On 06/14/2016 08:11 AM, Linus Torvalds wrote:
> On Mon, Jun 13, 2016 at 5:52 AM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
>> On Sat, Jun 11, 2016 at 06:02:57PM -0700, Linus Torvalds wrote:
>>>
>>> I've timed it at over a thousand cycles on at least some CPU's, but
>>> that's still peanuts compared to a real page fault. It shouldn't be
>>> *that* noticeable, ie no way it's a 6% regression on its own.
>>
>> Looks like setting accessed bit is the problem.
> 
> Ok. I've definitely seen it as an issue, but never to the point of
> several percent on a real benchmark that wasn't explicitly testing
> that cost.
> 
> I reported the excessive dirty/accessed bit cost to Intel back in the
> P4 days, but it's apparently not been high enough for anybody to care.
> 
>> We spend 36% more time in page walk only, about 1% of total userspace time.
>> Combining this with page walk footprint on caches, I guess we can get to
>> this 3.5% score difference I see.
>>
>> I'm not sure if there's anything we can do to solve the issue without
>> screwing relacim logic again. :(
> 
> I think we should say "screw the reclaim logic" for now, and revert
> commit 5c0a85fad949 for now.
> 
> Considering how much trouble the accessed bit is on some other
> architectures too, I wonder if we should strive to simply not care
> about it, and always leaving it set. And then rely entirely on just
> unmapping the pages and making the "we took a page fault after
> unmapping" be the real activity tester.
> 
> So get rid of the "if the page is young, mark it old but leave it in
> the page tables" logic entirely. When we unmap a page, it will always
> either be in the swap cache or the page cache anyway, so faulting it
> in again should be just a minor fault with no actual IO happening.
> 
> That might be less of an impact in the end - yes, the unmap and
> re-fault is much more expensive, but it presumably happens to much
> fewer pages.

FWIW, something like that is what Martin did for s390 3 years ago.
We now use invalidation and page faults to implement the *young 
functions in  pgtable.h (basically using a SW young bit). This
helped us to get rid of the storage keys (which contain the HW 
reference bit). The performance did not seem to suffer.

See commit 0944fe3f4a323f436180d39402cae7f9c46ead17
s390/mm: implement software referenced bits

> 
> What do you think?

Your proposal would be to do the software tracking via
invalidation/fault part of the generic mm code and not to hide it
in the architecture backend. Correct?

> 
>              Linus
> 

WARNING: multiple messages have this Message-ID (diff)
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: lkp@lists.01.org
Subject: Re: [mm] 5c0a85fad9: unixbench.score -6.3% regression
Date: Tue, 14 Jun 2016 16:03:23 +0200	[thread overview]
Message-ID: <57600EAB.9030000@de.ibm.com> (raw)
In-Reply-To: <CA+55aFx2TdqHW5VvirF-fAe4rRtSKK6BH06LyN4Ma3Q7ifJkxA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

On 06/14/2016 08:11 AM, Linus Torvalds wrote:
> On Mon, Jun 13, 2016 at 5:52 AM, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
>> On Sat, Jun 11, 2016 at 06:02:57PM -0700, Linus Torvalds wrote:
>>>
>>> I've timed it at over a thousand cycles on at least some CPU's, but
>>> that's still peanuts compared to a real page fault. It shouldn't be
>>> *that* noticeable, ie no way it's a 6% regression on its own.
>>
>> Looks like setting accessed bit is the problem.
> 
> Ok. I've definitely seen it as an issue, but never to the point of
> several percent on a real benchmark that wasn't explicitly testing
> that cost.
> 
> I reported the excessive dirty/accessed bit cost to Intel back in the
> P4 days, but it's apparently not been high enough for anybody to care.
> 
>> We spend 36% more time in page walk only, about 1% of total userspace time.
>> Combining this with page walk footprint on caches, I guess we can get to
>> this 3.5% score difference I see.
>>
>> I'm not sure if there's anything we can do to solve the issue without
>> screwing relacim logic again. :(
> 
> I think we should say "screw the reclaim logic" for now, and revert
> commit 5c0a85fad949 for now.
> 
> Considering how much trouble the accessed bit is on some other
> architectures too, I wonder if we should strive to simply not care
> about it, and always leaving it set. And then rely entirely on just
> unmapping the pages and making the "we took a page fault after
> unmapping" be the real activity tester.
> 
> So get rid of the "if the page is young, mark it old but leave it in
> the page tables" logic entirely. When we unmap a page, it will always
> either be in the swap cache or the page cache anyway, so faulting it
> in again should be just a minor fault with no actual IO happening.
> 
> That might be less of an impact in the end - yes, the unmap and
> re-fault is much more expensive, but it presumably happens to much
> fewer pages.

FWIW, something like that is what Martin did for s390 3 years ago.
We now use invalidation and page faults to implement the *young 
functions in  pgtable.h (basically using a SW young bit). This
helped us to get rid of the storage keys (which contain the HW 
reference bit). The performance did not seem to suffer.

See commit 0944fe3f4a323f436180d39402cae7f9c46ead17
s390/mm: implement software referenced bits

> 
> What do you think?

Your proposal would be to do the software tracking via
invalidation/fault part of the generic mm code and not to hide it
in the architecture backend. Correct?

> 
>              Linus
> 


  parent reply	other threads:[~2016-06-14 14:03 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06  2:27 [mm] 5c0a85fad9: unixbench.score -6.3% regression kernel test robot
2016-06-06  2:27 ` [lkp] " kernel test robot
2016-06-06  9:51 ` Kirill A. Shutemov
2016-06-06  9:51   ` [lkp] " Kirill A. Shutemov
2016-06-08  7:21   ` Huang, Ying
2016-06-08  7:21     ` [LKP] [lkp] " Huang, Ying
2016-06-08  8:41     ` Huang, Ying
2016-06-08  8:41       ` [LKP] [lkp] " Huang, Ying
2016-06-08  8:58       ` Kirill A. Shutemov
2016-06-08  8:58         ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-12  0:49         ` Huang, Ying
2016-06-12  0:49           ` [LKP] [lkp] " Huang, Ying
2016-06-12  1:02           ` Linus Torvalds
2016-06-12  1:02             ` [LKP] [lkp] " Linus Torvalds
2016-06-13  9:02             ` Huang, Ying
2016-06-13  9:02               ` [LKP] [lkp] " Huang, Ying
2016-06-14 13:38               ` Minchan Kim
2016-06-14 13:38                 ` [LKP] [lkp] " Minchan Kim
2016-06-15 23:42                 ` Huang, Ying
2016-06-15 23:42                   ` [LKP] [lkp] " Huang, Ying
2016-06-13 12:52             ` Kirill A. Shutemov
2016-06-13 12:52               ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-14  6:11               ` Linus Torvalds
2016-06-14  6:11                 ` [LKP] [lkp] " Linus Torvalds
2016-06-14  8:26                 ` Kirill A. Shutemov
2016-06-14  8:26                   ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-14 16:07                   ` Rik van Riel
2016-06-14 16:07                     ` [LKP] [lkp] " Rik van Riel
2016-06-14 14:03                 ` Christian Borntraeger [this message]
2016-06-14 14:03                   ` Christian Borntraeger
2016-06-14  8:57         ` Minchan Kim
2016-06-14  8:57           ` [LKP] [lkp] " Minchan Kim
2016-06-14 14:34           ` Kirill A. Shutemov
2016-06-14 14:34             ` [LKP] [lkp] " Kirill A. Shutemov
2016-06-15 23:52             ` Huang, Ying
2016-06-15 23:52               ` [LKP] [lkp] " Huang, Ying
2016-06-16  0:13               ` Minchan Kim
2016-06-16  0:13                 ` [LKP] [lkp] " Minchan Kim
2016-06-16 22:27                 ` Huang, Ying
2016-06-16 22:27                   ` [LKP] [lkp] " Huang, Ying
2016-06-17  5:41                   ` Minchan Kim
2016-06-17  5:41                     ` [LKP] [lkp] " Minchan Kim
2016-06-17 19:26                     ` Huang, Ying
2016-06-17 19:26                       ` [LKP] [lkp] " Huang, Ying
2016-06-20  0:06                       ` Minchan Kim
2016-06-20  0:06                         ` [LKP] [lkp] " Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57600EAB.9030000@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vinmenon@codeaurora.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.