From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752495AbcFNQHh (ORCPT ); Tue, 14 Jun 2016 12:07:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54329 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751329AbcFNQHf (ORCPT ); Tue, 14 Jun 2016 12:07:35 -0400 Message-ID: <1465920449.2756.12.camel@redhat.com> Subject: Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression From: Rik van Riel To: "Kirill A. Shutemov" , Linus Torvalds , Mel Gorman Cc: "Kirill A. Shutemov" , "Huang, Ying" , Michal Hocko , LKML , Michal Hocko , Minchan Kim , Vinayak Menon , Andrew Morton , LKP , Dave Hansen , Vladimir Davydov Date: Tue, 14 Jun 2016 12:07:29 -0400 In-Reply-To: <20160614082613.GA1066@node.shutemov.name> References: <20160606022724.GA26227@yexl-desktop> <20160606095136.GA79951@black.fi.intel.com> <87a8iw5enf.fsf@yhuang-dev.intel.com> <8760tk5aym.fsf@yhuang-dev.intel.com> <20160608085811.GB12655@black.fi.intel.com> <87porn44fm.fsf@yhuang-dev.intel.com> <20160613125248.GA30109@black.fi.intel.com> <20160614082613.GA1066@node.shutemov.name> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-UBw4qoouh4jYvCDkFyGk" Mime-Version: 1.0 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 14 Jun 2016 16:07:35 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-UBw4qoouh4jYvCDkFyGk Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2016-06-14 at 11:26 +0300, Kirill A. Shutemov wrote: > On Mon, Jun 13, 2016 at 11:11:05PM -0700, Linus Torvalds wrote: > >=20 > > On Mon, Jun 13, 2016 at 5:52 AM, Kirill A. Shutemov > > wrote: > > >=20 > > > On Sat, Jun 11, 2016 at 06:02:57PM -0700, Linus Torvalds wrote: > > > >=20 > > > >=20 > > > > I've timed it at over a thousand cycles on at least some CPU's, > > > > but > > > > that's still peanuts compared to a real page fault. It > > > > shouldn't be > > > > *that* noticeable, ie no way it's a 6% regression on its own. > > > Looks like setting accessed bit is the problem. > > Ok. I've definitely seen it as an issue, but never to the point of > > several percent on a real benchmark that wasn't explicitly testing > > that cost. > >=20 > > I reported the excessive dirty/accessed bit cost to Intel back in > > the > > P4 days, but it's apparently not been high enough for anybody to > > care. > >=20 > > >=20 > > > We spend 36% more time in page walk only, about 1% of total > > > userspace time. > > > Combining this with page walk footprint on caches, I guess we can > > > get to > > > this 3.5% score difference I see. > > >=20 > > > I'm not sure if there's anything we can do to solve the issue > > > without > > > screwing relacim logic again. :( > > I think we should say "screw the reclaim logic" for now, and revert > > commit 5c0a85fad949 for now. > Okay. I'll prepare the patch. >=20 > >=20 > > Considering how much trouble the accessed bit is on some other > > architectures too, I wonder if we should strive to simply not care > > about it, and always leaving it set. And then rely entirely on just > > unmapping the pages and making the "we took a page fault after > > unmapping" be the real activity tester. > >=20 > > So get rid of the "if the page is young, mark it old but leave it > > in > > the page tables" logic entirely. When we unmap a page, it will > > always > > either be in the swap cache or the page cache anyway, so faulting > > it > > in again should be just a minor fault with no actual IO happening. > >=20 > > That might be less of an impact in the end - yes, the unmap and > > re-fault is much more expensive, but it presumably happens to much > > fewer pages. > >=20 > > What do you think? > Well, we cannot do this for anonymous memory. No swap -- no swap > cache, if > I read code correctly. >=20 > I guess it's doable for file mappings. Although I would expect > regressions > in other benchmarks. IIUC, it would require page unmapping to > propogate > page to active list, which is suboptimal. >=20 > And implications for page_idle is not clear to me. >=20 > Rik, Mel, any comments? We can clear the accessed/young bit when anon pages are moved from the active to the inactive list. Reclaim does not care about the young bit on active anon pages at all. For anon pages it uses a two hand clock algorithm, with only pages on the inactive list being cared about. For file pages, I believe we do look at the young bit on mapped pages when they reach the end of the inactive list. Again, we only care about the young bit on inactive pages. One option may be to count on actively used file pages actually being on the active list, and always set the young bit on ptes when the page is already active. Then we can let reclaim do its thing with the smaller number of pages that are on the inactive list, while doing the faster thing for pages that are on the active list. Does that make sense? --=20 All Rights Reversed. --=-UBw4qoouh4jYvCDkFyGk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXYCvBAAoJEM553pKExN6D940IALwnx4irFQwgtQNxQs17Ixje qrkka5ej4EXVG5A0PmNO68NNDhqTOekhiOx0yGtF89xkXKFhULml8GkPMfWegb7V BqyHyyS4u0mW/SAgdXDn+TV6nLjcAp1L93K4a5SpefckYXzFlfbXOdpP+1fvNA4I iNidqtLwYTw6bK138l3a9hmBf15gNWNQfe0Oo7Qt6HTvcg9Eb9bUoaF0x5JaIwv8 FSp4ayUOZIgLktsmZIddLRSKUvkW44ndB7OB5VdL50h2+O38gA7CCz6qRj45ONBF /UxB6VlVE2yWA4b4FffngWwBkZdo2QFS9hgMM7bDE4Xk4UY0BhkBCXraHENb9nk= =RAk8 -----END PGP SIGNATURE----- --=-UBw4qoouh4jYvCDkFyGk--