From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751539AbcFNI0W (ORCPT ); Tue, 14 Jun 2016 04:26:22 -0400 Received: from mail-lf0-f67.google.com ([209.85.215.67]:34697 "EHLO mail-lf0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751186AbcFNI0S (ORCPT ); Tue, 14 Jun 2016 04:26:18 -0400 Date: Tue, 14 Jun 2016 11:26:13 +0300 From: "Kirill A. Shutemov" To: Linus Torvalds , Rik van Riel , Mel Gorman Cc: "Kirill A. Shutemov" , "Huang, Ying" , Michal Hocko , LKML , Michal Hocko , Minchan Kim , Vinayak Menon , Andrew Morton , LKP , Dave Hansen , Vladimir Davydov Subject: Re: [LKP] [lkp] [mm] 5c0a85fad9: unixbench.score -6.3% regression Message-ID: <20160614082613.GA1066@node.shutemov.name> References: <20160606022724.GA26227@yexl-desktop> <20160606095136.GA79951@black.fi.intel.com> <87a8iw5enf.fsf@yhuang-dev.intel.com> <8760tk5aym.fsf@yhuang-dev.intel.com> <20160608085811.GB12655@black.fi.intel.com> <87porn44fm.fsf@yhuang-dev.intel.com> <20160613125248.GA30109@black.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 13, 2016 at 11:11:05PM -0700, Linus Torvalds wrote: > On Mon, Jun 13, 2016 at 5:52 AM, Kirill A. Shutemov > wrote: > > On Sat, Jun 11, 2016 at 06:02:57PM -0700, Linus Torvalds wrote: > >> > >> I've timed it at over a thousand cycles on at least some CPU's, but > >> that's still peanuts compared to a real page fault. It shouldn't be > >> *that* noticeable, ie no way it's a 6% regression on its own. > > > > Looks like setting accessed bit is the problem. > > Ok. I've definitely seen it as an issue, but never to the point of > several percent on a real benchmark that wasn't explicitly testing > that cost. > > I reported the excessive dirty/accessed bit cost to Intel back in the > P4 days, but it's apparently not been high enough for anybody to care. > > > We spend 36% more time in page walk only, about 1% of total userspace time. > > Combining this with page walk footprint on caches, I guess we can get to > > this 3.5% score difference I see. > > > > I'm not sure if there's anything we can do to solve the issue without > > screwing relacim logic again. :( > > I think we should say "screw the reclaim logic" for now, and revert > commit 5c0a85fad949 for now. Okay. I'll prepare the patch. > Considering how much trouble the accessed bit is on some other > architectures too, I wonder if we should strive to simply not care > about it, and always leaving it set. And then rely entirely on just > unmapping the pages and making the "we took a page fault after > unmapping" be the real activity tester. > > So get rid of the "if the page is young, mark it old but leave it in > the page tables" logic entirely. When we unmap a page, it will always > either be in the swap cache or the page cache anyway, so faulting it > in again should be just a minor fault with no actual IO happening. > > That might be less of an impact in the end - yes, the unmap and > re-fault is much more expensive, but it presumably happens to much > fewer pages. > > What do you think? Well, we cannot do this for anonymous memory. No swap -- no swap cache, if I read code correctly. I guess it's doable for file mappings. Although I would expect regressions in other benchmarks. IIUC, it would require page unmapping to propogate page to active list, which is suboptimal. And implications for page_idle is not clear to me. Rik, Mel, any comments? -- Kirill A. Shutemov