From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Kirill A. Shutemov" Date: Thu, 16 Aug 2012 16:43:56 +0000 Subject: Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address Message-Id: <20120816164356.GA30106@shutemov.name> List-Id: References: <1345130154-9602-1-git-send-email-kirill.shutemov@linux.intel.com> <1345130154-9602-7-git-send-email-kirill.shutemov@linux.intel.com> <20120816161647.GM11188@redhat.com> In-Reply-To: <20120816161647.GM11188@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Andrea Arcangeli Cc: "Kirill A. Shutemov" , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Andi Kleen , Tim Chen , Alex Shi , Jan Beulich , Robert Richter , Andy Lutomirski , Andrew Morton , Johannes Weiner , Hugh Dickins , KAMEZAWA Hiroyuki , Mel Gorman , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mips@linux-mips.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org On Thu, Aug 16, 2012 at 06:16:47PM +0200, Andrea Arcangeli wrote: > Hi Kirill, > > On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote: > > for (i = 0; i < pages_per_huge_page; > > i++, p = mem_map_next(p, page, i)) { > > It may be more optimal to avoid a multiplication/shiftleft before the > add, and to do: > > for (i = 0, vaddr = haddr; i < pages_per_huge_page; > i++, p = mem_map_next(p, page, i), vaddr += PAGE_SIZE) { > Makes sense. I'll update it. > > cond_resched(); > > - clear_user_highpage(p, addr + i * PAGE_SIZE); > > + vaddr = haddr + i*PAGE_SIZE; > > Not sure if gcc can optimize it away because of the external calls. > > > + if (!ARCH_HAS_USER_NOCACHE || i = target) > > + clear_user_highpage(page + i, vaddr); > > + else > > + clear_user_highpage_nocache(page + i, vaddr); > > } > > > My only worry overall is if there can be some workload where this may > actually slow down userland if the CPU cache is very large and > userland would access most of the faulted in memory after the first > fault. > > So I wouldn't mind to add one more check in addition of > !ARCH_HAS_USER_NOCACHE above to check a runtime sysctl variable. It'll > waste a cacheline yes but I doubt it's measurable compared to the time > it takes to do a >=2M hugepage copy. Hm.. I think with static_key we can avoid cache overhead here. I'll try. > Furthermore it would allow people to benchmark its effect without > having to rebuild the kernel themself. > > All other patches looks fine to me. Thanks, for review. Could you take a look at huge zero page patchset? ;) -- Kirill A. Shutemov