From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933035AbcBPPqy (ORCPT ); Tue, 16 Feb 2016 10:46:54 -0500 Received: from mga03.intel.com ([134.134.136.65]:24513 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932300AbcBPPqv (ORCPT ); Tue, 16 Feb 2016 10:46:51 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,455,1449561600"; d="scan'208";a="886094758" Subject: Re: [PATCHv2 17/28] thp: skip file huge pmd on copy_huge_pmd() To: "Kirill A. Shutemov" References: <1455200516-132137-1-git-send-email-kirill.shutemov@linux.intel.com> <1455200516-132137-18-git-send-email-kirill.shutemov@linux.intel.com> <56BE2781.7060808@intel.com> <20160216101450.GE46557@black.fi.intel.com> Cc: Hugh Dickins , Andrea Arcangeli , Andrew Morton , Vlastimil Babka , Christoph Lameter , Naoya Horiguchi , Jerome Marchand , Yang Shi , Sasha Levin , linux-kernel@vger.kernel.org, linux-mm@kvack.org From: Dave Hansen Message-ID: <56C3445D.3040305@intel.com> Date: Tue, 16 Feb 2016 07:46:37 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <20160216101450.GE46557@black.fi.intel.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/16/2016 02:14 AM, Kirill A. Shutemov wrote: > On Fri, Feb 12, 2016 at 10:42:09AM -0800, Dave Hansen wrote: >> On 02/11/2016 06:21 AM, Kirill A. Shutemov wrote: >>> File pmds can be safely skip on copy_huge_pmd(), we can re-fault them >>> later. COW for file mappings handled on pte level. >> >> Is this different from 4k pages? I figured we might skip copying >> file-backed ptes on fork, but I couldn't find the code. > > Currently, we only filter out on per-VMA basis. See first comment in > copy_page_range(). > > Here we handle PMD mapped file pages in COW mapping. File THP can be > mapped into COW mapping as result of read page fault. OK... So, copy_page_range() has a check for "Don't copy ptes where a page fault will fill them correctly." Seems sane enough, but the check is implemented using a check for the VMA having !vma->anon_vma, which is a head-scratcher for a moment. Why does that apply to huge tmpfs? Ahh, MAP_PRIVATE. MAP_PRIVATE vmas have ->anon_vma because they have essentially-anonymous pages for when they do a COW, so they don't hit that check and they go through the copy_*() functions, including copy_huge_pmd(). We don't handle 2M COW operations yet so we simply decline to copy these pages. Might cost us page faults down the road, but it makes things easier to implement for now. Did I get that right? Any chance we could get a bit of that into the patch descriptions so that the next hapless reviewer can spend their time looking at your code instead of relearning the fork() handling for MAP_PRIVATE?