From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751822AbbJTByZ (ORCPT ); Mon, 19 Oct 2015 21:54:25 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:36109 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751195AbbJTByY (ORCPT ); Mon, 19 Oct 2015 21:54:24 -0400 Subject: Re: [PATCH 0/3] hugetlbfs fallocate hole punch race with page faults To: Andrew Morton References: <1445033310-13155-1-git-send-email-mike.kravetz@oracle.com> <20151019161840.63e6afaa73aceec23e351905@linux-foundation.org> From: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Hansen , Naoya Horiguchi , Hugh Dickins , Davidlohr Bueso Message-ID: <56259EC4.9010207@oracle.com> Date: Mon, 19 Oct 2015 18:54:12 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151019161840.63e6afaa73aceec23e351905@linux-foundation.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: userv0022.oracle.com [156.151.31.74] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/19/2015 04:18 PM, Andrew Morton wrote: > On Fri, 16 Oct 2015 15:08:27 -0700 Mike Kravetz wrote: > >> The hugetlbfs fallocate hole punch code can race with page faults. The >> result is that after a hole punch operation, pages may remain within the >> hole. No other side effects of this race were observed. >> >> In preparation for adding userfaultfd support to hugetlbfs, it is desirable >> to plug or significantly shrink this hole. This patch set uses the same >> mechanism employed in shmem (see commit f00cdc6df7). >> > > "still buggy but not as bad as before" isn't what we strive for ;) What > would it take to fix this for real? An exhaustive description of the > bug would be a good starting point, thanks. > Thanks for asking, it made me look closer at ways to resolve this. The current code in remove_inode_hugepages() does nothing with a page if it is still mapped. The only way it can be mapped is if we race and take a page fault after unmapping, but before the page is removed. This patch set makes that window much smaller, but it still exists. Instead of "giving up" on a mapped page, remove_inode_hugepages() can go back and unmap it. I'll code this up tomorrow. Fortunately, it is pretty easy to hit these races and verify proper behavior. I'll create a new patch set with this combined code for a complete fix. -- Mike Kravetz