From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757242AbcFADJy (ORCPT ); Tue, 31 May 2016 23:09:54 -0400 Received: from out4133-66.mail.aliyun.com ([42.120.133.66]:37633 "EHLO out4133-66.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750927AbcFADJx (ORCPT ); Tue, 31 May 2016 23:09:53 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e02c03303;MF=hillf.zj@alibaba-inc.com;NM=1;PH=DS;RN=9;SR=0;TI=SMTPD_----4sJAmUp_1464750563; Reply-To: "Hillf Danton" From: "Hillf Danton" To: "'Mike Kravetz'" , , Cc: "'Dave Hansen'" , "'Kirill Shutemov'" , "'Michal Hocko'" , "'Naoya Horiguchi'" , "'Aneesh Kumar'" , "'Andrew Morton'" References: <1464720957-15698-1-git-send-email-mike.kravetz@oracle.com> In-Reply-To: <1464720957-15698-1-git-send-email-mike.kravetz@oracle.com> Subject: Re: [PATCH] mm/hugetlb: fix huge page reserve accounting for private mappings Date: Wed, 01 Jun 2016 11:09:23 +0800 Message-ID: <00a801d1bbb3$00980d40$01c827c0$@alibaba-inc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQFk43r3iyY8Jw3BQpb3Ss01GOZZYaCtnmsQ Content-Language: zh-cn Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > When creating a private mapping of a hugetlbfs file, it is possible to > unmap pages via ftruncate or fallocate hole punch. If subsequent faults > repopulate these mappings, the reserve counts will go negative. This > is because the code currently assumes all faults to private mappings will > consume reserves. The problem can be recreated as follows: > - mmap(MAP_PRIVATE) a file in hugetlbfs filesystem > - write fault in pages in the mapping > - fallocate(FALLOC_FL_PUNCH_HOLE) some pages in the mapping > - write fault in pages in the hole > This will result in negative huge page reserve counts and negative subpool > usage counts for the hugetlbfs. Note that this can also be recreated with > ftruncate, but fallocate is more straight forward. > > This patch modifies the routines vma_needs_reserves and vma_has_reserves > to examine the reserve map associated with private mappings similar to that > for shared mappings. However, the reserve map semantics for private and > shared mappings are very different. This results in subtly different code > that is explained in the comments. > > Signed-off-by: Mike Kravetz > --- Acked-by: Hillf Danton > mm/hugetlb.c | 42 ++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 40 insertions(+), 2 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 949d806..0949d0d 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -831,8 +831,27 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) > * Only the process that called mmap() has reserves for > * private mappings. > */ > - if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) > - return true; > + if (is_vma_resv_set(vma, HPAGE_RESV_OWNER)) { > + /* > + * Like the shared case above, a hole punch or truncate > + * could have been performed on the private mapping. > + * Examine the value of chg to determine if reserves > + * actually exist or were previously consumed. > + * Very Subtle - The value of chg comes from a previous > + * call to vma_needs_reserves(). The reserve map for > + * private mappings has different (opposite) semantics > + * than that of shared mappings. vma_needs_reserves() > + * has already taken this difference in semantics into > + * account. Therefore, the meaning of chg is the same > + * as in the shared case above. Code could easily be > + * combined, but keeping it separate draws attention to > + * subtle differences. > + */ > + if (chg) > + return false; > + else > + return true; > + } > > return false; > } > @@ -1815,6 +1834,25 @@ static long __vma_reservation_common(struct hstate *h, > > if (vma->vm_flags & VM_MAYSHARE) > return ret; > + else if (is_vma_resv_set(vma, HPAGE_RESV_OWNER) && ret >= 0) { > + /* > + * In most cases, reserves always exist for private mappings. > + * However, a file associated with mapping could have been > + * hole punched or truncated after reserves were consumed. > + * As subsequent fault on such a range will not use reserves. > + * Subtle - The reserve map for private mappings has the > + * opposite meaning than that of shared mappings. If NO > + * entry is in the reserve map, it means a reservation exists. > + * If an entry exists in the reserve map, it means the > + * reservation has already been consumed. As a result, the > + * return value of this routine is the opposite of the > + * value returned from reserve map manipulation routines above. > + */ > + if (ret) > + return 0; > + else > + return 1; > + } > else > return ret < 0 ? ret : 0; > } > -- > 2.4.11