From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758652AbcIWKgi (ORCPT ); Fri, 23 Sep 2016 06:36:38 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:48564 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752747AbcIWKgf (ORCPT ); Fri, 23 Sep 2016 06:36:35 -0400 Date: Fri, 23 Sep 2016 12:36:22 +0200 From: Gerald Schaefer To: Dave Hansen Cc: Andrew Morton , Michal Hocko , Naoya Horiguchi , Hillf Danton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A . Shutemov" , Vlastimil Babka , Mike Kravetz , "Aneesh Kumar K . V" , Martin Schwidefsky , Heiko Carstens , Rui Teng Subject: Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size In-Reply-To: <57E41EF6.1010903@linux.intel.com> References: <20160920155354.54403-1-gerald.schaefer@de.ibm.com> <20160920155354.54403-2-gerald.schaefer@de.ibm.com> <05d701d213d1$7fb70880$7f251980$@alibaba-inc.com> <20160921143534.0dd95fe7@thinkpad> <20160922095137.GC11875@dhcp22.suse.cz> <20160922154549.483ee313@thinkpad> <20160922182937.38af9d0e@thinkpad> <57E41EF6.1010903@linux.intel.com> X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16092310-0012-0000-0000-0000046316D2 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16092310-0013-0000-0000-0000158E7AC1 Message-Id: <20160923123622.00289d21@thinkpad> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-23_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609020000 definitions=main-1609230195 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 22 Sep 2016 11:12:06 -0700 Dave Hansen wrote: > On 09/22/2016 09:29 AM, Gerald Schaefer wrote: > > static void dissolve_free_huge_page(struct page *page) > > { > > + struct page *head = compound_head(page); > > + struct hstate *h = page_hstate(head); > > + int nid = page_to_nid(head); > > + > > spin_lock(&hugetlb_lock); > > - if (PageHuge(page) && !page_count(page)) { > > - struct hstate *h = page_hstate(page); > > - int nid = page_to_nid(page); > > - list_del(&page->lru); > > - h->free_huge_pages--; > > - h->free_huge_pages_node[nid]--; > > - h->max_huge_pages--; > > - update_and_free_page(h, page); > > - } > > + list_del(&head->lru); > > + h->free_huge_pages--; > > + h->free_huge_pages_node[nid]--; > > + h->max_huge_pages--; > > + update_and_free_page(h, head); > > spin_unlock(&hugetlb_lock); > > } > > Do you need to revalidate anything once you acquire the lock? Can this, > for instance, race with another thread doing vm.nr_hugepages=0? Or a > thread faulting in and allocating the large page that's being dissolved? > Yes, good point. I was relying on the range being isolated, but that only seems to be checked in dequeue_huge_page_node(), as introduced with the original commit. So this would only protect against anyone allocating the hugepage at this point. This is also somehow expected, since we already are beyond the "point of no return" in offline_pages(). vm.nr_hugepages=0 seems to be an issue though, as set_max_hugepages() will not care about isolation, and so I guess we could have a race here and double-free the hugepage. Revalidation of at least PageHuge() after taking the lock should protect from that, not sure about page_count(), but I think I'll just check both which will give the same behaviour as before. Will send v4, after thinking a bit more on the page reservation point brought up by Mike.