From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757135AbcIUMfq (ORCPT ); Wed, 21 Sep 2016 08:35:46 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:43648 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754806AbcIUMfn (ORCPT ); Wed, 21 Sep 2016 08:35:43 -0400 Date: Wed, 21 Sep 2016 14:35:34 +0200 From: Gerald Schaefer To: Andrew Morton , Naoya Horiguchi Cc: Hillf Danton , , , Michal Hocko , "Kirill A . Shutemov" , Vlastimil Babka , Mike Kravetz , "Aneesh Kumar K . V" , Martin Schwidefsky , Heiko Carstens , Dave Hansen , Rui Teng , Gerald Schaefer Subject: [PATCH v2 1/1] mm/hugetlb: fix memory offline with hugepage size > memory block size In-Reply-To: <05d701d213d1$7fb70880$7f251980$@alibaba-inc.com> References: <20160920155354.54403-1-gerald.schaefer@de.ibm.com> <20160920155354.54403-2-gerald.schaefer@de.ibm.com> <05d701d213d1$7fb70880$7f251980$@alibaba-inc.com> Organization: IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz / Geschaeftsfuehrung: Dirk Wittkopp / Sitz der Gesellschaft: Boeblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 X-Mailer: Claws Mail 3.9.0 (GTK+ 2.24.23; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16092112-0020-0000-0000-0000024F9120 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16092112-0021-0000-0000-00003E464E1D Message-Id: <20160921143534.0dd95fe7@thinkpad> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-21_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609020000 definitions=main-1609210228 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a list corruption and addressing exception when trying to set a memory block offline that is part (but not the first part) of a hugetlb page with a size > memory block size. When no other smaller hugetlb page sizes are present, the VM_BUG_ON() will trigger directly. In the other case we will run into an addressing exception later, because dissolve_free_huge_page() will not work on the head page of the compound hugetlb page which will result in a NULL hstate from page_hstate(). To fix this, first remove the VM_BUG_ON() because it is wrong, and then use the compound head page in dissolve_free_huge_page(). Also change locking in dissolve_free_huge_page(), so that it only takes the lock when actually removing a hugepage. Signed-off-by: Gerald Schaefer --- Changes in v2: - Update comment in dissolve_free_huge_pages() - Change locking in dissolve_free_huge_page() mm/hugetlb.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 87e11d8..1522af8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1441,23 +1441,30 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, */ static void dissolve_free_huge_page(struct page *page) { + struct page *head = compound_head(page); + struct hstate *h; + int nid; + + if (page_count(head)) + return; + + h = page_hstate(head); + nid = page_to_nid(head); + spin_lock(&hugetlb_lock); - if (PageHuge(page) && !page_count(page)) { - struct hstate *h = page_hstate(page); - int nid = page_to_nid(page); - list_del(&page->lru); - h->free_huge_pages--; - h->free_huge_pages_node[nid]--; - h->max_huge_pages--; - update_and_free_page(h, page); - } + list_del(&head->lru); + h->free_huge_pages--; + h->free_huge_pages_node[nid]--; + h->max_huge_pages--; + update_and_free_page(h, head); spin_unlock(&hugetlb_lock); } /* * Dissolve free hugepages in a given pfn range. Used by memory hotplug to * make specified memory blocks removable from the system. - * Note that start_pfn should aligned with (minimum) hugepage size. + * Note that this will dissolve a free gigantic hugepage completely, if any + * part of it lies within the given range. */ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) { @@ -1466,9 +1473,9 @@ void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) if (!hugepages_supported()) return; - VM_BUG_ON(!IS_ALIGNED(start_pfn, 1 << minimum_order)); for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) - dissolve_free_huge_page(pfn_to_page(pfn)); + if (PageHuge(pfn_to_page(pfn))) + dissolve_free_huge_page(pfn_to_page(pfn)); } /*