From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13056C433EF for ; Mon, 18 Oct 2021 07:58:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8E20060F12 for ; Mon, 18 Oct 2021 07:58:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8E20060F12 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DBDC16B006C; Mon, 18 Oct 2021 03:58:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D6E96900002; Mon, 18 Oct 2021 03:58:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C35D96B0072; Mon, 18 Oct 2021 03:58:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id B75546B006C for ; Mon, 18 Oct 2021 03:58:56 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6E59932086 for ; Mon, 18 Oct 2021 07:58:56 +0000 (UTC) X-FDA: 78708807072.14.02DF8FD Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf04.hostedemail.com (Postfix) with ESMTP id 0968D50000AE for ; Mon, 18 Oct 2021 07:58:53 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C18461FD6D; Mon, 18 Oct 2021 07:58:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1634543934; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uNgX69E0dWc+93bcddHwiVjMEdBTIeLB3NxAJ7mr6T0=; b=ZIYy/RKe44wLR6SyLqd9xNki3zpzkHqCMtudcRZ/vcTS9ocrAC/BQIJhj0N4Lm+aHoSl/B RmQMiWu7NZqaHuN6QmCTCQ0m+UYpuAZZsSTVHRY0LM9GGhX5qBgKtBVbfM7oY0vUkAZ43n gcwXhqJnWHQWLNLXUYBjgIBI0IB/eJY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1634543934; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=uNgX69E0dWc+93bcddHwiVjMEdBTIeLB3NxAJ7mr6T0=; b=jFSbw4Ug3ZNZAJ+/8ckuWENbH5cYcLTeNEsDMaAND7jKpipKdatrB7WUfgKbtoxhbqyAt8 3btMSGmPZmHB+tDA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F319713CC9; Mon, 18 Oct 2021 07:58:53 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EqW6Nz0pbWFyHAAAMHmgww (envelope-from ); Mon, 18 Oct 2021 07:58:53 +0000 Date: Mon, 18 Oct 2021 09:58:52 +0200 From: Oscar Salvador To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Michal Hocko , Zi Yan , Muchun Song , Naoya Horiguchi , David Rientjes , "Aneesh Kumar K . V" , Nghia Le , Andrew Morton Subject: Re: [PATCH v4 4/5] hugetlb: add demote bool to gigantic page routines Message-ID: <20211018075851.GB11960@linux> References: <20211007181918.136982-1-mike.kravetz@oracle.com> <20211007181918.136982-5-mike.kravetz@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211007181918.136982-5-mike.kravetz@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0968D50000AE X-Stat-Signature: 19g8u3g8rmbj49yncntgrdjtn3pjnbkr Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b="ZIYy/RKe"; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=jFSbw4Ug; spf=pass (imf04.hostedemail.com: domain of osalvador@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de X-HE-Tag: 1634543933-872482 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 07, 2021 at 11:19:17AM -0700, Mike Kravetz wrote: > The routines remove_hugetlb_page and destroy_compound_gigantic_page > will remove a gigantic page and make the set of base pages ready to be > returned to a lower level allocator. In the process of doing this, they > make all base pages reference counted. > > The routine prep_compound_gigantic_page creates a gigantic page from a > set of base pages. It assumes that all these base pages are reference > counted. > > During demotion, a gigantic page will be split into huge pages of a > smaller size. This logically involves use of the routines, > remove_hugetlb_page, and destroy_compound_gigantic_page followed by > prep_compound*_page for each smaller huge page. > > When pages are reference counted (ref count >= 0), additional > speculative ref counts could be taken. This could result in errors It would be great to learn about those cases involving speculative ref counts. > while demoting a huge page. Quite a bit of code would need to be > created to handle all possible issues. > > Instead of dealing with the possibility of speculative ref counts, avoid > the possibility by keeping ref counts at zero during the demote process. > Add a boolean 'demote' to the routines remove_hugetlb_page, > destroy_compound_gigantic_page and prep_compound_gigantic_page. If the > boolean is set, the remove and destroy routines will not reference count > pages and the prep routine will not expect reference counted pages. > > '*_for_demote' wrappers of the routines will be added in a subsequent > patch where this functionality is used. > > Signed-off-by: Mike Kravetz Reviewed-by: Oscar Salvador > --- > mm/hugetlb.c | 54 +++++++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 43 insertions(+), 11 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 563338f4dbc4..794e0c4c1b3c 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1271,8 +1271,8 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed) > nr_nodes--) > > #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE > -static void destroy_compound_gigantic_page(struct page *page, > - unsigned int order) > +static void __destroy_compound_gigantic_page(struct page *page, > + unsigned int order, bool demote) > { > int i; > int nr_pages = 1 << order; > @@ -1284,7 +1284,8 @@ static void destroy_compound_gigantic_page(struct page *page, > for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) { > p->mapping = NULL; > clear_compound_head(p); > - set_page_refcounted(p); > + if (!demote) > + set_page_refcounted(p); > } > > set_compound_order(page, 0); > @@ -1292,6 +1293,12 @@ static void destroy_compound_gigantic_page(struct page *page, > __ClearPageHead(page); > } > > +static void destroy_compound_gigantic_page(struct page *page, > + unsigned int order) > +{ > + __destroy_compound_gigantic_page(page, order, false); > +} > + > static void free_gigantic_page(struct page *page, unsigned int order) > { > /* > @@ -1364,12 +1371,15 @@ static inline void destroy_compound_gigantic_page(struct page *page, > > /* > * Remove hugetlb page from lists, and update dtor so that page appears > - * as just a compound page. A reference is held on the page. > + * as just a compound page. > + * > + * A reference is held on the page, except in the case of demote. > * > * Must be called with hugetlb lock held. > */ > -static void remove_hugetlb_page(struct hstate *h, struct page *page, > - bool adjust_surplus) > +static void __remove_hugetlb_page(struct hstate *h, struct page *page, > + bool adjust_surplus, > + bool demote) > { > int nid = page_to_nid(page); > > @@ -1407,8 +1417,12 @@ static void remove_hugetlb_page(struct hstate *h, struct page *page, > * > * This handles the case where more than one ref is held when and > * after update_and_free_page is called. > + * > + * In the case of demote we do not ref count the page as it will soon > + * be turned into a page of smaller size. > */ > - set_page_refcounted(page); > + if (!demote) > + set_page_refcounted(page); > if (hstate_is_gigantic(h)) > set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > else > @@ -1418,6 +1432,12 @@ static void remove_hugetlb_page(struct hstate *h, struct page *page, > h->nr_huge_pages_node[nid]--; > } > > +static void remove_hugetlb_page(struct hstate *h, struct page *page, > + bool adjust_surplus) > +{ > + __remove_hugetlb_page(h, page, adjust_surplus, false); > +} > + > static void add_hugetlb_page(struct hstate *h, struct page *page, > bool adjust_surplus) > { > @@ -1681,7 +1701,8 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > spin_unlock_irq(&hugetlb_lock); > } > > -static bool prep_compound_gigantic_page(struct page *page, unsigned int order) > +static bool __prep_compound_gigantic_page(struct page *page, unsigned int order, > + bool demote) > { > int i, j; > int nr_pages = 1 << order; > @@ -1719,10 +1740,16 @@ static bool prep_compound_gigantic_page(struct page *page, unsigned int order) > * the set of pages can not be converted to a gigantic page. > * The caller who allocated the pages should then discard the > * pages using the appropriate free interface. > + * > + * In the case of demote, the ref count will be zero. > */ > - if (!page_ref_freeze(p, 1)) { > - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); > - goto out_error; > + if (!demote) { > + if (!page_ref_freeze(p, 1)) { > + pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); > + goto out_error; > + } > + } else { > + VM_BUG_ON_PAGE(page_count(p), p); > } > set_page_count(p, 0); > set_compound_head(p, page); > @@ -1747,6 +1774,11 @@ static bool prep_compound_gigantic_page(struct page *page, unsigned int order) > return false; > } > > +static bool prep_compound_gigantic_page(struct page *page, unsigned int order) > +{ > + return __prep_compound_gigantic_page(page, order, false); > +} > + > /* > * PageHuge() only returns true for hugetlbfs pages, but not for normal or > * transparent huge pages. See the PageTransHuge() documentation for more > -- > 2.31.1 > -- Oscar Salvador SUSE Labs