From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90316C433EF for ; Mon, 25 Oct 2021 07:24:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F200260FBF for ; Mon, 25 Oct 2021 07:24:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org F200260FBF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 771126B0072; Mon, 25 Oct 2021 03:24:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 721366B0073; Mon, 25 Oct 2021 03:24:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CEFF940007; Mon, 25 Oct 2021 03:24:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id 4ED6D6B0072 for ; Mon, 25 Oct 2021 03:24:26 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 085C12DEA5 for ; Mon, 25 Oct 2021 07:24:26 +0000 (UTC) X-FDA: 78734121732.33.0877FCA Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf09.hostedemail.com (Postfix) with ESMTP id 851753000103 for ; Mon, 25 Oct 2021 07:24:25 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 85D2E21709; Mon, 25 Oct 2021 07:24:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1635146664; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=PLIElNgWls0m10MMnxTbJ7+jlz26AIDr6YPYX4k5c9k=; b=fEXGKlkVBpaYKisUwoF3zfejibRCigYixOXzlyRQznC3mEedTlquKSxOkSm5VSxYFySeod OEMftDc+XFucVXkv/YJDkGZx9524EF0QyDkAu9UsLST07cL5uJ1ZdDgs+nwxHLG3ptNusy kmUUJV7JEa9Q4bnzCqmBHoFxipCaAJM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1635146664; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=PLIElNgWls0m10MMnxTbJ7+jlz26AIDr6YPYX4k5c9k=; b=2YoaYmLZFjw2y508RUgR4cT5SDxeObt+gisR1KW33W6Hm3GOi2SxVwjVkxQ+gk0UiTWTRP pv3wuldWwJvWa/Dw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D5E131342A; Mon, 25 Oct 2021 07:24:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id LcV/MadbdmGFCAAAMHmgww (envelope-from ); Mon, 25 Oct 2021 07:24:23 +0000 Date: Mon, 25 Oct 2021 09:24:22 +0200 From: Oscar Salvador To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Hildenbrand , Michal Hocko , Zi Yan , Muchun Song , Naoya Horiguchi , David Rientjes , "Aneesh Kumar K . V" , Nghia Le , Andrew Morton Subject: Re: [PATCH v4 1/5] hugetlb: add demote hugetlb page sysfs interfaces Message-ID: <20211025072421.GB6338@linux> References: <20211007181918.136982-1-mike.kravetz@oracle.com> <20211007181918.136982-2-mike.kravetz@oracle.com> <47e53389-638a-1af1-e94f-b3c7e5e7459e@oracle.com> <20211018073552.GA11960@linux> <0530e4ef-2492-5186-f919-5db68edea654@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0530e4ef-2492-5186-f919-5db68edea654@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 851753000103 X-Stat-Signature: u77ou6buria64yssmb3qp4yzpkfx17ra Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=fEXGKlkV; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=2YoaYmLZ; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf09.hostedemail.com: domain of osalvador@suse.de designates 195.135.220.28 as permitted sender) smtp.mailfrom=osalvador@suse.de X-HE-Tag: 1635146665-522049 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 22, 2021 at 11:58:42AM -0700, Mike Kravetz wrote: > From f9c401323fee234667787a118c74d93aa185fcf6 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz > Date: Fri, 22 Oct 2021 11:40:57 -0700 > Subject: [PATCH v4 1/5] hugetlb: add demote hugetlb page sysfs interfaces > > Two new sysfs files are added to demote hugtlb pages. These files are > both per-hugetlb page size and per node. Files are: > demote_size - The size in Kb that pages are demoted to. (read-write) > demote - The number of huge pages to demote. (write-only) > > By default, demote_size is the next smallest huge page size. Valid huge > page sizes less than huge page size may be written to this file. When > huge pages are demoted, they are demoted to this size. > > Writing a value to demote will result in an attempt to demote that > number of hugetlb pages to an appropriate number of demote_size pages. > > NOTE: Demote interfaces are only provided for huge page sizes if there > is a smaller target demote huge page size. For example, on x86 1GB huge > pages will have demote interfaces. 2MB huge pages will not have demote > interfaces. > > This patch does not provide full demote functionality. It only provides > the sysfs interfaces. > > It also provides documentation for the new interfaces. > > Signed-off-by: Mike Kravetz Reviewed-by: Oscar Salvador > --- > Documentation/admin-guide/mm/hugetlbpage.rst | 30 +++- > include/linux/hugetlb.h | 1 + > mm/hugetlb.c | 155 ++++++++++++++++++- > 3 files changed, 183 insertions(+), 3 deletions(-) > > diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst > index 8abaeb144e44..bb90de3885d1 100644 > --- a/Documentation/admin-guide/mm/hugetlbpage.rst > +++ b/Documentation/admin-guide/mm/hugetlbpage.rst > @@ -234,8 +234,12 @@ will exist, of the form:: > > hugepages-${size}kB > > -Inside each of these directories, the same set of files will exist:: > +Inside each of these directories, the set of files contained in ``/proc`` > +will exist. In addition, two additional interfaces for demoting huge > +pages may exist:: > > + demote > + demote_size > nr_hugepages > nr_hugepages_mempolicy > nr_overcommit_hugepages > @@ -243,7 +247,29 @@ Inside each of these directories, the same set of files will exist:: > resv_hugepages > surplus_hugepages > > -which function as described above for the default huge page-sized case. > +The demote interfaces provide the ability to split a huge page into > +smaller huge pages. For example, the x86 architecture supports both > +1GB and 2MB huge pages sizes. A 1GB huge page can be split into 512 > +2MB huge pages. Demote interfaces are not available for the smallest > +huge page size. The demote interfaces are: > + > +demote_size > + is the size of demoted pages. When a page is demoted a corresponding > + number of huge pages of demote_size will be created. By default, > + demote_size is set to the next smaller huge page size. If there are > + multiple smaller huge page sizes, demote_size can be set to any of > + these smaller sizes. Only huge page sizes less than the current huge > + pages size are allowed. > + > +demote > + is used to demote a number of huge pages. A user with root privileges > + can write to this file. It may not be possible to demote the > + requested number of huge pages. To determine how many pages were > + actually demoted, compare the value of nr_hugepages before and after > + writing to the demote interface. demote is a write only interface. > + > +The interfaces which are the same as in ``/proc`` (all except demote and > +demote_size) function as described above for the default huge page-sized case. > > .. _mem_policy_and_hp_alloc: > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 1faebe1cd0ed..f2c3979efd69 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -596,6 +596,7 @@ struct hstate { > int next_nid_to_alloc; > int next_nid_to_free; > unsigned int order; > + unsigned int demote_order; > unsigned long mask; > unsigned long max_huge_pages; > unsigned long nr_huge_pages; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 95dc7b83381f..d2262ad4b3ed 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2986,7 +2986,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) > > static void __init hugetlb_init_hstates(void) > { > - struct hstate *h; > + struct hstate *h, *h2; > > for_each_hstate(h) { > if (minimum_order > huge_page_order(h)) > @@ -2995,6 +2995,22 @@ static void __init hugetlb_init_hstates(void) > /* oversize hugepages were init'ed in early boot */ > if (!hstate_is_gigantic(h)) > hugetlb_hstate_alloc_pages(h); > + > + /* > + * Set demote order for each hstate. Note that > + * h->demote_order is initially 0. > + * - We can not demote gigantic pages if runtime freeing > + * is not supported, so skip this. > + */ > + if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) > + continue; > + for_each_hstate(h2) { > + if (h2 == h) > + continue; > + if (h2->order < h->order && > + h2->order > h->demote_order) > + h->demote_order = h2->order; > + } > } > VM_BUG_ON(minimum_order == UINT_MAX); > } > @@ -3235,9 +3251,31 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, > return 0; > } > > +static int demote_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed) > + __must_hold(&hugetlb_lock) > +{ > + int rc = 0; > + > + lockdep_assert_held(&hugetlb_lock); > + > + /* We should never get here if no demote order */ > + if (!h->demote_order) { > + pr_warn("HugeTLB: NULL demote order passed to demote_pool_huge_page.\n"); > + return -EINVAL; /* internal error */ > + } > + > + /* > + * TODO - demote fucntionality will be added in subsequent patch > + */ > + return rc; > +} > + > #define HSTATE_ATTR_RO(_name) \ > static struct kobj_attribute _name##_attr = __ATTR_RO(_name) > > +#define HSTATE_ATTR_WO(_name) \ > + static struct kobj_attribute _name##_attr = __ATTR_WO(_name) > + > #define HSTATE_ATTR(_name) \ > static struct kobj_attribute _name##_attr = \ > __ATTR(_name, 0644, _name##_show, _name##_store) > @@ -3433,6 +3471,105 @@ static ssize_t surplus_hugepages_show(struct kobject *kobj, > } > HSTATE_ATTR_RO(surplus_hugepages); > > +static ssize_t demote_store(struct kobject *kobj, > + struct kobj_attribute *attr, const char *buf, size_t len) > +{ > + unsigned long nr_demote; > + unsigned long nr_available; > + nodemask_t nodes_allowed, *n_mask; > + struct hstate *h; > + int err = 0; > + int nid; > + > + err = kstrtoul(buf, 10, &nr_demote); > + if (err) > + return err; > + h = kobj_to_hstate(kobj, &nid); > + > + if (nid != NUMA_NO_NODE) { > + init_nodemask_of_node(&nodes_allowed, nid); > + n_mask = &nodes_allowed; > + } else { > + n_mask = &node_states[N_MEMORY]; > + } > + > + /* Synchronize with other sysfs operations modifying huge pages */ > + mutex_lock(&h->resize_lock); > + spin_lock_irq(&hugetlb_lock); > + > + while (nr_demote) { > + /* > + * Check for available pages to demote each time thorough the > + * loop as demote_pool_huge_page will drop hugetlb_lock. > + * > + * NOTE: demote_pool_huge_page does not yet drop hugetlb_lock > + * but will when full demote functionality is added in a later > + * patch. > + */ > + if (nid != NUMA_NO_NODE) > + nr_available = h->free_huge_pages_node[nid]; > + else > + nr_available = h->free_huge_pages; > + nr_available -= h->resv_huge_pages; > + if (!nr_available) > + break; > + > + err = demote_pool_huge_page(h, n_mask); > + if (err) > + break; > + > + nr_demote--; > + } > + > + spin_unlock_irq(&hugetlb_lock); > + mutex_unlock(&h->resize_lock); > + > + if (err) > + return err; > + return len; > +} > +HSTATE_ATTR_WO(demote); > + > +static ssize_t demote_size_show(struct kobject *kobj, > + struct kobj_attribute *attr, char *buf) > +{ > + int nid; > + struct hstate *h = kobj_to_hstate(kobj, &nid); > + unsigned long demote_size = (PAGE_SIZE << h->demote_order) / SZ_1K; > + > + return sysfs_emit(buf, "%lukB\n", demote_size); > +} > + > +static ssize_t demote_size_store(struct kobject *kobj, > + struct kobj_attribute *attr, > + const char *buf, size_t count) > +{ > + struct hstate *h, *demote_hstate; > + unsigned long demote_size; > + unsigned int demote_order; > + int nid; > + > + demote_size = (unsigned long)memparse(buf, NULL); > + > + demote_hstate = size_to_hstate(demote_size); > + if (!demote_hstate) > + return -EINVAL; > + demote_order = demote_hstate->order; > + > + /* demote order must be smaller than hstate order */ > + h = kobj_to_hstate(kobj, &nid); > + if (demote_order >= h->order) > + return -EINVAL; > + > + /* resize_lock synchronizes access to demote size and writes */ > + mutex_lock(&h->resize_lock); > + h->demote_order = demote_order; > + mutex_unlock(&h->resize_lock); > + > + return count; > +} > +HSTATE_ATTR(demote_size); > + > static struct attribute *hstate_attrs[] = { > &nr_hugepages_attr.attr, > &nr_overcommit_hugepages_attr.attr, > @@ -3449,6 +3586,16 @@ static const struct attribute_group hstate_attr_group = { > .attrs = hstate_attrs, > }; > > +static struct attribute *hstate_demote_attrs[] = { > + &demote_size_attr.attr, > + &demote_attr.attr, > + NULL, > +}; > + > +static const struct attribute_group hstate_demote_attr_group = { > + .attrs = hstate_demote_attrs, > +}; > + > static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, > struct kobject **hstate_kobjs, > const struct attribute_group *hstate_attr_group) > @@ -3466,6 +3613,12 @@ static int hugetlb_sysfs_add_hstate(struct hstate *h, struct kobject *parent, > hstate_kobjs[hi] = NULL; > } > > + if (h->demote_order) { > + if (sysfs_create_group(hstate_kobjs[hi], > + &hstate_demote_attr_group)) > + pr_warn("HugeTLB unable to create demote interfaces for %s\n", h->name); > + } > + > return retval; > } > > -- > 2.31.1 > -- Oscar Salvador SUSE Labs