From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FDF3C3601A for ; Thu, 3 Apr 2025 14:26:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD5BC280005; Thu, 3 Apr 2025 10:26:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8621280001; Thu, 3 Apr 2025 10:26:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92665280005; Thu, 3 Apr 2025 10:26:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 71486280001 for ; Thu, 3 Apr 2025 10:26:49 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 186D4C09E9 for ; Thu, 3 Apr 2025 14:26:50 +0000 (UTC) X-FDA: 83292958980.05.5E6274F Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf24.hostedemail.com (Postfix) with ESMTP id DDC7C18000A for ; Thu, 3 Apr 2025 14:26:47 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yybUzOCu; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=RRPK1jgI; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yybUzOCu; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=RRPK1jgI; spf=pass (imf24.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743690408; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q6il3U6btJYVngh+B3N9cvls+A0Gwg6DPfxna89mzYs=; b=aoKSYzZCEBj0KbMwjlCl0FjsBdn4fZMsgfOGLw164xmKuGFTq29fNeA2oN9ZQeksgq0gLA u2Zn0xR3nKFVTtbvUrWQ2c9/rIobNhaQ+sJ3yi1y36dD9D1oLkp6VITsqB+CObeIm6x9wt 1DQjzmcfMp5izDUf6ryOEGGP9qaD0f0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743690408; a=rsa-sha256; cv=none; b=hcQP6k3nq5Aulj/w8JwEmH3c9PrEUjzH4cfEruI2rnndYITju8cvmNg2kBYOb5nCurI9Iv WAyExPgJb70lRPpUMTgO2u7GIyD37WLNWsu6yR2f5Feu9fGc0CP6A8SdD+KlBjCb4kpS+m yvhb/nNXK1wtErMY0YuOoSbff8HinTU= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yybUzOCu; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=RRPK1jgI; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=yybUzOCu; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=RRPK1jgI; spf=pass (imf24.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 457951F385; Thu, 3 Apr 2025 14:26:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1743690406; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Q6il3U6btJYVngh+B3N9cvls+A0Gwg6DPfxna89mzYs=; b=yybUzOCuiPRVXT6Qhu+lh3JjE456ftoE4KkMhf2KjQig0i5rhfJvoVcLvujgCf4HqgxtO4 g5eIaHC+xH81r73ImcxnAGDczilVzCDqp1RV4pW+HZgRtF3kV6pWXlyQWTof/h5W8XNlkC jQPu2GCtjKiLkX+Hm54C8/ietubQZDw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1743690406; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Q6il3U6btJYVngh+B3N9cvls+A0Gwg6DPfxna89mzYs=; b=RRPK1jgIK3aAj52sqbRzoSpLMOftGDjTExIACucUVJBvC54j3TeIw5Yl09hftJ5v+dubPY DW4TUc6mDEdQ4+BA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1743690406; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Q6il3U6btJYVngh+B3N9cvls+A0Gwg6DPfxna89mzYs=; b=yybUzOCuiPRVXT6Qhu+lh3JjE456ftoE4KkMhf2KjQig0i5rhfJvoVcLvujgCf4HqgxtO4 g5eIaHC+xH81r73ImcxnAGDczilVzCDqp1RV4pW+HZgRtF3kV6pWXlyQWTof/h5W8XNlkC jQPu2GCtjKiLkX+Hm54C8/ietubQZDw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1743690406; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Q6il3U6btJYVngh+B3N9cvls+A0Gwg6DPfxna89mzYs=; b=RRPK1jgIK3aAj52sqbRzoSpLMOftGDjTExIACucUVJBvC54j3TeIw5Yl09hftJ5v+dubPY DW4TUc6mDEdQ4+BA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id D4AC41392A; Thu, 3 Apr 2025 14:26:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id dQrBMKWa7megVQAAD6G6ig (envelope-from ); Thu, 03 Apr 2025 14:26:45 +0000 Date: Thu, 3 Apr 2025 16:26:36 +0200 From: Oscar Salvador To: Jinjiang Tu Cc: muchun.song@linux.dev, akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, wangkefeng.wang@huawei.com Subject: Re: [PATCH v2] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages Message-ID: References: <20250401082339.676723-1-tujinjiang@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250401082339.676723-1-tujinjiang@huawei.com> X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: DDC7C18000A X-Stat-Signature: b1crrsdatbj8q99yxjfru4wm6kuxsrwe X-HE-Tag: 1743690407-48422 X-HE-Meta: U2FsdGVkX18c5OYTTeXZ3kBuwJPooLx3MxXbwTi3U4Y3TDQpWvDuK95R++uzpAfPqXD5lGiiiVT7Zubb217uvn7gwQHCRtgnT2xKQNFsdgXGhOSn7ohA3zLydIInH+uL5PV3v/IBHyYC2WYFzBZZgjqT5xq08eibsFS2kU1YMIyQiuLDPGYhX9YpTNtHqlylbCBG9bpJQJ7s6xUIiqAh+u7/OCBWbeWAcTuxVA6qsciw8cD2fcKuAruScIFEqP42yf79+omsg3p6n6pIhACSrgq4sXUj1xGG/xAK9rn8upFMJWJpcCMKcAYVeKK4p+XgfHsZvoRUYWZt0G4lYno/wXix+PukGXY6vzKFrZMl8HcTYx5yztMATq9FStDg/UQKAXyuM3ZuQPH6ur+iFYQbGkRpvWnH3d8xEg/8sxzTaMIXxrL9JKuK3X6rBoeaK0wZ6J8TGjKvUFSUSfFgwCkPyVjYx7MfCFyCc0G9/XKQ6+IF41DkmP5DzR7wTmoTurO0rK1cAyUBtD1U4OXtt09wjt2vcib+X3ZRz2A109YB8TinSI4GmXcUpO4Jnkpn+QbHNkrUQtJxKJRRhoHPt0Bpgo48I3FzU89apKkmgvygkBBlUuPvFc8javiicqHzJDvMBtuvJVfMXxqbs52TxEd+fcGTiaI749mtK8+hxHLZvcx1xG7Uebiq7bbwgx+LvXlYQ7o6UTzKpkiG5SK+7XAk6wHgplKI2cPCko8qROjwI9GYjBx2bZFsA1PVbZKzUlNcxibgATYKBhJtMR76W0jB2AEatuBYz2ic1dCf5vwGZdUnAKmgIydDnNyUEceYTNzM9bZe3NuudjI2fgSRP+59SBGgSRVLa0DtmeIgsfL1eNAdlsRmFx5jO1O5ZBCrHmD5vGRvKIhzznxMxUkV23KwHzFhC3lRUUs4K7zDsextX1HvD1FqPO2BRqOofWqcRBl6ii8lmNq/2ZRErjyd2Ax iUDOT9CH aKDftxamrOADseRXjKF9bZj62H9724EJAQyOuPvLWYEIKFbTMtIdTILU90L6grKc0r4Wj0UHRgQQaIho/NrOdVnhuCP5mwF6yjGsvBINZICDmIp4+0FSqEqxKXmJnyBYnFMY+m2JbNnUDPixwaBOufU8w3DD9cPtALuqbkKpLpOFK79djwtVTG1JTSP2oGs0kLyW2IqGgbwKeIWe3NdIAl3Am5wWG6BVwYXbLzN32XXGD74xZyRUyQYLRqrwMrhCBLptIlRlHQ37aZwJsQL1yk5Lo+sfCcbiGGn13aLTPS0+sPfYPnzHI+y8hbw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 01, 2025 at 04:23:39PM +0800, Jinjiang Tu wrote: > In set_max_huge_pages(), min_count should mean the acquired persistent > huge pages, but it contains surplus huge pages. It will leads to failing > to freeing free huge pages for a Node. > > Steps to reproduce: > 1) create 5 hugetlb folios in Node0 > 2) run a program to use all the hugetlb folios > 3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5 > hugetlb folios in Node0 are accounted as surplus. > 4) create 5 hugetlb folios in Node1 > 5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios > > The result: > Node0 Node1 > Total 5 5 > Free 0 5 > Surp 5 5 > > We couldn't subtract surplus_huge_pages from min_mount, since free hugetlb > folios may be surplus due to HVO. In __update_and_free_hugetlb_folio(), > hugetlb_vmemmap_restore_folio() may fail, add the folio back to pool and > treat it as surplus. If we directly subtract surplus_huge_pages from > min_mount, some free folios will be subtracted twice. > > To fix it, check if count is less than the num of free huge pages that > could be destroyed (i.e., available_huge_pages(h)), and remove hugetlb > folios if so. > > Since there may exist free surplus hugetlb folios, we should remove > surplus folios first to make surplus count correct. > > The result with this patch: > Node0 Node1 > Total 5 0 > Free 0 0 > Surp 5 0 Unfortunately, this will not fly. Assume this: # echo 3 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages # numactl -m 0 ./hugetlb_use_2_pages & # echo 2 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages Because now you're checking for available_huge_pages in the remove_pool_hugetlb_folio block, you will fail to free that page, and will be accounted as surplus, thus failing to reduce the pool. Of course, once the program terminates, all will be cleaned up, but in the meantime that page (or more, depending on your set up) will not be freed into the buddy allocator, which is unexpected. And there are other scenarios where the same will happen. So, it seems to me that we have to find a more deterministic approach. The whole operation is almost entirely covered with the lock, other than when we free the pages we collected, so I am sure we can find a way that does not need special casing things. -- Oscar Salvador SUSE Labs