From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53BE5C6FD1F for ; Tue, 14 Mar 2023 15:38:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A4C166B0072; Tue, 14 Mar 2023 11:38:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FD298E0002; Tue, 14 Mar 2023 11:38:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C3A78E0001; Tue, 14 Mar 2023 11:38:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7A0DD6B0072 for ; Tue, 14 Mar 2023 11:38:37 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3A48D1C4846 for ; Tue, 14 Mar 2023 15:38:37 +0000 (UTC) X-FDA: 80567911074.24.7013CE3 Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by imf17.hostedemail.com (Postfix) with ESMTP id 6EF7840016 for ; Tue, 14 Mar 2023 15:38:35 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=adBeIK5U; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of jthoughton@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678808315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VXhb6G3m6Jn9DoqqnMQYKZ18mk+WZNGudb8DV2PC6TI=; b=AkvjcWCTj1PkygEGMHr3ryCag3oJVv1PJqvsfv2rXwqM+nM3g/zFwf8EKOZke5ZhxNiUwr hNuJCLeNJycG20kj2enxxCbkBifQW5pO1st1UIAcS2mNzT5bUlEaU6Y3oa1wO4GXhIc6u4 t11mAYcZP9ywNC46CsTJwiD335XZAEI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=adBeIK5U; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of jthoughton@google.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=jthoughton@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678808315; a=rsa-sha256; cv=none; b=xfAhw8KgcSaHOprYOJXCGYnRc82uZBQ472zQLw8ORtNjKjtkSwVRJPESMZ+ynUEJscItZx DzWtFVFYWJhTbjB/TsYc/d788l06ZrTUhEC5wVOVME3esqgN12MHcqedzges4UwFn39o0G cFyk3J0QvbF9yMXRqy2cAHs4PtxepkU= Received: by mail-yb1-f169.google.com with SMTP id v196so8667233ybe.9 for ; Tue, 14 Mar 2023 08:38:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1678808314; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VXhb6G3m6Jn9DoqqnMQYKZ18mk+WZNGudb8DV2PC6TI=; b=adBeIK5UvR6D+I8ACCo6VDU1QejgmiIIa63hF8jFL/jUhc3xv/wk5kPOXyfkqXx9QH N1pnz5dSmGlgCn+BlQkYmobhvl+RFUKc8gu063zeR+nAGQ24dgCuA1xX1KjHjj1Qqjch TaZuuocyuObrXg848ZwXEROTfuTBtIT6nN3yqN9OWwnQIG3TQtt0jk89j9gPSQ01801R l4+Aa+mcJeaqUIOY8BbTW/vrGUdL10yEBfLpkQkLVFBz+uKrwxjkr5BKQC83gH8V7gfO fD/YNY1KzLLLtmMbRPWgvxcQg5k6gIVeuEvCo5lYGkEfrOeZJav0pEKhAJlJ32NDIDA4 yksA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678808314; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VXhb6G3m6Jn9DoqqnMQYKZ18mk+WZNGudb8DV2PC6TI=; b=jZ6/q6updK3IRZ6/8PiMSXbr2TqVSFpLEc9LgIZhMM2fWhi28xwzJHMnB/XoLfocEZ c5XsPaVfUbpZFGIXWoUHkkymew3yghImZPXtov7rlpEp7SyHch02Cu6QCsKWqgCsS7Ol FNeCg7WsDW41zFe2XRAv4LTRCBk2lIdQY1YvcVQqeXiw9pbI0IOFzRZmbQ4HTfr5sb4v LDD4KTnqma/BDWHoSsVrtrbKiVy+47pnw2nYqNkz8u2XscjXg/Y3GGxebZUdPrkVuTv0 5CiDI/J3fxDt9/aEeFPBUR+KINRcjJuxAfvXuAFFQKTs0JGVTQIPugIFjB34MOH+xzzz 1TFw== X-Gm-Message-State: AO0yUKX9/btc3bHHlNEtLG/UL9Uay4gN4RWpGFkKlC/aNblIrXgdvjF7 B4DY0jHESGZtHkqQNXUb0ztPA9AKVDF+2Dxkcliq4A== X-Google-Smtp-Source: AK7set+4UDvKiGVX30u4cOmTNINKL6BtuE2KHwAGDalzi0Dn/rAjciFMWO0aT752laInpP1VhRUg73jTWJ+VpTQMP8o= X-Received: by 2002:a25:8f8c:0:b0:b21:a3b8:45cd with SMTP id u12-20020a258f8c000000b00b21a3b845cdmr13608678ybl.0.1678808314277; Tue, 14 Mar 2023 08:38:34 -0700 (PDT) MIME-Version: 1.0 References: <20230306191944.GA15773@monkey> In-Reply-To: <20230306191944.GA15773@monkey> From: James Houghton Date: Tue, 14 Mar 2023 08:37:58 -0700 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs To: Mike Kravetz Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Peter Xu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6EF7840016 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: 55fjez8tzw39bzx3wjnbp59crprs7ftt X-HE-Tag: 1678808315-69450 X-HE-Meta: U2FsdGVkX19OvYwwQId3wYhESrjIVwSMxXEAFnAPx8epWvjp6pVRemMAHDbg6wZKzXKiqhuYvFr9t840fYxFR12UBxOAcHFJb6RPIRSg581csPcAq0MrHTYNx6e4Yb4rh5uWmBX9qX3SmXxGxkljwJvNfOzBEdEDVBOoltIfAw9a1+MWy9ZcUOh9ntImfsTXSw4zlmlqeiG0D4lyLMeLFGFiLplVP+WTyhiKF+F4vOY82cdFusBx4K3MHOI5H75Pv99huW0VS4WkFz0vlddAdfon1zop/4kdfhaLKd58gRLL0D6/yYblybwZAw7mos5CtYu8t6Sw3pHMGd/sma2xZsDkUz0Wi5EpqAnIi7drR/BMQIZeLU74TOtaxWvOFYq2KeeOUnb6edFnd7Su36+FBqFzkOGYt0EfO3ULeF7rTNyTBt2AZbvmij1ctiNDBbyyTX7YIaG5UBZDROe89Gwx1HPs3ywu0cirKN0vmcms1F8qbSytCqaYA9TvHznrtKVZp9gm4B8Idjpn/EqFHb1oq6ykSn3BKO+FeG1bvaCDsu3JXgWl3uFCCcyfU+Xmm01RJtzsEHJ7LHXVDDhMEq4qfPDyE9rQTU06iBrCuyE4cUrNeP0GATK4QYITsAqaJxp9WoHMKgCgjNFp4BZscKDeVbZ/i+gL9XQf3R7iLnRXOKw9FvGwLdCV/ve9gkwv3rYJ1nfDllT1JXShzUOOyXDJPibTM1tllboO5SgPPrNlqtoslDFyn00HGsrlBEdwO30nV2LF8gXZ78kBHxQqMjBUA4R2+bZjj7EzkyE/iVVjTnj8ptk8ypIig4OP1keDN9JdG77D4ptehwCuZL2qfqSnQVg8pBbJbsyASViCTEC6YsZ7913fbuGX7WU72JlGmFvKZ7QeWW+r82S3r3gjY3WwJ6KCvJCCUDJmzMFI99YtdvGwJDE4btAQTRKlvPb5srN0Ee5c6Uu1Sbyc5sE0dij oQP3fbSM DFzNkFvCqN/cu3+7liFIsEGGP6ygMwB2tSUqEK0Yh6rCxYvFmHH3rshF4WRpjS+m+KF9BhAHja1PQmqVBJTqIFXJycfJOkfR7IS58q5/TbI+OmnYYe49ECYP7BA2Ro7nxoltUOlMhgYh47n81iQRPm6N7SyYUBNwzVOF/0AiAt4kCRcUHutMlEJio4Ouc5IxOVTcfVnMt6bWOazD2EUwQh5GASxrh0wYw5gUnGqlh8MzwyDjeGY7sPlf8eKetexcElho9z/xXum9whzAGSQEoX2DdSsYxy+2SMyWPdUrDgewqLA0GeUaB0d7wRMvl5RysSycaTBYWX5EXbt8CcCHq1RydSWLlPp+dV+oZ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Mar 6, 2023 at 11:19=E2=80=AFAM Mike Kravetz wrote: > > This is past the deadline, so feel free to ignore. However, ... > > James Houghton has been working on the concept of HugeTLB High Granularit= y > Mapping (HGM) as discussed here: > https://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@goog= le.com/ > > The primary motivation for this work is post-copy live migration of VMs b= acked > by hugetlb pages via userfaultfd. A followup use case is more gracefully > handling memory errors/poison on hugetlb pages. > > As can be seen by the size of James's patch set, the required changes for > HGM are a bit complex and involved. This is also complicated the need > choosing a 'mapcount strategy' as the previous scheme used by hugetlb > will no longer work. > > A HGM for hugetlbfs session would present the current approach and challe= nges. > While much of the work is confined to hugetlb, there is a bit spill over = to > other mm areas: specifically page table walking. A discussion on ways to > move forward with this effort would be appreciated. Thanks for proposing this, Mike. To hopefully get more interest in this topic, I want to lay out the reasons that Google uses HugeTLB for VMs today. They are: - Guaranteed availability of hugepages - Guaranteed NUMA alignment - Availability of 1G pages - HugeTLB vmemmap optimization to save page struct overhead Until generic mm supports all this, HugeTLB will remain a very important piece of Linux for us. :) The main limitation of HugeTLB that I care about is that it can only map an entire hugepage at once; it can never partially map a hugepage (like, there is no such thing as a PTE-mapped HugeTLB page). As Mike said, this makes the following applications impossible: 1. With userfaultfd-based live migration, being able to fetch and install memory at PAGE_SIZE. 2. Memory poison at PAGE_SIZE. HugeTLB high-granularity mapping (HGM) is an effort to make #1 and #2 possible with HugeTLB. #1 and #2 are already possible with generic mm, so this also begs the question: Can we merge HugeTLB with generic mm? This would certainly be much more work than HGM, but it removes all those pesky HugeTLB special cases (though, we still want all those features that HugeTLB has). Coming up with a plan to merge HugeTLB with generic mm would be challenging, and LSFMM might be a good place to have such a discussion. Not all of HugeTLB would need to be merged. I think some of the main special cases that should be removed are: 1. hugetlb_fault (fault/GUP special case) 2. page_vma_mapped_walk's special case 3. hugetlb_entry in pagewalk 4. HugeTLB's rmap/mapcount special cases (already working on this!) As part of this merge/unification, architectures would need to merge their hugetlb implementations with their generic mm implementations (for example, moving any special logic from set_huge_pte_at to set_pte_at). These are just some initial thoughts; I'm sure many of you have your own ideas for this. A discussion about HGM might serve as a jumping-off point for ideas for how to enhance the generic mm implementation to make the unification possible. - James Houghton