From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAD69D20686 for ; Tue, 15 Oct 2024 23:46:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2AFE66B007B; Tue, 15 Oct 2024 19:46:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2915E6B0088; Tue, 15 Oct 2024 19:46:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 14DE76B0089; Tue, 15 Oct 2024 19:46:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EC5CC6B007B for ; Tue, 15 Oct 2024 19:46:05 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3768E140FED for ; Tue, 15 Oct 2024 23:45:56 +0000 (UTC) X-FDA: 82677471912.09.A1CA3F2 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf13.hostedemail.com (Postfix) with ESMTP id 901EF20014 for ; Tue, 15 Oct 2024 23:45:55 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Rw15JRpF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=jiaqiyan@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729035931; a=rsa-sha256; cv=none; b=bDUc1KZIKMxpL471W8Cv9HNJTWhIT6y7T9qXmtOsMdJKPr5Aq9fMBvY30S395Qywx/psuU U3oIg7ONEU7t0iujGIqvNxMXks8Tfk+So+G6SHnI1Qx+YTpzQ6hyUKVA1I2yRzlZObjmmo s4ZVqhvSBrpXK4TNOdON9kwCrd1CLAA= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Rw15JRpF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=jiaqiyan@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729035931; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/pzVu8MhhdFE7nwa3OK7cZ/2vAB1XV9u1wDM5BKWZ3U=; b=gWpyJEDDNFLskP9Wc8xJSyDTdH0bzM279n2Txj+kHxssNDtVLnXD+Mhaqr4sWFTp3AKK40 dT6KrIPNZsK1fsesSv+N3keMwVvXXhO0wXvTGpM+xw5ZJvjku7b20I8vsmASV5mniEIYIR 3BFGNEYqdLUmN3FbSKfhMEcqWtJU80o= Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-539e681ba70so2342e87.1 for ; Tue, 15 Oct 2024 16:46:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729035962; x=1729640762; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/pzVu8MhhdFE7nwa3OK7cZ/2vAB1XV9u1wDM5BKWZ3U=; b=Rw15JRpFMHPTlPa16QhcosznlggrvaFMGgvbo7VbE41AqjD683fUBwmfVOXCKf0F7r LgrFc7x62le0lNdpLA8FK8HRvS5pJ1c4H8oWCyJl1U4ot4mMDVwfs9NeCiWl4jG/JHHe kwSlPbo5pg2zDEcbeGtYdTuKPVM+21PpzUoETY7D5DXuLS7AIpMXRbto+aN3AUeyTV/u wuVWIpex1cEECzcvUo58C6MXfykUojQgFkOLiqrdgA44O9bEcVxTvHOJp6VXU0ls3xiq z438updf/RB46wKIwC9x7ZMS5p49c9tLYFHalA9qHzt08emXaNMHd2wDDH0xprN4rh8Q jaJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729035962; x=1729640762; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/pzVu8MhhdFE7nwa3OK7cZ/2vAB1XV9u1wDM5BKWZ3U=; b=vDSkc83DSO22uM7985ATeAxXXV3zdsnlUMOpq8Yc9ITCdw8okX+yGCZjeaVdA+eKBE 1afmP7I+2GSiyxGa31jnQsA3Zb7TddRkxDlGYvorQypG27HJXfoO1tY+5+XQ246fztTM oM9EHbUWKoocDIuQ+3NvaBdNwiO2RQM992Eb1MW+6M66/twUo7fGctRTT7i/ip0R7s5w KzIUhoop7ztpKF71HRvfBhpTVHMtx3OTvE4M7Nmet21yg5wkdPF635N20b2an4LYhR08 dPq4IXZTfC/2gUbUjMezcOYyDhqGbEtxrmNHh9Ck597urAyIXKqoJ9l3nb6ZFfCFIN/w lU3Q== X-Forwarded-Encrypted: i=1; AJvYcCXVxXo5oKWFLW7m8X/KFslgO8o+sa47Ib0kNZpuOuUM58jxzkXxFgkTAR4D1tigYXIXZ2rPiYFyBg==@kvack.org X-Gm-Message-State: AOJu0YyHX2/Vixj0M3kT9mnwRfhkmkzKL3gypP6Ci63CfWOKzpMFjjId zohQYaPC3ftN18DXBtt+PIG4hA7xJ2j4pRkHzC7dTCppFmZnsiReHbxE0upaqODL7tJUdcfPO4+ 5Z/80jFGYVZ+0JMPR6GBECPG1RVlzinJTJQQe X-Google-Smtp-Source: AGHT+IHEAhOun5dORv2V+qqHci/aIrKdFQh6L0pJ/uWG1BHKawD4AX/qGVzfZYDWAAQdCLbQfq8mlNtcesyx9k1rrjU= X-Received: by 2002:a05:6512:b11:b0:535:3d14:1313 with SMTP id 2adb3069b0e04-53a04c2f2f8mr204433e87.0.1729035961844; Tue, 15 Oct 2024 16:46:01 -0700 (PDT) MIME-Version: 1.0 References: <20240924043924.3562257-1-jiaqiyan@google.com> <20240924043924.3562257-2-jiaqiyan@google.com> <7658ca1f-1b3b-4352-93d9-66f8dfd28408@oracle.com> In-Reply-To: From: Jiaqi Yan Date: Tue, 15 Oct 2024 16:45:49 -0700 Message-ID: Subject: Re: [RFC PATCH v1 1/2] mm/memory-failure: introduce global MFR policy To: jane.chu@oracle.com Cc: nao.horiguchi@gmail.com, linmiaohe@huawei.com, tony.luck@intel.com, wangkefeng.wang@huawei.com, akpm@linux-foundation.org, osalvador@suse.de, rientjes@google.com, duenwen@google.com, jthoughton@google.com, jgg@nvidia.com, ankita@nvidia.com, peterx@redhat.com, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: 3phtk71tqyyrx4b16y3keupnuwowpgk8 X-Rspamd-Queue-Id: 901EF20014 X-Rspamd-Server: rspam02 X-HE-Tag: 1729035955-377404 X-HE-Meta: U2FsdGVkX1+w0EPxxcKe5gFpIAD+zb2Bu/rJpkD/VMfeD8J0y8UzC1+dT1A8A7sdl7kAh9EJ2M3YAEwnDr4JPicNqb1c1AB1KKHQmfNsMtbAkeoyHEWzYFJcS/RnAuAXuUtbf2o7Pic35+wQAYHwBOCaGDmOOU2KOU6GBa9NPFe53FmvOdmZ2DnazqGx6jX8GEKB98bson0QrNSBUQjCjM+JExN9ayCsfd8Dm19Dp9ltfI5zKsyInkLsW2XhQjxjR6dgxy+7e8piCNKKY7W4b4JqLjfQh21ZNmEaO8a8PKC3WUo/VvLLy7kHhyz31Ke+6WVToTnzZouHqWE3DJ1lawA5SCc4DufPj5megIgFoNtA1yEAV+D7YgUrCz86hLS9S/YdnyKGiOAZjEVZiRft17WZcuUHYRLDIzOxDJK6NIvjCm3X5OAZCRriYkYSC63Lx+3EOlXaCYSN8CUqqMyjNS7182dCVeUupoGhOORFobMM+J0NdBNQprOTpoLcui0zrfCRuU7mtW+tH3cClqZ9vzlIpKLP9xGXnKKqk40lx1UXisX9skYD7X840hgZJtiJWtSYlAapk649lgQnJJqddXRmRlEWlXSw3CeoSvYewCH6lS0LxV69jSZrM5kM9gaAQo+TR+8Po1kshB4F65K1pY99dwpiX/C3dKaYvnKKQnERJd/OzeRbkWTxWrxhxayWz9/lYtfc3b8OgM0lg5HnWIc8IfGmWAWzPJvXzymbrZXUjpMmuXWWxpegebn1SNSf+FzSuKpSrZeIzFMs87E6N5JXf1b/I4/UpJbJWPFWKvKxZqdCgCg1k51pVF+dZDdPEzcsNUwl4vC/Nb6PE5HZE2qGG0S1M8onQSGsRb8hNF9EaEMpCFhnXF5fzqshJuNjBmXcRuFe6Mf8+FtA3A3PM9qr6hd/GWc+S70/8haNgTVDRaagZL8rwUG7nAw7DZANWiDWquzpOKbZ+P743Cu c8QM2155 sx1A777CEgbAaiJXMwa4a8ppmJq3E3txYZK1vObSMUfVPt7AjwWsOs1QZEFNJYG5egucq5+3Pr7J+Ve3euMpBORknCKaR3QfClvnnWaBCUAvL5SpEq9uAICoexX0I81xUIQRwWriK2TyfOA+OgxQf26bALi9sRYDY/4vIScOcANeCr62QMnU05QRMfOLOXP2myw7su5x/71qCB8z6trMj2Q6NARbATLbH81c6xyGrZ1H6FzWM3ySLbik7oWfBZzXNjC8T7GBfctVnU30DopfgT2nldllfHugYkRM2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 11, 2024 at 11:28=E2=80=AFAM wrote: > > On 10/10/2024 4:21 PM, Jiaqi Yan wrote: > > > On Mon, Oct 7, 2024 at 10:24=E2=80=AFAM wrote: > >> On 10/3/2024 4:51 PM, Jiaqi Yan wrote: > >>> soned page (sub- or huge-) will eventually be isolated, because, > >>> The code here is "global policy". The "per-VMA policy", proposed in > >>> 0/2 but code not sent, should be able to support isolation + offline > >>> at some point (all VMAs are gone and page becomes free). > >> "per-VMA policy" sounds interesting. > >>>> Another thing I'm curious at is whether you have tested with real > >>>> hardware UE - the one that triggers MCE. When a real UE is consumed= by > >>> Yes, with our workload. Can you share more about what is the "trainin= g > >>> process"? Is it something to train memory or screen memory errors? > >> The cover letter mentioned "Machine Learning (ML) workloads", so I use= d > >> it as an example. > > Got you. In that case, if the ML workload (running in a VM) wants to > > do what you described, wouldn't losing 1G hugetlb page due to kernel > > offline make the VM/workload even harder to execute recover logic? > > Indeed. > > As the user application got more sophisticated on recovering from > poison, what about making the kernel to do the heavy lifting? I think there are two things. First, if userspace claims it has enough or sophisticated recovery ability (assume we trust it), can it take full control of what happens to the hardware poisoned memory page it **owns**? My answer to this question is yes. The reason is I believe the kernel has a limited ability to do memory failure recovery (MFR) optimally for all userspace. Current hard offline support in the kernel has also made userspace recovery hard, so userspace deserve a position in MFR. Second, what is the granularity of the control? This patch makes the control applicable to every process. So what about making it controllable only by the userspace process that owns the memory page? Kernel can still do whatever the heavy lifting (hard offline, set HWPoison) **after** the owning userspace unclaims the control, or exits. Another way to "disable hardoffline but still set HWPoison" I can think of is, make the HWPOISON flag apply at page_size level, instead of always set at the compound head. At least from hugetlb's perspective, is it a good idea? > > Something like by way of userfaultfd, kernel provides a new/clean > hugetlb page, copied over good data from the clean subpages and then > present the clean hugetlb page to user process with indication that > subpage x is a substitute of the poisoned old subpage x, hence its data > might need a refill? I am not sure how exactly to pull this through as > the even is not a page-fault, but just wondering whether something like > this is possible. > > thanks, > > -jane > > > > >> -jane > >>