From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 550D52F8BD2; Thu, 18 Sep 2025 08:34:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.133 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758184453; cv=none; b=DMaf32GSbO27flT/3acj9/8l8S7lyQqX1IKtBqjw6urljvK/vwUyazqk+oDp57oGorVkpj1S96B4FxP4Hqzn1ncaM1tN6hBSNtMIxSy0OqULWZ+2/o8aswMBetHZzxQa9f5bXl/5Byy4ytG3TabXPoMJ884xflXcu+6Rh5QCevA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758184453; c=relaxed/simple; bh=YUyJj+AOLpQZ16qQcLa96qh1A9RBn/gEjHPtLkjTIFY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=Z4OPLVDun7BBava3Zt5BBY7Wen4P8yWiaQqvkwU98a/C2zH6wAwRe5eQGIb9noMi0L7OPr3di5aPXdhrHjcwKYFfPqEwIfCzB3mlQCdk9MVokHTGHiDY+2biRC2RofMQ8+8Zj0iIo9CkCFxg2w8Evot+z8DvIPIUPhF9j1eVRFo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=NAzFNn9G; arc=none smtp.client-ip=115.124.30.133 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="NAzFNn9G" DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1758184447; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=iHPyCqwaIE53kfvtOqEjL6h4xRme2WCE41tGRMbg4OA=; b=NAzFNn9GOVJAkfAMVdrcB07/dc9Ms6BXy74/eleljuE18J5xeCe+MSNCWI7mIBMNSKnAqNLzjwLm6kSAz8lYe1mcIwN/wPqqqWp2scY/NlMqRt6OvyP2fdTyiQYDDC+xHtbgB7ivv075GdoONkKJHKPWItxHsyJUsKpC8Jrumlc= Received: from 30.246.178.33(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0WoFHDqY_1758184444 cluster:ay36) by smtp.aliyun-inc.com; Thu, 18 Sep 2025 16:34:05 +0800 Message-ID: <1211fd9a-93e6-4ebe-a80d-083601138b70@linux.alibaba.com> Date: Thu, 18 Sep 2025 16:34:03 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm/memory-failure: Support disabling soft offline for HugeTLB pages To: Kyle Meyer , "Fan, Shawn" Cc: "Luck, Tony" , Andrew Morton , "corbet@lwn.net" , "david@redhat.com" , "linmiaohe@huawei.com" , "shuah@kernel.org" , "jane.chu@oracle.com" , "jiaqiyan@google.com" , "Liam.Howlett@oracle.com" , "bp@alien8.de" , "hannes@cmpxchg.org" , "jack@suse.cz" , "joel.granados@kernel.org" , "laoar.shao@gmail.com" , "lorenzo.stoakes@oracle.com" , "mclapinski@google.com" , "mhocko@suse.com" , "nao.horiguchi@gmail.com" , "osalvador@suse.de" , "Wysocki, Rafael J" , "rppt@kernel.org" , "Anderson, Russ" , "surenb@google.com" , "vbabka@suse.cz" , "linux-acpi@vger.kernel.org" , "linux-doc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , "linux-mm@kvack.org" References: <20250915201618.7d9d294a6b22e0f71540884b@linux-foundation.org> From: Shuai Xue In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit 在 2025/9/18 02:59, Kyle Meyer 写道: > On Wed, Sep 17, 2025 at 06:35:14AM +0000, Fan, Shawn wrote: >>>> My original patch for this just skipped the GHES->offline process >>>> for huge pages. But I wasn't aware of the sysctl control. That provides >>>> a better solution. >>> >>> Tony, does that mean you're OK with using the existing sysctl interface? If >>> so, I'll just send a separate patch to update the sysfs-memory-page-offline >>> documentation and drop the rest. >> >> Kyle, >> >> It depends on which camp the external customer that reported this >> falls into: >> >> 1) "I'm OK disabling all soft offline requests". >> >> or the: >> >> 2) "I'd like 4K pages to still go offline if the BIOS asks, just not any huge pages". >> >> Shawn: Can you please find out? >> >> >> -> Prefer the 2nd option, "4K pages still go offline if the BIOS asks, just not any huge pages." > > OK, thank you. > > Does that mean they want to avoid offlining transparent huge pages as well? > > Thanks, > Kyle Meyer Hi, Shawn, As memory access is typically interleaved between channels. When the per-rank threshold is exceeded, soft-offlining the last accessed address seems unreasonable - regardless of whether it's a 4KB page or a huge page. The error accumulation happens at the rank level, but the action is taken on a specific page that happened to trigger the threshold, which doesn't address the underlying issue. I prefer the first option that disabling all soft offline requests from GHES driver. Thanks. Shuai