From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9645C433F5 for ; Fri, 25 Mar 2022 21:11:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E49566B0071; Fri, 25 Mar 2022 17:11:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF8236B0073; Fri, 25 Mar 2022 17:11:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C99DB6B0074; Fri, 25 Mar 2022 17:11:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0250.hostedemail.com [216.40.44.250]) by kanga.kvack.org (Postfix) with ESMTP id BA2326B0071 for ; Fri, 25 Mar 2022 17:11:31 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6B8531828A45D for ; Fri, 25 Mar 2022 21:11:31 +0000 (UTC) X-FDA: 79284154782.20.A1E7AAC Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) by imf03.hostedemail.com (Postfix) with ESMTP id EE8EE20031 for ; Fri, 25 Mar 2022 21:11:30 +0000 (UTC) Received: by mail-lf1-f47.google.com with SMTP id z12so1554300lfu.10 for ; Fri, 25 Mar 2022 14:11:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=Z2R8IXbfOENnpVqFLfONqtqcep3EL97StMvmwwVUHcQ=; b=KYP9XLUAiIS+9V1HfsjxXIoib+4FGm9lBWv8A7cCTEttgasjg2QPZVQvSpTJM9hntq NMBebABdHTql7R6R4lnpv71soTHy+btgrcHHWb0x/5q3cMi9C7GVC2oZljA5LYcQQWoy 4RcD8K+59V1hWNTBRpOnOhdsRf8PUnsZ65z1QkFA36pjze8TEb2EispinZ7cYAFcB0ji p5wKHJp2xuQI4XtFAz9Mlrj+4U6xDy+TaUNA3icWy9hrSOm4Gmu8a59RTrHzp4XrcsUS /TOVV/aLWYKYhnelDVLh/3cp2yTjSq8E97TjsbctGEJumalV5Be/5j+tqSTNTNrvH7rN iGfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Z2R8IXbfOENnpVqFLfONqtqcep3EL97StMvmwwVUHcQ=; b=XPBqBr+lRhdLhzwTJvZVRXMKHi/O339zvv55jY2akw8uqwXp7N+xejSqRZCGNkeOSs 6r9RLMwbqnshkhXUjjN2SqCaTZUEVV7OMkJFw0XpFgQoZowg4Pb/elmzlR7+SE5gYzKJ wL1kCvOMxM3u4ZchcBoNjWCoX9IQmTdeZDJbc6s7yw0BLGw4WOv0WZncVnR43rEskf0x 2xhS72OlY3RtQ9MbsxGZlLpOQ4bdv3Ary3tTxxRIzPl0ZSdnORS19tG5lugQqfh8+nOI PntRQIp1zHYInIOCArWPj0vWUeuSvhD8adRT9j8IiQupZrhoslLa+XiaduoLLLaXe5Bf 1nMg== X-Gm-Message-State: AOAM533YI/z3bGyxeiQobAjVIVinMtXBRRVqnWIVp+TR70KOPQdY1/r5 sAAgei83ggaLLo1MDMGXTiKQU5vGaR8qbqflDId+Ng== X-Google-Smtp-Source: ABdhPJwfcylkSp0GmNK2N0ZN4atESfmJOZriyPEQZql43J8nbtyDnGSZWJbCGEwC6fsDlb933k1ksUh5gbEHvDptSIA= X-Received: by 2002:a05:6512:150f:b0:448:244c:e96e with SMTP id bq15-20020a056512150f00b00448244ce96emr9291877lfb.653.1648242689136; Fri, 25 Mar 2022 14:11:29 -0700 (PDT) MIME-Version: 1.0 References: <20220323232929.3035443-1-jiaqiyan@google.com> In-Reply-To: From: Jiaqi Yan Date: Fri, 25 Mar 2022 14:11:17 -0700 Message-ID: Subject: Re: [RFC v1 0/2] Memory poison recovery in khugepaged To: Yang Shi Cc: Tony Luck , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , "Kirill A. Shutemov" , Miaohe Lin , Jue Wang , Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KYP9XLUA; spf=pass (imf03.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.167.47 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: EE8EE20031 X-Stat-Signature: rhf4opink3ikaz81ohq1rowyq4jgauan X-HE-Tag: 1648242690-88069 X-Bogosity: Ham, tests=bogofilter, spamicity=0.001587, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 24, 2022 at 7:51 PM Yang Shi wrote: > > On Wed, Mar 23, 2022 at 4:29 PM Jiaqi Yan wrote: > > > > Problem > > =3D=3D=3D=3D=3D=3D=3D > > Memory DIMMs are subject to multi-bit flips, i.e. memory errors. > > As memory size and density increase, the chances of and number of > > memory errors increase. The increasing size and density of server > > RAM in the data center and cloud have shown increased uncorrectable > > memory errors. There are already mechanisms in the kernel to recover > > from uncorrectable memory errors. This series of patches provides > > the recovery mechanism for the particular kernel agent khugepaged. > > > > Impact > > =3D=3D=3D=3D=3D=3D > > The main reason we chose to make khugepaged tolerant of memory failures > > was its high possibility of accessing poisoned memory while performing > > functionally optional compaction actions. Standard applications > > typically don't have strict requirements on the size of its pages. > > So they are given 4K pages by the kernel. The kernel is able to improve > > application performance by either 1) giving application 2M pages > > to begin with, or 2) collapsing 4K pages into 2M pages when possible. > > This collapsing operation is done by khugepaged, a kernel agent that > > is constantly scanning memory. When collapsing 4K pages into a 2M page, > > it must copy the data from the 4K pages into a physically contiguous > > 2M page. Therefore, as long as there exists one poisoned cache line in > > collapsible 4K pages, khugepaged will eventually access it. The current > > impact to users is a machine check exception triggered kernel panic. > > However, khugepaged=E2=80=99s compaction operations are not functionall= y required > > kernel actions. Therefore making khugepaged tolerant to poisoned memory > > will greatly improve user experience. > > > > Solution > > =3D=3D=3D=3D=3D=3D=3D=3D > > As stated before, it is less desirable to crash the system only because > > khugepaged accesses poisoned pages while it is collapsing 4K pages. > > The high level idea of this patch series is to skip the group of pages > > (usually 512 4K-size pages) once khugepaged finds one of them is poison= ed, > > as these pages have become ineligible to be collapsed. > > > > We are also careful to unwind operations khuagepaged has performed befo= re > > it detects memory failures. For example, before copying and collapsing > > a group of anonymous pages into a huge page, the source pages will be > > isolated and their page table is unlinked from their PMD. These operati= ons > > need to be undone in order to ensure these pages are not changed/lost f= rom > > the perspective of other threads (both user and kernel space). As for > > file backed memory pages, there already exists a rollback case. This > > patch just extends it so that khugepaged also correctly rolls back when > > it fails to copy poisoned 4K pages. > > Actually I should asked the question in the first place before diving > into the implementation details, if uncorrectable memory error > happens, kernel will pin the poisoned page and set hwpoison flag, the > bumped page refcount would prevent the page from being collapsed IIUC. This patch series is for cases where khugepaged is the first guy that detec= ts the memory errors on these poisoned pages. IOW, the pages are not known to have memory errors when khugepaged collapsing gets to them. In our observation, this happens frequently when the huge page ratio of the system is relatively low, which is fairly common in cloud VMs. > So I'm wondering why we need this? > > > > > Jiaqi Yan (2): > > mm: khugepaged: recover from poisoned anonymous memory > > mm: khugepaged: recover from poisoned file-backed memory > > > > include/linux/highmem.h | 37 +++++++ > > mm/khugepaged.c | 211 +++++++++++++++++++++++++++++----------- > > 2 files changed, 189 insertions(+), 59 deletions(-) > > > > -- > > 2.35.1.894.gb6a874cedc-goog > >