From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 660AAC433EF for ; Fri, 25 Mar 2022 23:07:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C41206B0071; Fri, 25 Mar 2022 19:07:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC9346B0073; Fri, 25 Mar 2022 19:07:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1CCD8D0001; Fri, 25 Mar 2022 19:07:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8C5326B0071 for ; Fri, 25 Mar 2022 19:07:19 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 44D45A30AA for ; Fri, 25 Mar 2022 23:07:19 +0000 (UTC) X-FDA: 79284446598.19.69C8EFD Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf27.hostedemail.com (Postfix) with ESMTP id D46EB4002D for ; Fri, 25 Mar 2022 23:07:18 +0000 (UTC) Received: by mail-lf1-f51.google.com with SMTP id w7so15704351lfd.6 for ; Fri, 25 Mar 2022 16:07:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=bwHVCzT2bIVr/dKCr0VsE15nVdTCgDKvAVwIuYpHQcU=; b=ToXj7zbYnl2oEPDMUpG1URQq0FRTe7BKeV1pw3bTjseFbmgIjfbqkt/5k94YlaTzVB ePxwJy9rHyoGsnBG+1lkxwtTQV0ckIubF++gTc3NhFs/JcA2OyJC+tsU8lUfmoRXsFVu uyXlgnqjnzH8OCKMNal8pux4yGNLXcANV2PExhrnO7Gbm4KnHK74dd5OIVwDXWm3nx4d YdLnKgXjg9xTEAlM9N/EI/BxcI7WFwhTsDIhHFiqK5OroRDmxXktfcf0HTpqLeZZm/nr +lPfkDMi6oLCFr+6vHeu7UlU4Z/2XyqTO+H8zQZPg9tGTJyGpABvnzJg8KLG2PdCpnnr ZN+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=bwHVCzT2bIVr/dKCr0VsE15nVdTCgDKvAVwIuYpHQcU=; b=1tq3Wy+DunKZhz+vKeIiBk1L6MsLjdx/Ijc7L8zE8wc3atMqm3gp+WQIXKheLa77K7 RqtY0veOKO73EOmQdVgTsi5PG7MJmBrs4rMN01Z0qB3UrLOQ614xZdJrnUsTojLNlBXG 2lnW6wMf9IigsYcC8QLFVwWuoQY4xjFb9UNq5/hqouxm+L0a7m4QII28S6xyFMBL2bT5 Iuf1QEZp6JfWdaupCGy7G+2nhlL/H+QxHXK6FNoQNOC+JZ8n2zepBXxLJcT/7ngoZhVu XUQK8a7F0YdISyw3nrDNxjD6fk+QOhpFLV+Stb1oleVGgRjqYHucvhRJ5my3yCLRHqvQ rM6Q== X-Gm-Message-State: AOAM530FurHE9InjOzNf3raWf53pxcrG5zPn2E91z4eoudE+sRRIhHFY y+NS2+4WsA9F2+YkNR8FHl9miX6U/MiNR2JhUgvIhQ== X-Google-Smtp-Source: ABdhPJwHyJasBscxnDtt1p42O/D31dDfnHi4z32Ba5SyeSj+wIMkV+n0XzyTYUaUyQzZLJyttD3bnqo8Hwtzv3PmTAQ= X-Received: by 2002:a05:6512:1194:b0:44a:4815:357b with SMTP id g20-20020a056512119400b0044a4815357bmr9859053lfr.510.1648249637003; Fri, 25 Mar 2022 16:07:17 -0700 (PDT) MIME-Version: 1.0 References: <20220323232929.3035443-1-jiaqiyan@google.com> In-Reply-To: From: Jiaqi Yan Date: Fri, 25 Mar 2022 16:07:05 -0700 Message-ID: Subject: Re: [RFC v1 0/2] Memory poison recovery in khugepaged To: Yang Shi Cc: Tony Luck , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , "Kirill A. Shutemov" , Miaohe Lin , Jue Wang , Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ToXj7zbY; spf=pass (imf27.hostedemail.com: domain of jiaqiyan@google.com designates 209.85.167.51 as permitted sender) smtp.mailfrom=jiaqiyan@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D46EB4002D X-Stat-Signature: 4hec9ugkoapi9to7bj4qk498jxofserx X-HE-Tag: 1648249638-218319 X-Bogosity: Ham, tests=bogofilter, spamicity=0.001526, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 25, 2022 at 2:42 PM Yang Shi wrote: > > On Fri, Mar 25, 2022 at 2:11 PM Jiaqi Yan wrote: > > > > On Thu, Mar 24, 2022 at 7:51 PM Yang Shi wrote: > > > > > > On Wed, Mar 23, 2022 at 4:29 PM Jiaqi Yan wrote= : > > > > > > > > Problem > > > > =3D=3D=3D=3D=3D=3D=3D > > > > Memory DIMMs are subject to multi-bit flips, i.e. memory errors. > > > > As memory size and density increase, the chances of and number of > > > > memory errors increase. The increasing size and density of server > > > > RAM in the data center and cloud have shown increased uncorrectable > > > > memory errors. There are already mechanisms in the kernel to recove= r > > > > from uncorrectable memory errors. This series of patches provides > > > > the recovery mechanism for the particular kernel agent khugepaged. > > > > > > > > Impact > > > > =3D=3D=3D=3D=3D=3D > > > > The main reason we chose to make khugepaged tolerant of memory fail= ures > > > > was its high possibility of accessing poisoned memory while perform= ing > > > > functionally optional compaction actions. Standard applications > > > > typically don't have strict requirements on the size of its pages. > > > > So they are given 4K pages by the kernel. The kernel is able to imp= rove > > > > application performance by either 1) giving application 2M pages > > > > to begin with, or 2) collapsing 4K pages into 2M pages when possibl= e. > > > > This collapsing operation is done by khugepaged, a kernel agent tha= t > > > > is constantly scanning memory. When collapsing 4K pages into a 2M p= age, > > > > it must copy the data from the 4K pages into a physically contiguou= s > > > > 2M page. Therefore, as long as there exists one poisoned cache line= in > > > > collapsible 4K pages, khugepaged will eventually access it. The cur= rent > > > > impact to users is a machine check exception triggered kernel panic= . > > > > However, khugepaged=E2=80=99s compaction operations are not functio= nally required > > > > kernel actions. Therefore making khugepaged tolerant to poisoned me= mory > > > > will greatly improve user experience. > > > > > > > > Solution > > > > =3D=3D=3D=3D=3D=3D=3D=3D > > > > As stated before, it is less desirable to crash the system only bec= ause > > > > khugepaged accesses poisoned pages while it is collapsing 4K pages. > > > > The high level idea of this patch series is to skip the group of pa= ges > > > > (usually 512 4K-size pages) once khugepaged finds one of them is po= isoned, > > > > as these pages have become ineligible to be collapsed. > > > > > > > > We are also careful to unwind operations khuagepaged has performed = before > > > > it detects memory failures. For example, before copying and collaps= ing > > > > a group of anonymous pages into a huge page, the source pages will = be > > > > isolated and their page table is unlinked from their PMD. These ope= rations > > > > need to be undone in order to ensure these pages are not changed/lo= st from > > > > the perspective of other threads (both user and kernel space). As f= or > > > > file backed memory pages, there already exists a rollback case. Thi= s > > > > patch just extends it so that khugepaged also correctly rolls back = when > > > > it fails to copy poisoned 4K pages. > > > > > > Actually I should asked the question in the first place before diving > > > into the implementation details, if uncorrectable memory error > > > happens, kernel will pin the poisoned page and set hwpoison flag, the > > > bumped page refcount would prevent the page from being collapsed IIUC= . > > > > This patch series is for cases where khugepaged is the first guy that d= etects > > the memory errors on these poisoned pages. IOW, the pages are not known= to > > have memory errors when khugepaged collapsing gets to them. > > In our observation, this happens frequently when the huge page ratio of > > the system is relatively low, which is fairly common in cloud VMs. > > Thanks, this is the very important information that needs to be caught > in the 1st patch's commit log. Thanks for this valuable feedback. I will add this in the commit msg of v2, but I will wait for your comments on patch 2/2 before sending out v2. > > > > > > > > > > So I'm wondering why we need this? > > > > > > > > > > > Jiaqi Yan (2): > > > > mm: khugepaged: recover from poisoned anonymous memory > > > > mm: khugepaged: recover from poisoned file-backed memory > > > > > > > > include/linux/highmem.h | 37 +++++++ > > > > mm/khugepaged.c | 211 +++++++++++++++++++++++++++++-------= ---- > > > > 2 files changed, 189 insertions(+), 59 deletions(-) > > > > > > > > -- > > > > 2.35.1.894.gb6a874cedc-goog > > > >