From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 986E6C433DB for ; Wed, 13 Jan 2021 10:04:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E3FCF23359 for ; Wed, 13 Jan 2021 10:04:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E3FCF23359 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 26E406B016B; Wed, 13 Jan 2021 05:04:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 245528D002E; Wed, 13 Jan 2021 05:04:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10C738D0019; Wed, 13 Jan 2021 05:04:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id F00896B016B for ; Wed, 13 Jan 2021 05:04:52 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B7E6B180AD801 for ; Wed, 13 Jan 2021 10:04:52 +0000 (UTC) X-FDA: 77700318024.13.nerve22_39014532751d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 9928418140B74 for ; Wed, 13 Jan 2021 10:04:52 +0000 (UTC) X-HE-Tag: nerve22_39014532751d X-Filterd-Recvd-Size: 7575 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Wed, 13 Jan 2021 10:04:51 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R811e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=zhongjiang-ali@linux.alibaba.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---0ULc.JMJ_1610532286; Received: from L-X1DSLVDL-1420.local(mailfrom:zhongjiang-ali@linux.alibaba.com fp:SMTPD_---0ULc.JMJ_1610532286) by smtp.aliyun-inc.com(127.0.0.1); Wed, 13 Jan 2021 18:04:47 +0800 Subject: Re: [PATCH 04/10] mm, fsdax: Refactor memory-failure handler for dax mapping To: Ruan Shiyang , Jan Kara Cc: linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-raid@vger.kernel.org, darrick.wong@oracle.com, dan.j.williams@intel.com, david@fromorbit.com, hch@lst.de, song@kernel.org, rgoldwyn@suse.de, qi.fuli@fujitsu.com, y-goto@fujitsu.com References: <20201230165601.845024-1-ruansy.fnst@cn.fujitsu.com> <20201230165601.845024-5-ruansy.fnst@cn.fujitsu.com> <20210106154132.GC29271@quack2.suse.cz> <75164044-bfdf-b2d6-dff0-d6a8d56d1f62@cn.fujitsu.com> From: zhong jiang Message-ID: <781f276b-afdd-091c-3dba-048e415431ab@linux.alibaba.com> Date: Wed, 13 Jan 2021 18:04:46 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:85.0) Gecko/20100101 Thunderbird/85.0 MIME-Version: 1.0 In-Reply-To: <75164044-bfdf-b2d6-dff0-d6a8d56d1f62@cn.fujitsu.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/1/12 10:55 =E4=B8=8A=E5=8D=88, Ruan Shiyang wrote: > > > On 2021/1/6 =E4=B8=8B=E5=8D=8811:41, Jan Kara wrote: >> On Thu 31-12-20 00:55:55, Shiyang Ruan wrote: >>> The current memory_failure_dev_pagemap() can only handle single-mappe= d >>> dax page for fsdax mode.=C2=A0 The dax page could be mapped by multip= le=20 >>> files >>> and offsets if we let reflink feature & fsdax mode work together.=C2=A0= So, >>> we refactor current implementation to support handle memory failure o= n >>> each file and offset. >>> >>> Signed-off-by: Shiyang Ruan >> >> Overall this looks OK to me, a few comments below. >> >>> --- >>> =C2=A0 fs/dax.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 | 21 +++++++++++ >>> =C2=A0 include/linux/dax.h |=C2=A0 1 + >>> =C2=A0 include/linux/mm.h=C2=A0 |=C2=A0 9 +++++ >>> =C2=A0 mm/memory-failure.c | 91=20 >>> ++++++++++++++++++++++++++++++++++----------- >>> =C2=A0 4 files changed, 100 insertions(+), 22 deletions(-) > > ... > >>> =C2=A0 @@ -345,9 +348,12 @@ static void add_to_kill(struct task_struc= t=20 >>> *tsk, struct page *p, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tk->addr =3D page_address_in_vm= a(p, vma); >>> -=C2=A0=C2=A0=C2=A0 if (is_zone_device_page(p)) >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tk->size_shift =3D dev_pa= gemap_mapping_shift(p, vma); >>> -=C2=A0=C2=A0=C2=A0 else >>> +=C2=A0=C2=A0=C2=A0 if (is_zone_device_page(p)) { >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (is_device_fsdax_page(= p)) >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 t= k->addr =3D vma->vm_start + >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ((pgoff - vma->vm_pgoff) <<= PAGE_SHIFT); >> >> It seems strange to use 'pgoff' for dax pages and not for any other=20 >> page. >> Why? I'd rather pass correct pgoff from all callers of add_to_kill() a= nd >> avoid this special casing... > > Because one fsdax page can be shared by multiple pgoffs.=C2=A0 I have t= o=20 > pass each pgoff in each iteration to calculate the address in vma (for=20 > tk->addr).=C2=A0 Other kinds of pages don't need this. They can get the= ir=20 > unique address by calling "page_address_in_vma()". > IMO,=C2=A0=C2=A0 an fsdax page can be shared by multiple files rather tha= n=20 multiple pgoffs if fs query support reflink.=C2=A0=C2=A0 Because an page = only=20 located in an mapping(page->mapping is exclusive),=C2=A0 hence it=C2=A0 o= nly has=20 an pgoff or index pointing at the node. =C2=A0or=C2=A0 I miss something for the feature ?=C2=A0 thanks, > So, I added this fsdax case here.=C2=A0 This patchset only implemented = the=20 > fsdax case, other cases also need to be added here if to be implemented= . > > > --=20 > Thanks, > Ruan Shiyang. > >> >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tk->size_shift =3D dev_pa= gemap_mapping_shift(p, vma, tk->addr); >>> +=C2=A0=C2=A0=C2=A0 } else >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 tk->size_shift= =3D page_shift(compound_head(p)); >>> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* >>> @@ -495,7 +501,7 @@ static void collect_procs_anon(struct page=20 >>> *page, struct list_head *to_kill, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 if (!page_mapped_in_vma(page, vma)) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 continue; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 if (vma->vm_mm =3D=3D t->mm) >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 add_to_kill(t, page, vma, to_kill); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 add_to_kill(t, page, NULL, 0, vma, to_kill); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 read_unlock(&tasklist_lock); >>> @@ -505,24 +511,19 @@ static void collect_procs_anon(struct page=20 >>> *page, struct list_head *to_kill, >>> =C2=A0 /* >>> =C2=A0=C2=A0 * Collect processes when the error hit a file mapped pag= e. >>> =C2=A0=C2=A0 */ >>> -static void collect_procs_file(struct page *page, struct list_head=20 >>> *to_kill, >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 int force_early) >>> +static void collect_procs_file(struct page *page, struct=20 >>> address_space *mapping, >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgoff_t pgoff, struct lis= t_head *to_kill, int force_early) >>> =C2=A0 { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vm_area_struct *vma; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct task_struct *tsk; >>> -=C2=A0=C2=A0=C2=A0 struct address_space *mapping =3D page->mapping; >>> -=C2=A0=C2=A0=C2=A0 pgoff_t pgoff; >>> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 i_mmap_lock_read(mapping); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 read_lock(&tasklist_lock); >>> -=C2=A0=C2=A0=C2=A0 pgoff =3D page_to_pgoff(page); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 for_each_process(tsk) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct task_st= ruct *t =3D task_early_kill(tsk, force_early); >>> - >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!t) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 continue; >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vma_interval_tree_foreach= (vma, &mapping->i_mmap, pgoff, >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgoff) { >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vma_interval_tree_foreach= (vma, &mapping->i_mmap, pgoff,=20 >>> pgoff) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 /* >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 * Send early kill signal to tasks where a vma covers >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 * the page but the corrupted page is not necessarily >>> @@ -531,7 +532,7 @@ static void collect_procs_file(struct page=20 >>> *page, struct list_head *to_kill, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 * to be informed of all such data corruptions. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 */ >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 if (vma->vm_mm =3D=3D t->mm) >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 add_to_kill(t, page, vma, to_kill); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 add_to_kill(t, page, mapping, pgoff, vma, to_kill); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 read_unlock(&tasklist_lock); >>> @@ -550,7 +551,8 @@ static void collect_procs(struct page *page,=20 >>> struct list_head *tokill, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (PageAnon(page)) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 collect_procs_= anon(page, tokill, force_early); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else >>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 collect_procs_file(page, = tokill, force_early); >>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 collect_procs_file(page, = page->mapping, page_to_pgoff(page), >> >> Why not use page_mapping() helper here? It would be safer for THPs if=20 >> they >> ever get here... >> >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Honza >> >