From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F9F5C83008 for ; Tue, 28 Apr 2020 06:43:30 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B3DFD2076D for ; Tue, 28 Apr 2020 06:43:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B3DFD2076D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id DF772110BA996; Mon, 27 Apr 2020 23:42:33 -0700 (PDT) Received-SPF: Pass (helo) identity=helo; client-ip=211.29.132.246; helo=mail104.syd.optusnet.com.au; envelope-from=david@fromorbit.com; receiver= Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by ml01.01.org (Postfix) with ESMTP id 8B6DB110BA994 for ; Mon, 27 Apr 2020 23:42:30 -0700 (PDT) Received: from dread.disaster.area (pa49-195-157-175.pa.nsw.optusnet.com.au [49.195.157.175]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id DD45E82080A; Tue, 28 Apr 2020 16:43:19 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1jTJxe-0002vq-9O; Tue, 28 Apr 2020 16:43:18 +1000 Date: Tue, 28 Apr 2020 16:43:18 +1000 From: Dave Chinner To: "Ruan, Shiyang" Subject: Re: =?utf-8?B?5Zue5aSNOiBSZQ==?= =?utf-8?Q?=3A?= [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink Message-ID: <20200428064318.GG2040@dread.disaster.area> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> <20200427122836.GD29705@bombadil.infradead.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=X6os11be c=1 sm=1 tr=0 a=ONQRW0k9raierNYdzxQi9Q==:117 a=ONQRW0k9raierNYdzxQi9Q==:17 a=IkcTkHD0fZMA:10 a=cl8xLZFz6L8A:10 a=5KLPUuaC_9wA:10 a=JfrnYn6hAAAA:8 a=7-415B0cAAAA:8 a=Kw4piam9Eq2nsQd2tG8A:9 a=93mTbiTF0b_u7Sz-:21 a=KFoNIqDtwUuuseL_:21 a=QEXdDO2ut3YA:10 a=1CNFftbPRP8L7MoqJWF3:22 a=biEYGPWJfzWAr4FL6Ov7:22 a=pHzHmUro8NiASowvMSCR:22 a=n87TN5wuljxrRezIQYnT:22 Message-ID-Hash: NO6M4VOMYF7MQZKQYTT7BAXWPJP4KZNN X-Message-ID-Hash: NO6M4VOMYF7MQZKQYTT7BAXWPJP4KZNN X-MailFrom: david@fromorbit.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: Matthew Wilcox , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "darrick.wong@oracle.com" , "hch@lst.de" , "rgoldwyn@suse.de" , "Qi, Fuli" , "Gotou, Yasunori" X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 T24gVHVlLCBBcHIgMjgsIDIwMjAgYXQgMDY6MDk6NDdBTSArMDAwMCwgUnVhbiwgU2hpeWFuZyB3 cm90ZToNCj4gDQo+IOWcqCAyMDIwLzQvMjcgMjA6Mjg6MzYsICJNYXR0aGV3IFdpbGNveCIgPHdp bGx5QGluZnJhZGVhZC5vcmc+IOWGmemBkzoNCj4gDQo+ID5PbiBNb24sIEFwciAyNywgMjAyMCBh dCAwNDo0Nzo0MlBNICswODAwLCBTaGl5YW5nIFJ1YW4gd3JvdGU6DQo+ID4+ICBUaGlzIHBhdGNo c2V0IGlzIGEgdHJ5IHRvIHJlc29sdmUgdGhlIHNoYXJlZCAncGFnZSBjYWNoZScgcHJvYmxlbSBm b3INCj4gPj4gIGZzZGF4Lg0KPiA+Pg0KPiA+PiAgSW4gb3JkZXIgdG8gdHJhY2sgbXVsdGlwbGUg bWFwcGluZ3MgYW5kIGluZGV4ZXMgb24gb25lIHBhZ2UsIEkNCj4gPj4gIGludHJvZHVjZWQgYSBk YXgtcm1hcCByYi10cmVlIHRvIG1hbmFnZSB0aGUgcmVsYXRpb25zaGlwLiAgQSBkYXggZW50cnkN Cj4gPj4gIHdpbGwgYmUgYXNzb2NpYXRlZCBtb3JlIHRoYW4gb25jZSBpZiBpcyBzaGFyZWQuICBB dCB0aGUgc2Vjb25kIHRpbWUgd2UNCj4gPj4gIGFzc29jaWF0ZSB0aGlzIGVudHJ5LCB3ZSBjcmVh dGUgdGhpcyByYi10cmVlIGFuZCBzdG9yZSBpdHMgcm9vdCBpbg0KPiA+PiAgcGFnZS0+cHJpdmF0 ZShub3QgdXNlZCBpbiBmc2RheCkuICBJbnNlcnQgKC0+bWFwcGluZywgLT5pbmRleCkgd2hlbg0K PiA+PiAgZGF4X2Fzc29jaWF0ZV9lbnRyeSgpIGFuZCBkZWxldGUgaXQgd2hlbiBkYXhfZGlzYXNz b2NpYXRlX2VudHJ5KCkuDQo+ID4NCj4gPkRvIHdlIHJlYWxseSB3YW50IHRvIHRyYWNrIGFsbCBv ZiB0aGlzIG9uIGEgcGVyLXBhZ2UgYmFzaXM/ICBJIHdvdWxkDQo+ID5oYXZlIHRob3VnaHQgYSBw ZXItZXh0ZW50IGJhc2lzIHdhcyBtb3JlIHVzZWZ1bC4gIEVzc2VudGlhbGx5LCBjcmVhdGUNCj4g PmEgbmV3IGFkZHJlc3Nfc3BhY2UgZm9yIGVhY2ggc2hhcmVkIGV4dGVudC4gIFBlciBwYWdlIGp1 c3Qgc2VlbXMgbGlrZQ0KPiA+YSBodWdlIG92ZXJoZWFkLg0KPiA+DQo+IFBlci1leHRlbnQgdHJh Y2tpbmcgaXMgYSBuaWNlIGlkZWEgZm9yIG1lLiAgSSBoYXZlbid0IHRob3VnaHQgb2YgaXQgDQo+ IHlldC4uLg0KPiANCj4gQnV0IHRoZSBleHRlbnQgaW5mbyBpcyBtYWludGFpbmVkIGJ5IGZpbGVz eXN0ZW0uICBJIHRoaW5rIHdlIG5lZWQgYSB3YXkgDQo+IHRvIG9idGFpbiB0aGlzIGluZm8gZnJv bSBGUyB3aGVuIGFzc29jaWF0aW5nIGEgcGFnZS4gIE1heSBiZSBhIGJpdCANCj4gY29tcGxpY2F0 ZWQuICBMZXQgbWUgdGhpbmsgYWJvdXQgaXQuLi4NCg0KVGhhdCdzIHdoeSBJIHdhbnQgdGhlIC11 c2VyIG9mIHRoaXMgYXNzb2NpYXRpb24tIHRvIGRvIGEgZmlsZXN5c3RlbQ0KY2FsbG91dCBpbnN0 ZWFkIG9mIGtlZXBpbmcgaXQncyBvd24gbmFpdmUgdHJhY2tpbmcgaW5mcmFzdHJ1Y3R1cmUuDQpU aGUgZmlsZXN5c3RlbSBjYW4gZG8gYW4gZWZmaWNpZW50LCBvbi1kZW1hbmQgcmV2ZXJzZSBtYXBw aW5nIGxvb2t1cA0KZnJvbSBpdCdzIG93biBleHRlbnQgdHJhY2tpbmcgaW5mcmFzdHJ1Y3R1cmUs IGFuZCB0aGVyZSdzIHplcm8NCnJ1bnRpbWUgb3ZlcmhlYWQgd2hlbiB0aGVyZSBhcmUgbm8gZXJy b3JzIHByZXNlbnQuDQoNCkF0IHRoZSBtb21lbnQsIHRoaXMgImRheCBhc3NvY2lhdGlvbiIgaXMg dXNlZCB0byAicmVwb3J0IiBhIHN0b3JhZ2UNCm1lZGlhIGVycm9yIGRpcmVjdGx5IHRvIHVzZXJz cGFjZS4gSSBzYXkgInJlcG9ydCIgYmVjYXVzZSB3aGF0IGl0DQpkb2VzIGlzIGtpbGwgdXNlcnNw YWNlIHByb2Nlc3NlcyBkZWFkLiBUaGUgc3RvcmFnZSBtZWRpYSBlcnJvcg0KYWN0dWFsbHkgbmVl ZHMgdG8gYmUgcmVwb3J0ZWQgdG8gdGhlIG93bmVyIG9mIHRoZSBzdG9yYWdlIG1lZGlhLA0Kd2hp Y2ggaW4gdGhlIGNhc2Ugb2YgRlMtREFYIGlzIHRoZSBmaWxlc3l0ZW0uDQoNClRoYXQgd2F5IHRo ZSBmaWxlc3lzdGVtIGNhbiB0aGVuIGxvb2sgdXAgYWxsIHRoZSBvd25lcnMgb2YgdGhhdCBiYWQN Cm1lZGlhIHJhbmdlIChpLmUuIHRoZSBmaWxlc3lzdGVtIGJsb2NrIGl0IGNvcnJlc3BvbmRzIHRv KSBhbmQgdGFrZQ0KYXBwcm9wcmlhdGUgYWN0aW9uLiBlLmcuDQoNCi0gaWYgaXQgZmFsbHMgaW4g ZmlsZXN5dGVtIG1ldGFkYXRhLCBzaHV0ZG93biB0aGUgZmlsZXN5c3RlbQ0KLSBpZiBpdCBmYWxs cyBpbiB1c2VyIGRhdGEsIGNhbGwgdGhlICJraWxsIHVzZXJzcGFjZSBkZWFkIiByb3V0aW5lcw0K ICBmb3IgZWFjaCBtYXBwaW5nL2luZGV4IHR1cGxlIHRoZSBmaWxlc3lzdGVtIGZpbmRzIGZvciB0 aGUgZ2l2ZW4NCiAgTEJBIGFkZHJlc3MgdGhhdCB0aGUgbWVkaWEgZXJyb3Igb2NjdXJyZWQuDQoN ClJpZ2h0IG5vdyBpZiB0aGUgbWVkaWEgZXJyb3IgaXMgaW4gZmlsZXN5c3RlbSBtZXRhZGF0YSwg dGhlDQpmaWxlc3lzdGVtIGlzbid0IGV2ZW4gdG9sZCBhYm91dCBpdC4gVGhlIGZpbGVzeXN0ZW0g Y2FuJ3QgZXZlbiBzaHV0DQpkb3duIC0gdGhlIGVycm9yIGlzIGp1c3QgZHJvcHBlZCBvbiB0aGUg Zmxvb3IgYW5kIGl0IHdvbid0IGJlIHVudGlsDQp0aGUgZmlsZXN5c3RlbSBuZXh0IHRyaWVzIHRv IHJlZmVyZW5jZSB0aGF0IG1ldGFkYXRhIHRoYXQgd2Ugbm90aWNlDQp0aGVyZSBpcyBhbiBpc3N1 ZS4NCg0KQ2hlZXJzLA0KDQpEYXZlLg0KLS0gDQpEYXZlIENoaW5uZXINCmRhdmlkQGZyb21vcmJp dC5jb20KX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KTGlu dXgtbnZkaW1tIG1haWxpbmcgbGlzdCAtLSBsaW51eC1udmRpbW1AbGlzdHMuMDEub3JnClRvIHVu c3Vic2NyaWJlIHNlbmQgYW4gZW1haWwgdG8gbGludXgtbnZkaW1tLWxlYXZlQGxpc3RzLjAxLm9y Zwo= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFC83C83000 for ; Tue, 28 Apr 2020 06:43:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A8D4E2063A for ; Tue, 28 Apr 2020 06:43:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726256AbgD1Gn2 (ORCPT ); Tue, 28 Apr 2020 02:43:28 -0400 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:46019 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725917AbgD1Gn2 (ORCPT ); Tue, 28 Apr 2020 02:43:28 -0400 Received: from dread.disaster.area (pa49-195-157-175.pa.nsw.optusnet.com.au [49.195.157.175]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id DD45E82080A; Tue, 28 Apr 2020 16:43:19 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1jTJxe-0002vq-9O; Tue, 28 Apr 2020 16:43:18 +1000 Date: Tue, 28 Apr 2020 16:43:18 +1000 From: Dave Chinner To: "Ruan, Shiyang" Cc: Matthew Wilcox , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "darrick.wong@oracle.com" , "dan.j.williams@intel.com" , "hch@lst.de" , "rgoldwyn@suse.de" , "Qi, Fuli" , "Gotou, Yasunori" Subject: Re: =?utf-8?B?5Zue5aSNOiBSZQ==?= =?utf-8?Q?=3A?= [RFC PATCH 0/8] dax: Add a dax-rmap tree to support reflink Message-ID: <20200428064318.GG2040@dread.disaster.area> References: <20200427084750.136031-1-ruansy.fnst@cn.fujitsu.com> <20200427122836.GD29705@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=X6os11be c=1 sm=1 tr=0 a=ONQRW0k9raierNYdzxQi9Q==:117 a=ONQRW0k9raierNYdzxQi9Q==:17 a=IkcTkHD0fZMA:10 a=cl8xLZFz6L8A:10 a=5KLPUuaC_9wA:10 a=JfrnYn6hAAAA:8 a=7-415B0cAAAA:8 a=Kw4piam9Eq2nsQd2tG8A:9 a=93mTbiTF0b_u7Sz-:21 a=KFoNIqDtwUuuseL_:21 a=QEXdDO2ut3YA:10 a=1CNFftbPRP8L7MoqJWF3:22 a=biEYGPWJfzWAr4FL6Ov7:22 a=pHzHmUro8NiASowvMSCR:22 a=n87TN5wuljxrRezIQYnT:22 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Apr 28, 2020 at 06:09:47AM +0000, Ruan, Shiyang wrote: > > 在 2020/4/27 20:28:36, "Matthew Wilcox" 写道: > > >On Mon, Apr 27, 2020 at 04:47:42PM +0800, Shiyang Ruan wrote: > >> This patchset is a try to resolve the shared 'page cache' problem for > >> fsdax. > >> > >> In order to track multiple mappings and indexes on one page, I > >> introduced a dax-rmap rb-tree to manage the relationship. A dax entry > >> will be associated more than once if is shared. At the second time we > >> associate this entry, we create this rb-tree and store its root in > >> page->private(not used in fsdax). Insert (->mapping, ->index) when > >> dax_associate_entry() and delete it when dax_disassociate_entry(). > > > >Do we really want to track all of this on a per-page basis? I would > >have thought a per-extent basis was more useful. Essentially, create > >a new address_space for each shared extent. Per page just seems like > >a huge overhead. > > > Per-extent tracking is a nice idea for me. I haven't thought of it > yet... > > But the extent info is maintained by filesystem. I think we need a way > to obtain this info from FS when associating a page. May be a bit > complicated. Let me think about it... That's why I want the -user of this association- to do a filesystem callout instead of keeping it's own naive tracking infrastructure. The filesystem can do an efficient, on-demand reverse mapping lookup from it's own extent tracking infrastructure, and there's zero runtime overhead when there are no errors present. At the moment, this "dax association" is used to "report" a storage media error directly to userspace. I say "report" because what it does is kill userspace processes dead. The storage media error actually needs to be reported to the owner of the storage media, which in the case of FS-DAX is the filesytem. That way the filesystem can then look up all the owners of that bad media range (i.e. the filesystem block it corresponds to) and take appropriate action. e.g. - if it falls in filesytem metadata, shutdown the filesystem - if it falls in user data, call the "kill userspace dead" routines for each mapping/index tuple the filesystem finds for the given LBA address that the media error occurred. Right now if the media error is in filesystem metadata, the filesystem isn't even told about it. The filesystem can't even shut down - the error is just dropped on the floor and it won't be until the filesystem next tries to reference that metadata that we notice there is an issue. Cheers, Dave. -- Dave Chinner david@fromorbit.com