From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 80B782034C080 for ; Thu, 26 Oct 2017 16:47:20 -0700 (PDT) From: "Williams, Dan J" Subject: Re: [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less' support Date: Thu, 26 Oct 2017 23:51:04 +0000 Message-ID: <1509061831.25213.2.camel@intel.com> References: <150846713528.24336.4459262264611579791.stgit@dwillia2-desk3.amr.corp.intel.com> <20171020074750.GA13568@lst.de> <20171020093148.GA20304@lst.de> <20171026105850.GA31161@quack2.suse.cz> In-Reply-To: <20171026105850.GA31161@quack2.suse.cz> Content-Language: en-US Content-ID: <6B1E8E88FC7C9E4D9754113A024144AD@intel.com> MIME-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-7" Content-Transfer-Encoding: base64 Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: "hch@lst.de" , "jack@suse.cz" Cc: "mhocko@suse.com" , "benh@kernel.crashing.org" , "dave.hansen@linux.intel.com" , "heiko.carstens@de.ibm.com" , "bfields@fieldses.org" , "linux-mm@kvack.org" , "paulus@samba.org" , "Hefty, Sean" , "jlayton@poochiereds.net" , "mawilcox@microsoft.com" , "linux-rdma@vger.kernel.org" , "mpe@ellerman.id.au" , "dledford@redhat.com" , "jgunthorpe@obsidianresearch.com" , "hal.rosenstock@gmail.com" , "david@fromorbit.com" , "schwidefsky@de.ibm.com" , "viro@zeniv.linux.org.uk" , "gerald.schaefer@de.ibm.com" , "linux-nvdimm@lists.01.org" , "darrick.wong@oracle.com" , "linux-kernel@vger.kernel.org" , "linux-xfs@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "akpm@linux-foundation.org" , "kirill.shutemov@linux.intel.com" List-ID: T24gVGh1LCAyMDE3LTEwLTI2IGF0IDEyOjU4ICstMDIwMCwgSmFuIEthcmEgd3JvdGU6Cj4gT24g RnJpIDIwLTEwLTE3IDExOjMxOjQ4LCBDaHJpc3RvcGggSGVsbHdpZyB3cm90ZToKPiA+IE9uIEZy aSwgT2N0IDIwLCAyMDE3IGF0IDA5OjQ3OjUwQU0gKy0wMjAwLCBDaHJpc3RvcGggSGVsbHdpZyB3 cm90ZToKPiA+ID4gSSdkIGxpa2UgdG8gYnJhaW5zdG9ybSBob3cgd2UgY2FuIGRvIHNvbWV0aGlu ZyBiZXR0ZXIuCj4gPiA+IAo+ID4gPiBIb3cgYWJvdXQ6Cj4gPiA+IAo+ID4gPiBJZiB3ZSBoaXQg YSBwYWdlIHdpdGggYW4gZWxldmF0ZWQgcmVmY291bnQgaW4gdHJ1bmNhdGUgLyBob2xlIHB1Y2gK PiA+ID4gZXRjIGZvciBhIERBWCBmaWxlIHN5c3RlbSB3ZSBkbyBub3QgZnJlZSB0aGUgYmxvY2tz IGluIHRoZSBmaWxlIHN5c3RlbSwKPiA+ID4gYnV0IGFkZCBpdCB0byB0aGUgZXh0ZW50IGJ1c3kg bGlzdC4rQUtBQW9BLVdlIG1hcmsgdGhlIHBhZ2UgYXMgZGVsYXllZAo+ID4gPiBmcmVlIChlLmcu IHBhZ2UgZmxhZz8pIHNvIHRoYXQgd2hlbiBpdCBmaW5hbGx5IGhpdHMgcmVmY291bnQgemVybyB3 ZQo+ID4gPiBjYWxsIGJhY2sgaW50byB0aGUgZmlsZSBzeXN0ZW0gdG8gcmVtb3ZlIGl0IGZyb20g dGhlIGJ1c3kgbGlzdC4KPiA+IAo+ID4gQnJhaW5zdG9ybWluZyBzb21lIG1vcmU6Cj4gPiAKPiA+ IEdpdmVuIHRoYXQgb24gYSBEQVggZmlsZSB0aGVyZSBzaG91bGRuJ3QgYmUgYW55IGxvbmctdGVy bSBwYWdlCj4gPiByZWZlcmVuY2VzIGFmdGVyIHdlIHVubWFwIGl0IGZyb20gdGhlIHBhZ2UgdGFi bGUgYW5kIGRvbid0IGFsbG93Cj4gPiBnZXRfdXNlcl9wYWdlcyBjYWxscyB3aHkgbm90IHdhaXQg Zm9yIHRoZSByZWZlcmVuY2VzIGZvciBhbGwKPiA+IERBWCBwYWdlcyB0byBnbyBhd2F5IGZpcnN0 PytBS0FBb0EtRS5nLiBpZiB3ZSBmaW5kIGEgREFYIHBhZ2UgaW4KPiA+IHRydW5jYXRlX2lub2Rl X3BhZ2VzX3JhbmdlIHRoYXQgaGFzIGFuIGVsZXZhdGVkIHJlZmNvdW50IHdlIHNldAo+ID4gYSBu ZXcgZmxhZyB0byBwcmV2ZW50IG5ldyByZWZlcmVuY2VzIGZyb20gc2hvd2luZyB1cCwgYW5kIHRo ZW4KPiA+IHNpbXBseSB3YWl0IGZvciBpdCB0byBnbyBhd2F5LitBS0FBb0EtSW5zdGVhZCBvZiBh IGJ1c3kgd2F5IHdlIGNhbgo+ID4gZG8gdGhpcyB0aHJvdWdoIGEgZmV3IGhhc2hlZCB3YWl0cXVl dWVkIGluIGRldl9wYWdlbWFwLitBS0FBb0EtQW5kIGluCj4gPiBmYWN0IHB1dF96b25lX2Rldmlj ZV9wYWdlIGFscmVhZHkgZ2V0cyBjYWxsZWQgd2hlbiBwdXR0aW5nIHRoZQo+ID4gbGFzdCBwYWdl IHNvIHdlIGNhbiBoYW5kbGUgdGhlIHdha2V1cCBmcm9tIHRoZXJlLgo+ID4gCj4gPiBJbiBmYWN0 IGlmIHdlIGNhbid0IGZpbmQgYSBwYWdlIGZsYWcgZm9yIHRoZSBzdG9wIG5ldyBjYWxsZXJzCj4g PiB0aGluZ3Mgd2UgY291bGQgcHJvYmFibHkgY29tZSB1cCB3aXRoIGEgd2F5IHRvIGRvIHRoYXQg dGhyb3VnaAo+ID4gZGV2X3BhZ2VtYXAgc29tZWhvdywgYnV0IEknbSBub3Qgc3VyZSBob3cgZWZm aWNpZW50IHRoYXQgd291bGQKPiA+IGJlLgo+IAo+IFdlIHdlcmUgdGFsa2luZyBhYm91dCB0aGlz IHllc3RlcmRheSB3aXRoIERhbiBzbyBzb21lIG1vcmUgYnJhaW5zdG9ybWluZwo+IGZyb20gdXMu IFdlIGNhbiBpbXBsZW1lbnQgdGhlIHNvbHV0aW9uIHdpdGggZXh0ZW50IGJ1c3kgbGlzdCBpbiBl eHQ0Cj4gcmVsYXRpdmVseSBlYXNpbHkgLSB3ZSBhbHJlYWR5IGhhdmUgc3VjaCBsaXN0IGN1cnJl bnRseSBzaW1pbGFybHkgdG8gWEZTLgo+IFRoZXJlIHdvdWxkIGJlIHNvbWUgbW9kaWZpY2F0aW9u cyBuZWVkZWQgYnV0IG5vdGhpbmcgdG9vIGNvbXBsZXguIFRoZQo+IGJpZ2dlc3QgZG93bnNpZGUg b2YgdGhpcyBzb2x1dGlvbiBJIHNlZSBpcyB0aGF0IGl0IHJlcXVpcmVzIHBlci1maWxlc3lzdGVt Cj4gc29sdXRpb24gZm9yIGJ1c3kgZXh0ZW50cyAtIGV4dDQgYW5kIFhGUyBhcmUgcmVhc29uYWJs eSBmaW5lLCBob3dldmVyIGJ0cmZzCj4gbWF5IGhhdmUgcHJvYmxlbXMgYW5kIGV4dDIgZGVmaW5p dGVseSB3aWxsIG5lZWQgc29tZSBtb2RpZmljYXRpb25zLgo+IEludmlzaWJsZSB1c2VkIGJsb2Nr cyBtYXkgYmUgc3VycHJpc2luZyB0byB1c2VycyBhdCB0aW1lcyBhbHRob3VnaCBnaXZlbgo+IHBh Z2UgcmVmcyBzaG91bGQgYmUgcmVsYXRpdmVseSBzaG9ydCB0ZXJtLCB0aGF0IHNob3VsZCBub3Qg YmUgYSBiaWcgaXNzdWUuCj4gQnV0IGFyZSB3ZSBndWFyYW50ZWVkIHBhZ2UgcmVmcyBhcmUgc2hv cnQgdGVybT8gRS5nLiBpZiBzb21lb25lIGNyZWF0ZXMKPiB2NGwyIHZpZGVvYnVmIGluIE1BUF9T SEFSRUQgbWFwcGluZyBvZiBhIGZpbGUgb24gREFYIGZpbGVzeXN0ZW0sIHBhZ2UgcmVmcwo+IGNh biBiZSByYXRoZXIgbG9uZy10ZXJtIHNpbWlsYXJseSBhcyBpbiBSRE1BIGNhc2UuIEFsc28gZnJl ZWluZyBvZiBibG9ja3MKPiBvbiBwYWdlIHJlZmVyZW5jZSBkcm9wIGlzIGFub3RoZXIgYXN5bmMg ZW50cnkgcG9pbnQgaW50byB0aGUgZmlsZXN5c3RlbQo+IHdoaWNoIGNvdWxkIHVucGxlYXNhbnRs eSBzdXJwcmlzZSB1cyBidXQgSSBndWVzcyB3b3JrcXVldWVzIHdvdWxkIHNvbHZlCj4gdGhhdCBy ZWFzb25hYmx5IGZpbmUuCj4gCj4gV1JUIHdhaXRpbmcgZm9yIHBhZ2UgcmVmcyB0byBiZSBkcm9w cGVkIGJlZm9yZSBwcm9jZWVkaW5nIHdpdGggdHJ1bmNhdGUgKG9yCj4gcHVuY2ggaG9sZSBmb3Ig dGhhdCBtYXR0ZXIgLSB0aGF0IGNhc2UgaXMgZXZlbiBuYXN0aWVyIHNpbmNlIHdlIGRvbid0IGhh dmUKPiBpX3NpemUgdG8gZ3VhcmQgdXMpLiBXaGF0IEkgbGlrZSBhYm91dCB0aGlzIHNvbHV0aW9u IGlzIHRoYXQgaXQgaXMgdmVyeQo+IHZpc2libGUgdGhlcmUncyBzb21ldGhpbmcgdW51c3VhbCBn b2luZyBvbiB3aXRoIHRoZSBmaWxlIGJlaW5nIHRydW5jYXRlZCAvCj4gcHVuY2hlZCBhbmQgc28g cHJvYmxlbXMgYXJlIGVhc2llciB0byBkaWFnbm9zZSAvIGZpeCBmcm9tIHRoZSBhZG1pbiBzaWRl Lgo+IFNvIGZhciB3ZSBoYXZlIGd1YXJkZWQgaG9sZSBwdW5jaGluZyBmcm9tIGNvbmN1cnJlbnQg ZmF1bHRzIChhbmQKPiBnZXRfdXNlcl9wYWdlcygpIGRvZXMgZmF1bHQgb25jZSB5b3UgZG8gdW5t YXBfbWFwcGluZ19yYW5nZSgpKSB3aXRoCj4gSV9NTUFQX0xPQ0sgKG9yIGl0cyBlcXVpdmFsZW50 IGluIGV4dDQpLiBXZSBjYW5ub3QgZWFzaWx5IHdhaXQgZm9yIHBhZ2UKPiByZWZzIHRvIGJlIGRy b3BwZWQgdW5kZXIgSV9NTUFQX0xPQ0sgYXMgdGhhdCBjb3VsZCBkZWFkbG9jayAtIHRoZSBtb3N0 Cj4gb2J2aW91cyBjYXNlIERhbiBjYW1lIHVwIHdpdGggaXMgd2hlbiBHVVAgb2J0YWlucyByZWYg dG8gcGFnZSBBLCB0aGVuIGhvbGUKPiBwdW5jaCBjb21lcyBncmFiYmluZyBJX01NQVBfTE9DSyBh bmQgd2FpdGluZyBmb3IgcGFnZSByZWYgb24gQSB0byBiZQo+IGRyb3BwZWQsIGFuZCB0aGVuIEdV UCBibG9ja3Mgb24gdHJ5aW5nIHRvIGZhdWx0IGluIGFub3RoZXIgcGFnZS4KPiAKPiBJIHRoaW5r IHdlIGNhbm5vdCBlYXNpbHkgcHJldmVudCBuZXcgcGFnZSByZWZlcmVuY2VzIHRvIGJlIGdyYWJi ZWQgYXMgeW91Cj4gd3JpdGUgYWJvdmUgc2luY2Ugbm9ib2R5IGV4cGVjdHMgc3R1ZmYgbGlrZSBn ZXRfcGFnZSgpIHRvIGZhaWwuIEJ1dCBJK0FLQQo+IHRoaW5rIHRoYXQgdW5tYXBwaW5nIHJlbGV2 YW50IHBhZ2VzIGFuZCB0aGVuIHByZXZlbnRpbmcgdGhlbSB0byBiZSBmYXVsdGVkCj4gaW4gYWdh aW4gaXMgd29ya2FibGUgYW5kIHN0b3BzIEdVUCBhcyB3ZWxsLiBUaGUgcHJvYmxlbSB3aXRoIHRo YXQgaXMgdGhvdWdoCj4gd2hhdCB0byBkbyB3aXRoIHBhZ2UgZmF1bHRzIHRvIHN1Y2ggcGFnZXMg LSB5b3UgY2Fubm90IGp1c3QgZmFpbCB0aGVtIGZvcgo+IGhvbGUgcHVuY2gsIGFuZCB5b3UgY2Fu bm90IGVhc2lseSBhbGxvY2F0ZSBuZXcgYmxvY2tzIGVpdGhlci4gU28gd2UgYXJlCj4gYmFjayBh dCBhIHNpdHVhdGlvbiB3aGVyZSB3ZSBuZWVkIHRvIGRldGFjaCBibG9ja3MgZnJvbSB0aGUgaW5v ZGUgYW5kIHRoZW4KPiB3YWl0IGZvciBwYWdlIHJlZnMgdG8gYmUgZHJvcHBlZCAtIHNvIHNvbWUg Zm9ybSBvZiBidXN5IGV4dGVudHMuIEFtIEkKPiBtaXNzaW5nIHNvbWV0aGluZz8KPiAKCk5vLCB0 aGF0J3MgYSBnb29kIHN1bW1hcnkgb2Ygd2hhdCB3ZSB0YWxrZWQgYWJvdXQuIEhvd2V2ZXIsIEkg ZGlkIGdvCmJhY2sgYW5kIGdpdmUgdGhlIG5ldyBsb2NrIGFwcHJvYWNoIGEgdHJ5IGFuZCB3YXMg YWJsZSB0byBnZXQgbXkgdGVzdAp0byBwYXNzLiBUaGUgbmV3IGxvY2tpbmcgaXMgbm90IHByZXR0 eSBlc3BlY2lhbGx5IHNpbmNlIHlvdSBuZWVkIHRvCmRyb3AgYW5kIHJlYWNxdWlyZSB0aGUgbG9j ayBzbyB0aGF0IGdldF91c2VyX3BhZ2VzKCkgY2FuIGZpbmlzaApncmFiYmluZyBhbGwgdGhlIHBh Z2VzIGl0IG5lZWRzLiBIZXJlIGFyZSB0aGUgdHdvIHByaW1hcnkgcGF0Y2hlcyBpbgp0aGUgc2Vy aWVzLCBkbyB5b3UgdGhpbmsgdGhlIGV4dGVudC1idXN5IGFwcHJvYWNoIHdvdWxkIGJlIGNsZWFu ZXI/CgotLS0KCmNvbW1pdCA1MDIzZDIwYTBhYTc5NWRkYWZkNDM2NTViZTFiZmIyY2JjN2Y0NDQ1 CkF1dGhvcjogRGFuIFdpbGxpYW1zIDxkYW4uai53aWxsaWFtc0BpbnRlbC5jb20+CkRhdGU6ICAg V2VkIE9jdCAyNSAwNToxNDo1NCAyMDE3IC0wNzAwCgogICAgbW0sIGRheDogaGFuZGxlIHRydW5j YXRlIG9mIGRtYS1idXN5IHBhZ2VzCiAgICAKICAgIGdldF91c2VyX3BhZ2VzKCkgcGlucyBmaWxl IGJhY2tlZCBtZW1vcnkgcGFnZXMgZm9yIGFjY2VzcyBieSBkbWEKICAgIGRldmljZXMuIEhvd2V2 ZXIsIGl0IG9ubHkgcGlucyB0aGUgbWVtb3J5IHBhZ2VzIG5vdCB0aGUgcGFnZS10by1maWxlCiAg ICBvZmZzZXQgYXNzb2NpYXRpb24uIElmIGEgZmlsZSBpcyB0cnVuY2F0ZWQgdGhlIHBhZ2VzIGFy ZSBtYXBwZWQgb3V0IG9mCiAgICB0aGUgZmlsZSBhbmQgZG1hIG1heSBjb250aW51ZSBpbmRlZmlu aXRlbHkgaW50byBhIHBhZ2UgdGhhdCBpcyBvd25lZCBieQogICAgYSBkZXZpY2UgZHJpdmVyLiBU aGlzIGJyZWFrcyBjb2hlcmVuY3kgb2YgdGhlIGZpbGUgdnMgZG1hLCBidXQgdGhlCiAgICBhc3N1 bXB0aW9uIGlzIHRoYXQgaWYgdXNlcnNwYWNlIHdhbnRzIHRoZSBmaWxlLXNwYWNlIHRydW5jYXRl ZCBpdCBkb2VzCiAgICBub3QgbWF0dGVyIHdoYXQgZGF0YSBpcyBpbmJvdW5kIGZyb20gdGhlIGRl dmljZSwgaXQgaXMgbm90IHJlbGV2YW50CiAgICBhbnltb3JlLgogICAgCiAgICBUaGUgYXNzdW1w dGlvbnMgb2YgdGhlIHRydW5jYXRlLXBhZ2UtY2FjaGUgbW9kZWwgYXJlIGJyb2tlbiBieSBEQVgg d2hlcmUKICAgIHRoZSB0YXJnZXQgRE1BIHBhZ2UgKmlzKiB0aGUgZmlsZXN5c3RlbSBibG9jay4g TGVhdmluZyB0aGUgcGFnZSBwaW5uZWQKICAgIGZvciBETUEsIGJ1dCB0cnVuY2F0aW5nIHRoZSBm aWxlIGJsb2NrIG91dCBvZiB0aGUgZmlsZSwgbWVhbnMgdGhhdCB0aGUKICAgIGZpbGVzeXRlbSBp cyBmcmVlIHRvIHJlYWxsb2NhdGUgYSBibG9jayB1bmRlciBhY3RpdmUgRE1BIHRvIGFub3RoZXIK ICAgIGZpbGUhCiAgICAKICAgIEhlcmUgYXJlIHNvbWUgcG9zc2libGUgb3B0aW9ucyBmb3IgZml4 aW5nIHRoaXMgc2l0dWF0aW9uICgndHJ1bmNhdGUnIGFuZAogICAgJ2ZhbGxvY2F0ZShwdW5jaCBo b2xlKScgYXJlIHN5bm9ueW1vdXMgYmVsb3cpOgogICAgCiAgICAgICAgMS8gRmFpbCB0cnVuY2F0 ZSB3aGlsZSBhbnkgZmlsZSBibG9ja3MgbWlnaHQgYmUgdW5kZXIgZG1hCiAgICAKICAgICAgICAy LyBCbG9jayAoc2xlZXAtd2FpdCkgdHJ1bmNhdGUgd2hpbGUgYW55IGZpbGUgYmxvY2tzIG1pZ2h0 IGJlIHVuZGVyCiAgICAgICAgICAgZG1hCiAgICAKICAgICAgICAzLyBSZW1hcCBmaWxlIGJsb2Nr cyB0byBhICJsb3N0Ky1mb3VuZCItbGlrZSBmaWxlLWlub2RlIHdoZXJlCiAgICAgICAgICAgZG1h IGNhbiBjb250aW51ZSBhbmQgd2UgbWlnaHQgc2VlIHdoYXQgaW5ib3VuZCBkYXRhIGZyb20gRE1B IHdhcwogICAgICAgICAgIG1hcHBlZCBvdXQgb2YgdGhlIG9yaWdpbmFsIGZpbGUuIEJsb2NrcyBp biB0aGlzIGZpbGUgY291bGQgYmUKICAgICAgICAgICBmcmVlZCBiYWNrIHRvIHRoZSBmaWxlc3lz dGVtIHdoZW4gZG1hIGV2ZW50dWFsbHkgZW5kcy4KICAgIAogICAgICAgIDQvIExpc3QgdGhlIGJs b2NrcyB1bmRlciBETUEgaW4gdGhlIGV4dGVudCBidXN5IGxpc3QgYW5kIGVpdGhlciBob2xkCiAg ICAgICAgICAgb2ZmIGNvbW1pdCBvZiB0aGUgdHJ1bmNhdGUgdHJhbnNhY3Rpb24gdW50aWwgY29t bWl0LCBvciBvdGhlcndpc2UKICAgICAgICAgICBrZWVwIHRoZSBibG9ja3MgbWFya2VkIGJ1c3kg c28gdGhlIGFsbG9jYXRvciBkb2VzIG5vdCByZXVzZSB0aGVtCiAgICAgICAgICAgdW50aWwgRE1B IGNvbXBsZXRlcy4KICAgIAogICAgICAgIDUvIERpc2FibGUgZGF4IHVudGlsIG9wdGlvbiAzIG9y IGFub3RoZXIgbG9uZyB0ZXJtIHNvbHV0aW9uIGhhcyBiZWVuCiAgICAgICAgICAgaW1wbGVtZW50 ZWQuIEhvd2V2ZXIsIGZpbGVzeXN0ZW0tZGF4IGlzIHN0aWxsIG1hcmtlZCBleHBlcmltZW50YWwK ICAgICAgICAgICBmb3IgY29uY2VybnMgbGlrZSB0aGlzLgogICAgCiAgICBPcHRpb24gMSB3aWxs IHRocm93IGZhaWx1cmVzIHdoZXJlIHVzZXJzcGFjZSBoYXMgbmV2ZXIgZXhwZWN0ZWQgdGhlbQog ICAgYmVmb3JlLCBvcHRpb24gMiBtaWdodCBoYW5nIHRoZSB0cnVuY2F0aW5nIHByb2Nlc3MgaW5k ZWZpbml0ZWx5LCBhbmQKICAgIG9wdGlvbiAzIHJlcXVpcmVzIHBlciBmaWxlc3lzdGVtIGVuYWJs aW5nIHRvIHJlbWFwIGJsb2NrcyBmcm9tIG9uZSBpbm9kZQogICAgdG8gYW5vdGhlci4gIE9wdGlv biAyIGlzIGltcGxlbWVudGVkIGluIHRoaXMgcGF0Y2ggZm9yIHRoZSBEQVggcGF0aCB3aXRoCiAg ICB0aGUgZXhwZWN0YXRpb24gdGhhdCBub24tdHJhbnNpZW50IHVzZXJzIG9mIGdldF91c2VyX3Bh Z2VzKCkgKFJETUEpIGFyZQogICAgZGlzYWxsb3dlZCBmcm9tIHNldHRpbmcgdXAgZGF4IG1hcHBp bmdzIGFuZCB0aGF0IHRoZSBwb3RlbnRpYWwgZGVsYXkKICAgIGludHJvZHVjZWQgdG8gdGhlIHRy dW5jYXRlIHBhdGggaXMgYWNjZXB0YWJsZSBjb21wYXJlZCB0byB0aGUgcmVzcG9uc2UKICAgIHRp bWUgb2YgdGhlIHBhZ2UgY2FjaGUgY2FzZS4gVGhpcyBjYW4gb25seSBiZSBzZWVuIGFzIGEgc3Rv cC1nYXAgdW50aWwKICAgIHdlIGNhbiBzb2x2ZSB0aGUgcHJvYmxlbSBvZiBzYWZlbHkgc2VxdWVz dGVyaW5nIHVuYWxsb2NhdGVkIGZpbGVzeXN0ZW0KICAgIGJsb2NrcyB1bmRlciBhY3RpdmUgZG1h LgogICAgCiAgICBUaGUgc29sdXRpb24gaW50cm9kdWNlcyBhIG5ldyBpbm9kZSBzZW1hcGhvcmUg dGhhdCB0aGF0IGlzIGhlbGQKICAgIGV4Y2x1c2l2ZWx5IGZvciBnZXRfdXNlcl9wYWdlcygpIGFu ZCBoZWxkIGZvciByZWFkIGF0IHRydW5jYXRlIHdoaWxlCiAgICBzbGVlcC13YWl0aW5nIG9uIGEg aGFzaGVkIHdhaXRxdWV1ZS4KICAgIAogICAgQ3JlZGl0IGZvciBvcHRpb24gMyBnb2VzIHRvIERh dmUgSGFuc2VuLCB3aG8gcHJvcG9zZWQgc29tZXRoaW5nIHNpbWlsYXIKICAgIGFzIGFuIGFsdGVy bmF0aXZlIHdheSB0byBzb2x2ZSB0aGUgcHJvYmxlbSB0aGF0IE1BUF9ESVJFQ1Qgd2FzIHRyeWlu ZyB0bwogICAgc29sdmUuIENyZWRpdCBmb3Igb3B0aW9uIDQgZ29lcyB0byBDaHJpc3RvcGggSGVs bHdpZy4KICAgIAogICAgQ2M6IEphbiBLYXJhIDxqYWNrQHN1c2UuY3o+CiAgICBDYzogSmVmZiBN b3llciA8am1veWVyQHJlZGhhdC5jb20+CiAgICBDYzogRGF2ZSBDaGlubmVyIDxkYXZpZEBmcm9t b3JiaXQuY29tPgogICAgQ2M6IE1hdHRoZXcgV2lsY294IDxtYXdpbGNveEBtaWNyb3NvZnQuY29t PgogICAgQ2M6IEFsZXhhbmRlciBWaXJvIDx2aXJvQHplbml2LmxpbnV4Lm9yZy51az4KICAgIENj OiAiRGFycmljayBKLiBXb25nIiA8ZGFycmljay53b25nQG9yYWNsZS5jb20+CiAgICBDYzogUm9z cyBad2lzbGVyIDxyb3NzLnp3aXNsZXJAbGludXguaW50ZWwuY29tPgogICAgQ2M6IERhdmUgSGFu c2VuIDxkYXZlLmhhbnNlbkBsaW51eC5pbnRlbC5jb20+CiAgICBDYzogQW5kcmV3IE1vcnRvbiA8 YWtwbUBsaW51eC1mb3VuZGF0aW9uLm9yZz4KICAgIFJlcG9ydGVkLWJ5OiBDaHJpc3RvcGggSGVs bHdpZyA8aGNoQGxzdC5kZT4KICAgIFNpZ25lZC1vZmYtYnk6IERhbiBXaWxsaWFtcyA8ZGFuLmou d2lsbGlhbXNAaW50ZWwuY29tPgoKZGlmZiAtLWdpdCBhL2RyaXZlcnMvZGF4L3N1cGVyLmMgYi9k cml2ZXJzL2RheC9zdXBlci5jCmluZGV4IDRhYzM1OWUxNDc3Ny4uYTVhNGI5NWZmZGFmIDEwMDY0 NAotLS0gYS9kcml2ZXJzL2RheC9zdXBlci5jCistKy0rLSBiL2RyaXZlcnMvZGF4L3N1cGVyLmMK QEAgLTE2Nyw2ICstMTY3LDcgQEAgc3RydWN0IGRheF9kZXZpY2UgewogI2lmIElTX0VOQUJMRUQo Q09ORklHX0ZTX0RBWCkKIHN0YXRpYyB2b2lkIGdlbmVyaWNfZGF4X3BhZ2VmcmVlKHN0cnVjdCBw YWdlICpwYWdlLCB2b2lkICpkYXRhKQogeworLQl3YWtlX3VwX2Rldm1hcF9pZGxlKCZwYWdlLT5f cmVmY291bnQpOwogfQogCiBzdHJ1Y3QgZGF4X2RldmljZSAqZnNfZGF4X2NsYWltX2JkZXYoc3Ry dWN0IGJsb2NrX2RldmljZSAqYmRldiwgdm9pZCAqb3duZXIpCmRpZmYgLS1naXQgYS9mcy9kYXgu YyBiL2ZzL2RheC5jCmluZGV4IGZkNWQzODU5ODhkMS4uZjJjOThmOWNiODMzIDEwMDY0NAotLS0g YS9mcy9kYXguYworLSstKy0gYi9mcy9kYXguYwpAQCAtMzQ2LDYgKy0zNDYsMTkgQEAgc3RhdGlj IHZvaWQgZGF4X2Rpc2Fzc29jaWF0ZV9lbnRyeSh2b2lkICplbnRyeSwgc3RydWN0IGlub2RlICpp bm9kZSwgYm9vbCB0cnVuYykKIAl9CiB9CiAKKy1zdGF0aWMgc3RydWN0IHBhZ2UgKmRtYV9idXN5 X3BhZ2Uodm9pZCAqZW50cnkpCisteworLQl1bnNpZ25lZCBsb25nIHBmbiwgZW5kX3BmbjsKKy0K Ky0JZm9yX2VhY2hfZW50cnlfcGZuKGVudHJ5LCBwZm4sIGVuZF9wZm4pIHsKKy0JCXN0cnVjdCBw YWdlICpwYWdlID0gcGZuX3RvX3BhZ2UocGZuKTsKKy0KKy0JCWlmIChwYWdlX3JlZl9jb3VudChw YWdlKSA+IDEpCistCQkJcmV0dXJuIHBhZ2U7CistCX0KKy0JcmV0dXJuIE5VTEw7CistfQorLQog LyoKICAqIEZpbmQgcmFkaXggdHJlZSBlbnRyeSBhdCBnaXZlbiBpbmRleC4gSWYgaXQgcG9pbnRz IHRvIGFuIGV4Y2VwdGlvbmFsIGVudHJ5LAogICogcmV0dXJuIGl0IHdpdGggdGhlIHJhZGl4IHRy ZWUgZW50cnkgbG9ja2VkLiBJZiB0aGUgcmFkaXggdHJlZSBkb2Vzbid0CkBAIC00ODcsNiArLTUw MCw5NyBAQCBzdGF0aWMgdm9pZCAqZ3JhYl9tYXBwaW5nX2VudHJ5KHN0cnVjdCBhZGRyZXNzX3Nw YWNlICptYXBwaW5nLCBwZ29mZl90IGluZGV4LAogCXJldHVybiBlbnRyeTsKIH0KIAorLXN0YXRp YyBpbnQgd2FpdF9wYWdlKGF0b21pY190ICpfcmVmY291bnQpCisteworLQlzdHJ1Y3QgcGFnZSAq cGFnZSA9IGNvbnRhaW5lcl9vZihfcmVmY291bnQsIHN0cnVjdCBwYWdlLCBfcmVmY291bnQpOwor LQlzdHJ1Y3QgaW5vZGUgKmlub2RlID0gcGFnZS0+aW5vZGU7CistCistCWlmIChwYWdlX3JlZl9j b3VudChwYWdlKSA9PSAxKQorLQkJcmV0dXJuIDA7CistCistCWlfZGF4ZG1hX3VubG9ja19zaGFy ZWQoaW5vZGUpOworLQlzY2hlZHVsZSgpOworLQlpX2RheGRtYV9sb2NrX3NoYXJlZChpbm9kZSk7 CistCistCS8qCistCSAqIGlmIHdlIGJvdW5jZWQgdGhlIGRheGRtYV9sb2NrIHRoZW4gd2UgbmVl ZCB0byByZXNjYW4gdGhlCistCSAqIHRydW5jYXRlIGFyZWEuCistCSAqLworLQlyZXR1cm4gMTsK Ky19CistCistdm9pZCBkYXhfd2FpdF9kbWEoc3RydWN0IGFkZHJlc3Nfc3BhY2UgKm1hcHBpbmcs IGxvZmZfdCBsc3RhcnQsIGxvZmZfdCBsZW4pCisteworLQlzdHJ1Y3QgaW5vZGUgKmlub2RlID0g bWFwcGluZy0+aG9zdDsKKy0JcGdvZmZfdAlpbmRpY2VzW1BBR0VWRUNfU0laRV07CistCXBnb2Zm X3QJc3RhcnQsIGVuZCwgaW5kZXg7CistCXN0cnVjdCBwYWdldmVjIHB2ZWM7CistCXVuc2lnbmVk IGk7CistCistCWxvY2tkZXBfYXNzZXJ0X2hlbGQoJmlub2RlLT5pX2RheF9kbWFzZW0pOworLQor LQlpZiAobHN0YXJ0IDwgMCB8fCBsZW4gPCAtMSkKKy0JCXJldHVybjsKKy0KKy0JLyogaW4gdGhl IGxpbWl0ZWQgY2FzZSBnZXRfdXNlcl9wYWdlcyBmb3IgZGF4IGlzIGRpc2FibGVkICovCistCWlm IChJU19FTkFCTEVEKENPTkZJR19GU19EQVhfTElNSVRFRCkpCistCQlyZXR1cm47CistCistCWlm ICghZGF4X21hcHBpbmcobWFwcGluZykpCistCQlyZXR1cm47CistCistCWlmIChtYXBwaW5nLT5u cmV4Y2VwdGlvbmFsID09IDApCistCQlyZXR1cm47CistCistCWlmIChsZW4gPT0gLTEpCistCQll bmQgPSAtMTsKKy0JZWxzZQorLQkJZW5kID0gKGxzdGFydCArLSBsZW4pID4+IFBBR0VfU0hJRlQ7 CistCXN0YXJ0ID0gbHN0YXJ0ID4+IFBBR0VfU0hJRlQ7CistCistcmV0cnk6CistCXBhZ2V2ZWNf aW5pdCgmcHZlYywgMCk7CistCWluZGV4ID0gc3RhcnQ7CistCXdoaWxlIChpbmRleCA8IGVuZCAm JiBwYWdldmVjX2xvb2t1cF9lbnRyaWVzKCZwdmVjLCBtYXBwaW5nLCBpbmRleCwKKy0JCQkJbWlu KGVuZCAtIGluZGV4LCAocGdvZmZfdClQQUdFVkVDX1NJWkUpLAorLQkJCQlpbmRpY2VzKSkgewor LQkJZm9yIChpID0gMDsgaSA8IHBhZ2V2ZWNfY291bnQoJnB2ZWMpOyBpKy0rLSkgeworLQkJCXN0 cnVjdCBwYWdlICpwdmVjX2VudCA9IHB2ZWMucGFnZXNbaV07CistCQkJc3RydWN0IHBhZ2UgKnBh Z2UgPSBOVUxMOworLQkJCXZvaWQgKmVudHJ5OworLQorLQkJCWluZGV4ID0gaW5kaWNlc1tpXTsK Ky0JCQlpZiAoaW5kZXggPj0gZW5kKQorLQkJCQlicmVhazsKKy0KKy0JCQlpZiAoIXJhZGl4X3Ry ZWVfZXhjZXB0aW9uYWxfZW50cnkocHZlY19lbnQpKQorLQkJCQljb250aW51ZTsKKy0KKy0JCQlz cGluX2xvY2tfaXJxKCZtYXBwaW5nLT50cmVlX2xvY2spOworLQkJCWVudHJ5ID0gZ2V0X3VubG9j a2VkX21hcHBpbmdfZW50cnkobWFwcGluZywgaW5kZXgsIE5VTEwpOworLQkJCWlmIChlbnRyeSkK Ky0JCQkJcGFnZSA9IGRtYV9idXN5X3BhZ2UoZW50cnkpOworLQkJCXB1dF91bmxvY2tlZF9tYXBw aW5nX2VudHJ5KG1hcHBpbmcsIGluZGV4LCBlbnRyeSk7CistCQkJc3Bpbl91bmxvY2tfaXJxKCZt YXBwaW5nLT50cmVlX2xvY2spOworLQorLQkJCWlmIChwYWdlICYmIHdhaXRfb25fZGV2bWFwX2lk bGUoJnBhZ2UtPl9yZWZjb3VudCwKKy0JCQkJCQl3YWl0X3BhZ2UsCistCQkJCQkJVEFTS19VTklO VEVSUlVQVElCTEUpICE9IDApIHsKKy0JCQkJLyoKKy0JCQkJICogV2UgZHJvcHBlZCB0aGUgZG1h IGxvY2ssIHNvIHdlIG5lZWQKKy0JCQkJICogdG8gcmV2YWxpZGF0ZSB0aGF0IHByZXZpb3VzbHkg c2VlbgorLQkJCQkgKiBpZGxlIHBhZ2VzIGFyZSBzdGlsbCBpZGxlLgorLQkJCQkgKi8KKy0JCQkJ Z290byByZXRyeTsKKy0JCQl9CistCQl9CistCQlwYWdldmVjX3JlbW92ZV9leGNlcHRpb25hbHMo JnB2ZWMpOworLQkJcGFnZXZlY19yZWxlYXNlKCZwdmVjKTsKKy0JCWluZGV4Ky0rLTsKKy0JfQor LX0KKy1FWFBPUlRfU1lNQk9MX0dQTChkYXhfd2FpdF9kbWEpOworLQogc3RhdGljIGludCBfX2Rh eF9pbnZhbGlkYXRlX21hcHBpbmdfZW50cnkoc3RydWN0IGFkZHJlc3Nfc3BhY2UgKm1hcHBpbmcs CiAJCQkJCSAgcGdvZmZfdCBpbmRleCwgYm9vbCB0cnVuYykKIHsKQEAgLTUwOSw4ICstNjEzLDEw IEBAIHN0YXRpYyBpbnQgX19kYXhfaW52YWxpZGF0ZV9tYXBwaW5nX2VudHJ5KHN0cnVjdCBhZGRy ZXNzX3NwYWNlICptYXBwaW5nLAogb3V0OgogCXB1dF91bmxvY2tlZF9tYXBwaW5nX2VudHJ5KG1h cHBpbmcsIGluZGV4LCBlbnRyeSk7CiAJc3Bpbl91bmxvY2tfaXJxKCZtYXBwaW5nLT50cmVlX2xv Y2spOworLQogCXJldHVybiByZXQ7CiB9CistCiAvKgogICogRGVsZXRlIGV4Y2VwdGlvbmFsIERB WCBlbnRyeSBhdCBAaW5kZXggZnJvbSBAbWFwcGluZy4gV2FpdCBmb3IgcmFkaXggdHJlZQogICog ZW50cnkgdG8gZ2V0IHVubG9ja2VkIGJlZm9yZSBkZWxldGluZyBpdC4KZGlmZiAtLWdpdCBhL2Zz L2lub2RlLmMgYi9mcy9pbm9kZS5jCmluZGV4IGQxZTM1YjUzYmIyMy4uOTU0MDhlODdhOTZjIDEw MDY0NAotLS0gYS9mcy9pbm9kZS5jCistKy0rLSBiL2ZzL2lub2RlLmMKQEAgLTE5Miw2ICstMTky LDcgQEAgaW50IGlub2RlX2luaXRfYWx3YXlzKHN0cnVjdCBzdXBlcl9ibG9jayAqc2IsIHN0cnVj dCBpbm9kZSAqaW5vZGUpCiAJaW5vZGUtPmlfZnNub3RpZnlfbWFzayA9IDA7CiAjZW5kaWYKIAlp bm9kZS0+aV9mbGN0eCA9IE5VTEw7CistCWlfZGF4ZG1hX2luaXQoaW5vZGUpOwogCXRoaXNfY3B1 X2luYyhucl9pbm9kZXMpOwogCiAJcmV0dXJuIDA7CmRpZmYgLS1naXQgYS9pbmNsdWRlL2xpbnV4 L2RheC5oIGIvaW5jbHVkZS9saW51eC9kYXguaAppbmRleCBlYTIxZWJmZDE4ODkuLjZjZTFjNTA1 MTllNyAxMDA2NDQKLS0tIGEvaW5jbHVkZS9saW51eC9kYXguaAorLSstKy0gYi9pbmNsdWRlL2xp bnV4L2RheC5oCkBAIC0xMDAsMTAgKy0xMDAsMTUgQEAgaW50IGRheF9pbnZhbGlkYXRlX21hcHBp bmdfZW50cnlfc3luYyhzdHJ1Y3QgYWRkcmVzc19zcGFjZSAqbWFwcGluZywKIAkJCQkgICAgICBw Z29mZl90IGluZGV4KTsKIAogI2lmZGVmIENPTkZJR19GU19EQVgKKy12b2lkIGRheF93YWl0X2Rt YShzdHJ1Y3QgYWRkcmVzc19zcGFjZSAqbWFwcGluZywgbG9mZl90IGxzdGFydCwgbG9mZl90IGxl bik7CiBpbnQgX19kYXhfemVyb19wYWdlX3JhbmdlKHN0cnVjdCBibG9ja19kZXZpY2UgKmJkZXYs CiAJCXN0cnVjdCBkYXhfZGV2aWNlICpkYXhfZGV2LCBzZWN0b3JfdCBzZWN0b3IsCiAJCXVuc2ln bmVkIGludCBvZmZzZXQsIHVuc2lnbmVkIGludCBsZW5ndGgpOwogI2Vsc2UKKy1zdGF0aWMgaW5s aW5lIHZvaWQgZGF4X3dhaXRfZG1hKHN0cnVjdCBhZGRyZXNzX3NwYWNlICptYXBwaW5nLCBsb2Zm X3QgbHN0YXJ0LAorLQkJbG9mZl90IGxlbikKKy17CistfQogc3RhdGljIGlubGluZSBpbnQgX19k YXhfemVyb19wYWdlX3JhbmdlKHN0cnVjdCBibG9ja19kZXZpY2UgKmJkZXYsCiAJCXN0cnVjdCBk YXhfZGV2aWNlICpkYXhfZGV2LCBzZWN0b3JfdCBzZWN0b3IsCiAJCXVuc2lnbmVkIGludCBvZmZz ZXQsIHVuc2lnbmVkIGludCBsZW5ndGgpCmRpZmYgLS1naXQgYS9pbmNsdWRlL2xpbnV4L2ZzLmgg Yi9pbmNsdWRlL2xpbnV4L2ZzLmgKaW5kZXggMTNkYWIxOTFhMjNlLi5jZDViNGEwOTJkMWMgMTAw NjQ0Ci0tLSBhL2luY2x1ZGUvbGludXgvZnMuaAorLSstKy0gYi9pbmNsdWRlL2xpbnV4L2ZzLmgK QEAgLTY0NSw2ICstNjQ1LDkgQEAgc3RydWN0IGlub2RlIHsKICNpZmRlZiBDT05GSUdfSU1BCiAJ YXRvbWljX3QJCWlfcmVhZGNvdW50OyAvKiBzdHJ1Y3QgZmlsZXMgb3BlbiBSTyAqLwogI2VuZGlm CistI2lmZGVmIENPTkZJR19GU19EQVgKKy0Jc3RydWN0IHJ3X3NlbWFwaG9yZQlpX2RheF9kbWFz ZW07CistI2VuZGlmCiAJY29uc3Qgc3RydWN0IGZpbGVfb3BlcmF0aW9ucwkqaV9mb3A7CS8qIGZv cm1lciAtPmlfb3AtPmRlZmF1bHRfZmlsZV9vcHMgKi8KIAlzdHJ1Y3QgZmlsZV9sb2NrX2NvbnRl eHQJKmlfZmxjdHg7CiAJc3RydWN0IGFkZHJlc3Nfc3BhY2UJaV9kYXRhOwpAQCAtNzQ3LDYgKy03 NTAsNTkgQEAgc3RhdGljIGlubGluZSB2b2lkIGlub2RlX2xvY2tfbmVzdGVkKHN0cnVjdCBpbm9k ZSAqaW5vZGUsIHVuc2lnbmVkIHN1YmNsYXNzKQogCWRvd25fd3JpdGVfbmVzdGVkKCZpbm9kZS0+ aV9yd3NlbSwgc3ViY2xhc3MpOwogfQogCistI2lmZGVmIENPTkZJR19GU19EQVgKKy1zdGF0aWMg aW5saW5lIHZvaWQgaV9kYXhkbWFfaW5pdChzdHJ1Y3QgaW5vZGUgKmlub2RlKQorLXsKKy0JaW5p dF9yd3NlbSgmaW5vZGUtPmlfZGF4X2RtYXNlbSk7CistfQorLQorLXN0YXRpYyBpbmxpbmUgdm9p ZCBpX2RheGRtYV9sb2NrKHN0cnVjdCBpbm9kZSAqaW5vZGUpCisteworLQlkb3duX3dyaXRlKCZp bm9kZS0+aV9kYXhfZG1hc2VtKTsKKy19CistCistc3RhdGljIGlubGluZSB2b2lkIGlfZGF4ZG1h X3VubG9jayhzdHJ1Y3QgaW5vZGUgKmlub2RlKQorLXsKKy0JdXBfd3JpdGUoJmlub2RlLT5pX2Rh eF9kbWFzZW0pOworLX0KKy0KKy1zdGF0aWMgaW5saW5lIHZvaWQgaV9kYXhkbWFfbG9ja19zaGFy ZWQoc3RydWN0IGlub2RlICppbm9kZSkKKy17CistCS8qCistCSAqIFRoZSB3cml0ZSBsb2NrIGlz IHRha2VuIHVuZGVyIG1tYXBfc2VtIGluIHRoZQorLQkgKiBnZXRfdXNlcl9wYWdlcygpIHBhdGgg dGhlIHJlYWQgbG9jayBuZXN0cyBpbiB0aGUgdHJ1bmNhdGUKKy0JICogcGF0aC4KKy0JICovCist I2RlZmluZSBEQVhETUFfVFJVTkNBVEVfQ0xBU1MgMQorLQlkb3duX3JlYWRfbmVzdGVkKCZpbm9k ZS0+aV9kYXhfZG1hc2VtLCBEQVhETUFfVFJVTkNBVEVfQ0xBU1MpOworLX0KKy0KKy1zdGF0aWMg aW5saW5lIHZvaWQgaV9kYXhkbWFfdW5sb2NrX3NoYXJlZChzdHJ1Y3QgaW5vZGUgKmlub2RlKQor LXsKKy0JdXBfcmVhZCgmaW5vZGUtPmlfZGF4X2RtYXNlbSk7CistfQorLSNlbHNlIC8qIENPTkZJ R19GU19EQVggKi8KKy1zdGF0aWMgaW5saW5lIHZvaWQgaV9kYXhkbWFfaW5pdChzdHJ1Y3QgaW5v ZGUgKmlub2RlKQorLXsKKy19CistCistc3RhdGljIGlubGluZSB2b2lkIGlfZGF4ZG1hX2xvY2so c3RydWN0IGlub2RlICppbm9kZSkKKy17CistfQorLQorLXN0YXRpYyBpbmxpbmUgdm9pZCBpX2Rh eGRtYV91bmxvY2soc3RydWN0IGlub2RlICppbm9kZSkKKy17CistfQorLQorLXN0YXRpYyBpbmxp bmUgdm9pZCBpX2RheGRtYV9sb2NrX3NoYXJlZChzdHJ1Y3QgaW5vZGUgKmlub2RlKQorLXsKKy19 CistCistc3RhdGljIGlubGluZSB2b2lkIGlfZGF4ZG1hX3VubG9ja19zaGFyZWQoc3RydWN0IGlu b2RlICppbm9kZSkKKy17CistfQorLSNlbmRpZiAvKiBDT05GSUdfRlNfREFYICovCistCiB2b2lk IGxvY2tfdHdvX25vbmRpcmVjdG9yaWVzKHN0cnVjdCBpbm9kZSAqLCBzdHJ1Y3QgaW5vZGUqKTsK IHZvaWQgdW5sb2NrX3R3b19ub25kaXJlY3RvcmllcyhzdHJ1Y3QgaW5vZGUgKiwgc3RydWN0IGlu b2RlKik7CiAKZGlmZiAtLWdpdCBhL2luY2x1ZGUvbGludXgvd2FpdF9iaXQuaCBiL2luY2x1ZGUv bGludXgvd2FpdF9iaXQuaAppbmRleCAxMmIyNjY2MGQ3ZTkuLjYxODZlY2RiOWRmNyAxMDA2NDQK LS0tIGEvaW5jbHVkZS9saW51eC93YWl0X2JpdC5oCistKy0rLSBiL2luY2x1ZGUvbGludXgvd2Fp dF9iaXQuaApAQCAtMzAsMTAgKy0zMCwxMiBAQCBpbnQgX193YWl0X29uX2JpdChzdHJ1Y3Qgd2Fp dF9xdWV1ZV9oZWFkICp3cV9oZWFkLCBzdHJ1Y3Qgd2FpdF9iaXRfcXVldWVfZW50cnkgKgogaW50 IF9fd2FpdF9vbl9iaXRfbG9jayhzdHJ1Y3Qgd2FpdF9xdWV1ZV9oZWFkICp3cV9oZWFkLCBzdHJ1 Y3Qgd2FpdF9iaXRfcXVldWVfZW50cnkgKndicV9lbnRyeSwgd2FpdF9iaXRfYWN0aW9uX2YgKmFj dGlvbiwgdW5zaWduZWQgaW50IG1vZGUpOwogdm9pZCB3YWtlX3VwX2JpdCh2b2lkICp3b3JkLCBp bnQgYml0KTsKIHZvaWQgd2FrZV91cF9hdG9taWNfdChhdG9taWNfdCAqcCk7Cistdm9pZCB3YWtl X3VwX2Rldm1hcF9pZGxlKGF0b21pY190ICpwKTsKIGludCBvdXRfb2ZfbGluZV93YWl0X29uX2Jp dCh2b2lkICp3b3JkLCBpbnQsIHdhaXRfYml0X2FjdGlvbl9mICphY3Rpb24sIHVuc2lnbmVkIGlu dCBtb2RlKTsKIGludCBvdXRfb2ZfbGluZV93YWl0X29uX2JpdF90aW1lb3V0KHZvaWQgKndvcmQs IGludCwgd2FpdF9iaXRfYWN0aW9uX2YgKmFjdGlvbiwgdW5zaWduZWQgaW50IG1vZGUsIHVuc2ln bmVkIGxvbmcgdGltZW91dCk7CiBpbnQgb3V0X29mX2xpbmVfd2FpdF9vbl9iaXRfbG9jayh2b2lk ICp3b3JkLCBpbnQsIHdhaXRfYml0X2FjdGlvbl9mICphY3Rpb24sIHVuc2lnbmVkIGludCBtb2Rl KTsKIGludCBvdXRfb2ZfbGluZV93YWl0X29uX2F0b21pY190KGF0b21pY190ICpwLCBpbnQgKCop KGF0b21pY190ICopLCB1bnNpZ25lZCBpbnQgbW9kZSk7CistaW50IG91dF9vZl9saW5lX3dhaXRf b25fZGV2bWFwX2lkbGUoYXRvbWljX3QgKnAsIGludCAoKikoYXRvbWljX3QgKiksIHVuc2lnbmVk IGludCBtb2RlKTsKIHN0cnVjdCB3YWl0X3F1ZXVlX2hlYWQgKmJpdF93YWl0cXVldWUodm9pZCAq d29yZCwgaW50IGJpdCk7CiBleHRlcm4gdm9pZCBfX2luaXQgd2FpdF9iaXRfaW5pdCh2b2lkKTsK IApAQCAtMjU4LDQgKy0yNjAsMTIgQEAgaW50IHdhaXRfb25fYXRvbWljX3QoYXRvbWljX3QgKnZh bCwgaW50ICgqYWN0aW9uKShhdG9taWNfdCAqKSwgdW5zaWduZWQgbW9kZSkKIAlyZXR1cm4gb3V0 X29mX2xpbmVfd2FpdF9vbl9hdG9taWNfdCh2YWwsIGFjdGlvbiwgbW9kZSk7CiB9CiAKKy1zdGF0 aWMgaW5saW5lCistaW50IHdhaXRfb25fZGV2bWFwX2lkbGUoYXRvbWljX3QgKnZhbCwgaW50ICgq YWN0aW9uKShhdG9taWNfdCAqKSwgdW5zaWduZWQgbW9kZSkKKy17CistCW1pZ2h0X3NsZWVwKCk7 CistCWlmIChhdG9taWNfcmVhZCh2YWwpID09IDEpCistCQlyZXR1cm4gMDsKKy0JcmV0dXJuIG91 dF9vZl9saW5lX3dhaXRfb25fZGV2bWFwX2lkbGUodmFsLCBhY3Rpb24sIG1vZGUpOworLX0KICNl bmRpZiAvKiBfTElOVVhfV0FJVF9CSVRfSCAqLwpkaWZmIC0tZ2l0IGEva2VybmVsL3NjaGVkL3dh aXRfYml0LmMgYi9rZXJuZWwvc2NoZWQvd2FpdF9iaXQuYwppbmRleCBmODE1OTY5OGFhNGQuLjZl YTkzMTQ5NjE0YSAxMDA2NDQKLS0tIGEva2VybmVsL3NjaGVkL3dhaXRfYml0LmMKKy0rLSstIGIv a2VybmVsL3NjaGVkL3dhaXRfYml0LmMKQEAgLTE2MiwxMSArLTE2MiwxNyBAQCBzdGF0aWMgaW5s aW5lIHdhaXRfcXVldWVfaGVhZF90ICphdG9taWNfdF93YWl0cXVldWUoYXRvbWljX3QgKnApCiAJ cmV0dXJuIGJpdF93YWl0cXVldWUocCwgMCk7CiB9CiAKLXN0YXRpYyBpbnQgd2FrZV9hdG9taWNf dF9mdW5jdGlvbihzdHJ1Y3Qgd2FpdF9xdWV1ZV9lbnRyeSAqd3FfZW50cnksIHVuc2lnbmVkIG1v ZGUsIGludCBzeW5jLAotCQkJCSAgdm9pZCAqYXJnKQorLXN0YXRpYyBpbmxpbmUgc3RydWN0IHdh aXRfYml0X3F1ZXVlX2VudHJ5ICp0b193YWl0X2JpdF9xKAorLQkJc3RydWN0IHdhaXRfcXVldWVf ZW50cnkgKndxX2VudHJ5KQorLXsKKy0JcmV0dXJuIGNvbnRhaW5lcl9vZih3cV9lbnRyeSwgc3Ry dWN0IHdhaXRfYml0X3F1ZXVlX2VudHJ5LCB3cV9lbnRyeSk7CistfQorLQorLXN0YXRpYyBpbnQg d2FrZV9hdG9taWNfdF9mdW5jdGlvbihzdHJ1Y3Qgd2FpdF9xdWV1ZV9lbnRyeSAqd3FfZW50cnks CistCQl1bnNpZ25lZCBtb2RlLCBpbnQgc3luYywgdm9pZCAqYXJnKQogewogCXN0cnVjdCB3YWl0 X2JpdF9rZXkgKmtleSA9IGFyZzsKLQlzdHJ1Y3Qgd2FpdF9iaXRfcXVldWVfZW50cnkgKndhaXRf Yml0ID0gY29udGFpbmVyX29mKHdxX2VudHJ5LCBzdHJ1Y3Qgd2FpdF9iaXRfcXVldWVfZW50cnks IHdxX2VudHJ5KTsKKy0Jc3RydWN0IHdhaXRfYml0X3F1ZXVlX2VudHJ5ICp3YWl0X2JpdCA9IHRv X3dhaXRfYml0X3Eod3FfZW50cnkpOwogCWF0b21pY190ICp2YWwgPSBrZXktPmZsYWdzOwogCiAJ aWYgKHdhaXRfYml0LT5rZXkuZmxhZ3MgIT0ga2V5LT5mbGFncyB8fApAQCAtMTc2LDE0ICstMTgy LDI5IEBAIHN0YXRpYyBpbnQgd2FrZV9hdG9taWNfdF9mdW5jdGlvbihzdHJ1Y3Qgd2FpdF9xdWV1 ZV9lbnRyeSAqd3FfZW50cnksIHVuc2lnbmVkIG1vCiAJcmV0dXJuIGF1dG9yZW1vdmVfd2FrZV9m dW5jdGlvbih3cV9lbnRyeSwgbW9kZSwgc3luYywga2V5KTsKIH0KIAorLXN0YXRpYyBpbnQgd2Fr ZV9kZXZtYXBfaWRsZV9mdW5jdGlvbihzdHJ1Y3Qgd2FpdF9xdWV1ZV9lbnRyeSAqd3FfZW50cnks CistCQl1bnNpZ25lZCBtb2RlLCBpbnQgc3luYywgdm9pZCAqYXJnKQorLXsKKy0Jc3RydWN0IHdh aXRfYml0X2tleSAqa2V5ID0gYXJnOworLQlzdHJ1Y3Qgd2FpdF9iaXRfcXVldWVfZW50cnkgKndh aXRfYml0ID0gdG9fd2FpdF9iaXRfcSh3cV9lbnRyeSk7CistCWF0b21pY190ICp2YWwgPSBrZXkt PmZsYWdzOworLQorLQlpZiAod2FpdF9iaXQtPmtleS5mbGFncyAhPSBrZXktPmZsYWdzIHx8Cist CSAgICB3YWl0X2JpdC0+a2V5LmJpdF9uciAhPSBrZXktPmJpdF9uciB8fAorLQkgICAgYXRvbWlj X3JlYWQodmFsKSAhPSAxKQorLQkJcmV0dXJuIDA7CistCXJldHVybiBhdXRvcmVtb3ZlX3dha2Vf ZnVuY3Rpb24od3FfZW50cnksIG1vZGUsIHN5bmMsIGtleSk7CistfQorLQogLyoKICAqIFRvIGFs bG93IGludGVycnVwdGlibGUgd2FpdGluZyBhbmQgYXN5bmNocm9ub3VzIChpLmUuIG5vbmJsb2Nr aW5nKSB3YWl0aW5nLAogICogdGhlIGFjdGlvbnMgb2YgX193YWl0X29uX2F0b21pY190KCkgYXJl IHBlcm1pdHRlZCByZXR1cm4gY29kZXMuICBOb256ZXJvCiAgKiByZXR1cm4gY29kZXMgaGFsdCB3 YWl0aW5nIGFuZCByZXR1cm4uCiAgKi8KIHN0YXRpYyBfX3NjaGVkCi1pbnQgX193YWl0X29uX2F0 b21pY190KHN0cnVjdCB3YWl0X3F1ZXVlX2hlYWQgKndxX2hlYWQsIHN0cnVjdCB3YWl0X2JpdF9x dWV1ZV9lbnRyeSAqd2JxX2VudHJ5LAotCQkgICAgICAgaW50ICgqYWN0aW9uKShhdG9taWNfdCAq KSwgdW5zaWduZWQgbW9kZSkKKy1pbnQgX193YWl0X29uX2F0b21pY190KHN0cnVjdCB3YWl0X3F1 ZXVlX2hlYWQgKndxX2hlYWQsCistCQlzdHJ1Y3Qgd2FpdF9iaXRfcXVldWVfZW50cnkgKndicV9l bnRyeSwKKy0JCWludCAoKmFjdGlvbikoYXRvbWljX3QgKiksIHVuc2lnbmVkIG1vZGUsIGludCB0 YXJnZXQpCiB7CiAJYXRvbWljX3QgKnZhbDsKIAlpbnQgcmV0ID0gMDsKQEAgLTE5MSwxMCArLTIx MiwxMCBAQCBpbnQgX193YWl0X29uX2F0b21pY190KHN0cnVjdCB3YWl0X3F1ZXVlX2hlYWQgKndx X2hlYWQsIHN0cnVjdCB3YWl0X2JpdF9xdWV1ZV9lbgogCWRvIHsKIAkJcHJlcGFyZV90b193YWl0 KHdxX2hlYWQsICZ3YnFfZW50cnktPndxX2VudHJ5LCBtb2RlKTsKIAkJdmFsID0gd2JxX2VudHJ5 LT5rZXkuZmxhZ3M7Ci0JCWlmIChhdG9taWNfcmVhZCh2YWwpID09IDApCistCQlpZiAoYXRvbWlj X3JlYWQodmFsKSA9PSB0YXJnZXQpCiAJCQlicmVhazsKIAkJcmV0ID0gKCphY3Rpb24pKHZhbCk7 Ci0JfSB3aGlsZSAoIXJldCAmJiBhdG9taWNfcmVhZCh2YWwpICE9IDApOworLQl9IHdoaWxlICgh cmV0ICYmIGF0b21pY19yZWFkKHZhbCkgIT0gdGFyZ2V0KTsKIAlmaW5pc2hfd2FpdCh3cV9oZWFk LCAmd2JxX2VudHJ5LT53cV9lbnRyeSk7CiAJcmV0dXJuIHJldDsKIH0KQEAgLTIxMCwxNiArLTIz MSwzNyBAQCBpbnQgX193YWl0X29uX2F0b21pY190KHN0cnVjdCB3YWl0X3F1ZXVlX2hlYWQgKndx X2hlYWQsIHN0cnVjdCB3YWl0X2JpdF9xdWV1ZV9lbgogCQl9LAkJCQkJCQkrQUZ3CiAJfQogCist I2RlZmluZSBERUZJTkVfV0FJVF9ERVZNQVBfSURMRShuYW1lLCBwKQkJCQkJK0FGdworLQlzdHJ1 Y3Qgd2FpdF9iaXRfcXVldWVfZW50cnkgbmFtZSA9IHsJCQkJK0FGdworLQkJLmtleSA9IF9fV0FJ VF9BVE9NSUNfVF9LRVlfSU5JVElBTElaRVIocCksCQkrQUZ3CistCQkud3FfZW50cnkgPSB7CQkJ CQkJK0FGdworLQkJCS5wcml2YXRlCT0gY3VycmVudCwJCQkrQUZ3CistCQkJLmZ1bmMJCT0gd2Fr ZV9kZXZtYXBfaWRsZV9mdW5jdGlvbiwJK0FGdworLQkJCS5lbnRyeQkJPQkJCQkrQUZ3CistCQkJ CUxJU1RfSEVBRF9JTklUKChuYW1lKS53cV9lbnRyeS5lbnRyeSksCStBRncKKy0JCX0sCQkJCQkJ CStBRncKKy0JfQorLQogX19zY2hlZCBpbnQgb3V0X29mX2xpbmVfd2FpdF9vbl9hdG9taWNfdChh dG9taWNfdCAqcCwgaW50ICgqYWN0aW9uKShhdG9taWNfdCAqKSwKIAkJCQkJIHVuc2lnbmVkIG1v ZGUpCiB7CiAJc3RydWN0IHdhaXRfcXVldWVfaGVhZCAqd3FfaGVhZCA9IGF0b21pY190X3dhaXRx dWV1ZShwKTsKIAlERUZJTkVfV0FJVF9BVE9NSUNfVCh3cV9lbnRyeSwgcCk7CiAKLQlyZXR1cm4g X193YWl0X29uX2F0b21pY190KHdxX2hlYWQsICZ3cV9lbnRyeSwgYWN0aW9uLCBtb2RlKTsKKy0J cmV0dXJuIF9fd2FpdF9vbl9hdG9taWNfdCh3cV9oZWFkLCAmd3FfZW50cnksIGFjdGlvbiwgbW9k ZSwgMCk7CiB9CiBFWFBPUlRfU1lNQk9MKG91dF9vZl9saW5lX3dhaXRfb25fYXRvbWljX3QpOwog CistX19zY2hlZCBpbnQgb3V0X29mX2xpbmVfd2FpdF9vbl9kZXZtYXBfaWRsZShhdG9taWNfdCAq cCwgaW50ICgqYWN0aW9uKShhdG9taWNfdCAqKSwKKy0JCQkJCSB1bnNpZ25lZCBtb2RlKQorLXsK Ky0Jc3RydWN0IHdhaXRfcXVldWVfaGVhZCAqd3FfaGVhZCA9IGF0b21pY190X3dhaXRxdWV1ZShw KTsKKy0JREVGSU5FX1dBSVRfREVWTUFQX0lETEUod3FfZW50cnksIHApOworLQorLQlyZXR1cm4g X193YWl0X29uX2F0b21pY190KHdxX2hlYWQsICZ3cV9lbnRyeSwgYWN0aW9uLCBtb2RlLCAxKTsK Ky19CistRVhQT1JUX1NZTUJPTChvdXRfb2ZfbGluZV93YWl0X29uX2Rldm1hcF9pZGxlKTsKKy0K IC8qKgogICogd2FrZV91cF9hdG9taWNfdCAtIFdha2UgdXAgYSB3YWl0ZXIgb24gYSBhdG9taWNf dAogICogQHA6IFRoZSBhdG9taWNfdCBiZWluZyB3YWl0ZWQgb24sIGEga2VybmVsIHZpcnR1YWwg YWRkcmVzcwpAQCAtMjM1LDYgKy0yNzcsMTIgQEAgdm9pZCB3YWtlX3VwX2F0b21pY190KGF0b21p Y190ICpwKQogfQogRVhQT1JUX1NZTUJPTCh3YWtlX3VwX2F0b21pY190KTsKIAorLXZvaWQgd2Fr ZV91cF9kZXZtYXBfaWRsZShhdG9taWNfdCAqcCkKKy17CistCV9fd2FrZV91cF9iaXQoYXRvbWlj X3Rfd2FpdHF1ZXVlKHApLCBwLCBXQUlUX0FUT01JQ19UX0JJVF9OUik7CistfQorLUVYUE9SVF9T WU1CT0wod2FrZV91cF9kZXZtYXBfaWRsZSk7CistCiBfX3NjaGVkIGludCBiaXRfd2FpdChzdHJ1 Y3Qgd2FpdF9iaXRfa2V5ICp3b3JkLCBpbnQgbW9kZSkKIHsKIAlzY2hlZHVsZSgpOwpkaWZmIC0t Z2l0IGEvbW0vZ3VwLmMgYi9tbS9ndXAuYwppbmRleCAzMDhiZTg5N2QyMmEuLmZkN2IyYTJlMmQx OSAxMDA2NDQKLS0tIGEvbW0vZ3VwLmMKKy0rLSstIGIvbW0vZ3VwLmMKQEAgLTU3OSw2ICstNTc5 LDQxIEBAIHN0YXRpYyBpbnQgY2hlY2tfdm1hX2ZsYWdzKHN0cnVjdCB2bV9hcmVhX3N0cnVjdCAq dm1hLCB1bnNpZ25lZCBsb25nIGd1cF9mbGFncykKIAlyZXR1cm4gMDsKIH0KIAorLXN0YXRpYyBz dHJ1Y3QgaW5vZGUgKmRvX2RheF9sb2NrKHN0cnVjdCB2bV9hcmVhX3N0cnVjdCAqdm1hLAorLQkJ dW5zaWduZWQgaW50IGZvbGxfZmxhZ3MpCisteworLQlzdHJ1Y3QgZmlsZSAqZmlsZTsKKy0Jc3Ry dWN0IGlub2RlICppbm9kZTsKKy0KKy0JaWYgKCEoZm9sbF9mbGFncyAmIEZPTExfR0VUKSkKKy0J CXJldHVybiBOVUxMOworLQlpZiAoIXZtYV9pc19kYXgodm1hKSkKKy0JCXJldHVybiBOVUxMOwor LQlmaWxlID0gdm1hLT52bV9maWxlOworLQlpbm9kZSA9IGZpbGVfaW5vZGUoZmlsZSk7CistCWlm IChpbm9kZS0+aV9tb2RlID09IFNfSUZDSFIpCistCQlyZXR1cm4gTlVMTDsKKy0JcmV0dXJuIGlu b2RlOworLX0KKy0KKy1zdGF0aWMgc3RydWN0IGlub2RlICpkYXhfdHJ1bmNhdGVfbG9jayhzdHJ1 Y3Qgdm1fYXJlYV9zdHJ1Y3QgKnZtYSwKKy0JCXVuc2lnbmVkIGludCBmb2xsX2ZsYWdzKQorLXsK Ky0Jc3RydWN0IGlub2RlICppbm9kZSA9IGRvX2RheF9sb2NrKHZtYSwgZm9sbF9mbGFncyk7Cist CistCWlmICghaW5vZGUpCistCQlyZXR1cm4gTlVMTDsKKy0JaV9kYXhkbWFfbG9jayhpbm9kZSk7 CistCXJldHVybiBpbm9kZTsKKy19CistCistc3RhdGljIHZvaWQgZGF4X3RydW5jYXRlX3VubG9j ayhzdHJ1Y3QgaW5vZGUgKmlub2RlKQorLXsKKy0JaWYgKCFpbm9kZSkKKy0JCXJldHVybjsKKy0J aV9kYXhkbWFfdW5sb2NrKGlub2RlKTsKKy19CistCiAvKioKICAqIF9fZ2V0X3VzZXJfcGFnZXMo KSAtIHBpbiB1c2VyIHBhZ2VzIGluIG1lbW9yeQogICogQHRzazoJdGFza19zdHJ1Y3Qgb2YgdGFy Z2V0IHRhc2sKQEAgLTY1OSw2ICstNjk0LDcgQEAgc3RhdGljIGxvbmcgX19nZXRfdXNlcl9wYWdl cyhzdHJ1Y3QgdGFza19zdHJ1Y3QgKnRzaywgc3RydWN0IG1tX3N0cnVjdCAqbW0sCiAKIAlkbyB7 CiAJCXN0cnVjdCBwYWdlICpwYWdlOworLQkJc3RydWN0IGlub2RlICppbm9kZTsKIAkJdW5zaWdu ZWQgaW50IGZvbGxfZmxhZ3MgPSBndXBfZmxhZ3M7CiAJCXVuc2lnbmVkIGludCBwYWdlX2luY3Jl bTsKIApAQCAtNjkzLDcgKy03MjksOSBAQCBzdGF0aWMgbG9uZyBfX2dldF91c2VyX3BhZ2VzKHN0 cnVjdCB0YXNrX3N0cnVjdCAqdHNrLCBzdHJ1Y3QgbW1fc3RydWN0ICptbSwKIAkJaWYgKHVubGlr ZWx5KGZhdGFsX3NpZ25hbF9wZW5kaW5nKGN1cnJlbnQpKSkKIAkJCXJldHVybiBpID8gaSA6IC1F UkVTVEFSVFNZUzsKIAkJY29uZF9yZXNjaGVkKCk7CistCQlpbm9kZSA9IGRheF90cnVuY2F0ZV9s b2NrKHZtYSwgZm9sbF9mbGFncyk7CiAJCXBhZ2UgPSBmb2xsb3dfcGFnZV9tYXNrKHZtYSwgc3Rh cnQsIGZvbGxfZmxhZ3MsICZwYWdlX21hc2spOworLQkJZGF4X3RydW5jYXRlX3VubG9jayhpbm9k ZSk7CiAJCWlmICghcGFnZSkgewogCQkJaW50IHJldDsKIAkJCXJldCA9IGZhdWx0aW5fcGFnZSh0 c2ssIHZtYSwgc3RhcnQsICZmb2xsX2ZsYWdzLAoKY29tbWl0IDY3ZDk1MjMxNGU5OTg5YjNiMTk0 NWM1MDQ4OGY0YTBmNzYwMjY0YzMKQXV0aG9yOiBEYW4gV2lsbGlhbXMgPGRhbi5qLndpbGxpYW1z QGludGVsLmNvbT4KRGF0ZTogICBUdWUgT2N0IDI0IDEzOjQxOjIyIDIwMTcgLTA3MDAKCiAgICB4 ZnM6IHdpcmUgdXAgZGF4IGRtYSB3YWl0aW5nCiAgICAKICAgIFRoZSBkYXgtZG1hIHZzIHRydW5j YXRlIGNvbGxpc2lvbiBhdm9pZGFuY2UgaW52b2x2ZXMgYWNxdWlyaW5nIHRoZSBuZXcKICAgIGlf ZGF4X2RtYXNlbSBhbmQgdmFsaWRhdGluZyB0aGUgbm8gcmFuZ2VzIHRoYXQgYXJlIHRvIGJlIG1h cHBlZCBvdXQgb2YKICAgIHRoZSBmaWxlIGFyZSBhY3RpdmUgZm9yIGRtYS4gSWYgYW55IGFyZSBm b3VuZCB3ZSB3YWl0IGZvciBwYWdlIGlkbGUKICAgIGFuZCByZXRyeSB0aGUgc2Nhbi4gVGhlIGxv Y2F0aW9ucyB3aGVyZSB3ZSBpbXBsZW1lbnQgdGhpcyB3YWl0IGxpbmUgdXAKICAgIHdpdGggd2hl cmUgd2UgY3VycmVudGx5IHdhaXQgZm9yIHBuZnMgbGF5b3V0IGxlYXNlcyB0byBleHBpcmUuCiAg ICAKICAgIFNpbmNlIHdlIG5lZWQgYm90aCBkbWEgdG8gYmUgaWRsZSBhbmQgbGVhc2VzIHRvIGJl IGJyb2tlbiwgYW5kIHNpbmNlCiAgICB4ZnNfYnJlYWtfbGF5b3V0cyBkcm9wcyBsb2Nrcywgd2Ug bmVlZCB0byByZXRyeSB0aGUgZG1hIGJ1c3kgc2NhbiB1bnRpbAogICAgd2UgY2FuIGNvbXBsZXRl IG9uZSB0aGF0IGZpbmRzIG5vIGJ1c3kgcGFnZXMuCiAgICAKICAgIENjOiBKYW4gS2FyYSA8amFj a0BzdXNlLmN6PgogICAgQ2M6IERhdmUgQ2hpbm5lciA8ZGF2aWRAZnJvbW9yYml0LmNvbT4KICAg IENjOiAiRGFycmljayBKLiBXb25nIiA8ZGFycmljay53b25nQG9yYWNsZS5jb20+CiAgICBDYzog Um9zcyBad2lzbGVyIDxyb3NzLnp3aXNsZXJAbGludXguaW50ZWwuY29tPgogICAgQ2M6IENocmlz dG9waCBIZWxsd2lnIDxoY2hAbHN0LmRlPgogICAgU2lnbmVkLW9mZi1ieTogRGFuIFdpbGxpYW1z IDxkYW4uai53aWxsaWFtc0BpbnRlbC5jb20+CgpkaWZmIC0tZ2l0IGEvZnMveGZzL3hmc19maWxl LmMgYi9mcy94ZnMveGZzX2ZpbGUuYwppbmRleCBjNjc4MDc0M2Y4ZWMuLmUzZWM0NmMyOGM2MCAx MDA2NDQKLS0tIGEvZnMveGZzL3hmc19maWxlLmMKKy0rLSstIGIvZnMveGZzL3hmc19maWxlLmMK QEAgLTM0Nyw3ICstMzQ3LDcgQEAgeGZzX2ZpbGVfYWlvX3dyaXRlX2NoZWNrcygKIAkJcmV0dXJu IGVycm9yOwogCiAJZXJyb3IgPSB4ZnNfYnJlYWtfbGF5b3V0cyhpbm9kZSwgaW9sb2NrKTsKLQlp ZiAoZXJyb3IpCistCWlmIChlcnJvciA8IDApCiAJCXJldHVybiBlcnJvcjsKIAogCS8qCkBAIC03 NjIsNyArLTc2Miw3IEBAIHhmc19maWxlX2ZhbGxvY2F0ZSgKIAlzdHJ1Y3QgeGZzX2lub2RlCSpp cCA9IFhGU19JKGlub2RlKTsKIAlsb25nCQkJZXJyb3I7CiAJZW51bSB4ZnNfcHJlYWxsb2NfZmxh Z3MJZmxhZ3MgPSAwOwotCXVpbnQJCQlpb2xvY2sgPSBYRlNfSU9MT0NLX0VYQ0w7CistCXVpbnQJ CQlpb2xvY2sgPSBYRlNfREFYRE1BX0xPQ0tfU0hBUkVEOwogCWxvZmZfdAkJCW5ld19zaXplID0g MDsKIAlib29sCQkJZG9fZmlsZV9pbnNlcnQgPSAwOwogCkBAIC03NzEsMTAgKy03NzEsMjAgQEAg eGZzX2ZpbGVfZmFsbG9jYXRlKAogCWlmIChtb2RlICYgK0FINC1YRlNfRkFMTE9DX0ZMX1NVUFBP UlRFRCkKIAkJcmV0dXJuIC1FT1BOT1RTVVBQOwogCistcmV0cnk6CiAJeGZzX2lsb2NrKGlwLCBp b2xvY2spOworLQlkYXhfd2FpdF9kbWEoaW5vZGUtPmlfbWFwcGluZywgb2Zmc2V0LCBsZW4pOwor LQorLQl4ZnNfaWxvY2soaXAsIFhGU19JT0xPQ0tfRVhDTCk7CistCWlvbG9jayB8PSBYRlNfSU9M T0NLX0VYQ0w7CiAJZXJyb3IgPSB4ZnNfYnJlYWtfbGF5b3V0cyhpbm9kZSwgJmlvbG9jayk7Ci0J aWYgKGVycm9yKQorLQlpZiAoZXJyb3IgPCAwKQogCQlnb3RvIG91dF91bmxvY2s7CistCWVsc2Ug aWYgKGVycm9yID4gMCAmJiBJU19FTkFCTEVEKENPTkZJR19GU19EQVgpKSB7CistCQl4ZnNfaXVu bG9jayhpcCwgaW9sb2NrKTsKKy0JCWlvbG9jayA9IFhGU19EQVhETUFfTE9DS19TSEFSRUQ7Cist CQlnb3RvIHJldHJ5OworLQl9CiAKIAl4ZnNfaWxvY2soaXAsIFhGU19NTUFQTE9DS19FWENMKTsK IAlpb2xvY2sgfD0gWEZTX01NQVBMT0NLX0VYQ0w7CmRpZmYgLS1naXQgYS9mcy94ZnMveGZzX2lu b2RlLmMgYi9mcy94ZnMveGZzX2lub2RlLmMKaW5kZXggNGVjNWI3ZjQ1NDAxLi43ODNmMTU4OTRi N2IgMTAwNjQ0Ci0tLSBhL2ZzL3hmcy94ZnNfaW5vZGUuYworLSstKy0gYi9mcy94ZnMveGZzX2lu b2RlLmMKQEAgLTE3MSw3ICstMTcxLDE0IEBAIHhmc19pbG9ja19hdHRyX21hcF9zaGFyZWQoCiAg KiB0YWtlbiBpbiBwbGFjZXMgd2hlcmUgd2UgbmVlZCB0byBpbnZhbGlkYXRlIHRoZSBwYWdlIGNh Y2hlIGluIGEgcmFjZQogICogZnJlZSBtYW5uZXIgKGUuZy4gdHJ1bmNhdGUsIGhvbGUgcHVuY2gg YW5kIG90aGVyIGV4dGVudCBtYW5pcHVsYXRpb24KICAqIGZ1bmN0aW9ucykuCi0gKi8KKy0gKgor LSAqIFRoZSBYRlNfREFYRE1BX0xPQ0tfU0hBUkVEIGxvY2sgaXMgYSBDT05GSUdfRlNfREFYIHNw ZWNpYWwgY2FzZSBsb2NrCistICogZm9yIHN5bmNocm9uaXppbmcgdHJ1bmNhdGUgdnMgb25nb2lu ZyBETUEuIFRoZSBnZXRfdXNlcl9wYWdlcygpIHBhdGgKKy0gKiB3aWxsIGhvbGQgdGhpcyBsb2Nr IGV4Y2x1c2l2ZWx5IHdoZW4gaW5jcmVtZW50aW5nIHBhZ2UgcmVmZXJlbmNlCistICogY291bnRz IGZvciBETUEuIEJlZm9yZSBhbiBleHRlbnQgY2FuIGJlIHRydW5jYXRlZCB3ZSBuZWVkIHRvIGNv bXBsZXRlCistICogYSB2YWxpZGF0ZS1pZGxlIHN3ZWVwIG9mIGFsbCBwYWdlcyBpbiB0aGUgcmFu Z2Ugd2hpbGUgaG9sZGluZyB0aGlzCistICogbG9jayBpbiBzaGFyZWQgbW9kZS4KKy0qLwogdm9p ZAogeGZzX2lsb2NrKAogCXhmc19pbm9kZV90CQkqaXAsCkBAIC0xOTIsNiArLTE5OSw5IEBAIHhm c19pbG9jaygKIAkgICAgICAgKFhGU19JTE9DS19TSEFSRUQgfCBYRlNfSUxPQ0tfRVhDTCkpOwog CUFTU0VSVCgobG9ja19mbGFncyAmICtBSDQoWEZTX0xPQ0tfTUFTSyB8IFhGU19MT0NLX1NVQkNM QVNTX01BU0spKSA9PSAwKTsKIAorLQlpZiAobG9ja19mbGFncyAmIFhGU19EQVhETUFfTE9DS19T SEFSRUQpCistCQlpX2RheGRtYV9sb2NrX3NoYXJlZChWRlNfSShpcCkpOworLQogCWlmIChsb2Nr X2ZsYWdzICYgWEZTX0lPTE9DS19FWENMKSB7CiAJCWRvd25fd3JpdGVfbmVzdGVkKCZWRlNfSShp cCktPmlfcndzZW0sCiAJCQkJICBYRlNfSU9MT0NLX0RFUChsb2NrX2ZsYWdzKSk7CkBAIC0zMjgs NiArLTMzOCw5IEBAIHhmc19pdW5sb2NrKAogCWVsc2UgaWYgKGxvY2tfZmxhZ3MgJiBYRlNfSUxP Q0tfU0hBUkVEKQogCQltcnVubG9ja19zaGFyZWQoJmlwLT5pX2xvY2spOwogCistCWlmIChsb2Nr X2ZsYWdzICYgWEZTX0RBWERNQV9MT0NLX1NIQVJFRCkKKy0JCWlfZGF4ZG1hX3VubG9ja19zaGFy ZWQoVkZTX0koaXApKTsKKy0KIAl0cmFjZV94ZnNfaXVubG9jayhpcCwgbG9ja19mbGFncywgX1JF VF9JUF8pOwogfQogCmRpZmYgLS1naXQgYS9mcy94ZnMveGZzX2lub2RlLmggYi9mcy94ZnMveGZz X2lub2RlLmgKaW5kZXggMGVlNDUzZGUyMzlhLi4wNjYyZWRmMDA1MjkgMTAwNjQ0Ci0tLSBhL2Zz L3hmcy94ZnNfaW5vZGUuaAorLSstKy0gYi9mcy94ZnMveGZzX2lub2RlLmgKQEAgLTI4MywxMCAr LTI4MywxMiBAQCBzdGF0aWMgaW5saW5lIHZvaWQgeGZzX2lmdW5sb2NrKHN0cnVjdCB4ZnNfaW5v ZGUgKmlwKQogI2RlZmluZQlYRlNfSUxPQ0tfU0hBUkVECSgxPDwzKQogI2RlZmluZQlYRlNfTU1B UExPQ0tfRVhDTAkoMTw8NCkKICNkZWZpbmUJWEZTX01NQVBMT0NLX1NIQVJFRAkoMTw8NSkKKy0j ZGVmaW5lCVhGU19EQVhETUFfTE9DS19TSEFSRUQJKDE8PDYpCiAKICNkZWZpbmUgWEZTX0xPQ0tf TUFTSwkJKFhGU19JT0xPQ0tfRVhDTCB8IFhGU19JT0xPQ0tfU0hBUkVEICtBRncKIAkJCQl8IFhG U19JTE9DS19FWENMIHwgWEZTX0lMT0NLX1NIQVJFRCArQUZ3Ci0JCQkJfCBYRlNfTU1BUExPQ0tf RVhDTCB8IFhGU19NTUFQTE9DS19TSEFSRUQpCistCQkJCXwgWEZTX01NQVBMT0NLX0VYQ0wgfCBY RlNfTU1BUExPQ0tfU0hBUkVEICtBRncKKy0JCQkJfCBYRlNfREFYRE1BX0xPQ0tfU0hBUkVEKQog CiAjZGVmaW5lIFhGU19MT0NLX0ZMQUdTICtBRncKIAl7IFhGU19JT0xPQ0tfRVhDTCwJIklPTE9D S19FWENMIiB9LCArQUZ3CkBAIC0yOTQsNyArLTI5Niw4IEBAIHN0YXRpYyBpbmxpbmUgdm9pZCB4 ZnNfaWZ1bmxvY2soc3RydWN0IHhmc19pbm9kZSAqaXApCiAJeyBYRlNfSUxPQ0tfRVhDTCwJIklM T0NLX0VYQ0wiIH0sICtBRncKIAl7IFhGU19JTE9DS19TSEFSRUQsCSJJTE9DS19TSEFSRUQiIH0s ICtBRncKIAl7IFhGU19NTUFQTE9DS19FWENMLAkiTU1BUExPQ0tfRVhDTCIgfSwgK0FGdwotCXsg WEZTX01NQVBMT0NLX1NIQVJFRCwJIk1NQVBMT0NLX1NIQVJFRCIgfQorLQl7IFhGU19NTUFQTE9D S19TSEFSRUQsCSJNTUFQTE9DS19TSEFSRUQiIH0sICtBRncKKy0JeyBYRlNfREFYRE1BX0xPQ0tf U0hBUkVELCAiWEZTX0RBWERNQV9MT0NLX1NIQVJFRCIgfQogCiAKIC8qCmRpZmYgLS1naXQgYS9m cy94ZnMveGZzX2lvY3RsLmMgYi9mcy94ZnMveGZzX2lvY3RsLmMKaW5kZXggYWE3NTM4OWJlOGNm Li5mZDM4NGVhMDBlZGUgMTAwNjQ0Ci0tLSBhL2ZzL3hmcy94ZnNfaW9jdGwuYworLSstKy0gYi9m cy94ZnMveGZzX2lvY3RsLmMKQEAgLTYxMiw3ICstNjEyLDcgQEAgeGZzX2lvY19zcGFjZSgKIAlz dHJ1Y3QgeGZzX2lub2RlCSppcCA9IFhGU19JKGlub2RlKTsKIAlzdHJ1Y3QgaWF0dHIJCWlhdHRy OwogCWVudW0geGZzX3ByZWFsbG9jX2ZsYWdzCWZsYWdzID0gMDsKLQl1aW50CQkJaW9sb2NrID0g WEZTX0lPTE9DS19FWENMOworLQl1aW50CQkJaW9sb2NrID0gWEZTX0RBWERNQV9MT0NLX1NIQVJF RDsKIAlpbnQJCQllcnJvcjsKIAogCS8qCkBAIC02MzcsMTggKy02MzcsNiBAQCB4ZnNfaW9jX3Nw YWNlKAogCWlmIChmaWxwLT5mX21vZGUgJiBGTU9ERV9OT0NNVElNRSkKIAkJZmxhZ3MgfD0gWEZT X1BSRUFMTE9DX0lOVklTSUJMRTsKIAotCWVycm9yID0gbW50X3dhbnRfd3JpdGVfZmlsZShmaWxw KTsKLQlpZiAoZXJyb3IpCi0JCXJldHVybiBlcnJvcjsKLQotCXhmc19pbG9jayhpcCwgaW9sb2Nr KTsKLQllcnJvciA9IHhmc19icmVha19sYXlvdXRzKGlub2RlLCAmaW9sb2NrKTsKLQlpZiAoZXJy b3IpCi0JCWdvdG8gb3V0X3VubG9jazsKLQotCXhmc19pbG9jayhpcCwgWEZTX01NQVBMT0NLX0VY Q0wpOwotCWlvbG9jayB8PSBYRlNfTU1BUExPQ0tfRVhDTDsKLQogCXN3aXRjaCAoYmYtPmxfd2hl bmNlKSB7CiAJY2FzZSAwOiAvKlNFRUtfU0VUKi8KIAkJYnJlYWs7CkBAIC02NTksMTAgKy02NDcs MzEgQEAgeGZzX2lvY19zcGFjZSgKIAkJYmYtPmxfc3RhcnQgKy09IFhGU19JU0laRShpcCk7CiAJ CWJyZWFrOwogCWRlZmF1bHQ6Ci0JCWVycm9yID0gLUVJTlZBTDsKKy0JCXJldHVybiAtRUlOVkFM OworLQl9CistCistCWVycm9yID0gbW50X3dhbnRfd3JpdGVfZmlsZShmaWxwKTsKKy0JaWYgKGVy cm9yKQorLQkJcmV0dXJuIGVycm9yOworLQorLXJldHJ5OgorLQl4ZnNfaWxvY2soaXAsIGlvbG9j ayk7CistCWRheF93YWl0X2RtYShpbm9kZS0+aV9tYXBwaW5nLCBiZi0+bF9zdGFydCwgYmYtPmxf bGVuKTsKKy0KKy0JeGZzX2lsb2NrKGlwLCBYRlNfSU9MT0NLX0VYQ0wpOworLQlpb2xvY2sgfD0g WEZTX0lPTE9DS19FWENMOworLQllcnJvciA9IHhmc19icmVha19sYXlvdXRzKGlub2RlLCAmaW9s b2NrKTsKKy0JaWYgKGVycm9yIDwgMCkKIAkJZ290byBvdXRfdW5sb2NrOworLQllbHNlIGlmIChl cnJvciA+IDAgJiYgSVNfRU5BQkxFRChDT05GSUdfRlNfREFYKSkgeworLQkJeGZzX2l1bmxvY2so aXAsIGlvbG9jayk7CistCQlpb2xvY2sgPSBYRlNfREFYRE1BX0xPQ0tfU0hBUkVEOworLQkJZ290 byByZXRyeTsKIAl9CiAKKy0JeGZzX2lsb2NrKGlwLCBYRlNfTU1BUExPQ0tfRVhDTCk7CistCWlv bG9jayB8PSBYRlNfTU1BUExPQ0tfRVhDTDsKKy0KIAkvKgogCSAqIGxlbmd0aCBvZiA8PSAwIGZv ciByZXN2L3VucmVzdi96ZXJvIGlzIGludmFsaWQuICBsZW5ndGggZm9yCiAJICogYWxsb2MvZnJl ZSBpcyBpZ25vcmVkIGNvbXBsZXRlbHkgYW5kIHdlIGhhdmUgbm8gaWRlYSB3aGF0IHVzZXJzcGFj ZQpkaWZmIC0tZ2l0IGEvZnMveGZzL3hmc19wbmZzLmMgYi9mcy94ZnMveGZzX3BuZnMuYwppbmRl eCA0MjQ2ODc2ZGY3YjcuLjVmNGQ0NmIzY2Q3ZiAxMDA2NDQKLS0tIGEvZnMveGZzL3hmc19wbmZz LmMKKy0rLSstIGIvZnMveGZzL3hmc19wbmZzLmMKQEAgLTM1LDE4ICstMzUsMTkgQEAgeGZzX2Jy ZWFrX2xheW91dHMoCiAJdWludAkJCSppb2xvY2spCiB7CiAJc3RydWN0IHhmc19pbm9kZQkqaXAg PSBYRlNfSShpbm9kZSk7Ci0JaW50CQkJZXJyb3I7CistCWludAkJCWVycm9yLCBkaWRfdW5sb2Nr ID0gMDsKIAogCUFTU0VSVCh4ZnNfaXNpbG9ja2VkKGlwLCBYRlNfSU9MT0NLX1NIQVJFRHxYRlNf SU9MT0NLX0VYQ0wpKTsKIAogCXdoaWxlICgoZXJyb3IgPSBicmVha19sYXlvdXQoaW5vZGUsIGZh bHNlKSA9PSAtRVdPVUxEQkxPQ0spKSB7CiAJCXhmc19pdW5sb2NrKGlwLCAqaW9sb2NrKTsKKy0J CWRpZF91bmxvY2sgPSAxOwogCQllcnJvciA9IGJyZWFrX2xheW91dChpbm9kZSwgdHJ1ZSk7CiAJ CSppb2xvY2sgPSBYRlNfSU9MT0NLX0VYQ0w7CiAJCXhmc19pbG9jayhpcCwgKmlvbG9jayk7CiAJ fQogCi0JcmV0dXJuIGVycm9yOworLQlyZXR1cm4gZXJyb3IgPCAwID8gZXJyb3IgOiBkaWRfdW5s b2NrOwogfQogCiAvKgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX18KTGludXgtbnZkaW1tIG1haWxpbmcgbGlzdApMaW51eC1udmRpbW1AbGlzdHMuMDEub3Jn Cmh0dHBzOi8vbGlzdHMuMDEub3JnL21haWxtYW4vbGlzdGluZm8vbGludXgtbnZkaW1tCg== From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Williams, Dan J" Subject: Re: [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less' support Date: Thu, 26 Oct 2017 23:51:04 +0000 Message-ID: <1509061831.25213.2.camel@intel.com> References: <150846713528.24336.4459262264611579791.stgit@dwillia2-desk3.amr.corp.intel.com> <20171020074750.GA13568@lst.de> <20171020093148.GA20304@lst.de> <20171026105850.GA31161@quack2.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-7" Content-Transfer-Encoding: 8BIT Return-path: In-Reply-To: <20171026105850.GA31161@quack2.suse.cz> Content-Language: en-US Content-ID: <6B1E8E88FC7C9E4D9754113A024144AD@intel.com> Sender: linux-kernel-owner@vger.kernel.org To: "hch@lst.de" , "jack@suse.cz" Cc: "schwidefsky@de.ibm.com" , "darrick.wong@oracle.com" , "dledford@redhat.com" , "linux-rdma@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "bfields@fieldses.org" , "linux-mm@kvack.org" , "heiko.carstens@de.ibm.com" , "dave.hansen@linux.intel.com" , "linux-xfs@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "jmoyer@redhat.com" , "viro@zeniv.linux.org.uk" , "kirill.shutemov@linux.intel.com" , akpm@linux-foundati List-Id: linux-rdma@vger.kernel.org On Thu, 2017-10-26 at 12:58 +-0200, Jan Kara wrote: +AD4- On Fri 20-10-17 11:31:48, Christoph Hellwig wrote: +AD4- +AD4- On Fri, Oct 20, 2017 at 09:47:50AM +-0200, Christoph Hellwig wrote: +AD4- +AD4- +AD4- I'd like to brainstorm how we can do something better. +AD4- +AD4- +AD4- +AD4- +AD4- +AD4- How about: +AD4- +AD4- +AD4- +AD4- +AD4- +AD4- If we hit a page with an elevated refcount in truncate / hole puch +AD4- +AD4- +AD4- etc for a DAX file system we do not free the blocks in the file system, +AD4- +AD4- +AD4- but add it to the extent busy list.+AKAAoA-We mark the page as delayed +AD4- +AD4- +AD4- free (e.g. page flag?) so that when it finally hits refcount zero we +AD4- +AD4- +AD4- call back into the file system to remove it from the busy list. +AD4- +AD4- +AD4- +AD4- Brainstorming some more: +AD4- +AD4- +AD4- +AD4- Given that on a DAX file there shouldn't be any long-term page +AD4- +AD4- references after we unmap it from the page table and don't allow +AD4- +AD4- get+AF8-user+AF8-pages calls why not wait for the references for all +AD4- +AD4- DAX pages to go away first?+AKAAoA-E.g. if we find a DAX page in +AD4- +AD4- truncate+AF8-inode+AF8-pages+AF8-range that has an elevated refcount we set +AD4- +AD4- a new flag to prevent new references from showing up, and then +AD4- +AD4- simply wait for it to go away.+AKAAoA-Instead of a busy way we can +AD4- +AD4- do this through a few hashed waitqueued in dev+AF8-pagemap.+AKAAoA-And in +AD4- +AD4- fact put+AF8-zone+AF8-device+AF8-page already gets called when putting the +AD4- +AD4- last page so we can handle the wakeup from there. +AD4- +AD4- +AD4- +AD4- In fact if we can't find a page flag for the stop new callers +AD4- +AD4- things we could probably come up with a way to do that through +AD4- +AD4- dev+AF8-pagemap somehow, but I'm not sure how efficient that would +AD4- +AD4- be. +AD4- +AD4- We were talking about this yesterday with Dan so some more brainstorming +AD4- from us. We can implement the solution with extent busy list in ext4 +AD4- relatively easily - we already have such list currently similarly to XFS. +AD4- There would be some modifications needed but nothing too complex. The +AD4- biggest downside of this solution I see is that it requires per-filesystem +AD4- solution for busy extents - ext4 and XFS are reasonably fine, however btrfs +AD4- may have problems and ext2 definitely will need some modifications. +AD4- Invisible used blocks may be surprising to users at times although given +AD4- page refs should be relatively short term, that should not be a big issue. +AD4- But are we guaranteed page refs are short term? E.g. if someone creates +AD4- v4l2 videobuf in MAP+AF8-SHARED mapping of a file on DAX filesystem, page refs +AD4- can be rather long-term similarly as in RDMA case. Also freeing of blocks +AD4- on page reference drop is another async entry point into the filesystem +AD4- which could unpleasantly surprise us but I guess workqueues would solve +AD4- that reasonably fine. +AD4- +AD4- WRT waiting for page refs to be dropped before proceeding with truncate (or +AD4- punch hole for that matter - that case is even nastier since we don't have +AD4- i+AF8-size to guard us). What I like about this solution is that it is very +AD4- visible there's something unusual going on with the file being truncated / +AD4- punched and so problems are easier to diagnose / fix from the admin side. +AD4- So far we have guarded hole punching from concurrent faults (and +AD4- get+AF8-user+AF8-pages() does fault once you do unmap+AF8-mapping+AF8-range()) with +AD4- I+AF8-MMAP+AF8-LOCK (or its equivalent in ext4). We cannot easily wait for page +AD4- refs to be dropped under I+AF8-MMAP+AF8-LOCK as that could deadlock - the most +AD4- obvious case Dan came up with is when GUP obtains ref to page A, then hole +AD4- punch comes grabbing I+AF8-MMAP+AF8-LOCK and waiting for page ref on A to be +AD4- dropped, and then GUP blocks on trying to fault in another page. +AD4- +AD4- I think we cannot easily prevent new page references to be grabbed as you +AD4- write above since nobody expects stuff like get+AF8-page() to fail. But I+AKA- +AD4- think that unmapping relevant pages and then preventing them to be faulted +AD4- in again is workable and stops GUP as well. The problem with that is though +AD4- what to do with page faults to such pages - you cannot just fail them for +AD4- hole punch, and you cannot easily allocate new blocks either. So we are +AD4- back at a situation where we need to detach blocks from the inode and then +AD4- wait for page refs to be dropped - so some form of busy extents. Am I +AD4- missing something? +AD4- No, that's a good summary of what we talked about. However, I did go back and give the new lock approach a try and was able to get my test to pass. The new locking is not pretty especially since you need to drop and reacquire the lock so that get+AF8-user+AF8-pages() can finish grabbing all the pages it needs. Here are the two primary patches in the series, do you think the extent-busy approach would be cleaner? --- commit 5023d20a0aa795ddafd43655be1bfb2cbc7f4445 Author: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- Date: Wed Oct 25 05:14:54 2017 -0700 mm, dax: handle truncate of dma-busy pages get+AF8-user+AF8-pages() pins file backed memory pages for access by dma devices. However, it only pins the memory pages not the page-to-file offset association. If a file is truncated the pages are mapped out of the file and dma may continue indefinitely into a page that is owned by a device driver. This breaks coherency of the file vs dma, but the assumption is that if userspace wants the file-space truncated it does not matter what data is inbound from the device, it is not relevant anymore. The assumptions of the truncate-page-cache model are broken by DAX where the target DMA page +ACo-is+ACo- the filesystem block. Leaving the page pinned for DMA, but truncating the file block out of the file, means that the filesytem is free to reallocate a block under active DMA to another file+ACE- Here are some possible options for fixing this situation ('truncate' and 'fallocate(punch hole)' are synonymous below): 1/ Fail truncate while any file blocks might be under dma 2/ Block (sleep-wait) truncate while any file blocks might be under dma 3/ Remap file blocks to a +ACI-lost+-found+ACI--like file-inode where dma can continue and we might see what inbound data from DMA was mapped out of the original file. Blocks in this file could be freed back to the filesystem when dma eventually ends. 4/ List the blocks under DMA in the extent busy list and either hold off commit of the truncate transaction until commit, or otherwise keep the blocks marked busy so the allocator does not reuse them until DMA completes. 5/ Disable dax until option 3 or another long term solution has been implemented. However, filesystem-dax is still marked experimental for concerns like this. Option 1 will throw failures where userspace has never expected them before, option 2 might hang the truncating process indefinitely, and option 3 requires per filesystem enabling to remap blocks from one inode to another. Option 2 is implemented in this patch for the DAX path with the expectation that non-transient users of get+AF8-user+AF8-pages() (RDMA) are disallowed from setting up dax mappings and that the potential delay introduced to the truncate path is acceptable compared to the response time of the page cache case. This can only be seen as a stop-gap until we can solve the problem of safely sequestering unallocated filesystem blocks under active dma. The solution introduces a new inode semaphore that that is held exclusively for get+AF8-user+AF8-pages() and held for read at truncate while sleep-waiting on a hashed waitqueue. Credit for option 3 goes to Dave Hansen, who proposed something similar as an alternative way to solve the problem that MAP+AF8-DIRECT was trying to solve. Credit for option 4 goes to Christoph Hellwig. Cc: Jan Kara +ADw-jack+AEA-suse.cz+AD4- Cc: Jeff Moyer +ADw-jmoyer+AEA-redhat.com+AD4- Cc: Dave Chinner +ADw-david+AEA-fromorbit.com+AD4- Cc: Matthew Wilcox +ADw-mawilcox+AEA-microsoft.com+AD4- Cc: Alexander Viro +ADw-viro+AEA-zeniv.linux.org.uk+AD4- Cc: +ACI-Darrick J. Wong+ACI- +ADw-darrick.wong+AEA-oracle.com+AD4- Cc: Ross Zwisler +ADw-ross.zwisler+AEA-linux.intel.com+AD4- Cc: Dave Hansen +ADw-dave.hansen+AEA-linux.intel.com+AD4- Cc: Andrew Morton +ADw-akpm+AEA-linux-foundation.org+AD4- Reported-by: Christoph Hellwig +ADw-hch+AEA-lst.de+AD4- Signed-off-by: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 4ac359e14777..a5a4b95ffdaf 100644 --- a/drivers/dax/super.c +-+-+- b/drivers/dax/super.c +AEAAQA- -167,6 +-167,7 +AEAAQA- struct dax+AF8-device +AHs- +ACM-if IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX) static void generic+AF8-dax+AF8-pagefree(struct page +ACo-page, void +ACo-data) +AHs- +- wake+AF8-up+AF8-devmap+AF8-idle(+ACY-page-+AD4AXw-refcount)+ADs- +AH0- struct dax+AF8-device +ACo-fs+AF8-dax+AF8-claim+AF8-bdev(struct block+AF8-device +ACo-bdev, void +ACo-owner) diff --git a/fs/dax.c b/fs/dax.c index fd5d385988d1..f2c98f9cb833 100644 --- a/fs/dax.c +-+-+- b/fs/dax.c +AEAAQA- -346,6 +-346,19 +AEAAQA- static void dax+AF8-disassociate+AF8-entry(void +ACo-entry, struct inode +ACo-inode, bool trunc) +AH0- +AH0- +-static struct page +ACo-dma+AF8-busy+AF8-page(void +ACo-entry) +-+AHs- +- unsigned long pfn, end+AF8-pfn+ADs- +- +- for+AF8-each+AF8-entry+AF8-pfn(entry, pfn, end+AF8-pfn) +AHs- +- struct page +ACo-page +AD0- pfn+AF8-to+AF8-page(pfn)+ADs- +- +- if (page+AF8-ref+AF8-count(page) +AD4- 1) +- return page+ADs- +- +AH0- +- return NULL+ADs- +-+AH0- +- /+ACo- +ACo- Find radix tree entry at given index. If it points to an exceptional entry, +ACo- return it with the radix tree entry locked. If the radix tree doesn't +AEAAQA- -487,6 +-500,97 +AEAAQA- static void +ACo-grab+AF8-mapping+AF8-entry(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index, return entry+ADs- +AH0- +-static int wait+AF8-page(atomic+AF8-t +ACoAXw-refcount) +-+AHs- +- struct page +ACo-page +AD0- container+AF8-of(+AF8-refcount, struct page, +AF8-refcount)+ADs- +- struct inode +ACo-inode +AD0- page-+AD4-inode+ADs- +- +- if (page+AF8-ref+AF8-count(page) +AD0APQ- 1) +- return 0+ADs- +- +- i+AF8-daxdma+AF8-unlock+AF8-shared(inode)+ADs- +- schedule()+ADs- +- i+AF8-daxdma+AF8-lock+AF8-shared(inode)+ADs- +- +- /+ACo- +- +ACo- if we bounced the daxdma+AF8-lock then we need to rescan the +- +ACo- truncate area. +- +ACo-/ +- return 1+ADs- +-+AH0- +- +-void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-mapping, loff+AF8-t lstart, loff+AF8-t len) +-+AHs- +- struct inode +ACo-inode +AD0- mapping-+AD4-host+ADs- +- pgoff+AF8-t indices+AFs-PAGEVEC+AF8-SIZE+AF0AOw- +- pgoff+AF8-t start, end, index+ADs- +- struct pagevec pvec+ADs- +- unsigned i+ADs- +- +- lockdep+AF8-assert+AF8-held(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +- +- if (lstart +ADw- 0 +AHwAfA- len +ADw- -1) +- return+ADs- +- +- /+ACo- in the limited case get+AF8-user+AF8-pages for dax is disabled +ACo-/ +- if (IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX+AF8-LIMITED)) +- return+ADs- +- +- if (+ACE-dax+AF8-mapping(mapping)) +- return+ADs- +- +- if (mapping-+AD4-nrexceptional +AD0APQ- 0) +- return+ADs- +- +- if (len +AD0APQ- -1) +- end +AD0- -1+ADs- +- else +- end +AD0- (lstart +- len) +AD4APg- PAGE+AF8-SHIFT+ADs- +- start +AD0- lstart +AD4APg- PAGE+AF8-SHIFT+ADs- +- +-retry: +- pagevec+AF8-init(+ACY-pvec, 0)+ADs- +- index +AD0- start+ADs- +- while (index +ADw- end +ACYAJg- pagevec+AF8-lookup+AF8-entries(+ACY-pvec, mapping, index, +- min(end - index, (pgoff+AF8-t)PAGEVEC+AF8-SIZE), +- indices)) +AHs- +- for (i +AD0- 0+ADs- i +ADw- pagevec+AF8-count(+ACY-pvec)+ADs- i+-+-) +AHs- +- struct page +ACo-pvec+AF8-ent +AD0- pvec.pages+AFs-i+AF0AOw- +- struct page +ACo-page +AD0- NULL+ADs- +- void +ACo-entry+ADs- +- +- index +AD0- indices+AFs-i+AF0AOw- +- if (index +AD4APQ- end) +- break+ADs- +- +- if (+ACE-radix+AF8-tree+AF8-exceptional+AF8-entry(pvec+AF8-ent)) +- continue+ADs- +- +- spin+AF8-lock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- entry +AD0- get+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, NULL)+ADs- +- if (entry) +- page +AD0- dma+AF8-busy+AF8-page(entry)+ADs- +- put+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, entry)+ADs- +- spin+AF8-unlock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- +- if (page +ACYAJg- wait+AF8-on+AF8-devmap+AF8-idle(+ACY-page-+AD4AXw-refcount, +- wait+AF8-page, +- TASK+AF8-UNINTERRUPTIBLE) +ACEAPQ- 0) +AHs- +- /+ACo- +- +ACo- We dropped the dma lock, so we need +- +ACo- to revalidate that previously seen +- +ACo- idle pages are still idle. +- +ACo-/ +- goto retry+ADs- +- +AH0- +- +AH0- +- pagevec+AF8-remove+AF8-exceptionals(+ACY-pvec)+ADs- +- pagevec+AF8-release(+ACY-pvec)+ADs- +- index+-+-+ADs- +- +AH0- +-+AH0- +-EXPORT+AF8-SYMBOL+AF8-GPL(dax+AF8-wait+AF8-dma)+ADs- +- static int +AF8AXw-dax+AF8-invalidate+AF8-mapping+AF8-entry(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index, bool trunc) +AHs- +AEAAQA- -509,8 +-613,10 +AEAAQA- static int +AF8AXw-dax+AF8-invalidate+AF8-mapping+AF8-entry(struct address+AF8-space +ACo-mapping, out: put+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, entry)+ADs- spin+AF8-unlock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- return ret+ADs- +AH0- +- /+ACo- +ACo- Delete exceptional DAX entry at +AEA-index from +AEA-mapping. Wait for radix tree +ACo- entry to get unlocked before deleting it. diff --git a/fs/inode.c b/fs/inode.c index d1e35b53bb23..95408e87a96c 100644 --- a/fs/inode.c +-+-+- b/fs/inode.c +AEAAQA- -192,6 +-192,7 +AEAAQA- int inode+AF8-init+AF8-always(struct super+AF8-block +ACo-sb, struct inode +ACo-inode) inode-+AD4-i+AF8-fsnotify+AF8-mask +AD0- 0+ADs- +ACM-endif inode-+AD4-i+AF8-flctx +AD0- NULL+ADs- +- i+AF8-daxdma+AF8-init(inode)+ADs- this+AF8-cpu+AF8-inc(nr+AF8-inodes)+ADs- return 0+ADs- diff --git a/include/linux/dax.h b/include/linux/dax.h index ea21ebfd1889..6ce1c50519e7 100644 --- a/include/linux/dax.h +-+-+- b/include/linux/dax.h +AEAAQA- -100,10 +-100,15 +AEAAQA- int dax+AF8-invalidate+AF8-mapping+AF8-entry+AF8-sync(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index)+ADs- +ACM-ifdef CONFIG+AF8-FS+AF8-DAX +-void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-mapping, loff+AF8-t lstart, loff+AF8-t len)+ADs- int +AF8AXw-dax+AF8-zero+AF8-page+AF8-range(struct block+AF8-device +ACo-bdev, struct dax+AF8-device +ACo-dax+AF8-dev, sector+AF8-t sector, unsigned int offset, unsigned int length)+ADs- +ACM-else +-static inline void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-mapping, loff+AF8-t lstart, +- loff+AF8-t len) +-+AHs- +-+AH0- static inline int +AF8AXw-dax+AF8-zero+AF8-page+AF8-range(struct block+AF8-device +ACo-bdev, struct dax+AF8-device +ACo-dax+AF8-dev, sector+AF8-t sector, unsigned int offset, unsigned int length) diff --git a/include/linux/fs.h b/include/linux/fs.h index 13dab191a23e..cd5b4a092d1c 100644 --- a/include/linux/fs.h +-+-+- b/include/linux/fs.h +AEAAQA- -645,6 +-645,9 +AEAAQA- struct inode +AHs- +ACM-ifdef CONFIG+AF8-IMA atomic+AF8-t i+AF8-readcount+ADs- /+ACo- struct files open RO +ACo-/ +ACM-endif +-+ACM-ifdef CONFIG+AF8-FS+AF8-DAX +- struct rw+AF8-semaphore i+AF8-dax+AF8-dmasem+ADs- +-+ACM-endif const struct file+AF8-operations +ACo-i+AF8-fop+ADs- /+ACo- former -+AD4-i+AF8-op-+AD4-default+AF8-file+AF8-ops +ACo-/ struct file+AF8-lock+AF8-context +ACo-i+AF8-flctx+ADs- struct address+AF8-space i+AF8-data+ADs- +AEAAQA- -747,6 +-750,59 +AEAAQA- static inline void inode+AF8-lock+AF8-nested(struct inode +ACo-inode, unsigned subclass) down+AF8-write+AF8-nested(+ACY-inode-+AD4-i+AF8-rwsem, subclass)+ADs- +AH0- +-+ACM-ifdef CONFIG+AF8-FS+AF8-DAX +-static inline void i+AF8-daxdma+AF8-init(struct inode +ACo-inode) +-+AHs- +- init+AF8-rwsem(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock(struct inode +ACo-inode) +-+AHs- +- down+AF8-write(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock(struct inode +ACo-inode) +-+AHs- +- up+AF8-write(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock+AF8-shared(struct inode +ACo-inode) +-+AHs- +- /+ACo- +- +ACo- The write lock is taken under mmap+AF8-sem in the +- +ACo- get+AF8-user+AF8-pages() path the read lock nests in the truncate +- +ACo- path. +- +ACo-/ +-+ACM-define DAXDMA+AF8-TRUNCATE+AF8-CLASS 1 +- down+AF8-read+AF8-nested(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem, DAXDMA+AF8-TRUNCATE+AF8-CLASS)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock+AF8-shared(struct inode +ACo-inode) +-+AHs- +- up+AF8-read(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +-+ACM-else /+ACo- CONFIG+AF8-FS+AF8-DAX +ACo-/ +-static inline void i+AF8-daxdma+AF8-init(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock+AF8-shared(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock+AF8-shared(struct inode +ACo-inode) +-+AHs- +-+AH0- +-+ACM-endif /+ACo- CONFIG+AF8-FS+AF8-DAX +ACo-/ +- void lock+AF8-two+AF8-nondirectories(struct inode +ACo-, struct inode+ACo-)+ADs- void unlock+AF8-two+AF8-nondirectories(struct inode +ACo-, struct inode+ACo-)+ADs- diff --git a/include/linux/wait+AF8-bit.h b/include/linux/wait+AF8-bit.h index 12b26660d7e9..6186ecdb9df7 100644 --- a/include/linux/wait+AF8-bit.h +-+-+- b/include/linux/wait+AF8-bit.h +AEAAQA- -30,10 +-30,12 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-bit(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-entry +ACo- int +AF8AXw-wait+AF8-on+AF8-bit+AF8-lock(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- void wake+AF8-up+AF8-bit(void +ACo-word, int bit)+ADs- void wake+AF8-up+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p)+ADs- +-void wake+AF8-up+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit(void +ACo-word, int, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit+AF8-timeout(void +ACo-word, int, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode, unsigned long timeout)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit+AF8-lock(void +ACo-word, int, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p, int (+ACo-)(atomic+AF8-t +ACo-), unsigned int mode)+ADs- +-int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p, int (+ACo-)(atomic+AF8-t +ACo-), unsigned int mode)+ADs- struct wait+AF8-queue+AF8-head +ACo-bit+AF8-waitqueue(void +ACo-word, int bit)+ADs- extern void +AF8AXw-init wait+AF8-bit+AF8-init(void)+ADs- +AEAAQA- -258,4 +-260,12 +AEAAQA- int wait+AF8-on+AF8-atomic+AF8-t(atomic+AF8-t +ACo-val, int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) return out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(val, action, mode)+ADs- +AH0- +-static inline +-int wait+AF8-on+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-val, int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) +-+AHs- +- might+AF8-sleep()+ADs- +- if (atomic+AF8-read(val) +AD0APQ- 1) +- return 0+ADs- +- return out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle(val, action, mode)+ADs- +-+AH0- +ACM-endif /+ACo- +AF8-LINUX+AF8-WAIT+AF8-BIT+AF8-H +ACo-/ diff --git a/kernel/sched/wait+AF8-bit.c b/kernel/sched/wait+AF8-bit.c index f8159698aa4d..6ea93149614a 100644 --- a/kernel/sched/wait+AF8-bit.c +-+-+- b/kernel/sched/wait+AF8-bit.c +AEAAQA- -162,11 +-162,17 +AEAAQA- static inline wait+AF8-queue+AF8-head+AF8-t +ACo-atomic+AF8-t+AF8-waitqueue(atomic+AF8-t +ACo-p) return bit+AF8-waitqueue(p, 0)+ADs- +AH0- -static int wake+AF8-atomic+AF8-t+AF8-function(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, unsigned mode, int sync, - void +ACo-arg) +-static inline struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-to+AF8-wait+AF8-bit+AF8-q( +- struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry) +-+AHs- +- return container+AF8-of(wq+AF8-entry, struct wait+AF8-bit+AF8-queue+AF8-entry, wq+AF8-entry)+ADs- +-+AH0- +- +-static int wake+AF8-atomic+AF8-t+AF8-function(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, +- unsigned mode, int sync, void +ACo-arg) +AHs- struct wait+AF8-bit+AF8-key +ACo-key +AD0- arg+ADs- - struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- container+AF8-of(wq+AF8-entry, struct wait+AF8-bit+AF8-queue+AF8-entry, wq+AF8-entry)+ADs- +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- to+AF8-wait+AF8-bit+AF8-q(wq+AF8-entry)+ADs- atomic+AF8-t +ACo-val +AD0- key-+AD4-flags+ADs- if (wait+AF8-bit-+AD4-key.flags +ACEAPQ- key-+AD4-flags +AHwAfA- +AEAAQA- -176,14 +-182,29 +AEAAQA- static int wake+AF8-atomic+AF8-t+AF8-function(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, unsigned mo return autoremove+AF8-wake+AF8-function(wq+AF8-entry, mode, sync, key)+ADs- +AH0- +-static int wake+AF8-devmap+AF8-idle+AF8-function(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, +- unsigned mode, int sync, void +ACo-arg) +-+AHs- +- struct wait+AF8-bit+AF8-key +ACo-key +AD0- arg+ADs- +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- to+AF8-wait+AF8-bit+AF8-q(wq+AF8-entry)+ADs- +- atomic+AF8-t +ACo-val +AD0- key-+AD4-flags+ADs- +- +- if (wait+AF8-bit-+AD4-key.flags +ACEAPQ- key-+AD4-flags +AHwAfA- +- wait+AF8-bit-+AD4-key.bit+AF8-nr +ACEAPQ- key-+AD4-bit+AF8-nr +AHwAfA- +- atomic+AF8-read(val) +ACEAPQ- 1) +- return 0+ADs- +- return autoremove+AF8-wake+AF8-function(wq+AF8-entry, mode, sync, key)+ADs- +-+AH0- +- /+ACo- +ACo- To allow interruptible waiting and asynchronous (i.e. nonblocking) waiting, +ACo- the actions of +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t() are permitted return codes. Nonzero +ACo- return codes halt waiting and return. +ACo-/ static +AF8AXw-sched -int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry, - int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) +-int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry, +- int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode, int target) +AHs- atomic+AF8-t +ACo-val+ADs- int ret +AD0- 0+ADs- +AEAAQA- -191,10 +-212,10 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-en do +AHs- prepare+AF8-to+AF8-wait(wq+AF8-head, +ACY-wbq+AF8-entry-+AD4-wq+AF8-entry, mode)+ADs- val +AD0- wbq+AF8-entry-+AD4-key.flags+ADs- - if (atomic+AF8-read(val) +AD0APQ- 0) +- if (atomic+AF8-read(val) +AD0APQ- target) break+ADs- ret +AD0- (+ACo-action)(val)+ADs- - +AH0- while (+ACE-ret +ACYAJg- atomic+AF8-read(val) +ACEAPQ- 0)+ADs- +- +AH0- while (+ACE-ret +ACYAJg- atomic+AF8-read(val) +ACEAPQ- target)+ADs- finish+AF8-wait(wq+AF8-head, +ACY-wbq+AF8-entry-+AD4-wq+AF8-entry)+ADs- return ret+ADs- +AH0- +AEAAQA- -210,16 +-231,37 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-en +AH0-, +AFw- +AH0- +-+ACM-define DEFINE+AF8-WAIT+AF8-DEVMAP+AF8-IDLE(name, p) +AFw- +- struct wait+AF8-bit+AF8-queue+AF8-entry name +AD0- +AHs- +AFw- +- .key +AD0- +AF8AXw-WAIT+AF8-ATOMIC+AF8-T+AF8-KEY+AF8-INITIALIZER(p), +AFw- +- .wq+AF8-entry +AD0- +AHs- +AFw- +- .private +AD0- current, +AFw- +- .func +AD0- wake+AF8-devmap+AF8-idle+AF8-function, +AFw- +- .entry +AD0- +AFw- +- LIST+AF8-HEAD+AF8-INIT((name).wq+AF8-entry.entry), +AFw- +- +AH0-, +AFw- +- +AH0- +- +AF8AXw-sched int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p, int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) +AHs- struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head +AD0- atomic+AF8-t+AF8-waitqueue(p)+ADs- DEFINE+AF8-WAIT+AF8-ATOMIC+AF8-T(wq+AF8-entry, p)+ADs- - return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-entry, action, mode)+ADs- +- return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-entry, action, mode, 0)+ADs- +AH0- EXPORT+AF8-SYMBOL(out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t)+ADs- +-+AF8AXw-sched int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p, int (+ACo-action)(atomic+AF8-t +ACo-), +- unsigned mode) +-+AHs- +- struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head +AD0- atomic+AF8-t+AF8-waitqueue(p)+ADs- +- DEFINE+AF8-WAIT+AF8-DEVMAP+AF8-IDLE(wq+AF8-entry, p)+ADs- +- +- return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-entry, action, mode, 1)+ADs- +-+AH0- +-EXPORT+AF8-SYMBOL(out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle)+ADs- +- /+ACoAKg- +ACo- wake+AF8-up+AF8-atomic+AF8-t - Wake up a waiter on a atomic+AF8-t +ACo- +AEA-p: The atomic+AF8-t being waited on, a kernel virtual address +AEAAQA- -235,6 +-277,12 +AEAAQA- void wake+AF8-up+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p) +AH0- EXPORT+AF8-SYMBOL(wake+AF8-up+AF8-atomic+AF8-t)+ADs- +-void wake+AF8-up+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p) +-+AHs- +- +AF8AXw-wake+AF8-up+AF8-bit(atomic+AF8-t+AF8-waitqueue(p), p, WAIT+AF8-ATOMIC+AF8-T+AF8-BIT+AF8-NR)+ADs- +-+AH0- +-EXPORT+AF8-SYMBOL(wake+AF8-up+AF8-devmap+AF8-idle)+ADs- +- +AF8AXw-sched int bit+AF8-wait(struct wait+AF8-bit+AF8-key +ACo-word, int mode) +AHs- schedule()+ADs- diff --git a/mm/gup.c b/mm/gup.c index 308be897d22a..fd7b2a2e2d19 100644 --- a/mm/gup.c +-+-+- b/mm/gup.c +AEAAQA- -579,6 +-579,41 +AEAAQA- static int check+AF8-vma+AF8-flags(struct vm+AF8-area+AF8-struct +ACo-vma, unsigned long gup+AF8-flags) return 0+ADs- +AH0- +-static struct inode +ACo-do+AF8-dax+AF8-lock(struct vm+AF8-area+AF8-struct +ACo-vma, +- unsigned int foll+AF8-flags) +-+AHs- +- struct file +ACo-file+ADs- +- struct inode +ACo-inode+ADs- +- +- if (+ACE-(foll+AF8-flags +ACY- FOLL+AF8-GET)) +- return NULL+ADs- +- if (+ACE-vma+AF8-is+AF8-dax(vma)) +- return NULL+ADs- +- file +AD0- vma-+AD4-vm+AF8-file+ADs- +- inode +AD0- file+AF8-inode(file)+ADs- +- if (inode-+AD4-i+AF8-mode +AD0APQ- S+AF8-IFCHR) +- return NULL+ADs- +- return inode+ADs- +-+AH0- +- +-static struct inode +ACo-dax+AF8-truncate+AF8-lock(struct vm+AF8-area+AF8-struct +ACo-vma, +- unsigned int foll+AF8-flags) +-+AHs- +- struct inode +ACo-inode +AD0- do+AF8-dax+AF8-lock(vma, foll+AF8-flags)+ADs- +- +- if (+ACE-inode) +- return NULL+ADs- +- i+AF8-daxdma+AF8-lock(inode)+ADs- +- return inode+ADs- +-+AH0- +- +-static void dax+AF8-truncate+AF8-unlock(struct inode +ACo-inode) +-+AHs- +- if (+ACE-inode) +- return+ADs- +- i+AF8-daxdma+AF8-unlock(inode)+ADs- +-+AH0- +- /+ACoAKg- +ACo- +AF8AXw-get+AF8-user+AF8-pages() - pin user pages in memory +ACo- +AEA-tsk: task+AF8-struct of target task +AEAAQA- -659,6 +-694,7 +AEAAQA- static long +AF8AXw-get+AF8-user+AF8-pages(struct task+AF8-struct +ACo-tsk, struct mm+AF8-struct +ACo-mm, do +AHs- struct page +ACo-page+ADs- +- struct inode +ACo-inode+ADs- unsigned int foll+AF8-flags +AD0- gup+AF8-flags+ADs- unsigned int page+AF8-increm+ADs- +AEAAQA- -693,7 +-729,9 +AEAAQA- static long +AF8AXw-get+AF8-user+AF8-pages(struct task+AF8-struct +ACo-tsk, struct mm+AF8-struct +ACo-mm, if (unlikely(fatal+AF8-signal+AF8-pending(current))) return i ? i : -ERESTARTSYS+ADs- cond+AF8-resched()+ADs- +- inode +AD0- dax+AF8-truncate+AF8-lock(vma, foll+AF8-flags)+ADs- page +AD0- follow+AF8-page+AF8-mask(vma, start, foll+AF8-flags, +ACY-page+AF8-mask)+ADs- +- dax+AF8-truncate+AF8-unlock(inode)+ADs- if (+ACE-page) +AHs- int ret+ADs- ret +AD0- faultin+AF8-page(tsk, vma, start, +ACY-foll+AF8-flags, commit 67d952314e9989b3b1945c50488f4a0f760264c3 Author: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- Date: Tue Oct 24 13:41:22 2017 -0700 xfs: wire up dax dma waiting The dax-dma vs truncate collision avoidance involves acquiring the new i+AF8-dax+AF8-dmasem and validating the no ranges that are to be mapped out of the file are active for dma. If any are found we wait for page idle and retry the scan. The locations where we implement this wait line up with where we currently wait for pnfs layout leases to expire. Since we need both dma to be idle and leases to be broken, and since xfs+AF8-break+AF8-layouts drops locks, we need to retry the dma busy scan until we can complete one that finds no busy pages. Cc: Jan Kara +ADw-jack+AEA-suse.cz+AD4- Cc: Dave Chinner +ADw-david+AEA-fromorbit.com+AD4- Cc: +ACI-Darrick J. Wong+ACI- +ADw-darrick.wong+AEA-oracle.com+AD4- Cc: Ross Zwisler +ADw-ross.zwisler+AEA-linux.intel.com+AD4- Cc: Christoph Hellwig +ADw-hch+AEA-lst.de+AD4- Signed-off-by: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- diff --git a/fs/xfs/xfs+AF8-file.c b/fs/xfs/xfs+AF8-file.c index c6780743f8ec..e3ec46c28c60 100644 --- a/fs/xfs/xfs+AF8-file.c +-+-+- b/fs/xfs/xfs+AF8-file.c +AEAAQA- -347,7 +-347,7 +AEAAQA- xfs+AF8-file+AF8-aio+AF8-write+AF8-checks( return error+ADs- error +AD0- xfs+AF8-break+AF8-layouts(inode, iolock)+ADs- - if (error) +- if (error +ADw- 0) return error+ADs- /+ACo- +AEAAQA- -762,7 +-762,7 +AEAAQA- xfs+AF8-file+AF8-fallocate( struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- long error+ADs- enum xfs+AF8-prealloc+AF8-flags flags +AD0- 0+ADs- - uint iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- uint iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- loff+AF8-t new+AF8-size +AD0- 0+ADs- bool do+AF8-file+AF8-insert +AD0- 0+ADs- +AEAAQA- -771,10 +-771,20 +AEAAQA- xfs+AF8-file+AF8-fallocate( if (mode +ACY- +AH4-XFS+AF8-FALLOC+AF8-FL+AF8-SUPPORTED) return -EOPNOTSUPP+ADs- +-retry: xfs+AF8-ilock(ip, iolock)+ADs- +- dax+AF8-wait+AF8-dma(inode-+AD4-i+AF8-mapping, offset, len)+ADs- +- +- xfs+AF8-ilock(ip, XFS+AF8-IOLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-IOLOCK+AF8-EXCL+ADs- error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- - if (error) +- if (error +ADw- 0) goto out+AF8-unlock+ADs- +- else if (error +AD4- 0 +ACYAJg- IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX)) +AHs- +- xfs+AF8-iunlock(ip, iolock)+ADs- +- iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- +- goto retry+ADs- +- +AH0- xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- diff --git a/fs/xfs/xfs+AF8-inode.c b/fs/xfs/xfs+AF8-inode.c index 4ec5b7f45401..783f15894b7b 100644 --- a/fs/xfs/xfs+AF8-inode.c +-+-+- b/fs/xfs/xfs+AF8-inode.c +AEAAQA- -171,7 +-171,14 +AEAAQA- xfs+AF8-ilock+AF8-attr+AF8-map+AF8-shared( +ACo- taken in places where we need to invalidate the page cache in a race +ACo- free manner (e.g. truncate, hole punch and other extent manipulation +ACo- functions). - +ACo-/ +- +ACo- +- +ACo- The XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED lock is a CONFIG+AF8-FS+AF8-DAX special case lock +- +ACo- for synchronizing truncate vs ongoing DMA. The get+AF8-user+AF8-pages() path +- +ACo- will hold this lock exclusively when incrementing page reference +- +ACo- counts for DMA. Before an extent can be truncated we need to complete +- +ACo- a validate-idle sweep of all pages in the range while holding this +- +ACo- lock in shared mode. +-+ACo-/ void xfs+AF8-ilock( xfs+AF8-inode+AF8-t +ACo-ip, +AEAAQA- -192,6 +-199,9 +AEAAQA- xfs+AF8-ilock( (XFS+AF8-ILOCK+AF8-SHARED +AHw- XFS+AF8-ILOCK+AF8-EXCL))+ADs- ASSERT((lock+AF8-flags +ACY- +AH4-(XFS+AF8-LOCK+AF8-MASK +AHw- XFS+AF8-LOCK+AF8-SUBCLASS+AF8-MASK)) +AD0APQ- 0)+ADs- +- if (lock+AF8-flags +ACY- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) +- i+AF8-daxdma+AF8-lock+AF8-shared(VFS+AF8-I(ip))+ADs- +- if (lock+AF8-flags +ACY- XFS+AF8-IOLOCK+AF8-EXCL) +AHs- down+AF8-write+AF8-nested(+ACY-VFS+AF8-I(ip)-+AD4-i+AF8-rwsem, XFS+AF8-IOLOCK+AF8-DEP(lock+AF8-flags))+ADs- +AEAAQA- -328,6 +-338,9 +AEAAQA- xfs+AF8-iunlock( else if (lock+AF8-flags +ACY- XFS+AF8-ILOCK+AF8-SHARED) mrunlock+AF8-shared(+ACY-ip-+AD4-i+AF8-lock)+ADs- +- if (lock+AF8-flags +ACY- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) +- i+AF8-daxdma+AF8-unlock+AF8-shared(VFS+AF8-I(ip))+ADs- +- trace+AF8-xfs+AF8-iunlock(ip, lock+AF8-flags, +AF8-RET+AF8-IP+AF8-)+ADs- +AH0- diff --git a/fs/xfs/xfs+AF8-inode.h b/fs/xfs/xfs+AF8-inode.h index 0ee453de239a..0662edf00529 100644 --- a/fs/xfs/xfs+AF8-inode.h +-+-+- b/fs/xfs/xfs+AF8-inode.h +AEAAQA- -283,10 +-283,12 +AEAAQA- static inline void xfs+AF8-ifunlock(struct xfs+AF8-inode +ACo-ip) +ACM-define XFS+AF8-ILOCK+AF8-SHARED (1+ADwAPA-3) +ACM-define XFS+AF8-MMAPLOCK+AF8-EXCL (1+ADwAPA-4) +ACM-define XFS+AF8-MMAPLOCK+AF8-SHARED (1+ADwAPA-5) +-+ACM-define XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED (1+ADwAPA-6) +ACM-define XFS+AF8-LOCK+AF8-MASK (XFS+AF8-IOLOCK+AF8-EXCL +AHw- XFS+AF8-IOLOCK+AF8-SHARED +AFw- +AHw- XFS+AF8-ILOCK+AF8-EXCL +AHw- XFS+AF8-ILOCK+AF8-SHARED +AFw- - +AHw- XFS+AF8-MMAPLOCK+AF8-EXCL +AHw- XFS+AF8-MMAPLOCK+AF8-SHARED) +- +AHw- XFS+AF8-MMAPLOCK+AF8-EXCL +AHw- XFS+AF8-MMAPLOCK+AF8-SHARED +AFw- +- +AHw- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) +ACM-define XFS+AF8-LOCK+AF8-FLAGS +AFw- +AHs- XFS+AF8-IOLOCK+AF8-EXCL, +ACI-IOLOCK+AF8-EXCL+ACI- +AH0-, +AFw- +AEAAQA- -294,7 +-296,8 +AEAAQA- static inline void xfs+AF8-ifunlock(struct xfs+AF8-inode +ACo-ip) +AHs- XFS+AF8-ILOCK+AF8-EXCL, +ACI-ILOCK+AF8-EXCL+ACI- +AH0-, +AFw- +AHs- XFS+AF8-ILOCK+AF8-SHARED, +ACI-ILOCK+AF8-SHARED+ACI- +AH0-, +AFw- +AHs- XFS+AF8-MMAPLOCK+AF8-EXCL, +ACI-MMAPLOCK+AF8-EXCL+ACI- +AH0-, +AFw- - +AHs- XFS+AF8-MMAPLOCK+AF8-SHARED, +ACI-MMAPLOCK+AF8-SHARED+ACI- +AH0- +- +AHs- XFS+AF8-MMAPLOCK+AF8-SHARED, +ACI-MMAPLOCK+AF8-SHARED+ACI- +AH0-, +AFw- +- +AHs- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED, +ACI-XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ACI- +AH0- /+ACo- diff --git a/fs/xfs/xfs+AF8-ioctl.c b/fs/xfs/xfs+AF8-ioctl.c index aa75389be8cf..fd384ea00ede 100644 --- a/fs/xfs/xfs+AF8-ioctl.c +-+-+- b/fs/xfs/xfs+AF8-ioctl.c +AEAAQA- -612,7 +-612,7 +AEAAQA- xfs+AF8-ioc+AF8-space( struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- struct iattr iattr+ADs- enum xfs+AF8-prealloc+AF8-flags flags +AD0- 0+ADs- - uint iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- uint iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- int error+ADs- /+ACo- +AEAAQA- -637,18 +-637,6 +AEAAQA- xfs+AF8-ioc+AF8-space( if (filp-+AD4-f+AF8-mode +ACY- FMODE+AF8-NOCMTIME) flags +AHwAPQ- XFS+AF8-PREALLOC+AF8-INVISIBLE+ADs- - error +AD0- mnt+AF8-want+AF8-write+AF8-file(filp)+ADs- - if (error) - return error+ADs- - - xfs+AF8-ilock(ip, iolock)+ADs- - error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- - if (error) - goto out+AF8-unlock+ADs- - - xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- - iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- - switch (bf-+AD4-l+AF8-whence) +AHs- case 0: /+ACo-SEEK+AF8-SET+ACo-/ break+ADs- +AEAAQA- -659,10 +-647,31 +AEAAQA- xfs+AF8-ioc+AF8-space( bf-+AD4-l+AF8-start +-+AD0- XFS+AF8-ISIZE(ip)+ADs- break+ADs- default: - error +AD0- -EINVAL+ADs- +- return -EINVAL+ADs- +- +AH0- +- +- error +AD0- mnt+AF8-want+AF8-write+AF8-file(filp)+ADs- +- if (error) +- return error+ADs- +- +-retry: +- xfs+AF8-ilock(ip, iolock)+ADs- +- dax+AF8-wait+AF8-dma(inode-+AD4-i+AF8-mapping, bf-+AD4-l+AF8-start, bf-+AD4-l+AF8-len)+ADs- +- +- xfs+AF8-ilock(ip, XFS+AF8-IOLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- +- if (error +ADw- 0) goto out+AF8-unlock+ADs- +- else if (error +AD4- 0 +ACYAJg- IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX)) +AHs- +- xfs+AF8-iunlock(ip, iolock)+ADs- +- iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- +- goto retry+ADs- +AH0- +- xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- +- /+ACo- +ACo- length of +ADwAPQ- 0 for resv/unresv/zero is invalid. length for +ACo- alloc/free is ignored completely and we have no idea what userspace diff --git a/fs/xfs/xfs+AF8-pnfs.c b/fs/xfs/xfs+AF8-pnfs.c index 4246876df7b7..5f4d46b3cd7f 100644 --- a/fs/xfs/xfs+AF8-pnfs.c +-+-+- b/fs/xfs/xfs+AF8-pnfs.c +AEAAQA- -35,18 +-35,19 +AEAAQA- xfs+AF8-break+AF8-layouts( uint +ACo-iolock) +AHs- struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- - int error+ADs- +- int error, did+AF8-unlock +AD0- 0+ADs- ASSERT(xfs+AF8-isilocked(ip, XFS+AF8-IOLOCK+AF8-SHARED+AHw-XFS+AF8-IOLOCK+AF8-EXCL))+ADs- while ((error +AD0- break+AF8-layout(inode, false) +AD0APQ- -EWOULDBLOCK)) +AHs- xfs+AF8-iunlock(ip, +ACo-iolock)+ADs- +- did+AF8-unlock +AD0- 1+ADs- error +AD0- break+AF8-layout(inode, true)+ADs- +ACo-iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- xfs+AF8-ilock(ip, +ACo-iolock)+ADs- +AH0- - return error+ADs- +- return error +ADw- 0 ? error : did+AF8-unlock+ADs- +AH0- /+ACo- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com ([134.134.136.20]:15465 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932395AbdJZXvJ (ORCPT ); Thu, 26 Oct 2017 19:51:09 -0400 From: "Williams, Dan J" Subject: Re: [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less' support Date: Thu, 26 Oct 2017 23:51:04 +0000 Message-ID: <1509061831.25213.2.camel@intel.com> References: <150846713528.24336.4459262264611579791.stgit@dwillia2-desk3.amr.corp.intel.com> <20171020074750.GA13568@lst.de> <20171020093148.GA20304@lst.de> <20171026105850.GA31161@quack2.suse.cz> In-Reply-To: <20171026105850.GA31161@quack2.suse.cz> Content-Language: en-US Content-Type: text/plain; charset="utf-7" Content-ID: <6B1E8E88FC7C9E4D9754113A024144AD@intel.com> Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: "hch@lst.de" , "jack@suse.cz" Cc: "schwidefsky@de.ibm.com" , "darrick.wong@oracle.com" , "dledford@redhat.com" , "linux-rdma@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "bfields@fieldses.org" , "linux-mm@kvack.org" , "heiko.carstens@de.ibm.com" , "dave.hansen@linux.intel.com" , "linux-xfs@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "jmoyer@redhat.com" , "viro@zeniv.linux.org.uk" , "kirill.shutemov@linux.intel.com" , "akpm@linux-foundation.org" , "Hefty, Sean" , "linux-nvdimm@lists.01.org" , "jlayton@poochiereds.net" , "mawilcox@microsoft.com" , "mhocko@suse.com" , "ross.zwisler@linux.intel.com" , "gerald.schaefer@de.ibm.com" , "jgunthorpe@obsidianresearch.com" , "hal.rosenstock@gmail.com" , "benh@kernel.crashing.org" , "david@fromorbit.com" , "mpe@ellerman.id.au" , "paulus@samba.org" On Thu, 2017-10-26 at 12:58 +-0200, Jan Kara wrote: +AD4- On Fri 20-10-17 11:31:48, Christoph Hellwig wrote: +AD4- +AD4- On Fri, Oct 20, 2017 at 09:47:50AM +-0200, Christoph Hellwig wrote: +AD4- +AD4- +AD4- I'd like to brainstorm how we can do something better. +AD4- +AD4- +AD4- +AD4- +AD4- +AD4- How about: +AD4- +AD4- +AD4- +AD4- +AD4- +AD4- If we hit a page with an elevated refcount in truncate / hole puch +AD4- +AD4- +AD4- etc for a DAX file system we do not free the blocks in the file system, +AD4- +AD4- +AD4- but add it to the extent busy list.+AKAAoA-We mark the page as delayed +AD4- +AD4- +AD4- free (e.g. page flag?) so that when it finally hits refcount zero we +AD4- +AD4- +AD4- call back into the file system to remove it from the busy list. +AD4- +AD4- +AD4- +AD4- Brainstorming some more: +AD4- +AD4- +AD4- +AD4- Given that on a DAX file there shouldn't be any long-term page +AD4- +AD4- references after we unmap it from the page table and don't allow +AD4- +AD4- get+AF8-user+AF8-pages calls why not wait for the references for all +AD4- +AD4- DAX pages to go away first?+AKAAoA-E.g. if we find a DAX page in +AD4- +AD4- truncate+AF8-inode+AF8-pages+AF8-range that has an elevated refcount we set +AD4- +AD4- a new flag to prevent new references from showing up, and then +AD4- +AD4- simply wait for it to go away.+AKAAoA-Instead of a busy way we can +AD4- +AD4- do this through a few hashed waitqueued in dev+AF8-pagemap.+AKAAoA-And in +AD4- +AD4- fact put+AF8-zone+AF8-device+AF8-page already gets called when putting the +AD4- +AD4- last page so we can handle the wakeup from there. +AD4- +AD4- +AD4- +AD4- In fact if we can't find a page flag for the stop new callers +AD4- +AD4- things we could probably come up with a way to do that through +AD4- +AD4- dev+AF8-pagemap somehow, but I'm not sure how efficient that would +AD4- +AD4- be. +AD4- +AD4- We were talking about this yesterday with Dan so some more brainstorming +AD4- from us. We can implement the solution with extent busy list in ext4 +AD4- relatively easily - we already have such list currently similarly to XFS. +AD4- There would be some modifications needed but nothing too complex. The +AD4- biggest downside of this solution I see is that it requires per-filesystem +AD4- solution for busy extents - ext4 and XFS are reasonably fine, however btrfs +AD4- may have problems and ext2 definitely will need some modifications. +AD4- Invisible used blocks may be surprising to users at times although given +AD4- page refs should be relatively short term, that should not be a big issue. +AD4- But are we guaranteed page refs are short term? E.g. if someone creates +AD4- v4l2 videobuf in MAP+AF8-SHARED mapping of a file on DAX filesystem, page refs +AD4- can be rather long-term similarly as in RDMA case. Also freeing of blocks +AD4- on page reference drop is another async entry point into the filesystem +AD4- which could unpleasantly surprise us but I guess workqueues would solve +AD4- that reasonably fine. +AD4- +AD4- WRT waiting for page refs to be dropped before proceeding with truncate (or +AD4- punch hole for that matter - that case is even nastier since we don't have +AD4- i+AF8-size to guard us). What I like about this solution is that it is very +AD4- visible there's something unusual going on with the file being truncated / +AD4- punched and so problems are easier to diagnose / fix from the admin side. +AD4- So far we have guarded hole punching from concurrent faults (and +AD4- get+AF8-user+AF8-pages() does fault once you do unmap+AF8-mapping+AF8-range()) with +AD4- I+AF8-MMAP+AF8-LOCK (or its equivalent in ext4). We cannot easily wait for page +AD4- refs to be dropped under I+AF8-MMAP+AF8-LOCK as that could deadlock - the most +AD4- obvious case Dan came up with is when GUP obtains ref to page A, then hole +AD4- punch comes grabbing I+AF8-MMAP+AF8-LOCK and waiting for page ref on A to be +AD4- dropped, and then GUP blocks on trying to fault in another page. +AD4- +AD4- I think we cannot easily prevent new page references to be grabbed as you +AD4- write above since nobody expects stuff like get+AF8-page() to fail. But I+AKA- +AD4- think that unmapping relevant pages and then preventing them to be faulted +AD4- in again is workable and stops GUP as well. The problem with that is though +AD4- what to do with page faults to such pages - you cannot just fail them for +AD4- hole punch, and you cannot easily allocate new blocks either. So we are +AD4- back at a situation where we need to detach blocks from the inode and then +AD4- wait for page refs to be dropped - so some form of busy extents. Am I +AD4- missing something? +AD4- No, that's a good summary of what we talked about. However, I did go back and give the new lock approach a try and was able to get my test to pass. The new locking is not pretty especially since you need to drop and reacquire the lock so that get+AF8-user+AF8-pages() can finish grabbing all the pages it needs. Here are the two primary patches in the series, do you think the extent-busy approach would be cleaner? --- commit 5023d20a0aa795ddafd43655be1bfb2cbc7f4445 Author: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- Date: Wed Oct 25 05:14:54 2017 -0700 mm, dax: handle truncate of dma-busy pages get+AF8-user+AF8-pages() pins file backed memory pages for access by dma devices. However, it only pins the memory pages not the page-to-file offset association. If a file is truncated the pages are mapped out of the file and dma may continue indefinitely into a page that is owned by a device driver. This breaks coherency of the file vs dma, but the assumption is that if userspace wants the file-space truncated it does not matter what data is inbound from the device, it is not relevant anymore. The assumptions of the truncate-page-cache model are broken by DAX where the target DMA page +ACo-is+ACo- the filesystem block. Leaving the page pinned for DMA, but truncating the file block out of the file, means that the filesytem is free to reallocate a block under active DMA to another file+ACE- Here are some possible options for fixing this situation ('truncate' and 'fallocate(punch hole)' are synonymous below): 1/ Fail truncate while any file blocks might be under dma 2/ Block (sleep-wait) truncate while any file blocks might be under dma 3/ Remap file blocks to a +ACI-lost+-found+ACI--like file-inode where dma can continue and we might see what inbound data from DMA was mapped out of the original file. Blocks in this file could be freed back to the filesystem when dma eventually ends. 4/ List the blocks under DMA in the extent busy list and either hold off commit of the truncate transaction until commit, or otherwise keep the blocks marked busy so the allocator does not reuse them until DMA completes. 5/ Disable dax until option 3 or another long term solution has been implemented. However, filesystem-dax is still marked experimental for concerns like this. Option 1 will throw failures where userspace has never expected them before, option 2 might hang the truncating process indefinitely, and option 3 requires per filesystem enabling to remap blocks from one inode to another. Option 2 is implemented in this patch for the DAX path with the expectation that non-transient users of get+AF8-user+AF8-pages() (RDMA) are disallowed from setting up dax mappings and that the potential delay introduced to the truncate path is acceptable compared to the response time of the page cache case. This can only be seen as a stop-gap until we can solve the problem of safely sequestering unallocated filesystem blocks under active dma. The solution introduces a new inode semaphore that that is held exclusively for get+AF8-user+AF8-pages() and held for read at truncate while sleep-waiting on a hashed waitqueue. Credit for option 3 goes to Dave Hansen, who proposed something similar as an alternative way to solve the problem that MAP+AF8-DIRECT was trying to solve. Credit for option 4 goes to Christoph Hellwig. Cc: Jan Kara +ADw-jack+AEA-suse.cz+AD4- Cc: Jeff Moyer +ADw-jmoyer+AEA-redhat.com+AD4- Cc: Dave Chinner +ADw-david+AEA-fromorbit.com+AD4- Cc: Matthew Wilcox +ADw-mawilcox+AEA-microsoft.com+AD4- Cc: Alexander Viro +ADw-viro+AEA-zeniv.linux.org.uk+AD4- Cc: +ACI-Darrick J. Wong+ACI- +ADw-darrick.wong+AEA-oracle.com+AD4- Cc: Ross Zwisler +ADw-ross.zwisler+AEA-linux.intel.com+AD4- Cc: Dave Hansen +ADw-dave.hansen+AEA-linux.intel.com+AD4- Cc: Andrew Morton +ADw-akpm+AEA-linux-foundation.org+AD4- Reported-by: Christoph Hellwig +ADw-hch+AEA-lst.de+AD4- Signed-off-by: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 4ac359e14777..a5a4b95ffdaf 100644 --- a/drivers/dax/super.c +-+-+- b/drivers/dax/super.c +AEAAQA- -167,6 +-167,7 +AEAAQA- struct dax+AF8-device +AHs- +ACM-if IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX) static void generic+AF8-dax+AF8-pagefree(struct page +ACo-page, void +ACo-data) +AHs- +- wake+AF8-up+AF8-devmap+AF8-idle(+ACY-page-+AD4AXw-refcount)+ADs- +AH0- struct dax+AF8-device +ACo-fs+AF8-dax+AF8-claim+AF8-bdev(struct block+AF8-device +ACo-bdev, void +ACo-owner) diff --git a/fs/dax.c b/fs/dax.c index fd5d385988d1..f2c98f9cb833 100644 --- a/fs/dax.c +-+-+- b/fs/dax.c +AEAAQA- -346,6 +-346,19 +AEAAQA- static void dax+AF8-disassociate+AF8-entry(void +ACo-entry, struct inode +ACo-inode, bool trunc) +AH0- +AH0- +-static struct page +ACo-dma+AF8-busy+AF8-page(void +ACo-entry) +-+AHs- +- unsigned long pfn, end+AF8-pfn+ADs- +- +- for+AF8-each+AF8-entry+AF8-pfn(entry, pfn, end+AF8-pfn) +AHs- +- struct page +ACo-page +AD0- pfn+AF8-to+AF8-page(pfn)+ADs- +- +- if (page+AF8-ref+AF8-count(page) +AD4- 1) +- return page+ADs- +- +AH0- +- return NULL+ADs- +-+AH0- +- /+ACo- +ACo- Find radix tree entry at given index. If it points to an exceptional entry, +ACo- return it with the radix tree entry locked. If the radix tree doesn't +AEAAQA- -487,6 +-500,97 +AEAAQA- static void +ACo-grab+AF8-mapping+AF8-entry(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index, return entry+ADs- +AH0- +-static int wait+AF8-page(atomic+AF8-t +ACoAXw-refcount) +-+AHs- +- struct page +ACo-page +AD0- container+AF8-of(+AF8-refcount, struct page, +AF8-refcount)+ADs- +- struct inode +ACo-inode +AD0- page-+AD4-inode+ADs- +- +- if (page+AF8-ref+AF8-count(page) +AD0APQ- 1) +- return 0+ADs- +- +- i+AF8-daxdma+AF8-unlock+AF8-shared(inode)+ADs- +- schedule()+ADs- +- i+AF8-daxdma+AF8-lock+AF8-shared(inode)+ADs- +- +- /+ACo- +- +ACo- if we bounced the daxdma+AF8-lock then we need to rescan the +- +ACo- truncate area. +- +ACo-/ +- return 1+ADs- +-+AH0- +- +-void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-mapping, loff+AF8-t lstart, loff+AF8-t len) +-+AHs- +- struct inode +ACo-inode +AD0- mapping-+AD4-host+ADs- +- pgoff+AF8-t indices+AFs-PAGEVEC+AF8-SIZE+AF0AOw- +- pgoff+AF8-t start, end, index+ADs- +- struct pagevec pvec+ADs- +- unsigned i+ADs- +- +- lockdep+AF8-assert+AF8-held(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +- +- if (lstart +ADw- 0 +AHwAfA- len +ADw- -1) +- return+ADs- +- +- /+ACo- in the limited case get+AF8-user+AF8-pages for dax is disabled +ACo-/ +- if (IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX+AF8-LIMITED)) +- return+ADs- +- +- if (+ACE-dax+AF8-mapping(mapping)) +- return+ADs- +- +- if (mapping-+AD4-nrexceptional +AD0APQ- 0) +- return+ADs- +- +- if (len +AD0APQ- -1) +- end +AD0- -1+ADs- +- else +- end +AD0- (lstart +- len) +AD4APg- PAGE+AF8-SHIFT+ADs- +- start +AD0- lstart +AD4APg- PAGE+AF8-SHIFT+ADs- +- +-retry: +- pagevec+AF8-init(+ACY-pvec, 0)+ADs- +- index +AD0- start+ADs- +- while (index +ADw- end +ACYAJg- pagevec+AF8-lookup+AF8-entries(+ACY-pvec, mapping, index, +- min(end - index, (pgoff+AF8-t)PAGEVEC+AF8-SIZE), +- indices)) +AHs- +- for (i +AD0- 0+ADs- i +ADw- pagevec+AF8-count(+ACY-pvec)+ADs- i+-+-) +AHs- +- struct page +ACo-pvec+AF8-ent +AD0- pvec.pages+AFs-i+AF0AOw- +- struct page +ACo-page +AD0- NULL+ADs- +- void +ACo-entry+ADs- +- +- index +AD0- indices+AFs-i+AF0AOw- +- if (index +AD4APQ- end) +- break+ADs- +- +- if (+ACE-radix+AF8-tree+AF8-exceptional+AF8-entry(pvec+AF8-ent)) +- continue+ADs- +- +- spin+AF8-lock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- entry +AD0- get+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, NULL)+ADs- +- if (entry) +- page +AD0- dma+AF8-busy+AF8-page(entry)+ADs- +- put+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, entry)+ADs- +- spin+AF8-unlock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- +- if (page +ACYAJg- wait+AF8-on+AF8-devmap+AF8-idle(+ACY-page-+AD4AXw-refcount, +- wait+AF8-page, +- TASK+AF8-UNINTERRUPTIBLE) +ACEAPQ- 0) +AHs- +- /+ACo- +- +ACo- We dropped the dma lock, so we need +- +ACo- to revalidate that previously seen +- +ACo- idle pages are still idle. +- +ACo-/ +- goto retry+ADs- +- +AH0- +- +AH0- +- pagevec+AF8-remove+AF8-exceptionals(+ACY-pvec)+ADs- +- pagevec+AF8-release(+ACY-pvec)+ADs- +- index+-+-+ADs- +- +AH0- +-+AH0- +-EXPORT+AF8-SYMBOL+AF8-GPL(dax+AF8-wait+AF8-dma)+ADs- +- static int +AF8AXw-dax+AF8-invalidate+AF8-mapping+AF8-entry(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index, bool trunc) +AHs- +AEAAQA- -509,8 +-613,10 +AEAAQA- static int +AF8AXw-dax+AF8-invalidate+AF8-mapping+AF8-entry(struct address+AF8-space +ACo-mapping, out: put+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, entry)+ADs- spin+AF8-unlock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- return ret+ADs- +AH0- +- /+ACo- +ACo- Delete exceptional DAX entry at +AEA-index from +AEA-mapping. Wait for radix tree +ACo- entry to get unlocked before deleting it. diff --git a/fs/inode.c b/fs/inode.c index d1e35b53bb23..95408e87a96c 100644 --- a/fs/inode.c +-+-+- b/fs/inode.c +AEAAQA- -192,6 +-192,7 +AEAAQA- int inode+AF8-init+AF8-always(struct super+AF8-block +ACo-sb, struct inode +ACo-inode) inode-+AD4-i+AF8-fsnotify+AF8-mask +AD0- 0+ADs- +ACM-endif inode-+AD4-i+AF8-flctx +AD0- NULL+ADs- +- i+AF8-daxdma+AF8-init(inode)+ADs- this+AF8-cpu+AF8-inc(nr+AF8-inodes)+ADs- return 0+ADs- diff --git a/include/linux/dax.h b/include/linux/dax.h index ea21ebfd1889..6ce1c50519e7 100644 --- a/include/linux/dax.h +-+-+- b/include/linux/dax.h +AEAAQA- -100,10 +-100,15 +AEAAQA- int dax+AF8-invalidate+AF8-mapping+AF8-entry+AF8-sync(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index)+ADs- +ACM-ifdef CONFIG+AF8-FS+AF8-DAX +-void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-mapping, loff+AF8-t lstart, loff+AF8-t len)+ADs- int +AF8AXw-dax+AF8-zero+AF8-page+AF8-range(struct block+AF8-device +ACo-bdev, struct dax+AF8-device +ACo-dax+AF8-dev, sector+AF8-t sector, unsigned int offset, unsigned int length)+ADs- +ACM-else +-static inline void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-mapping, loff+AF8-t lstart, +- loff+AF8-t len) +-+AHs- +-+AH0- static inline int +AF8AXw-dax+AF8-zero+AF8-page+AF8-range(struct block+AF8-device +ACo-bdev, struct dax+AF8-device +ACo-dax+AF8-dev, sector+AF8-t sector, unsigned int offset, unsigned int length) diff --git a/include/linux/fs.h b/include/linux/fs.h index 13dab191a23e..cd5b4a092d1c 100644 --- a/include/linux/fs.h +-+-+- b/include/linux/fs.h +AEAAQA- -645,6 +-645,9 +AEAAQA- struct inode +AHs- +ACM-ifdef CONFIG+AF8-IMA atomic+AF8-t i+AF8-readcount+ADs- /+ACo- struct files open RO +ACo-/ +ACM-endif +-+ACM-ifdef CONFIG+AF8-FS+AF8-DAX +- struct rw+AF8-semaphore i+AF8-dax+AF8-dmasem+ADs- +-+ACM-endif const struct file+AF8-operations +ACo-i+AF8-fop+ADs- /+ACo- former -+AD4-i+AF8-op-+AD4-default+AF8-file+AF8-ops +ACo-/ struct file+AF8-lock+AF8-context +ACo-i+AF8-flctx+ADs- struct address+AF8-space i+AF8-data+ADs- +AEAAQA- -747,6 +-750,59 +AEAAQA- static inline void inode+AF8-lock+AF8-nested(struct inode +ACo-inode, unsigned subclass) down+AF8-write+AF8-nested(+ACY-inode-+AD4-i+AF8-rwsem, subclass)+ADs- +AH0- +-+ACM-ifdef CONFIG+AF8-FS+AF8-DAX +-static inline void i+AF8-daxdma+AF8-init(struct inode +ACo-inode) +-+AHs- +- init+AF8-rwsem(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock(struct inode +ACo-inode) +-+AHs- +- down+AF8-write(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock(struct inode +ACo-inode) +-+AHs- +- up+AF8-write(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock+AF8-shared(struct inode +ACo-inode) +-+AHs- +- /+ACo- +- +ACo- The write lock is taken under mmap+AF8-sem in the +- +ACo- get+AF8-user+AF8-pages() path the read lock nests in the truncate +- +ACo- path. +- +ACo-/ +-+ACM-define DAXDMA+AF8-TRUNCATE+AF8-CLASS 1 +- down+AF8-read+AF8-nested(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem, DAXDMA+AF8-TRUNCATE+AF8-CLASS)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock+AF8-shared(struct inode +ACo-inode) +-+AHs- +- up+AF8-read(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +-+ACM-else /+ACo- CONFIG+AF8-FS+AF8-DAX +ACo-/ +-static inline void i+AF8-daxdma+AF8-init(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock+AF8-shared(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock+AF8-shared(struct inode +ACo-inode) +-+AHs- +-+AH0- +-+ACM-endif /+ACo- CONFIG+AF8-FS+AF8-DAX +ACo-/ +- void lock+AF8-two+AF8-nondirectories(struct inode +ACo-, struct inode+ACo-)+ADs- void unlock+AF8-two+AF8-nondirectories(struct inode +ACo-, struct inode+ACo-)+ADs- diff --git a/include/linux/wait+AF8-bit.h b/include/linux/wait+AF8-bit.h index 12b26660d7e9..6186ecdb9df7 100644 --- a/include/linux/wait+AF8-bit.h +-+-+- b/include/linux/wait+AF8-bit.h +AEAAQA- -30,10 +-30,12 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-bit(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-entry +ACo- int +AF8AXw-wait+AF8-on+AF8-bit+AF8-lock(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- void wake+AF8-up+AF8-bit(void +ACo-word, int bit)+ADs- void wake+AF8-up+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p)+ADs- +-void wake+AF8-up+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit(void +ACo-word, int, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit+AF8-timeout(void +ACo-word, int, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode, unsigned long timeout)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit+AF8-lock(void +ACo-word, int, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p, int (+ACo-)(atomic+AF8-t +ACo-), unsigned int mode)+ADs- +-int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p, int (+ACo-)(atomic+AF8-t +ACo-), unsigned int mode)+ADs- struct wait+AF8-queue+AF8-head +ACo-bit+AF8-waitqueue(void +ACo-word, int bit)+ADs- extern void +AF8AXw-init wait+AF8-bit+AF8-init(void)+ADs- +AEAAQA- -258,4 +-260,12 +AEAAQA- int wait+AF8-on+AF8-atomic+AF8-t(atomic+AF8-t +ACo-val, int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) return out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(val, action, mode)+ADs- +AH0- +-static inline +-int wait+AF8-on+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-val, int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) +-+AHs- +- might+AF8-sleep()+ADs- +- if (atomic+AF8-read(val) +AD0APQ- 1) +- return 0+ADs- +- return out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle(val, action, mode)+ADs- +-+AH0- +ACM-endif /+ACo- +AF8-LINUX+AF8-WAIT+AF8-BIT+AF8-H +ACo-/ diff --git a/kernel/sched/wait+AF8-bit.c b/kernel/sched/wait+AF8-bit.c index f8159698aa4d..6ea93149614a 100644 --- a/kernel/sched/wait+AF8-bit.c +-+-+- b/kernel/sched/wait+AF8-bit.c +AEAAQA- -162,11 +-162,17 +AEAAQA- static inline wait+AF8-queue+AF8-head+AF8-t +ACo-atomic+AF8-t+AF8-waitqueue(atomic+AF8-t +ACo-p) return bit+AF8-waitqueue(p, 0)+ADs- +AH0- -static int wake+AF8-atomic+AF8-t+AF8-function(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, unsigned mode, int sync, - void +ACo-arg) +-static inline struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-to+AF8-wait+AF8-bit+AF8-q( +- struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry) +-+AHs- +- return container+AF8-of(wq+AF8-entry, struct wait+AF8-bit+AF8-queue+AF8-entry, wq+AF8-entry)+ADs- +-+AH0- +- +-static int wake+AF8-atomic+AF8-t+AF8-function(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, +- unsigned mode, int sync, void +ACo-arg) +AHs- struct wait+AF8-bit+AF8-key +ACo-key +AD0- arg+ADs- - struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- container+AF8-of(wq+AF8-entry, struct wait+AF8-bit+AF8-queue+AF8-entry, wq+AF8-entry)+ADs- +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- to+AF8-wait+AF8-bit+AF8-q(wq+AF8-entry)+ADs- atomic+AF8-t +ACo-val +AD0- key-+AD4-flags+ADs- if (wait+AF8-bit-+AD4-key.flags +ACEAPQ- key-+AD4-flags +AHwAfA- +AEAAQA- -176,14 +-182,29 +AEAAQA- static int wake+AF8-atomic+AF8-t+AF8-function(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, unsigned mo return autoremove+AF8-wake+AF8-function(wq+AF8-entry, mode, sync, key)+ADs- +AH0- +-static int wake+AF8-devmap+AF8-idle+AF8-function(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, +- unsigned mode, int sync, void +ACo-arg) +-+AHs- +- struct wait+AF8-bit+AF8-key +ACo-key +AD0- arg+ADs- +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- to+AF8-wait+AF8-bit+AF8-q(wq+AF8-entry)+ADs- +- atomic+AF8-t +ACo-val +AD0- key-+AD4-flags+ADs- +- +- if (wait+AF8-bit-+AD4-key.flags +ACEAPQ- key-+AD4-flags +AHwAfA- +- wait+AF8-bit-+AD4-key.bit+AF8-nr +ACEAPQ- key-+AD4-bit+AF8-nr +AHwAfA- +- atomic+AF8-read(val) +ACEAPQ- 1) +- return 0+ADs- +- return autoremove+AF8-wake+AF8-function(wq+AF8-entry, mode, sync, key)+ADs- +-+AH0- +- /+ACo- +ACo- To allow interruptible waiting and asynchronous (i.e. nonblocking) waiting, +ACo- the actions of +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t() are permitted return codes. Nonzero +ACo- return codes halt waiting and return. +ACo-/ static +AF8AXw-sched -int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry, - int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) +-int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry, +- int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode, int target) +AHs- atomic+AF8-t +ACo-val+ADs- int ret +AD0- 0+ADs- +AEAAQA- -191,10 +-212,10 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-en do +AHs- prepare+AF8-to+AF8-wait(wq+AF8-head, +ACY-wbq+AF8-entry-+AD4-wq+AF8-entry, mode)+ADs- val +AD0- wbq+AF8-entry-+AD4-key.flags+ADs- - if (atomic+AF8-read(val) +AD0APQ- 0) +- if (atomic+AF8-read(val) +AD0APQ- target) break+ADs- ret +AD0- (+ACo-action)(val)+ADs- - +AH0- while (+ACE-ret +ACYAJg- atomic+AF8-read(val) +ACEAPQ- 0)+ADs- +- +AH0- while (+ACE-ret +ACYAJg- atomic+AF8-read(val) +ACEAPQ- target)+ADs- finish+AF8-wait(wq+AF8-head, +ACY-wbq+AF8-entry-+AD4-wq+AF8-entry)+ADs- return ret+ADs- +AH0- +AEAAQA- -210,16 +-231,37 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-en +AH0-, +AFw- +AH0- +-+ACM-define DEFINE+AF8-WAIT+AF8-DEVMAP+AF8-IDLE(name, p) +AFw- +- struct wait+AF8-bit+AF8-queue+AF8-entry name +AD0- +AHs- +AFw- +- .key +AD0- +AF8AXw-WAIT+AF8-ATOMIC+AF8-T+AF8-KEY+AF8-INITIALIZER(p), +AFw- +- .wq+AF8-entry +AD0- +AHs- +AFw- +- .private +AD0- current, +AFw- +- .func +AD0- wake+AF8-devmap+AF8-idle+AF8-function, +AFw- +- .entry +AD0- +AFw- +- LIST+AF8-HEAD+AF8-INIT((name).wq+AF8-entry.entry), +AFw- +- +AH0-, +AFw- +- +AH0- +- +AF8AXw-sched int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p, int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) +AHs- struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head +AD0- atomic+AF8-t+AF8-waitqueue(p)+ADs- DEFINE+AF8-WAIT+AF8-ATOMIC+AF8-T(wq+AF8-entry, p)+ADs- - return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-entry, action, mode)+ADs- +- return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-entry, action, mode, 0)+ADs- +AH0- EXPORT+AF8-SYMBOL(out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t)+ADs- +-+AF8AXw-sched int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p, int (+ACo-action)(atomic+AF8-t +ACo-), +- unsigned mode) +-+AHs- +- struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head +AD0- atomic+AF8-t+AF8-waitqueue(p)+ADs- +- DEFINE+AF8-WAIT+AF8-DEVMAP+AF8-IDLE(wq+AF8-entry, p)+ADs- +- +- return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-entry, action, mode, 1)+ADs- +-+AH0- +-EXPORT+AF8-SYMBOL(out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle)+ADs- +- /+ACoAKg- +ACo- wake+AF8-up+AF8-atomic+AF8-t - Wake up a waiter on a atomic+AF8-t +ACo- +AEA-p: The atomic+AF8-t being waited on, a kernel virtual address +AEAAQA- -235,6 +-277,12 +AEAAQA- void wake+AF8-up+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p) +AH0- EXPORT+AF8-SYMBOL(wake+AF8-up+AF8-atomic+AF8-t)+ADs- +-void wake+AF8-up+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p) +-+AHs- +- +AF8AXw-wake+AF8-up+AF8-bit(atomic+AF8-t+AF8-waitqueue(p), p, WAIT+AF8-ATOMIC+AF8-T+AF8-BIT+AF8-NR)+ADs- +-+AH0- +-EXPORT+AF8-SYMBOL(wake+AF8-up+AF8-devmap+AF8-idle)+ADs- +- +AF8AXw-sched int bit+AF8-wait(struct wait+AF8-bit+AF8-key +ACo-word, int mode) +AHs- schedule()+ADs- diff --git a/mm/gup.c b/mm/gup.c index 308be897d22a..fd7b2a2e2d19 100644 --- a/mm/gup.c +-+-+- b/mm/gup.c +AEAAQA- -579,6 +-579,41 +AEAAQA- static int check+AF8-vma+AF8-flags(struct vm+AF8-area+AF8-struct +ACo-vma, unsigned long gup+AF8-flags) return 0+ADs- +AH0- +-static struct inode +ACo-do+AF8-dax+AF8-lock(struct vm+AF8-area+AF8-struct +ACo-vma, +- unsigned int foll+AF8-flags) +-+AHs- +- struct file +ACo-file+ADs- +- struct inode +ACo-inode+ADs- +- +- if (+ACE-(foll+AF8-flags +ACY- FOLL+AF8-GET)) +- return NULL+ADs- +- if (+ACE-vma+AF8-is+AF8-dax(vma)) +- return NULL+ADs- +- file +AD0- vma-+AD4-vm+AF8-file+ADs- +- inode +AD0- file+AF8-inode(file)+ADs- +- if (inode-+AD4-i+AF8-mode +AD0APQ- S+AF8-IFCHR) +- return NULL+ADs- +- return inode+ADs- +-+AH0- +- +-static struct inode +ACo-dax+AF8-truncate+AF8-lock(struct vm+AF8-area+AF8-struct +ACo-vma, +- unsigned int foll+AF8-flags) +-+AHs- +- struct inode +ACo-inode +AD0- do+AF8-dax+AF8-lock(vma, foll+AF8-flags)+ADs- +- +- if (+ACE-inode) +- return NULL+ADs- +- i+AF8-daxdma+AF8-lock(inode)+ADs- +- return inode+ADs- +-+AH0- +- +-static void dax+AF8-truncate+AF8-unlock(struct inode +ACo-inode) +-+AHs- +- if (+ACE-inode) +- return+ADs- +- i+AF8-daxdma+AF8-unlock(inode)+ADs- +-+AH0- +- /+ACoAKg- +ACo- +AF8AXw-get+AF8-user+AF8-pages() - pin user pages in memory +ACo- +AEA-tsk: task+AF8-struct of target task +AEAAQA- -659,6 +-694,7 +AEAAQA- static long +AF8AXw-get+AF8-user+AF8-pages(struct task+AF8-struct +ACo-tsk, struct mm+AF8-struct +ACo-mm, do +AHs- struct page +ACo-page+ADs- +- struct inode +ACo-inode+ADs- unsigned int foll+AF8-flags +AD0- gup+AF8-flags+ADs- unsigned int page+AF8-increm+ADs- +AEAAQA- -693,7 +-729,9 +AEAAQA- static long +AF8AXw-get+AF8-user+AF8-pages(struct task+AF8-struct +ACo-tsk, struct mm+AF8-struct +ACo-mm, if (unlikely(fatal+AF8-signal+AF8-pending(current))) return i ? i : -ERESTARTSYS+ADs- cond+AF8-resched()+ADs- +- inode +AD0- dax+AF8-truncate+AF8-lock(vma, foll+AF8-flags)+ADs- page +AD0- follow+AF8-page+AF8-mask(vma, start, foll+AF8-flags, +ACY-page+AF8-mask)+ADs- +- dax+AF8-truncate+AF8-unlock(inode)+ADs- if (+ACE-page) +AHs- int ret+ADs- ret +AD0- faultin+AF8-page(tsk, vma, start, +ACY-foll+AF8-flags, commit 67d952314e9989b3b1945c50488f4a0f760264c3 Author: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- Date: Tue Oct 24 13:41:22 2017 -0700 xfs: wire up dax dma waiting The dax-dma vs truncate collision avoidance involves acquiring the new i+AF8-dax+AF8-dmasem and validating the no ranges that are to be mapped out of the file are active for dma. If any are found we wait for page idle and retry the scan. The locations where we implement this wait line up with where we currently wait for pnfs layout leases to expire. Since we need both dma to be idle and leases to be broken, and since xfs+AF8-break+AF8-layouts drops locks, we need to retry the dma busy scan until we can complete one that finds no busy pages. Cc: Jan Kara +ADw-jack+AEA-suse.cz+AD4- Cc: Dave Chinner +ADw-david+AEA-fromorbit.com+AD4- Cc: +ACI-Darrick J. Wong+ACI- +ADw-darrick.wong+AEA-oracle.com+AD4- Cc: Ross Zwisler +ADw-ross.zwisler+AEA-linux.intel.com+AD4- Cc: Christoph Hellwig +ADw-hch+AEA-lst.de+AD4- Signed-off-by: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- diff --git a/fs/xfs/xfs+AF8-file.c b/fs/xfs/xfs+AF8-file.c index c6780743f8ec..e3ec46c28c60 100644 --- a/fs/xfs/xfs+AF8-file.c +-+-+- b/fs/xfs/xfs+AF8-file.c +AEAAQA- -347,7 +-347,7 +AEAAQA- xfs+AF8-file+AF8-aio+AF8-write+AF8-checks( return error+ADs- error +AD0- xfs+AF8-break+AF8-layouts(inode, iolock)+ADs- - if (error) +- if (error +ADw- 0) return error+ADs- /+ACo- +AEAAQA- -762,7 +-762,7 +AEAAQA- xfs+AF8-file+AF8-fallocate( struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- long error+ADs- enum xfs+AF8-prealloc+AF8-flags flags +AD0- 0+ADs- - uint iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- uint iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- loff+AF8-t new+AF8-size +AD0- 0+ADs- bool do+AF8-file+AF8-insert +AD0- 0+ADs- +AEAAQA- -771,10 +-771,20 +AEAAQA- xfs+AF8-file+AF8-fallocate( if (mode +ACY- +AH4-XFS+AF8-FALLOC+AF8-FL+AF8-SUPPORTED) return -EOPNOTSUPP+ADs- +-retry: xfs+AF8-ilock(ip, iolock)+ADs- +- dax+AF8-wait+AF8-dma(inode-+AD4-i+AF8-mapping, offset, len)+ADs- +- +- xfs+AF8-ilock(ip, XFS+AF8-IOLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-IOLOCK+AF8-EXCL+ADs- error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- - if (error) +- if (error +ADw- 0) goto out+AF8-unlock+ADs- +- else if (error +AD4- 0 +ACYAJg- IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX)) +AHs- +- xfs+AF8-iunlock(ip, iolock)+ADs- +- iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- +- goto retry+ADs- +- +AH0- xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- diff --git a/fs/xfs/xfs+AF8-inode.c b/fs/xfs/xfs+AF8-inode.c index 4ec5b7f45401..783f15894b7b 100644 --- a/fs/xfs/xfs+AF8-inode.c +-+-+- b/fs/xfs/xfs+AF8-inode.c +AEAAQA- -171,7 +-171,14 +AEAAQA- xfs+AF8-ilock+AF8-attr+AF8-map+AF8-shared( +ACo- taken in places where we need to invalidate the page cache in a race +ACo- free manner (e.g. truncate, hole punch and other extent manipulation +ACo- functions). - +ACo-/ +- +ACo- +- +ACo- The XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED lock is a CONFIG+AF8-FS+AF8-DAX special case lock +- +ACo- for synchronizing truncate vs ongoing DMA. The get+AF8-user+AF8-pages() path +- +ACo- will hold this lock exclusively when incrementing page reference +- +ACo- counts for DMA. Before an extent can be truncated we need to complete +- +ACo- a validate-idle sweep of all pages in the range while holding this +- +ACo- lock in shared mode. +-+ACo-/ void xfs+AF8-ilock( xfs+AF8-inode+AF8-t +ACo-ip, +AEAAQA- -192,6 +-199,9 +AEAAQA- xfs+AF8-ilock( (XFS+AF8-ILOCK+AF8-SHARED +AHw- XFS+AF8-ILOCK+AF8-EXCL))+ADs- ASSERT((lock+AF8-flags +ACY- +AH4-(XFS+AF8-LOCK+AF8-MASK +AHw- XFS+AF8-LOCK+AF8-SUBCLASS+AF8-MASK)) +AD0APQ- 0)+ADs- +- if (lock+AF8-flags +ACY- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) +- i+AF8-daxdma+AF8-lock+AF8-shared(VFS+AF8-I(ip))+ADs- +- if (lock+AF8-flags +ACY- XFS+AF8-IOLOCK+AF8-EXCL) +AHs- down+AF8-write+AF8-nested(+ACY-VFS+AF8-I(ip)-+AD4-i+AF8-rwsem, XFS+AF8-IOLOCK+AF8-DEP(lock+AF8-flags))+ADs- +AEAAQA- -328,6 +-338,9 +AEAAQA- xfs+AF8-iunlock( else if (lock+AF8-flags +ACY- XFS+AF8-ILOCK+AF8-SHARED) mrunlock+AF8-shared(+ACY-ip-+AD4-i+AF8-lock)+ADs- +- if (lock+AF8-flags +ACY- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) +- i+AF8-daxdma+AF8-unlock+AF8-shared(VFS+AF8-I(ip))+ADs- +- trace+AF8-xfs+AF8-iunlock(ip, lock+AF8-flags, +AF8-RET+AF8-IP+AF8-)+ADs- +AH0- diff --git a/fs/xfs/xfs+AF8-inode.h b/fs/xfs/xfs+AF8-inode.h index 0ee453de239a..0662edf00529 100644 --- a/fs/xfs/xfs+AF8-inode.h +-+-+- b/fs/xfs/xfs+AF8-inode.h +AEAAQA- -283,10 +-283,12 +AEAAQA- static inline void xfs+AF8-ifunlock(struct xfs+AF8-inode +ACo-ip) +ACM-define XFS+AF8-ILOCK+AF8-SHARED (1+ADwAPA-3) +ACM-define XFS+AF8-MMAPLOCK+AF8-EXCL (1+ADwAPA-4) +ACM-define XFS+AF8-MMAPLOCK+AF8-SHARED (1+ADwAPA-5) +-+ACM-define XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED (1+ADwAPA-6) +ACM-define XFS+AF8-LOCK+AF8-MASK (XFS+AF8-IOLOCK+AF8-EXCL +AHw- XFS+AF8-IOLOCK+AF8-SHARED +AFw- +AHw- XFS+AF8-ILOCK+AF8-EXCL +AHw- XFS+AF8-ILOCK+AF8-SHARED +AFw- - +AHw- XFS+AF8-MMAPLOCK+AF8-EXCL +AHw- XFS+AF8-MMAPLOCK+AF8-SHARED) +- +AHw- XFS+AF8-MMAPLOCK+AF8-EXCL +AHw- XFS+AF8-MMAPLOCK+AF8-SHARED +AFw- +- +AHw- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) +ACM-define XFS+AF8-LOCK+AF8-FLAGS +AFw- +AHs- XFS+AF8-IOLOCK+AF8-EXCL, +ACI-IOLOCK+AF8-EXCL+ACI- +AH0-, +AFw- +AEAAQA- -294,7 +-296,8 +AEAAQA- static inline void xfs+AF8-ifunlock(struct xfs+AF8-inode +ACo-ip) +AHs- XFS+AF8-ILOCK+AF8-EXCL, +ACI-ILOCK+AF8-EXCL+ACI- +AH0-, +AFw- +AHs- XFS+AF8-ILOCK+AF8-SHARED, +ACI-ILOCK+AF8-SHARED+ACI- +AH0-, +AFw- +AHs- XFS+AF8-MMAPLOCK+AF8-EXCL, +ACI-MMAPLOCK+AF8-EXCL+ACI- +AH0-, +AFw- - +AHs- XFS+AF8-MMAPLOCK+AF8-SHARED, +ACI-MMAPLOCK+AF8-SHARED+ACI- +AH0- +- +AHs- XFS+AF8-MMAPLOCK+AF8-SHARED, +ACI-MMAPLOCK+AF8-SHARED+ACI- +AH0-, +AFw- +- +AHs- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED, +ACI-XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ACI- +AH0- /+ACo- diff --git a/fs/xfs/xfs+AF8-ioctl.c b/fs/xfs/xfs+AF8-ioctl.c index aa75389be8cf..fd384ea00ede 100644 --- a/fs/xfs/xfs+AF8-ioctl.c +-+-+- b/fs/xfs/xfs+AF8-ioctl.c +AEAAQA- -612,7 +-612,7 +AEAAQA- xfs+AF8-ioc+AF8-space( struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- struct iattr iattr+ADs- enum xfs+AF8-prealloc+AF8-flags flags +AD0- 0+ADs- - uint iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- uint iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- int error+ADs- /+ACo- +AEAAQA- -637,18 +-637,6 +AEAAQA- xfs+AF8-ioc+AF8-space( if (filp-+AD4-f+AF8-mode +ACY- FMODE+AF8-NOCMTIME) flags +AHwAPQ- XFS+AF8-PREALLOC+AF8-INVISIBLE+ADs- - error +AD0- mnt+AF8-want+AF8-write+AF8-file(filp)+ADs- - if (error) - return error+ADs- - - xfs+AF8-ilock(ip, iolock)+ADs- - error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- - if (error) - goto out+AF8-unlock+ADs- - - xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- - iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- - switch (bf-+AD4-l+AF8-whence) +AHs- case 0: /+ACo-SEEK+AF8-SET+ACo-/ break+ADs- +AEAAQA- -659,10 +-647,31 +AEAAQA- xfs+AF8-ioc+AF8-space( bf-+AD4-l+AF8-start +-+AD0- XFS+AF8-ISIZE(ip)+ADs- break+ADs- default: - error +AD0- -EINVAL+ADs- +- return -EINVAL+ADs- +- +AH0- +- +- error +AD0- mnt+AF8-want+AF8-write+AF8-file(filp)+ADs- +- if (error) +- return error+ADs- +- +-retry: +- xfs+AF8-ilock(ip, iolock)+ADs- +- dax+AF8-wait+AF8-dma(inode-+AD4-i+AF8-mapping, bf-+AD4-l+AF8-start, bf-+AD4-l+AF8-len)+ADs- +- +- xfs+AF8-ilock(ip, XFS+AF8-IOLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- +- if (error +ADw- 0) goto out+AF8-unlock+ADs- +- else if (error +AD4- 0 +ACYAJg- IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX)) +AHs- +- xfs+AF8-iunlock(ip, iolock)+ADs- +- iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- +- goto retry+ADs- +AH0- +- xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- +- /+ACo- +ACo- length of +ADwAPQ- 0 for resv/unresv/zero is invalid. length for +ACo- alloc/free is ignored completely and we have no idea what userspace diff --git a/fs/xfs/xfs+AF8-pnfs.c b/fs/xfs/xfs+AF8-pnfs.c index 4246876df7b7..5f4d46b3cd7f 100644 --- a/fs/xfs/xfs+AF8-pnfs.c +-+-+- b/fs/xfs/xfs+AF8-pnfs.c +AEAAQA- -35,18 +-35,19 +AEAAQA- xfs+AF8-break+AF8-layouts( uint +ACo-iolock) +AHs- struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- - int error+ADs- +- int error, did+AF8-unlock +AD0- 0+ADs- ASSERT(xfs+AF8-isilocked(ip, XFS+AF8-IOLOCK+AF8-SHARED+AHw-XFS+AF8-IOLOCK+AF8-EXCL))+ADs- while ((error +AD0- break+AF8-layout(inode, false) +AD0APQ- -EWOULDBLOCK)) +AHs- xfs+AF8-iunlock(ip, +ACo-iolock)+ADs- +- did+AF8-unlock +AD0- 1+ADs- error +AD0- break+AF8-layout(inode, true)+ADs- +ACo-iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- xfs+AF8-ilock(ip, +ACo-iolock)+ADs- +AH0- - return error+ADs- +- return error +ADw- 0 ? error : did+AF8-unlock+ADs- +AH0- /+ACo- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: "Williams, Dan J" To: "hch@lst.de" , "jack@suse.cz" CC: "schwidefsky@de.ibm.com" , "darrick.wong@oracle.com" , "dledford@redhat.com" , "linux-rdma@vger.kernel.org" , "linux-fsdevel@vger.kernel.org" , "bfields@fieldses.org" , "linux-mm@kvack.org" , "heiko.carstens@de.ibm.com" , "dave.hansen@linux.intel.com" , "linux-xfs@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "jmoyer@redhat.com" , "viro@zeniv.linux.org.uk" , "kirill.shutemov@linux.intel.com" , "akpm@linux-foundation.org" , "Hefty, Sean" , "linux-nvdimm@lists.01.org" , "jlayton@poochiereds.net" , "mawilcox@microsoft.com" , "mhocko@suse.com" , "ross.zwisler@linux.intel.com" , "gerald.schaefer@de.ibm.com" , "jgunthorpe@obsidianresearch.com" , "hal.rosenstock@gmail.com" , "benh@kernel.crashing.org" , "david@fromorbit.com" , "mpe@ellerman.id.au" , "paulus@samba.org" Subject: Re: [PATCH v3 00/13] dax: fix dma vs truncate and remove 'page-less' support Date: Thu, 26 Oct 2017 23:51:04 +0000 Message-ID: <1509061831.25213.2.camel@intel.com> References: <150846713528.24336.4459262264611579791.stgit@dwillia2-desk3.amr.corp.intel.com> <20171020074750.GA13568@lst.de> <20171020093148.GA20304@lst.de> <20171026105850.GA31161@quack2.suse.cz> In-Reply-To: <20171026105850.GA31161@quack2.suse.cz> Content-Language: en-US Content-Type: text/plain; charset="utf-7" Content-ID: <6B1E8E88FC7C9E4D9754113A024144AD@intel.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: owner-linux-mm@kvack.org List-ID: On Thu, 2017-10-26 at 12:58 +-0200, Jan Kara wrote: +AD4- On Fri 20-10-17 11:31:48, Christoph Hellwig wrote: +AD4- +AD4- On Fri, Oct 20, 2017 at 09:47:50AM +-0200, Christoph Hellwig wr= ote: +AD4- +AD4- +AD4- I'd like to brainstorm how we can do something better. +AD4- +AD4- +AD4-=20 +AD4- +AD4- +AD4- How about: +AD4- +AD4- +AD4-=20 +AD4- +AD4- +AD4- If we hit a page with an elevated refcount in truncate / = hole puch +AD4- +AD4- +AD4- etc for a DAX file system we do not free the blocks in th= e file system, +AD4- +AD4- +AD4- but add it to the extent busy list.+AKAAoA-We mark the pa= ge as delayed +AD4- +AD4- +AD4- free (e.g. page flag?) so that when it finally hits refco= unt zero we +AD4- +AD4- +AD4- call back into the file system to remove it from the busy= list. +AD4- +AD4-=20 +AD4- +AD4- Brainstorming some more: +AD4- +AD4-=20 +AD4- +AD4- Given that on a DAX file there shouldn't be any long-term page +AD4- +AD4- references after we unmap it from the page table and don't allo= w +AD4- +AD4- get+AF8-user+AF8-pages calls why not wait for the references fo= r all +AD4- +AD4- DAX pages to go away first?+AKAAoA-E.g. if we find a DAX page i= n +AD4- +AD4- truncate+AF8-inode+AF8-pages+AF8-range that has an elevated ref= count we set +AD4- +AD4- a new flag to prevent new references from showing up, and then +AD4- +AD4- simply wait for it to go away.+AKAAoA-Instead of a busy way we = can +AD4- +AD4- do this through a few hashed waitqueued in dev+AF8-pagemap.+AKA= AoA-And in +AD4- +AD4- fact put+AF8-zone+AF8-device+AF8-page already gets called when = putting the +AD4- +AD4- last page so we can handle the wakeup from there. +AD4- +AD4-=20 +AD4- +AD4- In fact if we can't find a page flag for the stop new callers +AD4- +AD4- things we could probably come up with a way to do that through +AD4- +AD4- dev+AF8-pagemap somehow, but I'm not sure how efficient that wo= uld +AD4- +AD4- be. +AD4-=20 +AD4- We were talking about this yesterday with Dan so some more brainstorm= ing +AD4- from us. We can implement the solution with extent busy list in ext4 +AD4- relatively easily - we already have such list currently similarly to = XFS. +AD4- There would be some modifications needed but nothing too complex. The +AD4- biggest downside of this solution I see is that it requires per-files= ystem +AD4- solution for busy extents - ext4 and XFS are reasonably fine, however= btrfs +AD4- may have problems and ext2 definitely will need some modifications. +AD4- Invisible used blocks may be surprising to users at times although gi= ven +AD4- page refs should be relatively short term, that should not be a big i= ssue. +AD4- But are we guaranteed page refs are short term? E.g. if someone creat= es +AD4- v4l2 videobuf in MAP+AF8-SHARED mapping of a file on DAX filesystem, = page refs +AD4- can be rather long-term similarly as in RDMA case. Also freeing of bl= ocks +AD4- on page reference drop is another async entry point into the filesyst= em +AD4- which could unpleasantly surprise us but I guess workqueues would sol= ve +AD4- that reasonably fine. +AD4-=20 +AD4- WRT waiting for page refs to be dropped before proceeding with trunca= te (or +AD4- punch hole for that matter - that case is even nastier since we don't= have +AD4- i+AF8-size to guard us). What I like about this solution is that it i= s very +AD4- visible there's something unusual going on with the file being trunca= ted / +AD4- punched and so problems are easier to diagnose / fix from the admin s= ide. +AD4- So far we have guarded hole punching from concurrent faults (and +AD4- get+AF8-user+AF8-pages() does fault once you do unmap+AF8-mapping+AF8= -range()) with +AD4- I+AF8-MMAP+AF8-LOCK (or its equivalent in ext4). We cannot easily wai= t for page +AD4- refs to be dropped under I+AF8-MMAP+AF8-LOCK as that could deadlock -= the most +AD4- obvious case Dan came up with is when GUP obtains ref to page A, then= hole +AD4- punch comes grabbing I+AF8-MMAP+AF8-LOCK and waiting for page ref on = A to be +AD4- dropped, and then GUP blocks on trying to fault in another page. +AD4-=20 +AD4- I think we cannot easily prevent new page references to be grabbed as= you +AD4- write above since nobody expects stuff like get+AF8-page() to fail. B= ut I+AKA- +AD4- think that unmapping relevant pages and then preventing them to be fa= ulted +AD4- in again is workable and stops GUP as well. The problem with that is = though +AD4- what to do with page faults to such pages - you cannot just fail them= for +AD4- hole punch, and you cannot easily allocate new blocks either. So we a= re +AD4- back at a situation where we need to detach blocks from the inode and= then +AD4- wait for page refs to be dropped - so some form of busy extents. Am I +AD4- missing something? +AD4-=20 No, that's a good summary of what we talked about. However, I did go back and give the new lock approach a try and was able to get my test to pass. The new locking is not pretty especially since you need to drop and reacquire the lock so that get+AF8-user+AF8-pages() can finish grabbing all the pages it needs. Here are the two primary patches in the series, do you think the extent-busy approach would be cleaner? --- commit 5023d20a0aa795ddafd43655be1bfb2cbc7f4445 Author: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- Date: Wed Oct 25 05:14:54 2017 -0700 mm, dax: handle truncate of dma-busy pages =20 get+AF8-user+AF8-pages() pins file backed memory pages for access by dm= a devices. However, it only pins the memory pages not the page-to-file offset association. If a file is truncated the pages are mapped out of the file and dma may continue indefinitely into a page that is owned by a device driver. This breaks coherency of the file vs dma, but the assumption is that if userspace wants the file-space truncated it does not matter what data is inbound from the device, it is not relevant anymore. =20 The assumptions of the truncate-page-cache model are broken by DAX wher= e the target DMA page +ACo-is+ACo- the filesystem block. Leaving the page= pinned for DMA, but truncating the file block out of the file, means that the filesytem is free to reallocate a block under active DMA to another file+ACE- =20 Here are some possible options for fixing this situation ('truncate' an= d 'fallocate(punch hole)' are synonymous below): =20 1/ Fail truncate while any file blocks might be under dma =20 2/ Block (sleep-wait) truncate while any file blocks might be under dma =20 3/ Remap file blocks to a +ACI-lost+-found+ACI--like file-inode whe= re dma can continue and we might see what inbound data from DMA was mapped out of the original file. Blocks in this file could be freed back to the filesystem when dma eventually ends. =20 4/ List the blocks under DMA in the extent busy list and either hol= d off commit of the truncate transaction until commit, or otherwis= e keep the blocks marked busy so the allocator does not reuse them until DMA completes. =20 5/ Disable dax until option 3 or another long term solution has bee= n implemented. However, filesystem-dax is still marked experimenta= l for concerns like this. =20 Option 1 will throw failures where userspace has never expected them before, option 2 might hang the truncating process indefinitely, and option 3 requires per filesystem enabling to remap blocks from one inod= e to another. Option 2 is implemented in this patch for the DAX path wit= h the expectation that non-transient users of get+AF8-user+AF8-pages() (R= DMA) are disallowed from setting up dax mappings and that the potential delay introduced to the truncate path is acceptable compared to the response time of the page cache case. This can only be seen as a stop-gap until we can solve the problem of safely sequestering unallocated filesystem blocks under active dma. =20 The solution introduces a new inode semaphore that that is held exclusively for get+AF8-user+AF8-pages() and held for read at truncate = while sleep-waiting on a hashed waitqueue. =20 Credit for option 3 goes to Dave Hansen, who proposed something similar as an alternative way to solve the problem that MAP+AF8-DIRECT was tryi= ng to solve. Credit for option 4 goes to Christoph Hellwig. =20 Cc: Jan Kara +ADw-jack+AEA-suse.cz+AD4- Cc: Jeff Moyer +ADw-jmoyer+AEA-redhat.com+AD4- Cc: Dave Chinner +ADw-david+AEA-fromorbit.com+AD4- Cc: Matthew Wilcox +ADw-mawilcox+AEA-microsoft.com+AD4- Cc: Alexander Viro +ADw-viro+AEA-zeniv.linux.org.uk+AD4- Cc: +ACI-Darrick J. Wong+ACI- +ADw-darrick.wong+AEA-oracle.com+AD4- Cc: Ross Zwisler +ADw-ross.zwisler+AEA-linux.intel.com+AD4- Cc: Dave Hansen +ADw-dave.hansen+AEA-linux.intel.com+AD4- Cc: Andrew Morton +ADw-akpm+AEA-linux-foundation.org+AD4- Reported-by: Christoph Hellwig +ADw-hch+AEA-lst.de+AD4- Signed-off-by: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 4ac359e14777..a5a4b95ffdaf 100644 --- a/drivers/dax/super.c +-+-+- b/drivers/dax/super.c +AEAAQA- -167,6 +-167,7 +AEAAQA- struct dax+AF8-device +AHs- +ACM-if IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX) static void generic+AF8-dax+AF8-pagefree(struct page +ACo-page, void +ACo-= data) +AHs- +- wake+AF8-up+AF8-devmap+AF8-idle(+ACY-page-+AD4AXw-refcount)+ADs- +AH0- =20 struct dax+AF8-device +ACo-fs+AF8-dax+AF8-claim+AF8-bdev(struct block+AF8-= device +ACo-bdev, void +ACo-owner) diff --git a/fs/dax.c b/fs/dax.c index fd5d385988d1..f2c98f9cb833 100644 --- a/fs/dax.c +-+-+- b/fs/dax.c +AEAAQA- -346,6 +-346,19 +AEAAQA- static void dax+AF8-disassociate+AF8-entr= y(void +ACo-entry, struct inode +ACo-inode, bool trunc) +AH0- +AH0- =20 +-static struct page +ACo-dma+AF8-busy+AF8-page(void +ACo-entry) +-+AHs- +- unsigned long pfn, end+AF8-pfn+ADs- +- +- for+AF8-each+AF8-entry+AF8-pfn(entry, pfn, end+AF8-pfn) +AHs- +- struct page +ACo-page +AD0- pfn+AF8-to+AF8-page(pfn)+ADs- +- +- if (page+AF8-ref+AF8-count(page) +AD4- 1) +- return page+ADs- +- +AH0- +- return NULL+ADs- +-+AH0- +- /+ACo- +ACo- Find radix tree entry at given index. If it points to an exceptiona= l entry, +ACo- return it with the radix tree entry locked. If the radix tree doesn= 't +AEAAQA- -487,6 +-500,97 +AEAAQA- static void +ACo-grab+AF8-mapping+AF8-ent= ry(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index, return entry+ADs- +AH0- =20 +-static int wait+AF8-page(atomic+AF8-t +ACoAXw-refcount) +-+AHs- +- struct page +ACo-page +AD0- container+AF8-of(+AF8-refcount, struct page,= +AF8-refcount)+ADs- +- struct inode +ACo-inode +AD0- page-+AD4-inode+ADs- +- +- if (page+AF8-ref+AF8-count(page) +AD0APQ- 1) +- return 0+ADs- +- +- i+AF8-daxdma+AF8-unlock+AF8-shared(inode)+ADs- +- schedule()+ADs- +- i+AF8-daxdma+AF8-lock+AF8-shared(inode)+ADs- +- +- /+ACo- +- +ACo- if we bounced the daxdma+AF8-lock then we need to rescan the +- +ACo- truncate area. +- +ACo-/ +- return 1+ADs- +-+AH0- +- +-void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-mapping, loff+AF8= -t lstart, loff+AF8-t len) +-+AHs- +- struct inode +ACo-inode +AD0- mapping-+AD4-host+ADs- +- pgoff+AF8-t indices+AFs-PAGEVEC+AF8-SIZE+AF0AOw- +- pgoff+AF8-t start, end, index+ADs- +- struct pagevec pvec+ADs- +- unsigned i+ADs- +- +- lockdep+AF8-assert+AF8-held(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +- +- if (lstart +ADw- 0 +AHwAfA- len +ADw- -1) +- return+ADs- +- +- /+ACo- in the limited case get+AF8-user+AF8-pages for dax is disabled +A= Co-/ +- if (IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX+AF8-LIMITED)) +- return+ADs- +- +- if (+ACE-dax+AF8-mapping(mapping)) +- return+ADs- +- +- if (mapping-+AD4-nrexceptional +AD0APQ- 0) +- return+ADs- +- +- if (len +AD0APQ- -1) +- end +AD0- -1+ADs- +- else +- end +AD0- (lstart +- len) +AD4APg- PAGE+AF8-SHIFT+ADs- +- start +AD0- lstart +AD4APg- PAGE+AF8-SHIFT+ADs- +- +-retry: +- pagevec+AF8-init(+ACY-pvec, 0)+ADs- +- index +AD0- start+ADs- +- while (index +ADw- end +ACYAJg- pagevec+AF8-lookup+AF8-entries(+ACY-pvec= , mapping, index, +- min(end - index, (pgoff+AF8-t)PAGEVEC+AF8-SIZE), +- indices)) +AHs- +- for (i +AD0- 0+ADs- i +ADw- pagevec+AF8-count(+ACY-pvec)+ADs- i+-+-) +A= Hs- +- struct page +ACo-pvec+AF8-ent +AD0- pvec.pages+AFs-i+AF0AOw- +- struct page +ACo-page +AD0- NULL+ADs- +- void +ACo-entry+ADs- +- +- index +AD0- indices+AFs-i+AF0AOw- +- if (index +AD4APQ- end) +- break+ADs- +- +- if (+ACE-radix+AF8-tree+AF8-exceptional+AF8-entry(pvec+AF8-ent)) +- continue+ADs- +- +- spin+AF8-lock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- entry +AD0- get+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, NUL= L)+ADs- +- if (entry) +- page +AD0- dma+AF8-busy+AF8-page(entry)+ADs- +- put+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, entry)+ADs- +- spin+AF8-unlock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- +- if (page +ACYAJg- wait+AF8-on+AF8-devmap+AF8-idle(+ACY-page-+AD4AXw-re= fcount, +- wait+AF8-page, +- TASK+AF8-UNINTERRUPTIBLE) +ACEAPQ- 0) +AHs- +- /+ACo- +- +ACo- We dropped the dma lock, so we need +- +ACo- to revalidate that previously seen +- +ACo- idle pages are still idle. +- +ACo-/ +- goto retry+ADs- +- +AH0- +- +AH0- +- pagevec+AF8-remove+AF8-exceptionals(+ACY-pvec)+ADs- +- pagevec+AF8-release(+ACY-pvec)+ADs- +- index+-+-+ADs- +- +AH0- +-+AH0- +-EXPORT+AF8-SYMBOL+AF8-GPL(dax+AF8-wait+AF8-dma)+ADs- +- static int +AF8AXw-dax+AF8-invalidate+AF8-mapping+AF8-entry(struct address= +AF8-space +ACo-mapping, pgoff+AF8-t index, bool trunc) +AHs- +AEAAQA- -509,8 +-613,10 +AEAAQA- static int +AF8AXw-dax+AF8-invalidate+AF8= -mapping+AF8-entry(struct address+AF8-space +ACo-mapping, out: put+AF8-unlocked+AF8-mapping+AF8-entry(mapping, index, entry)+ADs- spin+AF8-unlock+AF8-irq(+ACY-mapping-+AD4-tree+AF8-lock)+ADs- +- return ret+ADs- +AH0- +- /+ACo- +ACo- Delete exceptional DAX entry at +AEA-index from +AEA-mapping. Wait = for radix tree +ACo- entry to get unlocked before deleting it. diff --git a/fs/inode.c b/fs/inode.c index d1e35b53bb23..95408e87a96c 100644 --- a/fs/inode.c +-+-+- b/fs/inode.c +AEAAQA- -192,6 +-192,7 +AEAAQA- int inode+AF8-init+AF8-always(struct super= +AF8-block +ACo-sb, struct inode +ACo-inode) inode-+AD4-i+AF8-fsnotify+AF8-mask +AD0- 0+ADs- +ACM-endif inode-+AD4-i+AF8-flctx +AD0- NULL+ADs- +- i+AF8-daxdma+AF8-init(inode)+ADs- this+AF8-cpu+AF8-inc(nr+AF8-inodes)+ADs- =20 return 0+ADs- diff --git a/include/linux/dax.h b/include/linux/dax.h index ea21ebfd1889..6ce1c50519e7 100644 --- a/include/linux/dax.h +-+-+- b/include/linux/dax.h +AEAAQA- -100,10 +-100,15 +AEAAQA- int dax+AF8-invalidate+AF8-mapping+AF8-e= ntry+AF8-sync(struct address+AF8-space +ACo-mapping, pgoff+AF8-t index)+ADs- =20 +ACM-ifdef CONFIG+AF8-FS+AF8-DAX +-void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-mapping, loff+AF8= -t lstart, loff+AF8-t len)+ADs- int +AF8AXw-dax+AF8-zero+AF8-page+AF8-range(struct block+AF8-device +ACo-b= dev, struct dax+AF8-device +ACo-dax+AF8-dev, sector+AF8-t sector, unsigned int offset, unsigned int length)+ADs- +ACM-else +-static inline void dax+AF8-wait+AF8-dma(struct address+AF8-space +ACo-map= ping, loff+AF8-t lstart, +- loff+AF8-t len) +-+AHs- +-+AH0- static inline int +AF8AXw-dax+AF8-zero+AF8-page+AF8-range(struct block+AF8= -device +ACo-bdev, struct dax+AF8-device +ACo-dax+AF8-dev, sector+AF8-t sector, unsigned int offset, unsigned int length) diff --git a/include/linux/fs.h b/include/linux/fs.h index 13dab191a23e..cd5b4a092d1c 100644 --- a/include/linux/fs.h +-+-+- b/include/linux/fs.h +AEAAQA- -645,6 +-645,9 +AEAAQA- struct inode +AHs- +ACM-ifdef CONFIG+AF8-IMA atomic+AF8-t i+AF8-readcount+ADs- /+ACo- struct files open RO +ACo-/ +ACM-endif +-+ACM-ifdef CONFIG+AF8-FS+AF8-DAX +- struct rw+AF8-semaphore i+AF8-dax+AF8-dmasem+ADs- +-+ACM-endif const struct file+AF8-operations +ACo-i+AF8-fop+ADs- /+ACo- former -+AD4-= i+AF8-op-+AD4-default+AF8-file+AF8-ops +ACo-/ struct file+AF8-lock+AF8-context +ACo-i+AF8-flctx+ADs- struct address+AF8-space i+AF8-data+ADs- +AEAAQA- -747,6 +-750,59 +AEAAQA- static inline void inode+AF8-lock+AF8-nes= ted(struct inode +ACo-inode, unsigned subclass) down+AF8-write+AF8-nested(+ACY-inode-+AD4-i+AF8-rwsem, subclass)+ADs- +AH0- =20 +-+ACM-ifdef CONFIG+AF8-FS+AF8-DAX +-static inline void i+AF8-daxdma+AF8-init(struct inode +ACo-inode) +-+AHs- +- init+AF8-rwsem(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock(struct inode +ACo-inode) +-+AHs- +- down+AF8-write(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock(struct inode +ACo-inode) +-+AHs- +- up+AF8-write(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock+AF8-shared(struct inode +ACo-ino= de) +-+AHs- +- /+ACo- +- +ACo- The write lock is taken under mmap+AF8-sem in the +- +ACo- get+AF8-user+AF8-pages() path the read lock nests in the truncate +- +ACo- path. +- +ACo-/ +-+ACM-define DAXDMA+AF8-TRUNCATE+AF8-CLASS 1 +- down+AF8-read+AF8-nested(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem, DAXDMA+AF= 8-TRUNCATE+AF8-CLASS)+ADs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock+AF8-shared(struct inode +ACo-i= node) +-+AHs- +- up+AF8-read(+ACY-inode-+AD4-i+AF8-dax+AF8-dmasem)+ADs- +-+AH0- +-+ACM-else /+ACo- CONFIG+AF8-FS+AF8-DAX +ACo-/ +-static inline void i+AF8-daxdma+AF8-init(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock(struct inode +ACo-inode) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-lock+AF8-shared(struct inode +ACo-ino= de) +-+AHs- +-+AH0- +- +-static inline void i+AF8-daxdma+AF8-unlock+AF8-shared(struct inode +ACo-i= node) +-+AHs- +-+AH0- +-+ACM-endif /+ACo- CONFIG+AF8-FS+AF8-DAX +ACo-/ +- void lock+AF8-two+AF8-nondirectories(struct inode +ACo-, struct inode+ACo-= )+ADs- void unlock+AF8-two+AF8-nondirectories(struct inode +ACo-, struct inode+AC= o-)+ADs- =20 diff --git a/include/linux/wait+AF8-bit.h b/include/linux/wait+AF8-bit.h index 12b26660d7e9..6186ecdb9df7 100644 --- a/include/linux/wait+AF8-bit.h +-+-+- b/include/linux/wait+AF8-bit.h +AEAAQA- -30,10 +-30,12 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-bit(struct wai= t+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-en= try +ACo- int +AF8AXw-wait+AF8-on+AF8-bit+AF8-lock(struct wait+AF8-queue+AF8-head +A= Co-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry,= wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- void wake+AF8-up+AF8-bit(void +ACo-word, int bit)+ADs- void wake+AF8-up+AF8-atomic+AF8-t(atomic+AF8-t +ACo-p)+ADs- +-void wake+AF8-up+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit(void +ACo-word, int, wait+= AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit+AF8-timeout(void +ACo-word= , int, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode, unsign= ed long timeout)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-bit+AF8-lock(void +ACo-word, i= nt, wait+AF8-bit+AF8-action+AF8-f +ACo-action, unsigned int mode)+ADs- int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(atomic+AF8-t +ACo= -p, int (+ACo-)(atomic+AF8-t +ACo-), unsigned int mode)+ADs- +-int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle(atomic+AF8-t = +ACo-p, int (+ACo-)(atomic+AF8-t +ACo-), unsigned int mode)+ADs- struct wait+AF8-queue+AF8-head +ACo-bit+AF8-waitqueue(void +ACo-word, int = bit)+ADs- extern void +AF8AXw-init wait+AF8-bit+AF8-init(void)+ADs- =20 +AEAAQA- -258,4 +-260,12 +AEAAQA- int wait+AF8-on+AF8-atomic+AF8-t(atomic+A= F8-t +ACo-val, int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) return out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(val, action, = mode)+ADs- +AH0- =20 +-static inline +-int wait+AF8-on+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-val, int (+ACo-acti= on)(atomic+AF8-t +ACo-), unsigned mode) +-+AHs- +- might+AF8-sleep()+ADs- +- if (atomic+AF8-read(val) +AD0APQ- 1) +- return 0+ADs- +- return out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle(val, acti= on, mode)+ADs- +-+AH0- +ACM-endif /+ACo- +AF8-LINUX+AF8-WAIT+AF8-BIT+AF8-H +ACo-/ diff --git a/kernel/sched/wait+AF8-bit.c b/kernel/sched/wait+AF8-bit.c index f8159698aa4d..6ea93149614a 100644 --- a/kernel/sched/wait+AF8-bit.c +-+-+- b/kernel/sched/wait+AF8-bit.c +AEAAQA- -162,11 +-162,17 +AEAAQA- static inline wait+AF8-queue+AF8-head+AF= 8-t +ACo-atomic+AF8-t+AF8-waitqueue(atomic+AF8-t +ACo-p) return bit+AF8-waitqueue(p, 0)+ADs- +AH0- =20 -static int wake+AF8-atomic+AF8-t+AF8-function(struct wait+AF8-queue+AF8-en= try +ACo-wq+AF8-entry, unsigned mode, int sync, - void +ACo-arg) +-static inline struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-to+AF8-wait+AF= 8-bit+AF8-q( +- struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry) +-+AHs- +- return container+AF8-of(wq+AF8-entry, struct wait+AF8-bit+AF8-queue+AF8-= entry, wq+AF8-entry)+ADs- +-+AH0- +- +-static int wake+AF8-atomic+AF8-t+AF8-function(struct wait+AF8-queue+AF8-e= ntry +ACo-wq+AF8-entry, +- unsigned mode, int sync, void +ACo-arg) +AHs- struct wait+AF8-bit+AF8-key +ACo-key +AD0- arg+ADs- - struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- container= +AF8-of(wq+AF8-entry, struct wait+AF8-bit+AF8-queue+AF8-entry, wq+AF8-entry= )+ADs- +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- to+AF8-w= ait+AF8-bit+AF8-q(wq+AF8-entry)+ADs- atomic+AF8-t +ACo-val +AD0- key-+AD4-flags+ADs- =20 if (wait+AF8-bit-+AD4-key.flags +ACEAPQ- key-+AD4-flags +AHwAfA- +AEAAQA- -176,14 +-182,29 +AEAAQA- static int wake+AF8-atomic+AF8-t+AF8-fun= ction(struct wait+AF8-queue+AF8-entry +ACo-wq+AF8-entry, unsigned mo return autoremove+AF8-wake+AF8-function(wq+AF8-entry, mode, sync, key)+AD= s- +AH0- =20 +-static int wake+AF8-devmap+AF8-idle+AF8-function(struct wait+AF8-queue+AF= 8-entry +ACo-wq+AF8-entry, +- unsigned mode, int sync, void +ACo-arg) +-+AHs- +- struct wait+AF8-bit+AF8-key +ACo-key +AD0- arg+ADs- +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wait+AF8-bit +AD0- to+AF8-w= ait+AF8-bit+AF8-q(wq+AF8-entry)+ADs- +- atomic+AF8-t +ACo-val +AD0- key-+AD4-flags+ADs- +- +- if (wait+AF8-bit-+AD4-key.flags +ACEAPQ- key-+AD4-flags +AHwAfA- +- wait+AF8-bit-+AD4-key.bit+AF8-nr +ACEAPQ- key-+AD4-bit+AF8-nr +AHwAf= A- +- atomic+AF8-read(val) +ACEAPQ- 1) +- return 0+ADs- +- return autoremove+AF8-wake+AF8-function(wq+AF8-entry, mode, sync, key)+A= Ds- +-+AH0- +- /+ACo- +ACo- To allow interruptible waiting and asynchronous (i.e. nonblocking) = waiting, +ACo- the actions of +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t() are permitted= return codes. Nonzero +ACo- return codes halt waiting and return. +ACo-/ static +AF8AXw-sched -int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head +A= Co-wq+AF8-head, struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry, - int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) +-int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(struct wait+AF8-queue+AF8-head += ACo-wq+AF8-head, +- struct wait+AF8-bit+AF8-queue+AF8-entry +ACo-wbq+AF8-entry, +- int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode, int target) +AHs- atomic+AF8-t +ACo-val+ADs- int ret +AD0- 0+ADs- +AEAAQA- -191,10 +-212,10 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t= (struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-q= ueue+AF8-en do +AHs- prepare+AF8-to+AF8-wait(wq+AF8-head, +ACY-wbq+AF8-entry-+AD4-wq+AF8-entr= y, mode)+ADs- val +AD0- wbq+AF8-entry-+AD4-key.flags+ADs- - if (atomic+AF8-read(val) +AD0APQ- 0) +- if (atomic+AF8-read(val) +AD0APQ- target) break+ADs- ret +AD0- (+ACo-action)(val)+ADs- - +AH0- while (+ACE-ret +ACYAJg- atomic+AF8-read(val) +ACEAPQ- 0)+ADs- +- +AH0- while (+ACE-ret +ACYAJg- atomic+AF8-read(val) +ACEAPQ- target)+ADs= - finish+AF8-wait(wq+AF8-head, +ACY-wbq+AF8-entry-+AD4-wq+AF8-entry)+ADs- return ret+ADs- +AH0- +AEAAQA- -210,16 +-231,37 +AEAAQA- int +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t= (struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head, struct wait+AF8-bit+AF8-q= ueue+AF8-en +AH0-, +AFw- +AH0- =20 +-+ACM-define DEFINE+AF8-WAIT+AF8-DEVMAP+AF8-IDLE(name, p) +AFw- +- struct wait+AF8-bit+AF8-queue+AF8-entry name +AD0- +AHs- +AFw- +- .key +AD0- +AF8AXw-WAIT+AF8-ATOMIC+AF8-T+AF8-KEY+AF8-INITIALIZER(p), += AFw- +- .wq+AF8-entry +AD0- +AHs- +AFw- +- .private +AD0- current, +AFw- +- .func +AD0- wake+AF8-devmap+AF8-idle+AF8-function, +AFw- +- .entry +AD0- +AFw- +- LIST+AF8-HEAD+AF8-INIT((name).wq+AF8-entry.entry), +AFw- +- +AH0-, +AFw- +- +AH0- +- +AF8AXw-sched int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t(ato= mic+AF8-t +ACo-p, int (+ACo-action)(atomic+AF8-t +ACo-), unsigned mode) +AHs- struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head +AD0- atomic+AF8-t+AF8-wa= itqueue(p)+ADs- DEFINE+AF8-WAIT+AF8-ATOMIC+AF8-T(wq+AF8-entry, p)+ADs- =20 - return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-entr= y, action, mode)+ADs- +- return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-ent= ry, action, mode, 0)+ADs- +AH0- EXPORT+AF8-SYMBOL(out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-atomic+AF8-t)+AD= s- =20 +-+AF8AXw-sched int out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle= (atomic+AF8-t +ACo-p, int (+ACo-action)(atomic+AF8-t +ACo-), +- unsigned mode) +-+AHs- +- struct wait+AF8-queue+AF8-head +ACo-wq+AF8-head +AD0- atomic+AF8-t+AF8-w= aitqueue(p)+ADs- +- DEFINE+AF8-WAIT+AF8-DEVMAP+AF8-IDLE(wq+AF8-entry, p)+ADs- +- +- return +AF8AXw-wait+AF8-on+AF8-atomic+AF8-t(wq+AF8-head, +ACY-wq+AF8-ent= ry, action, mode, 1)+ADs- +-+AH0- +-EXPORT+AF8-SYMBOL(out+AF8-of+AF8-line+AF8-wait+AF8-on+AF8-devmap+AF8-idle= )+ADs- +- /+ACoAKg- +ACo- wake+AF8-up+AF8-atomic+AF8-t - Wake up a waiter on a atomic+AF8-t +ACo- +AEA-p: The atomic+AF8-t being waited on, a kernel virtual address +AEAAQA- -235,6 +-277,12 +AEAAQA- void wake+AF8-up+AF8-atomic+AF8-t(atomic+= AF8-t +ACo-p) +AH0- EXPORT+AF8-SYMBOL(wake+AF8-up+AF8-atomic+AF8-t)+ADs- =20 +-void wake+AF8-up+AF8-devmap+AF8-idle(atomic+AF8-t +ACo-p) +-+AHs- +- +AF8AXw-wake+AF8-up+AF8-bit(atomic+AF8-t+AF8-waitqueue(p), p, WAIT+AF8-A= TOMIC+AF8-T+AF8-BIT+AF8-NR)+ADs- +-+AH0- +-EXPORT+AF8-SYMBOL(wake+AF8-up+AF8-devmap+AF8-idle)+ADs- +- +AF8AXw-sched int bit+AF8-wait(struct wait+AF8-bit+AF8-key +ACo-word, int = mode) +AHs- schedule()+ADs- diff --git a/mm/gup.c b/mm/gup.c index 308be897d22a..fd7b2a2e2d19 100644 --- a/mm/gup.c +-+-+- b/mm/gup.c +AEAAQA- -579,6 +-579,41 +AEAAQA- static int check+AF8-vma+AF8-flags(struct= vm+AF8-area+AF8-struct +ACo-vma, unsigned long gup+AF8-flags) return 0+ADs- +AH0- =20 +-static struct inode +ACo-do+AF8-dax+AF8-lock(struct vm+AF8-area+AF8-struc= t +ACo-vma, +- unsigned int foll+AF8-flags) +-+AHs- +- struct file +ACo-file+ADs- +- struct inode +ACo-inode+ADs- +- +- if (+ACE-(foll+AF8-flags +ACY- FOLL+AF8-GET)) +- return NULL+ADs- +- if (+ACE-vma+AF8-is+AF8-dax(vma)) +- return NULL+ADs- +- file +AD0- vma-+AD4-vm+AF8-file+ADs- +- inode +AD0- file+AF8-inode(file)+ADs- +- if (inode-+AD4-i+AF8-mode +AD0APQ- S+AF8-IFCHR) +- return NULL+ADs- +- return inode+ADs- +-+AH0- +- +-static struct inode +ACo-dax+AF8-truncate+AF8-lock(struct vm+AF8-area+AF8= -struct +ACo-vma, +- unsigned int foll+AF8-flags) +-+AHs- +- struct inode +ACo-inode +AD0- do+AF8-dax+AF8-lock(vma, foll+AF8-flags)+A= Ds- +- +- if (+ACE-inode) +- return NULL+ADs- +- i+AF8-daxdma+AF8-lock(inode)+ADs- +- return inode+ADs- +-+AH0- +- +-static void dax+AF8-truncate+AF8-unlock(struct inode +ACo-inode) +-+AHs- +- if (+ACE-inode) +- return+ADs- +- i+AF8-daxdma+AF8-unlock(inode)+ADs- +-+AH0- +- /+ACoAKg- +ACo- +AF8AXw-get+AF8-user+AF8-pages() - pin user pages in memory +ACo- +AEA-tsk: task+AF8-struct of target task +AEAAQA- -659,6 +-694,7 +AEAAQA- static long +AF8AXw-get+AF8-user+AF8-pages= (struct task+AF8-struct +ACo-tsk, struct mm+AF8-struct +ACo-mm, =20 do +AHs- struct page +ACo-page+ADs- +- struct inode +ACo-inode+ADs- unsigned int foll+AF8-flags +AD0- gup+AF8-flags+ADs- unsigned int page+AF8-increm+ADs- =20 +AEAAQA- -693,7 +-729,9 +AEAAQA- static long +AF8AXw-get+AF8-user+AF8-pages= (struct task+AF8-struct +ACo-tsk, struct mm+AF8-struct +ACo-mm, if (unlikely(fatal+AF8-signal+AF8-pending(current))) return i ? i : -ERESTARTSYS+ADs- cond+AF8-resched()+ADs- +- inode +AD0- dax+AF8-truncate+AF8-lock(vma, foll+AF8-flags)+ADs- page +AD0- follow+AF8-page+AF8-mask(vma, start, foll+AF8-flags, +ACY-pag= e+AF8-mask)+ADs- +- dax+AF8-truncate+AF8-unlock(inode)+ADs- if (+ACE-page) +AHs- int ret+ADs- ret +AD0- faultin+AF8-page(tsk, vma, start, +ACY-foll+AF8-flags, commit 67d952314e9989b3b1945c50488f4a0f760264c3 Author: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- Date: Tue Oct 24 13:41:22 2017 -0700 xfs: wire up dax dma waiting =20 The dax-dma vs truncate collision avoidance involves acquiring the new i+AF8-dax+AF8-dmasem and validating the no ranges that are to be mapped= out of the file are active for dma. If any are found we wait for page idle and retry the scan. The locations where we implement this wait line up with where we currently wait for pnfs layout leases to expire. =20 Since we need both dma to be idle and leases to be broken, and since xfs+AF8-break+AF8-layouts drops locks, we need to retry the dma busy sc= an until we can complete one that finds no busy pages. =20 Cc: Jan Kara +ADw-jack+AEA-suse.cz+AD4- Cc: Dave Chinner +ADw-david+AEA-fromorbit.com+AD4- Cc: +ACI-Darrick J. Wong+ACI- +ADw-darrick.wong+AEA-oracle.com+AD4- Cc: Ross Zwisler +ADw-ross.zwisler+AEA-linux.intel.com+AD4- Cc: Christoph Hellwig +ADw-hch+AEA-lst.de+AD4- Signed-off-by: Dan Williams +ADw-dan.j.williams+AEA-intel.com+AD4- diff --git a/fs/xfs/xfs+AF8-file.c b/fs/xfs/xfs+AF8-file.c index c6780743f8ec..e3ec46c28c60 100644 --- a/fs/xfs/xfs+AF8-file.c +-+-+- b/fs/xfs/xfs+AF8-file.c +AEAAQA- -347,7 +-347,7 +AEAAQA- xfs+AF8-file+AF8-aio+AF8-write+AF8-checks( return error+ADs- =20 error +AD0- xfs+AF8-break+AF8-layouts(inode, iolock)+ADs- - if (error) +- if (error +ADw- 0) return error+ADs- =20 /+ACo- +AEAAQA- -762,7 +-762,7 +AEAAQA- xfs+AF8-file+AF8-fallocate( struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- long error+ADs- enum xfs+AF8-prealloc+AF8-flags flags +AD0- 0+ADs- - uint iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- uint iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- loff+AF8-t new+AF8-size +AD0- 0+ADs- bool do+AF8-file+AF8-insert +AD0- 0+ADs- =20 +AEAAQA- -771,10 +-771,20 +AEAAQA- xfs+AF8-file+AF8-fallocate( if (mode +ACY- +AH4-XFS+AF8-FALLOC+AF8-FL+AF8-SUPPORTED) return -EOPNOTSUPP+ADs- =20 +-retry: xfs+AF8-ilock(ip, iolock)+ADs- +- dax+AF8-wait+AF8-dma(inode-+AD4-i+AF8-mapping, offset, len)+ADs- +- +- xfs+AF8-ilock(ip, XFS+AF8-IOLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-IOLOCK+AF8-EXCL+ADs- error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- - if (error) +- if (error +ADw- 0) goto out+AF8-unlock+ADs- +- else if (error +AD4- 0 +ACYAJg- IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX)) += AHs- +- xfs+AF8-iunlock(ip, iolock)+ADs- +- iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- +- goto retry+ADs- +- +AH0- =20 xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- diff --git a/fs/xfs/xfs+AF8-inode.c b/fs/xfs/xfs+AF8-inode.c index 4ec5b7f45401..783f15894b7b 100644 --- a/fs/xfs/xfs+AF8-inode.c +-+-+- b/fs/xfs/xfs+AF8-inode.c +AEAAQA- -171,7 +-171,14 +AEAAQA- xfs+AF8-ilock+AF8-attr+AF8-map+AF8-shared= ( +ACo- taken in places where we need to invalidate the page cache in a rac= e +ACo- free manner (e.g. truncate, hole punch and other extent manipulatio= n +ACo- functions). - +ACo-/ +- +ACo- +- +ACo- The XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED lock is a CONFIG+AF8-FS+AF8= -DAX special case lock +- +ACo- for synchronizing truncate vs ongoing DMA. The get+AF8-user+AF8-pa= ges() path +- +ACo- will hold this lock exclusively when incrementing page reference +- +ACo- counts for DMA. Before an extent can be truncated we need to compl= ete +- +ACo- a validate-idle sweep of all pages in the range while holding this +- +ACo- lock in shared mode. +-+ACo-/ void xfs+AF8-ilock( xfs+AF8-inode+AF8-t +ACo-ip, +AEAAQA- -192,6 +-199,9 +AEAAQA- xfs+AF8-ilock( (XFS+AF8-ILOCK+AF8-SHARED +AHw- XFS+AF8-ILOCK+AF8-EXCL))+ADs- ASSERT((lock+AF8-flags +ACY- +AH4-(XFS+AF8-LOCK+AF8-MASK +AHw- XFS+AF8-LO= CK+AF8-SUBCLASS+AF8-MASK)) +AD0APQ- 0)+ADs- =20 +- if (lock+AF8-flags +ACY- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) +- i+AF8-daxdma+AF8-lock+AF8-shared(VFS+AF8-I(ip))+ADs- +- if (lock+AF8-flags +ACY- XFS+AF8-IOLOCK+AF8-EXCL) +AHs- down+AF8-write+AF8-nested(+ACY-VFS+AF8-I(ip)-+AD4-i+AF8-rwsem, XFS+AF8-IOLOCK+AF8-DEP(lock+AF8-flags))+ADs- +AEAAQA- -328,6 +-338,9 +AEAAQA- xfs+AF8-iunlock( else if (lock+AF8-flags +ACY- XFS+AF8-ILOCK+AF8-SHARED) mrunlock+AF8-shared(+ACY-ip-+AD4-i+AF8-lock)+ADs- =20 +- if (lock+AF8-flags +ACY- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) +- i+AF8-daxdma+AF8-unlock+AF8-shared(VFS+AF8-I(ip))+ADs- +- trace+AF8-xfs+AF8-iunlock(ip, lock+AF8-flags, +AF8-RET+AF8-IP+AF8-)+ADs- +AH0- =20 diff --git a/fs/xfs/xfs+AF8-inode.h b/fs/xfs/xfs+AF8-inode.h index 0ee453de239a..0662edf00529 100644 --- a/fs/xfs/xfs+AF8-inode.h +-+-+- b/fs/xfs/xfs+AF8-inode.h +AEAAQA- -283,10 +-283,12 +AEAAQA- static inline void xfs+AF8-ifunlock(stru= ct xfs+AF8-inode +ACo-ip) +ACM-define XFS+AF8-ILOCK+AF8-SHARED (1+ADwAPA-3) +ACM-define XFS+AF8-MMAPLOCK+AF8-EXCL (1+ADwAPA-4) +ACM-define XFS+AF8-MMAPLOCK+AF8-SHARED (1+ADwAPA-5) +-+ACM-define XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED (1+ADwAPA-6) =20 +ACM-define XFS+AF8-LOCK+AF8-MASK (XFS+AF8-IOLOCK+AF8-EXCL +AHw- XFS+AF8-= IOLOCK+AF8-SHARED +AFw- +AHw- XFS+AF8-ILOCK+AF8-EXCL +AHw- XFS+AF8-ILOCK+AF8-SHARED +AFw- - +AHw- XFS+AF8-MMAPLOCK+AF8-EXCL +AHw- XFS+AF8-MMAPLOCK+AF8-SHARED) +- +AHw- XFS+AF8-MMAPLOCK+AF8-EXCL +AHw- XFS+AF8-MMAPLOCK+AF8-SHARED +AF= w- +- +AHw- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED) =20 +ACM-define XFS+AF8-LOCK+AF8-FLAGS +AFw- +AHs- XFS+AF8-IOLOCK+AF8-EXCL, +ACI-IOLOCK+AF8-EXCL+ACI- +AH0-, +AFw- +AEAAQA- -294,7 +-296,8 +AEAAQA- static inline void xfs+AF8-ifunlock(struct= xfs+AF8-inode +ACo-ip) +AHs- XFS+AF8-ILOCK+AF8-EXCL, +ACI-ILOCK+AF8-EXCL+ACI- +AH0-, +AFw- +AHs- XFS+AF8-ILOCK+AF8-SHARED, +ACI-ILOCK+AF8-SHARED+ACI- +AH0-, +AFw- +AHs- XFS+AF8-MMAPLOCK+AF8-EXCL, +ACI-MMAPLOCK+AF8-EXCL+ACI- +AH0-, +AFw- - +AHs- XFS+AF8-MMAPLOCK+AF8-SHARED, +ACI-MMAPLOCK+AF8-SHARED+ACI- +AH0- +- +AHs- XFS+AF8-MMAPLOCK+AF8-SHARED, +ACI-MMAPLOCK+AF8-SHARED+ACI- +AH0-, = +AFw- +- +AHs- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED, +ACI-XFS+AF8-DAXDMA+AF8-LOCK+A= F8-SHARED+ACI- +AH0- =20 =20 /+ACo- diff --git a/fs/xfs/xfs+AF8-ioctl.c b/fs/xfs/xfs+AF8-ioctl.c index aa75389be8cf..fd384ea00ede 100644 --- a/fs/xfs/xfs+AF8-ioctl.c +-+-+- b/fs/xfs/xfs+AF8-ioctl.c +AEAAQA- -612,7 +-612,7 +AEAAQA- xfs+AF8-ioc+AF8-space( struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- struct iattr iattr+ADs- enum xfs+AF8-prealloc+AF8-flags flags +AD0- 0+ADs- - uint iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- uint iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- int error+ADs- =20 /+ACo- +AEAAQA- -637,18 +-637,6 +AEAAQA- xfs+AF8-ioc+AF8-space( if (filp-+AD4-f+AF8-mode +ACY- FMODE+AF8-NOCMTIME) flags +AHwAPQ- XFS+AF8-PREALLOC+AF8-INVISIBLE+ADs- =20 - error +AD0- mnt+AF8-want+AF8-write+AF8-file(filp)+ADs- - if (error) - return error+ADs- - - xfs+AF8-ilock(ip, iolock)+ADs- - error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- - if (error) - goto out+AF8-unlock+ADs- - - xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- - iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- - switch (bf-+AD4-l+AF8-whence) +AHs- case 0: /+ACo-SEEK+AF8-SET+ACo-/ break+ADs- +AEAAQA- -659,10 +-647,31 +AEAAQA- xfs+AF8-ioc+AF8-space( bf-+AD4-l+AF8-start +-+AD0- XFS+AF8-ISIZE(ip)+ADs- break+ADs- default: - error +AD0- -EINVAL+ADs- +- return -EINVAL+ADs- +- +AH0- +- +- error +AD0- mnt+AF8-want+AF8-write+AF8-file(filp)+ADs- +- if (error) +- return error+ADs- +- +-retry: +- xfs+AF8-ilock(ip, iolock)+ADs- +- dax+AF8-wait+AF8-dma(inode-+AD4-i+AF8-mapping, bf-+AD4-l+AF8-start, bf-+= AD4-l+AF8-len)+ADs- +- +- xfs+AF8-ilock(ip, XFS+AF8-IOLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-IOLOCK+AF8-EXCL+ADs- +- error +AD0- xfs+AF8-break+AF8-layouts(inode, +ACY-iolock)+ADs- +- if (error +ADw- 0) goto out+AF8-unlock+ADs- +- else if (error +AD4- 0 +ACYAJg- IS+AF8-ENABLED(CONFIG+AF8-FS+AF8-DAX)) += AHs- +- xfs+AF8-iunlock(ip, iolock)+ADs- +- iolock +AD0- XFS+AF8-DAXDMA+AF8-LOCK+AF8-SHARED+ADs- +- goto retry+ADs- +AH0- =20 +- xfs+AF8-ilock(ip, XFS+AF8-MMAPLOCK+AF8-EXCL)+ADs- +- iolock +AHwAPQ- XFS+AF8-MMAPLOCK+AF8-EXCL+ADs- +- /+ACo- +ACo- length of +ADwAPQ- 0 for resv/unresv/zero is invalid. length for +ACo- alloc/free is ignored completely and we have no idea what userspac= e diff --git a/fs/xfs/xfs+AF8-pnfs.c b/fs/xfs/xfs+AF8-pnfs.c index 4246876df7b7..5f4d46b3cd7f 100644 --- a/fs/xfs/xfs+AF8-pnfs.c +-+-+- b/fs/xfs/xfs+AF8-pnfs.c +AEAAQA- -35,18 +-35,19 +AEAAQA- xfs+AF8-break+AF8-layouts( uint +ACo-iolock) +AHs- struct xfs+AF8-inode +ACo-ip +AD0- XFS+AF8-I(inode)+ADs- - int error+ADs- +- int error, did+AF8-unlock +AD0- 0+ADs- =20 ASSERT(xfs+AF8-isilocked(ip, XFS+AF8-IOLOCK+AF8-SHARED+AHw-XFS+AF8-IOLOCK= +AF8-EXCL))+ADs- =20 while ((error +AD0- break+AF8-layout(inode, false) +AD0APQ- -EWOULDBLOCK)= ) +AHs- xfs+AF8-iunlock(ip, +ACo-iolock)+ADs- +- did+AF8-unlock +AD0- 1+ADs- error +AD0- break+AF8-layout(inode, true)+ADs- +ACo-iolock +AD0- XFS+AF8-IOLOCK+AF8-EXCL+ADs- xfs+AF8-ilock(ip, +ACo-iolock)+ADs- +AH0- =20 - return error+ADs- +- return error +ADw- 0 ? error : did+AF8-unlock+ADs- +AH0- =20 /+ACo- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org