From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jerome Glisse Subject: Re: DANGER WILL ROBINSON, DANGER Date: Thu, 15 Aug 2019 16:16:30 -0400 Message-ID: <20190815201630.GA25517@redhat.com> References: <20190809160047.8319-1-alazar@bitdefender.com> <20190809160047.8319-72-alazar@bitdefender.com> <20190809162444.GP5482@bombadil.infradead.org> <1565694095.D172a51.28640.@15f23d3a749365d981e968181cce585d2dcb3ffa> <20190815191929.GA9253@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <20190815191929.GA9253@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Adalbert =?utf-8?B?TGF6xINy?= Cc: Tamas K Lengyel , Weijiang Yang , Yu C , kvm@vger.kernel.org, Radim =?utf-8?B?S3LEjW3DocWZ?= , Jan Kiszka , Samuel =?iso-8859-1?Q?Laur=E9n?= , Konrad Rzeszutek Wilk , Matthew Wilcox , virtualization@lists.linux-foundation.org, linux-mm@kvack.org, Patrick Colp , Mathieu Tarral , Stefan Hajnoczi , Mircea =?iso-8859-1?Q?C=EErjaliu?= , Paolo Bonzini , Mihai =?utf-8?B?RG9uyJt1?= List-Id: virtualization@lists.linuxfoundation.org T24gVGh1LCBBdWcgMTUsIDIwMTkgYXQgMDM6MTk6MjlQTSAtMDQwMCwgSmVyb21lIEdsaXNzZSB3 cm90ZToKPiBPbiBUdWUsIEF1ZyAxMywgMjAxOSBhdCAwMjowMTozNVBNICswMzAwLCBBZGFsYmVy dCBMYXrEg3Igd3JvdGU6Cj4gPiBPbiBGcmksIDkgQXVnIDIwMTkgMDk6MjQ6NDQgLTA3MDAsIE1h dHRoZXcgV2lsY294IDx3aWxseUBpbmZyYWRlYWQub3JnPiB3cm90ZToKPiA+ID4gT24gRnJpLCBB dWcgMDksIDIwMTkgYXQgMDc6MDA6MjZQTSArMDMwMCwgQWRhbGJlcnQgTGF6xINyIHdyb3RlOgo+ ID4gPiA+ICsrKyBiL2luY2x1ZGUvbGludXgvcGFnZS1mbGFncy5oCj4gPiA+ID4gQEAgLTQxNyw4 ICs0MTcsMTAgQEAgUEFHRUZMQUcoSWRsZSwgaWRsZSwgUEZfQU5ZKQo+ID4gPiA+ICAgKi8KPiA+ ID4gPiAgI2RlZmluZSBQQUdFX01BUFBJTkdfQU5PTgkweDEKPiA+ID4gPiAgI2RlZmluZSBQQUdF X01BUFBJTkdfTU9WQUJMRQkweDIKPiA+ID4gPiArI2RlZmluZSBQQUdFX01BUFBJTkdfUkVNT1RF CTB4NAo+ID4gPiAKPiA+ID4gVWguICBIb3cgZG8geW91IGtub3cgcGFnZS0+bWFwcGluZyB3b3Vs ZCBvdGhlcndpc2UgaGF2ZSBiaXQgMiBjbGVhcj8KPiA+ID4gV2hvJ3MgZ3VhcmFudGVlaW5nIHRo YXQ/Cj4gPiA+IAo+ID4gPiBUaGlzIGlzIGFuIGF3ZnVsbHkgYmlnIHBhdGNoIHRvIHRoZSBtZW1v cnkgbWFuYWdlbWVudCBjb2RlLCBidXJpZWQgaW4KPiA+ID4gdGhlIG1pZGRsZSBvZiBhIGdpZ2Fu dGljIHNlcmllcyB3aGljaCBhbG1vc3QgZ3VhcmFudGVlcyBub2JvZHkgd291bGQKPiA+ID4gbG9v ayBhdCBpdC4gIEkgY2FsbCBzaGVuYW5pZ2Fucy4KPiA+ID4gCj4gPiA+ID4gQEAgLTEwMjEsNyAr MTAyMiw3IEBAIHZvaWQgcGFnZV9tb3ZlX2Fub25fcm1hcChzdHJ1Y3QgcGFnZSAqcGFnZSwgc3Ry dWN0IHZtX2FyZWFfc3RydWN0ICp2bWEpCj4gPiA+ID4gICAqIF9fcGFnZV9zZXRfYW5vbl9ybWFw IC0gc2V0IHVwIG5ldyBhbm9ueW1vdXMgcm1hcAo+ID4gPiA+ICAgKiBAcGFnZToJUGFnZSBvciBI dWdlcGFnZSB0byBhZGQgdG8gcm1hcAo+ID4gPiA+ICAgKiBAdm1hOglWTSBhcmVhIHRvIGFkZCBw YWdlIHRvLgo+ID4gPiA+IC0gKiBAYWRkcmVzczoJVXNlciB2aXJ0dWFsIGFkZHJlc3Mgb2YgdGhl IG1hcHBpbmcJCj4gPiA+ID4gKyAqIEBhZGRyZXNzOglVc2VyIHZpcnR1YWwgYWRkcmVzcyBvZiB0 aGUgbWFwcGluZwo+ID4gPiAKPiA+ID4gQW5kIG1peGluZyBpbiBmbHVmZiBjaGFuZ2VzIGxpa2Ug dGhpcyBpcyBhIHJlYWwgbm8tbm8uICBUcnkgYWdhaW4uCj4gPiA+IAo+ID4gCj4gPiBObyBiYWQg aW50ZW50aW9ucywganVzdCBvdmVyemVhbG91cy4KPiA+IEkgZGlkbid0IHdhbnQgdG8gaGlkZSBh bnl0aGluZyBmcm9tIG91ciBwYXRjaGVzLgo+ID4gT25jZSB3ZSBhZHZhbmNlIHdpdGggdGhlIGlu dHJvc3BlY3Rpb24gcGF0Y2hlcyByZWxhdGVkIHRvIEtWTSB3ZSdsbCBiZQo+ID4gYmFjayB3aXRo IHRoZSByZW1vdGUgbWFwcGluZyBwYXRjaCwgc3BsaXQgYW5kIGNsZWFuZWQuCj4gCj4gVGhleSBh cmUgbm90IGJpdCBsZWZ0IGluIHN0cnVjdCBwYWdlICEgTG9va2luZyBhdCB0aGUgcGF0Y2ggaXQg c2VlbXMKPiB5b3Ugd2FudCB0byBoYXZlIHlvdXIgb3duIHBpbiBjb3VudCBqdXN0IGZvciBLVk0u IFRoaXMgaXMgYmFkLCB3ZSBhcmUKPiBhbHJlYWR5IHRyeWluZyB0byBzb2x2ZSB0aGUgR1VQIHRo aW5nIChzZWUgYWxsIHZhcmlvdXMgcGF0Y2hzZXQgYWJvdXQKPiBHVVAgcG9zdGVkIHJlY2VudGx5 KS4KPiAKPiBZb3UgbmVlZCB0byByZXRoaW5rIGhvdyB5b3Ugd2FudCB0byBhY2hpZXZlIHRoaXMu IFdoeSBub3Qgc2ltcGx5IGEKPiByZW1vdGUgcmVhZCgpL3dyaXRlKCkgaW50byB0aGUgcHJvY2Vz cyBtZW1vcnkgaWUgS1ZNSSB3b3VsZCBjYWxsCj4gYW4gaW9jdGwgdGhhdCBhbGxvdyB0byByZWFk IG9yIHdyaXRlIGludG8gYSByZW1vdGUgcHJvY2VzcyBtZW1vcnkKPiBsaWtlIHB0cmFjZSgpIGJ1 dCBvbiBzdGVyb2lkIC4uLgo+IAo+IEFkZGluZyB0aGlzIHdob2xlIGJpZyBjb21wbGV4IGluZnJh c3RydWN0dXJlIHdpdGhvdXQganVzdGlmaWNhdGlvbgo+IG9mIHdoeSB3ZSBuZWVkIHRvIGF2b2lk IHJvdW5kIHRyaXAgaXMganVzdCB0b28gbXVjaCByZWFsbHkuCgpUaGlua2luZyBhIGJpdCBtb3Jl IGFib3V0IHRoaXMsIHlvdSBjYW4gYWNoaWV2ZSB0aGUgc2FtZSB0aGluZyB3aXRob3V0CmFkZGlu ZyBhIHNpbmdsZSBsaW5lIHRvIGFueSBtbSBjb2RlLiBJbnN0ZWFkIG9mIGhhdmluZyBtbWFwIHdp dGgKUFJPVF9OT05FIHwgTUFQX0xPQ0tFRCB5b3UgaGF2ZSB1c2Vyc3BhY2UgbW1hcCBzb21lIGt2 bSBkZXZpY2UgZmlsZQooaSBhbSBhc3N1bWluZyB0aGlzIGlzIHNvbWV0aGluZyB5b3UgYWxyZWFk eSBoYXZlIGFuZCBjYW4gY29udHJvbAp0aGUgbW1hcCBjYWxsYmFjaykuCgpTbyBub3cga2VybmVs IHNpZGUgeW91IGhhdmUgYSB2bWEgd2l0aCBhIHZtX29wZXJhdGlvbnNfc3RydWN0IHVuZGVyCnlv dXIgY29udHJvbCB0aGlzIG1lYW5zIHRoYXQgZXZlcnl0aGluZyB5b3Ugd2FudCB0byBibG9jayBt bSB3aXNlCmZyb20gd2l0aGluIHRoZSBpbnNwZWN0b3IgcHJvY2VzcyBjYW4gYmUgYmxvY2sgdGhy b3VnaCB0aG9zZSBjYWxsLQpiYWNrcyAoZmluZF9zcGVjaWFsX3BhZ2UoKSBzcGVjaWZpY2FseSBm b3Igd2hpY2ggeW91IGhhdmUgdG8gcmV0dXJuCk5VTEwgYWxsIHRoZSB0aW1lKS4KClRvIG1pcnJv ciB0YXJnZXQgcHJvY2VzcyBtZW1vcnkgeW91IGNhbiB1c2UgaG1tX21pcnJvciwgd2hlbiB5b3UK cG9wdWxhdGUgdGhlIGluc3BlY3RvciBwcm9jZXNzIHBhZ2UgdGFibGUgeW91IHVzZSBpbnNlcnRf cGZuKCkKKG1tYXAgb2YgdGhlIGt2bSBkZXZpY2UgZmlsZSBtdXN0IG1hcmsgdGhpcyB2bWEgYXMg UEZOTUFQKS4KCkJ5IGZvbGxvd2luZyB0aGUgaG1tX21pcnJvciBBUEksIGFueXRpbWUgdGhlIHRh cmdldCBwcm9jZXNzIGhhcwphIGNoYW5nZSBpbiBpdHMgcGFnZSB0YWJsZSAoaWUgdmlydHVhbCBh ZGRyZXNzIC0+IHBhZ2UpIHlvdSB3aWxsCmdldCBhIGNhbGxiYWNrIGFuZCBhbGwgeW91IGhhdmUg dG8gZG8gaXMgY2xlYXIgdGhlIHBhZ2UgdGFibGUKd2l0aGluIHRoZSBpbnNwZWN0b3IgcHJvY2Vz cyBhbmQgZmx1c2ggdGxiICh1c2UgemFwX3BhZ2VfcmFuZ2UpLgoKT24gcGFnZSBmYXVsdCB3aXRo aW4gdGhlIGluc3BlY3RvciBwcm9jZXNzIHRoZSBmYXVsdCBjYWxsYmFjayBvZgp2bV9vcHMgd2ls bCBnZXQgY2FsbCBhbmQgZnJvbSB0aGVyZSB5b3UgY2FsbCBobW1fbWlycm9yIGZvbGxvd2luZwpp dHMgQVBJLgoKT2ggYWxzbyBtYXJrIHRoZSB2bWEgd2l0aCBWTV9XSVBFT05GT1JLIHRvIGF2b2lk IGFueSBpc3N1ZSBpZiB0aGUKaW5zcGVjdG9yIHByb2Nlc3MgdXNlIGZvcmsoKSAoeW91IGNvdWxk IHN1cHBvcnQgZm9yayBidXQgdGhlbiB5b3UKd291bGQgbmVlZCB0byBtYXJrIHRoZSB2bWEgYXMg U0hBUkVEIGFuZCB1c2UgdW5tYXBfbWFwcGluZ19wYWdlcwppbnN0ZWFkIG9mIHphcF9wYWdlX3Jh bmdlKS4KCgpUaGVyZSBldmVyeXRoaW5nIHlvdSB3YW50IHRvIGRvIHdpdGggYWxyZWFkeSB1cHN0 cmVhbSBtbSBjb2RlLgoKQ2hlZXJzLApKw6lyw7RtZQpfX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fXwpWaXJ0dWFsaXphdGlvbiBtYWlsaW5nIGxpc3QKVmlydHVh bGl6YXRpb25AbGlzdHMubGludXgtZm91bmRhdGlvbi5vcmcKaHR0cHM6Ly9saXN0cy5saW51eGZv dW5kYXRpb24ub3JnL21haWxtYW4vbGlzdGluZm8vdmlydHVhbGl6YXRpb24= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,SUBJ_ALL_CAPS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40938C3A589 for ; Thu, 15 Aug 2019 20:16:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 175082086C for ; Thu, 15 Aug 2019 20:16:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730723AbfHOUQh (ORCPT ); Thu, 15 Aug 2019 16:16:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58069 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726370AbfHOUQh (ORCPT ); Thu, 15 Aug 2019 16:16:37 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5257BC057EC6; Thu, 15 Aug 2019 20:16:36 +0000 (UTC) Received: from redhat.com (unknown [10.20.6.178]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2ED628CF81; Thu, 15 Aug 2019 20:16:32 +0000 (UTC) Date: Thu, 15 Aug 2019 16:16:30 -0400 From: Jerome Glisse To: Adalbert =?utf-8?B?TGF6xINy?= Cc: Matthew Wilcox , kvm@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Paolo Bonzini , Radim =?utf-8?B?S3LEjW3DocWZ?= , Konrad Rzeszutek Wilk , Tamas K Lengyel , Mathieu Tarral , Samuel =?iso-8859-1?Q?Laur=E9n?= , Patrick Colp , Jan Kiszka , Stefan Hajnoczi , Weijiang Yang , Yu C , Mihai =?utf-8?B?RG9uyJt1?= , Mircea =?iso-8859-1?Q?C=EErjaliu?= Subject: Re: DANGER WILL ROBINSON, DANGER Message-ID: <20190815201630.GA25517@redhat.com> References: <20190809160047.8319-1-alazar@bitdefender.com> <20190809160047.8319-72-alazar@bitdefender.com> <20190809162444.GP5482@bombadil.infradead.org> <1565694095.D172a51.28640.@15f23d3a749365d981e968181cce585d2dcb3ffa> <20190815191929.GA9253@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190815191929.GA9253@redhat.com> User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Thu, 15 Aug 2019 20:16:36 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Thu, Aug 15, 2019 at 03:19:29PM -0400, Jerome Glisse wrote: > On Tue, Aug 13, 2019 at 02:01:35PM +0300, Adalbert Lazăr wrote: > > On Fri, 9 Aug 2019 09:24:44 -0700, Matthew Wilcox wrote: > > > On Fri, Aug 09, 2019 at 07:00:26PM +0300, Adalbert Lazăr wrote: > > > > +++ b/include/linux/page-flags.h > > > > @@ -417,8 +417,10 @@ PAGEFLAG(Idle, idle, PF_ANY) > > > > */ > > > > #define PAGE_MAPPING_ANON 0x1 > > > > #define PAGE_MAPPING_MOVABLE 0x2 > > > > +#define PAGE_MAPPING_REMOTE 0x4 > > > > > > Uh. How do you know page->mapping would otherwise have bit 2 clear? > > > Who's guaranteeing that? > > > > > > This is an awfully big patch to the memory management code, buried in > > > the middle of a gigantic series which almost guarantees nobody would > > > look at it. I call shenanigans. > > > > > > > @@ -1021,7 +1022,7 @@ void page_move_anon_rmap(struct page *page, struct vm_area_struct *vma) > > > > * __page_set_anon_rmap - set up new anonymous rmap > > > > * @page: Page or Hugepage to add to rmap > > > > * @vma: VM area to add page to. > > > > - * @address: User virtual address of the mapping > > > > + * @address: User virtual address of the mapping > > > > > > And mixing in fluff changes like this is a real no-no. Try again. > > > > > > > No bad intentions, just overzealous. > > I didn't want to hide anything from our patches. > > Once we advance with the introspection patches related to KVM we'll be > > back with the remote mapping patch, split and cleaned. > > They are not bit left in struct page ! Looking at the patch it seems > you want to have your own pin count just for KVM. This is bad, we are > already trying to solve the GUP thing (see all various patchset about > GUP posted recently). > > You need to rethink how you want to achieve this. Why not simply a > remote read()/write() into the process memory ie KVMI would call > an ioctl that allow to read or write into a remote process memory > like ptrace() but on steroid ... > > Adding this whole big complex infrastructure without justification > of why we need to avoid round trip is just too much really. Thinking a bit more about this, you can achieve the same thing without adding a single line to any mm code. Instead of having mmap with PROT_NONE | MAP_LOCKED you have userspace mmap some kvm device file (i am assuming this is something you already have and can control the mmap callback). So now kernel side you have a vma with a vm_operations_struct under your control this means that everything you want to block mm wise from within the inspector process can be block through those call- backs (find_special_page() specificaly for which you have to return NULL all the time). To mirror target process memory you can use hmm_mirror, when you populate the inspector process page table you use insert_pfn() (mmap of the kvm device file must mark this vma as PFNMAP). By following the hmm_mirror API, anytime the target process has a change in its page table (ie virtual address -> page) you will get a callback and all you have to do is clear the page table within the inspector process and flush tlb (use zap_page_range). On page fault within the inspector process the fault callback of vm_ops will get call and from there you call hmm_mirror following its API. Oh also mark the vma with VM_WIPEONFORK to avoid any issue if the inspector process use fork() (you could support fork but then you would need to mark the vma as SHARED and use unmap_mapping_pages instead of zap_page_range). There everything you want to do with already upstream mm code. Cheers, Jérôme