From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C86AC4332D for ; Fri, 20 Mar 2020 00:03:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EAA0720772 for ; Fri, 20 Mar 2020 00:03:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="niE0QFwA" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EAA0720772 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4E0146B0003; Thu, 19 Mar 2020 20:03:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4912F6B0006; Thu, 19 Mar 2020 20:03:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37F536B0007; Thu, 19 Mar 2020 20:03:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id 20D446B0003 for ; Thu, 19 Mar 2020 20:03:49 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id CD97A824556B for ; Fri, 20 Mar 2020 00:03:48 +0000 (UTC) X-FDA: 76613792136.08.waves72_5b7681db0bd11 X-HE-Tag: waves72_5b7681db0bd11 X-Filterd-Recvd-Size: 10411 Received: from mail-qk1-f194.google.com (mail-qk1-f194.google.com [209.85.222.194]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Fri, 20 Mar 2020 00:03:48 +0000 (UTC) Received: by mail-qk1-f194.google.com with SMTP id f28so5139494qkk.13 for ; Thu, 19 Mar 2020 17:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=1pJxpq8bFhAdtFvjymRkDlT+zzgC3t93bLWt4+LdA0o=; b=niE0QFwAScsJNksiL1RD0EoZyWLTkmIFeNZTwtNcFLkMjDayi0bxMD56+ydBpxjFer ccfFIEFWOAqX7S6DF8dy2B90sjOJd11kjQNv+zzxmWUfQ+NuUvGWeapza5y7m3aL1wJG j3hvcJFjjm2RHXKkVcIUMT/Unai34kkTifnPtPSjJsdNLmUD0nhajqJhX+Agn9Kz3t8d MseyS+KCBRWIY/Clnpsqfu2H+lIZb7sticYp/HJnFGSWY7gUq7c/Zqf45npz2Z8DGQ78 mXMSeVQdS57T+ekPAXfTCqZyqZmCir8LMmtQu7NDGFXY1YDgmRsH7Uq3cAhojnHv5w1y bUgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=1pJxpq8bFhAdtFvjymRkDlT+zzgC3t93bLWt4+LdA0o=; b=ef+OySw2/D8gkAGJmbraHYjQLV3lu78xcrMqSAIp7iKFlQ8G2gQYq9h+58IR9YTsNF C16uL4BjrXcezvVUrTRG2+Cdhlo+KBB7ZZj1lzhdZWzg9qXuDW+AwLnTM5cuBo3c/qCa jQMV9nHEmdtTrJL/KWbNAvCwyTl9RjuxegJv0xeyrP41WmDUAmIioCWqIYJ41HR2dV19 hXWw8XJnQ99t5znSW/Fh3ev9JFsv+5awBY6hXzzfDkG5hGfS3HzYVffbYFhib/LsfgYv B05OeX/79mOrynPP8oyazPPEGULFrUHYHPoNWNs2xTRTtR5KM4gWIbXdXPjKd9MDNbKe wjug== X-Gm-Message-State: ANhLgQ24g1QfgZ6oTuBLrnV5HOzH8t3xE7lZmrlvuVq0Z1HkfG/ycolY b4sy8suh3GyN/j00HZrVnOLwrA== X-Google-Smtp-Source: ADFU+vvmojx05al3/dBJDAgC1QulIXZbUAIDZwSQ+speDrKVTJZR9ajSBjpmE5SDOXsMKVejlONOTg== X-Received: by 2002:a37:b44:: with SMTP id 65mr5064085qkl.201.1584662627565; Thu, 19 Mar 2020 17:03:47 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id h11sm139725qtr.38.2020.03.19.17.03.46 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 19 Mar 2020 17:03:47 -0700 (PDT) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1jF58b-00029b-D5; Thu, 19 Mar 2020 21:03:45 -0300 Date: Thu, 19 Mar 2020 21:03:45 -0300 From: Jason Gunthorpe To: Ralph Campbell Cc: Christoph Hellwig , Dan Williams , Bharata B Rao , Christian =?utf-8?B?S8O2bmln?= , Ben Skeggs , Jerome Glisse , kvm-ppc@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault Message-ID: <20200320000345.GO20941@ziepe.ca> References: <20200316193216.920734-4-hch@lst.de> <7256f88d-809e-4aba-3c46-a223bd8cc521@nvidia.com> <20200317121536.GQ20941@ziepe.ca> <20200317122445.GA11662@lst.de> <20200317122813.GA11866@lst.de> <20200317124755.GR20941@ziepe.ca> <20200317125955.GA12847@lst.de> <24fca825-3b0f-188f-bcf2-fadcf3a9f05a@nvidia.com> <20200319181716.GK20941@ziepe.ca> <89e33770-a0ab-e1ec-d5e5-535edefd3fd3@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <89e33770-a0ab-e1ec-d5e5-535edefd3fd3@nvidia.com> User-Agent: Mutt/1.9.4 (2018-02-28) Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 19, 2020 at 03:56:50PM -0700, Ralph Campbell wrote: > Adding linux-kselftest@vger.kernel.org for the test config question. >=20 > On 3/19/20 11:17 AM, Jason Gunthorpe wrote: > > On Tue, Mar 17, 2020 at 04:14:31PM -0700, Ralph Campbell wrote: > > >=20 > > > On 3/17/20 5:59 AM, Christoph Hellwig wrote: > > > > On Tue, Mar 17, 2020 at 09:47:55AM -0300, Jason Gunthorpe wrote: > > > > > I've been using v7 of Ralph's tester and it is working well - i= t has > > > > > DEVICE_PRIVATE support so I think it can test this flow too. Ra= lph are > > > > > you able? > > > > >=20 > > > > > This hunk seems trivial enough to me, can we include it now? > > > >=20 > > > > I can send a separate patch for it once the tester covers it. I = don't > > > > want to add it to the original patch as it is a significant behav= ior > > > > change compared to the existing code. > > > >=20 > > >=20 > > > Attached is an updated version of my HMM tests based on linux-5.6.0= -rc6. > > > I ran this OK with Jason's 8+1 HMM patches, Christoph's 1-5 misc HM= M clean ups, > > > and Christoph's 1-4 device private page changes applied. > >=20 > > I'd like to get this to mergable, it looks pretty good now, but I hav= e > > no idea about selftests - and I'm struggling to even compile the tool= s > > dir > >=20 > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug > > > index 69def4a9df00..4d22ce7879a7 100644 > > > +++ b/lib/Kconfig.debug > > > @@ -2162,6 +2162,18 @@ config TEST_MEMINIT > > > If unsure, say N. > > > +config TEST_HMM > > > + tristate "Test HMM (Heterogeneous Memory Management)" > > > + depends on DEVICE_PRIVATE > > > + select HMM_MIRROR > > > + select MMU_NOTIFIER > >=20 > > extra spaces >=20 > Will fix in v8. >=20 > > In general I wonder if it even makes sense that DEVICE_PRIVATE is use= r > > selectable? >=20 > Should tests enable the feature or the feature enable the test? > IMHO, if the feature is being compiled into the kernel, that should > enable the menu item for the test. If the feature isn't selected, > no need to test it :-) I ment if DEVICE_PRIVATE should be a user selectable option at all, or should it be turned on when a driver like nouveau is selected. Is there some downside to enabling DEVICE_PRIVATE? > > The notifier holds a mmgrab, no need for another one >=20 > OK. I'll replace dmirror->mm with dmirror->notifier.mm. Right that is good too > > > + filp->private_data =3D dmirror; > >=20 > > Not sure what this comment means >=20 > I'll change the comment to: > /* > * The first open of the device character file registers the ad= dress > * space of the process doing the open() system call with the d= evice. > * Subsequent file opens by other processes will have access to= the > * first process' address space. > */ How does this happen? The function looks like it always does the same thi= ng > > > +static bool dmirror_interval_invalidate(struct mmu_interval_notifi= er *mni, > > > + const struct mmu_notifier_range *range, > > > + unsigned long cur_seq) > > > +{ > > > + struct dmirror *dmirror =3D container_of(mni, struct dmirror, not= ifier); > > > + struct mm_struct *mm =3D dmirror->mm; > > > + > > > + /* > > > + * If the process doesn't exist, we don't need to invalidate the > > > + * device page table since the address space will be torn down. > > > + */ > > > + if (!mmget_not_zero(mm)) > > > + return true; > >=20 > > Why? Don't the notifiers provide for this already. > >=20 > > mmget_not_zero() is required before calling hmm_range_fault() though Oh... This is the invalidate_all path during invalidation IMHO you should test the invalidation reason in the range to exclude this. But xa_erase looks totally safe so there should be no reason to do that. > This is a workaround for a problem I don't quite understand. > If you change tools/testing/selftests/vm/hmm-tests.c line 868 to > ASSERT_EQ(ret, -1); > Then the test will abort, core dump, and cause two problems, > 1) the migrated page will be faulted back to system memory in order to = write > it to the core dump. This triggers lockdep_assert_held(&walk.mm->mma= p_sem) > in walk_page_range(). Has the migration stuff become entangled with the xarray? > [ 137.980718] Code: 80 2f 1a 83 c6 05 e9 8d 7b 01 01 e8 3e b1 b1 fe e9= 05 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 56 41 55 41 54 = 55 <48> 89 fd 53 4c 8d 6d 10 e8 3c fc ff ff 49 89 c4 4c 89 e0 83 e0 03 > [ 137.999461] RSP: 0018:ffffc900015e77c8 EFLAGS: 00000246 ORIG_RAX: ff= ffffffffffff13 > [ 138.007028] RAX: ffff8886e508c408 RBX: 0000000000000000 RCX: fffffff= f82626c89 > [ 138.014159] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffffc90= 0015e78a0 > [ 138.021293] RBP: ffffc900015e78a0 R08: ffffffff811461c4 R09: fffff52= 0002bcf17 > [ 138.028426] R10: fffff520002bcf16 R11: 0000000000000003 R12: 0000000= 002606d10 > [ 138.035557] R13: ffff8886e508c448 R14: 0000000000000031 R15: fffffff= fa06546a0 > [ 138.042701] ? do_raw_spin_lock+0x104/0x1d0 > [ 138.046888] ? xas_store+0x19/0xa60 > [ 138.050390] xas_store+0x5b3/0xa60 > [ 138.053806] ? register_lock_class+0x860/0x860 > [ 138.058267] __xa_erase+0x96/0x110 > [ 138.061673] ? xas_store+0xa60/0xa60 > [ 138.065267] xa_erase+0x19/0x30 oh, it is doing this: static void mn_itree_release(struct mmu_notifier_subscriptions *subscript= ions, struct mm_struct *mm) { struct mmu_notifier_range range =3D { .flags =3D MMU_NOTIFIER_RANGE_BLOCKABLE, .event =3D MMU_NOTIFY_RELEASE, .mm =3D mm, .start =3D 0, .end =3D ULONG_MAX, }; ie it is sitting doing a huge number of xa_erases, I suppose. Probably in normal exit the notifier is removed before the mm is destroyed. The xa_erase needs to be a bit smarter to jump over gaps in the tree perhaps some xa_for_each() xa_erase() pattern? > > Also I get this: > >=20 > > lib/test_hmm.c: In function =E2=80=98dmirror_devmem_fault_alloc_and_c= opy=E2=80=99: > > lib/test_hmm.c:1041:25: warning: unused variable =E2=80=98vma=E2=80=99= [-Wunused-variable] > > 1041 | struct vm_area_struct *vma =3D args->vma; > >=20 > > But this is a kernel bug, due to alloc_page_vma being a #define not a > > static inline and me having CONFIG_NUMA off in this .config >=20 > Fixed. in gfp.h? Jason