From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CD7F9C433F5 for ; Wed, 2 Mar 2022 07:03:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4119F10ED99; Wed, 2 Mar 2022 07:03:54 +0000 (UTC) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3D54A10ED9B for ; Wed, 2 Mar 2022 07:03:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1646204633; x=1677740633; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=AJsWiuHt+5a2alWYfcuOVqSTwjn017fRmVmPeONbo24=; b=INOUdGNukItvogGCvtU3h0hp0TNNPr/PzGnhs+nTztxHoNgw9l/VBCzW 9jLYoGfGjUqsUHYsItIVVYLCCthwsf5jyy0wy9Q9YkQ3xUp91N+2whuwj Ra+har5rCYOW6yLAN2TGjGSxk+S2lO7nOObpCX0tPOjC0KXza2CVGJpKe 0sXXk9ZZ2RhYG9nkyg8Fqnls7Z6YD8/T+y3MjZd6wSykpvpgraWTkNamR lKrnZgKq3smiV1qHTMO3LbkOxsBx0WyJcqMmOXGn1gkQ/bNzD9jdSSdJe 5/99DylyX1mTZ4aFVZsrndz6ZS461/Xu8quldjBl184yyH5xweGXov6bB g==; X-IronPort-AV: E=McAfee;i="6200,9189,10273"; a="250902900" X-IronPort-AV: E=Sophos;i="5.90,148,1643702400"; d="scan'208";a="250902900" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Mar 2022 23:03:52 -0800 X-IronPort-AV: E=Sophos;i="5.90,148,1643702400"; d="scan'208";a="593909401" Received: from ccrisan-mobl3.ger.corp.intel.com (HELO [10.249.254.224]) ([10.249.254.224]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Mar 2022 23:03:51 -0800 Message-ID: From: Thomas =?ISO-8859-1?Q?Hellstr=F6m?= To: Niranjana Vishwanathapura Date: Wed, 02 Mar 2022 08:03:20 +0100 In-Reply-To: <20220302031326.GD25117@nvishwa1-DESK> References: <20220222171030.690214-1-thomas.hellstrom@linux.intel.com> <20220302031326.GD25117@nvishwa1-DESK> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.40.4 (3.40.4-2.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: Re: [Intel-gfx] [PATCH 1/2] HAX: drm/i915: Clarify vma lifetime X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: intel-gfx@lists.freedesktop.org, matthew.auld@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" On Tue, 2022-03-01 at 19:13 -0800, Niranjana Vishwanathapura wrote: > On Tue, Feb 22, 2022 at 06:10:29PM +0100, Thomas Hellström wrote: > > It's unclear what reference the initial vma kref reference refers > > to. > > A vma can have multiple weak references, the object vma list, > > the vm's bound list and the GT's closed_list, and the initial vma > > reference can be put from lookups of all these lists. > > > > With the current implementation this means > > that any holder of yet another vma refcount (currently only > > i915_gem_object_unbind()) needs to be holding two of either > > *) An object refcount, > > *) A vm open count > > *) A vma open count > > > > in order for us to not risk leaking a reference by having the > > initial vma reference being put twice. > > > > Address this by re-introducing i915_vma_destroy() which removes all > > weak references of the vma and *then* puts the initial vma > > refcount. > > This makes a strong vma reference hold on to the vma > > unconditionally. > > > > Perhaps a better name would be i915_vma_revoke() or > > i915_vma_zombify(), > > since other callers may still hold a refcount, but with the > > prospect of > > being able to replace the vma refcount with the object lock in the > > near > > future, let's stick with i915_vma_destroy(). > > > > Finally this commit fixes a race in that previously > > i915_vma_release() and > > now i915_vma_destroy() could destroy a vma without taking the vm- > > >mutex > > after an advisory check that the vma mm_node was not allocated. > > This would race with the ungrab_vma() function creating a trace > > similar > > to the below one. This was fixed in one of the __i915_vma_put() > > callsites > > in > > commit bc1922e5d349 ("drm/i915: Fix a race between vma / object > > destruction and unbinding") > > but although not seemingly triggered by CI, that > > is not sufficient. This patch is needed to fix that properly. > > > > [823.012188] Console: switching to colour dummy device 80x25 > > [823.012422] [IGT] gem_ppgtt: executing > > [823.016667] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctx0 > > [852.436465] stack segment: 0000 [#1] PREEMPT SMP NOPTI > > [852.436480] CPU: 0 PID: 3200 Comm: gem_ppgtt Not tainted 5.16.0- > > CI-CI_DRM_11115+ #1 > > [852.436489] Hardware name: Intel Corporation Alder Lake Client > > Platform/AlderLake-P DDR5 RVP, BIOS > > ADLPFWI1.R00.2422.A00.2110131104 10/13/2021 > > [852.436499] RIP: 0010:ungrab_vma+0x9/0x80 [i915] > > [852.436711] Code: ef e8 4b 85 cf e0 e8 36 a3 d6 e0 8b 83 f8 9c 00 > > 00 85 c0 75 e1 5b 5d 41 5c 41 5d c3 e9 d6 fd 14 00 55 53 48 8b af > > c0 00 00 00 <8b> 45 00 85 c0 75 03 5b 5d c3 48 8b 85 a0 02 00 00 48 > > 89 fb 48 8b > > [852.436727] RSP: 0018:ffffc90006db7880 EFLAGS: 00010246 > > [852.436734] RAX: 0000000000000000 RBX: ffffc90006db7598 RCX: > > 0000000000000000 > > [852.436742] RDX: ffff88815349e898 RSI: ffff88815349e858 RDI: > > ffff88810a284140 > > [852.436748] RBP: 6b6b6b6b6b6b6b6b R08: ffff88815349e898 R09: > > ffff88815349e8e8 > > [852.436754] R10: 0000000000000001 R11: 0000000051ef1141 R12: > > ffff88810a284140 > > [852.436762] R13: 0000000000000000 R14: ffff88815349e868 R15: > > ffff88810a284458 > > [852.436770] FS:  00007f5c04b04e40(0000) GS:ffff88849f000000(0000) > > knlGS:0000000000000000 > > [852.436781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [852.436788] CR2: 00007f5c04b38fe0 CR3: 000000010a6e8001 CR4: > > 0000000000770ef0 > > [852.436797] PKRU: 55555554 > > [852.436801] Call Trace: > > [852.436806]  > > [852.436811]  i915_gem_evict_for_node+0x33c/0x3c0 [i915] > > [852.437014]  i915_gem_gtt_reserve+0x106/0x130 [i915] > > [852.437211]  i915_vma_pin_ww+0x8f4/0xb60 [i915] > > [852.437412]  eb_validate_vmas+0x688/0x860 [i915] > > [852.437596]  i915_gem_do_execbuffer+0xc0e/0x25b0 [i915] > > [852.437770]  ? deactivate_slab+0x5f2/0x7d0 > > [852.437778]  ? _raw_spin_unlock_irqrestore+0x50/0x60 > > [852.437789]  ? i915_gem_execbuffer2_ioctl+0xc6/0x2c0 [i915] > > [852.437944]  ? init_object+0x49/0x80 > > [852.437950]  ? __lock_acquire+0x5e6/0x2580 > > [852.437963]  i915_gem_execbuffer2_ioctl+0x116/0x2c0 [i915] > > [852.438129]  ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915] > > [852.438300]  drm_ioctl_kernel+0xac/0x140 > > [852.438310]  drm_ioctl+0x201/0x3d0 > > [852.438316]  ? i915_gem_do_execbuffer+0x25b0/0x25b0 [i915] > > [852.438490]  __x64_sys_ioctl+0x6a/0xa0 > > [852.438498]  do_syscall_64+0x37/0xb0 > > [852.438507]  entry_SYSCALL_64_after_hwframe+0x44/0xae > > [852.438515] RIP: 0033:0x7f5c0415b317 > > [852.438523] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 > > 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 > > 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 > > 64 89 01 48 > > [852.438542] RSP: 002b:00007ffd765039a8 EFLAGS: 00000246 ORIG_RAX: > > 0000000000000010 > > [852.438553] RAX: ffffffffffffffda RBX: 000055e4d7829dd0 RCX: > > 00007f5c0415b317 > > [852.438562] RDX: 00007ffd76503a00 RSI: 00000000c0406469 RDI: > > 0000000000000017 > > [852.438571] RBP: 00007ffd76503a00 R08: 0000000000000000 R09: > > 0000000000000081 > > [852.438579] R10: 00000000ffffff7f R11: 0000000000000246 R12: > > 00000000c0406469 > > [852.438587] R13: 0000000000000017 R14: 00007ffd76503a00 R15: > > 0000000000000000 > > [852.438598]  > > [852.438602] Modules linked in: snd_hda_codec_hdmi i915 mei_hdcp > > x86_pkg_temp_thermal snd_hda_intel snd_intel_dspcfg drm_buddy > > coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec ttm > > ghash_clmulni_intel snd_hwdep snd_hda_core e1000e drm_dp_helper ptp > > snd_pcm mei_me drm_kms_helper pps_core mei syscopyarea sysfillrect > > sysimgblt fb_sys_fops prime_numbers intel_lpss_pci smsc75xx usbnet > > mii > > [852.440310] ---[ end trace e52cdd2fe4fd911c ]--- > > > > v2: Fix typos in the commit message. > > > > Fixes: 7e00897be8bf ("drm/i915: Add object locking to > > i915_gem_evict_for_node and i915_gem_evict_something, v2.") > > Fixes: bc1922e5d349 ("drm/i915: Fix a race between vma / object > > destruction and unbinding") > > Cc: Maarten Lankhorst > > Signed-off-by: Thomas Hellström > > Reviewed-by: Matthew Auld > > --- > > drivers/gpu/drm/i915/gem/i915_gem_object.c    | 14 +--- > > .../drm/i915/gem/selftests/i915_gem_mman.c    |  4 +- > > drivers/gpu/drm/i915/gt/intel_gtt.c           | 17 +++-- > > drivers/gpu/drm/i915/i915_vma.c               | 74 > > ++++++++++++++++--- > > drivers/gpu/drm/i915/i915_vma.h               |  3 + > > 5 files changed, 79 insertions(+), 33 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c > > b/drivers/gpu/drm/i915/gem/i915_gem_object.c > > index e03e362d320b..78c4cbe82031 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > > @@ -267,12 +267,6 @@ void __i915_gem_object_pages_fini(struct > > drm_i915_gem_object *obj) > >         if (!list_empty(&obj->vma.list)) { > >                 struct i915_vma *vma; > > > > -               /* > > -                * Note that the vma keeps an object reference > > while > > -                * it is active, so it *should* not sleep while we > > -                * destroy it. Our debug code errs insits it > > *might*. > > -                * For the moment, play along. > > -                */ > >                 spin_lock(&obj->vma.lock); > >                 while ((vma = list_first_entry_or_null(&obj- > > >vma.list, > >                                                        struct > > i915_vma, > > @@ -280,13 +274,7 @@ void __i915_gem_object_pages_fini(struct > > drm_i915_gem_object *obj) > >                         GEM_BUG_ON(vma->obj != obj); > >                         spin_unlock(&obj->vma.lock); > > > > -                       /* Verify that the vma is unbound under the > > vm mutex. */ > > -                       mutex_lock(&vma->vm->mutex); > > -                       atomic_and(~I915_VMA_PIN_MASK, &vma- > > >flags); > > -                       __i915_vma_unbind(vma); > > -                       mutex_unlock(&vma->vm->mutex); > > - > > -                       __i915_vma_put(vma); > > +                       i915_vma_destroy(vma); > > > >                         spin_lock(&obj->vma.lock); > >                 } > > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > > b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > > index ba29767348be..af36bffd064b 100644 > > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c > > @@ -167,7 +167,7 @@ static int check_partial_mapping(struct > > drm_i915_gem_object *obj, > > > > out: > >         i915_gem_object_lock(obj, NULL); > > -       __i915_vma_put(vma); > > +       i915_vma_destroy(vma); > >         i915_gem_object_unlock(obj); > >         return err; > > } > > @@ -264,7 +264,7 @@ static int check_partial_mappings(struct > > drm_i915_gem_object *obj, > >                         return err; > > > >                 i915_gem_object_lock(obj, NULL); > > -               __i915_vma_put(vma); > > +               i915_vma_destroy(vma); > >                 i915_gem_object_unlock(obj); > > > >                 if (igt_timeout(end_time, > > diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c > > b/drivers/gpu/drm/i915/gt/intel_gtt.c > > index df23ebdfc994..4363848f7411 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_gtt.c > > +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c > > @@ -105,14 +105,19 @@ void __i915_vm_close(struct > > i915_address_space *vm) > >         list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link) > > { > >                 struct drm_i915_gem_object *obj = vma->obj; > > > > -               /* Keep the obj (and hence the vma) alive as _we_ > > destroy it */ > > -               if (!kref_get_unless_zero(&obj->base.refcount)) > > +               if (!kref_get_unless_zero(&obj->base.refcount)) { > > +                       /* > > +                        * Unbind the dying vma to ensure the > > bound_list > > +                        * is completely drained. We leave the > > destruction to > > +                        * the object destructor. > > +                        */ > > +                       atomic_and(~I915_VMA_PIN_MASK, &vma- > > >flags); > > +                       WARN_ON(__i915_vma_unbind(vma)); > >                         continue; > > +               } > > > > -               atomic_and(~I915_VMA_PIN_MASK, &vma->flags); > > -               WARN_ON(__i915_vma_unbind(vma)); > > -               __i915_vma_put(vma); > > - > > +               /* Keep the obj (and hence the vma) alive as _we_ > > destroy it */ > > +               i915_vma_destroy_locked(vma); > >                 i915_gem_object_put(obj); > >         } > >         GEM_BUG_ON(!list_empty(&vm->bound_list)); > > diff --git a/drivers/gpu/drm/i915/i915_vma.c > > b/drivers/gpu/drm/i915/i915_vma.c > > index 52f43a465440..9c1582a473c6 100644 > > --- a/drivers/gpu/drm/i915/i915_vma.c > > +++ b/drivers/gpu/drm/i915/i915_vma.c > > @@ -1562,15 +1562,27 @@ void i915_vma_reopen(struct i915_vma *vma) > > void i915_vma_release(struct kref *ref) > > { > >         struct i915_vma *vma = container_of(ref, typeof(*vma), > > ref); > > + > > +       i915_vm_put(vma->vm); > > +       i915_active_fini(&vma->active); > > +       GEM_WARN_ON(vma->resource); > > +       i915_vma_free(vma); > > +} > > + > > +static void force_unbind(struct i915_vma *vma) > > +{ > > +       if (!drm_mm_node_allocated(&vma->node)) > > +               return; > > + > > +       atomic_and(~I915_VMA_PIN_MASK, &vma->flags); > > +       WARN_ON(__i915_vma_unbind(vma)); > > +       GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); > > +} > > + > > +static void release_references(struct i915_vma *vma) > > +{ > >         struct drm_i915_gem_object *obj = vma->obj; > > > > -       if (drm_mm_node_allocated(&vma->node)) { > > -               mutex_lock(&vma->vm->mutex); > > -               atomic_and(~I915_VMA_PIN_MASK, &vma->flags); > > -               WARN_ON(__i915_vma_unbind(vma)); > > -               mutex_unlock(&vma->vm->mutex); > > -               GEM_BUG_ON(drm_mm_node_allocated(&vma->node)); > > -       } > >         GEM_BUG_ON(i915_vma_is_active(vma)); > > > >         spin_lock(&obj->vma.lock); > > @@ -1580,11 +1592,49 @@ void i915_vma_release(struct kref *ref) > >         spin_unlock(&obj->vma.lock); > > > >         __i915_vma_remove_closed(vma); > > -       i915_vm_put(vma->vm); > > > > -       i915_active_fini(&vma->active); > > -       GEM_WARN_ON(vma->resource); > > -       i915_vma_free(vma); > > +       __i915_vma_put(vma); > > +} > > + > > +/** > > + * i915_vma_destroy_locked - Remove all weak reference to the vma > > and put > > + * the initial reference. > > + * > > + * This function should be called when it's decided the vma isn't > > needed > > + * anymore. The caller must assure that it doesn't race with > > another lookup > > + * plus destroy, typically by taking an appropriate reference. > > + * > > + * Current callsites are > > + * - __i915_gem_object_pages_fini() > > + * - __i915_vm_close() - Blocks the above function by taking a > > reference on > > + * the object. > > + * - __i915_vma_parked() - Blocks the above functions by taking an > > open-count on > > + * the vm and a reference on the object. > > + * > > + * Because of locks taken during destruction, a vma is also > > guaranteed to > > + * stay alive while the following locks are held if it was looked > > up while > > + * holding one of the locks: > > + * - vm->mutex > > + * - obj->vma.lock > > + * - gt->closed_lock > > + * > > + * A vma user can also temporarily keep the vma alive while > > holding a vma > > + * reference. > > + */ > > +void i915_vma_destroy_locked(struct i915_vma *vma) > > +{ > > +       lockdep_assert_held(&vma->vm->mutex); > > + > > +       force_unbind(vma); > > +       release_references(vma); > > +} > > + > > +void i915_vma_destroy(struct i915_vma *vma) > > +{ > > +       mutex_lock(&vma->vm->mutex); > > +       force_unbind(vma); > > +       mutex_unlock(&vma->vm->mutex); > > +       release_references(vma); > > } > > > > void i915_vma_parked(struct intel_gt *gt) > > @@ -1618,7 +1668,7 @@ void i915_vma_parked(struct intel_gt *gt) > > > >                 if (i915_gem_object_trylock(obj, NULL)) { > >                         INIT_LIST_HEAD(&vma->closed_link); > > -                       __i915_vma_put(vma); > > +                       i915_vma_destroy(vma); > >                         i915_gem_object_unlock(obj); > >                 } else { > >                         /* back you go.. */ > > There is one more __i915_vma_put() in i915_gem_object_unbind > (i915_gem.c). > I am not seeing that being replaced by i915_vma_destroy(). > > Other than that the patch looks fine to me. Thanks for reviewing, Niranjana. That __i915_vma_put() is for a later patch to remove the refcount completely. since it's actually a get() put() pair that uses the refcount in a proper way. But since Maarten has introduced an object lock in i915_vma_parked() we don't need it anymore. /Thomas > > Niranjana > > > > diff --git a/drivers/gpu/drm/i915/i915_vma.h > > b/drivers/gpu/drm/i915/i915_vma.h > > index 011af044ad4f..67ae7341c7e0 100644 > > --- a/drivers/gpu/drm/i915/i915_vma.h > > +++ b/drivers/gpu/drm/i915/i915_vma.h > > @@ -236,6 +236,9 @@ static inline void __i915_vma_put(struct > > i915_vma *vma) > >         kref_put(&vma->ref, i915_vma_release); > > } > > > > +void i915_vma_destroy_locked(struct i915_vma *vma); > > +void i915_vma_destroy(struct i915_vma *vma); > > + > > #define assert_vma_held(vma) dma_resv_assert_held((vma)->obj- > > >base.resv) > > > > static inline void i915_vma_lock(struct i915_vma *vma) > > -- > > 2.34.1 > >