From: Boris Brezillon <boris.brezillon@collabora.com>
To: Rob Herring <robh+dt@kernel.org>
Cc: stable <stable@vger.kernel.org>,
dri-devel <dri-devel@lists.freedesktop.org>,
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
Steven Price <steven.price@arm.com>
Subject: Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()
Date: Fri, 6 Dec 2019 09:08:09 +0100 [thread overview]
Message-ID: <20191206090809.0832f4aa@collabora.com> (raw)
In-Reply-To: <20191206085327.66a8c479@collabora.com>
On Fri, 6 Dec 2019 08:53:27 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> On Thu, 5 Dec 2019 17:08:02 -0600
> Rob Herring <robh+dt@kernel.org> wrote:
>
> > On Fri, Nov 29, 2019 at 8:33 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > On Fri, 29 Nov 2019 14:24:48 +0000
> > > Steven Price <steven.price@arm.com> wrote:
> > >
> > > > On 29/11/2019 13:59, Boris Brezillon wrote:
> > > > > If 2 threads change the MADVISE property of the same BO in parallel we
> > > > > might end up with an shmem->madv value that's inconsistent with the
> > > > > presence of the BO in the shrinker list.
> > > >
> > > > I'm a bit worried from the point of view of user space sanity that you
> > > > observed this - but clearly the kernel should be robust!
> > >
> > > It's not something I observed, just found the race by inspecting the
> > > code, and I thought it was worth fixing it.
> >
> > I'm not so sure there's a race.
>
> I'm pretty sure there's one:
>
> T0 T1
>
> lock(pages)
> madv = 1
> unlock(pages)
>
> lock(pages)
> madv = 0
> unlock(pages)
>
> lock(shrinker)
> remove_from_list(bo)
> unlock(shrinker)
>
> lock(shrinker)
> add_to_list(bo)
> unlock(shrinker)
>
> You end up with madv = 0 and the BO is added to the list.
>
> > If there is, we still check madv value
> > when purging, so it would be harmless even if the state is
> > inconsistent.
>
> Indeed. Note that you could also have this other situation where the BO
> is marked purgeable but not present in the list. In that case it will
> never be purged, but it's kinda user space fault anyway. I agree, none
> of this problems are critical, and I'm fine leaving it unfixed as long
> as it's documented somewhere that the race exist and is harmless.
>
> >
> > > > > The easiest solution to fix that is to protect the
> > > > > drm_gem_shmem_madvise() call with the shrinker lock.
> > > > >
> > > > > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> > > > > Cc: <stable@vger.kernel.org>
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > >
> > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > >
> > > Thanks.
> > >
> > > >
> > > > > ---
> > > > > drivers/gpu/drm/panfrost/panfrost_drv.c | 9 ++++-----
> > > > > 1 file changed, 4 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > index f21bc8a7ee3a..efc0a24d1f4c 100644
> > > > > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
> > > > > return -ENOENT;
> > > > > }
> > > > >
> > > > > + mutex_lock(&pfdev->shrinker_lock);
> > > > > args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);
> >
> > This means we now hold the shrinker_lock while we take the pages_lock.
> > Is lockdep happy with this change? I suspect not given all the fun I
> > had getting lockdep happy.
>
> I have tested with lockdep enabled and it's all good from lockdep PoV
> because the locks are taken in the same order in the madvise() and
> schinker_scan() path (first the shrinker lock, then the pages lock).
>
> Note that patch 7 introduces a deadlock in the shrinker path, but this
> is unrelated to this shrinker lock being taken earlier in madvise
> (drm_gem_put_pages() is called while the pages lock is already held).
My bad, there's no deadlock in this version, because we don't use
->pages_use_count to retain the page table (we just use a gpu_usecount
in patch 8 to prevent the purge). But I started working on a version
that uses ->pages_use_count instead of introducing yet another
refcount, and in this version I take/release a ref on the page table in
the mmu_map()/mmu_unmap() path. This causes a deadlock when GEM mappings
are teared down by the shrinker logic (because the pages lock is already
taken in panfrost_gem_purge())...
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Rob Herring <robh+dt@kernel.org>
Cc: Steven Price <steven.price@arm.com>,
Tomeu Vizoso <tomeu@tomeuvizoso.net>,
Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>,
stable <stable@vger.kernel.org>,
dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise()
Date: Fri, 6 Dec 2019 09:08:09 +0100 [thread overview]
Message-ID: <20191206090809.0832f4aa@collabora.com> (raw)
In-Reply-To: <20191206085327.66a8c479@collabora.com>
On Fri, 6 Dec 2019 08:53:27 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> On Thu, 5 Dec 2019 17:08:02 -0600
> Rob Herring <robh+dt@kernel.org> wrote:
>
> > On Fri, Nov 29, 2019 at 8:33 AM Boris Brezillon
> > <boris.brezillon@collabora.com> wrote:
> > >
> > > On Fri, 29 Nov 2019 14:24:48 +0000
> > > Steven Price <steven.price@arm.com> wrote:
> > >
> > > > On 29/11/2019 13:59, Boris Brezillon wrote:
> > > > > If 2 threads change the MADVISE property of the same BO in parallel we
> > > > > might end up with an shmem->madv value that's inconsistent with the
> > > > > presence of the BO in the shrinker list.
> > > >
> > > > I'm a bit worried from the point of view of user space sanity that you
> > > > observed this - but clearly the kernel should be robust!
> > >
> > > It's not something I observed, just found the race by inspecting the
> > > code, and I thought it was worth fixing it.
> >
> > I'm not so sure there's a race.
>
> I'm pretty sure there's one:
>
> T0 T1
>
> lock(pages)
> madv = 1
> unlock(pages)
>
> lock(pages)
> madv = 0
> unlock(pages)
>
> lock(shrinker)
> remove_from_list(bo)
> unlock(shrinker)
>
> lock(shrinker)
> add_to_list(bo)
> unlock(shrinker)
>
> You end up with madv = 0 and the BO is added to the list.
>
> > If there is, we still check madv value
> > when purging, so it would be harmless even if the state is
> > inconsistent.
>
> Indeed. Note that you could also have this other situation where the BO
> is marked purgeable but not present in the list. In that case it will
> never be purged, but it's kinda user space fault anyway. I agree, none
> of this problems are critical, and I'm fine leaving it unfixed as long
> as it's documented somewhere that the race exist and is harmless.
>
> >
> > > > > The easiest solution to fix that is to protect the
> > > > > drm_gem_shmem_madvise() call with the shrinker lock.
> > > > >
> > > > > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support")
> > > > > Cc: <stable@vger.kernel.org>
> > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > > >
> > > > Reviewed-by: Steven Price <steven.price@arm.com>
> > >
> > > Thanks.
> > >
> > > >
> > > > > ---
> > > > > drivers/gpu/drm/panfrost/panfrost_drv.c | 9 ++++-----
> > > > > 1 file changed, 4 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > index f21bc8a7ee3a..efc0a24d1f4c 100644
> > > > > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
> > > > > @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data,
> > > > > return -ENOENT;
> > > > > }
> > > > >
> > > > > + mutex_lock(&pfdev->shrinker_lock);
> > > > > args->retained = drm_gem_shmem_madvise(gem_obj, args->madv);
> >
> > This means we now hold the shrinker_lock while we take the pages_lock.
> > Is lockdep happy with this change? I suspect not given all the fun I
> > had getting lockdep happy.
>
> I have tested with lockdep enabled and it's all good from lockdep PoV
> because the locks are taken in the same order in the madvise() and
> schinker_scan() path (first the shrinker lock, then the pages lock).
>
> Note that patch 7 introduces a deadlock in the shrinker path, but this
> is unrelated to this shrinker lock being taken earlier in madvise
> (drm_gem_put_pages() is called while the pages lock is already held).
My bad, there's no deadlock in this version, because we don't use
->pages_use_count to retain the page table (we just use a gpu_usecount
in patch 8 to prevent the purge). But I started working on a version
that uses ->pages_use_count instead of introducing yet another
refcount, and in this version I take/release a ref on the page table in
the mmu_map()/mmu_unmap() path. This causes a deadlock when GEM mappings
are teared down by the shrinker logic (because the pages lock is already
taken in panfrost_gem_purge())...
next prev parent reply other threads:[~2019-12-06 8:08 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-29 13:59 [PATCH 0/8] panfrost: Fixes for 5.4 Boris Brezillon
2019-11-29 13:59 ` [PATCH 1/8] drm/panfrost: Make panfrost_job_run() return an ERR_PTR() instead of NULL Boris Brezillon
2019-11-29 13:59 ` Boris Brezillon
2019-11-29 14:19 ` Steven Price
2019-11-29 14:19 ` Steven Price
2019-11-29 14:31 ` Boris Brezillon
2019-11-29 14:31 ` Boris Brezillon
2019-11-29 14:38 ` Steven Price
2019-11-29 14:38 ` Steven Price
2019-11-29 19:32 ` Boris Brezillon
2019-11-29 19:32 ` Boris Brezillon
2019-11-29 13:59 ` [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise() Boris Brezillon
2019-11-29 13:59 ` Boris Brezillon
2019-11-29 14:24 ` Steven Price
2019-11-29 14:24 ` Steven Price
2019-11-29 14:33 ` Boris Brezillon
2019-11-29 14:33 ` Boris Brezillon
2019-11-29 14:40 ` Steven Price
2019-11-29 14:40 ` Steven Price
2019-11-29 20:07 ` Daniel Vetter
2019-11-29 20:07 ` Daniel Vetter
2019-11-29 21:45 ` Boris Brezillon
2019-11-29 21:45 ` Boris Brezillon
2019-12-05 23:08 ` Rob Herring
2019-12-05 23:08 ` Rob Herring
2019-12-06 7:53 ` Boris Brezillon
2019-12-06 7:53 ` Boris Brezillon
2019-12-06 8:08 ` Boris Brezillon [this message]
2019-12-06 8:08 ` Boris Brezillon
2019-11-29 13:59 ` [PATCH 3/8] drm/panfrost: Fix a BO leak in panfrost_ioctl_mmap_bo() Boris Brezillon
2019-11-29 13:59 ` Boris Brezillon
2019-11-29 14:26 ` Steven Price
2019-11-29 14:26 ` Steven Price
2019-11-29 13:59 ` [PATCH 4/8] drm/panfrost: Fix a race in panfrost_gem_free_object() Boris Brezillon
2019-11-29 13:59 ` Boris Brezillon
2019-11-29 14:28 ` Steven Price
2019-11-29 14:28 ` Steven Price
2019-11-29 13:59 ` [PATCH 5/8] drm/panfrost: Open/close the perfcnt BO Boris Brezillon
2019-11-29 13:59 ` Boris Brezillon
2019-11-29 14:34 ` Steven Price
2019-11-29 14:34 ` Steven Price
2019-11-29 13:59 ` [PATCH 6/8] drm/panfrost: Make sure imported/exported BOs are never purged Boris Brezillon
2019-11-29 13:59 ` Boris Brezillon
2019-11-29 14:14 ` Boris Brezillon
2019-11-29 14:14 ` Boris Brezillon
2019-11-29 14:45 ` Steven Price
2019-11-29 14:45 ` Steven Price
2019-11-29 14:52 ` Boris Brezillon
2019-11-29 14:52 ` Boris Brezillon
2019-11-29 20:12 ` Daniel Vetter
2019-11-29 20:12 ` Daniel Vetter
2019-11-29 21:09 ` Boris Brezillon
2019-11-29 21:09 ` Boris Brezillon
2019-12-02 8:52 ` Daniel Vetter
2019-12-02 8:52 ` Daniel Vetter
2019-12-02 9:50 ` Boris Brezillon
2019-12-02 9:50 ` Boris Brezillon
2019-11-29 13:59 ` [PATCH 7/8] drm/panfrost: Add the panfrost_gem_mapping concept Boris Brezillon
2019-11-29 13:59 ` Boris Brezillon
2019-11-29 15:37 ` Steven Price
2019-11-29 15:37 ` Steven Price
2019-11-29 20:14 ` Daniel Vetter
2019-11-29 20:14 ` Daniel Vetter
2019-11-29 21:36 ` Boris Brezillon
2019-11-29 21:36 ` Boris Brezillon
2019-12-02 8:55 ` Daniel Vetter
2019-12-02 8:55 ` Daniel Vetter
2019-12-02 9:13 ` Boris Brezillon
2019-12-02 9:13 ` Boris Brezillon
2019-12-02 9:44 ` Daniel Vetter
2019-12-02 9:44 ` Daniel Vetter
2019-12-04 11:41 ` Steven Price
2019-12-04 11:41 ` Steven Price
2019-11-29 13:59 ` [PATCH 8/8] drm/panfrost: Make sure the shrinker does not reclaim referenced BOs Boris Brezillon
2019-11-29 13:59 ` Boris Brezillon
2019-11-29 15:48 ` Steven Price
2019-11-29 15:48 ` Steven Price
2019-11-29 16:07 ` Boris Brezillon
2019-11-29 16:07 ` Boris Brezillon
2019-11-29 16:12 ` Steven Price
2019-11-29 16:12 ` Steven Price
2019-12-02 12:50 ` Robin Murphy
2019-12-02 12:50 ` Robin Murphy
2019-12-02 13:32 ` Boris Brezillon
2019-12-02 13:32 ` Boris Brezillon
2019-11-29 14:23 ` [PATCH 0/8] panfrost: Fixes for 5.4 Alyssa Rosenzweig
2019-12-06 17:16 ` Rob Herring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191206090809.0832f4aa@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=alyssa.rosenzweig@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=robh+dt@kernel.org \
--cc=stable@vger.kernel.org \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.