All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	mhocko@kernel.org, "Linux MM" <linux-mm@kvack.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Dave Chinner" <dchinner@redhat.com>, "Leo Liu" <Leo.Liu@amd.com>
Subject: Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
Date: Mon, 22 Mar 2021 15:22:02 +0100	[thread overview]
Message-ID: <YFioChrLPkjMBTP3@phenom.ffwll.local> (raw)
In-Reply-To: <20210322140548.GN1719932@casper.infradead.org>

On Mon, Mar 22, 2021 at 02:05:48PM +0000, Matthew Wilcox wrote:
> On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > 
> > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > the shrinker.
> > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > 
> > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > situation where there's really no other way out. For anything where you
> > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > My impression is just the other way around. You should try to avoid the
> > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > Where did you get that idea?
> > > 
> > > Well from the kernel comment on GFP_NOFS:
> > > 
> > >  * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > interfaces.
> > >  * Please try to avoid using this flag directly and instead use
> > >  * memalloc_nofs_{save,restore} to mark the whole scope which
> > > cannot/shouldn't
> > >  * recurse into the FS layer with a short explanation why. All allocation
> > >  * requests will inherit GFP_NOFS implicitly.
> > 
> > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > 
> > Adding them, maybe I got confused.
> 
> My impression is that the scoped API is preferred these days.
> 
> https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> 
> I'd probably need to spend a few months learning the DRM subsystem to
> have a more detailed opinion on whether passing GFP flags around explicitly
> or using the scope API is the better approach for your situation.

Atm it's a single allocation in the ttm shrinker that's already explicitly
using GFP_NOFS that we're talking about here.

The scoped api might make sense for gpu scheduler, where we really operate
under GFP_NOWAIT for somewhat awkward reasons. But also I thought at least
for GFP_NOIO you generally need a mempool and think about how you
guarantee forward progress anyway. Is that also a bit outdated thinking,
and nowadays we could operate under the assumption that this Just Works?
Given that GFP_NOFS seems to fall over already for us I'm not super sure
about that ...

> I usually defer to Michal on these kinds of questions.
> 
> > > > The kernel is full of explicit gfp_t flag
> > > > passing to make this as explicit as possible. The memalloc_no* stuff
> > > > is just for when you go through entire subsystems and really can't
> > > > wire it through. I can't find the discussion anymore, but that was the
> > > > advice I got from mm/fs people.
> > > > 
> > > > One reason is that generally a small GFP_KERNEL allocation never
> > > > fails. But it absolutely can fail if it's in a memalloc_no* section,
> > > > and these kind of non-obvious non-local effects are a real pain in
> > > > testing and review. Hence explicit gfp_flag passing as much as
> > > > possible.
> 
> I agree with this; it's definitely a problem with the scope API.  I wanted
> to extend it to include GFP_NOWAIT, but if you do that, your chances of
> memory allocation failure go way up, so you really want to set __GFP_NOWARN
> too, but now you need to audit all the places that you're calling to be
> sure they really handle errors correctly.
> 
> So I think I'm giving up on that patch set.

Yeah the auditing is what scares me, and why at least personally I prefer
explicit gfp flags. It's much easier to debug a lockdep splat involving
fs_reclaim than memory allocation failures leading to very strange bugs
because we're not handling the allocation failure properly (or maybe not
even at all).
-Daniel

> 
> > > > > > > > If this is just to paper over the seq_printf doing the wrong allocations,
> > > > > > > > then just move that out from under the fs_reclaim_acquire/release part.
> > > > > > > No, that wasn't the problem.
> > > > > > > 
> > > > > > > We have just seen to many failures to allocate pages for swapout and I think
> > > > > > > that would improve this because in a lot of cases we can then immediately
> > > > > > > swap things out instead of having to rely on upper layers.
> > > > > > Yeah, you broke it. Now the real shrinker is running with GFP_KERNEL,
> > > > > > because your memalloc_no is only around the debugfs function. And ofc it's
> > > > > > much easier to allocate with GFP_KERNEL, right until you deadlock :-)
> > > > > The problem here is that for example kswapd calls the shrinker without
> > > > > holding a FS lock as far as I can see.
> > > > > 
> > > > > And it is rather sad that we can't optimize this case directly.
> > > > I'm still not clear what you want to optimize? You can check for "is
> > > > this kswapd" in pf flags, but that sounds very hairy and fragile.
> > > 
> > > Well we only need the NOFS flag when the shrinker callback really comes from
> > > a memory shortage in the FS subsystem, and that is rather unlikely.
> > > 
> > > When we would allow all other cases to be able to directly IO the freed up
> > > pages to swap it would certainly help.
> > 
> > tbh I'm not sure. i915-gem code has played tricks with special casing the
> > kswapd path, and they do kinda scare me at least. I'm not sure whether
> > there's not some hidden dependencies there that would make this a bad
> > idea. Like afaik direct reclaim can sometimes stall for kswapd to catch up
> > a bit, or at least did in the past (I think, really not much clue about
> > this)
> > 
> > The other thing is that the fs_reclaim_acquire/release annotation really
> > only works well if you use it outside of the direct reclaim path too.
> > Otherwise it's not much better than just lots of testing. That pretty much
> > means you have to annotate the kswapd path.
> > -Daniel
> > 
> > 
> > 
> > > 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > Anyway you are right if some caller doesn't use the memalloc_no*()
> > > > > approach we are busted.
> > > > > 
> > > > > Going to change the patch to only not warn for the moment.
> > > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Shrinking is hard, there's no easy way out here.
> > > > > > 
> > > > > > Cheers, Daniel
> > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > 
> > > > > > > > __GFP_NOWARN should be there indeed I think.
> > > > > > > > -Daniel
> > > > > > > > 
> > > > > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > > > > ---
> > > > > > > > >     drivers/gpu/drm/ttm/ttm_tt.c | 5 ++++-
> > > > > > > > >     1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > index 2f0833c98d2c..86fa3e82dacc 100644
> > > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > @@ -369,7 +369,7 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
> > > > > > > > >             };
> > > > > > > > >             int ret;
> > > > > > > > > -  ret = ttm_bo_swapout(&ctx, GFP_NOFS);
> > > > > > > > > +  ret = ttm_bo_swapout(&ctx, GFP_KERNEL | __GFP_NOWARN);
> > > > > > > > >             return ret < 0 ? SHRINK_EMPTY : ret;
> > > > > > > > >     }
> > > > > > > > > @@ -389,10 +389,13 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
> > > > > > > > >     static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
> > > > > > > > >     {
> > > > > > > > >             struct shrink_control sc = { .gfp_mask = GFP_KERNEL };
> > > > > > > > > +  unsigned int flags;
> > > > > > > > >             fs_reclaim_acquire(GFP_KERNEL);
> > > > > > > > > +  flags = memalloc_nofs_save();
> > > > > > > > >             seq_printf(m, "%lu/%lu\n", ttm_tt_shrinker_count(&mm_shrinker, &sc),
> > > > > > > > >                        ttm_tt_shrinker_scan(&mm_shrinker, &sc));
> > > > > > > > > +  memalloc_nofs_restore(flags);
> > > > > > > > >             fs_reclaim_release(GFP_KERNEL);
> > > > > > > > >             return 0;
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > dri-devel mailing list
> > > > > > > > > dri-devel@lists.freedesktop.org
> > > > > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > > 
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	mhocko@kernel.org, "Linux MM" <linux-mm@kvack.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Dave Chinner" <dchinner@redhat.com>, "Leo Liu" <Leo.Liu@amd.com>
Subject: Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
Date: Mon, 22 Mar 2021 15:22:02 +0100	[thread overview]
Message-ID: <YFioChrLPkjMBTP3@phenom.ffwll.local> (raw)
In-Reply-To: <20210322140548.GN1719932@casper.infradead.org>

On Mon, Mar 22, 2021 at 02:05:48PM +0000, Matthew Wilcox wrote:
> On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > 
> > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > the shrinker.
> > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > 
> > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > situation where there's really no other way out. For anything where you
> > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > My impression is just the other way around. You should try to avoid the
> > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > Where did you get that idea?
> > > 
> > > Well from the kernel comment on GFP_NOFS:
> > > 
> > >  * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > interfaces.
> > >  * Please try to avoid using this flag directly and instead use
> > >  * memalloc_nofs_{save,restore} to mark the whole scope which
> > > cannot/shouldn't
> > >  * recurse into the FS layer with a short explanation why. All allocation
> > >  * requests will inherit GFP_NOFS implicitly.
> > 
> > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > 
> > Adding them, maybe I got confused.
> 
> My impression is that the scoped API is preferred these days.
> 
> https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> 
> I'd probably need to spend a few months learning the DRM subsystem to
> have a more detailed opinion on whether passing GFP flags around explicitly
> or using the scope API is the better approach for your situation.

Atm it's a single allocation in the ttm shrinker that's already explicitly
using GFP_NOFS that we're talking about here.

The scoped api might make sense for gpu scheduler, where we really operate
under GFP_NOWAIT for somewhat awkward reasons. But also I thought at least
for GFP_NOIO you generally need a mempool and think about how you
guarantee forward progress anyway. Is that also a bit outdated thinking,
and nowadays we could operate under the assumption that this Just Works?
Given that GFP_NOFS seems to fall over already for us I'm not super sure
about that ...

> I usually defer to Michal on these kinds of questions.
> 
> > > > The kernel is full of explicit gfp_t flag
> > > > passing to make this as explicit as possible. The memalloc_no* stuff
> > > > is just for when you go through entire subsystems and really can't
> > > > wire it through. I can't find the discussion anymore, but that was the
> > > > advice I got from mm/fs people.
> > > > 
> > > > One reason is that generally a small GFP_KERNEL allocation never
> > > > fails. But it absolutely can fail if it's in a memalloc_no* section,
> > > > and these kind of non-obvious non-local effects are a real pain in
> > > > testing and review. Hence explicit gfp_flag passing as much as
> > > > possible.
> 
> I agree with this; it's definitely a problem with the scope API.  I wanted
> to extend it to include GFP_NOWAIT, but if you do that, your chances of
> memory allocation failure go way up, so you really want to set __GFP_NOWARN
> too, but now you need to audit all the places that you're calling to be
> sure they really handle errors correctly.
> 
> So I think I'm giving up on that patch set.

Yeah the auditing is what scares me, and why at least personally I prefer
explicit gfp flags. It's much easier to debug a lockdep splat involving
fs_reclaim than memory allocation failures leading to very strange bugs
because we're not handling the allocation failure properly (or maybe not
even at all).
-Daniel

> 
> > > > > > > > If this is just to paper over the seq_printf doing the wrong allocations,
> > > > > > > > then just move that out from under the fs_reclaim_acquire/release part.
> > > > > > > No, that wasn't the problem.
> > > > > > > 
> > > > > > > We have just seen to many failures to allocate pages for swapout and I think
> > > > > > > that would improve this because in a lot of cases we can then immediately
> > > > > > > swap things out instead of having to rely on upper layers.
> > > > > > Yeah, you broke it. Now the real shrinker is running with GFP_KERNEL,
> > > > > > because your memalloc_no is only around the debugfs function. And ofc it's
> > > > > > much easier to allocate with GFP_KERNEL, right until you deadlock :-)
> > > > > The problem here is that for example kswapd calls the shrinker without
> > > > > holding a FS lock as far as I can see.
> > > > > 
> > > > > And it is rather sad that we can't optimize this case directly.
> > > > I'm still not clear what you want to optimize? You can check for "is
> > > > this kswapd" in pf flags, but that sounds very hairy and fragile.
> > > 
> > > Well we only need the NOFS flag when the shrinker callback really comes from
> > > a memory shortage in the FS subsystem, and that is rather unlikely.
> > > 
> > > When we would allow all other cases to be able to directly IO the freed up
> > > pages to swap it would certainly help.
> > 
> > tbh I'm not sure. i915-gem code has played tricks with special casing the
> > kswapd path, and they do kinda scare me at least. I'm not sure whether
> > there's not some hidden dependencies there that would make this a bad
> > idea. Like afaik direct reclaim can sometimes stall for kswapd to catch up
> > a bit, or at least did in the past (I think, really not much clue about
> > this)
> > 
> > The other thing is that the fs_reclaim_acquire/release annotation really
> > only works well if you use it outside of the direct reclaim path too.
> > Otherwise it's not much better than just lots of testing. That pretty much
> > means you have to annotate the kswapd path.
> > -Daniel
> > 
> > 
> > 
> > > 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > Anyway you are right if some caller doesn't use the memalloc_no*()
> > > > > approach we are busted.
> > > > > 
> > > > > Going to change the patch to only not warn for the moment.
> > > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Shrinking is hard, there's no easy way out here.
> > > > > > 
> > > > > > Cheers, Daniel
> > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > 
> > > > > > > > __GFP_NOWARN should be there indeed I think.
> > > > > > > > -Daniel
> > > > > > > > 
> > > > > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > > > > ---
> > > > > > > > >     drivers/gpu/drm/ttm/ttm_tt.c | 5 ++++-
> > > > > > > > >     1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > index 2f0833c98d2c..86fa3e82dacc 100644
> > > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > @@ -369,7 +369,7 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
> > > > > > > > >             };
> > > > > > > > >             int ret;
> > > > > > > > > -  ret = ttm_bo_swapout(&ctx, GFP_NOFS);
> > > > > > > > > +  ret = ttm_bo_swapout(&ctx, GFP_KERNEL | __GFP_NOWARN);
> > > > > > > > >             return ret < 0 ? SHRINK_EMPTY : ret;
> > > > > > > > >     }
> > > > > > > > > @@ -389,10 +389,13 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
> > > > > > > > >     static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
> > > > > > > > >     {
> > > > > > > > >             struct shrink_control sc = { .gfp_mask = GFP_KERNEL };
> > > > > > > > > +  unsigned int flags;
> > > > > > > > >             fs_reclaim_acquire(GFP_KERNEL);
> > > > > > > > > +  flags = memalloc_nofs_save();
> > > > > > > > >             seq_printf(m, "%lu/%lu\n", ttm_tt_shrinker_count(&mm_shrinker, &sc),
> > > > > > > > >                        ttm_tt_shrinker_scan(&mm_shrinker, &sc));
> > > > > > > > > +  memalloc_nofs_restore(flags);
> > > > > > > > >             fs_reclaim_release(GFP_KERNEL);
> > > > > > > > >             return 0;
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > dri-devel mailing list
> > > > > > > > > dri-devel@lists.freedesktop.org
> > > > > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > > 
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Dave Chinner" <dchinner@redhat.com>, "Leo Liu" <Leo.Liu@amd.com>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Linux MM" <linux-mm@kvack.org>,
	mhocko@kernel.org
Subject: Re: [PATCH] drm/ttm: stop warning on TT shrinker failure
Date: Mon, 22 Mar 2021 15:22:02 +0100	[thread overview]
Message-ID: <YFioChrLPkjMBTP3@phenom.ffwll.local> (raw)
In-Reply-To: <20210322140548.GN1719932@casper.infradead.org>

On Mon, Mar 22, 2021 at 02:05:48PM +0000, Matthew Wilcox wrote:
> On Mon, Mar 22, 2021 at 02:49:27PM +0100, Daniel Vetter wrote:
> > On Sun, Mar 21, 2021 at 03:18:28PM +0100, Christian König wrote:
> > > Am 20.03.21 um 14:17 schrieb Daniel Vetter:
> > > > On Sat, Mar 20, 2021 at 10:04 AM Christian König
> > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > Am 19.03.21 um 20:06 schrieb Daniel Vetter:
> > > > > > On Fri, Mar 19, 2021 at 07:53:48PM +0100, Christian König wrote:
> > > > > > > Am 19.03.21 um 18:52 schrieb Daniel Vetter:
> > > > > > > > On Fri, Mar 19, 2021 at 03:08:57PM +0100, Christian König wrote:
> > > > > > > > > Don't print a warning when we fail to allocate a page for swapping things out.
> > > > > > > > > 
> > > > > > > > > Also rely on memalloc_nofs_save/memalloc_nofs_restore instead of GFP_NOFS.
> > > > > > > > Uh this part doesn't make sense. Especially since you only do it for the
> > > > > > > > debugfs file, not in general. Which means you've just completely broken
> > > > > > > > the shrinker.
> > > > > > > Are you sure? My impression is that GFP_NOFS should now work much more out
> > > > > > > of the box with the memalloc_nofs_save()/memalloc_nofs_restore().
> > > > > > Yeah, if you'd put it in the right place :-)
> > > > > > 
> > > > > > But also -mm folks are very clear that memalloc_no*() family is for dire
> > > > > > situation where there's really no other way out. For anything where you
> > > > > > know what you're doing, you really should use explicit gfp flags.
> > > > > My impression is just the other way around. You should try to avoid the
> > > > > NOFS/NOIO flags and use the memalloc_no* approach instead.
> > > > Where did you get that idea?
> > > 
> > > Well from the kernel comment on GFP_NOFS:
> > > 
> > >  * %GFP_NOFS will use direct reclaim but will not use any filesystem
> > > interfaces.
> > >  * Please try to avoid using this flag directly and instead use
> > >  * memalloc_nofs_{save,restore} to mark the whole scope which
> > > cannot/shouldn't
> > >  * recurse into the FS layer with a short explanation why. All allocation
> > >  * requests will inherit GFP_NOFS implicitly.
> > 
> > Huh that's interesting, since iirc Willy or Dave told me the opposite, and
> > the memalloc_no* stuff is for e.g. nfs calling into network layer (needs
> > GFP_NOFS) or swap on top of a filesystems (even needs GFP_NOIO I think).
> > 
> > Adding them, maybe I got confused.
> 
> My impression is that the scoped API is preferred these days.
> 
> https://www.kernel.org/doc/html/latest/core-api/gfp_mask-from-fs-io.html
> 
> I'd probably need to spend a few months learning the DRM subsystem to
> have a more detailed opinion on whether passing GFP flags around explicitly
> or using the scope API is the better approach for your situation.

Atm it's a single allocation in the ttm shrinker that's already explicitly
using GFP_NOFS that we're talking about here.

The scoped api might make sense for gpu scheduler, where we really operate
under GFP_NOWAIT for somewhat awkward reasons. But also I thought at least
for GFP_NOIO you generally need a mempool and think about how you
guarantee forward progress anyway. Is that also a bit outdated thinking,
and nowadays we could operate under the assumption that this Just Works?
Given that GFP_NOFS seems to fall over already for us I'm not super sure
about that ...

> I usually defer to Michal on these kinds of questions.
> 
> > > > The kernel is full of explicit gfp_t flag
> > > > passing to make this as explicit as possible. The memalloc_no* stuff
> > > > is just for when you go through entire subsystems and really can't
> > > > wire it through. I can't find the discussion anymore, but that was the
> > > > advice I got from mm/fs people.
> > > > 
> > > > One reason is that generally a small GFP_KERNEL allocation never
> > > > fails. But it absolutely can fail if it's in a memalloc_no* section,
> > > > and these kind of non-obvious non-local effects are a real pain in
> > > > testing and review. Hence explicit gfp_flag passing as much as
> > > > possible.
> 
> I agree with this; it's definitely a problem with the scope API.  I wanted
> to extend it to include GFP_NOWAIT, but if you do that, your chances of
> memory allocation failure go way up, so you really want to set __GFP_NOWARN
> too, but now you need to audit all the places that you're calling to be
> sure they really handle errors correctly.
> 
> So I think I'm giving up on that patch set.

Yeah the auditing is what scares me, and why at least personally I prefer
explicit gfp flags. It's much easier to debug a lockdep splat involving
fs_reclaim than memory allocation failures leading to very strange bugs
because we're not handling the allocation failure properly (or maybe not
even at all).
-Daniel

> 
> > > > > > > > If this is just to paper over the seq_printf doing the wrong allocations,
> > > > > > > > then just move that out from under the fs_reclaim_acquire/release part.
> > > > > > > No, that wasn't the problem.
> > > > > > > 
> > > > > > > We have just seen to many failures to allocate pages for swapout and I think
> > > > > > > that would improve this because in a lot of cases we can then immediately
> > > > > > > swap things out instead of having to rely on upper layers.
> > > > > > Yeah, you broke it. Now the real shrinker is running with GFP_KERNEL,
> > > > > > because your memalloc_no is only around the debugfs function. And ofc it's
> > > > > > much easier to allocate with GFP_KERNEL, right until you deadlock :-)
> > > > > The problem here is that for example kswapd calls the shrinker without
> > > > > holding a FS lock as far as I can see.
> > > > > 
> > > > > And it is rather sad that we can't optimize this case directly.
> > > > I'm still not clear what you want to optimize? You can check for "is
> > > > this kswapd" in pf flags, but that sounds very hairy and fragile.
> > > 
> > > Well we only need the NOFS flag when the shrinker callback really comes from
> > > a memory shortage in the FS subsystem, and that is rather unlikely.
> > > 
> > > When we would allow all other cases to be able to directly IO the freed up
> > > pages to swap it would certainly help.
> > 
> > tbh I'm not sure. i915-gem code has played tricks with special casing the
> > kswapd path, and they do kinda scare me at least. I'm not sure whether
> > there's not some hidden dependencies there that would make this a bad
> > idea. Like afaik direct reclaim can sometimes stall for kswapd to catch up
> > a bit, or at least did in the past (I think, really not much clue about
> > this)
> > 
> > The other thing is that the fs_reclaim_acquire/release annotation really
> > only works well if you use it outside of the direct reclaim path too.
> > Otherwise it's not much better than just lots of testing. That pretty much
> > means you have to annotate the kswapd path.
> > -Daniel
> > 
> > 
> > 
> > > 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > Anyway you are right if some caller doesn't use the memalloc_no*()
> > > > > approach we are busted.
> > > > > 
> > > > > Going to change the patch to only not warn for the moment.
> > > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Shrinking is hard, there's no easy way out here.
> > > > > > 
> > > > > > Cheers, Daniel
> > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > 
> > > > > > > > __GFP_NOWARN should be there indeed I think.
> > > > > > > > -Daniel
> > > > > > > > 
> > > > > > > > > Signed-off-by: Christian König <christian.koenig@amd.com>
> > > > > > > > > ---
> > > > > > > > >     drivers/gpu/drm/ttm/ttm_tt.c | 5 ++++-
> > > > > > > > >     1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > index 2f0833c98d2c..86fa3e82dacc 100644
> > > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> > > > > > > > > @@ -369,7 +369,7 @@ static unsigned long ttm_tt_shrinker_scan(struct shrinker *shrink,
> > > > > > > > >             };
> > > > > > > > >             int ret;
> > > > > > > > > -  ret = ttm_bo_swapout(&ctx, GFP_NOFS);
> > > > > > > > > +  ret = ttm_bo_swapout(&ctx, GFP_KERNEL | __GFP_NOWARN);
> > > > > > > > >             return ret < 0 ? SHRINK_EMPTY : ret;
> > > > > > > > >     }
> > > > > > > > > @@ -389,10 +389,13 @@ static unsigned long ttm_tt_shrinker_count(struct shrinker *shrink,
> > > > > > > > >     static int ttm_tt_debugfs_shrink_show(struct seq_file *m, void *data)
> > > > > > > > >     {
> > > > > > > > >             struct shrink_control sc = { .gfp_mask = GFP_KERNEL };
> > > > > > > > > +  unsigned int flags;
> > > > > > > > >             fs_reclaim_acquire(GFP_KERNEL);
> > > > > > > > > +  flags = memalloc_nofs_save();
> > > > > > > > >             seq_printf(m, "%lu/%lu\n", ttm_tt_shrinker_count(&mm_shrinker, &sc),
> > > > > > > > >                        ttm_tt_shrinker_scan(&mm_shrinker, &sc));
> > > > > > > > > +  memalloc_nofs_restore(flags);
> > > > > > > > >             fs_reclaim_release(GFP_KERNEL);
> > > > > > > > >             return 0;
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > dri-devel mailing list
> > > > > > > > > dri-devel@lists.freedesktop.org
> > > > > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > > 
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


  reply	other threads:[~2021-03-22 14:22 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19 14:08 [PATCH] drm/ttm: stop warning on TT shrinker failure Christian König
2021-03-19 14:08 ` Christian König
2021-03-19 17:19 ` Deucher, Alexander
2021-03-19 17:19   ` Deucher, Alexander
2021-03-19 17:52 ` Daniel Vetter
2021-03-19 17:52   ` Daniel Vetter
2021-03-19 18:53   ` Christian König
2021-03-19 18:53     ` Christian König
2021-03-19 19:06     ` Daniel Vetter
2021-03-19 19:06       ` Daniel Vetter
2021-03-20  9:04       ` Christian König
2021-03-20  9:04         ` Christian König
2021-03-20 13:17         ` Daniel Vetter
2021-03-20 13:17           ` Daniel Vetter
2021-03-21 14:18           ` Christian König
2021-03-21 14:18             ` Christian König
2021-03-22 13:49             ` Daniel Vetter
2021-03-22 13:49               ` Daniel Vetter
2021-03-22 13:49               ` Daniel Vetter
2021-03-22 14:05               ` Matthew Wilcox
2021-03-22 14:05                 ` Matthew Wilcox
2021-03-22 14:05                 ` Matthew Wilcox
2021-03-22 14:22                 ` Daniel Vetter [this message]
2021-03-22 14:22                   ` Daniel Vetter
2021-03-22 14:22                   ` Daniel Vetter
2021-03-22 15:57                 ` Michal Hocko
2021-03-22 15:57                   ` Michal Hocko
2021-03-22 15:57                   ` Michal Hocko
2021-03-22 17:02                   ` Daniel Vetter
2021-03-22 17:02                     ` Daniel Vetter
2021-03-22 17:02                     ` Daniel Vetter
2021-03-22 19:34                     ` Christian König
2021-03-22 19:34                       ` Christian König
2021-03-22 19:34                       ` Christian König
2021-03-23  7:38                       ` Michal Hocko
2021-03-23  7:38                         ` Michal Hocko
2021-03-23  7:38                         ` Michal Hocko
2021-03-23 11:28                         ` Daniel Vetter
2021-03-23 11:28                           ` Daniel Vetter
2021-03-23 11:28                           ` Daniel Vetter
2021-03-23 11:46                           ` Michal Hocko
2021-03-23 11:46                             ` Michal Hocko
2021-03-23 11:46                             ` Michal Hocko
2021-03-23 11:51                             ` Christian König
2021-03-23 11:51                               ` Christian König
2021-03-23 11:51                               ` Christian König
2021-03-23 12:00                               ` Daniel Vetter
2021-03-23 12:00                                 ` Daniel Vetter
2021-03-23 12:00                                 ` Daniel Vetter
2021-03-23 12:05                               ` Michal Hocko
2021-03-23 12:05                                 ` Michal Hocko
2021-03-23 12:05                                 ` Michal Hocko
2021-03-23 11:48                           ` Christian König
2021-03-23 11:48                             ` Christian König
2021-03-23 11:48                             ` Christian König
2021-03-23 12:04                             ` Michal Hocko
2021-03-23 12:04                               ` Michal Hocko
2021-03-23 12:04                               ` Michal Hocko
2021-03-23 12:21                               ` Christian König
2021-03-23 12:21                                 ` Christian König
2021-03-23 12:21                                 ` Christian König
2021-03-23 12:37                                 ` Michal Hocko
2021-03-23 12:37                                   ` Michal Hocko
2021-03-23 12:37                                   ` Michal Hocko
2021-03-23 13:06                                   ` Christian König
2021-03-23 13:06                                     ` Christian König
2021-03-23 13:06                                     ` Christian König
2021-03-23 13:41                                     ` Michal Hocko
2021-03-23 13:41                                       ` Michal Hocko
2021-03-23 13:41                                       ` Michal Hocko
2021-03-23 13:56                                       ` Christian König
2021-03-23 13:56                                         ` Christian König
2021-03-23 13:56                                         ` Christian König
2021-03-23 15:13                                         ` Michal Hocko
2021-03-23 15:13                                           ` Michal Hocko
2021-03-23 15:13                                           ` Michal Hocko
2021-03-23 15:45                                           ` Christian König
2021-03-23 15:45                                             ` Christian König
2021-03-23 15:45                                             ` Christian König
2021-03-24 10:19                                             ` Thomas Hellström (Intel)
2021-03-24 10:19                                               ` Thomas Hellström (Intel)
2021-03-24 10:19                                               ` Thomas Hellström (Intel)
2021-03-24 11:55                                               ` Daniel Vetter
2021-03-24 11:55                                                 ` Daniel Vetter
2021-03-24 11:55                                                 ` Daniel Vetter
2021-03-24 12:00                                                 ` Christian König
2021-03-24 12:00                                                   ` Christian König
2021-03-24 12:00                                                   ` Christian König
2021-03-24 12:01                                                   ` Daniel Vetter
2021-03-24 12:01                                                     ` Daniel Vetter
2021-03-24 12:01                                                     ` Daniel Vetter
2021-03-24 12:07                                                     ` Christian König
2021-03-24 12:07                                                       ` Christian König
2021-03-24 12:07                                                       ` Christian König
2021-03-24 19:20                                                       ` Daniel Vetter
2021-03-24 19:20                                                         ` Daniel Vetter
2021-03-24 19:20                                                         ` Daniel Vetter
2021-03-23 13:15                               ` Daniel Vetter
2021-03-23 13:15                                 ` Daniel Vetter
2021-03-23 13:15                                 ` Daniel Vetter
2021-03-23 13:48                                 ` Michal Hocko
2021-03-23 13:48                                   ` Michal Hocko
2021-03-23 13:48                                   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFioChrLPkjMBTP3@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Leo.Liu@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dchinner@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.