From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH] mm: fix hanging shrinker management on long do_shrink_slab Date: Wed, 4 Dec 2019 09:35:14 +0100 Message-ID: <20191204083514.GC25242@dhcp22.suse.cz> References: <20191129214541.3110-1-ptikhomirov@virtuozzo.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <20191129214541.3110-1-ptikhomirov@virtuozzo.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Pavel Tikhomirov Cc: Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Vladimir Davydov , Roman Gushchin , Shakeel Butt , Chris Down , Yang Shi , Tejun Heo , Thomas Gleixner , "Kirill A . Shutemov" , Konstantin Khorenko , Kirill Tkhai , Andrey Ryabinin On Sat 30-11-19 00:45:41, Pavel Tikhomirov wrote: > We have a problem that shrinker_rwsem can be held for a long time for > read in shrink_slab, at the same time any process which is trying to > manage shrinkers hangs. > > The shrinker_rwsem is taken in shrink_slab while traversing shrinker_list. > It tries to shrink something on nfs (hard) but nfs server is dead at > these moment already and rpc will never succeed. Generally any shrinker > can take significant time to do_shrink_slab, so it's a bad idea to hold > the list lock here. Yes, this is a known problem and people have already tried to address it in the past. Have you checked previous attempts? SRCU based one http://lkml.kernel.org/r/153365347929.19074.12509495712735843805.stgit@localhost.localdomain but I believe there were others (I only had this one in my notes). Please make sure to Cc Dave Chinner when posting a next version because he had some concerns about the change of the behavior. > We have a similar problem in shrink_slab_memcg, except that we are > traversing shrinker_map+shrinker_idr there. > > The idea of the patch is to inc a refcount to the chosen shrinker so it > won't disappear and release shrinker_rwsem while we are in > do_shrink_slab, after that we will reacquire shrinker_rwsem, dec > the refcount and continue the traversal. The reference count part makes sense to me. RCU role needs a better explanation. Also do you have any reason to not use completion for the final step? Openconding essentially the same concept sounds a bit awkward to me. > We also need a wait_queue so that unregister_shrinker can wait for the > refcnt to become zero. Only after these we can safely remove the > shrinker from list and idr, and free the shrinker. [...] > crash> bt ... > PID: 18739 TASK: ... CPU: 3 COMMAND: "bash" > #0 [...] __schedule at ... > #1 [...] schedule at ... > #2 [...] rpc_wait_bit_killable at ... [sunrpc] > #3 [...] __wait_on_bit at ... > #4 [...] out_of_line_wait_on_bit at ... > #5 [...] _nfs4_proc_delegreturn at ... [nfsv4] > #6 [...] nfs4_proc_delegreturn at ... [nfsv4] > #7 [...] nfs_do_return_delegation at ... [nfsv4] > #8 [...] nfs4_evict_inode at ... [nfsv4] > #9 [...] evict at ... > #10 [...] dispose_list at ... > #11 [...] prune_icache_sb at ... > #12 [...] super_cache_scan at ... > #13 [...] do_shrink_slab at ... Are NFS people aware of this? Because this is simply not acceptable behavior. Memory reclaim cannot be block indefinitely or for a long time. There must be a way to simply give up if the underlying inode cannot be reclaimed. I still have to think about the proposed solution. It sounds a bit over complicated to me. -- Michal Hocko SUSE Labs