From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Sun, 8 Oct 2017 18:47:46 +0300 From: Vladimir Davydov To: Al Viro Cc: Michal Hocko , Jia-Ju Bai , torbjorn.lindh@gopta.se, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [BUG] fs/super: a possible sleep-in-atomic bug in put_super Message-ID: <20171008154746.bgtkxir2pytftef3@esperanza> References: <06badf5e-292d-ef63-7499-6888dec1b9b0@163.com> <20171006090604.m5oxcyb2xtllpmpu@dhcp22.suse.cz> <20171007115640.w3m6vxxrglcbeutl@esperanza> <20171007170651.GR21978@ZenIV.linux.org.uk> <20171007211444.GS21978@ZenIV.linux.org.uk> <20171008005602.GT21978@ZenIV.linux.org.uk> <20171008020327.GU21978@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171008020327.GU21978@ZenIV.linux.org.uk> Sender: linux-kernel-owner@vger.kernel.org List-ID: On Sun, Oct 08, 2017 at 03:03:32AM +0100, Al Viro wrote: > On Sun, Oct 08, 2017 at 01:56:08AM +0100, Al Viro wrote: > > > What's more, we need to be careful about resize vs. drain. Right now it's > > on list_lrus_mutex, but if we drop that around actual resize of an individual > > list_lru, we'll need something else. Would there be any problem if we > > took memcg_cache_ids_sem shared in memcg_offline_kmem()? > > > > The first problem is not fatal - we can e.g. use the sign of the field used > > to store the number of ->memcg_lrus elements (i.e. stashed value of > > memcg_nr_cache_ids at allocation or last resize) to indicate that actual > > freeing is left for resizer... > > Ugh. That spinlock would have to be held over too much work, or bounced back > and forth a lot on memcg shutdowns ;-/ Gets especially nasty if we want > list_lru_destroy() callable from rcu callbacks. Oh, well... > > I still suspect that locking there is too heavy, but it looks like I don't have > a better replacement. > > What are the realistic numbers of memcg on a big system? Several thousand. I guess we could turn list_lrus_mutex into a spin lock by making resize/drain procedures handle list_lru destruction as you suggested above, but list_lru_destroy() would still have to iterate over all elements of list_lru_node->memcg_lrus array to free per-memcg objects, which is too heavy to be performed under sb_lock IMHO. Thanks, Vladimir