From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD9C3C4360C for ; Wed, 16 Oct 2019 09:19:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9603D20650 for ; Wed, 16 Oct 2019 09:19:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9603D20650 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2CE938E0005; Wed, 16 Oct 2019 05:19:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 257FA8E0001; Wed, 16 Oct 2019 05:19:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0F8748E0005; Wed, 16 Oct 2019 05:19:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id D8BDC8E0001 for ; Wed, 16 Oct 2019 05:19:55 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 40AA7DB59 for ; Wed, 16 Oct 2019 09:19:55 +0000 (UTC) X-FDA: 76049100750.05.soup67_5689542520122 X-HE-Tag: soup67_5689542520122 X-Filterd-Recvd-Size: 5119 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Wed, 16 Oct 2019 09:19:54 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 53F5BB213; Wed, 16 Oct 2019 09:18:40 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 181F31E3BDE; Wed, 16 Oct 2019 11:18:40 +0200 (CEST) Date: Wed, 16 Oct 2019 11:18:40 +0200 From: Jan Kara To: Roman Gushchin Cc: Jan Kara , "linux-mm@kvack.org" , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Kernel Team , "tj@kernel.org" , Dennis Zhou Subject: Re: [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Message-ID: <20191016091840.GC30337@quack2.suse.cz> References: <20191010234036.2860655-1-guro@fb.com> <20191015090933.GA21104@quack2.suse.cz> <20191015214041.GA24736@tower.DHCP.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191015214041.GA24736@tower.DHCP.thefacebook.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 15-10-19 21:40:45, Roman Gushchin wrote: > On Tue, Oct 15, 2019 at 11:09:33AM +0200, Jan Kara wrote: > > On Thu 10-10-19 16:40:36, Roman Gushchin wrote: > > > > > @@ -426,7 +431,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work) > > > if (!list_empty(&inode->i_io_list)) { > > > struct inode *pos; > > > > > > - inode_io_list_del_locked(inode, old_wb); > > > + inode_io_list_del_locked(inode, old_wb, false); > > > inode->i_wb = new_wb; > > > list_for_each_entry(pos, &new_wb->b_dirty, i_io_list) > > > if (time_after_eq(inode->dirtied_when, > > > > This bit looks wrong. Not the change you made as such but the fact that you > > can now move inode from b_attached list of old wb to the dirty list of new > > wb. > > Hm, can you, please, elaborate a bit more why it's wrong? > The reference to the old_wb will be dropped by the switching code. My point is that the code in full looks like: if (!list_empty(&inode->i_io_list)) { struct inode *pos; inode_io_list_del_locked(inode, old_wb); inode->i_wb = new_wb; list_for_each_entry(pos, &new_wb->b_dirty, i_io_list) if (time_after_eq(inode->dirtied_when, pos->dirtied_when)) break; inode_io_list_move_locked(inode, new_wb, pos->i_io_list.prev); } else { So inode is always moved from some io list in old_wb to b_dirty list of new_wb. This is fine when it could be only on b_dirty, b_io, b_more_io lists of old_wb. But once you add b_attached list to the game, it is not correct anymore. You should not add clean inode to b_dirty list of new_wb. > > > + > > > + list_for_each_entry_safe(inode, tmp, &wb->b_attached, i_io_list) { > > > + if (!spin_trylock(&inode->i_lock)) > > > + continue; > > > + xa_lock_irq(&inode->i_mapping->i_pages); > > > + if (!(inode->i_state & > > > + (I_FREEING | I_CLEAR | I_SYNC | I_DIRTY | I_WB_SWITCH))) { > > > + WARN_ON_ONCE(inode->i_wb != wb); > > > + inode->i_wb = NULL; > > > + wb_put(wb); > > > > Hum, currently the code assumes that once i_wb is set, it never becomes > > NULL again. In particular the inode e.g. in > > fs/fs-writeback.c:inode_congested() or generally unlocked_inode_to_wb_begin() > > users could get broken by this. The i_wb switching code is so complex > > exactly because of these interactions. > > > > Maybe you thought through the interactions and things are actually fine but > > if nothing else you'd need a big fat comment here explaining why this is > > fine and update inode_congested() comments etc. > > Yeah, I thought that once inode is clean and not switching it's safe to clear > the i_wb pointer, but seems that it's not completely true. > > One idea I have is to always release wbs using rcu delayed work, so that > it will be save to dereference i_wb pointer under rcu, if only it's not NULL > (the check has to be added). I'll try to implement this scheme, but if you > know in advance that it's not gonna work, please, let me know. I think I'd just drop inode_to_wb_is_valid() because once i_wb can change to NULL, that function is just pointless in that single callsite. Also we have to count with the fact that unlocked_inode_to_wb_begin() can return NULL and gracefully do as much as possible in that case for all the callers. And I agree that those occurences in mm/page-writeback.c should be blocked by inode being clean and you holding all those locks so you can warn if that happens I guess. Honza -- Jan Kara SUSE Labs, CR