linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Roman Gushchin <guro@fb.com>
Cc: Jan Kara <jack@suse.cz>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>, "tj@kernel.org" <tj@kernel.org>,
	Dennis Zhou <dennis@kernel.org>
Subject: Re: [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups
Date: Wed, 16 Oct 2019 11:18:40 +0200	[thread overview]
Message-ID: <20191016091840.GC30337@quack2.suse.cz> (raw)
In-Reply-To: <20191015214041.GA24736@tower.DHCP.thefacebook.com>

On Tue 15-10-19 21:40:45, Roman Gushchin wrote:
> On Tue, Oct 15, 2019 at 11:09:33AM +0200, Jan Kara wrote:
> > On Thu 10-10-19 16:40:36, Roman Gushchin wrote:
> > 
> > > @@ -426,7 +431,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
> > >  	if (!list_empty(&inode->i_io_list)) {
> > >  		struct inode *pos;
> > >  
> > > -		inode_io_list_del_locked(inode, old_wb);
> > > +		inode_io_list_del_locked(inode, old_wb, false);
> > >  		inode->i_wb = new_wb;
> > >  		list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
> > >  			if (time_after_eq(inode->dirtied_when,
> > 
> > This bit looks wrong. Not the change you made as such but the fact that you
> > can now move inode from b_attached list of old wb to the dirty list of new
> > wb.
> 
> Hm, can you, please, elaborate a bit more why it's wrong?
> The reference to the old_wb will be dropped by the switching code.

My point is that the code in full looks like:

        if (!list_empty(&inode->i_io_list)) {
                struct inode *pos;

                inode_io_list_del_locked(inode, old_wb);
                inode->i_wb = new_wb;
                list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
                        if (time_after_eq(inode->dirtied_when,
                                          pos->dirtied_when))
                                break;
                inode_io_list_move_locked(inode, new_wb, pos->i_io_list.prev);
        } else {

So inode is always moved from some io list in old_wb to b_dirty list of
new_wb. This is fine when it could be only on b_dirty, b_io, b_more_io lists
of old_wb. But once you add b_attached list to the game, it is not correct
anymore. You should not add clean inode to b_dirty list of new_wb.

> > > +
> > > +	list_for_each_entry_safe(inode, tmp, &wb->b_attached, i_io_list) {
> > > +		if (!spin_trylock(&inode->i_lock))
> > > +			continue;
> > > +		xa_lock_irq(&inode->i_mapping->i_pages);
> > > +		if (!(inode->i_state &
> > > +		      (I_FREEING | I_CLEAR | I_SYNC | I_DIRTY | I_WB_SWITCH))) {
> > > +			WARN_ON_ONCE(inode->i_wb != wb);
> > > +			inode->i_wb = NULL;
> > > +			wb_put(wb);
> > 
> > Hum, currently the code assumes that once i_wb is set, it never becomes
> > NULL again. In particular the inode e.g. in
> > fs/fs-writeback.c:inode_congested() or generally unlocked_inode_to_wb_begin()
> > users could get broken by this. The i_wb switching code is so complex
> > exactly because of these interactions.
> > 
> > Maybe you thought through the interactions and things are actually fine but
> > if nothing else you'd need a big fat comment here explaining why this is
> > fine and update inode_congested() comments etc.
> 
> Yeah, I thought that once inode is clean and not switching it's safe to clear
> the i_wb pointer, but seems that it's not completely true.
>
> One idea I have is to always release wbs using rcu delayed work, so that
> it will be save to dereference i_wb pointer under rcu, if only it's not NULL
> (the check has to be added). I'll try to implement this scheme, but if you
> know in advance that it's not gonna work, please, let me know.

I think I'd just drop inode_to_wb_is_valid() because once i_wb can change
to NULL, that function is just pointless in that single callsite. Also we
have to count with the fact that unlocked_inode_to_wb_begin() can return
NULL and gracefully do as much as possible in that case for all the
callers. And I agree that those occurences in mm/page-writeback.c should be
blocked by inode being clean and you holding all those locks so you can
warn if that happens I guess.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-10-16  9:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10 23:40 [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Roman Gushchin
2019-10-15  9:09 ` Jan Kara
2019-10-15 21:40   ` Roman Gushchin
2019-10-16  9:18     ` Jan Kara [this message]
2019-10-21 23:49       ` Roman Gushchin
2019-10-22  8:15         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191016091840.GC30337@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=Kernel-team@fb.com \
    --cc=dennis@kernel.org \
    --cc=guro@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).