All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Roman Gushchin <guro@fb.com>
Cc: Jan Kara <jack@suse.cz>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>, "tj@kernel.org" <tj@kernel.org>,
	Dennis Zhou <dennis@kernel.org>
Subject: Re: [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups
Date: Wed, 16 Oct 2019 11:18:40 +0200	[thread overview]
Message-ID: <20191016091840.GC30337@quack2.suse.cz> (raw)
In-Reply-To: <20191015214041.GA24736@tower.DHCP.thefacebook.com>

On Tue 15-10-19 21:40:45, Roman Gushchin wrote:
> On Tue, Oct 15, 2019 at 11:09:33AM +0200, Jan Kara wrote:
> > On Thu 10-10-19 16:40:36, Roman Gushchin wrote:
> > 
> > > @@ -426,7 +431,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
> > >  	if (!list_empty(&inode->i_io_list)) {
> > >  		struct inode *pos;
> > >  
> > > -		inode_io_list_del_locked(inode, old_wb);
> > > +		inode_io_list_del_locked(inode, old_wb, false);
> > >  		inode->i_wb = new_wb;
> > >  		list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
> > >  			if (time_after_eq(inode->dirtied_when,
> > 
> > This bit looks wrong. Not the change you made as such but the fact that you
> > can now move inode from b_attached list of old wb to the dirty list of new
> > wb.
> 
> Hm, can you, please, elaborate a bit more why it's wrong?
> The reference to the old_wb will be dropped by the switching code.

My point is that the code in full looks like:

        if (!list_empty(&inode->i_io_list)) {
                struct inode *pos;

                inode_io_list_del_locked(inode, old_wb);
                inode->i_wb = new_wb;
                list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
                        if (time_after_eq(inode->dirtied_when,
                                          pos->dirtied_when))
                                break;
                inode_io_list_move_locked(inode, new_wb, pos->i_io_list.prev);
        } else {

So inode is always moved from some io list in old_wb to b_dirty list of
new_wb. This is fine when it could be only on b_dirty, b_io, b_more_io lists
of old_wb. But once you add b_attached list to the game, it is not correct
anymore. You should not add clean inode to b_dirty list of new_wb.

> > > +
> > > +	list_for_each_entry_safe(inode, tmp, &wb->b_attached, i_io_list) {
> > > +		if (!spin_trylock(&inode->i_lock))
> > > +			continue;
> > > +		xa_lock_irq(&inode->i_mapping->i_pages);
> > > +		if (!(inode->i_state &
> > > +		      (I_FREEING | I_CLEAR | I_SYNC | I_DIRTY | I_WB_SWITCH))) {
> > > +			WARN_ON_ONCE(inode->i_wb != wb);
> > > +			inode->i_wb = NULL;
> > > +			wb_put(wb);
> > 
> > Hum, currently the code assumes that once i_wb is set, it never becomes
> > NULL again. In particular the inode e.g. in
> > fs/fs-writeback.c:inode_congested() or generally unlocked_inode_to_wb_begin()
> > users could get broken by this. The i_wb switching code is so complex
> > exactly because of these interactions.
> > 
> > Maybe you thought through the interactions and things are actually fine but
> > if nothing else you'd need a big fat comment here explaining why this is
> > fine and update inode_congested() comments etc.
> 
> Yeah, I thought that once inode is clean and not switching it's safe to clear
> the i_wb pointer, but seems that it's not completely true.
>
> One idea I have is to always release wbs using rcu delayed work, so that
> it will be save to dereference i_wb pointer under rcu, if only it's not NULL
> (the check has to be added). I'll try to implement this scheme, but if you
> know in advance that it's not gonna work, please, let me know.

I think I'd just drop inode_to_wb_is_valid() because once i_wb can change
to NULL, that function is just pointless in that single callsite. Also we
have to count with the fact that unlocked_inode_to_wb_begin() can return
NULL and gracefully do as much as possible in that case for all the
callers. And I agree that those occurences in mm/page-writeback.c should be
blocked by inode being clean and you holding all those locks so you can
warn if that happens I guess.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2019-10-16  9:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10 23:40 [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Roman Gushchin
2019-10-15  9:09 ` Jan Kara
2019-10-15 21:40   ` Roman Gushchin
2019-10-16  9:18     ` Jan Kara [this message]
2019-10-21 23:49       ` Roman Gushchin
2019-10-22  8:15         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191016091840.GC30337@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=Kernel-team@fb.com \
    --cc=dennis@kernel.org \
    --cc=guro@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.