From: Roman Gushchin <guro@fb.com>
To: Jan Kara <jack@suse.cz>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Kernel Team <Kernel-team@fb.com>, "tj@kernel.org" <tj@kernel.org>,
Dennis Zhou <dennis@kernel.org>
Subject: Re: [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups
Date: Mon, 21 Oct 2019 23:49:04 +0000 [thread overview]
Message-ID: <20191021234858.GA16251@castle> (raw)
In-Reply-To: <20191016091840.GC30337@quack2.suse.cz>
On Wed, Oct 16, 2019 at 11:18:40AM +0200, Jan Kara wrote:
> On Tue 15-10-19 21:40:45, Roman Gushchin wrote:
> > On Tue, Oct 15, 2019 at 11:09:33AM +0200, Jan Kara wrote:
> > > On Thu 10-10-19 16:40:36, Roman Gushchin wrote:
> > >
> > > > @@ -426,7 +431,7 @@ static void inode_switch_wbs_work_fn(struct work_struct *work)
> > > > if (!list_empty(&inode->i_io_list)) {
> > > > struct inode *pos;
> > > >
> > > > - inode_io_list_del_locked(inode, old_wb);
> > > > + inode_io_list_del_locked(inode, old_wb, false);
> > > > inode->i_wb = new_wb;
> > > > list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
> > > > if (time_after_eq(inode->dirtied_when,
> > >
> > > This bit looks wrong. Not the change you made as such but the fact that you
> > > can now move inode from b_attached list of old wb to the dirty list of new
> > > wb.
> >
> > Hm, can you, please, elaborate a bit more why it's wrong?
> > The reference to the old_wb will be dropped by the switching code.
>
> My point is that the code in full looks like:
>
> if (!list_empty(&inode->i_io_list)) {
> struct inode *pos;
>
> inode_io_list_del_locked(inode, old_wb);
> inode->i_wb = new_wb;
> list_for_each_entry(pos, &new_wb->b_dirty, i_io_list)
> if (time_after_eq(inode->dirtied_when,
> pos->dirtied_when))
> break;
> inode_io_list_move_locked(inode, new_wb, pos->i_io_list.prev);
> } else {
>
> So inode is always moved from some io list in old_wb to b_dirty list of
> new_wb. This is fine when it could be only on b_dirty, b_io, b_more_io lists
> of old_wb. But once you add b_attached list to the game, it is not correct
> anymore. You should not add clean inode to b_dirty list of new_wb.
I see...
Hm, will checking of i_state for not containing I_DIRTY_ALL bits be enough here?
Alternatively, I can introduce a new bit which will explicitly point at the
inode being on the b_attached list, but I'd prefer not to do it.
>
> > > > +
> > > > + list_for_each_entry_safe(inode, tmp, &wb->b_attached, i_io_list) {
> > > > + if (!spin_trylock(&inode->i_lock))
> > > > + continue;
> > > > + xa_lock_irq(&inode->i_mapping->i_pages);
> > > > + if (!(inode->i_state &
> > > > + (I_FREEING | I_CLEAR | I_SYNC | I_DIRTY | I_WB_SWITCH))) {
> > > > + WARN_ON_ONCE(inode->i_wb != wb);
> > > > + inode->i_wb = NULL;
> > > > + wb_put(wb);
> > >
> > > Hum, currently the code assumes that once i_wb is set, it never becomes
> > > NULL again. In particular the inode e.g. in
> > > fs/fs-writeback.c:inode_congested() or generally unlocked_inode_to_wb_begin()
> > > users could get broken by this. The i_wb switching code is so complex
> > > exactly because of these interactions.
> > >
> > > Maybe you thought through the interactions and things are actually fine but
> > > if nothing else you'd need a big fat comment here explaining why this is
> > > fine and update inode_congested() comments etc.
> >
> > Yeah, I thought that once inode is clean and not switching it's safe to clear
> > the i_wb pointer, but seems that it's not completely true.
> >
> > One idea I have is to always release wbs using rcu delayed work, so that
> > it will be save to dereference i_wb pointer under rcu, if only it's not NULL
> > (the check has to be added). I'll try to implement this scheme, but if you
> > know in advance that it's not gonna work, please, let me know.
>
> I think I'd just drop inode_to_wb_is_valid() because once i_wb can change
> to NULL, that function is just pointless in that single callsite. Also we
> have to count with the fact that unlocked_inode_to_wb_begin() can return
> NULL and gracefully do as much as possible in that case for all the
> callers. And I agree that those occurences in mm/page-writeback.c should be
> blocked by inode being clean and you holding all those locks so you can
> warn if that happens I guess.
Yeah, it sounds good to me. I actually have a patch, which I'll post after
some more extensive testing (want to make sure I do not hit these warns).
Thank you!
next prev parent reply other threads:[~2019-10-21 23:49 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-10 23:40 [PATCH v2] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Roman Gushchin
2019-10-15 9:09 ` Jan Kara
2019-10-15 21:40 ` Roman Gushchin
2019-10-16 9:18 ` Jan Kara
2019-10-21 23:49 ` Roman Gushchin [this message]
2019-10-22 8:15 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191021234858.GA16251@castle \
--to=guro@fb.com \
--cc=Kernel-team@fb.com \
--cc=dennis@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.