From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roman Gushchin Subject: Re: [PATCH v9 3/8] writeback, cgroup: increment isw_nr_in_flight before grabbing an inode Date: Wed, 9 Jun 2021 17:21:14 -0700 Message-ID: References: <20210608230225.2078447-1-guro@fb.com> <20210608230225.2078447-4-guro@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=date : from : to : cc : subject : message-id : references : content-type : in-reply-to : mime-version; s=facebook; bh=TpK1cdobo9o7wKCHVN2XbKA1lu3rGsD869bUAtYau8w=; b=k/2l+fZ5vvfGbMqv4qfFYUW2YzWyA5kDEJDH8d5ZOzxro+78YefJKW3k48uLoh4ESc6u nMkL2lrZQVgw7s69ZMUtLaDkRFb3sIYaPMFXt7MzKDhMc9KR6Z/Zngz7bpWKnEtsA//X lhVEvIEi+HlajwQ0J8Jhk42I0jscqpsoKi0= Content-Disposition: inline In-Reply-To: List-ID: Content-Transfer-Encoding: 7bit To: Ming Lei Cc: Andrew Morton , Tejun Heo , linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, Alexander Viro , Jan Kara , Dennis Zhou , Dave Chinner , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jan Kara On Wed, Jun 09, 2021 at 11:32:44AM +0800, Ming Lei wrote: > On Tue, Jun 08, 2021 at 04:02:20PM -0700, Roman Gushchin wrote: > > isw_nr_in_flight is used do determine whether the inode switch queue > > should be flushed from the umount path. Currently it's increased > > after grabbing an inode and even scheduling the switch work. It means > > the umount path can be walked past cleanup_offline_cgwb() with active > > inode references, which can result in a "Busy inodes after unmount." > > message and use-after-free issues (with inode->i_sb which gets freed). > > > > Fix it by incrementing isw_nr_in_flight before doing anything with > > the inode and decrementing in the case when switching wasn't scheduled. > > > > The problem hasn't yet been seen in the real life and was discovered > > by Jan Kara by looking into the code. > > > > Suggested-by: Jan Kara > > Signed-off-by: Roman Gushchin > > Reviewed-by: Jan Kara > > --- > > fs/fs-writeback.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c > > index b6fc13a4962d..4413e005c28c 100644 > > --- a/fs/fs-writeback.c > > +++ b/fs/fs-writeback.c > > @@ -505,6 +505,8 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id) > > if (!isw) > > return; > > > > + atomic_inc(&isw_nr_in_flight); > > smp_mb() may be required for ordering the WRITE in 'atomic_inc(&isw_nr_in_flight)' > and the following READ on 'inode->i_sb->s_flags & SB_ACTIVE'. Otherwise, > cgroup_writeback_umount() may observe zero of 'isw_nr_in_flight' because of > re-order of the two OPs, then miss the flush_workqueue(). > > Also this barrier should serve as pair of the one added in cgroup_writeback_umount(), > so maybe this patch should be merged with 2/8. Hi Ming! Good point, I agree. How about a patch below? Thanks! -- >From 282861286074c47907759d80c01419f0d0630dae Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Wed, 9 Jun 2021 14:14:26 -0700 Subject: [PATCH] cgroup, writeback: add smp_mb() to inode_prepare_wbs_switch() Add a memory barrier between incrementing isw_nr_in_flight and checking the sb's SB_ACTIVE flag and grabbing an inode in inode_prepare_wbs_switch(). It's required to prevent grabbing an inode before incrementing isw_nr_in_flight, otherwise 0 can be obtained as isw_nr_in_flight in cgroup_writeback_umount() and isw_wq will not be flushed, potentially leading to a memory corruption. Added smp_mb() will work in pair with smp_mb() in cgroup_writeback_umount(). Suggested-by: Ming Lei Signed-off-by: Roman Gushchin --- fs/fs-writeback.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 545fce68e919..6332b86ca4ed 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -513,6 +513,14 @@ static void inode_switch_wbs_work_fn(struct work_struct *work) static bool inode_prepare_wbs_switch(struct inode *inode, struct bdi_writeback *new_wb) { + /* + * Paired with smp_mb() in cgroup_writeback_umount(). + * isw_nr_in_flight must be increased before checking SB_ACTIVE and + * grabbing an inode, otherwise isw_nr_in_flight can be observed as 0 + * in cgroup_writeback_umount() and the isw_wq will be not flushed. + */ + smp_mb(); + /* while holding I_WB_SWITCH, no one else can update the association */ spin_lock(&inode->i_lock); if (!(inode->i_sb->s_flags & SB_ACTIVE) || -- 2.31.1