[PATCH] fsnotify: Fix oops in fsnotify_clear_marks_by_group

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] fsnotify: Fix oops in fsnotify_clear_marks_by_group_flags()
@ 2015-06-26 14:50 Jan Kara
  2015-06-27 18:54 ` Lino Sanfilippo
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2015-06-26 14:50 UTC (permalink / raw)
  To: Ashish Sangwan; +Cc: LinoSanfilippo, linux-fsdevel, Jan Kara

From: Jan Kara <jack@suse.cz>

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and we dereference
free memory in the loop there.

Fix the problem by keeping mark_mutex held in
fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
that we need to call a ->freeing_mark() callback which may acquire
mark_mutex again. To avoid this and similar lock inversion issues, we
move the call to ->freeing_mark() callback to the kthread destroying the
mark.

Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/notify/mark.c | 38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

Ashish, can you verify whether this patch fixes the problem you have hit?
Thanks!

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 92e48c70f0f0..11e433c2bf1c 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -152,31 +152,15 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
 		BUG();
 
 	list_del_init(&mark->g_list);
-
 	spin_unlock(&mark->lock);
 
-	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
-		iput(inode);
-	/* release lock temporarily */
-	mutex_unlock(&group->mark_mutex);
-
 	spin_lock(&destroy_lock);
 	list_add(&mark->g_list, &destroy_list);
 	spin_unlock(&destroy_lock);
 	wake_up(&destroy_waitq);
-	/*
-	 * We don't necessarily have a ref on mark from caller so the above destroy
-	 * may have actually freed it, unless this group provides a 'freeing_mark'
-	 * function which must be holding a reference.
-	 */
 
-	/*
-	 * Some groups like to know that marks are being freed.  This is a
-	 * callback to the group function to let it know that this mark
-	 * is being freed.
-	 */
-	if (group->ops->freeing_mark)
-		group->ops->freeing_mark(mark, group);
+	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
+		iput(inode);
 
 	/*
 	 * __fsnotify_update_child_dentry_flags(inode);
@@ -191,8 +175,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
 	 */
 
 	atomic_dec(&group->num_marks);
-
-	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
 }
 
 void fsnotify_destroy_mark(struct fsnotify_mark *mark,
@@ -205,7 +187,10 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 
 /*
  * Destroy all marks in the given list. The marks must be already detached from
- * the original inode / vfsmount.
+ * the original inode / vfsmount. Note that we can race with
+ * fsnotify_clear_marks_by_group_flags(). However we hold a reference to each
+ * mark so they won't get freed from under us and nobody else touches our
+ * free_list list_head.
  */
 void fsnotify_destroy_marks(struct list_head *to_free)
 {
@@ -406,7 +391,7 @@ struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
 }
 
 /*
- * clear any marks in a group in which mark->flags & flags is true
+ * Clear any marks in a group in which mark->flags & flags is true.
  */
 void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 					 unsigned int flags)
@@ -460,6 +445,7 @@ static int fsnotify_mark_destroy(void *ignored)
 {
 	struct fsnotify_mark *mark, *next;
 	struct list_head private_destroy_list;
+	struct fsnotify_group *group;
 
 	for (;;) {
 		spin_lock(&destroy_lock);
@@ -471,6 +457,14 @@ static int fsnotify_mark_destroy(void *ignored)
 
 		list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
 			list_del_init(&mark->g_list);
+			group = mark->group;
+			/*
+			 * Some groups like to know that marks are being freed.
+			 * This is a callback to the group function to let it
+			 * know that this mark is being freed.
+			 */
+			if (group && group->ops->freeing_mark)
+				group->ops->freeing_mark(mark, group);
 			fsnotify_put_mark(mark);
 		}
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fsnotify: Fix oops in fsnotify_clear_marks_by_group_flags()
  2015-06-26 14:50 Jan Kara
@ 2015-06-27 18:54 ` Lino Sanfilippo
  0 siblings, 0 replies; 5+ messages in thread
From: Lino Sanfilippo @ 2015-06-27 18:54 UTC (permalink / raw)
  To: Jan Kara, Ashish Sangwan; +Cc: linux-fsdevel, Jan Kara

Hi,

On 26.06.2015 16:50, Jan Kara wrote:
> From: Jan Kara <jack@suse.cz>
> 
> fsnotify_clear_marks_by_group_flags() can race with
> fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> mark_mutex, a mark from the list iterated by
> fsnotify_clear_marks_by_group_flags() can be freed and we dereference
> free memory in the loop there.
> 
> Fix the problem by keeping mark_mutex held in
> fsnotify_destroy_mark_locked(). The reason why we drop that mutex is
> that we need to call a ->freeing_mark() callback which may acquire
> mark_mutex again. To avoid this and similar lock inversion issues, we
> move the call to ->freeing_mark() callback to the kthread destroying the
> mark.
> 
> Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/notify/mark.c | 38 ++++++++++++++++----------------------
>  1 file changed, 16 insertions(+), 22 deletions(-)
> 
> Ashish, can you verify whether this patch fixes the problem you have hit?
> Thanks!
> 
> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index 92e48c70f0f0..11e433c2bf1c 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -152,31 +152,15 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
>  		BUG();
>  
>  	list_del_init(&mark->g_list);
> -
>  	spin_unlock(&mark->lock);
>  
> -	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
> -		iput(inode);
> -	/* release lock temporarily */
> -	mutex_unlock(&group->mark_mutex);
> -
>  	spin_lock(&destroy_lock);
>  	list_add(&mark->g_list, &destroy_list);
>  	spin_unlock(&destroy_lock);
>  	wake_up(&destroy_waitq);
> -	/*
> -	 * We don't necessarily have a ref on mark from caller so the above destroy
> -	 * may have actually freed it, unless this group provides a 'freeing_mark'
> -	 * function which must be holding a reference.
> -	 */
>  
> -	/*
> -	 * Some groups like to know that marks are being freed.  This is a
> -	 * callback to the group function to let it know that this mark
> -	 * is being freed.
> -	 */
> -	if (group->ops->freeing_mark)
> -		group->ops->freeing_mark(mark, group);
> +	if (inode && (mark->flags & FSNOTIFY_MARK_FLAG_OBJECT_PINNED))
> +		iput(inode);
>  
>  	/*
>  	 * __fsnotify_update_child_dentry_flags(inode);
> @@ -191,8 +175,6 @@ void fsnotify_destroy_mark_locked(struct fsnotify_mark *mark,
>  	 */
>  
>  	atomic_dec(&group->num_marks);
> -
> -	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
>  }
>  
>  void fsnotify_destroy_mark(struct fsnotify_mark *mark,
> @@ -205,7 +187,10 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark,
>  
>  /*
>   * Destroy all marks in the given list. The marks must be already detached from
> - * the original inode / vfsmount.
> + * the original inode / vfsmount. Note that we can race with
> + * fsnotify_clear_marks_by_group_flags(). However we hold a reference to each
> + * mark so they won't get freed from under us and nobody else touches our
> + * free_list list_head.
>   */
>  void fsnotify_destroy_marks(struct list_head *to_free)
>  {
> @@ -406,7 +391,7 @@ struct fsnotify_mark *fsnotify_find_mark(struct hlist_head *head,
>  }
>  
>  /*
> - * clear any marks in a group in which mark->flags & flags is true
> + * Clear any marks in a group in which mark->flags & flags is true.
>   */
>  void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
>  					 unsigned int flags)
> @@ -460,6 +445,7 @@ static int fsnotify_mark_destroy(void *ignored)
>  {
>  	struct fsnotify_mark *mark, *next;
>  	struct list_head private_destroy_list;
> +	struct fsnotify_group *group;
>  
>  	for (;;) {
>  		spin_lock(&destroy_lock);
> @@ -471,6 +457,14 @@ static int fsnotify_mark_destroy(void *ignored)
>  
>  		list_for_each_entry_safe(mark, next, &private_destroy_list, g_list) {
>  			list_del_init(&mark->g_list);
> +			group = mark->group;
> +			/*
> +			 * Some groups like to know that marks are being freed.
> +			 * This is a callback to the group function to let it
> +			 * know that this mark is being freed.
> +			 */
> +			if (group && group->ops->freeing_mark)
> +				group->ops->freeing_mark(mark, group);
>  			fsnotify_put_mark(mark);
>  		}
>  
> 

looks good, thanks for fixing this Jan!

Regards,
Lino


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] fsnotify: Fix oops in fsnotify_clear_marks_by_group_flags()
@ 2015-07-23 13:54 Jan Kara
  2015-07-24  5:52 ` Ashish Sangwan
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2015-07-23 13:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-fsdevel, Ashish Sangwan, Lino Sanfilippo, Jan Kara, stable

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference
free memory.

Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list. This method is
safe even when entries from the list can disappear once we drop the lock.

CC: stable@vger.kernel.org
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Jan Kara <jack@suse.com>
---
 fs/notify/mark.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

Andrew, this is the new version of the fsnotify oops fix. It has survived
LTP tests and also a reproducer I wrote for triggering the oops. I'll work
on integrating the reproducer in LTP inotify tests.

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 92e48c70f0f0..39ddcaf0918f 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -412,16 +412,36 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
 					 unsigned int flags)
 {
 	struct fsnotify_mark *lmark, *mark;
+	LIST_HEAD(to_free);
 
+	/*
+	 * We have to be really careful here. Anytime we drop mark_mutex, e.g.
+	 * fsnotify_clear_marks_by_inode() can come and free marks. Even in our
+	 * to_free list so we have to use mark_mutex even when accessing that
+	 * list. And freeing mark requires us to drop mark_mutex. So we can
+	 * reliably free only the first mark in the list. That's why we first
+	 * move marks to free to to_free list in one go and then free marks in
+	 * to_free list one by one.
+	 */
 	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
-		if (mark->flags & flags) {
-			fsnotify_get_mark(mark);
-			fsnotify_destroy_mark_locked(mark, group);
-			fsnotify_put_mark(mark);
-		}
+		if (mark->flags & flags)
+			list_move(&mark->g_list, &to_free);
 	}
 	mutex_unlock(&group->mark_mutex);
+
+	while (1) {
+		mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
+		if (list_empty(&to_free)) {
+			mutex_unlock(&group->mark_mutex);
+			break;
+		}
+		mark = list_first_entry(&to_free, struct fsnotify_mark, g_list);
+		fsnotify_get_mark(mark);
+		fsnotify_destroy_mark_locked(mark, group);
+		mutex_unlock(&group->mark_mutex);
+		fsnotify_put_mark(mark);
+	}
 }
 
 /*
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fsnotify: Fix oops in fsnotify_clear_marks_by_group_flags()
  2015-07-23 13:54 [PATCH] fsnotify: Fix oops in fsnotify_clear_marks_by_group_flags() Jan Kara
@ 2015-07-24  5:52 ` Ashish Sangwan
  2015-07-27  9:43   ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Ashish Sangwan @ 2015-07-24  5:52 UTC (permalink / raw)
  To: Jan Kara
  Cc: Andrew Morton, linux-fsdevel, Ashish Sangwan, Lino Sanfilippo,
	stable

On Thu, Jul 23, 2015 at 7:24 PM, Jan Kara <jack@suse.com> wrote:
> fsnotify_clear_marks_by_group_flags() can race with
> fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> mark_mutex, a mark from the list iterated by
> fsnotify_clear_marks_by_group_flags() can be freed and thus the next
> entry pointer we have cached may become stale and we dereference
> free memory.
>
> Fix the problem by first moving marks to free to a special private list
> and then always free the first entry in the special list. This method is
> safe even when entries from the list can disappear once we drop the lock.
>
> CC: stable@vger.kernel.org
> Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> Signed-off-by: Jan Kara <jack@suse.com>
> ---
>  fs/notify/mark.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
>
> Andrew, this is the new version of the fsnotify oops fix. It has survived
> LTP tests and also a reproducer I wrote for triggering the oops. I'll work
> on integrating the reproducer in LTP inotify tests.
>
> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index 92e48c70f0f0..39ddcaf0918f 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -412,16 +412,36 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
>                                          unsigned int flags)
>  {
>         struct fsnotify_mark *lmark, *mark;
> +       LIST_HEAD(to_free);
>
> +       /*
> +        * We have to be really careful here. Anytime we drop mark_mutex, e.g.
> +        * fsnotify_clear_marks_by_inode() can come and free marks. Even in our
> +        * to_free list so we have to use mark_mutex even when accessing that
> +        * list. And freeing mark requires us to drop mark_mutex. So we can
> +        * reliably free only the first mark in the list. That's why we first
> +        * move marks to free to to_free list in one go and then free marks in
> +        * to_free list one by one.
> +        */
>         mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
>         list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
> -               if (mark->flags & flags) {
> -                       fsnotify_get_mark(mark);
> -                       fsnotify_destroy_mark_locked(mark, group);
> -                       fsnotify_put_mark(mark);
> -               }
> +               if (mark->flags & flags)
> +                       list_move(&mark->g_list, &to_free);
>         }
>         mutex_unlock(&group->mark_mutex);
> +
> +       while (1) {
> +               mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
Just a nitpick. Instead of locking/unlocking mutex multiple times in
the while loop,
can't we just keep the entire while loop inside the mutex_lock?
Overall, the patch seems ok to me.
Reviewed-by:  Ashish Sangwan <a.sangwan@samsung.com>

> +               if (list_empty(&to_free)) {
> +                       mutex_unlock(&group->mark_mutex);
> +                       break;
> +               }
> +               mark = list_first_entry(&to_free, struct fsnotify_mark, g_list);
> +               fsnotify_get_mark(mark);
> +               fsnotify_destroy_mark_locked(mark, group);
> +               mutex_unlock(&group->mark_mutex);
> +               fsnotify_put_mark(mark);
> +       }
>  }
>
>  /*
> --
> 2.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fsnotify: Fix oops in fsnotify_clear_marks_by_group_flags()
  2015-07-24  5:52 ` Ashish Sangwan
@ 2015-07-27  9:43   ` Jan Kara
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kara @ 2015-07-27  9:43 UTC (permalink / raw)
  To: Ashish Sangwan
  Cc: Jan Kara, Andrew Morton, linux-fsdevel, Ashish Sangwan,
	Lino Sanfilippo, stable

On Fri 24-07-15 11:22:49, Ashish Sangwan wrote:
> On Thu, Jul 23, 2015 at 7:24 PM, Jan Kara <jack@suse.com> wrote:
> > fsnotify_clear_marks_by_group_flags() can race with
> > fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
> > mark_mutex, a mark from the list iterated by
> > fsnotify_clear_marks_by_group_flags() can be freed and thus the next
> > entry pointer we have cached may become stale and we dereference
> > free memory.
> >
> > Fix the problem by first moving marks to free to a special private list
> > and then always free the first entry in the special list. This method is
> > safe even when entries from the list can disappear once we drop the lock.
> >
> > CC: stable@vger.kernel.org
> > Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
> > Signed-off-by: Jan Kara <jack@suse.com>
> > ---
> >  fs/notify/mark.c | 30 +++++++++++++++++++++++++-----
> >  1 file changed, 25 insertions(+), 5 deletions(-)
> >
> > Andrew, this is the new version of the fsnotify oops fix. It has survived
> > LTP tests and also a reproducer I wrote for triggering the oops. I'll work
> > on integrating the reproducer in LTP inotify tests.
> >
> > diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> > index 92e48c70f0f0..39ddcaf0918f 100644
> > --- a/fs/notify/mark.c
> > +++ b/fs/notify/mark.c
> > @@ -412,16 +412,36 @@ void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group,
> >                                          unsigned int flags)
> >  {
> >         struct fsnotify_mark *lmark, *mark;
> > +       LIST_HEAD(to_free);
> >
> > +       /*
> > +        * We have to be really careful here. Anytime we drop mark_mutex, e.g.
> > +        * fsnotify_clear_marks_by_inode() can come and free marks. Even in our
> > +        * to_free list so we have to use mark_mutex even when accessing that
> > +        * list. And freeing mark requires us to drop mark_mutex. So we can
> > +        * reliably free only the first mark in the list. That's why we first
> > +        * move marks to free to to_free list in one go and then free marks in
> > +        * to_free list one by one.
> > +        */
> >         mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
> >         list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
> > -               if (mark->flags & flags) {
> > -                       fsnotify_get_mark(mark);
> > -                       fsnotify_destroy_mark_locked(mark, group);
> > -                       fsnotify_put_mark(mark);
> > -               }
> > +               if (mark->flags & flags)
> > +                       list_move(&mark->g_list, &to_free);
> >         }
> >         mutex_unlock(&group->mark_mutex);
> > +
> > +       while (1) {
> > +               mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
> Just a nitpick. Instead of locking/unlocking mutex multiple times in
> the while loop,
> can't we just keep the entire while loop inside the mutex_lock?
> Overall, the patch seems ok to me.
> Reviewed-by:  Ashish Sangwan <a.sangwan@samsung.com>

Thanks for review! We cannot because fsnotify_destroy_mark_locked() drops
the mutex anyway. I have some cleanup patches prepared which split
fsnotify_destroy_mark_locked() into two functions - one which needs to be
called under mark_mutex and one which has to be called outside of it. And
for these patches the current code makes it easier to convert...

									Honza
> 
> > +               if (list_empty(&to_free)) {
> > +                       mutex_unlock(&group->mark_mutex);
> > +                       break;
> > +               }
> > +               mark = list_first_entry(&to_free, struct fsnotify_mark, g_list);
> > +               fsnotify_get_mark(mark);
> > +               fsnotify_destroy_mark_locked(mark, group);
> > +               mutex_unlock(&group->mark_mutex);
> > +               fsnotify_put_mark(mark);
> > +       }
> >  }
> >
> >  /*
> > --
> > 2.1.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-07-27  9:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-23 13:54 [PATCH] fsnotify: Fix oops in fsnotify_clear_marks_by_group_flags() Jan Kara
2015-07-24  5:52 ` Ashish Sangwan
2015-07-27  9:43   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2015-06-26 14:50 Jan Kara
2015-06-27 18:54 ` Lino Sanfilippo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).