public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>, Jan Kara <jack@suse.cz>,
	Eric Paris <eparis@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: fsnotify_mark_srcu wtf?
Date: Sat, 5 Nov 2016 22:43:45 +0100	[thread overview]
Message-ID: <20161105214345.GB32353@quack2.suse.cz> (raw)
In-Reply-To: <CAOQ4uxhMuKT78c0fif0+wZcyPV=h78Vr_4yZ8vUERkKc3+zKQg@mail.gmail.com>

On Thu 03-11-16 12:25:13, Amir Goldstein wrote:
> On Thu, Nov 3, 2016 at 12:09 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > We've got a report where a fanotify daemon that implements permission checks
> > screws up and doesn't send a reply.  This then causes widespread hangs due to
> > fsnotify_mark_srcu read side lock being held and thus causing synchronize_srcu()
> > called from e.g. inotify_release()-> fsnotify_destroy_group()->
> > fsnotify_mark_destroy_list() to block.
> >
> > Below program demonstrates the issue.  It should output a single line:
> >
> > close(inotify_fd): success
> >
> > Instead it outputs nothing, which means that close(inotify_fd) got blocked by
> > the waiting permission event.
> >
> > Wouldn't making the srcu per-group fix this?  Would that be too expensive?
> >
> 
> IIUC, the purpose of fsnotify_mark_srcu is to protect the inode and vfsmount
> mark lists, which are used to iterate the groups to send events to.
> So you cannot make it per group as far as I can tell.
> 
> OTOH, specifically, for fanotify, once you can passed
> fanotify_should_send_event(),
> there is no need to keep a reference to the mark, so it may be safely destroyed,
> you only need a reference to the group.
> 
> I tested this patch on top of my fanotify super block watch development branch
> and it seems to fix the problem you reported, but I am not savvy enough with
> srcu to say that this is correct.
> Jan, what do you think?

So the way you've written it is definitely buggy. The inode_node and
vfsmount_node pointers may become invalid once you drop SRCU lock and so
you cannot dereference them even after regaining that lock. Also frankly
your solution looks a bit ugly. I'll think whether we can somehow fix the
problem...

								Honza

> 
> From 28da34cdf9bf71fe9bbac13ded11a19da3b7a48e Mon Sep 17 00:00:00 2001
> From: Amir Goldstein <amir73il@gmail.com>
> Date: Thu, 3 Nov 2016 11:55:46 +0200
> Subject: [PATCH] fanotify: handle permission events without holding
>  fsnotify_mark_srcu
> 
> Handling fanotify events does not entail dereferencing fsnotify_mask
> beyond the point of fanotify_should_send_event().
> 
> For the case of permission events, which may block indefinetely,
> return -EAGAIN and then fsnotify() will call handle_event() again
> without a reference to the mark.
> 
> Without a reference to the mark, there is no need to call
> handle_event() under fsnotify_mark_srcu read side lock
> which is protecting the inode/vfsmount mark lists.
> We just need to hold a reference to the group
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/notify/fanotify/fanotify.c | 14 +++++++++++---
>  fs/notify/fsnotify.c          | 26 ++++++++++++++++++++++----
>  2 files changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 604e307..0ffa9ed 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -298,9 +298,17 @@ static int fanotify_handle_event(struct
> fsnotify_group *group,
>         BUILD_BUG_ON(FAN_ACCESS_PERM != FS_ACCESS_PERM);
>         BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
> 
> -       if (!fanotify_should_send_event(inode_mark, vfsmnt_mark, mask, data
> -                                       data_type))
> -               return 0;
> +       if (inode_mark || vfsmnt_mark) {
> +               if (!fanotify_should_send_event(inode_mark, vfsmnt_mark, mask,
> +                                               data, data_type))
> +                       return 0;
> +
> +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> +               /* Ask to be called again without a reference to mark */
> +               if (mask & FAN_ALL_PERM_EVENTS)
> +                       return -EAGAIN;
> +#endif
> +       }
> 
>         pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode,
>                  mask);
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 34bccbe..dc0c86e 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -199,7 +199,7 @@ int fsnotify(struct inode *to_tell, __u32 mask,
> void *data, int data_is,
>  {
>         struct hlist_node *inode_node = NULL, *vfsmount_node = NULL;
>         struct fsnotify_mark *inode_mark = NULL, *vfsmount_mark = NULL;
> -       struct fsnotify_group *inode_group, *vfsmount_group;
> +       struct fsnotify_group *inode_group, *vfsmount_group, *group;
>         struct mount *mnt;
>         int idx, ret = 0;
>         /* global tests shouldn't care about events on child only the
> specific event */
> @@ -282,15 +282,33 @@ int fsnotify(struct inode *to_tell, __u32 mask,
> void *data, int data_is,
>                 ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask,
>                                     data, data_is, cookie, file_name);
> 
> -               if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
> -                       goto out;
> -
>                 if (inode_group)
>                         inode_node = srcu_dereference(inode_node->next,
>                                                       &fsnotify_mark_srcu);
>                 if (vfsmount_group)
>                         vfsmount_node = srcu_dereference(vfsmount_node->next,
>                                                          &fsnotify_mark_srcu);
> +
> +               if (ret != -EAGAIN)
> +                       continue;
> +
> +               group = inode_group ? : vfsmount_group;
> +               fsnotify_get_group(group);
> +               srcu_read_unlock(&fsnotify_mark_srcu, idx);
> +               /*
> +                * Calling handle_event() again without reference to mark,
> +                * so we do not need to call it under fsnotify_mark_srcu
> +                * which is protecting the inode/vfsmount mark lists.
> +                * We just need to hold a reference to the group
> +                */
> +               return group->ops->handle_event(group, to_tell, NULL, NULL,
> +                                               mask, data, data_is,
> +                                               file_name, cookie);
> +               idx = srcu_read_lock(&fsnotify_mark_srcu);
> +               fsnotify_put_group(group);
> +
> +               if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
> +                       goto out;
>         }
>         ret = 0;
>  out:
> -- 
> 2.7.4
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2016-11-05 21:43 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-02 22:09 fsnotify_mark_srcu wtf? Miklos Szeredi
2016-11-03 10:25 ` Amir Goldstein
2016-11-05 21:43   ` Jan Kara [this message]
2016-11-05 21:34 ` Jan Kara
2016-11-06  6:45   ` Amir Goldstein
2016-11-09 11:10     ` Jan Kara
2016-11-09 18:26       ` Amir Goldstein
2016-11-10 19:46         ` Jan Kara
2016-11-10 20:02           ` Amir Goldstein
2016-11-10 20:44           ` Miklos Szeredi
2016-11-10 22:41             ` Amir Goldstein
2016-11-13 18:43           ` Amir Goldstein
2016-11-14 11:59             ` Amir Goldstein
2016-12-02  8:26           ` Miklos Szeredi
2016-12-02 10:48             ` Jan Kara
2016-12-02 11:02               ` Miklos Szeredi
2016-12-06 17:07                 ` Jan Kara
2016-12-02 11:41               ` Amir Goldstein
2016-12-02 11:57                 ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161105214345.GB32353@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=amir73il@gmail.com \
    --cc=eparis@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox