From: Jan Kara <jack@suse.cz>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>, Jan Kara <jack@suse.cz>,
Eric Paris <eparis@redhat.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: fsnotify_mark_srcu wtf?
Date: Sat, 5 Nov 2016 22:43:45 +0100 [thread overview]
Message-ID: <20161105214345.GB32353@quack2.suse.cz> (raw)
In-Reply-To: <CAOQ4uxhMuKT78c0fif0+wZcyPV=h78Vr_4yZ8vUERkKc3+zKQg@mail.gmail.com>
On Thu 03-11-16 12:25:13, Amir Goldstein wrote:
> On Thu, Nov 3, 2016 at 12:09 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > We've got a report where a fanotify daemon that implements permission checks
> > screws up and doesn't send a reply. This then causes widespread hangs due to
> > fsnotify_mark_srcu read side lock being held and thus causing synchronize_srcu()
> > called from e.g. inotify_release()-> fsnotify_destroy_group()->
> > fsnotify_mark_destroy_list() to block.
> >
> > Below program demonstrates the issue. It should output a single line:
> >
> > close(inotify_fd): success
> >
> > Instead it outputs nothing, which means that close(inotify_fd) got blocked by
> > the waiting permission event.
> >
> > Wouldn't making the srcu per-group fix this? Would that be too expensive?
> >
>
> IIUC, the purpose of fsnotify_mark_srcu is to protect the inode and vfsmount
> mark lists, which are used to iterate the groups to send events to.
> So you cannot make it per group as far as I can tell.
>
> OTOH, specifically, for fanotify, once you can passed
> fanotify_should_send_event(),
> there is no need to keep a reference to the mark, so it may be safely destroyed,
> you only need a reference to the group.
>
> I tested this patch on top of my fanotify super block watch development branch
> and it seems to fix the problem you reported, but I am not savvy enough with
> srcu to say that this is correct.
> Jan, what do you think?
So the way you've written it is definitely buggy. The inode_node and
vfsmount_node pointers may become invalid once you drop SRCU lock and so
you cannot dereference them even after regaining that lock. Also frankly
your solution looks a bit ugly. I'll think whether we can somehow fix the
problem...
Honza
>
> From 28da34cdf9bf71fe9bbac13ded11a19da3b7a48e Mon Sep 17 00:00:00 2001
> From: Amir Goldstein <amir73il@gmail.com>
> Date: Thu, 3 Nov 2016 11:55:46 +0200
> Subject: [PATCH] fanotify: handle permission events without holding
> fsnotify_mark_srcu
>
> Handling fanotify events does not entail dereferencing fsnotify_mask
> beyond the point of fanotify_should_send_event().
>
> For the case of permission events, which may block indefinetely,
> return -EAGAIN and then fsnotify() will call handle_event() again
> without a reference to the mark.
>
> Without a reference to the mark, there is no need to call
> handle_event() under fsnotify_mark_srcu read side lock
> which is protecting the inode/vfsmount mark lists.
> We just need to hold a reference to the group
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/notify/fanotify/fanotify.c | 14 +++++++++++---
> fs/notify/fsnotify.c | 26 ++++++++++++++++++++++----
> 2 files changed, 33 insertions(+), 7 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c
> index 604e307..0ffa9ed 100644
> --- a/fs/notify/fanotify/fanotify.c
> +++ b/fs/notify/fanotify/fanotify.c
> @@ -298,9 +298,17 @@ static int fanotify_handle_event(struct
> fsnotify_group *group,
> BUILD_BUG_ON(FAN_ACCESS_PERM != FS_ACCESS_PERM);
> BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR);
>
> - if (!fanotify_should_send_event(inode_mark, vfsmnt_mark, mask, data
> - data_type))
> - return 0;
> + if (inode_mark || vfsmnt_mark) {
> + if (!fanotify_should_send_event(inode_mark, vfsmnt_mark, mask,
> + data, data_type))
> + return 0;
> +
> +#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS
> + /* Ask to be called again without a reference to mark */
> + if (mask & FAN_ALL_PERM_EVENTS)
> + return -EAGAIN;
> +#endif
> + }
>
> pr_debug("%s: group=%p inode=%p mask=%x\n", __func__, group, inode,
> mask);
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 34bccbe..dc0c86e 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -199,7 +199,7 @@ int fsnotify(struct inode *to_tell, __u32 mask,
> void *data, int data_is,
> {
> struct hlist_node *inode_node = NULL, *vfsmount_node = NULL;
> struct fsnotify_mark *inode_mark = NULL, *vfsmount_mark = NULL;
> - struct fsnotify_group *inode_group, *vfsmount_group;
> + struct fsnotify_group *inode_group, *vfsmount_group, *group;
> struct mount *mnt;
> int idx, ret = 0;
> /* global tests shouldn't care about events on child only the
> specific event */
> @@ -282,15 +282,33 @@ int fsnotify(struct inode *to_tell, __u32 mask,
> void *data, int data_is,
> ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask,
> data, data_is, cookie, file_name);
>
> - if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
> - goto out;
> -
> if (inode_group)
> inode_node = srcu_dereference(inode_node->next,
> &fsnotify_mark_srcu);
> if (vfsmount_group)
> vfsmount_node = srcu_dereference(vfsmount_node->next,
> &fsnotify_mark_srcu);
> +
> + if (ret != -EAGAIN)
> + continue;
> +
> + group = inode_group ? : vfsmount_group;
> + fsnotify_get_group(group);
> + srcu_read_unlock(&fsnotify_mark_srcu, idx);
> + /*
> + * Calling handle_event() again without reference to mark,
> + * so we do not need to call it under fsnotify_mark_srcu
> + * which is protecting the inode/vfsmount mark lists.
> + * We just need to hold a reference to the group
> + */
> + return group->ops->handle_event(group, to_tell, NULL, NULL,
> + mask, data, data_is,
> + file_name, cookie);
> + idx = srcu_read_lock(&fsnotify_mark_srcu);
> + fsnotify_put_group(group);
> +
> + if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS))
> + goto out;
> }
> ret = 0;
> out:
> --
> 2.7.4
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2016-11-05 21:43 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-02 22:09 fsnotify_mark_srcu wtf? Miklos Szeredi
2016-11-03 10:25 ` Amir Goldstein
2016-11-05 21:43 ` Jan Kara [this message]
2016-11-05 21:34 ` Jan Kara
2016-11-06 6:45 ` Amir Goldstein
2016-11-09 11:10 ` Jan Kara
2016-11-09 18:26 ` Amir Goldstein
2016-11-10 19:46 ` Jan Kara
2016-11-10 20:02 ` Amir Goldstein
2016-11-10 20:44 ` Miklos Szeredi
2016-11-10 22:41 ` Amir Goldstein
2016-11-13 18:43 ` Amir Goldstein
2016-11-14 11:59 ` Amir Goldstein
2016-12-02 8:26 ` Miklos Szeredi
2016-12-02 10:48 ` Jan Kara
2016-12-02 11:02 ` Miklos Szeredi
2016-12-06 17:07 ` Jan Kara
2016-12-02 11:41 ` Amir Goldstein
2016-12-02 11:57 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161105214345.GB32353@quack2.suse.cz \
--to=jack@suse.cz \
--cc=amir73il@gmail.com \
--cc=eparis@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox