From: Al Viro <viro@zeniv.linux.org.uk>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
criu@openvz.org, bpf@vger.kernel.org,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Christian Brauner" <christian.brauner@ubuntu.com>,
"Oleg Nesterov" <oleg@redhat.com>,
"Cyrill Gorcunov" <gorcunov@gmail.com>,
"Jann Horn" <jann@thejh.net>, "Kees Cook" <keescook@chromium.org>,
"Daniel P . Berrangé" <berrange@redhat.com>,
"Jeff Layton" <jlayton@redhat.com>,
"Miklos Szeredi" <miklos@szeredi.hu>,
"Matthew Wilcox" <willy@infradead.org>,
"J. Bruce Fields" <bfields@fieldses.org>,
"Trond Myklebust" <trond.myklebust@hammerspace.com>,
"Chris Wright" <chrisw@redhat.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Martin KaFai Lau" <kafai@fb.com>,
"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
"Andrii Nakryiko" <andriin@fb.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@chromium.org>,
"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v2 09/24] file: Replace fcheck_files with files_lookup_fd_rcu
Date: Wed, 9 Dec 2020 16:54:11 +0000 [thread overview]
Message-ID: <20201209165411.GA16743@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20201207224903.GA4117703@ZenIV.linux.org.uk>
[paulmck Cc'd]
On Mon, Dec 07, 2020 at 10:49:04PM +0000, Al Viro wrote:
> On Mon, Dec 07, 2020 at 10:46:57PM +0000, Al Viro wrote:
> > On Fri, Nov 20, 2020 at 05:14:26PM -0600, Eric W. Biederman wrote:
> >
> > > /*
> > > * Check whether the specified fd has an open file.
> > > */
> > > -#define fcheck(fd) fcheck_files(current->files, fd)
> > > +#define fcheck(fd) files_lookup_fd_rcu(current->files, fd)
> >
> > Huh?
> > fs/file.c:1113: file = fcheck(oldfd);
> > dup3(), under ->file_lock, no rcu_read_lock() in sight
> >
> > fs/locks.c:2548: f = fcheck(fd);
> > fcntl_setlk(), ditto
> >
> > fs/locks.c:2679: f = fcheck(fd);
> > fcntl_setlk64(), ditto
> >
> > fs/notify/dnotify/dnotify.c:330: f = fcheck(fd);
> > fcntl_dirnotify(); this one _is_ under rcu_read_lock().
> >
> >
> > IOW, unless I've missed something earlier in the series, this is wrong.
>
> I have missed something, all right. Ignore that comment...
Actually, I take that back - the use of fcheck() in dnotify _is_ interesting.
At the very least it needs to be commented upon; what that code is trying
to prevent is a race between fcntl_dirnotify() and close(2)/dup2(2) closing
the descriptor in question. The problem is, dnotify marks are removed
when we detach from descriptor table; that's done by filp_close() calling
dnotify_flush().
Suppose fcntl(fd, F_NOTIFY, 0) in one thread races with close(fd) in another
(both sharing the same descriptor table). If the former had created and
inserted a mark *after* the latter has gotten past dnotify_flush(), there
would be nothing to evict that mark.
That's the reason that fcheck() is there. rcu_read_lock() used to be
sufficient, but the locking has changed since then and even if it is
still enough, that's not at all obvious.
Exclusion is not an issue; barriers, OTOH... Back then we had
->i_lock taken both by dnotify_flush() before any checks and
by fcntl_dirnotify() around the fcheck+insertion. So on close
side we had
store NULL into descriptor table
acquire ->i_lock
fetch ->i_dnotify
...
release ->i_lock
while on fcntl() side we had
acquire ->i_lock
fetch from descriptor table, sod off if not our file
...
store ->i_dnotify
...
release ->i_lock
Storing NULL into descriptor table could get reordered into
->i_lock-protected area in dnotify_flush(), but it could not
get reordered past releasing ->i_lock. So fcntl_dirnotify()
either grabbed ->i_lock before dnotify_flush() (in which case
missing the store of NULL into descriptor table wouldn't
matter, since dnotify_flush() would've observed the mark
we'd inserted) or it would've seen that store to descriptor
table.
Nowadays it's nowhere near as straightforward; in fcntl_dirnotify()
we have
/* this is needed to prevent the fcntl/close race described below */
mutex_lock(&dnotify_group->mark_mutex);
and it would appear to be similar to the original situation, with
->mark_mutex serving in place of ->i_lock. However, dnotify_flush()
might not take that mutex at all - it has
fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group);
if (!fsn_mark)
return;
before grabbing that thing. So the things are trickier - we actually
rely upon the barriers in fsnotify_find_mark(). And those are complicated.
The case when it returns non-NULL is not a problem - there we have that
mutex providing the barriers we need. NULL can be returned in two cases:
a) ->i_fsnotify_marks is not empty, but it contains no
dnotify marks. In that case we have ->i_fsnotify_marks.lock acquired
and released. By the time it gets around to fcheck(), fcntl_dirnotify() has
either found or created and inserted a dnotify mark into that list, with
->i_fsnotify_marks.lock acquired/released around the insertion, so we
are fine - either fcntl_dirnotify() gets there first (in which case
dnotify_flush() will observe it), or release of that lock by
fsnotify_find_mark() called by dnotify_flush() will serve as a barrier,
making sure that store to descriptor table will be observed.
b) fsnotify_find_mark() (fsnotify_grab_connector() it calls,
actually) finds ->i_fsnotify_marks empty. That's where the things
get interesting; we have
idx = srcu_read_lock(&fsnotify_mark_srcu);
conn = srcu_dereference(*connp, &fsnotify_mark_srcu);
if (!conn)
goto out;
on dnotify_flush() side. The matching store of fcntl_dirnotify()
side would be in fsnotify_attach_connector_to_object(), where
we have
/*
* cmpxchg() provides the barrier so that readers of *connp can see
* only initialized structure
*/
if (cmpxchg(connp, NULL, conn)) {
/* Someone else created list structure for us */
So we have
A:
store NULL to descriptor table
srcu_read_lock
srcu_dereference fetches NULL from ->i_fsnotify_marks
vs.
B:
cmpxchg replaces NULL with non-NULL in ->i_fsnotify_marks
fetch from descriptor table, can't miss the store done by A
Which might be safe, but the whole thing *RELLY* needs to be discussed
in fcntl_dirnotify() in more details. fs/notify/* guts are convoluted
enough to confuse anyone unfamiliar with them.
next prev parent reply other threads:[~2020-12-09 16:55 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87r1on1v62.fsf@x220.int.ebiederm.org>
2020-11-20 23:14 ` [PATCH v2 01/24] exec: Move unshare_files to fix posix file locking during exec Eric W. Biederman
2020-11-20 23:58 ` Linus Torvalds
2020-12-07 22:22 ` Al Viro
[not found] ` <87zh2pusdw.fsf@x220.int.ebiederm.org>
2020-12-07 23:35 ` Al Viro
2020-11-20 23:14 ` [PATCH v2 02/24] exec: Simplify unshare_files Eric W. Biederman
2020-11-23 17:50 ` Oleg Nesterov
2020-11-23 18:25 ` Linus Torvalds
[not found] ` <87im9vx08i.fsf@x220.int.ebiederm.org>
[not found] ` <87pn42r0n7.fsf@x220.int.ebiederm.org>
2020-11-24 19:58 ` Linus Torvalds
2020-11-24 20:14 ` Arnd Bergmann
2020-11-24 23:44 ` Geoff Levand
2020-11-25 1:16 ` Eric W. Biederman
2020-11-27 20:29 ` Arnd Bergmann
2020-11-30 21:37 ` Eric W. Biederman
2020-12-01 9:46 ` Michael Ellerman
2020-11-25 21:51 ` [RFC][PATCH] coredump: Document coredump code exclusively used by cell spufs Eric W. Biederman
2020-12-02 15:20 ` Eric W. Biederman
2020-12-02 15:58 ` Arnd Bergmann
2020-12-07 22:32 ` [PATCH v2 02/24] exec: Simplify unshare_files Al Viro
2020-11-20 23:14 ` [PATCH v2 03/24] exec: Remove reset_files_struct Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 04/24] kcmp: In kcmp_epoll_target use fget_task Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 05/24] bpf: In bpf_task_fd_query " Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 06/24] proc/fd: In proc_fd_link " Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 07/24] file: Rename __fcheck_files to files_lookup_fd_raw Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 08/24] file: Factor files_lookup_fd_locked out of fcheck_files Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 09/24] file: Replace fcheck_files with files_lookup_fd_rcu Eric W. Biederman
2020-12-07 22:46 ` Al Viro
2020-12-07 22:49 ` Al Viro
2020-12-09 16:54 ` Al Viro [this message]
2020-12-09 17:44 ` Paul E. McKenney
2020-11-20 23:14 ` [PATCH v2 10/24] file: Rename fcheck lookup_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 11/24] file: Implement task_lookup_fd_rcu Eric W. Biederman
2020-11-21 18:19 ` Cyrill Gorcunov
[not found] ` <87blfp1r8b.fsf@x220.int.ebiederm.org>
2020-11-22 13:52 ` Cyrill Gorcunov
2020-11-20 23:14 ` [PATCH v2 12/24] proc/fd: In tid_fd_mode use task_lookup_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 13/24] kcmp: In get_file_raw_ptr " Eric W. Biederman
2020-11-23 21:05 ` Cyrill Gorcunov
2020-11-20 23:14 ` [PATCH v2 14/24] file: Implement task_lookup_next_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 15/24] proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu Eric W. Biederman
2020-12-07 23:29 ` Al Viro
[not found] ` <877dprvs8e.fsf@x220.int.ebiederm.org>
2020-12-09 4:07 ` Al Viro
[not found] ` <877dprtxly.fsf@x220.int.ebiederm.org>
2020-12-09 14:23 ` Al Viro
2020-12-09 18:04 ` [PATCH] files: rcu free files_struct Eric W. Biederman
2020-12-09 19:13 ` Linus Torvalds
2020-12-09 19:50 ` Al Viro
2020-12-09 21:32 ` Eric W. Biederman
2020-12-10 6:13 ` Al Viro
2020-12-10 19:29 ` Eric W. Biederman
2020-12-10 21:36 ` Al Viro
2020-12-10 22:30 ` Christian Brauner
2020-12-10 22:54 ` Al Viro
2020-12-10 23:10 ` Al Viro
2020-12-09 21:42 ` [PATCH -1/24] exec: Don't open code get_close_on_exec Eric W. Biederman
2020-12-09 22:04 ` [PATCH] files: rcu free files_struct Eric W. Biederman
2020-12-09 19:49 ` Matthew Wilcox
2020-12-09 22:06 ` Eric W. Biederman
2020-12-09 22:58 ` Al Viro
2020-12-09 23:01 ` Linus Torvalds
2020-12-09 23:04 ` Linus Torvalds
2020-12-09 23:07 ` Matthew Wilcox
2020-12-09 23:26 ` Paul E. McKenney
2020-12-09 23:29 ` Linus Torvalds
2020-12-10 0:39 ` Paul E. McKenney
2020-12-10 0:41 ` Linus Torvalds
2020-12-10 0:57 ` Paul E. McKenney
2020-11-20 23:14 ` [PATCH v2 16/24] bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu Eric W. Biederman
2020-11-23 17:06 ` Yonghong Song
2020-11-20 23:14 ` [PATCH v2 17/24] proc/fd: In fdinfo seq_show don't use get_files_struct Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 18/24] file: Merge __fd_install into fd_install Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 19/24] file: In f_dupfd read RLIMIT_NOFILE once Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 20/24] file: Merge __alloc_fd into alloc_fd Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 21/24] file: Rename __close_fd to close_fd and remove the files parameter Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 22/24] file: Replace ksys_close with close_fd Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 23/24] file: Rename __close_fd_get_file close_fd_get_file Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 24/24] file: Remove get_files_struct Eric W. Biederman
2020-11-21 0:05 ` [PATCH v2 00/24] exec: Move unshare_files and guarantee files_struct.count is correct Linus Torvalds
2020-11-28 5:12 ` Al Viro
2020-12-07 23:56 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201209165411.GA16743@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=andriin@fb.com \
--cc=ast@kernel.org \
--cc=berrange@redhat.com \
--cc=bfields@fieldses.org \
--cc=bpf@vger.kernel.org \
--cc=christian.brauner@ubuntu.com \
--cc=chrisw@redhat.com \
--cc=criu@openvz.org \
--cc=daniel@iogearbox.net \
--cc=ebiederm@xmission.com \
--cc=gorcunov@gmail.com \
--cc=jann@thejh.net \
--cc=jlayton@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=keescook@chromium.org \
--cc=kpsingh@chromium.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=songliubraving@fb.com \
--cc=torvalds@linux-foundation.org \
--cc=trond.myklebust@hammerspace.com \
--cc=willy@infradead.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.