From: Al Viro <viro@zeniv.linux.org.uk>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
criu@openvz.org, bpf@vger.kernel.org,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Christian Brauner" <christian.brauner@ubuntu.com>,
"Oleg Nesterov" <oleg@redhat.com>,
"Cyrill Gorcunov" <gorcunov@gmail.com>,
"Jann Horn" <jann@thejh.net>, "Kees Cook" <keescook@chromium.org>,
"Daniel P . Berrangé" <berrange@redhat.com>,
"Jeff Layton" <jlayton@redhat.com>,
"Miklos Szeredi" <miklos@szeredi.hu>,
"Matthew Wilcox" <willy@infradead.org>,
"J. Bruce Fields" <bfields@fieldses.org>,
"Trond Myklebust" <trond.myklebust@hammerspace.com>,
"Chris Wright" <chrisw@redhat.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Martin KaFai Lau" <kafai@fb.com>,
"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
"Andrii Nakryiko" <andriin@fb.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@chromium.org>,
"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v2 09/24] file: Replace fcheck_files with files_lookup_fd_rcu
Date: Wed, 9 Dec 2020 16:54:11 +0000 [thread overview]
Message-ID: <20201209165411.GA16743@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20201207224903.GA4117703@ZenIV.linux.org.uk>
[paulmck Cc'd]
On Mon, Dec 07, 2020 at 10:49:04PM +0000, Al Viro wrote:
> On Mon, Dec 07, 2020 at 10:46:57PM +0000, Al Viro wrote:
> > On Fri, Nov 20, 2020 at 05:14:26PM -0600, Eric W. Biederman wrote:
> >
> > > /*
> > > * Check whether the specified fd has an open file.
> > > */
> > > -#define fcheck(fd) fcheck_files(current->files, fd)
> > > +#define fcheck(fd) files_lookup_fd_rcu(current->files, fd)
> >
> > Huh?
> > fs/file.c:1113: file = fcheck(oldfd);
> > dup3(), under ->file_lock, no rcu_read_lock() in sight
> >
> > fs/locks.c:2548: f = fcheck(fd);
> > fcntl_setlk(), ditto
> >
> > fs/locks.c:2679: f = fcheck(fd);
> > fcntl_setlk64(), ditto
> >
> > fs/notify/dnotify/dnotify.c:330: f = fcheck(fd);
> > fcntl_dirnotify(); this one _is_ under rcu_read_lock().
> >
> >
> > IOW, unless I've missed something earlier in the series, this is wrong.
>
> I have missed something, all right. Ignore that comment...
Actually, I take that back - the use of fcheck() in dnotify _is_ interesting.
At the very least it needs to be commented upon; what that code is trying
to prevent is a race between fcntl_dirnotify() and close(2)/dup2(2) closing
the descriptor in question. The problem is, dnotify marks are removed
when we detach from descriptor table; that's done by filp_close() calling
dnotify_flush().
Suppose fcntl(fd, F_NOTIFY, 0) in one thread races with close(fd) in another
(both sharing the same descriptor table). If the former had created and
inserted a mark *after* the latter has gotten past dnotify_flush(), there
would be nothing to evict that mark.
That's the reason that fcheck() is there. rcu_read_lock() used to be
sufficient, but the locking has changed since then and even if it is
still enough, that's not at all obvious.
Exclusion is not an issue; barriers, OTOH... Back then we had
->i_lock taken both by dnotify_flush() before any checks and
by fcntl_dirnotify() around the fcheck+insertion. So on close
side we had
store NULL into descriptor table
acquire ->i_lock
fetch ->i_dnotify
...
release ->i_lock
while on fcntl() side we had
acquire ->i_lock
fetch from descriptor table, sod off if not our file
...
store ->i_dnotify
...
release ->i_lock
Storing NULL into descriptor table could get reordered into
->i_lock-protected area in dnotify_flush(), but it could not
get reordered past releasing ->i_lock. So fcntl_dirnotify()
either grabbed ->i_lock before dnotify_flush() (in which case
missing the store of NULL into descriptor table wouldn't
matter, since dnotify_flush() would've observed the mark
we'd inserted) or it would've seen that store to descriptor
table.
Nowadays it's nowhere near as straightforward; in fcntl_dirnotify()
we have
/* this is needed to prevent the fcntl/close race described below */
mutex_lock(&dnotify_group->mark_mutex);
and it would appear to be similar to the original situation, with
->mark_mutex serving in place of ->i_lock. However, dnotify_flush()
might not take that mutex at all - it has
fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group);
if (!fsn_mark)
return;
before grabbing that thing. So the things are trickier - we actually
rely upon the barriers in fsnotify_find_mark(). And those are complicated.
The case when it returns non-NULL is not a problem - there we have that
mutex providing the barriers we need. NULL can be returned in two cases:
a) ->i_fsnotify_marks is not empty, but it contains no
dnotify marks. In that case we have ->i_fsnotify_marks.lock acquired
and released. By the time it gets around to fcheck(), fcntl_dirnotify() has
either found or created and inserted a dnotify mark into that list, with
->i_fsnotify_marks.lock acquired/released around the insertion, so we
are fine - either fcntl_dirnotify() gets there first (in which case
dnotify_flush() will observe it), or release of that lock by
fsnotify_find_mark() called by dnotify_flush() will serve as a barrier,
making sure that store to descriptor table will be observed.
b) fsnotify_find_mark() (fsnotify_grab_connector() it calls,
actually) finds ->i_fsnotify_marks empty. That's where the things
get interesting; we have
idx = srcu_read_lock(&fsnotify_mark_srcu);
conn = srcu_dereference(*connp, &fsnotify_mark_srcu);
if (!conn)
goto out;
on dnotify_flush() side. The matching store of fcntl_dirnotify()
side would be in fsnotify_attach_connector_to_object(), where
we have
/*
* cmpxchg() provides the barrier so that readers of *connp can see
* only initialized structure
*/
if (cmpxchg(connp, NULL, conn)) {
/* Someone else created list structure for us */
So we have
A:
store NULL to descriptor table
srcu_read_lock
srcu_dereference fetches NULL from ->i_fsnotify_marks
vs.
B:
cmpxchg replaces NULL with non-NULL in ->i_fsnotify_marks
fetch from descriptor table, can't miss the store done by A
Which might be safe, but the whole thing *RELLY* needs to be discussed
in fcntl_dirnotify() in more details. fs/notify/* guts are convoluted
enough to confuse anyone unfamiliar with them.
next prev parent reply other threads:[~2020-12-09 16:55 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <87r1on1v62.fsf@x220.int.ebiederm.org>
2020-11-20 23:14 ` [PATCH v2 01/24] exec: Move unshare_files to fix posix file locking during exec Eric W. Biederman
2020-11-20 23:58 ` Linus Torvalds
2020-12-07 22:22 ` Al Viro
[not found] ` <87zh2pusdw.fsf@x220.int.ebiederm.org>
2020-12-07 23:35 ` Al Viro
2020-11-20 23:14 ` [PATCH v2 02/24] exec: Simplify unshare_files Eric W. Biederman
2020-11-23 17:50 ` Oleg Nesterov
2020-11-23 18:25 ` Linus Torvalds
[not found] ` <87im9vx08i.fsf@x220.int.ebiederm.org>
[not found] ` <87pn42r0n7.fsf@x220.int.ebiederm.org>
2020-11-24 19:58 ` Linus Torvalds
2020-11-24 20:14 ` Arnd Bergmann
2020-11-24 23:44 ` Geoff Levand
2020-11-25 1:16 ` Eric W. Biederman
2020-11-27 20:29 ` Arnd Bergmann
2020-11-30 21:37 ` Eric W. Biederman
2020-12-01 9:46 ` Michael Ellerman
2020-11-25 21:51 ` [RFC][PATCH] coredump: Document coredump code exclusively used by cell spufs Eric W. Biederman
2020-12-02 15:20 ` Eric W. Biederman
2020-12-02 15:58 ` Arnd Bergmann
2020-12-07 22:32 ` [PATCH v2 02/24] exec: Simplify unshare_files Al Viro
2020-11-20 23:14 ` [PATCH v2 03/24] exec: Remove reset_files_struct Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 04/24] kcmp: In kcmp_epoll_target use fget_task Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 05/24] bpf: In bpf_task_fd_query " Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 06/24] proc/fd: In proc_fd_link " Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 07/24] file: Rename __fcheck_files to files_lookup_fd_raw Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 08/24] file: Factor files_lookup_fd_locked out of fcheck_files Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 09/24] file: Replace fcheck_files with files_lookup_fd_rcu Eric W. Biederman
2020-12-07 22:46 ` Al Viro
2020-12-07 22:49 ` Al Viro
2020-12-09 16:54 ` Al Viro [this message]
2020-12-09 17:44 ` Paul E. McKenney
2020-11-20 23:14 ` [PATCH v2 10/24] file: Rename fcheck lookup_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 11/24] file: Implement task_lookup_fd_rcu Eric W. Biederman
2020-11-21 18:19 ` Cyrill Gorcunov
[not found] ` <87blfp1r8b.fsf@x220.int.ebiederm.org>
2020-11-22 13:52 ` Cyrill Gorcunov
2020-11-20 23:14 ` [PATCH v2 12/24] proc/fd: In tid_fd_mode use task_lookup_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 13/24] kcmp: In get_file_raw_ptr " Eric W. Biederman
2020-11-23 21:05 ` Cyrill Gorcunov
2020-11-20 23:14 ` [PATCH v2 14/24] file: Implement task_lookup_next_fd_rcu Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 15/24] proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu Eric W. Biederman
2020-12-07 23:29 ` Al Viro
[not found] ` <877dprvs8e.fsf@x220.int.ebiederm.org>
2020-12-09 4:07 ` Al Viro
[not found] ` <877dprtxly.fsf@x220.int.ebiederm.org>
2020-12-09 14:23 ` Al Viro
2020-12-09 18:04 ` [PATCH] files: rcu free files_struct Eric W. Biederman
2020-12-09 19:13 ` Linus Torvalds
2020-12-09 19:50 ` Al Viro
2020-12-09 21:32 ` Eric W. Biederman
2020-12-10 6:13 ` Al Viro
2020-12-10 19:29 ` Eric W. Biederman
2020-12-10 21:36 ` Al Viro
2020-12-10 22:30 ` Christian Brauner
2020-12-10 22:54 ` Al Viro
2020-12-10 23:10 ` Al Viro
2020-12-09 21:42 ` [PATCH -1/24] exec: Don't open code get_close_on_exec Eric W. Biederman
2020-12-09 22:04 ` [PATCH] files: rcu free files_struct Eric W. Biederman
2020-12-09 19:49 ` Matthew Wilcox
2020-12-09 22:06 ` Eric W. Biederman
2020-12-09 22:58 ` Al Viro
2020-12-09 23:01 ` Linus Torvalds
2020-12-09 23:04 ` Linus Torvalds
2020-12-09 23:07 ` Matthew Wilcox
2020-12-09 23:26 ` Paul E. McKenney
2020-12-09 23:29 ` Linus Torvalds
2020-12-10 0:39 ` Paul E. McKenney
2020-12-10 0:41 ` Linus Torvalds
2020-12-10 0:57 ` Paul E. McKenney
2020-11-20 23:14 ` [PATCH v2 16/24] bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu Eric W. Biederman
2020-11-23 17:06 ` Yonghong Song
2020-11-20 23:14 ` [PATCH v2 17/24] proc/fd: In fdinfo seq_show don't use get_files_struct Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 18/24] file: Merge __fd_install into fd_install Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 19/24] file: In f_dupfd read RLIMIT_NOFILE once Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 20/24] file: Merge __alloc_fd into alloc_fd Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 21/24] file: Rename __close_fd to close_fd and remove the files parameter Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 22/24] file: Replace ksys_close with close_fd Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 23/24] file: Rename __close_fd_get_file close_fd_get_file Eric W. Biederman
2020-11-20 23:14 ` [PATCH v2 24/24] file: Remove get_files_struct Eric W. Biederman
2020-11-21 0:05 ` [PATCH v2 00/24] exec: Move unshare_files and guarantee files_struct.count is correct Linus Torvalds
2020-11-28 5:12 ` Al Viro
2020-12-07 23:56 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201209165411.GA16743@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=andriin@fb.com \
--cc=ast@kernel.org \
--cc=berrange@redhat.com \
--cc=bfields@fieldses.org \
--cc=bpf@vger.kernel.org \
--cc=christian.brauner@ubuntu.com \
--cc=chrisw@redhat.com \
--cc=criu@openvz.org \
--cc=daniel@iogearbox.net \
--cc=ebiederm@xmission.com \
--cc=gorcunov@gmail.com \
--cc=jann@thejh.net \
--cc=jlayton@redhat.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=keescook@chromium.org \
--cc=kpsingh@chromium.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=oleg@redhat.com \
--cc=paulmck@kernel.org \
--cc=songliubraving@fb.com \
--cc=torvalds@linux-foundation.org \
--cc=trond.myklebust@hammerspace.com \
--cc=willy@infradead.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).