All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jens Axboe <axboe@kernel.dk>, Jann Horn <jannh@google.com>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Thu, 7 Feb 2019 13:31:35 +0000	[thread overview]
Message-ID: <20190207133135.GZ2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190207092253.GD19821@veci.piliscsaba.redhat.com>

On Thu, Feb 07, 2019 at 10:22:53AM +0100, Miklos Szeredi wrote:
> On Thu, Feb 07, 2019 at 04:00:59AM +0000, Al Viro wrote:
> 
> > So in theory it would be possible to have
> > 	* thread A: sendmsg() has SCM_RIGHTS created and populated,
> > complete with file refcount and ->inflight increments implied,
> > at which point it gets preempted and loses the timeslice.
> > 	* thread B: gets to run and removes all references
> > from descriptor table it shares with thread A.
> > 	* on another CPU we have garbage collector triggered;
> > it determines the set of potentially unreachable unix_sock and
> > everything in our SCM_RIGHTS _is_ in that set, now that no
> > other references remain.
> > 	* on the first CPU, thread A regains the timeslice
> > and inserts its SCM_RIGHTS into queue.  And it does contain
> > references to sockets from the candidate set of running
> > garbage collector, confusing the hell out of it.
> 
> Reminds me: long time ago there was a bug report, and based on that I found a
> bug in MSG_PEEK handling (not confirmed to have fixed the reported bug).  This
> fix, although pretty simple, got lost somehow.  While unix gc code is in your
> head, can you please review and I'll resend through davem?

Umm...  I think the bug is real (something that looks like an eviction
candidate, but actually is referenced from the reachable queue might
get peeked via that queue, then have _its_ queue modified via new
external reference, all between two passes over that queue, confusing the
fuck out of unix_gc()), but I think the fix is an overkill...

Am I right assuming that this queue-modifying operation is accept(), removing
an embryo unix_sock from the queue of listener and thus hiding SCM_RIGHTS in
_its_ queue from scan_children()?

Let me think of it a bit, OK?  While we are at it, some questions from digging
through the current net/unix/garbage.c:
	1) is there any need for ->inflight to be atomic?  All accesses are under
unix_gc_lock, after all...
	2) pumping unix_gc on each sodding reference in SCM_RIGHTS (within
unix_notinflight()/unix_inflight()) looks atrocious... wouldn't it be better to
hold it over that loop?
	3) unix_get_socket() probably ought to be static nowadays...
	4) I wonder if in scan_inflight()/scan_children() we would be better
off with explicit switch (by enum argument) instead of an indirect call.
	5) do we really need UNIX_GC_MAYBE_CYCLE?  Looks like the only
real use is in inc_inflight_move_tail(), and AFAICS it could bloody well
have been
	u->inflight++;
	if (u->inflight == 1)	// just got from zero to non-zero
		list_move_tail(&u->link, &gc_candidates);
The logics there is "we'd found a reference to something that still was
a candidate for eviction in a reachable SCM_RIGHTS, so it's actually
reachable and needs to be scanned (and removed from the set of candidates);
move to the end of list, so that the main loop gets around to it".
If it *was* past the cursor in the list, there's no need to move it; if
we got past it, it must've had zero ->inflight (or we would've removed
it from the set back when we got past it).  Note that it's only called if
UNIX_GC_CANDIDATE is set (i.e. if it's in the initial candidate set),
so for this one ->inflight is guaranteed to mean the number of SCM_RIGHTS
refs from outside of the current candidate set...
	6) unix_get_socket() looks like it might benefit from another
FMODE bit; not sure where it's best set, though - the obvious way would
be SOCK_GC_CARES in sock->flags, set by e.g. unix_create1(), with
sock_alloc_file() propagating it into file->f_mode.  Then unix_get_socket()
would be able to bugger off with NULL for most of the references in
SCM_RIGHTS, without looking into inode...

Comments?  IIRC, you'd done the last serious round of rewriting unix_gc()
and friends...

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@zeniv.linux.org.uk>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Jens Axboe <axboe@kernel.dk>, Jann Horn <jannh@google.com>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Thu, 7 Feb 2019 13:31:35 +0000	[thread overview]
Message-ID: <20190207133135.GZ2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190207092253.GD19821@veci.piliscsaba.redhat.com>

On Thu, Feb 07, 2019 at 10:22:53AM +0100, Miklos Szeredi wrote:
> On Thu, Feb 07, 2019 at 04:00:59AM +0000, Al Viro wrote:
> 
> > So in theory it would be possible to have
> > 	* thread A: sendmsg() has SCM_RIGHTS created and populated,
> > complete with file refcount and ->inflight increments implied,
> > at which point it gets preempted and loses the timeslice.
> > 	* thread B: gets to run and removes all references
> > from descriptor table it shares with thread A.
> > 	* on another CPU we have garbage collector triggered;
> > it determines the set of potentially unreachable unix_sock and
> > everything in our SCM_RIGHTS _is_ in that set, now that no
> > other references remain.
> > 	* on the first CPU, thread A regains the timeslice
> > and inserts its SCM_RIGHTS into queue.  And it does contain
> > references to sockets from the candidate set of running
> > garbage collector, confusing the hell out of it.
> 
> Reminds me: long time ago there was a bug report, and based on that I found a
> bug in MSG_PEEK handling (not confirmed to have fixed the reported bug).  This
> fix, although pretty simple, got lost somehow.  While unix gc code is in your
> head, can you please review and I'll resend through davem?

Umm...  I think the bug is real (something that looks like an eviction
candidate, but actually is referenced from the reachable queue might
get peeked via that queue, then have _its_ queue modified via new
external reference, all between two passes over that queue, confusing the
fuck out of unix_gc()), but I think the fix is an overkill...

Am I right assuming that this queue-modifying operation is accept(), removing
an embryo unix_sock from the queue of listener and thus hiding SCM_RIGHTS in
_its_ queue from scan_children()?

Let me think of it a bit, OK?  While we are at it, some questions from digging
through the current net/unix/garbage.c:
	1) is there any need for ->inflight to be atomic?  All accesses are under
unix_gc_lock, after all...
	2) pumping unix_gc on each sodding reference in SCM_RIGHTS (within
unix_notinflight()/unix_inflight()) looks atrocious... wouldn't it be better to
hold it over that loop?
	3) unix_get_socket() probably ought to be static nowadays...
	4) I wonder if in scan_inflight()/scan_children() we would be better
off with explicit switch (by enum argument) instead of an indirect call.
	5) do we really need UNIX_GC_MAYBE_CYCLE?  Looks like the only
real use is in inc_inflight_move_tail(), and AFAICS it could bloody well
have been
	u->inflight++;
	if (u->inflight == 1)	// just got from zero to non-zero
		list_move_tail(&u->link, &gc_candidates);
The logics there is "we'd found a reference to something that still was
a candidate for eviction in a reachable SCM_RIGHTS, so it's actually
reachable and needs to be scanned (and removed from the set of candidates);
move to the end of list, so that the main loop gets around to it".
If it *was* past the cursor in the list, there's no need to move it; if
we got past it, it must've had zero ->inflight (or we would've removed
it from the set back when we got past it).  Note that it's only called if
UNIX_GC_CANDIDATE is set (i.e. if it's in the initial candidate set),
so for this one ->inflight is guaranteed to mean the number of SCM_RIGHTS
refs from outside of the current candidate set...
	6) unix_get_socket() looks like it might benefit from another
FMODE bit; not sure where it's best set, though - the obvious way would
be SOCK_GC_CARES in sock->flags, set by e.g. unix_create1(), with
sock_alloc_file() propagating it into file->f_mode.  Then unix_get_socket()
would be able to bugger off with NULL for most of the references in
SCM_RIGHTS, without looking into inode...

Comments?  IIRC, you'd done the last serious round of rewriting unix_gc()
and friends...

  reply	other threads:[~2019-02-07 13:31 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 19:26 [PATCHSET v9] io_uring IO interface Jens Axboe
2019-01-29 19:26 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 20:47   ` Jann Horn
2019-01-29 20:47     ` Jann Horn
2019-01-29 20:56     ` Jens Axboe
2019-01-29 20:56       ` Jens Axboe
2019-01-29 21:10       ` Jann Horn
2019-01-29 21:10         ` Jann Horn
2019-01-29 21:33         ` Jens Axboe
2019-01-29 21:33           ` Jens Axboe
2019-01-29 19:26 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 23:31   ` Jann Horn
2019-01-29 23:31     ` Jann Horn
2019-01-29 23:44     ` Jens Axboe
2019-01-29 23:44       ` Jens Axboe
2019-01-30 15:33       ` Jens Axboe
2019-01-30 15:33         ` Jens Axboe
2019-01-29 19:26 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 22:44   ` Jann Horn
2019-01-29 22:44     ` Jann Horn
2019-01-29 22:56     ` Jens Axboe
2019-01-29 22:56       ` Jens Axboe
2019-01-29 23:03       ` Jann Horn
2019-01-29 23:03         ` Jann Horn
2019-01-29 23:06         ` Jens Axboe
2019-01-29 23:06           ` Jens Axboe
2019-01-29 23:08           ` Jann Horn
2019-01-29 23:08             ` Jann Horn
2019-01-29 23:14             ` Jens Axboe
2019-01-29 23:14               ` Jens Axboe
2019-01-29 23:42               ` Jann Horn
2019-01-29 23:42                 ` Jann Horn
2019-01-29 23:51                 ` Jens Axboe
2019-01-29 23:51                   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-30  1:29   ` Jann Horn
2019-01-30  1:29     ` Jann Horn
2019-01-30 15:35     ` Jens Axboe
2019-01-30 15:35       ` Jens Axboe
2019-02-04  2:56     ` Al Viro
2019-02-04  2:56       ` Al Viro
2019-02-05  2:19       ` Jens Axboe
2019-02-05  2:19         ` Jens Axboe
2019-02-05 17:57         ` Jens Axboe
2019-02-05 17:57           ` Jens Axboe
2019-02-05 19:08           ` Jens Axboe
2019-02-05 19:08             ` Jens Axboe
2019-02-06  0:27             ` Jens Axboe
2019-02-06  0:27               ` Jens Axboe
2019-02-06  1:01               ` Al Viro
2019-02-06  1:01                 ` Al Viro
2019-02-06 17:56                 ` Jens Axboe
2019-02-06 17:56                   ` Jens Axboe
2019-02-07  4:05                   ` Al Viro
2019-02-07  4:05                     ` Al Viro
2019-02-07 16:14                     ` Jens Axboe
2019-02-07 16:30                       ` Al Viro
2019-02-07 16:30                         ` Al Viro
2019-02-07 16:35                         ` Jens Axboe
2019-02-07 16:35                           ` Jens Axboe
2019-02-07 16:51                         ` Al Viro
2019-02-07 16:51                           ` Al Viro
2019-02-06  0:56             ` Al Viro
2019-02-06  0:56               ` Al Viro
2019-02-06 13:41               ` Jens Axboe
2019-02-06 13:41                 ` Jens Axboe
2019-02-07  4:00                 ` Al Viro
2019-02-07  4:00                   ` Al Viro
2019-02-07  9:22                   ` Miklos Szeredi
2019-02-07  9:22                     ` Miklos Szeredi
2019-02-07 13:31                     ` Al Viro [this message]
2019-02-07 13:31                       ` Al Viro
2019-02-07 14:20                       ` Miklos Szeredi
2019-02-07 14:20                         ` Miklos Szeredi
2019-02-07 15:20                         ` Al Viro
2019-02-07 15:20                           ` Al Viro
2019-02-07 15:27                           ` Miklos Szeredi
2019-02-07 15:27                             ` Miklos Szeredi
2019-02-07 16:26                             ` Al Viro
2019-02-07 16:26                               ` Al Viro
2019-02-07 19:08                               ` Miklos Szeredi
2019-02-07 19:08                                 ` Miklos Szeredi
2019-02-07 18:45                   ` Jens Axboe
2019-02-07 18:45                     ` Jens Axboe
2019-02-07 18:58                     ` Jens Axboe
2019-02-07 18:58                       ` Jens Axboe
2019-02-11 15:55                     ` Jonathan Corbet
2019-02-11 15:55                       ` Jonathan Corbet
2019-02-11 17:35                       ` Al Viro
2019-02-11 17:35                         ` Al Viro
2019-02-11 20:33                         ` Jonathan Corbet
2019-02-11 20:33                           ` Jonathan Corbet
2019-01-29 19:26 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:27 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-29 19:27   ` Jens Axboe
2019-01-29 19:27 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-29 19:27   ` Jens Axboe
2019-01-29 19:27 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
2019-01-29 19:27   ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-08 12:17   ` Alan Jenkins
2019-02-08 12:17     ` Alan Jenkins
2019-02-08 12:57     ` Jens Axboe
2019-02-08 12:57       ` Jens Axboe
2019-02-08 14:02       ` Alan Jenkins
2019-02-08 14:02         ` Alan Jenkins
2019-02-08 15:13         ` Jens Axboe
2019-02-08 15:13           ` Jens Axboe
2019-02-12 12:29           ` Alan Jenkins
2019-02-12 12:29             ` Alan Jenkins
2019-02-12 15:17             ` Jens Axboe
2019-02-12 15:17               ` Jens Axboe
2019-02-12 17:21               ` Alan Jenkins
2019-02-12 17:21                 ` Alan Jenkins
2019-02-12 17:33                 ` Jens Axboe
2019-02-12 17:33                   ` Jens Axboe
2019-02-12 20:23                   ` Alan Jenkins
2019-02-12 20:23                     ` Alan Jenkins
2019-02-12 21:10                     ` Jens Axboe
2019-02-12 21:10                       ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] io_uring IO interface Jens Axboe
2019-02-01 15:24 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-01 15:24   ` Jens Axboe
2019-01-30 21:55 [PATCHSET v10] io_uring IO interface Jens Axboe
2019-01-30 21:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30 21:55   ` Jens Axboe
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-29 16:36   ` Jann Horn
2019-01-29 16:36     ` Jann Horn
2019-01-29 18:13     ` Jens Axboe
2019-01-29 18:13       ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190207133135.GZ2217@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=avi@scylladb.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.