All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Wed, 6 Feb 2019 00:56:38 +0000	[thread overview]
Message-ID: <20190206005638.GU2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <40b27e78-9ee8-1395-feb3-a73aac87c9a7@kernel.dk>

On Tue, Feb 05, 2019 at 12:08:25PM -0700, Jens Axboe wrote:
> Proof is in the pudding, here's the main commit introducing io_uring
> and now wiring it up to the AF_UNIX garbage collection:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=io_uring&id=158e6f42b67d0abe9ee84886b96ca8c4b3d3dfd5
> 
> How does that look?

In a word - wrong.  Some theory: garbage collector assumes that there is
a subset of file references such that
	* for all files with such references there's an associated unix_sock.
	* all such references are stored in SCM_RIGHTS datagrams that can be
found by the garbage collector (currently: for data-bearing AF_UNIX sockets -
queued SCM_RIGHTS datagrams, for listeners - SCM_RIGHTS datagrams sent via
yet-to-be-accepted connections).
	* there is an efficient way to count those references for given file
(->inflight of the corresponding unix_sock).
	* removal of those references would render the graph acyclic.
	* file can _NOT_ be subject to syscalls unless there are references
to it outside of that subset.

unix_inflight() moves a reference into the subset
unix_notinflight() moves a reference out of the subset
activity that might add such references ought to call wait_for_unix_gc() first
(basically, to stall the massive insertions when gc is running).

Note that unix_gc() does *NOT* work in terms of dropping file references -
the primary effect is locating the SCM_RIGHTS datagrams that can be disposed
of and taking them out.  It simply won't do anything to your file references,
no matter what.  Add a printk into your ->release() and try to register io_uring
descriptor into itself, then close it.  And observe ->release() not being
called for that object.  Ever.

PS: The algorithm used by unix_gc() is basically this -

	grab unix_gc_lock (giving exclusion with unix_inflight/unix_notinflight
			   and stabilizing ->inflight counters)

	Candidates = {}
	for all unix_sock u such that u->inflight > 0
		if file corresponding to u has no other references
			Candidates += u

	/* everything else already is reachable; due to unix_gc_lock these
	   can't die or get syscall-visible references under us */
	Might_Die = Candidates

	/* invariant to maintain: for u in Candidates u->inflight will be equal
	   to the number of references from SCM_RIGHTS datagrams *except*
	   those immediately reachable from elements of Might_Die */

	for all u in Candidates
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight--

	To_Scan = ()	// stuff reachable from those must live
	for all u in Might_Die
		if u->inflight > 0
			queue u into To_Scan

	while To_Scan is non-empty
		u = dequeue(To_Scan)
		Might_Die -= u
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight++	// maintain the invariant
				if v in Might_Die
					queue v into To_Scan

	/* at that point nothing in Might_Die is reachable from the outside */

	/* restore the original values of ->inflight */
	for all u in Might_Die
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight++

	hitlist = ()
	for all u in Might_Die
		for each SCM_RIGHTS datagram D immediately reachable from u
			if D contains references to something in Candidates
				move D to hitlist
	/* all those datagrams would've never become reachable */

	drop unix_gc_lock

	discard all datagrams in hitlist.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@zeniv.linux.org.uk>
To: Jens Axboe <axboe@kernel.dk>
Cc: Jann Horn <jannh@google.com>,
	linux-aio@kvack.org, linux-block@vger.kernel.org,
	Linux API <linux-api@vger.kernel.org>,
	hch@lst.de, jmoyer@redhat.com, avi@scylladb.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] io_uring: add file set registration
Date: Wed, 6 Feb 2019 00:56:38 +0000	[thread overview]
Message-ID: <20190206005638.GU2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <40b27e78-9ee8-1395-feb3-a73aac87c9a7@kernel.dk>

On Tue, Feb 05, 2019 at 12:08:25PM -0700, Jens Axboe wrote:
> Proof is in the pudding, here's the main commit introducing io_uring
> and now wiring it up to the AF_UNIX garbage collection:
> 
> http://git.kernel.dk/cgit/linux-block/commit/?h=io_uring&id=158e6f42b67d0abe9ee84886b96ca8c4b3d3dfd5
> 
> How does that look?

In a word - wrong.  Some theory: garbage collector assumes that there is
a subset of file references such that
	* for all files with such references there's an associated unix_sock.
	* all such references are stored in SCM_RIGHTS datagrams that can be
found by the garbage collector (currently: for data-bearing AF_UNIX sockets -
queued SCM_RIGHTS datagrams, for listeners - SCM_RIGHTS datagrams sent via
yet-to-be-accepted connections).
	* there is an efficient way to count those references for given file
(->inflight of the corresponding unix_sock).
	* removal of those references would render the graph acyclic.
	* file can _NOT_ be subject to syscalls unless there are references
to it outside of that subset.

unix_inflight() moves a reference into the subset
unix_notinflight() moves a reference out of the subset
activity that might add such references ought to call wait_for_unix_gc() first
(basically, to stall the massive insertions when gc is running).

Note that unix_gc() does *NOT* work in terms of dropping file references -
the primary effect is locating the SCM_RIGHTS datagrams that can be disposed
of and taking them out.  It simply won't do anything to your file references,
no matter what.  Add a printk into your ->release() and try to register io_uring
descriptor into itself, then close it.  And observe ->release() not being
called for that object.  Ever.

PS: The algorithm used by unix_gc() is basically this -

	grab unix_gc_lock (giving exclusion with unix_inflight/unix_notinflight
			   and stabilizing ->inflight counters)

	Candidates = {}
	for all unix_sock u such that u->inflight > 0
		if file corresponding to u has no other references
			Candidates += u

	/* everything else already is reachable; due to unix_gc_lock these
	   can't die or get syscall-visible references under us */
	Might_Die = Candidates

	/* invariant to maintain: for u in Candidates u->inflight will be equal
	   to the number of references from SCM_RIGHTS datagrams *except*
	   those immediately reachable from elements of Might_Die */

	for all u in Candidates
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight--

	To_Scan = ()	// stuff reachable from those must live
	for all u in Might_Die
		if u->inflight > 0
			queue u into To_Scan

	while To_Scan is non-empty
		u = dequeue(To_Scan)
		Might_Die -= u
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight++	// maintain the invariant
				if v in Might_Die
					queue v into To_Scan

	/* at that point nothing in Might_Die is reachable from the outside */

	/* restore the original values of ->inflight */
	for all u in Might_Die
		for each file reference v in SCM_RIGHTS datagrams
					immediately reachable from u
			if v in Candidates
				v->inflight++

	hitlist = ()
	for all u in Might_Die
		for each SCM_RIGHTS datagram D immediately reachable from u
			if D contains references to something in Candidates
				move D to hitlist
	/* all those datagrams would've never become reachable */

	drop unix_gc_lock

	discard all datagrams in hitlist.

  parent reply	other threads:[~2019-02-06  0:56 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-29 19:26 [PATCHSET v9] io_uring IO interface Jens Axboe
2019-01-29 19:26 ` Jens Axboe
2019-01-29 19:26 ` [PATCH 01/18] fs: add an iopoll method to struct file_operations Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 02/18] block: wire up block device iopoll method Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 03/18] block: add bio_set_polled() helper Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 04/18] iomap: wire up the iopoll method Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 05/18] Add io_uring IO interface Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 06/18] io_uring: add fsync support Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 07/18] io_uring: support for IO polling Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 20:47   ` Jann Horn
2019-01-29 20:47     ` Jann Horn
2019-01-29 20:56     ` Jens Axboe
2019-01-29 20:56       ` Jens Axboe
2019-01-29 21:10       ` Jann Horn
2019-01-29 21:10         ` Jann Horn
2019-01-29 21:33         ` Jens Axboe
2019-01-29 21:33           ` Jens Axboe
2019-01-29 19:26 ` [PATCH 08/18] fs: add fget_many() and fput_many() Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 09/18] io_uring: use fget/fput_many() for file references Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 23:31   ` Jann Horn
2019-01-29 23:31     ` Jann Horn
2019-01-29 23:44     ` Jens Axboe
2019-01-29 23:44       ` Jens Axboe
2019-01-30 15:33       ` Jens Axboe
2019-01-30 15:33         ` Jens Axboe
2019-01-29 19:26 ` [PATCH 10/18] io_uring: batch io_kiocb allocation Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 11/18] block: implement bio helper to add iter bvec pages to bio Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 12/18] io_uring: add support for pre-mapped user IO buffers Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 22:44   ` Jann Horn
2019-01-29 22:44     ` Jann Horn
2019-01-29 22:56     ` Jens Axboe
2019-01-29 22:56       ` Jens Axboe
2019-01-29 23:03       ` Jann Horn
2019-01-29 23:03         ` Jann Horn
2019-01-29 23:06         ` Jens Axboe
2019-01-29 23:06           ` Jens Axboe
2019-01-29 23:08           ` Jann Horn
2019-01-29 23:08             ` Jann Horn
2019-01-29 23:14             ` Jens Axboe
2019-01-29 23:14               ` Jens Axboe
2019-01-29 23:42               ` Jann Horn
2019-01-29 23:42                 ` Jann Horn
2019-01-29 23:51                 ` Jens Axboe
2019-01-29 23:51                   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-30  1:29   ` Jann Horn
2019-01-30  1:29     ` Jann Horn
2019-01-30 15:35     ` Jens Axboe
2019-01-30 15:35       ` Jens Axboe
2019-02-04  2:56     ` Al Viro
2019-02-04  2:56       ` Al Viro
2019-02-05  2:19       ` Jens Axboe
2019-02-05  2:19         ` Jens Axboe
2019-02-05 17:57         ` Jens Axboe
2019-02-05 17:57           ` Jens Axboe
2019-02-05 19:08           ` Jens Axboe
2019-02-05 19:08             ` Jens Axboe
2019-02-06  0:27             ` Jens Axboe
2019-02-06  0:27               ` Jens Axboe
2019-02-06  1:01               ` Al Viro
2019-02-06  1:01                 ` Al Viro
2019-02-06 17:56                 ` Jens Axboe
2019-02-06 17:56                   ` Jens Axboe
2019-02-07  4:05                   ` Al Viro
2019-02-07  4:05                     ` Al Viro
2019-02-07 16:14                     ` Jens Axboe
2019-02-07 16:30                       ` Al Viro
2019-02-07 16:30                         ` Al Viro
2019-02-07 16:35                         ` Jens Axboe
2019-02-07 16:35                           ` Jens Axboe
2019-02-07 16:51                         ` Al Viro
2019-02-07 16:51                           ` Al Viro
2019-02-06  0:56             ` Al Viro [this message]
2019-02-06  0:56               ` Al Viro
2019-02-06 13:41               ` Jens Axboe
2019-02-06 13:41                 ` Jens Axboe
2019-02-07  4:00                 ` Al Viro
2019-02-07  4:00                   ` Al Viro
2019-02-07  9:22                   ` Miklos Szeredi
2019-02-07  9:22                     ` Miklos Szeredi
2019-02-07 13:31                     ` Al Viro
2019-02-07 13:31                       ` Al Viro
2019-02-07 14:20                       ` Miklos Szeredi
2019-02-07 14:20                         ` Miklos Szeredi
2019-02-07 15:20                         ` Al Viro
2019-02-07 15:20                           ` Al Viro
2019-02-07 15:27                           ` Miklos Szeredi
2019-02-07 15:27                             ` Miklos Szeredi
2019-02-07 16:26                             ` Al Viro
2019-02-07 16:26                               ` Al Viro
2019-02-07 19:08                               ` Miklos Szeredi
2019-02-07 19:08                                 ` Miklos Szeredi
2019-02-07 18:45                   ` Jens Axboe
2019-02-07 18:45                     ` Jens Axboe
2019-02-07 18:58                     ` Jens Axboe
2019-02-07 18:58                       ` Jens Axboe
2019-02-11 15:55                     ` Jonathan Corbet
2019-02-11 15:55                       ` Jonathan Corbet
2019-02-11 17:35                       ` Al Viro
2019-02-11 17:35                         ` Al Viro
2019-02-11 20:33                         ` Jonathan Corbet
2019-02-11 20:33                           ` Jonathan Corbet
2019-01-29 19:26 ` [PATCH 14/18] io_uring: add submission polling Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:26 ` [PATCH 15/18] io_uring: add io_kiocb ref count Jens Axboe
2019-01-29 19:26   ` Jens Axboe
2019-01-29 19:27 ` [PATCH 16/18] io_uring: add support for IORING_OP_POLL Jens Axboe
2019-01-29 19:27   ` Jens Axboe
2019-01-29 19:27 ` [PATCH 17/18] io_uring: allow workqueue item to handle multiple buffered requests Jens Axboe
2019-01-29 19:27   ` Jens Axboe
2019-01-29 19:27 ` [PATCH 18/18] io_uring: add io_uring_event cache hit information Jens Axboe
2019-01-29 19:27   ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-02-07 19:55 [PATCHSET v12] io_uring IO interface Jens Axboe
2019-02-07 19:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-07 19:55   ` Jens Axboe
2019-02-08 12:17   ` Alan Jenkins
2019-02-08 12:17     ` Alan Jenkins
2019-02-08 12:57     ` Jens Axboe
2019-02-08 12:57       ` Jens Axboe
2019-02-08 14:02       ` Alan Jenkins
2019-02-08 14:02         ` Alan Jenkins
2019-02-08 15:13         ` Jens Axboe
2019-02-08 15:13           ` Jens Axboe
2019-02-12 12:29           ` Alan Jenkins
2019-02-12 12:29             ` Alan Jenkins
2019-02-12 15:17             ` Jens Axboe
2019-02-12 15:17               ` Jens Axboe
2019-02-12 17:21               ` Alan Jenkins
2019-02-12 17:21                 ` Alan Jenkins
2019-02-12 17:33                 ` Jens Axboe
2019-02-12 17:33                   ` Jens Axboe
2019-02-12 20:23                   ` Alan Jenkins
2019-02-12 20:23                     ` Alan Jenkins
2019-02-12 21:10                     ` Jens Axboe
2019-02-12 21:10                       ` Jens Axboe
2019-02-01 15:23 [PATCHSET v11] io_uring IO interface Jens Axboe
2019-02-01 15:24 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-02-01 15:24   ` Jens Axboe
2019-01-30 21:55 [PATCHSET v10] io_uring IO interface Jens Axboe
2019-01-30 21:55 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-30 21:55   ` Jens Axboe
2019-01-28 21:35 [PATCHSET v8] io_uring IO interface Jens Axboe
2019-01-28 21:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe
2019-01-28 21:35   ` Jens Axboe
2019-01-29 16:36   ` Jann Horn
2019-01-29 16:36     ` Jann Horn
2019-01-29 18:13     ` Jens Axboe
2019-01-29 18:13       ` Jens Axboe
2019-01-23 15:35 [PATCHSET v7] io_uring IO interface Jens Axboe
2019-01-23 15:35 ` [PATCH 13/18] io_uring: add file set registration Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190206005638.GU2217@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=avi@scylladb.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=jannh@google.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.