linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Christian Brauner <brauner@kernel.org>,
	Florian Weimer <fweimer@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Tycho Andersen <tycho@tycho.pizza>,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders
Date: Thu, 7 Dec 2023 20:16:55 -0700	[thread overview]
Message-ID: <c86faa98-937f-42e6-8c05-60112fd95966@kernel.dk> (raw)
In-Reply-To: <20231207-entdecken-selektiert-d5ce6dca6a80@brauner>

On 12/7/23 3:58 PM, Christian Brauner wrote:
> [adjusting Cc as that's really a separate topic]
> 
> On Thu, Nov 30, 2023 at 08:43:18PM +0100, Florian Weimer wrote:
>> * Mathieu Desnoyers:
>>
>>>>> I'd like to offer a userspace API which allows safe stashing of
>>>>> unreachable file descriptors on a service thread.
> 
> Fwiw, systemd has a concept called the fdstore:
> 
> https://systemd.io/FILE_DESCRIPTOR_STORE
> 
> "The file descriptor store [...] allows services to upload during
> runtime additional fds to the service manager that it shall keep on its
> behalf. File descriptors are passed back to the service on subsequent
> activations, the same way as any socket activation fds are passed.
> 
> [...]
> 
> The primary use-case of this logic is to permit services to restart
> seamlessly (for example to update them to a newer version), without
> losing execution context, dropping pinned resources, terminating
> established connections or even just momentarily losing connectivity. In
> fact, as the file descriptors can be uploaded freely at any time during
> the service runtime, this can even be used to implement services that
> robustly handle abnormal termination and can recover from that without
> losing pinned resources."
> 
>>
>>>> By "safe" here do you mean not accessible via pidfd_getfd()?
>>
>> No, unreachable by close/close_range/dup2/dup3.  I expect we can do an
>> intra-process transfer using /proc, but I'm hoping for something nicer.
> 
> File descriptors are reachable for all processes/threads that share a
> file descriptor table. Changing that means breaking core userspace
> assumptions about how file descriptors work. That's not going to happen
> as far as I'm concerned.
> 
> We may consider additional security_* hooks in close*() and dup*(). That
> would allow you to utilize Landlock or BPF LSM to prevent file
> descriptors from being closed or duplicated. pidfd_getfd() is already
> blockable via security_file_receive().
> 
> In general, messing with fds in that way is really not a good idea.
> 
> If you need something that awkward, then you should go all the way and
> look at io_uring which basically has a separate fd-like handle called
> "fixed files".
> 
> Fixed file indexes are separate file-descriptor like handles that can
> only be used from io_uring calls but not with the regular system call
> interface.
> 
> IOW, you can refer to a file using an io_uring fixed index. The index to
> use can be chosen by userspace and can't be used with any regular
> fd-based system calls.
> 
> The io_uring fd itself can be made a fixed file itself
> 
> The only thing missing would be to turn an io_uring fixed file back into
> a regular file descriptor. That could probably be done by using
> receive_fd() and then installing that fd back into the caller's file
> descriptor table. But that would require an io_uring patch.

FWIW, since it was very trivial, I posted an rfc/test patch for just
that with a test case. It's here:

https://lore.kernel.org/io-uring/df0e24ff-f3a0-4818-8282-2a4e03b7b5a6@kernel.dk/

-- 
Jens Axboe


  reply	other threads:[~2023-12-08  3:16 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 16:39 [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders Tycho Andersen
2023-11-30 16:39 ` [RFC 2/3] selftests/pidfd: add non-thread-group leader tests Tycho Andersen
2023-11-30 16:39 ` [RFC 3/3] clone: allow CLONE_THREAD | CLONE_PIDFD together Tycho Andersen
2023-11-30 17:39 ` [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders Oleg Nesterov
2023-11-30 17:56   ` Tycho Andersen
2023-12-01 16:31     ` Tycho Andersen
2023-12-07 17:57       ` Christian Brauner
2023-12-07 21:25         ` Christian Brauner
2023-12-08 20:04           ` Tycho Andersen
2023-11-30 18:37 ` Florian Weimer
2023-11-30 18:54   ` Tycho Andersen
2023-11-30 19:00     ` Mathieu Desnoyers
2023-11-30 19:17       ` Tycho Andersen
2023-11-30 19:43       ` Florian Weimer
2023-12-06 15:27         ` Tycho Andersen
2023-12-07 22:58         ` Christian Brauner
2023-12-08  3:16           ` Jens Axboe [this message]
2023-12-08 13:15           ` Florian Weimer
2023-12-08 13:48             ` Christian Brauner
2023-12-08 13:58               ` Florian Weimer
2023-12-07 17:21 ` Christian Brauner
2023-12-07 17:52   ` Tycho Andersen
2023-12-08 17:47   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c86faa98-937f-42e6-8c05-60112fd95966@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=fweimer@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=tycho@tycho.pizza \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).