From: Ming Lei <ming.lei@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Jeff Moyer <jmoyer@redhat.com>, Keith Busch <kbusch@meta.com>,
linux-nvme@lists.infradead.org, io-uring@vger.kernel.org,
axboe@kernel.dk, hch@lst.de, sagi@grimberg.me,
asml.silence@gmail.com, linux-security-module@vger.kernel.org,
Kanchan Joshi <joshi.k@samsung.com>
Subject: Re: [PATCH 1/2] iouring: one capable call per iouring instance
Date: Tue, 5 Dec 2023 13:25:44 +0800 [thread overview]
Message-ID: <ZW60WPf/hmAUoxPv@fedora> (raw)
In-Reply-To: <ZW6nmR2ytIBApXE0@kbusch-mbp>
On Mon, Dec 04, 2023 at 09:31:21PM -0700, Keith Busch wrote:
> On Tue, Dec 05, 2023 at 12:14:22PM +0800, Ming Lei wrote:
> > On Mon, Dec 04, 2023 at 11:57:55AM -0700, Keith Busch wrote:
> > > On Mon, Dec 04, 2023 at 01:40:58PM -0500, Jeff Moyer wrote:
> > > > I added a CC: linux-security-module@vger
> > > > Keith Busch <kbusch@meta.com> writes:
> > > > > From: Keith Busch <kbusch@kernel.org>
> > > > >
> > > > > The uring_cmd operation is often used for privileged actions, so drivers
> > > > > subscribing to this interface check capable() for each command. The
> > > > > capable() function is not fast path friendly for many kernel configs,
> > > > > and this can really harm performance. Stash the capable sys admin
> > > > > attribute in the io_uring context and set a new issue_flag for the
> > > > > uring_cmd interface.
> > > >
> > > > I have a few questions. What privileged actions are performance
> > > > sensitive? I would hope that anything requiring privileges would not
> > > > be in a fast path (but clearly that's not the case).
> > >
> > > Protocol specifics that don't have a generic equivalent. For example,
> > > NVMe FDP is reachable only through the uring_cmd and ioctl interfaces,
> > > but you use it like normal reads and writes so has to be as fast as the
> > > generic interfaces.
> >
> > But normal read/write pt command doesn't require ADMIN any more since
> > commit 855b7717f44b ("nvme: fine-granular CAP_SYS_ADMIN for nvme io commands"),
> > why do you have to pay the cost of checking capable(CAP_SYS_ADMIN)?
>
> Good question. The "capable" check had always been first so even with
> the relaxed permissions, it was still paying the price. I have changed
> that order in commit staged here (not yet upstream):
>
> http://git.infradead.org/nvme.git/commitdiff/7be866b1cf0bf1dfa74480fe8097daeceda68622
With this change, I guess you shouldn't see the following big gap, right?
> Before: 970k IOPs
> After: 1750k IOPs
>
> Note that only prevents the costly capable() check if the inexpensive
> checks could make a determination. That's still not solving the problem
> long term since we aim for forward compatibility where we have no idea
> which opcodes, admin identifications, or vendor specifics could be
> deemed "safe" for non-root users in the future, so those conditions
> would always fall back to the more expensive check that this patch was
> trying to mitigate for admin processes.
Not sure I get the idea, it is related with nvme's permission model for
user pt command, and:
1) it should be always checked in entry of nvme user pt command
2) only the following two types of commands require ADMIN, per commit
855b7717f44b ("nvme: fine-granular CAP_SYS_ADMIN for nvme io commands")
- any admin-cmd is not allowed
- vendor-specific and fabric commmand are not allowed
Can you provide more details why the expensive check can't be avoided for
fast read/write user IO commands?
Thanks,
Ming
next prev parent reply other threads:[~2023-12-05 5:26 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-04 17:53 [PATCH 1/2] iouring: one capable call per iouring instance Keith Busch
2023-12-04 17:53 ` [PATCH 2/2] nvme: use uring_cmd sys_admin flag Keith Busch
2023-12-04 18:05 ` [PATCH 1/2] iouring: one capable call per iouring instance Jens Axboe
2023-12-04 18:45 ` Pavel Begunkov
2023-12-05 16:21 ` Kanchan Joshi
2023-12-06 21:09 ` Keith Busch
2023-12-04 18:15 ` Jens Axboe
2023-12-04 18:40 ` Jeff Moyer
2023-12-04 18:57 ` Keith Busch
2023-12-05 4:14 ` Ming Lei
2023-12-05 4:31 ` Keith Busch
2023-12-05 5:25 ` Ming Lei [this message]
2023-12-05 15:45 ` Keith Busch
2023-12-06 3:08 ` Ming Lei
2023-12-06 15:31 ` Keith Busch
2023-12-07 1:23 ` Ming Lei
2023-12-07 17:48 ` Christoph Hellwig
2023-12-04 19:01 ` Jens Axboe
2023-12-04 19:22 ` Jeff Moyer
2023-12-04 19:33 ` Jens Axboe
2023-12-04 19:37 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZW60WPf/hmAUoxPv@fedora \
--to=ming.lei@redhat.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=io-uring@vger.kernel.org \
--cc=jmoyer@redhat.com \
--cc=joshi.k@samsung.com \
--cc=kbusch@kernel.org \
--cc=kbusch@meta.com \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-security-module@vger.kernel.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.