public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* userfaultfd: two-step UFFDIO_API always gives -EINVAL
@ 2024-11-23 15:13 stsp
  2024-11-25  9:05 ` stsp
  2024-11-25 15:59 ` Peter Xu
  0 siblings, 2 replies; 19+ messages in thread
From: stsp @ 2024-11-23 15:13 UTC (permalink / raw)
  To: Linux kernel; +Cc: Muhammad Usama Anjum, Peter Xu

Hello.

I tried to use userfaultfd and got
that strange result: when I first do
UFFDIO_API ioctl with features = 0,
it succeeds. I check the needed
features, and they are all in place.
But on the second step, where I
request the needed features,
UFFDIO_API gives -EINVAL no matter
what features I requested (or even
set features to 0 again).

A quick look into the kernel code
suggests that the problem is that
uffd_ctx_features() doesn't check
user_features for being 0, and just
sets UFFD_FEATURE_INITIALIZED
with no features at all. After that,
userfaultfd_api() should always
fail with -EINVAL when doing this:

ctx_features = uffd_ctx_features(features);
ret = -EINVAL;
if (cmpxchg(&ctx->features, 0, ctx_features) != 0)
         goto err_out;

But I haven't checked my finding
by rebuilding the kernel.
So is this broken or am I missing
something?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-23 15:13 userfaultfd: two-step UFFDIO_API always gives -EINVAL stsp
@ 2024-11-25  9:05 ` stsp
  2024-11-25 15:59 ` Peter Xu
  1 sibling, 0 replies; 19+ messages in thread
From: stsp @ 2024-11-25  9:05 UTC (permalink / raw)
  To: Linux kernel; +Cc: Muhammad Usama Anjum, Peter Xu

23.11.2024 18:13, stsp пишет:
> Hello.
>
> I tried to use userfaultfd and got
> that strange result: when I first do
> UFFDIO_API ioctl with features = 0,
> it succeeds. I check the needed
> features, and they are all in place.
> But on the second step, where I
> request the needed features,
> UFFDIO_API gives -EINVAL no matter
> what features I requested (or even
> set features to 0 again).
With the test patch below, the
problem can be reproduced. All
the code in selftests suggest
that UFFDIO_API should not be
called twice, whereas man page
says this:

```
        After the userfaultfd object is created with userfaultfd(), the 
applica‐
        tion must enable it using the UFFDIO_API ioctl(2) operation. 
This oper‐
        ation allows a two-step handshake between the kernel and user  
space  to
        determine what API version and features the kernel supports, and 
then to
        enable  those  features  user  space wants.
```
But the second part doesn't work and
is never being tested in selftests.
So is this a documentation problem?

This patch can be used to make
sure the second call doesn't work:

--- a/tools/testing/selftests/mm/uffd-unit-tests.c
+++ b/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -171,6 +171,14 @@ static int test_uffd_api(bool use_dev)
                 goto out;
         }

+       /* Request valid feature via UFFDIO_API */
+       uffdio_api.api = UFFD_API;
+       uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
+       if (ioctl(uffd, UFFDIO_API, &uffdio_api)) {
+               uffd_test_fail("UFFDIO_API should succeed but failed");
+               exit(1);
+       }
+
         uffd_test_pass();
  out:
         close(uffd);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-23 15:13 userfaultfd: two-step UFFDIO_API always gives -EINVAL stsp
  2024-11-25  9:05 ` stsp
@ 2024-11-25 15:59 ` Peter Xu
  2024-11-25 16:15   ` stsp
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Xu @ 2024-11-25 15:59 UTC (permalink / raw)
  To: stsp; +Cc: Linux kernel, Muhammad Usama Anjum

On Sat, Nov 23, 2024 at 06:13:01PM +0300, stsp wrote:
> Hello.
> 
> I tried to use userfaultfd and got
> that strange result: when I first do
> UFFDIO_API ioctl with features = 0,
> it succeeds. I check the needed
> features, and they are all in place.
> But on the second step, where I
> request the needed features,
> UFFDIO_API gives -EINVAL no matter
> what features I requested (or even
> set features to 0 again).
> 
> A quick look into the kernel code
> suggests that the problem is that
> uffd_ctx_features() doesn't check
> user_features for being 0, and just
> sets UFFD_FEATURE_INITIALIZED
> with no features at all. After that,
> userfaultfd_api() should always
> fail with -EINVAL when doing this:
> 
> ctx_features = uffd_ctx_features(features);
> ret = -EINVAL;
> if (cmpxchg(&ctx->features, 0, ctx_features) != 0)
>         goto err_out;
> 
> But I haven't checked my finding
> by rebuilding the kernel.
> So is this broken or am I missing
> something?

I agree it's slightly confusing but it's intended.  It's like that since
the start, so I think we should still keep the behavior.

The userapp needs to create one extra userfaultfd to detect supported
features in the kernel.  To use it after a probe request, you'll need to
close() the fd, redo the userfaultfd syscall to create another fd.

The kernel cannot assume features==0 to be a pure query, because not all
userfaultfd features requires setting of a feature bit. E.g., the default
anonymous missing traps doesn't require any feature bit to set.  So the
initial UFFDIO_API(features=0) is the enablement of such feature.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 15:59 ` Peter Xu
@ 2024-11-25 16:15   ` stsp
  2024-11-25 16:58     ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: stsp @ 2024-11-25 16:15 UTC (permalink / raw)
  To: Peter Xu; +Cc: Linux kernel, Muhammad Usama Anjum

25.11.2024 18:59, Peter Xu пишет:
> I agree it's slightly confusing but it's intended.  It's like that since
> the start, so I think we should still keep the behavior.
>
> The userapp needs to create one extra userfaultfd to detect supported
> features in the kernel.  To use it after a probe request, you'll need to
> close() the fd, redo the userfaultfd syscall to create another fd.
Hi Peter, thanks for info.
Unfortunately man page doesn't
say that. In fact if it did, I won't be
using the second userfaultfd just
for that, anyway. :)

Man page clearly talks about
"the userfaultfd object" (one object)
when mandating the "two-step handshake".
I spent hours of head-scratching
before went looking into the sources,
and even then I was confident the man
page is right: people should always assume
documentation is correct, code is buggy.

Would it be possible to re-document
this part? As all test-cases in kernel
do not use 2-steps - how about just
removing that part from man page?
Suggesting another fd would be strange. :)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 16:15   ` stsp
@ 2024-11-25 16:58     ` Peter Xu
  2024-11-25 17:07       ` stsp
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2024-11-25 16:58 UTC (permalink / raw)
  To: stsp; +Cc: Linux kernel, Muhammad Usama Anjum, Axel Rasmussen, Mike Rapoport

On Mon, Nov 25, 2024 at 07:15:10PM +0300, stsp wrote:
> 25.11.2024 18:59, Peter Xu пишет:
> > I agree it's slightly confusing but it's intended.  It's like that since
> > the start, so I think we should still keep the behavior.
> > 
> > The userapp needs to create one extra userfaultfd to detect supported
> > features in the kernel.  To use it after a probe request, you'll need to
> > close() the fd, redo the userfaultfd syscall to create another fd.
> Hi Peter, thanks for info.
> Unfortunately man page doesn't
> say that. In fact if it did, I won't be
> using the second userfaultfd just
> for that, anyway. :)

But AFAIU that's the only way to probe kernel userfaultfd features.. so if
we need a probe we need to have two fds.

> 
> Man page clearly talks about
> "the userfaultfd object" (one object)
> when mandating the "two-step handshake".
> I spent hours of head-scratching
> before went looking into the sources,
> and even then I was confident the man
> page is right: people should always assume
> documentation is correct, code is buggy.

Hmm yes.  I didn't pay much attention initially, but then after I read the
latest man-pages/, especially "UFFDIO_API(2const)" I found it looks indeed
wrong in the doc.

In this case we can't change the code because we need to keep it working
like before to not break ABI.  We may still update the doc.

IIUC the two-step was mentioned since this patch:

https://lore.kernel.org/lkml/20230919190206.388896-6-axelrasmussen@google.com/#t

So I also copied Axel and Mike, just to make sure I didn't miss something.

> 
> Would it be possible to re-document
> this part? As all test-cases in kernel
> do not use 2-steps - how about just
> removing that part from man page?
> Suggesting another fd would be strange. :)

I would actually suggest mention another fd is needed for probing features.
But you can wait for some comment from either Axel or Mike to double check
that should be either removed or proposed.

For man-pages contribution in case you're interested, see:

https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/CONTRIBUTING

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 16:58     ` Peter Xu
@ 2024-11-25 17:07       ` stsp
  2024-11-25 17:13         ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: stsp @ 2024-11-25 17:07 UTC (permalink / raw)
  To: Peter Xu
  Cc: Linux kernel, Muhammad Usama Anjum, Axel Rasmussen, Mike Rapoport

25.11.2024 19:58, Peter Xu пишет:
> On Mon, Nov 25, 2024 at 07:15:10PM +0300, stsp wrote:
>> Man page clearly talks about
>> "the userfaultfd object" (one object)
>> when mandating the "two-step handshake".
>> I spent hours of head-scratching
>> before went looking into the sources,
>> and even then I was confident the man
>> page is right: people should always assume
>> documentation is correct, code is buggy.
> Hmm yes.  I didn't pay much attention initially, but then after I read the
> latest man-pages/, especially "UFFDIO_API(2const)" I found it looks indeed
> wrong in the doc.
>
> In this case we can't change the code because we need to keep it working
> like before to not break ABI.  We may still update the doc.
I wonder if some non-ABI-breaker
is possible, like eg keep the current
behavior of "features=0", but allow
to (optionally) override that by a
non-0 request? Yes, I've seen kselftests
are trying to double-register after 0,
but IIRC they tried to register wrong
options, which would fail anyway.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 17:07       ` stsp
@ 2024-11-25 17:13         ` Peter Xu
  2024-11-25 17:32           ` stsp
  2024-11-25 22:42           ` Axel Rasmussen
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Xu @ 2024-11-25 17:13 UTC (permalink / raw)
  To: stsp; +Cc: Linux kernel, Muhammad Usama Anjum, Axel Rasmussen, Mike Rapoport

On Mon, Nov 25, 2024 at 08:07:34PM +0300, stsp wrote:
> 25.11.2024 19:58, Peter Xu пишет:
> > On Mon, Nov 25, 2024 at 07:15:10PM +0300, stsp wrote:
> > > Man page clearly talks about
> > > "the userfaultfd object" (one object)
> > > when mandating the "two-step handshake".
> > > I spent hours of head-scratching
> > > before went looking into the sources,
> > > and even then I was confident the man
> > > page is right: people should always assume
> > > documentation is correct, code is buggy.
> > Hmm yes.  I didn't pay much attention initially, but then after I read the
> > latest man-pages/, especially "UFFDIO_API(2const)" I found it looks indeed
> > wrong in the doc.
> > 
> > In this case we can't change the code because we need to keep it working
> > like before to not break ABI.  We may still update the doc.
> I wonder if some non-ABI-breaker
> is possible, like eg keep the current
> behavior of "features=0", but allow
> to (optionally) override that by a
> non-0 request? Yes, I've seen kselftests
> are trying to double-register after 0,
> but IIRC they tried to register wrong
> options, which would fail anyway.

Old kernels will fail with -EINVAL, new will succeed.  It's already an ABI
violation, IMHO.

Not to mention I'm not sure what could happen if uffd feature flags can
change on the fly.  Your proposal means it can happen when anon missing
trap is enabled at least.  That's probably unwanted, and unnecessary
complexity to maintain to the kernel.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 17:13         ` Peter Xu
@ 2024-11-25 17:32           ` stsp
  2024-11-25 17:44             ` Peter Xu
  2024-11-25 22:42           ` Axel Rasmussen
  1 sibling, 1 reply; 19+ messages in thread
From: stsp @ 2024-11-25 17:32 UTC (permalink / raw)
  To: Peter Xu; +Cc: Linux kernel, Muhammad Usama Anjum

25.11.2024 20:13, Peter Xu пишет:
> Old kernels will fail with -EINVAL, new will succeed.  It's already an ABI
> violation, IMHO.
>
> Not to mention I'm not sure what could happen if uffd feature flags can
> change on the fly.  Your proposal means it can happen when anon missing
> trap is enabled at least.  That's probably unwanted, and unnecessary
> complexity to maintain to the kernel.
OK, thanks for considering.

By the way, as we are at it, I have
this usage question. I initially intended
to use UFFD_FEATURE_WP_ASYNC, but
it appears (and is documented so) to not
deliver any notification.
Why not?
I am currently using UFFD_FEATURE_PAGEFAULT_FLAG_WP,
but I only want to monitor the fact that
the page was written to. With
UFFD_FEATURE_WP_ASYNC it would be
much faster, as the kernel resolves the
fault for me. Yes, I've seen the mentioning
of /proc/pages in docs (I don't even have
/proc/pages - perhaps it was ment to be
/proc/<pid>/pages?), but why such a
complexity if all I need is the notification
similar to what I get from
UFFD_FEATURE_PAGEFAULT_FLAG_WP?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 17:32           ` stsp
@ 2024-11-25 17:44             ` Peter Xu
  2024-11-25 18:01               ` stsp
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2024-11-25 17:44 UTC (permalink / raw)
  To: stsp; +Cc: Linux kernel, Muhammad Usama Anjum

On Mon, Nov 25, 2024 at 08:32:54PM +0300, stsp wrote:
> 25.11.2024 20:13, Peter Xu пишет:
> > Old kernels will fail with -EINVAL, new will succeed.  It's already an ABI
> > violation, IMHO.
> > 
> > Not to mention I'm not sure what could happen if uffd feature flags can
> > change on the fly.  Your proposal means it can happen when anon missing
> > trap is enabled at least.  That's probably unwanted, and unnecessary
> > complexity to maintain to the kernel.
> OK, thanks for considering.
> 
> By the way, as we are at it, I have
> this usage question. I initially intended
> to use UFFD_FEATURE_WP_ASYNC, but
> it appears (and is documented so) to not
> deliver any notification.
> Why not?
> I am currently using UFFD_FEATURE_PAGEFAULT_FLAG_WP,
> but I only want to monitor the fact that
> the page was written to. With
> UFFD_FEATURE_WP_ASYNC it would be
> much faster, as the kernel resolves the
> fault for me. Yes, I've seen the mentioning
> of /proc/pages in docs (I don't even have
> /proc/pages - perhaps it was ment to be
> /proc/<pid>/pages?), but why such a
> complexity if all I need is the notification
> similar to what I get from
> UFFD_FEATURE_PAGEFAULT_FLAG_WP?

Apps who tracks snapshots needs the unmodified pages before being written.
Those cannot rely on kernel resolution because it needs more than "if the
page is written" - it also needs the page data before being written.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 17:44             ` Peter Xu
@ 2024-11-25 18:01               ` stsp
  2024-11-25 18:44                 ` Muhammad Usama Anjum
  0 siblings, 1 reply; 19+ messages in thread
From: stsp @ 2024-11-25 18:01 UTC (permalink / raw)
  To: Peter Xu; +Cc: Linux kernel, Muhammad Usama Anjum

25.11.2024 20:44, Peter Xu пишет:
> Apps who tracks snapshots needs the unmodified pages before being written.
> Those cannot rely on kernel resolution because it needs more than "if the
> page is written" - it also needs the page data before being written.
Say I am writing a frame grabber
(not exactly, but very close to).
I monitor the video buffer of another
process, and "snapshot" it with some
frequency. I only need to know what
pages were modified, to reduce the
bandwidth to an absolute minimum,
and if the process is not playing - then
to not grab anything until it resumes.
UFFD_FEATURE_PAGEFAULT_FLAG_WP
works quite well for me already, but
I envision a huge boost with
UFFD_FEATURE_WP_ASYNC.
What would you suggest for that usage
scenario?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 18:01               ` stsp
@ 2024-11-25 18:44                 ` Muhammad Usama Anjum
  2024-11-26  7:32                   ` stsp
  2024-11-26  9:41                   ` stsp
  0 siblings, 2 replies; 19+ messages in thread
From: Muhammad Usama Anjum @ 2024-11-25 18:44 UTC (permalink / raw)
  To: stsp, Peter Xu; +Cc: Usama.Anjum, Linux kernel

On 11/25/24 11:01 PM, stsp wrote:
> 25.11.2024 20:44, Peter Xu пишет:
>> Apps who tracks snapshots needs the unmodified pages before being written.
>> Those cannot rely on kernel resolution because it needs more than "if the
>> page is written" - it also needs the page data before being written.
> Say I am writing a frame grabber
> (not exactly, but very close to).
> I monitor the video buffer of another
> process, and "snapshot" it with some
> frequency. I only need to know what
> pages were modified, to reduce the
> bandwidth to an absolute minimum,
> and if the process is not playing - then
> to not grab anything until it resumes.
> UFFD_FEATURE_PAGEFAULT_FLAG_WP
> works quite well for me already, but
> I envision a huge boost with
> UFFD_FEATURE_WP_ASYNC.
> What would you suggest for that usage
> scenario?
The UFFD_FEATURE_WP_ASYNC was designed for exactly this case.
The IOCTL will return you the modified pages. An example of usage
can be found in selftest/mm/pagemap_ioctl.c.

-- 
BR,
Muhammad Usama Anjum


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 17:13         ` Peter Xu
  2024-11-25 17:32           ` stsp
@ 2024-11-25 22:42           ` Axel Rasmussen
  2024-11-26  7:39             ` stsp
  1 sibling, 1 reply; 19+ messages in thread
From: Axel Rasmussen @ 2024-11-25 22:42 UTC (permalink / raw)
  To: Peter Xu; +Cc: stsp, Linux kernel, Muhammad Usama Anjum, Mike Rapoport

On Mon, Nov 25, 2024 at 9:13 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Mon, Nov 25, 2024 at 08:07:34PM +0300, stsp wrote:
> > 25.11.2024 19:58, Peter Xu пишет:
> > > On Mon, Nov 25, 2024 at 07:15:10PM +0300, stsp wrote:
> > > > Man page clearly talks about
> > > > "the userfaultfd object" (one object)
> > > > when mandating the "two-step handshake".
> > > > I spent hours of head-scratching
> > > > before went looking into the sources,
> > > > and even then I was confident the man
> > > > page is right: people should always assume
> > > > documentation is correct, code is buggy.
> > > Hmm yes.  I didn't pay much attention initially, but then after I read the
> > > latest man-pages/, especially "UFFDIO_API(2const)" I found it looks indeed
> > > wrong in the doc.
> > >
> > > In this case we can't change the code because we need to keep it working
> > > like before to not break ABI.  We may still update the doc.
> > I wonder if some non-ABI-breaker
> > is possible, like eg keep the current
> > behavior of "features=0", but allow
> > to (optionally) override that by a
> > non-0 request? Yes, I've seen kselftests
> > are trying to double-register after 0,
> > but IIRC they tried to register wrong
> > options, which would fail anyway.
>
> Old kernels will fail with -EINVAL, new will succeed.  It's already an ABI
> violation, IMHO.
>
> Not to mention I'm not sure what could happen if uffd feature flags can
> change on the fly.  Your proposal means it can happen when anon missing
> trap is enabled at least.  That's probably unwanted, and unnecessary
> complexity to maintain to the kernel.
>
> Thanks,
>
> --
> Peter Xu

I agree with Peter, we should just update the man page to mention
UFFDIO_API can only be called once, so probing requires a second
userfaultfd. I think not mentioning that was just an oversight from
the last time I updated the man page.

For what it's worth, I still don't like the two-step handshake design,
my preference is still an API like this:

1. userspace asks for the features it wants
2. kernel responds with the (possibly subset of) features it actually supports
3. userspace is free to carry on with perhaps limited features, or
exit with error, or ...

But, I think at that point the ship has already sailed. I think to
maintain compatibility with existing programs there isn't much we can
do at this point. Also from the last time we discussed this I think I
am somewhat alone preferring that design. :)

Also I agree that allowing > 1 UFFDIO_API calls is potentially tricky.
Calling it with features = 0 isn't just for probing, it also gives you
a fully initialized userfaultfd which could be used out of the box (if
you don't need any additional features).

If nobody else is interested in doing it I can send a patch to fixup
the man page at some point in the near future.

>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 18:44                 ` Muhammad Usama Anjum
@ 2024-11-26  7:32                   ` stsp
  2024-11-26 15:56                     ` Peter Xu
  2024-11-26  9:41                   ` stsp
  1 sibling, 1 reply; 19+ messages in thread
From: stsp @ 2024-11-26  7:32 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Peter Xu; +Cc: Linux kernel

25.11.2024 21:44, Muhammad Usama Anjum пишет:
> The UFFD_FEATURE_WP_ASYNC was designed for exactly this case.
> The IOCTL will return you the modified pages. An example of usage
> can be found in selftest/mm/pagemap_ioctl.c.
Thank you!
I studied the examples.
They are quite clear and good
for copy/pasting purposes.

I had yet another "problem":
I tried to create the usefaultfd,
then register the API and fork()
the process. In child I do UFFDIO_REGISTER,
but the parent can't see that.
So instead of copy by fork, I
had to use SCM_RIGHTS.
Is this expected, or should it
work fine with forked fd? That
would be a bit simpler than to
apply to SCM_RIGHTS tricks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 22:42           ` Axel Rasmussen
@ 2024-11-26  7:39             ` stsp
  2024-11-26 15:50               ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: stsp @ 2024-11-26  7:39 UTC (permalink / raw)
  To: Axel Rasmussen, Peter Xu
  Cc: Linux kernel, Muhammad Usama Anjum, Mike Rapoport

26.11.2024 01:42, Axel Rasmussen пишет:
> For what it's worth, I still don't like the two-step handshake design,
> my preference is still an API like this:
>
> 1. userspace asks for the features it wants
> 2. kernel responds with the (possibly subset of) features it actually supports
> 3. userspace is free to carry on with perhaps limited features, or
> exit with error, or ...

4. pass the needed features to
UFFDIO_REGISTER, correct?

> But, I think at that point the ship has already sailed. I think to
> maintain compatibility with existing programs there isn't much we can
> do at this point.
Please, just why do you have that UFFD_API
const? Only to call every screw-up like this
one, a sailed ship? :)
Why not to add UFFD_API_v2?
Then UFFD_API_v3?
Full binary and source compatibility is
therefore preserved, you only need to
update the man page to document the
latest one.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-25 18:44                 ` Muhammad Usama Anjum
  2024-11-26  7:32                   ` stsp
@ 2024-11-26  9:41                   ` stsp
  1 sibling, 0 replies; 19+ messages in thread
From: stsp @ 2024-11-26  9:41 UTC (permalink / raw)
  To: Muhammad Usama Anjum, Peter Xu; +Cc: Linux kernel

25.11.2024 21:44, Muhammad Usama Anjum пишет:
> The UFFD_FEATURE_WP_ASYNC was designed for exactly this case.
> The IOCTL will return you the modified pages. An example of usage
> can be found in selftest/mm/pagemap_ioctl.c.
Am I right that in this case there
are no more async notifications?
I.e. I need to run PAGEMAP_SCAN
ioctl() periodically, and there is
nothing to "ping" me on that?
Or have I missed that part in the
pagemap_ioctl.c?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-26  7:39             ` stsp
@ 2024-11-26 15:50               ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2024-11-26 15:50 UTC (permalink / raw)
  To: stsp; +Cc: Axel Rasmussen, Linux kernel, Muhammad Usama Anjum, Mike Rapoport

On Tue, Nov 26, 2024 at 10:39:19AM +0300, stsp wrote:
> Please, just why do you have that UFFD_API
> const? Only to call every screw-up like this
> one, a sailed ship? :)
> Why not to add UFFD_API_v2?
> Then UFFD_API_v3?
> Full binary and source compatibility is
> therefore preserved, you only need to
> update the man page to document the
> latest one.

We could.  I'd say we only need UFFDIO_FEATURE to fetch the features, if we
want the interface clean.  I just never feel strongly to add it.

"Creating two fds" is indeed awkward, but that isn't too bad either when
the userapp can easily wrap a function to do that, open+close the fd
within.  Actually that's what our unit test was doing.

int uffd_get_features(uint64_t *features)
{
	struct uffdio_api uffdio_api = { .api = UFFD_API, .features = 0 };
	/*
	 * This should by default work in most kernels; the feature list
	 * will be the same no matter what we pass in here.
	 */
	int fd = uffd_open(UFFD_USER_MODE_ONLY);

	if (fd < 0)
		/* Maybe the kernel is older than user-only mode? */
		fd = uffd_open(0);

	if (fd < 0)
		return fd;

	if (ioctl(fd, UFFDIO_API, &uffdio_api)) {
		close(fd);
		return -errno;
	}

	*features = uffdio_api.features;
	close(fd);

	return 0;
}

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-26  7:32                   ` stsp
@ 2024-11-26 15:56                     ` Peter Xu
  2024-11-26 16:16                       ` stsp
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Xu @ 2024-11-26 15:56 UTC (permalink / raw)
  To: stsp; +Cc: Muhammad Usama Anjum, Linux kernel

On Tue, Nov 26, 2024 at 10:32:28AM +0300, stsp wrote:
> 25.11.2024 21:44, Muhammad Usama Anjum пишет:
> > The UFFD_FEATURE_WP_ASYNC was designed for exactly this case.
> > The IOCTL will return you the modified pages. An example of usage
> > can be found in selftest/mm/pagemap_ioctl.c.
> Thank you!
> I studied the examples.
> They are quite clear and good
> for copy/pasting purposes.
> 
> I had yet another "problem":
> I tried to create the usefaultfd,
> then register the API and fork()
> the process. In child I do UFFDIO_REGISTER,

This doesn't sound like the right thing to do.. as the fd (returned from
syscall(userfaultfd)) should be linked to a specific mm.  If the parent
invoked that syscall, it's linked to the parent address space, not child.
You may want to do syscall(userfalut) in child process, then pass it over
with scm rights.  Otherwise IIUC the trap will be armed on parent virtual
address space.

> but the parent can't see that.
> So instead of copy by fork, I
> had to use SCM_RIGHTS.
> Is this expected, or should it
> work fine with forked fd? That
> would be a bit simpler than to
> apply to SCM_RIGHTS tricks.

If child is the process you'd like to monitor, I think scm rights is
exactly the right approach.  Otherwise you can have a look at
UFFD_FEATURE_EVENT_FORK.  However that's not designed for child-only traps,
IIUC.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-26 15:56                     ` Peter Xu
@ 2024-11-26 16:16                       ` stsp
  2024-11-26 17:41                         ` Peter Xu
  0 siblings, 1 reply; 19+ messages in thread
From: stsp @ 2024-11-26 16:16 UTC (permalink / raw)
  To: Peter Xu; +Cc: Muhammad Usama Anjum, Linux kernel

26.11.2024 18:56, Peter Xu пишет:
> This doesn't sound like the right thing to do.. as the fd (returned from
> syscall(userfaultfd)) should be linked to a specific mm.  If the parent
> invoked that syscall, it's linked to the parent address space, not child.
> You may want to do syscall(userfalut) in child process, then pass it over
> with scm rights.  Otherwise IIUC the trap will be armed on parent virtual
> address space.
Ok, thanks for info.
man page doesn't seem to describe
the multi-process case, so both fork()
and SCM_RIGHTS were just a guesses
on my side, one of which worked.
Probably something to add to the doc.

The last problem I had (last one, I promise! :)
is that if I remove O_NONBLOCK, then
the entire app hangs. It turns out, w/o
O_NONBLOCK, userfaultfd's fd awakes
the select() call with the ready-to-read
descriptor at the very beginning, long
before any fault is detected. Then it
goes to read() and blocks forever. My
code is not prepared for read() blocking
after select().
I then checked and double-checked
and re-checked that with O_NONBLOCK
nothing like that happens at all: select()
is not awaken until the faults are coming.
It could be that select awakes anyway
but read() doesn block, but no, its not
the case. In nonblock mode select()
awakes only when it should. And in
blocking mode - it awakes immediately,
leading to a hang.
Is this a bug?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL
  2024-11-26 16:16                       ` stsp
@ 2024-11-26 17:41                         ` Peter Xu
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Xu @ 2024-11-26 17:41 UTC (permalink / raw)
  To: stsp; +Cc: Muhammad Usama Anjum, Linux kernel

On Tue, Nov 26, 2024 at 07:16:19PM +0300, stsp wrote:
> 26.11.2024 18:56, Peter Xu пишет:
> > This doesn't sound like the right thing to do.. as the fd (returned from
> > syscall(userfaultfd)) should be linked to a specific mm.  If the parent
> > invoked that syscall, it's linked to the parent address space, not child.
> > You may want to do syscall(userfalut) in child process, then pass it over
> > with scm rights.  Otherwise IIUC the trap will be armed on parent virtual
> > address space.
> Ok, thanks for info.
> man page doesn't seem to describe
> the multi-process case, so both fork()
> and SCM_RIGHTS were just a guesses
> on my side, one of which worked.
> Probably something to add to the doc.
> 
> The last problem I had (last one, I promise! :)
> is that if I remove O_NONBLOCK, then
> the entire app hangs. It turns out, w/o
> O_NONBLOCK, userfaultfd's fd awakes
> the select() call with the ready-to-read
> descriptor at the very beginning, long

I highly suspect it's not a real POLLIN, but POLLERR.  See:

userfaultfd_poll():
	/*
	 * poll() never guarantees that read won't block.
	 * userfaults can be waken before they're read().
	 */
	if (unlikely(!(file->f_flags & O_NONBLOCK)))
		return EPOLLERR;

I suppose select() will report that in readfds[].

> before any fault is detected. Then it
> goes to read() and blocks forever. My
> code is not prepared for read() blocking
> after select().
> I then checked and double-checked
> and re-checked that with O_NONBLOCK
> nothing like that happens at all: select()
> is not awaken until the faults are coming.
> It could be that select awakes anyway
> but read() doesn block, but no, its not
> the case. In nonblock mode select()
> awakes only when it should. And in
> blocking mode - it awakes immediately,
> leading to a hang.
> Is this a bug?

Not a bug, but, AFAIU, a design decision.  If you're interested, you can
read commit ba85c702e4b.

Userfaultfd is a special kind of fd, and poll()/select() doesn't always
mean that the next read() is not going to block.  Fundamentally it's
because data-ready event is based on waitqueue, while waitqueue can change
between a select() v.s. a read() later, so the waited entry can be removed
within the short period.

In short, please stick with NONBLOCK on userfaultfd.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-11-26 17:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-23 15:13 userfaultfd: two-step UFFDIO_API always gives -EINVAL stsp
2024-11-25  9:05 ` stsp
2024-11-25 15:59 ` Peter Xu
2024-11-25 16:15   ` stsp
2024-11-25 16:58     ` Peter Xu
2024-11-25 17:07       ` stsp
2024-11-25 17:13         ` Peter Xu
2024-11-25 17:32           ` stsp
2024-11-25 17:44             ` Peter Xu
2024-11-25 18:01               ` stsp
2024-11-25 18:44                 ` Muhammad Usama Anjum
2024-11-26  7:32                   ` stsp
2024-11-26 15:56                     ` Peter Xu
2024-11-26 16:16                       ` stsp
2024-11-26 17:41                         ` Peter Xu
2024-11-26  9:41                   ` stsp
2024-11-25 22:42           ` Axel Rasmussen
2024-11-26  7:39             ` stsp
2024-11-26 15:50               ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox