* userfaultfd: two-step UFFDIO_API always gives -EINVAL @ 2024-11-23 15:13 stsp 2024-11-25 9:05 ` stsp 2024-11-25 15:59 ` Peter Xu 0 siblings, 2 replies; 19+ messages in thread From: stsp @ 2024-11-23 15:13 UTC (permalink / raw) To: Linux kernel; +Cc: Muhammad Usama Anjum, Peter Xu Hello. I tried to use userfaultfd and got that strange result: when I first do UFFDIO_API ioctl with features = 0, it succeeds. I check the needed features, and they are all in place. But on the second step, where I request the needed features, UFFDIO_API gives -EINVAL no matter what features I requested (or even set features to 0 again). A quick look into the kernel code suggests that the problem is that uffd_ctx_features() doesn't check user_features for being 0, and just sets UFFD_FEATURE_INITIALIZED with no features at all. After that, userfaultfd_api() should always fail with -EINVAL when doing this: ctx_features = uffd_ctx_features(features); ret = -EINVAL; if (cmpxchg(&ctx->features, 0, ctx_features) != 0) goto err_out; But I haven't checked my finding by rebuilding the kernel. So is this broken or am I missing something? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-23 15:13 userfaultfd: two-step UFFDIO_API always gives -EINVAL stsp @ 2024-11-25 9:05 ` stsp 2024-11-25 15:59 ` Peter Xu 1 sibling, 0 replies; 19+ messages in thread From: stsp @ 2024-11-25 9:05 UTC (permalink / raw) To: Linux kernel; +Cc: Muhammad Usama Anjum, Peter Xu 23.11.2024 18:13, stsp пишет: > Hello. > > I tried to use userfaultfd and got > that strange result: when I first do > UFFDIO_API ioctl with features = 0, > it succeeds. I check the needed > features, and they are all in place. > But on the second step, where I > request the needed features, > UFFDIO_API gives -EINVAL no matter > what features I requested (or even > set features to 0 again). With the test patch below, the problem can be reproduced. All the code in selftests suggest that UFFDIO_API should not be called twice, whereas man page says this: ``` After the userfaultfd object is created with userfaultfd(), the applica‐ tion must enable it using the UFFDIO_API ioctl(2) operation. This oper‐ ation allows a two-step handshake between the kernel and user space to determine what API version and features the kernel supports, and then to enable those features user space wants. ``` But the second part doesn't work and is never being tested in selftests. So is this a documentation problem? This patch can be used to make sure the second call doesn't work: --- a/tools/testing/selftests/mm/uffd-unit-tests.c +++ b/tools/testing/selftests/mm/uffd-unit-tests.c @@ -171,6 +171,14 @@ static int test_uffd_api(bool use_dev) goto out; } + /* Request valid feature via UFFDIO_API */ + uffdio_api.api = UFFD_API; + uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP; + if (ioctl(uffd, UFFDIO_API, &uffdio_api)) { + uffd_test_fail("UFFDIO_API should succeed but failed"); + exit(1); + } + uffd_test_pass(); out: close(uffd); ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-23 15:13 userfaultfd: two-step UFFDIO_API always gives -EINVAL stsp 2024-11-25 9:05 ` stsp @ 2024-11-25 15:59 ` Peter Xu 2024-11-25 16:15 ` stsp 1 sibling, 1 reply; 19+ messages in thread From: Peter Xu @ 2024-11-25 15:59 UTC (permalink / raw) To: stsp; +Cc: Linux kernel, Muhammad Usama Anjum On Sat, Nov 23, 2024 at 06:13:01PM +0300, stsp wrote: > Hello. > > I tried to use userfaultfd and got > that strange result: when I first do > UFFDIO_API ioctl with features = 0, > it succeeds. I check the needed > features, and they are all in place. > But on the second step, where I > request the needed features, > UFFDIO_API gives -EINVAL no matter > what features I requested (or even > set features to 0 again). > > A quick look into the kernel code > suggests that the problem is that > uffd_ctx_features() doesn't check > user_features for being 0, and just > sets UFFD_FEATURE_INITIALIZED > with no features at all. After that, > userfaultfd_api() should always > fail with -EINVAL when doing this: > > ctx_features = uffd_ctx_features(features); > ret = -EINVAL; > if (cmpxchg(&ctx->features, 0, ctx_features) != 0) > goto err_out; > > But I haven't checked my finding > by rebuilding the kernel. > So is this broken or am I missing > something? I agree it's slightly confusing but it's intended. It's like that since the start, so I think we should still keep the behavior. The userapp needs to create one extra userfaultfd to detect supported features in the kernel. To use it after a probe request, you'll need to close() the fd, redo the userfaultfd syscall to create another fd. The kernel cannot assume features==0 to be a pure query, because not all userfaultfd features requires setting of a feature bit. E.g., the default anonymous missing traps doesn't require any feature bit to set. So the initial UFFDIO_API(features=0) is the enablement of such feature. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 15:59 ` Peter Xu @ 2024-11-25 16:15 ` stsp 2024-11-25 16:58 ` Peter Xu 0 siblings, 1 reply; 19+ messages in thread From: stsp @ 2024-11-25 16:15 UTC (permalink / raw) To: Peter Xu; +Cc: Linux kernel, Muhammad Usama Anjum 25.11.2024 18:59, Peter Xu пишет: > I agree it's slightly confusing but it's intended. It's like that since > the start, so I think we should still keep the behavior. > > The userapp needs to create one extra userfaultfd to detect supported > features in the kernel. To use it after a probe request, you'll need to > close() the fd, redo the userfaultfd syscall to create another fd. Hi Peter, thanks for info. Unfortunately man page doesn't say that. In fact if it did, I won't be using the second userfaultfd just for that, anyway. :) Man page clearly talks about "the userfaultfd object" (one object) when mandating the "two-step handshake". I spent hours of head-scratching before went looking into the sources, and even then I was confident the man page is right: people should always assume documentation is correct, code is buggy. Would it be possible to re-document this part? As all test-cases in kernel do not use 2-steps - how about just removing that part from man page? Suggesting another fd would be strange. :) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 16:15 ` stsp @ 2024-11-25 16:58 ` Peter Xu 2024-11-25 17:07 ` stsp 0 siblings, 1 reply; 19+ messages in thread From: Peter Xu @ 2024-11-25 16:58 UTC (permalink / raw) To: stsp; +Cc: Linux kernel, Muhammad Usama Anjum, Axel Rasmussen, Mike Rapoport On Mon, Nov 25, 2024 at 07:15:10PM +0300, stsp wrote: > 25.11.2024 18:59, Peter Xu пишет: > > I agree it's slightly confusing but it's intended. It's like that since > > the start, so I think we should still keep the behavior. > > > > The userapp needs to create one extra userfaultfd to detect supported > > features in the kernel. To use it after a probe request, you'll need to > > close() the fd, redo the userfaultfd syscall to create another fd. > Hi Peter, thanks for info. > Unfortunately man page doesn't > say that. In fact if it did, I won't be > using the second userfaultfd just > for that, anyway. :) But AFAIU that's the only way to probe kernel userfaultfd features.. so if we need a probe we need to have two fds. > > Man page clearly talks about > "the userfaultfd object" (one object) > when mandating the "two-step handshake". > I spent hours of head-scratching > before went looking into the sources, > and even then I was confident the man > page is right: people should always assume > documentation is correct, code is buggy. Hmm yes. I didn't pay much attention initially, but then after I read the latest man-pages/, especially "UFFDIO_API(2const)" I found it looks indeed wrong in the doc. In this case we can't change the code because we need to keep it working like before to not break ABI. We may still update the doc. IIUC the two-step was mentioned since this patch: https://lore.kernel.org/lkml/20230919190206.388896-6-axelrasmussen@google.com/#t So I also copied Axel and Mike, just to make sure I didn't miss something. > > Would it be possible to re-document > this part? As all test-cases in kernel > do not use 2-steps - how about just > removing that part from man page? > Suggesting another fd would be strange. :) I would actually suggest mention another fd is needed for probing features. But you can wait for some comment from either Axel or Mike to double check that should be either removed or proposed. For man-pages contribution in case you're interested, see: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/CONTRIBUTING Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 16:58 ` Peter Xu @ 2024-11-25 17:07 ` stsp 2024-11-25 17:13 ` Peter Xu 0 siblings, 1 reply; 19+ messages in thread From: stsp @ 2024-11-25 17:07 UTC (permalink / raw) To: Peter Xu Cc: Linux kernel, Muhammad Usama Anjum, Axel Rasmussen, Mike Rapoport 25.11.2024 19:58, Peter Xu пишет: > On Mon, Nov 25, 2024 at 07:15:10PM +0300, stsp wrote: >> Man page clearly talks about >> "the userfaultfd object" (one object) >> when mandating the "two-step handshake". >> I spent hours of head-scratching >> before went looking into the sources, >> and even then I was confident the man >> page is right: people should always assume >> documentation is correct, code is buggy. > Hmm yes. I didn't pay much attention initially, but then after I read the > latest man-pages/, especially "UFFDIO_API(2const)" I found it looks indeed > wrong in the doc. > > In this case we can't change the code because we need to keep it working > like before to not break ABI. We may still update the doc. I wonder if some non-ABI-breaker is possible, like eg keep the current behavior of "features=0", but allow to (optionally) override that by a non-0 request? Yes, I've seen kselftests are trying to double-register after 0, but IIRC they tried to register wrong options, which would fail anyway. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 17:07 ` stsp @ 2024-11-25 17:13 ` Peter Xu 2024-11-25 17:32 ` stsp 2024-11-25 22:42 ` Axel Rasmussen 0 siblings, 2 replies; 19+ messages in thread From: Peter Xu @ 2024-11-25 17:13 UTC (permalink / raw) To: stsp; +Cc: Linux kernel, Muhammad Usama Anjum, Axel Rasmussen, Mike Rapoport On Mon, Nov 25, 2024 at 08:07:34PM +0300, stsp wrote: > 25.11.2024 19:58, Peter Xu пишет: > > On Mon, Nov 25, 2024 at 07:15:10PM +0300, stsp wrote: > > > Man page clearly talks about > > > "the userfaultfd object" (one object) > > > when mandating the "two-step handshake". > > > I spent hours of head-scratching > > > before went looking into the sources, > > > and even then I was confident the man > > > page is right: people should always assume > > > documentation is correct, code is buggy. > > Hmm yes. I didn't pay much attention initially, but then after I read the > > latest man-pages/, especially "UFFDIO_API(2const)" I found it looks indeed > > wrong in the doc. > > > > In this case we can't change the code because we need to keep it working > > like before to not break ABI. We may still update the doc. > I wonder if some non-ABI-breaker > is possible, like eg keep the current > behavior of "features=0", but allow > to (optionally) override that by a > non-0 request? Yes, I've seen kselftests > are trying to double-register after 0, > but IIRC they tried to register wrong > options, which would fail anyway. Old kernels will fail with -EINVAL, new will succeed. It's already an ABI violation, IMHO. Not to mention I'm not sure what could happen if uffd feature flags can change on the fly. Your proposal means it can happen when anon missing trap is enabled at least. That's probably unwanted, and unnecessary complexity to maintain to the kernel. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 17:13 ` Peter Xu @ 2024-11-25 17:32 ` stsp 2024-11-25 17:44 ` Peter Xu 2024-11-25 22:42 ` Axel Rasmussen 1 sibling, 1 reply; 19+ messages in thread From: stsp @ 2024-11-25 17:32 UTC (permalink / raw) To: Peter Xu; +Cc: Linux kernel, Muhammad Usama Anjum 25.11.2024 20:13, Peter Xu пишет: > Old kernels will fail with -EINVAL, new will succeed. It's already an ABI > violation, IMHO. > > Not to mention I'm not sure what could happen if uffd feature flags can > change on the fly. Your proposal means it can happen when anon missing > trap is enabled at least. That's probably unwanted, and unnecessary > complexity to maintain to the kernel. OK, thanks for considering. By the way, as we are at it, I have this usage question. I initially intended to use UFFD_FEATURE_WP_ASYNC, but it appears (and is documented so) to not deliver any notification. Why not? I am currently using UFFD_FEATURE_PAGEFAULT_FLAG_WP, but I only want to monitor the fact that the page was written to. With UFFD_FEATURE_WP_ASYNC it would be much faster, as the kernel resolves the fault for me. Yes, I've seen the mentioning of /proc/pages in docs (I don't even have /proc/pages - perhaps it was ment to be /proc/<pid>/pages?), but why such a complexity if all I need is the notification similar to what I get from UFFD_FEATURE_PAGEFAULT_FLAG_WP? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 17:32 ` stsp @ 2024-11-25 17:44 ` Peter Xu 2024-11-25 18:01 ` stsp 0 siblings, 1 reply; 19+ messages in thread From: Peter Xu @ 2024-11-25 17:44 UTC (permalink / raw) To: stsp; +Cc: Linux kernel, Muhammad Usama Anjum On Mon, Nov 25, 2024 at 08:32:54PM +0300, stsp wrote: > 25.11.2024 20:13, Peter Xu пишет: > > Old kernels will fail with -EINVAL, new will succeed. It's already an ABI > > violation, IMHO. > > > > Not to mention I'm not sure what could happen if uffd feature flags can > > change on the fly. Your proposal means it can happen when anon missing > > trap is enabled at least. That's probably unwanted, and unnecessary > > complexity to maintain to the kernel. > OK, thanks for considering. > > By the way, as we are at it, I have > this usage question. I initially intended > to use UFFD_FEATURE_WP_ASYNC, but > it appears (and is documented so) to not > deliver any notification. > Why not? > I am currently using UFFD_FEATURE_PAGEFAULT_FLAG_WP, > but I only want to monitor the fact that > the page was written to. With > UFFD_FEATURE_WP_ASYNC it would be > much faster, as the kernel resolves the > fault for me. Yes, I've seen the mentioning > of /proc/pages in docs (I don't even have > /proc/pages - perhaps it was ment to be > /proc/<pid>/pages?), but why such a > complexity if all I need is the notification > similar to what I get from > UFFD_FEATURE_PAGEFAULT_FLAG_WP? Apps who tracks snapshots needs the unmodified pages before being written. Those cannot rely on kernel resolution because it needs more than "if the page is written" - it also needs the page data before being written. -- Peter Xu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 17:44 ` Peter Xu @ 2024-11-25 18:01 ` stsp 2024-11-25 18:44 ` Muhammad Usama Anjum 0 siblings, 1 reply; 19+ messages in thread From: stsp @ 2024-11-25 18:01 UTC (permalink / raw) To: Peter Xu; +Cc: Linux kernel, Muhammad Usama Anjum 25.11.2024 20:44, Peter Xu пишет: > Apps who tracks snapshots needs the unmodified pages before being written. > Those cannot rely on kernel resolution because it needs more than "if the > page is written" - it also needs the page data before being written. Say I am writing a frame grabber (not exactly, but very close to). I monitor the video buffer of another process, and "snapshot" it with some frequency. I only need to know what pages were modified, to reduce the bandwidth to an absolute minimum, and if the process is not playing - then to not grab anything until it resumes. UFFD_FEATURE_PAGEFAULT_FLAG_WP works quite well for me already, but I envision a huge boost with UFFD_FEATURE_WP_ASYNC. What would you suggest for that usage scenario? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 18:01 ` stsp @ 2024-11-25 18:44 ` Muhammad Usama Anjum 2024-11-26 7:32 ` stsp 2024-11-26 9:41 ` stsp 0 siblings, 2 replies; 19+ messages in thread From: Muhammad Usama Anjum @ 2024-11-25 18:44 UTC (permalink / raw) To: stsp, Peter Xu; +Cc: Usama.Anjum, Linux kernel On 11/25/24 11:01 PM, stsp wrote: > 25.11.2024 20:44, Peter Xu пишет: >> Apps who tracks snapshots needs the unmodified pages before being written. >> Those cannot rely on kernel resolution because it needs more than "if the >> page is written" - it also needs the page data before being written. > Say I am writing a frame grabber > (not exactly, but very close to). > I monitor the video buffer of another > process, and "snapshot" it with some > frequency. I only need to know what > pages were modified, to reduce the > bandwidth to an absolute minimum, > and if the process is not playing - then > to not grab anything until it resumes. > UFFD_FEATURE_PAGEFAULT_FLAG_WP > works quite well for me already, but > I envision a huge boost with > UFFD_FEATURE_WP_ASYNC. > What would you suggest for that usage > scenario? The UFFD_FEATURE_WP_ASYNC was designed for exactly this case. The IOCTL will return you the modified pages. An example of usage can be found in selftest/mm/pagemap_ioctl.c. -- BR, Muhammad Usama Anjum ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 18:44 ` Muhammad Usama Anjum @ 2024-11-26 7:32 ` stsp 2024-11-26 15:56 ` Peter Xu 2024-11-26 9:41 ` stsp 1 sibling, 1 reply; 19+ messages in thread From: stsp @ 2024-11-26 7:32 UTC (permalink / raw) To: Muhammad Usama Anjum, Peter Xu; +Cc: Linux kernel 25.11.2024 21:44, Muhammad Usama Anjum пишет: > The UFFD_FEATURE_WP_ASYNC was designed for exactly this case. > The IOCTL will return you the modified pages. An example of usage > can be found in selftest/mm/pagemap_ioctl.c. Thank you! I studied the examples. They are quite clear and good for copy/pasting purposes. I had yet another "problem": I tried to create the usefaultfd, then register the API and fork() the process. In child I do UFFDIO_REGISTER, but the parent can't see that. So instead of copy by fork, I had to use SCM_RIGHTS. Is this expected, or should it work fine with forked fd? That would be a bit simpler than to apply to SCM_RIGHTS tricks. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-26 7:32 ` stsp @ 2024-11-26 15:56 ` Peter Xu 2024-11-26 16:16 ` stsp 0 siblings, 1 reply; 19+ messages in thread From: Peter Xu @ 2024-11-26 15:56 UTC (permalink / raw) To: stsp; +Cc: Muhammad Usama Anjum, Linux kernel On Tue, Nov 26, 2024 at 10:32:28AM +0300, stsp wrote: > 25.11.2024 21:44, Muhammad Usama Anjum пишет: > > The UFFD_FEATURE_WP_ASYNC was designed for exactly this case. > > The IOCTL will return you the modified pages. An example of usage > > can be found in selftest/mm/pagemap_ioctl.c. > Thank you! > I studied the examples. > They are quite clear and good > for copy/pasting purposes. > > I had yet another "problem": > I tried to create the usefaultfd, > then register the API and fork() > the process. In child I do UFFDIO_REGISTER, This doesn't sound like the right thing to do.. as the fd (returned from syscall(userfaultfd)) should be linked to a specific mm. If the parent invoked that syscall, it's linked to the parent address space, not child. You may want to do syscall(userfalut) in child process, then pass it over with scm rights. Otherwise IIUC the trap will be armed on parent virtual address space. > but the parent can't see that. > So instead of copy by fork, I > had to use SCM_RIGHTS. > Is this expected, or should it > work fine with forked fd? That > would be a bit simpler than to > apply to SCM_RIGHTS tricks. If child is the process you'd like to monitor, I think scm rights is exactly the right approach. Otherwise you can have a look at UFFD_FEATURE_EVENT_FORK. However that's not designed for child-only traps, IIUC. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-26 15:56 ` Peter Xu @ 2024-11-26 16:16 ` stsp 2024-11-26 17:41 ` Peter Xu 0 siblings, 1 reply; 19+ messages in thread From: stsp @ 2024-11-26 16:16 UTC (permalink / raw) To: Peter Xu; +Cc: Muhammad Usama Anjum, Linux kernel 26.11.2024 18:56, Peter Xu пишет: > This doesn't sound like the right thing to do.. as the fd (returned from > syscall(userfaultfd)) should be linked to a specific mm. If the parent > invoked that syscall, it's linked to the parent address space, not child. > You may want to do syscall(userfalut) in child process, then pass it over > with scm rights. Otherwise IIUC the trap will be armed on parent virtual > address space. Ok, thanks for info. man page doesn't seem to describe the multi-process case, so both fork() and SCM_RIGHTS were just a guesses on my side, one of which worked. Probably something to add to the doc. The last problem I had (last one, I promise! :) is that if I remove O_NONBLOCK, then the entire app hangs. It turns out, w/o O_NONBLOCK, userfaultfd's fd awakes the select() call with the ready-to-read descriptor at the very beginning, long before any fault is detected. Then it goes to read() and blocks forever. My code is not prepared for read() blocking after select(). I then checked and double-checked and re-checked that with O_NONBLOCK nothing like that happens at all: select() is not awaken until the faults are coming. It could be that select awakes anyway but read() doesn block, but no, its not the case. In nonblock mode select() awakes only when it should. And in blocking mode - it awakes immediately, leading to a hang. Is this a bug? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-26 16:16 ` stsp @ 2024-11-26 17:41 ` Peter Xu 0 siblings, 0 replies; 19+ messages in thread From: Peter Xu @ 2024-11-26 17:41 UTC (permalink / raw) To: stsp; +Cc: Muhammad Usama Anjum, Linux kernel On Tue, Nov 26, 2024 at 07:16:19PM +0300, stsp wrote: > 26.11.2024 18:56, Peter Xu пишет: > > This doesn't sound like the right thing to do.. as the fd (returned from > > syscall(userfaultfd)) should be linked to a specific mm. If the parent > > invoked that syscall, it's linked to the parent address space, not child. > > You may want to do syscall(userfalut) in child process, then pass it over > > with scm rights. Otherwise IIUC the trap will be armed on parent virtual > > address space. > Ok, thanks for info. > man page doesn't seem to describe > the multi-process case, so both fork() > and SCM_RIGHTS were just a guesses > on my side, one of which worked. > Probably something to add to the doc. > > The last problem I had (last one, I promise! :) > is that if I remove O_NONBLOCK, then > the entire app hangs. It turns out, w/o > O_NONBLOCK, userfaultfd's fd awakes > the select() call with the ready-to-read > descriptor at the very beginning, long I highly suspect it's not a real POLLIN, but POLLERR. See: userfaultfd_poll(): /* * poll() never guarantees that read won't block. * userfaults can be waken before they're read(). */ if (unlikely(!(file->f_flags & O_NONBLOCK))) return EPOLLERR; I suppose select() will report that in readfds[]. > before any fault is detected. Then it > goes to read() and blocks forever. My > code is not prepared for read() blocking > after select(). > I then checked and double-checked > and re-checked that with O_NONBLOCK > nothing like that happens at all: select() > is not awaken until the faults are coming. > It could be that select awakes anyway > but read() doesn block, but no, its not > the case. In nonblock mode select() > awakes only when it should. And in > blocking mode - it awakes immediately, > leading to a hang. > Is this a bug? Not a bug, but, AFAIU, a design decision. If you're interested, you can read commit ba85c702e4b. Userfaultfd is a special kind of fd, and poll()/select() doesn't always mean that the next read() is not going to block. Fundamentally it's because data-ready event is based on waitqueue, while waitqueue can change between a select() v.s. a read() later, so the waited entry can be removed within the short period. In short, please stick with NONBLOCK on userfaultfd. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 18:44 ` Muhammad Usama Anjum 2024-11-26 7:32 ` stsp @ 2024-11-26 9:41 ` stsp 1 sibling, 0 replies; 19+ messages in thread From: stsp @ 2024-11-26 9:41 UTC (permalink / raw) To: Muhammad Usama Anjum, Peter Xu; +Cc: Linux kernel 25.11.2024 21:44, Muhammad Usama Anjum пишет: > The UFFD_FEATURE_WP_ASYNC was designed for exactly this case. > The IOCTL will return you the modified pages. An example of usage > can be found in selftest/mm/pagemap_ioctl.c. Am I right that in this case there are no more async notifications? I.e. I need to run PAGEMAP_SCAN ioctl() periodically, and there is nothing to "ping" me on that? Or have I missed that part in the pagemap_ioctl.c? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 17:13 ` Peter Xu 2024-11-25 17:32 ` stsp @ 2024-11-25 22:42 ` Axel Rasmussen 2024-11-26 7:39 ` stsp 1 sibling, 1 reply; 19+ messages in thread From: Axel Rasmussen @ 2024-11-25 22:42 UTC (permalink / raw) To: Peter Xu; +Cc: stsp, Linux kernel, Muhammad Usama Anjum, Mike Rapoport On Mon, Nov 25, 2024 at 9:13 AM Peter Xu <peterx@redhat.com> wrote: > > On Mon, Nov 25, 2024 at 08:07:34PM +0300, stsp wrote: > > 25.11.2024 19:58, Peter Xu пишет: > > > On Mon, Nov 25, 2024 at 07:15:10PM +0300, stsp wrote: > > > > Man page clearly talks about > > > > "the userfaultfd object" (one object) > > > > when mandating the "two-step handshake". > > > > I spent hours of head-scratching > > > > before went looking into the sources, > > > > and even then I was confident the man > > > > page is right: people should always assume > > > > documentation is correct, code is buggy. > > > Hmm yes. I didn't pay much attention initially, but then after I read the > > > latest man-pages/, especially "UFFDIO_API(2const)" I found it looks indeed > > > wrong in the doc. > > > > > > In this case we can't change the code because we need to keep it working > > > like before to not break ABI. We may still update the doc. > > I wonder if some non-ABI-breaker > > is possible, like eg keep the current > > behavior of "features=0", but allow > > to (optionally) override that by a > > non-0 request? Yes, I've seen kselftests > > are trying to double-register after 0, > > but IIRC they tried to register wrong > > options, which would fail anyway. > > Old kernels will fail with -EINVAL, new will succeed. It's already an ABI > violation, IMHO. > > Not to mention I'm not sure what could happen if uffd feature flags can > change on the fly. Your proposal means it can happen when anon missing > trap is enabled at least. That's probably unwanted, and unnecessary > complexity to maintain to the kernel. > > Thanks, > > -- > Peter Xu I agree with Peter, we should just update the man page to mention UFFDIO_API can only be called once, so probing requires a second userfaultfd. I think not mentioning that was just an oversight from the last time I updated the man page. For what it's worth, I still don't like the two-step handshake design, my preference is still an API like this: 1. userspace asks for the features it wants 2. kernel responds with the (possibly subset of) features it actually supports 3. userspace is free to carry on with perhaps limited features, or exit with error, or ... But, I think at that point the ship has already sailed. I think to maintain compatibility with existing programs there isn't much we can do at this point. Also from the last time we discussed this I think I am somewhat alone preferring that design. :) Also I agree that allowing > 1 UFFDIO_API calls is potentially tricky. Calling it with features = 0 isn't just for probing, it also gives you a fully initialized userfaultfd which could be used out of the box (if you don't need any additional features). If nobody else is interested in doing it I can send a patch to fixup the man page at some point in the near future. > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-25 22:42 ` Axel Rasmussen @ 2024-11-26 7:39 ` stsp 2024-11-26 15:50 ` Peter Xu 0 siblings, 1 reply; 19+ messages in thread From: stsp @ 2024-11-26 7:39 UTC (permalink / raw) To: Axel Rasmussen, Peter Xu Cc: Linux kernel, Muhammad Usama Anjum, Mike Rapoport 26.11.2024 01:42, Axel Rasmussen пишет: > For what it's worth, I still don't like the two-step handshake design, > my preference is still an API like this: > > 1. userspace asks for the features it wants > 2. kernel responds with the (possibly subset of) features it actually supports > 3. userspace is free to carry on with perhaps limited features, or > exit with error, or ... 4. pass the needed features to UFFDIO_REGISTER, correct? > But, I think at that point the ship has already sailed. I think to > maintain compatibility with existing programs there isn't much we can > do at this point. Please, just why do you have that UFFD_API const? Only to call every screw-up like this one, a sailed ship? :) Why not to add UFFD_API_v2? Then UFFD_API_v3? Full binary and source compatibility is therefore preserved, you only need to update the man page to document the latest one. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: userfaultfd: two-step UFFDIO_API always gives -EINVAL 2024-11-26 7:39 ` stsp @ 2024-11-26 15:50 ` Peter Xu 0 siblings, 0 replies; 19+ messages in thread From: Peter Xu @ 2024-11-26 15:50 UTC (permalink / raw) To: stsp; +Cc: Axel Rasmussen, Linux kernel, Muhammad Usama Anjum, Mike Rapoport On Tue, Nov 26, 2024 at 10:39:19AM +0300, stsp wrote: > Please, just why do you have that UFFD_API > const? Only to call every screw-up like this > one, a sailed ship? :) > Why not to add UFFD_API_v2? > Then UFFD_API_v3? > Full binary and source compatibility is > therefore preserved, you only need to > update the man page to document the > latest one. We could. I'd say we only need UFFDIO_FEATURE to fetch the features, if we want the interface clean. I just never feel strongly to add it. "Creating two fds" is indeed awkward, but that isn't too bad either when the userapp can easily wrap a function to do that, open+close the fd within. Actually that's what our unit test was doing. int uffd_get_features(uint64_t *features) { struct uffdio_api uffdio_api = { .api = UFFD_API, .features = 0 }; /* * This should by default work in most kernels; the feature list * will be the same no matter what we pass in here. */ int fd = uffd_open(UFFD_USER_MODE_ONLY); if (fd < 0) /* Maybe the kernel is older than user-only mode? */ fd = uffd_open(0); if (fd < 0) return fd; if (ioctl(fd, UFFDIO_API, &uffdio_api)) { close(fd); return -errno; } *features = uffdio_api.features; close(fd); return 0; } -- Peter Xu ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-11-26 17:41 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-23 15:13 userfaultfd: two-step UFFDIO_API always gives -EINVAL stsp 2024-11-25 9:05 ` stsp 2024-11-25 15:59 ` Peter Xu 2024-11-25 16:15 ` stsp 2024-11-25 16:58 ` Peter Xu 2024-11-25 17:07 ` stsp 2024-11-25 17:13 ` Peter Xu 2024-11-25 17:32 ` stsp 2024-11-25 17:44 ` Peter Xu 2024-11-25 18:01 ` stsp 2024-11-25 18:44 ` Muhammad Usama Anjum 2024-11-26 7:32 ` stsp 2024-11-26 15:56 ` Peter Xu 2024-11-26 16:16 ` stsp 2024-11-26 17:41 ` Peter Xu 2024-11-26 9:41 ` stsp 2024-11-25 22:42 ` Axel Rasmussen 2024-11-26 7:39 ` stsp 2024-11-26 15:50 ` Peter Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox