* [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting
@ 2025-02-14 15:34 Eric Sandeen
2025-02-14 15:37 ` [RFC PATCH 1/2] watch_queue: Fix pipe buffer allocation Eric Sandeen
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Eric Sandeen @ 2025-02-14 15:34 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Lukas Schauer
(Giant disclaimer: I had not even heard of watch_queue.c until this week.
So this might be garbage, but I think at least the bug is real and I think
the analysis is correct.)
We got a bug report stating that doing this in a loop:
pipe2(pipefd, O_NOTIFICATION_PIPE);
ioctl(pipefd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);
close(pipefd[0]);
close(pipefd[1]);
as an unprivileged user would eventually yield -EPERM. The bug actually
turned up when running
https://github.com/SELinuxProject/selinux-testsuite/tree/main/tests/watchkey
Analysis:
The -EPERM happens because the too_many_pipe_buffers_soft() test in
watch_queue_set_size() fails. That fails because user->pipe_bufs has
underflowed. user->pipe_bufs has underflowed because (abbreviated) ...
sys_pipe2
__do_pipe_flags
create_pipe_files
get_pipe_inode
alloc_pipe_info
pipe_bufs = PIPE_DEF_BUFFERS; // (16)
// charge 16 bufs to user
account_pipe_buffers(user, 0, pipe_bufs);
pipe->nr_accounted = pipe_bufs; // (16)
so now the pipe has nr_accounted set, normally to PIPE_DEF_BUFFERS (16)
Then, the ioctl:
IOC_WATCH_QUEUE_SET_SIZE
watch_queue_set_size(nr_notes)
nr_pages = nr_notes / WATCH_QUEUE_NOTES_PER_PAGE; // (8)
// reduce bufs charged to user for this pipe from 16 to 8
account_pipe_buffers(pipe->user, pipe->nr_accounted, nr_pages);
nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE;
pipe_resize_ring(pipe, roundup_pow_of_two(nr_slots == nr_notes));
if (!pipe_has_watch_queue(pipe))
pipe->nr_accounted = nr_slots;
Two things to note here:
Because pipe_has_watch_queue() is true, pipe->nr_accounted is not changed.
Also, pipe_resize_ring() resized to the number of notifications, not the
number of pages, though pipe_resize_ring() seems to expect a page count?
At this point wve hae charged 8 pages to user->pipe_bufs, but
pipe->nr_accounted is still 16. And, it seems that we have actually
allocated 8*32 pages to the pipe by virtue of calling pipe_resize_ring()
with nr_notes not nr_pages.
Finally, we close the pipes:
pipe_release
put_pipe_info
free_pipe_info
account_pipe_buffers(pipe->user, pipe->nr_accounted, 0);
This subtracts pipe->nr_accounted (16) from user->pipe_bufs, even though
per above we had only actually charged 8 to user->pipe_bufs, so it's a
net reduction. Do that in a loop, and eventually user->pipe_bufs underflows.
Maybe all that should be in the commit messages, but I'll try to reduce it for
now, any comments are appreciated.
(This does pass the handful of watch queue tests in LTP.)
thanks,
-Eric
^ permalink raw reply [flat|nested] 7+ messages in thread* [RFC PATCH 1/2] watch_queue: Fix pipe buffer allocation
2025-02-14 15:34 [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
@ 2025-02-14 15:37 ` Eric Sandeen
2025-02-14 15:38 ` [RFC PATCH 2/2] watch_queue: Fix pipe accounting Eric Sandeen
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2025-02-14 15:37 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Lukas Schauer
It seems that watch_queue_set_size() calls pipe_resize_ring() with the
number of notifications in play, not the number of pages required to
hold those notifications, as pipe_resize_ring() expects. Change from
nr_notes to nr_pages to fix this.
Fixes: c73be61cede58 ("pipe: Add general notification queue support")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 5267adeaa403..10b4d311e930 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -254,6 +254,8 @@ long watch_queue_set_size(struct pipe_inode_info *pipe, unsigned int nr_notes)
nr_pages = (nr_notes + WATCH_QUEUE_NOTES_PER_PAGE - 1);
nr_pages /= WATCH_QUEUE_NOTES_PER_PAGE;
+ /* Round nr_notes up to fill the pages */
+ nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE;
user_bufs = account_pipe_buffers(pipe->user, pipe->nr_accounted, nr_pages);
if (nr_pages > pipe->max_usage &&
@@ -264,8 +266,7 @@ long watch_queue_set_size(struct pipe_inode_info *pipe, unsigned int nr_notes)
goto error;
}
- nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE;
- ret = pipe_resize_ring(pipe, roundup_pow_of_two(nr_notes));
+ ret = pipe_resize_ring(pipe, nr_pages);
if (ret < 0)
goto error;
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC PATCH 2/2] watch_queue: Fix pipe accounting
2025-02-14 15:34 [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
2025-02-14 15:37 ` [RFC PATCH 1/2] watch_queue: Fix pipe buffer allocation Eric Sandeen
@ 2025-02-14 15:38 ` Eric Sandeen
2025-02-14 16:28 ` [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
2025-02-14 16:49 ` [RFC PATCH 2/2] watch_queue: Fix pipe accounting David Howells
3 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2025-02-14 15:38 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Lukas Schauer
Currently, watch_queue_set_size() modifies the pipe buffers charged to
user->pipe_bufs without updating the pipe->nr_accounted on the pipe
itself, due to the if (!pipe_has_watch_queue()) test in
pipe_resize_ring(). This means that when the pipe is ultimately freed,
we decrement user->pipe_bufs by something other than what than we had
charged to it, potentially leading to an underflow. This in turn can
cause subsequent too_many_pipe_buffers_soft() tests to fail with -EPERM.
Fixes: e95aada4cb93d ("pipe: wakeup wr_wait after setting max_usage")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
diff --git a/fs/pipe.c b/fs/pipe.c
index 94b59045ab44..072e2e003165 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1317,10 +1317,8 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
pipe->tail = tail;
pipe->head = head;
- if (!pipe_has_watch_queue(pipe)) {
- pipe->max_usage = nr_slots;
- pipe->nr_accounted = nr_slots;
- }
+ pipe->max_usage = nr_slots;
+ pipe->nr_accounted = nr_slots;
spin_unlock_irq(&pipe->rd_wait.lock);
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting
2025-02-14 15:34 [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
2025-02-14 15:37 ` [RFC PATCH 1/2] watch_queue: Fix pipe buffer allocation Eric Sandeen
2025-02-14 15:38 ` [RFC PATCH 2/2] watch_queue: Fix pipe accounting Eric Sandeen
@ 2025-02-14 16:28 ` Eric Sandeen
2025-02-14 16:49 ` [RFC PATCH 2/2] watch_queue: Fix pipe accounting David Howells
3 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2025-02-14 16:28 UTC (permalink / raw)
To: linux-fsdevel; +Cc: David Howells, Lukas Schauer
On 2/14/25 9:34 AM, Eric Sandeen wrote:
> (Giant disclaimer: I had not even heard of watch_queue.c until this week.
> So this might be garbage, but I think at least the bug is real and I think
> the analysis is correct.)
>
> We got a bug report stating that doing this in a loop:
>
> pipe2(pipefd, O_NOTIFICATION_PIPE);
> ioctl(pipefd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);
I should have specified that BUF_SIZE is 256, which is why
watch_queue_set_size adjusts the accounting to
(256 / WATCH_QUEUE_NOTES_PER_PAGE) == 8 in my analysis.
The maximum allowed in IOC_WATCH_QUEUE_SET_SIZE is 512, which won't
expose the bug, but slightly smaller numbers (~480, which leads
to < 16 pages), will.
Thanks,
-Eric
> close(pipefd[0]);
> close(pipefd[1]);
>
> as an unprivileged user would eventually yield -EPERM. The bug actually
> turned up when running
> https://github.com/SELinuxProject/selinux-testsuite/tree/main/tests/watchkey
>
> Analysis:
>
> The -EPERM happens because the too_many_pipe_buffers_soft() test in
> watch_queue_set_size() fails. That fails because user->pipe_bufs has
> underflowed. user->pipe_bufs has underflowed because (abbreviated) ...
>
> sys_pipe2
> __do_pipe_flags
> create_pipe_files
> get_pipe_inode
> alloc_pipe_info
> pipe_bufs = PIPE_DEF_BUFFERS; // (16)
> // charge 16 bufs to user
> account_pipe_buffers(user, 0, pipe_bufs);
> pipe->nr_accounted = pipe_bufs; // (16)
>
> so now the pipe has nr_accounted set, normally to PIPE_DEF_BUFFERS (16)
>
> Then, the ioctl:
>
> IOC_WATCH_QUEUE_SET_SIZE
> watch_queue_set_size(nr_notes)
> nr_pages = nr_notes / WATCH_QUEUE_NOTES_PER_PAGE; // (8)
> // reduce bufs charged to user for this pipe from 16 to 8
> account_pipe_buffers(pipe->user, pipe->nr_accounted, nr_pages);
> nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE;
> pipe_resize_ring(pipe, roundup_pow_of_two(nr_slots == nr_notes));
> if (!pipe_has_watch_queue(pipe))
> pipe->nr_accounted = nr_slots;
>
> Two things to note here:
>
> Because pipe_has_watch_queue() is true, pipe->nr_accounted is not changed.
> Also, pipe_resize_ring() resized to the number of notifications, not the
> number of pages, though pipe_resize_ring() seems to expect a page count?
>
> At this point wve hae charged 8 pages to user->pipe_bufs, but
> pipe->nr_accounted is still 16. And, it seems that we have actually
> allocated 8*32 pages to the pipe by virtue of calling pipe_resize_ring()
> with nr_notes not nr_pages.
>
> Finally, we close the pipes:
>
> pipe_release
> put_pipe_info
> free_pipe_info
> account_pipe_buffers(pipe->user, pipe->nr_accounted, 0);
>
> This subtracts pipe->nr_accounted (16) from user->pipe_bufs, even though
> per above we had only actually charged 8 to user->pipe_bufs, so it's a
> net reduction. Do that in a loop, and eventually user->pipe_bufs underflows.
>
> Maybe all that should be in the commit messages, but I'll try to reduce it for
> now, any comments are appreciated.
>
> (This does pass the handful of watch queue tests in LTP.)
>
> thanks,
> -Eric
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 2/2] watch_queue: Fix pipe accounting
2025-02-14 15:34 [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
` (2 preceding siblings ...)
2025-02-14 16:28 ` [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
@ 2025-02-14 16:49 ` David Howells
2025-02-14 16:58 ` Eric Sandeen
3 siblings, 1 reply; 7+ messages in thread
From: David Howells @ 2025-02-14 16:49 UTC (permalink / raw)
To: Eric Sandeen; +Cc: dhowells, linux-fsdevel, Lukas Schauer
Eric Sandeen <sandeen@redhat.com> wrote:
> - if (!pipe_has_watch_queue(pipe)) {
> - pipe->max_usage = nr_slots;
> - pipe->nr_accounted = nr_slots;
> - }
> + pipe->max_usage = nr_slots;
> + pipe->nr_accounted = nr_slots;
Hmmm... The pipe ring is supposed to have some spare capacity when used as a
watch queue so that you can bung at least a few messages into it.
David
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [RFC PATCH 2/2] watch_queue: Fix pipe accounting
2025-02-14 16:49 ` [RFC PATCH 2/2] watch_queue: Fix pipe accounting David Howells
@ 2025-02-14 16:58 ` Eric Sandeen
2025-02-14 17:36 ` Eric Sandeen
0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2025-02-14 16:58 UTC (permalink / raw)
To: David Howells, Eric Sandeen; +Cc: linux-fsdevel, Lukas Schauer
On 2/14/25 10:49 AM, David Howells wrote:
> Eric Sandeen <sandeen@redhat.com> wrote:
>
>> - if (!pipe_has_watch_queue(pipe)) {
>> - pipe->max_usage = nr_slots;
>> - pipe->nr_accounted = nr_slots;
>> - }
>> + pipe->max_usage = nr_slots;
>> + pipe->nr_accounted = nr_slots;
>
> Hmmm... The pipe ring is supposed to have some spare capacity when used as a
> watch queue so that you can bung at least a few messages into it.
If the pipe has an original number of buffers on it, and
then watch_queue_set_size adjusts that number - where does
the spare capacity come from? Who adds it / where?
Thanks,
-Eric
> David
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [RFC PATCH 2/2] watch_queue: Fix pipe accounting
2025-02-14 16:58 ` Eric Sandeen
@ 2025-02-14 17:36 ` Eric Sandeen
0 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2025-02-14 17:36 UTC (permalink / raw)
To: David Howells, Eric Sandeen; +Cc: linux-fsdevel, Lukas Schauer
On 2/14/25 10:58 AM, Eric Sandeen wrote:
> On 2/14/25 10:49 AM, David Howells wrote:
>> Eric Sandeen <sandeen@redhat.com> wrote:
>>
>>> - if (!pipe_has_watch_queue(pipe)) {
>>> - pipe->max_usage = nr_slots;
>>> - pipe->nr_accounted = nr_slots;
>>> - }
>>> + pipe->max_usage = nr_slots;
>>> + pipe->nr_accounted = nr_slots;
>>
>> Hmmm... The pipe ring is supposed to have some spare capacity when used as a
>> watch queue so that you can bung at least a few messages into it.
>
> If the pipe has an original number of buffers on it, and
> then watch_queue_set_size adjusts that number - where does
> the spare capacity come from? Who adds it / where?
If that was resizing the ring to nr_notes not nr_pages, then ... I guess we
could keep the pipe_has_watch_queue() test as above, skip updating nr_accounted
there, and set it manually to the user-charged number in watch_queue_set_size()?
Something like this? I'm not sure about max_usage, maybe it should be set
as well.
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 5267adeaa403..07a161ecc8a8 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -269,6 +269,14 @@ long watch_queue_set_size(struct pipe_inode_info *pipe, unsigned int nr_notes)
if (ret < 0)
goto error;
+ /*
+ * pipe_resize_ring does not update nr_accounted for watch_queue pipes,
+ * because the above vastly overprovisions. Set nr_accounted on
+ * this pipe to the number that was charged to the user above
+ * here manually.
+ */
+ pipe->nr_accounted = nr_pages;
+
ret = -ENOMEM;
pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL);
if (!pages)
> Thanks,
> -Eric
>
>> David
>>
>>
>
>
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-02-14 17:36 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-14 15:34 [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
2025-02-14 15:37 ` [RFC PATCH 1/2] watch_queue: Fix pipe buffer allocation Eric Sandeen
2025-02-14 15:38 ` [RFC PATCH 2/2] watch_queue: Fix pipe accounting Eric Sandeen
2025-02-14 16:28 ` [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
2025-02-14 16:49 ` [RFC PATCH 2/2] watch_queue: Fix pipe accounting David Howells
2025-02-14 16:58 ` Eric Sandeen
2025-02-14 17:36 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).