linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@redhat.com>
To: linux-fsdevel <linux-fsdevel@vger.kernel.org>
Cc: David Howells <dhowells@redhat.com>, Lukas Schauer <lukas@schauer.dev>
Subject: Re: [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting
Date: Fri, 14 Feb 2025 10:28:18 -0600	[thread overview]
Message-ID: <97f7f028-f0ef-45a9-bce5-bc1c263a343f@redhat.com> (raw)
In-Reply-To: <b34d5d5f-f936-4781-82d3-6a69fdec9b61@redhat.com>

On 2/14/25 9:34 AM, Eric Sandeen wrote:
> (Giant disclaimer: I had not even heard of watch_queue.c until this week.
> So this might be garbage, but I think at least the bug is real and I think
> the analysis is correct.)
> 
> We got a bug report stating that doing this in a loop:
> 
>         pipe2(pipefd, O_NOTIFICATION_PIPE);
>         ioctl(pipefd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);

I should have specified that BUF_SIZE is 256, which is why
watch_queue_set_size adjusts the accounting to
(256 / WATCH_QUEUE_NOTES_PER_PAGE) == 8 in my analysis.

The maximum allowed in IOC_WATCH_QUEUE_SET_SIZE is 512, which won't
expose the bug, but slightly smaller numbers (~480, which leads
to < 16 pages), will.

Thanks,
-Eric

>         close(pipefd[0]);
>         close(pipefd[1]);
> 
> as an unprivileged user would eventually yield -EPERM. The bug actually
> turned up when running
> https://github.com/SELinuxProject/selinux-testsuite/tree/main/tests/watchkey
> 
> Analysis:
> 
> The -EPERM happens because the too_many_pipe_buffers_soft() test in
> watch_queue_set_size() fails. That fails because user->pipe_bufs has
> underflowed. user->pipe_bufs has underflowed because (abbreviated) ...
> 
> sys_pipe2
>         __do_pipe_flags
>                 create_pipe_files
>                         get_pipe_inode
>                                 alloc_pipe_info
>                                         pipe_bufs = PIPE_DEF_BUFFERS; // (16)
>                                         // charge 16 bufs to user
>                                         account_pipe_buffers(user, 0, pipe_bufs);
>                                         pipe->nr_accounted = pipe_bufs; // (16)
> 
> so now the pipe has nr_accounted set, normally to PIPE_DEF_BUFFERS (16)
> 
> Then, the ioctl:
> 
> IOC_WATCH_QUEUE_SET_SIZE
>         watch_queue_set_size(nr_notes)
>                 nr_pages = nr_notes / WATCH_QUEUE_NOTES_PER_PAGE; // (8)
>                 // reduce bufs charged to user for this pipe from 16 to 8
>                 account_pipe_buffers(pipe->user, pipe->nr_accounted, nr_pages);
>                 nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE;
>                 pipe_resize_ring(pipe, roundup_pow_of_two(nr_slots == nr_notes));
>                         if (!pipe_has_watch_queue(pipe))
>                                 pipe->nr_accounted = nr_slots;
> 
> Two things to note here:
> 
> Because pipe_has_watch_queue() is true, pipe->nr_accounted is not changed.
> Also, pipe_resize_ring() resized to the number of notifications, not the
> number of pages, though pipe_resize_ring() seems to expect a page count?
> 
> At this point wve hae charged 8 pages to user->pipe_bufs, but
> pipe->nr_accounted is still 16. And, it seems that we have actually
> allocated 8*32 pages to the pipe by virtue of calling pipe_resize_ring()
> with nr_notes not nr_pages.
> 
> Finally, we close the pipes:
> 
> pipe_release
>         put_pipe_info
>                 free_pipe_info
>                         account_pipe_buffers(pipe->user, pipe->nr_accounted, 0);
> 
> This subtracts pipe->nr_accounted (16) from user->pipe_bufs, even though
> per above we had only actually charged 8 to user->pipe_bufs, so it's a
> net reduction. Do that in a loop, and eventually user->pipe_bufs underflows.
> 
> Maybe all that should be in the commit messages, but I'll try to reduce it for
> now, any comments are appreciated.
> 
> (This does pass the handful of watch queue tests in LTP.)
> 
> thanks,
> -Eric
> 
> 


  parent reply	other threads:[~2025-02-14 16:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-14 15:34 [RFC PATCH 0/2] watch_queue: Fix pipe allocation and accounting Eric Sandeen
2025-02-14 15:37 ` [RFC PATCH 1/2] watch_queue: Fix pipe buffer allocation Eric Sandeen
2025-02-14 15:38 ` [RFC PATCH 2/2] watch_queue: Fix pipe accounting Eric Sandeen
2025-02-14 16:28 ` Eric Sandeen [this message]
2025-02-14 16:49 ` David Howells
2025-02-14 16:58   ` Eric Sandeen
2025-02-14 17:36     ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97f7f028-f0ef-45a9-bce5-bc1c263a343f@redhat.com \
    --to=sandeen@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lukas@schauer.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).