Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
	Yuhao Jiang <danisjiang@gmail.com>
Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting
Date: Fri, 23 Jan 2026 07:50:53 -0700	[thread overview]
Message-ID: <fc8664bb-7769-48a2-b470-71fb81828e26@kernel.dk> (raw)
In-Reply-To: <596bc7ac-3d24-43a7-9e7e-e59189525ebc@gmail.com>

On 1/23/26 7:26 AM, Pavel Begunkov wrote:
> On 1/22/26 21:51, Pavel Begunkov wrote:
> ...
>>>>> I already briefly touched on that earlier, for sure not going to be of
>>>>> any practical concern.
>>>>
>>>> Modest 16 GB can give 1M entries. Assuming 50ns-100ns per entry for the
>>>> xarray business, that's 50-100ms. It's all serialised, so multiply by
>>>> the number of CPUs/threads, e.g. 10-100, that's 0.5-10s. Account sky
>>>> high spinlock contention, and it jumps again, and there can be more
>>>> memory / CPUs / numa nodes. Not saying that it's worse than the
>>>> current O(n^2), I have a test program that borderline hangs the
>>>> system.
>>>
>>> It's definitely not worse than the existing system, which is why I don't
>>> think it's a big deal. Nobody has ever complained about time to register
>>> buffers. It's inherently a slow path, and quite slow at that depending
>>> on the use case. Out of curiosity, I ran some stilly testing on
>>> registering 16GB of memory, with 1..32 threads. Each will do 16GB, so
>>> 512GB registered in total for the 32 case. Before is the current kernel,
>>> after is with per-user xarray accounting:
>>>
>>> before
>>>
>>> nthreads 1:      646 msec
>>> nthreads 2:      888 msec
>>> nthreads 4:      864 msec
>>> nthreads 8:     1450 msec
>>> nthreads 16:    2890 msec
>>> nthreads 32:    4410 msec
>>>
>>> after
>>>
>>> nthreads 1:      650 msec
>>> nthreads 2:      888 msec
>>> nthreads 4:      892 msec
>>> nthreads 8:     1270 msec
>>> nthreads 16:    2430 msec
>>> nthreads 32:    4160 msec
>>>
>>> This includes both registering buffers, cloning all of them to another
>>> ring, and unregistering times, and nowhere is locking scalability an
>>> issue for the xarray manipulation. The box has 32 nodes and 512 CPUs. So
>>> no, I strongly believe this isn't an issue.
>>>
>>> IOW, accurate accounting is cheaper than the stuff we have now. None of
>>> them are super cheap. Does it matter? I really don't think so, or people
>>> would've complained already. The only complaint I got on these kinds of
>>> things was for cloning, which did get fixed up some releases ago.
>>
>> You need compound pages
>>
>> always > /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
>>
>> And use update() instead of register() as accounting dedup for
>> registration is broken-disabled. For the current kernel:
>>
>> Single threaded:
>> 1x1G: 7.5s
>> 2x1G: 45s
>> 4x1G: 190s
>>
>> 16x should be ~3000s, not going to run it. Uninterruptible and no
>> cond_resched, so spawn NR_CPUS threads and the system is completely
>> unresponsive (I guess it depends on the preemption mode).
> The program is below for reference, but it's trivial. THP setting
> is done inside for convenience. There are ways to make the runtime
> even worse, but that should be enough.

Thanks for sending that. Ran it on the same box, on current -git and
with user_struct xarray accounting. Modified it so that 2nd arg is
number of threads, for easy running:

current -git

axboe@r7625 ~> cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
[always] inherit madvise never
axboe@r7625 ~> for i in 1 2 4 8 16; time ./ppage $i $i; end
register 1 GB, num threads 1

________________________________________________________
Executed in  178.91 millis    fish           external
   usr time    9.82 millis  313.00 micros    9.51 millis
   sys time  161.83 millis  149.00 micros  161.68 millis

register 2 GB, num threads 2

________________________________________________________
Executed in  638.49 millis    fish           external
   usr time    0.03 secs    285.00 micros    0.03 secs
   sys time    1.14 secs    135.00 micros    1.14 secs

register 4 GB, num threads 4

________________________________________________________
Executed in    2.17 secs    fish           external
   usr time    0.05 secs  314.00 micros    0.05 secs
   sys time    6.31 secs  150.00 micros    6.31 secs

register 8 GB, num threads 8

________________________________________________________
Executed in    4.97 secs    fish           external
   usr time    0.12 secs  299.00 micros    0.12 secs
   sys time   28.97 secs  142.00 micros   28.97 secs

register 16 GB, num threads 16

________________________________________________________
Executed in   10.34 secs    fish           external
   usr time    0.20 secs  294.00 micros    0.20 secs
   sys time  126.42 secs  140.00 micros  126.42 secs


-git + user_struct xarray for accounting

axboe@r7625 ~> cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled
[always] inherit madvise never
axboe@r7625 ~> for i in 1 2 4 8 16; time ./ppage $i $i; end
register 1 GB, num threads 1

________________________________________________________
Executed in   54.05 millis    fish           external
   usr time   10.66 millis  327.00 micros   10.34 millis
   sys time   41.60 millis  259.00 micros   41.34 millis

register 2 GB, num threads 2

________________________________________________________
Executed in  105.70 millis    fish           external
   usr time   34.38 millis  206.00 micros   34.17 millis
   sys time   68.55 millis  206.00 micros   68.35 millis

register 4 GB, num threads 4

________________________________________________________
Executed in  214.72 millis    fish           external
   usr time   48.10 millis  193.00 micros   47.91 millis
   sys time  182.25 millis  193.00 micros  182.06 millis

register 8 GB, num threads 8

________________________________________________________
Executed in  441.96 millis    fish           external
   usr time  123.26 millis  195.00 micros  123.07 millis
   sys time  568.20 millis  195.00 micros  568.00 millis

register 16 GB, num threads 16

________________________________________________________
Executed in  917.70 millis    fish           external
   usr time    0.17 secs    202.00 micros    0.17 secs
   sys time    2.48 secs    202.00 micros    2.48 secs


-- 
Jens Axboe

next prev parent reply	other threads:[~2026-01-23 14:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-19  7:10 [PATCH v2] io_uring/rsrc: fix RLIMIT_MEMLOCK bypass by removing cross-buffer accounting Yuhao Jiang
2026-01-19 17:03 ` Jens Axboe
2026-01-19 23:34   ` Yuhao Jiang
2026-01-19 23:40     ` Jens Axboe
2026-01-20  7:05       ` Yuhao Jiang
2026-01-20 12:04         ` Jens Axboe
2026-01-20 12:05         ` Pavel Begunkov
2026-01-20 17:03           ` Jens Axboe
2026-01-20 21:45             ` Pavel Begunkov
2026-01-21 14:58               ` Jens Axboe
2026-01-22 11:43                 ` Pavel Begunkov
2026-01-22 17:47                   ` Jens Axboe
2026-01-22 21:51                     ` Pavel Begunkov
2026-01-23 14:26                       ` Pavel Begunkov
2026-01-23 14:50                         ` Jens Axboe [this message]
2026-01-23 15:04                           ` Jens Axboe
2026-01-23 16:52                             ` Jens Axboe
2026-01-24 11:04                               ` Pavel Begunkov
2026-01-24 15:14                                 ` Jens Axboe
2026-01-24 15:55                                   ` Jens Axboe
2026-01-24 16:30                                     ` Pavel Begunkov
2026-01-24 18:44                                     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc8664bb-7769-48a2-b470-71fb81828e26@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=danisjiang@gmail.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.