netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>, David Wei <dw@davidwei.uk>
Cc: io-uring@vger.kernel.org, netdev@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	David Ahern <dsahern@kernel.org>
Subject: Re: [PATCH v1 06/15] net: page_pool: add ->scrub mem provider callback
Date: Wed, 16 Oct 2024 18:42:26 +0100	[thread overview]
Message-ID: <6acf95a6-2ddc-4eee-a6e1-257ac8d41285@gmail.com> (raw)
In-Reply-To: <CAHS8izNt8pfBwGnRNWphN4vJJ=1yJX=++-RmGVHrVOvy59=13Q@mail.gmail.com>

On 10/14/24 23:58, Mina Almasry wrote:
> On Sun, Oct 13, 2024 at 8:25 PM David Wei <dw@davidwei.uk> wrote:
>>
>> On 2024-10-10 10:54, Mina Almasry wrote:
>>> On Wed, Oct 9, 2024 at 2:58 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>
>>>> On 10/9/24 22:00, Mina Almasry wrote:
>>>>> On Mon, Oct 7, 2024 at 3:16 PM David Wei <dw@davidwei.uk> wrote:
>>>>>>
>>>>>> From: Pavel Begunkov <asml.silence@gmail.com>
>>>>>>
>>>>>> page pool is now waiting for all ppiovs to return before destroying
>>>>>> itself, and for that to happen the memory provider might need to push
>>>>>> some buffers, flush caches and so on.
>>>>>>
>>>>>> todo: we'll try to get by without it before the final release
>>>>>>
>>>>>
>>>>> Is the intention to drop this todo and stick with this patch, or to
>>>>> move ahead with this patch?
>>>>
>>>> Heh, I overlooked this todo. The plan is to actually leave it
>>>> as is, it's by far the simplest way and doesn't really gets
>>>> into anyone's way as it's a slow path.
>>>>
>>>>> To be honest, I think I read in a follow up patch that you want to
>>>>> unref all the memory on page_pool_destory, which is not how the
>>>>> page_pool is used today. Tdoay page_pool_destroy does not reclaim
>>>>> memory. Changing that may be OK.
>>>>
>>>> It doesn't because it can't (not breaking anything), which is a
>>>> problem as the page pool might never get destroyed. io_uring
>>>> doesn't change that, a buffer can't be reclaimed while anything
>>>> in the kernel stack holds it. It's only when it's given to the
>>>> user we can force it back out of there.
>>
>> The page pool will definitely be destroyed, the call to
>> netdev_rx_queue_restart() with mp_ops/mp_priv set to null and netdev
>> core will ensure that.
>>
>>>>
>>>> And it has to happen one way or another, we can't trust the
>>>> user to put buffers back, it's just devmem does that by temporarily
>>>> attaching the lifetime of such buffers to a socket.
>>>>
>>>
>>> (noob question) does io_uring not have a socket equivalent that you
>>> can tie the lifetime of the buffers to? I'm thinking there must be

You can say it is bound to io_uring / io_uring's object
representing the queue.

>>> one, because in your patches IIRC you have the fill queues and the
>>> memory you bind from the userspace, there should be something that
>>> tells you that the userspace has exited/crashed and it's time to now
>>> destroy the fill queue and unbind the memory, right?
>>>
>>> I'm thinking you may want to bind the lifetime of the buffers to that,
>>> instead of the lifetime of the pool. The pool will not be destroyed
>>> until the next driver/reset reconfiguration happens, right? That could
>>> be long long after the userspace has stopped using the memory.

io_uring will reset the queue if it dies / requested to release
the queue.

>> Yes, there are io_uring objects e.g. interface queue that hold
>> everything together. IIRC page pool destroy doesn't unref but it waits
>> for all pages that are handed out to skbs to be returned. So for us,
>> below might work:
>>
>> 1. Call netdev_rx_queue_restart() which allocates a new pp for the rx
>>     queue and tries to free the old pp
>> 2. At this point we're guaranteed that any packets hitting this rx queue
>>     will not go to user pages from our memory provider

It's reasonable to assume that the driver will start destroying
the page pool, but I wouldn't rely on it when it comes to the
kernel state correctness, i.e. not crashing the kernel. It's a bit
fragile, drivers always tend to do all kinds of interesting stuff,
I'd rather deal with a loud io_uring / page pool leak in case of
some weirdness. And that means we can't really guarantee the above
and need to care about not racing with allocations.

>> 3. Assume userspace is gone (either crash or gracefully terminating),
>>     unref the uref for all pages, same as what scrub() is doing today
>> 4. Any pages that are still in skb frags will get freed when the sockets
>>     etc are closed

And we need to prevent from requests receiving netmem that are
already pushed to sockets.

>> 5. Rely on the pp delay release to eventually terminate and clean up
>>
>> Let me know what you think Pavel.

I think it's reasonable to leave it as is for now, I don't believe
anyone cares much about a simple slow path memory provider-only
callback. And we can always kill it later on if we find a good way
to synchronise pieces, which will be more apparent when we add some
more registration dynamism on top, when/if this patchset is merged.

In short, let's resend the series with the callback, see if
maintainers have a strong opinion, and otherwise I'd say it
should be fine as is.

> Something roughly along those lines sounds more reasonable to me.
> 
> The critical point is as I said above, if you free the memory only
> when the pp is destroyed, then the memory lives from 1 io_uring ZC
> instance to the next. The next instance will see a reduced address
> space because the previously destroyed io_uring ZC connection did not
> free the memory. You could have users in production opening thousands
> of io_uring ZC connections between rxq resets, and not cleaning up
> those connections. In that case I think eventually they'll run out of
> memory as the memory leaks until it's cleaned up with a pp destroy
> (driver reset?).

Not sure what giving memory from one io_uring zc instance to
another means. And it's perfectly valid to receive a buffer, close
the socket and only after use the data, it logically belongs to
the user, not the socket. It's only bound to io_uring zcrx/queue
object for clean up purposes if io_uring goes down, it's different
from devmem TCP.

-- 
Pavel Begunkov

  reply	other threads:[~2024-10-16 17:42 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-07 22:15 [PATCH v1 00/15] io_uring zero copy rx David Wei
2024-10-07 22:15 ` [PATCH v1 01/15] net: devmem: pull struct definitions out of ifdef David Wei
2024-10-09 20:17   ` Mina Almasry
2024-10-09 23:16     ` Pavel Begunkov
2024-10-10 18:01       ` Mina Almasry
2024-10-10 18:57         ` Pavel Begunkov
2024-10-13 22:38           ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 02/15] net: prefix devmem specific helpers David Wei
2024-10-09 20:19   ` Mina Almasry
2024-10-07 22:15 ` [PATCH v1 03/15] net: generalise net_iov chunk owners David Wei
2024-10-08 15:46   ` Stanislav Fomichev
2024-10-08 16:34     ` Pavel Begunkov
2024-10-09 16:28       ` Stanislav Fomichev
2024-10-11 18:44         ` David Wei
2024-10-11 22:02           ` Pavel Begunkov
2024-10-11 22:25             ` Mina Almasry
2024-10-11 23:12               ` Pavel Begunkov
2024-10-09 20:44   ` Mina Almasry
2024-10-09 22:13     ` Pavel Begunkov
2024-10-09 22:19     ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 04/15] net: page_pool: create hooks for custom page providers David Wei
2024-10-09 20:49   ` Mina Almasry
2024-10-09 22:02     ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 05/15] net: prepare for non devmem TCP memory providers David Wei
2024-10-09 20:56   ` Mina Almasry
2024-10-09 21:45     ` Pavel Begunkov
2024-10-13 22:33       ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 06/15] net: page_pool: add ->scrub mem provider callback David Wei
2024-10-09 21:00   ` Mina Almasry
2024-10-09 21:59     ` Pavel Begunkov
2024-10-10 17:54       ` Mina Almasry
2024-10-13 17:25         ` David Wei
2024-10-14 13:37           ` Pavel Begunkov
2024-10-14 22:58           ` Mina Almasry
2024-10-16 17:42             ` Pavel Begunkov [this message]
2024-11-01 17:18               ` Mina Almasry
2024-11-01 18:35                 ` Pavel Begunkov
2024-11-01 19:24                   ` Mina Almasry
2024-11-01 21:38                     ` Pavel Begunkov
2024-11-04 20:42                       ` Mina Almasry
2024-11-04 21:27                         ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 07/15] net: page pool: add helper creating area from pages David Wei
2024-10-09 21:11   ` Mina Almasry
2024-10-09 21:34     ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 08/15] net: add helper executing custom callback from napi David Wei
2024-10-08 22:25   ` Joe Damato
2024-10-09 15:09     ` Pavel Begunkov
2024-10-09 16:13       ` Joe Damato
2024-10-09 19:12         ` Pavel Begunkov
2024-10-07 22:15 ` [PATCH v1 09/15] io_uring/zcrx: add interface queue and refill queue David Wei
2024-10-09 17:50   ` Jens Axboe
2024-10-09 18:09     ` Jens Axboe
2024-10-09 19:08     ` Pavel Begunkov
2024-10-11 22:11     ` Pavel Begunkov
2024-10-13 17:32     ` David Wei
2024-10-07 22:15 ` [PATCH v1 10/15] io_uring/zcrx: add io_zcrx_area David Wei
2024-10-09 18:02   ` Jens Axboe
2024-10-09 19:05     ` Pavel Begunkov
2024-10-09 19:06       ` Jens Axboe
2024-10-09 21:29   ` Mina Almasry
2024-10-07 22:15 ` [PATCH v1 11/15] io_uring/zcrx: implement zerocopy receive pp memory provider David Wei
2024-10-09 18:10   ` Jens Axboe
2024-10-09 22:01   ` Mina Almasry
2024-10-09 22:58     ` Pavel Begunkov
2024-10-10 18:19       ` Mina Almasry
2024-10-10 20:26         ` Pavel Begunkov
2024-10-10 20:53           ` Mina Almasry
2024-10-10 20:58             ` Mina Almasry
2024-10-10 21:22             ` Pavel Begunkov
2024-10-11  0:32               ` Mina Almasry
2024-10-11  1:49                 ` Pavel Begunkov
2024-10-07 22:16 ` [PATCH v1 12/15] io_uring/zcrx: add io_recvzc request David Wei
2024-10-09 18:28   ` Jens Axboe
2024-10-09 18:51     ` Pavel Begunkov
2024-10-09 19:01       ` Jens Axboe
2024-10-09 19:27         ` Pavel Begunkov
2024-10-09 19:42           ` Jens Axboe
2024-10-09 19:47             ` Pavel Begunkov
2024-10-09 19:50               ` Jens Axboe
2024-10-07 22:16 ` [PATCH v1 13/15] io_uring/zcrx: add copy fallback David Wei
2024-10-08 15:58   ` Stanislav Fomichev
2024-10-08 16:39     ` Pavel Begunkov
2024-10-08 16:40     ` David Wei
2024-10-09 16:30       ` Stanislav Fomichev
2024-10-09 23:05         ` Pavel Begunkov
2024-10-11  6:22           ` David Wei
2024-10-11 14:43             ` Stanislav Fomichev
2024-10-09 18:38   ` Jens Axboe
2024-10-07 22:16 ` [PATCH v1 14/15] io_uring/zcrx: set pp memory provider for an rx queue David Wei
2024-10-09 18:42   ` Jens Axboe
2024-10-10 13:09     ` Pavel Begunkov
2024-10-10 13:19       ` Jens Axboe
2024-10-07 22:16 ` [PATCH v1 15/15] io_uring/zcrx: throttle receive requests David Wei
2024-10-09 18:43   ` Jens Axboe
2024-10-07 22:20 ` [PATCH v1 00/15] io_uring zero copy rx David Wei
2024-10-08 23:10 ` Joe Damato
2024-10-09 15:07   ` Pavel Begunkov
2024-10-09 16:10     ` Joe Damato
2024-10-09 16:12     ` Jens Axboe
2024-10-11  6:15       ` David Wei
2024-10-09 15:27 ` Jens Axboe
2024-10-09 15:38   ` David Ahern
2024-10-09 15:43     ` Jens Axboe
2024-10-09 15:49       ` Pavel Begunkov
2024-10-09 15:50         ` Jens Axboe
2024-10-09 16:35       ` David Ahern
2024-10-09 16:50         ` Jens Axboe
2024-10-09 16:53           ` Jens Axboe
2024-10-09 17:12             ` Jens Axboe
2024-10-10 14:21               ` Jens Axboe
2024-10-10 15:03                 ` David Ahern
2024-10-10 15:15                   ` Jens Axboe
2024-10-10 18:11                 ` Jens Axboe
2024-10-14  8:42                   ` David Laight
2024-10-09 16:55 ` Mina Almasry
2024-10-09 16:57   ` Jens Axboe
2024-10-09 19:32     ` Mina Almasry
2024-10-09 19:43       ` Pavel Begunkov
2024-10-09 19:47       ` Jens Axboe
2024-10-09 17:19   ` David Ahern
2024-10-09 18:21   ` Pedro Tammela
2024-10-10 13:19     ` Pavel Begunkov
2024-10-11  0:35     ` David Wei
2024-10-11 14:28       ` Pedro Tammela
2024-10-11  0:29   ` David Wei
2024-10-11 19:43     ` Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6acf95a6-2ddc-4eee-a6e1-257ac8d41285@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=almasrymina@google.com \
    --cc=axboe@kernel.dk \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).