From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Cc: "Tariq Toukan" <tariqt@mellanox.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
"Toke Høiland-Jørgensen" <toke@toke.dk>,
"toshiaki.makita1@gmail.com" <toshiaki.makita1@gmail.com>,
"grygorii.strashko@ti.com" <grygorii.strashko@ti.com>,
"mcroce@redhat.com" <mcroce@redhat.com>,
brouer@redhat.com
Subject: Re: [PATCH net-next v1 08/11] xdp: tracking page_pool resources and safe removal
Date: Tue, 18 Jun 2019 17:19:51 +0200 [thread overview]
Message-ID: <20190618171951.17128ed8@carbon> (raw)
In-Reply-To: <20190618125431.GA5307@khorivan>
On Tue, 18 Jun 2019 15:54:33 +0300 Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
> On Sun, Jun 16, 2019 at 10:56:25AM +0000, Tariq Toukan wrote:
> >
> >On 6/15/2019 12:33 PM, Ivan Khoronzhuk wrote:
> >> On Thu, Jun 13, 2019 at 08:28:42PM +0200, Jesper Dangaard Brouer wrote:
[...]
> >>
> >> What would you recommend to do for the following situation:
> >>
> >> Same receive queue is shared between 2 network devices. The receive ring is
> >> filled by pages from page_pool, but you don't know the actual port (ndev)
> >> filling this ring, because a device is recognized only after packet is
> >> received.
> >>
> >> The API is so that xdp rxq is bind to network device, each frame has
> >> reference
> >> on it, so rxq ndev must be static. That means each netdev has it's own rxq
> >> instance even no need in it. Thus, after your changes, page must be
> >> returned to
> >> the pool it was taken from, or released from old pool and recycled in
> >> new one
> >> somehow.
> >>
> >> And that is inconvenience at least. It's hard to move pages between
> >> pools w/o performance penalty. No way to use common pool either,
> >> as unreg_rxq now drops the pool and 2 rxqa can't reference same
> >> pool.
> >
> >Within the single netdev, separate page_pool instances are anyway
> >created for different RX rings, working under different NAPI's.
>
> The circumstances are so that same RX ring is shared between 2
> netdevs... and netdev can be known only after descriptor/packet is
> received. Thus, while filling RX ring, there is no actual device,
> but when packet is received it has to be recycled to appropriate
> net device pool. Before this change there were no difference from
> which pool the page was allocated to fill RX ring, as there were no
> owner. After this change there is owner - netdev page pool.
It not really a dependency added in this patchset. A page_pool is
strictly bound to a single RX-queue, for performance, as this allow us
a NAPI fast-path return used for early drop (XDP_DROP).
I can see that the API xdp_rxq_info_reg_mem_model() make it possible to
call it on different xdp_rxq_info structs with the same page_pool
pointer. But it was never intended to be used like that, and I
consider it an API usage violation. I originally wanted to add the
allocator pointer to xdp_rxq_info_reg() call, but the API was extended
in different versions, so I didn't want to break users. I've actually
tried hard to catch when drivers use the API wrong, via WARN(), but I
guess you found a loop hole.
Besides, we already have a dependency from the RX-queue to the netdev
in the xdp_rxq_info structure. E.g. the xdp_rxq_info->dev is sort of
central, and dereferenced by BPF-code to read xdp_md->ingress_ifindex,
and also used by cpumap when creating SKBs.
> For cpsw the dma unmap is common for both netdevs and no difference
> who is freeing the page, but there is difference which pool it's
> freed to.
>
> So that, while filling RX ring the page is taken from page pool of
> ndev1, but packet is received for ndev2, it has to be later
> returned/recycled to page pool of ndev1, but when xdp buffer is
> handed over to xdp prog the xdp_rxq_info has reference on ndev2 ...
>
> And no way to predict the final ndev before packet is received, so no
> way to choose appropriate page pool as now it becomes page owner.
>
> So, while RX ring filling, the page/dma recycling is needed but should
> be some way to identify page owner only after receiving packet.
>
> Roughly speaking, something like:
>
> pool->pages_state_hold_cnt++;
>
> outside of page allocation API, after packet is received.
Don't EVER manipulate the internal state outside of page allocation
API. That kills the purpose of defining any API.
> and free of the counter while allocation (w/o owing the page).
You use-case of two netdev's sharing the same RX-queue sounds dubious,
and very hardware specific. I'm not sure why we want to bend the APIs
to support this?
If we had to allow page_pool to be registered twice, via
xdp_rxq_info_reg_mem_model() then I guess we could extend page_pool
with a usage/users reference count, and then only really free the
page_pool when refcnt reach zero. But it just seems and looks wrong
(in the code) as the hole trick to get performance is to only have one
user.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2019-06-18 15:20 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-13 18:28 [PATCH net-next v1 00/11] xdp: page_pool fixes and in-flight accounting Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 01/11] net: page_pool: add helper function to retrieve dma addresses Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 02/11] net: page_pool: add helper function to unmap " Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 03/11] xdp: fix leak of IDA cyclic id if rhashtable_insert_slow fails Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 04/11] xdp: page_pool related fix to cpumap Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 05/11] veth: use xdp_release_frame for XDP_PASS Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 06/11] page_pool: introduce page_pool_free and use in mlx5 Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 07/11] mlx5: more strict use of page_pool API Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 08/11] xdp: tracking page_pool resources and safe removal Jesper Dangaard Brouer
2019-06-14 11:01 ` Ilias Apalodimas
2019-06-15 9:33 ` Ivan Khoronzhuk
2019-06-16 10:56 ` Tariq Toukan
2019-06-18 12:54 ` Ivan Khoronzhuk
2019-06-18 13:30 ` Ilias Apalodimas
2019-06-18 13:48 ` Ivan Khoronzhuk
2019-06-18 15:19 ` Jesper Dangaard Brouer [this message]
2019-06-18 17:54 ` Ivan Khoronzhuk
2019-06-19 11:12 ` Ivan Khoronzhuk
2019-06-13 18:28 ` [PATCH net-next v1 09/11] xdp: force mem allocator removal and periodic warning Jesper Dangaard Brouer
2019-06-15 8:59 ` Ivan Khoronzhuk
2019-06-18 8:37 ` Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 10/11] xdp: add tracepoints for XDP mem Jesper Dangaard Brouer
2019-06-13 18:28 ` [PATCH net-next v1 11/11] page_pool: add tracepoints for page_pool with details need by XDP Jesper Dangaard Brouer
2019-06-15 2:41 ` [PATCH net-next v1 00/11] xdp: page_pool fixes and in-flight accounting David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190618171951.17128ed8@carbon \
--to=brouer@redhat.com \
--cc=grygorii.strashko@ti.com \
--cc=ilias.apalodimas@linaro.org \
--cc=ivan.khoronzhuk@linaro.org \
--cc=mcroce@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=tariqt@mellanox.com \
--cc=toke@toke.dk \
--cc=toshiaki.makita1@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).