From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: "Li,Rongqing" <lirongqing@baidu.com>,
"Saeed Mahameed" <saeedm@mellanox.com>,
"ilias.apalodimas@linaro.org" <ilias.apalodimas@linaro.org>,
"jonathan.lemon@gmail.com" <jonathan.lemon@gmail.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"mhocko@kernel.org" <mhocko@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Björn Töpel" <bjorn.topel@intel.com>,
brouer@redhat.com
Subject: Re: [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE condition
Date: Fri, 13 Dec 2019 09:48:45 +0100 [thread overview]
Message-ID: <20191213094845.56fb42a4@carbon> (raw)
In-Reply-To: <079a0315-efea-9221-8538-47decf263684@huawei.com>
On Fri, 13 Dec 2019 14:53:37 +0800
Yunsheng Lin <linyunsheng@huawei.com> wrote:
> On 2019/12/13 14:27, Li,Rongqing wrote:
> >>
> >> It is good to allocate the rx page close to both cpu and device,
> >> but if both goal can not be reached, maybe we choose to allocate
> >> page that close to cpu?
> >>
> > I think it is true
> >
> > If it is true, , we can remove pool->p.nid, and replace
> > alloc_pages_node with alloc_pages in __page_pool_alloc_pages_slow,
> > and change pool_page_reusable as that page_to_nid(page) is checked
> > with numa_mem_id()
No, as I explained before, you cannot use numa_mem_id() in pool_page_reusable,
because recycle call can happen from/on a remote CPU (and numa_mem_id()
uses the local CPU to determine the NUMA id).
Today we already have XDP cpumap redirect. And we are working on
extending this to SKBs, which will increase the likelihood even more.
I don't think we want to not-recycle if a remote NUMA node CPU happened
to touch the packet?
(Based on the optimizations done for Facebook (the reason we added this))
What seems to matter is the NUMA node of CPU that runs RX NAPI-polling,
this is the first CPU that touch the packet memory and struct-page
memory. For these drivers, it is also the same "RX-CPU" that does the
allocation of new pages (to refill the RX-ring), and these "new" pages
can come from the page_pool.
With this in mind, it does seem strange that we are doing the check on
the "free"/recycles code path, that can run on remote CPUs...
> > since alloc_pages hint to use the current node page, and
> > __page_pool_alloc_pages_slow will be called in NAPI polling often
> > if recycle failed, after some cycle, the page will be from local
> > memory node.
You are basically saying that the NUMA check should be moved to
allocation time, as it is running the RX-CPU (NAPI). And eventually
after some time the pages will come from correct NUMA node.
I think we can do that, and only affect the semi-fast-path.
We just need to handle that pages in the ptr_ring that are recycled can
be from the wrong NUMA node. In __page_pool_get_cached() when
consuming pages from the ptr_ring (__ptr_ring_consume_batched), then we
can evict pages from wrong NUMA node.
For the pool->alloc.cache we either accept, that it will eventually
after some time be emptied (it is only in a 100% XDP_DROP workload that
it will continue to reuse same pages). Or we simply clear the
pool->alloc.cache when calling page_pool_update_nid().
> Yes if allocation and recycling are in the same NAPI polling context.
Which is a false assumption.
> As pointed out by Saeed and Ilias, the allocation and recycling seems
> to may not be happening in the same NAPI polling context, see:
>
> "In the current code base if they are only called under NAPI this
> might be true. On the page_pool skb recycling patches though (yes
> we'll eventually send those :)) this is called from kfree_skb()."
>
> So there may need some additionl attention.
Yes, as explained above.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2019-12-13 8:49 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-06 9:32 [PATCH][v2] page_pool: handle page recycle for NUMA_NO_NODE condition Li RongQing
2019-12-07 3:52 ` Saeed Mahameed
2019-12-09 1:31 ` Yunsheng Lin
2019-12-09 3:47 ` 答复: " Li,Rongqing
2019-12-09 9:30 ` Ilias Apalodimas
2019-12-09 10:37 ` 答复: " Li,Rongqing
2019-12-09 12:14 ` Jesper Dangaard Brouer
2019-12-09 23:34 ` Saeed Mahameed
2019-12-10 1:31 ` Yunsheng Lin
2019-12-10 9:39 ` 答复: " Li,Rongqing
2019-12-10 14:52 ` Ilias Apalodimas
2019-12-10 19:56 ` Saeed Mahameed
2019-12-10 19:45 ` Saeed Mahameed
2019-12-11 3:01 ` Yunsheng Lin
2019-12-11 3:06 ` Yunsheng Lin
2019-12-11 20:57 ` Saeed Mahameed
2019-12-12 1:04 ` Yunsheng Lin
2019-12-10 15:02 ` Ilias Apalodimas
2019-12-10 20:02 ` Saeed Mahameed
2019-12-10 20:10 ` Ilias Apalodimas
2019-12-11 18:49 ` Jesper Dangaard Brouer
2019-12-11 21:24 ` Saeed Mahameed
2019-12-12 1:34 ` Yunsheng Lin
2019-12-12 10:18 ` Jesper Dangaard Brouer
2019-12-13 3:40 ` Yunsheng Lin
2019-12-13 6:27 ` 答复: " Li,Rongqing
2019-12-13 6:53 ` Yunsheng Lin
2019-12-13 8:48 ` Jesper Dangaard Brouer [this message]
2019-12-16 1:51 ` Yunsheng Lin
2019-12-16 4:02 ` 答复: " Li,Rongqing
2019-12-16 10:13 ` Ilias Apalodimas
2019-12-16 10:16 ` Ilias Apalodimas
2019-12-16 10:57 ` 答复: " Li,Rongqing
2019-12-17 19:38 ` Saeed Mahameed
2019-12-17 19:35 ` Saeed Mahameed
2019-12-17 19:27 ` Saeed Mahameed
2019-12-16 12:15 ` Michal Hocko
2019-12-16 12:34 ` Ilias Apalodimas
2019-12-16 13:08 ` Michal Hocko
2019-12-16 13:21 ` Ilias Apalodimas
2019-12-17 2:11 ` Yunsheng Lin
2019-12-17 9:11 ` Michal Hocko
2019-12-19 2:09 ` Yunsheng Lin
2019-12-19 11:53 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191213094845.56fb42a4@carbon \
--to=brouer@redhat.com \
--cc=bhelgaas@google.com \
--cc=bjorn.topel@intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jonathan.lemon@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=lirongqing@baidu.com \
--cc=mhocko@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=saeedm@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).