From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Aaron Lu <aaron.lu@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"Paweł Staszewski" <pstaszewski@itcare.pl>,
"Eric Dumazet" <eric.dumazet@gmail.com>,
"Tariq Toukan" <tariqt@mellanox.com>,
"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
"Yoel Caspersen" <yoel@kviknet.dk>,
"Mel Gorman" <mgorman@techsingularity.net>,
"Saeed Mahameed" <saeedm@mellanox.com>,
"Michal Hocko" <mhocko@suse.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
brouer@redhat.com
Subject: Re: [PATCH 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
Date: Mon, 5 Nov 2018 10:55:25 +0100 [thread overview]
Message-ID: <20181105105525.1f78c661@redhat.com> (raw)
In-Reply-To: <20181105085820.6341-1-aaron.lu@intel.com>
On Mon, 5 Nov 2018 16:58:19 +0800
Aaron Lu <aaron.lu@intel.com> wrote:
> page_frag_free() calls __free_pages_ok() to free the page back to
> Buddy. This is OK for high order page, but for order-0 pages, it
> misses the optimization opportunity of using Per-Cpu-Pages and can
> cause zone lock contention when called frequently.
>
> Paweł Staszewski recently shared his result of 'how Linux kernel
> handles normal traffic'[1] and from perf data, Jesper Dangaard Brouer
> found the lock contention comes from page allocator:
>
> mlx5e_poll_tx_cq
> |
> --16.34%--napi_consume_skb
> |
> |--12.65%--__free_pages_ok
> | |
> | --11.86%--free_one_page
> | |
> | |--10.10%--queued_spin_lock_slowpath
> | |
> | --0.65%--_raw_spin_lock
> |
> |--1.55%--page_frag_free
> |
> --1.44%--skb_release_data
>
> Jesper explained how it happened: mlx5 driver RX-page recycle
> mechanism is not effective in this workload and pages have to go
> through the page allocator. The lock contention happens during
> mlx5 DMA TX completion cycle. And the page allocator cannot keep
> up at these speeds.[2]
>
> I thought that __free_pages_ok() are mostly freeing high order
> pages and thought this is an lock contention for high order pages
> but Jesper explained in detail that __free_pages_ok() here are
> actually freeing order-0 pages because mlx5 is using order-0 pages
> to satisfy its page pool allocation request.[3]
>
> The free path as pointed out by Jesper is:
> skb_free_head()
> -> skb_free_frag()
> -> skb_free_frag()
Nitpick: you added skb_free_frag() two times, else correct.
(All this stuff gets inlined by the compiler, which makes it hard to
spot with perf report).
> -> page_frag_free()
> And the pages being freed on this path are order-0 pages.
>
> Fix this by doing similar things as in __page_frag_cache_drain() -
> send the being freed page to PCP if it's an order-0 page, or
> directly to Buddy if it is a high order page.
>
> With this change, Paweł hasn't noticed lock contention yet in
> his workload and Jesper has noticed a 7% performance improvement
> using a micro benchmark and lock contention is gone.
>
> [1]: https://www.spinics.net/lists/netdev/msg531362.html
> [2]: https://www.spinics.net/lists/netdev/msg531421.html
> [3]: https://www.spinics.net/lists/netdev/msg531556.html
> Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
> Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---
It is REALLY great that Aaron spotted this! (based on my analysis).
This have likely been causing scalability issues on real-life network
traffic, but have been hiding behind the driver level recycle tricks
for micro-benchmarking.
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> mm/page_alloc.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ae31839874b8..91a9a6af41a2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4555,8 +4555,14 @@ void page_frag_free(void *addr)
> {
> struct page *page = virt_to_head_page(addr);
>
> - if (unlikely(put_page_testzero(page)))
> - __free_pages_ok(page, compound_order(page));
> + if (unlikely(put_page_testzero(page))) {
> + unsigned int order = compound_order(page);
> +
> + if (order == 0)
> + free_unref_page(page);
> + else
> + __free_pages_ok(page, order);
> + }
> }
> EXPORT_SYMBOL(page_frag_free);
>
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2018-11-05 9:55 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-05 8:58 [PATCH 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free() Aaron Lu
2018-11-05 8:58 ` Aaron Lu
2018-11-05 8:58 ` [PATCH 2/2] mm/page_alloc: use a single function to free page Aaron Lu
2018-11-05 16:39 ` Dave Hansen
2018-11-06 5:30 ` [PATCH v2 " Aaron Lu
2018-11-06 8:16 ` Vlastimil Babka
2018-11-06 8:47 ` Aaron Lu
2018-11-06 9:32 ` Vlastimil Babka
2018-11-06 11:20 ` Aaron Lu
2018-11-06 11:31 ` [PATCH v3 " Aaron Lu
2018-11-06 12:06 ` Vlastimil Babka
2018-11-05 9:26 ` [PATCH 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free() Vlastimil Babka
2018-11-05 9:26 ` Vlastimil Babka
2018-11-05 9:26 ` Mel Gorman
2018-11-05 9:26 ` Mel Gorman
2018-11-05 9:55 ` Jesper Dangaard Brouer [this message]
2018-11-05 10:46 ` Ilias Apalodimas
2018-11-05 10:46 ` Ilias Apalodimas
2018-11-05 15:44 ` Alexander Duyck
2018-11-10 23:54 ` Paweł Staszewski
2018-11-10 23:54 ` Paweł Staszewski
2018-11-11 23:05 ` Alexander Duyck
2018-11-12 0:39 ` Paweł Staszewski
2018-11-12 0:39 ` Paweł Staszewski
2018-11-12 15:30 ` Alexander Duyck
2018-11-12 15:44 ` Eric Dumazet
2018-11-12 17:06 ` Paweł Staszewski
2018-11-12 17:06 ` Paweł Staszewski
2018-11-12 17:01 ` Paweł Staszewski
2018-11-12 17:01 ` Paweł Staszewski
2018-11-05 16:37 ` Dave Hansen
2018-11-06 5:28 ` [PATCH v2 " Aaron Lu
2018-11-06 5:28 ` Aaron Lu
2018-11-07 9:59 ` Tariq Toukan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181105105525.1f78c661@redhat.com \
--to=brouer@redhat.com \
--cc=aaron.lu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=eric.dumazet@gmail.com \
--cc=ilias.apalodimas@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=pstaszewski@itcare.pl \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
--cc=vbabka@suse.cz \
--cc=yoel@kviknet.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.