From: Vlastimil Babka <vbabka@suse.cz>
To: Debabrata Banerjee <dbavatar@gmail.com>,
Eric Dumazet <eric.dumazet@gmail.com>
Cc: Shaohua Li <shli@fb.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"davem@davemloft.net" <davem@davemloft.net>,
Kernel-team@fb.com, Eric Dumazet <edumazet@google.com>,
David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, "Banerjee, Debabrata" <dbanerje@akamai.com>,
Joshua Hunt <johunt@akamai.com>
Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation
Date: Fri, 12 Jun 2015 11:34:17 +0200 [thread overview]
Message-ID: <557AA799.8000306@suse.cz> (raw)
In-Reply-To: <CAATkVEwg-0=nBrcb2N_ZtEJdCwJbzbSyMK-3SpBj_BgfjKucHg@mail.gmail.com>
On 06/11/2015 11:28 PM, Debabrata Banerjee wrote:
> Resend in plaintext, thanks gmail:
>
> It's somewhat an intractable problem to know if compaction will succeed
> without trying it,
There are heuristics, but those cannot be perfect by definition. I think
the worse problem here is the extra latency, even if it does succeed,
though.
> and you can certainly end up in a state where memory is
> heavily fragmented, even with compaction running. You can't compact kernel
> pages for example, so you can end up in a state where compaction does
> nothing through no fault of it's own.
Correct.
> In this case you waste time in compaction routines, then end up reclaiming
> precious page cache pages or swapping out for whatever it is your machine
> was doing trying to do to satisfy these order-3 allocations, after which all
> those pages need to be restored from disk almost immediately. This is not a
> happy server.
That sounds like an overloaded server to me.
> Any mm fix may be years away.
Well, what kind of "fix"? There's no way to always avoid fragmentation
without some kind of an oracle that will tell you which unmovable
allocations (e.g. kernel pages) to put side by side because they will be
freed at the same time.
> The only simple solution I can
> think of is specifically caching these allocations, in any other case under
> memory pressure they will be split by other smaller allocations.
In this case the allocations have simple fallback to order-0, so caching
them would make sense only if someone shows that the benefits of having
order-3 instead of order-0 them are worth it.
> We've been forcing these allocations to order-0 internally until we can
> think of something else.
I think the proposed patch is better than forcing everything to order-0.
It makes the attempt to allocate order-3 cheap.
The VM should generally serve you better if it's told your requirements.
Communicating that the order-3 allocation is just an opportunistic
attempt with simple fallback is the right way.
> -Deb
>
>
>> On Thu, Jun 11, 2015 at 4:48 PM, Eric Dumazet <eric.dumazet@gmail.com>
>> wrote:
>>>
>>> On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote:
>>>> We saw excessive memory compaction triggered by skb_page_frag_refill.
>>>> This causes performance issues. Commit 5640f7685831e0 introduces the
>>>> order-3 allocation to improve performance. But memory compaction has
>>>> high overhead. The benefit of order-3 allocation can't compensate the
>>>> overhead of memory compaction.
>>>>
>>>> This patch makes the order-3 page allocation atomic. If there is no
>>>> memory pressure and memory isn't fragmented, the alloction will still
>>>> success, so we don't sacrifice the order-3 benefit here. If the atomic
>>>> allocation fails, compaction will not be triggered and we will fallback
>>>> to order-0 immediately.
>>>>
>>>> The mellanox driver does similar thing, if this is accepted, we must fix
>>>> the driver too.
>>>>
>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>> Signed-off-by: Shaohua Li <shli@fb.com>
>>>> ---
>>>> net/core/sock.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/core/sock.c b/net/core/sock.c
>>>> index 292f422..e9855a4 100644
>>>> --- a/net/core/sock.c
>>>> +++ b/net/core/sock.c
>>>> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct
>>>> page_frag *pfrag, gfp_t gfp)
>>>>
>>>> pfrag->offset = 0;
>>>> if (SKB_FRAG_PAGE_ORDER) {
>>>> - pfrag->page = alloc_pages(gfp | __GFP_COMP |
>>>> + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP
>>>> |
>>>> __GFP_NOWARN | __GFP_NORETRY,
>>>> SKB_FRAG_PAGE_ORDER);
>>>> if (likely(pfrag->page)) {
>>>
>>> This is not a specific networking issue, but mm one.
>>>
>>> You really need to start a discussion with mm experts.
>>>
>>> Your changelog does not exactly explains what _is_ the problem.
>>>
>>> If the problem lies in mm layer, it might be time to fix it, instead of
>>> work around the bug by never triggering it from this particular point,
>>> which is a safe point where a process is willing to wait a bit.
>>>
>>> Memory compaction is either working as intending, or not.
>>>
>>> If we enabled it but never run it because it hurts, what is the point
>>> enabling it ?
>>>
>>>
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org. For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-06-12 9:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-11 22:27 [RFC v2] net: use atomic allocation for order-3 page allocation Shaohua Li
2015-06-11 20:48 ` [RFC] " Eric Dumazet
2015-06-11 21:16 ` Chris Mason
2015-06-11 21:22 ` Eric Dumazet
2015-06-11 21:45 ` Shaohua Li
2015-06-11 21:56 ` Eric Dumazet
2015-06-11 22:01 ` Shaohua Li
2015-06-11 22:18 ` Chris Mason
2015-06-11 22:55 ` Eric Dumazet
2015-06-11 21:35 ` Debabrata Banerjee
2015-06-11 22:18 ` David Miller
2015-06-12 9:25 ` Vlastimil Babka
2015-06-11 21:25 ` Debabrata Banerjee
2015-06-11 21:28 ` Debabrata Banerjee
2015-06-12 9:34 ` Vlastimil Babka [this message]
2015-06-11 22:53 ` [RFC v2] " Eric Dumazet
2015-06-11 23:32 ` Shaohua Li
2015-06-11 23:38 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=557AA799.8000306@suse.cz \
--to=vbabka@suse.cz \
--cc=Kernel-team@fb.com \
--cc=davem@davemloft.net \
--cc=dbanerje@akamai.com \
--cc=dbavatar@gmail.com \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=johunt@akamai.com \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=rientjes@google.com \
--cc=shli@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).