From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f170.google.com (mail-ie0-f170.google.com [209.85.223.170]) by kanga.kvack.org (Postfix) with ESMTP id 53E076B0032 for ; Thu, 11 Jun 2015 16:48:10 -0400 (EDT) Received: by iebgx4 with SMTP id gx4so12364846ieb.0 for ; Thu, 11 Jun 2015 13:48:10 -0700 (PDT) Received: from mail-ie0-x244.google.com (mail-ie0-x244.google.com. [2607:f8b0:4001:c03::244]) by mx.google.com with ESMTPS id g63si522836ioj.58.2015.06.11.13.48.09 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 13:48:09 -0700 (PDT) Received: by ierx19 with SMTP id x19so5122539ier.0 for ; Thu, 11 Jun 2015 13:48:09 -0700 (PDT) Message-ID: <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation From: Eric Dumazet Date: Thu, 11 Jun 2015 13:48:07 -0700 In-Reply-To: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: > We saw excessive memory compaction triggered by skb_page_frag_refill. > This causes performance issues. Commit 5640f7685831e0 introduces the > order-3 allocation to improve performance. But memory compaction has > high overhead. The benefit of order-3 allocation can't compensate the > overhead of memory compaction. > > This patch makes the order-3 page allocation atomic. If there is no > memory pressure and memory isn't fragmented, the alloction will still > success, so we don't sacrifice the order-3 benefit here. If the atomic > allocation fails, compaction will not be triggered and we will fallback > to order-0 immediately. > > The mellanox driver does similar thing, if this is accepted, we must fix > the driver too. > > Cc: Eric Dumazet > Signed-off-by: Shaohua Li > --- > net/core/sock.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 292f422..e9855a4 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > > pfrag->offset = 0; > if (SKB_FRAG_PAGE_ORDER) { > - pfrag->page = alloc_pages(gfp | __GFP_COMP | > + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | > __GFP_NOWARN | __GFP_NORETRY, > SKB_FRAG_PAGE_ORDER); > if (likely(pfrag->page)) { This is not a specific networking issue, but mm one. You really need to start a discussion with mm experts. Your changelog does not exactly explains what _is_ the problem. If the problem lies in mm layer, it might be time to fix it, instead of work around the bug by never triggering it from this particular point, which is a safe point where a process is willing to wait a bit. Memory compaction is either working as intending, or not. If we enabled it but never run it because it hurts, what is the point enabling it ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f181.google.com (mail-pd0-f181.google.com [209.85.192.181]) by kanga.kvack.org (Postfix) with ESMTP id 1A5246B0032 for ; Thu, 11 Jun 2015 17:16:56 -0400 (EDT) Received: by pdbnf5 with SMTP id nf5so10439248pdb.2 for ; Thu, 11 Jun 2015 14:16:55 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com. [67.231.145.42]) by mx.google.com with ESMTPS id s7si2485995pdl.14.2015.06.11.14.16.54 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 11 Jun 2015 14:16:55 -0700 (PDT) Message-ID: <5579FABE.4050505@fb.com> Date: Thu, 11 Jun 2015 17:16:46 -0400 From: Chris Mason MIME-Version: 1.0 Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> In-Reply-To: <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet , Shaohua Li Cc: netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org On 06/11/2015 04:48 PM, Eric Dumazet wrote: > On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: >> We saw excessive memory compaction triggered by skb_page_frag_refill. >> This causes performance issues. Commit 5640f7685831e0 introduces the >> order-3 allocation to improve performance. But memory compaction has >> high overhead. The benefit of order-3 allocation can't compensate the >> overhead of memory compaction. >> >> This patch makes the order-3 page allocation atomic. If there is no >> memory pressure and memory isn't fragmented, the alloction will still >> success, so we don't sacrifice the order-3 benefit here. If the atomic >> allocation fails, compaction will not be triggered and we will fallback >> to order-0 immediately. >> >> The mellanox driver does similar thing, if this is accepted, we must fix >> the driver too. >> >> Cc: Eric Dumazet >> Signed-off-by: Shaohua Li >> --- >> net/core/sock.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/core/sock.c b/net/core/sock.c >> index 292f422..e9855a4 100644 >> --- a/net/core/sock.c >> +++ b/net/core/sock.c >> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) >> >> pfrag->offset = 0; >> if (SKB_FRAG_PAGE_ORDER) { >> - pfrag->page = alloc_pages(gfp | __GFP_COMP | >> + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | >> __GFP_NOWARN | __GFP_NORETRY, >> SKB_FRAG_PAGE_ORDER); >> if (likely(pfrag->page)) { > > This is not a specific networking issue, but mm one. > > You really need to start a discussion with mm experts. > > Your changelog does not exactly explains what _is_ the problem. > > If the problem lies in mm layer, it might be time to fix it, instead of > work around the bug by never triggering it from this particular point, > which is a safe point where a process is willing to wait a bit. > > Memory compaction is either working as intending, or not. > > If we enabled it but never run it because it hurts, what is the point > enabling it ? networking is asking for 32KB, and the MM layer is doing what it can to provide it. Are the gains from getting 32KB contig bigger than the cost of moving pages around if the MM has to actually go into compaction? Should we start disk IO to give back 32KB contig? I think we want to tell the MM to compact in the background and give networking 32KB if it happens to have it available. If not, fall back to smaller allocations without doing anything expensive. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f180.google.com (mail-ig0-f180.google.com [209.85.213.180]) by kanga.kvack.org (Postfix) with ESMTP id 489BA900015 for ; Thu, 11 Jun 2015 17:22:16 -0400 (EDT) Received: by igbsb11 with SMTP id sb11so12927281igb.0 for ; Thu, 11 Jun 2015 14:22:16 -0700 (PDT) Received: from mail-ig0-x241.google.com (mail-ig0-x241.google.com. [2607:f8b0:4001:c05::241]) by mx.google.com with ESMTPS id ik8si217783igb.1.2015.06.11.14.22.15 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 14:22:15 -0700 (PDT) Received: by igdh15 with SMTP id h15so4865997igd.3 for ; Thu, 11 Jun 2015 14:22:15 -0700 (PDT) Message-ID: <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation From: Eric Dumazet Date: Thu, 11 Jun 2015 14:22:13 -0700 In-Reply-To: <5579FABE.4050505@fb.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Chris Mason Cc: Shaohua Li , netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org On Thu, 2015-06-11 at 17:16 -0400, Chris Mason wrote: > On 06/11/2015 04:48 PM, Eric Dumazet wrote: > > On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: > >> We saw excessive memory compaction triggered by skb_page_frag_refill. > >> This causes performance issues. Commit 5640f7685831e0 introduces the > >> order-3 allocation to improve performance. But memory compaction has > >> high overhead. The benefit of order-3 allocation can't compensate the > >> overhead of memory compaction. > >> > >> This patch makes the order-3 page allocation atomic. If there is no > >> memory pressure and memory isn't fragmented, the alloction will still > >> success, so we don't sacrifice the order-3 benefit here. If the atomic > >> allocation fails, compaction will not be triggered and we will fallback > >> to order-0 immediately. > >> > >> The mellanox driver does similar thing, if this is accepted, we must fix > >> the driver too. > >> > >> Cc: Eric Dumazet > >> Signed-off-by: Shaohua Li > >> --- > >> net/core/sock.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/net/core/sock.c b/net/core/sock.c > >> index 292f422..e9855a4 100644 > >> --- a/net/core/sock.c > >> +++ b/net/core/sock.c > >> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > >> > >> pfrag->offset = 0; > >> if (SKB_FRAG_PAGE_ORDER) { > >> - pfrag->page = alloc_pages(gfp | __GFP_COMP | > >> + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | > >> __GFP_NOWARN | __GFP_NORETRY, > >> SKB_FRAG_PAGE_ORDER); > >> if (likely(pfrag->page)) { > > > > This is not a specific networking issue, but mm one. > > > > You really need to start a discussion with mm experts. > > > > Your changelog does not exactly explains what _is_ the problem. > > > > If the problem lies in mm layer, it might be time to fix it, instead of > > work around the bug by never triggering it from this particular point, > > which is a safe point where a process is willing to wait a bit. > > > > Memory compaction is either working as intending, or not. > > > > If we enabled it but never run it because it hurts, what is the point > > enabling it ? > > networking is asking for 32KB, and the MM layer is doing what it can to > provide it. Are the gains from getting 32KB contig bigger than the cost > of moving pages around if the MM has to actually go into compaction? > Should we start disk IO to give back 32KB contig? > > I think we want to tell the MM to compact in the background and give > networking 32KB if it happens to have it available. If not, fall back > to smaller allocations without doing anything expensive. Exactly my point. (And I mentioned this about 4 months ago) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by kanga.kvack.org (Postfix) with ESMTP id 895AE6B0032 for ; Thu, 11 Jun 2015 17:25:51 -0400 (EDT) Received: by wifx6 with SMTP id x6so18669238wif.0 for ; Thu, 11 Jun 2015 14:25:51 -0700 (PDT) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com. [2a00:1450:400c:c05::235]) by mx.google.com with ESMTPS id oo2si3392917wjc.190.2015.06.11.14.25.49 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 14:25:50 -0700 (PDT) Received: by wibut5 with SMTP id ut5so1109990wib.1 for ; Thu, 11 Jun 2015 14:25:49 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> Date: Thu, 11 Jun 2015 17:25:49 -0400 Message-ID: Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation From: Debabrata Banerjee Content-Type: multipart/alternative; boundary=f46d043c80eeab2780051844a2b7 Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: Shaohua Li , "netdev@vger.kernel.org" , "davem@davemloft.net" , Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org, "Banerjee, Debabrata" , Joshua Hunt --f46d043c80eeab2780051844a2b7 Content-Type: text/plain; charset=UTF-8 It's somewhat an intractable problem to know if compaction will succeed without trying it, and you can certainly end up in a state where memory is heavily fragmented, even with compaction running. You can't compact kernel pages for example, so you can end up in a state where compaction does nothing through no fault of it's own. In this case you waste time in compaction routines, then end up reclaiming precious page cache pages or swapping out for whatever it is your machine was doing trying to do to satisfy these order-3 allocations, after which all those pages need to be restored from disk almost immediately. This is not a happy server. Any mm fix may be years away. The only simple solution I can think of is specifically caching these allocations, in any other case under memory pressure they will be split by other smaller allocations. We've been forcing these allocations to order-0 internally until we can think of something else. -Deb On Thu, Jun 11, 2015 at 4:48 PM, Eric Dumazet wrote: > On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: > > We saw excessive memory compaction triggered by skb_page_frag_refill. > > This causes performance issues. Commit 5640f7685831e0 introduces the > > order-3 allocation to improve performance. But memory compaction has > > high overhead. The benefit of order-3 allocation can't compensate the > > overhead of memory compaction. > > > > This patch makes the order-3 page allocation atomic. If there is no > > memory pressure and memory isn't fragmented, the alloction will still > > success, so we don't sacrifice the order-3 benefit here. If the atomic > > allocation fails, compaction will not be triggered and we will fallback > > to order-0 immediately. > > > > The mellanox driver does similar thing, if this is accepted, we must fix > > the driver too. > > > > Cc: Eric Dumazet > > Signed-off-by: Shaohua Li > > --- > > net/core/sock.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 292f422..e9855a4 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct > page_frag *pfrag, gfp_t gfp) > > > > pfrag->offset = 0; > > if (SKB_FRAG_PAGE_ORDER) { > > - pfrag->page = alloc_pages(gfp | __GFP_COMP | > > + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP > | > > __GFP_NOWARN | __GFP_NORETRY, > > SKB_FRAG_PAGE_ORDER); > > if (likely(pfrag->page)) { > > This is not a specific networking issue, but mm one. > > You really need to start a discussion with mm experts. > > Your changelog does not exactly explains what _is_ the problem. > > If the problem lies in mm layer, it might be time to fix it, instead of > work around the bug by never triggering it from this particular point, > which is a safe point where a process is willing to wait a bit. > > Memory compaction is either working as intending, or not. > > If we enabled it but never run it because it hurts, what is the point > enabling it ? > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > --f46d043c80eeab2780051844a2b7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
It's somewhat an intractable problem to know if compac= tion will succeed without trying it, and you can certainly end up in a stat= e where memory is heavily fragmented, even with compaction running. You can= 't compact kernel pages for example, so you can end up in a state where= compaction does nothing through no fault of it's own.

In this case you waste time in compaction routines, then end up reclaimi= ng precious page cache pages or swapping out for whatever it is your machin= e was doing trying to do to satisfy these order-3 allocations, after which = all those pages need to be restored from disk almost immediately. This is n= ot a happy server. Any mm fix may be years away. The only simple solution I= can think of is specifically caching these allocations, in any other case = under memory pressure they will be split by other smaller allocations.
=
We've been forcing these allocations to order-0 internal= ly until we can think of something else.

-Deb
<= div class=3D"gmail_extra">
On Thu, Jun 11, 20= 15 at 4:48 PM, Eric Dumazet <eric.dumazet@gmail.com> wr= ote:
On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote:
> We saw excessive memory compaction triggered by skb_page_frag_refill.<= br> > This causes performance issues. Commit 5640f7685831e0 introduces the > order-3 allocation to improve performance. But memory compaction has > high overhead. The benefit of order-3 allocation can't compensate = the
> overhead of memory compaction.
>
> This patch makes the order-3 page allocation atomic. If there is no > memory pressure and memory isn't fragmented, the alloction will st= ill
> success, so we don't sacrifice the order-3 benefit here. If the at= omic
> allocation fails, compaction will not be triggered and we will fallbac= k
> to order-0 immediately.
>
> The mellanox driver does similar thing, if this is accepted, we must f= ix
> the driver too.
>
> Cc: Eric Dumazet <edumazet@g= oogle.com>
> Signed-off-by: Shaohua Li <shli@fb.c= om>
> ---
>=C2=A0 net/core/sock.c | 2 +-
>=C2=A0 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 292f422..e9855a4 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struc= t page_frag *pfrag, gfp_t gfp)
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0pfrag->offset =3D 0;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0if (SKB_FRAG_PAGE_ORDER) {
> -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pfrag->page =3D al= loc_pages(gfp | __GFP_COMP |
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pfrag->page =3D al= loc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP |
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0__GFP_NOWARN | __GFP_NORETRY,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0SKB_FRAG_PAGE_ORDER);
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (likely(pfrag= ->page)) {

This is not a specific networking issue, but mm one.

You really need to start a discussion with mm experts.

Your changelog does not exactly explains what _is_ the problem.

If the problem lies in mm layer, it might be time to fix it, instead of
work around the bug by never triggering it from this particular point,
which is a safe point where a process is willing to wait a bit.

Memory compaction is either working as intending, or not.

If we enabled it but never run it because it hurts, what is the point
enabling it ?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.= =C2=A0 For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=3Dmailto:"dont@kvack.org"> email@kva= ck.org </a>

--f46d043c80eeab2780051844a2b7-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f176.google.com (mail-wi0-f176.google.com [209.85.212.176]) by kanga.kvack.org (Postfix) with ESMTP id 9ABC86B0038 for ; Thu, 11 Jun 2015 17:28:07 -0400 (EDT) Received: by wiwd19 with SMTP id d19so1169729wiw.0 for ; Thu, 11 Jun 2015 14:28:07 -0700 (PDT) Received: from mail-wi0-x22c.google.com (mail-wi0-x22c.google.com. [2a00:1450:400c:c05::22c]) by mx.google.com with ESMTPS id y11si4076892wiv.114.2015.06.11.14.28.05 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 14:28:06 -0700 (PDT) Received: by wibut5 with SMTP id ut5so1139553wib.1 for ; Thu, 11 Jun 2015 14:28:05 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> Date: Thu, 11 Jun 2015 17:28:05 -0400 Message-ID: Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation From: Debabrata Banerjee Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: Shaohua Li , "netdev@vger.kernel.org" , "davem@davemloft.net" , Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org, "Banerjee, Debabrata" , Joshua Hunt Resend in plaintext, thanks gmail: It's somewhat an intractable problem to know if compaction will succeed without trying it, and you can certainly end up in a state where memory is heavily fragmented, even with compaction running. You can't compact kernel pages for example, so you can end up in a state where compaction does nothing through no fault of it's own. In this case you waste time in compaction routines, then end up reclaiming precious page cache pages or swapping out for whatever it is your machine was doing trying to do to satisfy these order-3 allocations, after which all those pages need to be restored from disk almost immediately. This is not a happy server. Any mm fix may be years away. The only simple solution I can think of is specifically caching these allocations, in any other case under memory pressure they will be split by other smaller allocations. We've been forcing these allocations to order-0 internally until we can think of something else. -Deb > On Thu, Jun 11, 2015 at 4:48 PM, Eric Dumazet > wrote: >> >> On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: >> > We saw excessive memory compaction triggered by skb_page_frag_refill. >> > This causes performance issues. Commit 5640f7685831e0 introduces the >> > order-3 allocation to improve performance. But memory compaction has >> > high overhead. The benefit of order-3 allocation can't compensate the >> > overhead of memory compaction. >> > >> > This patch makes the order-3 page allocation atomic. If there is no >> > memory pressure and memory isn't fragmented, the alloction will still >> > success, so we don't sacrifice the order-3 benefit here. If the atomic >> > allocation fails, compaction will not be triggered and we will fallback >> > to order-0 immediately. >> > >> > The mellanox driver does similar thing, if this is accepted, we must fix >> > the driver too. >> > >> > Cc: Eric Dumazet >> > Signed-off-by: Shaohua Li >> > --- >> > net/core/sock.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/net/core/sock.c b/net/core/sock.c >> > index 292f422..e9855a4 100644 >> > --- a/net/core/sock.c >> > +++ b/net/core/sock.c >> > @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct >> > page_frag *pfrag, gfp_t gfp) >> > >> > pfrag->offset = 0; >> > if (SKB_FRAG_PAGE_ORDER) { >> > - pfrag->page = alloc_pages(gfp | __GFP_COMP | >> > + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP >> > | >> > __GFP_NOWARN | __GFP_NORETRY, >> > SKB_FRAG_PAGE_ORDER); >> > if (likely(pfrag->page)) { >> >> This is not a specific networking issue, but mm one. >> >> You really need to start a discussion with mm experts. >> >> Your changelog does not exactly explains what _is_ the problem. >> >> If the problem lies in mm layer, it might be time to fix it, instead of >> work around the bug by never triggering it from this particular point, >> which is a safe point where a process is willing to wait a bit. >> >> Memory compaction is either working as intending, or not. >> >> If we enabled it but never run it because it hurts, what is the point >> enabling it ? >> >> >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@kvack.org. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: email@kvack.org > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f41.google.com (mail-wg0-f41.google.com [74.125.82.41]) by kanga.kvack.org (Postfix) with ESMTP id CE5286B0038 for ; Thu, 11 Jun 2015 17:35:02 -0400 (EDT) Received: by wgme6 with SMTP id e6so12014294wgm.2 for ; Thu, 11 Jun 2015 14:35:02 -0700 (PDT) Received: from mail-wg0-x235.google.com (mail-wg0-x235.google.com. [2a00:1450:400c:c00::235]) by mx.google.com with ESMTPS id k2si18433295wia.122.2015.06.11.14.35.01 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 14:35:01 -0700 (PDT) Received: by wgbgq6 with SMTP id gq6so11989346wgb.3 for ; Thu, 11 Jun 2015 14:35:01 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <5579FABE.4050505@fb.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> Date: Thu, 11 Jun 2015 17:35:00 -0400 Message-ID: Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation From: Debabrata Banerjee Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Chris Mason Cc: Eric Dumazet , Shaohua Li , "netdev@vger.kernel.org" , "davem@davemloft.net" , Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org, Joshua Hunt , "Banerjee, Debabrata" There is no "background" it doesn't matter if this activity happens synchronously or asynchronously, unless you're sensitive to the latency on that single operation. If you're driving all your cpu's and memory hard then this is work that still takes resources. If there's a kernel thread with compaction running, then obviously your process is not. Your patch should help in that not every atomic allocation failure should mean yet another run at compaction/reclaim. -Deb On Thu, Jun 11, 2015 at 5:16 PM, Chris Mason wrote: > networking is asking for 32KB, and the MM layer is doing what it can to > provide it. Are the gains from getting 32KB contig bigger than the cost > of moving pages around if the MM has to actually go into compaction? > Should we start disk IO to give back 32KB contig? > > I think we want to tell the MM to compact in the background and give > networking 32KB if it happens to have it available. If not, fall back > to smaller allocations without doing anything expensive. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f45.google.com (mail-qg0-f45.google.com [209.85.192.45]) by kanga.kvack.org (Postfix) with ESMTP id E85B76B0038 for ; Thu, 11 Jun 2015 17:45:38 -0400 (EDT) Received: by qgfa66 with SMTP id a66so6004460qgf.0 for ; Thu, 11 Jun 2015 14:45:38 -0700 (PDT) Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com. [67.231.153.30]) by mx.google.com with ESMTPS id e207si1907872qhc.3.2015.06.11.14.45.36 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 11 Jun 2015 14:45:37 -0700 (PDT) Date: Thu, 11 Jun 2015 14:45:25 -0700 From: Shaohua Li Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation Message-ID: <20150611214525.GA406740@devbig257.prn2.facebook.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: Chris Mason , netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org On Thu, Jun 11, 2015 at 02:22:13PM -0700, Eric Dumazet wrote: > On Thu, 2015-06-11 at 17:16 -0400, Chris Mason wrote: > > On 06/11/2015 04:48 PM, Eric Dumazet wrote: > > > On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: > > >> We saw excessive memory compaction triggered by skb_page_frag_refill. > > >> This causes performance issues. Commit 5640f7685831e0 introduces the > > >> order-3 allocation to improve performance. But memory compaction has > > >> high overhead. The benefit of order-3 allocation can't compensate the > > >> overhead of memory compaction. > > >> > > >> This patch makes the order-3 page allocation atomic. If there is no > > >> memory pressure and memory isn't fragmented, the alloction will still > > >> success, so we don't sacrifice the order-3 benefit here. If the atomic > > >> allocation fails, compaction will not be triggered and we will fallback > > >> to order-0 immediately. > > >> > > >> The mellanox driver does similar thing, if this is accepted, we must fix > > >> the driver too. > > >> > > >> Cc: Eric Dumazet > > >> Signed-off-by: Shaohua Li > > >> --- > > >> net/core/sock.c | 2 +- > > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > >> > > >> diff --git a/net/core/sock.c b/net/core/sock.c > > >> index 292f422..e9855a4 100644 > > >> --- a/net/core/sock.c > > >> +++ b/net/core/sock.c > > >> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > > >> > > >> pfrag->offset = 0; > > >> if (SKB_FRAG_PAGE_ORDER) { > > >> - pfrag->page = alloc_pages(gfp | __GFP_COMP | > > >> + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | > > >> __GFP_NOWARN | __GFP_NORETRY, > > >> SKB_FRAG_PAGE_ORDER); > > >> if (likely(pfrag->page)) { > > > > > > This is not a specific networking issue, but mm one. > > > > > > You really need to start a discussion with mm experts. > > > > > > Your changelog does not exactly explains what _is_ the problem. > > > > > > If the problem lies in mm layer, it might be time to fix it, instead of > > > work around the bug by never triggering it from this particular point, > > > which is a safe point where a process is willing to wait a bit. > > > > > > Memory compaction is either working as intending, or not. > > > > > > If we enabled it but never run it because it hurts, what is the point > > > enabling it ? > > > > networking is asking for 32KB, and the MM layer is doing what it can to > > provide it. Are the gains from getting 32KB contig bigger than the cost > > of moving pages around if the MM has to actually go into compaction? > > Should we start disk IO to give back 32KB contig? > > > > I think we want to tell the MM to compact in the background and give > > networking 32KB if it happens to have it available. If not, fall back > > to smaller allocations without doing anything expensive. > > Exactly my point. (And I mentioned this about 4 months ago) This is exactly what the patch try to do. Atomic 32k allocation will fail with memory pressure, kswapd is waken up to do compaction and we fallback to 4k. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f169.google.com (mail-ie0-f169.google.com [209.85.223.169]) by kanga.kvack.org (Postfix) with ESMTP id EC9056B0032 for ; Thu, 11 Jun 2015 17:56:28 -0400 (EDT) Received: by iebmu5 with SMTP id mu5so13145500ieb.1 for ; Thu, 11 Jun 2015 14:56:28 -0700 (PDT) Received: from mail-ig0-x244.google.com (mail-ig0-x244.google.com. [2607:f8b0:4001:c05::244]) by mx.google.com with ESMTPS id p65si1512542iop.13.2015.06.11.14.56.28 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 14:56:28 -0700 (PDT) Received: by igdj8 with SMTP id j8so5084054igd.2 for ; Thu, 11 Jun 2015 14:56:28 -0700 (PDT) Message-ID: <1434059786.27504.58.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation From: Eric Dumazet Date: Thu, 11 Jun 2015 14:56:26 -0700 In-Reply-To: <20150611214525.GA406740@devbig257.prn2.facebook.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> <20150611214525.GA406740@devbig257.prn2.facebook.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: Chris Mason , netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org On Thu, 2015-06-11 at 14:45 -0700, Shaohua Li wrote: > This is exactly what the patch try to do. Atomic 32k allocation will > fail with memory pressure, kswapd is waken up to do compaction and we > fallback to 4k. Read your changelog, then read what you just wrote. Your changelog said : 'compaction will not be triggered and we will fallback to order-0 immediately.' Now you tell me that compaction is started. What is the truth ? Please make sure changelog is precise, this would avoid many mails. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f43.google.com (mail-qg0-f43.google.com [209.85.192.43]) by kanga.kvack.org (Postfix) with ESMTP id C02546B006C for ; Thu, 11 Jun 2015 18:01:26 -0400 (EDT) Received: by qgf75 with SMTP id 75so6064933qgf.1 for ; Thu, 11 Jun 2015 15:01:26 -0700 (PDT) Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com. [67.231.153.30]) by mx.google.com with ESMTPS id w53si498154qge.37.2015.06.11.15.01.25 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 11 Jun 2015 15:01:26 -0700 (PDT) Date: Thu, 11 Jun 2015 15:01:15 -0700 From: Shaohua Li Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation Message-ID: <20150611220115.GA448912@devbig257.prn2.facebook.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> <20150611214525.GA406740@devbig257.prn2.facebook.com> <1434059786.27504.58.camel@edumazet-glaptop2.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1434059786.27504.58.camel@edumazet-glaptop2.roam.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: Chris Mason , netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org On Thu, Jun 11, 2015 at 02:56:26PM -0700, Eric Dumazet wrote: > On Thu, 2015-06-11 at 14:45 -0700, Shaohua Li wrote: > > > This is exactly what the patch try to do. Atomic 32k allocation will > > fail with memory pressure, kswapd is waken up to do compaction and we > > fallback to 4k. > > Read your changelog, then read what you just wrote. > > Your changelog said : > > 'compaction will not be triggered and we will fallback to order-0 > immediately.' > > Now you tell me that compaction is started. > > What is the truth ? > > Please make sure changelog is precise, this would avoid many mails. Ah, ok. I mean direct compaction isn't triggered, kswapd is still waken up to do compaction. I'll update the changelog. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f176.google.com (mail-qk0-f176.google.com [209.85.220.176]) by kanga.kvack.org (Postfix) with ESMTP id A0BE86B006C for ; Thu, 11 Jun 2015 18:18:05 -0400 (EDT) Received: by qkhg32 with SMTP id g32so9157378qkh.0 for ; Thu, 11 Jun 2015 15:18:05 -0700 (PDT) Received: from shards.monkeyblade.net (shards.monkeyblade.net. [2001:4f8:3:36:211:85ff:fe63:a549]) by mx.google.com with ESMTP id 63si157699qhw.101.2015.06.11.15.18.04 for ; Thu, 11 Jun 2015 15:18:04 -0700 (PDT) Date: Thu, 11 Jun 2015 15:18:01 -0700 (PDT) Message-Id: <20150611.151801.1297394068071005900.davem@davemloft.net> Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation From: David Miller In-Reply-To: References: <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: dbavatar@gmail.com Cc: clm@fb.com, eric.dumazet@gmail.com, shli@fb.com, netdev@vger.kernel.org, Kernel-team@fb.com, edumazet@google.com, rientjes@google.com, linux-mm@kvack.org, johunt@akamai.com, dbanerje@akamai.com Please stop top-posting. Quote the relevant material you are replying to first, the add your response commentary afterwards rather than beforehand. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f176.google.com (mail-qk0-f176.google.com [209.85.220.176]) by kanga.kvack.org (Postfix) with ESMTP id 2D0AD6B006E for ; Thu, 11 Jun 2015 18:19:00 -0400 (EDT) Received: by qkhg32 with SMTP id g32so9165041qkh.0 for ; Thu, 11 Jun 2015 15:19:00 -0700 (PDT) Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com. [67.231.153.30]) by mx.google.com with ESMTPS id x104si1971922qgx.59.2015.06.11.15.18.58 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 11 Jun 2015 15:18:59 -0700 (PDT) Message-ID: <557A0949.3020705@fb.com> Date: Thu, 11 Jun 2015 18:18:49 -0400 From: Chris Mason MIME-Version: 1.0 Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> In-Reply-To: <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: Shaohua Li , netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org On 06/11/2015 05:22 PM, Eric Dumazet wrote: > On Thu, 2015-06-11 at 17:16 -0400, Chris Mason wrote: >> On 06/11/2015 04:48 PM, Eric Dumazet wrote: >> >> networking is asking for 32KB, and the MM layer is doing what it can to >> provide it. Are the gains from getting 32KB contig bigger than the cost >> of moving pages around if the MM has to actually go into compaction? >> Should we start disk IO to give back 32KB contig? >> >> I think we want to tell the MM to compact in the background and give >> networking 32KB if it happens to have it available. If not, fall back >> to smaller allocations without doing anything expensive. > > Exactly my point. (And I mentioned this about 4 months ago) Sorry, reading this again I wasn't very clear. I agree with Shaohua's patch because it is telling the allocator that we don't want to wait for reclaim or compaction to find contiguous pages. But, is there any fallback to a single page allocation somewhere else? If this is the only way to get memory, we might want to add a single alloc_page path that won't trigger compaction but is at least able to wait for kswapd to make progress. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f49.google.com (mail-pa0-f49.google.com [209.85.220.49]) by kanga.kvack.org (Postfix) with ESMTP id 5F1E86B0038 for ; Thu, 11 Jun 2015 18:27:21 -0400 (EDT) Received: by padev16 with SMTP id ev16so10380000pad.0 for ; Thu, 11 Jun 2015 15:27:21 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com. [67.231.145.42]) by mx.google.com with ESMTPS id pm1si2656462pbc.4.2015.06.11.15.27.20 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 11 Jun 2015 15:27:20 -0700 (PDT) Received: from pps.filterd (m0044010 [127.0.0.1]) by mx0a-00082601.pphosted.com (8.14.5/8.14.5) with SMTP id t5BMOs35002770 for ; Thu, 11 Jun 2015 15:27:19 -0700 Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 1uy7af17qd-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Thu, 11 Jun 2015 15:27:19 -0700 Received: from facebook.com (2401:db00:20:7003:face:0:4d:0) by mx-out.facebook.com (10.212.236.87) with ESMTP id 047b2ac6108911e584310002c9521c9e-3d1dc2a0 for ; Thu, 11 Jun 2015 15:27:17 -0700 From: Shaohua Li Subject: [RFC v2] net: use atomic allocation for order-3 page allocation Date: Thu, 11 Jun 2015 15:27:16 -0700 Message-ID: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> MIME-Version: 1.0 Content-Type: text/plain Sender: owner-linux-mm@kvack.org List-ID: To: netdev@vger.kernel.org Cc: davem@davemloft.net, Kernel-team@fb.com, clm@fb.com, linux-mm@kvack.org, dbavatar@gmail.com, Eric Dumazet We saw excessive direct memory compaction triggered by skb_page_frag_refill. This causes performance issues and add latency. Commit 5640f7685831e0 introduces the order-3 allocation. According to the changelog, the order-3 allocation isn't a must-have but to improve performance. But direct memory compaction has high overhead. The benefit of order-3 allocation can't compensate the overhead of direct memory compaction. This patch makes the order-3 page allocation atomic. If there is no memory pressure and memory isn't fragmented, the alloction will still success, so we don't sacrifice the order-3 benefit here. If the atomic allocation fails, direct memory compaction will not be triggered, skb_page_frag_refill will fallback to order-0 immediately, hence the direct memory compaction overhead is avoided. In the allocation failure case, kswapd is waken up and doing compaction, so chances are allocation could success next time. The mellanox driver does similar thing, if this is accepted, we must fix the driver too. V2: make the changelog clearer Cc: Eric Dumazet Signed-off-by: Shaohua Li --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index 292f422..e9855a4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) pfrag->offset = 0; if (SKB_FRAG_PAGE_ORDER) { - pfrag->page = alloc_pages(gfp | __GFP_COMP | + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY, SKB_FRAG_PAGE_ORDER); if (likely(pfrag->page)) { -- 1.8.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f181.google.com (mail-ie0-f181.google.com [209.85.223.181]) by kanga.kvack.org (Postfix) with ESMTP id 2D15D6B0032 for ; Thu, 11 Jun 2015 18:53:07 -0400 (EDT) Received: by iebps5 with SMTP id ps5so13704498ieb.3 for ; Thu, 11 Jun 2015 15:53:07 -0700 (PDT) Received: from mail-ig0-x244.google.com (mail-ig0-x244.google.com. [2607:f8b0:4001:c05::244]) by mx.google.com with ESMTPS id a3si1838138icv.24.2015.06.11.15.53.06 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 15:53:06 -0700 (PDT) Received: by igdj8 with SMTP id j8so260047igd.0 for ; Thu, 11 Jun 2015 15:53:06 -0700 (PDT) Message-ID: <1434063184.27504.60.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC v2] net: use atomic allocation for order-3 page allocation From: Eric Dumazet Date: Thu, 11 Jun 2015 15:53:04 -0700 In-Reply-To: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, clm@fb.com, linux-mm@kvack.org, dbavatar@gmail.com, Eric Dumazet On Thu, 2015-06-11 at 15:27 -0700, Shaohua Li wrote: > We saw excessive direct memory compaction triggered by skb_page_frag_refill. > This causes performance issues and add latency. Commit 5640f7685831e0 > introduces the order-3 allocation. According to the changelog, the order-3 > allocation isn't a must-have but to improve performance. But direct memory > compaction has high overhead. The benefit of order-3 allocation can't > compensate the overhead of direct memory compaction. > > This patch makes the order-3 page allocation atomic. If there is no memory > pressure and memory isn't fragmented, the alloction will still success, so we > don't sacrifice the order-3 benefit here. If the atomic allocation fails, > direct memory compaction will not be triggered, skb_page_frag_refill will > fallback to order-0 immediately, hence the direct memory compaction overhead is > avoided. In the allocation failure case, kswapd is waken up and doing > compaction, so chances are allocation could success next time. > > The mellanox driver does similar thing, if this is accepted, we must fix > the driver too. > > V2: make the changelog clearer > > Cc: Eric Dumazet > Signed-off-by: Shaohua Li > --- > net/core/sock.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 292f422..e9855a4 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > > pfrag->offset = 0; > if (SKB_FRAG_PAGE_ORDER) { > - pfrag->page = alloc_pages(gfp | __GFP_COMP | > + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | > __GFP_NOWARN | __GFP_NORETRY, > SKB_FRAG_PAGE_ORDER); > if (likely(pfrag->page)) { OK, now what about alloc_skb_with_frags() ? This should have same problem right ? Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f172.google.com (mail-ig0-f172.google.com [209.85.213.172]) by kanga.kvack.org (Postfix) with ESMTP id B9EDF6B0038 for ; Thu, 11 Jun 2015 18:55:57 -0400 (EDT) Received: by igbhj9 with SMTP id hj9so1852594igb.1 for ; Thu, 11 Jun 2015 15:55:57 -0700 (PDT) Received: from mail-ig0-x241.google.com (mail-ig0-x241.google.com. [2607:f8b0:4001:c05::241]) by mx.google.com with ESMTPS id 15si1577580ioo.98.2015.06.11.15.55.57 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 15:55:57 -0700 (PDT) Received: by igdj8 with SMTP id j8so256586igd.2 for ; Thu, 11 Jun 2015 15:55:57 -0700 (PDT) Message-ID: <1434063355.27504.62.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation From: Eric Dumazet Date: Thu, 11 Jun 2015 15:55:55 -0700 In-Reply-To: <557A0949.3020705@fb.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> <557A0949.3020705@fb.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Chris Mason Cc: Shaohua Li , netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org On Thu, 2015-06-11 at 18:18 -0400, Chris Mason wrote: > But, is there any fallback to a single page allocation somewhere else? > If this is the only way to get memory, we might want to add a single > alloc_page path that won't trigger compaction but is at least able to > wait for kswapd to make progress. Sure, there is a fallback to order-0 in both skb_page_frag_refill() and alloc_skb_with_frags() They also use __GFP_NOWARN | __GFP_NORETRY -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com [209.85.220.53]) by kanga.kvack.org (Postfix) with ESMTP id 2DE656B0032 for ; Thu, 11 Jun 2015 19:32:47 -0400 (EDT) Received: by pacyx8 with SMTP id yx8so10919836pac.2 for ; Thu, 11 Jun 2015 16:32:46 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com. [67.231.145.42]) by mx.google.com with ESMTPS id 13si2781267pdb.141.2015.06.11.16.32.46 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Thu, 11 Jun 2015 16:32:46 -0700 (PDT) Date: Thu, 11 Jun 2015 16:32:35 -0700 From: Shaohua Li Subject: Re: [RFC v2] net: use atomic allocation for order-3 page allocation Message-ID: <20150611233235.GA667489@devbig257.prn2.facebook.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434063184.27504.60.camel@edumazet-glaptop2.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1434063184.27504.60.camel@edumazet-glaptop2.roam.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: To: Eric Dumazet Cc: netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, clm@fb.com, linux-mm@kvack.org, dbavatar@gmail.com, Eric Dumazet On Thu, Jun 11, 2015 at 03:53:04PM -0700, Eric Dumazet wrote: > On Thu, 2015-06-11 at 15:27 -0700, Shaohua Li wrote: > > We saw excessive direct memory compaction triggered by skb_page_frag_refill. > > This causes performance issues and add latency. Commit 5640f7685831e0 > > introduces the order-3 allocation. According to the changelog, the order-3 > > allocation isn't a must-have but to improve performance. But direct memory > > compaction has high overhead. The benefit of order-3 allocation can't > > compensate the overhead of direct memory compaction. > > > > This patch makes the order-3 page allocation atomic. If there is no memory > > pressure and memory isn't fragmented, the alloction will still success, so we > > don't sacrifice the order-3 benefit here. If the atomic allocation fails, > > direct memory compaction will not be triggered, skb_page_frag_refill will > > fallback to order-0 immediately, hence the direct memory compaction overhead is > > avoided. In the allocation failure case, kswapd is waken up and doing > > compaction, so chances are allocation could success next time. > > > > The mellanox driver does similar thing, if this is accepted, we must fix > > the driver too. > > > > V2: make the changelog clearer > > > > Cc: Eric Dumazet > > Signed-off-by: Shaohua Li > > --- > > net/core/sock.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 292f422..e9855a4 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > > > > pfrag->offset = 0; > > if (SKB_FRAG_PAGE_ORDER) { > > - pfrag->page = alloc_pages(gfp | __GFP_COMP | > > + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | > > __GFP_NOWARN | __GFP_NORETRY, > > SKB_FRAG_PAGE_ORDER); > > if (likely(pfrag->page)) { > > > OK, now what about alloc_skb_with_frags() ? > > This should have same problem right ? Ok, looks similar, added. Didn't trigger this one though. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com [209.85.213.169]) by kanga.kvack.org (Postfix) with ESMTP id 59D266B0032 for ; Thu, 11 Jun 2015 19:38:25 -0400 (EDT) Received: by igbpi8 with SMTP id pi8so2229951igb.0 for ; Thu, 11 Jun 2015 16:38:25 -0700 (PDT) Received: from mail-ie0-x243.google.com (mail-ie0-x243.google.com. [2607:f8b0:4001:c03::243]) by mx.google.com with ESMTPS id pg9si1906584icb.5.2015.06.11.16.38.24 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2015 16:38:24 -0700 (PDT) Received: by iebtr6 with SMTP id tr6so6082561ieb.1 for ; Thu, 11 Jun 2015 16:38:24 -0700 (PDT) Message-ID: <1434065902.27504.64.camel@edumazet-glaptop2.roam.corp.google.com> Subject: Re: [RFC v2] net: use atomic allocation for order-3 page allocation From: Eric Dumazet Date: Thu, 11 Jun 2015 16:38:22 -0700 In-Reply-To: <20150611233235.GA667489@devbig257.prn2.facebook.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434063184.27504.60.camel@edumazet-glaptop2.roam.corp.google.com> <20150611233235.GA667489@devbig257.prn2.facebook.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Shaohua Li Cc: netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, clm@fb.com, linux-mm@kvack.org, dbavatar@gmail.com, Eric Dumazet On Thu, 2015-06-11 at 16:32 -0700, Shaohua Li wrote: > > Ok, looks similar, added. Didn't trigger this one though. Probably because you do not use af_unix with big enough messages. > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 3cfff2a..9856c7a 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4398,7 +4398,9 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len, > > while (order) { > if (npages >= 1 << order) { > - page = alloc_pages(gfp_mask | Here, order is > 0 (Look at while (order) right above) > + gfp_t gfp = order > 0 ? > + gfp_mask & ~__GFP_WAIT : gfp_mask; > + page = alloc_pages(gfp | > __GFP_COMP | > __GFP_NOWARN | -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by kanga.kvack.org (Postfix) with ESMTP id 0A5D56B0032 for ; Fri, 12 Jun 2015 05:25:36 -0400 (EDT) Received: by wiwd19 with SMTP id d19so12624249wiw.0 for ; Fri, 12 Jun 2015 02:25:35 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id y11si2316033wiv.114.2015.06.12.02.25.33 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 12 Jun 2015 02:25:33 -0700 (PDT) Message-ID: <557AA58A.2060207@suse.cz> Date: Fri, 12 Jun 2015 11:25:30 +0200 From: Vlastimil Babka MIME-Version: 1.0 Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Debabrata Banerjee , Chris Mason Cc: Eric Dumazet , Shaohua Li , "netdev@vger.kernel.org" , "davem@davemloft.net" , Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org, Joshua Hunt , "Banerjee, Debabrata" On 06/11/2015 11:35 PM, Debabrata Banerjee wrote: > There is no "background" it doesn't matter if this activity happens > synchronously or asynchronously, unless you're sensitive to the > latency on that single operation. If you're driving all your cpu's and > memory hard then this is work that still takes resources. If there's a > kernel thread with compaction running, then obviously your process is > not. Well that of course depends on the CPU utilization of "your process". > Your patch should help in that not every atomic allocation failure > should mean yet another run at compaction/reclaim. If you don't want to wake up kswapd, add also __GFP_NO_KSWAPD flag. Additionally, gfp_to_alloc_flags() will stop treating such allocation as atomic - it allows atomic allocations to bypass cpusets and lowers the watermark by 1/4 (unless there's also __GFP_NOMEMALLOC). It might actually make sense to add __GFP_NO_KSWAPD for an allocation like this one that has a simple order-0 fallback. Vlastimil > -Deb > > On Thu, Jun 11, 2015 at 5:16 PM, Chris Mason wrote: > >> networking is asking for 32KB, and the MM layer is doing what it can to >> provide it. Are the gains from getting 32KB contig bigger than the cost >> of moving pages around if the MM has to actually go into compaction? >> Should we start disk IO to give back 32KB contig? >> >> I think we want to tell the MM to compact in the background and give >> networking 32KB if it happens to have it available. If not, fall back >> to smaller allocations without doing anything expensive. >> > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) by kanga.kvack.org (Postfix) with ESMTP id 0867C6B0032 for ; Fri, 12 Jun 2015 05:34:21 -0400 (EDT) Received: by wibut5 with SMTP id ut5so12815419wib.1 for ; Fri, 12 Jun 2015 02:34:20 -0700 (PDT) Received: from mx2.suse.de (cantor2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id ga2si6110399wjb.135.2015.06.12.02.34.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 12 Jun 2015 02:34:19 -0700 (PDT) Message-ID: <557AA799.8000306@suse.cz> Date: Fri, 12 Jun 2015 11:34:17 +0200 From: Vlastimil Babka MIME-Version: 1.0 Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Debabrata Banerjee , Eric Dumazet Cc: Shaohua Li , "netdev@vger.kernel.org" , "davem@davemloft.net" , Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org, "Banerjee, Debabrata" , Joshua Hunt On 06/11/2015 11:28 PM, Debabrata Banerjee wrote: > Resend in plaintext, thanks gmail: > > It's somewhat an intractable problem to know if compaction will succeed > without trying it, There are heuristics, but those cannot be perfect by definition. I think the worse problem here is the extra latency, even if it does succeed, though. > and you can certainly end up in a state where memory is > heavily fragmented, even with compaction running. You can't compact kernel > pages for example, so you can end up in a state where compaction does > nothing through no fault of it's own. Correct. > In this case you waste time in compaction routines, then end up reclaiming > precious page cache pages or swapping out for whatever it is your machine > was doing trying to do to satisfy these order-3 allocations, after which all > those pages need to be restored from disk almost immediately. This is not a > happy server. That sounds like an overloaded server to me. > Any mm fix may be years away. Well, what kind of "fix"? There's no way to always avoid fragmentation without some kind of an oracle that will tell you which unmovable allocations (e.g. kernel pages) to put side by side because they will be freed at the same time. > The only simple solution I can > think of is specifically caching these allocations, in any other case under > memory pressure they will be split by other smaller allocations. In this case the allocations have simple fallback to order-0, so caching them would make sense only if someone shows that the benefits of having order-3 instead of order-0 them are worth it. > We've been forcing these allocations to order-0 internally until we can > think of something else. I think the proposed patch is better than forcing everything to order-0. It makes the attempt to allocate order-3 cheap. The VM should generally serve you better if it's told your requirements. Communicating that the order-3 allocation is just an opportunistic attempt with simple fallback is the right way. > -Deb > > >> On Thu, Jun 11, 2015 at 4:48 PM, Eric Dumazet >> wrote: >>> >>> On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: >>>> We saw excessive memory compaction triggered by skb_page_frag_refill. >>>> This causes performance issues. Commit 5640f7685831e0 introduces the >>>> order-3 allocation to improve performance. But memory compaction has >>>> high overhead. The benefit of order-3 allocation can't compensate the >>>> overhead of memory compaction. >>>> >>>> This patch makes the order-3 page allocation atomic. If there is no >>>> memory pressure and memory isn't fragmented, the alloction will still >>>> success, so we don't sacrifice the order-3 benefit here. If the atomic >>>> allocation fails, compaction will not be triggered and we will fallback >>>> to order-0 immediately. >>>> >>>> The mellanox driver does similar thing, if this is accepted, we must fix >>>> the driver too. >>>> >>>> Cc: Eric Dumazet >>>> Signed-off-by: Shaohua Li >>>> --- >>>> net/core/sock.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/net/core/sock.c b/net/core/sock.c >>>> index 292f422..e9855a4 100644 >>>> --- a/net/core/sock.c >>>> +++ b/net/core/sock.c >>>> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct >>>> page_frag *pfrag, gfp_t gfp) >>>> >>>> pfrag->offset = 0; >>>> if (SKB_FRAG_PAGE_ORDER) { >>>> - pfrag->page = alloc_pages(gfp | __GFP_COMP | >>>> + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP >>>> | >>>> __GFP_NOWARN | __GFP_NORETRY, >>>> SKB_FRAG_PAGE_ORDER); >>>> if (likely(pfrag->page)) { >>> >>> This is not a specific networking issue, but mm one. >>> >>> You really need to start a discussion with mm experts. >>> >>> Your changelog does not exactly explains what _is_ the problem. >>> >>> If the problem lies in mm layer, it might be time to fix it, instead of >>> work around the bug by never triggering it from this particular point, >>> which is a safe point where a process is willing to wait a bit. >>> >>> Memory compaction is either working as intending, or not. >>> >>> If we enabled it but never run it because it hurts, what is the point >>> enabling it ? >>> >>> >>> >>> -- >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in >>> the body to majordomo@kvack.org. For more info on Linux MM, >>> see: http://www.linux-mm.org/ . >>> Don't email: email@kvack.org >> >> > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shaohua Li Subject: [RFC] net: use atomic allocation for order-3 page allocation Date: Thu, 11 Jun 2015 13:24:31 -0700 Message-ID: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> Mime-Version: 1.0 Content-Type: text/plain Cc: , , Eric Dumazet To: Return-path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:56037 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752046AbbFKUYf (ORCPT ); Thu, 11 Jun 2015 16:24:35 -0400 Received: from pps.filterd (m0004077 [127.0.0.1]) by mx0b-00082601.pphosted.com (8.14.5/8.14.5) with SMTP id t5BKMeXC010897 for ; Thu, 11 Jun 2015 13:24:34 -0700 Received: from mail.thefacebook.com ([199.201.64.23]) by mx0b-00082601.pphosted.com with ESMTP id 1uyddk0vq0-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Thu, 11 Jun 2015 13:24:34 -0700 Received: from facebook.com (2401:db00:20:7003:face:0:4d:0) by mx-out.facebook.com (10.212.232.63) with ESMTP id de3d87f2107711e5b6730002c992ebde-30ff92a0 for ; Thu, 11 Jun 2015 13:24:32 -0700 Sender: netdev-owner@vger.kernel.org List-ID: We saw excessive memory compaction triggered by skb_page_frag_refill. This causes performance issues. Commit 5640f7685831e0 introduces the order-3 allocation to improve performance. But memory compaction has high overhead. The benefit of order-3 allocation can't compensate the overhead of memory compaction. This patch makes the order-3 page allocation atomic. If there is no memory pressure and memory isn't fragmented, the alloction will still success, so we don't sacrifice the order-3 benefit here. If the atomic allocation fails, compaction will not be triggered and we will fallback to order-0 immediately. The mellanox driver does similar thing, if this is accepted, we must fix the driver too. Cc: Eric Dumazet Signed-off-by: Shaohua Li --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index 292f422..e9855a4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) pfrag->offset = 0; if (SKB_FRAG_PAGE_ORDER) { - pfrag->page = alloc_pages(gfp | __GFP_COMP | + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY, SKB_FRAG_PAGE_ORDER); if (likely(pfrag->page)) { -- 1.8.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation Date: Thu, 11 Jun 2015 13:48:07 -0700 Message-ID: <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, davem@davemloft.net, Kernel-team@fb.com, Eric Dumazet , David Rientjes , linux-mm@kvack.org To: Shaohua Li Return-path: Received: from mail-ie0-f193.google.com ([209.85.223.193]:36001 "EHLO mail-ie0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755009AbbFKUsK (ORCPT ); Thu, 11 Jun 2015 16:48:10 -0400 Received: by ierx19 with SMTP id x19so5101042ier.3 for ; Thu, 11 Jun 2015 13:48:09 -0700 (PDT) In-Reply-To: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: > We saw excessive memory compaction triggered by skb_page_frag_refill. > This causes performance issues. Commit 5640f7685831e0 introduces the > order-3 allocation to improve performance. But memory compaction has > high overhead. The benefit of order-3 allocation can't compensate the > overhead of memory compaction. > > This patch makes the order-3 page allocation atomic. If there is no > memory pressure and memory isn't fragmented, the alloction will still > success, so we don't sacrifice the order-3 benefit here. If the atomic > allocation fails, compaction will not be triggered and we will fallback > to order-0 immediately. > > The mellanox driver does similar thing, if this is accepted, we must fix > the driver too. > > Cc: Eric Dumazet > Signed-off-by: Shaohua Li > --- > net/core/sock.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 292f422..e9855a4 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > > pfrag->offset = 0; > if (SKB_FRAG_PAGE_ORDER) { > - pfrag->page = alloc_pages(gfp | __GFP_COMP | > + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | > __GFP_NOWARN | __GFP_NORETRY, > SKB_FRAG_PAGE_ORDER); > if (likely(pfrag->page)) { This is not a specific networking issue, but mm one. You really need to start a discussion with mm experts. Your changelog does not exactly explains what _is_ the problem. If the problem lies in mm layer, it might be time to fix it, instead of work around the bug by never triggering it from this particular point, which is a safe point where a process is willing to wait a bit. Memory compaction is either working as intending, or not. If we enabled it but never run it because it hurts, what is the point enabling it ? From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation Date: Thu, 11 Jun 2015 17:16:46 -0400 Message-ID: <5579FABE.4050505@fb.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Cc: , , , Eric Dumazet , David Rientjes , To: Eric Dumazet , Shaohua Li Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:8195 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751385AbbFKVQ4 (ORCPT ); Thu, 11 Jun 2015 17:16:56 -0400 In-Reply-To: <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On 06/11/2015 04:48 PM, Eric Dumazet wrote: > On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: >> We saw excessive memory compaction triggered by skb_page_frag_refill. >> This causes performance issues. Commit 5640f7685831e0 introduces the >> order-3 allocation to improve performance. But memory compaction has >> high overhead. The benefit of order-3 allocation can't compensate the >> overhead of memory compaction. >> >> This patch makes the order-3 page allocation atomic. If there is no >> memory pressure and memory isn't fragmented, the alloction will still >> success, so we don't sacrifice the order-3 benefit here. If the atomic >> allocation fails, compaction will not be triggered and we will fallback >> to order-0 immediately. >> >> The mellanox driver does similar thing, if this is accepted, we must fix >> the driver too. >> >> Cc: Eric Dumazet >> Signed-off-by: Shaohua Li >> --- >> net/core/sock.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/net/core/sock.c b/net/core/sock.c >> index 292f422..e9855a4 100644 >> --- a/net/core/sock.c >> +++ b/net/core/sock.c >> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) >> >> pfrag->offset = 0; >> if (SKB_FRAG_PAGE_ORDER) { >> - pfrag->page = alloc_pages(gfp | __GFP_COMP | >> + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | >> __GFP_NOWARN | __GFP_NORETRY, >> SKB_FRAG_PAGE_ORDER); >> if (likely(pfrag->page)) { > > This is not a specific networking issue, but mm one. > > You really need to start a discussion with mm experts. > > Your changelog does not exactly explains what _is_ the problem. > > If the problem lies in mm layer, it might be time to fix it, instead of > work around the bug by never triggering it from this particular point, > which is a safe point where a process is willing to wait a bit. > > Memory compaction is either working as intending, or not. > > If we enabled it but never run it because it hurts, what is the point > enabling it ? networking is asking for 32KB, and the MM layer is doing what it can to provide it. Are the gains from getting 32KB contig bigger than the cost of moving pages around if the MM has to actually go into compaction? Should we start disk IO to give back 32KB contig? I think we want to tell the MM to compact in the background and give networking 32KB if it happens to have it available. If not, fall back to smaller allocations without doing anything expensive. -chris From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shaohua Li Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation Date: Thu, 11 Jun 2015 14:45:25 -0700 Message-ID: <20150611214525.GA406740@devbig257.prn2.facebook.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Chris Mason , , , , Eric Dumazet , David Rientjes , To: Eric Dumazet Return-path: Content-Disposition: inline In-Reply-To: <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org On Thu, Jun 11, 2015 at 02:22:13PM -0700, Eric Dumazet wrote: > On Thu, 2015-06-11 at 17:16 -0400, Chris Mason wrote: > > On 06/11/2015 04:48 PM, Eric Dumazet wrote: > > > On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: > > >> We saw excessive memory compaction triggered by skb_page_frag_refill. > > >> This causes performance issues. Commit 5640f7685831e0 introduces the > > >> order-3 allocation to improve performance. But memory compaction has > > >> high overhead. The benefit of order-3 allocation can't compensate the > > >> overhead of memory compaction. > > >> > > >> This patch makes the order-3 page allocation atomic. If there is no > > >> memory pressure and memory isn't fragmented, the alloction will still > > >> success, so we don't sacrifice the order-3 benefit here. If the atomic > > >> allocation fails, compaction will not be triggered and we will fallback > > >> to order-0 immediately. > > >> > > >> The mellanox driver does similar thing, if this is accepted, we must fix > > >> the driver too. > > >> > > >> Cc: Eric Dumazet > > >> Signed-off-by: Shaohua Li > > >> --- > > >> net/core/sock.c | 2 +- > > >> 1 file changed, 1 insertion(+), 1 deletion(-) > > >> > > >> diff --git a/net/core/sock.c b/net/core/sock.c > > >> index 292f422..e9855a4 100644 > > >> --- a/net/core/sock.c > > >> +++ b/net/core/sock.c > > >> @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > > >> > > >> pfrag->offset = 0; > > >> if (SKB_FRAG_PAGE_ORDER) { > > >> - pfrag->page = alloc_pages(gfp | __GFP_COMP | > > >> + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | > > >> __GFP_NOWARN | __GFP_NORETRY, > > >> SKB_FRAG_PAGE_ORDER); > > >> if (likely(pfrag->page)) { > > > > > > This is not a specific networking issue, but mm one. > > > > > > You really need to start a discussion with mm experts. > > > > > > Your changelog does not exactly explains what _is_ the problem. > > > > > > If the problem lies in mm layer, it might be time to fix it, instead of > > > work around the bug by never triggering it from this particular point, > > > which is a safe point where a process is willing to wait a bit. > > > > > > Memory compaction is either working as intending, or not. > > > > > > If we enabled it but never run it because it hurts, what is the point > > > enabling it ? > > > > networking is asking for 32KB, and the MM layer is doing what it can to > > provide it. Are the gains from getting 32KB contig bigger than the cost > > of moving pages around if the MM has to actually go into compaction? > > Should we start disk IO to give back 32KB contig? > > > > I think we want to tell the MM to compact in the background and give > > networking 32KB if it happens to have it available. If not, fall back > > to smaller allocations without doing anything expensive. > > Exactly my point. (And I mentioned this about 4 months ago) This is exactly what the patch try to do. Atomic 32k allocation will fail with memory pressure, kswapd is waken up to do compaction and we fallback to 4k. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shaohua Li Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation Date: Thu, 11 Jun 2015 15:01:15 -0700 Message-ID: <20150611220115.GA448912@devbig257.prn2.facebook.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> <20150611214525.GA406740@devbig257.prn2.facebook.com> <1434059786.27504.58.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Chris Mason , , , , Eric Dumazet , David Rientjes , To: Eric Dumazet Return-path: Content-Disposition: inline In-Reply-To: <1434059786.27504.58.camel@edumazet-glaptop2.roam.corp.google.com> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org On Thu, Jun 11, 2015 at 02:56:26PM -0700, Eric Dumazet wrote: > On Thu, 2015-06-11 at 14:45 -0700, Shaohua Li wrote: > > > This is exactly what the patch try to do. Atomic 32k allocation will > > fail with memory pressure, kswapd is waken up to do compaction and we > > fallback to 4k. > > Read your changelog, then read what you just wrote. > > Your changelog said : > > 'compaction will not be triggered and we will fallback to order-0 > immediately.' > > Now you tell me that compaction is started. > > What is the truth ? > > Please make sure changelog is precise, this would avoid many mails. Ah, ok. I mean direct compaction isn't triggered, kswapd is still waken up to do compaction. I'll update the changelog. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [RFC] net: use atomic allocation for order-3 page allocation Date: Thu, 11 Jun 2015 18:18:49 -0400 Message-ID: <557A0949.3020705@fb.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434055687.27504.51.camel@edumazet-glaptop2.roam.corp.google.com> <5579FABE.4050505@fb.com> <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Cc: Shaohua Li , , , , Eric Dumazet , David Rientjes , To: Eric Dumazet Return-path: In-Reply-To: <1434057733.27504.52.camel@edumazet-glaptop2.roam.corp.google.com> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org On 06/11/2015 05:22 PM, Eric Dumazet wrote: > On Thu, 2015-06-11 at 17:16 -0400, Chris Mason wrote: >> On 06/11/2015 04:48 PM, Eric Dumazet wrote: >> >> networking is asking for 32KB, and the MM layer is doing what it can to >> provide it. Are the gains from getting 32KB contig bigger than the cost >> of moving pages around if the MM has to actually go into compaction? >> Should we start disk IO to give back 32KB contig? >> >> I think we want to tell the MM to compact in the background and give >> networking 32KB if it happens to have it available. If not, fall back >> to smaller allocations without doing anything expensive. > > Exactly my point. (And I mentioned this about 4 months ago) Sorry, reading this again I wasn't very clear. I agree with Shaohua's patch because it is telling the allocator that we don't want to wait for reclaim or compaction to find contiguous pages. But, is there any fallback to a single page allocation somewhere else? If this is the only way to get memory, we might want to add a single alloc_page path that won't trigger compaction but is at least able to wait for kswapd to make progress. -chris -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shaohua Li Subject: Re: [RFC v2] net: use atomic allocation for order-3 page allocation Date: Thu, 11 Jun 2015 16:32:35 -0700 Message-ID: <20150611233235.GA667489@devbig257.prn2.facebook.com> References: <71a20cf185c485fa23d9347bd846a6f4e9753405.1434053941.git.shli@fb.com> <1434063184.27504.60.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: , , , , , , Eric Dumazet To: Eric Dumazet Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:46634 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752257AbbFKXcr (ORCPT ); Thu, 11 Jun 2015 19:32:47 -0400 Content-Disposition: inline In-Reply-To: <1434063184.27504.60.camel@edumazet-glaptop2.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Jun 11, 2015 at 03:53:04PM -0700, Eric Dumazet wrote: > On Thu, 2015-06-11 at 15:27 -0700, Shaohua Li wrote: > > We saw excessive direct memory compaction triggered by skb_page_frag_refill. > > This causes performance issues and add latency. Commit 5640f7685831e0 > > introduces the order-3 allocation. According to the changelog, the order-3 > > allocation isn't a must-have but to improve performance. But direct memory > > compaction has high overhead. The benefit of order-3 allocation can't > > compensate the overhead of direct memory compaction. > > > > This patch makes the order-3 page allocation atomic. If there is no memory > > pressure and memory isn't fragmented, the alloction will still success, so we > > don't sacrifice the order-3 benefit here. If the atomic allocation fails, > > direct memory compaction will not be triggered, skb_page_frag_refill will > > fallback to order-0 immediately, hence the direct memory compaction overhead is > > avoided. In the allocation failure case, kswapd is waken up and doing > > compaction, so chances are allocation could success next time. > > > > The mellanox driver does similar thing, if this is accepted, we must fix > > the driver too. > > > > V2: make the changelog clearer > > > > Cc: Eric Dumazet > > Signed-off-by: Shaohua Li > > --- > > net/core/sock.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 292f422..e9855a4 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > > > > pfrag->offset = 0; > > if (SKB_FRAG_PAGE_ORDER) { > > - pfrag->page = alloc_pages(gfp | __GFP_COMP | > > + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | > > __GFP_NOWARN | __GFP_NORETRY, > > SKB_FRAG_PAGE_ORDER); > > if (likely(pfrag->page)) { > > > OK, now what about alloc_skb_with_frags() ? > > This should have same problem right ? Ok, looks similar, added. Didn't trigger this one though. >>From 940dde18f7f655377a4c30d5de54c9eff15ab5a5 Mon Sep 17 00:00:00 2001 Message-Id: <940dde18f7f655377a4c30d5de54c9eff15ab5a5.1434065353.git.shli@fb.com> From: Shaohua Li Date: Thu, 11 Jun 2015 16:16:21 -0700 Subject: [RFC] net: use atomic allocation for order-3 page allocation We saw excessive direct memory compaction triggered by skb_page_frag_refill. This causes performance issues and add latency. Commit 5640f7685831e0 introduces the order-3 allocation. According to the changelog, the order-3 allocation isn't a must-have but to improve performance. But direct memory compaction has high overhead. The benefit of order-3 allocation can't compensate the overhead of direct memory compaction. This patch makes the order-3 page allocation atomic. If there is no memory pressure and memory isn't fragmented, the alloction will still success, so we don't sacrifice the order-3 benefit here. If the atomic allocation fails, direct memory compaction will not be triggered, skb_page_frag_refill will fallback to order-0 immediately, hence the direct memory compaction overhead is avoided. In the allocation failure case, kswapd is waken up and doing compaction, so chances are allocation could success next time. alloc_skb_with_frags is the same. The mellanox driver does similar thing, if this is accepted, we must fix the driver too. V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric V2: make the changelog clearer Cc: Eric Dumazet Cc: Chris Mason Cc: Debabrata Banerjee Signed-off-by: Shaohua Li --- net/core/skbuff.c | 4 +++- net/core/sock.c | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3cfff2a..9856c7a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4398,7 +4398,9 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len, while (order) { if (npages >= 1 << order) { - page = alloc_pages(gfp_mask | + gfp_t gfp = order > 0 ? + gfp_mask & ~__GFP_WAIT : gfp_mask; + page = alloc_pages(gfp | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY, diff --git a/net/core/sock.c b/net/core/sock.c index 292f422..e9855a4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) pfrag->offset = 0; if (SKB_FRAG_PAGE_ORDER) { - pfrag->page = alloc_pages(gfp | __GFP_COMP | + pfrag->page = alloc_pages((gfp & ~__GFP_WAIT) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY, SKB_FRAG_PAGE_ORDER); if (likely(pfrag->page)) { -- 1.8.1