* [PATCH] skb: Propagate pfmemalloc on skb from head page only
@ 2013-03-14 13:29 Pavel Emelyanov
2013-03-14 14:16 ` Eric Dumazet
2013-03-14 14:28 ` Mel Gorman
0 siblings, 2 replies; 7+ messages in thread
From: Pavel Emelyanov @ 2013-03-14 13:29 UTC (permalink / raw)
To: David Miller, Mel Gorman, Eric Dumazet, Linux Netdev List, stable
Cc: Alexey Kuznetsov
Hi.
I'm trying to send big chunks of memory from application address space via
TCP socket using vmsplice + splice like this
mem = mmap(128Mb);
vmsplice(pipe[1], mem); /* splice memory into pipe */
splice(pipe[0], tcp_socket); /* send it into network */
When I'm lucky and a huge page splices into the pipe and then into the socket
_and_ client and server ends of the TCP connection are on the same host,
communicating via lo, the whole connection gets stuck! The sending queue
becomes full and app stops writing/splicing more into it, but the receiving
queue remains empty, and that's why.
The __skb_fill_page_desc observes a tail page of a huge page and erroneously
propagates its page->pfmemalloc value onto socket (the pfmemalloc on tail pages
contain garbage). Then this skb->pfmemalloc leaks through lo and due to the
tcp_v4_rcv
sk_filter
if (skb->pfmemalloc && !sock_flag(sk, SOCK_MEMALLOC)) /* true */
return -ENOMEM
goto release_and_discard;
no packets reach the socket. Even TCP re-transmits are dropped by this, as skb
cloning clones the pfmemalloc flag as well.
That said, here's the proper page->pfmemalloc propagation onto socket: we
must check the huge-page's head page only, other pages' pfmemalloc and mapping
values do not contain what is expected in this place. However, I'm not sure
whether this fix is _complete_, since pfmemalloc propagation via lo also
oesn't look great.
Both, bit propagation from page to skb and this check in sk_filter, were
introduced by c48a11c7 (netvm: propagate page->pfmemalloc to skb), in v3.5 so
Mel and stable@ are in Cc.
Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
---
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index eb2106f..4e525eb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1292,11 +1292,13 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
* do not lose pfmemalloc information as the pages would not be
* allocated using __GFP_MEMALLOC.
*/
- if (page->pfmemalloc && !page->mapping)
- skb->pfmemalloc = true;
frag->page.p = page;
frag->page_offset = off;
skb_frag_size_set(frag, size);
+
+ page = compound_head(page);
+ if (page->pfmemalloc && !page->mapping)
+ skb->pfmemalloc = true;
}
/**
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] skb: Propagate pfmemalloc on skb from head page only
2013-03-14 13:29 [PATCH] skb: Propagate pfmemalloc on skb from head page only Pavel Emelyanov
@ 2013-03-14 14:16 ` Eric Dumazet
2013-03-14 14:23 ` Pavel Emelyanov
2013-03-14 15:54 ` David Miller
2013-03-14 14:28 ` Mel Gorman
1 sibling, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2013-03-14 14:16 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: David Miller, Mel Gorman, Linux Netdev List, stable,
Alexey Kuznetsov
On Thu, 2013-03-14 at 17:29 +0400, Pavel Emelyanov wrote:
> Hi.
>
> I'm trying to send big chunks of memory from application address space via
> TCP socket using vmsplice + splice like this
>
> mem = mmap(128Mb);
> vmsplice(pipe[1], mem); /* splice memory into pipe */
> splice(pipe[0], tcp_socket); /* send it into network */
>
> When I'm lucky and a huge page splices into the pipe and then into the socket
> _and_ client and server ends of the TCP connection are on the same host,
> communicating via lo, the whole connection gets stuck! The sending queue
> becomes full and app stops writing/splicing more into it, but the receiving
> queue remains empty, and that's why.
>
> The __skb_fill_page_desc observes a tail page of a huge page and erroneously
> propagates its page->pfmemalloc value onto socket (the pfmemalloc on tail pages
> contain garbage). Then this skb->pfmemalloc leaks through lo and due to the
>
> tcp_v4_rcv
> sk_filter
> if (skb->pfmemalloc && !sock_flag(sk, SOCK_MEMALLOC)) /* true */
> return -ENOMEM
> goto release_and_discard;
>
> no packets reach the socket. Even TCP re-transmits are dropped by this, as skb
> cloning clones the pfmemalloc flag as well.
>
> That said, here's the proper page->pfmemalloc propagation onto socket: we
> must check the huge-page's head page only, other pages' pfmemalloc and mapping
> values do not contain what is expected in this place. However, I'm not sure
> whether this fix is _complete_, since pfmemalloc propagation via lo also
> oesn't look great.
>
> Both, bit propagation from page to skb and this check in sk_filter, were
> introduced by c48a11c7 (netvm: propagate page->pfmemalloc to skb), in v3.5 so
> Mel and stable@ are in Cc.
>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>
> ---
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index eb2106f..4e525eb 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1292,11 +1292,13 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
> * do not lose pfmemalloc information as the pages would not be
> * allocated using __GFP_MEMALLOC.
> */
> - if (page->pfmemalloc && !page->mapping)
> - skb->pfmemalloc = true;
> frag->page.p = page;
> frag->page_offset = off;
> skb_frag_size_set(frag, size);
> +
> + page = compound_head(page);
> + if (page->pfmemalloc && !page->mapping)
> + skb->pfmemalloc = true;
> }
>
> /**
> --
This looks a nice finding.
Note this can trigger even without vmsplice() use but regular network
receive.
Acked-by: Eric Dumazet <edumazet@google.com>
When I discussed with David on this issue, I said that one possibility
would be to accept a pfmemalloc skb on regular skb if no other packet is
in a receive queue, to get a chance to make progress (and limit memory
consumption to no more than one skb per TCP socket)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] skb: Propagate pfmemalloc on skb from head page only
2013-03-14 14:16 ` Eric Dumazet
@ 2013-03-14 14:23 ` Pavel Emelyanov
2013-03-14 14:34 ` Eric Dumazet
2013-03-14 15:54 ` David Miller
1 sibling, 1 reply; 7+ messages in thread
From: Pavel Emelyanov @ 2013-03-14 14:23 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Mel Gorman, Linux Netdev List, stable,
Alexey Kuznetsov
>> That said, here's the proper page->pfmemalloc propagation onto socket: we
>> must check the huge-page's head page only, other pages' pfmemalloc and mapping
>> values do not contain what is expected in this place. However, I'm not sure
>> whether this fix is _complete_, since pfmemalloc propagation via lo also
>> oesn't look great.
>>
>> Both, bit propagation from page to skb and this check in sk_filter, were
>> introduced by c48a11c7 (netvm: propagate page->pfmemalloc to skb), in v3.5 so
>> Mel and stable@ are in Cc.
>>
>> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>>
>> ---
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index eb2106f..4e525eb 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -1292,11 +1292,13 @@ static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,
>> * do not lose pfmemalloc information as the pages would not be
>> * allocated using __GFP_MEMALLOC.
>> */
>> - if (page->pfmemalloc && !page->mapping)
>> - skb->pfmemalloc = true;
>> frag->page.p = page;
>> frag->page_offset = off;
>> skb_frag_size_set(frag, size);
>> +
>> + page = compound_head(page);
>> + if (page->pfmemalloc && !page->mapping)
>> + skb->pfmemalloc = true;
>> }
>>
>> /**
>> --
>
> This looks a nice finding.
>
> Note this can trigger even without vmsplice() use but regular network
> receive.
Presumably you're right, but I don't understand how :( In order to trigger
this, we should have a huge page, that gets linked to an skb _before_ it
enters the TCP receive path. How can this happen when doing sendmsg/recvmsg?
> Acked-by: Eric Dumazet <edumazet@google.com>
>
> When I discussed with David on this issue, I said that one possibility
> would be to accept a pfmemalloc skb on regular skb if no other packet is
> in a receive queue, to get a chance to make progress (and limit memory
> consumption to no more than one skb per TCP socket)
Thanks,
Pavel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] skb: Propagate pfmemalloc on skb from head page only
2013-03-14 14:23 ` Pavel Emelyanov
@ 2013-03-14 14:34 ` Eric Dumazet
2013-03-14 14:36 ` Pavel Emelyanov
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2013-03-14 14:34 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: David Miller, Mel Gorman, Linux Netdev List, stable,
Alexey Kuznetsov
On Thu, 2013-03-14 at 18:23 +0400, Pavel Emelyanov wrote:
> Presumably you're right, but I don't understand how :( In order to trigger
> this, we should have a huge page, that gets linked to an skb _before_ it
> enters the TCP receive path. How can this happen when doing sendmsg/recvmsg?
Not only huge pages.
network now uses order-3 pages in both transmit and receive paths.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] skb: Propagate pfmemalloc on skb from head page only
2013-03-14 14:34 ` Eric Dumazet
@ 2013-03-14 14:36 ` Pavel Emelyanov
0 siblings, 0 replies; 7+ messages in thread
From: Pavel Emelyanov @ 2013-03-14 14:36 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Miller, Mel Gorman, Linux Netdev List, stable,
Alexey Kuznetsov
On 03/14/2013 06:34 PM, Eric Dumazet wrote:
> On Thu, 2013-03-14 at 18:23 +0400, Pavel Emelyanov wrote:
>
>> Presumably you're right, but I don't understand how :( In order to trigger
>> this, we should have a huge page, that gets linked to an skb _before_ it
>> enters the TCP receive path. How can this happen when doing sendmsg/recvmsg?
>
> Not only huge pages.
>
> network now uses order-3 pages in both transmit and receive paths.
Ah, indeed! Thanks, Eric.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] skb: Propagate pfmemalloc on skb from head page only
2013-03-14 14:16 ` Eric Dumazet
2013-03-14 14:23 ` Pavel Emelyanov
@ 2013-03-14 15:54 ` David Miller
1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2013-03-14 15:54 UTC (permalink / raw)
To: eric.dumazet; +Cc: xemul, mgorman, netdev, stable, kuznet
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 14 Mar 2013 15:16:09 +0100
> On Thu, 2013-03-14 at 17:29 +0400, Pavel Emelyanov wrote:
>> Hi.
>>
>> I'm trying to send big chunks of memory from application address space via
>> TCP socket using vmsplice + splice like this
>>
>> mem = mmap(128Mb);
>> vmsplice(pipe[1], mem); /* splice memory into pipe */
>> splice(pipe[0], tcp_socket); /* send it into network */
>>
>> When I'm lucky and a huge page splices into the pipe and then into the socket
>> _and_ client and server ends of the TCP connection are on the same host,
>> communicating via lo, the whole connection gets stuck! The sending queue
>> becomes full and app stops writing/splicing more into it, but the receiving
>> queue remains empty, and that's why.
>>
>> The __skb_fill_page_desc observes a tail page of a huge page and erroneously
>> propagates its page->pfmemalloc value onto socket (the pfmemalloc on tail pages
>> contain garbage). Then this skb->pfmemalloc leaks through lo and due to the
>>
>> tcp_v4_rcv
>> sk_filter
>> if (skb->pfmemalloc && !sock_flag(sk, SOCK_MEMALLOC)) /* true */
>> return -ENOMEM
>> goto release_and_discard;
>>
>> no packets reach the socket. Even TCP re-transmits are dropped by this, as skb
>> cloning clones the pfmemalloc flag as well.
>>
>> That said, here's the proper page->pfmemalloc propagation onto socket: we
>> must check the huge-page's head page only, other pages' pfmemalloc and mapping
>> values do not contain what is expected in this place. However, I'm not sure
>> whether this fix is _complete_, since pfmemalloc propagation via lo also
>> oesn't look great.
>>
>> Both, bit propagation from page to skb and this check in sk_filter, were
>> introduced by c48a11c7 (netvm: propagate page->pfmemalloc to skb), in v3.5 so
>> Mel and stable@ are in Cc.
>>
>> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
...
> Acked-by: Eric Dumazet <edumazet@google.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] skb: Propagate pfmemalloc on skb from head page only
2013-03-14 13:29 [PATCH] skb: Propagate pfmemalloc on skb from head page only Pavel Emelyanov
2013-03-14 14:16 ` Eric Dumazet
@ 2013-03-14 14:28 ` Mel Gorman
1 sibling, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2013-03-14 14:28 UTC (permalink / raw)
To: Pavel Emelyanov
Cc: David Miller, Eric Dumazet, Linux Netdev List, stable,
Alexey Kuznetsov
On Thu, Mar 14, 2013 at 05:29:40PM +0400, Pavel Emelyanov wrote:
> Hi.
>
> I'm trying to send big chunks of memory from application address space via
> TCP socket using vmsplice + splice like this
>
> mem = mmap(128Mb);
> vmsplice(pipe[1], mem); /* splice memory into pipe */
> splice(pipe[0], tcp_socket); /* send it into network */
>
> When I'm lucky and a huge page splices into the pipe and then into the socket
> _and_ client and server ends of the TCP connection are on the same host,
> communicating via lo, the whole connection gets stuck! The sending queue
> becomes full and app stops writing/splicing more into it, but the receiving
> queue remains empty, and that's why.
>
> The __skb_fill_page_desc observes a tail page of a huge page and erroneously
> propagates its page->pfmemalloc value onto socket (the pfmemalloc on tail pages
> contain garbage). Then this skb->pfmemalloc leaks through lo and due to the
>
> tcp_v4_rcv
> sk_filter
> if (skb->pfmemalloc && !sock_flag(sk, SOCK_MEMALLOC)) /* true */
> return -ENOMEM
> goto release_and_discard;
>
> no packets reach the socket. Even TCP re-transmits are dropped by this, as skb
> cloning clones the pfmemalloc flag as well.
>
> That said, here's the proper page->pfmemalloc propagation onto socket: we
> must check the huge-page's head page only, other pages' pfmemalloc and mapping
> values do not contain what is expected in this place. However, I'm not sure
> whether this fix is _complete_, since pfmemalloc propagation via lo also
> oesn't look great.
>
> Both, bit propagation from page to skb and this check in sk_filter, were
> introduced by c48a11c7 (netvm: propagate page->pfmemalloc to skb), in v3.5 so
> Mel and stable@ are in Cc.
>
> Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
>
Acked-by: Mel Gorman <mgorman@suse.de>
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-03-14 15:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-14 13:29 [PATCH] skb: Propagate pfmemalloc on skb from head page only Pavel Emelyanov
2013-03-14 14:16 ` Eric Dumazet
2013-03-14 14:23 ` Pavel Emelyanov
2013-03-14 14:34 ` Eric Dumazet
2013-03-14 14:36 ` Pavel Emelyanov
2013-03-14 15:54 ` David Miller
2013-03-14 14:28 ` Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).