* [PATCH] phonet: Check input from user before allocating @ 2012-04-02 20:31 Sasha Levin 2012-04-02 19:00 ` Rémi Denis-Courmont 2012-04-02 19:01 ` Rémi Denis-Courmont 0 siblings, 2 replies; 19+ messages in thread From: Sasha Levin @ 2012-04-02 20:31 UTC (permalink / raw) To: remi.denis-courmont, davem; +Cc: davej, netdev, linux-kernel, Sasha Levin A phonet packet is limited to USHRT_MAX bytes, this is never checked during tx which means that the user can specify any size he wishes, and the kernel will attempt to allocate that size. In the good case, it'll lead to the following warning, but it may also cause the kernel to kick in the OOM and kill a random task on the server. [ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730() [ 8921.749770] Pid: 5081, comm: trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46 [ 8921.756672] Call Trace: [ 8921.758185] [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0 [ 8921.762868] [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20 [ 8921.765399] [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730 [ 8921.769226] [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20 [ 8921.771686] [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660 [ 8921.773919] [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240 [ 8921.776248] [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0 [ 8921.778294] [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0 [ 8921.780847] [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260 [ 8921.783179] [<ffffffff821b3c65>] __alloc_skb+0x75/0x170 [ 8921.784971] [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260 [ 8921.787111] [<ffffffff821b002e>] ? release_sock+0x7e/0x90 [ 8921.788973] [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20 [ 8921.791052] [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380 [ 8921.792931] [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180 [ 8921.794917] [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90 [ 8921.797053] [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70 [ 8921.798992] [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0 [ 8921.801395] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0 [ 8921.803501] [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0 [ 8921.805505] [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140 [ 8921.807860] [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110 [ 8921.809986] [<ffffffff811958e7>] ? might_fault+0x97/0xa0 [ 8921.811998] [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90 [ 8921.814595] [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0 [ 8921.816702] [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200 [ 8921.818819] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50 [ 8921.820863] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0 [ 8921.823318] [<ffffffff811e1926>] vfs_writev+0x46/0x60 [ 8921.825219] [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0 [ 8921.827127] [<ffffffff82658039>] system_call_fastpath+0x16/0x1b [ 8921.829384] ---[ end trace dffe390f30db9eb7 ]--- Signed-off-by: Sasha Levin <levinsasha928@gmail.com> --- net/phonet/pep.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/net/phonet/pep.c b/net/phonet/pep.c index 9f60008..caee99e 100644 --- a/net/phonet/pep.c +++ b/net/phonet/pep.c @@ -1130,6 +1130,9 @@ static int pep_sendmsg(struct kiocb *iocb, struct sock *sk, int flags = msg->msg_flags; int err, done; + if (len > USHRT_MAX) + return -E2BIG; + if ((msg->msg_flags & ~(MSG_DONTWAIT|MSG_EOR|MSG_NOSIGNAL| MSG_CMSG_COMPAT)) || !(msg->msg_flags & MSG_EOR)) -- 1.7.8.5 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-02 20:31 [PATCH] phonet: Check input from user before allocating Sasha Levin @ 2012-04-02 19:00 ` Rémi Denis-Courmont 2012-04-02 21:38 ` David Miller 2012-04-02 19:01 ` Rémi Denis-Courmont 1 sibling, 1 reply; 19+ messages in thread From: Rémi Denis-Courmont @ 2012-04-02 19:00 UTC (permalink / raw) To: Sasha Levin; +Cc: remi.denis-courmont, davem, davej, netdev, linux-kernel Le lundi 2 avril 2012 23:31:00 Sasha Levin, vous avez écrit : > A phonet packet is limited to USHRT_MAX bytes, this is never checked during > tx which means that the user can specify any size he wishes, and the kernel > will attempt to allocate that size. > > In the good case, it'll lead to the following warning, but it may also > cause the kernel to kick in the OOM and kill a random task on the server. > > [ 8921.744094] WARNING: at mm/page_alloc.c:2255 > __alloc_pages_slowpath+0x65/0x730() [ 8921.749770] Pid: 5081, comm: > trinity Tainted: G W 3.4.0-rc1-next-20120402-sasha #46 [ > 8921.756672] Call Trace: > [ 8921.758185] [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0 > [ 8921.762868] [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20 > [ 8921.765399] [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730 > [ 8921.769226] [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20 > [ 8921.771686] [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660 > [ 8921.773919] [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240 > [ 8921.776248] [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0 > [ 8921.778294] [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0 > [ 8921.780847] [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260 > [ 8921.783179] [<ffffffff821b3c65>] __alloc_skb+0x75/0x170 > [ 8921.784971] [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260 > [ 8921.787111] [<ffffffff821b002e>] ? release_sock+0x7e/0x90 > [ 8921.788973] [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20 > [ 8921.791052] [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380 > [ 8921.792931] [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180 > [ 8921.794917] [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90 > [ 8921.797053] [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70 > [ 8921.798992] [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0 > [ 8921.801395] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0 > [ 8921.803501] [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0 > [ 8921.805505] [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140 > [ 8921.807860] [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110 > [ 8921.809986] [<ffffffff811958e7>] ? might_fault+0x97/0xa0 > [ 8921.811998] [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90 > [ 8921.814595] [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0 > [ 8921.816702] [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200 > [ 8921.818819] [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50 > [ 8921.820863] [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0 > [ 8921.823318] [<ffffffff811e1926>] vfs_writev+0x46/0x60 > [ 8921.825219] [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0 > [ 8921.827127] [<ffffffff82658039>] system_call_fastpath+0x16/0x1b > [ 8921.829384] ---[ end trace dffe390f30db9eb7 ]--- > > Signed-off-by: Sasha Levin <levinsasha928@gmail.com> > --- > net/phonet/pep.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/net/phonet/pep.c b/net/phonet/pep.c > index 9f60008..caee99e 100644 > --- a/net/phonet/pep.c > +++ b/net/phonet/pep.c > @@ -1130,6 +1130,9 @@ static int pep_sendmsg(struct kiocb *iocb, struct > sock *sk, int flags = msg->msg_flags; > int err, done; > > + if (len > USHRT_MAX) > + return -E2BIG; I think EMSGSIZE is specified in that case. > + > if ((msg->msg_flags & ~(MSG_DONTWAIT|MSG_EOR|MSG_NOSIGNAL| > MSG_CMSG_COMPAT)) || > !(msg->msg_flags & MSG_EOR)) -- Rémi Denis-Courmont http://www.remlab.net/ http://fi.linkedin.com/in/remidenis ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-02 19:00 ` Rémi Denis-Courmont @ 2012-04-02 21:38 ` David Miller 0 siblings, 0 replies; 19+ messages in thread From: David Miller @ 2012-04-02 21:38 UTC (permalink / raw) To: remi; +Cc: levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel From: "Rémi Denis-Courmont" <remi@remlab.net> Date: Mon, 2 Apr 2012 22:00:40 +0300 > Le lundi 2 avril 2012 23:31:00 Sasha Levin, vous avez écrit : >> + if (len > USHRT_MAX) >> + return -E2BIG; > > I think EMSGSIZE is specified in that case. Agreed. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-02 20:31 [PATCH] phonet: Check input from user before allocating Sasha Levin 2012-04-02 19:00 ` Rémi Denis-Courmont @ 2012-04-02 19:01 ` Rémi Denis-Courmont 2012-04-02 21:40 ` David Miller 1 sibling, 1 reply; 19+ messages in thread From: Rémi Denis-Courmont @ 2012-04-02 19:01 UTC (permalink / raw) To: Sasha Levin; +Cc: remi.denis-courmont, davem, davej, netdev, linux-kernel Le lundi 2 avril 2012 23:31:00 Sasha Levin, vous avez écrit : > A phonet packet is limited to USHRT_MAX bytes, this is never checked during > tx which means that the user can specify any size he wishes, and the kernel > will attempt to allocate that size. Does this really solve the problem? I guess 128kb is still possible with USHRT_MAX plus skbuff overhead, which might still trigger OOM relatively easily once the memory gets sufficiently fragmented. How does UDP deal with this? -- Rémi Denis-Courmont http://www.remlab.net/ http://fi.linkedin.com/in/remidenis ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-02 19:01 ` Rémi Denis-Courmont @ 2012-04-02 21:40 ` David Miller 2012-04-03 1:53 ` Eric Dumazet 2012-04-03 6:36 ` [PATCH] phonet: Check input from user before allocating Rémi Denis-Courmont 0 siblings, 2 replies; 19+ messages in thread From: David Miller @ 2012-04-02 21:40 UTC (permalink / raw) To: remi; +Cc: levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel From: "Rémi Denis-Courmont" <remi@remlab.net> Date: Mon, 2 Apr 2012 22:01:40 +0300 > Le lundi 2 avril 2012 23:31:00 Sasha Levin, vous avez écrit : >> A phonet packet is limited to USHRT_MAX bytes, this is never checked during >> tx which means that the user can specify any size he wishes, and the kernel >> will attempt to allocate that size. > > Does this really solve the problem? I guess 128kb is still possible with > USHRT_MAX plus skbuff overhead, which might still trigger OOM relatively > easily once the memory gets sufficiently fragmented. > > How does UDP deal with this? UDP generates a fragment list of MTU sized SKBs. Phonet could avoid the large allocations by building page based SKBs. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-02 21:40 ` David Miller @ 2012-04-03 1:53 ` Eric Dumazet 2012-04-03 1:59 ` David Miller 2012-04-03 6:36 ` [PATCH] phonet: Check input from user before allocating Rémi Denis-Courmont 1 sibling, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2012-04-03 1:53 UTC (permalink / raw) To: David Miller Cc: remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel On Mon, 2012-04-02 at 17:40 -0400, David Miller wrote: > From: "Rémi Denis-Courmont" <remi@remlab.net> > Date: Mon, 2 Apr 2012 22:01:40 +0300 > > > Le lundi 2 avril 2012 23:31:00 Sasha Levin, vous avez écrit : > >> A phonet packet is limited to USHRT_MAX bytes, this is never checked during > >> tx which means that the user can specify any size he wishes, and the kernel > >> will attempt to allocate that size. > > > > Does this really solve the problem? I guess 128kb is still possible with > > USHRT_MAX plus skbuff overhead, which might still trigger OOM relatively > > easily once the memory gets sufficiently fragmented. > > > > How does UDP deal with this? > > UDP generates a fragment list of MTU sized SKBs. > > Phonet could avoid the large allocations by building page based > SKBs. Not that AF_UNIX does nothing in this respect, it can use order-XX pages for large datagrams. (I beleve I sent a patch some time ago to address this point) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 1:53 ` Eric Dumazet @ 2012-04-03 1:59 ` David Miller 2012-04-03 2:15 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: David Miller @ 2012-04-03 1:59 UTC (permalink / raw) To: eric.dumazet Cc: remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 03 Apr 2012 03:53:17 +0200 > Not that AF_UNIX does nothing in this respect, it can use order-XX pages > for large datagrams. > > (I beleve I sent a patch some time ago to address this point) Yes, on the datagram side it's a problem. For stream AF_UNIX sockets the allocation is capped at SKB_MAX_ALLOC which evaluates to an order 2 page. Overall, AF_UNIX ought to be easy to deal with since all of the routines that copy data between userspace and SKBs can handle segmented SKBs and thus most of the work is converting over to sock_alloc_send_pskb() and setting data_len how we set the normal length of sock_alloc_skb_skb() currently. Anyways, feel free to resubmit your patch. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 1:59 ` David Miller @ 2012-04-03 2:15 ` Eric Dumazet 2012-04-03 2:23 ` David Miller 2012-04-03 15:28 ` [PATCH v2 net-next] af_unix: reduce high order page allocations Eric Dumazet 0 siblings, 2 replies; 19+ messages in thread From: Eric Dumazet @ 2012-04-03 2:15 UTC (permalink / raw) To: David Miller Cc: remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel On Mon, 2012-04-02 at 21:59 -0400, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 03 Apr 2012 03:53:17 +0200 > > > Not that AF_UNIX does nothing in this respect, it can use order-XX pages > > for large datagrams. > > > > (I beleve I sent a patch some time ago to address this point) > > Yes, on the datagram side it's a problem. > > For stream AF_UNIX sockets the allocation is capped at SKB_MAX_ALLOC > which evaluates to an order 2 page. > > Overall, AF_UNIX ought to be easy to deal with since all of the > routines that copy data between userspace and SKBs can handle > segmented SKBs and thus most of the work is converting over to > sock_alloc_send_pskb() and setting data_len how we set the normal > length of sock_alloc_skb_skb() currently. > > Anyways, feel free to resubmit your patch. This was indeed a basic patch, but it probably can lower raw performance on some apps, (if memory frag is not an issue) so I need to bench it. Any idea of a representative benchmark in dgram af_unix ? http://patchwork.ozlabs.org/patch/114103/ I'll respin it with proper performance resuts. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 2:15 ` Eric Dumazet @ 2012-04-03 2:23 ` David Miller 2012-04-03 2:29 ` Eric Dumazet 2012-04-03 2:29 ` Rick Jones 2012-04-03 15:28 ` [PATCH v2 net-next] af_unix: reduce high order page allocations Eric Dumazet 1 sibling, 2 replies; 19+ messages in thread From: David Miller @ 2012-04-03 2:23 UTC (permalink / raw) To: eric.dumazet Cc: remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 03 Apr 2012 04:15:04 +0200 > Any idea of a representative benchmark in dgram af_unix ? Maybe you could try to make bw_unix from lmbench use SOCK_DGRAM? :-) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 2:23 ` David Miller @ 2012-04-03 2:29 ` Eric Dumazet 2012-04-03 2:29 ` Rick Jones 1 sibling, 0 replies; 19+ messages in thread From: Eric Dumazet @ 2012-04-03 2:29 UTC (permalink / raw) To: David Miller Cc: remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel On Mon, 2012-04-02 at 22:23 -0400, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Tue, 03 Apr 2012 04:15:04 +0200 > > > Any idea of a representative benchmark in dgram af_unix ? > > Maybe you could try to make bw_unix from lmbench use > SOCK_DGRAM? :-) > Yes, I also converted hackbench to SOCK_DGRAM and change its DATASIZE from 100 to 10000 bytes per message. Thanks ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 2:23 ` David Miller 2012-04-03 2:29 ` Eric Dumazet @ 2012-04-03 2:29 ` Rick Jones 2012-04-03 2:34 ` Eric Dumazet 1 sibling, 1 reply; 19+ messages in thread From: Rick Jones @ 2012-04-03 2:29 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel On 04/02/2012 07:23 PM, David Miller wrote: > From: Eric Dumazet<eric.dumazet@gmail.com> > Date: Tue, 03 Apr 2012 04:15:04 +0200 > >> Any idea of a representative benchmark in dgram af_unix ? > > Maybe you could try to make bw_unix from lmbench use > SOCK_DGRAM? :-) I don't know it isn't entirely bitrotted, but there are streaming and datagram AF_UNIX tests in netperf - they require conditional inclusion via ./configure --enable-unixdomain: http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#DG_005fSTREAM happy benchmarking, rick jones ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 2:29 ` Rick Jones @ 2012-04-03 2:34 ` Eric Dumazet 2012-04-03 2:39 ` Rick Jones 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2012-04-03 2:34 UTC (permalink / raw) To: Rick Jones Cc: David Miller, remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel On Mon, 2012-04-02 at 19:29 -0700, Rick Jones wrote: > On 04/02/2012 07:23 PM, David Miller wrote: > > From: Eric Dumazet<eric.dumazet@gmail.com> > > Date: Tue, 03 Apr 2012 04:15:04 +0200 > > > >> Any idea of a representative benchmark in dgram af_unix ? > > > > Maybe you could try to make bw_unix from lmbench use > > SOCK_DGRAM? :-) > > I don't know it isn't entirely bitrotted, but there are streaming and > datagram AF_UNIX tests in netperf - they require conditional inclusion > via ./configure --enable-unixdomain: > > http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#DG_005fSTREAM > Ah yes of course, I'll try that. BTW, do you have plans to support vmsplice()/splice() as a way to provide 0-copy to TCP_STREAM ? I ask this because I am currently working on fixing splice(pipe -> socket) performance problem and need a standard benchmark to publish performance results. Thanks ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 2:34 ` Eric Dumazet @ 2012-04-03 2:39 ` Rick Jones 2012-04-03 3:14 ` Eric Dumazet 0 siblings, 1 reply; 19+ messages in thread From: Rick Jones @ 2012-04-03 2:39 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel On 04/02/2012 07:34 PM, Eric Dumazet wrote: > On Mon, 2012-04-02 at 19:29 -0700, Rick Jones wrote: >> I don't know it isn't entirely bitrotted, but there are streaming and >> datagram AF_UNIX tests in netperf - they require conditional inclusion >> via ./configure --enable-unixdomain: >> >> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#DG_005fSTREAM >> > > Ah yes of course, I'll try that. > > BTW, do you have plans to support vmsplice()/splice() as a way to > provide 0-copy to TCP_STREAM ? I'd probably plead "too platform specific" but I've already got some platform-specific stuff in there like TCP_INFO so I probably cannot hide behind that. Particularly if I ever do make "native" WSA calls for Windows... Until now I'd not thought of vmsplice/splice though - only sendfile for zero copy, which should be there for TCP in the form of the TCP_SENDFILE test (unmigrated, so none of the omni output selection is available) > I ask this because I am currently working on fixing splice(pipe -> > socket) performance problem and need a standard benchmark to publish > performance results. I'll start to ponder, without promises. rick ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 2:39 ` Rick Jones @ 2012-04-03 3:14 ` Eric Dumazet 2012-04-03 18:18 ` Rick Jones 0 siblings, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2012-04-03 3:14 UTC (permalink / raw) To: Rick Jones Cc: David Miller, remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel On Mon, 2012-04-02 at 19:39 -0700, Rick Jones wrote: > On 04/02/2012 07:34 PM, Eric Dumazet wrote: > > On Mon, 2012-04-02 at 19:29 -0700, Rick Jones wrote: > >> I don't know it isn't entirely bitrotted, but there are streaming and > >> datagram AF_UNIX tests in netperf - they require conditional inclusion > >> via ./configure --enable-unixdomain: > >> > >> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#DG_005fSTREAM > >> > > > > Ah yes of course, I'll try that. > > It seems netperf has some problems zith AF_UNIX dgram : socket(PF_FILE, SOCK_DGRAM, 0) = 4 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [0], 4) = 0 getsockopt(4, SOL_SOCKET, SO_SNDBUF, [2048], [4]) = 0 setsockopt(4, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0 getsockopt(4, SOL_SOCKET, SO_RCVBUF, [2288], [4]) = 0 sendto(3, "\0\0\0+\377\377\377\377\0\0\0\0\0\0\10\0\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0"..., 256, 0, NULL, 0) = 256 select(1024, [3], NULL, NULL, {120, 0}) = 1 (in [3], left {119, 979635}) recvfrom(3, "\0\0\0,\0\0\0\0\0\0\10\360\0\0\10\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0\0"..., 256, 0, NULL, NULL) = 256 connect(4, {sa_family=AF_FILE, path="/tmp/netpe62HGNM"}, 110) = 0 rt_sigaction(SIGALRM, {0x403171, [ALRM], SA_RESTORER|SA_INTERRUPT, 0x7f691f026af0}, NULL, 8) = 0 alarm(10) = 0 sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 2048, 0, NULL, 0) = -1 EMSGSIZE (Message too long) dup(2) = 5 fcntl(5, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) fstat(5, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f691fa18000 lseek(5, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) write(5, "dg_send: data send error: Messag"..., 43dg_send: data send error: Message too long ) = 43 close(5) = 0 munmap(0x7f691fa18000, 4096) = 0 exit_group(1) = ? I guess the SO_SNDBUF/SO_RCVBUF limits are a bit too low ? Tried this on an old kernel as well. Linux edumazet-glaptop 2.6.38-13-generic #57~lucid1-Ubuntu SMP Tue Mar 6 20:05:46 UTC 2012 x86_64 GNU/Linux ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 3:14 ` Eric Dumazet @ 2012-04-03 18:18 ` Rick Jones 0 siblings, 0 replies; 19+ messages in thread From: Rick Jones @ 2012-04-03 18:18 UTC (permalink / raw) To: Eric Dumazet Cc: David Miller, remi, levinsasha928, remi.denis-courmont, davej, netdev, linux-kernel On 04/02/2012 08:14 PM, Eric Dumazet wrote: > It seems netperf has some problems zith AF_UNIX dgram : > > socket(PF_FILE, SOCK_DGRAM, 0) = 4 > setsockopt(4, SOL_SOCKET, SO_SNDBUF, [0], 4) = 0 > getsockopt(4, SOL_SOCKET, SO_SNDBUF, [2048], [4]) = 0 > setsockopt(4, SOL_SOCKET, SO_RCVBUF, [0], 4) = 0 > getsockopt(4, SOL_SOCKET, SO_RCVBUF, [2288], [4]) = 0 > sendto(3, "\0\0\0+\377\377\377\377\0\0\0\0\0\0\10\0\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0"..., 256, 0, NULL, 0) = 256 > select(1024, [3], NULL, NULL, {120, 0}) = 1 (in [3], left {119, 979635}) > recvfrom(3, "\0\0\0,\0\0\0\0\0\0\10\360\0\0\10\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0\0"..., 256, 0, NULL, NULL) = 256 > connect(4, {sa_family=AF_FILE, path="/tmp/netpe62HGNM"}, 110) = 0 > rt_sigaction(SIGALRM, {0x403171, [ALRM], SA_RESTORER|SA_INTERRUPT, 0x7f691f026af0}, NULL, 8) = 0 > alarm(10) = 0 > sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 2048, 0, NULL, 0) = -1 EMSGSIZE (Message too long) > dup(2) = 5 > fcntl(5, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE) > fstat(5, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 3), ...}) = 0 > mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f691fa18000 > lseek(5, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) > write(5, "dg_send: data send error: Messag"..., 43dg_send: data send error: Message too long > ) = 43 > close(5) = 0 > munmap(0x7f691fa18000, 4096) = 0 > exit_group(1) = ? > > I guess the SO_SNDBUF/SO_RCVBUF limits are a bit too low ? The AF_UNIX support in netperf is from a time when the code was very simplistic - if no socket buffer size specified on the command line, take the system default. If no send size specified, use the SO_SNDBUF size, though some limits were applied. Of course, along the way, there was a change in the value for socket buffer size that meant "take the default" - it used to be 0 but then became -1. Something to do with when Windows would decide to do copy avoidance - tell Windows the socket buffer size is 0 and it will do copy avoidance. The code in src/nettest_unix.c was still initializing to 0 rather than -1. I've fixed that in the top-of-trunk: raj@tardy:~/netperf2_trunk$ src/netperf -t DG_STREAM DG UNIDIRECTIONAL SEND TEST Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 126976 65507 10.00 1195647 0 811.07 2288 10.00 1195647 811.07 BTW, the local CPU utilization seems sane, but "remote" (which is redundant anyway) seems to be fubar somehow. More bitrot I suspect. happy benchmarking, rick jones ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 net-next] af_unix: reduce high order page allocations 2012-04-03 2:15 ` Eric Dumazet 2012-04-03 2:23 ` David Miller @ 2012-04-03 15:28 ` Eric Dumazet 2012-04-03 20:43 ` David Miller 1 sibling, 1 reply; 19+ messages in thread From: Eric Dumazet @ 2012-04-03 15:28 UTC (permalink / raw) To: David Miller; +Cc: netdev unix_dgram_sendmsg() currently builds linear skbs, and this can stress page allocator with high order page allocations. When memory gets fragmented, this can eventually fail. We can try to use order-2 allocations for skb head (SKB_MAX_ALLOC) plus up to 16 page fragments to lower pressure on buddy allocator. This patch has no effect on messages of less than 16064 bytes. (on 64bit arches with PAGE_SIZE=4096) For bigger messages (from 16065 to 81600 bytes), this patch brings reliability at the expense of performance penalty because of extra pages allocations. netperf -t DG_STREAM -T 0,2 -- -m 16064 -s 200000 ->4086040 Messages / 10s netperf -t DG_STREAM -T 0,2 -- -m 16068 -s 200000 ->3901747 Messages / 10s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- v2: use SKB_MAX_ALLOC instead of SKB_MAX_ORDER(0, 0) to not slow down applications using up to 16000 bytes messages. net/unix/af_unix.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index d510353..eadb902 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1442,6 +1442,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock, long timeo; struct scm_cookie tmp_scm; int max_level; + int data_len = 0; if (NULL == siocb->scm) siocb->scm = &tmp_scm; @@ -1475,7 +1476,13 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock, if (len > sk->sk_sndbuf - 32) goto out; - skb = sock_alloc_send_skb(sk, len, msg->msg_flags&MSG_DONTWAIT, &err); + if (len > SKB_MAX_ALLOC) + data_len = min_t(size_t, + len - SKB_MAX_ALLOC, + MAX_SKB_FRAGS * PAGE_SIZE); + + skb = sock_alloc_send_pskb(sk, len - data_len, data_len, + msg->msg_flags & MSG_DONTWAIT, &err); if (skb == NULL) goto out; @@ -1485,8 +1492,10 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock, max_level = err + 1; unix_get_secdata(siocb->scm, skb); - skb_reset_transport_header(skb); - err = memcpy_fromiovec(skb_put(skb, len), msg->msg_iov, len); + skb_put(skb, len - data_len); + skb->data_len = data_len; + skb->len = len; + err = skb_copy_datagram_from_iovec(skb, 0, msg->msg_iov, 0, len); if (err) goto out_free; ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 net-next] af_unix: reduce high order page allocations 2012-04-03 15:28 ` [PATCH v2 net-next] af_unix: reduce high order page allocations Eric Dumazet @ 2012-04-03 20:43 ` David Miller 0 siblings, 0 replies; 19+ messages in thread From: David Miller @ 2012-04-03 20:43 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 03 Apr 2012 17:28:28 +0200 > unix_dgram_sendmsg() currently builds linear skbs, and this can stress > page allocator with high order page allocations. When memory gets > fragmented, this can eventually fail. > > We can try to use order-2 allocations for skb head (SKB_MAX_ALLOC) plus > up to 16 page fragments to lower pressure on buddy allocator. > > This patch has no effect on messages of less than 16064 bytes. > (on 64bit arches with PAGE_SIZE=4096) > > For bigger messages (from 16065 to 81600 bytes), this patch brings > reliability at the expense of performance penalty because of extra pages > allocations. > > netperf -t DG_STREAM -T 0,2 -- -m 16064 -s 200000 > ->4086040 Messages / 10s > > netperf -t DG_STREAM -T 0,2 -- -m 16068 -s 200000 > ->3901747 Messages / 10s > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > v2: use SKB_MAX_ALLOC instead of SKB_MAX_ORDER(0, 0) to not slow down > applications using up to 16000 bytes messages. Looks good, applied to net-next, thanks Eric! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-02 21:40 ` David Miller 2012-04-03 1:53 ` Eric Dumazet @ 2012-04-03 6:36 ` Rémi Denis-Courmont 2012-04-03 6:38 ` David Miller 1 sibling, 1 reply; 19+ messages in thread From: Rémi Denis-Courmont @ 2012-04-03 6:36 UTC (permalink / raw) To: David Miller; +Cc: levinsasha928, davej, netdev, linux-kernel On Mon, 02 Apr 2012 17:40:06 -0400 (EDT), David Miller <davem@davemloft.net> wrote: > UDP generates a fragment list of MTU sized SKBs. > > Phonet could avoid the large allocations by building page based > SKBs. Oh right. And Phonet devices don't support scatter/gather, so that I guess that would merely delay the problem. Also sendmsg() code would need to be reectored to look up the output device and then the MTU before allocating the socket buffer. This will only work if the default MTU is reduced first :/ -- Rémi Denis-Courmont Sent from my collocated server ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] phonet: Check input from user before allocating 2012-04-03 6:36 ` [PATCH] phonet: Check input from user before allocating Rémi Denis-Courmont @ 2012-04-03 6:38 ` David Miller 0 siblings, 0 replies; 19+ messages in thread From: David Miller @ 2012-04-03 6:38 UTC (permalink / raw) To: remi; +Cc: levinsasha928, davej, netdev, linux-kernel From: Rémi Denis-Courmont <remi@remlab.net> Date: Tue, 03 Apr 2012 08:36:20 +0200 > Oh right. And Phonet devices don't support scatter/gather, so that I > guess that would merely delay the problem. And this lack of SG support in the drivers never materialized largely because the phonet stack never generated such things. So yes indeed both sides of this equation will need to be addressed. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2012-04-03 20:43 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-02 20:31 [PATCH] phonet: Check input from user before allocating Sasha Levin 2012-04-02 19:00 ` Rémi Denis-Courmont 2012-04-02 21:38 ` David Miller 2012-04-02 19:01 ` Rémi Denis-Courmont 2012-04-02 21:40 ` David Miller 2012-04-03 1:53 ` Eric Dumazet 2012-04-03 1:59 ` David Miller 2012-04-03 2:15 ` Eric Dumazet 2012-04-03 2:23 ` David Miller 2012-04-03 2:29 ` Eric Dumazet 2012-04-03 2:29 ` Rick Jones 2012-04-03 2:34 ` Eric Dumazet 2012-04-03 2:39 ` Rick Jones 2012-04-03 3:14 ` Eric Dumazet 2012-04-03 18:18 ` Rick Jones 2012-04-03 15:28 ` [PATCH v2 net-next] af_unix: reduce high order page allocations Eric Dumazet 2012-04-03 20:43 ` David Miller 2012-04-03 6:36 ` [PATCH] phonet: Check input from user before allocating Rémi Denis-Courmont 2012-04-03 6:38 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).