* [RFC/PATCH 0/3] UDP memory usage accounting @ 2007-09-21 12:18 Satoshi OSHIMA 2007-09-21 12:58 ` Evgeniy Polyakov 0 siblings, 1 reply; 11+ messages in thread From: Satoshi OSHIMA @ 2007-09-21 12:18 UTC (permalink / raw) To: netdev; +Cc: haoki, 吉藤 英明 This patch set try to introduce memory usage accounting for UDP(currently ipv4 only). Currently, memory usage of UDP can be observed as the sam of usage of tx_queue and rx_queue. But I believe that the system wide accounting is usefull when heavy loaded condition. In the next step, I would like to add memory usage quota for UDP to avoid unlimited memory consumption problem under DDOS attack. This patch set is for 2.6.23-rc7. Unfortunately, I don't have any NIC with UFO. So this patch set is not tested with UFO supported device. I appreciate your comment/test/feedback. Satoshi Oshima ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-21 12:18 [RFC/PATCH 0/3] UDP memory usage accounting Satoshi OSHIMA @ 2007-09-21 12:58 ` Evgeniy Polyakov 2007-09-27 18:51 ` Hideo AOKI 2007-09-28 13:26 ` Satoshi OSHIMA 0 siblings, 2 replies; 11+ messages in thread From: Evgeniy Polyakov @ 2007-09-21 12:58 UTC (permalink / raw) To: Satoshi OSHIMA; +Cc: netdev, haoki, 吉藤 英明 Hi. On Fri, Sep 21, 2007 at 09:18:07PM +0900, Satoshi OSHIMA (satoshi.oshima.fk@hitachi.com) wrote: > This patch set try to introduce memory usage accounting for > UDP(currently ipv4 only). > > Currently, memory usage of UDP can be observed as the sam of > usage of tx_queue and rx_queue. But I believe that the system > wide accounting is usefull when heavy loaded condition. > > In the next step, I would like to add memory usage quota > for UDP to avoid unlimited memory consumption problem > under DDOS attack. Could you please desribed such attack in more details? Each UDP socket has its queue length which can not be exceeded (roughly), no new sockets are created when remote side sends a packet (like after special steps in TCP), so where is possibility to eat all the mem? > This patch set is for 2.6.23-rc7. I seriously doubt you want to put udp specific hacks and zillions of atomic ops all around the code just to know exact number of bytes eaten for UDP. Please use udp specific code (like udp_sendmsg()) for proper accounting if you need that, but not hacks in generic ip code. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-21 12:58 ` Evgeniy Polyakov @ 2007-09-27 18:51 ` Hideo AOKI 2007-09-28 13:26 ` Satoshi OSHIMA 1 sibling, 0 replies; 11+ messages in thread From: Hideo AOKI @ 2007-09-27 18:51 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: Satoshi OSHIMA, netdev, yoshfuji Hello, Apologies for late response. Evgeniy Polyakov wrote: > Hi. > > On Fri, Sep 21, 2007 at 09:18:07PM +0900, Satoshi OSHIMA (satoshi.oshima.fk@hitachi.com) wrote: >> This patch set try to introduce memory usage accounting for >> UDP(currently ipv4 only). >> >> Currently, memory usage of UDP can be observed as the sam of >> usage of tx_queue and rx_queue. But I believe that the system >> wide accounting is usefull when heavy loaded condition. >> >> In the next step, I would like to add memory usage quota >> for UDP to avoid unlimited memory consumption problem >> under DDOS attack. > > Could you please desribed such attack in more details? > Each UDP socket has its queue length which can not be exceeded > (roughly), no new sockets are created when remote side sends a packet > (like after special steps in TCP), so where is possibility to eat all > the mem? I think Satoshi will answer this question soon. >> This patch set is for 2.6.23-rc7. > > I seriously doubt you want to put udp specific hacks and zillions of > atomic ops all around the code just to know exact number of bytes eaten > for UDP. I'll revise the patch to reduce the number of atomic operations. > Please use udp specific code (like udp_sendmsg()) for proper accounting > if you need that, but not hacks in generic ip code. As far as I know, Satoshi is improving this part right now. Please wait his response. Many thanks for your comments. Best regards, Hideo Aoki -- Hitachi Computer Products (America) Inc. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-21 12:58 ` Evgeniy Polyakov 2007-09-27 18:51 ` Hideo AOKI @ 2007-09-28 13:26 ` Satoshi OSHIMA 2007-09-29 3:21 ` Herbert Xu 1 sibling, 1 reply; 11+ messages in thread From: Satoshi OSHIMA @ 2007-09-28 13:26 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: netdev, haoki, ?? ??, Yumiko SUGITA Hi, Thank you for your comment. Evgeniy Polyakov wrote: > Hi. > > On Fri, Sep 21, 2007 at 09:18:07PM +0900, Satoshi OSHIMA (satoshi.oshima.fk@hitachi.com) wrote: >> This patch set try to introduce memory usage accounting for >> UDP(currently ipv4 only). >> >> Currently, memory usage of UDP can be observed as the sam of >> usage of tx_queue and rx_queue. But I believe that the system >> wide accounting is usefull when heavy loaded condition. >> >> In the next step, I would like to add memory usage quota >> for UDP to avoid unlimited memory consumption problem >> under DDOS attack. > > Could you please desribed such attack in more details? > Each UDP socket has its queue length which can not be exceeded > (roughly), no new sockets are created when remote side sends a packet > (like after special steps in TCP), so where is possibility to eat all > the mem? For example, sk_buff is put on the slab and slab can be acquired only from ZONE_NORMAL in i386. In such case, from 300 to 500MB memory consumption will be fatal. Users can easily open 1000 sockets per process under default ulimit. If such sockets hold messages but user processes don't receive it. Almost all slab will be occupied by sk_buff. >> This patch set is for 2.6.23-rc7. > > I seriously doubt you want to put udp specific hacks and zillions of > atomic ops all around the code just to know exact number of bytes eaten > for UDP. > Please use udp specific code (like udp_sendmsg()) for proper accounting > if you need that, but not hacks in generic ip code. I couldn't find the way to account UDP memory consumption in UDP layer. In receive path, accounting can be done in UDP layer because sk_buff is marked for UDP in UDP layer and it is released in UDP layer. In send path, sk_buff is aquired in IP layer and also released in IP layer. Especially, there is a possibility of "appending data" to the preceding sk_buff in send queue. On the other hand, I agree that UDP specific code in IP layer is not preferable. So I generalize UDP specific code in IP layer in take 2. Could you take a look at my take 2 patch set? Satoshi Oshima ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-28 13:26 ` Satoshi OSHIMA @ 2007-09-29 3:21 ` Herbert Xu 2007-09-29 4:47 ` David Miller 0 siblings, 1 reply; 11+ messages in thread From: Herbert Xu @ 2007-09-29 3:21 UTC (permalink / raw) To: Satoshi OSHIMA; +Cc: johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com> wrote: > > In such case, from 300 to 500MB memory consumption will > be fatal. Users can easily open 1000 sockets per process > under default ulimit. If such sockets hold messages but > user processes don't receive it. Almost all slab will > be occupied by sk_buff. Well the solution to that is to have a per-user limit rather than a system-wide limit. Otherwise any user can stop system daemons from using UDP. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-29 3:21 ` Herbert Xu @ 2007-09-29 4:47 ` David Miller 2007-09-29 4:51 ` David Miller 2007-09-29 5:55 ` Herbert Xu 0 siblings, 2 replies; 11+ messages in thread From: David Miller @ 2007-09-29 4:47 UTC (permalink / raw) To: herbert Cc: satoshi.oshima.fk, johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sat, 29 Sep 2007 11:21:05 +0800 > Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com> wrote: > > > > In such case, from 300 to 500MB memory consumption will > > be fatal. Users can easily open 1000 sockets per process > > under default ulimit. If such sockets hold messages but > > user processes don't receive it. Almost all slab will > > be occupied by sk_buff. > > Well the solution to that is to have a per-user limit rather > than a system-wide limit. Otherwise any user can stop system > daemons from using UDP. Per-user limits are not necessarily the answer. There are two things we (might) need to guard against, one local and one remote. Originally the TCP global memory accounting was added to handle remote issues. You could really make apache do stupid things without it. Open up a ton of connections to a web server, request a ton of data, don't read any of it. When we get into the red zone, we purge out of order queues and other packet allocations that are expendable. Legitimate active connections can thus make progress and allocate packets. More importantly the amount of memory usable by TCP sockets is bounded by some limit. But this limit is arbitrary and easily wrong. If my system is just sending one static file to hundreds of thousands of clients, well then using %99 of RAM for socket buffer memory is just fine. That is not how the global accounting works, unfortunately. It doesn't know what's happening, it doesn't "respond" to any stimulus to control memory use. It just understands it's local state and it's local limits. It's a very poor way to handle the problem. If you do a per-user limit, apache would basically just stop at that redzone point. In some sense making the attack more effective because then it's trivial to shut down an entire web server this way. In my opinion this stuff needs to be reinvestigated more deeply. In fact I think the stuff we have for TCP is insufficient and/or ineffective. Furthermore, the fact that there is even the slightest urge to duplicate this for UDP should be a big red flag that we need a better solution. The fact is that the networking does not participate with the rest of the system wrt. memory pressure callbacks. That's the problem. I've mentioned before that things like the routing cache should register trimming callbacks just like the dcache and inode cache already do. I see no valid argument against doing something similar for sockets. Such a register_shrinker() handler for TCP could, for example, look for TCP flows which haven't made forward progress in more than a certain amount of time and attempt to trim SKB memory from them. A shrinker callback could also be used to adjust any global socket memory limit scheme we might have. Set the limit real high initially, but then scale it back if we get a lot of shrinker calls. UDP and other datagram sockets are troublesome because the memory gets wholly tied up immediately during the send call and it's not easy to liberate anything. The nice part about datagram sockets, however, is that they make forward progress quickly and their memory is liberated as soon as the device transmits the packet. They don't have to wait for ACKs, windows openning up, or anything like that to happen. To be honest I don't even think UDP is much of a real problem for this reason. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-29 4:47 ` David Miller @ 2007-09-29 4:51 ` David Miller 2007-09-29 5:20 ` Herbert Xu 2007-09-29 5:55 ` Herbert Xu 1 sibling, 1 reply; 11+ messages in thread From: David Miller @ 2007-09-29 4:51 UTC (permalink / raw) To: herbert Cc: satoshi.oshima.fk, johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf From: David Miller <davem@davemloft.net> Date: Fri, 28 Sep 2007 21:47:37 -0700 (PDT) > Per-user limits are not necessarily the answer. Seeing myself say this reminds me that it seems to have been missed in all of this that we do have a limit per user already. There is a per-socket send buffer limit, and there is a per-user open file descriptor limit. Multiply the two to determine how much system memory the user can consume using sockets. This is why I view a lot ofthis work as quite pointless. We already have limits, but nobody wants to use them for performance reasons :-) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-29 4:51 ` David Miller @ 2007-09-29 5:20 ` Herbert Xu 2007-10-01 13:56 ` Satoshi OSHIMA 0 siblings, 1 reply; 11+ messages in thread From: Herbert Xu @ 2007-09-29 5:20 UTC (permalink / raw) To: David Miller Cc: satoshi.oshima.fk, johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf On Fri, Sep 28, 2007 at 09:51:59PM -0700, David Miller wrote: > > There is a per-socket send buffer limit, and there is a per-user open > file descriptor limit. Multiply the two to determine how much system > memory the user can consume using sockets. We do have these limits but they're per-process, not per-user. Unless you lock down the number of processes each user can have to no more than a handful then this is basically useless. For example, let's say each socket can lock down 64K of kernel memory (which is quite easy to do BTW, just open a TCP/UDP socket, send data to it from another socket but keep the data in the socket by not calling recvmsg), and that each process can have 1024 file descriptors (the default), then each process can pin 64K x 1024 = 64M of memory. So if the user can have 10 processes, then that's 640M of kernel memory that can be pinned down. Usually the process limit is at least 10 times higher. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-29 5:20 ` Herbert Xu @ 2007-10-01 13:56 ` Satoshi OSHIMA 0 siblings, 0 replies; 11+ messages in thread From: Satoshi OSHIMA @ 2007-10-01 13:56 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf, Evgeniy Polyakov Herbert Xu wrote: > On Fri, Sep 28, 2007 at 09:51:59PM -0700, David Miller wrote: >> There is a per-socket send buffer limit, and there is a per-user open >> file descriptor limit. Multiply the two to determine how much system >> memory the user can consume using sockets. > > We do have these limits but they're per-process, not per-user. > Unless you lock down the number of processes each user can have > to no more than a handful then this is basically useless. > > For example, let's say each socket can lock down 64K of kernel > memory (which is quite easy to do BTW, just open a TCP/UDP socket, > send data to it from another socket but keep the data in the > socket by not calling recvmsg), and that each process can have > 1024 file descriptors (the default), then each process can pin > > 64K x 1024 = 64M > > of memory. So if the user can have 10 processes, then that's > 640M of kernel memory that can be pinned down. Usually the > process limit is at least 10 times higher. Thank you very mush for your comment. What you pointed out is my motivation to make this patch. I think that per-process limits won't help to solve this problem. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-29 4:47 ` David Miller 2007-09-29 4:51 ` David Miller @ 2007-09-29 5:55 ` Herbert Xu 2007-10-01 13:57 ` Satoshi OSHIMA 1 sibling, 1 reply; 11+ messages in thread From: Herbert Xu @ 2007-09-29 5:55 UTC (permalink / raw) To: David Miller Cc: satoshi.oshima.fk, johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf On Fri, Sep 28, 2007 at 09:47:37PM -0700, David Miller wrote: > > There are two things we (might) need to guard against, one local and > one remote. Right I was focusing on the local threat. > If you do a per-user limit, apache would basically just stop at that > redzone point. In some sense making the attack more effective because > then it's trivial to shut down an entire web server this way. Having a per-user limit doesn't necessarily mean that we have to apply the limit differently to how we apply the system-wide limits. We could keep exactly the same code as we have now but check against a per-user limit instead of a system-wide one. In other words your apache scenario will continue to work as is even with a per-user limit. Now where it does become useful is when we have a rogue local user. As it is that user can chew up all of the budgeted TCP memory by simply not calling recvmsg. As I've stated in the other email, the existing rlimits don't help because they're per-process rather than per-user. BTW, this is not fatal for TCP because TCP provides a minimum amount of memory for each socket even when we are over the limit. However, if we this was implemented for UDP without a minimum guarantee then it'd be quite useless. > I see no valid argument against doing something similar for sockets. > Such a register_shrinker() handler for TCP could, for example, look > for TCP flows which haven't made forward progress in more than a > certain amount of time and attempt to trim SKB memory from them. Yes I agree this would be quite useful for sending. However, it'll be tough to shrink skbs that we've already acked for but the app for some reason has decided to leave in the socket by not calling recvmsg. > UDP and other datagram sockets are troublesome because the memory > gets wholly tied up immediately during the send call and it's not > easy to liberate anything. The nice part about datagram sockets, > however, is that they make forward progress quickly and their > memory is liberated as soon as the device transmits the packet. > They don't have to wait for ACKs, windows openning up, or anything > like that to happen. Agreed. Also the recvmsg case I've described above is much simpler for UDP as we can just go through all the sockets and free skbs at random :) > To be honest I don't even think UDP is much of a real problem for this > reason. It's not a hard problem but we do need to have some code for it. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/PATCH 0/3] UDP memory usage accounting 2007-09-29 5:55 ` Herbert Xu @ 2007-10-01 13:57 ` Satoshi OSHIMA 0 siblings, 0 replies; 11+ messages in thread From: Satoshi OSHIMA @ 2007-10-01 13:57 UTC (permalink / raw) To: Herbert Xu Cc: David Miller, johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf > On Fri, Sep 28, 2007 at 09:47:37PM -0700, David Miller wrote: >> There are two things we (might) need to guard against, one local and >> one remote. > > Right I was focusing on the local threat. > >> If you do a per-user limit, apache would basically just stop at that >> redzone point. In some sense making the attack more effective because >> then it's trivial to shut down an entire web server this way. > > Having a per-user limit doesn't necessarily mean that we have > to apply the limit differently to how we apply the system-wide > limits. We could keep exactly the same code as we have now but > check against a per-user limit instead of a system-wide one. > > In other words your apache scenario will continue to work as is > even with a per-user limit. I'm afraid that per-user limit won't work for system administrator, because he can't know who is the rogue user in advance (before such attack is made). And once the attack is made, system will not responce because of the lack of memory for slab. So if he only has per-user limit, he need to split the memory budget for UDP to each user. The limit per user will be very small if number of users in the system is large. > Now where it does become useful is when we have a rogue local > user. As it is that user can chew up all of the budgeted TCP > memory by simply not calling recvmsg. As I've stated in the > other email, the existing rlimits don't help because they're > per-process rather than per-user. > > BTW, this is not fatal for TCP because TCP provides a minimum > amount of memory for each socket even when we are over the > limit. However, if we this was implemented for UDP without > a minimum guarantee then it'd be quite useless. Hmm, I didn't realize that. Thank you for your good suggestion. I will think of it. >> I see no valid argument against doing something similar for sockets. >> Such a register_shrinker() handler for TCP could, for example, look >> for TCP flows which haven't made forward progress in more than a >> certain amount of time and attempt to trim SKB memory from them. > > Yes I agree this would be quite useful for sending. However, it'll > be tough to shrink skbs that we've already acked for but the app > for some reason has decided to leave in the socket by not calling > recvmsg. > >> UDP and other datagram sockets are troublesome because the memory >> gets wholly tied up immediately during the send call and it's not >> easy to liberate anything. The nice part about datagram sockets, >> however, is that they make forward progress quickly and their >> memory is liberated as soon as the device transmits the packet. >> They don't have to wait for ACKs, windows openning up, or anything >> like that to happen. > > Agreed. Also the recvmsg case I've described above is much > simpler for UDP as we can just go through all the sockets and > free skbs at random :) > >> To be honest I don't even think UDP is much of a real problem for this >> reason. > > It's not a hard problem but we do need to have some code for it. I believe so. Currently, a nasty user can easily stop the system without root privilege. This may not be a serious problem, but this is the problem to be fixed. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-10-01 13:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-21 12:18 [RFC/PATCH 0/3] UDP memory usage accounting Satoshi OSHIMA 2007-09-21 12:58 ` Evgeniy Polyakov 2007-09-27 18:51 ` Hideo AOKI 2007-09-28 13:26 ` Satoshi OSHIMA 2007-09-29 3:21 ` Herbert Xu 2007-09-29 4:47 ` David Miller 2007-09-29 4:51 ` David Miller 2007-09-29 5:20 ` Herbert Xu 2007-10-01 13:56 ` Satoshi OSHIMA 2007-09-29 5:55 ` Herbert Xu 2007-10-01 13:57 ` Satoshi OSHIMA
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).