[RFC/PATCH 0/3] UDP memory usage accounting

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC/PATCH 0/3] UDP memory usage accounting
@ 2007-09-21 12:18 Satoshi OSHIMA
  2007-09-21 12:58 ` Evgeniy Polyakov
  0 siblings, 1 reply; 11+ messages in thread
From: Satoshi OSHIMA @ 2007-09-21 12:18 UTC (permalink / raw)
  To: netdev; +Cc: haoki, 吉藤 英明

This patch set try to introduce memory usage accounting for
UDP(currently ipv4 only).

Currently, memory usage of UDP can be observed as the sam of
usage of tx_queue and rx_queue. But I believe that the system
wide accounting is usefull when heavy loaded condition.

In the next step, I would like to add memory usage quota
for UDP to avoid unlimited memory consumption problem
under DDOS attack.

This patch set is for 2.6.23-rc7.

Unfortunately, I don't have any NIC with UFO.
So this patch set is not tested with UFO supported
device.

I appreciate your comment/test/feedback.

Satoshi Oshima

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-21 12:18 [RFC/PATCH 0/3] UDP memory usage accounting Satoshi OSHIMA
@ 2007-09-21 12:58 ` Evgeniy Polyakov
  2007-09-27 18:51   ` Hideo AOKI
  2007-09-28 13:26   ` Satoshi OSHIMA
  0 siblings, 2 replies; 11+ messages in thread
From: Evgeniy Polyakov @ 2007-09-21 12:58 UTC (permalink / raw)
  To: Satoshi OSHIMA; +Cc: netdev, haoki, 吉藤 英明

Hi.

On Fri, Sep 21, 2007 at 09:18:07PM +0900, Satoshi OSHIMA (satoshi.oshima.fk@hitachi.com) wrote:
> This patch set try to introduce memory usage accounting for
> UDP(currently ipv4 only).
> 
> Currently, memory usage of UDP can be observed as the sam of
> usage of tx_queue and rx_queue. But I believe that the system
> wide accounting is usefull when heavy loaded condition.
> 
> In the next step, I would like to add memory usage quota
> for UDP to avoid unlimited memory consumption problem
> under DDOS attack.

Could you please desribed such attack in more details?
Each UDP socket has its queue length which can not be exceeded
(roughly), no new sockets are created when remote side sends a packet
(like after special steps in TCP), so where is possibility to eat all
the mem?

> This patch set is for 2.6.23-rc7.

I seriously doubt you want to put udp specific hacks and zillions of
atomic ops all around the code just to know exact number of bytes eaten
for UDP.

Please use udp specific code (like udp_sendmsg()) for proper accounting
if you need that, but not hacks in generic ip code.

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-21 12:58 ` Evgeniy Polyakov
@ 2007-09-27 18:51   ` Hideo AOKI
  2007-09-28 13:26   ` Satoshi OSHIMA
  1 sibling, 0 replies; 11+ messages in thread
From: Hideo AOKI @ 2007-09-27 18:51 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Satoshi OSHIMA, netdev, yoshfuji

Hello,

Apologies for late response.

Evgeniy Polyakov wrote:
> Hi.
> 
> On Fri, Sep 21, 2007 at 09:18:07PM +0900, Satoshi OSHIMA (satoshi.oshima.fk@hitachi.com) wrote:
>> This patch set try to introduce memory usage accounting for
>> UDP(currently ipv4 only).
>>
>> Currently, memory usage of UDP can be observed as the sam of
>> usage of tx_queue and rx_queue. But I believe that the system
>> wide accounting is usefull when heavy loaded condition.
>>
>> In the next step, I would like to add memory usage quota
>> for UDP to avoid unlimited memory consumption problem
>> under DDOS attack.
> 
> Could you please desribed such attack in more details?
> Each UDP socket has its queue length which can not be exceeded
> (roughly), no new sockets are created when remote side sends a packet
> (like after special steps in TCP), so where is possibility to eat all
> the mem?

I think Satoshi will answer this question soon.

>> This patch set is for 2.6.23-rc7.
> 
> I seriously doubt you want to put udp specific hacks and zillions of
> atomic ops all around the code just to know exact number of bytes eaten
> for UDP.

I'll revise the patch to reduce the number of atomic operations.

> Please use udp specific code (like udp_sendmsg()) for proper accounting
> if you need that, but not hacks in generic ip code.

As far as I know, Satoshi is improving this part right now. Please wait his
response.

Many thanks for your comments.

Best regards,
Hideo Aoki

--
Hitachi Computer Products (America) Inc.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-21 12:58 ` Evgeniy Polyakov
  2007-09-27 18:51   ` Hideo AOKI
@ 2007-09-28 13:26   ` Satoshi OSHIMA
  2007-09-29  3:21     ` Herbert Xu
  1 sibling, 1 reply; 11+ messages in thread
From: Satoshi OSHIMA @ 2007-09-28 13:26 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: netdev, haoki, ?? ??, Yumiko SUGITA

Hi,

Thank you for your comment.

Evgeniy Polyakov wrote:
> Hi.
>
> On Fri, Sep 21, 2007 at 09:18:07PM +0900, Satoshi OSHIMA
(satoshi.oshima.fk@hitachi.com) wrote:
>> This patch set try to introduce memory usage accounting for
>> UDP(currently ipv4 only).
>>
>> Currently, memory usage of UDP can be observed as the sam of
>> usage of tx_queue and rx_queue. But I believe that the system
>> wide accounting is usefull when heavy loaded condition.
>>
>> In the next step, I would like to add memory usage quota
>> for UDP to avoid unlimited memory consumption problem
>> under DDOS attack.
>
> Could you please desribed such attack in more details?
> Each UDP socket has its queue length which can not be exceeded
> (roughly), no new sockets are created when remote side sends a packet
> (like after special steps in TCP), so where is possibility to eat all
> the mem?

For example, sk_buff is put on the slab and
slab can be acquired only from ZONE_NORMAL in i386.

In such case, from 300 to 500MB memory consumption will
be fatal. Users can easily open 1000 sockets per process
under default ulimit. If such sockets hold messages but
user processes don't receive it. Almost all slab will
be occupied by sk_buff.

>> This patch set is for 2.6.23-rc7.
>
> I seriously doubt you want to put udp specific hacks and zillions of
> atomic ops all around the code just to know exact number of bytes eaten
> for UDP.
> Please use udp specific code (like udp_sendmsg()) for proper accounting
> if you need that, but not hacks in generic ip code.

I couldn't find the way to account UDP memory consumption
in UDP layer.

In receive path, accounting can be done in UDP layer
because sk_buff is marked for UDP in UDP layer and it is
released in UDP layer.

In send path, sk_buff is aquired in IP layer and also
released in IP layer. Especially, there is a possibility
 of "appending data" to the preceding sk_buff in send
queue.

On the other hand, I agree that UDP specific code
in IP layer is not preferable. So I generalize UDP
specific code in IP layer in take 2.

Could you take a look at my take 2 patch set?

Satoshi Oshima

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-28 13:26   ` Satoshi OSHIMA
@ 2007-09-29  3:21     ` Herbert Xu
  2007-09-29  4:47       ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2007-09-29  3:21 UTC (permalink / raw)
  To: Satoshi OSHIMA; +Cc: johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf

Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com> wrote:
>
> In such case, from 300 to 500MB memory consumption will
> be fatal. Users can easily open 1000 sockets per process
> under default ulimit. If such sockets hold messages but
> user processes don't receive it. Almost all slab will
> be occupied by sk_buff.

Well the solution to that is to have a per-user limit rather
than a system-wide limit.  Otherwise any user can stop system
daemons from using UDP.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-29  3:21     ` Herbert Xu
@ 2007-09-29  4:47       ` David Miller
  2007-09-29  4:51         ` David Miller
  2007-09-29  5:55         ` Herbert Xu
  0 siblings, 2 replies; 11+ messages in thread
From: David Miller @ 2007-09-29  4:47 UTC (permalink / raw)
  To: herbert
  Cc: satoshi.oshima.fk, johnpol, netdev, haoki, yoshfuji,
	yumiko.sugita.yf

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 29 Sep 2007 11:21:05 +0800

> Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com> wrote:
> >
> > In such case, from 300 to 500MB memory consumption will
> > be fatal. Users can easily open 1000 sockets per process
> > under default ulimit. If such sockets hold messages but
> > user processes don't receive it. Almost all slab will
> > be occupied by sk_buff.
> 
> Well the solution to that is to have a per-user limit rather
> than a system-wide limit.  Otherwise any user can stop system
> daemons from using UDP.

Per-user limits are not necessarily the answer.

There are two things we (might) need to guard against, one local and
one remote.

Originally the TCP global memory accounting was added to handle remote
issues.  You could really make apache do stupid things without it.
Open up a ton of connections to a web server, request a ton of
data, don't read any of it.

When we get into the red zone, we purge out of order queues and other
packet allocations that are expendable.  Legitimate active connections
can thus make progress and allocate packets.  More importantly the
amount of memory usable by TCP sockets is bounded by some limit.

But this limit is arbitrary and easily wrong.  If my system is just
sending one static file to hundreds of thousands of clients, well
then using %99 of RAM for socket buffer memory is just fine.  That
is not how the global accounting works, unfortunately.  It doesn't
know what's happening, it doesn't "respond" to any stimulus to
control memory use.  It just understands it's local state and it's
local limits.  It's a very poor way to handle the problem.

If you do a per-user limit, apache would basically just stop at that
redzone point.  In some sense making the attack more effective because
then it's trivial to shut down an entire web server this way.

In my opinion this stuff needs to be reinvestigated more deeply.  In
fact I think the stuff we have for TCP is insufficient and/or
ineffective.

Furthermore, the fact that there is even the slightest urge to
duplicate this for UDP should be a big red flag that we need a better
solution.

The fact is that the networking does not participate with the rest
of the system wrt. memory pressure callbacks.  That's the problem.

I've mentioned before that things like the routing cache should
register trimming callbacks just like the dcache and inode cache
already do.

I see no valid argument against doing something similar for sockets.
Such a register_shrinker() handler for TCP could, for example, look
for TCP flows which haven't made forward progress in more than a
certain amount of time and attempt to trim SKB memory from them.

A shrinker callback could also be used to adjust any global socket
memory limit scheme we might have.  Set the limit real high initially,
but then scale it back if we get a lot of shrinker calls.

UDP and other datagram sockets are troublesome because the memory
gets wholly tied up immediately during the send call and it's not
easy to liberate anything.  The nice part about datagram sockets,
however, is that they make forward progress quickly and their
memory is liberated as soon as the device transmits the packet.
They don't have to wait for ACKs, windows openning up, or anything
like that to happen.

To be honest I don't even think UDP is much of a real problem for this
reason.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-29  4:47       ` David Miller
@ 2007-09-29  4:51         ` David Miller
  2007-09-29  5:20           ` Herbert Xu
  2007-09-29  5:55         ` Herbert Xu
  1 sibling, 1 reply; 11+ messages in thread
From: David Miller @ 2007-09-29  4:51 UTC (permalink / raw)
  To: herbert
  Cc: satoshi.oshima.fk, johnpol, netdev, haoki, yoshfuji,
	yumiko.sugita.yf

From: David Miller <davem@davemloft.net>
Date: Fri, 28 Sep 2007 21:47:37 -0700 (PDT)

> Per-user limits are not necessarily the answer.

Seeing myself say this reminds me that it seems to have been missed in
all of this that we do have a limit per user already.

There is a per-socket send buffer limit, and there is a per-user open
file descriptor limit.  Multiply the two to determine how much system
memory the user can consume using sockets.

This is why I view a lot ofthis work as quite pointless.  We already
have limits, but nobody wants to use them for performance reasons :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-29  4:51         ` David Miller
@ 2007-09-29  5:20           ` Herbert Xu
  2007-10-01 13:56             ` Satoshi OSHIMA
  0 siblings, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2007-09-29  5:20 UTC (permalink / raw)
  To: David Miller
  Cc: satoshi.oshima.fk, johnpol, netdev, haoki, yoshfuji,
	yumiko.sugita.yf

On Fri, Sep 28, 2007 at 09:51:59PM -0700, David Miller wrote:
> 
> There is a per-socket send buffer limit, and there is a per-user open
> file descriptor limit.  Multiply the two to determine how much system
> memory the user can consume using sockets.

We do have these limits but they're per-process, not per-user.
Unless you lock down the number of processes each user can have
to no more than a handful then this is basically useless.

For example, let's say each socket can lock down 64K of kernel
memory (which is quite easy to do BTW, just open a TCP/UDP socket,
send data to it from another socket but keep the data in the
socket by not calling recvmsg), and that each process can have
1024 file descriptors (the default), then each process can pin

64K x 1024 = 64M

of memory.  So if the user can have 10 processes, then that's
640M of kernel memory that can be pinned down.  Usually the
process limit is at least 10 times higher.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-29  4:47       ` David Miller
  2007-09-29  4:51         ` David Miller
@ 2007-09-29  5:55         ` Herbert Xu
  2007-10-01 13:57           ` Satoshi OSHIMA
  1 sibling, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2007-09-29  5:55 UTC (permalink / raw)
  To: David Miller
  Cc: satoshi.oshima.fk, johnpol, netdev, haoki, yoshfuji,
	yumiko.sugita.yf

On Fri, Sep 28, 2007 at 09:47:37PM -0700, David Miller wrote:
>
> There are two things we (might) need to guard against, one local and
> one remote.

Right I was focusing on the local threat.

> If you do a per-user limit, apache would basically just stop at that
> redzone point.  In some sense making the attack more effective because
> then it's trivial to shut down an entire web server this way.

Having a per-user limit doesn't necessarily mean that we have
to apply the limit differently to how we apply the system-wide
limits.  We could keep exactly the same code as we have now but
check against a per-user limit instead of a system-wide one.

In other words your apache scenario will continue to work as is
even with a per-user limit.

Now where it does become useful is when we have a rogue local
user.  As it is that user can chew up all of the budgeted TCP
memory by simply not calling recvmsg.  As I've stated in the
other email, the existing rlimits don't help because they're
per-process rather than per-user.

BTW, this is not fatal for TCP because TCP provides a minimum
amount of memory for each socket even when we are over the
limit.  However, if we this was implemented for UDP without
a minimum guarantee then it'd be quite useless.

> I see no valid argument against doing something similar for sockets.
> Such a register_shrinker() handler for TCP could, for example, look
> for TCP flows which haven't made forward progress in more than a
> certain amount of time and attempt to trim SKB memory from them.

Yes I agree this would be quite useful for sending.  However, it'll
be tough to shrink skbs that we've already acked for but the app
for some reason has decided to leave in the socket by not calling
recvmsg.

> UDP and other datagram sockets are troublesome because the memory
> gets wholly tied up immediately during the send call and it's not
> easy to liberate anything.  The nice part about datagram sockets,
> however, is that they make forward progress quickly and their
> memory is liberated as soon as the device transmits the packet.
> They don't have to wait for ACKs, windows openning up, or anything
> like that to happen.

Agreed.  Also the recvmsg case I've described above is much
simpler for UDP as we can just go through all the sockets and
free skbs at random :)

> To be honest I don't even think UDP is much of a real problem for this
> reason.

It's not a hard problem but we do need to have some code for it.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-29  5:20           ` Herbert Xu
@ 2007-10-01 13:56             ` Satoshi OSHIMA
  0 siblings, 0 replies; 11+ messages in thread
From: Satoshi OSHIMA @ 2007-10-01 13:56 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf,
	Evgeniy Polyakov

Herbert Xu wrote:
> On Fri, Sep 28, 2007 at 09:51:59PM -0700, David Miller wrote:
>> There is a per-socket send buffer limit, and there is a per-user open
>> file descriptor limit.  Multiply the two to determine how much system
>> memory the user can consume using sockets.
>
> We do have these limits but they're per-process, not per-user.
> Unless you lock down the number of processes each user can have
> to no more than a handful then this is basically useless.
>
> For example, let's say each socket can lock down 64K of kernel
> memory (which is quite easy to do BTW, just open a TCP/UDP socket,
> send data to it from another socket but keep the data in the
> socket by not calling recvmsg), and that each process can have
> 1024 file descriptors (the default), then each process can pin
>
> 64K x 1024 = 64M
>
> of memory.  So if the user can have 10 processes, then that's
> 640M of kernel memory that can be pinned down.  Usually the
> process limit is at least 10 times higher.

Thank you very mush for your comment.

What you pointed out is my motivation to make this patch.
I think that per-process limits won't help to solve this
problem.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC/PATCH 0/3] UDP memory usage accounting
  2007-09-29  5:55         ` Herbert Xu
@ 2007-10-01 13:57           ` Satoshi OSHIMA
  0 siblings, 0 replies; 11+ messages in thread
From: Satoshi OSHIMA @ 2007-10-01 13:57 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David Miller, johnpol, netdev, haoki, yoshfuji, yumiko.sugita.yf

> On Fri, Sep 28, 2007 at 09:47:37PM -0700, David Miller wrote:
>> There are two things we (might) need to guard against, one local and
>> one remote.
>
> Right I was focusing on the local threat.
>
>> If you do a per-user limit, apache would basically just stop at that
>> redzone point.  In some sense making the attack more effective because
>> then it's trivial to shut down an entire web server this way.
>
> Having a per-user limit doesn't necessarily mean that we have
> to apply the limit differently to how we apply the system-wide
> limits.  We could keep exactly the same code as we have now but
> check against a per-user limit instead of a system-wide one.
>
> In other words your apache scenario will continue to work as is
> even with a per-user limit.

I'm afraid that per-user limit won't work for system administrator,
because he can't know who is the rogue user in advance (before
such attack is made). And once the attack is made, system will
not responce because of the lack of memory for slab.

So if he only has per-user limit, he need to split the memory
budget for UDP to each user. The limit per user will be very
small if number of users in the system is large.


> Now where it does become useful is when we have a rogue local
> user.  As it is that user can chew up all of the budgeted TCP
> memory by simply not calling recvmsg.  As I've stated in the
> other email, the existing rlimits don't help because they're
> per-process rather than per-user.
>
> BTW, this is not fatal for TCP because TCP provides a minimum
> amount of memory for each socket even when we are over the
> limit.  However, if we this was implemented for UDP without
> a minimum guarantee then it'd be quite useless.

Hmm, I didn't realize that. Thank you for your good
suggestion. I will think of it.


>> I see no valid argument against doing something similar for sockets.
>> Such a register_shrinker() handler for TCP could, for example, look
>> for TCP flows which haven't made forward progress in more than a
>> certain amount of time and attempt to trim SKB memory from them.
>
> Yes I agree this would be quite useful for sending.  However, it'll
> be tough to shrink skbs that we've already acked for but the app
> for some reason has decided to leave in the socket by not calling
> recvmsg.
>
>> UDP and other datagram sockets are troublesome because the memory
>> gets wholly tied up immediately during the send call and it's not
>> easy to liberate anything.  The nice part about datagram sockets,
>> however, is that they make forward progress quickly and their
>> memory is liberated as soon as the device transmits the packet.
>> They don't have to wait for ACKs, windows openning up, or anything
>> like that to happen.
>
> Agreed.  Also the recvmsg case I've described above is much
> simpler for UDP as we can just go through all the sockets and
> free skbs at random :)
>
>> To be honest I don't even think UDP is much of a real problem for this
>> reason.
>
> It's not a hard problem but we do need to have some code for it.

I believe so. Currently, a nasty user can easily stop the system
without root privilege. This may not be a serious problem, but
this is the problem to be fixed.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-10-01 13:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-21 12:18 [RFC/PATCH 0/3] UDP memory usage accounting Satoshi OSHIMA
2007-09-21 12:58 ` Evgeniy Polyakov
2007-09-27 18:51   ` Hideo AOKI
2007-09-28 13:26   ` Satoshi OSHIMA
2007-09-29  3:21     ` Herbert Xu
2007-09-29  4:47       ` David Miller
2007-09-29  4:51         ` David Miller
2007-09-29  5:20           ` Herbert Xu
2007-10-01 13:56             ` Satoshi OSHIMA
2007-09-29  5:55         ` Herbert Xu
2007-10-01 13:57           ` Satoshi OSHIMA

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).