Re: getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behaviour

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behaviour
       [not found] ` <5006DD6B.3030300@pu-pm.univ-fcomte.fr>
@ 2012-07-18 16:11   ` Eric Dumazet
  2012-07-18 17:32     ` Rick Jones
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2012-07-18 16:11 UTC (permalink / raw)
  To: Eugen Dedu; +Cc: linux-kernel, netdev

On Wed, 2012-07-18 at 17:59 +0200, Eugen Dedu wrote:
> Any idea?
> 
> On 17/07/12 11:27, Eugen Dedu wrote:
> > Hi all,
> >
> > I looked on Internet and at the old thread
> > http://lkml.indiana.edu/hypermail/linux/kernel/0108.0/0275.html, but the
> > issue is still not settled as far as I see.
> >
> > I need to have the highest memory available for snd/rcv buffer and I
> > need to know/confirm how much it allocated for my process (how much I
> > can use).
> >
> > So with Linux we need to do something like:
> > setsockopt (..., SO_RCVBUF, 256000, ...)
> > getsockopt (..., SO_RCVBUF, &i, ...)
> > i /= 2;
> >
> > where i is the size I am looking for.
> >
> > Now, to make this code work for other OSes it should be changed to:
> > setsockopt (..., SO_RCVBUF, 256000, ...)
> > getsockopt (..., SO_RCVBUF, &i, ...)
> > #ifdef LINUX
> > i /= 2;
> > #endif
> >
> > First question, is this code correct? If not, what code gives the amount
> > of memory useable for my process?
> >
> > Second, it seems to me that linux is definitely "non-standard" here.
> > Saying that linux uses twice as memory has nothing to do with that,
> > since getsockopt should return what the application can count on, not
> > what is the internal use. It is like a hypothetical malloc (10) would
> > return not 10, but 20 (including meta-information). Is that right?
> >
> > Cheers,

That the way it's done on linux since day 0

You can probably find a lot of pages on the web explaining the
rationale.

If your application handles UDP frames, what SO_RCVBUF should count ?

If its the amount of payload bytes, you could have a pathological
situation where an attacker sends 1-byte UDP frames fast enough and
could consume a lot of kernel memory.

Each frame consumes a fair amount of kernel memory (between 512 bytes
and 8 Kbytes depending on the driver).

So linux says : If user expect to receive  XXXX bytes, set a limit of
_kernel_ memory used to store these bytes, and use an estimation of 100%
of overhead. That is : allow 2*XXXX bytes to be allocated for socket
receive buffers.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behaviour
  2012-07-18 16:11   ` getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behaviour Eric Dumazet
@ 2012-07-18 17:32     ` Rick Jones
  2012-07-19 16:14       ` Eugen Dedu
  0 siblings, 1 reply; 3+ messages in thread
From: Rick Jones @ 2012-07-18 17:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eugen Dedu, linux-kernel@vger.kernel.org, netdev

On 07/18/2012 09:11 AM, Eric Dumazet wrote:
>
> That the way it's done on linux since day 0
>
> You can probably find a lot of pages on the web explaining the
> rationale.
>
> If your application handles UDP frames, what SO_RCVBUF should count ?
>
> If its the amount of payload bytes, you could have a pathological
> situation where an attacker sends 1-byte UDP frames fast enough and
> could consume a lot of kernel memory.
>
> Each frame consumes a fair amount of kernel memory (between 512 bytes
> and 8 Kbytes depending on the driver).
>
> So linux says : If user expect to receive  XXXX bytes, set a limit of
> _kernel_ memory used to store these bytes, and use an estimation of 100%
> of overhead. That is : allow 2*XXXX bytes to be allocated for socket
> receive buffers.

Expanding on/rewording that, in a setsockopt() call SO_RCVBUF specifies 
the data bytes and gets doubled to become the kernel/overhead byte 
limit.  Unless the doubling would be greater than net.core.rmem_max, in 
which case the limit becomes net.core.rmem_max.

But on getsockopt() SO_RCVBUF is always the kernel/overhead byte limit.

In one call it is fish.  In the other it is fowl.

Other stacks appear to keep their kernel/overhead limit quiet, keeping 
SO_RCVBUF an expression of a data limit in both setsockopt() and 
getsockopt().  With those stacks, there is I suppose the possible source 
of confusion when/if someone tests the queuing to a socket, sends "high 
overhead" packets and doesn't get to SO_RCVBUF worth of data though I 
don't recall encountering that in my "pre-linux" time.

The sometimes fish, sometimes fowl version (along with the auto tuning 
when one doesn't make setsockopt() calls) gave me fits in netperf for 
years until I finally relented and split the socket buffer size 
variables into three - what netperf's user requested via the command 
line, what it was right after the socket was created, and what it was at 
the end of the data phase of the test.

rick jones

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behaviour
  2012-07-18 17:32     ` Rick Jones
@ 2012-07-19 16:14       ` Eugen Dedu
  0 siblings, 0 replies; 3+ messages in thread
From: Eugen Dedu @ 2012-07-19 16:14 UTC (permalink / raw)
  To: Rick Jones; +Cc: Eric Dumazet, linux-kernel@vger.kernel.org, netdev

On 18/07/12 19:32, Rick Jones wrote:
> On 07/18/2012 09:11 AM, Eric Dumazet wrote:
>>
>> That the way it's done on linux since day 0
>>
>> You can probably find a lot of pages on the web explaining the
>> rationale.
>>
>> If your application handles UDP frames, what SO_RCVBUF should count ?
>>
>> If its the amount of payload bytes, you could have a pathological
>> situation where an attacker sends 1-byte UDP frames fast enough and
>> could consume a lot of kernel memory.
>>
>> Each frame consumes a fair amount of kernel memory (between 512 bytes
>> and 8 Kbytes depending on the driver).
>>
>> So linux says : If user expect to receive XXXX bytes, set a limit of
>> _kernel_ memory used to store these bytes, and use an estimation of 100%
>> of overhead. That is : allow 2*XXXX bytes to be allocated for socket
>> receive buffers.
>
> Expanding on/rewording that, in a setsockopt() call SO_RCVBUF specifies
> the data bytes and gets doubled to become the kernel/overhead byte
> limit. Unless the doubling would be greater than net.core.rmem_max, in
> which case the limit becomes net.core.rmem_max.
>
> But on getsockopt() SO_RCVBUF is always the kernel/overhead byte limit.
>
> In one call it is fish. In the other it is fowl.
>
> Other stacks appear to keep their kernel/overhead limit quiet, keeping
> SO_RCVBUF an expression of a data limit in both setsockopt() and
> getsockopt(). With those stacks, there is I suppose the possible source
> of confusion when/if someone tests the queuing to a socket, sends "high
> overhead" packets and doesn't get to SO_RCVBUF worth of data though I
> don't recall encountering that in my "pre-linux" time.

Thank you to both for the answers.  As I understand, it it is impossible 
(or not practical) to fulfill sometimes user requirements on buff size, 
since if only 1-byte udp packets arrive and are not consumed by 
application, the memory needed by linux is say 1000 greater, which of 
course is not available.  Other OSes have the same problem (see above 
"doesn't get to SO_RCVBUF worth of data"), except that they return the 
same value in getsockopt as setsockopt.  However, note that with linux 
the confusion is still possible, even if it appears more rarely.

> The sometimes fish, sometimes fowl version (along with the auto tuning
> when one doesn't make setsockopt() calls) gave me fits in netperf for
> years until I finally relented and split the socket buffer size
> variables into three - what netperf's user requested via the command
> line, what it was right after the socket was created, and what it was at
> the end of the data phase of the test.

-- 
Eugen

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-07-19 16:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <50052FFE.2060109@pu-pm.univ-fcomte.fr>
     [not found] ` <5006DD6B.3030300@pu-pm.univ-fcomte.fr>
2012-07-18 16:11   ` getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behaviour Eric Dumazet
2012-07-18 17:32     ` Rick Jones
2012-07-19 16:14       ` Eugen Dedu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).