All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [RFC PATCH net-next] tcp: reduce cpu usage under tcp memory pressure when SO_SNDBUF is set
Date: Mon, 10 Aug 2015 13:29:15 -0400	[thread overview]
Message-ID: <55C8DF6B.9080509@akamai.com> (raw)
In-Reply-To: <1439218022.1084.3.camel@edumazet-glaptop2.roam.corp.google.com>

On 08/10/2015 10:47 AM, Eric Dumazet wrote:
> On Fri, 2015-08-07 at 18:31 +0000, Jason Baron wrote:
>> From: Jason Baron <jbaron@akamai.com>
>>
>> When SO_SNDBUF is set and we are under tcp memory pressure, the effective write
>> buffer space can be much lower than what was set using SO_SNDBUF. For example,
>> we may have set the buffer to 100kb, but we may only be able to write 10kb. In
>> this scenario poll()/select()/epoll(), are going to continuously return POLLOUT,
>> followed by -EAGAIN from write() in a very tight loop.
>>
>> Introduce sk->sk_effective_sndbuf, such that we can track the 'effective' size
>> of the sndbuf, when we have a short write due to memory pressure. By using the
>> sk->sk_effective_sndbuf instead of the sk->sk_sndbuf when we are under memory
>> pressure, we can delay the POLLOUT until 1/3 of the buffer clears as we normally
>> do. There is no issue here when SO_SNDBUF is not set, since the tcp layer will
>> auto tune the sk->sndbuf.
>>
>> In my testing, this brought a single threaad's cpu usage down from 100% to 1%
>> while maintaining the same level of throughput when under memory pressure.
>>
> 
> I am not sure we need to grow socket for something that looks like a
> flag ?
>


So I added a new field because I needed to store the new 'effective'
sndbuf somewhere and then restore the original value that was set via
SO_SNDBUF. So its really b/c of SO_SNDBUF. We could perhaps use the fact
that we are in memory pressure to signal wakeups differently, but I'm
not sure exactly how.


> Also you add a race in sk_stream_wspace() as sk_effective_sndbuf value
> can change under us.
> 
> +       if (sk->sk_effective_sndbuf)
> +               return sk->sk_effective_sndbuf - sk->sk_wmem_queued;
> +
> 
> 
> 
> 

thanks. better?

--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -798,8 +798,10 @@ static inline int sk_stream_min_wspace(const struct
sock *sk)

 static inline int sk_stream_wspace(const struct sock *sk)
 {
-       if (sk->sk_effective_sndbuf)
-               return sk->sk_effective_sndbuf - sk->sk_wmem_queued;
+       int effective_sndbuf = sk->sk_effective_sndbuf;
+
+       if (effective_sndbuf)
+               return effective_sndbuf - sk->sk_wmem_queued;

        return sk->sk_sndbuf - sk->sk_wmem_queued;
 }


Thanks,

-Jason

  reply	other threads:[~2015-08-10 17:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-07 18:31 [RFC PATCH net-next] tcp: reduce cpu usage under tcp memory pressure when SO_SNDBUF is set Jason Baron
2015-08-10 14:47 ` Eric Dumazet
2015-08-10 17:29   ` Jason Baron [this message]
2015-08-10 21:26     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55C8DF6B.9080509@akamai.com \
    --to=jbaron@akamai.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.