From: Jon Maloy <maloy@donjonn.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Paul Gortmaker <paul.gortmaker@windriver.com>,
"tipc-discussion@lists.sourceforge.net"
<tipc-discussion@lists.sourceforge.net>,
"davem@davemloft.net" <davem@davemloft.net>
Subject: Re: [PATCH net-next v2 2/8] tipc: compensate for double accounting in socket rcv buffer
Date: Thu, 15 May 2014 09:25:24 -0400 [thread overview]
Message-ID: <5374C044.6040802@donjonn.com> (raw)
In-Reply-To: <1400096531.7973.123.camel@edumazet-glaptop2.roam.corp.google.com>
On 05/14/2014 03:42 PM, Eric Dumazet wrote:
> On Wed, 2014-05-14 at 15:00 -0400, Jon Maloy wrote:
>
>> This is where I don't get it.
>>
>> sk_add_backlog(limit) is (via sk_rcvqueues_full) testing for
>>
>> (sk_backlog.len + sk_rmem_alloc) > limit
>>
>> But, if the receiving user is slow, sk_rmem_alloc will run full eventually, even if we
>> reduce sk_backlog.len with truesize of each transferred buffer, and sk_add_backlog
>> should then start throwing away packets. Why doesn't this happen?
> It definitely can happen if sender tries to send small packets, that
> have a high truesize/len ratio.
>
> $ nstat -a | egrep "TcpExtTCPBacklogDrop|IpInReceives|TcpExtTCPRcvCoalesce"
> IpInReceives 8357544624 0.0
> TcpExtTCPBacklogDrop 13 0.0
> TcpExtTCPRcvCoalesce 437826621 0.0
>
> You claim that "that sk_backlog.len does not
> give correct information about the buffer situation.", but really it
> does.
Backlog Q Recv Q Total: Reported
------------- --------- ------ -----------
bef. release_sock: 2 kB 0 kB 2 kB 2 kB
during rel_lock: 0 kB 2 kB 2 kB 4 kB
after rel_lock: 0 kB 2 kB 2 kB 2 kB
But I guess you understand this, so you must mean something else
when you say it gives a "correct information".
To me it looks wrong.
My perception of those queues is that they really are part of the
same logical queue, whose combined size must not exceed the
configured upper socket limit, otherwise we violate the rules we
are supposed to obey to regarding memory consumption per socket.
>
> Your problem seems that you do not use the appropriate 'limit',
> or assume very tight scheduling constraints (An incoming packet has to
> be immediately consumed by receiver, otherwise following packet might be
> dropped)
Relating to the above, the 'limit' we use in TIPC is the same both for
sk_add_backlog() and in tipc_filter_rcv(), so we can have a somewhat
deterministic behavior. But, given that the 'reported' value (the
one used by sk_add_backlog() ) is not correct at all moments, we
end up with seeing messages unnecessarily rejected. (Note that I
consistently say 'messages' here. In TIPC the units arriving in a socket
are complete, reassembled messages, not packets.)
>
> If rcvbuf_limit(sk, buf) is the limit for normal packets (sk_rmem_alloc)
> in receive queue, then you need something bigger to allow bursts.
>
>
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 3f9912f87d0d..fe4f37d8029a 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1457,7 +1457,7 @@ u32 tipc_sk_rcv(struct sock *sk, struct sk_buff *buf)
> if (!sock_owned_by_user(sk)) {
> res = filter_rcv(sk, buf);
> } else {
> - if (sk_add_backlog(sk, buf, rcvbuf_limit(sk, buf)))
> + if (sk_add_backlog(sk, buf, 2 * rcvbuf_limit(sk, buf)))
This would work, but is in reality just a cruder version of what I have done.
It would temporarily violate the configured memory constraint, while my
method doesn't.
I should also say that my first idea when I realized this problem was just
to do like this. But given that our (fix) socket buffer limit is already at a crazy
level, I ddn't find it a good idea to double it once more.
Now, since we are planning to post a patch series with byte-based flow control
control in a few months, I would like to avoid ending up in the same discussions
again, and maybe have to redo things. If you are are interested, I could give you
a more thorough background to the TIPC two-level flow control, and involve you
in our discussions. But I am not sure this mailing list is the right forum for that.
Should I cc you when we start discussing this internally? Only if you have time
and are interested, of course.
Regards
///jon
> res = TIPC_ERR_OVERLOAD;
> else
> res = TIPC_OK;
>
>
>
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
next prev parent reply other threads:[~2014-05-15 13:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-14 9:39 [PATCH net-next v2 0/8] tipc: bug fixes and improvements Jon Maloy
2014-05-14 9:39 ` [PATCH net-next v2 1/8] tipc: decrease connection flow control window Jon Maloy
2014-05-14 11:50 ` Eric Dumazet
2014-05-14 12:12 ` Jon Maloy
2014-05-14 9:39 ` [PATCH net-next v2 2/8] tipc: compensate for double accounting in socket rcv buffer Jon Maloy
2014-05-14 11:47 ` Eric Dumazet
2014-05-14 12:53 ` Jon Maloy
2014-05-14 17:37 ` David Miller
2014-05-14 17:52 ` Jon Maloy
2014-05-14 17:45 ` Eric Dumazet
2014-05-14 19:00 ` Jon Maloy
2014-05-14 19:42 ` Eric Dumazet
2014-05-15 13:25 ` Jon Maloy [this message]
2014-05-14 9:39 ` [PATCH net-next v2 3/8] tipc: don't record link RESET or ACTIVATE messages as traffic Jon Maloy
2014-05-14 9:39 ` [PATCH net-next v2 4/8] tipc: mark head of reassembly buffer as non-linear Jon Maloy
2014-05-14 9:39 ` [PATCH net-next v2 5/8] tipc: rename and move message reassembly function Jon Maloy
2014-05-14 9:39 ` [PATCH net-next v2 6/8] tipc: improve and extend media address conversion functions Jon Maloy
2014-05-14 9:39 ` [PATCH net-next v2 7/8] tipc: clean up neigbor discovery message reception Jon Maloy
2014-05-14 9:39 ` [PATCH net-next v2 8/8] tipc: merge port message reception into socket reception function Jon Maloy
2014-05-14 19:20 ` [PATCH net-next v2 0/8] tipc: bug fixes and improvements David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5374C044.6040802@donjonn.com \
--to=maloy@donjonn.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=jon.maloy@ericsson.com \
--cc=netdev@vger.kernel.org \
--cc=paul.gortmaker@windriver.com \
--cc=tipc-discussion@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).