From: Ying Xue <ying.xue@windriver.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>,
David Miller <davem@davemloft.net>, <netdev@vger.kernel.org>,
Jon Maloy <jon.maloy@ericsson.com>,
Erik Hugne <erik.hugne@ericsson.com>
Subject: Re: [PATCH net-next 01/12] tipc: change socket buffer overflow control to respect sk_rcvbuf
Date: Tue, 4 Jun 2013 09:37:54 +0800 [thread overview]
Message-ID: <51AD44F2.5060108@windriver.com> (raw)
In-Reply-To: <20130603131624.GA7860@hmsreliant.think-freely.org>
On 06/03/2013 09:16 PM, Neil Horman wrote:
> On Mon, Jun 03, 2013 at 05:55:06PM +0800, Ying Xue wrote:
>> On 05/31/2013 09:36 PM, Neil Horman wrote:
>>> On Thu, May 30, 2013 at 03:36:06PM -0400, Paul Gortmaker wrote:
>>>> From: Jon Maloy <jon.maloy@ericsson.com>
>>>>
>>>> As per feedback from the netdev community, we change the buffer
>>>> overflow protection algorithm in receiving sockets so that it
>>>> always respects the nominal upper limit set in sk_rcvbuf.
>>>>
>>>> Instead of scaling up from a small sk_rcvbuf value, which leads to
>>>> violation of the configured sk_rcvbuf limit, we now calculate the
>>>> weighted per-message limit by scaling down from a much bigger value,
>>>> still in the same field, according to the importance priority of the
>>>> received message.
>>>>
>>>> Cc: Neil Horman <nhorman@tuxdriver.com>
>>>> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>>>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
>>>> ---
>>>> net/tipc/socket.c | 13 +++++++------
>>>> 1 file changed, 7 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
>>>> index 515ce38..2dfabc7 100644
>>>> --- a/net/tipc/socket.c
>>>> +++ b/net/tipc/socket.c
>>>> @@ -1,7 +1,7 @@
>>>> /*
>>>> * net/tipc/socket.c: TIPC socket API
>>>> *
>>>> - * Copyright (c) 2001-2007, 2012 Ericsson AB
>>>> + * Copyright (c) 2001-2007, 2012-2013, Ericsson AB
>>>> * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
>>>> * All rights reserved.
>>>> *
>>>> @@ -203,6 +203,7 @@ static int tipc_create(struct net *net, struct socket *sock, int protocol,
>>>>
>>>> sock_init_data(sock, sk);
>>>> sk->sk_backlog_rcv = backlog_rcv;
>>>> + sk->sk_rcvbuf = CONN_OVERLOAD_LIMIT;
>>> The last time Jon and I discussed this, I thought the consensus was to export
>>> sk_rcvbuf via its own sysctl, or tie it to sysctl_rmem (while requiring a
>>> protocol specific minimum on top of that), so administrators on memory
>>> constrained systems didn't wonder why their sysctl changes weren't being
>>> honored.
>>
>> Yes, your suggestion is reasonable, and I prefer to involve
>> net.tipc.sysctl_rmem. But I have one question about it:
>>
>> As you suggested as belows, the default value of sk->sk_rcvbuf is set to
>> sk->sk_rcvbuf >> 4 << msg_importance(TIPC_CRITICAL_IMPORTANCE), that is,
>> sk->sk_rcvbuf is about 32MB.
>>
>> However, please see below code:
>>
>> int sock_setsockopt()
>> {
>> ...
>> case SO_RCVBUF:
>> /* Don't error on this BSD doesn't and if you think
>> * about it this is right. Otherwise apps have to
>> * play 'guess the biggest size' games. RCVBUF/SNDBUF
>> * are treated in BSD as hints
>> */
>> val = min_t(u32, val, sysctl_rmem_max);
>> set_rcvbuf:
>> sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
>> /*
>> * We double it on the way in to account for
>> * "struct sk_buff" etc. overhead. Applications
>> * assume that the SO_RCVBUF setting they make will
>> * allow that much actual data to be received on that
>> * socket.
>> *
>> * Applications are unaware that "struct sk_buff" and
>> * other overheads allocate from the receive buffer
>> * during socket buffer allocation.
>> *
>> * And after considering the possible alternatives,
>> * returning the value we actually used in getsockopt
>> * is the most desirable behavior.
>> */
>> sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
>> break;
>> ...
>> }
>>
>> From above logic of setting sk->sk_rcvbuf with SO_RCVBUF, it only
>> permits the maximum value of sk->sk_rcvbuf to sysctl_rmem_max * 2(ie,
>> about 400KB normally).
>>
>> So, even if the default value of sk->sk_rcvbuf is set to 32MB with
>> net.tipc.sysctl_rmem, a bit smaller value than the default value can
>> never be set to sk->sk_rcvbuf successfully with SO_RCVBUF option.
>>
>> How can we avoid the limit?
>>
> By administratively adjusting sysctl_rmem_max to be a sufficiently large value
> such that using SO_RCVBUF won't be clamed to a lower limit.
>
> If you don't want to force users to have to manually adjust the sysctl, there
> might be support for you to automatically update sysctl_rmem_max in your
> tipc_init routine, and print an informational message indicating that tipc
> requires the additional space (although I still maintain its not strictly
> needed, but thats another argument).
>
Thanks for your clear clarification.
I also have the same concern. If we override sysctl_rmem_max in
tipc_init() with a larger value, I am afraid that other guys will oppose
the behaviour.
The truth is that little TIPC user adjusts the sk->sk_rcvbuf with
SO_RCVBUF option in practice. If he really wants to do, he should follow
your suggestion he manually enlarges the sysctl.
OK, I will rewrite the patch with your suggestion.
Regards,
Ying
> Neil
>
>
>
next prev parent reply other threads:[~2013-06-04 1:38 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-30 19:36 [PATCH net-next 00/12] tipc: make use of kernel threads to simplify things Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 01/12] tipc: change socket buffer overflow control to respect sk_rcvbuf Paul Gortmaker
2013-05-31 13:36 ` Neil Horman
2013-06-03 9:55 ` Ying Xue
2013-06-03 13:16 ` Neil Horman
2013-06-04 1:37 ` Ying Xue [this message]
2013-06-04 13:40 ` Neil Horman
2013-05-30 19:36 ` [PATCH net-next 02/12] tipc: Add "max_ports" configuration parameter Paul Gortmaker
2013-05-30 22:49 ` David Miller
2013-05-31 8:25 ` Erik Hugne
2013-05-31 8:29 ` David Miller
2013-05-31 8:34 ` Erik Hugne
2013-05-31 8:40 ` David Miller
2013-05-31 9:23 ` Erik Hugne
2013-05-31 9:25 ` David Laight
2013-05-31 9:26 ` David Miller
2013-05-31 9:06 ` David Laight
2013-05-31 17:48 ` Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 03/12] tipc: allow implicit connect for stream sockets Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 04/12] tipc: introduce new TIPC server infrastructure Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 05/12] tipc: convert topology server to use new server facility Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 06/12] tipc: convert configuration " Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 07/12] tipc: delete code orphaned by new server infrastructure Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 08/12] tipc: remove user_port instance from tipc_port structure Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 09/12] tipc: rename tipc_createport_raw to tipc_createport Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 10/12] tipc: convert config_lock from spinlock to mutex Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 11/12] tipc: save sock structure pointer instead of void pointer to tipc_port Paul Gortmaker
2013-05-30 19:36 ` [PATCH net-next 12/12] tipc: cosmetic realignment of function arguments Paul Gortmaker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51AD44F2.5060108@windriver.com \
--to=ying.xue@windriver.com \
--cc=davem@davemloft.net \
--cc=erik.hugne@ericsson.com \
--cc=jon.maloy@ericsson.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=paul.gortmaker@windriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.