Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ying Xue <ying.xue@windriver.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	David Miller <davem@davemloft.net>, <netdev@vger.kernel.org>,
	Ying Xue <ying.xue@windriver.com>
Subject: Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
Date: Mon, 10 Dec 2012 14:27:50 +0800	[thread overview]
Message-ID: <50C580E6.7030905@windriver.com> (raw)
In-Reply-To: <20121209165020.GA4362@neilslaptop.think-freely.org>

Neil Horman wrote:
> On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
>   
>> On 12/07/2012 02:20 PM, Neil Horman wrote:
>>     
>>> On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
>>>       
>>>> From: Ying Xue <ying.xue@windriver.com>
>>>>
>>>> The sk_receive_queue limit control is currently performed for
>>>> all arriving messages, disregarding socket and message type.
>>>> But for connected sockets this check is redundant, since the protocol
>>>> flow control already makes queue overflow impossible.
>>>>
>>>>         
>>> Can you explain where that occurs?  
>>>       
>> It happens in the functions port_dispatcher_sigh() and tipc_send(), 
>> among other places. Both are to be found in the file port.c, which 
>> was supposed to contain the 'generic' (i.e., API independent) part 
>> of the send/receive code.
>> Now that we have only one API left, the socket API, we are 
>> planning to merge the code in socket.c and port.c, and get rid of 
>> some code overhead.
>>
>> The flow control in TIPC is message based, where the sender
>> requires to receive an explicit acknowledge message for each 
>> 512 message the receiver reads to user space.
>> If the sender has more than 1024 messages outstanding without having
>> received an acknowledge he will be suspended or receive EAGAIN until 
>> he does.
>> The plan going forward is to replace this mechanism with a more 
>> standard looking byte based flow control, while maintaining 
>> backwards compatibility.
>>
>>     
> Ok, That makes more sense, thank you.  Although I still don't think this is
> safe (but the problem may not be solely introduced by this patch).  Using a
> global limit that assumes the sender will block when the congestion window is
> reached just doesn't seem sane to me.  It clearly works with the Linux
> implementation, as it conforms to your expectations, but an alternate
> implementation could create a DOS situation by simply ignoring the window limit,
> and continuing to send.  I see that we drop frames over the global limit in
> filter_rcv, but the check in rx_queue_full bumps up that limit based on the
> value of msg_importance(msg), but that threshold is ignored if the value of
> msg_importance is invalid.  All a sender needs to do is flood a receiver with
> frames containing an invalid set of message importance bits, and you will queue
> frames indefinately.  In fact that will also happen if you send message of
> CRITICAL importance as well, so you don't even need to supply an invalid value
> here.
>
>   

You are absolutely right. I will correct these drawbacks in next version.

>>> I see where the tipc dispatch function calls
>>> sk_add_backlog, which checks the per socket recieve queue (regardless of weather
>>> the receiving socket is connection oriented or connectionless), but if the
>>> receiver doesn't call receive very often, This just adds a check against your
>>> global limit, doing nothing for your per-socket limits. 
>>>       
>> OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
>> our per-socket limit. In fact, TIPC connectionless overflow control currently 
>> is a kind of a hybrid, based on a message counter when the socket is not locked, 
>> and based on sk_rcv_queue's byte limit when a message has to be added to the 
>> backlog.
>> We are planning to fix this inconsistency too.
>>     
> Good, thank you,  that was seeming quite wrong to me.
>
>   
>>  In fact it seems to
>>     
>>> repeat the same check twice, as in the worst case of the incomming message being
>>> TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
>>> OVERLOAD_LIMIT_BASE/2 again.
>>>       
>> Yes, you are right. The intention is that only the first test, 
>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
>> will be run for the vast majority of messages, since we must assume
>> that there is no overload most of the time.
>> An inelegant optimization, perhaps, but not logically wrong.
>>     
> No, not logically wrong, but not an optimization either.  With this change,
> your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
> rx_queue_full, and then you do some multiplication based on that.  If you really
> want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
> doubling it like this patch series does), mark rx_queue_full as inline, and just
> pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
> the conditional branch and a call instruction.  If you add a multiplication
> factor table, you can eliminate the if/else clauses in rx_queue_full as well.
>
>   

Good suggestion with a factor table. Maybe it's unnecessary to 
explicitly mark rx_queue_full as inline. Currently it sounds like we let 
complier decide whether a function is defined as inline or not.

Regards,
Ying

> Neil
>
>   
>> ///jon
>>
>>     
>>> Neil
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>       
>>     
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

next prev parent reply	other threads:[~2012-12-10  6:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 01/10] tipc: remove obsolete flush of stale reassembly buffer Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit Paul Gortmaker
2012-12-07 16:07   ` Neil Horman
2012-12-07 17:36     ` David Miller
2012-12-07 19:54       ` Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets Paul Gortmaker
2012-12-07 19:20   ` Neil Horman
2012-12-07 22:30     ` Jon Maloy
2012-12-09 16:50       ` Neil Horman
2012-12-10  6:27         ` Ying Xue [this message]
2012-12-10  8:46           ` Jon Maloy
2012-12-10 14:22           ` Neil Horman
2012-12-07 14:28 ` [PATCH net-next 04/10] tipc: change sk_receive_queue upper limit Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 05/10] tipc: standardize across connect/disconnect function naming Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 06/10] tipc: consolidate connection-oriented message reception in one function Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 07/10] tipc: introduce non-blocking socket connect Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 08/10] tipc: eliminate connection setup for implied connect in recv_msg() Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 09/10] tipc: add lock nesting notation to quiet lockdep warning Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 10/10] tipc: refactor accept() code for improved readability Paul Gortmaker
2012-12-07 19:42   ` Neil Horman
2012-12-07 19:56     ` Paul Gortmaker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C580E6.7030905@windriver.com \
    --to=ying.xue@windriver.com \
    --cc=davem@davemloft.net \
    --cc=jon.maloy@ericsson.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=paul.gortmaker@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).