netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: khc@pm.waw.pl, netdev@vger.kernel.org, satoru.satoh@gmail.com
Subject: Re: [PATCH] net: reduce number of reference taken on sk_refcnt
Date: Thu, 21 May 2009 11:07:19 +0200	[thread overview]
Message-ID: <4A1519C7.3090901@cosmosbay.com> (raw)
In-Reply-To: <20090518.215823.98238538.davem@davemloft.net>

David Miller a écrit :
> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Sun, 10 May 2009 12:45:56 +0200
> 
>> Patch follows for RFC only (not Signed-of...), and based on net-next-2.6 
> 
> Thanks for the analysis.
> 
>> @@ -922,10 +922,13 @@ static inline int tcp_prequeue(struct sock *sk, struct sk_buff *skb)
>>  	} else if (skb_queue_len(&tp->ucopy.prequeue) == 1) {
>>  		wake_up_interruptible_poll(sk->sk_sleep,
>>  					   POLLIN | POLLRDNORM | POLLRDBAND);
>> -		if (!inet_csk_ack_scheduled(sk))
>> +		if (!inet_csk_ack_scheduled(sk)) {
>> +			unsigned int delay = (3 * tcp_rto_min(sk)) / 4;
>> +
>> +			delay = min(inet_csk(sk)->icsk_ack.ato, delay);
>>  			inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
>> -						  (3 * tcp_rto_min(sk)) / 4,
>> -						  TCP_RTO_MAX);
>> +						  delay, TCP_RTO_MAX);
>> +		}
>>  	}
>>  	return 1;
> 
> I think this code is trying to aggressively stretch the ACK when
> prequeueing.  In order to make sure there is enough time to get
> the process on the CPU and send a response, and thus piggyback
> the ACK.
> 
> If that turns out not to really matter, or matter less than your
> problem, then we can make your change and I'm all for it.

This change gave me about 15% increase in bandwidth in a multiflow
tcp benchmark. But this optimization worked because tasks could be
wakeup and send their answer in the same jiffies, and 'rearming'
the xmit timer with the same value...

(135.000 messages received/sent per second in my benchmark, with 60 flows)

mod_timer() has special heuristic to avoid calling __mod_timer()

int mod_timer(struct timer_list *timer, unsigned long expires)
{
        /*
         * This is a common optimization triggered by the
         * networking code - if the timer is re-modified
         * to be the same thing then just return:
         */
        if (timer->expires == expires && timer_pending(timer))
                return 1;

        return __mod_timer(timer, expires, false);
}

with HZ=1000, and real applications (using more than 1 msec to process the request),
I suppose this kind of optimization is unlikely to happen,
so we might extend mod_timer() heuristic to avoid changing timer->expires
if the new value is almost the same than previous, and not "exactly the same value"

int mod_timer_unexact(struct timer_list *timer, unsigned long expires, long maxerror)
{
        /*
         * This is a common optimization triggered by the
         * networking code - if the timer is re-modified
         * to be about the same thing then just return:
         */
        if (timer_pending(timer)) {
		long delta = expires - timer->expires;

		if (delta <= maxerror && delta >= -maxerror)
	                return 1;
	}
        return __mod_timer(timer, expires, false);
}



But to be effective, prequeue needs a blocked task
for each flow, and modern daemons prefer to use poll/epoll and
prequeue is thus not used.

Another possibility would be to use a different timer for prequeue 
exclusive use instead of sharing xmit_timer.




  reply	other threads:[~2009-05-21  9:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-08 12:32 NAPI and TX Krzysztof Halasa
2009-05-08 14:24 ` Ben Hutchings
2009-05-08 15:12 ` [PATCH] net: reduce number of reference taken on sk_refcnt Eric Dumazet
2009-05-08 21:48   ` David Miller
2009-05-09 12:13     ` Eric Dumazet
2009-05-09 20:34       ` David Miller
2009-05-09 20:40         ` David Miller
2009-05-10  7:09           ` Eric Dumazet
2009-05-10  7:43             ` Eric Dumazet
2009-05-10 10:45               ` Eric Dumazet
2009-05-19  4:58                 ` David Miller
2009-05-21  9:07                   ` Eric Dumazet [this message]
2009-05-09 20:36       ` David Miller
2009-05-08 21:44 ` NAPI and TX David Miller
2009-05-09 12:27   ` Krzysztof Halasa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A1519C7.3090901@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=khc@pm.waw.pl \
    --cc=netdev@vger.kernel.org \
    --cc=satoru.satoh@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).