From: Eric Dumazet <dada1@cosmosbay.com>
To: David Miller <davem@davemloft.net>
Cc: khc@pm.waw.pl, netdev@vger.kernel.org
Subject: Re: [PATCH] net: reduce number of reference taken on sk_refcnt
Date: Sun, 10 May 2009 09:09:18 +0200 [thread overview]
Message-ID: <4A067D9E.7050706@cosmosbay.com> (raw)
In-Reply-To: <20090509.134002.258408495.davem@davemloft.net>
David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Sat, 09 May 2009 13:34:54 -0700 (PDT)
>
>> Consider the case where we always send some message on CPU A and
>> then process the ACK on CPU B. We'll always be cancelling the
>> timer on a foreign cpu.
>
> I should also mention that TCP has a peculiar optimization of timers
> that is likely being thwarted by your workload. It never deletes
> timers under normal operation, it simply lets them still expire
> and the handler notices that there is "nothing to do" and returns.
Yes, you refer to INET_CSK_CLEAR_TIMERS condition, never set.
>
> But when the connection does shut down, we have to purge all of
> these timers.
>
> That could be another part of why you see timers in your profile.
>
>
Well, in my workload they should never expire, since application exchange
enough data on both direction, and they are no losses (Gigabit LAN context)
On machine acting as a server (the one I am focusing to, of course),
each incoming frame :
- Contains ACK for the previous sent frame
- Contains data provided by the client.
- Starts a timer for delayed ACK
Then server applications reacts and sends a new payload, and TCP stack
- Sends a frame including ACK for previous received frame
- Contains data provided by server application
- Starts a timer for retransmiting this frame if no ACK is received later.
So yes, each incoming and each outgoing frame is going to call mod_timer()
Problem is that incoming process is done by CPU 0 (the one that is dedicated
to NAPI processing because of stress situation, cpu 100% in softirq land),
and outgoing processing done by other cpus in the machine.
offsetof(struct inet_connection_sock, icsk_retransmit_timer)=0x208
offsetof(struct inet_connection_sock, icsk_delack_timer)=0x238
So there are cache line ping-pongs, but oprofile seems to point
to a spinlock contention in lock_timer_base(), I dont know why...
shouldnt (in my workload) delack_timer all belongs to cpu 0, and
retransmit_timers to other cpus ?
Or is mod_timer never migrates an already established timer ?
That would explain the lock contention on timer_base, we should
take care of it if possible.
Thanks David
next prev parent reply other threads:[~2009-05-10 7:09 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-08 12:32 NAPI and TX Krzysztof Halasa
2009-05-08 14:24 ` Ben Hutchings
2009-05-08 15:12 ` [PATCH] net: reduce number of reference taken on sk_refcnt Eric Dumazet
2009-05-08 21:48 ` David Miller
2009-05-09 12:13 ` Eric Dumazet
2009-05-09 20:34 ` David Miller
2009-05-09 20:40 ` David Miller
2009-05-10 7:09 ` Eric Dumazet [this message]
2009-05-10 7:43 ` Eric Dumazet
2009-05-10 10:45 ` Eric Dumazet
2009-05-19 4:58 ` David Miller
2009-05-21 9:07 ` Eric Dumazet
2009-05-09 20:36 ` David Miller
2009-05-08 21:44 ` NAPI and TX David Miller
2009-05-09 12:27 ` Krzysztof Halasa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A067D9E.7050706@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=khc@pm.waw.pl \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.