From: Eric Dumazet <eric.dumazet@gmail.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Richard Cochran <richardcochran@gmail.com>,
David Miller <davem@davemloft.net>,
netdev@vger.kernel.org
Subject: Re: [RFC] net: remove erroneous sk null assignment in timestamping
Date: Sat, 08 Oct 2011 10:57:18 +0200 [thread overview]
Message-ID: <1318064238.5276.2.camel@edumazet-laptop> (raw)
In-Reply-To: <1318061808.3991.12.camel@jlt3.sipsolutions.net>
Le samedi 08 octobre 2011 à 10:16 +0200, Johannes Berg a écrit :
> I'm not terribly familiar with struct sock. Looking at it, I'm a bit
> confused by skb_orphan() -- it doesn't put the sock reference. So are
> sockets not refcounted for skbs in this way? They seem to use
> sock_wfree() which does a bit more than this it seems, and I don't see
> it using sk_refcnt anywhere so I'm a bit confused now.
Check following commit changelog to get some information on this.
commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu Jun 11 02:55:43 2009 -0700
net: No more expensive sock_hold()/sock_put() on each tx
One of the problem with sock memory accounting is it uses
a pair of sock_hold()/sock_put() for each transmitted packet.
This slows down bidirectional flows because the receive path
also needs to take a refcount on socket and might use a different
cpu than transmit path or transmit completion path. So these
two atomic operations also trigger cache line bounces.
We can see this in tx or tx/rx workloads (media gateways for example),
where sock_wfree() can be in top five functions in profiles.
We use this sock_hold()/sock_put() so that sock freeing
is delayed until all tx packets are completed.
As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
by one unit at init time, until sk_free() is called.
Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
to decrement initial offset and atomicaly check if any packets
are in flight.
skb_set_owner_w() doesnt call sock_hold() anymore
sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
reached 0 to perform the final freeing.
Drawback is that a skb->truesize error could lead to unfreeable sockets, or
even worse, prematurely calling __sk_free() on a live socket.
Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
contention point. 5 % speedup on a UDP transmit workload (depends
on number of flows), lowering TX completion cpu usage.
next prev parent reply other threads:[~2011-10-08 8:57 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-07 17:11 [RFC] net: remove erroneous sk null assignment in timestamping Johannes Berg
2011-10-07 17:33 ` David Miller
2011-10-07 17:40 ` Johannes Berg
2011-10-07 17:47 ` Johannes Berg
2011-10-07 17:53 ` Johannes Berg
2011-10-07 18:42 ` Johannes Berg
2011-10-08 7:59 ` Richard Cochran
2011-10-08 7:57 ` Richard Cochran
2011-10-08 8:16 ` Johannes Berg
2011-10-08 8:57 ` Eric Dumazet [this message]
2011-10-08 10:32 ` Johannes Berg
2011-10-11 13:34 ` Richard Cochran
2011-10-08 10:35 ` Richard Cochran
2011-10-12 18:36 ` [PATCH 1/1] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-12 19:25 ` Eric Dumazet
2011-10-12 19:27 ` Johannes Berg
2011-10-12 19:52 ` Eric Dumazet
2011-10-13 8:54 ` Johannes Berg
2011-10-13 4:51 ` Richard Cochran
2011-10-13 9:46 ` [PATCH 0/3] net: time stamping fixes Richard Cochran
2011-10-13 9:46 ` [PATCH 1/3] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-19 4:42 ` Eric Dumazet
2011-10-13 9:46 ` [PATCH 2/3] dp83640: use proper function to free transmit time stamping packets Richard Cochran
2011-10-19 4:47 ` Eric Dumazet
2011-10-13 9:46 ` [PATCH 3/3] dp83640: free packet queues on remove Richard Cochran
2011-10-19 4:48 ` Eric Dumazet
2011-10-19 4:16 ` [PATCH 0/3] net: time stamping fixes David Miller
2011-10-19 5:15 ` Johannes Berg
2011-10-19 11:50 ` Richard Cochran
2011-10-19 12:33 ` Eric Dumazet
2011-10-19 12:38 ` Eric Dumazet
2011-10-19 12:58 ` Johannes Berg
2011-10-19 13:09 ` Johannes Berg
2011-10-19 13:25 ` Eric Dumazet
2011-10-19 13:35 ` Johannes Berg
2011-10-19 13:44 ` Eric Dumazet
2011-10-19 13:57 ` Johannes Berg
2011-10-19 14:08 ` Eric Dumazet
2011-10-19 14:24 ` Johannes Berg
2011-10-19 14:27 ` Richard Cochran
2011-10-19 14:33 ` Eric Dumazet
2011-10-19 13:21 ` Eric Dumazet
2011-10-19 13:25 ` Johannes Berg
2011-10-19 13:27 ` Eric Dumazet
2011-10-19 13:32 ` Johannes Berg
2011-10-19 14:25 ` Richard Cochran
2011-10-21 10:49 ` [PATCH v2 " Richard Cochran
2011-10-21 10:49 ` [PATCH v2 1/3] net: hold sock reference while processing tx timestamps Richard Cochran
2011-10-21 11:31 ` Eric Dumazet
2011-10-24 6:55 ` David Miller
2011-10-21 11:44 ` Johannes Berg
2011-10-21 10:49 ` [PATCH v2 2/3] dp83640: use proper function to free transmit time stamping packets Richard Cochran
2011-10-24 6:55 ` David Miller
2011-10-24 17:47 ` Richard Cochran
2011-10-24 23:16 ` David Miller
2011-10-21 10:49 ` [PATCH v2 3/3] dp83640: free packet queues on remove Richard Cochran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1318064238.5276.2.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=davem@davemloft.net \
--cc=johannes@sipsolutions.net \
--cc=netdev@vger.kernel.org \
--cc=richardcochran@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox