netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: netdev@vger.kernel.org, roopa@cumulusnetworks.com, davem@davemloft.net
Subject: Re: [PATCH RFC net-next 0/4] bridge: improve cache utilization
Date: Tue, 31 Jan 2017 08:39:19 -0800	[thread overview]
Message-ID: <20170131083919.51a3ac9f@xeon-e3> (raw)
In-Reply-To: <1485876718-18091-1-git-send-email-nikolay@cumulusnetworks.com>

On Tue, 31 Jan 2017 16:31:54 +0100
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> Hi all,
> This is the first set which begins to deal with the bad bridge cache
> access patterns. The first patch rearranges the bridge and port structs
> a little so the frequently (and closely) accessed members are in the same
> cache line. The second patch then moves the garbage collection to a
> workqueue trying to improve system responsiveness under load (many fdbs)
> and more importantly removes the need to check if the matched entry is
> expired in __br_fdb_get which was a source of false-sharing.
> The third patch is a preparation for the final one which adds a new option
> that allows to turn off "used" updates on transmit to a fdb which is the
> other source of false-sharing in the bridge. If properly configured, i.e.
> ports bound to CPUs (thus updating "updated" locally) and "used_enabled"
> set to 0 then the bridge's HitM goes from 100% to 0%, but even with
> "used_enabled" = 1 we still get a win because lookups which iterated over
> the hash chain caused false-sharing due to the first cache line being
> used for both mac/vid and used/updated fields.
> I'm not a fan of adding new options to the bridge, so I'm open to
> suggestions for this one. Things I've tried before that:
> - dynamically allocate a pool of percpu memory for used/updated fields
>   (works but it's more complicated as we need dynamic resizing, too)
> - dynamically allocate percpu memory for each fdb separately (I'm not a fan
>   of this one, but it's much simpler to implement)
> 
> Of these two obviously the first approach worked best, but the complexity
> it brings is considerable, while with this patchset we can achieve the same
> result with proper configuration. Any feedback on this one would be greatly
> appreciated.
> 
> Some results from tests I've run:
> (note that these were run in good conditions for the baseline, everything
>  ran on a single NUMA node and there were only 3 fdbs)
> 
> 1. baseline
> 100% Load HitM on the fdbs (between everyone who has done lookups and hit
>                             one of the 3 hash chains of the communicating
>                             src/dst fdbs)
> Overall 5.06% Load HitM for the bridge, first place in the list
> 
> bridge udp rr between 3 cores/ports/fdbs test:
>    0.35%  ksoftirqd/0  [k] br_fdb_update
>    0.30%  ksoftirqd/0  [k] __br_fdb_get
>    0.17%  ksoftirqd/0  [k] br_handle_frame_finish
> 
> 2. used = 1
> 0% Local Load HitM
> bridge udp rr between 3 cores/ports/fdbs test:
>    0.24%  ksoftirqd/0    [k] br_fdb_update
>    0.23%  ksoftirqd/0    [k] __br_fdb_get
>    0.12%  ksoftirqd/0    [k] br_handle_frame_finish
> 
> 3. used = 0
> 0% Local Load HitM
> bridge udp rr between 3 cores/ports/fdbs test:
>    0.23%  ksoftirqd/0    [k] __br_fdb_get
>    0.22%  ksoftirqd/0    [k] br_fdb_update
>    0.10%  ksoftirqd/0    [k] br_handle_frame_finish
> 
> 
> Thanks,
>  Nik
> 
> Nikolay Aleksandrov (4):
>   bridge: modify bridge and port to have often accessed fields in one
>     cache line
>   bridge: move to workqueue gc
>   bridge: move write-heavy fdb members in their own cache line
>   bridge: add ability to turn off fdb used updates
> 
>  include/uapi/linux/if_link.h |  1 +
>  net/bridge/br_device.c       |  2 ++
>  net/bridge/br_fdb.c          | 31 ++++++++++++++---------
>  net/bridge/br_if.c           |  2 +-
>  net/bridge/br_input.c        |  3 ++-
>  net/bridge/br_ioctl.c        |  2 +-
>  net/bridge/br_netlink.c      | 12 +++++++--
>  net/bridge/br_private.h      | 58 ++++++++++++++++++++++----------------------
>  net/bridge/br_stp.c          |  2 +-
>  net/bridge/br_stp_if.c       |  4 +--
>  net/bridge/br_stp_timer.c    |  2 --
>  net/bridge/br_sysfs_br.c     | 25 ++++++++++++++++++-
>  12 files changed, 92 insertions(+), 52 deletions(-)
> 

I agree with the first 3 patches, but not the last one.
Changing the API just for a performance hack is not necessary. Instead make
the algorithm smarter and use per-cpu values.

  parent reply	other threads:[~2017-01-31 16:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-31 15:31 [PATCH RFC net-next 0/4] bridge: improve cache utilization Nikolay Aleksandrov
2017-01-31 15:31 ` [PATCH RFC net-next 1/4] bridge: modify bridge and port to have often accessed fields in one cache line Nikolay Aleksandrov
2017-01-31 15:31 ` [PATCH RFC net-next 2/4] bridge: move to workqueue gc Nikolay Aleksandrov
2017-01-31 15:31 ` [PATCH RFC net-next 3/4] bridge: move write-heavy fdb members in their own cache line Nikolay Aleksandrov
2017-01-31 16:37   ` Stephen Hemminger
2017-01-31 16:39     ` Nikolay Aleksandrov
2017-01-31 15:31 ` [PATCH RFC net-next 4/4] bridge: add ability to turn off fdb used updates Nikolay Aleksandrov
2017-02-03  2:47   ` David Miller
2017-02-03  8:30     ` Nikolay Aleksandrov
2017-02-03 18:28       ` Stephen Hemminger
2017-02-03 18:34         ` Nikolay Aleksandrov
2017-02-03 22:24           ` Stephen Hemminger
2017-02-03 22:27             ` Nikolay Aleksandrov
2017-02-04 16:45     ` Nikolay Aleksandrov
2017-01-31 16:39 ` Stephen Hemminger [this message]
2017-01-31 16:41   ` [PATCH RFC net-next 0/4] bridge: improve cache utilization Nikolay Aleksandrov
2017-01-31 18:09     ` Nikolay Aleksandrov
2017-01-31 18:21       ` Stephen Hemminger
2017-01-31 18:45         ` Nikolay Aleksandrov
2017-01-31 18:51           ` Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170131083919.51a3ac9f@xeon-e3 \
    --to=stephen@networkplumber.org \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=roopa@cumulusnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).