All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: netdev@vger.kernel.org, roopa@cumulusnetworks.com, davem@davemloft.net
Subject: Re: [PATCH RFC net-next 0/4] bridge: improve cache utilization
Date: Tue, 31 Jan 2017 08:39:19 -0800	[thread overview]
Message-ID: <20170131083919.51a3ac9f@xeon-e3> (raw)
In-Reply-To: <1485876718-18091-1-git-send-email-nikolay@cumulusnetworks.com>

On Tue, 31 Jan 2017 16:31:54 +0100
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> Hi all,
> This is the first set which begins to deal with the bad bridge cache
> access patterns. The first patch rearranges the bridge and port structs
> a little so the frequently (and closely) accessed members are in the same
> cache line. The second patch then moves the garbage collection to a
> workqueue trying to improve system responsiveness under load (many fdbs)
> and more importantly removes the need to check if the matched entry is
> expired in __br_fdb_get which was a source of false-sharing.
> The third patch is a preparation for the final one which adds a new option
> that allows to turn off "used" updates on transmit to a fdb which is the
> other source of false-sharing in the bridge. If properly configured, i.e.
> ports bound to CPUs (thus updating "updated" locally) and "used_enabled"
> set to 0 then the bridge's HitM goes from 100% to 0%, but even with
> "used_enabled" = 1 we still get a win because lookups which iterated over
> the hash chain caused false-sharing due to the first cache line being
> used for both mac/vid and used/updated fields.
> I'm not a fan of adding new options to the bridge, so I'm open to
> suggestions for this one. Things I've tried before that:
> - dynamically allocate a pool of percpu memory for used/updated fields
>   (works but it's more complicated as we need dynamic resizing, too)
> - dynamically allocate percpu memory for each fdb separately (I'm not a fan
>   of this one, but it's much simpler to implement)
> 
> Of these two obviously the first approach worked best, but the complexity
> it brings is considerable, while with this patchset we can achieve the same
> result with proper configuration. Any feedback on this one would be greatly
> appreciated.
> 
> Some results from tests I've run:
> (note that these were run in good conditions for the baseline, everything
>  ran on a single NUMA node and there were only 3 fdbs)
> 
> 1. baseline
> 100% Load HitM on the fdbs (between everyone who has done lookups and hit
>                             one of the 3 hash chains of the communicating
>                             src/dst fdbs)
> Overall 5.06% Load HitM for the bridge, first place in the list
> 
> bridge udp rr between 3 cores/ports/fdbs test:
>    0.35%  ksoftirqd/0  [k] br_fdb_update
>    0.30%  ksoftirqd/0  [k] __br_fdb_get
>    0.17%  ksoftirqd/0  [k] br_handle_frame_finish
> 
> 2. used = 1
> 0% Local Load HitM
> bridge udp rr between 3 cores/ports/fdbs test:
>    0.24%  ksoftirqd/0    [k] br_fdb_update
>    0.23%  ksoftirqd/0    [k] __br_fdb_get
>    0.12%  ksoftirqd/0    [k] br_handle_frame_finish
> 
> 3. used = 0
> 0% Local Load HitM
> bridge udp rr between 3 cores/ports/fdbs test:
>    0.23%  ksoftirqd/0    [k] __br_fdb_get
>    0.22%  ksoftirqd/0    [k] br_fdb_update
>    0.10%  ksoftirqd/0    [k] br_handle_frame_finish
> 
> 
> Thanks,
>  Nik
> 
> Nikolay Aleksandrov (4):
>   bridge: modify bridge and port to have often accessed fields in one
>     cache line
>   bridge: move to workqueue gc
>   bridge: move write-heavy fdb members in their own cache line
>   bridge: add ability to turn off fdb used updates
> 
>  include/uapi/linux/if_link.h |  1 +
>  net/bridge/br_device.c       |  2 ++
>  net/bridge/br_fdb.c          | 31 ++++++++++++++---------
>  net/bridge/br_if.c           |  2 +-
>  net/bridge/br_input.c        |  3 ++-
>  net/bridge/br_ioctl.c        |  2 +-
>  net/bridge/br_netlink.c      | 12 +++++++--
>  net/bridge/br_private.h      | 58 ++++++++++++++++++++++----------------------
>  net/bridge/br_stp.c          |  2 +-
>  net/bridge/br_stp_if.c       |  4 +--
>  net/bridge/br_stp_timer.c    |  2 --
>  net/bridge/br_sysfs_br.c     | 25 ++++++++++++++++++-
>  12 files changed, 92 insertions(+), 52 deletions(-)
> 

I agree with the first 3 patches, but not the last one.
Changing the API just for a performance hack is not necessary. Instead make
the algorithm smarter and use per-cpu values.

  parent reply	other threads:[~2017-01-31 16:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-31 15:31 [PATCH RFC net-next 0/4] bridge: improve cache utilization Nikolay Aleksandrov
2017-01-31 15:31 ` [PATCH RFC net-next 1/4] bridge: modify bridge and port to have often accessed fields in one cache line Nikolay Aleksandrov
2017-01-31 15:31 ` [PATCH RFC net-next 2/4] bridge: move to workqueue gc Nikolay Aleksandrov
2017-01-31 15:31 ` [PATCH RFC net-next 3/4] bridge: move write-heavy fdb members in their own cache line Nikolay Aleksandrov
2017-01-31 16:37   ` Stephen Hemminger
2017-01-31 16:39     ` Nikolay Aleksandrov
2017-01-31 15:31 ` [PATCH RFC net-next 4/4] bridge: add ability to turn off fdb used updates Nikolay Aleksandrov
2017-02-03  2:47   ` David Miller
2017-02-03  8:30     ` Nikolay Aleksandrov
2017-02-03 18:28       ` Stephen Hemminger
2017-02-03 18:34         ` Nikolay Aleksandrov
2017-02-03 22:24           ` Stephen Hemminger
2017-02-03 22:27             ` Nikolay Aleksandrov
2017-02-04 16:45     ` Nikolay Aleksandrov
2017-01-31 16:39 ` Stephen Hemminger [this message]
2017-01-31 16:41   ` [PATCH RFC net-next 0/4] bridge: improve cache utilization Nikolay Aleksandrov
2017-01-31 18:09     ` Nikolay Aleksandrov
2017-01-31 18:21       ` Stephen Hemminger
2017-01-31 18:45         ` Nikolay Aleksandrov
2017-01-31 18:51           ` Nikolay Aleksandrov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170131083919.51a3ac9f@xeon-e3 \
    --to=stephen@networkplumber.org \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=roopa@cumulusnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.