netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Poirier <bpoirier@suse.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Manish Chopra <manishc@marvell.com>,
	GR-Linux-NIC-Dev <GR-Linux-NIC-Dev@marvell.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [EXT] [PATCH net-next 03/16] qlge: Deduplicate lbq_buf_size
Date: Fri, 28 Jun 2019 17:57:13 +0900	[thread overview]
Message-ID: <20190628085713.GB14978@f1> (raw)
In-Reply-To: <CA+FuTSfKw6aaXk0hA0p_AUp9Oa_D+5Bwst8HUz7mJM-wO5Obow@mail.gmail.com>

On 2019/06/26 11:42, Willem de Bruijn wrote:
> On Wed, Jun 26, 2019 at 7:37 AM Benjamin Poirier <bpoirier@suse.com> wrote:
> >
> > On 2019/06/26 09:24, Manish Chopra wrote:
> > > > -----Original Message-----
> > > > From: Benjamin Poirier <bpoirier@suse.com>
> > > > Sent: Monday, June 17, 2019 1:19 PM
> > > > To: Manish Chopra <manishc@marvell.com>; GR-Linux-NIC-Dev <GR-Linux-
> > > > NIC-Dev@marvell.com>; netdev@vger.kernel.org
> > > > Subject: [EXT] [PATCH net-next 03/16] qlge: Deduplicate lbq_buf_size
> > > >
> > > > External Email
> > > >
> > > > ----------------------------------------------------------------------
> > > > lbq_buf_size is duplicated to every rx_ring structure whereas lbq_buf_order is
> > > > present once in the ql_adapter structure. All rings use the same buf size, keep
> > > > only one copy of it. Also factor out the calculation of lbq_buf_size instead of
> > > > having two copies.
> > > >
> > > > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> > > > ---
> > [...]
> > >
> > > Not sure if this change is really required, I think fields relevant to rx_ring should be present in the rx_ring structure.
> > > There are various other fields like "lbq_len" and "lbq_size" which would be same for all rx rings but still under the relevant rx_ring structure.
> 
> The one argument against deduplicating might be if the original fields
> are in a hot cacheline and the new location adds a cacheline access to
> a hot path. Not sure if that is relevant here. But maybe something to
> double check.
> 

Thanks for the hint. I didn't check before because my hunch was that
this driver is not near that level of optimization but I checked now and
got the following results.

The other side of the link is doing pktgen of 257B frames (smallest to
exercise the lbq path), over 10 runs of 100s, number of frames received
by the stack before/after applying this patch:
before: 1487000±2000pps
after: 1538000±2000pps
(3.4% improvement)

Looking with perf stat, boiling down the numbers to be per packet, most
interesting differences are:
cycles -4.32%
instructions -0.97%
L1-dcache-loads +6.30%
L1-dcache-load-misses -1.26%
LLC-loads +2.93%
LLC-load-misses -1.16%

So, fewer cycles/packet, fewer cache misses. Note however that
L1-dcache-loads had high variability between runs (~12%).

Looking with pahole, struct rx_ring is reduced by 8 bytes but its a 784
bytes structure full of holes to begin with.

Numbers are from an old Xeon E3-1275

Full output follows:
root@dtest:~# cat /tmp/mini.sh
#!/bin/bash

da=$(ethtool -S ql1 | awk '/rx_nic_fifo_drop/ {print $2}')
ra=$(ip -s link show dev ql1 | awk 'NR == 4 {print $2}')
sleep 100
db=$(ethtool -S ql1 | awk '/rx_nic_fifo_drop/ {print $2}')
rb=$(ip -s link show dev ql1 | awk 'NR == 4 {print $2}')
echo "rx $((rb - ra))"
echo "dropped $((db - da))"

root@dtest:~# perf stat -a -r10 -d /tmp/mini.sh
rx 148763519
dropped 296128797
rx 148784865
dropped 296107878
rx 148675349
dropped 296217234
rx 148634271
dropped 296260358
rx 148813581
dropped 296089785
rx 148755713
dropped 296136071
rx 148434116
dropped 296459743
rx 148390638
dropped 296501405
rx 148500812
dropped 296391286
rx 148973912
dropped 295920014

 Performance counter stats for 'system wide' (10 runs):

        800,245.24 msec cpu-clock                 #    8.000 CPUs utilized            ( +-  0.00% )
            11,559      context-switches          #   14.445 M/sec                    ( +-  0.13% )
                36      cpu-migrations            #    0.044 M/sec                    ( +-  7.11% )
            62,100      page-faults               #   77.601 M/sec                    ( +-  0.51% )
   362,221,639,977      cycles                    # 452638.486 GHz                    ( +-  0.77% )  (49.92%)
   515,318,132,327      instructions              #    1.42  insn per cycle           ( +-  0.64% )  (62.42%)
    91,744,969,260      branches                  # 114646115.533 M/sec               ( +-  0.04% )  (62.50%)
       549,681,601      branch-misses             #    0.60% of all branches          ( +-  0.25% )  (62.50%)
   115,207,692,963      L1-dcache-loads           # 143965544.751 M/sec               ( +- 12.09% )  (40.59%)
     2,629,809,584      L1-dcache-load-misses     #    2.28% of all L1-dcache hits    ( +-  3.24% )  (29.90%)
       837,806,984      LLC-loads                 # 1046938.235 M/sec                 ( +-  0.73% )  (25.31%)
       569,343,079      LLC-load-misses           #   67.96% of all LL-cache hits     ( +-  0.07% )  (37.50%)

         100.03177 +- 0.00104 seconds time elapsed  ( +-  0.00% )

#
# changed the driver now to include this patch
#

root@dtest:~# perf stat -a -r10 -d /tmp/mini.sh
rx 154094001
dropped 290799944
rx 153798301
dropped 291094799
rx 153942166
dropped 290950792
rx 154150506
dropped 290743194
rx 154179791
dropped 290712400
rx 153907213
dropped 290985227
rx 153813515
dropped 291078475
rx 153603554
dropped 291289250
rx 153521414
dropped 291372145
rx 153592955
dropped 291299734

 Performance counter stats for 'system wide' (10 runs):

        800,234.85 msec cpu-clock                 #    8.000 CPUs utilized            ( +-  0.00% )
            12,061      context-switches          #   15.072 M/sec                    ( +-  0.61% )
                32      cpu-migrations            #    0.040 M/sec                    ( +-  8.40% )
            62,210      page-faults               #   77.740 M/sec                    ( +-  0.60% )
   358,454,546,598      cycles                    # 447936.882 GHz                    ( +-  1.34% )  (49.84%)
   527,804,259,267      instructions              #    1.47  insn per cycle           ( +-  1.11% )  (62.35%)
    94,904,030,904      branches                  # 118595275.389 M/sec               ( +-  0.05% )  (62.50%)
       568,214,180      branch-misses             #    0.60% of all branches          ( +-  0.12% )  (62.50%)
   126,667,377,528      L1-dcache-loads           # 158287823.791 M/sec               ( +- 12.48% )  (41.21%)
     2,685,865,791      L1-dcache-load-misses     #    2.12% of all L1-dcache hits    ( +-  4.01% )  (29.42%)
       891,939,954      LLC-loads                 # 1114598.225 M/sec                 ( +-  0.73% )  (25.16%)
       582,037,612      LLC-load-misses           #   65.26% of all LL-cache hits     ( +-  0.09% )  (37.50%)

       100.0307117 +- 0.0000810 seconds time elapsed  ( +-  0.00% )

  reply	other threads:[~2019-06-28  8:57 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-17  7:48 [PATCH net-next 01/16] qlge: Remove irq_cnt Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 02/16] qlge: Remove page_chunk.last_flag Benjamin Poirier
2019-06-26  9:12   ` Manish Chopra
2019-06-17  7:48 ` [PATCH net-next 03/16] qlge: Deduplicate lbq_buf_size Benjamin Poirier
2019-06-26  9:24   ` [EXT] " Manish Chopra
2019-06-26 11:37     ` Benjamin Poirier
2019-06-26 15:42       ` Willem de Bruijn
2019-06-28  8:57         ` Benjamin Poirier [this message]
2019-06-28 14:56           ` Willem de Bruijn
2019-06-17  7:48 ` [PATCH net-next 04/16] qlge: Remove bq_desc.maplen Benjamin Poirier
2019-06-26  9:31   ` Manish Chopra
2019-06-17  7:48 ` [PATCH net-next 05/16] qlge: Remove rx_ring.sbq_buf_size Benjamin Poirier
2019-06-26  9:36   ` [EXT] " Manish Chopra
2019-06-26 11:39     ` Benjamin Poirier
2019-06-26 15:35       ` Willem de Bruijn
2019-06-17  7:48 ` [PATCH net-next 06/16] qlge: Remove useless dma synchronization calls Benjamin Poirier
2019-06-17  9:44   ` [EXT] " Manish Chopra
2019-06-18  2:51     ` Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 07/16] qlge: Deduplicate rx buffer queue management Benjamin Poirier
2019-06-27 10:02   ` [EXT] " Manish Chopra
2019-07-09  2:10     ` Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 08/16] qlge: Fix dma_sync_single calls Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 09/16] qlge: Remove rx_ring.type Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 10/16] qlge: Factor out duplicated expression Benjamin Poirier
2019-06-23 17:59   ` David Miller
2019-06-23 18:00     ` David Miller
2019-06-24  7:52     ` Benjamin Poirier
2019-06-25 18:32       ` Manish Chopra
2019-06-28  8:52         ` Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 11/16] qlge: Remove qlge_bq.len & size Benjamin Poirier
2019-06-27 10:47   ` Manish Chopra
2019-07-09  6:52     ` Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 12/16] qlge: Remove useless memset Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 13/16] qlge: Replace memset with assignment Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 14/16] qlge: Update buffer queue prod index despite oom Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 15/16] qlge: Refill rx buffers up to multiple of 16 Benjamin Poirier
2019-06-17  7:48 ` [PATCH net-next 16/16] qlge: Refill empty buffer queues from wq Benjamin Poirier
2019-06-27 14:18   ` [EXT] " Manish Chopra
2019-07-10  1:18     ` Benjamin Poirier
2019-06-17 16:49 ` [PATCH net-next 01/16] qlge: Remove irq_cnt Manish Chopra
2019-06-26  8:59 ` Manish Chopra
2019-06-26 11:36   ` Benjamin Poirier
2019-06-26 13:21     ` Manish Chopra
2019-07-05  8:33       ` Benjamin Poirier
2019-07-15  1:40 ` Benjamin Poirier
2019-07-15  9:48   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190628085713.GB14978@f1 \
    --to=bpoirier@suse.com \
    --cc=GR-Linux-NIC-Dev@marvell.com \
    --cc=manishc@marvell.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).