From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>,
Wei Fang <wei.fang@nxp.com>, Shenwei Wang <shenwei.wang@nxp.com>,
Clark Wang <xiaoning.wang@nxp.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
linux-imx@nxp.com, netdev@vger.kernel.org,
Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
Alexandre Belloni <alexandre.belloni@bootlin.com>,
Maxime Chevallier <maxime.chevallier@bootlin.com>,
Andrew Lunn <andrew@lunn.ch>,
Stephen Hemminger <stephen@networkplumber.org>,
Alexander Stein <alexander.stein@ew.tq-group.com>
Subject: Re: Ethernet issue on imx6
Date: Tue, 17 Oct 2023 13:19:01 +0200 [thread overview]
Message-ID: <20231017131901.5ae65e4d@xps-13> (raw)
In-Reply-To: <CANn89iLxKQOY5ZA5o3d1y=v4MEAsAQnzmVDjmLY0_bJPG93tKQ@mail.gmail.com>
Hi Eric,
edumazet@google.com wrote on Mon, 16 Oct 2023 21:37:58 +0200:
> On Mon, Oct 16, 2023 at 5:37 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hello again,
> >
> > > > > # iperf3 -c 192.168.1.1
> > > > > Connecting to host 192.168.1.1, port 5201
> > > > > [ 5] local 192.168.1.2 port 37948 connected to 192.168.1.1 port 5201
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 11.3 MBytes 94.5 Mbits/sec 43 32.5 KBytes
> > > > > [ 5] 1.00-2.00 sec 3.29 MBytes 27.6 Mbits/sec 26 1.41 KBytes
> > > > > [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > > [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 5 1.41 KBytes
> > > > > [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > > [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > >
> > > > > Thanks,
> > > > > Miquèl
> > > >
> > > > Can you experiment with :
> > > >
> > > > - Disabling TSO on your NIC (ethtool -K eth0 tso off)
> > > > - Reducing max GSO size (ip link set dev eth0 gso_max_size 16384)
> > > >
> > > > I suspect some kind of issues with fec TX completion, vs TSO emulation.
> > >
> > > Wow, appears to have a significant effect. I am using Busybox's iproute
> > > implementation which does not know gso_max_size, but I hacked directly
> > > into netdevice.h just to see if it would have an effect. I'm adding
> > > iproute2 to the image for further testing.
> > >
> > > Here is the diff:
> > >
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -2364,7 +2364,7 @@ struct net_device {
> > > /* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
> > > * and shinfo->gso_segs is a 16bit field.
> > > */
> > > -#define GSO_MAX_SIZE (8 * GSO_MAX_SEGS)
> > > +#define GSO_MAX_SIZE 16384u
> > >
> > > unsigned int gso_max_size;
> > > #define TSO_LEGACY_MAX_SIZE 65536
> > >
> > > And here are the results:
> > >
> > > # ethtool -K eth0 tso off
> > > # iperf3 -c 192.168.1.1 -u -b1M
> > > Connecting to host 192.168.1.1, port 5201
> > > [ 5] local 192.168.1.2 port 50490 connected to 192.168.1.1 port 5201
> > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > [ 5] 0.00-1.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 1.00-2.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 2.00-3.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 3.00-4.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 4.00-5.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 5.00-6.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 6.00-7.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 7.00-8.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 8.00-9.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 9.00-10.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > > [ 5] 0.00-10.00 sec 1.19 MBytes 1.00 Mbits/sec 0.000 ms 0/864 (0%) sender
> > > [ 5] 0.00-10.05 sec 1.11 MBytes 925 Kbits/sec 0.045 ms 62/864 (7.2%) receiver
> > > iperf Done.
> > > # iperf3 -c 192.168.1.1
> > > Connecting to host 192.168.1.1, port 5201
> > > [ 5] local 192.168.1.2 port 34792 connected to 192.168.1.1 port 5201
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 1.63 MBytes 13.7 Mbits/sec 30 1.41 KBytes
> > > [ 5] 1.00-2.00 sec 7.40 MBytes 62.1 Mbits/sec 65 14.1 KBytes
> > > [ 5] 2.00-3.00 sec 7.83 MBytes 65.7 Mbits/sec 109 2.83 KBytes
> > > [ 5] 3.00-4.00 sec 2.49 MBytes 20.9 Mbits/sec 46 19.8 KBytes
> > > [ 5] 4.00-5.00 sec 7.89 MBytes 66.2 Mbits/sec 109 2.83 KBytes
> > > [ 5] 5.00-6.00 sec 255 KBytes 2.09 Mbits/sec 22 2.83 KBytes
> > > [ 5] 6.00-7.00 sec 4.35 MBytes 36.5 Mbits/sec 74 41.0 KBytes
> > > [ 5] 7.00-8.00 sec 10.9 MBytes 91.8 Mbits/sec 34 45.2 KBytes
> > > [ 5] 8.00-9.00 sec 5.35 MBytes 44.9 Mbits/sec 82 1.41 KBytes
> > > [ 5] 9.00-10.00 sec 1.37 MBytes 11.5 Mbits/sec 73 1.41 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 49.5 MBytes 41.5 Mbits/sec 644 sender
> > > [ 5] 0.00-10.05 sec 49.3 MBytes 41.1 Mbits/sec receiver
> > > iperf Done.
> > >
> > > There is still a noticeable amount of drop/retries, but overall the
> > > results are significantly better. What is the rationale behind the
> > > choice of 16384 in particular? Could this be further improved?
> >
> > Apparently I've been too enthusiastic. After sending this e-mail I've
> > re-generated an image with iproute2 and dd'ed the whole image into an
> > SD card, while until now I was just updating the kernel/DT manually and
> > got the same performances as above without the gro size trick. I need
> > to clarify this further.
> >
>
> Looking a bit at fec, I think fec_enet_txq_put_hdr_tso() is bogus...
>
> txq->tso_hdrs should be properly aligned by definition.
>
> If FEC_QUIRK_SWAP_FRAME is requested, better copy the right thing, not
> original skb->data ???
I've clarified the situation after looking at the build artifacts and
going through (way) longer testing sessions, as successive 10-second
tests can lead to really different results.
On a 4.14.322 kernel (still maintained) I really get extremely crappy
throughput.
On a mainline 6.5 kernel I thought I had a similar issue but this was
due to wrong RGMII-ID timings being used (I ported the board from 4.14
to 6.5 and made a mistake). So with the right timings, I get
much better throughput but still significantly low compared to what I
would expect.
So I tested Eric's fixes:
- TCP fix:
https://lore.kernel.org/netdev/CANn89iJUBujG2AOBYsr0V7qyC5WTgzx0GucO=2ES69tTDJRziw@mail.gmail.com/
- FEC fix:
https://lore.kernel.org/netdev/CANn89iLxKQOY5ZA5o3d1y=v4MEAsAQnzmVDjmLY0_bJPG93tKQ@mail.gmail.com/
As well as different CPUfreq/CPUidle parameters, as pointed out by
Alexander:
https://lore.kernel.org/netdev/2245614.iZASKD2KPV@steina-w/
Here are the results of 100 seconds iperf uplink TCP tests, as reported
by the receiver. First value is the mean, the raw results are in the '(' ')'.
Unit: Mbps
Default setup:
CPUidle yes, CPUfreq yes, TCP fix no, FEC fix no: 30.2 (23.8, 28.4, 38.4)
CPU power management tests (with TCP fix and FEC fix):
CPUidle yes, CPUfreq yes: 26.5 (24.5, 28.5)
CPUidle no, CPUfreq yes: 50.3 (44.8, 55.7)
CPUidle yes, CPUfreq no: 80.2 (75.8, 79.5, 80.8, 81.8, 83.1)
CPUidle no, CPUfreq no: 85.4 (80.6, 81.1, 86.2, 87.5, 91.8)
Eric's fixes tests (No CPUidle, no CPUfreq):
TCP fix yes, FEC fix yes: 85.4 (80.6, 81.1, 86.2, 87.5, 91.8) (same as above)
TCP fix no, FEC fix yes: 82.0 (74.5, 75.9, 82.2, 87.5, 90.2)
TCP fix yes, FEC fix no: 81.4 (77.5, 77.7, 82.8, 83.7, 85.4)
TCP fix no, FEC fix no: 79.6 (68.2, 77.6, 78.9, 86.4, 87.1)
So indeed the TCP and FEC patches don't seem to have a real impact (or
a small one, I don't know given how scattered are the results). However
there is definitely something wrong with the low power settings and I
believe the Errata pointed by Alexander may have a real impact there
(ERR006687 ENET: Only the ENET wake-up interrupt request can wake the
system from Wait mode [i.MX 6Dual/6Quad Only]), probably that my
hardware lacks the hardware workaround.
I believe the remaining fluctuations are due to the RGMII-ID timings
not being totally optimal, I think I would need to extend them slightly
more in the Tx path but they are already set to the maximum value.
Anyhow, I no longer see any difference in the drop rate between -b1M
and -b0 (<1%) so I believe it is acceptable like that.
Now I might try to track what is missing in 4.14.322 and perhaps ask
for a backport if it's relevant.
Thanks a lot for all your feedback,
Miquèl
next prev parent reply other threads:[~2023-10-17 11:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-12 17:34 Ethernet issue on imx6 Miquel Raynal
2023-10-12 19:39 ` Russell King (Oracle)
2023-10-13 8:40 ` Miquel Raynal
2023-10-13 10:16 ` Wei Fang
2023-10-16 11:49 ` Eric Dumazet
2023-10-16 13:58 ` Miquel Raynal
2023-10-16 15:06 ` Eric Dumazet
2023-10-16 15:36 ` Miquel Raynal
2023-10-16 19:37 ` Eric Dumazet
2023-10-16 21:47 ` Russell King (Oracle)
2023-10-17 11:19 ` Miquel Raynal [this message]
2023-10-12 20:46 ` Andrew Lunn
2023-10-12 22:58 ` Stephen Hemminger
2023-10-13 8:27 ` Miquel Raynal
2023-10-13 15:51 ` Andrew Lunn
2023-10-27 20:58 ` Miquel Raynal
2023-11-17 15:09 ` Miquel Raynal
2023-10-16 8:48 ` Alexander Stein
2023-10-16 13:31 ` Miquel Raynal
2023-10-16 14:41 ` Alexander Stein
2023-10-17 10:49 ` Miquel Raynal
2023-10-18 9:08 ` Alexander Stein
2023-10-27 20:58 ` Miquel Raynal
2023-10-13 8:50 ` James Chapman
2023-10-13 10:37 ` Miquel Raynal
2023-10-13 11:54 ` James Chapman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231017131901.5ae65e4d@xps-13 \
--to=miquel.raynal@bootlin.com \
--cc=alexander.stein@ew.tq-group.com \
--cc=alexandre.belloni@bootlin.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-imx@nxp.com \
--cc=linux@armlinux.org.uk \
--cc=maxime.chevallier@bootlin.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shenwei.wang@nxp.com \
--cc=stephen@networkplumber.org \
--cc=thomas.petazzoni@bootlin.com \
--cc=wei.fang@nxp.com \
--cc=xiaoning.wang@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.