From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Eric Dumazet <edumazet@google.com>
Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>,
Wei Fang <wei.fang@nxp.com>, Shenwei Wang <shenwei.wang@nxp.com>,
Clark Wang <xiaoning.wang@nxp.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
linux-imx@nxp.com, netdev@vger.kernel.org,
Thomas Petazzoni <thomas.petazzoni@bootlin.com>,
Alexandre Belloni <alexandre.belloni@bootlin.com>,
Maxime Chevallier <maxime.chevallier@bootlin.com>,
Andrew Lunn <andrew@lunn.ch>,
Stephen Hemminger <stephen@networkplumber.org>,
Alexander Stein <alexander.stein@ew.tq-group.com>
Subject: Re: Ethernet issue on imx6
Date: Tue, 17 Oct 2023 13:19:01 +0200 [thread overview]
Message-ID: <20231017131901.5ae65e4d@xps-13> (raw)
In-Reply-To: <CANn89iLxKQOY5ZA5o3d1y=v4MEAsAQnzmVDjmLY0_bJPG93tKQ@mail.gmail.com>
Hi Eric,
edumazet@google.com wrote on Mon, 16 Oct 2023 21:37:58 +0200:
> On Mon, Oct 16, 2023 at 5:37 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hello again,
> >
> > > > > # iperf3 -c 192.168.1.1
> > > > > Connecting to host 192.168.1.1, port 5201
> > > > > [ 5] local 192.168.1.2 port 37948 connected to 192.168.1.1 port 5201
> > > > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > > > [ 5] 0.00-1.00 sec 11.3 MBytes 94.5 Mbits/sec 43 32.5 KBytes
> > > > > [ 5] 1.00-2.00 sec 3.29 MBytes 27.6 Mbits/sec 26 1.41 KBytes
> > > > > [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > > [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 5 1.41 KBytes
> > > > > [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 6.00-7.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 7.00-8.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
> > > > > [ 5] 8.00-9.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > > [ 5] 9.00-10.00 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
> > > > >
> > > > > Thanks,
> > > > > Miquèl
> > > >
> > > > Can you experiment with :
> > > >
> > > > - Disabling TSO on your NIC (ethtool -K eth0 tso off)
> > > > - Reducing max GSO size (ip link set dev eth0 gso_max_size 16384)
> > > >
> > > > I suspect some kind of issues with fec TX completion, vs TSO emulation.
> > >
> > > Wow, appears to have a significant effect. I am using Busybox's iproute
> > > implementation which does not know gso_max_size, but I hacked directly
> > > into netdevice.h just to see if it would have an effect. I'm adding
> > > iproute2 to the image for further testing.
> > >
> > > Here is the diff:
> > >
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -2364,7 +2364,7 @@ struct net_device {
> > > /* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
> > > * and shinfo->gso_segs is a 16bit field.
> > > */
> > > -#define GSO_MAX_SIZE (8 * GSO_MAX_SEGS)
> > > +#define GSO_MAX_SIZE 16384u
> > >
> > > unsigned int gso_max_size;
> > > #define TSO_LEGACY_MAX_SIZE 65536
> > >
> > > And here are the results:
> > >
> > > # ethtool -K eth0 tso off
> > > # iperf3 -c 192.168.1.1 -u -b1M
> > > Connecting to host 192.168.1.1, port 5201
> > > [ 5] local 192.168.1.2 port 50490 connected to 192.168.1.1 port 5201
> > > [ ID] Interval Transfer Bitrate Total Datagrams
> > > [ 5] 0.00-1.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 1.00-2.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 2.00-3.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 3.00-4.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 4.00-5.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 5.00-6.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 6.00-7.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > [ 5] 7.00-8.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 8.00-9.00 sec 122 KBytes 996 Kbits/sec 86
> > > [ 5] 9.00-10.00 sec 123 KBytes 1.01 Mbits/sec 87
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
> > > [ 5] 0.00-10.00 sec 1.19 MBytes 1.00 Mbits/sec 0.000 ms 0/864 (0%) sender
> > > [ 5] 0.00-10.05 sec 1.11 MBytes 925 Kbits/sec 0.045 ms 62/864 (7.2%) receiver
> > > iperf Done.
> > > # iperf3 -c 192.168.1.1
> > > Connecting to host 192.168.1.1, port 5201
> > > [ 5] local 192.168.1.2 port 34792 connected to 192.168.1.1 port 5201
> > > [ ID] Interval Transfer Bitrate Retr Cwnd
> > > [ 5] 0.00-1.00 sec 1.63 MBytes 13.7 Mbits/sec 30 1.41 KBytes
> > > [ 5] 1.00-2.00 sec 7.40 MBytes 62.1 Mbits/sec 65 14.1 KBytes
> > > [ 5] 2.00-3.00 sec 7.83 MBytes 65.7 Mbits/sec 109 2.83 KBytes
> > > [ 5] 3.00-4.00 sec 2.49 MBytes 20.9 Mbits/sec 46 19.8 KBytes
> > > [ 5] 4.00-5.00 sec 7.89 MBytes 66.2 Mbits/sec 109 2.83 KBytes
> > > [ 5] 5.00-6.00 sec 255 KBytes 2.09 Mbits/sec 22 2.83 KBytes
> > > [ 5] 6.00-7.00 sec 4.35 MBytes 36.5 Mbits/sec 74 41.0 KBytes
> > > [ 5] 7.00-8.00 sec 10.9 MBytes 91.8 Mbits/sec 34 45.2 KBytes
> > > [ 5] 8.00-9.00 sec 5.35 MBytes 44.9 Mbits/sec 82 1.41 KBytes
> > > [ 5] 9.00-10.00 sec 1.37 MBytes 11.5 Mbits/sec 73 1.41 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 49.5 MBytes 41.5 Mbits/sec 644 sender
> > > [ 5] 0.00-10.05 sec 49.3 MBytes 41.1 Mbits/sec receiver
> > > iperf Done.
> > >
> > > There is still a noticeable amount of drop/retries, but overall the
> > > results are significantly better. What is the rationale behind the
> > > choice of 16384 in particular? Could this be further improved?
> >
> > Apparently I've been too enthusiastic. After sending this e-mail I've
> > re-generated an image with iproute2 and dd'ed the whole image into an
> > SD card, while until now I was just updating the kernel/DT manually and
> > got the same performances as above without the gro size trick. I need
> > to clarify this further.
> >
>
> Looking a bit at fec, I think fec_enet_txq_put_hdr_tso() is bogus...
>
> txq->tso_hdrs should be properly aligned by definition.
>
> If FEC_QUIRK_SWAP_FRAME is requested, better copy the right thing, not
> original skb->data ???
I've clarified the situation after looking at the build artifacts and
going through (way) longer testing sessions, as successive 10-second
tests can lead to really different results.
On a 4.14.322 kernel (still maintained) I really get extremely crappy
throughput.
On a mainline 6.5 kernel I thought I had a similar issue but this was
due to wrong RGMII-ID timings being used (I ported the board from 4.14
to 6.5 and made a mistake). So with the right timings, I get
much better throughput but still significantly low compared to what I
would expect.
So I tested Eric's fixes:
- TCP fix:
https://lore.kernel.org/netdev/CANn89iJUBujG2AOBYsr0V7qyC5WTgzx0GucO=2ES69tTDJRziw@mail.gmail.com/
- FEC fix:
https://lore.kernel.org/netdev/CANn89iLxKQOY5ZA5o3d1y=v4MEAsAQnzmVDjmLY0_bJPG93tKQ@mail.gmail.com/
As well as different CPUfreq/CPUidle parameters, as pointed out by
Alexander:
https://lore.kernel.org/netdev/2245614.iZASKD2KPV@steina-w/
Here are the results of 100 seconds iperf uplink TCP tests, as reported
by the receiver. First value is the mean, the raw results are in the '(' ')'.
Unit: Mbps
Default setup:
CPUidle yes, CPUfreq yes, TCP fix no, FEC fix no: 30.2 (23.8, 28.4, 38.4)
CPU power management tests (with TCP fix and FEC fix):
CPUidle yes, CPUfreq yes: 26.5 (24.5, 28.5)
CPUidle no, CPUfreq yes: 50.3 (44.8, 55.7)
CPUidle yes, CPUfreq no: 80.2 (75.8, 79.5, 80.8, 81.8, 83.1)
CPUidle no, CPUfreq no: 85.4 (80.6, 81.1, 86.2, 87.5, 91.8)
Eric's fixes tests (No CPUidle, no CPUfreq):
TCP fix yes, FEC fix yes: 85.4 (80.6, 81.1, 86.2, 87.5, 91.8) (same as above)
TCP fix no, FEC fix yes: 82.0 (74.5, 75.9, 82.2, 87.5, 90.2)
TCP fix yes, FEC fix no: 81.4 (77.5, 77.7, 82.8, 83.7, 85.4)
TCP fix no, FEC fix no: 79.6 (68.2, 77.6, 78.9, 86.4, 87.1)
So indeed the TCP and FEC patches don't seem to have a real impact (or
a small one, I don't know given how scattered are the results). However
there is definitely something wrong with the low power settings and I
believe the Errata pointed by Alexander may have a real impact there
(ERR006687 ENET: Only the ENET wake-up interrupt request can wake the
system from Wait mode [i.MX 6Dual/6Quad Only]), probably that my
hardware lacks the hardware workaround.
I believe the remaining fluctuations are due to the RGMII-ID timings
not being totally optimal, I think I would need to extend them slightly
more in the Tx path but they are already set to the maximum value.
Anyhow, I no longer see any difference in the drop rate between -b1M
and -b0 (<1%) so I believe it is acceptable like that.
Now I might try to track what is missing in 4.14.322 and perhaps ask
for a backport if it's relevant.
Thanks a lot for all your feedback,
Miquèl
next prev parent reply other threads:[~2023-10-17 11:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-12 17:34 Ethernet issue on imx6 Miquel Raynal
2023-10-12 19:39 ` Russell King (Oracle)
2023-10-13 8:40 ` Miquel Raynal
2023-10-13 10:16 ` Wei Fang
2023-10-16 11:49 ` Eric Dumazet
2023-10-16 13:58 ` Miquel Raynal
2023-10-16 15:06 ` Eric Dumazet
2023-10-16 15:36 ` Miquel Raynal
2023-10-16 19:37 ` Eric Dumazet
2023-10-16 21:47 ` Russell King (Oracle)
2023-10-17 11:19 ` Miquel Raynal [this message]
2023-10-12 20:46 ` Andrew Lunn
2023-10-12 22:58 ` Stephen Hemminger
2023-10-13 8:27 ` Miquel Raynal
2023-10-13 15:51 ` Andrew Lunn
2023-10-27 20:58 ` Miquel Raynal
2023-11-17 15:09 ` Miquel Raynal
2023-10-16 8:48 ` Alexander Stein
2023-10-16 13:31 ` Miquel Raynal
2023-10-16 14:41 ` Alexander Stein
2023-10-17 10:49 ` Miquel Raynal
2023-10-18 9:08 ` Alexander Stein
2023-10-27 20:58 ` Miquel Raynal
2023-10-13 8:50 ` James Chapman
2023-10-13 10:37 ` Miquel Raynal
2023-10-13 11:54 ` James Chapman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231017131901.5ae65e4d@xps-13 \
--to=miquel.raynal@bootlin.com \
--cc=alexander.stein@ew.tq-group.com \
--cc=alexandre.belloni@bootlin.com \
--cc=andrew@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-imx@nxp.com \
--cc=linux@armlinux.org.uk \
--cc=maxime.chevallier@bootlin.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shenwei.wang@nxp.com \
--cc=stephen@networkplumber.org \
--cc=thomas.petazzoni@bootlin.com \
--cc=wei.fang@nxp.com \
--cc=xiaoning.wang@nxp.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox