BUG: i.MX6-FEC: broken TCP tx checksumming

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* BUG: i.MX6-FEC: broken TCP tx checksumming
@ 2014-07-16  8:27 Holger Schurig
  2014-07-16  9:58 ` Russell King - ARM Linux
  0 siblings, 1 reply; 6+ messages in thread
From: Holger Schurig @ 2014-07-16  8:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

on my target I use the i:MX6 FEC with kernel 3.16-rc5. Very quickly I
was able to "ping", but TCP (e.g. ssh) didn't work.

I first suspected autonegotiation, because "mii-tool eth0" shows
(wrongly) eth0: negotiated "1000baseT-HD flow-control, link ok".
Despite my switch not a gigabit one ...  Then I found out that
"ethtool eth0" worked better, it displayed "Speed: 100Mb/s", "Duplex:
Full". And with ping working, it couldn't be gigabit-ethernet against
a non-gigabit switch anyway ...

I then looked at linux-next and applied the FEC related patches from
it. To no avail.

Finally I made started wireshark (on the desktop). And Wireshark said:

  Header checksum: 0x0000 [incorrect, should be 0xe65e (may be caused
by "IP checksum offload"?)]

So I tried "ethtool -K eth0 tx off" (on the i.MX6 board) and suddenly
ssh worked.




I think it is unrelated, but together with rc-5 patches I'm using this
linux-next patches:

net: fec: iMX6 FEC does not support half-duplex gigabit
net: fec: fix ethtool set_pauseparam duplex bug
net: fec: fix interrupt handling races
net: fec: use netif_tx_disable() rather than netif_stop_queue()
net: fec: remove checking for NULL phy_dev in fec_enet_close()
net: fec: ensure that a disconnected phy isn't configured
net: fec: stop the phy before shutting down the MAC
net: fec: remove useless fep->opened
net: fec: make rx skb handling more robust
net: fec: clean up transmit descriptor setup
net: fec: ensure fec_enet_free_buffers() properly cleans the rings
net: fec: fix missing kmalloc() failure check in fec_enet_alloc_buffers()
net: fec: improve safety of suspend/resume/transmit timeout paths
net: fec: ensure fec_enet_close() copes with resume failure
net: fec: only restart or stop the device if it is present and running
net: fec: move calls to quiesce/resume packet processing out of fec_restart()
net: fec: remove inappropriate calls around fec_restart()
net: fec: quiesce packet processing before stopping device in fec_suspend()
net: fec: quiesce packet processing before stopping device in fec_set_features()
net: fec: quiesce packet processing before changing features
net: fec: quiesce packet processing when taking link down in
fec_enet_adjust_link()
net: fec: clean up duplex mode handling
net: fec: better implementation of iMX6 ERR006358 quirk
net: fec: replace delayed work with standard work
net: fec: clear receive interrupts before processing a packet
net: fec: reorder ethtool ops to match order in struct declaration
net: fec: add support for dumping transmit ring on timeout
net: fec: remove useless status check in tx reap path
net: fec: consolidate hwtstamp implementation

^ permalink raw reply	[flat|nested] 6+ messages in thread

* BUG: i.MX6-FEC: broken TCP tx checksumming
  2014-07-16  8:27 BUG: i.MX6-FEC: broken TCP tx checksumming Holger Schurig
@ 2014-07-16  9:58 ` Russell King - ARM Linux
  2014-07-16 14:03   ` Holger Schurig
  2014-07-16 14:09   ` Holger Schurig
  0 siblings, 2 replies; 6+ messages in thread
From: Russell King - ARM Linux @ 2014-07-16  9:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 16, 2014 at 10:27:45AM +0200, Holger Schurig wrote:
> on my target I use the i:MX6 FEC with kernel 3.16-rc5. Very quickly I
> was able to "ping", but TCP (e.g. ssh) didn't work.
> 
> I first suspected autonegotiation, because "mii-tool eth0" shows
> (wrongly) eth0: negotiated "1000baseT-HD flow-control, link ok".
> Despite my switch not a gigabit one ...

That's rather weird - that line is printed by the generic phy layer,
so it suggests a bug in either the generic phy support, or the phy
driver itself.  It shouldn't report HD mode for gigabit with my patch
series applied - the FEC hardware doesn't support HD at gigabit speeds.

> Then I found out that
> "ethtool eth0" worked better, it displayed "Speed: 100Mb/s", "Duplex:
> Full". And with ping working, it couldn't be gigabit-ethernet against
> a non-gigabit switch anyway ...

That's again read via the generic phy layer, and that suggests that
the phy finally reported the correct speed... again, I wonder if
something in the phy layer is buggy.  I've never seen this behaviour
here with Atheros AR8035.

> I then looked at linux-next and applied the FEC related patches from
> it. To no avail.

In some ways, no change is good.

> Finally I made started wireshark (on the desktop). And Wireshark said:
> 
>   Header checksum: 0x0000 [incorrect, should be 0xe65e (may be caused
> by "IP checksum offload"?)]
> 
> So I tried "ethtool -K eth0 tx off" (on the i.MX6 board) and suddenly
> ssh worked.

That sounds like the hardware IP header checksumming isn't working.  Is
there anything specific to your setup?  VLAN maybe?

Which gcc version are you using?

It could be that the hardware isn't seeing the update for cbd_esc before
cbd_sc.  To test that out, can you try putting a wmb() before the writes
to bdp->cbd_sc in:

fec_enet_txq_submit_frag_skb
fec_enet_txq_submit_skb
fec_enet_txq_put_data_tso
fec_enet_txq_put_hdr_tso

to ensure that the previous transmit descriptor updates are pushed out
before the descriptor is handed over to the hardware.  That's a little
heavy-weight, but let's use the sledge hammer first...

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* BUG: i.MX6-FEC: broken TCP tx checksumming
  2014-07-16  9:58 ` Russell King - ARM Linux
@ 2014-07-16 14:03   ` Holger Schurig
  2014-07-16 18:39     ` Russell King - ARM Linux
  2014-07-16 14:09   ` Holger Schurig
  1 sibling, 1 reply; 6+ messages in thread
From: Holger Schurig @ 2014-07-16 14:03 UTC (permalink / raw)
  To: linux-arm-kernel

>> So I tried "ethtool -K eth0 tx off" (on the i.MX6 board) and suddenly
>> ssh worked.
>
> That sounds like the hardware IP header checksumming isn't working.  Is
> there anything specific to your setup?  VLAN maybe?

No, no games, I just did a normal "ifconfig eth0 192.168.200.199".

> Which gcc version are you using?

Currently http://releases.linaro.org/14.04/components/toolchain/binaries/gcc-linaro-arm-linux-gnueabihf-4.8-2014.04_linux.tar.xz
 But I could try a different one.

> That's a little heavy-weight, but let's use the sledge hammer first...

The sledge hammer worked!

Do you want to to find out which of the for wmb() is actually needed?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* BUG: i.MX6-FEC: broken TCP tx checksumming
  2014-07-16  9:58 ` Russell King - ARM Linux
  2014-07-16 14:03   ` Holger Schurig
@ 2014-07-16 14:09   ` Holger Schurig
  1 sibling, 0 replies; 6+ messages in thread
From: Holger Schurig @ 2014-07-16 14:09 UTC (permalink / raw)
  To: linux-arm-kernel

> hat's again read via the generic phy layer, and that suggests that
> the phy finally reported the correct speed... again, I wonder if
> something in the phy layer is buggy.

It's just miitool that is broken. When the link is up & ready, this happens:

root at mde:~# ethtool eth0
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes:   10baseT/Half 10baseT/Full
                       100baseT/Half 100baseT/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full
                       100baseT/Half 100baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                    100baseT/Half 100baseT/Full
Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Link detected: yes
root at mde:~# mii-tool eth0
eth0: negotiated 1000baseT-HD flow-control, link ok

Seems also that things in /sys/class/net are ok:

root at mde:~# cat /sys/class/net/eth0/speed
100
root at mde:~# cat /sys/class/net/eth0/duplex
full

mii-tool calls (according to strace) SIOCGMIIPHY, whereas ethtool uses
SIOCETHTOOL.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* BUG: i.MX6-FEC: broken TCP tx checksumming
  2014-07-16 14:03   ` Holger Schurig
@ 2014-07-16 18:39     ` Russell King - ARM Linux
  2014-07-17  8:22       ` Holger Schurig
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux @ 2014-07-16 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 16, 2014 at 04:03:21PM +0200, Holger Schurig wrote:
> >> So I tried "ethtool -K eth0 tx off" (on the i.MX6 board) and suddenly
> >> ssh worked.
> >
> > That sounds like the hardware IP header checksumming isn't working.  Is
> > there anything specific to your setup?  VLAN maybe?
> 
> No, no games, I just did a normal "ifconfig eth0 192.168.200.199".
> 
> > Which gcc version are you using?
> 
> Currently http://releases.linaro.org/14.04/components/toolchain/binaries/gcc-linaro-arm-linux-gnueabihf-4.8-2014.04_linux.tar.xz
>  But I could try a different one.

I'd consider asking whether you'd send me fec_main.o with and without the
wmb() in it, but I'm not sure if I have the time to look at it right now.

> > That's a little heavy-weight, but let's use the sledge hammer first...
> 
> The sledge hammer worked!
> 
> Do you want to to find out which of the for wmb() is actually needed?

It may be worth checking whether it needs to be wmb(), or whether
barrier() will do - in other words, is it the compiler re-ordering the
stores, or is it the hardware re-ordering them.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* BUG: i.MX6-FEC: broken TCP tx checksumming
  2014-07-16 18:39     ` Russell King - ARM Linux
@ 2014-07-17  8:22       ` Holger Schurig
  0 siblings, 0 replies; 6+ messages in thread
From: Holger Schurig @ 2014-07-17  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

It's good that you didn't invest too much time, because I found the
real culprit.

I wanted to trigger the "clock from phy" code path in
imx6q_1588_init() (arch/arm/mach-imx/mach-imx6q.c). This is done in
weird way: the code checks the 3rd clock in the DTS somehow.

So I removed the 3rd clock in my dts, e.g. making it:

&fec {
    status = "okay";
    pinctrl-names = "default";
    pinctrl-0 = <&pinctrl_fec>;
    phy-mode = "rmii";
    phy-reset-gpios = <&gpio2 4 GPIO_ACTIVE_LOW>;
    clocks = <&clks 117>, <&clks 117>;
    clock-names = "ipg", "ahb";
};

And as soon as I make that, the TCP checksumming doesn't work. When I
remove that manual-clock-assignment it (so that the default get's
active), tcp checksumming works. That I told you it works with wmb()
was in error, because I changed several things at once (grave
debugging error....  shame on me!).

Now if the TCP checksumming code in the chip depends the ptp clock,
then no code part should check if it is set to something else. So I
guess imx6q_1588_init() is actualy buggy by checking for dtsclock[2]
!= ptp_clk.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-07-17  8:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-16  8:27 BUG: i.MX6-FEC: broken TCP tx checksumming Holger Schurig
2014-07-16  9:58 ` Russell King - ARM Linux
2014-07-16 14:03   ` Holger Schurig
2014-07-16 18:39     ` Russell King - ARM Linux
2014-07-17  8:22       ` Holger Schurig
2014-07-16 14:09   ` Holger Schurig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).