* [PATCH 0/1] mv643xx_eth: Disable TSO by default
@ 2014-11-01 15:30 Ezequiel Garcia
2014-11-01 15:30 ` [PATCH 1/1] net: mv643xx_eth: Make TSO disabled " Ezequiel Garcia
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Ezequiel Garcia @ 2014-11-01 15:30 UTC (permalink / raw)
To: netdev, David Miller
Cc: Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem,
Nadav Haklai, Ezequiel Garcia
Several users ([1], [2]) have been reporting data corruption with TSO on
Kirkwood platforms (i.e. using the mv643xx_eth driver).
Until we manage to find what's causing this, this simple patch will make
the TSO path disabled by default. This patch should be queued for stable,
fixing the TSO feature introduced in v3.16.
The corruption itself is very easy to reproduce: checking md5sum on a mounted
NFS directory gives a different result each time. Same tests using the mvneta
driver (Armada 370/38x/XP SoC) pass with no issues.
Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints
are well received.
[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=764162
[2] http://archlinuxarm.org/forum/viewtopic.php?f=9&t=7692
Ezequiel Garcia (1):
net: mv643xx_eth: Make TSO disabled by default
drivers/net/ethernet/marvell/mv643xx_eth.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--
2.1.0
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/1] net: mv643xx_eth: Make TSO disabled by default 2014-11-01 15:30 [PATCH 0/1] mv643xx_eth: Disable TSO by default Ezequiel Garcia @ 2014-11-01 15:30 ` Ezequiel Garcia 2014-11-01 17:00 ` Thomas Petazzoni 2014-11-01 17:26 ` [PATCH 0/1] mv643xx_eth: Disable TSO " Eric Dumazet 2014-11-04 14:20 ` Karl Beldan 2 siblings, 1 reply; 18+ messages in thread From: Ezequiel Garcia @ 2014-11-01 15:30 UTC (permalink / raw) To: netdev, David Miller Cc: Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai, Ezequiel Garcia Data corruption has been observed to be produced by TSO. For instance, accessing files on a NFS-server with TSO enabled results in different data transferred each time. This has been observed only on Kirkwood platforms, i.e. with the mv643xx_eth driver. Same tests on platforms using the mvneta ethernet driver have passed without errors. Make TSO disabled by default for now, until we can found a proper fix for the regression. Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') Reported-by: Slawomir Gajzner <slawomir.gajzner@gmail.com> Reported-by: Julien D'Ascenzio <jdascenzio@yahoo.fr> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> --- drivers/net/ethernet/marvell/mv643xx_eth.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index b151a94..8b72780 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -3110,11 +3110,11 @@ static int mv643xx_eth_probe(struct platform_device *pdev) dev->watchdog_timeo = 2 * HZ; dev->base_addr = 0; - dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO; + dev->features = NETIF_F_SG | NETIF_F_IP_CSUM; dev->vlan_features = dev->features; dev->features |= NETIF_F_RXCSUM; - dev->hw_features = dev->features; + dev->hw_features = dev->features | NETIF_F_TSO; dev->priv_flags |= IFF_UNICAST_FLT; dev->gso_max_segs = MV643XX_MAX_TSO_SEGS; -- 2.1.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1] net: mv643xx_eth: Make TSO disabled by default 2014-11-01 15:30 ` [PATCH 1/1] net: mv643xx_eth: Make TSO disabled " Ezequiel Garcia @ 2014-11-01 17:00 ` Thomas Petazzoni 2014-11-01 17:40 ` David Miller 0 siblings, 1 reply; 18+ messages in thread From: Thomas Petazzoni @ 2014-11-01 17:00 UTC (permalink / raw) To: Ezequiel Garcia Cc: netdev, David Miller, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai Dear Ezequiel Garcia, On Sat, 1 Nov 2014 12:30:20 -0300, Ezequiel Garcia wrote: > Data corruption has been observed to be produced by TSO. For instance, > accessing files on a NFS-server with TSO enabled results in different data > transferred each time. > > This has been observed only on Kirkwood platforms, i.e. with the mv643xx_eth > driver. Same tests on platforms using the mvneta ethernet driver have > passed without errors. > > Make TSO disabled by default for now, until we can found a proper fix > for the regression. > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') > Reported-by: Slawomir Gajzner <slawomir.gajzner@gmail.com> > Reported-by: Julien D'Ascenzio <jdascenzio@yahoo.fr> > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Shouldn't you have a: Cc: <stable@vger.kernel.org> # v3.16+ here ? Thanks, Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1] net: mv643xx_eth: Make TSO disabled by default 2014-11-01 17:00 ` Thomas Petazzoni @ 2014-11-01 17:40 ` David Miller 0 siblings, 0 replies; 18+ messages in thread From: David Miller @ 2014-11-01 17:40 UTC (permalink / raw) To: thomas.petazzoni Cc: ezequiel.garcia, netdev, gregory.clement, tawfik, alior, nadavh From: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Date: Sat, 1 Nov 2014 18:00:37 +0100 > Dear Ezequiel Garcia, > > On Sat, 1 Nov 2014 12:30:20 -0300, Ezequiel Garcia wrote: >> Data corruption has been observed to be produced by TSO. For instance, >> accessing files on a NFS-server with TSO enabled results in different data >> transferred each time. >> >> This has been observed only on Kirkwood platforms, i.e. with the mv643xx_eth >> driver. Same tests on platforms using the mvneta ethernet driver have >> passed without errors. >> >> Make TSO disabled by default for now, until we can found a proper fix >> for the regression. >> >> Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') >> Reported-by: Slawomir Gajzner <slawomir.gajzner@gmail.com> >> Reported-by: Julien D'Ascenzio <jdascenzio@yahoo.fr> >> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > > Shouldn't you have a: > > Cc: <stable@vger.kernel.org> # v3.16+ Networking patches do not get sent to -stable that way. Instead, people specifically request that I queue up the change for -stable and I submit it by hand at the appropriate time. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-01 15:30 [PATCH 0/1] mv643xx_eth: Disable TSO by default Ezequiel Garcia 2014-11-01 15:30 ` [PATCH 1/1] net: mv643xx_eth: Make TSO disabled " Ezequiel Garcia @ 2014-11-01 17:26 ` Eric Dumazet 2014-11-01 17:33 ` Thomas Petazzoni ` (3 more replies) 2014-11-04 14:20 ` Karl Beldan 2 siblings, 4 replies; 18+ messages in thread From: Eric Dumazet @ 2014-11-01 17:26 UTC (permalink / raw) To: Ezequiel Garcia Cc: netdev, David Miller, Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote: > Several users ([1], [2]) have been reporting data corruption with TSO on > Kirkwood platforms (i.e. using the mv643xx_eth driver). > > Until we manage to find what's causing this, this simple patch will make > the TSO path disabled by default. This patch should be queued for stable, > fixing the TSO feature introduced in v3.16. > > The corruption itself is very easy to reproduce: checking md5sum on a mounted > NFS directory gives a different result each time. Same tests using the mvneta > driver (Armada 370/38x/XP SoC) pass with no issues. > > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints > are well received. lack of barriers maybe ? It seems you might need to populate all TX descriptors but delay the first, like doing the populate in descending order. If you take a look at txq_submit_skb(), you'll see the final desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by txq_submit_frag_skb() You should kick the nick only when all TX descriptors are ready and committed to memory. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-01 17:26 ` [PATCH 0/1] mv643xx_eth: Disable TSO " Eric Dumazet @ 2014-11-01 17:33 ` Thomas Petazzoni 2014-11-01 18:01 ` Eric Dumazet 2014-11-01 17:37 ` Eric Dumazet ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Thomas Petazzoni @ 2014-11-01 17:33 UTC (permalink / raw) To: Eric Dumazet Cc: Ezequiel Garcia, netdev, David Miller, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai Dear Eric Dumazet, On Sat, 01 Nov 2014 10:26:06 -0700, Eric Dumazet wrote: > On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote: > > Several users ([1], [2]) have been reporting data corruption with TSO on > > Kirkwood platforms (i.e. using the mv643xx_eth driver). > > > > Until we manage to find what's causing this, this simple patch will make > > the TSO path disabled by default. This patch should be queued for stable, > > fixing the TSO feature introduced in v3.16. > > > > The corruption itself is very easy to reproduce: checking md5sum on a mounted > > NFS directory gives a different result each time. Same tests using the mvneta > > driver (Armada 370/38x/XP SoC) pass with no issues. > > > > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints > > are well received. > > lack of barriers maybe ? > > It seems you might need to populate all TX descriptors but delay the > first, like doing the populate in descending order. > > If you take a look at txq_submit_skb(), you'll see the final > desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by > txq_submit_frag_skb() > > You should kick the nick only when all TX descriptors are ready and > committed to memory. As far as I know, ARMv5 does not do out-of-order execution, and so on ARMv5, mb() == rmb() == wmb() == barrier(). But might be a missing compiler barrier, indeed, as that's not specific to the architecture. Thanks for the hint! Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-01 17:33 ` Thomas Petazzoni @ 2014-11-01 18:01 ` Eric Dumazet 0 siblings, 0 replies; 18+ messages in thread From: Eric Dumazet @ 2014-11-01 18:01 UTC (permalink / raw) To: Thomas Petazzoni Cc: Ezequiel Garcia, netdev, David Miller, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai On Sat, 2014-11-01 at 18:33 +0100, Thomas Petazzoni wrote: > Dear Eric Dumazet, You can call me Eric ;) > As far as I know, ARMv5 does not do out-of-order execution, and so on > ARMv5, mb() == rmb() == wmb() == barrier(). But might be a missing > compiler barrier, indeed, as that's not specific to the architecture. > > Thanks for the hint! I think the wmb() is needed, but the more worrying problem is the NIC can start transmitting a segment while the data payload is not yet placed in TX descriptor. Normally, the patch I cooked should avoid this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-01 17:26 ` [PATCH 0/1] mv643xx_eth: Disable TSO " Eric Dumazet 2014-11-01 17:33 ` Thomas Petazzoni @ 2014-11-01 17:37 ` Eric Dumazet 2014-11-01 19:05 ` Ezequiel Garcia 2014-11-01 17:42 ` David Miller 2014-11-03 14:51 ` David Laight 3 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2014-11-01 17:37 UTC (permalink / raw) To: Ezequiel Garcia Cc: netdev, David Miller, Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai On Sat, 2014-11-01 at 10:26 -0700, Eric Dumazet wrote: > On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote: > > Several users ([1], [2]) have been reporting data corruption with TSO on > > Kirkwood platforms (i.e. using the mv643xx_eth driver). > > > > Until we manage to find what's causing this, this simple patch will make > > the TSO path disabled by default. This patch should be queued for stable, > > fixing the TSO feature introduced in v3.16. > > > > The corruption itself is very easy to reproduce: checking md5sum on a mounted > > NFS directory gives a different result each time. Same tests using the mvneta > > driver (Armada 370/38x/XP SoC) pass with no issues. > > > > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints > > are well received. > > lack of barriers maybe ? > > It seems you might need to populate all TX descriptors but delay the > first, like doing the populate in descending order. > > If you take a look at txq_submit_skb(), you'll see the final > desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by > txq_submit_frag_skb() > > You should kick the nick only when all TX descriptors are ready and > committed to memory. > Untested patch would be : diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index b151a949f352a20ec8e74b4f3a7b6bb194ce841c..44789cc9a263992f91e46006d7e12703a2824cb4 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -773,7 +773,8 @@ txq_put_data_tso(struct net_device *dev, struct tx_queue *txq, } static inline void -txq_put_hdr_tso(struct sk_buff *skb, struct tx_queue *txq, int length) +txq_put_hdr_tso(struct sk_buff *skb, struct tx_queue *txq, int length, + struct tx_desc **pdesc, u32 *cmd) { struct mv643xx_eth_private *mp = txq_to_mp(txq); int hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb); @@ -797,9 +798,13 @@ txq_put_hdr_tso(struct sk_buff *skb, struct tx_queue *txq, int length) desc->byte_cnt = hdr_len; desc->buf_ptr = txq->tso_hdrs_dma + txq->tx_curr_desc * TSO_HEADER_SIZE; - desc->cmd_sts = cmd_csum | BUFFER_OWNED_BY_DMA | TX_FIRST_DESC | - GEN_CRC; - + cmd_csum |= BUFFER_OWNED_BY_DMA | TX_FIRST_DESC | GEN_CRC; + if (*pdesc == NULL) { + *pdesc = desc; + *cmd = cmd_csum; + } else { + desc->cmd_sts = cmd_csum; + } txq->tx_curr_desc++; if (txq->tx_curr_desc == txq->tx_ring_size) txq->tx_curr_desc = 0; @@ -813,6 +818,8 @@ static int txq_submit_tso(struct tx_queue *txq, struct sk_buff *skb, int desc_count = 0; struct tso_t tso; int hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb); + struct tx_desc *desc = NULL; + u32 cmd_sts = 0; /* Count needed descriptors */ if ((txq->tx_desc_count + tso_count_descs(skb)) >= txq->tx_ring_size) { @@ -834,7 +841,7 @@ static int txq_submit_tso(struct tx_queue *txq, struct sk_buff *skb, /* prepare packet headers: MAC + IP + TCP */ hdr = txq->tso_hdrs + txq->tx_curr_desc * TSO_HEADER_SIZE; tso_build_hdr(skb, hdr, &tso, data_left, total_len == 0); - txq_put_hdr_tso(skb, txq, data_left); + txq_put_hdr_tso(skb, txq, data_left, &desc, &cmd_sts); while (data_left > 0) { int size; @@ -854,6 +861,10 @@ static int txq_submit_tso(struct tx_queue *txq, struct sk_buff *skb, __skb_queue_tail(&txq->tx_skb, skb); skb_tx_timestamp(skb); + /* ensure all other descriptors are written before first cmd_sts */ + wmb(); + desc->cmd_sts = cmd_sts; + /* clear TX_END status */ mp->work_tx_end &= ~(1 << txq->index); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-01 17:37 ` Eric Dumazet @ 2014-11-01 19:05 ` Ezequiel Garcia 0 siblings, 0 replies; 18+ messages in thread From: Ezequiel Garcia @ 2014-11-01 19:05 UTC (permalink / raw) To: Eric Dumazet Cc: netdev, David Miller, Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai [-- Attachment #1: Type: text/plain, Size: 1716 bytes --] On 11/01/2014 02:37 PM, Eric Dumazet wrote: > On Sat, 2014-11-01 at 10:26 -0700, Eric Dumazet wrote: >> On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote: >>> Several users ([1], [2]) have been reporting data corruption with TSO on >>> Kirkwood platforms (i.e. using the mv643xx_eth driver). >>> >>> Until we manage to find what's causing this, this simple patch will make >>> the TSO path disabled by default. This patch should be queued for stable, >>> fixing the TSO feature introduced in v3.16. >>> >>> The corruption itself is very easy to reproduce: checking md5sum on a mounted >>> NFS directory gives a different result each time. Same tests using the mvneta >>> driver (Armada 370/38x/XP SoC) pass with no issues. >>> >>> Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints >>> are well received. >> >> lack of barriers maybe ? >> Yup, that was my initial thought as well... >> It seems you might need to populate all TX descriptors but delay the >> first, like doing the populate in descending order. >> >> If you take a look at txq_submit_skb(), you'll see the final >> desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by >> txq_submit_frag_skb() >> >> You should kick the nick only when all TX descriptors are ready and >> committed to memory. >> > > Untested patch would be : > Yeah, it makes sense. I'm still seeing the corruption after applying your patch. However, maybe we are onto something. I'll see about taking a closer look and give this some more thought. Thanks for the hint! -- Ezequiel García, Free Electrons Embedded Linux, Kernel and Android Engineering http://free-electrons.com [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-01 17:26 ` [PATCH 0/1] mv643xx_eth: Disable TSO " Eric Dumazet 2014-11-01 17:33 ` Thomas Petazzoni 2014-11-01 17:37 ` Eric Dumazet @ 2014-11-01 17:42 ` David Miller 2014-11-03 14:51 ` David Laight 3 siblings, 0 replies; 18+ messages in thread From: David Miller @ 2014-11-01 17:42 UTC (permalink / raw) To: eric.dumazet Cc: ezequiel.garcia, netdev, thomas.petazzoni, gregory.clement, tawfik, alior, nadavh From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sat, 01 Nov 2014 10:26:06 -0700 > On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote: >> Several users ([1], [2]) have been reporting data corruption with TSO on >> Kirkwood platforms (i.e. using the mv643xx_eth driver). >> >> Until we manage to find what's causing this, this simple patch will make >> the TSO path disabled by default. This patch should be queued for stable, >> fixing the TSO feature introduced in v3.16. >> >> The corruption itself is very easy to reproduce: checking md5sum on a mounted >> NFS directory gives a different result each time. Same tests using the mvneta >> driver (Armada 370/38x/XP SoC) pass with no issues. >> >> Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints >> are well received. > > lack of barriers maybe ? > > It seems you might need to populate all TX descriptors but delay the > first, like doing the populate in descending order. > > If you take a look at txq_submit_skb(), you'll see the final > desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by > txq_submit_frag_skb() > > You should kick the nick only when all TX descriptors are ready and > committed to memory. Yes, please look into whether doing something like this fixes the problem. ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-01 17:26 ` [PATCH 0/1] mv643xx_eth: Disable TSO " Eric Dumazet ` (2 preceding siblings ...) 2014-11-01 17:42 ` David Miller @ 2014-11-03 14:51 ` David Laight 2014-11-03 19:04 ` Eric Dumazet 3 siblings, 1 reply; 18+ messages in thread From: David Laight @ 2014-11-03 14:51 UTC (permalink / raw) To: 'Eric Dumazet', Ezequiel Garcia Cc: netdev@vger.kernel.org, David Miller, Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai From: Eric Dumazet > On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote: > > Several users ([1], [2]) have been reporting data corruption with TSO on > > Kirkwood platforms (i.e. using the mv643xx_eth driver). > > > > Until we manage to find what's causing this, this simple patch will make > > the TSO path disabled by default. This patch should be queued for stable, > > fixing the TSO feature introduced in v3.16. > > > > The corruption itself is very easy to reproduce: checking md5sum on a mounted > > NFS directory gives a different result each time. Same tests using the mvneta > > driver (Armada 370/38x/XP SoC) pass with no issues. > > > > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints > > are well received. > > lack of barriers maybe ? > > It seems you might need to populate all TX descriptors but delay the > first, like doing the populate in descending order. > > If you take a look at txq_submit_skb(), you'll see the final > desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by > txq_submit_frag_skb() > > You should kick the nick only when all TX descriptors are ready and > committed to memory. Don't forget that the nick might process the first descriptor without being given a 'kick' - it will read it when it finishes processing the previous frame. This also means that you have to be careful about the order of the writes to the first descriptor. David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-03 14:51 ` David Laight @ 2014-11-03 19:04 ` Eric Dumazet 0 siblings, 0 replies; 18+ messages in thread From: Eric Dumazet @ 2014-11-03 19:04 UTC (permalink / raw) To: David Laight Cc: Ezequiel Garcia, netdev@vger.kernel.org, David Miller, Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai On Mon, 2014-11-03 at 14:51 +0000, David Laight wrote: > From: Eric Dumazet > > On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote: > > > Several users ([1], [2]) have been reporting data corruption with TSO on > > > Kirkwood platforms (i.e. using the mv643xx_eth driver). > > > > > > Until we manage to find what's causing this, this simple patch will make > > > the TSO path disabled by default. This patch should be queued for stable, > > > fixing the TSO feature introduced in v3.16. > > > > > > The corruption itself is very easy to reproduce: checking md5sum on a mounted > > > NFS directory gives a different result each time. Same tests using the mvneta > > > driver (Armada 370/38x/XP SoC) pass with no issues. > > > > > > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints > > > are well received. > > > > lack of barriers maybe ? > > > > It seems you might need to populate all TX descriptors but delay the > > first, like doing the populate in descending order. > > > > If you take a look at txq_submit_skb(), you'll see the final > > desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by > > txq_submit_frag_skb() > > > > You should kick the nick only when all TX descriptors are ready and > > committed to memory. > > Don't forget that the nick might process the first descriptor without > being given a 'kick' - it will read it when it finishes processing the > previous frame. > This also means that you have to be careful about the order of the writes > to the first descriptor. This is what I implied and implemented in the patch ;) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-01 15:30 [PATCH 0/1] mv643xx_eth: Disable TSO by default Ezequiel Garcia 2014-11-01 15:30 ` [PATCH 1/1] net: mv643xx_eth: Make TSO disabled " Ezequiel Garcia 2014-11-01 17:26 ` [PATCH 0/1] mv643xx_eth: Disable TSO " Eric Dumazet @ 2014-11-04 14:20 ` Karl Beldan 2014-11-05 8:39 ` Bug#764162: " Ian Campbell 2 siblings, 1 reply; 18+ messages in thread From: Karl Beldan @ 2014-11-04 14:20 UTC (permalink / raw) To: Ezequiel Garcia Cc: netdev, David Miller, Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai On Sat, Nov 01, 2014 at 12:30:19PM -0300, Ezequiel Garcia wrote: > Several users ([1], [2]) have been reporting data corruption with TSO on > Kirkwood platforms (i.e. using the mv643xx_eth driver). > > Until we manage to find what's causing this, this simple patch will make > the TSO path disabled by default. This patch should be queued for stable, > fixing the TSO feature introduced in v3.16. > > The corruption itself is very easy to reproduce: checking md5sum on a mounted > NFS directory gives a different result each time. Same tests using the mvneta > driver (Armada 370/38x/XP SoC) pass with no issues. > > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints > are well received. > Hi, Can you try this : @@ -1067,7 +1082,8 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force) txq->tx_desc_count--; skb = NULL; - if (cmd_sts & TX_LAST_DESC) + if ((cmd_sts & (TX_LAST_DESC | TX_ENABLE_INTERRUPT)) == + (TX_LAST_DESC | TX_ENABLE_INTERRUPT)) skb = __skb_dequeue(&txq->tx_skb); if (cmd_sts & ERROR_SUMMARY) { -- Karl ^ permalink raw reply [flat|nested] 18+ messages in thread
* Bug#764162: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-04 14:20 ` Karl Beldan @ 2014-11-05 8:39 ` Ian Campbell 2014-11-05 10:09 ` Karl Beldan 0 siblings, 1 reply; 18+ messages in thread From: Ian Campbell @ 2014-11-05 8:39 UTC (permalink / raw) To: Karl Beldan Cc: Ezequiel Garcia, netdev, David Miller, Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai, 764162 On Tue, 2014-11-04 at 15:20 +0100, Karl Beldan wrote: > On Sat, Nov 01, 2014 at 12:30:19PM -0300, Ezequiel Garcia wrote: > > Several users ([1], [2]) have been reporting data corruption with TSO on > > Kirkwood platforms (i.e. using the mv643xx_eth driver). > > > > Until we manage to find what's causing this, this simple patch will make > > the TSO path disabled by default. This patch should be queued for stable, > > fixing the TSO feature introduced in v3.16. > > > > The corruption itself is very easy to reproduce: checking md5sum on a mounted > > NFS directory gives a different result each time. Same tests using the mvneta > > driver (Armada 370/38x/XP SoC) pass with no issues. > > > > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints > > are well received. > > > > Hi, > > Can you try this : It fixes things for me, thanks! Tested-by: Ian Campbell <ijc@hellion.org.uk> > @@ -1067,7 +1082,8 @@ static int txq_reclaim(struct tx_queue *txq, int budget, int force) > txq->tx_desc_count--; > > skb = NULL; > - if (cmd_sts & TX_LAST_DESC) > + if ((cmd_sts & (TX_LAST_DESC | TX_ENABLE_INTERRUPT)) == > + (TX_LAST_DESC | TX_ENABLE_INTERRUPT)) > skb = __skb_dequeue(&txq->tx_skb); > > if (cmd_sts & ERROR_SUMMARY) { > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default 2014-11-05 8:39 ` Bug#764162: " Ian Campbell @ 2014-11-05 10:09 ` Karl Beldan 0 siblings, 0 replies; 18+ messages in thread From: Karl Beldan @ 2014-11-05 10:09 UTC (permalink / raw) To: Ian Campbell Cc: Ezequiel Garcia, netdev, David Miller, Thomas Petazzoni, Gregory Clement, Tawfik Bayouk, Lior Amsalem, Nadav Haklai, 764162 On Wed, Nov 05, 2014 at 08:39:26AM +0000, Ian Campbell wrote: > On Tue, 2014-11-04 at 15:20 +0100, Karl Beldan wrote: > > On Sat, Nov 01, 2014 at 12:30:19PM -0300, Ezequiel Garcia wrote: > > > Several users ([1], [2]) have been reporting data corruption with TSO on > > > Kirkwood platforms (i.e. using the mv643xx_eth driver). > > > > > > Until we manage to find what's causing this, this simple patch will make > > > the TSO path disabled by default. This patch should be queued for stable, > > > fixing the TSO feature introduced in v3.16. > > > > > > The corruption itself is very easy to reproduce: checking md5sum on a mounted > > > NFS directory gives a different result each time. Same tests using the mvneta > > > driver (Armada 370/38x/XP SoC) pass with no issues. > > > > > > Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints > > > are well received. > > > > > > > Hi, > > > > Can you try this : > > It fixes things for me, thanks! > > Tested-by: Ian Campbell <ijc@hellion.org.uk> > Good thing, thanks for your feedbak Ian ! Karl ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/1] mv643xx_eth: Disable TSO by default @ 2015-05-29 6:16 Dr. Uwe Meyer-Gruhl 2015-05-29 13:07 ` Ezequiel Garcia 0 siblings, 1 reply; 18+ messages in thread From: Dr. Uwe Meyer-Gruhl @ 2015-05-29 6:16 UTC (permalink / raw) To: linux-arm-kernel > On Wed, Nov 05, 2014 at 08:39:26AM +0000, Ian Campbell wrote: >> On Tue, 2014-11-04 at 15:20 +0100, Karl Beldan wrote: >>> On Sat, Nov 01, 2014 at 12:30:19PM -0300, Ezequiel Garcia wrote: >>>> Several users ([1], [2]) have been reporting data corruption >>>> with TSO on Kirkwood platforms (i.e. using the mv643xx_eth >>>> driver). >>>> >>>> Until we manage to find what's causing this, this simple patch will make >>>> the TSO path disabled by default. This patch should be queued for stable, >>>> fixing the TSO feature introduced in v3.16. >>>> >>>> The corruption itself is very easy to reproduce: checkingmd5sum on a mounted >>>> NFS directory gives a different result each time. Same tests using the mvneta >>>> driver (Armada 370/38x/XP SoC) pass with no issues. >>>> >>>> Frankly, I'm a bit puzzled about this, and so any ideas ordebugging hints >>>> are well received. >>>> >>> >>> Hi, >>> >>> Can you try this : >> >> It fixes things for me, thanks! >> >> Tested-by: Ian Campbell <ijc@hellion.org.uk> >> > > Good thing, thanks for your feedbak Ian ! > > Karl -- That would be a good thing - although: Neither the patch to disable TSO altogether nor the one that fixes the underlying problem actually made it to the official kernel source tree, so it is still present in all kernels > 3.16 - I just stumbled over this in the current 4.0.4 version. The fixes in the thread http://marc.info/?l=linux-netdev&m=141517941900547&w=2 are not applicable any more to the current driver from the 4.0 kernel, as the whole respective logic seems to have been changed meanwhile, sadly without fixing the problem. Disabling TSO completely still works, though. Can someone in the know please suggest a working fix to the kernel maintainers, preferably one that does not resort to disable TSO? Uwe -- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3740 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150529/a14a7b4f/attachment.p7s> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/1] mv643xx_eth: Disable TSO by default 2015-05-29 6:16 Dr. Uwe Meyer-Gruhl @ 2015-05-29 13:07 ` Ezequiel Garcia 2015-05-29 19:53 ` Dr. Uwe Meyer-Gruhl 0 siblings, 1 reply; 18+ messages in thread From: Ezequiel Garcia @ 2015-05-29 13:07 UTC (permalink / raw) To: linux-arm-kernel On 05/29/2015 03:16 AM, Dr. Uwe Meyer-Gruhl wrote: >> On Wed, Nov 05, 2014 at 08:39:26AM +0000, Ian Campbell wrote: >>> On Tue, 2014-11-04 at 15:20 +0100, Karl Beldan wrote: >>>> On Sat, Nov 01, 2014 at 12:30:19PM -0300, Ezequiel Garcia wrote: >>>>> Several users ([1], [2]) have been reporting data corruption >>>>> with TSO on Kirkwood platforms (i.e. using the mv643xx_eth >>>>> driver). >>>>> >>>>> Until we manage to find what's causing this, this simple patch >>>>> will make >>>>> the TSO path disabled by default. This patch should be queued > for stable, >>>>> fixing the TSO feature introduced in v3.16. >>>>> >>>>> The corruption itself is very easy to reproduce: checkingmd5sum on >>>>> a mounted >>>>> NFS directory gives a different result each time. Same tests using >>>>> the mvneta >>>>> driver (Armada 370/38x/XP SoC) pass with no issues. >>>>> >>>>> Frankly, I'm a bit puzzled about this, and so any ideas ordebugging >>>>> hints >>>>> are well received. >>>>> >>>> >>>> Hi, >>>> >>>> Can you try this : >>> >>> It fixes things for me, thanks! >>> >>> Tested-by: Ian Campbell <ijc@hellion.org.uk> >>> >> >> Good thing, thanks for your feedbak Ian ! >> >> Karl -- > > That would be a good thing - although: Neither the patch to disable TSO > altogether nor the one that fixes the underlying problem actually made > it to the official kernel source tree, so it is still present in all > kernels > 3.16 - I just stumbled over this in the current 4.0.4 version. > > The fixes in the thread > http://marc.info/?l=linux-netdev&m=141517941900547&w=2 are not > applicable any more to the current driver from the 4.0 kernel, as the > whole respective logic seems to have been changed meanwhile, sadly > without fixing the problem. Disabling TSO completely still works, though. > > Can someone in the know please suggest a working fix to the kernel > maintainers, preferably one that does not resort to disable TSO? > Hello Uwe, Thanks for reporting. There wasn't any patch left behind. The above fix was merged (and applied to the stable tree) as this commit: commit 2c2a9cbd64387d6b70ac5db013e9bfe9412c7354 Author: Karl Beldan <karl.beldan@rivierawaves.com> Date: Wed Nov 5 15:32:59 2014 +0100 net: mv643xx_eth: reclaim TX skbs only when released by the HW Can you explain in detail what sort of problem are you running into, which kernel version are you using, etc.? -- Ezequiel Garcia, VanguardiaSur www.vanguardiasur.com.ar ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/1] mv643xx_eth: Disable TSO by default 2015-05-29 13:07 ` Ezequiel Garcia @ 2015-05-29 19:53 ` Dr. Uwe Meyer-Gruhl 0 siblings, 0 replies; 18+ messages in thread From: Dr. Uwe Meyer-Gruhl @ 2015-05-29 19:53 UTC (permalink / raw) To: linux-arm-kernel Am 29.05.2015 um 15:07 schrieb Ezequiel Garcia: > On 05/29/2015 03:16 AM, Dr. Uwe Meyer-Gruhl wrote: >>> On Wed, Nov 05, 2014 at 08:39:26AM +0000, Ian Campbell wrote: >>>> On Tue, 2014-11-04 at 15:20 +0100, Karl Beldan wrote: >>>>> On Sat, Nov 01, 2014 at 12:30:19PM -0300, Ezequiel Garcia wrote: >>>>>> Several users ([1], [2]) have been reporting data corruption >>>>>> with TSO on Kirkwood platforms (i.e. using the mv643xx_eth >>>>>> driver). >>>>>> >>>>>> Until we manage to find what's causing this, this simple patch >>>>>> will make >>>>>> the TSO path disabled by default. This patch should be queued >> for stable, >>>>>> fixing the TSO feature introduced in v3.16. >>>>>> >>>>>> The corruption itself is very easy to reproduce: checkingmd5sum on >>>>>> a mounted >>>>>> NFS directory gives a different result each time. Same tests using >>>>>> the mvneta >>>>>> driver (Armada 370/38x/XP SoC) pass with no issues. >>>>>> >>>>>> Frankly, I'm a bit puzzled about this, and so any ideas ordebugging >>>>>> hints >>>>>> are well received. >>>>>> >>>>> Hi, >>>>> >>>>> Can you try this : >>>> It fixes things for me, thanks! >>>> >>>> Tested-by: Ian Campbell <ijc@hellion.org.uk> >>>> >>> Good thing, thanks for your feedbak Ian ! >>> >>> Karl -- >> That would be a good thing - although: Neither the patch to disable TSO >> altogether nor the one that fixes the underlying problem actually made >> it to the official kernel source tree, so it is still present in all >> kernels > 3.16 - I just stumbled over this in the current 4.0.4 version. >> >> The fixes in the thread >> http://marc.info/?l=linux-netdev&m=141517941900547&w=2 are not >> applicable any more to the current driver from the 4.0 kernel, as the >> whole respective logic seems to have been changed meanwhile, sadly >> without fixing the problem. Disabling TSO completely still works, though. >> >> Can someone in the know please suggest a working fix to the kernel >> maintainers, preferably one that does not resort to disable TSO? >> > Hello Uwe, > > Thanks for reporting. There wasn't any patch left behind. The above fix > was merged (and applied to the stable tree) as this commit: > > commit 2c2a9cbd64387d6b70ac5db013e9bfe9412c7354 > Author: Karl Beldan <karl.beldan@rivierawaves.com> > Date: Wed Nov 5 15:32:59 2014 +0100 > > net: mv643xx_eth: reclaim TX skbs only when released by the HW > > Can you explain in detail what sort of problem are you running into, > which kernel version are you using, etc.? Sure, the same kind as had been reported initially. Background: I have done a Debian port running on an Iomega iConnect (a Kirkwood variant). One of my users reported corrupted data when he used the machine as a file and media server over Samba and DLNA. He ruled out any hardware or other faults and pointed to my kernel(s). He actually tried versions 3.18.1 and 4.0.4. However, my kernels are mostly plain vanilla from kernel.org without any relevant patches to the code itself. The user also found that a kernel 3.14 ran O.K. in the same context. When I tried, I could just use md5sum on the /usr subtree repeatedly without problems locally, but over NFS with the iConnect as a server, I had differences on every try in different files each time. After ruling out network issues as well, it was clear that the NIC driver must be the culprit and after a bit of searching which one it actually was, I quickly found the messages about the broken TSO feature in mv643xx_eth. Then, I tried the suggested fix of "ethtool -K eth0 tso off" and voila - the errors are gone. For now, I use a patch to change the default to TSO off, but an upstream patch to handle TSO correctly would be much better. Uwe -- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3740 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20150529/12a23844/attachment.p7s> ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-05-29 19:53 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-11-01 15:30 [PATCH 0/1] mv643xx_eth: Disable TSO by default Ezequiel Garcia 2014-11-01 15:30 ` [PATCH 1/1] net: mv643xx_eth: Make TSO disabled " Ezequiel Garcia 2014-11-01 17:00 ` Thomas Petazzoni 2014-11-01 17:40 ` David Miller 2014-11-01 17:26 ` [PATCH 0/1] mv643xx_eth: Disable TSO " Eric Dumazet 2014-11-01 17:33 ` Thomas Petazzoni 2014-11-01 18:01 ` Eric Dumazet 2014-11-01 17:37 ` Eric Dumazet 2014-11-01 19:05 ` Ezequiel Garcia 2014-11-01 17:42 ` David Miller 2014-11-03 14:51 ` David Laight 2014-11-03 19:04 ` Eric Dumazet 2014-11-04 14:20 ` Karl Beldan 2014-11-05 8:39 ` Bug#764162: " Ian Campbell 2014-11-05 10:09 ` Karl Beldan -- strict thread matches above, loose matches on Subject: below -- 2015-05-29 6:16 Dr. Uwe Meyer-Gruhl 2015-05-29 13:07 ` Ezequiel Garcia 2015-05-29 19:53 ` Dr. Uwe Meyer-Gruhl
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.