* [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down
@ 2013-05-13 21:04 Nithin Nayak Sujir
2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Nithin Nayak Sujir @ 2013-05-13 21:04 UTC (permalink / raw)
To: davem; +Cc: netdev, Nithin Nayak Sujir
v2:
- Modify tg3_phy_power_bug() function to use a switch instead of a
complicated if statement. Suggested by Joe Perches.
Michael Chan (1):
tg3: Fix data corruption on 5725 with TSO
Nithin Nayak Sujir (1):
tg3: Skip powering down function 0 on certain serdes devices
drivers/net/ethernet/broadcom/tg3.c | 49 ++++++++++++++++++++++++++++++++-----
1 file changed, 43 insertions(+), 6 deletions(-)
--
1.8.1.4
^ permalink raw reply [flat|nested] 20+ messages in thread* [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices 2013-05-13 21:04 [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down Nithin Nayak Sujir @ 2013-05-13 21:04 ` Nithin Nayak Sujir 2013-05-14 18:08 ` Joe Perches 2013-05-13 21:04 ` [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO Nithin Nayak Sujir 2013-05-14 18:32 ` [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down David Miller 2 siblings, 1 reply; 20+ messages in thread From: Nithin Nayak Sujir @ 2013-05-13 21:04 UTC (permalink / raw) To: davem; +Cc: netdev, Nithin Nayak Sujir, stable, Michael Chan On the 5718, 5719 and 5720 serdes devices, powering down function 0 results in all the other ports being powered down. Add code to skip function 0 power down. v2: - Modify tg3_phy_power_bug() function to use a switch instead of a complicated if statement. Suggested by Joe Perches. Cc: <stable@vger.kernel.org> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> --- drivers/net/ethernet/broadcom/tg3.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 728d42a..781be76 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -2957,6 +2957,31 @@ static int tg3_5700_link_polarity(struct tg3 *tp, u32 speed) return 0; } +static bool tg3_phy_power_bug(struct tg3 *tp) +{ + switch (tg3_asic_rev(tp)) { + case ASIC_REV_5700: + case ASIC_REV_5704: + return true; + case ASIC_REV_5780: + if (tp->phy_flags & TG3_PHYFLG_MII_SERDES) + return true; + return false; + case ASIC_REV_5717: + if (!tp->pci_fn) + return true; + return false; + case ASIC_REV_5719: + case ASIC_REV_5720: + if ((tp->phy_flags & TG3_PHYFLG_PHY_SERDES) && + !tp->pci_fn) + return true; + return false; + } + + return false; +} + static void tg3_power_down_phy(struct tg3 *tp, bool do_low_power) { u32 val; @@ -3016,12 +3041,7 @@ static void tg3_power_down_phy(struct tg3 *tp, bool do_low_power) /* The PHY should not be powered down on some chips because * of bugs. */ - if (tg3_asic_rev(tp) == ASIC_REV_5700 || - tg3_asic_rev(tp) == ASIC_REV_5704 || - (tg3_asic_rev(tp) == ASIC_REV_5780 && - (tp->phy_flags & TG3_PHYFLG_MII_SERDES)) || - (tg3_asic_rev(tp) == ASIC_REV_5717 && - !tp->pci_fn)) + if (tg3_phy_power_bug(tp)) return; if (tg3_chip_rev(tp) == CHIPREV_5784_AX || -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices 2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir @ 2013-05-14 18:08 ` Joe Perches 2013-05-14 18:17 ` Nithin Nayak Sujir 0 siblings, 1 reply; 20+ messages in thread From: Joe Perches @ 2013-05-14 18:08 UTC (permalink / raw) To: Nithin Nayak Sujir; +Cc: davem, netdev, stable, Michael Chan On Mon, 2013-05-13 at 14:04 -0700, Nithin Nayak Sujir wrote: > On the 5718, 5719 and 5720 serdes devices, powering down function 0 > results in all the other ports being powered down. Add code to skip > function 0 power down. Hi Nithin. 5718? I'm confused a bit by the commit message. > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c [] > +static bool tg3_phy_power_bug(struct tg3 *tp) > +{ > + switch (tg3_asic_rev(tp)) { > + case ASIC_REV_5700: > + case ASIC_REV_5704: > + return true; > + case ASIC_REV_5780: > + if (tp->phy_flags & TG3_PHYFLG_MII_SERDES) > + return true; > + return false; > + case ASIC_REV_5717: > + if (!tp->pci_fn) > + return true; > + return false; > + case ASIC_REV_5719: > + case ASIC_REV_5720: > + if ((tp->phy_flags & TG3_PHYFLG_PHY_SERDES) && > + !tp->pci_fn) > + return true; > + return false; > + } > + Where is the 5718 in this? What is the 5718? There is no #define for it. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices 2013-05-14 18:08 ` Joe Perches @ 2013-05-14 18:17 ` Nithin Nayak Sujir 0 siblings, 0 replies; 20+ messages in thread From: Nithin Nayak Sujir @ 2013-05-14 18:17 UTC (permalink / raw) To: Joe Perches; +Cc: davem, netdev, stable, Michael Chan On 05/14/2013 11:08 AM, Joe Perches wrote: > On Mon, 2013-05-13 at 14:04 -0700, Nithin Nayak Sujir wrote: >> On the 5718, 5719 and 5720 serdes devices, powering down function 0 >> results in all the other ports being powered down. Add code to skip >> function 0 power down. > > Hi Nithin. 5718? I'm confused a bit by the commit message. > >> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c > [] >> +static bool tg3_phy_power_bug(struct tg3 *tp) >> +{ >> + switch (tg3_asic_rev(tp)) { >> + case ASIC_REV_5700: >> + case ASIC_REV_5704: >> + return true; >> + case ASIC_REV_5780: >> + if (tp->phy_flags & TG3_PHYFLG_MII_SERDES) >> + return true; >> + return false; >> + case ASIC_REV_5717: >> + if (!tp->pci_fn) >> + return true; >> + return false; >> + case ASIC_REV_5719: >> + case ASIC_REV_5720: >> + if ((tp->phy_flags & TG3_PHYFLG_PHY_SERDES) && >> + !tp->pci_fn) >> + return true; >> + return false; >> + } >> + > > Where is the 5718 in this? > What is the 5718? > There is no #define for it. > The 5718 is another device in the same family as the 5719 and 5720. There is no case or define for the ASIC_REV because it has the same asic revision as the 5719. However, it is a separate device and you can find it in the pci table and in the code as TG3PCI_DEVICE_TIGON3_5718. > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-13 21:04 [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down Nithin Nayak Sujir 2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir @ 2013-05-13 21:04 ` Nithin Nayak Sujir 2013-05-13 21:14 ` Eric Dumazet 2013-05-14 18:32 ` [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down David Miller 2 siblings, 1 reply; 20+ messages in thread From: Nithin Nayak Sujir @ 2013-05-13 21:04 UTC (permalink / raw) To: davem; +Cc: netdev, Michael Chan, stable, Nithin Nayak Sujir From: Michael Chan <mchan@broadcom.com> The 5725 family of devices (asic rev 5762), corrupts TSO packets where the buffer is within MSS bytes of a 4G boundary (4G, 8G etc.). Detect this condition and trigger the workaround path. Cc: <stable@vger.kernel.org> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> --- drivers/net/ethernet/broadcom/tg3.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 781be76..e285d76 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -7448,6 +7448,20 @@ static inline int tg3_4g_overflow_test(dma_addr_t mapping, int len) return (base > 0xffffdcc0) && (base + len + 8 < base); } +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes + * of any 4GB boundaries: 4G, 8G, etc + */ +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping, + u32 len, u32 mss) +{ + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { + u32 base = (u32) mapping & 0xffffffff; + + return ((base + len + (mss & 0x3fff)) < base); + } + return 0; +} + /* Test for DMA addresses > 40-bit */ static inline int tg3_40bit_overflow_test(struct tg3 *tp, dma_addr_t mapping, int len) @@ -7484,6 +7498,9 @@ static bool tg3_tx_frag_set(struct tg3_napi *tnapi, u32 *entry, u32 *budget, if (tg3_4g_overflow_test(map, len)) hwbug = true; + if (tg3_4g_tso_overflow_test(tp, map, len, mss)) + hwbug = true; + if (tg3_40bit_overflow_test(tp, map, len)) hwbug = true; -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-13 21:04 ` [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO Nithin Nayak Sujir @ 2013-05-13 21:14 ` Eric Dumazet 2013-05-13 21:34 ` Nithin Nayak Sujir 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2013-05-13 21:14 UTC (permalink / raw) To: Nithin Nayak Sujir; +Cc: davem, netdev, Michael Chan, stable On Mon, 2013-05-13 at 14:04 -0700, Nithin Nayak Sujir wrote: > From: Michael Chan <mchan@broadcom.com> > > The 5725 family of devices (asic rev 5762), corrupts TSO packets where > the buffer is within MSS bytes of a 4G boundary (4G, 8G etc.). Detect > this condition and trigger the workaround path. > > Cc: <stable@vger.kernel.org> > Signed-off-by: Michael Chan <mchan@broadcom.com> > Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> > --- > drivers/net/ethernet/broadcom/tg3.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c > index 781be76..e285d76 100644 > --- a/drivers/net/ethernet/broadcom/tg3.c > +++ b/drivers/net/ethernet/broadcom/tg3.c > @@ -7448,6 +7448,20 @@ static inline int tg3_4g_overflow_test(dma_addr_t mapping, int len) > return (base > 0xffffdcc0) && (base + len + 8 < base); > } > > +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes > + * of any 4GB boundaries: 4G, 8G, etc > + */ > +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping, > + u32 len, u32 mss) > +{ > + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { > + u32 base = (u32) mapping & 0xffffffff; > + > + return ((base + len + (mss & 0x3fff)) < base); > + } > + return 0; > +} > + I am curious : Does this condition even triggers ? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-13 21:14 ` Eric Dumazet @ 2013-05-13 21:34 ` Nithin Nayak Sujir 2013-05-13 21:40 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Nithin Nayak Sujir @ 2013-05-13 21:34 UTC (permalink / raw) To: Eric Dumazet; +Cc: davem, netdev, Michael Chan, stable On 05/13/2013 02:14 PM, Eric Dumazet wrote: > On Mon, 2013-05-13 at 14:04 -0700, Nithin Nayak Sujir wrote: >> From: Michael Chan <mchan@broadcom.com> >> >> The 5725 family of devices (asic rev 5762), corrupts TSO packets where >> the buffer is within MSS bytes of a 4G boundary (4G, 8G etc.). Detect >> this condition and trigger the workaround path. >> >> Cc: <stable@vger.kernel.org> >> Signed-off-by: Michael Chan <mchan@broadcom.com> >> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> >> --- >> drivers/net/ethernet/broadcom/tg3.c | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> >> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c >> index 781be76..e285d76 100644 >> --- a/drivers/net/ethernet/broadcom/tg3.c >> +++ b/drivers/net/ethernet/broadcom/tg3.c >> @@ -7448,6 +7448,20 @@ static inline int tg3_4g_overflow_test(dma_addr_t mapping, int len) >> return (base > 0xffffdcc0) && (base + len + 8 < base); >> } >> >> +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes >> + * of any 4GB boundaries: 4G, 8G, etc >> + */ >> +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping, >> + u32 len, u32 mss) >> +{ >> + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { >> + u32 base = (u32) mapping & 0xffffffff; >> + >> + return ((base + len + (mss & 0x3fff)) < base); >> + } >> + return 0; >> +} >> + > > I am curious : Does this condition even triggers ? > Yes, it's a rare problem to occur and was reported in our lab. After we implemented this fix, the problem didn't happen again. > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-13 21:34 ` Nithin Nayak Sujir @ 2013-05-13 21:40 ` Eric Dumazet 2013-05-13 21:47 ` Nithin Nayak Sujir 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2013-05-13 21:40 UTC (permalink / raw) To: Nithin Nayak Sujir; +Cc: davem, netdev, Michael Chan, stable On Mon, 2013-05-13 at 14:34 -0700, Nithin Nayak Sujir wrote: > > On 05/13/2013 02:14 PM, Eric Dumazet wrote: > >> +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes > >> + * of any 4GB boundaries: 4G, 8G, etc > >> + */ > >> +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping, > >> + u32 len, u32 mss) > >> +{ > >> + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { > >> + u32 base = (u32) mapping & 0xffffffff; > >> + > >> + return ((base + len + (mss & 0x3fff)) < base); > >> + } > >> + return 0; > >> +} > >> + > > > > I am curious : Does this condition even triggers ? > > > > Yes, it's a rare problem to occur and was reported in our lab. After we > implemented this fix, the problem didn't happen again. > I just cant figure out which part of the kernel could allocate a fragment spanning a 4G region. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-13 21:40 ` Eric Dumazet @ 2013-05-13 21:47 ` Nithin Nayak Sujir 2013-05-13 22:10 ` Eric Dumazet 2013-05-14 8:40 ` David Laight 0 siblings, 2 replies; 20+ messages in thread From: Nithin Nayak Sujir @ 2013-05-13 21:47 UTC (permalink / raw) To: Eric Dumazet; +Cc: davem, netdev, Michael Chan, stable On 05/13/2013 02:40 PM, Eric Dumazet wrote: > On Mon, 2013-05-13 at 14:34 -0700, Nithin Nayak Sujir wrote: >> >> On 05/13/2013 02:14 PM, Eric Dumazet wrote: > >>>> +/* Test for TSO DMA buffers that cross into regions which are within MSS bytes >>>> + * of any 4GB boundaries: 4G, 8G, etc >>>> + */ >>>> +static inline int tg3_4g_tso_overflow_test(struct tg3 *tp, dma_addr_t mapping, >>>> + u32 len, u32 mss) >>>> +{ >>>> + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { >>>> + u32 base = (u32) mapping & 0xffffffff; >>>> + >>>> + return ((base + len + (mss & 0x3fff)) < base); >>>> + } >>>> + return 0; >>>> +} >>>> + >>> >>> I am curious : Does this condition even triggers ? >>> >> >> Yes, it's a rare problem to occur and was reported in our lab. After we >> implemented this fix, the problem didn't happen again. >> > > I just cant figure out which part of the kernel could allocate a > fragment spanning a 4G region. > For the bug to occur, the fragment does not have to span a 4G boundary. If it is within MSS bytes (9.6k) of a 4G boundary, it triggers the failure. > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-13 21:47 ` Nithin Nayak Sujir @ 2013-05-13 22:10 ` Eric Dumazet 2013-05-14 8:40 ` David Laight 1 sibling, 0 replies; 20+ messages in thread From: Eric Dumazet @ 2013-05-13 22:10 UTC (permalink / raw) To: Nithin Nayak Sujir; +Cc: davem, netdev, Michael Chan, stable On Mon, 2013-05-13 at 14:47 -0700, Nithin Nayak Sujir wrote: > For the bug to occur, the fragment does not have to span a 4G boundary. If it is > within MSS bytes (9.6k) of a 4G boundary, it triggers the failure. > Ah, that indeed can happen. Thanks ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-13 21:47 ` Nithin Nayak Sujir 2013-05-13 22:10 ` Eric Dumazet @ 2013-05-14 8:40 ` David Laight 2013-05-14 15:04 ` Michael Chan 1 sibling, 1 reply; 20+ messages in thread From: David Laight @ 2013-05-14 8:40 UTC (permalink / raw) To: Nithin Nayak Sujir, Eric Dumazet; +Cc: davem, netdev, Michael Chan, stable > >>>> + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { > >>>> + u32 base = (u32) mapping & 0xffffffff; > >>>> + > >>>> + return ((base + len + (mss & 0x3fff)) < base); ... > For the bug to occur, the fragment does not have to span a 4G boundary. If it is > within MSS bytes (9.6k) of a 4G boundary, it triggers the failure. Would it be worth simplifying the test to assume that 'len' is 64k and 'mss' 9.6k? (commenting on the actual condition.) The number of false positives would be small, but the test a lot quicker. The '(u32)mapping + (0x10000 + 9600) < (u32)mapping' test might even be faster than the ' tg3_asic_rev(tp) == ASIC_REV_5762' one. David ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-14 8:40 ` David Laight @ 2013-05-14 15:04 ` Michael Chan 2013-05-14 15:20 ` David Laight 0 siblings, 1 reply; 20+ messages in thread From: Michael Chan @ 2013-05-14 15:04 UTC (permalink / raw) To: David Laight; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable On Tue, 2013-05-14 at 09:40 +0100, David Laight wrote: > > >>>> + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { > > >>>> + u32 base = (u32) mapping & 0xffffffff; > > >>>> + > > >>>> + return ((base + len + (mss & 0x3fff)) < base); > ... > > For the bug to occur, the fragment does not have to span a 4G boundary. If it is > > within MSS bytes (9.6k) of a 4G boundary, it triggers the failure. > > Would it be worth simplifying the test to assume that 'len' > is 64k and 'mss' 9.6k? > (commenting on the actual condition.) > The number of false positives would be small, but the test > a lot quicker. > The '(u32)mapping + (0x10000 + 9600) < (u32)mapping' test might > even be faster than the ' tg3_asic_rev(tp) == ASIC_REV_5762' one. I think that if we do this and detect a false positive, it may be very far from the 4G boundary. The new skb that we allocate to workaround the condition may be even closer to 4G and may hit the real bug condition. The mss and len values are accessed many times in this immediate code path just before setting the TX BD, gcc should be able to optimize this quite nicely. ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-14 15:04 ` Michael Chan @ 2013-05-14 15:20 ` David Laight 2013-05-14 16:19 ` Michael Chan 0 siblings, 1 reply; 20+ messages in thread From: David Laight @ 2013-05-14 15:20 UTC (permalink / raw) To: Michael Chan; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable > On Tue, 2013-05-14 at 09:40 +0100, David Laight wrote: > > > >>>> + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { > > > >>>> + u32 base = (u32) mapping & 0xffffffff; > > > >>>> + > > > >>>> + return ((base + len + (mss & 0x3fff)) < base); > > ... > > > For the bug to occur, the fragment does not have to span a 4G boundary. If it is > > > within MSS bytes (9.6k) of a 4G boundary, it triggers the failure. > > > > Would it be worth simplifying the test to assume that 'len' > > is 64k and 'mss' 9.6k? > > (commenting on the actual condition.) > > The number of false positives would be small, but the test > > a lot quicker. > > The '(u32)mapping + (0x10000 + 9600) < (u32)mapping' test might > > even be faster than the ' tg3_asic_rev(tp) == ASIC_REV_5762' one. > > I think that if we do this and detect a false positive, it may be very > far from the 4G boundary. It can't be very far away, approx 1 in 65k checks would fail. You could do the finer test afterwards. > The new skb that we allocate to workaround the condition may be > even closer to 4G and may hit the real bug condition. If the 'fix' is to relocate the skb you are doomed to lose regardless of the check - unless you are willing to reallocate a lot of times, and without freeing the old skb. I'd assumed the 'fix' was to disable the relevant offload. > The mss and len values are accessed many times in this immediate code > path just before setting the TX BD, gcc should be able to optimize this > quite nicely. I was looking at the number of branches in the hot path, not whether the values were already in registers. David ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-14 15:20 ` David Laight @ 2013-05-14 16:19 ` Michael Chan 2013-05-14 16:46 ` Eric Dumazet 2013-05-15 8:56 ` David Laight 0 siblings, 2 replies; 20+ messages in thread From: Michael Chan @ 2013-05-14 16:19 UTC (permalink / raw) To: David Laight; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable On Tue, 2013-05-14 at 16:20 +0100, David Laight wrote: > > On Tue, 2013-05-14 at 09:40 +0100, David Laight wrote: > > > > >>>> + if (tg3_asic_rev(tp) == ASIC_REV_5762 && mss) { > > > > >>>> + u32 base = (u32) mapping & 0xffffffff; > > > > >>>> + > > > > >>>> + return ((base + len + (mss & 0x3fff)) < base); > > > ... > > > > For the bug to occur, the fragment does not have to span a 4G boundary. If it is > > > > within MSS bytes (9.6k) of a 4G boundary, it triggers the failure. > > > > > > Would it be worth simplifying the test to assume that 'len' > > > is 64k and 'mss' 9.6k? > > > (commenting on the actual condition.) > > > The number of false positives would be small, but the test > > > a lot quicker. > > > The '(u32)mapping + (0x10000 + 9600) < (u32)mapping' test might > > > even be faster than the ' tg3_asic_rev(tp) == ASIC_REV_5762' one. > > > > I think that if we do this and detect a false positive, it may be very > > far from the 4G boundary. > > It can't be very far away, approx 1 in 65k checks would fail. > You could do the finer test afterwards. If we do a 2nd level test, it will be ok. But I'm not sure if it is worth the complexity. > > > The new skb that we allocate to workaround the condition may be > > even closer to 4G and may hit the real bug condition. > > If the 'fix' is to relocate the skb you are doomed to lose regardless > of the check - unless you are willing to reallocate a lot of times, > and without freeing the old skb. > I'd assumed the 'fix' was to disable the relevant offload. We relocate once and then drop the packet if we encounter additional errors, including OOM, DMA mapping error, 4G boundary, etc. The new linear skb should not hit the 4G boundary again. The room between the end of this current buffer and 4G isn't big enough for the new linear skb. > > > The mss and len values are accessed many times in this immediate code > > path just before setting the TX BD, gcc should be able to optimize this > > quite nicely. > > I was looking at the number of branches in the hot path, not whether > the values were already in registers. > Isn't the number of branches the same whether we use actual values in registers or fixed values? ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-14 16:19 ` Michael Chan @ 2013-05-14 16:46 ` Eric Dumazet 2013-05-15 8:56 ` David Laight 1 sibling, 0 replies; 20+ messages in thread From: Eric Dumazet @ 2013-05-14 16:46 UTC (permalink / raw) To: Michael Chan; +Cc: David Laight, Nithin Nayak Sujir, davem, netdev, stable On Tue, 2013-05-14 at 09:19 -0700, Michael Chan wrote: > We relocate once and then drop the packet if we encounter additional > errors, including OOM, DMA mapping error, 4G boundary, etc. The new > linear skb should not hit the 4G boundary again. The room between the > end of this current buffer and 4G isn't big enough for the new linear > skb. This remind me an issue on bnx2x : bnx2x FW has a limitation on GSO packets : A single mss can not span more than 10 fragments. After "net: use a per task frag allocator" patch, its possible for an application interleaving small write() on several sockets to build pathological skbs using 16 fragments (aka MAX_SKB_FRAGS) but small amount of payload. Fast path should build skbs with 2 or 3 fragments, as fragments can be order-3 pages. bnx2x driver performs an expensive skb_linearize() call and this can fail if memory is fragmented : skb->len can be around 64K, and including the skb_shared_info overhead, we might need order-5 pages. ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-14 16:19 ` Michael Chan 2013-05-14 16:46 ` Eric Dumazet @ 2013-05-15 8:56 ` David Laight 2013-05-15 15:12 ` Michael Chan 1 sibling, 1 reply; 20+ messages in thread From: David Laight @ 2013-05-15 8:56 UTC (permalink / raw) To: Michael Chan; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable > > If the 'fix' is to relocate the skb you are doomed to lose regardless > > of the check - unless you are willing to reallocate a lot of times, > > and without freeing the old skb. > > I'd assumed the 'fix' was to disable the relevant offload. > > We relocate once and then drop the packet if we encounter additional > errors, including OOM, DMA mapping error, 4G boundary, etc. The new > linear skb should not hit the 4G boundary again. The room between the > end of this current buffer and 4G isn't big enough for the new linear > skb. The first skb might be just below the 4G boundary and the second just below the 8G one. David ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-15 8:56 ` David Laight @ 2013-05-15 15:12 ` Michael Chan 2013-05-15 15:23 ` Eric Dumazet 0 siblings, 1 reply; 20+ messages in thread From: Michael Chan @ 2013-05-15 15:12 UTC (permalink / raw) To: David Laight; +Cc: Nithin Nayak Sujir, Eric Dumazet, davem, netdev, stable On Wed, 2013-05-15 at 09:56 +0100, David Laight wrote: > > > If the 'fix' is to relocate the skb you are doomed to lose regardless > > > of the check - unless you are willing to reallocate a lot of times, > > > and without freeing the old skb. > > > I'd assumed the 'fix' was to disable the relevant offload. > > > > We relocate once and then drop the packet if we encounter additional > > errors, including OOM, DMA mapping error, 4G boundary, etc. The new > > linear skb should not hit the 4G boundary again. The room between the > > end of this current buffer and 4G isn't big enough for the new linear > > skb. > > The first skb might be just below the 4G boundary and the > second just below the 8G one. > We will discard the packet if that happens. I think the probabilty is very small. ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-15 15:12 ` Michael Chan @ 2013-05-15 15:23 ` Eric Dumazet 2013-05-15 15:51 ` Michael Chan 0 siblings, 1 reply; 20+ messages in thread From: Eric Dumazet @ 2013-05-15 15:23 UTC (permalink / raw) To: Michael Chan; +Cc: David Laight, Nithin Nayak Sujir, davem, netdev, stable On Wed, 2013-05-15 at 08:12 -0700, Michael Chan wrote: > > We will discard the packet if that happens. I think the probabilty is > very small. Data corruption means the content of the packet was mangled on the wire, or was it a more serious issue, like a tx queue hang ? ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO 2013-05-15 15:23 ` Eric Dumazet @ 2013-05-15 15:51 ` Michael Chan 0 siblings, 0 replies; 20+ messages in thread From: Michael Chan @ 2013-05-15 15:51 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Laight, Nithin Nayak Sujir, davem, netdev, stable On Wed, 2013-05-15 at 08:23 -0700, Eric Dumazet wrote: > On Wed, 2013-05-15 at 08:12 -0700, Michael Chan wrote: > > > > > We will discard the packet if that happens. I think the probabilty is > > very small. > > Data corruption means the content of the packet was mangled on the wire, > or was it a more serious issue, like a tx queue hang ? > > I think the DMA engine is getting the wrong data under this condition so we end up with bad data but correct header checksums on the wire. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down 2013-05-13 21:04 [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down Nithin Nayak Sujir 2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir 2013-05-13 21:04 ` [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO Nithin Nayak Sujir @ 2013-05-14 18:32 ` David Miller 2 siblings, 0 replies; 20+ messages in thread From: David Miller @ 2013-05-14 18:32 UTC (permalink / raw) To: nsujir; +Cc: netdev From: "Nithin Nayak Sujir" <nsujir@broadcom.com> Date: Mon, 13 May 2013 14:04:14 -0700 > v2: > - Modify tg3_phy_power_bug() function to use a switch instead of a > complicated if statement. Suggested by Joe Perches. > > Michael Chan (1): > tg3: Fix data corruption on 5725 with TSO > > Nithin Nayak Sujir (1): > tg3: Skip powering down function 0 on certain serdes devices All applied, thanks. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2013-05-15 15:51 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-13 21:04 [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down Nithin Nayak Sujir 2013-05-13 21:04 ` [PATCH v2 net 1/2] tg3: Skip powering down function 0 on certain serdes devices Nithin Nayak Sujir 2013-05-14 18:08 ` Joe Perches 2013-05-14 18:17 ` Nithin Nayak Sujir 2013-05-13 21:04 ` [PATCH v2 net 2/2] tg3: Fix data corruption on 5725 with TSO Nithin Nayak Sujir 2013-05-13 21:14 ` Eric Dumazet 2013-05-13 21:34 ` Nithin Nayak Sujir 2013-05-13 21:40 ` Eric Dumazet 2013-05-13 21:47 ` Nithin Nayak Sujir 2013-05-13 22:10 ` Eric Dumazet 2013-05-14 8:40 ` David Laight 2013-05-14 15:04 ` Michael Chan 2013-05-14 15:20 ` David Laight 2013-05-14 16:19 ` Michael Chan 2013-05-14 16:46 ` Eric Dumazet 2013-05-15 8:56 ` David Laight 2013-05-15 15:12 ` Michael Chan 2013-05-15 15:23 ` Eric Dumazet 2013-05-15 15:51 ` Michael Chan 2013-05-14 18:32 ` [PATCH v2 net 0/2] tg3: 2 bugfixes - TSO data corruption and phy power down David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).