* [REGRESSION] r8169: jumbo fixes caused jumbo regressions!
@ 2012-11-13 17:06 Kirill Smelkov
2012-11-13 22:35 ` Francois Romieu
0 siblings, 1 reply; 4+ messages in thread
From: Kirill Smelkov @ 2012-11-13 17:06 UTC (permalink / raw)
To: Francois Romieu
Cc: Realtek linux nic maintainers, Hayes Wang, David S. Miller,
Greg Kroah-Hartman, netdev
Short description:
I run net-next on my netbook with yukon2 ethernet controller and stable-3.0
at work with pcie realtek network chips on several hosts. Upgrading from
3.0.45 to 3.0.46 there revealed jumbo-related regression, because of
r8169: jumbo fixes.
which is
cc669c37ba4a9c5c54c7842d0c9428aab64d62d7 at stable-3.0, and
d58d46b5d85139d18eb939aa7279c160bab70484 upstream
The problem is it is no longer possible to use 7200 mtu and tx checksum
offload. Both features used to work without problems.
Details
-------
I have two machines with realtek chips in them. They are
eth0: RTL8168cp/8111cp at 0xdffb8000, 00:18:7d:11:83:2b, XID 1cb00080 IRQ 16
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
and
eth0: RTL8168c/8111c at 0xf8062000, 00:22:15:90:7e:c6, XID 1c4000c0 IRQ 17
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
Visually looking at chips, I can confirm that they are labelled as RTL8111CP
and RTL8111C accordingly.
I used to set mtu=7200 and turn tx checksum offload on on them and
transmit/receive almost gigabit traffic from/to either of them without a
problem. This worked fine until upgrade from 3.0.45 to 3.0.46 where
things broke - now for both devices r8169 driver says:
eth0: jumbo features [frames: 6128 bytes, tx checksumming: ko]
i.e. only 6128 max mtu and no support for tx checksum offload.
Indeed, for one thing the patch says tx checksumming cannot work together with
jumbo frames:
commit cc669c37ba4a9c5c54c7842d0c9428aab64d62d7
Author: Francois Romieu <romieu@fr.zoreil.com>
AuthorDate: Fri Oct 5 23:29:11 2012 +0200
Commit: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CommitDate: Sat Oct 13 05:28:12 2012 +0900
r8169: jumbo fixes.
commit d58d46b5d85139d18eb939aa7279c160bab70484 upstream.
- fix features : jumbo frames and checksumming can not be used at the
same time.
- introduce hw_jumbo_{enable / disable} helpers. Their content has been
creatively extracted from Realtek's own drivers. As an illustration,
it would be nice to know how/if the MaxTxPacketSize register operates
when the device can work with a 9k jumbo frame as its documentation
(8168c) can not be applied beyond ~7k.
- rtl_tx_performance_tweak is moved forward. No change.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
but again, I say that up till now I've used ~7K jumbos with tx checksum offload
just fine on those chips:
My test is to stream raw video from 8 PAL cameras to net - 4 for 720x576@25 and
4 for 360x288@25 which for YUYV format occupies ~ 860 Mbps of bandwidth. The
program to transmit/receive video is here: http://repo.or.cz/w/rawv.git
For video sources vivi.ko video driver is used with fps set to 25. The
streams are generated with
$ rawv -d /dev/video$X,720x576 -t 239.255.17.$X:1200$X # X=1..4, 5834 eth framelen
$ rawv -d /dev/video$X,360x288 -t 239.255.17.$X:1200$X # X=5..8, 6554 eth framelen
(which is more than 6K jumbos for the second case), and also to come
close to 7K limit with
$ rawv -d /dev/video$X,708x576 -t 239.255.17.$X:1200$X # X=1..4, 7154 eth framelen
$ rawv -d /dev/video$X,352x288 -t 239.255.17.$X:1200$X # X=5..8, 7114 eth framelen
This used to work fine with mtu set to 7200 or 7152 (=7152+14+2 =7168 =1024*7
max eth framelen) and tx csum offload turned on via `ethtool -K eth0 tx on`.
Patching the driver to know "true xid"
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index f7a56f4..247a238 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1773,6 +1778,7 @@ static void rtl8169_get_mac_version(struct rtl8169_private *tp,
reg = RTL_R32(TxConfig);
while ((reg & p->mask) != p->val)
p++;
+ dprintk("mac_version for 0x%08x (0x%08x): %i\n", reg, reg & 0x9cf0f8ff,p->mac_version);
tp->mac_version = p->mac_version;
if (tp->mac_version == RTL_GIGA_MAC_NONE) {
I've found that RTL_R32(TxConfig) is 0x3fb00080 and 0x3f4006c0 for my chips.
This gives RTL_GIGA_MAC_VER_24 and RTL_GIGA_MAC_VER_22 judging by table in
rtl8169_get_mac_version().
Then I'm now running 3.0.46 kernel with the following patch applied
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index f7a56f4..247a238 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -210,11 +212,11 @@ static const struct {
[RTL_GIGA_MAC_VER_21] =
_R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K, false),
[RTL_GIGA_MAC_VER_22] =
- _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168c/8111c", RTL_TD_1, NULL, JUMBO_7K, true),
[RTL_GIGA_MAC_VER_23] =
_R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_6K, false),
[RTL_GIGA_MAC_VER_24] =
- _R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_6K, false),
+ _R("RTL8168cp/8111cp", RTL_TD_1, NULL, JUMBO_7K, true),
[RTL_GIGA_MAC_VER_25] =
_R("RTL8168d/8111d", RTL_TD_1, FIRMWARE_8168D_1,
JUMBO_9K, false),
and ~7K jumbos and tx csum offload work again.
(by the way, on atom system, without tx csum offload, half of cpu time
is spent only to calculate checksums...)
Now I wonder, where that 6K limit came from and why they say it is now
not possible to use jumbos together with tx csum offload? Is my testing
enough to justify raising the limits and allowing tx offload? If yes,
then how do we handle this regression?
Thanks,
Kirill
P.S. Just for info: I've also tried, but on both my chips 9K jumbos do
not work.
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [REGRESSION] r8169: jumbo fixes caused jumbo regressions!
2012-11-13 17:06 [REGRESSION] r8169: jumbo fixes caused jumbo regressions! Kirill Smelkov
@ 2012-11-13 22:35 ` Francois Romieu
2012-11-14 9:25 ` Kirill Smelkov
0 siblings, 1 reply; 4+ messages in thread
From: Francois Romieu @ 2012-11-13 22:35 UTC (permalink / raw)
To: Kirill Smelkov
Cc: Realtek linux nic maintainers, Hayes Wang, David S. Miller,
Greg Kroah-Hartman, netdev
Kirill Smelkov <kirr@mns.spb.ru> :
[...]
> My test is to stream raw video from 8 PAL cameras to net - 4 for 720x576@25 and
> 4 for 360x288@25 which for YUYV format occupies ~ 860 Mbps of bandwidth. The
> program to transmit/receive video is here: http://repo.or.cz/w/rawv.git
$ git clone http://repo.or.cz/w/rawv.git
Cloning into 'rawv'...
fatal: http://repo.or.cz/w/rawv.git/info/refs not valid: is this a git repository?
[...]
> (by the way, on atom system, without tx csum offload, half of cpu time
> is spent only to calculate checksums...)
:o(
> Now I wonder, where that 6K limit came from and why they say it is now
> not possible to use jumbos together with tx csum offload ?
Here is an excerpt from a mail where Hayes explained the rules of
engagement back in may 2011 (John Lumby and Chris Friesen were Cced then):
! The Max tx sizes for 8168 series are as following:
!
! 8168B is 4K bytes.
! 8168C and 8168CP are 6K bytes.
! 8168D and later are 9K bytes.
!
! Note that these sizes all include head size. That is, the mtu must less than
! these values.
! You have to enable Jumbo frame feature when the tx size is large, otherwise the
! packet would not be sent. Because the hw doesn't provide the threshold, the
! checking for MTU > 1500 is just for convenience for sw.
!
! The TSO couldn't work with some feature which need to disable hw checksum, such
! as Jumbo frame. The hw checksum have to be disabled in certain situations, so
! the TSO feature should be checked in these situations, too.
> Is my testing enough to justify raising the limits and allowing tx offload ?
I don't oppose knobs to go off-limits but I'll need some rather good reason
before changing the manufacturer's suggested defaults.
--
Ueimor
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [REGRESSION] r8169: jumbo fixes caused jumbo regressions!
2012-11-13 22:35 ` Francois Romieu
@ 2012-11-14 9:25 ` Kirill Smelkov
2012-11-26 16:19 ` Kirill Smelkov
0 siblings, 1 reply; 4+ messages in thread
From: Kirill Smelkov @ 2012-11-14 9:25 UTC (permalink / raw)
To: Francois Romieu
Cc: Realtek linux nic maintainers, Hayes Wang, David S. Miller,
Greg Kroah-Hartman, netdev
On Tue, Nov 13, 2012 at 11:35:12PM +0100, Francois Romieu wrote:
> Kirill Smelkov <kirr@mns.spb.ru> :
> [...]
> > My test is to stream raw video from 8 PAL cameras to net - 4 for 720x576@25 and
> > 4 for 360x288@25 which for YUYV format occupies ~ 860 Mbps of bandwidth. The
> > program to transmit/receive video is here: http://repo.or.cz/w/rawv.git
>
> $ git clone http://repo.or.cz/w/rawv.git
> Cloning into 'rawv'...
> fatal: http://repo.or.cz/w/rawv.git/info/refs not valid: is this a git repository?
That's a gitweb view. The actual git repo is here:
git://repo.or.cz/rawv.git
sorry for confusion.
( if you'll want to test with vivi on a slow system, you'll need my
patches, currently staging in media-tree patchwork, or available here:
git://repo.or.cz/linux-2.6/kirr.git vivi-speedup-and-fps.over-net-next )
> [...]
> > (by the way, on atom system, without tx csum offload, half of cpu time
> > is spent only to calculate checksums...)
>
> :o(
yes, that large. In top, my workload is
%sy %id %si
default driver load ~25 ~45 ~27
(ethtool -k shows
tx-checksumming: off)
after ~8 ~81 ~11
`ethtool -K eth0 tx on`
that's why the issue is important.
> > Now I wonder, where that 6K limit came from and why they say it is now
> > not possible to use jumbos together with tx csum offload ?
>
> Here is an excerpt from a mail where Hayes explained the rules of
> engagement back in may 2011 (John Lumby and Chris Friesen were Cced then):
Can't find that mail in gmane netdev archive and on google, to restore
full context. Was that in private?
> ! The Max tx sizes for 8168 series are as following:
> !
> ! 8168B is 4K bytes.
> ! 8168C and 8168CP are 6K bytes.
> ! 8168D and later are 9K bytes.
> !
> ! Note that these sizes all include head size. That is, the mtu must less than
> ! these values.
> ! You have to enable Jumbo frame feature when the tx size is large, otherwise the
> ! packet would not be sent. Because the hw doesn't provide the threshold, the
> ! checking for MTU > 1500 is just for convenience for sw.
This part is clear.
> ! The TSO couldn't work with some feature which need to disable hw checksum, such
> ! as Jumbo frame. The hw checksum have to be disabled in certain situations, so
> ! the TSO feature should be checked in these situations, too.
I don't enable TSO nor I need it. The text indirectly says that hw
checksum should be disabled when jumbo frames are used.
~~~~
Hayes, Realtek linux nic maintainers,
could you please confirm that for all 8168C and 8168CP jumbo_max is
6K and that when jumbos are used, tx checksumming should be off?
If so, how come my two chips work stable with ~7K jumbos and tx checksum
offload on (tested this night again for ~16 hours without any problem).
thanks beforehand.
> > Is my testing enough to justify raising the limits and allowing tx offload ?
>
> I don't oppose knobs to go off-limits but I'll need some rather good reason
> before changing the manufacturer's suggested defaults.
Thanks. Let's see what Realtek people say.
Kirill
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [REGRESSION] r8169: jumbo fixes caused jumbo regressions!
2012-11-14 9:25 ` Kirill Smelkov
@ 2012-11-26 16:19 ` Kirill Smelkov
0 siblings, 0 replies; 4+ messages in thread
From: Kirill Smelkov @ 2012-11-26 16:19 UTC (permalink / raw)
To: Hayes Wang, Realtek linux nic maintainers
Cc: Francois Romieu, David S. Miller, Greg Kroah-Hartman, netdev
On Wed, Nov 14, 2012 at 01:25:30PM +0400, Kirill Smelkov wrote:
> On Tue, Nov 13, 2012 at 11:35:12PM +0100, Francois Romieu wrote:
> > Kirill Smelkov <kirr@mns.spb.ru> :
> > [...]
> > > My test is to stream raw video from 8 PAL cameras to net - 4 for 720x576@25 and
> > > 4 for 360x288@25 which for YUYV format occupies ~ 860 Mbps of bandwidth. The
> > > program to transmit/receive video is here: http://repo.or.cz/w/rawv.git
[...]
> > > (by the way, on atom system, without tx csum offload, half of cpu time
> > > is spent only to calculate checksums...)
> >
> > :o(
>
> yes, that large. In top, my workload is
>
> %sy %id %si
>
> default driver load ~25 ~45 ~27
> (ethtool -k shows
> tx-checksumming: off)
>
> after ~8 ~81 ~11
> `ethtool -K eth0 tx on`
>
>
> that's why the issue is important.
>
>
> > > Now I wonder, where that 6K limit came from and why they say it is now
> > > not possible to use jumbos together with tx csum offload ?
> >
> > Here is an excerpt from a mail where Hayes explained the rules of
> > engagement back in may 2011 (John Lumby and Chris Friesen were Cced then):
>
> Can't find that mail in gmane netdev archive and on google, to restore
> full context. Was that in private?
>
>
> > ! The Max tx sizes for 8168 series are as following:
> > !
> > ! 8168B is 4K bytes.
> > ! 8168C and 8168CP are 6K bytes.
> > ! 8168D and later are 9K bytes.
> > !
> > ! Note that these sizes all include head size. That is, the mtu must less than
> > ! these values.
> > ! You have to enable Jumbo frame feature when the tx size is large, otherwise the
> > ! packet would not be sent. Because the hw doesn't provide the threshold, the
> > ! checking for MTU > 1500 is just for convenience for sw.
>
> This part is clear.
>
>
> > ! The TSO couldn't work with some feature which need to disable hw checksum, such
> > ! as Jumbo frame. The hw checksum have to be disabled in certain situations, so
> > ! the TSO feature should be checked in these situations, too.
>
> I don't enable TSO nor I need it. The text indirectly says that hw
> checksum should be disabled when jumbo frames are used.
[...]
> ~~~~
>
> Hayes, Realtek linux nic maintainers,
>
> could you please confirm that for all 8168C and 8168CP jumbo_max is
> 6K and that when jumbos are used, tx checksumming should be off?
>
> If so, how come my two chips work stable with ~7K jumbos and tx checksum
> offload on (tested this night again for ~16 hours without any problem).
>
> thanks beforehand.
Dear Hayes, Realtek linux nic maintainers,
Two years ago, for current products, I've specifically choosed
motherboard with RTL8111CP, because Linux driver supported large-enough
Jumbo-frames and tx/rx offload.
Now they say that jumbo-frames should be lowered in length and tx
offload is gone, but my nics still work without problems with old ~7K
jumbos and tx checksum offload. To keep current systems working I either
have to choose another hardware, or patch the driver in contrast to what
people say was the info from the manufacturer.
Neither I like to apply risky patches nor change already proved hardware
to something else without a good reason. So please, as Realtek
representatives,
could you please confirm that for all 8168C and 8168CP jumbo_max is
6K and that when jumbos are used, tx checksumming should be off?
Thanks beforehand,
Kirill
P.S. If so, how come my two chips work stable with ~7K jumbos and tx
checksum offload on (last time tested for ~16 hours without any problem)?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-11-26 16:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-13 17:06 [REGRESSION] r8169: jumbo fixes caused jumbo regressions! Kirill Smelkov
2012-11-13 22:35 ` Francois Romieu
2012-11-14 9:25 ` Kirill Smelkov
2012-11-26 16:19 ` Kirill Smelkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox