* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-11 9:32 ` Vladimir Oltean
0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Oltean @ 2022-05-11 9:32 UTC (permalink / raw)
To: Felix Fietkau
Cc: Sean Wang, Landen Chao, DENG Qingfang, Andrew Lunn,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
> Hi Vladimir,
>
> On 11.05.22 00:21, Vladimir Oltean wrote:
> > It sounds as if this is masking a problem on the receiver end, because
> > not only does my enetc port receive the packet, it also replies to the
> > ARP request.
> >
> > pc # sudo tcpreplay -i eth1 arp-broken.pcap
> > root@debian:~# ip addr add 192.168.42.1/24 dev eno0
> > root@debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
> > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
> > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
> > ^C
> > 2 packets captured
> > 2 packets received by filter
> > 0 packets dropped by kernel
> >
> > What MAC/driver has trouble with these packets? Is there anything wrong
> > in ethtool stats? Do they even reach software? You can also use
> > "dropwatch -l kas" for some hints if they do.
>
> For some reason I can't reproduce the issue of ARPs not getting replies
> anymore.
> The garbage data is still present in the ARP packets without my patch
> though. So regardless of whether ARP packets are processed correctly or if
> they just trip up on some receivers under specific conditions, I believe my
> patch is valid and should be applied.
I don't have a very strong opinion regarding whether to apply the patch or not.
I think we've removed it from bug fix territory now, until proven otherwise.
I do care about the justification (commit message, comments) being
correct though. If you cannot reproduce now, someone one year from now
surely cannot reproduce it either, and won't know why the code is there.
FYI, the reason why you call __skb_put_padto() is not the reason why
others call __skb_put_padto().
> Who knows, maybe the garbage padding even leaks some data from previous
> packets, or some other information from within the switch.
I mean, the padding has to come from somewhere, no? Although I'd
probably imagine non-scrubbed buffer cells rather than data structures...
Let's see what others have to say. I've been wanting to make the policy
of whether to call __skb_put_padto() standardized for all tagging protocol
drivers (similar to what is done in dsa_realloc_skb() and below it).
We pad for tail taggers, maybe we can always pad and this removes a
conditional, and simplifies taggers. Side note, I already dislike that
the comment in tag_brcm.c is out of sync with the code. It says that
padding up to ETH_ZLEN is necessary, but proceeds to pad up until
ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
It would be nice if we could use the simple eth_skb_pad().
But there will be a small performance degradation for small packets due
to the memset in __skb_pad(), which I'm not sure is worth the change.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-11 9:32 ` Vladimir Oltean
0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Oltean @ 2022-05-11 9:32 UTC (permalink / raw)
To: Felix Fietkau
Cc: Sean Wang, Landen Chao, DENG Qingfang, Andrew Lunn,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
> Hi Vladimir,
>
> On 11.05.22 00:21, Vladimir Oltean wrote:
> > It sounds as if this is masking a problem on the receiver end, because
> > not only does my enetc port receive the packet, it also replies to the
> > ARP request.
> >
> > pc # sudo tcpreplay -i eth1 arp-broken.pcap
> > root@debian:~# ip addr add 192.168.42.1/24 dev eno0
> > root@debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
> > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
> > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
> > ^C
> > 2 packets captured
> > 2 packets received by filter
> > 0 packets dropped by kernel
> >
> > What MAC/driver has trouble with these packets? Is there anything wrong
> > in ethtool stats? Do they even reach software? You can also use
> > "dropwatch -l kas" for some hints if they do.
>
> For some reason I can't reproduce the issue of ARPs not getting replies
> anymore.
> The garbage data is still present in the ARP packets without my patch
> though. So regardless of whether ARP packets are processed correctly or if
> they just trip up on some receivers under specific conditions, I believe my
> patch is valid and should be applied.
I don't have a very strong opinion regarding whether to apply the patch or not.
I think we've removed it from bug fix territory now, until proven otherwise.
I do care about the justification (commit message, comments) being
correct though. If you cannot reproduce now, someone one year from now
surely cannot reproduce it either, and won't know why the code is there.
FYI, the reason why you call __skb_put_padto() is not the reason why
others call __skb_put_padto().
> Who knows, maybe the garbage padding even leaks some data from previous
> packets, or some other information from within the switch.
I mean, the padding has to come from somewhere, no? Although I'd
probably imagine non-scrubbed buffer cells rather than data structures...
Let's see what others have to say. I've been wanting to make the policy
of whether to call __skb_put_padto() standardized for all tagging protocol
drivers (similar to what is done in dsa_realloc_skb() and below it).
We pad for tail taggers, maybe we can always pad and this removes a
conditional, and simplifies taggers. Side note, I already dislike that
the comment in tag_brcm.c is out of sync with the code. It says that
padding up to ETH_ZLEN is necessary, but proceeds to pad up until
ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
It would be nice if we could use the simple eth_skb_pad().
But there will be a small performance degradation for small packets due
to the memset in __skb_pad(), which I'm not sure is worth the change.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
2022-05-11 9:32 ` Vladimir Oltean
(?)
@ 2022-05-11 12:24 ` Felix Fietkau
-1 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-11 12:24 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Sean Wang, Landen Chao, DENG Qingfang, Andrew Lunn,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
On 11.05.22 11:32, Vladimir Oltean wrote:
> On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
>> Hi Vladimir,
>>
>> On 11.05.22 00:21, Vladimir Oltean wrote:
>> > It sounds as if this is masking a problem on the receiver end, because
>> > not only does my enetc port receive the packet, it also replies to the
>> > ARP request.
>> >
>> > pc # sudo tcpreplay -i eth1 arp-broken.pcap
>> > root@debian:~# ip addr add 192.168.42.1/24 dev eno0
>> > root@debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
>> > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>> > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
>> > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
>> > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
>> > ^C
>> > 2 packets captured
>> > 2 packets received by filter
>> > 0 packets dropped by kernel
>> >
>> > What MAC/driver has trouble with these packets? Is there anything wrong
>> > in ethtool stats? Do they even reach software? You can also use
>> > "dropwatch -l kas" for some hints if they do.
>>
>> For some reason I can't reproduce the issue of ARPs not getting replies
>> anymore.
>> The garbage data is still present in the ARP packets without my patch
>> though. So regardless of whether ARP packets are processed correctly or if
>> they just trip up on some receivers under specific conditions, I believe my
>> patch is valid and should be applied.
>
> I don't have a very strong opinion regarding whether to apply the patch or not.
> I think we've removed it from bug fix territory now, until proven otherwise.
I strongly disagree. Without my fix we're relying on undefined behavior
of the hardware, since the switch requires padding that accounts for the
special tag.
> I do care about the justification (commit message, comments) being
> correct though. If you cannot reproduce now, someone one year from now
> surely cannot reproduce it either, and won't know why the code is there.
I think there is some misunderstanding here. I absolutely can reproduce
the corrupted padding reliably, and it matches what I put into commit
message and comments.
The issue that I can't reproduce reliably at the moment (ARP reception
failure) is something that I only pointed out in a reply to this thread.
This is what prompted me to look into the padding issue in the first
place, and it also matches reports about connectivity issues that I got
from other people.
> FYI, the reason why you call __skb_put_padto() is not the reason why
> others call __skb_put_padto().
It matches the call in tag_brcm.c (because I copied it from there), it's
just that the symptoms that I'm fixing are different (undefined behavior
instead of hard packet drop in the switch logic).
>> Who knows, maybe the garbage padding even leaks some data from previous
>> packets, or some other information from within the switch.
>
> I mean, the padding has to come from somewhere, no? Although I'd
> probably imagine non-scrubbed buffer cells rather than data structures...
>
> Let's see what others have to say. I've been wanting to make the policy
> of whether to call __skb_put_padto() standardized for all tagging protocol
> drivers (similar to what is done in dsa_realloc_skb() and below it).
> We pad for tail taggers, maybe we can always pad and this removes a
> conditional, and simplifies taggers. Side note, I already dislike that
> the comment in tag_brcm.c is out of sync with the code. It says that
> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> It would be nice if we could use the simple eth_skb_pad().
>
> But there will be a small performance degradation for small packets due
> to the memset in __skb_pad(), which I'm not sure is worth the change.
I guess we have different views on this. In my opinion, correctness
matters more in this case than the tiny performance degradation.
- Felix
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-11 12:24 ` Felix Fietkau
0 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-11 12:24 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Sean Wang, Landen Chao, DENG Qingfang, Andrew Lunn,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
On 11.05.22 11:32, Vladimir Oltean wrote:
> On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
>> Hi Vladimir,
>>
>> On 11.05.22 00:21, Vladimir Oltean wrote:
>> > It sounds as if this is masking a problem on the receiver end, because
>> > not only does my enetc port receive the packet, it also replies to the
>> > ARP request.
>> >
>> > pc # sudo tcpreplay -i eth1 arp-broken.pcap
>> > root@debian:~# ip addr add 192.168.42.1/24 dev eno0
>> > root@debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
>> > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>> > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
>> > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
>> > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
>> > ^C
>> > 2 packets captured
>> > 2 packets received by filter
>> > 0 packets dropped by kernel
>> >
>> > What MAC/driver has trouble with these packets? Is there anything wrong
>> > in ethtool stats? Do they even reach software? You can also use
>> > "dropwatch -l kas" for some hints if they do.
>>
>> For some reason I can't reproduce the issue of ARPs not getting replies
>> anymore.
>> The garbage data is still present in the ARP packets without my patch
>> though. So regardless of whether ARP packets are processed correctly or if
>> they just trip up on some receivers under specific conditions, I believe my
>> patch is valid and should be applied.
>
> I don't have a very strong opinion regarding whether to apply the patch or not.
> I think we've removed it from bug fix territory now, until proven otherwise.
I strongly disagree. Without my fix we're relying on undefined behavior
of the hardware, since the switch requires padding that accounts for the
special tag.
> I do care about the justification (commit message, comments) being
> correct though. If you cannot reproduce now, someone one year from now
> surely cannot reproduce it either, and won't know why the code is there.
I think there is some misunderstanding here. I absolutely can reproduce
the corrupted padding reliably, and it matches what I put into commit
message and comments.
The issue that I can't reproduce reliably at the moment (ARP reception
failure) is something that I only pointed out in a reply to this thread.
This is what prompted me to look into the padding issue in the first
place, and it also matches reports about connectivity issues that I got
from other people.
> FYI, the reason why you call __skb_put_padto() is not the reason why
> others call __skb_put_padto().
It matches the call in tag_brcm.c (because I copied it from there), it's
just that the symptoms that I'm fixing are different (undefined behavior
instead of hard packet drop in the switch logic).
>> Who knows, maybe the garbage padding even leaks some data from previous
>> packets, or some other information from within the switch.
>
> I mean, the padding has to come from somewhere, no? Although I'd
> probably imagine non-scrubbed buffer cells rather than data structures...
>
> Let's see what others have to say. I've been wanting to make the policy
> of whether to call __skb_put_padto() standardized for all tagging protocol
> drivers (similar to what is done in dsa_realloc_skb() and below it).
> We pad for tail taggers, maybe we can always pad and this removes a
> conditional, and simplifies taggers. Side note, I already dislike that
> the comment in tag_brcm.c is out of sync with the code. It says that
> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> It would be nice if we could use the simple eth_skb_pad().
>
> But there will be a small performance degradation for small packets due
> to the memset in __skb_pad(), which I'm not sure is worth the change.
I guess we have different views on this. In my opinion, correctness
matters more in this case than the tiny performance degradation.
- Felix
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-11 12:24 ` Felix Fietkau
0 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-11 12:24 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Sean Wang, Landen Chao, DENG Qingfang, Andrew Lunn,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
On 11.05.22 11:32, Vladimir Oltean wrote:
> On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
>> Hi Vladimir,
>>
>> On 11.05.22 00:21, Vladimir Oltean wrote:
>> > It sounds as if this is masking a problem on the receiver end, because
>> > not only does my enetc port receive the packet, it also replies to the
>> > ARP request.
>> >
>> > pc # sudo tcpreplay -i eth1 arp-broken.pcap
>> > root@debian:~# ip addr add 192.168.42.1/24 dev eno0
>> > root@debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
>> > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
>> > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
>> > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
>> > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
>> > ^C
>> > 2 packets captured
>> > 2 packets received by filter
>> > 0 packets dropped by kernel
>> >
>> > What MAC/driver has trouble with these packets? Is there anything wrong
>> > in ethtool stats? Do they even reach software? You can also use
>> > "dropwatch -l kas" for some hints if they do.
>>
>> For some reason I can't reproduce the issue of ARPs not getting replies
>> anymore.
>> The garbage data is still present in the ARP packets without my patch
>> though. So regardless of whether ARP packets are processed correctly or if
>> they just trip up on some receivers under specific conditions, I believe my
>> patch is valid and should be applied.
>
> I don't have a very strong opinion regarding whether to apply the patch or not.
> I think we've removed it from bug fix territory now, until proven otherwise.
I strongly disagree. Without my fix we're relying on undefined behavior
of the hardware, since the switch requires padding that accounts for the
special tag.
> I do care about the justification (commit message, comments) being
> correct though. If you cannot reproduce now, someone one year from now
> surely cannot reproduce it either, and won't know why the code is there.
I think there is some misunderstanding here. I absolutely can reproduce
the corrupted padding reliably, and it matches what I put into commit
message and comments.
The issue that I can't reproduce reliably at the moment (ARP reception
failure) is something that I only pointed out in a reply to this thread.
This is what prompted me to look into the padding issue in the first
place, and it also matches reports about connectivity issues that I got
from other people.
> FYI, the reason why you call __skb_put_padto() is not the reason why
> others call __skb_put_padto().
It matches the call in tag_brcm.c (because I copied it from there), it's
just that the symptoms that I'm fixing are different (undefined behavior
instead of hard packet drop in the switch logic).
>> Who knows, maybe the garbage padding even leaks some data from previous
>> packets, or some other information from within the switch.
>
> I mean, the padding has to come from somewhere, no? Although I'd
> probably imagine non-scrubbed buffer cells rather than data structures...
>
> Let's see what others have to say. I've been wanting to make the policy
> of whether to call __skb_put_padto() standardized for all tagging protocol
> drivers (similar to what is done in dsa_realloc_skb() and below it).
> We pad for tail taggers, maybe we can always pad and this removes a
> conditional, and simplifies taggers. Side note, I already dislike that
> the comment in tag_brcm.c is out of sync with the code. It says that
> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> It would be nice if we could use the simple eth_skb_pad().
>
> But there will be a small performance degradation for small packets due
> to the memset in __skb_pad(), which I'm not sure is worth the change.
I guess we have different views on this. In my opinion, correctness
matters more in this case than the tiny performance degradation.
- Felix
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
2022-05-11 12:24 ` Felix Fietkau
(?)
@ 2022-05-11 13:22 ` Vladimir Oltean
-1 siblings, 0 replies; 51+ messages in thread
From: Vladimir Oltean @ 2022-05-11 13:22 UTC (permalink / raw)
To: Felix Fietkau
Cc: Sean Wang, Landen Chao, DENG Qingfang, Andrew Lunn,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
On Wed, May 11, 2022 at 02:24:19PM +0200, Felix Fietkau wrote:
>
> On 11.05.22 11:32, Vladimir Oltean wrote:
> > On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
> > > Hi Vladimir,
> > >
> > > On 11.05.22 00:21, Vladimir Oltean wrote:
> > > > It sounds as if this is masking a problem on the receiver end, because
> > > > not only does my enetc port receive the packet, it also replies to the
> > > > ARP request.
> > > > > pc # sudo tcpreplay -i eth1 arp-broken.pcap
> > > > root@debian:~# ip addr add 192.168.42.1/24 dev eno0
> > > > root@debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
> > > > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> > > > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> > > > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
> > > > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
> > > > ^C
> > > > 2 packets captured
> > > > 2 packets received by filter
> > > > 0 packets dropped by kernel
> > > > > What MAC/driver has trouble with these packets? Is there
> > > anything wrong
> > > > in ethtool stats? Do they even reach software? You can also use
> > > > "dropwatch -l kas" for some hints if they do.
> > >
> > > For some reason I can't reproduce the issue of ARPs not getting replies
> > > anymore.
> > > The garbage data is still present in the ARP packets without my patch
> > > though. So regardless of whether ARP packets are processed correctly or if
> > > they just trip up on some receivers under specific conditions, I believe my
> > > patch is valid and should be applied.
> >
> > I don't have a very strong opinion regarding whether to apply the patch or not.
> > I think we've removed it from bug fix territory now, until proven otherwise.
>
> I strongly disagree. Without my fix we're relying on undefined behavior of
> the hardware, since the switch requires padding that accounts for the
> special tag.
>
> > I do care about the justification (commit message, comments) being
> > correct though. If you cannot reproduce now, someone one year from now
> > surely cannot reproduce it either, and won't know why the code is there.
>
> I think there is some misunderstanding here. I absolutely can reproduce the
> corrupted padding reliably, and it matches what I put into commit message
> and comments.
>
> The issue that I can't reproduce reliably at the moment (ARP reception
> failure) is something that I only pointed out in a reply to this thread.
> This is what prompted me to look into the padding issue in the first place,
> and it also matches reports about connectivity issues that I got from other
> people.
>
> > FYI, the reason why you call __skb_put_padto() is not the reason why
> > others call __skb_put_padto().
>
> It matches the call in tag_brcm.c (because I copied it from there), it's
> just that the symptoms that I'm fixing are different (undefined behavior
> instead of hard packet drop in the switch logic).
>
> > > Who knows, maybe the garbage padding even leaks some data from previous
> > > packets, or some other information from within the switch.
> >
> > I mean, the padding has to come from somewhere, no? Although I'd
> > probably imagine non-scrubbed buffer cells rather than data structures...
> >
> > Let's see what others have to say. I've been wanting to make the policy
> > of whether to call __skb_put_padto() standardized for all tagging protocol
> > drivers (similar to what is done in dsa_realloc_skb() and below it).
> > We pad for tail taggers, maybe we can always pad and this removes a
> > conditional, and simplifies taggers. Side note, I already dislike that
> > the comment in tag_brcm.c is out of sync with the code. It says that
> > padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> > ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> > It would be nice if we could use the simple eth_skb_pad().
> >
> > But there will be a small performance degradation for small packets due
> > to the memset in __skb_pad(), which I'm not sure is worth the change.
>
> I guess we have different views on this. In my opinion, correctness matters
> more in this case than the tiny performance degradation.
It seems that we are in disagreement about what it is that I am disputing.
I am not disputing that the switch inserts non-zero padding octets, I am
disputing the claim that this somehow violates any standard.
IEEE 802.3 clause 4.2.8 Frame transmission details this beyond any doubt.
It says:
ComputePad appends an array of arbitrary bits to the MAC client data to
pad the frame to the minimum frame size:
function ComputePad(var dataParam: DataValue): DataValue;
begin
ComputePad := {Append an array of size padSize of arbitrary bits to the MAC client dataField}
end; {ComputePad}
Clause 4.2.9 Frame reception then proceeds to say (too long to copy it,
sorry) that the RemovePad function truncates the dataParam when possible
(which has to do with whether the FCS is passed up to software or not)
to the value represented by the lengthOrTypeParam (in octets), therefore
*not* looking at the contents of the padding (other than to validate the
FCS).
So, what correctness are we talking about?
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 51+ messages in thread* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-11 13:22 ` Vladimir Oltean
0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Oltean @ 2022-05-11 13:22 UTC (permalink / raw)
To: Felix Fietkau
Cc: Sean Wang, Landen Chao, DENG Qingfang, Andrew Lunn,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
On Wed, May 11, 2022 at 02:24:19PM +0200, Felix Fietkau wrote:
>
> On 11.05.22 11:32, Vladimir Oltean wrote:
> > On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
> > > Hi Vladimir,
> > >
> > > On 11.05.22 00:21, Vladimir Oltean wrote:
> > > > It sounds as if this is masking a problem on the receiver end, because
> > > > not only does my enetc port receive the packet, it also replies to the
> > > > ARP request.
> > > > > pc # sudo tcpreplay -i eth1 arp-broken.pcap
> > > > root@debian:~# ip addr add 192.168.42.1/24 dev eno0
> > > > root@debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
> > > > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> > > > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> > > > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
> > > > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
> > > > ^C
> > > > 2 packets captured
> > > > 2 packets received by filter
> > > > 0 packets dropped by kernel
> > > > > What MAC/driver has trouble with these packets? Is there
> > > anything wrong
> > > > in ethtool stats? Do they even reach software? You can also use
> > > > "dropwatch -l kas" for some hints if they do.
> > >
> > > For some reason I can't reproduce the issue of ARPs not getting replies
> > > anymore.
> > > The garbage data is still present in the ARP packets without my patch
> > > though. So regardless of whether ARP packets are processed correctly or if
> > > they just trip up on some receivers under specific conditions, I believe my
> > > patch is valid and should be applied.
> >
> > I don't have a very strong opinion regarding whether to apply the patch or not.
> > I think we've removed it from bug fix territory now, until proven otherwise.
>
> I strongly disagree. Without my fix we're relying on undefined behavior of
> the hardware, since the switch requires padding that accounts for the
> special tag.
>
> > I do care about the justification (commit message, comments) being
> > correct though. If you cannot reproduce now, someone one year from now
> > surely cannot reproduce it either, and won't know why the code is there.
>
> I think there is some misunderstanding here. I absolutely can reproduce the
> corrupted padding reliably, and it matches what I put into commit message
> and comments.
>
> The issue that I can't reproduce reliably at the moment (ARP reception
> failure) is something that I only pointed out in a reply to this thread.
> This is what prompted me to look into the padding issue in the first place,
> and it also matches reports about connectivity issues that I got from other
> people.
>
> > FYI, the reason why you call __skb_put_padto() is not the reason why
> > others call __skb_put_padto().
>
> It matches the call in tag_brcm.c (because I copied it from there), it's
> just that the symptoms that I'm fixing are different (undefined behavior
> instead of hard packet drop in the switch logic).
>
> > > Who knows, maybe the garbage padding even leaks some data from previous
> > > packets, or some other information from within the switch.
> >
> > I mean, the padding has to come from somewhere, no? Although I'd
> > probably imagine non-scrubbed buffer cells rather than data structures...
> >
> > Let's see what others have to say. I've been wanting to make the policy
> > of whether to call __skb_put_padto() standardized for all tagging protocol
> > drivers (similar to what is done in dsa_realloc_skb() and below it).
> > We pad for tail taggers, maybe we can always pad and this removes a
> > conditional, and simplifies taggers. Side note, I already dislike that
> > the comment in tag_brcm.c is out of sync with the code. It says that
> > padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> > ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> > It would be nice if we could use the simple eth_skb_pad().
> >
> > But there will be a small performance degradation for small packets due
> > to the memset in __skb_pad(), which I'm not sure is worth the change.
>
> I guess we have different views on this. In my opinion, correctness matters
> more in this case than the tiny performance degradation.
It seems that we are in disagreement about what it is that I am disputing.
I am not disputing that the switch inserts non-zero padding octets, I am
disputing the claim that this somehow violates any standard.
IEEE 802.3 clause 4.2.8 Frame transmission details this beyond any doubt.
It says:
ComputePad appends an array of arbitrary bits to the MAC client data to
pad the frame to the minimum frame size:
function ComputePad(var dataParam: DataValue): DataValue;
begin
ComputePad := {Append an array of size padSize of arbitrary bits to the MAC client dataField}
end; {ComputePad}
Clause 4.2.9 Frame reception then proceeds to say (too long to copy it,
sorry) that the RemovePad function truncates the dataParam when possible
(which has to do with whether the FCS is passed up to software or not)
to the value represented by the lengthOrTypeParam (in octets), therefore
*not* looking at the contents of the padding (other than to validate the
FCS).
So, what correctness are we talking about?
^ permalink raw reply [flat|nested] 51+ messages in thread* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-11 13:22 ` Vladimir Oltean
0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Oltean @ 2022-05-11 13:22 UTC (permalink / raw)
To: Felix Fietkau
Cc: Sean Wang, Landen Chao, DENG Qingfang, Andrew Lunn,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
On Wed, May 11, 2022 at 02:24:19PM +0200, Felix Fietkau wrote:
>
> On 11.05.22 11:32, Vladimir Oltean wrote:
> > On Wed, May 11, 2022 at 10:50:17AM +0200, Felix Fietkau wrote:
> > > Hi Vladimir,
> > >
> > > On 11.05.22 00:21, Vladimir Oltean wrote:
> > > > It sounds as if this is masking a problem on the receiver end, because
> > > > not only does my enetc port receive the packet, it also replies to the
> > > > ARP request.
> > > > > pc # sudo tcpreplay -i eth1 arp-broken.pcap
> > > > root@debian:~# ip addr add 192.168.42.1/24 dev eno0
> > > > root@debian:~# tcpdump -i eno0 -e -n --no-promiscuous-mode arp
> > > > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> > > > listening on eno0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
> > > > 22:18:58.846753 f4:d4:88:5e:6f:d2 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Request who-has 192.168.42.1 tell 192.168.42.173, length 46
> > > > 22:18:58.846806 00:04:9f:05:f4:ab > f4:d4:88:5e:6f:d2, ethertype ARP (0x0806), length 42: Reply 192.168.42.1 is-at 00:04:9f:05:f4:ab, length 28
> > > > ^C
> > > > 2 packets captured
> > > > 2 packets received by filter
> > > > 0 packets dropped by kernel
> > > > > What MAC/driver has trouble with these packets? Is there
> > > anything wrong
> > > > in ethtool stats? Do they even reach software? You can also use
> > > > "dropwatch -l kas" for some hints if they do.
> > >
> > > For some reason I can't reproduce the issue of ARPs not getting replies
> > > anymore.
> > > The garbage data is still present in the ARP packets without my patch
> > > though. So regardless of whether ARP packets are processed correctly or if
> > > they just trip up on some receivers under specific conditions, I believe my
> > > patch is valid and should be applied.
> >
> > I don't have a very strong opinion regarding whether to apply the patch or not.
> > I think we've removed it from bug fix territory now, until proven otherwise.
>
> I strongly disagree. Without my fix we're relying on undefined behavior of
> the hardware, since the switch requires padding that accounts for the
> special tag.
>
> > I do care about the justification (commit message, comments) being
> > correct though. If you cannot reproduce now, someone one year from now
> > surely cannot reproduce it either, and won't know why the code is there.
>
> I think there is some misunderstanding here. I absolutely can reproduce the
> corrupted padding reliably, and it matches what I put into commit message
> and comments.
>
> The issue that I can't reproduce reliably at the moment (ARP reception
> failure) is something that I only pointed out in a reply to this thread.
> This is what prompted me to look into the padding issue in the first place,
> and it also matches reports about connectivity issues that I got from other
> people.
>
> > FYI, the reason why you call __skb_put_padto() is not the reason why
> > others call __skb_put_padto().
>
> It matches the call in tag_brcm.c (because I copied it from there), it's
> just that the symptoms that I'm fixing are different (undefined behavior
> instead of hard packet drop in the switch logic).
>
> > > Who knows, maybe the garbage padding even leaks some data from previous
> > > packets, or some other information from within the switch.
> >
> > I mean, the padding has to come from somewhere, no? Although I'd
> > probably imagine non-scrubbed buffer cells rather than data structures...
> >
> > Let's see what others have to say. I've been wanting to make the policy
> > of whether to call __skb_put_padto() standardized for all tagging protocol
> > drivers (similar to what is done in dsa_realloc_skb() and below it).
> > We pad for tail taggers, maybe we can always pad and this removes a
> > conditional, and simplifies taggers. Side note, I already dislike that
> > the comment in tag_brcm.c is out of sync with the code. It says that
> > padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> > ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> > It would be nice if we could use the simple eth_skb_pad().
> >
> > But there will be a small performance degradation for small packets due
> > to the memset in __skb_pad(), which I'm not sure is worth the change.
>
> I guess we have different views on this. In my opinion, correctness matters
> more in this case than the tiny performance degradation.
It seems that we are in disagreement about what it is that I am disputing.
I am not disputing that the switch inserts non-zero padding octets, I am
disputing the claim that this somehow violates any standard.
IEEE 802.3 clause 4.2.8 Frame transmission details this beyond any doubt.
It says:
ComputePad appends an array of arbitrary bits to the MAC client data to
pad the frame to the minimum frame size:
function ComputePad(var dataParam: DataValue): DataValue;
begin
ComputePad := {Append an array of size padSize of arbitrary bits to the MAC client dataField}
end; {ComputePad}
Clause 4.2.9 Frame reception then proceeds to say (too long to copy it,
sorry) that the RemovePad function truncates the dataParam when possible
(which has to do with whether the FCS is passed up to software or not)
to the value represented by the lengthOrTypeParam (in octets), therefore
*not* looking at the contents of the padding (other than to validate the
FCS).
So, what correctness are we talking about?
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
2022-05-11 9:32 ` Vladimir Oltean
(?)
@ 2022-05-11 14:32 ` Andrew Lunn
-1 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-11 14:32 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Felix Fietkau, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
> Let's see what others have to say. I've been wanting to make the policy
> of whether to call __skb_put_padto() standardized for all tagging protocol
> drivers (similar to what is done in dsa_realloc_skb() and below it).
> We pad for tail taggers, maybe we can always pad and this removes a
> conditional, and simplifies taggers. Side note, I already dislike that
> the comment in tag_brcm.c is out of sync with the code. It says that
> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> It would be nice if we could use the simple eth_skb_pad().
There are some master devices which will perform padding on their own,
in hardware. So for taggers which insert the header at the head,
forcing such padding would be a waste of CPU time.
For tail taggers, padding short packets by default does however make
sense. The master device is probably going to pad in the wrong way if
it does padding.
Andrew
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-11 14:32 ` Andrew Lunn
0 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-11 14:32 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Felix Fietkau, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
> Let's see what others have to say. I've been wanting to make the policy
> of whether to call __skb_put_padto() standardized for all tagging protocol
> drivers (similar to what is done in dsa_realloc_skb() and below it).
> We pad for tail taggers, maybe we can always pad and this removes a
> conditional, and simplifies taggers. Side note, I already dislike that
> the comment in tag_brcm.c is out of sync with the code. It says that
> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> It would be nice if we could use the simple eth_skb_pad().
There are some master devices which will perform padding on their own,
in hardware. So for taggers which insert the header at the head,
forcing such padding would be a waste of CPU time.
For tail taggers, padding short packets by default does however make
sense. The master device is probably going to pad in the wrong way if
it does padding.
Andrew
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-11 14:32 ` Andrew Lunn
0 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-11 14:32 UTC (permalink / raw)
To: Vladimir Oltean
Cc: Felix Fietkau, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
> Let's see what others have to say. I've been wanting to make the policy
> of whether to call __skb_put_padto() standardized for all tagging protocol
> drivers (similar to what is done in dsa_realloc_skb() and below it).
> We pad for tail taggers, maybe we can always pad and this removes a
> conditional, and simplifies taggers. Side note, I already dislike that
> the comment in tag_brcm.c is out of sync with the code. It says that
> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
> It would be nice if we could use the simple eth_skb_pad().
There are some master devices which will perform padding on their own,
in hardware. So for taggers which insert the header at the head,
forcing such padding would be a waste of CPU time.
For tail taggers, padding short packets by default does however make
sense. The master device is probably going to pad in the wrong way if
it does padding.
Andrew
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
2022-05-11 14:32 ` Andrew Lunn
(?)
@ 2022-05-12 8:51 ` Felix Fietkau
-1 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-12 8:51 UTC (permalink / raw)
To: Andrew Lunn, Vladimir Oltean
Cc: Sean Wang, Landen Chao, DENG Qingfang, Vivien Didelot,
Florian Fainelli, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Matthias Brugger, netdev, linux-arm-kernel,
linux-mediatek, linux-kernel
On 11.05.22 16:32, Andrew Lunn wrote:
>> Let's see what others have to say. I've been wanting to make the policy
>> of whether to call __skb_put_padto() standardized for all tagging protocol
>> drivers (similar to what is done in dsa_realloc_skb() and below it).
>> We pad for tail taggers, maybe we can always pad and this removes a
>> conditional, and simplifies taggers. Side note, I already dislike that
>> the comment in tag_brcm.c is out of sync with the code. It says that
>> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
>> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
>> It would be nice if we could use the simple eth_skb_pad().
>
> There are some master devices which will perform padding on their own,
> in hardware. So for taggers which insert the header at the head,
> forcing such padding would be a waste of CPU time.
>
> For tail taggers, padding short packets by default does however make
> sense. The master device is probably going to pad in the wrong way if
> it does padding.
I just ran some more tests, here's what I found:
The switch automatically pads all forwarded packets to 64 bytes.
When packets are forwarded from one external port to another, the
padding is all zero.
Only when packets are sent from a CPU port to an external port, the last
4 bytes contain garbage. The garbage bytes are different for every
packet, and I can't tell if it's leaking contents of previous packets or
what else is in there.
Based on that, I'm pretty sure that the hardware simply has a quirk
where it does not account for the special tag when generating its own
padding internally.
I found that replacing my __skb_put_padto call with eth_skb_pad also
works, so I'm going to send v3 with that and an updated comment.
- Felix
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-12 8:51 ` Felix Fietkau
0 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-12 8:51 UTC (permalink / raw)
To: Andrew Lunn, Vladimir Oltean
Cc: Sean Wang, Landen Chao, DENG Qingfang, Vivien Didelot,
Florian Fainelli, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Matthias Brugger, netdev, linux-arm-kernel,
linux-mediatek, linux-kernel
On 11.05.22 16:32, Andrew Lunn wrote:
>> Let's see what others have to say. I've been wanting to make the policy
>> of whether to call __skb_put_padto() standardized for all tagging protocol
>> drivers (similar to what is done in dsa_realloc_skb() and below it).
>> We pad for tail taggers, maybe we can always pad and this removes a
>> conditional, and simplifies taggers. Side note, I already dislike that
>> the comment in tag_brcm.c is out of sync with the code. It says that
>> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
>> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
>> It would be nice if we could use the simple eth_skb_pad().
>
> There are some master devices which will perform padding on their own,
> in hardware. So for taggers which insert the header at the head,
> forcing such padding would be a waste of CPU time.
>
> For tail taggers, padding short packets by default does however make
> sense. The master device is probably going to pad in the wrong way if
> it does padding.
I just ran some more tests, here's what I found:
The switch automatically pads all forwarded packets to 64 bytes.
When packets are forwarded from one external port to another, the
padding is all zero.
Only when packets are sent from a CPU port to an external port, the last
4 bytes contain garbage. The garbage bytes are different for every
packet, and I can't tell if it's leaking contents of previous packets or
what else is in there.
Based on that, I'm pretty sure that the hardware simply has a quirk
where it does not account for the special tag when generating its own
padding internally.
I found that replacing my __skb_put_padto call with eth_skb_pad also
works, so I'm going to send v3 with that and an updated comment.
- Felix
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-12 8:51 ` Felix Fietkau
0 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-12 8:51 UTC (permalink / raw)
To: Andrew Lunn, Vladimir Oltean
Cc: Sean Wang, Landen Chao, DENG Qingfang, Vivien Didelot,
Florian Fainelli, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Matthias Brugger, netdev, linux-arm-kernel,
linux-mediatek, linux-kernel
On 11.05.22 16:32, Andrew Lunn wrote:
>> Let's see what others have to say. I've been wanting to make the policy
>> of whether to call __skb_put_padto() standardized for all tagging protocol
>> drivers (similar to what is done in dsa_realloc_skb() and below it).
>> We pad for tail taggers, maybe we can always pad and this removes a
>> conditional, and simplifies taggers. Side note, I already dislike that
>> the comment in tag_brcm.c is out of sync with the code. It says that
>> padding up to ETH_ZLEN is necessary, but proceeds to pad up until
>> ETH_ZLEN + tag len, only to add the tag len once more below via skb_push().
>> It would be nice if we could use the simple eth_skb_pad().
>
> There are some master devices which will perform padding on their own,
> in hardware. So for taggers which insert the header at the head,
> forcing such padding would be a waste of CPU time.
>
> For tail taggers, padding short packets by default does however make
> sense. The master device is probably going to pad in the wrong way if
> it does padding.
I just ran some more tests, here's what I found:
The switch automatically pads all forwarded packets to 64 bytes.
When packets are forwarded from one external port to another, the
padding is all zero.
Only when packets are sent from a CPU port to an external port, the last
4 bytes contain garbage. The garbage bytes are different for every
packet, and I can't tell if it's leaking contents of previous packets or
what else is in there.
Based on that, I'm pretty sure that the hardware simply has a quirk
where it does not account for the special tag when generating its own
padding internally.
I found that replacing my __skb_put_padto call with eth_skb_pad also
works, so I'm going to send v3 with that and an updated comment.
- Felix
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
2022-05-12 8:51 ` Felix Fietkau
(?)
@ 2022-05-12 12:39 ` Andrew Lunn
-1 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-12 12:39 UTC (permalink / raw)
To: Felix Fietkau
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
Hi Felix
Thanks for the additional testing.
> I just ran some more tests, here's what I found:
> The switch automatically pads all forwarded packets to 64 bytes.
> When packets are forwarded from one external port to another, the padding is
> all zero.
> Only when packets are sent from a CPU port to an external port, the last 4
> bytes contain garbage. The garbage bytes are different for every packet, and
> I can't tell if it's leaking contents of previous packets or what else is in
> there.
> Based on that, I'm pretty sure that the hardware simply has a quirk where it
> does not account for the special tag when generating its own padding
> internally.
This does not yet explain why your receiver is dropping the frame. As
Vladimir pointed out, the contents of the pad should not matter.
Is it also getting the FCS wrong when it pads? That would cause the
receiver to drop the frame.
Or do we have an issue in the receiver where it is looking at the
contents of the pad?
Andrew
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-12 12:39 ` Andrew Lunn
0 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-12 12:39 UTC (permalink / raw)
To: Felix Fietkau
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
Hi Felix
Thanks for the additional testing.
> I just ran some more tests, here's what I found:
> The switch automatically pads all forwarded packets to 64 bytes.
> When packets are forwarded from one external port to another, the padding is
> all zero.
> Only when packets are sent from a CPU port to an external port, the last 4
> bytes contain garbage. The garbage bytes are different for every packet, and
> I can't tell if it's leaking contents of previous packets or what else is in
> there.
> Based on that, I'm pretty sure that the hardware simply has a quirk where it
> does not account for the special tag when generating its own padding
> internally.
This does not yet explain why your receiver is dropping the frame. As
Vladimir pointed out, the contents of the pad should not matter.
Is it also getting the FCS wrong when it pads? That would cause the
receiver to drop the frame.
Or do we have an issue in the receiver where it is looking at the
contents of the pad?
Andrew
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-12 12:39 ` Andrew Lunn
0 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-12 12:39 UTC (permalink / raw)
To: Felix Fietkau
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
Hi Felix
Thanks for the additional testing.
> I just ran some more tests, here's what I found:
> The switch automatically pads all forwarded packets to 64 bytes.
> When packets are forwarded from one external port to another, the padding is
> all zero.
> Only when packets are sent from a CPU port to an external port, the last 4
> bytes contain garbage. The garbage bytes are different for every packet, and
> I can't tell if it's leaking contents of previous packets or what else is in
> there.
> Based on that, I'm pretty sure that the hardware simply has a quirk where it
> does not account for the special tag when generating its own padding
> internally.
This does not yet explain why your receiver is dropping the frame. As
Vladimir pointed out, the contents of the pad should not matter.
Is it also getting the FCS wrong when it pads? That would cause the
receiver to drop the frame.
Or do we have an issue in the receiver where it is looking at the
contents of the pad?
Andrew
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
2022-05-12 12:39 ` Andrew Lunn
(?)
@ 2022-05-12 13:08 ` Felix Fietkau
-1 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-12 13:08 UTC (permalink / raw)
To: Andrew Lunn
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
Hi Andrew,
On 12.05.22 14:39, Andrew Lunn wrote:
>> I just ran some more tests, here's what I found:
>> The switch automatically pads all forwarded packets to 64 bytes.
>> When packets are forwarded from one external port to another, the padding is
>> all zero.
>> Only when packets are sent from a CPU port to an external port, the last 4
>> bytes contain garbage. The garbage bytes are different for every packet, and
>> I can't tell if it's leaking contents of previous packets or what else is in
>> there.
>> Based on that, I'm pretty sure that the hardware simply has a quirk where it
>> does not account for the special tag when generating its own padding
>> internally.
>
> This does not yet explain why your receiver is dropping the frame. As
> Vladimir pointed out, the contents of the pad should not matter.
>
> Is it also getting the FCS wrong when it pads? That would cause the
> receiver to drop the frame.
>
> Or do we have an issue in the receiver where it is looking at the
> contents of the pad?
On the devices that I used for testing before, FCS wasn't reported in my
captures. Since I can't reproduce the issue of the receiver dropping
frames anymore, I currently have no way of figuring out what went wrong.
When I was able to reproduce the issue, I'm sure that I switched between
patched and unpatched builds a few times to make sure that my change
actually made a difference, which it did.
I do agree that having the garbage bytes in there is technically
compliant with the spec. On the other hand, based on my observations I
believe that the hardware's behavior of filling the last 4 bytes with
seemingly random values only in the case of small frames being sent with
a CPU special tag is clearly not intentional nor by design.
The issue is also clearly limited to processing packets with the tag
(which can only come from the CPU), so in my opinion the tag driver is
the right place to deal with it.
So I guess it comes down to whether you guys think that this is worth
fixing.
I still consider it worth fixing, because:
- It looks like a hardware bug to me with potentially unknown consequences.
- If it caused issues in my setup, it might do so in other people's
setups as well.
- I can't rule out potential information leakage from those 4 bytes
I guess if you guys don't think the issue is worth the price of a very
small performance hit from padding the packets, I will just have to keep
this as an out-of-tree patch.
- Felix
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-12 13:08 ` Felix Fietkau
0 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-12 13:08 UTC (permalink / raw)
To: Andrew Lunn
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
Hi Andrew,
On 12.05.22 14:39, Andrew Lunn wrote:
>> I just ran some more tests, here's what I found:
>> The switch automatically pads all forwarded packets to 64 bytes.
>> When packets are forwarded from one external port to another, the padding is
>> all zero.
>> Only when packets are sent from a CPU port to an external port, the last 4
>> bytes contain garbage. The garbage bytes are different for every packet, and
>> I can't tell if it's leaking contents of previous packets or what else is in
>> there.
>> Based on that, I'm pretty sure that the hardware simply has a quirk where it
>> does not account for the special tag when generating its own padding
>> internally.
>
> This does not yet explain why your receiver is dropping the frame. As
> Vladimir pointed out, the contents of the pad should not matter.
>
> Is it also getting the FCS wrong when it pads? That would cause the
> receiver to drop the frame.
>
> Or do we have an issue in the receiver where it is looking at the
> contents of the pad?
On the devices that I used for testing before, FCS wasn't reported in my
captures. Since I can't reproduce the issue of the receiver dropping
frames anymore, I currently have no way of figuring out what went wrong.
When I was able to reproduce the issue, I'm sure that I switched between
patched and unpatched builds a few times to make sure that my change
actually made a difference, which it did.
I do agree that having the garbage bytes in there is technically
compliant with the spec. On the other hand, based on my observations I
believe that the hardware's behavior of filling the last 4 bytes with
seemingly random values only in the case of small frames being sent with
a CPU special tag is clearly not intentional nor by design.
The issue is also clearly limited to processing packets with the tag
(which can only come from the CPU), so in my opinion the tag driver is
the right place to deal with it.
So I guess it comes down to whether you guys think that this is worth
fixing.
I still consider it worth fixing, because:
- It looks like a hardware bug to me with potentially unknown consequences.
- If it caused issues in my setup, it might do so in other people's
setups as well.
- I can't rule out potential information leakage from those 4 bytes
I guess if you guys don't think the issue is worth the price of a very
small performance hit from padding the packets, I will just have to keep
this as an out-of-tree patch.
- Felix
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-12 13:08 ` Felix Fietkau
0 siblings, 0 replies; 51+ messages in thread
From: Felix Fietkau @ 2022-05-12 13:08 UTC (permalink / raw)
To: Andrew Lunn
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
Hi Andrew,
On 12.05.22 14:39, Andrew Lunn wrote:
>> I just ran some more tests, here's what I found:
>> The switch automatically pads all forwarded packets to 64 bytes.
>> When packets are forwarded from one external port to another, the padding is
>> all zero.
>> Only when packets are sent from a CPU port to an external port, the last 4
>> bytes contain garbage. The garbage bytes are different for every packet, and
>> I can't tell if it's leaking contents of previous packets or what else is in
>> there.
>> Based on that, I'm pretty sure that the hardware simply has a quirk where it
>> does not account for the special tag when generating its own padding
>> internally.
>
> This does not yet explain why your receiver is dropping the frame. As
> Vladimir pointed out, the contents of the pad should not matter.
>
> Is it also getting the FCS wrong when it pads? That would cause the
> receiver to drop the frame.
>
> Or do we have an issue in the receiver where it is looking at the
> contents of the pad?
On the devices that I used for testing before, FCS wasn't reported in my
captures. Since I can't reproduce the issue of the receiver dropping
frames anymore, I currently have no way of figuring out what went wrong.
When I was able to reproduce the issue, I'm sure that I switched between
patched and unpatched builds a few times to make sure that my change
actually made a difference, which it did.
I do agree that having the garbage bytes in there is technically
compliant with the spec. On the other hand, based on my observations I
believe that the hardware's behavior of filling the last 4 bytes with
seemingly random values only in the case of small frames being sent with
a CPU special tag is clearly not intentional nor by design.
The issue is also clearly limited to processing packets with the tag
(which can only come from the CPU), so in my opinion the tag driver is
the right place to deal with it.
So I guess it comes down to whether you guys think that this is worth
fixing.
I still consider it worth fixing, because:
- It looks like a hardware bug to me with potentially unknown consequences.
- If it caused issues in my setup, it might do so in other people's
setups as well.
- I can't rule out potential information leakage from those 4 bytes
I guess if you guys don't think the issue is worth the price of a very
small performance hit from padding the packets, I will just have to keep
this as an out-of-tree patch.
- Felix
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
2022-05-12 13:08 ` Felix Fietkau
(?)
@ 2022-05-12 15:32 ` Andrew Lunn
-1 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-12 15:32 UTC (permalink / raw)
To: Felix Fietkau
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
> I guess if you guys don't think the issue is worth the price of a very small
> performance hit from padding the packets, I will just have to keep this as
> an out-of-tree patch.
I'm thinking it is worth fixing, but i also wonder if your receiver
has a bug. And if it does have a bug, it will probably come back to
bite you sometime in the future.
Andrew
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek
^ permalink raw reply [flat|nested] 51+ messages in thread* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-12 15:32 ` Andrew Lunn
0 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-12 15:32 UTC (permalink / raw)
To: Felix Fietkau
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
> I guess if you guys don't think the issue is worth the price of a very small
> performance hit from padding the packets, I will just have to keep this as
> an out-of-tree patch.
I'm thinking it is worth fixing, but i also wonder if your receiver
has a bug. And if it does have a bug, it will probably come back to
bite you sometime in the future.
Andrew
^ permalink raw reply [flat|nested] 51+ messages in thread* Re: [PATCH v2] net: dsa: tag_mtk: add padding for tx packets
@ 2022-05-12 15:32 ` Andrew Lunn
0 siblings, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2022-05-12 15:32 UTC (permalink / raw)
To: Felix Fietkau
Cc: Vladimir Oltean, Sean Wang, Landen Chao, DENG Qingfang,
Vivien Didelot, Florian Fainelli, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Matthias Brugger, netdev,
linux-arm-kernel, linux-mediatek, linux-kernel
> I guess if you guys don't think the issue is worth the price of a very small
> performance hit from padding the packets, I will just have to keep this as
> an out-of-tree patch.
I'm thinking it is worth fixing, but i also wonder if your receiver
has a bug. And if it does have a bug, it will probably come back to
bite you sometime in the future.
Andrew
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 51+ messages in thread