* Re: igc: missing HW timestamps at TX
From: Ferenc Fejes @ 2022-08-12 14:13 UTC (permalink / raw)
To: vinicius.gomes@intel.com, netdev@vger.kernel.org
Cc: marton12050@gmail.com, peti.antal99@gmail.com,
vladimir.oltean@nxp.com
In-Reply-To: <87tu6i6h1k.fsf@intel.com>
Hi Vinicius!
On Thu, 2022-08-11 at 10:33 -0300, Vinicius Costa Gomes wrote:
> Hi Ferenc,
>
> >
> > With iperf TCP test line-rate achiveable just like without the
> > patch.
> >
>
> That's very good to know.
>
> > > >
> > > > If you are feeling adventurous and feel like helping test it,
> > > > here
> > > > is
> > > > the link:
> > > >
> > > > https%3A%2F%2Fgithub.com%2Fvcgomes%2Fnet-next%2Ftree%2Figc-
> > > > multiple-tstamp-timers-lock-new
> > > >
> >
> > Is there any test in partucular you interested in? My testbed is
> > configured so I can do some.
> >
>
> The only thing I am worried about is, if in the "dropped" HW
> timestamps
> case, if all the timestamp slots are indeed full, or if there's any
> bug
> and we missed one timestamp.
>
> Can you verify that for for every dropped HW timestamp in your
> application, can you see that 'tx_hwtstamp_skipped' (from 'ethtool -
> S')
> increases everytime the drop happens? Seeing if
> 'tx_hwtstamp_timeouts'
> also increases would be useful as well.
Yes, its increasing. Let me illustrate it:
Ethtool before the measurement:
$ ethtool -S enp3s0 | grep hwtstamp
tx_hwtstamp_timeouts: 1
tx_hwtstamp_skipped: 409
rx_hwtstamp_cleared: 0
Measurement:
$ sudo isochron send -i enp3s0 -s 64 -c 0.0000005 --client 10.0.0.20 --
num-frames 10000000 -F isochron.dat --sync-threshold 2000 -M $((1 <<
2)) --sched-fifo --sched-priority 99
(note: isochron would try to send a packet in every 500ns, but the rate
actually limited by the sleep/syscall latency so its sending packets in
about every 15-20us)
Output:
isochron[1660315948.335677744]: local ptpmon -7 sysmon -
25 receiver ptpmon 0 sysmon 4
Timed out waiting for TX timestamps, 10 timestamps unacknowledged
seqid 3441 missing timestamps: hw,
seqid 3442 missing timestamps: hw,
seqid 3443 missing timestamps: hw,
seqid 3449 missing timestamps: hw,
seqid 5530 missing timestamps: hw,
seqid 5531 missing timestamps: hw,
seqid 7597 missing timestamps: hw,
seqid 7598 missing timestamps: hw,
seqid 7599 missing timestamps: hw,
seqid 7605 missing timestamps: hw,
Ethtool after the measurement:
ethtool -S enp3s0 | grep hwtstamp
tx_hwtstamp_timeouts: 1
tx_hwtstamp_skipped: 419
rx_hwtstamp_cleared: 0
Which is inline with what the isochron see.
But thats only happens if I forcingly put the affinity of the sender
different CPU core than the ptp worker of the igc. If those running on
the same core I doesnt lost any HW timestam even for 10 million
packets. Worth to mention actually I see many lost timestamp which
confused me a little bit but those are lost because of the small
MSG_ERRQUEUE. When I increased that from few kbytes to 20 mbytes I got
every timestamp successfully.
>
> If for every drop there's one 'tx_hwtstamp_skipped' increment, then
> it
> means that the driver is doing its best, and the workload is
> requesting
> more timestamps than the system is able to handle.
>
> If only 'tx_hwtstamp_timeouts' increases then it's possible that
> there
> could be a bug hiding still.
On the other hand I'm little bit confused with the ETF behavior.
Without HW offload, I lost almost every timestamp even with large (one
packet in every 500 us) sending rate and with HW offload I still lost a
lot. But that migh be beyond the igc, and some config issue on my setup
(I have to apply mqprio and do the PTP sync on default priority and
data packets with SO_TXTIME cmsg sent to ETF at prio 2). Does the
tx_queue affect the timestamping?
CC Vladimir, the author of isochron.
>
> > > >
> > > > Cheers,
> > >
> > > Best,
> > > Ferenc
Best,
Ferenc
^ permalink raw reply
* Re: [PATCH 4/4] dt-bindings: net: dsa: mediatek,mt7530: update json-schema
From: Arınç ÜNAL @ 2022-08-12 14:06 UTC (permalink / raw)
To: Krzysztof Kozlowski, Andrew Lunn, Vivien Didelot,
Florian Fainelli, Vladimir Oltean, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
Matthias Brugger, Sean Wang, Landen Chao, DENG Qingfang,
Frank Wunderlich, Luiz Angelo Daros de Luca, Sander Vanheule,
René van Dorst, Daniel Golle, erkin.bozoglu,
Sergio Paracuellos
Cc: netdev, devicetree, linux-arm-kernel, linux-mediatek,
linux-kernel
In-Reply-To: <3731cd56-f7e8-6807-06b5-b8b176b078b6@linaro.org>
On 12.08.2022 16:48, Krzysztof Kozlowski wrote:
> On 12/08/2022 16:41, Arınç ÜNAL wrote:
>> On 12.08.2022 10:01, Krzysztof Kozlowski wrote:
>>> On 12/08/2022 09:57, Krzysztof Kozlowski wrote:
>>>> On 12/08/2022 01:09, Arınç ÜNAL wrote:
>>>>>>> -patternProperties:
>>>>>>> - "^(ethernet-)?ports$":
>>>>>>> - type: object
>>>>>>
>>>>>> Actually four patches...
>>>>>>
>>>>>> I don't find this change explained in commit msg. What is more, it looks
>>>>>> incorrect. All properties and patternProperties should be explained in
>>>>>> top-level part.
>>>>>>
>>>>>> Defining such properties (with big piece of YAML) in each if:then: is no
>>>>>> readable.
>>>>>
>>>>> I can't figure out another way. I need to require certain properties for
>>>>> a compatible string AND certain enum/const for certain properties which
>>>>> are inside patternProperties for "^(ethernet-)?port@[0-9]+$" by reading
>>>>> the compatible string.
>>>>
>>>> requiring properties is not equal to defining them and nothing stops you
>>>> from defining all properties top-level and requiring them in
>>>> allOf:if:then:patternProperties.
>>>>
>>>>
>>>>> If I put allOf:if:then under patternProperties, I can't do the latter.
>>>>
>>>> You can.
>>
>> Am I supposed to do something like this:
>>
>> patternProperties:
>> "^(ethernet-)?ports$":
>> type: object
>>
>> patternProperties:
>> "^(ethernet-)?port@[0-9]+$":
>> type: object
>> description: Ethernet switch ports
>>
>> unevaluatedProperties: false
>>
>> properties:
>> reg:
>> description:
>> Port address described must be 5 or 6 for CPU port and
>> from 0 to 5 for user ports.
>>
>> allOf:
>> - $ref: dsa-port.yaml#
>> - if:
>> properties:
>> label:
>> items:
>> - const: cpu
>> then:
>> allOf:
>> - if:
>> properties:
>
> Not really, this is absolutely unreadable.
>
> Usually the way it is handled is:
>
> patternProperties:
> "^(ethernet-)?ports$":
> type: object
>
> patternProperties:
> "^(ethernet-)?port@[0-9]+$":
> type: object
> description: Ethernet switch ports
> unevaluatedProperties: false
> ... regular stuff follows
>
> allOf:
> - if:
> properties:
> compatible:
> .....
> then:
> patternProperties:
> "^(ethernet-)?ports$":
> patternProperties:
> "^(ethernet-)?port@[0-9]+$":
> properties:
> reg:
> const: 5
>
>
> I admit that it is still difficult to parse, which could justify
> splitting to separate schema. Anyway the point of my comment was to
> define all properties in top level, not in allOf.
>
> allOf should be used to constrain these properties.
The problem is:
- only specific values of reg are allowed if label is cpu.
- only specific values of phy-mode are allowed if reg is 5 or 6.
This forces me to define properties under allOf:if:then. Splitting to
separate schema (per compatible string?) wouldn't help in this case.
I can split patternProperties to two sections, but I can't directly
define the reg property like you put above.
I can at least split mediatek,mt7531 to a separate schema to have less
patternProperties on a single binding.
What do you think?
Arınç
^ permalink raw reply
* Re: [PATCH 4/4] dt-bindings: net: dsa: mediatek,mt7530: update json-schema
From: Krzysztof Kozlowski @ 2022-08-12 13:48 UTC (permalink / raw)
To: Arınç ÜNAL, Andrew Lunn, Vivien Didelot,
Florian Fainelli, Vladimir Oltean, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
Matthias Brugger, Sean Wang, Landen Chao, DENG Qingfang,
Frank Wunderlich, Luiz Angelo Daros de Luca, Sander Vanheule,
René van Dorst, Daniel Golle, erkin.bozoglu,
Sergio Paracuellos
Cc: netdev, devicetree, linux-arm-kernel, linux-mediatek,
linux-kernel
In-Reply-To: <70e246af-c336-0896-95b5-9e42a17a239d@arinc9.com>
On 12/08/2022 16:41, Arınç ÜNAL wrote:
> On 12.08.2022 10:01, Krzysztof Kozlowski wrote:
>> On 12/08/2022 09:57, Krzysztof Kozlowski wrote:
>>> On 12/08/2022 01:09, Arınç ÜNAL wrote:
>>>>>> -patternProperties:
>>>>>> - "^(ethernet-)?ports$":
>>>>>> - type: object
>>>>>
>>>>> Actually four patches...
>>>>>
>>>>> I don't find this change explained in commit msg. What is more, it looks
>>>>> incorrect. All properties and patternProperties should be explained in
>>>>> top-level part.
>>>>>
>>>>> Defining such properties (with big piece of YAML) in each if:then: is no
>>>>> readable.
>>>>
>>>> I can't figure out another way. I need to require certain properties for
>>>> a compatible string AND certain enum/const for certain properties which
>>>> are inside patternProperties for "^(ethernet-)?port@[0-9]+$" by reading
>>>> the compatible string.
>>>
>>> requiring properties is not equal to defining them and nothing stops you
>>> from defining all properties top-level and requiring them in
>>> allOf:if:then:patternProperties.
>>>
>>>
>>>> If I put allOf:if:then under patternProperties, I can't do the latter.
>>>
>>> You can.
>
> Am I supposed to do something like this:
>
> patternProperties:
> "^(ethernet-)?ports$":
> type: object
>
> patternProperties:
> "^(ethernet-)?port@[0-9]+$":
> type: object
> description: Ethernet switch ports
>
> unevaluatedProperties: false
>
> properties:
> reg:
> description:
> Port address described must be 5 or 6 for CPU port and
> from 0 to 5 for user ports.
>
> allOf:
> - $ref: dsa-port.yaml#
> - if:
> properties:
> label:
> items:
> - const: cpu
> then:
> allOf:
> - if:
> properties:
Not really, this is absolutely unreadable.
Usually the way it is handled is:
patternProperties:
"^(ethernet-)?ports$":
type: object
patternProperties:
"^(ethernet-)?port@[0-9]+$":
type: object
description: Ethernet switch ports
unevaluatedProperties: false
... regular stuff follows
allOf:
- if:
properties:
compatible:
.....
then:
patternProperties:
"^(ethernet-)?ports$":
patternProperties:
"^(ethernet-)?port@[0-9]+$":
properties:
reg:
const: 5
I admit that it is still difficult to parse, which could justify
splitting to separate schema. Anyway the point of my comment was to
define all properties in top level, not in allOf.
allOf should be used to constrain these properties.
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH 4/4] dt-bindings: net: dsa: mediatek,mt7530: update json-schema
From: Arınç ÜNAL @ 2022-08-12 13:41 UTC (permalink / raw)
To: Krzysztof Kozlowski, Andrew Lunn, Vivien Didelot,
Florian Fainelli, Vladimir Oltean, David S . Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
Matthias Brugger, Sean Wang, Landen Chao, DENG Qingfang,
Frank Wunderlich, Luiz Angelo Daros de Luca, Sander Vanheule,
René van Dorst, Daniel Golle, erkin.bozoglu,
Sergio Paracuellos
Cc: netdev, devicetree, linux-arm-kernel, linux-mediatek,
linux-kernel
In-Reply-To: <40130c63-1e36-bb43-43b4-444a8f287226@linaro.org>
On 12.08.2022 10:01, Krzysztof Kozlowski wrote:
> On 12/08/2022 09:57, Krzysztof Kozlowski wrote:
>> On 12/08/2022 01:09, Arınç ÜNAL wrote:
>>>>> -patternProperties:
>>>>> - "^(ethernet-)?ports$":
>>>>> - type: object
>>>>
>>>> Actually four patches...
>>>>
>>>> I don't find this change explained in commit msg. What is more, it looks
>>>> incorrect. All properties and patternProperties should be explained in
>>>> top-level part.
>>>>
>>>> Defining such properties (with big piece of YAML) in each if:then: is no
>>>> readable.
>>>
>>> I can't figure out another way. I need to require certain properties for
>>> a compatible string AND certain enum/const for certain properties which
>>> are inside patternProperties for "^(ethernet-)?port@[0-9]+$" by reading
>>> the compatible string.
>>
>> requiring properties is not equal to defining them and nothing stops you
>> from defining all properties top-level and requiring them in
>> allOf:if:then:patternProperties.
>>
>>
>>> If I put allOf:if:then under patternProperties, I can't do the latter.
>>
>> You can.
Am I supposed to do something like this:
patternProperties:
"^(ethernet-)?ports$":
type: object
patternProperties:
"^(ethernet-)?port@[0-9]+$":
type: object
description: Ethernet switch ports
unevaluatedProperties: false
properties:
reg:
description:
Port address described must be 5 or 6 for CPU port and
from 0 to 5 for user ports.
allOf:
- $ref: dsa-port.yaml#
- if:
properties:
label:
items:
- const: cpu
then:
allOf:
- if:
properties:
compatible:
items:
- const: mediatek,mt7530
- const: mediatek,mt7621
then:
allOf:
- if:
properties:
reg:
const: 5
then:
properties:
phy-mode:
enum:
- gmii
- mii
- rgmii
- if:
properties:
reg:
const: 6
then:
properties:
phy-mode:
enum:
- rgmii
- trgmii
- if:
properties:
compatible:
items:
- const: mediatek,mt7531
then:
allOf:
- if:
properties:
reg:
const: 5
then:
properties:
phy-mode:
enum:
- 1000base-x
- 2500base-x
- rgmii
- sgmii
- if:
properties:
reg:
const: 6
then:
properties:
phy-mode:
enum:
- 1000base-x
- 2500base-x
- sgmii
properties:
reg:
enum:
- 5
- 6
required:
- phy-mode
>>
>>>
>>> Other than readability to human eyes, binding check works as intended,
>>> in case there's no other way to do it.
>>
>> I don't see the problem in doing it and readability is one of main
>> factors of code admission to Linux kernel.
>
> One more thought - if your schema around allOf:if:then grows too much,
> it is actually a sign that it might benefit from splitting. Either into
> two separate schemas or into common+two separate.
>
> Best regards,
> Krzysztof
Arınç
^ permalink raw reply
* Re: [PATCH net 1/3] netfilter: nf_conntrack_tcp: re-init for syn packets only
From: Neal Cardwell @ 2022-08-12 13:34 UTC (permalink / raw)
To: patchwork-bot+netdevbpf
Cc: Pablo Neira Ayuso, netfilter-devel, davem, netdev, kuba,
Yuchung Cheng, Eric Dumazet
In-Reply-To: <165116521266.24173.17359123747982099697.git-patchwork-notify@kernel.org>
On Thu, Apr 28, 2022 at 1:00 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
>
> Hello:
>
> This series was applied to netdev/net.git (master)
> by Pablo Neira Ayuso <pablo@netfilter.org>:
>
> On Thu, 28 Apr 2022 16:21:07 +0200 you wrote:
> > From: Florian Westphal <fw@strlen.de>
> >
> > Jaco Kroon reported tcp problems that Eric Dumazet and Neal Cardwell
> > pinpointed to nf_conntrack tcp_in_window() bug.
> >
> > tcp trace shows following sequence:
> >
> > [...]
>
> Here is the summary with links:
> - [net,1/3] netfilter: nf_conntrack_tcp: re-init for syn packets only
> https://git.kernel.org/netdev/net/c/c7aab4f17021
> - [net,2/3] netfilter: conntrack: fix udp offload timeout sysctl
> https://git.kernel.org/netdev/net/c/626873c446f7
> - [net,3/3] netfilter: nft_socket: only do sk lookups when indev is available
> https://git.kernel.org/netdev/net/c/743b83f15d40
>
> You are awesome, thank you!
> --
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/patchwork/pwbot.html
This first commit is an important bug fix for a serious bug that causes
TCP connection hangs for users of TCP fast open and nf_conntrack:
c7aab4f17021b netfilter: nf_conntrack_tcp: re-init for syn packets only
We are continuing to get reports about the bug that this commit fixes.
It seems this fix was only backported to v5.17 stable release, and not further,
due to a cherry-pick conflict, because this fix implicitly depends on a
slightly earlier v5.17 fix in the same spot:
82b72cb94666 netfilter: conntrack: re-init state for retransmitted syn-ack
I manually verified that the fix c7aab4f17021b can be cleanly cherry-picked
into the oldest (v4.9.325) and newest (v5.15.60) longterm release kernels as
long as we first cherry-pick that related fix that it implicitly depends on:
82b72cb94666b3dbd7152bb9f441b068af7a921b
netfilter: conntrack: re-init state for retransmitted syn-ack
c7aab4f17021b636a0ee75bcf28e06fb7c94ab48
netfilter: nf_conntrack_tcp: re-init for syn packets only
So would it be possible to backport both of those fixes with the following
cherry-picks, to all LTS stable releases?
git cherry-pick 82b72cb94666b3dbd7152bb9f441b068af7a921b
git cherry-pick c7aab4f17021b636a0ee75bcf28e06fb7c94ab48
Thanks!
Best Regards,
neal
^ permalink raw reply
* Re: [PATCH bpf] xsk: fix corrupted packets for XDP_SHARED_UMEM
From: Maciej Fijalkowski @ 2022-08-12 13:06 UTC (permalink / raw)
To: Magnus Karlsson
Cc: magnus.karlsson, bjorn, ast, daniel, netdev, jonathan.lemon, bpf,
Alasdair McWilliam, Intrusion Shield Team
In-Reply-To: <20220812113259.531-1-magnus.karlsson@gmail.com>
On Fri, Aug 12, 2022 at 01:32:59PM +0200, Magnus Karlsson wrote:
> From: Magnus Karlsson <magnus.karlsson@intel.com>
>
> Fix an issue in XDP_SHARED_UMEM mode together with aligned mode were
s/were/where
> packets are corrupted for the second and any further sockets bound to
> the same umem. In other words, this does not affect the first socket
> bound to the umem. The culprit for this bug is that the initialization
> of the DMA addresses for the pre-populated xsk buffer pool entries was
> not performed for any socket but the first one bound to the umem. Only
> the linear array of DMA addresses was populated. Fix this by
> populating the DMA addresses in the xsk buffer pool for every socket
> bound to the same umem.
>
> Fixes: 94033cd8e73b8 ("xsk: Optimize for aligned case")
> Reported-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com>
> Reported-by: Intrusion Shield Team <dnevil@intrusion.com>
> Tested-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com>
> Link: https://lore.kernel.org/xdp-newbies/6205E10C-292E-4995-9D10-409649354226@outlook.com/
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
> net/xdp/xsk_buff_pool.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
> index f70112176b7c..a71a8c6edf55 100644
> --- a/net/xdp/xsk_buff_pool.c
> +++ b/net/xdp/xsk_buff_pool.c
> @@ -379,6 +379,16 @@ static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map)
>
> static int xp_init_dma_info(struct xsk_buff_pool *pool, struct xsk_dma_map *dma_map)
> {
> + if (!pool->unaligned) {
> + u32 i;
> +
> + for (i = 0; i < pool->heads_cnt; i++) {
> + struct xdp_buff_xsk *xskb = &pool->heads[i];
> +
> + xp_init_xskb_dma(xskb, pool, dma_map->dma_pages, xskb->orig_addr);
I wondered if it would be better to move it to the end of func and use
pool->dma_pages, but it probably doesn't matter in the end.
Great catch!
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> + }
> + }
> +
> pool->dma_pages = kvcalloc(dma_map->dma_pages_cnt, sizeof(*pool->dma_pages), GFP_KERNEL);
> if (!pool->dma_pages)
> return -ENOMEM;
> @@ -428,12 +438,6 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>
> if (pool->unaligned)
> xp_check_dma_contiguity(dma_map);
> - else
> - for (i = 0; i < pool->heads_cnt; i++) {
> - struct xdp_buff_xsk *xskb = &pool->heads[i];
> -
> - xp_init_xskb_dma(xskb, pool, dma_map->dma_pages, xskb->orig_addr);
> - }
>
> err = xp_init_dma_info(pool, dma_map);
> if (err) {
>
> base-commit: 4e4588f1c4d2e67c993208f0550ef3fae33abce4
> --
> 2.34.1
>
^ permalink raw reply
* [PATCH] ice: avoid improper tc config for queue map
From: Ding Hui @ 2022-08-12 12:39 UTC (permalink / raw)
To: jesse.brandeburg, anthony.l.nguyen, davem, edumazet, kuba, pabeni,
intel-wired-lan
Cc: netdev, linux-kernel, anatolii.gerasymenko, Ding Hui
There are problems if allocated queues less than Traffic Classes.
Commit a632b2a4c920 ("ice: ethtool: Prohibit improper channel config
for DCB") already disallow setting less queues than TCs.
Another case is if we first set less queues, and later update more TCs
config due to LLDP, ice_vsi_cfg_tc() will failed but left wrong
num_txq/rxq and tc_cfg in vsi, that will cause invalid porinter access.
[ 95.968089] ice 0000:3b:00.1: More TCs defined than queues/rings allocated.
[ 95.968092] ice 0000:3b:00.1: Trying to use more Rx queues (8), than were allocated (1)!
[ 95.968093] ice 0000:3b:00.1: Failed to config TC for VSI index: 0
[ 95.969621] general protection fault: 0000 [#1] SMP NOPTI
[ 95.969705] CPU: 1 PID: 58405 Comm: lldpad Kdump: loaded Tainted: G U W O --------- -t - 4.18.0 #1
[ 95.969867] Hardware name: O.E.M/BC11SPSCB10, BIOS 8.23 12/30/2021
[ 95.969992] RIP: 0010:devm_kmalloc+0xa/0x60
[ 95.970052] Code: 5c ff ff ff 31 c0 5b 5d 41 5c c3 b8 f4 ff ff ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 89 d1 <8b> 97 60 02 00 00 48 8d 7e 18 48 39 f7 72 3f 55 89 ce 53 48 8b 4c
[ 95.970344] RSP: 0018:ffffc9003f553888 EFLAGS: 00010206
[ 95.970425] RAX: dead000000000200 RBX: ffffea003c425b00 RCX: 00000000006080c0
[ 95.970536] RDX: 00000000006080c0 RSI: 0000000000000200 RDI: dead000000000200
[ 95.970648] RBP: dead000000000200 R08: 00000000000463c0 R09: ffff888ffa900000
[ 95.970760] R10: 0000000000000000 R11: 0000000000000002 R12: ffff888ff6b40100
[ 95.970870] R13: ffff888ff6a55018 R14: 0000000000000000 R15: ffff888ff6a55460
[ 95.970981] FS: 00007f51b7d24700(0000) GS:ffff88903ee80000(0000) knlGS:0000000000000000
[ 95.971108] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 95.971197] CR2: 00007fac5410d710 CR3: 0000000f2c1de002 CR4: 00000000007606e0
[ 95.971309] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 95.971419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 95.971530] PKRU: 55555554
[ 95.971573] Call Trace:
[ 95.971622] ice_setup_rx_ring+0x39/0x110 [ice]
[ 95.971695] ice_vsi_setup_rx_rings+0x54/0x90 [ice]
[ 95.971774] ice_vsi_open+0x25/0x120 [ice]
[ 95.971843] ice_open_internal+0xb8/0x1f0 [ice]
[ 95.971919] ice_ena_vsi+0x4f/0xd0 [ice]
[ 95.971987] ice_dcb_ena_dis_vsi.constprop.5+0x29/0x90 [ice]
[ 95.972082] ice_pf_dcb_cfg+0x29a/0x380 [ice]
[ 95.972154] ice_dcbnl_setets+0x174/0x1b0 [ice]
[ 95.972220] dcbnl_ieee_set+0x89/0x230
[ 95.972279] ? dcbnl_ieee_del+0x150/0x150
[ 95.972341] dcb_doit+0x124/0x1b0
[ 95.972392] rtnetlink_rcv_msg+0x243/0x2f0
[ 95.972457] ? dcb_doit+0x14d/0x1b0
[ 95.972510] ? __kmalloc_node_track_caller+0x1d3/0x280
[ 95.972591] ? rtnl_calcit.isra.31+0x100/0x100
[ 95.972661] netlink_rcv_skb+0xcf/0xf0
[ 95.972720] netlink_unicast+0x16d/0x220
[ 95.972781] netlink_sendmsg+0x2ba/0x3a0
[ 95.975891] sock_sendmsg+0x4c/0x50
[ 95.979032] ___sys_sendmsg+0x2e4/0x300
[ 95.982147] ? kmem_cache_alloc+0x13e/0x190
[ 95.985242] ? __wake_up_common_lock+0x79/0x90
[ 95.988338] ? __check_object_size+0xac/0x1b0
[ 95.991440] ? _copy_to_user+0x22/0x30
[ 95.994539] ? move_addr_to_user+0xbb/0xd0
[ 95.997619] ? __sys_sendmsg+0x53/0x80
[ 96.000664] __sys_sendmsg+0x53/0x80
[ 96.003747] do_syscall_64+0x5b/0x1d0
[ 96.006862] entry_SYSCALL_64_after_hwframe+0x65/0xca
Only update num_txq/rxq when passed check, and restore tc_cfg if setup
queue map failed.
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
---
drivers/net/ethernet/intel/ice/ice_lib.c | 42 +++++++++++++++---------
1 file changed, 26 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index a830f7f9aed0..6e64cca30351 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -914,7 +914,7 @@ static void ice_set_dflt_vsi_ctx(struct ice_hw *hw, struct ice_vsi_ctx *ctxt)
*/
static int ice_vsi_setup_q_map(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt)
{
- u16 offset = 0, qmap = 0, tx_count = 0, pow = 0;
+ u16 offset = 0, qmap = 0, tx_count = 0, rx_count = 0, pow = 0;
u16 num_txq_per_tc, num_rxq_per_tc;
u16 qcount_tx = vsi->alloc_txq;
u16 qcount_rx = vsi->alloc_rxq;
@@ -981,23 +981,25 @@ static int ice_vsi_setup_q_map(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt)
* at least 1)
*/
if (offset)
- vsi->num_rxq = offset;
+ rx_count = offset;
else
- vsi->num_rxq = num_rxq_per_tc;
+ rx_count = num_rxq_per_tc;
- if (vsi->num_rxq > vsi->alloc_rxq) {
+ if (rx_count > vsi->alloc_rxq) {
dev_err(ice_pf_to_dev(vsi->back), "Trying to use more Rx queues (%u), than were allocated (%u)!\n",
- vsi->num_rxq, vsi->alloc_rxq);
+ rx_count, vsi->alloc_rxq);
return -EINVAL;
}
- vsi->num_txq = tx_count;
- if (vsi->num_txq > vsi->alloc_txq) {
+ if (tx_count > vsi->alloc_txq) {
dev_err(ice_pf_to_dev(vsi->back), "Trying to use more Tx queues (%u), than were allocated (%u)!\n",
- vsi->num_txq, vsi->alloc_txq);
+ tx_count, vsi->alloc_txq);
return -EINVAL;
}
+ vsi->num_txq = tx_count;
+ vsi->num_rxq = rx_count;
+
if (vsi->type == ICE_VSI_VF && vsi->num_txq != vsi->num_rxq) {
dev_dbg(ice_pf_to_dev(vsi->back), "VF VSI should have same number of Tx and Rx queues. Hence making them equal\n");
/* since there is a chance that num_rxq could have been changed
@@ -3492,6 +3494,7 @@ ice_vsi_setup_q_map_mqprio(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt,
int tc0_qcount = vsi->mqprio_qopt.qopt.count[0];
u8 netdev_tc = 0;
int i;
+ u16 new_txq, new_rxq;
vsi->tc_cfg.ena_tc = ena_tc ? ena_tc : 1;
@@ -3530,21 +3533,24 @@ ice_vsi_setup_q_map_mqprio(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt,
}
}
- /* Set actual Tx/Rx queue pairs */
- vsi->num_txq = offset + qcount_tx;
- if (vsi->num_txq > vsi->alloc_txq) {
+ new_txq = offset + qcount_tx;
+ if (new_txq > vsi->alloc_txq) {
dev_err(ice_pf_to_dev(vsi->back), "Trying to use more Tx queues (%u), than were allocated (%u)!\n",
- vsi->num_txq, vsi->alloc_txq);
+ new_txq, vsi->alloc_txq);
return -EINVAL;
}
- vsi->num_rxq = offset + qcount_rx;
- if (vsi->num_rxq > vsi->alloc_rxq) {
+ new_rxq = offset + qcount_rx;
+ if (new_rxq > vsi->alloc_rxq) {
dev_err(ice_pf_to_dev(vsi->back), "Trying to use more Rx queues (%u), than were allocated (%u)!\n",
- vsi->num_rxq, vsi->alloc_rxq);
+ new_rxq, vsi->alloc_rxq);
return -EINVAL;
}
+ /* Set actual Tx/Rx queue pairs */
+ vsi->num_txq = new_txq;
+ vsi->num_rxq = new_rxq;
+
/* Setup queue TC[0].qmap for given VSI context */
ctxt->info.tc_mapping[0] = cpu_to_le16(qmap);
ctxt->info.q_mapping[0] = cpu_to_le16(vsi->rxq_map[0]);
@@ -3580,6 +3586,7 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc)
struct device *dev;
int i, ret = 0;
u8 num_tc = 0;
+ struct ice_tc_cfg old_tc_cfg;
dev = ice_pf_to_dev(pf);
if (vsi->tc_cfg.ena_tc == ena_tc &&
@@ -3600,6 +3607,7 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc)
max_txqs[i] = vsi->num_txq;
}
+ memcpy(&old_tc_cfg, &vsi->tc_cfg, sizeof(old_tc_cfg));
vsi->tc_cfg.ena_tc = ena_tc;
vsi->tc_cfg.numtc = num_tc;
@@ -3616,8 +3624,10 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc)
else
ret = ice_vsi_setup_q_map(vsi, ctx);
- if (ret)
+ if (ret) {
+ memcpy(&vsi->tc_cfg, &old_tc_cfg, sizeof(vsi->tc_cfg));
goto out;
+ }
/* must to indicate which section of VSI context are being modified */
ctx->info.valid_sections = cpu_to_le16(ICE_AQ_VSI_PROP_RXQ_MAP_VALID);
--
2.17.1
^ permalink raw reply related
* [PATCH] net: sfp: reset fault retry counter on successful reinitialisation
From: Stefan Mahr @ 2022-08-12 13:04 UTC (permalink / raw)
To: Russell King, netdev; +Cc: linux-kernel, Stefan Mahr
This patch resets the fault retry counter to the default value, if the
module reinitialisation was successful. Otherwise without resetting
the counter, five (N_FAULT/N_FAULT_INIT) single TX_FAULT events will
deactivate the module persistently.
In case the reinitialisation was not successful after five retries,
the module is still being deactivated.
Signed-off-by: Stefan Mahr <dac922@gmx.de>
---
drivers/net/phy/sfp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 63f90fe9a4d2..a8d7a713222a 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -2263,6 +2263,9 @@ static void sfp_sm_main(struct sfp *sfp, unsigned int event)
} else if (event == SFP_E_TIMEOUT || event == SFP_E_TX_CLEAR) {
dev_info(sfp->dev, "module transmit fault recovered\n");
sfp_sm_link_check_los(sfp);
+
+ /* Reset the fault retry count */
+ sfp->sm_fault_retries = N_FAULT;
}
break;
--
2.25.1
^ permalink raw reply related
* Re: [PATCH] net:ipvs: add rcu read lock in some parts
From: Julian Anastasov @ 2022-08-12 12:17 UTC (permalink / raw)
To: sunsuwan
Cc: Simon Horman, pablo, netdev, lvs-devel, chenzhen126, yanan,
liaichun, caowangbao
In-Reply-To: <20220812093412.808351-1-sunsuwan3@huawei.com>
Hello,
On Fri, 12 Aug 2022, sunsuwan wrote:
> We founf a possible UAF if rmmod pe_sid or schedule,
> when packages in hook and get pe or sched.
>
> Signed-off-by: sunsuwan <sunsuwan3@huawei.com>
> Signed-off-by: chenzhen <chenzhen126@huawei.com>
> ---
> net/netfilter/ipvs/ip_vs_core.c | 6 ++++++
> net/netfilter/ipvs/ip_vs_ctl.c | 3 +++
> net/netfilter/ipvs/ip_vs_dh.c | 2 ++
> 3 files changed, 11 insertions(+)
>
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 51ad557a525b..d289f184d5c1 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -235,7 +235,9 @@ ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc,
> {
> ip_vs_conn_fill_param(svc->ipvs, svc->af, protocol, caddr, cport, vaddr,
> vport, p);
> + rcu_read_lock();
> p->pe = rcu_dereference(svc->pe);
> + rcu_read_unlock();
Hm, in theory, here we are under rcu_read_lock, see
nf_hook() in include/linux/netfilter.h. IPVS processes packets
by using netfilter hooks. So, IPVS scheduling is under this lock
too and ip_vs_conn_fill_param_persist() is part of the scheduling
call flow.
> if (p->pe && p->pe->fill_param)
> return p->pe->fill_param(p, skb);
>
> @@ -346,7 +348,9 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
> * template is not available.
> * return *ignored=0 i.e. ICMP and NF_DROP
> */
> + rcu_read_lock();
> sched = rcu_dereference(svc->scheduler);
> + rcu_read_unlock();
Scheduling from hook...
> if (sched) {
> /* read svc->sched_data after svc->scheduler */
> smp_rmb();
> @@ -521,7 +525,9 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
> return NULL;
> }
>
> + rcu_read_lock();
> sched = rcu_dereference(svc->scheduler);
> + rcu_read_unlock();
Scheduling from hook...
> if (sched) {
> /* read svc->sched_data after svc->scheduler */
> smp_rmb();
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index efab2b06d373..91e568028001 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -580,6 +580,7 @@ bool ip_vs_has_real_service(struct netns_ipvs *ipvs, int af, __u16 protocol,
> /* Check for "full" addressed entries */
> hash = ip_vs_rs_hashkey(af, daddr, dport);
>
> + rcu_read_lock();
ip_vs_has_real_service() is called by ip_vs_out_hook()
> hlist_for_each_entry_rcu(dest, &ipvs->rs_table[hash], d_list) {
> if (dest->port == dport &&
> dest->af == af &&
> @@ -587,9 +588,11 @@ bool ip_vs_has_real_service(struct netns_ipvs *ipvs, int af, __u16 protocol,
> (dest->protocol == protocol || dest->vfwmark) &&
> IP_VS_DFWD_METHOD(dest) == IP_VS_CONN_F_MASQ) {
> /* HIT */
> + rcu_read_unlock();
> return true;
> }
> }
> + rcu_read_unlock();
>
> return false;
> }
> diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
> index 5e6ec32aff2b..3e4b9607172b 100644
> --- a/net/netfilter/ipvs/ip_vs_dh.c
> +++ b/net/netfilter/ipvs/ip_vs_dh.c
> @@ -219,7 +219,9 @@ ip_vs_dh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb,
> IP_VS_DBG(6, "%s(): Scheduling...\n", __func__);
>
> s = (struct ip_vs_dh_state *) svc->sched_data;
> + rcu_read_lock();
> dest = ip_vs_dh_get(svc->af, s, &iph->daddr);
> + rcu_read_unlock();
Scheduling from hook...
> if (!dest
> || !(dest->flags & IP_VS_DEST_F_AVAILABLE)
> || atomic_read(&dest->weight) <= 0
> --
> 2.30.0
So, all above places are already under rcu_read_lock.
If you see some real problem, we should track it somehow.
As for the PEs, they are protected as follows:
- svc holds 1 pe_sip module refcnt (svc->pe), from ip_vs_pe_getbyname()
- every conn can get 1 pe_sip module refcnt (cp->pe): ip_vs_pe_get()
- when last conn releases pe_sip with ip_vs_pe_put() there
can be flying packets, so we have synchronize_rcu() in
ip_vs_sip_cleanup()
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply
* Re: [PATCH bpf-next v2 2/3] bpf: Add xdp dynptrs
From: kernel test robot @ 2022-08-12 12:39 UTC (permalink / raw)
To: Joanne Koong, bpf
Cc: kbuild-all, kafai, kuba, andrii, daniel, ast, netdev, kernel-team,
Joanne Koong
In-Reply-To: <20220811230501.2632393-3-joannelkoong@gmail.com>
Hi Joanne,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Joanne-Koong/Add-skb-xdp-dynptrs/20220812-070634
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: riscv-randconfig-r042-20220812 (https://download.01.org/0day-ci/archive/20220812/202208122032.0CZYkZY8-lkp@intel.com/config)
compiler: riscv32-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/20947f78c1a22c16604514fe2b7c222b77f8939b
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Joanne-Koong/Add-skb-xdp-dynptrs/20220812-070634
git checkout 20947f78c1a22c16604514fe2b7c222b77f8939b
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
riscv32-linux-ld: net/core/filter.o: in function `.L1284':
filter.c:(.text+0x5606): undefined reference to `bpf_dynptr_init'
riscv32-linux-ld: net/core/filter.o: in function `.L1279':
filter.c:(.text+0x561e): undefined reference to `bpf_dynptr_set_null'
riscv32-linux-ld: net/core/filter.o: in function `.L1298':
filter.c:(.text+0x566c): undefined reference to `bpf_dynptr_init'
riscv32-linux-ld: net/core/filter.o: in function `.L1304':
filter.c:(.text+0x568e): undefined reference to `bpf_dynptr_set_rdonly'
>> riscv32-linux-ld: filter.c:(.text+0x569a): undefined reference to `bpf_dynptr_set_null'
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply
* Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
From: Krzysztof Kozlowski @ 2022-08-12 12:36 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Wei Fang, andrew@lunn.ch, hkallweit1@gmail.com,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, robh+dt@kernel.org,
krzysztof.kozlowski+dt@linaro.org, f.fainelli@gmail.com,
netdev@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <YvZH9avGaZ3z5B5H@shell.armlinux.org.uk>
On 12/08/2022 15:30, Russell King (Oracle) wrote:
> On Fri, Aug 12, 2022 at 03:04:41PM +0300, Krzysztof Kozlowski wrote:
>> I did not propose a property to enable hibernation. The property must
>> describe hardware, so this piece is missing, regardless whether the
>> logic in the driver is "enable" or "disable".
>>
>> The hardware property for example is: "broken foo, so hibernation should
>> be disabled" or "engineer forgot to wire cables, so hibernation won't
>> work"...
>
> From the problem description, the PHY itself isn't broken. The stmmac
> hardware doesn't reset properly when the clock from the PHY is stopped.
There is nothing like that in the property name or property description.
Again - DT is not for describing driver behavior or driver policy.
> That could hardly be described as "broken" - it's quite common for
> hardware specifications to state that clocks must be running for the
> hardware to operate correctly.
>
> This is a matter of configuring the hardware to inter-operate correctly.
> Isn't that the whole purpose of DT?
>
> So, nothing is broken. Nothing has forgotten to be wired. It's a matter
> of properly configuring the hardware. Just the same as selecting the
> correct interface mode to connect two devices together.
I just gave you two examples what could be written, don't need to stick
them. You can use some real one...
Best regards,
Krzysztof
^ permalink raw reply
* Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
From: Russell King (Oracle) @ 2022-08-12 12:30 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: Wei Fang, andrew@lunn.ch, hkallweit1@gmail.com,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, robh+dt@kernel.org,
krzysztof.kozlowski+dt@linaro.org, f.fainelli@gmail.com,
netdev@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <4cf8d73e-9f14-fe8d-d6e2-551920c1f29e@linaro.org>
On Fri, Aug 12, 2022 at 03:04:41PM +0300, Krzysztof Kozlowski wrote:
> I did not propose a property to enable hibernation. The property must
> describe hardware, so this piece is missing, regardless whether the
> logic in the driver is "enable" or "disable".
>
> The hardware property for example is: "broken foo, so hibernation should
> be disabled" or "engineer forgot to wire cables, so hibernation won't
> work"...
From the problem description, the PHY itself isn't broken. The stmmac
hardware doesn't reset properly when the clock from the PHY is stopped.
That could hardly be described as "broken" - it's quite common for
hardware specifications to state that clocks must be running for the
hardware to operate correctly.
This is a matter of configuring the hardware to inter-operate correctly.
Isn't that the whole purpose of DT?
So, nothing is broken. Nothing has forgotten to be wired. It's a matter
of properly configuring the hardware. Just the same as selecting the
correct interface mode to connect two devices together.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* Re: [PATCH v4 net-next 3/6] drivers: net: dsa: add locked fdb entry flag to drivers
From: netdev @ 2022-08-12 12:29 UTC (permalink / raw)
To: Ido Schimmel
Cc: Vladimir Oltean, davem, kuba, netdev, Andrew Lunn, Vivien Didelot,
Florian Fainelli, Eric Dumazet, Paolo Abeni, Jiri Pirko,
Ivan Vecera, Roopa Prabhu, Nikolay Aleksandrov, Shuah Khan,
Daniel Borkmann, linux-kernel, bridge, linux-kselftest
On 2022-08-11 13:28, Ido Schimmel wrote:
>> > I'm talking about roaming, not forwarding. Let's say you have a locked
>> > entry with MAC X pointing to port Y. Now you get a packet with SMAC X
>> > from port Z which is unlocked. Will the FDB entry roam to port Z? I
>> > think it should, but at least in current implementation it seems that
>> > the "locked" flag will not be reset and having locked entries pointing
>> > to an unlocked port looks like a bug.
>> >
>>
In general I have been thinking that the said setup is a network
configuration error as I was arguing in an earlier conversation with
Vladimir. In this setup we must remember that SMAC X becomes DMAC X in
the return traffic on the open port. But the question arises to me why
MAC X would be behind the locked port without getting authed while being
behind an open port too?
In a real life setup, I don't think you would want random hosts behind a
locked port in the MAB case, but only the hosts you will let through.
Other hosts should be regarded as intruders.
If we are talking about a station move, then the locked entry will age
out and MAC X will function normally on the open port after the timeout,
which was a case that was taken up in earlier discussions.
But I will anyhow do some testing with this 'edge case' (of being behind
both a locked and an unlocked port) if I may call it so, and see to that
the offloaded and non-offloaded cases correspond to each other, and will
work satisfactory.
I think it will be good to have a flag to enable the mac-auth/MAB
feature, and I suggest just calling the flag 'mab', as it is short.
Otherwise I don't see any major issues with the whole feature as it is.
^ permalink raw reply
* Re: [PATCH v2] wifi: cfg80211: Fix UAF in ieee80211_scan_rx()
From: Greg KH @ 2022-08-12 12:15 UTC (permalink / raw)
To: Siddh Raman Pant
Cc: Johannes Berg, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, syzbot+6cb476b7c69916a0caca, linux-wireless,
linux-kernel, syzbot+f9acff9bf08a845f225d,
syzbot+9250865a55539d384347, linux-kernel-mentees
In-Reply-To: <18291779771.584fa6ab156295.3990923778713440655@siddh.me>
On Fri, Aug 12, 2022 at 03:21:50PM +0530, Siddh Raman Pant via Linux-kernel-mentees wrote:
> On Tue, 26 Jul 2022 18:09:21 +0530 Siddh Raman Pant wrote:
> > ieee80211_scan_rx() tries to access scan_req->flags after a null check
> > (see line 303 of mac80211/scan.c), but ___cfg80211_scan_done() uses
> > kfree() on the scan_req (see line 991 of wireless/scan.c).
> >
> > This results in a UAF.
> >
> > ieee80211_scan_rx() is called inside a RCU read-critical section
> > initiated by ieee80211_rx_napi() (see line 5044 of mac80211/rx.c).
> >
> > Thus, add an rcu_head to the scan_req struct, so that we can use
> > kfree_rcu() instead of kfree() and thus not free during the critical
> > section.
> >
> > We can clear the pointer before freeing here, since scan_req is
> > accessed using rcu_dereference().
> >
> > Bug report (3): https://syzkaller.appspot.com/bug?extid=f9acff9bf08a845f225d
> > Reported-by: syzbot+f9acff9bf08a845f225d@syzkaller.appspotmail.com
> > Reported-by: syzbot+6cb476b7c69916a0caca@syzkaller.appspotmail.com
> > Reported-by: syzbot+9250865a55539d384347@syzkaller.appspotmail.com
> >
> > Signed-off-by: Siddh Raman Pant code@siddh.me>
> > ---
> > Changes since v1 as requested:
> > - Fixed commit heading and better commit message.
> > - Clear pointer before freeing.
> >
> > include/net/cfg80211.h | 2 ++
> > net/wireless/scan.c | 2 +-
> > 2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
> > index 80f41446b1f0..7e0b448c4cdb 100644
> > --- a/include/net/cfg80211.h
> > +++ b/include/net/cfg80211.h
> > @@ -2368,6 +2368,7 @@ struct cfg80211_scan_6ghz_params {
> > * @n_6ghz_params: number of 6 GHz params
> > * @scan_6ghz_params: 6 GHz params
> > * @bssid: BSSID to scan for (most commonly, the wildcard BSSID)
> > + * @rcu_head: (internal) RCU head to use for freeing
> > */
> > struct cfg80211_scan_request {
> > struct cfg80211_ssid *ssids;
> > @@ -2397,6 +2398,7 @@ struct cfg80211_scan_request {
> > bool scan_6ghz;
> > u32 n_6ghz_params;
> > struct cfg80211_scan_6ghz_params *scan_6ghz_params;
> > + struct rcu_head rcu_head;
> >
> > /* keep last */
> > struct ieee80211_channel *channels[];
> > diff --git a/net/wireless/scan.c b/net/wireless/scan.c
> > index 6d82bd9eaf8c..6cf58fe6dea0 100644
> > --- a/net/wireless/scan.c
> > +++ b/net/wireless/scan.c
> > @@ -988,8 +988,8 @@ void ___cfg80211_scan_done(struct cfg80211_registered_device *rdev,
> > kfree(rdev->int_scan_req);
> > rdev->int_scan_req = NULL;
> >
> > - kfree(rdev->scan_req);
> > rdev->scan_req = NULL;
> > + kfree_rcu(rdev_req, rcu_head);
> >
> > if (!send_message)
> > rdev->scan_msg = msg;
> > --
> > 2.35.1
> >
>
> Hello,
>
> Probably the above quoted patch was missed, which can be found on
> https://lore.kernel.org/linux-wireless/20220726123921.29664-1-code@siddh.me/
>
> This patch was posted more than 2 weeks ago, with changes as requested.
>
> With the merge window almost ending, may I request for another look at
> this patch?
The merge window is for new features to be added, bugfixes can be merged
at any point in time, but most maintainers close their trees until after
the merge window is closed before accepting new fixes, like this one.
So just relax, wait another week or so, and if there's no response,
resend it then.
Personally, this patch seems very incorrect, but hey, I'm not the wifi
subsystem maintainer :)
thanks,
greg k-h
^ permalink raw reply
* Query on reads being flagged as direct writes...
From: Maciej Żenczykowski @ 2022-08-12 12:06 UTC (permalink / raw)
To: Lina Wang, Linux NetDev, BPF Mailing List, Jesper Dangaard Brouer,
Stanislav Fomichev, Thomas Graf, Alexei Starovoitov
From kernel/bpf/verifier.c with some simplifications (removed some of
the cases to make this shorter):
static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
const struct bpf_call_arg_meta *meta, enum bpf_access_type t)
{
enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
switch (prog_type) {
/* Program types only with direct read access go here! */
case BPF_PROG_TYPE_CGROUP_SKB: (and some others)
if (t == BPF_WRITE) return false;
fallthrough;
/* Program types with direct read + write access go here! */
case BPF_PROG_TYPE_SCHED_CLS: (and some others)
if (meta) return meta->pkt_access;
env->seen_direct_write = true;
return true;
case BPF_PROG_TYPE_CGROUP_SOCKOPT:
if (t == BPF_WRITE) env->seen_direct_write = true;
return true;
}
}
why does the above set env->seen_direct_write to true even when t !=
BPF_WRITE, even for programs that only allow (per the comment) direct
read access.
Is this working correctly? Is there some gotcha this is papering over?
Should 'env->seen_direct_write = true; return true;' be changed into
'fallthrough' so that write is only set if t == BPF_WRITE?
This matters because 'env->seen_direct_write = true' then triggers an
unconditional unclone in the bpf prologue, which I'd like to avoid
unless I actually need to modify the packet (with
bpf_skb_store_bytes)...
may_access_direct_pkt_data() has two call sites, in one it only gets
called with BPF_WRITE so it's ok, but the other one is in
check_func_arg():
if (type_is_pkt_pointer(type) && !may_access_direct_pkt_data(env,
meta, BPF_READ)) { verbose(env, "helper access to the packet is not
allowed\n"); return -EACCES; }
and I'm not really following what this does, but it seems like bpf
helper read access to the packet triggers unclone?
(side note: all packets ingressing from the rndis gadget driver are
clones due to how it deals with usb packet deaggregation [not to be
mistaken with lro/tso])
Confused...
^ permalink raw reply
* Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
From: Krzysztof Kozlowski @ 2022-08-12 12:04 UTC (permalink / raw)
To: Russell King (Oracle)
Cc: Wei Fang, andrew@lunn.ch, hkallweit1@gmail.com,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, robh+dt@kernel.org,
krzysztof.kozlowski+dt@linaro.org, f.fainelli@gmail.com,
netdev@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <YvY7Vjtj+WV3BI59@shell.armlinux.org.uk>
On 12/08/2022 14:36, Russell King (Oracle) wrote:
> On Fri, Aug 12, 2022 at 02:25:42PM +0300, Krzysztof Kozlowski wrote:
>> hibernation is a feature, but 'disable-hibernation' is not. DTS
>> describes the hardware, not policy or driver bejhvior. Why disabling
>> hibernation is a property of hardware? How you described, it's not,
>> therefore either property is not for DT or it has to be phrased
>> correctly to describe the hardware.
>
> However, older DT descriptions need to be compatible with later kernels,
> and as existing setups have hibernation enabled, introducing a property
> to _enable_ hibernation (which implies if the property is not present,
> hibernation is disabled) changes the behaviour with older DT, thereby
> breaking backwards compatibility.
>
> Yes, DT needs to describe hardware, but there are also other
> constraints too.
I did not propose a property to enable hibernation. The property must
describe hardware, so this piece is missing, regardless whether the
logic in the driver is "enable" or "disable".
The hardware property for example is: "broken foo, so hibernation should
be disabled" or "engineer forgot to wire cables, so hibernation won't
work"...
Best regards,
Krzysztof
^ permalink raw reply
* Re: mainline build failure due to 332f1795ca20 ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression")
From: Sudip Mukherjee @ 2022-08-12 12:03 UTC (permalink / raw)
To: Paul Menzel
Cc: Linus Torvalds, Jakub Kicinski, Marcel Holtmann, Johan Hedberg,
Luiz Augusto von Dentz, David S. Miller, Eric Dumazet,
Paolo Abeni, linux-bluetooth, netdev, linux-kernel,
Palmer Dabbelt
In-Reply-To: <f0a6f8cc-e8a5-ff72-b8f0-ed25fcf03b47@molgen.mpg.de>
Hi Paul,
On Fri, Aug 12, 2022 at 12:44 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Sudip,
>
>
> Am 12.08.22 um 13:25 schrieb Sudip Mukherjee (Codethink):
>
> > The latest mainline kernel branch fails to build csky and mips allmodconfig
> > with gcc-12.
> >
<snip>
>
> Does *[PATCH] Bluetooth: L2CAP: Elide a string overflow warning* [1] fix it?
Yes, this patch fixes the failure.
>
>
> > --
> > Regards
> > Sudip
>
> Only if you care, your signature delimiter is missing a trailing space [2].
Thanks. Should be fixed now.
--
Regards
Sudip
^ permalink raw reply
* [PATCH] net: fec: Restart PPS after link state change
From: Csókás Bence @ 2022-08-12 11:54 UTC (permalink / raw)
To: netdev
Cc: Richard Cochran, David S. Miller, Jakub Kicinski, qiangqing.zhang,
Csókás Bence
On link state change, the controller gets reset,
causing PPS to drop out and the PHC to lose its
time and calibration. So we restart it if needed,
restoring calibration and time registers.
Changes since v2:
* Add `fec_ptp_save_state()`/`fec_ptp_restore_state()`
* Use `ktime_get_real_ns()`
* Use `BIT()` macro
Changes since v1:
* More ECR #define's
* Stop PPS in `fec_ptp_stop()`
Signed-off-by: Csókás Bence <csokas.bence@prolan.hu>
---
drivers/net/ethernet/freescale/fec.h | 10 ++++++
drivers/net/ethernet/freescale/fec_main.c | 42 ++++++++++++++++++++---
drivers/net/ethernet/freescale/fec_ptp.c | 31 ++++++++++++++++-
3 files changed, 78 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index 0a343e68a87d..b656cda75c92 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -596,6 +596,13 @@ struct fec_enet_private {
int pps_enable;
unsigned int next_counter;
+ struct {
+ struct timespec64 ts_phc;
+ u64 ns_sys;
+ u32 at_corr;
+ u8 at_inc_corr;
+ } ptp_saved_state;
+
u64 ethtool_stats[];
};
@@ -605,5 +612,8 @@ void fec_ptp_start_cyclecounter(struct net_device *ndev);
int fec_ptp_set(struct net_device *ndev, struct ifreq *ifr);
int fec_ptp_get(struct net_device *ndev, struct ifreq *ifr);
+void fec_ptp_save_state(struct fec_enet_private *fep);
+int fec_ptp_restore_state(struct fec_enet_private *fep);
+
/****************************************************************************/
#endif /* FEC_H */
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 366c52b62d4b..b414574d7501 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -257,8 +257,11 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
#define FEC_MMFR_TA (2 << 16)
#define FEC_MMFR_DATA(v) (v & 0xffff)
/* FEC ECR bits definition */
-#define FEC_ECR_MAGICEN (1 << 2)
-#define FEC_ECR_SLEEP (1 << 3)
+#define FEC_ECR_RESET BIT(0)
+#define FEC_ECR_ETHEREN BIT(1)
+#define FEC_ECR_MAGICEN BIT(2)
+#define FEC_ECR_SLEEP BIT(3)
+#define FEC_ECR_EN1588 BIT(4)
#define FEC_MII_TIMEOUT 30000 /* us */
@@ -955,6 +958,9 @@ fec_restart(struct net_device *ndev)
u32 temp_mac[2];
u32 rcntl = OPT_FRAME_SIZE | 0x04;
u32 ecntl = 0x2; /* ETHEREN */
+ struct ptp_clock_request ptp_rq = { .type = PTP_CLK_REQ_PPS };
+
+ fec_ptp_save_state(fep);
/* Whack a reset. We should wait for this.
* For i.MX6SX SOC, enet use AXI bus, we use disable MAC
@@ -1106,7 +1112,7 @@ fec_restart(struct net_device *ndev)
}
if (fep->bufdesc_ex)
- ecntl |= (1 << 4);
+ ecntl |= FEC_ECR_EN1588;
#ifndef CONFIG_M5272
/* Enable the MIB statistic event counters */
@@ -1120,6 +1126,14 @@ fec_restart(struct net_device *ndev)
if (fep->bufdesc_ex)
fec_ptp_start_cyclecounter(ndev);
+ /* Restart PPS if needed */
+ if (fep->pps_enable) {
+ /* Clear flag so fec_ptp_enable_pps() doesn't return immediately */
+ fep->pps_enable = 0;
+ fec_ptp_restore_state(fep);
+ fep->ptp_caps.enable(&fep->ptp_caps, &ptp_rq, 1);
+ }
+
/* Enable interrupts we wish to service */
if (fep->link)
writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
@@ -1155,6 +1169,8 @@ fec_stop(struct net_device *ndev)
struct fec_enet_private *fep = netdev_priv(ndev);
u32 rmii_mode = readl(fep->hwp + FEC_R_CNTRL) & (1 << 8);
u32 val;
+ struct ptp_clock_request ptp_rq = { .type = PTP_CLK_REQ_PPS };
+ u32 ecntl = 0;
/* We cannot expect a graceful transmit stop without link !!! */
if (fep->link) {
@@ -1164,6 +1180,8 @@ fec_stop(struct net_device *ndev)
netdev_err(ndev, "Graceful transmit stop did not complete!\n");
}
+ fec_ptp_save_state(fep);
+
/* Whack a reset. We should wait for this.
* For i.MX6SX SOC, enet use AXI bus, we use disable MAC
* instead of reset MAC itself.
@@ -1185,12 +1203,28 @@ fec_stop(struct net_device *ndev)
}
writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
+ if (fep->bufdesc_ex)
+ ecntl |= FEC_ECR_EN1588;
+
/* We have to keep ENET enabled to have MII interrupt stay working */
if (fep->quirks & FEC_QUIRK_ENET_MAC &&
!(fep->wol_flag & FEC_WOL_FLAG_SLEEP_ON)) {
- writel(2, fep->hwp + FEC_ECNTRL);
+ ecntl |= FEC_ECR_ETHEREN;
writel(rmii_mode, fep->hwp + FEC_R_CNTRL);
}
+
+ writel(ecntl, fep->hwp + FEC_ECNTRL);
+
+ if (fep->bufdesc_ex)
+ fec_ptp_start_cyclecounter(ndev);
+
+ /* Restart PPS if needed */
+ if (fep->pps_enable) {
+ /* Clear flag so fec_ptp_enable_pps() doesn't return immediately */
+ fep->pps_enable = 0;
+ fec_ptp_restore_state(fep);
+ fep->ptp_caps.enable(&fep->ptp_caps, &ptp_rq, 1);
+ }
}
diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c
index a5077eff305b..60b3195a4ab4 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -269,7 +269,7 @@ void fec_ptp_start_cyclecounter(struct net_device *ndev)
fep->cc.mult = FEC_CC_MULT;
/* reset the ns time counter */
- timecounter_init(&fep->tc, &fep->cc, ktime_to_ns(ktime_get_real()));
+ timecounter_init(&fep->tc, &fep->cc, ktime_get_real_ns());
spin_unlock_irqrestore(&fep->tmreg_lock, flags);
}
@@ -628,7 +628,36 @@ void fec_ptp_stop(struct platform_device *pdev)
struct net_device *ndev = platform_get_drvdata(pdev);
struct fec_enet_private *fep = netdev_priv(ndev);
+ if (fep->pps_enable)
+ fec_ptp_enable_pps(fep, 0);
+
cancel_delayed_work_sync(&fep->time_keep);
if (fep->ptp_clock)
ptp_clock_unregister(fep->ptp_clock);
}
+
+void fec_ptp_save_state(struct fec_enet_private *fep)
+{
+ u32 atime_inc_corr;
+
+ fec_ptp_gettime(&fep->ptp_caps, &fep->ptp_saved_state.ts_phc);
+ fep->ptp_saved_state.ns_sys = ktime_get_ns();
+
+ fep->ptp_saved_state.at_corr = readl(fep->hwp + FEC_ATIME_CORR);
+ atime_inc_corr = readl(fep->hwp + FEC_ATIME_INC) & FEC_T_INC_CORR_MASK;
+ fep->ptp_saved_state.at_inc_corr = (u8)(atime_inc_corr >> FEC_T_INC_CORR_OFFSET);
+}
+
+int fec_ptp_restore_state(struct fec_enet_private *fep)
+{
+ u32 atime_inc = readl(fep->hwp + FEC_ATIME_INC) & FEC_T_INC_MASK;
+ u64 ns_sys;
+
+ writel(fep->ptp_saved_state.at_corr, fep->hwp + FEC_ATIME_CORR);
+ atime_inc |= ((u32)fep->ptp_saved_state.at_inc_corr) << FEC_T_INC_CORR_OFFSET;
+ writel(atime_inc, fep->hwp + FEC_ATIME_INC);
+
+ ns_sys = ktime_get_ns() - fep->ptp_saved_state.ns_sys;
+ timespec64_add_ns(&fep->ptp_saved_state.ts_phc, ns_sys);
+ return fec_ptp_settime(&fep->ptp_caps, &fep->ptp_saved_state.ts_phc);
+}
--
2.25.1
^ permalink raw reply related
* Re: mainline build failure due to 332f1795ca20 ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression")
From: Paul Menzel @ 2022-08-12 11:44 UTC (permalink / raw)
To: Sudip Mukherjee
Cc: torvalds, Jakub Kicinski, Marcel Holtmann, Johan Hedberg,
Luiz Augusto von Dentz, David S. Miller, Eric Dumazet,
Paolo Abeni, linux-bluetooth, netdev, linux-kernel,
Palmer Dabbelt
In-Reply-To: <YvY4xdZEWAPosFdJ@debian>
Dear Sudip,
Am 12.08.22 um 13:25 schrieb Sudip Mukherjee (Codethink):
> The latest mainline kernel branch fails to build csky and mips allmodconfig
> with gcc-12.
>
> mips error is:
>
> In function 'memcmp',
> inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9,
> inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2003:15:
> ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
> 44 | #define __underlying_memcmp __builtin_memcmp
> | ^
> ./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp'
> 420 | return __underlying_memcmp(p, q, size);
> | ^~~~~~~~~~~~~~~~~~~
> In function 'memcmp',
> inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9,
> inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2004:15:
> ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
> 44 | #define __underlying_memcmp __builtin_memcmp
> | ^
> ./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp'
> 420 | return __underlying_memcmp(p, q, size);
> | ^~~~~~~~~~~~~~~~~~~
>
>
> csky error is:
>
> In file included from net/bluetooth/l2cap_core.c:37:
> In function 'bacmp',
> inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2003:15:
> ./include/net/bluetooth/bluetooth.h:347:16: error: 'memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
> 347 | return memcmp(ba1, ba2, sizeof(bdaddr_t));
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> In function 'bacmp',
> inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2004:15:
> ./include/net/bluetooth/bluetooth.h:347:16: error: 'memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread]
> 347 | return memcmp(ba1, ba2, sizeof(bdaddr_t));
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> git bisect pointed to 332f1795ca20 ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression").
> And, reverting that commit has fixed the build failure.
>
> Already reported at https://lore.kernel.org/lkml/YvVQEDs75pxSgxjM@debian/
> and Jacub is looking at a fix, but this is just my usual build failure
> mail of mainline branch for Linus's information.
Does *[PATCH] Bluetooth: L2CAP: Elide a string overflow warning* [1] fix it?
Kind regards,
Paul
PS:
> --
> Regards
> Sudip
Only if you care, your signature delimiter is missing a trailing space [2].
[1]:
https://lore.kernel.org/linux-bluetooth/20220812055249.8037-1-palmer@rivosinc.com/T/#t
[2]: https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter
^ permalink raw reply
* Re: [RFCv7 PATCH net-next 36/36] net: redefine the prototype of netdev_features_t
From: shenjian (K) @ 2022-08-12 11:43 UTC (permalink / raw)
To: Jakub Kicinski, Alexander Lobakin
Cc: davem, andrew, ecree.xilinx, hkallweit1, saeed, leon, netdev,
linuxarm
In-Reply-To: <20220811081311.0f549b39@kernel.org>
在 2022/8/11 23:13, Jakub Kicinski 写道:
> On Thu, 11 Aug 2022 15:07:57 +0200 Alexander Lobakin wrote:
>
>>> Yes, Jakub also mentioned this.
>>>
>>> But there are many existed features interfaces(e.g. ndo_fix_features,
>>> ndo_features_check), use netdev_features_t as return value. Then we
>>> have to change their prototype.
>> We have to do 12k lines of changes already :D
>> You know, 16 bytes is probably fine to return directly and it will
>> be enough for up to 128 features (+64 more comparing to the
>> mainline). OTOH, using pointers removes that "what if/when", so
>> it's more flexible in that term. So that's why I asked for other
>> folks' opinions -- 2 PoVs doesn't seem enough here.
> >From a quick grep it seems like the and() is mostly used in some form
> of:
>
> features = and(features, mask);
>
> and we already have netdev_features_clear() which modifies its first
> argument. I'd also have and() update its first arg rather than return
> the result as a value.
ok, I will follow the behaviour of bitmap_and().
> It will require changing the prototype of
> ndo_features_check() :( But yeah, I reckon we shouldn't be putting of
> refactoring, best if we make all the changes at once than have to
> revisit this once the flags grow again and return by value starts to
> be a problem.
> .
>
Thanks,
Jian
^ permalink raw reply
* Re: [PATCH V5 0/6] ifcvf/vDPA: support query device config space through netlink
From: Zhu, Lingshan @ 2022-08-12 11:41 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: jasowang, virtualization, netdev, kvm, parav, xieyongji,
gautam.dawar
In-Reply-To: <20220812071638-mutt-send-email-mst@kernel.org>
On 8/12/2022 7:17 PM, Michael S. Tsirkin wrote:
> On Fri, Aug 12, 2022 at 07:14:39AM -0400, Michael S. Tsirkin wrote:
>> On Fri, Aug 12, 2022 at 06:44:54PM +0800, Zhu Lingshan wrote:
>>> This series allows userspace to query device config space of vDPA
>>> devices and the management devices through netlink,
>>> to get multi-queue, feature bits and etc.
>>>
>>> This series has introduced a new netlink attr
>>> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES, this should be used to query
>>> features of vDPA devices than the management device.
>>>
>>> Please help review.
>> I can't merge this for this merge window.
>> Am I right when I say that the new thing here is patch 5/6 + new
>> comments?
>> If yes I can queue it out of the window, on top.
> So at this point, can you please send patches on top of the vhost
> tree? I think these are just patches 3 and 5 but please confirm.
I will rebase them on vhost tree and resend them soon, main changes are
in patch 5,
we have made MTU, MAC, MQ conditional there. And there are some new
comments as
you suggested.
Thanks,
Zhu Lingshan
>
>
>>> Thanks!
>>> Zhu Lingshan
>>>
>>> Changes rom V4:
>>> (1) Read MAC, MTU, MQ conditionally (Michael)
>>> (2) If VIRTIO_NET_F_MAC not set, don't report MAC to userspace
>>> (3) If VIRTIO_NET_F_MTU not set, report 1500 to userspace
>>> (4) Add comments to the new attr
>>> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES(Michael)
>>> (5) Add comments for reporting the device status as LE(Michael)
>>>
>>> Changes from V3:
>>> (1)drop the fixes tags(Parva)
>>> (2)better commit log for patch 1/6(Michael)
>>> (3)assign num_queues to max_supported_vqs than max_vq_pairs(Jason)
>>> (4)initialize virtio pci capabilities in the probe() function.
>>>
>>> Changes from V2:
>>> Add fixes tags(Parva)
>>>
>>> Changes from V1:
>>> (1) Use __virito16_to_cpu(true, xxx) for the le16 casting(Jason)
>>> (2) Add a comment in ifcvf_get_config_size(), to explain
>>> why we should return the minimum value of
>>> sizeof(struct virtio_net_config) and the onboard
>>> cap size(Jason)
>>> (3) Introduced a new attr VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES
>>> (4) Show the changes of iproute2 output before and after 5/6 patch(Jason)
>>> (5) Fix cast warning in vdpa_fill_stats_rec()
>>>
>>> Zhu Lingshan (6):
>>> vDPA/ifcvf: get_config_size should return a value no greater than dev
>>> implementation
>>> vDPA/ifcvf: support userspace to query features and MQ of a management
>>> device
>>> vDPA: allow userspace to query features of a vDPA device
>>> vDPA: !FEATURES_OK should not block querying device config space
>>> vDPA: Conditionally read fields in virtio-net dev config space
>>> fix 'cast to restricted le16' warnings in vdpa.c
>>>
>>> drivers/vdpa/ifcvf/ifcvf_base.c | 13 ++-
>>> drivers/vdpa/ifcvf/ifcvf_base.h | 2 +
>>> drivers/vdpa/ifcvf/ifcvf_main.c | 142 +++++++++++++++++---------------
>>> drivers/vdpa/vdpa.c | 82 ++++++++++++------
>>> include/uapi/linux/vdpa.h | 3 +
>>> 5 files changed, 149 insertions(+), 93 deletions(-)
>>>
>>> --
>>> 2.31.1
^ permalink raw reply
* Re: [PATCH net 1/2] dt: ar803x: Document disable-hibernation property
From: Russell King (Oracle) @ 2022-08-12 11:36 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: Wei Fang, andrew@lunn.ch, hkallweit1@gmail.com,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, robh+dt@kernel.org,
krzysztof.kozlowski+dt@linaro.org, f.fainelli@gmail.com,
netdev@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <14cf568e-d7ee-886e-5122-69b2e58b8717@linaro.org>
On Fri, Aug 12, 2022 at 02:25:42PM +0300, Krzysztof Kozlowski wrote:
> hibernation is a feature, but 'disable-hibernation' is not. DTS
> describes the hardware, not policy or driver bejhvior. Why disabling
> hibernation is a property of hardware? How you described, it's not,
> therefore either property is not for DT or it has to be phrased
> correctly to describe the hardware.
However, older DT descriptions need to be compatible with later kernels,
and as existing setups have hibernation enabled, introducing a property
to _enable_ hibernation (which implies if the property is not present,
hibernation is disabled) changes the behaviour with older DT, thereby
breaking backwards compatibility.
Yes, DT needs to describe hardware, but there are also other
constraints too.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply
* [PATCH bpf] xsk: fix corrupted packets for XDP_SHARED_UMEM
From: Magnus Karlsson @ 2022-08-12 11:32 UTC (permalink / raw)
To: magnus.karlsson, bjorn, ast, daniel, netdev, maciej.fijalkowski
Cc: jonathan.lemon, bpf, Alasdair McWilliam, Intrusion Shield Team
From: Magnus Karlsson <magnus.karlsson@intel.com>
Fix an issue in XDP_SHARED_UMEM mode together with aligned mode were
packets are corrupted for the second and any further sockets bound to
the same umem. In other words, this does not affect the first socket
bound to the umem. The culprit for this bug is that the initialization
of the DMA addresses for the pre-populated xsk buffer pool entries was
not performed for any socket but the first one bound to the umem. Only
the linear array of DMA addresses was populated. Fix this by
populating the DMA addresses in the xsk buffer pool for every socket
bound to the same umem.
Fixes: 94033cd8e73b8 ("xsk: Optimize for aligned case")
Reported-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com>
Reported-by: Intrusion Shield Team <dnevil@intrusion.com>
Tested-by: Alasdair McWilliam <alasdair.mcwilliam@outlook.com>
Link: https://lore.kernel.org/xdp-newbies/6205E10C-292E-4995-9D10-409649354226@outlook.com/
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
net/xdp/xsk_buff_pool.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index f70112176b7c..a71a8c6edf55 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -379,6 +379,16 @@ static void xp_check_dma_contiguity(struct xsk_dma_map *dma_map)
static int xp_init_dma_info(struct xsk_buff_pool *pool, struct xsk_dma_map *dma_map)
{
+ if (!pool->unaligned) {
+ u32 i;
+
+ for (i = 0; i < pool->heads_cnt; i++) {
+ struct xdp_buff_xsk *xskb = &pool->heads[i];
+
+ xp_init_xskb_dma(xskb, pool, dma_map->dma_pages, xskb->orig_addr);
+ }
+ }
+
pool->dma_pages = kvcalloc(dma_map->dma_pages_cnt, sizeof(*pool->dma_pages), GFP_KERNEL);
if (!pool->dma_pages)
return -ENOMEM;
@@ -428,12 +438,6 @@ int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
if (pool->unaligned)
xp_check_dma_contiguity(dma_map);
- else
- for (i = 0; i < pool->heads_cnt; i++) {
- struct xdp_buff_xsk *xskb = &pool->heads[i];
-
- xp_init_xskb_dma(xskb, pool, dma_map->dma_pages, xskb->orig_addr);
- }
err = xp_init_dma_info(pool, dma_map);
if (err) {
base-commit: 4e4588f1c4d2e67c993208f0550ef3fae33abce4
--
2.34.1
^ permalink raw reply related
* Re: [RFCv7 PATCH net-next 36/36] net: redefine the prototype of netdev_features_t
From: shenjian (K) @ 2022-08-12 11:30 UTC (permalink / raw)
To: Alexander Lobakin
Cc: davem, kuba, andrew, ecree.xilinx, hkallweit1, saeed, leon,
netdev, linuxarm
In-Reply-To: <20220811130757.9904-1-alexandr.lobakin@intel.com>
在 2022/8/11 21:07, Alexander Lobakin 写道:
> From: "shenjian (K)" <shenjian15@huawei.com>
> Date: Wed, 10 Aug 2022 21:34:43 +0800
>
> BTW, you replied in HTML instead of plain text and korg mail servers
> rejected it. So non-Ccs can't see it. Just be aware that LKML
> accepts plain text only :)
>
>> 在 2022/8/10 19:35, Alexander Lobakin 写道:
>> > From: Jian Shen <shenjian15@huawei.com>
>> > Date: Wed, 10 Aug 2022 11:06:24 +0800
>> >
>> >> For the prototype of netdev_features_t is u64, and the number
>> >> of netdevice feature bits is 64 now. So there is no space to
>> >> introduce new feature bit. Change the prototype of netdev_features_t
>> >> from u64 to structure below:
>> >> typedef struct {
>> >> DECLARE_BITMAP(bits, NETDEV_FEATURE_COUNT);
>> >> } netdev_features_t;
>> >>
>> >> Rewrite the netdev_features helpers to adapt with new prototype.
>> >>
>> >> To avoid mistake using NETIF_F_XXX as NETIF_F_XXX_BIT as
>> >> input macroes for above helpers, remove all the macroes
>> >> of NETIF_F_XXX for single feature bit. Serveal macroes remained
>> >> temporarily, by some precompile dependency.
>> >>
>> >> With the prototype is no longer u64, the implementation of print
>> >> interface for netdev features(%pNF) is changed to bitmap. So
>> >> does the implementation of net/ethtool/.
>> >>
>> >> Signed-off-by: Jian Shen <shenjian15@huawei.com>
>> >> ---
>> >> drivers/net/ethernet/amazon/ena/ena_netdev.c | 12 +-
>> >> .../net/ethernet/intel/i40e/i40e_debugfs.c | 12 +-
>> >> .../ethernet/netronome/nfp/nfp_net_common.c | 4 +-
>> >> .../net/ethernet/pensando/ionic/ionic_lif.c | 4 +-
>> >> include/linux/netdev_features.h | 101 ++----------
>> >> include/linux/netdev_features_helper.h | 149
>> +++++++++++-------
>> >> include/linux/netdevice.h | 7 +-
>> >> include/linux/skbuff.h | 4 +-
>> >> include/net/ip_tunnels.h | 2 +-
>> >> lib/vsprintf.c | 11 +-
>> >> net/ethtool/features.c | 96 ++++-------
>> >> net/ethtool/ioctl.c | 46 ++++--
>> >> net/mac80211/main.c | 3 +-
>> >> 13 files changed, 201 insertions(+), 250 deletions(-)
>> > [...]
>> >
>> >> -static inline int find_next_netdev_feature(u64 feature, unsigned
>> long start)
>> >> -{
>> >> - /* like BITMAP_LAST_WORD_MASK() for u64
>> >> - * this sets the most significant 64 - start to 0.
>> >> - */
>> >> - feature &= ~0ULL >> (-start & ((sizeof(feature) * 8) - 1));
>> >> -
>> >> - return fls64(feature) - 1;
>> >> -}
>> >> +#define NETIF_F_HW_VLAN_CTAG_TX
>> >> +#define NETIF_F_IPV6_CSUM
>> >> +#define NETIF_F_TSO
>> >> +#define NETIF_F_GSO
>> > Uhm, what are those empty definitions for? They look confusing.
>> I kept them temporary for some drivers use them like below:
>> for example in drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
>> #ifdef NETIF_F_HW_VLAN_CTAG_TX
>> #include <linux/if_vlan.h>
>> #endif
>>
>> So far I haven't got a good way to replace it.
>
> I believe such constructs sneaked in from some development/draft
> versions of the code, as those definitions are always here, so
> this is just redundant/pointless.
> Just remove those ifdefs and always include the file.
> The empty definitions you left in netdev_features.h are confusing,
> I'd not keep them.
>
OK, I will add a new patch to remove them.
>>
>> >> >> /* This goes for the MSB to the LSB through the set feature
>> bits,
>> >> * mask_addr should be a u64 and bit an int
>> >> */
>
> [...]
>
>> >> +#define GSO_ENCAP_FEATURES (((u64)1 << NETIF_F_GSO_GRE_BIT)
>> | \
>> >> + ((u64)1 << NETIF_F_GSO_GRE_CSUM_BIT) | \
>> >> + ((u64)1 << NETIF_F_GSO_IPXIP4_BIT) | \
>> >> + ((u64)1 << NETIF_F_GSO_IPXIP6_BIT) | \
>> >> + (((u64)1 << NETIF_F_GSO_UDP_TUNNEL_BIT) | \
>> >> + ((u64)1 << NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT)))
>> > 1) 1ULL;
>> ok,will fix it
>>
>> > 2) what if we get a new GSO encap type which's bit will be higher
>> > than 64?
>> So far I prefer to use this. It's used to assgned to
>> skb_shinfo(skb)->gso_type, which prototype
>> is 'unsigned int'. Once new gso encap type introduced, we should
>> extend the gso_type first.
>
> But ::gso_type accepts flags like %SKB_GSO_DODGY and so on, not
> netdev_features, doesn't it?
>
>>
>>
>> >> +
>> >> #endif /* _LINUX_NETDEV_FEATURES_H */
>
> [...]
>
>> >> static inline netdev_features_t
>> >> netdev_features_and(const netdev_features_t a, const
>> netdev_features_t b)
>> >> {
>> >> - return a & b;
>> >> + netdev_features_t dst;
>> >> +
>> >> + bitmap_and(dst.bits, a.bits, b.bits, NETDEV_FEATURE_COUNT);
>> >> + return dst;
>> > Yeah, so as I wrote previously, not a good idea to return a whole
>> > bitmap/structure.
>> >
>> > netdev_features_and(*dst, const *a, const *b)
>> > {
>> > return bitmap_and(); // bitmap_and() actually returns useful value
>> > }
>> >
>> > I mean, 16 bytes (currently 8, but some new features will come
>> > pretty shortly, I'm sure) are probably okayish, but... let's see
>> > what other folks think, but even Linus wrote about this recently
>> > BTW.
>> Yes, Jakub also mentioned this.
>>
>> But there are many existed features interfaces(e.g. ndo_fix_features,
>> ndo_features_check), use netdev_features_t as return value. Then we
>> have to change their prototype.
>
> We have to do 12k lines of changes already :D
> You know, 16 bytes is probably fine to return directly and it will
> be enough for up to 128 features (+64 more comparing to the
> mainline). OTOH, using pointers removes that "what if/when", so
> it's more flexible in that term. So that's why I asked for other
> folks' opinions -- 2 PoVs doesn't seem enough here.
>
>>
>> second problem is for the helpers' definition. For example:
>> When we introduce helper like netdev_features_zero(netdev_features_t
>> *features)
>> without change prototype of netdev_features_t.
>> once covert netdev_features_t from u64 to unsigned long *, then it
>> becomes
>> netdev_features_zero(unsigned long **features), result in much
>> redundant work
>> to adjust it to netdev_features_zero(unsigned long *features).
>
>>
>>
>> >> }
>
> [...]
>
>> >> static noinline_for_stack
>> >> -char *netdev_bits(char *buf, char *end, const void *addr,
>> >> +char *netdev_bits(char *buf, char *end, void *addr,
>> >> struct printf_spec spec, const char *fmt)
>> >> {
>> >> - unsigned long long num;
>> >> - int size;
>> >> + netdev_features_t *features;
>> > const? We're printing.
>> It will cause compile warning for bitmap_string use features->bits
>> as input param without "const" definition in its prototype.
>
> Oof, that's weird. I checked the function you mentioned and don't
> see any reason why it would require non-RO access to the bitmap it
> prints.
> Could you maybe please change its proto to take const bitmap, so
> that it won't complain on your code? As a separate patch going right
> before this one in your series.
>
OK, will do it.
>> >> >> if (check_pointer(&buf, end, addr, spec))
>> >> return buf;
>> >> >> switch (fmt[1]) {
>> >> case 'F':
>> >> - num = *(const netdev_features_t *)addr;
>> >> - size = sizeof(netdev_features_t);
>> >> + features = (netdev_features_t *)addr;
>> > Casts are not needed when assigning from `void *`.
>> ok, will fix it
>> >> + spec.field_width = NETDEV_FEATURE_COUNT;
>> >> break;
>> >> default:
>> >> return error_string(buf, end, "(%pN?)", spec);
>> >> }
>> >> >> - return special_hex_number(buf, end, num, size);
>> >> + return bitmap_string(buf, end, features->bits, spec, fmt);
>> >> }
>
> [...]
>
>> >> -- >> 2.33.0
>> > That's my last review email for now. Insane amount of work, I'm glad
>> > someone did it finally. Thanks a lot!
>> >
>> > Olek
>> > .
>> Hi Olek,
>> Grateful for your review. You made a lot of valuable suggestions. I
>> will
>> check and continue refine the patchset.
>>
>> Thanks again!
>>
>> Jian
>
> Thanks!
> Olek
> .
>
Thanks,
Jian
^ permalink raw reply
* Re: [PATCH bpf-next 05/15] bpf: Fix incorrect mem_cgroup_put
From: Yafang Shao @ 2022-08-12 11:25 UTC (permalink / raw)
To: Muchun Song
Cc: Shakeel Butt, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin Lau, Song Liu, Yonghong Song,
john fastabend, KP Singh, Stanislav Fomichev, Hao Luo, jolsa,
Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Andrew Morton, netdev, bpf, Linux MM
In-Reply-To: <870C70CA-C760-40A5-8A04-F0962EFDF507@linux.dev>
On Fri, Aug 12, 2022 at 1:34 PM Muchun Song <muchun.song@linux.dev> wrote:
>
>
>
> > On Aug 12, 2022, at 08:27, Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Aug 11, 2022 at 11:47 PM Shakeel Butt <shakeelb@google.com> wrote:
> >>
> >> On Thu, Aug 11, 2022 at 10:49:13AM +0800, Yafang Shao wrote:
> >>> On Thu, Aug 11, 2022 at 1:07 AM Shakeel Butt <shakeelb@google.com> wrote:
> >>>>
> >>>> On Wed, Aug 10, 2022 at 03:18:30PM +0000, Yafang Shao wrote:
> >>>>> The memcg may be the root_mem_cgroup, in which case we shouldn't put it.
> >>>>
> >>>> No, it is ok to put root_mem_cgroup. css_put already handles the root
> >>>> cgroups.
> >>>>
> >>>
> >>> Ah, this commit log doesn't describe the issue clearly. I should improve it.
> >>> The issue is that in bpf_map_get_memcg() it doesn't get the objcg if
> >>> map->objcg is NULL (that can happen if the map belongs to the root
> >>> memcg), so we shouldn't put the objcg if map->objcg is NULL neither in
> >>> bpf_map_put_memcg().
> >>
> >> Sorry I am still not understanding. We are not 'getting' objcg in
> >> bpf_map_get_memcg(). We are 'getting' memcg from map->objcg and if that
> >> is NULL the function is returning root memcg and putting root memcg is
> >> totally fine.
> >
> > When the map belongs to root_mem_cgroup, the map->objcg is NULL, right ?
> > See also bpf_map_save_memcg() and it describes clearly in the comment -
> >
> > static void bpf_map_save_memcg(struct bpf_map *map)
> > {
> > /* Currently if a map is created by a process belonging to the root
> > * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> > * So we have to check map->objcg for being NULL each time it's
> > * being used.
> > */
> > map->objcg = get_obj_cgroup_from_current();
> > }
> >
> > So for the root_mem_cgroup case, bpf_map_get_memcg() will return
> > root_mem_cgroup directly without getting its css, right ? See below,
> >
> > static struct mem_cgroup *bpf_map_get_memcg(const struct bpf_map *map)
> > {
> >
> > if (map->objcg)
> > return get_mem_cgroup_from_objcg(map->objcg);
> >
> > return root_mem_cgroup; // without css_get(&memcg->css);
> > }
> >
> > But it will put the css unconditionally. See below,
> >
> > memcg = bpf_map_get_memcg(map);
> > ...
> > mem_cgroup_put(memcg);
> >
> > So we should put it *conditionally* as well.
>
> Hi,
>
> No. We could put root_mem_cgroup unconditionally since the root css
> is treated as no reference css. See css_put().
>
> static inline void css_put(struct cgroup_subsys_state *css)
> {
> // The root memcg’s css has been set with CSS_NO_REF.
> if (!(css->flags & CSS_NO_REF))
> percpu_ref_put(&css->refcnt);
> }
Indeed. I missed that.
The call stack is so deep that I didn't look into it :(
Thanks for the information.
--
Regards
Yafang
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox