* Re: [PATCH net-next v2 2/2] net: stmmac: Support enhanced addressing mode for DWMAC 4.10
From: Thierry Reding @ 2019-09-09 19:13 UTC (permalink / raw)
To: Jose Abreu
Cc: David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
Jon Hunter, Bitan Biswas, netdev@vger.kernel.org,
linux-tegra@vger.kernel.org
In-Reply-To: <BN8PR12MB3266AAC6FF4819EC25CB087BD3B70@BN8PR12MB3266.namprd12.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 1400 bytes --]
On Mon, Sep 09, 2019 at 04:05:52PM +0000, Jose Abreu wrote:
> From: Thierry Reding <thierry.reding@gmail.com>
> Date: Sep/09/2019, 16:25:46 (UTC+00:00)
>
> > @@ -79,6 +79,10 @@ static void dwmac4_dma_init_rx_chan(void __iomem *ioaddr,
> > value = value | (rxpbl << DMA_BUS_MODE_RPBL_SHIFT);
> > writel(value, ioaddr + DMA_CHAN_RX_CONTROL(chan));
> >
> > + if (dma_cfg->eame)
>
> There is no need for this check. If EAME is not enabled then upper 32
> bits will be zero.
The idea here was to potentially guard against this register not being
available on some revisions. Having the check here would avoid access to
the register if the device doesn't support enhanced addressing.
>
> > + writel(upper_32_bits(dma_rx_phy),
> > + ioaddr + DMA_CHAN_RX_BASE_ADDR_HI(chan));
> > +
> > writel(lower_32_bits(dma_rx_phy), ioaddr + DMA_CHAN_RX_BASE_ADDR(chan));
> > }
>
> > @@ -97,6 +101,10 @@ static void dwmac4_dma_init_tx_chan(void __iomem *ioaddr,
> >
> > writel(value, ioaddr + DMA_CHAN_TX_CONTROL(chan));
> >
> > + if (dma_cfg->eame)
>
> Same here.
>
> > + writel(upper_32_bits(dma_tx_phy),
> > + ioaddr + DMA_CHAN_TX_BASE_ADDR_HI(chan));
> > +
> > writel(lower_32_bits(dma_tx_phy), ioaddr + DMA_CHAN_TX_BASE_ADDR(chan));
> > }
>
> Also, please provide a cover letter in next submission.
Alright, will do.
Thierry
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH] net/mlx5: reduce stack usage in FW tracer
From: Saeed Mahameed @ 2019-09-09 19:39 UTC (permalink / raw)
To: davem@davemloft.net, arnd@arndb.de, leon@kernel.org
Cc: cai@lca.pw, linux-rdma@vger.kernel.org, Moshe Shemesh,
Feras Daoud, linux-kernel@vger.kernel.org, Eran Ben Elisha,
netdev@vger.kernel.org, Erez Shitrit
In-Reply-To: <20190906151123.1088455-1-arnd@arndb.de>
On Fri, 2019-09-06 at 17:11 +0200, Arnd Bergmann wrote:
> It's generally not ok to put a 512 byte buffer on the stack, as
> kernel
> stack is a scarce resource:
>
> drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:660:13:
> error: stack frame size of 1032 bytes in function
> 'mlx5_fw_tracer_handle_traces' [-Werror,-Wframe-larger-than=]
>
> This is done in a context that is allowed to sleep, so using
> dynamic allocation is ok as well. I'm not too worried about
> runtime overhead, as this already contains an snprintf() and
> other expensive functions.
>
> Fixes: 70dd6fdb8987 ("net/mlx5: FW tracer, parse traces and kernel
> tracing support")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> .../mellanox/mlx5/core/diag/fw_tracer.c | 21 ++++++++++-------
> --
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> index 2011eaf15cc5..d81e78060f9f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> @@ -557,16 +557,16 @@ static void mlx5_tracer_print_trace(struct
> tracer_string_format *str_frmt,
> struct mlx5_core_dev *dev,
> u64 trace_timestamp)
> {
> - char tmp[512];
> -
Hi Arnd, thanks for the patch,
this function is very perfomance critical when fw traces are activated
to pull some fw content on error situations, using kmalloc here might
become a problem and stall the system further more if the problem was
initially due to lack of memory.
since this function only needs 512 bytes maybe we should mark it as
noinline to avoid any extra stack usages on the caller function
mlx5_fw_tracer_handle_traces ?
> - snprintf(tmp, sizeof(tmp), str_frmt->string,
> - str_frmt->params[0],
> - str_frmt->params[1],
> - str_frmt->params[2],
> - str_frmt->params[3],
> - str_frmt->params[4],
> - str_frmt->params[5],
> - str_frmt->params[6]);
> + char *tmp = kasprintf(GFP_KERNEL, str_frmt->string,
> + str_frmt->params[0],
> + str_frmt->params[1],
> + str_frmt->params[2],
> + str_frmt->params[3],
> + str_frmt->params[4],
> + str_frmt->params[5],
> + str_frmt->params[6]);
> + if (!tmp)
> + return;
>
> trace_mlx5_fw(dev->tracer, trace_timestamp, str_frmt->lost,
> str_frmt->event_id, tmp);
> @@ -576,6 +576,7 @@ static void mlx5_tracer_print_trace(struct
> tracer_string_format *str_frmt,
>
> /* remove it from hash */
> mlx5_tracer_clean_message(str_frmt);
> + kfree(tmp);
> }
>
> static int mlx5_tracer_handle_string_trace(struct mlx5_fw_tracer
> *tracer,
^ permalink raw reply
* RE: VRF Issue Since kernel 5
From: Gowen @ 2019-09-09 19:43 UTC (permalink / raw)
To: Alexis Bauvin; +Cc: netdev@vger.kernel.org
In-Reply-To: <7CAF2F23-5D88-4BE7-B703-06B71D1EDD11@online.net>
Hi alexis,
I did this earlier today and no change.
I’ll look at trying to see if the return traffic is hitting the INPUT table tomorrow with some conntrack rules and see if it hits any of those rules. If not then do you have any hints/techniques I can use to find the source of the issue?
Gareth
-----Original Message-----
From: Alexis Bauvin <abauvin@online.net>
Sent: 09 September 2019 13:02
To: Gowen <gowen@potatocomputing.co.uk>
Cc: netdev@vger.kernel.org
Subject: Re: VRF Issue Since kernel 5
Hi,
I guess all routing from the management VRF itself is working correctly (i.e. cURLing an IP from this VRF or digging any DNS), and it is your route leakage that’s at fault.
Could you try swapping the local and l3mdev rules?
`ip rule del pref 0; ip rule add from all lookup local pref 1001`
I faced security issues and behavioral weirdnesses from the default kernel rule ordering regarding the default vrf.
Alexis
> Le 9 sept. 2019 à 12:53, Gowen <gowen@potatocomputing.co.uk> a écrit :
>
> Hi Alexis,
>
> Admin@NETM06:~$ sysctl net.ipv4.tcp_l3mdev_accept
> net.ipv4.tcp_l3mdev_accept = 1
>
> Admin@NETM06:~$ sudo ip vrf exec mgmt-vrf curl kernel.org
> curl: (6) Could not resolve host: kernel.org
>
> the failure to resolve is the same with all DNS lookups from any
> process I've run
>
> The route is there from the guide I originally used, I can't remember
> the purpose but I know I don't need it - I've removed it now and no
> change
>
> Admin@NETM06:~$ ip rule show
> 0: from all lookup local
> 1000: from all lookup [l3mdev-table]
> 32766: from all lookup main
> 32767: from all lookup default
>
> I could switch the VRFs over, but this is a test-box and i have prod boxes on this as well so not so keen on that if I can avoid it.
>
> From what I can speculate, because the TCP return traffic is met with an RST, it looks like it may be something to do with iptables - but even if I set the policy to ACCEPT and flush all the rules, the behaviour remains the same.
>
> Is it possible that the TCP stack isn't aware of the session (as is mapped to wrong VRF internally or something to that effect) and is therefore sending the RST?
>
> Gareth
> From: Alexis Bauvin <abauvin@online.net>
> Sent: 09 September 2019 10:28
> To: Gowen <gowen@potatocomputing.co.uk>
> Cc: netdev@vger.kernel.org <netdev@vger.kernel.org>
> Subject: Re: VRF Issue Since kernel 5
>
> Hi,
>
> There has been some changes regarding VRF isolation in Linux 5 IIRC,
> namely proper isolation of the default VRF.
>
> Some things you may try:
>
> - looking at the l3mdev_accept sysctls (e.g.
> `net.ipv4.tcp_l3mdev_accept`)
> - querying stuff from the management vrf through `ip vrf exec vrf-mgmt <stuff>`
> e.g. `ip vrf exec vrf-mgmt curl kernel.org`
> `ip vrf exec vrf-mgmt dig @1.1.1.1 kernel.org`
> - reversing your logic: default VRF is your management one, the other one is for your
> other boxes
>
> Also, your `unreachable default metric 4278198272` route looks odd to me.
>
> What are your routing rules? (`ip rule`)
>
> Alexis
>
> > Le 9 sept. 2019 à 09:46, Gowen <gowen@potatocomputing.co.uk> a écrit :
> >
> > Hi there,
> >
> > Dave A said this was the mailer to send this to:
> >
> >
> > I’ve been using my management interface in a VRF for several months now and it’s worked perfectly – I’ve been able to update/upgrade the packages just fine and iptables works excellently with it – exactly as I needed.
> >
> >
> > Since Kernel 5 though I am no longer able to update – but the issue
> > is quite a curious one as some traffic appears to be fine (DNS
> > lookups use VRF correctly) but others don’t (updating/upgrading the
> > packages)
> >
> >
> > I have on this device 2 interfaces:
> > Eth0 for management – inbound SSH, DNS, updates/upgrades
> > Eth1 for managing other boxes (ansible using SSH)
> >
> >
> > Link and addr info shown below:
> >
> >
> > Admin@NETM06:~$ ip link show
> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master mgmt-vrf state UP mode DEFAULT group default qlen 1000
> > link/ether 00:22:48:07:cc:ad brd ff:ff:ff:ff:ff:ff
> > 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
> > link/ether 00:22:48:07:c9:6c brd ff:ff:ff:ff:ff:ff
> > 4: mgmt-vrf: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP mode DEFAULT group default qlen 1000
> > link/ether 8a:f6:26:65:02:5a brd ff:ff:ff:ff:ff:ff
> >
> >
> > Admin@NETM06:~$ ip addr
> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > inet 127.0.0.1/8 scope host lo
> > valid_lft forever preferred_lft forever
> > inet6 ::1/128 scope host
> > valid_lft forever preferred_lft forever
> > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master mgmt-vrf state UP group default qlen 1000
> > link/ether 00:22:48:07:cc:ad brd ff:ff:ff:ff:ff:ff
> > inet 10.24.12.10/24 brd 10.24.12.255 scope global eth0
> > valid_lft forever preferred_lft forever
> > inet6 fe80::222:48ff:fe07:ccad/64 scope link
> > valid_lft forever preferred_lft forever
> > 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
> > link/ether 00:22:48:07:c9:6c brd ff:ff:ff:ff:ff:ff
> > inet 10.24.12.9/24 brd 10.24.12.255 scope global eth1
> > valid_lft forever preferred_lft forever
> > inet6 fe80::222:48ff:fe07:c96c/64 scope link
> > valid_lft forever preferred_lft forever
> > 4: mgmt-vrf: <NOARP,MASTER,UP,LOWER_UP> mtu 65536 qdisc noqueue state UP group default qlen 1000
> > link/ether 8a:f6:26:65:02:5a brd ff:ff:ff:ff:ff:ff
> >
> >
> >
> > the production traffic is all in the 10.0.0.0/8 network (eth1 global
> > VRF) except for a few subnets (DNS) which are routed out eth0
> > (mgmt-vrf)
> >
> >
> > Admin@NETM06:~$ ip route show
> > default via 10.24.12.1 dev eth0
> > 10.0.0.0/8 via 10.24.12.1 dev eth1
> > 10.24.12.0/24 dev eth1 proto kernel scope link src 10.24.12.9
> > 10.24.65.0/24 via 10.24.12.1 dev eth0
> > 10.25.65.0/24 via 10.24.12.1 dev eth0
> > 10.26.0.0/21 via 10.24.12.1 dev eth0
> > 10.26.64.0/21 via 10.24.12.1 dev eth0
> >
> >
> > Admin@NETM06:~$ ip route show vrf mgmt-vrf default via 10.24.12.1
> > dev eth0 unreachable default metric 4278198272
> > 10.24.12.0/24 dev eth0 proto kernel scope link src 10.24.12.10
> > 10.24.65.0/24 via 10.24.12.1 dev eth0
> > 10.25.65.0/24 via 10.24.12.1 dev eth0
> > 10.26.0.0/21 via 10.24.12.1 dev eth0
> > 10.26.64.0/21 via 10.24.12.1 dev eth0
> >
> >
> >
> > The strange activity occurs when I enter the command “sudo apt update” as I can resolve the DNS request (10.24.65.203 or 10.24.64.203, verified with tcpdump) out eth0 but for the actual update traffic there is no activity:
> >
> >
> > sudo tcpdump -i eth0 '(host 10.24.65.203 or host 10.25.65.203) and
> > port 53' -n <OUTPUT OMITTED FOR BREVITY>
> > 10:06:05.268735 IP 10.24.12.10.39963 > 10.24.65.203.53: 48798+ [1au]
> > A? security.ubuntu.com. (48) <OUTPUT OMITTED FOR BREVITY>
> > 10:06:05.284403 IP 10.24.65.203.53 > 10.24.12.10.39963: 48798 13/0/1
> > A 91.189.91.23, A 91.189.88.24, A 91.189.91.26, A 91.189.88.162, A
> > 91.189.88.149, A 91.189.91.24, A 91.189.88.173, A 91.189.88.177, A
> > 91.189.88.31, A 91.189.91.14, A 91.189.88.176, A 91.189.88.175, A
> > 91.189.88.174 (256)
> >
> >
> >
> > You can see that the update traffic is returned but is not accepted
> > by the stack and a RST is sent
> >
> >
> > Admin@NETM06:~$ sudo tcpdump -i eth0 '(not host 168.63.129.16 and
> > port 80)' -n
> > tcpdump: verbose output suppressed, use -v or -vv for full protocol
> > decode listening on eth0, link-type EN10MB (Ethernet), capture size
> > 262144 bytes
> > 10:17:12.690658 IP 10.24.12.10.40216 > 91.189.88.175.80: Flags [S],
> > seq 2279624826, win 64240, options [mss 1460,sackOK,TS val
> > 2029365856 ecr 0,nop,wscale 7], length 0
> > 10:17:12.691929 IP 10.24.12.10.52362 > 91.189.95.83.80: Flags [S], seq 1465797256, win 64240, options [mss 1460,sackOK,TS val 3833463674 ecr 0,nop,wscale 7], length 0
> > 10:17:12.696270 IP 91.189.88.175.80 > 10.24.12.10.40216: Flags [S.], seq 968450722, ack 2279624827, win 28960, options [mss 1418,sackOK,TS val 81957103 ecr 2029365856,nop,wscale 7], length 0
> > 10:17:12.696301 IP 10.24.12.10.40216 > 91.189.88.175.80: Flags [R], seq 2279624827, win 0, length 0
> > 10:17:12.697884 IP 91.189.95.83.80 > 10.24.12.10.52362: Flags [S.], seq 4148330738, ack 1465797257, win 28960, options [mss 1418,sackOK,TS val 2257624414 ecr 3833463674,nop,wscale 8], length 0
> > 10:17:12.697909 IP 10.24.12.10.52362 > 91.189.95.83.80: Flags [R],
> > seq 1465797257, win 0, length 0
> >
> >
> >
> >
> > I can emulate the DNS lookup using netcat in the vrf:
> >
> >
> > sudo ip vrf exec mgmt-vrf nc -u 10.24.65.203 53
> >
> >
> > then interactively enter the binary for a www.google.co.uk request:
> >
> >
> > 0035624be394010000010000000000010377777706676f6f676c6502636f02756b00
> > 000100010000290200000000000000
> >
> >
> > This returns as expected:
> >
> >
> > 00624be394010000010000000000010377777706676f6f676c6502636f02756b0000
> > 0100010000290200000000000000
> >
> >
> > I can run:
> >
> >
> > Admin@NETM06:~$ host www.google.co.uk www.google.co.uk has address
> > 172.217.169.3 www.google.co.uk has IPv6 address
> > 2a00:1450:4009:80d::2003
> >
> >
> > but I get a timeout for:
> >
> >
> > sudo ip vrf exec mgmt-vrf host www.google.co.uk ;; connection timed
> > out; no servers could be reached
> >
> >
> >
> > However I can take a repo address and vrf exec to it on port 80:
> >
> >
> > Admin@NETM06:~$ sudo ip vrf exec mgmt-vrf nc 91.189.91.23 80 hello
> > HTTP/1.1 400 Bad Request
> > <OUTPUT OMITTED>
> >
> > My iptables rule:
> >
> >
> > sudo iptables -Z
> > Admin@NETM06:~$ sudo iptables -L -v
> > Chain INPUT (policy DROP 16 packets, 3592 bytes)
> > pkts bytes target prot opt in out source destination
> > 44 2360 ACCEPT tcp -- any any anywhere anywhere tcp spt:http ctstate RELATED,ESTABLISHED
> > 83 10243 ACCEPT udp -- any any anywhere anywhere udp spt:domain ctstate RELATED,ESTABLISHED
> >
> >
> >
> > I cannot find out why the update isn’t working. Any help greatly
> > appreciated
> >
> >
> > Kind Regards,
> >
> >
> > Gareth
^ permalink raw reply
* [PATCH net-next 1/2] mlx5: steering: use correct enum type
From: Arnd Bergmann @ 2019-09-09 19:50 UTC (permalink / raw)
To: Saeed Mahameed, Leon Romanovsky, David S. Miller
Cc: Arnd Bergmann, Alex Vesker, Erez Shitrit, Mark Bloch, netdev,
linux-rdma, linux-kernel, clang-built-linux
The newly added code triggers a harmless warning with
clang:
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1080:9: error: implicit conversion from enumeration type 'enum mlx5_reformat_ctx_type' to different enumeration type 'enum mlx5dr_action_type' [-Werror,-Wenum-conversion]
rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1084:51: error: implicit conversion from enumeration type 'enum mlx5dr_action_type' to different enumeration type 'enum mlx5_reformat_ctx_type' [-Werror,-Wenum-conversion]
ret = mlx5dr_cmd_create_reformat_ctx(dmn->mdev, rt, data_sz, data,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~
Change it to use mlx5_reformat_ctx_type instead of mlx5dr_action_type.
Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
index a02f87f85c17..7d81a7735de5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
@@ -1074,7 +1074,7 @@ dr_action_create_reformat_action(struct mlx5dr_domain *dmn,
case DR_ACTION_TYP_L2_TO_TNL_L2:
case DR_ACTION_TYP_L2_TO_TNL_L3:
{
- enum mlx5dr_action_type rt;
+ enum mlx5_reformat_ctx_type rt;
if (action->action_type == DR_ACTION_TYP_L2_TO_TNL_L2)
rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
--
2.20.0
^ permalink raw reply related
* [PATCH net-next 2/2] mlx5: fix type mismatch
From: Arnd Bergmann @ 2019-09-09 19:50 UTC (permalink / raw)
To: Leon Romanovsky, Doug Ledford, Jason Gunthorpe, Saeed Mahameed,
David S. Miller
Cc: Arnd Bergmann, Alex Vesker, Erez Shitrit, Mark Bloch,
Yishai Hadas, Ariel Levkovich, linux-rdma, linux-kernel, netdev
In-Reply-To: <20190909195024.3268499-1-arnd@arndb.de>
In mlx5, pointers to 'phys_addr_t' and 'u64' are mixed since the addition
of the pool memory allocator, leading to incorrect behavior on 32-bit
architectures and this compiler warning:
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_icm_pool.c:121:8: error: incompatible pointer types passing 'u64 *' (aka 'unsigned long long *') to parameter of type 'phys_addr_t *' (aka 'unsigned int *') [-Werror,-Wincompatible-pointer-types]
&icm_mr->dm.addr, &icm_mr->dm.obj_id);
^~~~~~~~~~~~~~~~
include/linux/mlx5/driver.h:1092:39: note: passing argument to parameter 'addr' here
u64 length, u16 uid, phys_addr_t *addr, u32 *obj_id);
Change the code to use 'u64' consistently in place of 'phys_addr_t' to
fix this. The alternative of using phys_addr_t more would require a larger
rework.
Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/infiniband/hw/mlx5/cmd.c | 4 ++--
drivers/infiniband/hw/mlx5/cmd.h | 4 ++--
drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c | 4 ++--
include/linux/mlx5/driver.h | 4 ++--
5 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/cmd.c b/drivers/infiniband/hw/mlx5/cmd.c
index 4937947400cd..fbcad70ef6dc 100644
--- a/drivers/infiniband/hw/mlx5/cmd.c
+++ b/drivers/infiniband/hw/mlx5/cmd.c
@@ -82,7 +82,7 @@ int mlx5_cmd_modify_cong_params(struct mlx5_core_dev *dev,
return mlx5_cmd_exec(dev, in, in_size, out, sizeof(out));
}
-int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, phys_addr_t *addr,
+int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, u64 *addr,
u64 length, u32 alignment)
{
struct mlx5_core_dev *dev = dm->dev;
@@ -157,7 +157,7 @@ int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, phys_addr_t *addr,
return -ENOMEM;
}
-int mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, phys_addr_t addr, u64 length)
+int mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, u64 addr, u64 length)
{
struct mlx5_core_dev *dev = dm->dev;
u64 hw_start_addr = MLX5_CAP64_DEV_MEM(dev, memic_bar_start_addr);
diff --git a/drivers/infiniband/hw/mlx5/cmd.h b/drivers/infiniband/hw/mlx5/cmd.h
index 169cab4915e3..2ea7a45a6abb 100644
--- a/drivers/infiniband/hw/mlx5/cmd.h
+++ b/drivers/infiniband/hw/mlx5/cmd.h
@@ -44,9 +44,9 @@ int mlx5_cmd_query_cong_params(struct mlx5_core_dev *dev, int cong_point,
int mlx5_cmd_query_ext_ppcnt_counters(struct mlx5_core_dev *dev, void *out);
int mlx5_cmd_modify_cong_params(struct mlx5_core_dev *mdev,
void *in, int in_size);
-int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, phys_addr_t *addr,
+int mlx5_cmd_alloc_memic(struct mlx5_dm *dm, u64 *addr,
u64 length, u32 alignment);
-int mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, phys_addr_t addr, u64 length);
+int mlx5_cmd_dealloc_memic(struct mlx5_dm *dm, u64 addr, u64 length);
void mlx5_cmd_dealloc_pd(struct mlx5_core_dev *dev, u32 pdn, u16 uid);
void mlx5_cmd_destroy_tir(struct mlx5_core_dev *dev, u32 tirn, u16 uid);
void mlx5_cmd_destroy_tis(struct mlx5_core_dev *dev, u32 tisn, u16 uid);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 2ceaef3ea3fb..476d4447f901 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -561,7 +561,7 @@ enum mlx5_ib_mtt_access_flags {
struct mlx5_ib_dm {
struct ib_dm ibdm;
- phys_addr_t dev_addr;
+ u64 dev_addr;
u32 type;
size_t size;
union {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c
index e065c2f68f5a..ad4d7484fa63 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/dm.c
@@ -90,7 +90,7 @@ void mlx5_dm_cleanup(struct mlx5_core_dev *dev)
}
int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type,
- u64 length, u16 uid, phys_addr_t *addr, u32 *obj_id)
+ u64 length, u16 uid, u64 *addr, u32 *obj_id)
{
u32 num_blocks = DIV_ROUND_UP_ULL(length, MLX5_SW_ICM_BLOCK_SIZE(dev));
u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {};
@@ -175,7 +175,7 @@ int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type,
EXPORT_SYMBOL_GPL(mlx5_dm_sw_icm_alloc);
int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type,
- u64 length, u16 uid, phys_addr_t addr, u32 obj_id)
+ u64 length, u16 uid, u64 addr, u32 obj_id)
{
u32 num_blocks = DIV_ROUND_UP_ULL(length, MLX5_SW_ICM_BLOCK_SIZE(dev));
u32 out[MLX5_ST_SZ_DW(general_obj_out_cmd_hdr)] = {};
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 3e80f03a387f..e07f9daf7d42 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1089,9 +1089,9 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev,
struct mlx5_uars_page *mlx5_get_uars_page(struct mlx5_core_dev *mdev);
void mlx5_put_uars_page(struct mlx5_core_dev *mdev, struct mlx5_uars_page *up);
int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type,
- u64 length, u16 uid, phys_addr_t *addr, u32 *obj_id);
+ u64 length, u16 uid, u64 *addr, u32 *obj_id);
int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type,
- u64 length, u16 uid, phys_addr_t addr, u32 obj_id);
+ u64 length, u16 uid, u64 addr, u32 obj_id);
#ifdef CONFIG_MLX5_CORE_IPOIB
struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
--
2.20.0
^ permalink raw reply related
* Re: [PATCH v2] net: enable wireless core features with LEGACY_WEXT_ALLCONFIG
From: Jeff Johnson @ 2019-09-09 19:54 UTC (permalink / raw)
To: Johannes Berg
Cc: Mark Salyzyn, Greg KH, linux-kernel, kernel-team, David S. Miller,
Marcel Holtmann, linux-wireless, netdev, stable,
linux-wireless-owner
In-Reply-To: <6f3487136e71afbd4d2b621551ee14e68c4cc1ab.camel@sipsolutions.net>
On 2019-09-09 08:44, Johannes Berg wrote:
> Also, you probably know this, but in this particular case you really
> should just get rid of your wext dependencies
This.
Particularly for one out-of-tree driver with which I'm intimately
familiar there has been considerable recent work to make all WEXT code
correctly conditional, and nothing in the Android support should be
reliant upon WEXT.
^ permalink raw reply
* Re: [PATCH net-next 1/2] mlx5: steering: use correct enum type
From: Nathan Chancellor @ 2019-09-09 19:55 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Alex Vesker,
Erez Shitrit, Mark Bloch, netdev, linux-rdma, linux-kernel,
clang-built-linux
In-Reply-To: <20190909195024.3268499-1-arnd@arndb.de>
On Mon, Sep 09, 2019 at 09:50:08PM +0200, Arnd Bergmann wrote:
> The newly added code triggers a harmless warning with
> clang:
>
> drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1080:9: error: implicit conversion from enumeration type 'enum mlx5_reformat_ctx_type' to different enumeration type 'enum mlx5dr_action_type' [-Werror,-Wenum-conversion]
> rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
> ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1084:51: error: implicit conversion from enumeration type 'enum mlx5dr_action_type' to different enumeration type 'enum mlx5_reformat_ctx_type' [-Werror,-Wenum-conversion]
> ret = mlx5dr_cmd_create_reformat_ctx(dmn->mdev, rt, data_sz, data,
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~
>
> Change it to use mlx5_reformat_ctx_type instead of mlx5dr_action_type.
>
> Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
I sent the same fix a couple of days ago:
https://lore.kernel.org/netdev/20190905014733.17564-1-natechancellor@gmail.com/
I don't care which patch goes in since they are the same thing so:
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
^ permalink raw reply
* Re: [PATCH] net/mlx5: reduce stack usage in FW tracer
From: Arnd Bergmann @ 2019-09-09 20:18 UTC (permalink / raw)
To: Saeed Mahameed
Cc: davem@davemloft.net, leon@kernel.org, cai@lca.pw,
linux-rdma@vger.kernel.org, Moshe Shemesh, Feras Daoud,
linux-kernel@vger.kernel.org, Eran Ben Elisha,
netdev@vger.kernel.org, Erez Shitrit
In-Reply-To: <383db08b6001503ac45c2e12ac514208dc5a4bba.camel@mellanox.com>
On Mon, Sep 9, 2019 at 9:39 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
> On Fri, 2019-09-06 at 17:11 +0200, Arnd Bergmann wrote:
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> > @@ -557,16 +557,16 @@ static void mlx5_tracer_print_trace(struct
> > tracer_string_format *str_frmt,
> > struct mlx5_core_dev *dev,
> > u64 trace_timestamp)
> > {
> > - char tmp[512];
> > -
>
> Hi Arnd, thanks for the patch,
> this function is very perfomance critical when fw traces are activated
> to pull some fw content on error situations, using kmalloc here might
> become a problem and stall the system further more if the problem was
> initially due to lack of memory.
>
> since this function only needs 512 bytes maybe we should mark it as
> noinline to avoid any extra stack usages on the caller function
> mlx5_fw_tracer_handle_traces ?
That would shut up the warning, but doesn't sound right either.
If it's performance critical indeed, maybe the best solution would
be to also avoid the snprintf(), as that is also a rather heavyweight
function?
I could not find an easy solution for this, but I did notice the unusual way
this deals with a variable format string passed into mlx5_tracer_print_trace
along with a set of parameters, which opens up a set of possible
format string vulnerabilities as well as making mlx5_tracer_print_trace()
a bit expensive. You also take a mutex and free memory in there,
which obviously then also got allocated in the fast path.
To do this right, a better approach may be to just rely on ftrace, storing
the (pointer to the) format string and the arguments in the buffer without
creating a string. Would that be an option here?
A more minimal approach might be to move what is now the on-stack
buffer into the mlx5_fw_tracer function. I see that you already store
a copy of the string in there from mlx5_fw_tracer_save_trace(),
which conveniently also holds a mutex already that protects
it from concurrent access.
Arnd
^ permalink raw reply
* RE: [PATCH v3 0/5] Introduce variable length mdev alias
From: Parav Pandit @ 2019-09-09 20:42 UTC (permalink / raw)
To: Parav Pandit, alex.williamson@redhat.com, Jiri Pirko,
kwankhede@nvidia.com, cohuck@redhat.com, davem@davemloft.net
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
In-Reply-To: <20190902042436.23294-1-parav@mellanox.com>
Hi Alex,
> -----Original Message-----
> From: Parav Pandit <parav@mellanox.com>
> Sent: Sunday, September 1, 2019 11:25 PM
> To: alex.williamson@redhat.com; Jiri Pirko <jiri@mellanox.com>;
> kwankhede@nvidia.com; cohuck@redhat.com; davem@davemloft.net
> Cc: kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> netdev@vger.kernel.org; Parav Pandit <parav@mellanox.com>
> Subject: [PATCH v3 0/5] Introduce variable length mdev alias
>
> To have consistent naming for the netdevice of a mdev and to have consistent
> naming of the devlink port [1] of a mdev, which is formed using
> phys_port_name of the devlink port, current UUID is not usable because UUID
> is too long.
>
> UUID in string format is 36-characters long and in binary 128-bit.
> Both formats are not able to fit within 15 characters limit of netdev name.
>
> It is desired to have mdev device naming consistent using UUID.
> So that widely used user space framework such as ovs [2] can make use of
> mdev representor in similar way as PCIe SR-IOV VF and PF representors.
>
> Hence,
> (a) mdev alias is created which is derived using sha1 from the mdev name.
> (b) Vendor driver describes how long an alias should be for the child mdev
> created for a given parent.
> (c) Mdev aliases are unique at system level.
> (d) alias is created optionally whenever parent requested.
> This ensures that non networking mdev parents can function without alias
> creation overhead.
>
> This design is discussed at [3].
>
> An example systemd/udev extension will have,
>
> 1. netdev name created using mdev alias available in sysfs.
>
> mdev UUID=83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
> mdev 12 character alias=cd5b146a80a5
>
> netdev name of this mdev = enmcd5b146a80a5 Here en = Ethernet link m =
> mediated device
>
> 2. devlink port phys_port_name created using mdev alias.
> devlink phys_port_name=pcd5b146a80a5
>
> This patchset enables mdev core to maintain unique alias for a mdev.
>
> Patch-1 Introduces mdev alias using sha1.
> Patch-2 Ensures that mdev alias is unique in a system.
> Patch-3 Exposes mdev alias in a sysfs hirerchy, update Documentation
> Patch-4 Introduces mdev_alias() API.
> Patch-5 Extends mtty driver to optionally provide alias generation.
> This also enables to test UUID based sha1 collision and trigger error handling
> for duplicate sha1 results.
>
> [1] http://man7.org/linux/man-pages/man8/devlink-port.8.html
> [2] https://docs.openstack.org/os-vif/latest/user/plugins/ovs.html
> [3] https://patchwork.kernel.org/cover/11084231/
>
> ---
> Changelog:
> v2->v3:
> - Addressed comment from Yunsheng Lin
> - Changed strcmp() ==0 to !strcmp()
> - Addressed comment from Cornelia Hunk
> - Merged sysfs Documentation patch with syfs patch
> - Added more description for alias return value
Did you get a chance review this updated series?
I addressed Cornelia's and yours comment.
I do not think allocating alias memory twice, once for comparison and once for storing is good idea or moving alias generation logic inside the mdev_list_lock(). So I didn't address that suggestion of Cornelia.
> v1->v2:
> - Corrected a typo from 'and' to 'an'
> - Addressed comments from Alex Williamson
> - Kept mdev_device naturally aligned
> - Added error checking for crypt_*() calls
> - Moved alias NULL check at beginning
> - Added mdev_alias() API
> - Updated mtty driver to show example mdev_alias() usage
> - Changed return type of generate_alias() from int to char*
> v0->v1:
> - Addressed comments from Alex Williamson, Cornelia Hunk and Mark Bloch
> - Moved alias length check outside of the parent lock
> - Moved alias and digest allocation from kvzalloc to kzalloc
> - &alias[0] changed to alias
> - alias_length check is nested under get_alias_length callback check
> - Changed comments to start with an empty line
> - Added comment where alias memory ownership is handed over to mdev
> device
> - Fixed cleaunup of hash if mdev_bus_register() fails
> - Updated documentation for new sysfs alias file
> - Improved commit logs to make description more clear
> - Fixed inclusiong of alias for NULL check
> - Added ratelimited debug print for sha1 hash collision error
>
> Parav Pandit (5):
> mdev: Introduce sha1 based mdev alias
> mdev: Make mdev alias unique among all mdevs
> mdev: Expose mdev alias in sysfs tree
> mdev: Introduce an API mdev_alias
> mtty: Optionally support mtty alias
>
> .../driver-api/vfio-mediated-device.rst | 9 ++
> drivers/vfio/mdev/mdev_core.c | 142 +++++++++++++++++-
> drivers/vfio/mdev/mdev_private.h | 5 +-
> drivers/vfio/mdev/mdev_sysfs.c | 26 +++-
> include/linux/mdev.h | 5 +
> samples/vfio-mdev/mtty.c | 13 ++
> 6 files changed, 190 insertions(+), 10 deletions(-)
>
> --
> 2.19.2
^ permalink raw reply
* [PATCH net-next] ipv6: Don't use dst gateway directly in ip6_confirm_neigh()
From: Stefano Brivio @ 2019-09-09 20:44 UTC (permalink / raw)
To: David Miller
Cc: Guillaume Nault, Julian Anastasov, Nicolas Dichtel, David Ahern,
netdev
This is the equivalent of commit 2c6b55f45d53 ("ipv6: fix neighbour
resolution with raw socket") for ip6_confirm_neigh(): we can send a
packet with MSG_CONFIRM on a raw socket for a connected route, so the
gateway would be :: here, and we should pick the next hop using
rt6_nexthop() instead.
This was found by code review and, to the best of my knowledge, doesn't
actually fix a practical issue: the destination address from the packet
is not considered while confirming a neighbour, as ip6_confirm_neigh()
calls choose_neigh_daddr() without passing the packet, so there are no
similar issues as the one fixed by said commit.
A possible source of issues with the existing implementation might come
from the fact that, if we have a cached dst, we won't consider it,
while rt6_nexthop() takes care of that. I might just not be creative
enough to find a practical problem here: the only way to affect this
with cached routes is to have one coming from an ICMPv6 redirect, but
if the next hop is a directly connected host, there should be no
topology for which a redirect applies here, and tests with redirected
routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on
raw sockets destined to a directly connected host.
However, directly using the dst gateway here is not consistent anymore
with neighbour resolution, and, in general, as we want the next hop,
using rt6_nexthop() looks like the only sane way to fetch it.
Reported-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
net/ipv6/route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 7a5d331cdefa..874641d4d2a1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -227,7 +227,7 @@ static void ip6_confirm_neigh(const struct dst_entry *dst, const void *daddr)
struct net_device *dev = dst->dev;
struct rt6_info *rt = (struct rt6_info *)dst;
- daddr = choose_neigh_daddr(&rt->rt6i_gateway, NULL, daddr);
+ daddr = choose_neigh_daddr(rt6_nexthop(rt, &in6addr_any), NULL, daddr);
if (!daddr)
return;
if (dev->flags & (IFF_NOARP | IFF_LOOPBACK))
--
2.20.1
^ permalink raw reply related
* [PATCH] net/ibmvnic: Fix missing { in __ibmvnic_reset
From: Michal Suchanek @ 2019-09-09 20:44 UTC (permalink / raw)
To: netdev, David S. Miller, Juliet Kim
Cc: Michal Suchanek, Benjamin Herrenschmidt, Paul Mackerras,
Michael Ellerman, Thomas Falcon, John Allen, linuxppc-dev,
linux-kernel
In-Reply-To: <20190907.173714.1400426487600521308.davem@davemloft.net>
Commit 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue")
adds a } without corresponding { causing build break.
Fixes: 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
---
drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 6644cabc8e75..5cb55ea671e3 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1984,7 +1984,7 @@ static void __ibmvnic_reset(struct work_struct *work)
rwi = get_next_rwi(adapter);
while (rwi) {
if (adapter->state == VNIC_REMOVING ||
- adapter->state == VNIC_REMOVED)
+ adapter->state == VNIC_REMOVED) {
kfree(rwi);
rc = EBUSY;
break;
--
2.22.0
^ permalink raw reply related
* [PATCH net] tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
From: Neal Cardwell @ 2019-09-09 20:56 UTC (permalink / raw)
To: David Miller
Cc: netdev, Neal Cardwell, Yuchung Cheng, Soheil Hassas Yeganeh,
Eric Dumazet
Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
TCP_ECN_QUEUE_CWR.
Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
the behavior of data receivers, and deciding whether to reflect
incoming IP ECN CE marks as outgoing TCP th->ece marks. The
TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
is only called from tcp_undo_cwnd_reduction() by data senders during
an undo, so it should zero the sender-side state,
TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
incoming CE bits on incoming data packets just because outgoing
packets were spuriously retransmitted.
The bug has been reproduced with packetdrill to manifest in a scenario
with RFC3168 ECN, with an incoming data packet with CE bit set and
carrying a TCP timestamp value that causes cwnd undo. Before this fix,
the IP CE bit was ignored and not reflected in the TCP ECE header bit,
and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
even though the cwnd reduction had been undone. After this fix, the
sender properly reflects the CE bit and does not set the W bit.
Note: the bug actually predates 2005 git history; this Fixes footer is
chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
the patch applies cleanly (since before this commit the code was in a
.h file).
Fixes: bdf1ee5d3bd3 ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp_input.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c21e8a22fb3b..8a1cd93dbb09 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -266,7 +266,7 @@ static void tcp_ecn_accept_cwr(struct sock *sk, const struct sk_buff *skb)
static void tcp_ecn_withdraw_cwr(struct tcp_sock *tp)
{
- tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
+ tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR;
}
static void __tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb)
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related
* RE: [Intel-wired-lan] [PATCH] i40e: clear __I40E_VIRTCHNL_OP_PENDING on invalid min tx rate
From: Bowers, AndrewX @ 2019-09-09 21:00 UTC (permalink / raw)
To: intel-wired-lan@lists.osuosl.org; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190903060810.30775-1-sassmann@kpanic.de>
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Stefan Assmann
> Sent: Monday, September 2, 2019 11:08 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; davem@davemloft.net; sassmann@kpanic.de
> Subject: [Intel-wired-lan] [PATCH] i40e: clear
> __I40E_VIRTCHNL_OP_PENDING on invalid min tx rate
>
> In the case of an invalid min tx rate being requested
> i40e_ndo_set_vf_bw() immediately returns -EINVAL instead of releasing
> __I40E_VIRTCHNL_OP_PENDING first.
>
> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
> ---
> drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
^ permalink raw reply
* RE: [Intel-wired-lan] [PATCH] net/ixgbevf: make array api static const, makes object smaller
From: Bowers, AndrewX @ 2019-09-09 21:03 UTC (permalink / raw)
To: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
Cc: kernel-janitors@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20190906113356.9985-1-colin.king@canonical.com>
> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Colin King
> Sent: Friday, September 6, 2019 4:34 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; David S . Miller
> <davem@davemloft.net>; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org
> Cc: kernel-janitors@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH] net/ixgbevf: make array api static const,
> makes object smaller
>
> From: Colin Ian King <colin.king@canonical.com>
>
> Don't populate the array api on the stack but instead make it static const.
> Makes the object code smaller by 58 bytes.
>
> Before:
> text data bss dec hex filename
> 82969 9763 256 92988 16b3c ixgbevf/ixgbevf_main.o
>
> After:
> text data bss dec hex filename
> 82815 9859 256 92930 16b02 ixgbevf/ixgbevf_main.o
>
> (gcc version 9.2.1, amd64)
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
^ permalink raw reply
* Re: [PATCH net-next 1/2] mlx5: steering: use correct enum type
From: Nick Desaulniers @ 2019-09-09 21:14 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Arnd Bergmann, Saeed Mahameed, Leon Romanovsky, David S. Miller,
Alex Vesker, Erez Shitrit, Mark Bloch, Network Development,
linux-rdma, LKML, clang-built-linux
In-Reply-To: <20190909195513.GA94662@archlinux-threadripper>
On Mon, Sep 9, 2019 at 12:55 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> On Mon, Sep 09, 2019 at 09:50:08PM +0200, Arnd Bergmann wrote:
> > The newly added code triggers a harmless warning with
> > clang:
> >
> > drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1080:9: error: implicit conversion from enumeration type 'enum mlx5_reformat_ctx_type' to different enumeration type 'enum mlx5dr_action_type' [-Werror,-Wenum-conversion]
> > rt = MLX5_REFORMAT_TYPE_L2_TO_L2_TUNNEL;
> > ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c:1084:51: error: implicit conversion from enumeration type 'enum mlx5dr_action_type' to different enumeration type 'enum mlx5_reformat_ctx_type' [-Werror,-Wenum-conversion]
> > ret = mlx5dr_cmd_create_reformat_ctx(dmn->mdev, rt, data_sz, data,
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~
> >
> > Change it to use mlx5_reformat_ctx_type instead of mlx5dr_action_type.
> >
> > Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality")
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>
> I sent the same fix a couple of days ago:
>
> https://lore.kernel.org/netdev/20190905014733.17564-1-natechancellor@gmail.com/
>
> I don't care which patch goes in since they are the same thing so:
>
> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
GCC recently gained support (via me scanning the commit logs for an
unrelated feature) for -Wenum-warnings (though I think it's off by
default) so hopefully these kinds of issues will taper off over time.
--
Thanks,
~Nick Desaulniers
^ permalink raw reply
* Re: [PATCH] net/ibmvnic: Fix missing { in __ibmvnic_reset
From: Tyrel Datwyler @ 2019-09-09 21:21 UTC (permalink / raw)
To: Michal Suchanek, netdev, David S. Miller, Juliet Kim
Cc: linux-kernel, Thomas Falcon, Paul Mackerras, John Allen,
linuxppc-dev
In-Reply-To: <20190909204451.7929-1-msuchanek@suse.de>
On 9/9/19 1:44 PM, Michal Suchanek wrote:
> Commit 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue")
> adds a } without corresponding { causing build break.
>
> Fixes: 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue")
> Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index 6644cabc8e75..5cb55ea671e3 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1984,7 +1984,7 @@ static void __ibmvnic_reset(struct work_struct *work)
> rwi = get_next_rwi(adapter);
> while (rwi) {
> if (adapter->state == VNIC_REMOVING ||
> - adapter->state == VNIC_REMOVED)
> + adapter->state == VNIC_REMOVED) {
> kfree(rwi);
> rc = EBUSY;
> break;
>
^ permalink raw reply
* Re: [PATCH net] tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
From: Eric Dumazet @ 2019-09-09 21:26 UTC (permalink / raw)
To: Neal Cardwell; +Cc: David Miller, netdev, Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <20190909205602.248472-1-ncardwell@google.com>
On Mon, Sep 9, 2019 at 10:56 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
> TCP_ECN_QUEUE_CWR.
>
> Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
> the behavior of data receivers, and deciding whether to reflect
> incoming IP ECN CE marks as outgoing TCP th->ece marks. The
> TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
> and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
> is only called from tcp_undo_cwnd_reduction() by data senders during
> an undo, so it should zero the sender-side state,
> TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
> incoming CE bits on incoming data packets just because outgoing
> packets were spuriously retransmitted.
>
> The bug has been reproduced with packetdrill to manifest in a scenario
> with RFC3168 ECN, with an incoming data packet with CE bit set and
> carrying a TCP timestamp value that causes cwnd undo. Before this fix,
> the IP CE bit was ignored and not reflected in the TCP ECE header bit,
> and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
> even though the cwnd reduction had been undone. After this fix, the
> sender properly reflects the CE bit and does not set the W bit.
>
> Note: the bug actually predates 2005 git history; this Fixes footer is
> chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
> the patch applies cleanly (since before this commit the code was in a
> .h file).
>
> Fixes: bdf1ee5d3bd3 ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Thanks Neal
^ permalink raw reply
* Re: [PATCH] net/mlx5: reduce stack usage in FW tracer
From: Saeed Mahameed @ 2019-09-09 21:53 UTC (permalink / raw)
To: arnd@arndb.de
Cc: cai@lca.pw, linux-rdma@vger.kernel.org, davem@davemloft.net,
Moshe Shemesh, Feras Daoud, linux-kernel@vger.kernel.org,
Eran Ben Elisha, netdev@vger.kernel.org, leon@kernel.org,
Erez Shitrit
In-Reply-To: <CAK8P3a0_VhZ9hYmc6P3Qx+Z6WSHh3PVZ7JZh7Tr=R1CAKvqWmA@mail.gmail.com>
On Mon, 2019-09-09 at 22:18 +0200, Arnd Bergmann wrote:
> On Mon, Sep 9, 2019 at 9:39 PM Saeed Mahameed <saeedm@mellanox.com>
> wrote:
> > On Fri, 2019-09-06 at 17:11 +0200, Arnd Bergmann wrote:
> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
> > > @@ -557,16 +557,16 @@ static void mlx5_tracer_print_trace(struct
> > > tracer_string_format *str_frmt,
> > > struct mlx5_core_dev *dev,
> > > u64 trace_timestamp)
> > > {
> > > - char tmp[512];
> > > -
> >
> > Hi Arnd, thanks for the patch,
> > this function is very perfomance critical when fw traces are
> > activated
> > to pull some fw content on error situations, using kmalloc here
> > might
> > become a problem and stall the system further more if the problem
> > was
> > initially due to lack of memory.
> >
> > since this function only needs 512 bytes maybe we should mark it as
> > noinline to avoid any extra stack usages on the caller function
> > mlx5_fw_tracer_handle_traces ?
>
> That would shut up the warning, but doesn't sound right either.
>
> If it's performance critical indeed, maybe the best solution would
> be to also avoid the snprintf(), as that is also a rather heavyweight
> function?
>
> I could not find an easy solution for this, but I did notice the
> unusual way
> this deals with a variable format string passed into
> mlx5_tracer_print_trace
> along with a set of parameters, which opens up a set of possible
> format string vulnerabilities as well as making
> mlx5_tracer_print_trace()
> a bit expensive. You also take a mutex and free memory in there,
> which obviously then also got allocated in the fast path.
>
> To do this right, a better approach may be to just rely on ftrace,
> storing
> the (pointer to the) format string and the arguments in the buffer
> without
> creating a string. Would that be an option here?
I am not sure how this would work, since the format parameters can
changes depending on the FW string and the specific traces.
>
> A more minimal approach might be to move what is now the on-stack
> buffer into the mlx5_fw_tracer function. I see that you already store
> a copy of the string in there from mlx5_fw_tracer_save_trace(),
> which conveniently also holds a mutex already that protects
> it from concurrent access.
>
This sounds plausible.
So for now let's do this or the noinline approach, Please let me know
which one do you prefer, if it is the mutex protected buffer, i can do
it myself.
I will open an internal task and discussion then address your valuable
points in a future submission, since we already in rc8 I don't want to
take the risk now.
Thanks for your feedback !
Saeed.
> Arnd
^ permalink raw reply
* [PATCH] lib/Kconfig: fix OBJAGG in lib/ menu structure
From: Randy Dunlap @ 2019-09-09 21:54 UTC (permalink / raw)
To: netdev@vger.kernel.org, LKML, David Miller; +Cc: Jiri Pirko, Ido Schimmel
From: Randy Dunlap <rdunlap@infradead.org>
Keep the "Library routines" menu intact by moving OBJAGG into it.
Otherwise OBJAGG is displayed/presented as an orphan in the
various config menus.
Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: David S. Miller <davem@davemloft.net>
---
lib/Kconfig | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- linux-next-20190904.orig/lib/Kconfig
+++ linux-next-20190904/lib/Kconfig
@@ -631,6 +631,9 @@ config SBITMAP
config PARMAN
tristate "parman" if COMPILE_TEST
+config OBJAGG
+ tristate "objagg" if COMPILE_TEST
+
config STRING_SELFTEST
tristate "Test string functions"
@@ -653,6 +656,3 @@ config GENERIC_LIB_CMPDI2
config GENERIC_LIB_UCMPDI2
bool
-
-config OBJAGG
- tristate "objagg" if COMPILE_TEST
^ permalink raw reply
* Re: [PATCH net-next v8 2/3] net: phy: add support for clause 37 auto-negotiation
From: Tao Ren @ 2019-09-09 21:56 UTC (permalink / raw)
To: Andrew Lunn, Florian Fainelli, Heiner Kallweit, David S . Miller,
Vladimir Oltean, Arun Parameswaran, Justin Chen,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
openbmc@lists.ozlabs.org
In-Reply-To: <20190909204906.2191290-1-taoren@fb.com>
Hi Andrew / Florian,
On 9/9/19 1:49 PM, Tao Ren wrote:
> From: Heiner Kallweit <hkallweit1@gmail.com>
>
> This patch adds support for clause 37 1000Base-X auto-negotiation.
>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> Signed-off-by: Tao Ren <taoren@fb.com>
> Tested-by: René van Dorst <opensource@vdorst.com>
Just want to check if I missed your comments (I noticed some wired problems on my email client)?
If yes (sorry I missed something), could you please re-send your comments? And I will address them as soon as possible.
Thanks,
Tao
> ---
> Changes in v8:
> - Rebased the patch on top of net-next HEAD.
> Changes in v7:
> - Update "if (AUTONEG_ENABLE != phydev->autoneg)" to
> "if (phydev->autoneg != AUTONEG_ENABLE)" so checkpatch.pl is happy.
> Changes in v6:
> - add "Signed-off-by: Tao Ren <taoren@fb.com>"
> Changes in v1-v5:
> - nothing changed. It's given v5 just to align with the version of
> patch series.
>
> drivers/net/phy/phy_device.c | 139 +++++++++++++++++++++++++++++++++++
> include/linux/phy.h | 4 +
> 2 files changed, 143 insertions(+)
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 8933f07d39e9..dd05f750bb3f 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1607,6 +1607,40 @@ static int genphy_config_advert(struct phy_device *phydev)
> return changed;
> }
>
> +/**
> + * genphy_c37_config_advert - sanitize and advertise auto-negotiation parameters
> + * @phydev: target phy_device struct
> + *
> + * Description: Writes MII_ADVERTISE with the appropriate values,
> + * after sanitizing the values to make sure we only advertise
> + * what is supported. Returns < 0 on error, 0 if the PHY's advertisement
> + * hasn't changed, and > 0 if it has changed. This function is intended
> + * for Clause 37 1000Base-X mode.
> + */
> +static int genphy_c37_config_advert(struct phy_device *phydev)
> +{
> + u16 adv = 0;
> +
> + /* Only allow advertising what this PHY supports */
> + linkmode_and(phydev->advertising, phydev->advertising,
> + phydev->supported);
> +
> + if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
> + phydev->advertising))
> + adv |= ADVERTISE_1000XFULL;
> + if (linkmode_test_bit(ETHTOOL_LINK_MODE_Pause_BIT,
> + phydev->advertising))
> + adv |= ADVERTISE_1000XPAUSE;
> + if (linkmode_test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
> + phydev->advertising))
> + adv |= ADVERTISE_1000XPSE_ASYM;
> +
> + return phy_modify_changed(phydev, MII_ADVERTISE,
> + ADVERTISE_1000XFULL | ADVERTISE_1000XPAUSE |
> + ADVERTISE_1000XHALF | ADVERTISE_1000XPSE_ASYM,
> + adv);
> +}
> +
> /**
> * genphy_config_eee_advert - disable unwanted eee mode advertisement
> * @phydev: target phy_device struct
> @@ -1715,6 +1749,54 @@ int __genphy_config_aneg(struct phy_device *phydev, bool changed)
> }
> EXPORT_SYMBOL(__genphy_config_aneg);
>
> +/**
> + * genphy_c37_config_aneg - restart auto-negotiation or write BMCR
> + * @phydev: target phy_device struct
> + *
> + * Description: If auto-negotiation is enabled, we configure the
> + * advertising, and then restart auto-negotiation. If it is not
> + * enabled, then we write the BMCR. This function is intended
> + * for use with Clause 37 1000Base-X mode.
> + */
> +int genphy_c37_config_aneg(struct phy_device *phydev)
> +{
> + int err, changed;
> +
> + if (phydev->autoneg != AUTONEG_ENABLE)
> + return genphy_setup_forced(phydev);
> +
> + err = phy_modify(phydev, MII_BMCR, BMCR_SPEED1000 | BMCR_SPEED100,
> + BMCR_SPEED1000);
> + if (err)
> + return err;
> +
> + changed = genphy_c37_config_advert(phydev);
> + if (changed < 0) /* error */
> + return changed;
> +
> + if (!changed) {
> + /* Advertisement hasn't changed, but maybe aneg was never on to
> + * begin with? Or maybe phy was isolated?
> + */
> + int ctl = phy_read(phydev, MII_BMCR);
> +
> + if (ctl < 0)
> + return ctl;
> +
> + if (!(ctl & BMCR_ANENABLE) || (ctl & BMCR_ISOLATE))
> + changed = 1; /* do restart aneg */
> + }
> +
> + /* Only restart aneg if we are advertising something different
> + * than we were before.
> + */
> + if (changed > 0)
> + return genphy_restart_aneg(phydev);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(genphy_c37_config_aneg);
> +
> /**
> * genphy_aneg_done - return auto-negotiation status
> * @phydev: target phy_device struct
> @@ -1862,6 +1944,63 @@ int genphy_read_status(struct phy_device *phydev)
> }
> EXPORT_SYMBOL(genphy_read_status);
>
> +/**
> + * genphy_c37_read_status - check the link status and update current link state
> + * @phydev: target phy_device struct
> + *
> + * Description: Check the link, then figure out the current state
> + * by comparing what we advertise with what the link partner
> + * advertises. This function is for Clause 37 1000Base-X mode.
> + */
> +int genphy_c37_read_status(struct phy_device *phydev)
> +{
> + int lpa, err, old_link = phydev->link;
> +
> + /* Update the link, but return if there was an error */
> + err = genphy_update_link(phydev);
> + if (err)
> + return err;
> +
> + /* why bother the PHY if nothing can have changed */
> + if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link)
> + return 0;
> +
> + phydev->duplex = DUPLEX_UNKNOWN;
> + phydev->pause = 0;
> + phydev->asym_pause = 0;
> +
> + if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
> + lpa = phy_read(phydev, MII_LPA);
> + if (lpa < 0)
> + return lpa;
> +
> + linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
> + phydev->lp_advertising, lpa & LPA_LPACK);
> + linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
> + phydev->lp_advertising, lpa & LPA_1000XFULL);
> + linkmode_mod_bit(ETHTOOL_LINK_MODE_Pause_BIT,
> + phydev->lp_advertising, lpa & LPA_1000XPAUSE);
> + linkmode_mod_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
> + phydev->lp_advertising,
> + lpa & LPA_1000XPAUSE_ASYM);
> +
> + phy_resolve_aneg_linkmode(phydev);
> + } else if (phydev->autoneg == AUTONEG_DISABLE) {
> + int bmcr = phy_read(phydev, MII_BMCR);
> +
> + if (bmcr < 0)
> + return bmcr;
> +
> + if (bmcr & BMCR_FULLDPLX)
> + phydev->duplex = DUPLEX_FULL;
> + else
> + phydev->duplex = DUPLEX_HALF;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(genphy_c37_read_status);
> +
> /**
> * genphy_soft_reset - software reset the PHY via BMCR_RESET bit
> * @phydev: target phy_device struct
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index a7ecbe0e55aa..cd9786ccf630 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -1104,6 +1104,10 @@ int genphy_read_mmd_unsupported(struct phy_device *phdev, int devad,
> int genphy_write_mmd_unsupported(struct phy_device *phdev, int devnum,
> u16 regnum, u16 val);
>
> +/* Clause 37 */
> +int genphy_c37_config_aneg(struct phy_device *phydev);
> +int genphy_c37_read_status(struct phy_device *phydev);
> +
> /* Clause 45 PHY */
> int genphy_c45_restart_aneg(struct phy_device *phydev);
> int genphy_c45_check_and_restart_aneg(struct phy_device *phydev, bool restart);
>
^ permalink raw reply
* Re: general protection fault in cbs_destroy
From: Cong Wang @ 2019-09-09 22:23 UTC (permalink / raw)
To: Hillf Danton
Cc: syzbot, David Miller, Jamal Hadi Salim, Jiri Pirko, LKML,
Linux Kernel Network Developers, syzkaller-bugs,
Vinicius Costa Gomes
In-Reply-To: <20190908072907.15220-1-hdanton@sina.com>
On Sun, Sep 8, 2019 at 12:29 AM Hillf Danton <hdanton@sina.com> wrote:
>
>
> > syzbot found the following crash on Sat, 07 Sep 2019 23:08:08 -0700
> >
> > HEAD commit: 3b47fd5c Merge tag 'nfs-for-5.3-4' of git://git.linux-nfs...
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14854e71600000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=144488c6c6c6d2b6
> > dashboard link: https://syzkaller.appspot.com/bug?extid=3a8d6a998cbb73bcf337
> > compiler: clang version 9.0.0 (/home/glider/llvm/clang
> > 80fee25776c2fb61e74c1ecb1a523375c2500b69)
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17998f9e600000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10421efa600000
> >
> > general protection fault: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 9249 Comm: syz-executor457 Not tainted 5.3.0-rc7+ #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__list_del_entry_valid+0x6b/0x100 lib/list_debug.c:51
> > Code: 4c 89 f7 e8 97 d0 58 fe 48 ba 00 01 00 00 00 00 ad de 49 8b 1e 48 39
> > d3 74 54 48 83 c2 22 49 39 d7 74 5e 4c 89 f8 48 c1 e8 03 <42> 80 3c 20 00
> > 74 08 4c 89 ff e8 66 d0 58 fe 49 8b 17 4c 39 f2 75
> > RSP: 0018:ffff88809898f568 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> > RDX: dead000000000122 RSI: 0000000000000004 RDI: ffff88809fb5a7e8
> > RBP: ffff88809898f588 R08: dffffc0000000000 R09: ffffed1013131ea8
> > R10: ffffed1013131ea8 R11: 0000000000000000 R12: dffffc0000000000
> > R13: ffff88809fb5a480 R14: ffff88809fb5a7e0 R15: 0000000000000000
> > FS: 00005555568cb880(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000610 CR3: 00000000a3968000 CR4: 00000000001406f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > __list_del_entry include/linux/list.h:131 [inline]
> > list_del include/linux/list.h:139 [inline]
> > cbs_destroy+0x85/0x3e0 net/sched/sch_cbs.c:435
> > qdisc_create+0xff8/0x13e0 net/sched/sch_api.c:1302
> > tc_modify_qdisc+0x989/0x1ea0 net/sched/sch_api.c:1652
> > rtnetlink_rcv_msg+0x889/0xd40 net/core/rtnetlink.c:5223
> > netlink_rcv_skb+0x19e/0x3d0 net/netlink/af_netlink.c:2477
> > rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:5241
> > netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
> > netlink_unicast+0x787/0x900 net/netlink/af_netlink.c:1328
> > netlink_sendmsg+0x993/0xc50 net/netlink/af_netlink.c:1917
> > sock_sendmsg_nosec net/socket.c:637 [inline]
> > sock_sendmsg net/socket.c:657 [inline]
> > ___sys_sendmsg+0x60d/0x910 net/socket.c:2311
> > __sys_sendmsg net/socket.c:2356 [inline]
> > __do_sys_sendmsg net/socket.c:2365 [inline]
> > __se_sys_sendmsg net/socket.c:2363 [inline]
> > __x64_sys_sendmsg+0x17c/0x200 net/socket.c:2363
> > do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> --- a/net/sched/sch_cbs.c
> +++ b/net/sched/sch_cbs.c
> @@ -401,6 +401,7 @@ static int cbs_init(struct Qdisc *sch, s
> if (!q->qdisc)
> return -ENOMEM;
>
> + INIT_LIST_HEAD(&q->cbs_list);
> qdisc_hash_add(q->qdisc, false);
>
> q->queue = sch->dev_queue - netdev_get_tx_queue(dev, 0);
There is already a patch for the same bug:
http://patchwork.ozlabs.org/patch/1154697/
Vinicius, can you send V2 as David requested?
Thanks.
^ permalink raw reply
* Re: [RFC bpf-next 2/7] bpf: extend bpf_pcap support to tracing programs
From: Alan Maguire @ 2019-09-09 22:25 UTC (permalink / raw)
To: Yonghong Song
Cc: Alan Maguire, ast@kernel.org, daniel@iogearbox.net, Martin Lau,
Song Liu, davem@davemloft.net, jakub.kicinski@netronome.com,
hawk@kernel.org, john.fastabend@gmail.com, rostedt@goodmis.org,
mingo@redhat.com, quentin.monnet@netronome.com, Andrey Ignatov,
joe@wand.net.nz, acme@redhat.com, jolsa@kernel.org,
alexey.budankov@linux.intel.com, gregkh@linuxfoundation.org,
namhyung@kernel.org, sdf@google.com, f.fainelli@gmail.com,
shuah@kernel.org, peter@lekensteyn.nl, ivan@cloudflare.com,
Andrii Nakryiko, bhole_prashant_q7@lab.ntt.co.jp,
david.calavera@gmail.com, danieltimlee@gmail.com,
Takshak Chahande, netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org, toke, jbenc, acme
In-Reply-To: <89305ec8-7e03-3cd0-4e39-c3760dd3477b@fb.com>
On Sun, 8 Sep 2019, Yonghong Song wrote:
> For net side bpf_perf_event_output, we have
> static unsigned long bpf_skb_copy(void *dst_buff, const void *skb,
> unsigned long off, unsigned long len)
> {
> void *ptr = skb_header_pointer(skb, off, len, dst_buff);
>
> if (unlikely(!ptr))
> return len;
> if (ptr != dst_buff)
> memcpy(dst_buff, ptr, len);
>
> return 0;
> }
>
> BPF_CALL_5(bpf_skb_event_output, struct sk_buff *, skb, struct bpf_map
> *, map,
> u64, flags, void *, meta, u64, meta_size)
> {
> u64 skb_size = (flags & BPF_F_CTXLEN_MASK) >> 32;
>
> if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
> return -EINVAL;
> if (unlikely(skb_size > skb->len))
> return -EFAULT;
>
> return bpf_event_output(map, flags, meta, meta_size, skb, skb_size,
> bpf_skb_copy);
> }
>
> It does not really consider output all the frags.
> I understand that to get truly all packet data, frags should be
> considered, but seems we did not do it before? I am wondering
> whether we need to do here.
Thanks for the feedback! In experimenting with packet capture,
my original hope was to keep things simple and avoid fragment parsing
if possible. However if scatter-gather is enabled for the networking
device, or indeed if it's running in a VM it turns out a lot of the
interesting packet data ends up in the fragments on transmit (ssh
headers, http headers etc). So I think it would be worth considering
adding support for fragment traversal. It's not needed as much
in the skb program case - we can always pullup the skb - but in
the tracing situation we probably wouldn't want to do something
that invasive in tracing context.
Fragment traversal might be worth breaking out as a separate patchset,
perhaps triggered by a specific flag to bpf_skb_event_output?
Feedback from folks at Linux Plumbers (I hope I'm summarizing correctly)
seemed to agree with what you mentioned WRT the first patch in this
series. The gist was we probably don't want to force the metadata to be a
specific packet capture type; we'd rather use the existing perf event
mechanisms and if we are indeed doing packet capture, simply specify that
data in the program as metadata.
I'd be happy with that approach myself if I could capture skb
fragments in tracing programs - being able to do that would give
equivalent functionality to what I proposed but without having a packet
capture-specific helper.
>
> If we indeed do not need to handle frags here, I think maybe
> bpf_probe_read() in existing bpf kprobe function should be
> enough, we do not need this helper?
>
Certainly for many use cases, that will get you most of what you need -
particularly if you're just looking at L2 to L4 data. For full packet
capture however I think we may need to think about fragment traversal.
> > +
> > +/* Derive protocol for some of the easier cases. For tracing, a probe point
> > + * may be dealing with packets in various states. Common cases are IP
> > + * packets prior to adding MAC header (_PCAP_TYPE_IP) and a full packet
> > + * (_PCAP_TYPE_ETH). For other cases the caller must specify the
> > + * protocol they expect. Other heuristics for packet identification
> > + * should be added here as needed, since determining the packet type
> > + * ensures we do not capture packets that fail to match the desired
> > + * pcap type in BPF_F_PCAP_STRICT_TYPE mode.
> > + */
> > +static inline int bpf_skb_protocol_get(struct sk_buff *skb)
> > +{
> > + switch (htons(skb->protocol)) {
> > + case ETH_P_IP:
> > + case ETH_P_IPV6:
> > + if (skb_network_header(skb) == skb->data)
> > + return BPF_PCAP_TYPE_IP;
> > + else
> > + return BPF_PCAP_TYPE_ETH;
> > + default:
> > + return BPF_PCAP_TYPE_UNSET;
> > + }
> > +}
> > +
> > +BPF_CALL_5(bpf_trace_pcap, void *, data, u32, size, struct bpf_map *, map,
> > + int, protocol_wanted, u64, flags)
>
> Up to now, for helpers, verifier has a way to verifier it is used
> properly regarding to the context. For example, for xdp version
> perf_event_output, the help prototype,
> BPF_CALL_5(bpf_xdp_event_output, struct xdp_buff *, xdp, struct
> bpf_map *, map,
> u64, flags, void *, meta, u64, meta_size)
> the verifier is able to guarantee that the first parameter
> has correct type xdp_buff, not something from type cast.
> .arg1_type = ARG_PTR_TO_CTX,
>
> This helper, in the below we have
> .arg1_type = ARG_ANYTHING,
>
> So it is not really enforced. Bringing BTF can help, but type
> name matching typically bad.
>
>
One thing we were discussing - and I think this is similar to what
you're suggesting - is to investigate if there might be a way to
leverage BTF to provide additional guarantees that the tracing
data we are handling is indeed an skb. Specifically if we
trace a kprobe function argument or a tracepoint function, and
if we had that guarantee, we could perhaps invoke the skb-style
perf event output function (trace both the skb data and the metadata).
The challenge would be how to do that type-based matching; we'd
need the function argument information from BTF _and_ need to
somehow associate it at probe attach time.
Thanks again for looking at the code!
Alan
^ permalink raw reply
* Re: WARNING in cbs_dequeue_soft
From: Cong Wang @ 2019-09-09 22:28 UTC (permalink / raw)
To: syzbot
Cc: David Miller, Jamal Hadi Salim, Jiri Pirko, Leandro Dorileo, LKML,
Linux Kernel Network Developers, syzkaller-bugs, vedang.patel
In-Reply-To: <000000000000e3a76c0592047ea7@google.com>
#syz fix: net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
^ permalink raw reply
* [PATCH] bpf: validate bpf_func when BPF_JIT is enabled
From: Sami Tolvanen @ 2019-09-09 22:32 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann
Cc: Kees Cook, Martin KaFai Lau, Song Liu, Yonghong Song, netdev, bpf,
linux-kernel, Sami Tolvanen
With CONFIG_BPF_JIT, the kernel makes indirect calls to dynamically
generated code. This change adds basic sanity checking to ensure
we are jumping to a valid location, which narrows down the attack
surface on the stored pointer. This also prepares the code for future
Control-Flow Integrity (CFI) checking, which adds indirect call
validation to call targets that can be determined at compile-time, but
cannot validate calls to jited functions.
In addition, this change adds a weak arch_bpf_jit_check_func function,
which architectures that implement BPF JIT can override to perform
additional validation, such as verifying that the pointer points to
the correct memory region.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
---
include/linux/filter.h | 26 ++++++++++++++++++++++++--
kernel/bpf/core.c | 25 +++++++++++++++++++++++++
2 files changed, 49 insertions(+), 2 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 92c6e31fb008..abfb0e1b21a8 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -511,7 +511,10 @@ struct sock_fprog_kern {
struct sock_filter *filter;
};
+#define BPF_BINARY_HEADER_MAGIC 0x05de0e82
+
struct bpf_binary_header {
+ u32 magic;
u32 pages;
/* Some arches need word alignment for their instructions */
u8 image[] __aligned(4);
@@ -553,20 +556,39 @@ struct sk_filter {
DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
+#ifdef CONFIG_BPF_JIT
+/*
+ * With JIT, the kernel makes an indirect call to dynamically generated
+ * code. Use bpf_call_func to perform additional validation of the call
+ * target to narrow down attack surface. Architectures implementing BPF
+ * JIT can override arch_bpf_jit_check_func for arch-specific checking.
+ */
+extern unsigned int bpf_call_func(const struct bpf_prog *prog,
+ const void *ctx);
+
+extern bool arch_bpf_jit_check_func(const struct bpf_prog *prog);
+#else
+static inline unsigned int bpf_call_func(const struct bpf_prog *prog,
+ const void *ctx)
+{
+ return prog->bpf_func(ctx, prog->insnsi);
+}
+#endif
+
#define BPF_PROG_RUN(prog, ctx) ({ \
u32 ret; \
cant_sleep(); \
if (static_branch_unlikely(&bpf_stats_enabled_key)) { \
struct bpf_prog_stats *stats; \
u64 start = sched_clock(); \
- ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
+ ret = bpf_call_func(prog, ctx); \
stats = this_cpu_ptr(prog->aux->stats); \
u64_stats_update_begin(&stats->syncp); \
stats->cnt++; \
stats->nsecs += sched_clock() - start; \
u64_stats_update_end(&stats->syncp); \
} else { \
- ret = (*(prog)->bpf_func)(ctx, (prog)->insnsi); \
+ ret = bpf_call_func(prog, ctx); \
} \
ret; })
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 66088a9e9b9e..7aad58f67105 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -792,6 +792,30 @@ void __weak bpf_jit_free_exec(void *addr)
module_memfree(addr);
}
+#ifdef CONFIG_BPF_JIT
+bool __weak arch_bpf_jit_check_func(const struct bpf_prog *prog)
+{
+ return true;
+}
+
+unsigned int bpf_call_func(const struct bpf_prog *prog, const void *ctx)
+{
+ const struct bpf_binary_header *hdr = bpf_jit_binary_hdr(prog);
+
+ if (!IS_ENABLED(CONFIG_BPF_JIT_ALWAYS_ON) && !prog->jited)
+ return prog->bpf_func(ctx, prog->insnsi);
+
+ if (unlikely(hdr->magic != BPF_BINARY_HEADER_MAGIC ||
+ !arch_bpf_jit_check_func(prog))) {
+ WARN(1, "attempt to jump to an invalid address");
+ return 0;
+ }
+
+ return prog->bpf_func(ctx, prog->insnsi);
+}
+EXPORT_SYMBOL_GPL(bpf_call_func);
+#endif
+
struct bpf_binary_header *
bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
unsigned int alignment,
@@ -818,6 +842,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
/* Fill space with illegal/arch-dep instructions. */
bpf_fill_ill_insns(hdr, size);
+ hdr->magic = BPF_BINARY_HEADER_MAGIC;
hdr->pages = pages;
hole = min_t(unsigned int, size - (proglen + sizeof(*hdr)),
PAGE_SIZE - sizeof(*hdr));
--
2.23.0.162.g0b9fbb3734-goog
^ permalink raw reply related
* [RFC PATCH net-next 0/2] more IPv4 unicast extensions
From: Dave Taht @ 2019-09-09 22:37 UTC (permalink / raw)
To: netdev; +Cc: Dave Taht
Adding support in linux for the 240/4 and 0/8 ipv4 address ranges were
easy, no brainer removals of obsolete support for obsolete
specifications. Has anyone noticed yet?
The following two patches are intended as discussion points fot
my https://linuxplumbersconf.org/event/4/contributions/457/
talk today.
Dave Taht (2):
Allow 225/8-231/8 as unicast
Reduce localhost to a /16
include/linux/in.h | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--
2.17.1
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox