* Re: [PATCH] ixgbe: Fix secpath usage for IPsec TX offload.
From: David Miller @ 2019-09-12 11:43 UTC (permalink / raw)
To: steffen.klassert
Cc: jeffrey.t.kirsher, intel-wired-lan, michael, snelson, netdev
In-Reply-To: <20190912110144.GS2879@gauss3.secunet.de>
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 12 Sep 2019 13:01:44 +0200
> The ixgbe driver currently does IPsec TX offloading
> based on an existing secpath. However, the secpath
> can also come from the RX side, in this case it is
> misinterpreted for TX offload and the packets are
> dropped with a "bad sa_idx" error. Fix this by using
> the xfrm_offload() function to test for TX offload.
>
> Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
> Reported-by: Michael Marley <michael@michaelmarley.com>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
I'll apply this directly and queue it up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH] cxgb4: Fix spelling typos
From: David Miller @ 2019-09-12 11:51 UTC (permalink / raw)
To: arkadiusz; +Cc: vishal, netdev, linux-kernel
In-Reply-To: <20190910204901.11741-1-arkadiusz@drabczyk.org>
From: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Date: Tue, 10 Sep 2019 22:49:01 +0200
> Fix several spelling typos in comments in t4_hw.c.
>
> Signed-off-by: Arkadiusz Drabczyk <arkadiusz@drabczyk.org>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH net v2 01/11] net: core: limit nested device depth
From: Taehee Yoo @ 2019-09-12 11:54 UTC (permalink / raw)
To: David Miller
Cc: Netdev, j.vosburgh, vfalico, Andy Gospodarek,
Jiří Pírko, sd, Roopa Prabhu, saeedm, manishc,
rahulv, kys, haiyangz, sthemmin, sashal, hare, varun, ubraun,
kgraul, Jay Vosburgh
In-Reply-To: <20190912.133717.257813019167130934.davem@davemloft.net>
On Thu, 12 Sep 2019 at 20:37, David Miller <davem@davemloft.net> wrote:
>
> From: Taehee Yoo <ap420073@gmail.com>
> Date: Thu, 12 Sep 2019 19:14:37 +0900
>
> > On Thu, 12 Sep 2019 at 18:38, David Miller <davem@davemloft.net> wrote:
> >>
> >> From: Taehee Yoo <ap420073@gmail.com>
> >> Date: Thu, 12 Sep 2019 12:56:19 +0900
> >>
> >> > I tested with this reproducer commands without lockdep.
> >> >
> >> > ip link add dummy0 type dummy
> >> > ip link add link dummy0 name vlan1 type vlan id 1
> >> > ip link set vlan1 up
> >> >
> >> > for i in {2..200}
> >> > do
> >> > let A=$i-1
> >> >
> >> > ip link add name vlan$i link vlan$A type vlan id $i
> >> > done
> >> > ip link del vlan1 <-- this command is added.
> >>
> >> Is there any other device type which allows arbitrary nesting depth
> >> in this manner other than VLAN? Perhaps it is the VLAN nesting
> >> depth that we should limit instead of all of this extra code.
> >
> > Below device types have the same problem.
> > VLAN, BONDING, TEAM, VXLAN, MACVLAN, and MACSEC.
> > All the below test commands reproduce a panic.
>
> I think then we need to move the traversals over to a iterative
> rather than recursive algorithm.
I agree with that.
So I will check it out then send a v3 patchset.
Thank you!
^ permalink raw reply
* Re: [PATCH] sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
From: David Miller @ 2019-09-12 11:56 UTC (permalink / raw)
To: christophe.jaillet
Cc: vyasevich, nhorman, marcelo.leitner, linux-sctp, netdev,
linux-kernel, kernel-janitors
In-Reply-To: <20190911160239.10734-1-christophe.jaillet@wanadoo.fr>
From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date: Wed, 11 Sep 2019 18:02:39 +0200
> The '.exit' functions from 'pernet_operations' structure should be marked
> as __net_exit, not __net_init.
>
> Fixes: 8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization")
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Applied and queued up for -stable.
^ permalink raw reply
* Re: [PATCH v2 net-next 3/7] net: dsa: sja1105: Switch to hardware operations for PTP
From: Vladimir Oltean @ 2019-09-12 11:58 UTC (permalink / raw)
To: David Miller; +Cc: f.fainelli, vivien.didelot, andrew, richardcochran, netdev
In-Reply-To: <20190912.133753.1473374980190418320.davem@davemloft.net>
On 12/09/2019, David Miller <davem@davemloft.net> wrote:
> From: Vladimir Oltean <olteanv@gmail.com>
> Date: Thu, 12 Sep 2019 11:17:11 +0100
>
>> Hi Dave,
>>
>> On 12/09/2019, David Miller <davem@davemloft.net> wrote:
>>> From: Vladimir Oltean <olteanv@gmail.com>
>>> Date: Tue, 10 Sep 2019 04:34:57 +0300
>>>
>>>> static int sja1105_ptp_adjfine(struct ptp_clock_info *ptp, long
>>>> scaled_ppm)
>>>> {
>>>> struct sja1105_private *priv = ptp_to_sja1105(ptp);
>>>> + const struct sja1105_regs *regs = priv->info->regs;
>>>> s64 clkrate;
>>>> + int rc;
>>> ..
>>>> -static int sja1105_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
>>>> -{
>>>> - struct sja1105_private *priv = ptp_to_sja1105(ptp);
>>>> + rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptpclkrate,
>>>> + &clkrate, 4);
>>>
>>> You're sending an arbitrary 4 bytes of a 64-bit value. This works on
>>> little
>>> endian
>>> but will not on big endian.
>>>
>>> Please properly copy this clkrate into a "u32" variable and pass that
>>> into
>>> sja1105_spi_send_int().
>>>
>>> It also seems to suggest that you want to use abs() to perform that
>>> weird
>>> centering around 1 << 31 calculation.
>>>
>>> Thank you.
>>>
>>
>> It looks 'wrong' but it isn't. The driver uses the 'packing' framework
>> (lib/packing.c) which is endian-agnostic (converts between CPU and
>> peripheral endianness) and operates on u64 as the CPU word size. On
>> the contrary, u32 would not work with the 'packing' API in its current
>> form, but I don't see yet any reasons to extend it (packing64,
>> packing32 etc).
>
> That's extremely unintuitive and makes auditing patches next to impossible.
>
Through static analysis maybe you're right - I don't yet grasp what it
takes to prove an API is used correctly at build time, but I can look
at how others are doing it.
At runtime, there is sanity checking throughout and all the bugs I've
had while calling packing() incorrectly I caught them right away.
spi_send_int in particular is just a wrapper for packing an N byte
sized word (which fits in u64) in bits [8*N-1, 0] of the buffer, as
per peripheral memory ordering quirks. This is perhaps the trivial
case that can be handled through other APIs as well, but there are
times when I need to pack an u64 into a bit field that crosses even
64-bit boundaries. Combine that with weird byte ordering, and the
sja1105 driver would have simply not existed if it had to open-code
all of that. The API's high-level goal is for readers to be able to
follow along smoothly with the register manual.
All I'm saying is that I'm willing to make the packing API more
sane/checkable, but at the moment I just don't see the better
alternative.
Thanks,
-Vladimir
^ permalink raw reply
* Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: Jiri Pirko @ 2019-09-12 11:59 UTC (permalink / raw)
To: Roopa Prabhu
Cc: Michal Kubecek, netdev, David Miller, Jakub Kicinski, David Ahern,
Stephen Hemminger, dcbw, Andrew Lunn, parav, Saeed Mahameed,
mlxsw
In-Reply-To: <20190830170342.GR2312@nanopsycho>
Fri, Aug 30, 2019 at 07:03:42PM CEST, jiri@resnulli.us wrote:
>Fri, Aug 30, 2019 at 04:35:23PM CEST, roopa@cumulusnetworks.com wrote:
[...]
>>
>>so to summarize, i think we have discussed the following options to
>>update a netlink list attribute so far:
>>(a) encode an optional attribute/flag in the list attribute in
>>RTM_SETLINK to indicate if it is a add or del
>>(b) Use a flag in RTM_SETLINK and RTM_DELINK to indicate add/del
>>(close to bridge vlan add/del)
>
>Nope, bridge vlan add/del is done according to the cmd, not any flag.
>
>
>>(c) introduce a separate generic msg type to add/del to a list
>>attribute (IIUC this does need a separate msg type per subsystem or
>>netlink API)
Getting back to this, sorry.
Thinking about it for some time, a,b,c have all their issues. Why can't
we have another separate cmd as I originally proposed in this RFC? Does
anyone have any argument against it? Could you please describe?
Because otherwise, I don't feel comfortable going to any of a,b,c :(
Thanks!
^ permalink raw reply
* Re: [PATCH net v2 01/11] net: core: limit nested device depth
From: Taehee Yoo @ 2019-09-12 12:49 UTC (permalink / raw)
To: David Miller
Cc: Netdev, j.vosburgh, vfalico, Andy Gospodarek,
Jiří Pírko, sd, Roopa Prabhu, saeedm, manishc,
rahulv, kys, haiyangz, sthemmin, sashal, hare, varun, ubraun,
kgraul, Jay Vosburgh
In-Reply-To: <20190912.133717.257813019167130934.davem@davemloft.net>
On Thu, 12 Sep 2019 at 20:37, David Miller <davem@davemloft.net> wrote:
>
> From: Taehee Yoo <ap420073@gmail.com>
> Date: Thu, 12 Sep 2019 19:14:37 +0900
>
> > On Thu, 12 Sep 2019 at 18:38, David Miller <davem@davemloft.net> wrote:
> >>
> >> From: Taehee Yoo <ap420073@gmail.com>
> >> Date: Thu, 12 Sep 2019 12:56:19 +0900
> >>
> >> > I tested with this reproducer commands without lockdep.
> >> >
> >> > ip link add dummy0 type dummy
> >> > ip link add link dummy0 name vlan1 type vlan id 1
> >> > ip link set vlan1 up
> >> >
> >> > for i in {2..200}
> >> > do
> >> > let A=$i-1
> >> >
> >> > ip link add name vlan$i link vlan$A type vlan id $i
> >> > done
> >> > ip link del vlan1 <-- this command is added.
> >>
> >> Is there any other device type which allows arbitrary nesting depth
> >> in this manner other than VLAN? Perhaps it is the VLAN nesting
> >> depth that we should limit instead of all of this extra code.
> >
> > Below device types have the same problem.
> > VLAN, BONDING, TEAM, VXLAN, MACVLAN, and MACSEC.
> > All the below test commands reproduce a panic.
>
> I think then we need to move the traversals over to a iterative
> rather than recursive algorithm.
Just to clarify, I have a question.
There are two recursive routines in the code.
a) netdev_walk_all_{lower/upper}_dev() that are used to traversal
all of their lower or upper devices.
b) VLAN, BONDING, TEAM, VXLAN, MACVLAN, and MACSEC
modules internally handle their lower/upper devices recursively
when an event is received such as unregistering.
what is the routine that you mentioned?
^ permalink raw reply
* [PATCH net-next 0/2] mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
From: Ido Schimmel @ 2019-09-12 13:07 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, shalomt, mlxsw, Ido Schimmel
From: Ido Schimmel <idosch@mellanox.com>
Shalom says:
While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.
Patch #1 registers the CPU port with devlink.
Patch #2 adds the ability to query the CPU port's shared buffer quotas and
occupancy.
Shalom Toledo (2):
mlxsw: spectrum: Register CPU port with devlink
mlxsw: spectrum_buffers: Add the ability to query the CPU port's
shared buffer
drivers/net/ethernet/mellanox/mlxsw/core.c | 33 ++++++++++++
drivers/net/ethernet/mellanox/mlxsw/core.h | 5 ++
.../net/ethernet/mellanox/mlxsw/spectrum.c | 47 +++++++++++++++++
.../mellanox/mlxsw/spectrum_buffers.c | 51 ++++++++++++++++---
4 files changed, 128 insertions(+), 8 deletions(-)
--
2.21.0
^ permalink raw reply
* [PATCH net-next 1/2] mlxsw: spectrum: Register CPU port with devlink
From: Ido Schimmel @ 2019-09-12 13:07 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, shalomt, mlxsw, Ido Schimmel
In-Reply-To: <20190912130740.22061-1-idosch@idosch.org>
From: Shalom Toledo <shalomt@mellanox.com>
Register CPU port with devlink.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
drivers/net/ethernet/mellanox/mlxsw/core.c | 33 +++++++++++++
drivers/net/ethernet/mellanox/mlxsw/core.h | 5 ++
.../net/ethernet/mellanox/mlxsw/spectrum.c | 47 +++++++++++++++++++
3 files changed, 85 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 963a2b4b61b1..94f83d2be17e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1892,6 +1892,39 @@ void mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port)
}
EXPORT_SYMBOL(mlxsw_core_port_fini);
+int mlxsw_core_cpu_port_init(struct mlxsw_core *mlxsw_core,
+ void *port_driver_priv,
+ const unsigned char *switch_id,
+ unsigned char switch_id_len)
+{
+ struct devlink *devlink = priv_to_devlink(mlxsw_core);
+ struct mlxsw_core_port *mlxsw_core_port =
+ &mlxsw_core->ports[MLXSW_PORT_CPU_PORT];
+ struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
+ int err;
+
+ mlxsw_core_port->local_port = MLXSW_PORT_CPU_PORT;
+ mlxsw_core_port->port_driver_priv = port_driver_priv;
+ devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_CPU,
+ 0, false, 0, switch_id, switch_id_len);
+ err = devlink_port_register(devlink, devlink_port, MLXSW_PORT_CPU_PORT);
+ if (err)
+ memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
+ return err;
+}
+EXPORT_SYMBOL(mlxsw_core_cpu_port_init);
+
+void mlxsw_core_cpu_port_fini(struct mlxsw_core *mlxsw_core)
+{
+ struct mlxsw_core_port *mlxsw_core_port =
+ &mlxsw_core->ports[MLXSW_PORT_CPU_PORT];
+ struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
+
+ devlink_port_unregister(devlink_port);
+ memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
+}
+EXPORT_SYMBOL(mlxsw_core_cpu_port_fini);
+
void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
void *port_driver_priv, struct net_device *dev)
{
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index b65a17d49e43..5d7d2ab6d155 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -177,6 +177,11 @@ int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port,
const unsigned char *switch_id,
unsigned char switch_id_len);
void mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port);
+int mlxsw_core_cpu_port_init(struct mlxsw_core *mlxsw_core,
+ void *port_driver_priv,
+ const unsigned char *switch_id,
+ unsigned char switch_id_len);
+void mlxsw_core_cpu_port_fini(struct mlxsw_core *mlxsw_core);
void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
void *port_driver_priv, struct net_device *dev);
void mlxsw_core_port_ib_set(struct mlxsw_core *mlxsw_core, u8 local_port,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 91e4792bb7e7..1fc73a9ad84d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3872,6 +3872,46 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
mlxsw_core_port_fini(mlxsw_sp->core, local_port);
}
+static int mlxsw_sp_cpu_port_create(struct mlxsw_sp *mlxsw_sp)
+{
+ struct mlxsw_sp_port *mlxsw_sp_port;
+ int err;
+
+ mlxsw_sp_port = kzalloc(sizeof(*mlxsw_sp_port), GFP_KERNEL);
+ if (!mlxsw_sp_port)
+ return -ENOMEM;
+
+ mlxsw_sp_port->mlxsw_sp = mlxsw_sp;
+ mlxsw_sp_port->local_port = MLXSW_PORT_CPU_PORT;
+
+ mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = mlxsw_sp_port;
+
+ err = mlxsw_core_cpu_port_init(mlxsw_sp->core,
+ mlxsw_sp_port,
+ mlxsw_sp->base_mac,
+ sizeof(mlxsw_sp->base_mac));
+ if (err) {
+ dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize core CPU port\n");
+ goto err_core_cpu_port_init;
+ }
+
+ return err;
+
+err_core_cpu_port_init:
+ mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = NULL;
+ kfree(mlxsw_sp_port);
+ return err;
+}
+
+static void mlxsw_sp_cpu_port_remove(struct mlxsw_sp *mlxsw_sp)
+{
+ struct mlxsw_sp_port *mlxsw_sp_port = mlxsw_sp->ports[0];
+
+ mlxsw_core_cpu_port_fini(mlxsw_sp->core);
+ mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = NULL;
+ kfree(mlxsw_sp_port);
+}
+
static bool mlxsw_sp_port_created(struct mlxsw_sp *mlxsw_sp, u8 local_port)
{
return mlxsw_sp->ports[local_port] != NULL;
@@ -3884,6 +3924,7 @@ static void mlxsw_sp_ports_remove(struct mlxsw_sp *mlxsw_sp)
for (i = 1; i < mlxsw_core_max_ports(mlxsw_sp->core); i++)
if (mlxsw_sp_port_created(mlxsw_sp, i))
mlxsw_sp_port_remove(mlxsw_sp, i);
+ mlxsw_sp_cpu_port_remove(mlxsw_sp);
kfree(mlxsw_sp->port_to_module);
kfree(mlxsw_sp->ports);
}
@@ -3908,6 +3949,10 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
goto err_port_to_module_alloc;
}
+ err = mlxsw_sp_cpu_port_create(mlxsw_sp);
+ if (err)
+ goto err_cpu_port_create;
+
for (i = 1; i < max_ports; i++) {
/* Mark as invalid */
mlxsw_sp->port_to_module[i] = -1;
@@ -3931,6 +3976,8 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
for (i--; i >= 1; i--)
if (mlxsw_sp_port_created(mlxsw_sp, i))
mlxsw_sp_port_remove(mlxsw_sp, i);
+ mlxsw_sp_cpu_port_remove(mlxsw_sp);
+err_cpu_port_create:
kfree(mlxsw_sp->port_to_module);
err_port_to_module_alloc:
kfree(mlxsw_sp->ports);
--
2.21.0
^ permalink raw reply related
* [PATCH net-next 2/2] mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
From: Ido Schimmel @ 2019-09-12 13:07 UTC (permalink / raw)
To: netdev; +Cc: davem, jiri, shalomt, mlxsw, Ido Schimmel
In-Reply-To: <20190912130740.22061-1-idosch@idosch.org>
From: Shalom Toledo <shalomt@mellanox.com>
While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.
Since the CPU port has no ingress buffers, all the shared buffers ingress
information will be cleared.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
.../mellanox/mlxsw/spectrum_buffers.c | 51 ++++++++++++++++---
1 file changed, 43 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
index 888ba4300bcc..b9eeae37a4dc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
@@ -250,6 +250,10 @@ static int mlxsw_sp_sb_pm_occ_clear(struct mlxsw_sp *mlxsw_sp, u8 local_port,
&mlxsw_sp->sb_vals->pool_dess[pool_index];
char sbpm_pl[MLXSW_REG_SBPM_LEN];
+ if (local_port == MLXSW_PORT_CPU_PORT &&
+ des->dir == MLXSW_REG_SBXX_DIR_INGRESS)
+ return 0;
+
mlxsw_reg_sbpm_pack(sbpm_pl, local_port, des->pool, des->dir,
true, 0, 0);
return mlxsw_reg_trans_query(mlxsw_sp->core, MLXSW_REG(sbpm), sbpm_pl,
@@ -273,6 +277,10 @@ static int mlxsw_sp_sb_pm_occ_query(struct mlxsw_sp *mlxsw_sp, u8 local_port,
char sbpm_pl[MLXSW_REG_SBPM_LEN];
struct mlxsw_sp_sb_pm *pm;
+ if (local_port == MLXSW_PORT_CPU_PORT &&
+ des->dir == MLXSW_REG_SBXX_DIR_INGRESS)
+ return 0;
+
pm = mlxsw_sp_sb_pm_get(mlxsw_sp, local_port, pool_index);
mlxsw_reg_sbpm_pack(sbpm_pl, local_port, des->pool, des->dir,
false, 0, 0);
@@ -1085,6 +1093,11 @@ int mlxsw_sp_sb_port_pool_set(struct mlxsw_core_port *mlxsw_core_port,
u32 max_buff;
int err;
+ if (local_port == MLXSW_PORT_CPU_PORT) {
+ NL_SET_ERR_MSG_MOD(extack, "Changing CPU port's threshold is forbidden");
+ return -EINVAL;
+ }
+
err = mlxsw_sp_sb_threshold_in(mlxsw_sp, pool_index,
threshold, &max_buff, extack);
if (err)
@@ -1130,6 +1143,11 @@ int mlxsw_sp_sb_tc_pool_bind_set(struct mlxsw_core_port *mlxsw_core_port,
u32 max_buff;
int err;
+ if (local_port == MLXSW_PORT_CPU_PORT) {
+ NL_SET_ERR_MSG_MOD(extack, "Changing CPU port's binding is forbidden");
+ return -EINVAL;
+ }
+
if (dir != mlxsw_sp->sb_vals->pool_dess[pool_index].dir) {
NL_SET_ERR_MSG_MOD(extack, "Binding egress TC to ingress pool and vice versa is forbidden");
return -EINVAL;
@@ -1187,6 +1205,11 @@ static void mlxsw_sp_sb_sr_occ_query_cb(struct mlxsw_core *mlxsw_core,
local_port < mlxsw_core_max_ports(mlxsw_core); local_port++) {
if (!mlxsw_sp->ports[local_port])
continue;
+ if (local_port == MLXSW_PORT_CPU_PORT) {
+ /* Ingress quotas are not supported for the CPU port */
+ masked_count++;
+ continue;
+ }
for (i = 0; i < MLXSW_SP_SB_ING_TC_COUNT; i++) {
cm = mlxsw_sp_sb_cm_get(mlxsw_sp, local_port, i,
MLXSW_REG_SBXX_DIR_INGRESS);
@@ -1222,7 +1245,7 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
char *sbsr_pl;
u8 masked_count;
u8 local_port_1;
- u8 local_port = 0;
+ u8 local_port;
int i;
int err;
int err2;
@@ -1231,8 +1254,8 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
if (!sbsr_pl)
return -ENOMEM;
+ local_port = MLXSW_PORT_CPU_PORT;
next_batch:
- local_port++;
local_port_1 = local_port;
masked_count = 0;
mlxsw_reg_sbsr_pack(sbsr_pl, false);
@@ -1243,7 +1266,11 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
for (; local_port < mlxsw_core_max_ports(mlxsw_core); local_port++) {
if (!mlxsw_sp->ports[local_port])
continue;
- mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl, local_port, 1);
+ if (local_port != MLXSW_PORT_CPU_PORT) {
+ /* Ingress quotas are not supported for the CPU port */
+ mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl,
+ local_port, 1);
+ }
mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, local_port, 1);
for (i = 0; i < mlxsw_sp->sb_vals->pool_count; i++) {
err = mlxsw_sp_sb_pm_occ_query(mlxsw_sp, local_port, i,
@@ -1264,8 +1291,10 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
cb_priv);
if (err)
goto out;
- if (local_port < mlxsw_core_max_ports(mlxsw_core))
+ if (local_port < mlxsw_core_max_ports(mlxsw_core)) {
+ local_port++;
goto next_batch;
+ }
out:
err2 = mlxsw_reg_trans_bulk_wait(&bulk_list);
@@ -1282,7 +1311,7 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
LIST_HEAD(bulk_list);
char *sbsr_pl;
unsigned int masked_count;
- u8 local_port = 0;
+ u8 local_port;
int i;
int err;
int err2;
@@ -1291,8 +1320,8 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
if (!sbsr_pl)
return -ENOMEM;
+ local_port = MLXSW_PORT_CPU_PORT;
next_batch:
- local_port++;
masked_count = 0;
mlxsw_reg_sbsr_pack(sbsr_pl, true);
for (i = 0; i < MLXSW_SP_SB_ING_TC_COUNT; i++)
@@ -1302,7 +1331,11 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
for (; local_port < mlxsw_core_max_ports(mlxsw_core); local_port++) {
if (!mlxsw_sp->ports[local_port])
continue;
- mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl, local_port, 1);
+ if (local_port != MLXSW_PORT_CPU_PORT) {
+ /* Ingress quotas are not supported for the CPU port */
+ mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl,
+ local_port, 1);
+ }
mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, local_port, 1);
for (i = 0; i < mlxsw_sp->sb_vals->pool_count; i++) {
err = mlxsw_sp_sb_pm_occ_clear(mlxsw_sp, local_port, i,
@@ -1319,8 +1352,10 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
&bulk_list, NULL, 0);
if (err)
goto out;
- if (local_port < mlxsw_core_max_ports(mlxsw_core))
+ if (local_port < mlxsw_core_max_ports(mlxsw_core)) {
+ local_port++;
goto next_batch;
+ }
out:
err2 = mlxsw_reg_trans_bulk_wait(&bulk_list);
--
2.21.0
^ permalink raw reply related
* [PATCH net] net: dsa: Fix load order between DSA drivers and taggers
From: Andrew Lunn @ 2019-09-12 13:16 UTC (permalink / raw)
To: David Miller; +Cc: Florian Fainelli, Vivien Didelot, netdev, Andrew Lunn
The DSA core, DSA taggers and DSA drivers all make use of
module_init(). Hence they get initialised at device_initcall() time.
The ordering is non-deterministic. It can be a DSA driver is bound to
a device before the needed tag driver has been initialised, resulting
in the message:
No tagger for this switch
Rather than have this be fatal, return -EPROBE_DEFER so that it is
tried again later once all the needed drivers have been loaded.
Fixes: d3b8c04988ca ("dsa: Add boilerplate helper to register DSA tag driver modules")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
I did wonder if we should play with the core and tag drivers and make
them use subsystem_initcall(), but EPROBE_DEFER seems to be the more
preferred solution nowadays.
---
net/dsa/dsa2.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 3abd173ebacb..96f787cf9b6e 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -623,6 +623,8 @@ static int dsa_port_parse_cpu(struct dsa_port *dp, struct net_device *master)
tag_protocol = ds->ops->get_tag_protocol(ds, dp->index);
tag_ops = dsa_tag_driver_get(tag_protocol);
if (IS_ERR(tag_ops)) {
+ if (PTR_ERR(tag_ops) == -ENOPROTOOPT)
+ return -EPROBE_DEFER;
dev_warn(ds->dev, "No tagger for this switch\n");
return PTR_ERR(tag_ops);
}
--
2.23.0
^ permalink raw reply related
* [PATCH v4 1/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Alexandru Ardelean @ 2019-09-12 16:28 UTC (permalink / raw)
To: netdev, devicetree, linux-kernel
Cc: davem, robh+dt, mark.rutland, f.fainelli, hkallweit1, andrew,
mkubecek, Alexandru Ardelean
In-Reply-To: <20190912162812.402-1-alexandru.ardelean@analog.com>
The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
this feature is common across other PHYs (like EEE), and defining
`ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
The way EDPD works, is that the RX block is put to a lower power mode,
except for link-pulse detection circuits. The TX block is also put to low
power mode, but the PHY wakes-up periodically to send link pulses, to avoid
lock-ups in case the other side is also in EDPD mode.
Currently, there are 2 PHY drivers that look like they could use this new
PHY tunable feature: the `adin` && `micrel` PHYs.
The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
datasheet does not mention whether they can be disabled, but mentions that
they can modified.
The way this change is structured, is similar to the PHY tunable downshift
control:
* a `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` value is exposed to cover a default
TX interval; some PHYs could specify a certain value that makes sense
* `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
* `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD
As noted by the `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` the interval unit is 1
millisecond, which should cover a reasonable range of intervals:
- from 1 millisecond, which does not sound like much of a power-saver
- to ~65 seconds which is quite a lot to wait for a link to come up when
plugging a cable
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
---
include/uapi/linux/ethtool.h | 22 ++++++++++++++++++++++
net/core/ethtool.c | 6 ++++++
2 files changed, 28 insertions(+)
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index dd06302aa93e..8938b76c4ee3 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -259,10 +259,32 @@ struct ethtool_tunable {
#define ETHTOOL_PHY_FAST_LINK_DOWN_ON 0
#define ETHTOOL_PHY_FAST_LINK_DOWN_OFF 0xff
+/* Energy Detect Power Down (EDPD) is a feature supported by some PHYs, where
+ * the PHY's RX & TX blocks are put into a low-power mode when there is no
+ * link detected (typically cable is un-plugged). For RX, only a minimal
+ * link-detection is available, and for TX the PHY wakes up to send link pulses
+ * to avoid any lock-ups in case the peer PHY may also be running in EDPD mode.
+ *
+ * Some PHYs may support configuration of the wake-up interval for TX pulses,
+ * and some PHYs may support only disabling TX pulses entirely. For the latter
+ * a special value is required (ETHTOOL_PHY_EDPD_NO_TX) so that this can be
+ * configured from userspace (should the user want it).
+ *
+ * The interval units for TX wake-up are in milliseconds, since this should
+ * cover a reasonable range of intervals:
+ * - from 1 millisecond, which does not sound like much of a power-saver
+ * - to ~65 seconds which is quite a lot to wait for a link to come up when
+ * plugging a cable
+ */
+#define ETHTOOL_PHY_EDPD_DFLT_TX_MSECS 0xffff
+#define ETHTOOL_PHY_EDPD_NO_TX 0xfffe
+#define ETHTOOL_PHY_EDPD_DISABLE 0
+
enum phy_tunable_id {
ETHTOOL_PHY_ID_UNSPEC,
ETHTOOL_PHY_DOWNSHIFT,
ETHTOOL_PHY_FAST_LINK_DOWN,
+ ETHTOOL_PHY_EDPD,
/*
* Add your fresh new phy tunable attribute above and remember to update
* phy_tunable_strings[] in net/core/ethtool.c
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6288e69e94fc..c763106c73fc 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -133,6 +133,7 @@ phy_tunable_strings[__ETHTOOL_PHY_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
[ETHTOOL_ID_UNSPEC] = "Unspec",
[ETHTOOL_PHY_DOWNSHIFT] = "phy-downshift",
[ETHTOOL_PHY_FAST_LINK_DOWN] = "phy-fast-link-down",
+ [ETHTOOL_PHY_EDPD] = "phy-energy-detect-power-down",
};
static int ethtool_get_features(struct net_device *dev, void __user *useraddr)
@@ -2451,6 +2452,11 @@ static int ethtool_phy_tunable_valid(const struct ethtool_tunable *tuna)
tuna->type_id != ETHTOOL_TUNABLE_U8)
return -EINVAL;
break;
+ case ETHTOOL_PHY_EDPD:
+ if (tuna->len != sizeof(u16) ||
+ tuna->type_id != ETHTOOL_TUNABLE_U16)
+ return -EINVAL;
+ break;
default:
return -EINVAL;
}
--
2.20.1
^ permalink raw reply related
* [PATCH v4 2/2] net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
From: Alexandru Ardelean @ 2019-09-12 16:28 UTC (permalink / raw)
To: netdev, devicetree, linux-kernel
Cc: davem, robh+dt, mark.rutland, f.fainelli, hkallweit1, andrew,
mkubecek, Alexandru Ardelean
In-Reply-To: <20190912162812.402-1-alexandru.ardelean@analog.com>
This driver becomes the first user of the kernel's `ETHTOOL_PHY_EDPD`
phy-tunable feature.
EDPD is also enabled by default on PHY config_init, but can be disabled via
the phy-tunable control.
When enabling EDPD, it's also a good idea (for the ADIN PHYs) to enable TX
periodic pulses, so that in case the other PHY is also on EDPD mode, there
is no lock-up situation where both sides are waiting for the other to
transmit.
Via the phy-tunable control, TX pulses can be disabled if specifying 0
`tx-interval` via ethtool.
The ADIN PHY supports only fixed 1 second intervals; they cannot be
configured. That is why the acceptable values are 1,
ETHTOOL_PHY_EDPD_DFLT_TX_MSECS and ETHTOOL_PHY_EDPD_NO_TX (which disables
TX pulses).
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
---
drivers/net/phy/adin.c | 61 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/drivers/net/phy/adin.c b/drivers/net/phy/adin.c
index 4dec83df048d..9afeee67675b 100644
--- a/drivers/net/phy/adin.c
+++ b/drivers/net/phy/adin.c
@@ -26,6 +26,11 @@
#define ADIN1300_RX_ERR_CNT 0x0014
+#define ADIN1300_PHY_CTRL_STATUS2 0x0015
+#define ADIN1300_NRG_PD_EN BIT(3)
+#define ADIN1300_NRG_PD_TX_EN BIT(2)
+#define ADIN1300_NRG_PD_STATUS BIT(1)
+
#define ADIN1300_PHY_CTRL2 0x0016
#define ADIN1300_DOWNSPEED_AN_100_EN BIT(11)
#define ADIN1300_DOWNSPEED_AN_10_EN BIT(10)
@@ -328,12 +333,62 @@ static int adin_set_downshift(struct phy_device *phydev, u8 cnt)
ADIN1300_DOWNSPEEDS_EN);
}
+static int adin_get_edpd(struct phy_device *phydev, u16 *tx_interval)
+{
+ int val;
+
+ val = phy_read(phydev, ADIN1300_PHY_CTRL_STATUS2);
+ if (val < 0)
+ return val;
+
+ if (ADIN1300_NRG_PD_EN & val) {
+ if (val & ADIN1300_NRG_PD_TX_EN)
+ /* default is 1 second */
+ *tx_interval = ETHTOOL_PHY_EDPD_DFLT_TX_MSECS;
+ else
+ *tx_interval = ETHTOOL_PHY_EDPD_NO_TX;
+ } else {
+ *tx_interval = ETHTOOL_PHY_EDPD_DISABLE;
+ }
+
+ return 0;
+}
+
+static int adin_set_edpd(struct phy_device *phydev, u16 tx_interval)
+{
+ u16 val;
+
+ if (tx_interval == ETHTOOL_PHY_EDPD_DISABLE)
+ return phy_clear_bits(phydev, ADIN1300_PHY_CTRL_STATUS2,
+ (ADIN1300_NRG_PD_EN | ADIN1300_NRG_PD_TX_EN));
+
+ val = ADIN1300_NRG_PD_EN;
+
+ switch (tx_interval) {
+ case 1000: /* 1 second */
+ /* fallthrough */
+ case ETHTOOL_PHY_EDPD_DFLT_TX_MSECS:
+ val |= ADIN1300_NRG_PD_TX_EN;
+ /* fallthrough */
+ case ETHTOOL_PHY_EDPD_NO_TX:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return phy_modify(phydev, ADIN1300_PHY_CTRL_STATUS2,
+ (ADIN1300_NRG_PD_EN | ADIN1300_NRG_PD_TX_EN),
+ val);
+}
+
static int adin_get_tunable(struct phy_device *phydev,
struct ethtool_tunable *tuna, void *data)
{
switch (tuna->id) {
case ETHTOOL_PHY_DOWNSHIFT:
return adin_get_downshift(phydev, data);
+ case ETHTOOL_PHY_EDPD:
+ return adin_get_edpd(phydev, data);
default:
return -EOPNOTSUPP;
}
@@ -345,6 +400,8 @@ static int adin_set_tunable(struct phy_device *phydev,
switch (tuna->id) {
case ETHTOOL_PHY_DOWNSHIFT:
return adin_set_downshift(phydev, *(const u8 *)data);
+ case ETHTOOL_PHY_EDPD:
+ return adin_set_edpd(phydev, *(const u16 *)data);
default:
return -EOPNOTSUPP;
}
@@ -368,6 +425,10 @@ static int adin_config_init(struct phy_device *phydev)
if (rc < 0)
return rc;
+ rc = adin_set_edpd(phydev, 1);
+ if (rc < 0)
+ return rc;
+
phydev_dbg(phydev, "PHY is using mode '%s'\n",
phy_modes(phydev->interface));
--
2.20.1
^ permalink raw reply related
* [PATCH v4 0/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Alexandru Ardelean @ 2019-09-12 16:28 UTC (permalink / raw)
To: netdev, devicetree, linux-kernel
Cc: davem, robh+dt, mark.rutland, f.fainelli, hkallweit1, andrew,
mkubecek, Alexandru Ardelean
This changeset proposes a new control for PHY tunable to control Energy
Detect Power Down.
The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
this feature is common across other PHYs (like EEE), and defining
`ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
The way EDPD works, is that the RX block is put to a lower power mode,
except for link-pulse detection circuits. The TX block is also put to low
power mode, but the PHY wakes-up periodically to send link pulses, to avoid
lock-ups in case the other side is also in EDPD mode.
Currently, there are 2 PHY drivers that look like they could use this new
PHY tunable feature: the `adin` && `micrel` PHYs.
This series updates only the `adin` PHY driver to support this new feature,
as this chip has been tested. A change for `micrel` can be proposed after a
discussion of the PHY-tunable API is resolved.
Alexandru Ardelean (2):
ethtool: implement Energy Detect Powerdown support via phy-tunable
net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
drivers/net/phy/adin.c | 61 ++++++++++++++++++++++++++++++++++++
include/uapi/linux/ethtool.h | 22 +++++++++++++
net/core/ethtool.c | 6 ++++
3 files changed, 89 insertions(+)
--
Changelog v3 -> v4:
* impose the TX interval unit for EDPD to be milliseconds; the point was
raised by Michal; this should allow for intervals:
- as small as 1 millisecond, which does not sound like a power-saver
- as large as 65 seconds, which sounds like a lot to wait for a link to
come up
Changelog v2 -> v3:
* implement Andrew's review comments:
1. for patch `ethtool: implement Energy Detect Powerdown support via
phy-tunable`
- ETHTOOL_PHY_EDPD_DFLT_TX_INTERVAL == 0xffff
- ETHTOOL_PHY_EDPD_NO_TX == 0xfffe
- added comment in include/uapi/linux/ethtool.h
2. for patch `net: phy: adin: implement Energy Detect Powerdown mode via
phy-tunable`
- added comments about interval & units for the ADIN PHY
- in `adin_set_edpd()`: add a switch statement of all the valid values
- in `adin_get_edpd()`: return `ETHTOOL_PHY_EDPD_DFLT_TX_INTERVAL`
since the PHY only supports a single TX-interval value (1 second)
Changelog v1 -> v2:
* initial series was made up of 2 sub-series: 1 for kernel & 1 for ethtool
in userspace; v2 contains only the kernel series
2.20.1
^ permalink raw reply
* Re: memory leak in ppp_write
From: Takeshi Misawa @ 2019-09-12 13:34 UTC (permalink / raw)
To: syzbot
Cc: ast, bpf, daniel, davem, kafai, linux-kernel, linux-ppp, netdev,
paulus, songliubraving, syzkaller-bugs, yhs
In-Reply-To: <000000000000594c700591433550@google.com>
[-- Attachment #1: Type: text/plain, Size: 57 bytes --]
#syz test: https://github.com/google/kasan.git 6525771f
[-- Attachment #2: 0001-ppp-Fix-memory-leak-in-ppp_write.patch --]
[-- Type: text/x-diff, Size: 2454 bytes --]
From e4ff0d04a4b8dd6da3dfb9135235ae5360ce86e6 Mon Sep 17 00:00:00 2001
From: Takeshi Misawa <jeliantsurux@gmail.com>
Date: Wed, 11 Sep 2019 22:18:43 +0900
Subject: [PATCH] ppp: Fix memory leak in ppp_write
When ppp is closing, __ppp_xmit_process() failed to enqueue skb
and skb allocated in ppp_write() is leaked.
syzbot reported :
BUG: memory leak
unreferenced object 0xffff88812a17bc00 (size 224):
comm "syz-executor673", pid 6952, jiffies 4294942888 (age 13.040s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000d110fff9>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
[<00000000d110fff9>] slab_post_alloc_hook mm/slab.h:522 [inline]
[<00000000d110fff9>] slab_alloc_node mm/slab.c:3262 [inline]
[<00000000d110fff9>] kmem_cache_alloc_node+0x163/0x2f0 mm/slab.c:3574
[<000000002d616113>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:197
[<000000000167fc45>] alloc_skb include/linux/skbuff.h:1055 [inline]
[<000000000167fc45>] ppp_write+0x48/0x120 drivers/net/ppp/ppp_generic.c:502
[<000000009ab42c0b>] __vfs_write+0x43/0xa0 fs/read_write.c:494
[<00000000086b2e22>] vfs_write fs/read_write.c:558 [inline]
[<00000000086b2e22>] vfs_write+0xee/0x210 fs/read_write.c:542
[<00000000a2b70ef9>] ksys_write+0x7c/0x130 fs/read_write.c:611
[<00000000ce5e0fdd>] __do_sys_write fs/read_write.c:623 [inline]
[<00000000ce5e0fdd>] __se_sys_write fs/read_write.c:620 [inline]
[<00000000ce5e0fdd>] __x64_sys_write+0x1e/0x30 fs/read_write.c:620
[<00000000d9d7b370>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:296
[<0000000006e6d506>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by freeing skb, if ppp is closing.
Reported-by: syzbot+d9c8bf24e56416d7ce2c@syzkaller.appspotmail.com
Signed-off-by: Takeshi Misawa <jeliantsurux@gmail.com>
---
drivers/net/ppp/ppp_generic.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index a30e41a56085..9a1b006904a7 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -1415,6 +1415,8 @@ static void __ppp_xmit_process(struct ppp *ppp, struct sk_buff *skb)
netif_wake_queue(ppp->dev);
else
netif_stop_queue(ppp->dev);
+ } else {
+ kfree_skb(skb);
}
ppp_xmit_unlock(ppp);
}
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 04/11] net: phylink: switch to using fwnode_gpiod_get_index()
From: Andy Shevchenko @ 2019-09-12 13:44 UTC (permalink / raw)
To: Linus Walleij
Cc: Dmitry Torokhov, Russell King - ARM Linux admin, Mika Westerberg,
linux-kernel@vger.kernel.org, open list:GPIO SUBSYSTEM,
Andrew Lunn, David S. Miller, Florian Fainelli, Heiner Kallweit,
netdev
In-Reply-To: <CACRpkdbTErKxFBr__tj391FHwUTxC7ZF_m94tC8-VHzaynBsnw@mail.gmail.com>
On Thu, Sep 12, 2019 at 10:41:43AM +0100, Linus Walleij wrote:
> On Wed, Sep 11, 2019 at 10:51 AM Dmitry Torokhov
> <dmitry.torokhov@gmail.com> wrote:
>
> > If we are willing to sacrifice the custom label for the GPIO that
> > fwnode_gpiod_get_index() allows us to set, then there are several
> > drivers that could actually use gpiod_get() API.
>
> We have:
> gpiod_set_consumer_name(gpiod, "name");
> to deal with that so no sacrifice is needed.
Thank for this hint!
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* RE: [PATCH][PATCH net-next] hv_netvsc: Add the support of hibernation
From: Haiyang Zhang @ 2019-09-12 13:50 UTC (permalink / raw)
To: Dexuan Cui, KY Srinivasan, Stephen Hemminger, sashal@kernel.org,
davem@davemloft.net, linux-hyperv@vger.kernel.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Michael Kelley
In-Reply-To: <1568245063-69693-1-git-send-email-decui@microsoft.com>
> -----Original Message-----
> From: Dexuan Cui <decui@microsoft.com>
> Sent: Wednesday, September 11, 2019 7:38 PM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; sashal@kernel.org; davem@davemloft.net;
> linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Michael Kelley <mikelley@microsoft.com>
> Cc: Dexuan Cui <decui@microsoft.com>
> Subject: [PATCH][PATCH net-next] hv_netvsc: Add the support of hibernation
>
> The existing netvsc_detach() and netvsc_attach() APIs make it easy to
> implement the suspend/resume callbacks.
>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>
> This patch is basically a pure Hyper-V specific change and it has a
> build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus:
> Implement
> suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
> Hyper-V tree's hyperv-next branch:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.k
> ernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Fhyperv%2Flinux.git%2
> Flog%2F%3Fh%3Dhyperv-
> next&data=02%7C01%7Chaiyangz%40microsoft.com%7C0b84e40c446
> 648cc35fb08d737110e28%7C72f988bf86f141af91ab2d7cd011db47%7C1%7
> C0%7C637038418703112019&sdata=qd7DGFCJZ%2BDTix0VGcCe1JucV
> O97E0gILpVpcxlA6EE%3D&reserved=0
>
> I request this patch should go through Sasha's tree rather than the
> net-next tree.
>
> drivers/net/hyperv/hyperv_net.h | 3 +++
> drivers/net/hyperv/netvsc_drv.c | 59
> +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h
> b/drivers/net/hyperv/hyperv_net.h
> index ecc9af0..b8763ee 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -952,6 +952,9 @@ struct net_device_context {
> u32 vf_alloc;
> /* Serial number of the VF to team with */
> u32 vf_serial;
> +
> + /* Used to temporarily save the config info across hibernation */
> + struct netvsc_device_info *saved_netvsc_dev_info;
> };
>
> /* Per channel data */
> diff --git a/drivers/net/hyperv/netvsc_drv.c
> b/drivers/net/hyperv/netvsc_drv.c
> index afdcc56..f920959 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -2392,6 +2392,63 @@ static int netvsc_remove(struct hv_device *dev)
> return 0;
> }
>
> +static int netvsc_suspend(struct hv_device *dev)
> +{
> + struct net_device_context *ndev_ctx;
> + struct net_device *vf_netdev, *net;
> + struct netvsc_device *nvdev;
> + int ret;
> +
> + net = hv_get_drvdata(dev);
> +
> + ndev_ctx = netdev_priv(net);
> + cancel_delayed_work_sync(&ndev_ctx->dwork);
> +
> + rtnl_lock();
> +
> + nvdev = rtnl_dereference(ndev_ctx->nvdev);
> + if (nvdev == NULL) {
> + ret = -ENODEV;
> + goto out;
> + }
> +
> + cancel_work_sync(&nvdev->subchan_work);
This looks redundant because netvsc_detach() cancels subchan_work.
Thanks,
- Haiyang
^ permalink raw reply
* Re: [PATCH 04/11] net: phylink: switch to using fwnode_gpiod_get_index()
From: Russell King - ARM Linux admin @ 2019-09-12 13:52 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Linus Walleij, Dmitry Torokhov, Mika Westerberg,
linux-kernel@vger.kernel.org, open list:GPIO SUBSYSTEM,
Andrew Lunn, David S. Miller, Florian Fainelli, Heiner Kallweit,
netdev
In-Reply-To: <20190912134429.GZ2680@smile.fi.intel.com>
On Thu, Sep 12, 2019 at 04:44:29PM +0300, Andy Shevchenko wrote:
> On Thu, Sep 12, 2019 at 10:41:43AM +0100, Linus Walleij wrote:
> > On Wed, Sep 11, 2019 at 10:51 AM Dmitry Torokhov
> > <dmitry.torokhov@gmail.com> wrote:
> >
> > > If we are willing to sacrifice the custom label for the GPIO that
> > > fwnode_gpiod_get_index() allows us to set, then there are several
> > > drivers that could actually use gpiod_get() API.
> >
> > We have:
> > gpiod_set_consumer_name(gpiod, "name");
> > to deal with that so no sacrifice is needed.
>
> Thank for this hint!
Would it be possible to improve your email etiquette, and move this
discussion to a more appropriate subject line, so I don't have to keep
checking these emails, in case you _do_ talk about something relevent
to the original patch that the subject line refers to?
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply
* Re: [PATCH v2 net 1/3] sctp: change return type of sctp_get_port_local
From: Marcelo Ricardo Leitner @ 2019-09-12 14:51 UTC (permalink / raw)
To: Mao Wenan
Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel,
kernel-janitors
In-Reply-To: <20190912040219.67517-2-maowenan@huawei.com>
On Thu, Sep 12, 2019 at 12:02:17PM +0800, Mao Wenan wrote:
> Currently sctp_get_port_local() returns a long
> which is either 0,1 or a pointer casted to long.
> It's neither of the callers use the return value since
> commit 62208f12451f ("net: sctp: simplify sctp_get_port").
> Now two callers are sctp_get_port and sctp_do_bind,
> they actually assumend a casted to an int was the same as
> a pointer casted to a long, and they don't save the return
> value just check whether it is zero or non-zero, so
> it would better change return type from long to int for
> sctp_get_port_local.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
This Fixes tag is not needed. It's just a cleanup.
Maybe Dave can remove it for us.
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Thanks.
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
> net/sctp/socket.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 9d1f83b10c0a..5e1934c48709 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -309,7 +309,7 @@ static int sctp_bind(struct sock *sk, struct sockaddr *addr, int addr_len)
> return retval;
> }
>
> -static long sctp_get_port_local(struct sock *, union sctp_addr *);
> +static int sctp_get_port_local(struct sock *, union sctp_addr *);
>
> /* Verify this is a valid sockaddr. */
> static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
> @@ -7998,7 +7998,7 @@ static void sctp_unhash(struct sock *sk)
> static struct sctp_bind_bucket *sctp_bucket_create(
> struct sctp_bind_hashbucket *head, struct net *, unsigned short snum);
>
> -static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
> +static int sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
> {
> struct sctp_sock *sp = sctp_sk(sk);
> bool reuse = (sk->sk_reuse || sp->reuse);
> @@ -8108,7 +8108,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>
> if (sctp_bind_addr_conflict(&ep2->base.bind_addr,
> addr, sp2, sp)) {
> - ret = (long)sk2;
> + ret = 1;
> goto fail_unlock;
> }
> }
> @@ -8180,7 +8180,7 @@ static int sctp_get_port(struct sock *sk, unsigned short snum)
> addr.v4.sin_port = htons(snum);
>
> /* Note: sk->sk_num gets filled in if ephemeral port request. */
> - return !!sctp_get_port_local(sk, &addr);
> + return sctp_get_port_local(sk, &addr);
> }
>
> /*
> --
> 2.20.1
>
^ permalink raw reply
* Re: [PATCH v2 net 2/3] sctp: remove redundant assignment when call sctp_get_port_local
From: Marcelo Ricardo Leitner @ 2019-09-12 14:52 UTC (permalink / raw)
To: Mao Wenan
Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel,
kernel-janitors
In-Reply-To: <20190912040219.67517-3-maowenan@huawei.com>
On Thu, Sep 12, 2019 at 12:02:18PM +0800, Mao Wenan wrote:
> There are more parentheses in if clause when call sctp_get_port_local
> in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
> do cleanup.
>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
^ permalink raw reply
* Re: [PATCH v2 net 3/3] sctp: destroy bucket if failed to bind addr
From: Marcelo Ricardo Leitner @ 2019-09-12 14:52 UTC (permalink / raw)
To: Mao Wenan
Cc: vyasevich, nhorman, davem, linux-sctp, netdev, linux-kernel,
kernel-janitors, Hulk Robot
In-Reply-To: <20190912040219.67517-4-maowenan@huawei.com>
On Thu, Sep 12, 2019 at 12:02:19PM +0800, Mao Wenan wrote:
> There is one memory leak bug report:
> BUG: memory leak
> unreferenced object 0xffff8881dc4c5ec0 (size 40):
> comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s)
> hex dump (first 32 bytes):
> 02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00 ................
> f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00 .c=.............
> backtrace:
> [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp]
> [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp]
> [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp]
> [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6]
> [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647
> [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline]
> [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline]
> [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656
> [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
> [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> This is because in sctp_do_bind, if sctp_get_port_local is to
> create hash bucket successfully, and sctp_add_bind_addr failed
> to bind address, e.g return -ENOMEM, so memory leak found, it
> needs to destroy allocated bucket.
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
^ permalink raw reply
* [bisected] UDP / xfrm: NAT-T packets with bad UDP checksum get dropped
From: Thomas Jarosch @ 2019-09-12 15:13 UTC (permalink / raw)
To: netdev; +Cc: Tom Herbert, Steffen Klassert
Hello together,
after updating many machines already from 3.14 to 4.19.67,
one site showed a non-working IPSec VPN connection with 4.19.x.
This IPSec connection is using UDP NAT traversal on port 4500.
The tunnel gets established fine, but no data flows.
Output of "ip xfrm state" looked sane.
The TRACE iptables firewall target showed the UDP packets
on port 4500 get accepted as expected. Still they didn't appear
in "ip xfrm monitor" and vanish somewhere in the kernel,
no error counter in /proc/net/xfrm_state increased at all.
After a few hours of bisecting with a test VM,
this commit was identified to cause the packet drop:
*******************
commit 0a80966b1043c3e2dc684140f155a3fded308660
Author: Tom Herbert <therbert@google.com>
Date: Wed May 7 16:52:39 2014 -0700
net: Verify UDP checksum before handoff to encap
Moving validation of UDP checksum to be done in UDP not encap layer.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index f2d05d7be743..54ea0a3a48f1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1495,6 +1495,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
if (skb->len > sizeof(struct udphdr) && encap_rcv != NULL) {
int ret;
+ /* Verify checksum before giving to encap */
+ if (udp_lib_checksum_complete(skb))
+ goto csum_error;
+
ret = encap_rcv(sk, skb);
if (ret <= 0) {
UDP_INC_STATS_BH(sock_net(sk),
..
*******************
This commit is part of kernel 3.16. Reverting the commit
brings back the VPN connection using kernel 4.19.67.
I didn't spot the checksum error initially since wireshark and tcpdump
need to be explicitly told to verify checksums and there's a warning
about checksum offloading messing with it.
Further tracing showed the UDP packets leave the source
site with a zero UDP checksum and arrive, after passing
an unknown home router and a few other routers,
with a bogus UDP checksum on the destination side.
I've disabled rx and tx checksum offloading on the target test VM
and also on a router in between, but the packet dumps just
contain a fixed value as UDP checksum (f.e. 0x91c4).
RFC 3948 specifies how ESP packets should be encapsulated
using UDP for NAT traveral:
https://tools.ietf.org/html/rfc3948
*******************
2.1. UDP-Encapsulated ESP Header Format
the IPv4 UDP Checksum SHOULD be transmitted as a zero value, and
receivers MUST NOT depend on the UDP checksum being a zero value.
3.5. Tunnel Mode ESP Decapsulation
1. The UDP header is removed from the packet.
*******************
I'm wondering how the RFC should be interpreted here
regarding the UDP checksumming?
As far as I can tell it doesn't mention the UDP checksum
should be verified before decapsulation, the ESP packets
will provide proper data authentication anyway.
Cheers,
Thomas
^ permalink raw reply related
* Re: memory leak in ppp_write
From: syzbot @ 2019-09-12 15:17 UTC (permalink / raw)
To: ast, bpf, daniel, davem, jeliantsurux, kafai, linux-kernel,
linux-ppp, netdev, paulus, songliubraving, syzkaller-bugs, yhs
In-Reply-To: <CAKK_rchVQCYmjPSxk9MszV9BtF8y04-j2dpjV0Jg3c+nrRNEWQ@mail.gmail.com>
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger
crash:
Reported-and-tested-by:
syzbot+d9c8bf24e56416d7ce2c@syzkaller.appspotmail.com
Tested on:
commit: 6525771f Merge tag 'arc-5.3-rc7' of git://git.kernel.org/p..
git tree: https://github.com/google/kasan.git
kernel config: https://syzkaller.appspot.com/x/.config?x=e6131eafb9408877
dashboard link: https://syzkaller.appspot.com/bug?extid=d9c8bf24e56416d7ce2c
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=12610ef1600000
Note: testing is done by a robot and is best-effort only.
^ permalink raw reply
* [PATCH bpf-next] libbpf: add xsk_umem__adjust_offset
From: Kevin Laatz @ 2019-09-12 7:28 UTC (permalink / raw)
To: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jonathan.lemon
Cc: bruce.richardson, ciara.loftus, bpf, Kevin Laatz
Currently, xsk_umem_adjust_offset exists as a kernel internal function.
This patch adds xsk_umem__adjust_offset to libbpf so that it can be used
from userspace. This will take the responsibility of properly storing the
offset away from the application, making it less error prone.
Since xsk_umem__adjust_offset is called on a per-packet basis, we need to
inline the function to avoid any performance regressions. In order to
inline xsk_umem__adjust_offset, we need to add it to xsk.h. Unfortunately
this means that we can't dereference the xsk_umem_config struct directly
since it is defined only in xsk.c. We therefore add an extra API to return
the flags field to the user from the structure, and have the inline
function use this flags field directly.
Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
tools/lib/bpf/libbpf.map | 1 +
tools/lib/bpf/xsk.c | 5 +++++
tools/lib/bpf/xsk.h | 14 ++++++++++++++
3 files changed, 20 insertions(+)
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index d04c7cb623ed..760350c9b81c 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -189,4 +189,5 @@ LIBBPF_0.0.4 {
LIBBPF_0.0.5 {
global:
bpf_btf_get_next_id;
+ xsk_umem__get_flags;
} LIBBPF_0.0.4;
diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 842c4fd55859..a4250a721ea6 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -84,6 +84,11 @@ int xsk_socket__fd(const struct xsk_socket *xsk)
return xsk ? xsk->fd : -EINVAL;
}
+__u32 xsk_umem__get_flags(struct xsk_umem *umem)
+{
+ return umem->config.flags;
+}
+
static bool xsk_page_aligned(void *buffer)
{
unsigned long addr = (unsigned long)buffer;
diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
index 584f6820a639..bf782facb274 100644
--- a/tools/lib/bpf/xsk.h
+++ b/tools/lib/bpf/xsk.h
@@ -183,8 +183,22 @@ static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr)
return xsk_umem__extract_addr(addr) + xsk_umem__extract_offset(addr);
}
+/* Handle the offset appropriately depending on aligned or unaligned mode.
+ * For unaligned mode, we store the offset in the upper 16-bits of the address.
+ * For aligned mode, we simply add the offset to the address.
+ */
+static inline __u64 xsk_umem__adjust_offset(__u32 umem_flags, __u64 addr,
+ __u64 offset)
+{
+ if (umem_flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
+ return addr + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
+ else
+ return addr + offset;
+}
+
LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
+LIBBPF_API __u32 xsk_umem__get_flags(struct xsk_umem *umem);
#define XSK_RING_CONS__DEFAULT_NUM_DESCS 2048
#define XSK_RING_PROD__DEFAULT_NUM_DESCS 2048
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] ixgbe: Fix secpath usage for IPsec TX offload.
From: Jeff Kirsher @ 2019-09-12 15:57 UTC (permalink / raw)
To: David Miller, steffen.klassert; +Cc: intel-wired-lan, michael, snelson, netdev
In-Reply-To: <20190912.134359.345289288863944180.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 782 bytes --]
On Thu, 2019-09-12 at 13:43 +0200, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Thu, 12 Sep 2019 13:01:44 +0200
>
> > The ixgbe driver currently does IPsec TX offloading
> > based on an existing secpath. However, the secpath
> > can also come from the RX side, in this case it is
> > misinterpreted for TX offload and the packets are
> > dropped with a "bad sa_idx" error. Fix this by using
> > the xfrm_offload() function to test for TX offload.
> >
> > Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload")
> > Reported-by: Michael Marley <michael@michaelmarley.com>
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
>
> I'll apply this directly and queue it up for -stable, thanks.
Thanks Dave!
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox