Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 16/16] dt-bindings: net: add bindings for ADIN PHY driver
From: Ardelean, Alexandru @ 2019-08-06  7:03 UTC (permalink / raw)
  To: andrew@lunn.ch
  Cc: davem@davemloft.net, hkallweit1@gmail.com,
	devicetree@vger.kernel.org, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, f.fainelli@gmail.com,
	netdev@vger.kernel.org, robh+dt@kernel.org
In-Reply-To: <20190805141100.GG24275@lunn.ch>

On Mon, 2019-08-05 at 16:11 +0200, Andrew Lunn wrote:
> [External]
> 
> > +  adi,rx-internal-delay:
> > +    $ref: /schemas/types.yaml#/definitions/uint32
> > +    description: |
> > +      RGMII RX Clock Delay used only when PHY operates in RGMII mode (phy-mode
> > +      is "rgmii-id", "rgmii-rxid", "rgmii-txid") see `dt-bindings/net/adin.h`
> > +      default value is 0 (which represents 2 ns)
> > +    enum: [ 0, 1, 2, 6, 7 ]
> 
> We want these numbers to be in ns. So the default value would actually
> be 2. The driver needs to convert the number in DT to a value to poke
> into a PHY register. Please rename the property adi,rx-internal-delay-ns.

ack;
also, good point about ns units and PHY driver to convert it to reg values;

> 
> > +
> > +  adi,tx-internal-delay:
> > +    $ref: /schemas/types.yaml#/definitions/uint32
> > +    description: |
> > +      RGMII TX Clock Delay used only when PHY operates in RGMII mode (phy-mode
> > +      is "rgmii-id", "rgmii-rxid", "rgmii-txid") see `dt-bindings/net/adin.h`
> > +      default value is 0 (which represents 2 ns)
> > +    enum: [ 0, 1, 2, 6, 7 ]
> 
> Same here.
> 
> > +
> > +  adi,fifo-depth:
> > +    $ref: /schemas/types.yaml#/definitions/uint32
> > +    description: |
> > +      When operating in RMII mode, this option configures the FIFO depth.
> > +      See `dt-bindings/net/adin.h`.
> > +    enum: [ 0, 1, 2, 3, 4, 5 ]
> 
> Units? You should probably rename this adi,fifo-depth-bits and list
> the valid values in bits.

units are bits;
will adapt this

> 
> > +
> > +  adi,eee-enabled:
> > +    description: |
> > +      Advertise EEE capabilities on power-up/init (default disabled)
> > +    type: boolean
> 
> It is not the PHY which decides this. The MAC indicates if it is EEE
> capable to phylib. phylib looks into the PHY registers to determine if
> the PHY supports EEE. phylib will then enable EEE
> advertisement. Please remove this, and ensure EEE is disabled by
> default.

ack;
will remove

> 
> 	Andrew

^ permalink raw reply

* Re: [PATCH 15/16] net: phy: adin: add ethtool get_stats support
From: Ardelean, Alexandru @ 2019-08-06  7:11 UTC (permalink / raw)
  To: andrew@lunn.ch
  Cc: davem@davemloft.net, hkallweit1@gmail.com,
	devicetree@vger.kernel.org, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, f.fainelli@gmail.com,
	netdev@vger.kernel.org, robh+dt@kernel.org
In-Reply-To: <20190805152832.GT24275@lunn.ch>

On Mon, 2019-08-05 at 17:28 +0200, Andrew Lunn wrote:
> [External]
> 
> > +struct adin_hw_stat {
> > +	const char *string;
> > +static void adin_get_strings(struct phy_device *phydev, u8 *data)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(adin_hw_stats); i++) {
> > +		memcpy(data + i * ETH_GSTRING_LEN,
> > +		       adin_hw_stats[i].string, ETH_GSTRING_LEN);
> 
> You define string as a char *. So it will be only as long as it should
> be. However memcpy always copies ETH_GSTRING_LEN bytes, doing off the
> end of the string and into whatever follows.
> 

hmm, will use strlcpy()
i blindedly copied memcpy() from some other driver

> 
> > +	}
> > +}
> > +
> > +static int adin_read_mmd_stat_regs(struct phy_device *phydev,
> > +				   struct adin_hw_stat *stat,
> > +				   u32 *val)
> > +{
> > +	int ret;
> > +
> > +	ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, stat->reg1);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	*val = (ret & 0xffff);
> > +
> > +	if (stat->reg2 == 0)
> > +		return 0;
> > +
> > +	ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, stat->reg2);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	*val <<= 16;
> > +	*val |= (ret & 0xffff);
> 
> Does the hardware have a snapshot feature? Is there a danger that
> between the two reads stat->reg1 rolls over and you end up with too
> big a value?

i'm afraid i don't understand about the snapshot feature you are mentioning;
i.e. i don't remember seeing it in other chips;

regarding the danger that stat->reg1 rolls over, i guess that is possible, but it's a bit hard to guard against;
i guess if it ends up in that scenario, [for many counters] things would be horribly bad, and the chip, or cabling would
be unusable;

not sure if this answer is sufficient/satisfactory;

thanks

> 
>     Andrew

^ permalink raw reply

* Re: [PATCH 15/16] net: phy: adin: add ethtool get_stats support
From: Ardelean, Alexandru @ 2019-08-06  7:18 UTC (permalink / raw)
  To: andrew@lunn.ch
  Cc: davem@davemloft.net, hkallweit1@gmail.com,
	devicetree@vger.kernel.org, mark.rutland@arm.com,
	linux-kernel@vger.kernel.org, f.fainelli@gmail.com,
	netdev@vger.kernel.org, robh+dt@kernel.org
In-Reply-To: <20190805153058.GU24275@lunn.ch>

On Mon, 2019-08-05 at 17:30 +0200, Andrew Lunn wrote:
> [External]
> 
> On Mon, Aug 05, 2019 at 07:54:52PM +0300, Alexandru Ardelean wrote:
> > This change implements retrieving all the error counters from the PHY.
> > The PHY supports several error counters/stats. The `Mean Square Errors`
> > status values are only valie when a link is established, and shouldn't be
> > incremented. These values characterize the quality of a signal.
> 
> I think you mean accumulated, not incremented?

accumulated sounds better;


> > The rest of the error counters are self-clearing on read.
> > Most of them are reports from the Frame Checker engine that the PHY has.
> > 
> > Not retrieving the `LPI Wake Error Count Register` here, since that is used
> > by the PHY framework to check for any EEE errors. And that register is
> > self-clearing when read (as per IEEE spec).
> > 
> > Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
> > ---
> >  drivers/net/phy/adin.c | 108 +++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 108 insertions(+)
> > 
> > diff --git a/drivers/net/phy/adin.c b/drivers/net/phy/adin.c
> > index a1f3456a8504..04896547dac8 100644
> > --- a/drivers/net/phy/adin.c
> > +++ b/drivers/net/phy/adin.c
> > @@ -103,6 +103,32 @@ static struct clause22_mmd_map clause22_mmd_map[] = {
> >  	{ MDIO_MMD_PCS, MDIO_PCS_EEE_WK_ERR,	ADIN1300_LPI_WAKE_ERR_CNT_REG },
> >  };
> >  
> > +struct adin_hw_stat {
> > +	const char *string;
> > +	u16 reg1;
> > +	u16 reg2;
> > +	bool do_not_inc;
> 
> do_not_accumulate? or reverse its meaning, clear_on_read?

do_not_accumulate works;
there are only 4 regs that need this property set to true

> 
>    Andrew

^ permalink raw reply

* [PATCH net] net: ethernet: sun4i-emac: Support phy-handle property for finding PHYs
From: Chen-Yu Tsai @ 2019-08-06  7:35 UTC (permalink / raw)
  To: David S. Miller, Maxime Ripard
  Cc: Chen-Yu Tsai, netdev, linux-arm-kernel, linux-kernel

From: Chen-Yu Tsai <wens@csie.org>

The sun4i-emac uses the "phy" property to find the PHY it's supposed to
use. This property was deprecated in favor of "phy-handle" in commit
8c5b09447625 ("dt-bindings: net: sun4i-emac: Convert the binding to a
schemas").

Add support for this new property name, and fall back to the old one in
case the device tree hasn't been updated.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---

The aforementioned commit is in v5.3-rc1. It would be nice to have the
driver fix in the same release. In addition, an update for the device
tree has been queued up for v5.4, which made us realize the driver needs
an update.

---
 drivers/net/ethernet/allwinner/sun4i-emac.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/allwinner/sun4i-emac.c b/drivers/net/ethernet/allwinner/sun4i-emac.c
index 3434730a7699..0537df06a9b5 100644
--- a/drivers/net/ethernet/allwinner/sun4i-emac.c
+++ b/drivers/net/ethernet/allwinner/sun4i-emac.c
@@ -860,7 +860,9 @@ static int emac_probe(struct platform_device *pdev)
 		goto out_clk_disable_unprepare;
 	}
 
-	db->phy_node = of_parse_phandle(np, "phy", 0);
+	db->phy_node = of_parse_phandle(np, "phy-handle", 0);
+	if (!db->phy_node)
+		db->phy_node = of_parse_phandle(np, "phy", 0);
 	if (!db->phy_node) {
 		dev_err(&pdev->dev, "no associated PHY\n");
 		ret = -ENODEV;
-- 
2.20.1


^ permalink raw reply related

* [PATCH rdma-next v2 0/4] ODP support for mlx5 DC QPs
From: Leon Romanovsky @ 2019-08-06  7:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Michael Guralnik, Moni Shoua,
	Saeed Mahameed, linux-netdev

From: Leon Romanovsky <leonro@mellanox.com>

Changelog
 v2:
 * Fixed reserved_* field wrong name (Saeed M.)
 * Split first patch to two patches, one for mlx5-next and one for rdma-next. (Saeed M.)
 v1:
 * Fixed alignment to u64 in mlx5-abi.h (Gal P.)
 * https://lore.kernel.org/linux-rdma/20190804100048.32671-1-leon@kernel.org
 v0:
 * https://lore.kernel.org/linux-rdma/20190801122139.25224-1-leon@kernel.org

---------------------------------------------------------------------------------
From Michael,

The series adds support for on-demand paging for DC transport.
Adding handling of DC WQE parsing upon page faults and exposing
capabilities.

As DC is mlx-only transport, the capabilities are exposed to the user
using the direct-verbs mechanism. Namely through the
mlx5dv_query_device.

Thanks

Michael Guralnik (4):
  net/mlx5: Set ODP capabilities for DC transport
  IB/mlx5: Query ODP capabilities for DC
  IB/mlx5: Expose ODP for DC capabilities to user
  IB/mlx5: Add page fault handler for DC initiator WQE

 drivers/infiniband/hw/mlx5/main.c             |  6 +++++
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  1 +
 drivers/infiniband/hw/mlx5/odp.c              | 27 ++++++++++++++++++-
 .../net/ethernet/mellanox/mlx5/core/main.c    |  6 +++++
 include/linux/mlx5/mlx5_ifc.h                 |  4 ++-
 include/uapi/rdma/mlx5-abi.h                  |  3 +++
 6 files changed, 45 insertions(+), 2 deletions(-)

--
2.20.1


^ permalink raw reply

* [PATCH mlx5-next v2 1/4] net/mlx5: Set ODP capabilities for DC transport
From: Leon Romanovsky @ 2019-08-06  7:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Michael Guralnik, Moni Shoua,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190806074807.9111-1-leon@kernel.org>

From: Michael Guralnik <michaelgur@mellanox.com>

Query ODP capabilities for DC transport from FW.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 6 ++++++
 include/linux/mlx5/mlx5_ifc.h                  | 4 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index b15b27a497fc..3995fc6d4d34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -495,6 +495,12 @@ static int handle_hca_cap_odp(struct mlx5_core_dev *dev)
 	ODP_CAP_SET_MAX(dev, xrc_odp_caps.write);
 	ODP_CAP_SET_MAX(dev, xrc_odp_caps.read);
 	ODP_CAP_SET_MAX(dev, xrc_odp_caps.atomic);
+	ODP_CAP_SET_MAX(dev, dc_odp_caps.srq_receive);
+	ODP_CAP_SET_MAX(dev, dc_odp_caps.send);
+	ODP_CAP_SET_MAX(dev, dc_odp_caps.receive);
+	ODP_CAP_SET_MAX(dev, dc_odp_caps.write);
+	ODP_CAP_SET_MAX(dev, dc_odp_caps.read);
+	ODP_CAP_SET_MAX(dev, dc_odp_caps.atomic);
 
 	if (do_set)
 		err = set_caps(dev, set_ctx, set_sz,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ec571fd7fcf8..b96d5c37f70f 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -944,7 +944,9 @@ struct mlx5_ifc_odp_cap_bits {
 
 	struct mlx5_ifc_odp_per_transport_service_cap_bits xrc_odp_caps;
 
-	u8         reserved_at_100[0x700];
+	struct mlx5_ifc_odp_per_transport_service_cap_bits dc_odp_caps;
+
+	u8         reserved_at_120[0x6E0];
 };
 
 struct mlx5_ifc_calc_op {
-- 
2.20.1


^ permalink raw reply related

* [PATCH rdma-next v2 2/4] IB/mlx5: Query ODP capabilities for DC
From: Leon Romanovsky @ 2019-08-06  7:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Michael Guralnik, Moni Shoua,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190806074807.9111-1-leon@kernel.org>

From: Michael Guralnik <michaelgur@mellanox.com>

Cache current ODP capabilities for DC in mlx5_ib device.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 drivers/infiniband/hw/mlx5/odp.c     | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index cb41a7e6255a..f99c71b3c876 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -967,6 +967,7 @@ struct mlx5_ib_dev {
 	struct mutex			slow_path_mutex;
 	int				fill_delay;
 	struct ib_odp_caps	odp_caps;
+	uint32_t		dc_odp_caps;
 	u64			odp_max_size;
 	struct mlx5_ib_pf_eq	odp_pf_eq;
 
diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index b0c5de39d186..5e87a5e25574 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -353,6 +353,24 @@ void mlx5_ib_internal_fill_odp_caps(struct mlx5_ib_dev *dev)
 	if (MLX5_CAP_ODP(dev->mdev, xrc_odp_caps.srq_receive))
 		caps->per_transport_caps.xrc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
 
+	if (MLX5_CAP_ODP(dev->mdev, dc_odp_caps.send))
+		dev->dc_odp_caps |= IB_ODP_SUPPORT_SEND;
+
+	if (MLX5_CAP_ODP(dev->mdev, dc_odp_caps.receive))
+		dev->dc_odp_caps |= IB_ODP_SUPPORT_RECV;
+
+	if (MLX5_CAP_ODP(dev->mdev, dc_odp_caps.write))
+		dev->dc_odp_caps |= IB_ODP_SUPPORT_WRITE;
+
+	if (MLX5_CAP_ODP(dev->mdev, dc_odp_caps.read))
+		dev->dc_odp_caps |= IB_ODP_SUPPORT_READ;
+
+	if (MLX5_CAP_ODP(dev->mdev, dc_odp_caps.atomic))
+		dev->dc_odp_caps |= IB_ODP_SUPPORT_ATOMIC;
+
+	if (MLX5_CAP_ODP(dev->mdev, dc_odp_caps.srq_receive))
+		dev->dc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
+
 	if (MLX5_CAP_GEN(dev->mdev, fixed_buffer_size) &&
 	    MLX5_CAP_GEN(dev->mdev, null_mkey) &&
 	    MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset))
-- 
2.20.1


^ permalink raw reply related

* [PATCH rdma-next v2 4/4] IB/mlx5: Add page fault handler for DC initiator WQE
From: Leon Romanovsky @ 2019-08-06  7:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Michael Guralnik, Moni Shoua,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190806074807.9111-1-leon@kernel.org>

From: Michael Guralnik <michaelgur@mellanox.com>

Parsing DC initiator WQEs upon page fault requires skipping an address
vector segment, as in UD WQEs.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/odp.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index 5e87a5e25574..6f1de5edbe8e 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -1065,6 +1065,12 @@ static int mlx5_ib_mr_initiator_pfault_handler(
 	case IB_QPT_UD:
 		transport_caps = dev->odp_caps.per_transport_caps.ud_odp_caps;
 		break;
+	case IB_QPT_DRIVER:
+		if (qp->qp_sub_type == MLX5_IB_QPT_DCI) {
+			transport_caps = dev->dc_odp_caps;
+			break;
+		}
+		/* fall through */
 	default:
 		mlx5_ib_err(dev, "ODP fault on QP of an unsupported transport 0x%x\n",
 			    qp->ibqp.qp_type);
@@ -1078,7 +1084,8 @@ static int mlx5_ib_mr_initiator_pfault_handler(
 		return -EFAULT;
 	}
 
-	if (qp->ibqp.qp_type == IB_QPT_UD) {
+	if (qp->ibqp.qp_type == IB_QPT_UD ||
+	    qp->qp_sub_type == MLX5_IB_QPT_DCI) {
 		av = *wqe;
 		if (av->dqp_dct & cpu_to_be32(MLX5_EXTENDED_UD_AV))
 			*wqe += sizeof(struct mlx5_av);
-- 
2.20.1


^ permalink raw reply related

* [PATCH rdma-next v2 3/4] IB/mlx5: Expose ODP for DC capabilities to user
From: Leon Romanovsky @ 2019-08-06  7:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Michael Guralnik, Moni Shoua,
	Saeed Mahameed, linux-netdev
In-Reply-To: <20190806074807.9111-1-leon@kernel.org>

From: Michael Guralnik <michaelgur@mellanox.com>

Return ODP capabilities for DC to user in alloc_context.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 6 ++++++
 include/uapi/rdma/mlx5-abi.h      | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 4a3d700cd783..a53e0dc7c17f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1954,6 +1954,12 @@ static int mlx5_ib_alloc_ucontext(struct ib_ucontext *uctx,
 		resp.response_length += sizeof(resp.dump_fill_mkey);
 	}
 
+	if (field_avail(typeof(resp), dc_odp_caps, udata->outlen)) {
+		resp.dc_odp_caps = dev->dc_odp_caps;
+		resp.comp_mask |= MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_DC_ODP_CAPS;
+		resp.response_length += sizeof(resp.dc_odp_caps);
+	}
+
 	err = ib_copy_to_udata(udata, &resp, resp.response_length);
 	if (err)
 		goto out_mdev;
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index 624f5b53eb1f..7cab806d7fa7 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -98,6 +98,7 @@ struct mlx5_ib_alloc_ucontext_req_v2 {
 enum mlx5_ib_alloc_ucontext_resp_mask {
 	MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET = 1UL << 0,
 	MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_DUMP_FILL_MKEY    = 1UL << 1,
+	MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_DC_ODP_CAPS	   = 1UL << 2,
 };
 
 enum mlx5_user_cmds_supp_uhw {
@@ -147,6 +148,8 @@ struct mlx5_ib_alloc_ucontext_resp {
 	__u32	num_uars_per_page;
 	__u32	num_dyn_bfregs;
 	__u32	dump_fill_mkey;
+	__u32	dc_odp_caps;
+	__u32   reserved;
 };
 
 struct mlx5_ib_alloc_pd_resp {
-- 
2.20.1


^ permalink raw reply related

* [PATCH net v2] net: dsa: Check existence of .port_mdb_add callback before calling it
From: Chen-Yu Tsai @ 2019-08-06  7:53 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot, Florian Fainelli, David S. Miller
  Cc: Chen-Yu Tsai, netdev, linux-kernel

From: Chen-Yu Tsai <wens@csie.org>

With the recent addition of commit 75dad2520fc3 ("net: dsa: b53: Disable
all ports on setup"), users of b53 (BCM53125 on Lamobo R1 in my case)
are forced to use the dsa subsystem to enable the switch, instead of
having it in the default transparent "forward-to-all" mode.

The b53 driver does not support mdb bitmap functions. However the dsa
layer does not check for the existence of the .port_mdb_add callback
before actually using it. This results in a NULL pointer dereference,
as shown in the kernel oops below.

The other functions seem to be properly guarded. Do the same for
.port_mdb_add in dsa_switch_mdb_add_bitmap() as well.

b53 is not the only driver that doesn't support mdb bitmap functions.
Others include bcm_sf2, dsa_loop, lantiq_gswip, mt7530, mv88e6060,
qca8k, realtek-smi, and vitesse-vsc73xx.

    8<--- cut here ---
    Unable to handle kernel NULL pointer dereference at virtual address 00000000
    pgd = (ptrval)
    [00000000] *pgd=00000000
    Internal error: Oops: 80000005 [#1] SMP ARM
    Modules linked in: rtl8xxxu rtl8192cu rtl_usb rtl8192c_common rtlwifi mac80211 cfg80211
    CPU: 1 PID: 134 Comm: kworker/1:2 Not tainted 5.3.0-rc1-00247-gd3519030752a #1
    Hardware name: Allwinner sun7i (A20) Family
    Workqueue: events switchdev_deferred_process_work
    PC is at 0x0
    LR is at dsa_switch_event+0x570/0x620
    pc : [<00000000>]    lr : [<c08533ec>]    psr: 80070013
    sp : ee871db8  ip : 00000000  fp : ee98d0a4
    r10: 0000000c  r9 : 00000008  r8 : ee89f710
    r7 : ee98d040  r6 : ee98d088  r5 : c0f04c48  r4 : ee98d04c
    r3 : 00000000  r2 : ee89f710  r1 : 00000008  r0 : ee98d040
    Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
    Control: 10c5387d  Table: 6deb406a  DAC: 00000051
    Process kworker/1:2 (pid: 134, stack limit = 0x(ptrval))
    Stack: (0xee871db8 to 0xee872000)
    1da0:                                                       ee871e14 103ace2d
    1dc0: 00000000 ffffffff 00000000 ee871e14 00000005 00000000 c08524a0 00000000
    1de0: ffffe000 c014bdfc c0f04c48 ee871e98 c0f04c48 ee9e5000 c0851120 c014bef0
    1e00: 00000000 b643aea2 ee9b4068 c08509a8 ee2bf940 ee89f710 ee871ecb 00000000
    1e20: 00000008 103ace2d 00000000 c087e248 ee29c868 103ace2d 00000001 ffffffff
    1e40: 00000000 ee871e98 00000006 00000000 c0fb2a50 c087e2d0 ffffffff c08523c4
    1e60: ffffffff c014bdfc 00000006 c0fad2d0 ee871e98 ee89f710 00000000 c014c500
    1e80: 00000000 ee89f3c0 c0f04c48 00000000 ee9e5000 c087dfb4 ee9e5000 00000000
    1ea0: ee89f710 ee871ecb 00000001 103ace2d 00000000 c0f04c48 00000000 c087e0a8
    1ec0: 00000000 efd9a3e0 0089f3c0 103ace2d ee89f700 ee89f710 ee9e5000 00000122
    1ee0: 00000100 c087e130 ee89f700 c0fad2c8 c1003ef0 c087de4c 2e928000 c0fad2ec
    1f00: c0fad2ec ee839580 ef7a62c0 ef7a9400 00000000 c087def8 c0fad2ec c01447dc
    1f20: ef315640 ef7a62c0 00000008 ee839580 ee839594 ef7a62c0 00000008 c0f03d00
    1f40: ef7a62d8 ef7a62c0 ffffe000 c0145b84 ffffe000 c0fb2420 c0bfaa8c 00000000
    1f60: ffffe000 ee84b600 ee84b5c0 00000000 ee870000 ee839580 c0145b40 ef0e5ea4
    1f80: ee84b61c c014a6f8 00000001 ee84b5c0 c014a5b0 00000000 00000000 00000000
    1fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
    1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
    [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
    [<c014bdfc>] (notifier_call_chain) from [<c014bef0>] (raw_notifier_call_chain+0x18/0x20)
    [<c014bef0>] (raw_notifier_call_chain) from [<c08509a8>] (dsa_port_mdb_add+0x48/0x74)
    [<c08509a8>] (dsa_port_mdb_add) from [<c087e248>] (__switchdev_handle_port_obj_add+0x54/0xd4)
    [<c087e248>] (__switchdev_handle_port_obj_add) from [<c087e2d0>] (switchdev_handle_port_obj_add+0x8/0x14)
    [<c087e2d0>] (switchdev_handle_port_obj_add) from [<c08523c4>] (dsa_slave_switchdev_blocking_event+0x94/0xa4)
    [<c08523c4>] (dsa_slave_switchdev_blocking_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
    [<c014bdfc>] (notifier_call_chain) from [<c014c500>] (blocking_notifier_call_chain+0x50/0x68)
    [<c014c500>] (blocking_notifier_call_chain) from [<c087dfb4>] (switchdev_port_obj_notify+0x44/0xa8)
    [<c087dfb4>] (switchdev_port_obj_notify) from [<c087e0a8>] (switchdev_port_obj_add_now+0x90/0x104)
    [<c087e0a8>] (switchdev_port_obj_add_now) from [<c087e130>] (switchdev_port_obj_add_deferred+0x14/0x5c)
    [<c087e130>] (switchdev_port_obj_add_deferred) from [<c087de4c>] (switchdev_deferred_process+0x64/0x104)
    [<c087de4c>] (switchdev_deferred_process) from [<c087def8>] (switchdev_deferred_process_work+0xc/0x14)
    [<c087def8>] (switchdev_deferred_process_work) from [<c01447dc>] (process_one_work+0x218/0x50c)
    [<c01447dc>] (process_one_work) from [<c0145b84>] (worker_thread+0x44/0x5bc)
    [<c0145b84>] (worker_thread) from [<c014a6f8>] (kthread+0x148/0x150)
    [<c014a6f8>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
    Exception stack(0xee871fb0 to 0xee871ff8)
    1fa0:                                     00000000 00000000 00000000 00000000
    1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
    Code: bad PC value
    ---[ end trace 1292c61abd17b130 ]---

    [<c08533ec>] (dsa_switch_event) from [<c014bdfc>] (notifier_call_chain+0x48/0x84)
    corresponds to

	$ arm-linux-gnueabihf-addr2line -C -i -e vmlinux c08533ec

	linux/net/dsa/switch.c:156
	linux/net/dsa/switch.c:178
	linux/net/dsa/switch.c:328

Fixes: e6db98db8a95 ("net: dsa: add switch mdb bitmap functions")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
---
Changes since v1:

  - Moved the check to the beginning of dsa_switch_mdb_add()

Looks like we could also move the ops check out of
dsa_switch_mdb_prepare_bitmap(), though I suppose keeping the code the
way it is now is clearer.

---
 net/dsa/switch.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 4ec5b7f85d51..231af5268656 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -164,6 +164,9 @@ static int dsa_switch_mdb_add(struct dsa_switch *ds,
 	struct switchdev_trans *trans = info->trans;
 	int port;
 
+	if (!ds->ops->port_mdb_add)
+		return -EOPNOTSUPP;
+
 	/* Build a mask of Multicast group members */
 	bitmap_zero(ds->bitmap, ds->num_ports);
 	if (ds->index == info->sw_index)
-- 
2.20.1


^ permalink raw reply related

* [PATCH] net: cxgb3_main: Fix a resource leak in a error path in 'init_one()'
From: Christophe JAILLET @ 2019-08-06  8:55 UTC (permalink / raw)
  To: vishal, davem; +Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET

A call to 'kfree_skb()' is missing in the error handling path of
'init_one()'.
This is already present in 'remove_one()' but is missing here.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
index 1e82b9efe447..58f89f6a040f 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c
@@ -3269,7 +3269,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (!adapter->regs) {
 		dev_err(&pdev->dev, "cannot map device registers\n");
 		err = -ENOMEM;
-		goto out_free_adapter;
+		goto out_free_adapter_nofail;
 	}
 
 	adapter->pdev = pdev;
@@ -3397,6 +3397,9 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		if (adapter->port[i])
 			free_netdev(adapter->port[i]);
 
+out_free_adapter_nofail:
+	kfree_skb(adapter->nofail_skb);
+
 out_free_adapter:
 	kfree(adapter);
 
-- 
2.20.1


^ permalink raw reply related

* memory leak in internal_dev_create
From: syzbot @ 2019-08-06  8:58 UTC (permalink / raw)
  To: davem, dev, linux-kernel, netdev, pshelar, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    1e78030e Merge tag 'mmc-v5.3-rc1' of git://git.kernel.org/..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=148d3d1a600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=30cef20daf3e9977
dashboard link: https://syzkaller.appspot.com/bug?extid=13210896153522fe1ee5
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=136aa8c4600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=109ba792600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+13210896153522fe1ee5@syzkaller.appspotmail.com

BUG: memory leak
unreferenced object 0xffff8881207e4100 (size 128):
   comm "syz-executor032", pid 7014, jiffies 4294944027 (age 13.830s)
   hex dump (first 32 bytes):
     00 70 16 18 81 88 ff ff 80 af 8c 22 81 88 ff ff  .p........."....
     00 b6 23 17 81 88 ff ff 00 00 00 00 00 00 00 00  ..#.............
   backtrace:
     [<000000000eb78212>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]
     [<000000000eb78212>] slab_post_alloc_hook mm/slab.h:522 [inline]
     [<000000000eb78212>] slab_alloc mm/slab.c:3319 [inline]
     [<000000000eb78212>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
     [<00000000006ea6c6>] kmalloc include/linux/slab.h:552 [inline]
     [<00000000006ea6c6>] kzalloc include/linux/slab.h:748 [inline]
     [<00000000006ea6c6>] ovs_vport_alloc+0x37/0xf0  
net/openvswitch/vport.c:130
     [<00000000f9a04a7d>] internal_dev_create+0x24/0x1d0  
net/openvswitch/vport-internal_dev.c:164
     [<0000000056ee7c13>] ovs_vport_add+0x81/0x190  
net/openvswitch/vport.c:199
     [<000000005434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
     [<00000000b7b253f1>] ovs_dp_cmd_new+0x22f/0x410  
net/openvswitch/datapath.c:1614
     [<00000000e0988518>] genl_family_rcv_msg+0x2ab/0x5b0  
net/netlink/genetlink.c:629
     [<00000000d0cc9347>] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
     [<000000006694b647>] netlink_rcv_skb+0x61/0x170  
net/netlink/af_netlink.c:2477
     [<0000000088381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
     [<00000000dad42a47>] netlink_unicast_kernel  
net/netlink/af_netlink.c:1302 [inline]
     [<00000000dad42a47>] netlink_unicast+0x1ec/0x2d0  
net/netlink/af_netlink.c:1328
     [<0000000067e6b079>] netlink_sendmsg+0x270/0x480  
net/netlink/af_netlink.c:1917
     [<00000000aab08a47>] sock_sendmsg_nosec net/socket.c:637 [inline]
     [<00000000aab08a47>] sock_sendmsg+0x54/0x70 net/socket.c:657
     [<000000004cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
     [<00000000c4901c63>] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
     [<00000000c10abb2d>] __do_sys_sendmsg net/socket.c:2365 [inline]
     [<00000000c10abb2d>] __se_sys_sendmsg net/socket.c:2363 [inline]
     [<00000000c10abb2d>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363

BUG: memory leak
unreferenced object 0xffff88811723b600 (size 64):
   comm "syz-executor032", pid 7014, jiffies 4294944027 (age 13.830s)
   hex dump (first 32 bytes):
     01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
     00 00 00 00 00 00 00 00 02 00 00 00 05 35 82 c1  .............5..
   backtrace:
     [<00000000352f46d8>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]
     [<00000000352f46d8>] slab_post_alloc_hook mm/slab.h:522 [inline]
     [<00000000352f46d8>] slab_alloc mm/slab.c:3319 [inline]
     [<00000000352f46d8>] __do_kmalloc mm/slab.c:3653 [inline]
     [<00000000352f46d8>] __kmalloc+0x169/0x300 mm/slab.c:3664
     [<000000008e48f3d1>] kmalloc include/linux/slab.h:557 [inline]
     [<000000008e48f3d1>] ovs_vport_set_upcall_portids+0x54/0xd0  
net/openvswitch/vport.c:343
     [<00000000541e4f4a>] ovs_vport_alloc+0x7f/0xf0  
net/openvswitch/vport.c:139
     [<00000000f9a04a7d>] internal_dev_create+0x24/0x1d0  
net/openvswitch/vport-internal_dev.c:164
     [<0000000056ee7c13>] ovs_vport_add+0x81/0x190  
net/openvswitch/vport.c:199
     [<000000005434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
     [<00000000b7b253f1>] ovs_dp_cmd_new+0x22f/0x410  
net/openvswitch/datapath.c:1614
     [<00000000e0988518>] genl_family_rcv_msg+0x2ab/0x5b0  
net/netlink/genetlink.c:629
     [<00000000d0cc9347>] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
     [<000000006694b647>] netlink_rcv_skb+0x61/0x170  
net/netlink/af_netlink.c:2477
     [<0000000088381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
     [<00000000dad42a47>] netlink_unicast_kernel  
net/netlink/af_netlink.c:1302 [inline]
     [<00000000dad42a47>] netlink_unicast+0x1ec/0x2d0  
net/netlink/af_netlink.c:1328
     [<0000000067e6b079>] netlink_sendmsg+0x270/0x480  
net/netlink/af_netlink.c:1917
     [<00000000aab08a47>] sock_sendmsg_nosec net/socket.c:637 [inline]
     [<00000000aab08a47>] sock_sendmsg+0x54/0x70 net/socket.c:657
     [<000000004cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
     [<00000000c4901c63>] __sys_sendmsg+0x80/0xf0 net/socket.c:2356

BUG: memory leak
unreferenced object 0xffff8881228ca500 (size 128):
   comm "syz-executor032", pid 7015, jiffies 4294944622 (age 7.880s)
   hex dump (first 32 bytes):
     00 f0 27 18 81 88 ff ff 80 ac 8c 22 81 88 ff ff  ..'........"....
     40 b7 23 17 81 88 ff ff 00 00 00 00 00 00 00 00  @.#.............
   backtrace:
     [<000000000eb78212>] kmemleak_alloc_recursive  
include/linux/kmemleak.h:43 [inline]
     [<000000000eb78212>] slab_post_alloc_hook mm/slab.h:522 [inline]
     [<000000000eb78212>] slab_alloc mm/slab.c:3319 [inline]
     [<000000000eb78212>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3548
     [<00000000006ea6c6>] kmalloc include/linux/slab.h:552 [inline]
     [<00000000006ea6c6>] kzalloc include/linux/slab.h:748 [inline]
     [<00000000006ea6c6>] ovs_vport_alloc+0x37/0xf0  
net/openvswitch/vport.c:130
     [<00000000f9a04a7d>] internal_dev_create+0x24/0x1d0  
net/openvswitch/vport-internal_dev.c:164
     [<0000000056ee7c13>] ovs_vport_add+0x81/0x190  
net/openvswitch/vport.c:199
     [<000000005434efc7>] new_vport+0x19/0x80 net/openvswitch/datapath.c:194
     [<00000000b7b253f1>] ovs_dp_cmd_new+0x22f/0x410  
net/openvswitch/datapath.c:1614
     [<00000000e0988518>] genl_family_rcv_msg+0x2ab/0x5b0  
net/netlink/genetlink.c:629
     [<00000000d0cc9347>] genl_rcv_msg+0x54/0x9c net/netlink/genetlink.c:654
     [<000000006694b647>] netlink_rcv_skb+0x61/0x170  
net/netlink/af_netlink.c:2477
     [<0000000088381f37>] genl_rcv+0x29/0x40 net/netlink/genetlink.c:665
     [<00000000dad42a47>] netlink_unicast_kernel  
net/netlink/af_netlink.c:1302 [inline]
     [<00000000dad42a47>] netlink_unicast+0x1ec/0x2d0  
net/netlink/af_netlink.c:1328
     [<0000000067e6b079>] netlink_sendmsg+0x270/0x480  
net/netlink/af_netlink.c:1917
     [<00000000aab08a47>] sock_sendmsg_nosec net/socket.c:637 [inline]
     [<00000000aab08a47>] sock_sendmsg+0x54/0x70 net/socket.c:657
     [<000000004cb7c11d>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2311
     [<00000000c4901c63>] __sys_sendmsg+0x80/0xf0 net/socket.c:2356
     [<00000000c10abb2d>] __do_sys_sendmsg net/socket.c:2365 [inline]
     [<00000000c10abb2d>] __se_sys_sendmsg net/socket.c:2363 [inline]
     [<00000000c10abb2d>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2363



---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [RFC PATCH] kbuild: re-implement detection of CONFIG options leaked to user-space
From: Arnd Bergmann @ 2019-08-06  9:00 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Linux Kbuild mailing list, Sam Ravnborg, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song, bpf,
	Linux Kernel Mailing List, Networking, Amit Pundir
In-Reply-To: <20190806043729.5562-1-yamada.masahiro@socionext.com>

On Tue, Aug 6, 2019 at 6:38 AM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> I was playing with sed yesterday, but the resulted code might be unreadable.
>
> Sed scripts tend to be somewhat unreadable.
> I just wondered which language is appropriate for this?
> Maybe perl, or what else? I am not good at perl, though.

I like the sed version, in particular as it seems to do the job and
I'm not volunteering to write it in anything else.

> Maybe, it will be better to fix existing warnings
> before enabling this check.

Yes, absolutely.

> If somebody takes a closer look at them, that would be great.

Let's see:

> warning: include/uapi/linux/elfcore.h: leaks CONFIG_BINFMT_ELF_FDPIC to user-space

This one is nontrivial, since it defines two incompatible layouts for
this structure,
and the fdpic version is currently not usable at all from user space. Also,
the definition breaks configurations that have both CONFIG_BINFMT_ELF
and CONFIG_BINFMT_ELF_FDPIC enabled, which has become possible
with commit 382e67aec6a7 ("ARM: enable elf_fdpic on systems with an MMU").

The best way forward I see is to duplicate the structure definition, adding
a new 'struct elf_fdpic_prstatus', and using that in fs/binfmt_elf_fdpic.c.
The same change is required in include/linux/elfcore-compat.h.

> warning: include/uapi/linux/atmdev.h: leaks CONFIG_COMPAT to user-space

The "#define COMPAT_ATM_ADDPARTY" can be moved to include/linux/atmdev.h,
it's not needed in the uapi header

> warning: include/uapi/linux/raw.h: leaks CONFIG_MAX_RAW_DEVS to user-space

This has never been usable, I'd just remove MAX_RAW_MINORS and change
drivers/char/raw.c to use CONFIG_MAX_RAW_DEVS

> warning: include/uapi/linux/pktcdvd.h: leaks CONFIG_CDROM_PKTCDVD_WCACHE to user-space

USE_WCACHING can be moved to drivers/block/pktcdvd.c

> warning: include/uapi/linux/eventpoll.h: leaks CONFIG_PM_SLEEP to user-space

ep_take_care_of_epollwakeup() should not be in the header at all I think.
Commit 95f19f658ce1 ("epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled")
was wrong to move it out of fs/eventpoll.c, and I'd just move it back
as an inline function. (Added Amit to Cc for clarification).

> warning: include/uapi/linux/hw_breakpoint.h: leaks CONFIG_HAVE_MIXED_BREAKPOINTS_REGS to user-space

enum bp_type_idx started out in kernel/events/hw_breakpoint.c
and was later moved to a header which then became public. I
don't think it was ever meant to be public though. We either want
to move it back, or change the CONFIG_HAVE_MIXED_BREAKPOINTS_REGS
macro to an __ARCH_HAVE_MIXED_BREAKPOINTS_REGS that
is explicitly set in a header file by x86 and superh.

> warning: include/uapi/asm-generic/fcntl.h: leaks CONFIG_64BIT to user-space

The #ifdef could just be changed to
#if __BITS_PER_LONG == 32

We could also do this differently, given that most 64-bit architectures define
the same macros in their arch/*/include/asm/compat.h files (parisc and mips
use different values).

> warning: arch/x86/include/uapi/asm/mman.h: leaks CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS to user-space

I think arch_vm_get_page_prot should not be in the uapi header,
other architectures have it in arch/powerpc/include/asm/mman.h

> warning: arch/x86/include/uapi/asm/auxvec.h: leaks CONFIG_IA32_EMULATION to user-space
> warning: arch/x86/include/uapi/asm/auxvec.h: leaks CONFIG_X86_64 to user-space

It looks like this definition has always been wrong, across several
changes that made it wrong in different ways.

AT_VECTOR_SIZE_ARCH is supposed to define the size of the extra
aux vectors, which is meant to be '2' for i386 tasks, and '1' for
x86_64 tasks, regardless of how the kernel is configured. I looked at
this for a bit but it's hard to tell how to fix this without introducing
possible regressions. Note how 'mm->saved_auxv' uses this
size and gets copied between kernel and user space.

       Arnd

^ permalink raw reply

* Re: [RFC PATCH] kbuild: re-implement detection of CONFIG options leaked to user-space
From: Masahiro Yamada @ 2019-08-06  9:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kbuild mailing list, Sam Ravnborg, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song, bpf,
	Linux Kernel Mailing List, Networking, Amit Pundir
In-Reply-To: <CAK8P3a2POcb+AReLKib513i_RTN9kLM_Tun7+G5LOacDuy7gjQ@mail.gmail.com>

Hi Arnd,

On Tue, Aug 6, 2019 at 6:00 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Aug 6, 2019 at 6:38 AM Masahiro Yamada
> <yamada.masahiro@socionext.com> wrote:
> >
> > I was playing with sed yesterday, but the resulted code might be unreadable.
> >
> > Sed scripts tend to be somewhat unreadable.
> > I just wondered which language is appropriate for this?
> > Maybe perl, or what else? I am not good at perl, though.
>
> I like the sed version, in particular as it seems to do the job and
> I'm not volunteering to write it in anything else.
>
> > Maybe, it will be better to fix existing warnings
> > before enabling this check.
>
> Yes, absolutely.
>
> > If somebody takes a closer look at them, that would be great.
>
> Let's see:

Thanks for looking into these!

If possible, could you please send patches to fix them?
We can start with low-hanging fruits to reduce the number of warnings.

Thank you.

-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* Re: [PATCH] tools: bpftool: fix reading from /proc/config.gz
From: Quentin Monnet @ 2019-08-06  9:36 UTC (permalink / raw)
  To: Peter Wu
  Cc: Jakub Kicinski, Stanislav Fomichev, Alexei Starovoitov,
	Daniel Borkmann, netdev, Stanislav Fomichev
In-Reply-To: <20190805235449.GA8088@al>

Hi Peter,

2019-08-06 00:54 UTC+0100 ~ Peter Wu <peter@lekensteyn.nl>
> Hi all,
> 
> Thank you for your quick feedback, I will address them in the next
> revision.
> 
> On Mon, Aug 05, 2019 at 11:41:09AM +0100, Quentin Monnet wrote:
> 
>> As far as I understood (from examining Cilium [0]), /proc/config _is_
>> used by some distributions, such as CoreOS. This is why we look at that
>> location in bpftool.
>>
>> [0] https://github.com/cilium/cilium/blob/master/bpf/run_probes.sh#L42
> 
> This comment[1] says "CoreOS uses /proc/config", but I think that is a
> typo and is supposed to say "/proc/config.gz". The original feature
> request[2] uses "/boot/config" as example.
> 
>  [1]: https://github.com/cilium/cilium/pull/1065
>  [2]: https://github.com/cilium/cilium/issues/891
> 
> Given that "/proc/config.gz" is the standard since at least v2.6.12-rc2,
> and the official kernel has no mention of "/proc/config", I would like
> to skip the latter. If someone has a need for this and it is not covered
> by either /boot/config-$(uname -r) or /proc/config.gz, they could submit
> a patch for it with links to documentation. How about that?

Ok, did a bit of research on my side as well, and I couldn't find a
solid reference to /proc/config either, so it seems you are correct.
Let's drop /proc/config for now. Thanks for investigating that!

> 
>>> -static char *get_kernel_config_option(FILE *fd, const char *option)
>>> +static bool get_kernel_config_option(FILE *fd, char **buf_p, size_t *n_p,
>>> +				     char **value)
>>
>> Maybe we could rename this function, and have "next" appear in it
>> somewhere? After your changes, it does not return the value for a
>> specific option anymore.
> 
> I have changed it to "read_next_kernel_config_option", let me know if
> you prefer an alternative.
> 

Sounds good to me.

>>>  {
>>> -	size_t line_n = 0, optlen = strlen(option);
>>> -	char *res, *strval, *line = NULL;
>>> -	ssize_t n;
>>> +	char *sep;
>>> +	ssize_t linelen;
>>
>> Please order the declarations in reverse-Christmas tree style.
> 
> Does this refer to the type, name, or full line length? I did not find
> this in CodingStyle, the closest I could get is:
> https://lore.kernel.org/patchwork/patch/732076/
> 
> I will assume the line length for now.

I am unsure this is in the CodingStyle, but fairly certain that this is
a convention for at least network-related code. And yes, as I understand
it refers to the length of the line.

> 
>>>  static void probe_kernel_image_config(void)
>>> @@ -386,31 +386,34 @@ static void probe_kernel_image_config(void)
>>>  		/* test_bpf module for BPF tests */
>>>  		"CONFIG_TEST_BPF",
>>>  	};
>>> +	char *values[ARRAY_SIZE(options)] = { };
>>>  	char *value, *buf = NULL;
>>>  	struct utsname utsn;
>>>  	char path[PATH_MAX];
>>>  	size_t i, n;
>>>  	ssize_t ret;
>>> -	FILE *fd;
>>> +	FILE *fd = NULL;
>>> +	bool is_pipe = false;
>>
>> Reverse-Christmas-tree style please.
> 
> Even if that means moving lines? Something like this?
> 
>         char path[PATH_MAX];
>    +    bool is_pipe = false;
>    +    FILE *fd = NULL;
>         size_t i, n;
>         ssize_t ret;
>    -    FILE *fd;

Yes, that's the idea (although "is_pipe" should be at the top in that
case, above "path" -- and this applies to your v2 patch, by the way).

> 
>>>  	if (uname(&utsn))
>>> -		goto no_config;
>>> +		goto end_parse;
>>
>> Just thinking, maybe if uname() fails we can skip /boot/config-$(uname
>> -r) but still attempt to parse /proc/config{,.gz} instead of printing
>> only NULL options?
> 
> Good idea, I'll try a bit harder if uname falls for whatever reason.

Thanks!

> 
>> Because some distributions do use /proc/config, we should keep this. You
>> can probably add /proc/config.gz as another attempt below (or even
>> above) the current case?
> 
> I doubt it is actually in use, it looks like a typo in the original PR.
> This post only lists /proc/config.gz, /boot/config and
> /boot/config-$(uname -r): https://superuser.com/questions/287371
> 
>>> +	while (get_kernel_config_option(fd, &buf, &n, &value))
>>> +		for (i = 0; i < ARRAY_SIZE(options); i++) {
>>> +			if (values[i] || strcmp(buf, options[i]))
>>
>> Can we have an option set multiple times in the config file? If so,
>> maybe have a p_info() if values are different to warn users that
>> conflicting values were found?
> 
> scripts/kconfig/merge_config.sh seems to apply a merge strategy,
> overwriting earlier values and warning about it. However this should be
> rare given that it ended up at /proc/config.gz. For now I will favor
> simplicity over complexity and keep the old situation. Let me know if
> you prefer otherwise.

Understood, let's keep it that way for now.

Thanks,
Quentin

^ permalink raw reply

* [PATCH 1/1] ixgbe: sync the first fragment unconditionally
From: Firo Yang @ 2019-08-06  9:29 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: jeffrey.t.kirsher@intel.com, davem@davemloft.net,
	intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org,
	Firo Yang

In Xen environment, if Xen-swiotlb is enabled, ixgbe driver
could possibly allocate a page, DMA memory buffer, for the first
fragment which is not suitable for Xen-swiotlb to do DMA operations.
Xen-swiotlb will internally allocate another page for doing DMA
operations. It requires syncing between those two pages. Otherwise,
we may get an incomplete skb. To fix this problem, sync the first
fragment no matter the first fargment is makred as "page_released"
or not.

Signed-off-by: Firo Yang <firo.yang@suse.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cbaf712d6529..200de9838096 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1825,13 +1825,7 @@ static void ixgbe_pull_tail(struct ixgbe_ring *rx_ring,
 static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
 				struct sk_buff *skb)
 {
-	/* if the page was released unmap it, else just sync our portion */
-	if (unlikely(IXGBE_CB(skb)->page_released)) {
-		dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
-				     ixgbe_rx_pg_size(rx_ring),
-				     DMA_FROM_DEVICE,
-				     IXGBE_RX_DMA_ATTR);
-	} else if (ring_uses_build_skb(rx_ring)) {
+	if (ring_uses_build_skb(rx_ring)) {
 		unsigned long offset = (unsigned long)(skb->data) & ~PAGE_MASK;
 
 		dma_sync_single_range_for_cpu(rx_ring->dev,
@@ -1848,6 +1842,14 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
 					      skb_frag_size(frag),
 					      DMA_FROM_DEVICE);
 	}
+
+	/* If the page was released, just unmap it. */
+	if (unlikely(IXGBE_CB(skb)->page_released)) {
+		dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
+				     ixgbe_rx_pg_size(rx_ring),
+				     DMA_FROM_DEVICE,
+				     IXGBE_RX_DMA_ATTR);
+	}
 }
 
 /**
-- 
2.16.4


^ permalink raw reply related

* Re: [PATCH net 0/2] flow_offload hardware priority fixes
From: Pablo Neira Ayuso @ 2019-08-06  9:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netfilter-devel, davem, netdev, marcelo.leitner, jiri, wenxu,
	saeedm, paulb, gerlitz.or
In-Reply-To: <20190805120439.40d70cee@cakuba.netronome.com>

On Mon, Aug 05, 2019 at 12:04:39PM -0700, Jakub Kicinski wrote:
> On Sat, 3 Aug 2019 09:08:54 +0200, Pablo Neira Ayuso wrote:
> > The idea is that every subsystem (ethtool, tc, nf) sets up/binds its
> > own flow_block object. And each flow_block object has its own priority
> > range space. So whatever priority the user specifies only applies to
> > the specific subsystem.
> 
> Right, okay so that part is pretty obvious but thanks for spelling it
> out. 
> 
> Are you also agreeing that priorities of blocks, not rules within 
> a block are dictated by the order of processing within the kernel?
> IOW TC blocks are _always_ before nft blocks?

yes.

^ permalink raw reply

* [PATCH v2 0/3] arm/arm64: Add support for function error injection
From: Leo Yan @ 2019-08-06 10:00 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Naveen N. Rao,
	linux-arm-kernel, linux-kernel, linuxppc-dev, linux-arch, netdev,
	bpf, clang-built-linux, Masami Hiramatsu
  Cc: Leo Yan

This small patch set is to add support for function error injection;
this can be used to eanble more advanced debugging feature, e.g.
CONFIG_BPF_KPROBE_OVERRIDE.

The patch 01/03 is to consolidate the function definition which can be
suared cross architectures, patches 02,03/03 are used for enabling
function error injection on arm64 and arm architecture respectively.

I tested on arm64 platform Juno-r2 and one of my laptop with x86
architecture with below steps; I don't test for Arm architecture so
only pass compilation.

- Enable kernel configuration:
  CONFIG_BPF_KPROBE_OVERRIDE
  CONFIG_BTRFS_FS
  CONFIG_BPF_EVENTS=y
  CONFIG_KPROBES=y
  CONFIG_KPROBE_EVENTS=y
  CONFIG_BPF_KPROBE_OVERRIDE=y

- Build samples/bpf on with Debian rootFS:
  # cd $kernel
  # make headers_install
  # make samples/bpf/ LLC=llc-7 CLANG=clang-7

- Run the sample tracex7:
  # dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
  # DEVICE=$(losetup --show -f testfile.img)
  # mkfs.btrfs -f $DEVICE
  # ./tracex7 testfile.img
  [ 1975.211781] BTRFS error (device (efault)): open_ctree failed
  mount: /mnt/linux-kernel/linux-cs-dev/samples/bpf/tmpmnt: mount(2) system call failed: Cannot allocate memory.

Changes from v1:
* Consolidated the function definition into asm-generic header (Will);
* Used APIs to access pt_regs elements (Will);
* Fixed typos in the comments (Will).


Leo Yan (3):
  error-injection: Consolidate override function definition
  arm64: Add support for function error injection
  arm: Add support for function error injection

 arch/arm/Kconfig                           |  1 +
 arch/arm/include/asm/ptrace.h              |  5 +++++
 arch/arm/lib/Makefile                      |  2 ++
 arch/arm/lib/error-inject.c                | 19 +++++++++++++++++++
 arch/arm64/Kconfig                         |  1 +
 arch/arm64/include/asm/ptrace.h            |  5 +++++
 arch/arm64/lib/Makefile                    |  2 ++
 arch/arm64/lib/error-inject.c              | 18 ++++++++++++++++++
 arch/powerpc/include/asm/error-injection.h | 13 -------------
 arch/x86/include/asm/error-injection.h     | 13 -------------
 include/asm-generic/error-injection.h      |  6 ++++++
 include/linux/error-injection.h            |  6 +++---
 12 files changed, 62 insertions(+), 29 deletions(-)
 create mode 100644 arch/arm/lib/error-inject.c
 create mode 100644 arch/arm64/lib/error-inject.c
 delete mode 100644 arch/powerpc/include/asm/error-injection.h
 delete mode 100644 arch/x86/include/asm/error-injection.h

-- 
2.17.1


^ permalink raw reply

* [PATCH v2 1/3] error-injection: Consolidate override function definition
From: Leo Yan @ 2019-08-06 10:00 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Naveen N. Rao,
	linux-arm-kernel, linux-kernel, linuxppc-dev, linux-arch, netdev,
	bpf, clang-built-linux, Masami Hiramatsu
  Cc: Leo Yan
In-Reply-To: <20190806100015.11256-1-leo.yan@linaro.org>

The function override_function_with_return() is defined separately for
each architecture and every architecture's definition is almost same
with each other.  E.g. x86 and powerpc both define function in its own
asm/error-injection.h header and override_function_with_return() has
the same definition, the only difference is that x86 defines an extra
function just_return_func() but it is specific for x86 and is only used
by x86's override_function_with_return(), so don't need to export this
function.

This patch consolidates override_function_with_return() definition into
asm-generic/error-injection.h header, thus all architectures can use the
common definition.  As result, the architecture specific headers are
removed; the include/linux/error-injection.h header also changes to
include asm-generic/error-injection.h header rather than architecture
header, furthermore, it includes linux/compiler.h for successful
compilation.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 arch/powerpc/include/asm/error-injection.h | 13 -------------
 arch/x86/include/asm/error-injection.h     | 13 -------------
 include/asm-generic/error-injection.h      |  6 ++++++
 include/linux/error-injection.h            |  6 +++---
 4 files changed, 9 insertions(+), 29 deletions(-)
 delete mode 100644 arch/powerpc/include/asm/error-injection.h
 delete mode 100644 arch/x86/include/asm/error-injection.h

diff --git a/arch/powerpc/include/asm/error-injection.h b/arch/powerpc/include/asm/error-injection.h
deleted file mode 100644
index 62fd24739852..000000000000
--- a/arch/powerpc/include/asm/error-injection.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0+ */
-
-#ifndef _ASM_ERROR_INJECTION_H
-#define _ASM_ERROR_INJECTION_H
-
-#include <linux/compiler.h>
-#include <linux/linkage.h>
-#include <asm/ptrace.h>
-#include <asm-generic/error-injection.h>
-
-void override_function_with_return(struct pt_regs *regs);
-
-#endif /* _ASM_ERROR_INJECTION_H */
diff --git a/arch/x86/include/asm/error-injection.h b/arch/x86/include/asm/error-injection.h
deleted file mode 100644
index 47b7a1296245..000000000000
--- a/arch/x86/include/asm/error-injection.h
+++ /dev/null
@@ -1,13 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_ERROR_INJECTION_H
-#define _ASM_ERROR_INJECTION_H
-
-#include <linux/compiler.h>
-#include <linux/linkage.h>
-#include <asm/ptrace.h>
-#include <asm-generic/error-injection.h>
-
-asmlinkage void just_return_func(void);
-void override_function_with_return(struct pt_regs *regs);
-
-#endif /* _ASM_ERROR_INJECTION_H */
diff --git a/include/asm-generic/error-injection.h b/include/asm-generic/error-injection.h
index 95a159a4137f..80ca61058dd2 100644
--- a/include/asm-generic/error-injection.h
+++ b/include/asm-generic/error-injection.h
@@ -16,6 +16,8 @@ struct error_injection_entry {
 	int		etype;
 };
 
+struct pt_regs;
+
 #ifdef CONFIG_FUNCTION_ERROR_INJECTION
 /*
  * Whitelist ganerating macro. Specify functions which can be
@@ -28,8 +30,12 @@ static struct error_injection_entry __used				\
 		.addr = (unsigned long)fname,				\
 		.etype = EI_ETYPE_##_etype,				\
 	};
+
+void override_function_with_return(struct pt_regs *regs);
 #else
 #define ALLOW_ERROR_INJECTION(fname, _etype)
+
+static inline void override_function_with_return(struct pt_regs *regs) { }
 #endif
 #endif
 
diff --git a/include/linux/error-injection.h b/include/linux/error-injection.h
index 280c61ecbf20..635a95caf29f 100644
--- a/include/linux/error-injection.h
+++ b/include/linux/error-injection.h
@@ -2,16 +2,16 @@
 #ifndef _LINUX_ERROR_INJECTION_H
 #define _LINUX_ERROR_INJECTION_H
 
-#ifdef CONFIG_FUNCTION_ERROR_INJECTION
+#include <linux/compiler.h>
+#include <asm-generic/error-injection.h>
 
-#include <asm/error-injection.h>
+#ifdef CONFIG_FUNCTION_ERROR_INJECTION
 
 extern bool within_error_injection_list(unsigned long addr);
 extern int get_injectable_error_type(unsigned long addr);
 
 #else /* !CONFIG_FUNCTION_ERROR_INJECTION */
 
-#include <asm-generic/error-injection.h>
 static inline bool within_error_injection_list(unsigned long addr)
 {
 	return false;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 2/3] arm64: Add support for function error injection
From: Leo Yan @ 2019-08-06 10:00 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Naveen N. Rao,
	linux-arm-kernel, linux-kernel, linuxppc-dev, linux-arch, netdev,
	bpf, clang-built-linux, Masami Hiramatsu
  Cc: Leo Yan
In-Reply-To: <20190806100015.11256-1-leo.yan@linaro.org>

Inspired by the commit 7cd01b08d35f ("powerpc: Add support for function
error injection"), this patch supports function error injection for
Arm64.

This patch mainly support two functions: one is regs_set_return_value()
which is used to overwrite the return value; the another function is
override_function_with_return() which is to override the probed
function returning and jump to its caller.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 arch/arm64/Kconfig              |  1 +
 arch/arm64/include/asm/ptrace.h |  5 +++++
 arch/arm64/lib/Makefile         |  2 ++
 arch/arm64/lib/error-inject.c   | 18 ++++++++++++++++++
 4 files changed, 26 insertions(+)
 create mode 100644 arch/arm64/lib/error-inject.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..b15803afb2a0 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -148,6 +148,7 @@ config ARM64
 	select HAVE_FAST_GUP
 	select HAVE_FTRACE_MCOUNT_RECORD
 	select HAVE_FUNCTION_TRACER
+	select HAVE_FUNCTION_ERROR_INJECTION
 	select HAVE_FUNCTION_GRAPH_TRACER
 	select HAVE_GCC_PLUGINS
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index b1dd039023ef..891b9995cb4b 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -301,6 +301,11 @@ static inline unsigned long regs_return_value(struct pt_regs *regs)
 	return regs->regs[0];
 }
 
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
+{
+	regs->regs[0] = rc;
+}
+
 /**
  * regs_get_kernel_argument() - get Nth function argument in kernel
  * @regs:	pt_regs of that context
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 33c2a4abda04..f182ccb0438e 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -33,3 +33,5 @@ UBSAN_SANITIZE_atomic_ll_sc.o	:= n
 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
 
 obj-$(CONFIG_CRC32) += crc32.o
+
+obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/arm64/lib/error-inject.c b/arch/arm64/lib/error-inject.c
new file mode 100644
index 000000000000..ed15021da3ed
--- /dev/null
+++ b/arch/arm64/lib/error-inject.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/error-injection.h>
+#include <linux/kprobes.h>
+
+void override_function_with_return(struct pt_regs *regs)
+{
+	/*
+	 * 'regs' represents the state on entry of a predefined function in
+	 * the kernel/module and which is captured on a kprobe.
+	 *
+	 * When kprobe returns back from exception it will override the end
+	 * of probed function and directly return to the predefined
+	 * function's caller.
+	 */
+	instruction_pointer_set(regs, procedure_link_pointer(regs));
+}
+NOKPROBE_SYMBOL(override_function_with_return);
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 3/3] arm: Add support for function error injection
From: Leo Yan @ 2019-08-06 10:00 UTC (permalink / raw)
  To: Russell King, Oleg Nesterov, Catalin Marinas, Will Deacon,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86, Arnd Bergmann, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Naveen N. Rao,
	linux-arm-kernel, linux-kernel, linuxppc-dev, linux-arch, netdev,
	bpf, clang-built-linux, Masami Hiramatsu
  Cc: Leo Yan
In-Reply-To: <20190806100015.11256-1-leo.yan@linaro.org>

This patch implements arm specific functions regs_set_return_value() and
override_function_with_return() to support function error injection.

In the exception flow, it updates pt_regs::ARM_pc with pt_regs::ARM_lr
so can override the probed function return.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 arch/arm/Kconfig              |  1 +
 arch/arm/include/asm/ptrace.h |  5 +++++
 arch/arm/lib/Makefile         |  2 ++
 arch/arm/lib/error-inject.c   | 19 +++++++++++++++++++
 4 files changed, 27 insertions(+)
 create mode 100644 arch/arm/lib/error-inject.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 33b00579beff..2d3d44a037f6 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -77,6 +77,7 @@ config ARM
 	select HAVE_EXIT_THREAD
 	select HAVE_FAST_GUP if ARM_LPAE
 	select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
+	select HAVE_FUNCTION_ERROR_INJECTION if !THUMB2_KERNEL
 	select HAVE_FUNCTION_GRAPH_TRACER if !THUMB2_KERNEL && !CC_IS_CLANG
 	select HAVE_FUNCTION_TRACER if !XIP_KERNEL
 	select HAVE_GCC_PLUGINS
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index 91d6b7856be4..3b41f37b361a 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -89,6 +89,11 @@ static inline long regs_return_value(struct pt_regs *regs)
 	return regs->ARM_r0;
 }
 
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
+{
+	regs->ARM_r0 = rc;
+}
+
 #define instruction_pointer(regs)	(regs)->ARM_pc
 
 #ifdef CONFIG_THUMB2_KERNEL
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index b25c54585048..8f56484a7156 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -42,3 +42,5 @@ ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
   CFLAGS_xor-neon.o		+= $(NEON_FLAGS)
   obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
 endif
+
+obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
diff --git a/arch/arm/lib/error-inject.c b/arch/arm/lib/error-inject.c
new file mode 100644
index 000000000000..2d696dc94893
--- /dev/null
+++ b/arch/arm/lib/error-inject.c
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/error-injection.h>
+#include <linux/kprobes.h>
+
+void override_function_with_return(struct pt_regs *regs)
+{
+	/*
+	 * 'regs' represents the state on entry of a predefined function in
+	 * the kernel/module and which is captured on a kprobe.
+	 *
+	 * 'regs->ARM_lr' contains the the link register for the probed
+	 * function, when kprobe returns back from exception it will override
+	 * the end of probed function and directly return to the predefined
+	 * function's caller.
+	 */
+	instruction_pointer_set(regs, regs->ARM_lr);
+}
+NOKPROBE_SYMBOL(override_function_with_return);
-- 
2.17.1


^ permalink raw reply related

* [PATCH] net: sched: sch_taprio: fix memleak in error path for sched list parse
From: Ivan Khoronzhuk @ 2019-08-06 10:04 UTC (permalink / raw)
  To: vinicius.gomes, davem
  Cc: jhs, xiyou.wangcong, jiri, netdev, linux-kernel, Ivan Khoronzhuk

In case off error, all entries should be freed from the sched list
before deleting it. For simplicity use rcu way.

Fixes: 5a781ccbd19e46 ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---

Based on net/master

 net/sched/sch_taprio.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index b55a82c1e1bc..4f6333035841 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1451,7 +1451,8 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
 	spin_unlock_bh(qdisc_lock(sch));
 
 free_sched:
-	kfree(new_admin);
+	if (new_admin)
+		call_rcu(&new_admin->rcu, taprio_free_sched_cb);
 
 	return err;
 }
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net-next] ibmveth: Allow users to update reported speed and duplex
From: Michael Ellerman @ 2019-08-06 10:25 UTC (permalink / raw)
  To: Thomas Falcon, netdev; +Cc: Thomas Falcon, linuxppc-dev
In-Reply-To: <1565033086-21778-1-git-send-email-tlfalcon@linux.ibm.com>

Thomas Falcon <tlfalcon@linux.ibm.com> writes:
> Reported ethtool link settings for the ibmveth driver are currently
> hardcoded and no longer reflect the actual capabilities of supported
> hardware. There is no interface designed for retrieving this information
> from device firmware nor is there any way to update current settings
> to reflect observed or expected link speeds.
>
> To avoid confusion, initially define speed and duplex as unknown and

Doesn't that risk break existing setups?

> allow the user to alter these settings to match the expected
> capabilities of underlying hardware if needed. This update would allow
> the use of configurations that rely on certain link speed settings,
> such as LACP. This patch is based on the implementation in virtio_net.

Wouldn't it be safer to keep the current values as the default, and then
also allow them to be overridden by a motivated user.

cheers

^ permalink raw reply

* Re: [PATCH net 0/2] Fix batched event generation for vlan action
From: Jamal Hadi Salim @ 2019-08-06 10:44 UTC (permalink / raw)
  To: Roman Mashak, davem; +Cc: netdev, kernel, xiyou.wangcong, jiri
In-Reply-To: <1564773407-26209-1-git-send-email-mrv@mojatatu.com>

On 2019-08-02 3:16 p.m., Roman Mashak wrote:
> When adding or deleting a batch of entries, the kernel sends up to
> TCA_ACT_MAX_PRIO (defined to 32 in kernel) entries in an event to user
> space. However it does not consider that the action sizes may vary and
> require different skb sizes.
> 
> For example, consider the following script adding 32 entries with all
> supported vlan parameters (in order to maximize netlink messages size):
> 
> % cat tc-batch.sh
> TC="sudo /mnt/iproute2.git/tc/tc"
> 
> $TC actions flush action vlan
> for i in `seq 1 $1`;
> do
>     cmd="action vlan push protocol 802.1q id 4094 priority 7 pipe \
>                 index $i cookie aabbccddeeff112233445566778800a1 "
>     args=$args$cmd
> done
> $TC actions add $args
> %
> % ./tc-batch.sh 32
> Error: Failed to fill netlink attributes while adding TC action.
> We have an error talking to the kernel
> %
> 
> patch 1 adds callback in tc_action_ops of vlan action, which calculates
> the action size, and passes size to tcf_add_notify()/tcf_del_notify().
> 
> patch 2 updates the TDC test suite with relevant vlan test cases.

For the  series:
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* [PATCH] net/ipv4: reset mac head before call ip_tunnel_rcv()
From: Chen Minqiang @ 2019-08-06 10:47 UTC (permalink / raw)
  Cc: davem, Chen Minqiang, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel

Signed-off-by: Chen Minqiang <ptpt52@gmail.com>
---
 net/ipv4/ipip.c | 1 +
 net/ipv6/sit.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 43adfc1641ba..ba2b5fc8910f 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -242,6 +242,7 @@ static int ipip_tunnel_rcv(struct sk_buff *skb, u8 ipproto)
 			if (!tun_dst)
 				return 0;
 		}
+		skb_reset_mac_header(skb);
 		return ip_tunnel_rcv(tunnel, skb, tpi, tun_dst, log_ecn_error);
 	}
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 80610899a323..44a9674d06a6 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -739,6 +739,7 @@ static int sit_tunnel_rcv(struct sk_buff *skb, u8 ipproto)
 			tpi = &ipip_tpi;
 		if (iptunnel_pull_header(skb, 0, tpi->proto, false))
 			goto drop;
+		skb_reset_mac_header(skb);
 		return ip_tunnel_rcv(tunnel, skb, tpi, NULL, log_ecn_error);
 	}
 
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox