Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH resend net] net: ethtool: add missing NETIF_F_GSO_FRAGLIST feature string
From: Michal Kubecek @ 2020-06-17 21:18 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Florian Fainelli,
	Richard Cochran, Antoine Tenart, Aya Levin, Steffen Klassert,
	Willem de Bruijn, netdev, linux-kernel
In-Reply-To: <9oPfKdiVuoDf251VBJXgNs-Hv-HWPnIJk52x-SQc1frfg8QSf9z3rCL-CBSafkp9SO0CjNzU8QvUv9Abe4SvoUpejeob9OImDPbflzRC-0Y=@pm.me>

[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]

On Wed, Jun 17, 2020 at 08:42:47PM +0000, Alexander Lobakin wrote:
> Commit 3b33583265ed ("net: Add fraglist GRO/GSO feature flags") missed
> an entry for NETIF_F_GSO_FRAGLIST in netdev_features_strings array. As
> a result, fraglist GSO feature is not shown in 'ethtool -k' output and
> can't be toggled on/off.
> The fix is trivial.
> 
> Fixes: 3b33583265ed ("net: Add fraglist GRO/GSO feature flags")
> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> ---
>  net/ethtool/common.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index 423e640e3876..47f63526818e 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -43,6 +43,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
>  	[NETIF_F_GSO_SCTP_BIT] =	 "tx-sctp-segmentation",
>  	[NETIF_F_GSO_ESP_BIT] =		 "tx-esp-segmentation",
>  	[NETIF_F_GSO_UDP_L4_BIT] =	 "tx-udp-segmentation",
> +	[NETIF_F_GSO_FRAGLIST_BIT] =	 "tx-gso-list",
>  
>  	[NETIF_F_FCOE_CRC_BIT] =         "tx-checksum-fcoe-crc",
>  	[NETIF_F_SCTP_CRC_BIT] =        "tx-checksum-sctp",

Reviewed-by: Michal Kubecek <mkubecek@suse.cz>

AFAICS the name for NETIF_F_GSO_TUNNEL_REMCSUM_BIT is also missing but
IMHO it will be better to fix that by a separate patch with its own
Fixes tag.

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: net/dsa/microchip: correct placement of dt property phy-mode?
From: Andrew Lunn @ 2020-06-17 21:18 UTC (permalink / raw)
  To: Helmut Grohne
  Cc: Woojung Huh, Microchip Linux Driver Support, Vivien Didelot,
	Florian Fainelli, netdev
In-Reply-To: <20200617082235.GA1523@laureti-dev>

> If nothing else, it makes the device tree unintuitive to use.
> 
> Is this placement of the phy-mode on the switch intentional?

That i cannot answer.
> 
> If yes: I think this should be prominently documented in
> Documentation/devicetree/bindings/net/dsa/ksz.txt.

Yes, it needs to be documented.

> If no: The microchip driver should follow the documented dsa convention
> and place the phy-mode on the relevant port nodes.
>
> If no: Do we have to support old device trees that have the phy-mode
> property on the switch?

We should not break existing DT blobs. So the driver should be
extended to first look in the port node. If it does not find it there,
look in the switch node. And maybe give a warning if it is found in
the switch node, saying the DT should be updated.

    Andrew

^ permalink raw reply

* Re: [PATCH ghak90 V8 07/16] audit: add contid support for signalling the audit daemon
From: Paul Moore @ 2020-06-17 21:33 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: Eric W. Biederman, nhorman, linux-api, containers, LKML, dhowells,
	linux-audit, netfilter-devel, simo, netdev, linux-fsdevel,
	Eric Paris, mpatel, Serge Hallyn
In-Reply-To: <20200608180330.z23hohfa2nclhxf5@madcap2.tricolour.ca>

On Mon, Jun 8, 2020 at 2:04 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-04-22 13:24, Paul Moore wrote:
> > On Fri, Apr 17, 2020 at 6:26 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > Paul Moore <paul@paul-moore.com> writes:
> > > > On Thu, Apr 16, 2020 at 4:36 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > >> Paul Moore <paul@paul-moore.com> writes:
> > > >> > On Mon, Mar 30, 2020 at 1:49 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > >> >> On 2020-03-30 13:34, Paul Moore wrote:
> > > >> >> > On Mon, Mar 30, 2020 at 12:22 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > >> >> > > On 2020-03-30 10:26, Paul Moore wrote:
> > > >> >> > > > On Mon, Mar 30, 2020 at 9:47 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > >> >> > > > > On 2020-03-28 23:11, Paul Moore wrote:
> > > >> >> > > > > > On Tue, Mar 24, 2020 at 5:02 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > >> >> > > > > > > On 2020-03-23 20:16, Paul Moore wrote:
> > > >> >> > > > > > > > On Thu, Mar 19, 2020 at 6:03 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > >> >> > > > > > > > > On 2020-03-18 18:06, Paul Moore wrote:
> > > >> >
> > > >> > ...
> > > >> >
> > > >> >> > > Well, every time a record gets generated, *any* record gets generated,
> > > >> >> > > we'll need to check for which audit daemons this record is in scope and
> > > >> >> > > generate a different one for each depending on the content and whether
> > > >> >> > > or not the content is influenced by the scope.
> > > >> >> >
> > > >> >> > That's the problem right there - we don't want to have to generate a
> > > >> >> > unique record for *each* auditd on *every* record.  That is a recipe
> > > >> >> > for disaster.
> > > >> >> >
> > > >> >> > Solving this for all of the known audit records is not something we
> > > >> >> > need to worry about in depth at the moment (although giving it some
> > > >> >> > casual thought is not a bad thing), but solving this for the audit
> > > >> >> > container ID information *is* something we need to worry about right
> > > >> >> > now.
> > > >> >>
> > > >> >> If you think that a different nested contid value string per daemon is
> > > >> >> not acceptable, then we are back to issuing a record that has only *one*
> > > >> >> contid listed without any nesting information.  This brings us back to
> > > >> >> the original problem of keeping *all* audit log history since the boot
> > > >> >> of the machine to be able to track the nesting of any particular contid.
> > > >> >
> > > >> > I'm not ruling anything out, except for the "let's just completely
> > > >> > regenerate every record for each auditd instance".
> > > >>
> > > >> Paul I am a bit confused about what you are referring to when you say
> > > >> regenerate every record.
> > > >>
> > > >> Are you saying that you don't want to repeat the sequence:
> > > >>         audit_log_start(...);
> > > >>         audit_log_format(...);
> > > >>         audit_log_end(...);
> > > >> for every nested audit daemon?
> > > >
> > > > If it can be avoided yes.  Audit performance is already not-awesome,
> > > > this would make it even worse.
> > >
> > > As far as I can see not repeating sequences like that is fundamental
> > > for making this work at all.  Just because only the audit subsystem
> > > should know about one or multiple audit daemons.  Nothing else should
> > > care.
> >
> > Yes, exactly, this has been mentioned in the past.  Both the
> > performance hit and the code complication in the caller are things we
> > must avoid.
> >
> > > >> Or are you saying that you would like to literraly want to send the same
> > > >> skb to each of the nested audit daemons?
> > > >
> > > > Ideally we would reuse the generated audit messages as much as
> > > > possible.  Less work is better.  That's really my main concern here,
> > > > let's make sure we aren't going to totally tank performance when we
> > > > have a bunch of nested audit daemons.
> > >
> > > So I think there are two parts of this answer.  Assuming we are talking
> > > about nesting audit daemons in containers we will have different
> > > rulesets and I expect most of the events for a nested audit daemon won't
> > > be of interest to the outer audit daemon.
> >
> > Yes, this is another thing that Richard and I have discussed in the
> > past.  We will basically need to create per-daemon queues, rules,
> > tracking state, etc.; that is easy enough.  What will be slightly more
> > tricky is the part where we apply the filters to the individual
> > records and decide if that record is valid/desired for a given daemon.
> > I think it can be done without too much pain, and any changes to the
> > callers, but it will require a bit of work to make sure it is done
> > well and that records are needlessly duplicated in the kernel.
> >
> > > Beyond that it should be very straight forward to keep a pointer and
> > > leave the buffer as a scatter gather list until audit_log_end
> > > and translate pids, and rewrite ACIDs attributes in audit_log_end
> > > when we build the final packet.  Either through collaboration with
> > > audit_log_format or a special audit_log command that carefully sets
> > > up the handful of things that need that information.
> >
> > In order to maximize record re-use I think we will want to hold off on
> > assembling the final packet until it is sent to the daemons in the
> > kauditd thread.  We'll also likely need to create special
> > audit_log_XXX functions to capture fields which we know will need
> > translation, e.g. ACID information.  (the reason for the new
> > audit_log_XXX functions would be to mark the new sg element and ensure
> > the buffer is handled correctly)
> >
> > Regardless of the details, I think the scatter gather approach is the
> > key here - that seems like the best design idea I've seen thus far.
> > It enables us to replace portions of the record as needed ... and
> > possibly use the existing skb cow stuff ... it has been a while, but
> > does the skb cow functions handle scatter gather skbs or do they need
> > to be linear?
>
> How does the selection of this data management technique affect our
> choice of field format?

I'm not sure it affects the record string, but it might affect the
in-kernel API as we would likely want to have a special function for
logging the audit container ID that does the scatter-gather management
for the record.  There might also need to be some changes to how we
allocate the records.

However, since you're the one working on these patches I would expect
you to be the one to look into how this would work and what the
impacts might be to the code, record format, etc.

> Does this lock the field value to a fixed length?

I wouldn't think so.  In fact if it did it wouldn't really be a good solution.

Once again, this is something I would expect you to look into.

> Does the use of scatter/gather techniques or structures allow
> the use of different lengths of data for each destination (auditd)?

This is related to the above ... but yes, the reason why Eric and I
were discussing a scatter/gather approach is that it would presumably
allow one to break the single record string into pieces which could be
managed and manipulated much easier than the monolithic record string.

> I could see different target audit daemons triggering or switching to a
> different chunk of data and length.  This does raise a concern related
> to the previous sig_info2 discussion that the struct contobj that exists
> at the time of audit_log_exit called could have been reaped by the time
> the buffer is pulled from the queue for transmission to auditd, but we
> could hold a reference to it as is done for sig_info2.

Yes.

> Looking through the kernel scatter/gather possibilities, I see struct
> iovec which is used by the readv/writev/preadv/pwritev syscalls, but I'm
> understanding that this is a kernel implementation that will be not
> visible to user space.  So would the struct scatterlist be the right
> choice?

It has been so long since I've looked at the scatter-gather code that
I can't really say with any confidence at this point.  All I can say
is that the scatter-gather code really should just be an
implementation detail in the kernel and should not be visible to
userspace; userspace should get the same awful, improperly generated
netlink message it always has received from the kernel ;)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [PATCH v4 2/3] dt-bindings: net: ethernet-phy: add enet-phy-clock-out-frequency
From: Heiko Stuebner @ 2020-06-17 21:33 UTC (permalink / raw)
  To: davem, kuba
  Cc: robh+dt, andrew, f.fainelli, hkallweit1, linux, netdev,
	devicetree, linux-kernel, heiko, christoph.muellner,
	Heiko Stuebner
In-Reply-To: <20200617213326.1532365-1-heiko@sntech.de>

From: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>

Some ethernet phys have a configurable clock output, so add a generic
property to describe its target rate.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
---
 Documentation/devicetree/bindings/net/ethernet-phy.yaml | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
index 9b1f1147ca36..4dcf93f1c555 100644
--- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml
+++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
@@ -84,6 +84,11 @@ properties:
       the turn around line low at end of the control phase of the
       MDIO transaction.
 
+  enet-phy-clock-out-frequency:
+    $ref: /schemas/types.yaml#definitions/uint32
+    description:
+      Frequency in Hz to set an available clock output to.
+
   enet-phy-lane-swap:
     $ref: /schemas/types.yaml#definitions/flag
     description:
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 0/3] add clkout support to mscc phys
From: Heiko Stuebner @ 2020-06-17 21:33 UTC (permalink / raw)
  To: davem, kuba
  Cc: robh+dt, andrew, f.fainelli, hkallweit1, linux, netdev,
	devicetree, linux-kernel, heiko, christoph.muellner

The main part of this series is adding handling of the clkout
controls some of the mscc phys have and while at it Andrew
asked for some of the duplicated probe functionality to be
factored out into a common function.

changes in v4:
- fix missing variable initialization in one probe function
changes in v3:
- adapt to 5.8 merge-window results
- introduce a more generic enet-phy-property instead of
  using a vsc8531,* one - suggested by Andrew
changes in v2:
- new probe factoring patch as suggested by Andrew


Heiko Stuebner (3):
  net: phy: mscc: move shared probe code into a helper
  dt-bindings: net: ethernet-phy: add enet-phy-clock-out-frequency
  net: phy: mscc: handle the clkout control on some phy variants

 .../devicetree/bindings/net/ethernet-phy.yaml |   5 +
 drivers/net/phy/mscc/mscc.h                   |   9 +
 drivers/net/phy/mscc/mscc_main.c              | 211 ++++++++++++------
 3 files changed, 155 insertions(+), 70 deletions(-)

-- 
2.26.2


^ permalink raw reply

* [PATCH v4 3/3] net: phy: mscc: handle the clkout control on some phy variants
From: Heiko Stuebner @ 2020-06-17 21:33 UTC (permalink / raw)
  To: davem, kuba
  Cc: robh+dt, andrew, f.fainelli, hkallweit1, linux, netdev,
	devicetree, linux-kernel, heiko, christoph.muellner,
	Heiko Stuebner
In-Reply-To: <20200617213326.1532365-1-heiko@sntech.de>

From: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>

At least VSC8530/8531/8540/8541 contain a clock output that can emit
a predefined rate of 25, 50 or 125MHz.

This may then feed back into the network interface as source clock.
So follow the example the at803x already set and introduce a
vsc8531,clk-out-frequency property to set that output.

Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
---
 drivers/net/phy/mscc/mscc.h      |  9 ++++
 drivers/net/phy/mscc/mscc_main.c | 87 +++++++++++++++++++++++++++++---
 2 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/mscc/mscc.h b/drivers/net/phy/mscc/mscc.h
index fbcee5fce7b2..a3afc35c3eab 100644
--- a/drivers/net/phy/mscc/mscc.h
+++ b/drivers/net/phy/mscc/mscc.h
@@ -218,6 +218,13 @@ enum rgmii_clock_delay {
 #define INT_MEM_DATA_M			  0x00ff
 #define INT_MEM_DATA(x)			  (INT_MEM_DATA_M & (x))
 
+#define MSCC_CLKOUT_CNTL		  13
+#define CLKOUT_ENABLE			  BIT(15)
+#define CLKOUT_FREQ_MASK		  GENMASK(14, 13)
+#define CLKOUT_FREQ_25M			  (0x0 << 13)
+#define CLKOUT_FREQ_50M			  (0x1 << 13)
+#define CLKOUT_FREQ_125M		  (0x2 << 13)
+
 #define MSCC_PHY_PROC_CMD		  18
 #define PROC_CMD_NCOMPLETED		  0x8000
 #define PROC_CMD_FAILED			  0x4000
@@ -360,6 +367,8 @@ struct vsc8531_private {
 	 */
 	unsigned int base_addr;
 
+	u32 clkout_rate;
+
 #if IS_ENABLED(CONFIG_MACSEC)
 	/* MACsec fields:
 	 * - One SecY per device (enforced at the s/w implementation level)
diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c
index 5d2777522fb4..2f06cfa2f539 100644
--- a/drivers/net/phy/mscc/mscc_main.c
+++ b/drivers/net/phy/mscc/mscc_main.c
@@ -432,6 +432,18 @@ static int vsc85xx_dt_led_mode_get(struct phy_device *phydev,
 	return led_mode;
 }
 
+static void vsc8531_dt_clkout_rate_get(struct phy_device *phydev)
+{
+	struct vsc8531_private *priv = phydev->priv;
+	struct device *dev = &phydev->mdio.dev;
+	struct device_node *of_node = dev->of_node;
+
+	if (!of_node)
+		return;
+
+	of_property_read_u32(of_node, "enet-phy-clock-out-frequency",
+			     &priv->clkout_rate);
+}
 #else
 static int vsc85xx_edge_rate_magic_get(struct phy_device *phydev)
 {
@@ -444,6 +456,10 @@ static int vsc85xx_dt_led_mode_get(struct phy_device *phydev,
 {
 	return default_mode;
 }
+
+static void vsc8531_dt_clkout_rate_get(struct phy_device *phydev)
+{
+}
 #endif /* CONFIG_OF_MDIO */
 
 static int vsc85xx_dt_led_modes_get(struct phy_device *phydev,
@@ -1508,6 +1524,37 @@ static int vsc85xx_config_init(struct phy_device *phydev)
 	return 0;
 }
 
+static int vsc8531_config_init(struct phy_device *phydev)
+{
+	struct vsc8531_private *vsc8531 = phydev->priv;
+	u16 val;
+	int rc;
+
+	rc = vsc85xx_config_init(phydev);
+	if (rc)
+		return rc;
+
+	switch (vsc8531->clkout_rate) {
+	case 0:
+		val = 0;
+		break;
+	case 25000000:
+		val = CLKOUT_FREQ_25M | CLKOUT_ENABLE;
+		break;
+	case 50000000:
+		val = CLKOUT_FREQ_50M | CLKOUT_ENABLE;
+		break;
+	case 125000000:
+		val = CLKOUT_FREQ_125M | CLKOUT_ENABLE;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return phy_write_paged(phydev, MSCC_PHY_PAGE_EXTENDED_GPIO,
+			       MSCC_CLKOUT_CNTL, val);
+}
+
 static int vsc8584_did_interrupt(struct phy_device *phydev)
 {
 	int rc = 0;
@@ -1981,6 +2028,32 @@ static int vsc8514_probe(struct phy_device *phydev)
 				     vsc8531->base_addr, 0);
 }
 
+static int vsc8531_probe(struct phy_device *phydev)
+{
+	struct vsc8531_private *vsc8531;
+	int rate_magic, rc;
+	u32 default_mode[2] = {VSC8531_LINK_1000_ACTIVITY,
+	   VSC8531_LINK_100_ACTIVITY};
+
+	rate_magic = vsc85xx_edge_rate_magic_get(phydev);
+	if (rate_magic < 0)
+		return rate_magic;
+
+	rc = vsc85xx_probe_helper(phydev, default_mode,
+				  ARRAY_SIZE(default_mode),
+				  VSC85XX_SUPP_LED_MODES,
+				  vsc85xx_hw_stats,
+				  ARRAY_SIZE(vsc85xx_hw_stats));
+	if (rc < 0)
+		return rc;
+
+	vsc8531 = phydev->priv;
+	vsc8531->rate_magic = rate_magic;
+	vsc8531_dt_clkout_rate_get(phydev);
+
+	return 0;
+}
+
 static int vsc8574_probe(struct phy_device *phydev)
 {
 	struct vsc8531_private *vsc8531;
@@ -2136,14 +2209,14 @@ static struct phy_driver vsc85xx_driver[] = {
 	.phy_id_mask	= 0xfffffff0,
 	/* PHY_BASIC_FEATURES */
 	.soft_reset	= &genphy_soft_reset,
-	.config_init	= &vsc85xx_config_init,
+	.config_init	= &vsc8531_config_init,
 	.config_aneg    = &vsc85xx_config_aneg,
 	.read_status	= &vsc85xx_read_status,
 	.ack_interrupt	= &vsc85xx_ack_interrupt,
 	.config_intr	= &vsc85xx_config_intr,
 	.suspend	= &genphy_suspend,
 	.resume		= &genphy_resume,
-	.probe		= &vsc85xx_probe,
+	.probe		= &vsc8531_probe,
 	.set_wol	= &vsc85xx_wol_set,
 	.get_wol	= &vsc85xx_wol_get,
 	.get_tunable	= &vsc85xx_get_tunable,
@@ -2160,14 +2233,14 @@ static struct phy_driver vsc85xx_driver[] = {
 	.phy_id_mask    = 0xfffffff0,
 	/* PHY_GBIT_FEATURES */
 	.soft_reset	= &genphy_soft_reset,
-	.config_init    = &vsc85xx_config_init,
+	.config_init    = &vsc8531_config_init,
 	.config_aneg    = &vsc85xx_config_aneg,
 	.read_status	= &vsc85xx_read_status,
 	.ack_interrupt  = &vsc85xx_ack_interrupt,
 	.config_intr    = &vsc85xx_config_intr,
 	.suspend	= &genphy_suspend,
 	.resume		= &genphy_resume,
-	.probe		= &vsc85xx_probe,
+	.probe		= &vsc8531_probe,
 	.set_wol	= &vsc85xx_wol_set,
 	.get_wol	= &vsc85xx_wol_get,
 	.get_tunable	= &vsc85xx_get_tunable,
@@ -2184,14 +2257,14 @@ static struct phy_driver vsc85xx_driver[] = {
 	.phy_id_mask	= 0xfffffff0,
 	/* PHY_BASIC_FEATURES */
 	.soft_reset	= &genphy_soft_reset,
-	.config_init	= &vsc85xx_config_init,
+	.config_init	= &vsc8531_config_init,
 	.config_aneg	= &vsc85xx_config_aneg,
 	.read_status	= &vsc85xx_read_status,
 	.ack_interrupt	= &vsc85xx_ack_interrupt,
 	.config_intr	= &vsc85xx_config_intr,
 	.suspend	= &genphy_suspend,
 	.resume		= &genphy_resume,
-	.probe		= &vsc85xx_probe,
+	.probe		= &vsc8531_probe,
 	.set_wol	= &vsc85xx_wol_set,
 	.get_wol	= &vsc85xx_wol_get,
 	.get_tunable	= &vsc85xx_get_tunable,
@@ -2208,7 +2281,7 @@ static struct phy_driver vsc85xx_driver[] = {
 	.phy_id_mask    = 0xfffffff0,
 	/* PHY_GBIT_FEATURES */
 	.soft_reset	= &genphy_soft_reset,
-	.config_init    = &vsc85xx_config_init,
+	.config_init    = &vsc8531_config_init,
 	.config_aneg    = &vsc85xx_config_aneg,
 	.read_status	= &vsc85xx_read_status,
 	.ack_interrupt  = &vsc85xx_ack_interrupt,
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 1/3] net: phy: mscc: move shared probe code into a helper
From: Heiko Stuebner @ 2020-06-17 21:33 UTC (permalink / raw)
  To: davem, kuba
  Cc: robh+dt, andrew, f.fainelli, hkallweit1, linux, netdev,
	devicetree, linux-kernel, heiko, christoph.muellner,
	Heiko Stuebner
In-Reply-To: <20200617213326.1532365-1-heiko@sntech.de>

From: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>

The different probe functions share a lot of code, so move the
common parts into a helper to reduce duplication.

This moves the devm_phy_package_join below the general allocation
but as all components just allocate things, this should be ok.

Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
---
 drivers/net/phy/mscc/mscc_main.c | 124 +++++++++++++++----------------
 1 file changed, 61 insertions(+), 63 deletions(-)

diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c
index 5ddc44f87eaf..5d2777522fb4 100644
--- a/drivers/net/phy/mscc/mscc_main.c
+++ b/drivers/net/phy/mscc/mscc_main.c
@@ -1935,12 +1935,11 @@ static int vsc85xx_read_status(struct phy_device *phydev)
 	return genphy_read_status(phydev);
 }
 
-static int vsc8514_probe(struct phy_device *phydev)
+static int vsc85xx_probe_helper(struct phy_device *phydev,
+				u32 *leds, int num_leds, u16 led_modes,
+				const struct vsc85xx_hw_stat *stats, int nstats)
 {
 	struct vsc8531_private *vsc8531;
-	u32 default_mode[4] = {VSC8531_LINK_1000_ACTIVITY,
-	   VSC8531_LINK_100_ACTIVITY, VSC8531_LINK_ACTIVITY,
-	   VSC8531_DUPLEX_COLLISION};
 
 	vsc8531 = devm_kzalloc(&phydev->mdio.dev, sizeof(*vsc8531), GFP_KERNEL);
 	if (!vsc8531)
@@ -1948,54 +1947,66 @@ static int vsc8514_probe(struct phy_device *phydev)
 
 	phydev->priv = vsc8531;
 
-	vsc8584_get_base_addr(phydev);
-	devm_phy_package_join(&phydev->mdio.dev, phydev,
-			      vsc8531->base_addr, 0);
-
-	vsc8531->nleds = 4;
-	vsc8531->supp_led_modes = VSC85XX_SUPP_LED_MODES;
-	vsc8531->hw_stats = vsc85xx_hw_stats;
-	vsc8531->nstats = ARRAY_SIZE(vsc85xx_hw_stats);
+	vsc8531->nleds = num_leds;
+	vsc8531->supp_led_modes = led_modes;
+	vsc8531->hw_stats = stats;
+	vsc8531->nstats = nstats;
 	vsc8531->stats = devm_kcalloc(&phydev->mdio.dev, vsc8531->nstats,
 				      sizeof(u64), GFP_KERNEL);
 	if (!vsc8531->stats)
 		return -ENOMEM;
 
-	return vsc85xx_dt_led_modes_get(phydev, default_mode);
+	return vsc85xx_dt_led_modes_get(phydev, leds);
 }
 
-static int vsc8574_probe(struct phy_device *phydev)
+static int vsc8514_probe(struct phy_device *phydev)
 {
 	struct vsc8531_private *vsc8531;
+	int rc;
 	u32 default_mode[4] = {VSC8531_LINK_1000_ACTIVITY,
 	   VSC8531_LINK_100_ACTIVITY, VSC8531_LINK_ACTIVITY,
 	   VSC8531_DUPLEX_COLLISION};
 
-	vsc8531 = devm_kzalloc(&phydev->mdio.dev, sizeof(*vsc8531), GFP_KERNEL);
-	if (!vsc8531)
-		return -ENOMEM;
-
-	phydev->priv = vsc8531;
+	rc = vsc85xx_probe_helper(phydev, default_mode,
+				  ARRAY_SIZE(default_mode),
+				  VSC85XX_SUPP_LED_MODES,
+				  vsc85xx_hw_stats,
+				  ARRAY_SIZE(vsc85xx_hw_stats));
+	if (rc < 0)
+		return rc;
 
+	vsc8531 = phydev->priv;
 	vsc8584_get_base_addr(phydev);
-	devm_phy_package_join(&phydev->mdio.dev, phydev,
-			      vsc8531->base_addr, 0);
+	return devm_phy_package_join(&phydev->mdio.dev, phydev,
+				     vsc8531->base_addr, 0);
+}
 
-	vsc8531->nleds = 4;
-	vsc8531->supp_led_modes = VSC8584_SUPP_LED_MODES;
-	vsc8531->hw_stats = vsc8584_hw_stats;
-	vsc8531->nstats = ARRAY_SIZE(vsc8584_hw_stats);
-	vsc8531->stats = devm_kcalloc(&phydev->mdio.dev, vsc8531->nstats,
-				      sizeof(u64), GFP_KERNEL);
-	if (!vsc8531->stats)
-		return -ENOMEM;
+static int vsc8574_probe(struct phy_device *phydev)
+{
+	struct vsc8531_private *vsc8531;
+	int rc;
+	u32 default_mode[4] = {VSC8531_LINK_1000_ACTIVITY,
+	   VSC8531_LINK_100_ACTIVITY, VSC8531_LINK_ACTIVITY,
+	   VSC8531_DUPLEX_COLLISION};
+
+	rc = vsc85xx_probe_helper(phydev, default_mode,
+				  ARRAY_SIZE(default_mode),
+				  VSC8584_SUPP_LED_MODES,
+				  vsc8584_hw_stats,
+				  ARRAY_SIZE(vsc8584_hw_stats));
+	if (rc < 0)
+		return rc;
 
-	return vsc85xx_dt_led_modes_get(phydev, default_mode);
+	vsc8531 = phydev->priv;
+	vsc8584_get_base_addr(phydev);
+	return devm_phy_package_join(&phydev->mdio.dev, phydev,
+				     vsc8531->base_addr, 0);
 }
 
 static int vsc8584_probe(struct phy_device *phydev)
 {
 	struct vsc8531_private *vsc8531;
+	int rc;
 	u32 default_mode[4] = {VSC8531_LINK_1000_ACTIVITY,
 	   VSC8531_LINK_100_ACTIVITY, VSC8531_LINK_ACTIVITY,
 	   VSC8531_DUPLEX_COLLISION};
@@ -2005,32 +2016,24 @@ static int vsc8584_probe(struct phy_device *phydev)
 		return -ENOTSUPP;
 	}
 
-	vsc8531 = devm_kzalloc(&phydev->mdio.dev, sizeof(*vsc8531), GFP_KERNEL);
-	if (!vsc8531)
-		return -ENOMEM;
-
-	phydev->priv = vsc8531;
+	rc = vsc85xx_probe_helper(phydev, default_mode,
+				  ARRAY_SIZE(default_mode),
+				  VSC8584_SUPP_LED_MODES,
+				  vsc8584_hw_stats,
+				  ARRAY_SIZE(vsc8584_hw_stats));
+	if (rc < 0)
+		return rc;
 
+	vsc8531 = phydev->priv;
 	vsc8584_get_base_addr(phydev);
-	devm_phy_package_join(&phydev->mdio.dev, phydev,
-			      vsc8531->base_addr, 0);
-
-	vsc8531->nleds = 4;
-	vsc8531->supp_led_modes = VSC8584_SUPP_LED_MODES;
-	vsc8531->hw_stats = vsc8584_hw_stats;
-	vsc8531->nstats = ARRAY_SIZE(vsc8584_hw_stats);
-	vsc8531->stats = devm_kcalloc(&phydev->mdio.dev, vsc8531->nstats,
-				      sizeof(u64), GFP_KERNEL);
-	if (!vsc8531->stats)
-		return -ENOMEM;
-
-	return vsc85xx_dt_led_modes_get(phydev, default_mode);
+	return devm_phy_package_join(&phydev->mdio.dev, phydev,
+				     vsc8531->base_addr, 0);
 }
 
 static int vsc85xx_probe(struct phy_device *phydev)
 {
 	struct vsc8531_private *vsc8531;
-	int rate_magic;
+	int rate_magic, rc;
 	u32 default_mode[2] = {VSC8531_LINK_1000_ACTIVITY,
 	   VSC8531_LINK_100_ACTIVITY};
 
@@ -2038,23 +2041,18 @@ static int vsc85xx_probe(struct phy_device *phydev)
 	if (rate_magic < 0)
 		return rate_magic;
 
-	vsc8531 = devm_kzalloc(&phydev->mdio.dev, sizeof(*vsc8531), GFP_KERNEL);
-	if (!vsc8531)
-		return -ENOMEM;
-
-	phydev->priv = vsc8531;
+	rc = vsc85xx_probe_helper(phydev, default_mode,
+				  ARRAY_SIZE(default_mode),
+				  VSC85XX_SUPP_LED_MODES,
+				  vsc85xx_hw_stats,
+				  ARRAY_SIZE(vsc85xx_hw_stats));
+	if (rc < 0)
+		return rc;
 
+	vsc8531 = phydev->priv;
 	vsc8531->rate_magic = rate_magic;
-	vsc8531->nleds = 2;
-	vsc8531->supp_led_modes = VSC85XX_SUPP_LED_MODES;
-	vsc8531->hw_stats = vsc85xx_hw_stats;
-	vsc8531->nstats = ARRAY_SIZE(vsc85xx_hw_stats);
-	vsc8531->stats = devm_kcalloc(&phydev->mdio.dev, vsc8531->nstats,
-				      sizeof(u64), GFP_KERNEL);
-	if (!vsc8531->stats)
-		return -ENOMEM;
 
-	return vsc85xx_dt_led_modes_get(phydev, default_mode);
+	return 0;
 }
 
 /* Microsemi VSC85xx PHYs */
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH net v1] net: phy: smsc: fix printing too many logs
From: Andrew Lunn @ 2020-06-17 21:36 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Dejin Zheng, f.fainelli, hkallweit1, davem, kuba, netdev,
	linux-kernel, Kevin Groeneveld
In-Reply-To: <20200617202450.GX1551@shell.armlinux.org.uk>

On Wed, Jun 17, 2020 at 09:24:50PM +0100, Russell King - ARM Linux admin wrote:
> On Wed, Jun 17, 2020 at 08:43:34PM +0200, Andrew Lunn wrote:
> > You have explained what the change does. But not why it is
> > needed. What exactly is happening. To me, the key thing is
> > understanding why we get -110, and why it is not an actual error we
> > should be reporting as an error. That is what needs explaining.
> 
> The patch author really ought to be explaining this... but let me
> have a go.  It's worth pointing out that the comments in the file
> aren't good English either, so don't really describe what is going
> on.
> 
> When this PHY is in EDPD mode, it doesn't always detect a connected
> cable.  The workaround for it involves, when the link is down, and
> at each read_status() call:
> 
> - disable EDPD mode, forcing the PHY out of low-power mode
> - waiting 640ms to see if we have any energy detected from the media
> - re-enable entry to EDPD mode
> 
> This is presumably enough to allow the PHY to notice that a cable is
> connected, and resume normal operations to negotiate with the partner.
> 
> The problem is that when no media is detected, the 640ms wait times
> out (as it should, we don't want to wait forever) and the kernel
> prints a warning.

Hi Russell

Yes, that is what i was thinking. 

There probably should be a comment added just to prevent somebody
swapping it back to phy_read_poll_timeout(). It is not clear that
-ETIMEOUT is expected under some conditions.

	  Andrew

^ permalink raw reply

* Re: [PATCH v4 0/3] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Denis Efremov @ 2020-06-17 21:31 UTC (permalink / raw)
  To: Joe Perches, Waiman Long, Andrew Morton, David Howells,
	Jarkko Sakkinen, James Morris, Serge E. Hallyn, Linus Torvalds,
	Matthew Wilcox, David Rientjes
  Cc: Michal Hocko, Johannes Weiner, Dan Carpenter, David Sterba,
	Jason A . Donenfeld, linux-mm, keyrings, linux-kernel,
	linux-crypto, linux-pm, linux-stm32, linux-amlogic,
	linux-mediatek, linuxppc-dev, virtualization, netdev, linux-ppp,
	wireguard, linux-wireless, devel, linux-scsi, target-devel,
	linux-btrfs, linux-cifs, linux-fscrypt, ecryptfs, kasan-dev,
	linux-bluetooth, linux-wpan, linux-sctp, linux-nfs,
	tipc-discussion, linux-security-module, linux-integrity
In-Reply-To: <fe3b9a437be4aeab3bac68f04193cb6daaa5bee4.camel@perches.com>



On 6/16/20 9:53 PM, Joe Perches wrote:
> On Mon, 2020-06-15 at 21:57 -0400, Waiman Long wrote:
>>  v4:
>>   - Break out the memzero_explicit() change as suggested by Dan Carpenter
>>     so that it can be backported to stable.
>>   - Drop the "crypto: Remove unnecessary memzero_explicit()" patch for
>>     now as there can be a bit more discussion on what is best. It will be
>>     introduced as a separate patch later on after this one is merged.
> 
> To this larger audience and last week without reply:
> https://lore.kernel.org/lkml/573b3fbd5927c643920e1364230c296b23e7584d.camel@perches.com/
> 
> Are there _any_ fastpath uses of kfree or vfree?
> 
> Many patches have been posted recently to fix mispairings
> of specific types of alloc and free functions.

I've prepared a coccinelle script to highlight these mispairings in a function
a couple of days ago: https://lkml.org/lkml/2020/6/5/953
I've listed all the fixes in the commit message. 

Not so many mispairings actually, and most of them are harmless like:
kmalloc(E) -> kvfree(E)

However, coccinelle script can't detect cross-functions mispairings, i.e.
allocation in one function, free in another funtion.

Thanks,
Denis

^ permalink raw reply

* Re: [PATCH resend net] net: ethtool: add missing NETIF_F_GSO_FRAGLIST feature string
From: Alexander Lobakin @ 2020-06-17 21:38 UTC (permalink / raw)
  To: Michal Kubecek
  Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Florian Fainelli,
	Richard Cochran, Antoine Tenart, Aya Levin, Steffen Klassert,
	Willem de Bruijn, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20200617211844.kupsyijuurjpb5kd@lion.mk-sys.cz>

Hi Michal,

On Thursday, 18 June 2020, 0:18, Michal Kubecek <mkubecek@suse.cz> wrote:

> On Wed, Jun 17, 2020 at 08:42:47PM +0000, Alexander Lobakin wrote:
>
> > Commit 3b33583265ed ("net: Add fraglist GRO/GSO feature flags") missed
> > an entry for NETIF_F_GSO_FRAGLIST in netdev_features_strings array. As
> > a result, fraglist GSO feature is not shown in 'ethtool -k' output and
> > can't be toggled on/off.
> > The fix is trivial.
> >
> > Fixes: 3b33583265ed ("net: Add fraglist GRO/GSO feature flags")
> > Signed-off-by: Alexander Lobakin alobakin@pm.me
> >
> > ----------------------------------------------------------------------------------------------------------------
> >
> > net/ethtool/common.c | 1 +
> > 1 file changed, 1 insertion(+)
> > diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> > index 423e640e3876..47f63526818e 100644
> > --- a/net/ethtool/common.c
> > +++ b/net/ethtool/common.c
> > @@ -43,6 +43,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
> > [NETIF_F_GSO_SCTP_BIT] = "tx-sctp-segmentation",
> > [NETIF_F_GSO_ESP_BIT] = "tx-esp-segmentation",
> > [NETIF_F_GSO_UDP_L4_BIT] = "tx-udp-segmentation",
> >
> > -   [NETIF_F_GSO_FRAGLIST_BIT] = "tx-gso-list",
> >     [NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc",
> >     [NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp",
> >
>
> Reviewed-by: Michal Kubecekmkubecek@suse.cz

Thanks!

> AFAICS the name for NETIF_F_GSO_TUNNEL_REMCSUM_BIT is also missing but
> IMHO it will be better to fix that by a separate patch with its own
> Fixes tag.

Oh, nice catch! I'll make a separate for this one.
I also wanted to add any sort of static_assert() / BUILD_BUG_ON() to
prevent such misses, but don't see any easy pattern to check for now,
as netdev_features_strings[] is always NETDEV_FEATURE_COUNT-sized.

> Michal

^ permalink raw reply

* Re: [PATCH v2 2/2] net: phy: mscc: handle the clkout control on some phy variants
From: Rob Herring @ 2020-06-17 21:45 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: davem, kuba, andrew, f.fainelli, hkallweit1, linux, netdev,
	devicetree, linux-kernel, christoph.muellner, Heiko Stuebner
In-Reply-To: <20200609133140.1421109-2-heiko@sntech.de>

On Tue, Jun 09, 2020 at 03:31:40PM +0200, Heiko Stuebner wrote:
> From: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
> 
> At least VSC8530/8531/8540/8541 contain a clock output that can emit
> a predefined rate of 25, 50 or 125MHz.
> 
> This may then feed back into the network interface as source clock.
> So follow the example the at803x already set and introduce a
> vsc8531,clk-out-frequency property to set that output.
> 
> Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
> ---
> Hi Andrew,
> 
> I didn't change the property yet, do you have a suggestion on
> how to name it though? Going by the other examples in the
> ethernet-phy.yamls, something like enet-phy-clock-out-frequency ?

The correct thing to do here is make the phy a clock provider and then 
the client side use 'assigned-clock-rate' to set the rate. That has the 
advantage that it also describes the connection of the clock signal. You 
might not need that for a simple case, but I could imagine needing that 
in a more complex case.

Rob

^ permalink raw reply

* Re: [PATCH resend net] net: ethtool: add missing NETIF_F_GSO_FRAGLIST feature string
From: Michal Kubecek @ 2020-06-17 21:54 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Jakub Kicinski, Andrew Lunn, Florian Fainelli,
	Richard Cochran, Antoine Tenart, Aya Levin, Steffen Klassert,
	Willem de Bruijn, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <5u_BsHq6Jk0q09sGWsVKfvv0NRa5kBHjxdOr9VpNknRnxHpbRO_80JSHl_AWS-C_wBS7B15KetRleLaluF8tOysTNYyLVRUG4BRFQZxnhXw=@pm.me>

On Wed, Jun 17, 2020 at 09:38:47PM +0000, Alexander Lobakin wrote:
> Hi Michal,
> 
> On Thursday, 18 June 2020, 0:18, Michal Kubecek <mkubecek@suse.cz> wrote:
> 
> > On Wed, Jun 17, 2020 at 08:42:47PM +0000, Alexander Lobakin wrote:
> >
> > > Commit 3b33583265ed ("net: Add fraglist GRO/GSO feature flags") missed
> > > an entry for NETIF_F_GSO_FRAGLIST in netdev_features_strings array. As
> > > a result, fraglist GSO feature is not shown in 'ethtool -k' output and
> > > can't be toggled on/off.
> > > The fix is trivial.
> > >
> > > Fixes: 3b33583265ed ("net: Add fraglist GRO/GSO feature flags")
> > > Signed-off-by: Alexander Lobakin alobakin@pm.me
> > >
> > > ----------------------------------------------------------------------------------------------------------------
> > >
> > > net/ethtool/common.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > > diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> > > index 423e640e3876..47f63526818e 100644
> > > --- a/net/ethtool/common.c
> > > +++ b/net/ethtool/common.c
> > > @@ -43,6 +43,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
> > > [NETIF_F_GSO_SCTP_BIT] = "tx-sctp-segmentation",
> > > [NETIF_F_GSO_ESP_BIT] = "tx-esp-segmentation",
> > > [NETIF_F_GSO_UDP_L4_BIT] = "tx-udp-segmentation",
> > >
> > > -   [NETIF_F_GSO_FRAGLIST_BIT] = "tx-gso-list",
> > >     [NETIF_F_FCOE_CRC_BIT] = "tx-checksum-fcoe-crc",
> > >     [NETIF_F_SCTP_CRC_BIT] = "tx-checksum-sctp",
> > >
> >
> > Reviewed-by: Michal Kubecekmkubecek@suse.cz
> 
> Thanks!
> 
> > AFAICS the name for NETIF_F_GSO_TUNNEL_REMCSUM_BIT is also missing but
> > IMHO it will be better to fix that by a separate patch with its own
> > Fixes tag.
> 
> Oh, nice catch! I'll make a separate for this one.
> I also wanted to add any sort of static_assert() / BUILD_BUG_ON() to
> prevent such misses, but don't see any easy pattern to check for now,
> as netdev_features_strings[] is always NETDEV_FEATURE_COUNT-sized.

The key problem is that unlike e.g. link modes, new netdev features are
not always added at the end. So even if we changed
netdev_features_strings[] array to have size determined automatically
from initializers, adding new netdev feature somewhere in the middle
(e.g. GSO one) without updating netdev_features_strings[] would not be
caught as simple array size check would not notice the hole (actually,
there are also two intended holes in the array).

Michal

^ permalink raw reply

* Re: [PATCH v4 2/3] dt-bindings: net: ethernet-phy: add enet-phy-clock-out-frequency
From: Heiko Stübner @ 2020-06-17 21:59 UTC (permalink / raw)
  To: davem
  Cc: kuba, robh+dt, andrew, f.fainelli, hkallweit1, linux, netdev,
	devicetree, linux-kernel, christoph.muellner
In-Reply-To: <20200617213326.1532365-3-heiko@sntech.de>

Am Mittwoch, 17. Juni 2020, 23:33:25 CEST schrieb Heiko Stuebner:
> From: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>
> 
> Some ethernet phys have a configurable clock output, so add a generic
> property to describe its target rate.
> 
> Suggested-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Heiko Stuebner <heiko.stuebner@theobroma-systems.com>

just now Rob wrote for v3:

----- 8< ------
The correct thing to do here is make the phy a clock provider and then 
the client side use 'assigned-clock-rate' to set the rate. That has the 
advantage that it also describes the connection of the clock signal. You 
might not need that for a simple case, but I could imagine needing that 
in a more complex case.

Rob
----- 8< ------



>  Documentation/devicetree/bindings/net/ethernet-phy.yaml | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/ethernet-phy.yaml b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> index 9b1f1147ca36..4dcf93f1c555 100644
> --- a/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> +++ b/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> @@ -84,6 +84,11 @@ properties:
>        the turn around line low at end of the control phase of the
>        MDIO transaction.
>  
> +  enet-phy-clock-out-frequency:
> +    $ref: /schemas/types.yaml#definitions/uint32
> +    description:
> +      Frequency in Hz to set an available clock output to.
> +
>    enet-phy-lane-swap:
>      $ref: /schemas/types.yaml#definitions/flag
>      description:
> 





^ permalink raw reply

* Re: [PATCH] net: usb: ax88179_178a: fix packet alignment padding
From: David Miller @ 2020-06-17 21:59 UTC (permalink / raw)
  To: jk; +Cc: netdev, allan, freddy, pfink, linux-usb
In-Reply-To: <20200615025456.30219-1-jk@ozlabs.org>

From: Jeremy Kerr <jk@ozlabs.org>
Date: Mon, 15 Jun 2020 10:54:56 +0800

> Using a AX88179 device (0b95:1790), I see two bytes of appended data on
> every RX packet. For example, this 48-byte ping, using 0xff as a
> payload byte:
> 
>   04:20:22.528472 IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 2447, seq 1, length 64
> 	0x0000:  000a cd35 ea50 000a cd35 ea4f 0800 4500
> 	0x0010:  0054 c116 4000 4001 f63e c0a8 0101 c0a8
> 	0x0020:  0102 0800 b633 098f 0001 87ea cd5e 0000
> 	0x0030:  0000 dcf2 0600 0000 0000 ffff ffff ffff
> 	0x0040:  ffff ffff ffff ffff ffff ffff ffff ffff
> 	0x0050:  ffff ffff ffff ffff ffff ffff ffff ffff
> 	0x0060:  ffff 961f
> 
> Those last two bytes - 96 1f - aren't part of the original packet.
> 
> In the ax88179 RX path, the usbnet rx_fixup function trims a 2-byte
> 'alignment pseudo header' from the start of the packet, and sets the
> length from a per-packet field populated by hardware. It looks like that
> length field *includes* the 2-byte header; the current driver assumes
> that it's excluded.
> 
> This change trims the 2-byte alignment header after we've set the packet
> length, so the resulting packet length is correct. While we're moving
> the comment around, this also fixes the spelling of 'pseudo'.
> 
> Signed-off-by: Jeremy Kerr <jk@ozlabs.org>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH] net: usb: ax88179_178a: fix packet alignment padding
From: David Miller @ 2020-06-17 22:00 UTC (permalink / raw)
  To: jk; +Cc: netdev, allan, freddy, pfink, linux-usb, louis
In-Reply-To: <ac9ac673933f0e8383c6ab538302058ba2469192.camel@ozlabs.org>

From: Jeremy Kerr <jk@ozlabs.org>
Date: Wed, 17 Jun 2020 08:56:39 +0800

> [If you have any hints for forcing clustered packets, I'll see if I can
> probe the behaviour a little better to confirm]

Probably it would involve having packets arrive back to back faster
than some interval that either depends upon real time or some multiple
of a USB bus cycle.

^ permalink raw reply

* [PATCH v5 4/7] pidfd: Replace open-coded partial fd_install_received()
From: Kees Cook @ 2020-06-17 22:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano, Greg Kroah-Hartman,
	Andy Lutomirski, Will Drewry, Shuah Khan, netdev, containers,
	linux-api, linux-fsdevel, linux-kselftest
In-Reply-To: <20200617220327.3731559-1-keescook@chromium.org>

The sock counting (sock_update_netprioidx() and sock_update_classid()) was
missing from pidfd's implementation of received fd installation. Replace
the open-coded version with a call to the new fd_install_received()
helper.

Fixes: 8649c322f75c ("pid: Implement pidfd_getfd syscall")
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/pid.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index f1496b757162..24924ec5df0e 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -635,18 +635,9 @@ static int pidfd_getfd(struct pid *pid, int fd)
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
-	ret = security_file_receive(file);
-	if (ret) {
-		fput(file);
-		return ret;
-	}
-
-	ret = get_unused_fd_flags(O_CLOEXEC);
+	ret = fd_install_received(file, O_CLOEXEC);
 	if (ret < 0)
 		fput(file);
-	else
-		fd_install(ret, file);
-
 	return ret;
 }
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH v5 5/7] fs: Expand __fd_install_received() to accept fd
From: Kees Cook @ 2020-06-17 22:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano, Greg Kroah-Hartman,
	Andy Lutomirski, Will Drewry, Shuah Khan, netdev, containers,
	linux-api, linux-fsdevel, linux-kselftest
In-Reply-To: <20200617220327.3731559-1-keescook@chromium.org>

Expand __fd_install_received() with support for replace_fd() for the
coming seccomp "addfd" ioctl(). Add new wrapper fd_replace_received()
for the new mode and update existing wrappers to retain old mode.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/file.c            | 22 +++++++++++++++++-----
 include/linux/file.h | 10 +++++++---
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index de85a42defe2..9568bcfd1f44 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -937,6 +937,7 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 /**
  * __fd_install_received() - Install received file into file descriptor table
  *
+ * @fd: fd to install into (if negative, a new fd will be allocated)
  * @file: struct file that was received from another process
  * @ufd: __user pointer to write new fd number to
  * @o_flags: the O_* flags to apply to the new fd entry
@@ -947,7 +948,8 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
  *
  * Returns newly install fd or -ve on error.
  */
-int __fd_install_received(struct file *file, int __user *ufd, unsigned int o_flags)
+int __fd_install_received(int fd, struct file *file, int __user *ufd,
+			  unsigned int o_flags)
 {
 	struct socket *sock;
 	int new_fd;
@@ -957,9 +959,11 @@ int __fd_install_received(struct file *file, int __user *ufd, unsigned int o_fla
 	if (error)
 		return error;
 
-	new_fd = get_unused_fd_flags(o_flags);
-	if (new_fd < 0)
-		return new_fd;
+	if (fd < 0) {
+		new_fd = get_unused_fd_flags(o_flags);
+		if (new_fd < 0)
+			return new_fd;
+	}
 
 	if (ufd) {
 		error = put_user(new_fd, ufd);
@@ -969,6 +973,15 @@ int __fd_install_received(struct file *file, int __user *ufd, unsigned int o_fla
 		}
 	}
 
+	if (fd < 0)
+		fd_install(new_fd, get_file(file));
+	else {
+		new_fd = fd;
+		error = replace_fd(new_fd, file, o_flags);
+		if (error)
+			return error;
+	}
+
 	/* Bump the usage count and install the file. The resulting value of
 	 * "error" is ignored here since we only need to take action when
 	 * the file is a socket and testing "sock" for NULL is sufficient.
@@ -978,7 +991,6 @@ int __fd_install_received(struct file *file, int __user *ufd, unsigned int o_fla
 		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
 		sock_update_classid(&sock->sk->sk_cgrp_data);
 	}
-	fd_install(new_fd, get_file(file));
 	return new_fd;
 }
 
diff --git a/include/linux/file.h b/include/linux/file.h
index e19974ed9322..04389b0da11b 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -92,18 +92,22 @@ extern void put_unused_fd(unsigned int fd);
 
 extern void fd_install(unsigned int fd, struct file *file);
 
-extern int __fd_install_received(struct file *file, int __user *ufd,
+extern int __fd_install_received(int fd, struct file *file, int __user *ufd,
 				 unsigned int o_flags);
 static inline int fd_install_received_user(struct file *file, int __user *ufd,
 					   unsigned int o_flags)
 {
 	if (ufd == NULL)
 		return -EFAULT;
-	return __fd_install_received(file, ufd, o_flags);
+	return __fd_install_received(-1, file, ufd, o_flags);
 }
 static inline int fd_install_received(struct file *file, unsigned int o_flags)
 {
-	return __fd_install_received(file, NULL, o_flags);
+	return __fd_install_received(-1, file, NULL, o_flags);
+}
+static inline int fd_replace_received(int fd, struct file *file, unsigned int o_flags)
+{
+	return __fd_install_received(fd, file, NULL, o_flags);
 }
 
 extern void flush_delayed_fput(void);
-- 
2.25.1


^ permalink raw reply related

* [PATCH v5 7/7] selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
From: Kees Cook @ 2020-06-17 22:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano, Greg Kroah-Hartman,
	Andy Lutomirski, Will Drewry, Shuah Khan, netdev, containers,
	linux-api, linux-fsdevel, linux-kselftest
In-Reply-To: <20200617220327.3731559-1-keescook@chromium.org>

From: Sargun Dhillon <sargun@sargun.me>

Test whether we can add file descriptors in response to notifications.
This injects the file descriptors via notifications, and then uses kcmp
to determine whether or not it has been successful.

It also includes some basic sanity checking for arguments.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Link: https://lore.kernel.org/r/20200603011044.7972-5-sargun@sargun.me
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 229 ++++++++++++++++++
 1 file changed, 229 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 4e1891f8a0cd..143eafdc4fdc 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -45,6 +45,7 @@
 #include <sys/socket.h>
 #include <sys/ioctl.h>
 #include <linux/kcmp.h>
+#include <sys/resource.h>
 
 #include <unistd.h>
 #include <sys/syscall.h>
@@ -168,7 +169,9 @@ struct seccomp_metadata {
 
 #ifndef SECCOMP_FILTER_FLAG_NEW_LISTENER
 #define SECCOMP_FILTER_FLAG_NEW_LISTENER	(1UL << 3)
+#endif
 
+#ifndef SECCOMP_RET_USER_NOTIF
 #define SECCOMP_RET_USER_NOTIF 0x7fc00000U
 
 #define SECCOMP_IOC_MAGIC		'!'
@@ -204,6 +207,39 @@ struct seccomp_notif_sizes {
 };
 #endif
 
+#ifndef SECCOMP_IOCTL_NOTIF_ADDFD
+/* On success, the return value is the remote process's added fd number */
+#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3,	\
+						struct seccomp_notif_addfd)
+
+/* valid flags for seccomp_notif_addfd */
+#define SECCOMP_ADDFD_FLAG_SETFD	(1UL << 0) /* Specify remote fd */
+
+struct seccomp_notif_addfd {
+	__u64 id;
+	__u32 flags;
+	__u32 srcfd;
+	__u32 newfd;
+	__u32 newfd_flags;
+};
+#endif
+
+struct seccomp_notif_addfd_small {
+	__u64 id;
+	char weird[4];
+};
+#define SECCOMP_IOCTL_NOTIF_ADDFD_SMALL	\
+	SECCOMP_IOW(3, struct seccomp_notif_addfd_small)
+
+struct seccomp_notif_addfd_big {
+	union {
+		struct seccomp_notif_addfd addfd;
+		char buf[sizeof(struct seccomp_notif_addfd) + 8];
+	};
+};
+#define SECCOMP_IOCTL_NOTIF_ADDFD_BIG	\
+	SECCOMP_IOWR(3, struct seccomp_notif_addfd_big)
+
 #ifndef PTRACE_EVENTMSG_SYSCALL_ENTRY
 #define PTRACE_EVENTMSG_SYSCALL_ENTRY	1
 #define PTRACE_EVENTMSG_SYSCALL_EXIT	2
@@ -3833,6 +3869,199 @@ TEST(user_notification_filter_empty_threaded)
 	EXPECT_GT((pollfd.revents & POLLHUP) ?: 0, 0);
 }
 
+TEST(user_notification_addfd)
+{
+	pid_t pid;
+	long ret;
+	int status, listener, memfd, fd;
+	struct seccomp_notif_addfd addfd = {};
+	struct seccomp_notif_addfd_small small = {};
+	struct seccomp_notif_addfd_big big = {};
+	struct seccomp_notif req = {};
+	struct seccomp_notif_resp resp = {};
+	/* 100 ms */
+	struct timespec delay = { .tv_nsec = 100000000 };
+
+	memfd = memfd_create("test", 0);
+	ASSERT_GE(memfd, 0);
+
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+	}
+
+	/* Check that the basic notification machinery works */
+	listener = user_notif_syscall(__NR_getppid,
+				      SECCOMP_FILTER_FLAG_NEW_LISTENER);
+	ASSERT_GE(listener, 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+
+	if (pid == 0) {
+		if (syscall(__NR_getppid) != USER_NOTIF_MAGIC)
+			exit(1);
+		exit(syscall(__NR_getppid) != USER_NOTIF_MAGIC);
+	}
+
+	ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+
+	addfd.srcfd = memfd;
+	addfd.newfd = 0;
+	addfd.id = req.id;
+	addfd.flags = 0x0;
+
+	/* Verify bad newfd_flags cannot be set */
+	addfd.newfd_flags = ~O_CLOEXEC;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EINVAL);
+	addfd.newfd_flags = O_CLOEXEC;
+
+	/* Verify bad flags cannot be set */
+	addfd.flags = 0xff;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EINVAL);
+	addfd.flags = 0;
+
+	/* Verify that remote_fd cannot be set without setting flags */
+	addfd.newfd = 1;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EINVAL);
+	addfd.newfd = 0;
+
+	/* Verify small size cannot be set */
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_SMALL, &small), -1);
+	EXPECT_EQ(errno, EINVAL);
+
+	/* Verify we can't send bits filled in unknown buffer area */
+	memset(&big, 0xAA, sizeof(big));
+	big.addfd = addfd;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_BIG, &big), -1);
+	EXPECT_EQ(errno, E2BIG);
+
+
+	/* Verify we can set an arbitrary remote fd */
+	fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);
+	/*
+	 * The child has fds 0(stdin), 1(stdout), 2(stderr), 3(memfd),
+	 * 4(listener), so the newly allocated fd should be 5.
+	 */
+	EXPECT_EQ(fd, 5);
+	EXPECT_EQ(filecmp(getpid(), pid, memfd, fd), 0);
+
+	/* Verify we can set an arbitrary remote fd with large size */
+	memset(&big, 0x0, sizeof(big));
+	big.addfd = addfd;
+	fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_BIG, &big);
+	EXPECT_EQ(fd, 6);
+
+	/* Verify we can set a specific remote fd */
+	addfd.newfd = 42;
+	addfd.flags = SECCOMP_ADDFD_FLAG_SETFD;
+	fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);
+	EXPECT_EQ(fd, 42);
+	EXPECT_EQ(filecmp(getpid(), pid, memfd, fd), 0);
+
+	/* Resume syscall */
+	resp.id = req.id;
+	resp.error = 0;
+	resp.val = USER_NOTIF_MAGIC;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+	/*
+	 * This sets the ID of the ADD FD to the last request plus 1. The
+	 * notification ID increments 1 per notification.
+	 */
+	addfd.id = req.id + 1;
+
+	/* This spins until the underlying notification is generated */
+	while (ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd) != -1 &&
+	       errno != -EINPROGRESS)
+		nanosleep(&delay, NULL);
+
+	memset(&req, 0, sizeof(req));
+	ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+	ASSERT_EQ(addfd.id, req.id);
+
+	resp.id = req.id;
+	resp.error = 0;
+	resp.val = USER_NOTIF_MAGIC;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+	/* Wait for child to finish. */
+	EXPECT_EQ(waitpid(pid, &status, 0), pid);
+	EXPECT_EQ(true, WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	close(memfd);
+}
+
+TEST(user_notification_addfd_rlimit)
+{
+	pid_t pid;
+	long ret;
+	int status, listener, memfd;
+	struct seccomp_notif_addfd addfd = {};
+	struct seccomp_notif req = {};
+	struct seccomp_notif_resp resp = {};
+	const struct rlimit lim = {
+		.rlim_cur	= 0,
+		.rlim_max	= 0,
+	};
+
+	memfd = memfd_create("test", 0);
+	ASSERT_GE(memfd, 0);
+
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret) {
+		TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!");
+	}
+
+	/* Check that the basic notification machinery works */
+	listener = user_notif_syscall(__NR_getppid,
+				      SECCOMP_FILTER_FLAG_NEW_LISTENER);
+	ASSERT_GE(listener, 0);
+
+	pid = fork();
+	ASSERT_GE(pid, 0);
+
+	if (pid == 0)
+		exit(syscall(__NR_getppid) != USER_NOTIF_MAGIC);
+
+
+	ASSERT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_RECV, &req), 0);
+
+	ASSERT_EQ(prlimit(pid, RLIMIT_NOFILE, &lim, NULL), 0);
+
+	addfd.srcfd = memfd;
+	addfd.newfd_flags = O_CLOEXEC;
+	addfd.newfd = 0;
+	addfd.id = req.id;
+	addfd.flags = 0;
+
+	/* Should probably spot check /proc/sys/fs/file-nr */
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EMFILE);
+
+	addfd.newfd = 100;
+	addfd.flags = SECCOMP_ADDFD_FLAG_SETFD;
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+	EXPECT_EQ(errno, EBADF);
+
+	resp.id = req.id;
+	resp.error = 0;
+	resp.val = USER_NOTIF_MAGIC;
+
+	EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_SEND, &resp), 0);
+
+	/* Wait for child to finish. */
+	EXPECT_EQ(waitpid(pid, &status, 0), pid);
+	EXPECT_EQ(true, WIFEXITED(status));
+	EXPECT_EQ(0, WEXITSTATUS(status));
+
+	close(memfd);
+}
+
 /*
  * TODO:
  * - expand NNP testing
-- 
2.25.1


^ permalink raw reply related

* [PATCH v5 6/7] seccomp: Introduce addfd ioctl to seccomp user notifier
From: Kees Cook @ 2020-06-17 22:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Matt Denton, Christian Brauner,
	Tycho Andersen, David Laight, Christoph Hellwig, David S. Miller,
	Jakub Kicinski, Alexander Viro, Aleksa Sarai, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano, Greg Kroah-Hartman,
	Andy Lutomirski, Will Drewry, Shuah Khan, netdev, containers,
	linux-api, linux-fsdevel, linux-kselftest
In-Reply-To: <20200617220327.3731559-1-keescook@chromium.org>

From: Sargun Dhillon <sargun@sargun.me>

This adds a seccomp notifier ioctl which allows for the listener to
"add" file descriptors to a process which originated a seccomp user
notification. This allows calls like mount, and mknod to be "implemented",
as the return value, and the arguments are data in memory. On the other
hand, calls like connect can be "implemented" using pidfd_getfd.

Unfortunately, there are calls which return file descriptors, like
open, which are vulnerable to ToCToU attacks, and require that the more
privileged supervisor can inspect the argument, and perform the syscall
on behalf of the process generating the notification. This allows the
file descriptor generated from that open call to be returned to the
calling process.

In addition, there is functionality to allow for replacement of specific
file descriptors, following dup2-like semantics.

The ioctl handling is based on the discussions[1] of how Extensible
Arguments should interact with ioctls. Instead of building size into
the addfd structure, make it a function of the ioctl command (which
is how sizes are normally passed to ioctls). To support forward and
backward compatibility, just mask out the direction and size, and match
everything. The size (and any future direction) checks are done along
with copy_struct_from_user() logic.

As a note, the seccomp_notif_addfd structure is laid out based on 8-byte
alignment without requiring packing as there have been packing issues
with uapi highlighted before[1][2]. Although we could overload the
newfd field and use -1 to indicate that it is not to be used, doing
so requires changing the size of the fd field, and introduces struct
packing complexity.

[1]: https://lore.kernel.org/lkml/87o8w9bcaf.fsf@mid.deneb.enyo.de/
[2]: https://lore.kernel.org/lkml/a328b91d-fd8f-4f27-b3c2-91a9c45f18c0@rasmusvillemoes.dk/
[3]: https://lore.kernel.org/lkml/20200612104629.GA15814@ircssh-2.c.rugged-nimbus-611.internal

Suggested-by: Matt Denton <mpdenton@google.com>
Link: https://lore.kernel.org/r/20200603011044.7972-4-sargun@sargun.me
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/uapi/linux/seccomp.h |  22 +++++
 kernel/seccomp.c             | 172 ++++++++++++++++++++++++++++++++++-
 2 files changed, 193 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 965290f7dcc2..6ba18b82a02e 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -113,6 +113,25 @@ struct seccomp_notif_resp {
 	__u32 flags;
 };
 
+/* valid flags for seccomp_notif_addfd */
+#define SECCOMP_ADDFD_FLAG_SETFD	(1UL << 0) /* Specify remote fd */
+
+/**
+ * struct seccomp_notif_addfd
+ * @id: The ID of the seccomp notification
+ * @flags: SECCOMP_ADDFD_FLAG_*
+ * @srcfd: The local fd number
+ * @newfd: Optional remote FD number if SETFD option is set, otherwise 0.
+ * @newfd_flags: The O_* flags the remote FD should have applied
+ */
+struct seccomp_notif_addfd {
+	__u64 id;
+	__u32 flags;
+	__u32 srcfd;
+	__u32 newfd;
+	__u32 newfd_flags;
+};
+
 #define SECCOMP_IOC_MAGIC		'!'
 #define SECCOMP_IO(nr)			_IO(SECCOMP_IOC_MAGIC, nr)
 #define SECCOMP_IOR(nr, type)		_IOR(SECCOMP_IOC_MAGIC, nr, type)
@@ -124,5 +143,8 @@ struct seccomp_notif_resp {
 #define SECCOMP_IOCTL_NOTIF_SEND	SECCOMP_IOWR(1,	\
 						struct seccomp_notif_resp)
 #define SECCOMP_IOCTL_NOTIF_ID_VALID	SECCOMP_IOW(2, __u64)
+/* On success, the return value is the remote process's added fd number */
+#define SECCOMP_IOCTL_NOTIF_ADDFD	SECCOMP_IOW(3, \
+						struct seccomp_notif_addfd)
 
 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 0ed57e8c49d0..a7c20c408b83 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -87,10 +87,42 @@ struct seccomp_knotif {
 	long val;
 	u32 flags;
 
-	/* Signals when this has entered SECCOMP_NOTIFY_REPLIED */
+	/*
+	 * Signals when this has changed states, such as the listener
+	 * dying, a new seccomp addfd message, or changing to REPLIED
+	 */
 	struct completion ready;
 
 	struct list_head list;
+
+	/* outstanding addfd requests */
+	struct list_head addfd;
+};
+
+/**
+ * struct seccomp_kaddfd - container for seccomp_addfd ioctl messages
+ *
+ * @file: A reference to the file to install in the other task
+ * @fd: The fd number to install it at. If the fd number is -1, it means the
+ *      installing process should allocate the fd as normal.
+ * @flags: The flags for the new file descriptor. At the moment, only O_CLOEXEC
+ *         is allowed.
+ * @ret: The return value of the installing process. It is set to the fd num
+ *       upon success (>= 0).
+ * @completion: Indicates that the installing process has completed fd
+ *              installation, or gone away (either due to successful
+ *              reply, or signal)
+ *
+ */
+struct seccomp_kaddfd {
+	struct file *file;
+	int fd;
+	unsigned int flags;
+
+	/* To only be set on reply */
+	int ret;
+	struct completion completion;
+	struct list_head list;
 };
 
 /**
@@ -793,6 +825,17 @@ static u64 seccomp_next_notify_id(struct seccomp_filter *filter)
 	return filter->notif->next_id++;
 }
 
+static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd)
+{
+	/*
+	 * Remove the notification, and reset the list pointers, indicating
+	 * that it has been handled.
+	 */
+	list_del_init(&addfd->list);
+	addfd->ret = fd_replace_received(addfd->fd, addfd->file, addfd->flags);
+	complete(&addfd->completion);
+}
+
 static int seccomp_do_user_notification(int this_syscall,
 					struct seccomp_filter *match,
 					const struct seccomp_data *sd)
@@ -801,6 +844,7 @@ static int seccomp_do_user_notification(int this_syscall,
 	u32 flags = 0;
 	long ret = 0;
 	struct seccomp_knotif n = {};
+	struct seccomp_kaddfd *addfd, *tmp;
 
 	mutex_lock(&match->notify_lock);
 	err = -ENOSYS;
@@ -813,6 +857,7 @@ static int seccomp_do_user_notification(int this_syscall,
 	n.id = seccomp_next_notify_id(match);
 	init_completion(&n.ready);
 	list_add(&n.list, &match->notif->notifications);
+	INIT_LIST_HEAD(&n.addfd);
 
 	up(&match->notif->request);
 	wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
@@ -821,14 +866,31 @@ static int seccomp_do_user_notification(int this_syscall,
 	/*
 	 * This is where we wait for a reply from userspace.
 	 */
+wait:
 	err = wait_for_completion_interruptible(&n.ready);
 	mutex_lock(&match->notify_lock);
 	if (err == 0) {
+		/* Check if we were woken up by a addfd message */
+		addfd = list_first_entry_or_null(&n.addfd,
+						 struct seccomp_kaddfd, list);
+		if (addfd && n.state != SECCOMP_NOTIFY_REPLIED) {
+			seccomp_handle_addfd(addfd);
+			mutex_unlock(&match->notify_lock);
+			goto wait;
+		}
 		ret = n.val;
 		err = n.error;
 		flags = n.flags;
 	}
 
+	/* If there were any pending addfd calls, clear them out */
+	list_for_each_entry_safe(addfd, tmp, &n.addfd, list) {
+		/* The process went away before we got a chance to handle it */
+		addfd->ret = -ESRCH;
+		list_del_init(&addfd->list);
+		complete(&addfd->completion);
+	}
+
 	/*
 	 * Note that it's possible the listener died in between the time when
 	 * we were notified of a respons (or a signal) and when we were able to
@@ -1069,6 +1131,11 @@ static int seccomp_notify_release(struct inode *inode, struct file *file)
 		knotif->error = -ENOSYS;
 		knotif->val = 0;
 
+		/*
+		 * We do not need to wake up any pending addfd messages, as
+		 * the notifier will do that for us, as this just looks
+		 * like a standard reply.
+		 */
 		complete(&knotif->ready);
 	}
 
@@ -1233,12 +1300,107 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter,
 	return ret;
 }
 
+static long seccomp_notify_addfd(struct seccomp_filter *filter,
+				 struct seccomp_notif_addfd __user *uaddfd,
+				 unsigned int size)
+{
+	struct seccomp_notif_addfd addfd;
+	struct seccomp_knotif *knotif;
+	struct seccomp_kaddfd kaddfd;
+	int ret;
+
+	/* 24 is original sizeof(struct seccomp_notif_addfd) */
+	if (size < 24 || size >= PAGE_SIZE)
+		return -EINVAL;
+
+	ret = copy_struct_from_user(&addfd, sizeof(addfd), uaddfd, size);
+	if (ret)
+		return ret;
+
+	if (addfd.newfd_flags & ~O_CLOEXEC)
+		return -EINVAL;
+
+	if (addfd.flags & ~SECCOMP_ADDFD_FLAG_SETFD)
+		return -EINVAL;
+
+	if (addfd.newfd && !(addfd.flags & SECCOMP_ADDFD_FLAG_SETFD))
+		return -EINVAL;
+
+	kaddfd.file = fget(addfd.srcfd);
+	if (!kaddfd.file)
+		return -EBADF;
+
+	kaddfd.flags = addfd.newfd_flags;
+	kaddfd.fd = (addfd.flags & SECCOMP_ADDFD_FLAG_SETFD) ?
+		    addfd.newfd : -1;
+	init_completion(&kaddfd.completion);
+
+	ret = mutex_lock_interruptible(&filter->notify_lock);
+	if (ret < 0)
+		goto out;
+
+	knotif = find_notification(filter, addfd.id);
+	if (!knotif) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	/*
+	 * We do not want to allow for FD injection to occur before the
+	 * notification has been picked up by a userspace handler, or after
+	 * the notification has been replied to.
+	 */
+	if (knotif->state != SECCOMP_NOTIFY_SENT) {
+		ret = -EINPROGRESS;
+		goto out_unlock;
+	}
+
+	list_add(&kaddfd.list, &knotif->addfd);
+	complete(&knotif->ready);
+	mutex_unlock(&filter->notify_lock);
+
+	/* Now we wait for it to be processed or be interrupted */
+	ret = wait_for_completion_interruptible(&kaddfd.completion);
+	if (ret == 0) {
+		/*
+		 * We had a successful completion. The other side has already
+		 * removed us from the addfd queue, and
+		 * wait_for_completion_interruptible has a memory barrier upon
+		 * success that lets us read this value directly without
+		 * locking.
+		 */
+		ret = kaddfd.ret;
+		goto out;
+	}
+
+	mutex_lock(&filter->notify_lock);
+	/*
+	 * Even though we were woken up by a signal and not a successful
+	 * completion, a completion may have happened in the mean time.
+	 *
+	 * We need to check again if the addfd request has been handled,
+	 * and if not, we will remove it from the queue.
+	 */
+	if (list_empty(&kaddfd.list))
+		ret = kaddfd.ret;
+	else
+		list_del(&kaddfd.list);
+
+out_unlock:
+	mutex_unlock(&filter->notify_lock);
+out:
+	fput(kaddfd.file);
+
+	return ret;
+}
+
 static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
 				 unsigned long arg)
 {
 	struct seccomp_filter *filter = file->private_data;
 	void __user *buf = (void __user *)arg;
 
+	/* Fixed-size ioctls */
 	switch (cmd) {
 	case SECCOMP_IOCTL_NOTIF_RECV:
 		return seccomp_notify_recv(filter, buf);
@@ -1247,9 +1409,17 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd,
 	case SECCOMP_IOCTL_NOTIF_ID_VALID_WRONG_DIR:
 	case SECCOMP_IOCTL_NOTIF_ID_VALID:
 		return seccomp_notify_id_valid(filter, buf);
+	}
+
+	/* Extensible Argument ioctls */
+#define EA_IOCTL(cmd)	((cmd) & ~(IOC_INOUT | IOCSIZE_MASK))
+	switch (EA_IOCTL(cmd)) {
+	case EA_IOCTL(SECCOMP_IOCTL_NOTIF_ADDFD):
+		return seccomp_notify_addfd(filter, buf, _IOC_SIZE(cmd));
 	default:
 		return -EINVAL;
 	}
+#undef EA_IOCTL
 }
 
 static __poll_t seccomp_notify_poll(struct file *file,
-- 
2.25.1


^ permalink raw reply related

* [PATCH v5 3/7] fs: Add fd_install_received() wrapper for __fd_install_received()
From: Kees Cook @ 2020-06-17 22:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano, Greg Kroah-Hartman,
	Andy Lutomirski, Will Drewry, Shuah Khan, netdev, containers,
	linux-api, linux-fsdevel, linux-kselftest
In-Reply-To: <20200617220327.3731559-1-keescook@chromium.org>

For both pidfd and seccomp, the __user pointer is not used. Update
__fd_install_received() to make writing to ufd optional via a NULL check.
However, for the fd_install_received_user() wrapper, ufd is NULL checked
so an -EFAULT can be returned to avoid changing the SCM_RIGHTS interface
behavior. Add new wrapper fd_install_received() for pidfd and seccomp
that does not use the ufd argument. For the new helper, the new fd needs
to be returned on success. Update the existing callers to handle it.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/file.c            | 22 ++++++++++++++--------
 include/linux/file.h |  7 +++++++
 net/compat.c         |  2 +-
 net/core/scm.c       |  2 +-
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index f2167d6feec6..de85a42defe2 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -942,9 +942,10 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
  * @o_flags: the O_* flags to apply to the new fd entry
  *
  * Installs a received file into the file descriptor table, with appropriate
- * checks and count updates. Writes the fd number to userspace.
+ * checks and count updates. Optionally writes the fd number to userspace, if
+ * @ufd is non-NULL.
  *
- * Returns -ve on error.
+ * Returns newly install fd or -ve on error.
  */
 int __fd_install_received(struct file *file, int __user *ufd, unsigned int o_flags)
 {
@@ -960,20 +961,25 @@ int __fd_install_received(struct file *file, int __user *ufd, unsigned int o_fla
 	if (new_fd < 0)
 		return new_fd;
 
-	error = put_user(new_fd, ufd);
-	if (error) {
-		put_unused_fd(new_fd);
-		return error;
+	if (ufd) {
+		error = put_user(new_fd, ufd);
+		if (error) {
+			put_unused_fd(new_fd);
+			return error;
+		}
 	}
 
-	/* Bump the usage count and install the file. */
+	/* Bump the usage count and install the file. The resulting value of
+	 * "error" is ignored here since we only need to take action when
+	 * the file is a socket and testing "sock" for NULL is sufficient.
+	 */
 	sock = sock_from_file(file, &error);
 	if (sock) {
 		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
 		sock_update_classid(&sock->sk->sk_cgrp_data);
 	}
 	fd_install(new_fd, get_file(file));
-	return 0;
+	return new_fd;
 }
 
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
diff --git a/include/linux/file.h b/include/linux/file.h
index fe18a1a0d555..e19974ed9322 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -9,6 +9,7 @@
 #include <linux/compiler.h>
 #include <linux/types.h>
 #include <linux/posix_types.h>
+#include <linux/errno.h>
 
 struct file;
 
@@ -96,8 +97,14 @@ extern int __fd_install_received(struct file *file, int __user *ufd,
 static inline int fd_install_received_user(struct file *file, int __user *ufd,
 					   unsigned int o_flags)
 {
+	if (ufd == NULL)
+		return -EFAULT;
 	return __fd_install_received(file, ufd, o_flags);
 }
+static inline int fd_install_received(struct file *file, unsigned int o_flags)
+{
+	return __fd_install_received(file, NULL, o_flags);
+}
 
 extern void flush_delayed_fput(void);
 extern void __fput_sync(struct file *);
diff --git a/net/compat.c b/net/compat.c
index 94f288e8dac5..71494337cca7 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -299,7 +299,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
 
 	for (i = 0; i < fdmax; i++) {
 		err = fd_install_received_user(scm->fp->fp[i], cmsg_data + i, o_flags);
-		if (err)
+		if (err < 0)
 			break;
 	}
 
diff --git a/net/core/scm.c b/net/core/scm.c
index df190f1fdd28..b9a0442ebd26 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -307,7 +307,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 
 	for (i = 0; i < fdmax; i++) {
 		err = fd_install_received_user(scm->fp->fp[i], cmsg_data + i, o_flags);
-		if (err)
+		if (err < 0)
 			break;
 	}
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH v5 2/7] fs: Move __scm_install_fd() to __fd_install_received()
From: Kees Cook @ 2020-06-17 22:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano, Greg Kroah-Hartman,
	Andy Lutomirski, Will Drewry, Shuah Khan, netdev, containers,
	linux-api, linux-fsdevel, linux-kselftest
In-Reply-To: <20200617220327.3731559-1-keescook@chromium.org>

In preparation for users of the "install a received file" logic outside
of net/ (pidfd and seccomp), relocate and rename __scm_install_fd() from
net/core/scm.c to __fd_install_received() in fs/file.c, and provide a
wrapper named fd_install_received_user(), as future patches will change
the interface to __fd_install_received().

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/file.c            | 45 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/file.h |  8 ++++++++
 include/net/scm.h    |  1 -
 net/compat.c         |  2 +-
 net/core/scm.c       | 32 +------------------------------
 5 files changed, 55 insertions(+), 33 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index abb8b7081d7a..f2167d6feec6 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -11,6 +11,7 @@
 #include <linux/export.h>
 #include <linux/fs.h>
 #include <linux/mm.h>
+#include <linux/net.h>
 #include <linux/sched/signal.h>
 #include <linux/slab.h>
 #include <linux/file.h>
@@ -18,6 +19,8 @@
 #include <linux/bitops.h>
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
+#include <net/cls_cgroup.h>
+#include <net/netprio_cgroup.h>
 
 unsigned int sysctl_nr_open __read_mostly = 1024*1024;
 unsigned int sysctl_nr_open_min = BITS_PER_LONG;
@@ -931,6 +934,48 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags)
 	return err;
 }
 
+/**
+ * __fd_install_received() - Install received file into file descriptor table
+ *
+ * @file: struct file that was received from another process
+ * @ufd: __user pointer to write new fd number to
+ * @o_flags: the O_* flags to apply to the new fd entry
+ *
+ * Installs a received file into the file descriptor table, with appropriate
+ * checks and count updates. Writes the fd number to userspace.
+ *
+ * Returns -ve on error.
+ */
+int __fd_install_received(struct file *file, int __user *ufd, unsigned int o_flags)
+{
+	struct socket *sock;
+	int new_fd;
+	int error;
+
+	error = security_file_receive(file);
+	if (error)
+		return error;
+
+	new_fd = get_unused_fd_flags(o_flags);
+	if (new_fd < 0)
+		return new_fd;
+
+	error = put_user(new_fd, ufd);
+	if (error) {
+		put_unused_fd(new_fd);
+		return error;
+	}
+
+	/* Bump the usage count and install the file. */
+	sock = sock_from_file(file, &error);
+	if (sock) {
+		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
+		sock_update_classid(&sock->sk->sk_cgrp_data);
+	}
+	fd_install(new_fd, get_file(file));
+	return 0;
+}
+
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
 {
 	int err = -EBADF;
diff --git a/include/linux/file.h b/include/linux/file.h
index 122f80084a3e..fe18a1a0d555 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -91,6 +91,14 @@ extern void put_unused_fd(unsigned int fd);
 
 extern void fd_install(unsigned int fd, struct file *file);
 
+extern int __fd_install_received(struct file *file, int __user *ufd,
+				 unsigned int o_flags);
+static inline int fd_install_received_user(struct file *file, int __user *ufd,
+					   unsigned int o_flags)
+{
+	return __fd_install_received(file, ufd, o_flags);
+}
+
 extern void flush_delayed_fput(void);
 extern void __fput_sync(struct file *);
 
diff --git a/include/net/scm.h b/include/net/scm.h
index 581a94d6c613..1ce365f4c256 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -37,7 +37,6 @@ struct scm_cookie {
 #endif
 };
 
-int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags);
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
 void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
 int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
diff --git a/net/compat.c b/net/compat.c
index 27d477fdcaa0..94f288e8dac5 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -298,7 +298,7 @@ void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
 	int err = 0, i;
 
 	for (i = 0; i < fdmax; i++) {
-		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
+		err = fd_install_received_user(scm->fp->fp[i], cmsg_data + i, o_flags);
 		if (err)
 			break;
 	}
diff --git a/net/core/scm.c b/net/core/scm.c
index 6151678c73ed..df190f1fdd28 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -280,36 +280,6 @@ void put_cmsg_scm_timestamping(struct msghdr *msg, struct scm_timestamping_inter
 }
 EXPORT_SYMBOL(put_cmsg_scm_timestamping);
 
-int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags)
-{
-	struct socket *sock;
-	int new_fd;
-	int error;
-
-	error = security_file_receive(file);
-	if (error)
-		return error;
-
-	new_fd = get_unused_fd_flags(o_flags);
-	if (new_fd < 0)
-		return new_fd;
-
-	error = put_user(new_fd, ufd);
-	if (error) {
-		put_unused_fd(new_fd);
-		return error;
-	}
-
-	/* Bump the usage count and install the file. */
-	sock = sock_from_file(file, &error);
-	if (sock) {
-		sock_update_netprioidx(&sock->sk->sk_cgrp_data);
-		sock_update_classid(&sock->sk->sk_cgrp_data);
-	}
-	fd_install(new_fd, get_file(file));
-	return 0;
-}
-
 static int scm_max_fds(struct msghdr *msg)
 {
 	if (msg->msg_controllen <= sizeof(struct cmsghdr))
@@ -336,7 +306,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 	}
 
 	for (i = 0; i < fdmax; i++) {
-		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
+		err = fd_install_received_user(scm->fp->fp[i], cmsg_data + i, o_flags);
 		if (err)
 			break;
 	}
-- 
2.25.1


^ permalink raw reply related

* [PATCH v5 0/7] Add seccomp notifier ioctl that enables adding fds
From: Kees Cook @ 2020-06-17 22:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano, Greg Kroah-Hartman,
	Andy Lutomirski, Will Drewry, Shuah Khan, netdev, containers,
	linux-api, linux-fsdevel, linux-kselftest

Hello!

v5:
- merge ioctl fixes into Sargun's patches directly
- adjust new API to avoid "ufd_required" argument
- drop general clean up patches now present in for-next/seccomp
v4: https://lore.kernel.org/lkml/20200616032524.460144-1-keescook@chromium.org/

This continues the thread-merge between [1] and [2]. tl;dr: add a way for
a seccomp user_notif process manager to inject files into the managed
process in order to handle emulation of various fd-returning syscalls
across security boundaries. Containers folks and Chrome are in need
of the feature, and investigating this solution uncovered (and fixed)
implementation issues with existing file sending routines.

I intend to carry this in the seccomp tree, unless someone has objections.
:) Please review and test!

-Kees

[1] https://lore.kernel.org/lkml/20200603011044.7972-1-sargun@sargun.me/
[2] https://lore.kernel.org/lkml/20200610045214.1175600-1-keescook@chromium.org/


Kees Cook (5):
  net/scm: Regularize compat handling of scm_detach_fds()
  fs: Move __scm_install_fd() to __fd_install_received()
  fs: Add fd_install_received() wrapper for __fd_install_received()
  pidfd: Replace open-coded partial fd_install_received()
  fs: Expand __fd_install_received() to accept fd

Sargun Dhillon (2):
  seccomp: Introduce addfd ioctl to seccomp user notifier
  selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD

 fs/file.c                                     |  63 +++++
 include/linux/file.h                          |  19 ++
 include/uapi/linux/seccomp.h                  |  22 ++
 kernel/pid.c                                  |  11 +-
 kernel/seccomp.c                              | 172 ++++++++++++-
 net/compat.c                                  |  55 ++---
 net/core/scm.c                                |  50 +---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 229 ++++++++++++++++++
 8 files changed, 540 insertions(+), 81 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH v5 1/7] net/scm: Regularize compat handling of scm_detach_fds()
From: Kees Cook @ 2020-06-17 22:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Sargun Dhillon, Christian Brauner, Tycho Andersen,
	David Laight, Christoph Hellwig, David S. Miller, Jakub Kicinski,
	Alexander Viro, Aleksa Sarai, Matt Denton, Jann Horn,
	Chris Palmer, Robert Sesek, Giuseppe Scrivano, Greg Kroah-Hartman,
	Andy Lutomirski, Will Drewry, Shuah Khan, netdev, containers,
	linux-api, linux-fsdevel, linux-kselftest
In-Reply-To: <20200617220327.3731559-1-keescook@chromium.org>

Duplicate the cleanups from commit 2618d530dd8b ("net/scm: cleanup
scm_detach_fds") into the compat code.

Move the check added in commit 1f466e1f15cf ("net: cleanly handle kernel
vs user buffers for ->msg_control") to before the compat call, even
though it should be impossible for an in-kernel call to also be compat.

Correct the int "flags" argument to unsigned int to match fd_install()
and similar APIs.

Regularize any remaining differences, including a whitespace issue,
a checkpatch warning, and add the check from commit 6900317f5eff ("net,
scm: fix PaX detected msg_controllen overflow in scm_detach_fds") which
fixed an overflow unique to 64-bit. To avoid confusion when comparing
the compat handler to the native handler, just include the same check
in the compat handler.

Fixes: 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set correctly")
Fixes: d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set correctly")
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/scm.h |  1 +
 net/compat.c      | 55 +++++++++++++++++++++--------------------------
 net/core/scm.c    | 18 ++++++++--------
 3 files changed, 35 insertions(+), 39 deletions(-)

diff --git a/include/net/scm.h b/include/net/scm.h
index 1ce365f4c256..581a94d6c613 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -37,6 +37,7 @@ struct scm_cookie {
 #endif
 };
 
+int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags);
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm);
 void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm);
 int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm);
diff --git a/net/compat.c b/net/compat.c
index 5e3041a2c37d..27d477fdcaa0 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -281,39 +281,31 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int type, int len, void *dat
 	return 0;
 }
 
-void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
+static int scm_max_fds_compat(struct msghdr *msg)
 {
-	struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) kmsg->msg_control;
-	int fdmax = (kmsg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
-	int fdnum = scm->fp->count;
-	struct file **fp = scm->fp->fp;
-	int __user *cmfptr;
-	int err = 0, i;
+	if (msg->msg_controllen <= sizeof(struct compat_cmsghdr))
+		return 0;
+	return (msg->msg_controllen - sizeof(struct compat_cmsghdr)) / sizeof(int);
+}
 
-	if (fdnum < fdmax)
-		fdmax = fdnum;
+void scm_detach_fds_compat(struct msghdr *msg, struct scm_cookie *scm)
+{
+	struct compat_cmsghdr __user *cm =
+		(struct compat_cmsghdr __user *)msg->msg_control;
+	unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
+	int fdmax = min_t(int, scm_max_fds_compat(msg), scm->fp->count);
+	int __user *cmsg_data = CMSG_USER_DATA(cm);
+	int err = 0, i;
 
-	for (i = 0, cmfptr = (int __user *) CMSG_COMPAT_DATA(cm); i < fdmax; i++, cmfptr++) {
-		int new_fd;
-		err = security_file_receive(fp[i]);
+	for (i = 0; i < fdmax; i++) {
+		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
 		if (err)
 			break;
-		err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & kmsg->msg_flags
-					  ? O_CLOEXEC : 0);
-		if (err < 0)
-			break;
-		new_fd = err;
-		err = put_user(new_fd, cmfptr);
-		if (err) {
-			put_unused_fd(new_fd);
-			break;
-		}
-		/* Bump the usage count and install the file. */
-		fd_install(new_fd, get_file(fp[i]));
 	}
 
 	if (i > 0) {
 		int cmlen = CMSG_COMPAT_LEN(i * sizeof(int));
+
 		err = put_user(SOL_SOCKET, &cm->cmsg_level);
 		if (!err)
 			err = put_user(SCM_RIGHTS, &cm->cmsg_type);
@@ -321,16 +313,19 @@ void scm_detach_fds_compat(struct msghdr *kmsg, struct scm_cookie *scm)
 			err = put_user(cmlen, &cm->cmsg_len);
 		if (!err) {
 			cmlen = CMSG_COMPAT_SPACE(i * sizeof(int));
-			kmsg->msg_control += cmlen;
-			kmsg->msg_controllen -= cmlen;
+			if (msg->msg_controllen < cmlen)
+				cmlen = msg->msg_controllen;
+			msg->msg_control += cmlen;
+			msg->msg_controllen -= cmlen;
 		}
 	}
-	if (i < fdnum)
-		kmsg->msg_flags |= MSG_CTRUNC;
+
+	if (i < scm->fp->count || (scm->fp->count && fdmax <= 0))
+		msg->msg_flags |= MSG_CTRUNC;
 
 	/*
-	 * All of the files that fit in the message have had their
-	 * usage counts incremented, so we just free the list.
+	 * All of the files that fit in the message have had their usage counts
+	 * incremented, so we just free the list.
 	 */
 	__scm_destroy(scm);
 }
diff --git a/net/core/scm.c b/net/core/scm.c
index 875df1c2989d..6151678c73ed 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -280,7 +280,7 @@ void put_cmsg_scm_timestamping(struct msghdr *msg, struct scm_timestamping_inter
 }
 EXPORT_SYMBOL(put_cmsg_scm_timestamping);
 
-static int __scm_install_fd(struct file *file, int __user *ufd, int o_flags)
+int __scm_install_fd(struct file *file, int __user *ufd, unsigned int o_flags)
 {
 	struct socket *sock;
 	int new_fd;
@@ -319,29 +319,29 @@ static int scm_max_fds(struct msghdr *msg)
 
 void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
 {
-	struct cmsghdr __user *cm
-		= (__force struct cmsghdr __user*)msg->msg_control;
-	int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
+	struct cmsghdr __user *cm =
+		(__force struct cmsghdr __user *)msg->msg_control;
+	unsigned int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
 	int fdmax = min_t(int, scm_max_fds(msg), scm->fp->count);
 	int __user *cmsg_data = CMSG_USER_DATA(cm);
 	int err = 0, i;
 
+	/* no use for FD passing from kernel space callers */
+	if (WARN_ON_ONCE(!msg->msg_control_is_user))
+		return;
+
 	if (msg->msg_flags & MSG_CMSG_COMPAT) {
 		scm_detach_fds_compat(msg, scm);
 		return;
 	}
 
-	/* no use for FD passing from kernel space callers */
-	if (WARN_ON_ONCE(!msg->msg_control_is_user))
-		return;
-
 	for (i = 0; i < fdmax; i++) {
 		err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
 		if (err)
 			break;
 	}
 
-	if (i > 0)  {
+	if (i > 0) {
 		int cmlen = CMSG_LEN(i * sizeof(int));
 
 		err = put_user(SOL_SOCKET, &cm->cmsg_level);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] liquidio: Replace vmalloc_node + memset with vzalloc_node and use array_size
From: David Miller @ 2020-06-17 22:04 UTC (permalink / raw)
  To: gustavoars
  Cc: dchickles, sburla, fmanlunas, kuba, netdev, linux-kernel, gustavo
In-Reply-To: <20200615211855.GA32663@embeddedor>

From: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Date: Mon, 15 Jun 2020 16:18:55 -0500

> Use vzalloc/vzalloc_node instead of the vmalloc/vzalloc_node and memset.
> 
> Also, notice that vzalloc_node() function has no 2-factor argument form
> to calculate the size for the allocation, so multiplication factors need
> to be wrapped in array_size().
> 
> This issue was found with the help of Coccinelle and, audited and fixed
> manually.
> 
> Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83
> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH][next] ethtool: ioctl: Use array_size() in copy_to_user()
From: David Miller @ 2020-06-17 22:05 UTC (permalink / raw)
  To: gustavoars; +Cc: kuba, netdev, linux-kernel, gustavo, keescook
In-Reply-To: <20200615230107.GA16907@embeddedor>

From: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Date: Mon, 15 Jun 2020 18:01:07 -0500

> Use array_size() helper instead of the open-coded version in
> copy_to_user(). These sorts of multiplication factors need to
> be wrapped in array_size().
> 
> This issue was found with the help of Coccinelle and, audited and fixed
> manually.
> 
> Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83
> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>

Applied to net-next.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox