Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 1/7] lwt: Add net to build_state argument
From: kbuild test robot @ 2016-09-15  0:17 UTC (permalink / raw)
  To: Tom Herbert; +Cc: kbuild-all, davem, netdev, tgraf, roopa, kernel-team
In-Reply-To: <1473895376-347096-2-git-send-email-tom@herbertland.com>

[-- Attachment #1: Type: text/plain, Size: 1466 bytes --]

Hi Tom,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Tom-Herbert/net-ILA-resolver-and-generic-resolver-backend/20160915-073357
config: i386-randconfig-x003-201637 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/Tom-Herbert/net-ILA-resolver-and-generic-resolver-backend/20160915-073357 HEAD ef54da9d3e09732810a23f283e0f8039fb18eede builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   net/core/lwtunnel.c: In function 'lwtunnel_encap_str':
>> net/core/lwtunnel.c:42:7: error: 'LWTUNNEL_ENCAP_ILA_NOTIFY' undeclared (first use in this function)
     case LWTUNNEL_ENCAP_ILA_NOTIFY:
          ^~~~~~~~~~~~~~~~~~~~~~~~~
   net/core/lwtunnel.c:42:7: note: each undeclared identifier is reported only once for each function it appears in

vim +/LWTUNNEL_ENCAP_ILA_NOTIFY +42 net/core/lwtunnel.c

    36		 */
    37		switch (encap_type) {
    38		case LWTUNNEL_ENCAP_MPLS:
    39			return "MPLS";
    40		case LWTUNNEL_ENCAP_ILA:
    41			return "ILA";
  > 42		case LWTUNNEL_ENCAP_ILA_NOTIFY:
    43			return "ILA_NOTIFY";
    44		case LWTUNNEL_ENCAP_IP6:
    45		case LWTUNNEL_ENCAP_IP:

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23976 bytes --]

^ permalink raw reply

* Re: [PATCH V2 1/3] Documentation: devicetree: add qca8k binding
From: Andrew Lunn @ 2016-09-15  0:22 UTC (permalink / raw)
  To: John Crispin
  Cc: David S. Miller, Florian Fainelli, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	qsdk-review-A+ZNKFmMK5xy9aJCnZT0Uw,
	devicetree-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1473849542-3298-2-git-send-email-john-Pj+rj9U5foFAfugRpC6u6w@public.gmane.org>

On Wed, Sep 14, 2016 at 12:39:00PM +0200, John Crispin wrote:
> Add device-tree binding for ar8xxx switch families.
> 
> Cc: devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: John Crispin <john-Pj+rj9U5foFAfugRpC6u6w@public.gmane.org>
> ---
> Changes in V2
> * fixup ecample to include phy nodes and corresponding phandles
> * add a note explaining why we need to phy nodes
> 
>  .../devicetree/bindings/net/dsa/qca8k.txt          |   88 ++++++++++++++++++++
>  1 file changed, 88 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/dsa/qca8k.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
> new file mode 100644
> index 0000000..2c1582a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
> @@ -0,0 +1,88 @@
> +* Qualcomm Atheros QCA8xxx switch family
> +
> +Required properties:
> +
> +- compatible: should be "qca,qca8337"
> +- #size-cells: must be 0
> +- #address-cells: must be 1
> +
> +Subnodes:
> +
> +The integrated switch subnode should be specified according to the binding
> +described in dsa/dsa.txt. As the QCA8K switches do not have a N:N mapping of
> +port and PHY id, each subnode describing a port needs to have a valid phandle
> +referencing the internal PHY connected to it.

Hi John

I've not looked at the driver yet, but you said yesterday the CPU port
has to be port 0. I think it would be good to document that here.

Otherwise, this is looking good.

	   Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V2 2/3] net-next: dsa: add Qualcomm tag RX/TX handler
From: Andrew Lunn @ 2016-09-15  0:24 UTC (permalink / raw)
  To: John Crispin
  Cc: David S. Miller, Florian Fainelli, netdev, linux-kernel,
	qsdk-review
In-Reply-To: <1473849542-3298-3-git-send-email-john@phrozen.org>

On Wed, Sep 14, 2016 at 12:39:01PM +0200, John Crispin wrote:
> Add support for the 2-bytes Qualcomm tag that gigabit switches such as
> the QCA8337/N might insert when receiving packets, or that we need
> to insert while targeting specific switch ports. The tag is inserted
> directly behind the ethernet header.
> 
> Signed-off-by: John Crispin <john@phrozen.org>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/7] lwt: Add net to build_state argument
From: kbuild test robot @ 2016-09-15  0:42 UTC (permalink / raw)
  To: Tom Herbert; +Cc: kbuild-all, davem, netdev, tgraf, roopa, kernel-team
In-Reply-To: <1473895376-347096-2-git-send-email-tom@herbertland.com>

[-- Attachment #1: Type: text/plain, Size: 8155 bytes --]

Hi Tom,

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Tom-Herbert/net-ILA-resolver-and-generic-resolver-backend/20160915-073357
config: microblaze-mmu_defconfig (attached as .config)
compiler: microblaze-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=microblaze 

All warnings (new ones prefixed by >>):

   net/ipv4/fib_semantics.c: In function 'fib_encap_match':
>> net/ipv4/fib_semantics.c:614:8: warning: passing argument 1 of 'lwtunnel_build_state' from incompatible pointer type
     ret = lwtunnel_build_state(net, dev, encap_type, encap,
           ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'struct net_device *' but argument is of type 'struct net *'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:614:8: warning: passing argument 2 of 'lwtunnel_build_state' makes integer from pointer without a cast
     ret = lwtunnel_build_state(net, dev, encap_type, encap,
           ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'u16' but argument is of type 'struct net_device *'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:614:8: warning: passing argument 3 of 'lwtunnel_build_state' makes pointer from integer without a cast
     ret = lwtunnel_build_state(net, dev, encap_type, encap,
           ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'struct nlattr *' but argument is of type 'u16'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:614:8: warning: passing argument 4 of 'lwtunnel_build_state' makes integer from pointer without a cast
     ret = lwtunnel_build_state(net, dev, encap_type, encap,
           ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'unsigned int' but argument is of type 'struct nlattr *'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:614:8: warning: passing argument 5 of 'lwtunnel_build_state' makes pointer from integer without a cast
     ret = lwtunnel_build_state(net, dev, encap_type, encap,
           ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'const void *' but argument is of type 'int'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:614:8: warning: passing argument 6 of 'lwtunnel_build_state' from incompatible pointer type
     ret = lwtunnel_build_state(net, dev, encap_type, encap,
           ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'struct lwtunnel_state **' but argument is of type 'const struct fib_config *'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:614:8: error: too many arguments to function 'lwtunnel_build_state'
     ret = lwtunnel_build_state(net, dev, encap_type, encap,
           ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: declared here
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c: In function 'fib_create_info':
   net/ipv4/fib_semantics.c:1102:10: warning: passing argument 1 of 'lwtunnel_build_state' from incompatible pointer type
       err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
             ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'struct net_device *' but argument is of type 'struct net *'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:1102:10: warning: passing argument 2 of 'lwtunnel_build_state' makes integer from pointer without a cast
       err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
             ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'u16' but argument is of type 'struct net_device *'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:1102:10: warning: passing argument 3 of 'lwtunnel_build_state' makes pointer from integer without a cast
       err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
             ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'struct nlattr *' but argument is of type 'u16'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:1102:10: warning: passing argument 4 of 'lwtunnel_build_state' makes integer from pointer without a cast
       err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
             ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'unsigned int' but argument is of type 'struct nlattr *'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:1102:10: warning: passing argument 5 of 'lwtunnel_build_state' makes pointer from integer without a cast
       err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
             ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'const void *' but argument is of type 'int'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:1102:10: warning: passing argument 6 of 'lwtunnel_build_state' from incompatible pointer type
       err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
             ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: expected 'struct lwtunnel_state **' but argument is of type 'struct fib_config *'
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^
   net/ipv4/fib_semantics.c:1102:10: error: too many arguments to function 'lwtunnel_build_state'
       err = lwtunnel_build_state(net, dev, cfg->fc_encap_type,
             ^
   In file included from net/ipv4/fib_semantics.c:45:0:
   include/net/lwtunnel.h:172:19: note: declared here
    static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
                      ^

vim +/lwtunnel_build_state +614 net/ipv4/fib_semantics.c

   598	#endif /* CONFIG_IP_ROUTE_MULTIPATH */
   599	
   600	static int fib_encap_match(struct net *net, u16 encap_type,
   601				   struct nlattr *encap,
   602				   int oif, const struct fib_nh *nh,
   603				   const struct fib_config *cfg)
   604	{
   605		struct lwtunnel_state *lwtstate;
   606		struct net_device *dev = NULL;
   607		int ret, result = 0;
   608	
   609		if (encap_type == LWTUNNEL_ENCAP_NONE)
   610			return 0;
   611	
   612		if (oif)
   613			dev = __dev_get_by_index(net, oif);
 > 614		ret = lwtunnel_build_state(net, dev, encap_type, encap,
   615					   AF_INET, cfg, &lwtstate);
   616		if (!ret) {
   617			result = lwtunnel_cmp_encap(lwtstate, nh->nh_lwtstate);
   618			lwtstate_free(lwtstate);
   619		}
   620	
   621		return result;
   622	}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 12345 bytes --]

^ permalink raw reply

* Re: [Intel-wired-lan] [net-next PATCH v3 1/3] e1000: track BQL bytes regardless of skb or not
From: Alexei Starovoitov @ 2016-09-15  0:43 UTC (permalink / raw)
  To: Brown, Aaron F
  Cc: John Fastabend, bblanco@plumgrid.com, Kirsher, Jeffrey T,
	brouer@redhat.com, davem@davemloft.net, xiyou.wangcong@gmail.com,
	intel-wired-lan@lists.osuosl.org, u9012063@gmail.com,
	netdev@vger.kernel.org
In-Reply-To: <309B89C4C689E141A5FF6A0C5FB2118B81F90268@ORSMSX101.amr.corp.intel.com>

On Wed, Sep 14, 2016 at 11:57:24PM +0000, Brown, Aaron F wrote:
> > From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
> > Behalf Of John Fastabend
> > Sent: Monday, September 12, 2016 3:13 PM
> > To: bblanco@plumgrid.com; john.fastabend@gmail.com;
> > alexei.starovoitov@gmail.com; Kirsher, Jeffrey T
> > <jeffrey.t.kirsher@intel.com>; brouer@redhat.com; davem@davemloft.net
> > Cc: xiyou.wangcong@gmail.com; intel-wired-lan@lists.osuosl.org;
> > u9012063@gmail.com; netdev@vger.kernel.org
> > Subject: [Intel-wired-lan] [net-next PATCH v3 1/3] e1000: track BQL bytes
> > regardless of skb or not
> > 
> > The BQL API does not reference the sk_buff nor does the driver need to
> > reference the sk_buff to calculate the length of a transmitted frame.
> > This patch removes an sk_buff reference from the xmit irq path and
> > also allows packets sent from XDP to use BQL.
> > 
> > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > ---
> >  drivers/net/ethernet/intel/e1000/e1000_main.c |    7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> This patch is causing all my e1000 adapters to fail a simple ftp session with really slow response (hashing on) followed by adapter resets.  I have seen this on 4 different e1000 nics now, an 82543GC, 82544GC, 82546EB and an 82545GM.  On a few occasions I get a splat captured to dmesg.  Here is an example:
> --------------------------------
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x1c2/0x1d0
> NETDEV WATCHDOG: eth1 (e1000): transmit queue 0 timed out

Thanks a lot for the tests! Really appreciate it.

^ permalink raw reply

* Re: [PATCH V2 3/3] net-next: dsa: add new driver for qca8xxx family
From: Andrew Lunn @ 2016-09-15  1:12 UTC (permalink / raw)
  To: John Crispin
  Cc: David S. Miller, Florian Fainelli, netdev, linux-kernel,
	qsdk-review
In-Reply-To: <1473849542-3298-4-git-send-email-john@phrozen.org>

On Wed, Sep 14, 2016 at 12:39:02PM +0200, John Crispin wrote:
> This patch contains initial support for the QCA8337 switch. It
> will detect a QCA8337 switch, if present and declared in the DT.
> 
> Each port will be represented through a standalone net_device interface,
> as for other DSA switches. CPU can communicate with any of the ports by
> setting an IP@ on ethN interface. Most of the extra callbacks of the DSA
> subsystem are already supported, such as bridge offloading, stp, fdb.
> 
> Signed-off-by: John Crispin <john@phrozen.org>
> ---
> Changes in V2
> * add proper locking for the FDB table
> * remove udelay when changing the page. neither datasheet nore SDK code
>   requires a sleep
> * add a cond_resched to the busy wait loop
> * use nested locking when accessing the mdio bus
> * remove the phy_to_port() wrappers
> * remove mmd access function and use existing phy helpers
> * fix a copy/paste bug breaking the eee callbacks
> * use default vid 1 when fdb entries are added fro vid 0
> * remove the phy id check and add a switch id check instead
> * add error handling to the mdio read/write functions
> * remove inline usage

Hi John

Thanks for the changes, it looks a lot better.

> +static u16 qca8k_current_page = 0xffff;
> +
> +static struct
> +qca8k_priv *qca8k_to_priv(struct dsa_switch *ds)
> +{
> +	struct qca8k_priv *priv = ds->priv;
> +
> +	return priv;
> +}

https://lkml.org/lkml/2016/8/31/936

In this patch, Vivien removed all the callers to ds_to_priv(). Which
is what this function is. Please just use ds->priv.

> +
> +static void
> +qca8k_fdb_flush(struct qca8k_priv *priv)
> +{
> +	mutex_lock(&priv->fdb_mutex);
> +	qca8k_fdb_access(priv, QCA8K_FDB_FLUSH, -1);
> +	mutex_unlock(&priv->fdb_mutex);
> +}

So this protects the fdb. But should this mutex be more general. Take for example:

> +qca8k_eee_enable_set(struct dsa_switch *ds, int port, bool enable)
> +{
> +	struct qca8k_priv *priv = qca8k_to_priv(ds);
> +	u32 lpi_en = QCA8K_REG_EEE_CTRL_LPI_EN(port);
> +	u32 reg;
> +
> +	reg = qca8k_read(priv, QCA8K_REG_EEE_CTRL);
> +	if (enable)
> +		reg |= lpi_en;
> +	else
> +		reg &= ~lpi_en;
> +	qca8k_write(priv, QCA8K_REG_EEE_CTRL, reg);
> +}
> +

Here you do a read/modify/write. Could this be called on two
interfaces at the same time? I think you might want this protected as
well by a mutex. So maybe rename fdb_mutex to something more generic
and use it in places like this as well?

> +static int
> +qca8k_setup(struct dsa_switch *ds)
> +{
> +	struct qca8k_priv *priv = qca8k_to_priv(ds);
> +	int ret, i, phy_mode = -1;
> +
> +	/* Start by setting up the register mapping */
> +	priv->regmap = devm_regmap_init(ds->dev, NULL, priv,
> +					&qca8k_regmap_config);
> +
> +	if (IS_ERR(priv->regmap))
> +		pr_warn("regmap initialization failed");
> +
> +	/* Initialize CPU port pad mode (xMII type, delays...) */
> +	phy_mode = of_get_phy_mode(ds->ports[ds->dst->cpu_port].dn);
> +	if (phy_mode < 0) {
> +		pr_err("Can't find phy-mode for master device\n");
> +		return phy_mode;
> +	}
> +	ret = qca8k_set_pad_ctrl(priv, QCA8K_CPU_PORT, phy_mode);
> +	if (ret < 0)
> +		return ret;

Maybe add a check here that dsa_is_cpu_port(ds, 0) is true?  If you
say the CPU port has to be 0, it should be checked for.

> +	/* Enable CPU Port */
> +	qca8k_reg_set(priv, QCA8K_REG_GLOBAL_FW_CTRL0,
> +		      QCA8K_GLOBAL_FW_CTRL0_CPU_PORT_EN);
> +
> +	/* Enable MIB counters */
> +	qca8k_reg_set(priv, QCA8K_REG_MIB, QCA8K_MIB_CPU_KEEP);
> +	qca8k_write(priv, QCA8K_REG_MODULE_EN, QCA8K_MODULE_EN_MIB);

Will this implicitly clear the MIB counters? They should be set to
zero, otherwise they can survive a reboot of the host.

> +
> +	/* Setup connection between CPU ports & PHYs */

You missed a few "PHY". We tend to call such ports user ports, or
external ports.

> +	for (i = 0; i < DSA_MAX_PORTS; i++) {
> +		/* CPU port gets connected to all PHYs in the switch */
> +		if (dsa_is_cpu_port(ds, i)) {
> +			qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(QCA8K_CPU_PORT),
> +				  QCA8K_PORT_LOOKUP_MEMBER,
> +				  ds->enabled_port_mask);
> +		}
> +
> +		/* Invividual PHYs gets connected to CPU port only */
> +		if (ds->enabled_port_mask & BIT(i)) {
> +			int shift = 16 * (i % 2);
> +
> +			qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(i),
> +				  QCA8K_PORT_LOOKUP_MEMBER,
> +				  BIT(QCA8K_CPU_PORT));
> +
> +			/* Enable ARP Auto-learning by default */
> +			qca8k_reg_set(priv, QCA8K_PORT_LOOKUP_CTRL(i),
> +				      QCA8K_PORT_LOOKUP_LEARN);
> +
> +			/* For port based vlans to work we need to set the
> +			 * default egress vid
> +			 */
> +			qca8k_rmw(priv, QCA8K_EGRESS_VLAN(i),
> +				  0xffff << shift, 1 << shift);
> +			qca8k_write(priv, QCA8K_REG_PORT_VLAN_CTRL0(i),
> +				    QCA8K_PORT_VLAN_CVID(1) |
> +				    QCA8K_PORT_VLAN_SVID(1));
> +		}
> +	}
> +
> +	/* Flush the FDB table */
> +	qca8k_fdb_flush(priv);
> +
> +	return 0;
> +}
> +
> +static int
> +qca8k_set_addr(struct dsa_switch *ds, u8 *addr)
> +{
> +	/* The subsystem always calls this function so add an empty stub */
> +	return 0;
> +}

The b53/bcm_sf2 driver also has a NOP for set_addr(). Maybe it is time
to make it optional. But that is a separate patchset. This is O.K.

> +static void
> +qca8k_get_strings(struct dsa_switch *ds, int phy, uint8_t *data)

In general, please can you not use int phy, when the prototype is int
port. This is a bad example, since phy/port is not actually used,
but... In fact, it does not happen so often, so maybe you just missed
this one?

> +static int
> +qca8k_fdb_prepare(struct dsa_switch *ds, int port,
> +		  const struct switchdev_obj_port_fdb *fdb,
> +		  struct switchdev_trans *trans)
> +{
> +	/* We do not need to do anything specific here yet */
> +	return 0;
> +}

Does this mean you can store an infinite number of fdb entries?

> +static void
> +qca8k_sw_remove(struct mdio_device *mdiodev)
> +{
> +	struct qca8k_priv *priv = dev_get_drvdata(&mdiodev->dev);
> +
> +	dsa_unregister_switch(priv->ds);
> +}

This has the same problem as the mv88e6xxx driver. You should disable
all the ports when removing the driver. Something on my TODO list...

> +
> +#ifdef CONFIG_PM_SLEEP
> +static int qca8k_suspend(struct device *dev)
> +{
> +	struct platform_device *pdev = to_platform_device(dev);
> +	struct qca8k_priv *priv = platform_get_drvdata(pdev);
> +
> +	return dsa_switch_suspend(priv->ds);
> +}
> +
> +static int qca8k_resume(struct device *dev)
> +{
> +	struct platform_device *pdev = to_platform_device(dev);
> +	struct qca8k_priv *priv = platform_get_drvdata(pdev);
> +
> +	return dsa_switch_resume(priv->ds);
> +}
> +#endif /* CONFIG_PM_SLEEP */

The bcm_sf2 driver disables the port on suspend, and re-enables them
on resume. You should probably do the same.

Thanks

   Andrew

^ permalink raw reply

* Re: [RFC v3 19/22] landlock: Add interrupted origin
From: Andy Lutomirski @ 2016-09-15  1:19 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public.gmane.org,
	Linux API <linu
In-Reply-To: <57D9CBD3.7030100-WFhQfpSGs3bR7s880joybQ@public.gmane.org>

On Wed, Sep 14, 2016 at 3:14 PM, Mickaël Salaün <mic-WFhQfpSGs3bR7s880joybQ@public.gmane.org> wrote:
>
> On 14/09/2016 20:29, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> This third origin of hook call should cover all possible trigger paths
>>> (e.g. page fault). Landlock eBPF programs can then take decisions
>>> accordingly.
>>>
>>> Signed-off-by: Mickaël Salaün <mic-WFhQfpSGs3bR7s880joybQ@public.gmane.org>
>>> Cc: Alexei Starovoitov <ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>> Cc: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
>>> Cc: Daniel Borkmann <daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org>
>>> Cc: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>>> ---
>>
>>
>>>
>>> +       if (unlikely(in_interrupt())) {
>>
>> IMO security hooks have no business being called from interrupts.
>> Aren't they all synchronous things done by tasks?  Interrupts are
>> driver things.
>>
>> Are you trying to check for page faults and such?
>
> Yes, that was the idea you did put in my mind. Not sure how to deal with
> this.
>

It's not so easy, unfortunately.  The easiest reliable way might be to
set a TS_ flag on all syscall entries when TIF_SECCOMP or similar is
set.

--Andy

^ permalink raw reply

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Andy Lutomirski @ 2016-09-15  1:25 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry,
	kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public.gmane.org,
	Linux API <linu
In-Reply-To: <57D9CB25.1010103-WFhQfpSGs3bR7s880joybQ@public.gmane.org>

On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün <mic-WFhQfpSGs3bR7s880joybQ@public.gmane.org> wrote:
>
> On 14/09/2016 20:27, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
>>> set for all cgroup except the root. The flag is clear when a new process
>>> without the no_new_privs flags is attached to the cgroup.
>>>
>>> If a cgroup is landlocked, then any new attempt, from an unprivileged
>>> process, to attach a process without no_new_privs to this cgroup will
>>> be denied.
>>
>> Until and unless everyone can agree on a way to properly namespace,
>> delegate, etc cgroups, I think that trying to add unprivileged
>> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
>> no-internal-tasks, etc, I just don't see how this approach can be
>> viable.
>
> As far as I can tell, the no_new_privs flag of at task is not related to
> namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
> the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.
>
> Using cgroup is optional, any task could use the seccomp-based
> landlocking instead. However, for those that want/need to manage a
> security policy in a more dynamic way, using cgroups may make sense.
>
> I though cgroup delegation was OK in the v2, isn't it the case? Do you
> have some links?
>
>>
>> Can we try to make landlock work completely independently of cgroups
>> so that it doesn't get stuck and so that programs can use it without
>> worrying about cgroup v1 vs v2, interactions with cgroup managers,
>> cgroup managers that (supposedly?) will start migrating processes
>> around piecemeal and almost certainly blowing up landlock in the
>> process, etc?
>
> This RFC handle both cgroup and seccomp approaches in a similar way. I
> don't see why building on top of cgroup v2 is a problem. Is there
> security issues with delegation?

What I mean is: cgroup v2 delegation has a functionality problem.
Tejun says [1]:

We haven't had to face this decision because cgroup has never properly
supported delegating to applications and the in-use setups where this
happens are custom configurations where there is no boundary between
system and applications and adhoc trial-and-error is good enough a way
to find a working solution.  That wiggle room goes away once we
officially open this up to individual applications.

Unless and until that changes, I think that landlock should stay away
from cgroups.  Others could reasonably disagree with me.


[1] https://lkml.kernel.org/g/<20160909225747.GA30105-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org

^ permalink raw reply

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Alexei Starovoitov @ 2016-09-15  2:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel@vger.kernel.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry, kernel-hardening@lists.openwall.com, Linux API,
	LSM List
In-Reply-To: <CALCETrVjyLaL-0H1AFsfYUtDGA8NSn4R8LkvBMQT7Gpmxeswgg@mail.gmail.com>

On Wed, Sep 14, 2016 at 06:25:07PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On 14/09/2016 20:27, Andy Lutomirski wrote:
> >> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
> >>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
> >>> set for all cgroup except the root. The flag is clear when a new process
> >>> without the no_new_privs flags is attached to the cgroup.
> >>>
> >>> If a cgroup is landlocked, then any new attempt, from an unprivileged
> >>> process, to attach a process without no_new_privs to this cgroup will
> >>> be denied.
> >>
> >> Until and unless everyone can agree on a way to properly namespace,
> >> delegate, etc cgroups, I think that trying to add unprivileged
> >> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
> >> no-internal-tasks, etc, I just don't see how this approach can be
> >> viable.
> >
> > As far as I can tell, the no_new_privs flag of at task is not related to
> > namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
> > the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.
> >
> > Using cgroup is optional, any task could use the seccomp-based
> > landlocking instead. However, for those that want/need to manage a
> > security policy in a more dynamic way, using cgroups may make sense.
> >
> > I though cgroup delegation was OK in the v2, isn't it the case? Do you
> > have some links?
> >
> >>
> >> Can we try to make landlock work completely independently of cgroups
> >> so that it doesn't get stuck and so that programs can use it without
> >> worrying about cgroup v1 vs v2, interactions with cgroup managers,
> >> cgroup managers that (supposedly?) will start migrating processes
> >> around piecemeal and almost certainly blowing up landlock in the
> >> process, etc?
> >
> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> > don't see why building on top of cgroup v2 is a problem. Is there
> > security issues with delegation?
> 
> What I mean is: cgroup v2 delegation has a functionality problem.
> Tejun says [1]:
> 
> We haven't had to face this decision because cgroup has never properly
> supported delegating to applications and the in-use setups where this
> happens are custom configurations where there is no boundary between
> system and applications and adhoc trial-and-error is good enough a way
> to find a working solution.  That wiggle room goes away once we
> officially open this up to individual applications.
> 
> Unless and until that changes, I think that landlock should stay away
> from cgroups.  Others could reasonably disagree with me.

Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
and not for sandboxing. So the above doesn't matter in such contexts.
lsm hooks + cgroups provide convenient scope and existing entry points.
Please see checmate examples how it's used.

^ permalink raw reply

* [PATCH -next] net: dsa: bcm_sf2: Fix non static symbol warning
From: Wei Yongjun @ 2016-09-15  2:24 UTC (permalink / raw)
  To: David S . Miller, Florian Fainelli, Vivien Didelot, Andrew Lunn,
	Jiri Pirko
  Cc: Wei Yongjun, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Fixes the following sparse warning:

drivers/net/dsa/bcm_sf2.c:963:19: warning:
 symbol 'bcm_sf2_io_ops' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/dsa/bcm_sf2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 51f1fc0..f8bb5a5 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -960,7 +960,7 @@ static int bcm_sf2_core_write64(struct b53_device *dev, u8 page, u8 reg,
 	return 0;
 }
 
-struct b53_io_ops bcm_sf2_io_ops = {
+static struct b53_io_ops bcm_sf2_io_ops = {
 	.read8	= bcm_sf2_core_read8,
 	.read16	= bcm_sf2_core_read16,
 	.read32	= bcm_sf2_core_read32,

^ permalink raw reply related

* [PATCH -next] net: dsa: b53: Remove unused including <linux/version.h>
From: Wei Yongjun @ 2016-09-15  2:24 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Wei Yongjun, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Remove including <linux/version.h> that don't need it.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/dsa/b53/b53_priv.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 76672da..f192a67 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -372,7 +372,6 @@ static inline void b53_arl_from_entry(u64 *mac_vid, u32 *fwd_entry,
 
 #ifdef CONFIG_BCM47XX
 
-#include <linux/version.h>
 #include <linux/bcm47xx_nvram.h>
 #include <bcm47xx_board.h>
 static inline int b53_switch_get_reset_gpio(struct b53_device *dev)

^ permalink raw reply related

* [PATCH -next] net: emac: remove unnecessary dev_set_drvdata()
From: Wei Yongjun @ 2016-09-15  2:25 UTC (permalink / raw)
  To: Timur Tabi; +Cc: Wei Yongjun, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/qualcomm/emac/emac.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
index 56e0a9f..42d2d233 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -722,7 +722,6 @@ static int emac_remove(struct platform_device *pdev)
 
 	mdiobus_unregister(adpt->mii_bus);
 	free_netdev(netdev);
-	dev_set_drvdata(&pdev->dev, NULL);
 
 	return 0;
 }

^ permalink raw reply related

* [PATCH -next] net: emac: remove .owner field for driver
From: Wei Yongjun @ 2016-09-15  2:26 UTC (permalink / raw)
  To: Timur Tabi; +Cc: Wei Yongjun, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Remove .owner field if calls are used which set it automatically.

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/qualcomm/emac/emac.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
index 56e0a9f..764e7b4b 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -731,7 +731,6 @@ static struct platform_driver emac_platform_driver = {
 	.probe	= emac_probe,
 	.remove	= emac_remove,
 	.driver = {
-		.owner		= THIS_MODULE,
 		.name		= "qcom-emac",
 		.of_match_table = emac_dt_match,
 	},

^ permalink raw reply related

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Andy Lutomirski @ 2016-09-15  2:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün, linux-kernel@vger.kernel.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry
In-Reply-To: <20160915021940.GA65119@ast-mbp.thefacebook.com>

On Wed, Sep 14, 2016 at 7:19 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Wed, Sep 14, 2016 at 06:25:07PM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 3:11 PM, Mickaël Salaün <mic@digikod.net> wrote:
>> >
>> > On 14/09/2016 20:27, Andy Lutomirski wrote:
>> >> On Wed, Sep 14, 2016 at 12:24 AM, Mickaël Salaün <mic@digikod.net> wrote:
>> >>> Add a new flag CGRP_NO_NEW_PRIVS for each cgroup. This flag is initially
>> >>> set for all cgroup except the root. The flag is clear when a new process
>> >>> without the no_new_privs flags is attached to the cgroup.
>> >>>
>> >>> If a cgroup is landlocked, then any new attempt, from an unprivileged
>> >>> process, to attach a process without no_new_privs to this cgroup will
>> >>> be denied.
>> >>
>> >> Until and unless everyone can agree on a way to properly namespace,
>> >> delegate, etc cgroups, I think that trying to add unprivileged
>> >> semantics to cgroups is nuts.  Given the big thread about cgroup v2,
>> >> no-internal-tasks, etc, I just don't see how this approach can be
>> >> viable.
>> >
>> > As far as I can tell, the no_new_privs flag of at task is not related to
>> > namespaces. The CGRP_NO_NEW_PRIVS flag is only a cache to quickly access
>> > the no_new_privs property of *tasks* in a cgroup. The semantic is unchanged.
>> >
>> > Using cgroup is optional, any task could use the seccomp-based
>> > landlocking instead. However, for those that want/need to manage a
>> > security policy in a more dynamic way, using cgroups may make sense.
>> >
>> > I though cgroup delegation was OK in the v2, isn't it the case? Do you
>> > have some links?
>> >
>> >>
>> >> Can we try to make landlock work completely independently of cgroups
>> >> so that it doesn't get stuck and so that programs can use it without
>> >> worrying about cgroup v1 vs v2, interactions with cgroup managers,
>> >> cgroup managers that (supposedly?) will start migrating processes
>> >> around piecemeal and almost certainly blowing up landlock in the
>> >> process, etc?
>> >
>> > This RFC handle both cgroup and seccomp approaches in a similar way. I
>> > don't see why building on top of cgroup v2 is a problem. Is there
>> > security issues with delegation?
>>
>> What I mean is: cgroup v2 delegation has a functionality problem.
>> Tejun says [1]:
>>
>> We haven't had to face this decision because cgroup has never properly
>> supported delegating to applications and the in-use setups where this
>> happens are custom configurations where there is no boundary between
>> system and applications and adhoc trial-and-error is good enough a way
>> to find a working solution.  That wiggle room goes away once we
>> officially open this up to individual applications.
>>
>> Unless and until that changes, I think that landlock should stay away
>> from cgroups.  Others could reasonably disagree with me.
>
> Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> and not for sandboxing. So the above doesn't matter in such contexts.
> lsm hooks + cgroups provide convenient scope and existing entry points.
> Please see checmate examples how it's used.
>

To be clear: I'm not arguing at all that there shouldn't be
bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
landlock interface shouldn't expose any cgroup integration, at least
until the cgroup situation settles down a lot.

--Andy

^ permalink raw reply

* Re: [PATCH -next] net: dsa: bcm_sf2: Fix non static symbol warning
From: Florian Fainelli @ 2016-09-15  2:46 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: David S . Miller, Vivien Didelot, Andrew Lunn, Jiri Pirko,
	Wei Yongjun, netdev
In-Reply-To: <1473906253-1424-1-git-send-email-weiyj.lk@gmail.com>

2016-09-14 19:24 GMT-07:00 Wei Yongjun <weiyj.lk@gmail.com>:
> From: Wei Yongjun <weiyongjun1@huawei.com>
>
> Fixes the following sparse warning:
>
> drivers/net/dsa/bcm_sf2.c:963:19: warning:
>  symbol 'bcm_sf2_io_ops' was not declared. Should it be static?
>
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH -next] net: dsa: b53: Remove unused including <linux/version.h>
From: Florian Fainelli @ 2016-09-15  2:46 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: Wei Yongjun, netdev
In-Reply-To: <1473906277-1828-1-git-send-email-weiyj.lk@gmail.com>

2016-09-14 19:24 GMT-07:00 Wei Yongjun <weiyj.lk@gmail.com>:
> From: Wei Yongjun <weiyongjun1@huawei.com>
>
> Remove including <linux/version.h> that don't need it.
>
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

^ permalink raw reply

* Re: [PATCH 0/5] Make /sys/class/net per net namespace objects belong to container
From: Dmitry Torokhov @ 2016-09-15  3:24 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: David Miller, Tejun Heo, lkml, netdev
In-Reply-To: <87zinvpx5s.fsf@x220.int.ebiederm.org>

On Mon, Aug 29, 2016 at 5:38 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> David Miller <davem@davemloft.net> writes:
>
>> From: Dmitry Torokhov <dmitry.torokhov@gmail.com>
>> Date: Tue, 16 Aug 2016 15:33:10 -0700
>>
>>> There are objects in /sys hierarchy (/sys/class/net/) that logically belong
>>> to a namespace/container. Unfortunately all sysfs objects start their life
>>> belonging to global root, and while we could change ownership manually,
>>> keeping tracks of all objects that come and go is cumbersome. It would
>>> be better if kernel created them using correct uid/gid from the beginning.
>>>
>>> This series changes kernfs to allow creating object's with arbitrary
>>> uid/gid, adds get_ownership() callback to ktype structure so subsystems
>>> could supply their own logic (likely tied to namespace support) for
>>> determining ownership of kobjects, and adjusts sysfs code to make use of
>>> this information. Lastly net-sysfs is adjusted to make sure that objects in
>>> net namespace are owned by the root user from the owning user namespace.
>>>
>>> Note that we do not adjust ownership of objects moved into a new namespace
>>> (as when moving a network device into a container) as userspace can easily
>>> do it.
>>
>> I need some domain experts to review this series please.
>
> I just came back from vacation and I will aim to take a look shortly.
>
> The big picture idea seems sensible.  Having a better ownship of sysfs
> files that are part of a network namespace.  I will have to look at the
> details to see if the implementation is similarly sensible.

Eric,

Did you find anything objectionable in the series or should I fix up
the !CONFIG_SYSFS error in networking patch and resubmit?

Thanks.

-- 
Dmitry

^ permalink raw reply

* [PATCH] nfp: fix error return code in nfp_net_netdev_open()
From: Wei Yongjun @ 2016-09-15  3:45 UTC (permalink / raw)
  To: Jakub Kicinski, Rolf Neugebauer, Dinan Gunawardena, Simon Horman
  Cc: Wei Yongjun, oss-drivers, netdev

From: Wei Yongjun <weiyongjun1@huawei.com>

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: 73725d9dfd99 ("nfp: allocate ring SW structs dynamically")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 252e492..39dadfc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2044,12 +2044,16 @@ static int nfp_net_netdev_open(struct net_device *netdev)
 
 	nn->rx_rings = kcalloc(nn->num_rx_rings, sizeof(*nn->rx_rings),
 			       GFP_KERNEL);
-	if (!nn->rx_rings)
+	if (!nn->rx_rings) {
+		err = -ENOMEM;
 		goto err_free_lsc;
+	}
 	nn->tx_rings = kcalloc(nn->num_tx_rings, sizeof(*nn->tx_rings),
 			       GFP_KERNEL);
-	if (!nn->tx_rings)
+	if (!nn->tx_rings) {
+		err = -ENOMEM;
 		goto err_free_rx_rings;
+	}
 
 	for (r = 0; r < nn->num_r_vecs; r++) {
 		err = nfp_net_prepare_vector(nn, &nn->r_vecs[r], r);

^ permalink raw reply related

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Alexei Starovoitov @ 2016-09-15  4:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel@vger.kernel.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry
In-Reply-To: <CALCETrWXjJZZRj5XvDQ+-Grue+b4MW2TFKsfgYYFYoFBFVH71g@mail.gmail.com>

On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >> >
> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> >> > don't see why building on top of cgroup v2 is a problem. Is there
> >> > security issues with delegation?
> >>
> >> What I mean is: cgroup v2 delegation has a functionality problem.
> >> Tejun says [1]:
> >>
> >> We haven't had to face this decision because cgroup has never properly
> >> supported delegating to applications and the in-use setups where this
> >> happens are custom configurations where there is no boundary between
> >> system and applications and adhoc trial-and-error is good enough a way
> >> to find a working solution.  That wiggle room goes away once we
> >> officially open this up to individual applications.
> >>
> >> Unless and until that changes, I think that landlock should stay away
> >> from cgroups.  Others could reasonably disagree with me.
> >
> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> > and not for sandboxing. So the above doesn't matter in such contexts.
> > lsm hooks + cgroups provide convenient scope and existing entry points.
> > Please see checmate examples how it's used.
> >
> 
> To be clear: I'm not arguing at all that there shouldn't be
> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
> landlock interface shouldn't expose any cgroup integration, at least
> until the cgroup situation settles down a lot.

ahh. yes. we're perfectly in agreement here.
I'm suggesting that the next RFC shouldn't include unpriv
and seccomp at all. Once bpf+lsm+cgroup is merged, we can
argue about unpriv with cgroups and even unpriv as a whole,
since it's not a given. Seccomp integration is also questionable.
I'd rather not have seccomp as a gate keeper for this lsm.
lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
don't have one to one relationship, so mixing them up is only
asking for trouble further down the road.
If we really need to carry some information from seccomp to lsm+bpf,
it's easier to add eBPF support to seccomp and let bpf side deal
with passing whatever information. 

^ permalink raw reply

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Andy Lutomirski @ 2016-09-15  4:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry
In-Reply-To: <20160915040054.GA65308-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>

On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
<alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>> >> >
>> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
>> >> > don't see why building on top of cgroup v2 is a problem. Is there
>> >> > security issues with delegation?
>> >>
>> >> What I mean is: cgroup v2 delegation has a functionality problem.
>> >> Tejun says [1]:
>> >>
>> >> We haven't had to face this decision because cgroup has never properly
>> >> supported delegating to applications and the in-use setups where this
>> >> happens are custom configurations where there is no boundary between
>> >> system and applications and adhoc trial-and-error is good enough a way
>> >> to find a working solution.  That wiggle room goes away once we
>> >> officially open this up to individual applications.
>> >>
>> >> Unless and until that changes, I think that landlock should stay away
>> >> from cgroups.  Others could reasonably disagree with me.
>> >
>> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>> > and not for sandboxing. So the above doesn't matter in such contexts.
>> > lsm hooks + cgroups provide convenient scope and existing entry points.
>> > Please see checmate examples how it's used.
>> >
>>
>> To be clear: I'm not arguing at all that there shouldn't be
>> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>> landlock interface shouldn't expose any cgroup integration, at least
>> until the cgroup situation settles down a lot.
>
> ahh. yes. we're perfectly in agreement here.
> I'm suggesting that the next RFC shouldn't include unpriv
> and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> argue about unpriv with cgroups and even unpriv as a whole,
> since it's not a given. Seccomp integration is also questionable.
> I'd rather not have seccomp as a gate keeper for this lsm.
> lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> don't have one to one relationship, so mixing them up is only
> asking for trouble further down the road.
> If we really need to carry some information from seccomp to lsm+bpf,
> it's easier to add eBPF support to seccomp and let bpf side deal
> with passing whatever information.
>

As an argument for keeping seccomp (or an extended seccomp) as the
interface for an unprivileged bpf+lsm: seccomp already checks off most
of the boxes for safely letting unprivileged programs sandbox
themselves.  Furthermore, to the extent that there are use cases for
unprivileged bpf+lsm that *aren't* expressible within the seccomp
hierarchy, I suspect that syscall filters have exactly the same
problem and that we should fix seccomp to cover it.

If I ever add a "seccomp monitor", which is something I want to do
eventually, I think it should work for lsm+bpf as well, which is
another argument for keeping it in seccomp.

--Andy

^ permalink raw reply

* Re: [Intel-wired-lan] [net-next PATCH v3 1/3] e1000: track BQL bytes regardless of skb or not
From: John Fastabend @ 2016-09-15  4:22 UTC (permalink / raw)
  To: Alexei Starovoitov, Brown, Aaron F
  Cc: bblanco@plumgrid.com, Kirsher, Jeffrey T, brouer@redhat.com,
	davem@davemloft.net, xiyou.wangcong@gmail.com,
	intel-wired-lan@lists.osuosl.org, u9012063@gmail.com,
	netdev@vger.kernel.org
In-Reply-To: <20160915004353.GA63116@ast-mbp.thefacebook.com>

On 16-09-14 05:43 PM, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 11:57:24PM +0000, Brown, Aaron F wrote:
>>> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
>>> Behalf Of John Fastabend
>>> Sent: Monday, September 12, 2016 3:13 PM
>>> To: bblanco@plumgrid.com; john.fastabend@gmail.com;
>>> alexei.starovoitov@gmail.com; Kirsher, Jeffrey T
>>> <jeffrey.t.kirsher@intel.com>; brouer@redhat.com; davem@davemloft.net
>>> Cc: xiyou.wangcong@gmail.com; intel-wired-lan@lists.osuosl.org;
>>> u9012063@gmail.com; netdev@vger.kernel.org
>>> Subject: [Intel-wired-lan] [net-next PATCH v3 1/3] e1000: track BQL bytes
>>> regardless of skb or not
>>>
>>> The BQL API does not reference the sk_buff nor does the driver need to
>>> reference the sk_buff to calculate the length of a transmitted frame.
>>> This patch removes an sk_buff reference from the xmit irq path and
>>> also allows packets sent from XDP to use BQL.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>>>  drivers/net/ethernet/intel/e1000/e1000_main.c |    7 ++-----
>>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> This patch is causing all my e1000 adapters to fail a simple ftp session with really slow response (hashing on) followed by adapter resets.  I have seen this on 4 different e1000 nics now, an 82543GC, 82544GC, 82546EB and an 82545GM.  On a few occasions I get a splat captured to dmesg.  Here is an example:
>> --------------------------------
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x1c2/0x1d0
>> NETDEV WATCHDOG: eth1 (e1000): transmit queue 0 timed out
> 
> Thanks a lot for the tests! Really appreciate it.
> 

Thanks.

Jeff, please drop the series for now obviously this wont work. It needs
some work.

^ permalink raw reply

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Alexei Starovoitov @ 2016-09-15  4:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry
In-Reply-To: <CALCETrXTS8R1E2b9mmWzpOO6QOW5nWYW_RQRJYU1CGRsbNy+Yw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
> <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >> >> >
> >> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> >> >> > don't see why building on top of cgroup v2 is a problem. Is there
> >> >> > security issues with delegation?
> >> >>
> >> >> What I mean is: cgroup v2 delegation has a functionality problem.
> >> >> Tejun says [1]:
> >> >>
> >> >> We haven't had to face this decision because cgroup has never properly
> >> >> supported delegating to applications and the in-use setups where this
> >> >> happens are custom configurations where there is no boundary between
> >> >> system and applications and adhoc trial-and-error is good enough a way
> >> >> to find a working solution.  That wiggle room goes away once we
> >> >> officially open this up to individual applications.
> >> >>
> >> >> Unless and until that changes, I think that landlock should stay away
> >> >> from cgroups.  Others could reasonably disagree with me.
> >> >
> >> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> >> > and not for sandboxing. So the above doesn't matter in such contexts.
> >> > lsm hooks + cgroups provide convenient scope and existing entry points.
> >> > Please see checmate examples how it's used.
> >> >
> >>
> >> To be clear: I'm not arguing at all that there shouldn't be
> >> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
> >> landlock interface shouldn't expose any cgroup integration, at least
> >> until the cgroup situation settles down a lot.
> >
> > ahh. yes. we're perfectly in agreement here.
> > I'm suggesting that the next RFC shouldn't include unpriv
> > and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> > argue about unpriv with cgroups and even unpriv as a whole,
> > since it's not a given. Seccomp integration is also questionable.
> > I'd rather not have seccomp as a gate keeper for this lsm.
> > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> > don't have one to one relationship, so mixing them up is only
> > asking for trouble further down the road.
> > If we really need to carry some information from seccomp to lsm+bpf,
> > it's easier to add eBPF support to seccomp and let bpf side deal
> > with passing whatever information.
> >
> 
> As an argument for keeping seccomp (or an extended seccomp) as the
> interface for an unprivileged bpf+lsm: seccomp already checks off most
> of the boxes for safely letting unprivileged programs sandbox
> themselves.  

you mean the attach part of seccomp syscall that deals with no_new_priv?
sure, that's reusable.

> Furthermore, to the extent that there are use cases for
> unprivileged bpf+lsm that *aren't* expressible within the seccomp
> hierarchy, I suspect that syscall filters have exactly the same
> problem and that we should fix seccomp to cover it.

not sure what you mean by 'seccomp hierarchy'. The normal process
hierarchy ?
imo the main deficiency of secccomp is inability to look into arguments.
One can argue that it's a blessing, since composite args
are not yet copied into the kernel memory.
But in a lot of cases the seccomp arguments are FDs pointing
to kernel objects and if programs could examine those objects
the sandboxing scope would be more precise.
lsm+bpf solves that part and I'd still argue that it's
orthogonal to seccomp's pass/reject flow.
I mean if seccomp says 'ok' the syscall should continue executing
as normal and whatever LSM hooks were triggered by it may have
their own lsm+bpf verdicts.
Furthermore in the process hierarchy different children
should be able to set their own lsm+bpf filters that are not
related to parallel seccomp+bpf hierarchy of programs.
seccomp syscall can be an interface to attach programs
to lsm hooks, but nothing more than that.

^ permalink raw reply

* Re: [RFC 02/11] Add RoCE driver framework
From: Leon Romanovsky @ 2016-09-15  4:37 UTC (permalink / raw)
  To: Mintz, Yuval
  Cc: Yuval Mintz, Mark Bloch, Ram Amrani,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, David Miller,
	Ariel Elior, Michal Kalderon, Rajesh Borundia,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev
In-Reply-To: <BL2PR07MB23066BBC4A019365A632E2FA8DF10-I6Fv6QFlT9L2NWYWB7JgfuFPX92sqiQdvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 4543 bytes --]

On Wed, Sep 14, 2016 at 06:25:38PM +0000, Mintz, Yuval wrote:
> > > > > >> >> +uint debug;
> > > > > >> >> +module_param(debug, uint, 0);
> > > > > > >>> +MODULE_PARM_DESC(debug, "Default debug msglevel");
> > > > > >>
> > > > > >> >Why are you adding this as a module parameter?
> > > > > >>
> > > > > >>  I believe this is mostly to follow same line as qede which also defines
> > > > > > > 'debug' module parameter for allowing easy user control of debug
> > > > > > > prints [& specifically for probe prints, which can't be controlled
> > > > > > > otherwise].
> > > > >
> > > > > > Can you give us an example where dynamic debug and tracing infrastructures
> > > > > > are not enough?
> > > > >
> > > > > > AFAIK, most of these debug module parameters are legacy copy/paste
> > > > > > code which is useless in real life scenarios.
> > > > >
> > > > > Define 'enough'; Using dynamic debug you can provide all the necessary
> > > > > information and at an even better granularity that's achieved by suggested
> > >  > > infrastructure,  but is harder for an end-user to use. Same goes for tracing.
> > > > >
> > > > > The 'debug' option provides an easy grouping for prints related to a specific
> > > > > area in the driver.
> > > >
> > > > It is hard to agree with you that user which knows how-to load modules
> > > > with parameters won't success to enable debug prints.
> > >
> > > I think you're giving too much credit to the end-user. :-D
> > >
> > > > In addition, global increase in debug level for whole driver will create
> > > > printk storm in dmesg and give nothing to debuggability.
> > >
> > > So basically, what you're claiming is that ethtool 'msglvl' setting for devices
> > > is completely obselete. While this *might* be true, we use it extensively
> > > in our qede and qed drivers; The debug module parameter merely provides
> > > a manner of setting the debug value prior to initial probe for all interfaces.
> > > qedr follows the same practice.
>
> > Thanks for this excellent example. Ethtool 'msglvl' adds this
> > dynamically, while your DEBUG argument works for loading module
> > only.
>
> > If you want dynamic prints, you have two options:
> > 1. Add support of ethtool to whole RDMA stack.
> > 2. Use dynamic tracing infrastructure.
>
> > Which option do you prefer?
> Option 3 - continuing this discussion. :-)

Sorry,
I was under impression that you want this driver to be merged, but it
looks like It was incorrect assumption. Let's continue discussion.

>
> Perhaps I misread your intentions - I thought that by dynamic debug
> you meant that all debug in RDMA should be pr_debug() based, and
> therefore my objection regarding the ease with which users can
> configure it.

It is not for all RDMA, but in your proposed driver. You are adding this
"debug" module argument to your module.

> If all you meant was 'dynamically set' as opposed to 'statically set'
> then I agree that having that sort of configurability is preferable
> [Even though end-user would still probably prefer a module
> parameter for reproductions; As the name implies, 'debug' isn't
> meant to be used in other situations].

We are not adding code just for fun, but for a real reason, and
especially interfaces which will be visible to user.

The overall expectation from the driver's authors that they are
submitting driver which doesn't have bringup issues. For real
life scenarios, where the bugs will be reveled after some time of
usage, this global debug is useless.

>
> The other thing to consider are the probe-time prints.
> Problem is, you wouldn't have a control node between probe
> and until after the probing would be over, so it would be a bit
> hard to configure that.
> You can always think of some generic method of fixing that as well
> [sysfs node for the entire system for probe-time prints, perhaps?]

/sys/kernel/debug/tracing
/sys/kernel/debug/dynamic_debug

>
> Do notice you would be harming user-experience of reproductions
> though - as it would have to follow different mechanisms to open
> debug prints of various qed* components.

I don't understand this point at all. Do you think that it is normal to
ask user to debug your driver? Is this called "user-experience"?

And regarding the second point, the old code is not an excuse to
copy/paste bad practices.

As a summary, I didn't see in your responses any real life example where
you will need global debug level for your driver.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Andy Lutomirski @ 2016-09-15  4:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Mickaël Salaün,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry
In-Reply-To: <20160915043120.GA65819-+o4/htvd0TDFYCXBM6kdu7fOX0fSgVTm@public.gmane.org>

On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
<alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
>> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
>> <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
>> >> >> >
>> >> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
>> >> >> > don't see why building on top of cgroup v2 is a problem. Is there
>> >> >> > security issues with delegation?
>> >> >>
>> >> >> What I mean is: cgroup v2 delegation has a functionality problem.
>> >> >> Tejun says [1]:
>> >> >>
>> >> >> We haven't had to face this decision because cgroup has never properly
>> >> >> supported delegating to applications and the in-use setups where this
>> >> >> happens are custom configurations where there is no boundary between
>> >> >> system and applications and adhoc trial-and-error is good enough a way
>> >> >> to find a working solution.  That wiggle room goes away once we
>> >> >> officially open this up to individual applications.
>> >> >>
>> >> >> Unless and until that changes, I think that landlock should stay away
>> >> >> from cgroups.  Others could reasonably disagree with me.
>> >> >
>> >> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
>> >> > and not for sandboxing. So the above doesn't matter in such contexts.
>> >> > lsm hooks + cgroups provide convenient scope and existing entry points.
>> >> > Please see checmate examples how it's used.
>> >> >
>> >>
>> >> To be clear: I'm not arguing at all that there shouldn't be
>> >> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
>> >> landlock interface shouldn't expose any cgroup integration, at least
>> >> until the cgroup situation settles down a lot.
>> >
>> > ahh. yes. we're perfectly in agreement here.
>> > I'm suggesting that the next RFC shouldn't include unpriv
>> > and seccomp at all. Once bpf+lsm+cgroup is merged, we can
>> > argue about unpriv with cgroups and even unpriv as a whole,
>> > since it's not a given. Seccomp integration is also questionable.
>> > I'd rather not have seccomp as a gate keeper for this lsm.
>> > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
>> > don't have one to one relationship, so mixing them up is only
>> > asking for trouble further down the road.
>> > If we really need to carry some information from seccomp to lsm+bpf,
>> > it's easier to add eBPF support to seccomp and let bpf side deal
>> > with passing whatever information.
>> >
>>
>> As an argument for keeping seccomp (or an extended seccomp) as the
>> interface for an unprivileged bpf+lsm: seccomp already checks off most
>> of the boxes for safely letting unprivileged programs sandbox
>> themselves.
>
> you mean the attach part of seccomp syscall that deals with no_new_priv?
> sure, that's reusable.
>
>> Furthermore, to the extent that there are use cases for
>> unprivileged bpf+lsm that *aren't* expressible within the seccomp
>> hierarchy, I suspect that syscall filters have exactly the same
>> problem and that we should fix seccomp to cover it.
>
> not sure what you mean by 'seccomp hierarchy'. The normal process
> hierarchy ?

Kind of.  I mean the filter layers that are inherited across fork(),
the TSYNC mechanism, etc.

> imo the main deficiency of secccomp is inability to look into arguments.
> One can argue that it's a blessing, since composite args
> are not yet copied into the kernel memory.
> But in a lot of cases the seccomp arguments are FDs pointing
> to kernel objects and if programs could examine those objects
> the sandboxing scope would be more precise.
> lsm+bpf solves that part and I'd still argue that it's
> orthogonal to seccomp's pass/reject flow.
> I mean if seccomp says 'ok' the syscall should continue executing
> as normal and whatever LSM hooks were triggered by it may have
> their own lsm+bpf verdicts.

I agree with all of this...

> Furthermore in the process hierarchy different children
> should be able to set their own lsm+bpf filters that are not
> related to parallel seccomp+bpf hierarchy of programs.
> seccomp syscall can be an interface to attach programs
> to lsm hooks, but nothing more than that.

I'm not sure what you mean.  I mean that, logically, I think we should
be able to do:

seccomp(attach a syscall filter);
fork();
child does seccomp(attach some lsm filters);

I think that they *should* be related to the seccomp+bpf hierarchy of
programs in that they are entries in the same logical list of filter
layers installed.  Some of those layers can be syscall filters and
some of the layers can be lsm filters.  If we subsequently add a way
to attach a removable seccomp filter or a way to attach a seccomp
filter that logs failures to some fd watched by an outside monitor, I
think that should work for lsm, too, with more or less the same
interface.

If we need a way for a sandbox manager to opt different children into
different subsets of fancy filters, then I think that syscall filters
and lsm filters should use the same mechanism.

I think we might be on the same page here and just saying it different ways.

^ permalink raw reply

* Re: [RFC v3 18/22] cgroup,landlock: Add CGRP_NO_NEW_PRIVS to handle unprivileged hooks
From: Alexei Starovoitov @ 2016-09-15  4:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mickaël Salaün, linux-kernel@vger.kernel.org,
	Alexei Starovoitov, Arnd Bergmann, Casey Schaufler,
	Daniel Borkmann, Daniel Mack, David Drysdale, David S . Miller,
	Elena Reshetova, Eric W . Biederman, James Morris, Kees Cook,
	Paul Moore, Sargun Dhillon, Serge E . Hallyn, Tejun Heo,
	Will Drewry
In-Reply-To: <CALCETrU=tGLx8s_eqji6SfXRi=3W8FkGC7wA6VMfD-_wAVb66w@mail.gmail.com>

On Wed, Sep 14, 2016 at 09:38:16PM -0700, Andy Lutomirski wrote:
> On Wed, Sep 14, 2016 at 9:31 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Wed, Sep 14, 2016 at 09:08:57PM -0700, Andy Lutomirski wrote:
> >> On Wed, Sep 14, 2016 at 9:00 PM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:
> >> > On Wed, Sep 14, 2016 at 07:27:08PM -0700, Andy Lutomirski wrote:
> >> >> >> >
> >> >> >> > This RFC handle both cgroup and seccomp approaches in a similar way. I
> >> >> >> > don't see why building on top of cgroup v2 is a problem. Is there
> >> >> >> > security issues with delegation?
> >> >> >>
> >> >> >> What I mean is: cgroup v2 delegation has a functionality problem.
> >> >> >> Tejun says [1]:
> >> >> >>
> >> >> >> We haven't had to face this decision because cgroup has never properly
> >> >> >> supported delegating to applications and the in-use setups where this
> >> >> >> happens are custom configurations where there is no boundary between
> >> >> >> system and applications and adhoc trial-and-error is good enough a way
> >> >> >> to find a working solution.  That wiggle room goes away once we
> >> >> >> officially open this up to individual applications.
> >> >> >>
> >> >> >> Unless and until that changes, I think that landlock should stay away
> >> >> >> from cgroups.  Others could reasonably disagree with me.
> >> >> >
> >> >> > Ours and Sargun's use cases for cgroup+lsm+bpf is not for security
> >> >> > and not for sandboxing. So the above doesn't matter in such contexts.
> >> >> > lsm hooks + cgroups provide convenient scope and existing entry points.
> >> >> > Please see checmate examples how it's used.
> >> >> >
> >> >>
> >> >> To be clear: I'm not arguing at all that there shouldn't be
> >> >> bpf+lsm+cgroup integration.  I'm arguing that the unprivileged
> >> >> landlock interface shouldn't expose any cgroup integration, at least
> >> >> until the cgroup situation settles down a lot.
> >> >
> >> > ahh. yes. we're perfectly in agreement here.
> >> > I'm suggesting that the next RFC shouldn't include unpriv
> >> > and seccomp at all. Once bpf+lsm+cgroup is merged, we can
> >> > argue about unpriv with cgroups and even unpriv as a whole,
> >> > since it's not a given. Seccomp integration is also questionable.
> >> > I'd rather not have seccomp as a gate keeper for this lsm.
> >> > lsm and seccomp are orthogonal hook points. Syscalls and lsm hooks
> >> > don't have one to one relationship, so mixing them up is only
> >> > asking for trouble further down the road.
> >> > If we really need to carry some information from seccomp to lsm+bpf,
> >> > it's easier to add eBPF support to seccomp and let bpf side deal
> >> > with passing whatever information.
> >> >
> >>
> >> As an argument for keeping seccomp (or an extended seccomp) as the
> >> interface for an unprivileged bpf+lsm: seccomp already checks off most
> >> of the boxes for safely letting unprivileged programs sandbox
> >> themselves.
> >
> > you mean the attach part of seccomp syscall that deals with no_new_priv?
> > sure, that's reusable.
> >
> >> Furthermore, to the extent that there are use cases for
> >> unprivileged bpf+lsm that *aren't* expressible within the seccomp
> >> hierarchy, I suspect that syscall filters have exactly the same
> >> problem and that we should fix seccomp to cover it.
> >
> > not sure what you mean by 'seccomp hierarchy'. The normal process
> > hierarchy ?
> 
> Kind of.  I mean the filter layers that are inherited across fork(),
> the TSYNC mechanism, etc.
> 
> > imo the main deficiency of secccomp is inability to look into arguments.
> > One can argue that it's a blessing, since composite args
> > are not yet copied into the kernel memory.
> > But in a lot of cases the seccomp arguments are FDs pointing
> > to kernel objects and if programs could examine those objects
> > the sandboxing scope would be more precise.
> > lsm+bpf solves that part and I'd still argue that it's
> > orthogonal to seccomp's pass/reject flow.
> > I mean if seccomp says 'ok' the syscall should continue executing
> > as normal and whatever LSM hooks were triggered by it may have
> > their own lsm+bpf verdicts.
> 
> I agree with all of this...
> 
> > Furthermore in the process hierarchy different children
> > should be able to set their own lsm+bpf filters that are not
> > related to parallel seccomp+bpf hierarchy of programs.
> > seccomp syscall can be an interface to attach programs
> > to lsm hooks, but nothing more than that.
> 
> I'm not sure what you mean.  I mean that, logically, I think we should
> be able to do:
> 
> seccomp(attach a syscall filter);
> fork();
> child does seccomp(attach some lsm filters);
> 
> I think that they *should* be related to the seccomp+bpf hierarchy of
> programs in that they are entries in the same logical list of filter
> layers installed.  Some of those layers can be syscall filters and
> some of the layers can be lsm filters.  If we subsequently add a way
> to attach a removable seccomp filter or a way to attach a seccomp
> filter that logs failures to some fd watched by an outside monitor, I
> think that should work for lsm, too, with more or less the same
> interface.
> 
> If we need a way for a sandbox manager to opt different children into
> different subsets of fancy filters, then I think that syscall filters
> and lsm filters should use the same mechanism.
> 
> I think we might be on the same page here and just saying it different ways.

Sounds like it :)
All of the above makes sense to me.
The 'orthogonal' part is that the user should be able to use
this seccomp-managed hierarchy without actually enabling
TIF_SECCOMP for the task and syscalls should still go through
fast path and all the way till lsm hooks as normal.
I don't want to pay _any_ performance penalty for this feature
for lsm hooks (and all syscalls) that don't have bpf programs attached.


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox