Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Jason Wang @ 2018-10-19  2:19 UTC (permalink / raw)
  To: David Miller, bigeasy
  Cc: stephen, netdev, virtualization, tglx, makita.toshiaki, mst
In-Reply-To: <20181018.162308.2295937118791060714.davem@davemloft.net>


On 2018/10/19 上午7:23, David Miller wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Thu, 18 Oct 2018 10:43:13 +0200
>
>> on 32bit, lockdep notices that virtnet_open() and refill_work() invoke
>> try_fill_recv() from process context while virtnet_receive() invokes the
>> same function from BH context. The problem that the seqcounter within
>> u64_stats_update_begin() may deadlock if it is interrupted by BH and
>> then acquired again.
>>
>> Introduce u64_stats_update_begin_bh() which disables BH on 32bit
>> architectures. Since the BH might interrupt the worker, this new
>> function should not limited to SMP like the others which are expected
>> to be used in softirq.
>>
>> With this change we might lose increments but this is okay. The
>> important part that the two 32bit parts of the 64bit counter are not
>> corrupted.
>>
>> Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats").
>> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Trying to get down to the bottom of this:
>
> 1) virtnet_receive() runs from softirq but only if NAPI is active and
>     enabled.  It is in this context that it invokes try_fill_recv().
>
> 2) refill_work() runs from process context, but disables NAPI (and
>     thus invocation of virtnet_receive()) before calling
>     try_fill_recv().
>
> 3) virtnet_open() invokes from process context as well, but before the
>     NAPI instances are enabled, it is same as case #2.
>
> 4) virtnet_restore_up() is the same situations as #3.
>
> Therefore I agree that this is a false positive, and simply lockdep
> cannot see the NAPI synchronization done by case #2.
>
> I think we shouldn't add unnecessary BH disabling here, and instead
> find some way to annotate this for lockdep's sake.
>
> Thank you.
>

+1

Thanks

^ permalink raw reply

* Re: [PATCH 0/3] Bugfix for the netsec driver
From: Florian Fainelli @ 2018-10-19  2:56 UTC (permalink / raw)
  To: masahisa.kojima, netdev
  Cc: ilias.apalodimas, jaswinder.singh, ard.biesheuvel,
	osaki.yoshitoyo
In-Reply-To: <20181019010843.3605-1-masahisa.kojima@linaro.org>



On 10/18/2018 6:08 PM, masahisa.kojima@linaro.org wrote:
> From: Masahisa Kojima <masahisa.kojima@linaro.org>
> 
> This patch series include bugfix for the netsec ethernet
> controller driver, fix the problem in interface down/up.

Since all of these are bugfixes you would want to provide a Fixes: tag
for the offending commits, that way you can get automated backporting to
stable trees, also patches should be targeted at the "net" tree, which
is indicated with a subject start with [PATCH net], more comments in the
patches.

> 
> Masahisa Kojima (3):
>   net: socionext: stop PHY before resetting netsec
>   net: socionext: Add dummy PHY register read in phy_write()
>   net: socionext: reset tx queue in ndo_stop
> 
>  drivers/net/ethernet/socionext/netsec.c | 36 +++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
> 

-- 
Florian

^ permalink raw reply

* Re: [PATCH 1/3] net: socionext: Stop PHY before resetting netsec
From: Florian Fainelli @ 2018-10-19  2:59 UTC (permalink / raw)
  To: masahisa.kojima, netdev
  Cc: ilias.apalodimas, jaswinder.singh, ard.biesheuvel,
	osaki.yoshitoyo
In-Reply-To: <20181019010843.3605-2-masahisa.kojima@linaro.org>



On 10/18/2018 6:08 PM, masahisa.kojima@linaro.org wrote:
> From: Masahisa Kojima <masahisa.kojima@linaro.org>
> 
> After resetting netsec IP, driver have to wait until
> netsec mode turns to NRM mode.
> But sometimes mode transition to NRM will not complete
> if the PHY is in normal operation state.
> To avoid this situation, stop PHY before resetting netsec.
> 
> Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
> Signed-off-by: Yoshitoyo Osaki <osaki.yoshitoyo@socionext.com>
> ---
>  drivers/net/ethernet/socionext/netsec.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index 4289ccb26e4e..273cc5fc07e0 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -1343,11 +1343,11 @@ static int netsec_netdev_stop(struct net_device *ndev)
>  	netsec_uninit_pkt_dring(priv, NETSEC_RING_TX);
>  	netsec_uninit_pkt_dring(priv, NETSEC_RING_RX);
>  
> -	ret = netsec_reset_hardware(priv, false);
> -
>  	phy_stop(ndev->phydev);
>  	phy_disconnect(ndev->phydev);
>  
> +	ret = netsec_reset_hardware(priv, false);
> +
>  	pm_runtime_put_sync(priv->dev);
>  
>  	return ret;
> @@ -1415,7 +1415,7 @@ static const struct net_device_ops netsec_netdev_ops = {
>  };
>  
>  static int netsec_of_probe(struct platform_device *pdev,
> -			   struct netsec_priv *priv)
> +			   struct netsec_priv *priv, u32 *phy_addr)
>  {
>  	priv->phy_np = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
>  	if (!priv->phy_np) {
> @@ -1423,6 +1423,8 @@ static int netsec_of_probe(struct platform_device *pdev,
>  		return -EINVAL;
>  	}
>  
> +	*phy_addr = of_mdio_parse_addr(&pdev->dev, priv->phy_np);
> +
>  	priv->clk = devm_clk_get(&pdev->dev, NULL); /* get by 'phy_ref_clk' */
>  	if (IS_ERR(priv->clk)) {
>  		dev_err(&pdev->dev, "phy_ref_clk not found\n");
> @@ -1473,6 +1475,7 @@ static int netsec_register_mdio(struct netsec_priv *priv, u32 phy_addr)
>  {
>  	struct mii_bus *bus;
>  	int ret;
> +	u16 data;
>  
>  	bus = devm_mdiobus_alloc(priv->dev);
>  	if (!bus)
> @@ -1486,6 +1489,10 @@ static int netsec_register_mdio(struct netsec_priv *priv, u32 phy_addr)
>  	bus->parent = priv->dev;
>  	priv->mii_bus = bus;
>  
> +	/* set phy power down */
> +	data = netsec_phy_read(bus, phy_addr, 0) | 0x800;
> +	netsec_phy_write(bus, phy_addr, 0, data);

This is not explained in your commit message, and this is using open
coded values instead of the symbolic names from include/uapi/linux/mii.h.

Note that depending on the type of PHY you are interfaced with if the
PHY is in power down, the only register it is allowed to accept writes
for is MII_BMCR (per 802.3 clause 22 spec), any other read or write to a
different register can be discarded by its MDIO snooping logic. Since
probing the MDIO bus involves reading MII_PHYSID1/ID2, what is this
trying to achieve?

> +
>  	if (dev_of_node(priv->dev)) {
>  		struct device_node *mdio_node, *parent = dev_of_node(priv->dev);
>  
> @@ -1623,7 +1630,7 @@ static int netsec_probe(struct platform_device *pdev)
>  	}
>  
>  	if (dev_of_node(&pdev->dev))
> -		ret = netsec_of_probe(pdev, priv);
> +		ret = netsec_of_probe(pdev, priv, &phy_addr);
>  	else
>  		ret = netsec_acpi_probe(pdev, priv, &phy_addr);
>  	if (ret)
> 

-- 
Florian

^ permalink raw reply

* Re: [PATCH 2/3] net: socionext: Add dummy PHY register read in phy_write()
From: Florian Fainelli @ 2018-10-19  3:01 UTC (permalink / raw)
  To: masahisa.kojima, netdev
  Cc: ilias.apalodimas, jaswinder.singh, ard.biesheuvel,
	osaki.yoshitoyo
In-Reply-To: <20181019010843.3605-3-masahisa.kojima@linaro.org>



On 10/18/2018 6:08 PM, masahisa.kojima@linaro.org wrote:
> From: Masahisa Kojima <masahisa.kojima@linaro.org>
> 
> There is a compatibility issue between RTL8211E implemented
> in Developerbox and netsec network controller IP(F_GMAC4).
> 
> RTL8211E expects MDC clock must be kept toggling for several
> clock cycle with MDIO high before entering the IDLE state.
> To meet this requirement, netsec driver needs to issue dummy
> read(e.g. read PHYID1(offset 0x2) register) right after write.
> 
> Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
> Signed-off-by: Yoshitoyo Osaki <osaki.yoshitoyo@socionext.com>
> ---
>  drivers/net/ethernet/socionext/netsec.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index 273cc5fc07e0..e7faaf8be99e 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -431,9 +431,12 @@ static int netsec_mac_update_to_phy_state(struct netsec_priv *priv)
>  	return 0;
>  }
>  
> +static int netsec_phy_read(struct mii_bus *bus, int phy_addr, int reg_addr);
> +
>  static int netsec_phy_write(struct mii_bus *bus,
>  			    int phy_addr, int reg, u16 val)
>  {
> +	int status;
>  	struct netsec_priv *priv = bus->priv;
>  
>  	if (netsec_mac_write(priv, GMAC_REG_GDR, val))
> @@ -446,8 +449,19 @@ static int netsec_phy_write(struct mii_bus *bus,
>  			      GMAC_REG_SHIFT_CR_GAR)))
>  		return -ETIMEDOUT;
>  
> -	return netsec_mac_wait_while_busy(priv, GMAC_REG_GAR,
> -					  NETSEC_GMAC_GAR_REG_GB);
> +	status = netsec_mac_wait_while_busy(priv, GMAC_REG_GAR,
> +					    NETSEC_GMAC_GAR_REG_GB);
> +
> +	/* Developerbox implements RTL8211E PHY and there is
> +	 * a compatibility problem with F_GMAC4.
> +	 * RTL8211E expects MDC clock must be kept toggling for several
> +	 * clock cycle with MDIO high before entering the IDLE state.
> +	 * To meet this requirement, netsec driver needs to issue dummy
> +	 * read(e.g. read PHYID1(offset 0x2) register) right after write.
> +	 */
> +	netsec_phy_read(bus, phy_addr, 2);

MII_PHYSID1 instead of 0x2 would be preferable. It is not clear to me
from your commit message if this is a problem specific to your MDIO
controller implementation and the RTL8211E PHY or if this is a general
problem of the PHY irrespective of the MDIO controller it is interface
with. If the latter, then we should seek a solution at a different layer
such that other systems don't have that same problem.

Thank you!
-- 
Florian

^ permalink raw reply

* Re: [RFC PATCH net-next 2/4] net: 8021q: vlan_core: allow use list of vlans for real device
From: Bjørn Mork @ 2018-10-19 11:22 UTC (permalink / raw)
  To: Ivan Khoronzhuk
  Cc: grygorii.strashko, davem, linux-omap, netdev, linux-kernel,
	alexander.h.duyck
In-Reply-To: <20181016182035.18234-3-ivan.khoronzhuk@linaro.org>

Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> writes:

> @@ -236,6 +239,13 @@ __vlan_find_dev_deep_rcu(struct net_device *real_dev,
>  	return NULL;
>  }
>  
> +static inline int
> +vlan_for_each(struct net_device *dev,
> +	      int (*action)(struct net_device *dev, int vid, void *arg),
> +	      void *arg)
> +{
> +}
> +


This stub should return 0, shouldn't it?


Bjørn

^ permalink raw reply

* [PATCH] tipc: use destination length for copy string
From: Guoqing Jiang @ 2018-10-19  4:08 UTC (permalink / raw)
  To: jon.maloy, ying.xue, davem; +Cc: netdev, tipc-discussion, Guoqing Jiang

Got below warning with gcc 8.2 compiler.

net/tipc/topsrv.c: In function ‘tipc_topsrv_start’:
net/tipc/topsrv.c:660:2: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
  strncpy(srv->name, name, strlen(name) + 1);
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
net/tipc/topsrv.c:660:27: note: length computed here
  strncpy(srv->name, name, strlen(name) + 1);
                           ^~~~~~~~~~~~
So change it to correct length and use strscpy.

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 net/tipc/topsrv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/topsrv.c b/net/tipc/topsrv.c
index 2627b5d812e9..b84c0059214f 100644
--- a/net/tipc/topsrv.c
+++ b/net/tipc/topsrv.c
@@ -657,7 +657,7 @@ int tipc_topsrv_start(struct net *net)
 	srv->max_rcvbuf_size = sizeof(struct tipc_subscr);
 	INIT_WORK(&srv->awork, tipc_topsrv_accept);
 
-	strncpy(srv->name, name, strlen(name) + 1);
+	strscpy(srv->name, name, sizeof(srv->name));
 	tn->topsrv = srv;
 	atomic_set(&tn->subscription_count, 0);
 
-- 
2.12.3

^ permalink raw reply related

* Re: [PATCH bpf-next 2/3] tools, perf: use smp_{rmb,mb} barriers instead of {rmb,mb}
From: Alexei Starovoitov @ 2018-10-19  3:53 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Peter Zijlstra, paulmck, will.deacon, acme, yhs, john.fastabend,
	netdev
In-Reply-To: <1e3eea86-f708-f48b-4db4-911a84b94e06@iogearbox.net>

On Thu, Oct 18, 2018 at 09:00:46PM +0200, Daniel Borkmann wrote:
> On 10/18/2018 05:33 PM, Alexei Starovoitov wrote:
> > On Thu, Oct 18, 2018 at 05:04:34PM +0200, Daniel Borkmann wrote:
> >>  #endif /* _TOOLS_LINUX_ASM_IA64_BARRIER_H */
> >> diff --git a/tools/arch/powerpc/include/asm/barrier.h b/tools/arch/powerpc/include/asm/barrier.h
> >> index a634da0..905a2c6 100644
> >> --- a/tools/arch/powerpc/include/asm/barrier.h
> >> +++ b/tools/arch/powerpc/include/asm/barrier.h
> >> @@ -27,4 +27,20 @@
> >>  #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
> >>  #define wmb()  __asm__ __volatile__ ("sync" : : : "memory")
> >>
> >> +#if defined(__powerpc64__)
> >> +#define smp_lwsync()	__asm__ __volatile__ ("lwsync" : : : "memory")
> >> +
> >> +#define smp_store_release(p, v)			\
> >> +do {						\
> >> +	smp_lwsync();				\
> >> +	WRITE_ONCE(*p, v);			\
> >> +} while (0)
> >> +
> >> +#define smp_load_acquire(p)			\
> >> +({						\
> >> +	typeof(*p) ___p1 = READ_ONCE(*p);	\
> >> +	smp_lwsync();				\
> >> +	___p1;					\
> > 
> > I don't like this proliferation of asm.
> > Why do we think that we can do better job than compiler?
> > can we please use gcc builtins instead?
> > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> > __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
> > __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
> > are done specifically for this use case if I'm not mistaken.
> > I think it pays to learn what compiler provides.
> 
> But are you sure the C11 memory model matches exact same model as kernel?
> Seems like last time Will looked into it [0] it wasn't the case ...

I'm only suggesting equivalence of __atomic_load_n(ptr, __ATOMIC_ACQUIRE)
with kernel's smp_load_acquire().
I've seen a bunch of user space ring buffer implementations implemented
with __atomic_load_n() primitives.
But let's ask experts who live in both worlds.

Paul,
what would you recommend?
Should we copy paste smp_store_release() from kernel to be used
in user space library/tools
or use __atomic_load_n() builtins instead?


> The above was pulled in and slightly adapted from kernel side of arch
> asm barriers. Hm, it would probably be safest if an arch decides to adapt
> C11 barriers first from kernel side and user space could then use the
> exact same matching builtin functions for scenarios like these as well.
> 
>   [0] https://lore.kernel.org/lkml/20170308174300.GL20400@arm.com/

^ permalink raw reply

* Re: [PATCH net-next] net: ethernet: ti: cpsw: don't flush mcast entries while switch promisc mode
From: Ivan Khoronzhuk @ 2018-10-19 12:04 UTC (permalink / raw)
  To: Grygorii Strashko; +Cc: davem, linux-omap, netdev, linux-kernel
In-Reply-To: <6c34a3ce-dbee-538e-bda7-8dd485315267@ti.com>

On Thu, Oct 18, 2018 at 07:03:06PM -0500, Grygorii Strashko wrote:
>
>
>On 10/18/18 1:00 PM, Ivan Khoronzhuk wrote:
>>No need now to flush mcast entries in switch mode while toggling to
>>promiscuous mode. It's not needed as vlan reg_mcast = ALL_PORTS
>>and mcast/vlan ports = ALL_PORTS, the same happening for vlan
>>unreg_mcast, it's set to ALL_PORT_MASK just after calling promisc
>>mode routine by calling set allmulti. I suppose main reason to flush
>>them is to use unreg_mcast to receive all to host port. Thus, now, all
>>mcast packets are received anyway and no reason to flush mcast entries
>>unsafely, as they were synced with __dev_mc_sync() previously and are
>>not restored. Another way is to _dev_mc_unsync() them, but no need.
>
>User have possibility to add additional mcast entries or edit existing 
>in switch-mode, which is now done using custom tool. So, Host in 
>promisc
>mode will not receive packets for mcast address X if port mask for this
>addr set to (ALL_PORTS - HOST_PORT). Am I missing smth?

I didn't take into account the custom tool changing entries directly,
but even in this case there is at least a couple of interesting
questions:

1) Before the patch applied only several days ago -
   5da1948969bc1991920987ce4361ea56046e5a98
   "ti: cpsw: fix lost of mcast packets while rx_mode update"
   It was impossible to do correctly anyway, as all mcast entries not
   in the mc list were flushed (after rx_mode cb), by:
   cpsw_ale_flush_multicast(cpsw->ale, ALE_ALL_PORTS, vid);
   and those in mc, rewritten by adding them back in corrected form.
   ... or this cb was not supposed to be called at all ...

2) What is the reason to add mcast switch entires
   (ALL_PORTS - HOST_PORT) if its function is added anyway by
   unreq_mcast & (ALL_PORTS - HOST_PORT) ?
   So, doesn't matter it's added or not - it will work :-|.

3) Even so, toggling promisc mode will clear all these changes anyway,
   even I will call _dev_mc_unsync() after flushing them.

4) If user can tune ALE table by hand, what stops him do it after moving
   to promisc mode, seems he knows what he's doing?

5) It could be possible only for not default vlan entries, but mcast
   vlan support is not supported yet. Who is gona restore those
   entries after promisc off?

This behaviour is arguable, and flushing mcast entries can bring more
issues then leaving. For me it doesn't matter, I can archive the same
by adding after flush one line, it's even shorter:
__dev_mc_unsync(priv->ndev, NULL);

>
>>
>>Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
>>---
>>
>>Based on net-next/master
>>Tasted on am572x EVM and BBB
>>
>>  drivers/net/ethernet/ti/cpsw.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>>diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
>>index 226be2a56c1f..0e475020a674 100644
>>--- a/drivers/net/ethernet/ti/cpsw.c
>>+++ b/drivers/net/ethernet/ti/cpsw.c
>>@@ -638,9 +638,6 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
>>  			} while (time_after(timeout, jiffies));
>>  			cpsw_ale_control_set(ale, 0, ALE_AGEOUT, 1);
>>-			/* Clear all mcast from ALE */
>>-			cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS, -1);
>>-
>>  			/* Flood All Unicast Packets to Host port */
>>  			cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
>>  			dev_dbg(&ndev->dev, "promiscuity enabled\n");
>>
>
>-- 
>regards,
>-grygorii

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* [PATCH net-next] octeontx2-af: Remove set but not used variables 'devnum, is_pf'
From: YueHaibing @ 2018-10-19 13:03 UTC (permalink / raw)
  To: Sunil Goutham, Linu Cherian, Geetha sowjanya, Jerin Jacob,
	David S. Miller
  Cc: YueHaibing, netdev, linux-kernel, kernel-janitors

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/marvell/octeontx2/af/rvu.c: In function 'rvu_detach_rsrcs':
drivers/net/ethernet/marvell/octeontx2/af/rvu.c:855:6: warning:
 variable 'devnum' set but not used [-Wunused-but-set-variable]
 
drivers/net/ethernet/marvell/octeontx2/af/rvu.c:853:7: warning:
 variable 'is_pf' set but not used [-Wunused-but-set-variable]

drivers/net/ethernet/marvell/octeontx2/af/rvu.c: In function 'rvu_mbox_handler_ATTACH_RESOURCES':
drivers/net/ethernet/marvell/octeontx2/af/rvu.c:1054:7: warning:
 variable 'is_pf' set but not used [-Wunused-but-set-variable]

drivers/net/ethernet/marvell/octeontx2/af/rvu.c:1053:6: warning:
 variable 'devnum' set but not used [-Wunused-but-set-variable]

It never used since introduction in commit
746ea74241fa ("octeontx2-af: Add RVU block LF provisioning support")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 25 +++----------------------
 1 file changed, 3 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index c06cca9..60b3623 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -914,18 +914,9 @@ static int rvu_detach_rsrcs(struct rvu *rvu, struct rsrc_detach *detach,
 			    u16 pcifunc)
 {
 	struct rvu_hwinfo *hw = rvu->hw;
-	bool is_pf, detach_all = true;
+	bool detach_all = true;
 	struct rvu_block *block;
-	int devnum, blkid;
-
-	/* Check if this is for a RVU PF or VF */
-	if (pcifunc & RVU_PFVF_FUNC_MASK) {
-		is_pf = false;
-		devnum = rvu_get_hwvf(rvu, pcifunc);
-	} else {
-		is_pf = true;
-		devnum = rvu_get_pf(pcifunc);
-	}
+	int blkid;
 
 	spin_lock(&rvu->rsrc_lock);
 
@@ -1114,22 +1105,12 @@ static int rvu_mbox_handler_ATTACH_RESOURCES(struct rvu *rvu,
 					     struct msg_rsp *rsp)
 {
 	u16 pcifunc = attach->hdr.pcifunc;
-	int devnum, err;
-	bool is_pf;
+	int err;
 
 	/* If first request, detach all existing attached resources */
 	if (!attach->modify)
 		rvu_detach_rsrcs(rvu, NULL, pcifunc);
 
-	/* Check if this is for a RVU PF or VF */
-	if (pcifunc & RVU_PFVF_FUNC_MASK) {
-		is_pf = false;
-		devnum = rvu_get_hwvf(rvu, pcifunc);
-	} else {
-		is_pf = true;
-		devnum = rvu_get_pf(pcifunc);
-	}
-
 	spin_lock(&rvu->rsrc_lock);
 
 	/* Check if the request can be accommodated */

^ permalink raw reply related

* [PATCH v5 bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
From: Song Liu @ 2018-10-19  5:01 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, Song Liu
In-Reply-To: <20181019050107.3684711-1-songliubraving@fb.com>

BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
skb. This patch enables direct access of skb for these programs.

Two helper functions bpf_compute_and_save_data_end() and
bpf_restore_data_end() are introduced. There are used in
__cgroup_bpf_run_filter_skb(), to compute proper data_end for the
BPF program, and restore original data afterwards.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/filter.h | 21 +++++++++++++++++++++
 kernel/bpf/cgroup.c    |  6 ++++++
 net/core/filter.c      | 36 +++++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5771874bc01e..91b4c934f02e 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -548,6 +548,27 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
 	cb->data_end  = skb->data + skb_headlen(skb);
 }
 
+/* Similar to bpf_compute_data_pointers(), except that save orginal
+ * data in cb->data and cb->meta_data for restore.
+ */
+static inline void bpf_compute_and_save_data_end(
+	struct sk_buff *skb, void **saved_data_end)
+{
+	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+	*saved_data_end = cb->data_end;
+	cb->data_end  = skb->data + skb_headlen(skb);
+}
+
+/* Restore data saved by bpf_compute_data_pointers(). */
+static inline void bpf_restore_data_end(
+	struct sk_buff *skb, void *saved_data_end)
+{
+	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+	cb->data_end = saved_data_end;
+}
+
 static inline u8 *bpf_skb_cb(struct sk_buff *skb)
 {
 	/* eBPF programs may read/write skb->cb[] area to transfer meta
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 00f6ed2e4f9a..9425c2fb872f 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -553,6 +553,7 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 {
 	unsigned int offset = skb->data - skb_network_header(skb);
 	struct sock *save_sk;
+	void *saved_data_end;
 	struct cgroup *cgrp;
 	int ret;
 
@@ -566,8 +567,13 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	save_sk = skb->sk;
 	skb->sk = sk;
 	__skb_push(skb, offset);
+
+	/* compute pointers for the bpf prog */
+	bpf_compute_and_save_data_end(skb, &saved_data_end);
+
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
 				 bpf_prog_run_save_cb);
+	bpf_restore_data_end(skb, saved_data_end);
 	__skb_pull(skb, offset);
 	skb->sk = save_sk;
 	return ret == 1 ? 0 : -EPERM;
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..e3ca30bd6840 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5346,6 +5346,40 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return bpf_skb_is_valid_access(off, size, type, prog, info);
 }
 
+static bool cg_skb_is_valid_access(int off, int size,
+				   enum bpf_access_type type,
+				   const struct bpf_prog *prog,
+				   struct bpf_insn_access_aux *info)
+{
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, tc_classid):
+	case bpf_ctx_range(struct __sk_buff, data_meta):
+	case bpf_ctx_range(struct __sk_buff, flow_keys):
+		return false;
+	}
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case bpf_ctx_range(struct __sk_buff, mark):
+		case bpf_ctx_range(struct __sk_buff, priority):
+		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, data):
+		info->reg_type = PTR_TO_PACKET;
+		break;
+	case bpf_ctx_range(struct __sk_buff, data_end):
+		info->reg_type = PTR_TO_PACKET_END;
+		break;
+	}
+
+	return bpf_skb_is_valid_access(off, size, type, prog, info);
+}
+
 static bool lwt_is_valid_access(int off, int size,
 				enum bpf_access_type type,
 				const struct bpf_prog *prog,
@@ -7038,7 +7072,7 @@ const struct bpf_prog_ops xdp_prog_ops = {
 
 const struct bpf_verifier_ops cg_skb_verifier_ops = {
 	.get_func_proto		= cg_skb_func_proto,
-	.is_valid_access	= sk_filter_is_valid_access,
+	.is_valid_access	= cg_skb_is_valid_access,
 	.convert_ctx_access	= bpf_convert_ctx_access,
 };
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH v5 bpf-next 0/2] bpf: add cg_skb_is_valid_access
From: Song Liu @ 2018-10-19  5:01 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, Song Liu

Changes v4 -> v5:
1. Replaced bpf_compute_and_save_data_pointers() with
   bpf_compute_and_save_data_end();
   Replaced bpf_restore_data_pointers() with bpf_restore_data_end().
2. Fixed indentation in test_verifier.c

Changes v3 -> v4:
1. Fixed crash issue reported by Alexei.

Changes v2 -> v3:
1. Added helper function bpf_compute_and_save_data_pointers() and
   bpf_restore_data_pointers().

Changes v1 -> v2:
1. Updated the list of read-only fields, and read-write fields.
2. Added dummy sk to bpf_prog_test_run_skb().

This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access
some __skb_buff data directly.

Song Liu (2):
  bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
  bpf: add tests for direct packet access from CGROUP_SKB

 include/linux/filter.h                      |  21 +++
 kernel/bpf/cgroup.c                         |   6 +
 net/bpf/test_run.c                          |   7 +
 net/core/filter.c                           |  36 ++++-
 tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
 5 files changed, 240 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH v5 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Song Liu @ 2018-10-19  5:01 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, Song Liu
In-Reply-To: <20181019050107.3684711-1-songliubraving@fb.com>

Tests are added to make sure CGROUP_SKB cannot access:
  tc_classid, data_meta, flow_keys

and can read and write:
  mark, prority, and cb[0-4]

and can read other fields.

To make selftest with skb->sk work, a dummy sk is added in
bpf_prog_test_run_skb().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 net/bpf/test_run.c                          |   7 +
 tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
 2 files changed, 178 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0c423b8cd75c..8dccac305268 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -10,6 +10,8 @@
 #include <linux/etherdevice.h>
 #include <linux/filter.h>
 #include <linux/sched/signal.h>
+#include <net/sock.h>
+#include <net/tcp.h>
 
 static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
 		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
@@ -115,6 +117,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	u32 retval, duration;
 	int hh_len = ETH_HLEN;
 	struct sk_buff *skb;
+	struct sock sk = {0};
 	void *data;
 	int ret;
 
@@ -137,11 +140,15 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 		break;
 	}
 
+	sock_net_set(&sk, &init_net);
+	sock_init_data(NULL, &sk);
+
 	skb = build_skb(data, 0);
 	if (!skb) {
 		kfree(data);
 		return -ENOMEM;
 	}
+	skb->sk = &sk;
 
 	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
 	__skb_put(skb, size);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index cf4cd32b6772..f1ae8d09770f 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4862,6 +4862,177 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
+	{
+		"direct packet read test#1 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, len)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, pkt_type)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, queue_mapping)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, protocol)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_present)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+			BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#2 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_tci)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_proto)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff,
+					     ingress_ifindex)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_index)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, hash)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#3 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_4,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_5,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_7,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_8,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#4 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, family)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_port)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_port)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of tc_classid for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_classid)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of data_meta for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_meta)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of flow_keys for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, flow_keys)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid write access to napi_id for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_9,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
 	{
 		"valid cgroup storage access",
 		.insns = {
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.
From: Martin Lau @ 2018-10-19  5:06 UTC (permalink / raw)
  To: Joe Stringer
  Cc: daniel@iogearbox.net, Nitin Hande, netdev, ast@kernel.org,
	Jesper Brouer, john fastabend
In-Reply-To: <CAOftzPimZHf1qVSRo2gwifaOcy-_b4hYmA=HCYzNAm9m=mxmZw@mail.gmail.com>

On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote:
> On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 10/18/2018 11:06 PM, Joe Stringer wrote:
> > > On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
> > [...]
> > >> Open Issue
> > >> * The underlying code relies on presence of an skb to find out the
> > >> right sk for the case of REUSEPORT socket option. Since there is
> > >> no skb available at XDP hookpoint, the helper function will return
> > >> the first available sk based off the 5 tuple hash. If the desire
> > >> is to return a particular sk matching reuseport_cb function, please
> > >> suggest way to tackle it, which can be addressed in a future commit.
> >
> > >> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
> > >
> > > Thanks Nitin, LGTM overall.
> > >
> > > The REUSEPORT thing suggests that the usage of this helper from XDP
> > > layer may lead to a different socket being selected vs. the equivalent
> > > call at TC hook, or other places where the selection may occur. This
> > > could be a bit counter-intuitive.
> > >
> > > One thought I had to work around this was to introduce a flag,
> > > something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> > > effectively communicate in the API that the bpf_sk_lookup_xxx()
> > > functions will only select a REUSEPORT socket based on the hash and
> > > not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> > > of the flag would support finding REUSEPORT sockets by other
> > > mechanisms (which would be allowed for now from TC hooks but would be
> > > disallowed from XDP, since there's no specific plan to support this).
> >
> > Hmm, given skb is NULL here the only way to lookup the socket in such
> > scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
> > perhaps alternative is to pass this hash in from XDP itself to the
> > helper so it could be custom selector. Do you have a specific use case
> > on this for XDP (just curious)?
> 
> I don't have a use case for SO_REUSEPORT introspection from XDP, so
> I'm primarily thinking from the perspective of making the behaviour
> clear in the API in a way that leaves open the possibility for a
> reasonable implementation in future. From that perspective, my main
> concern is that it may surprise some BPF writers that the same
> "bpf_sk_lookup_tcp()" call (with identical parameters) may have
> different behaviour at TC vs. XDP layers, as the BPF selection of
> sockets is respected at TC but not at XDP.
> 
> FWIW we're already out of parameters for the actual call, so if we
> wanted to allow passing a hash in, we'd need to either dedicate half
> the 'flags' field for this configurable hash, or consider adding the
> new hash parameter to 'struct bpf_sock_tuple'.
> 
> +Martin for any thoughts on SO_REUSEPORT and XDP here.
The XDP/TC prog has read access to the sk fields through
'struct bpf_sock'?

A quick thought...
Considering all sk in the same reuse->socks[] share
many things (e.g. family,type,protocol,ip,port..etc are the same),
I wonder returning which particular sk from reuse->socks[] will
matter too much since most of the fields from 'struct bpf_sock' will
be the same.  Some of fields in 'struct bpf_sock' could be different
though, like priority?  Hence, another possibility is to limit the
accessible fields for the XDP prog.  Only allow accessing the fields
that must be the same among the sk in the same reuse->socks[].

^ permalink raw reply

* Re: [PATCH v5 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Eric Dumazet @ 2018-10-19  5:22 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: ast, daniel, kernel-team
In-Reply-To: <20181019050107.3684711-3-songliubraving@fb.com>



On 10/18/2018 10:01 PM, Song Liu wrote:
> Tests are added to make sure CGROUP_SKB cannot access:
>   tc_classid, data_meta, flow_keys
> 
> and can read and write:
>   mark, prority, and cb[0-4]
> 
> and can read other fields.
> 
> To make selftest with skb->sk work, a dummy sk is added in
> bpf_prog_test_run_skb().
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  net/bpf/test_run.c                          |   7 +
>  tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
>  2 files changed, 178 insertions(+)
> 
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 0c423b8cd75c..8dccac305268 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -10,6 +10,8 @@
>  #include <linux/etherdevice.h>
>  #include <linux/filter.h>
>  #include <linux/sched/signal.h>
> +#include <net/sock.h>
> +#include <net/tcp.h>
>  
>  static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
>  		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
> @@ -115,6 +117,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  	u32 retval, duration;
>  	int hh_len = ETH_HLEN;
>  	struct sk_buff *skb;
> +	struct sock sk = {0};

Arg another dummy :/

>  	void *data;
>  	int ret;
>  
> @@ -137,11 +140,15 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  		break;
>  	}
>  
> +	sock_net_set(&sk, &init_net);

A few lines later we use :

skb->protocol = eth_type_trans(skb, current->nsproxy->net_ns->loopback_dev);

So it looks like you want instead for consistency :

sock_net_set(&sk, current->nsproxy->net_ns);

	

> +	sock_init_data(NULL, &sk);
> +
>  	skb = build_skb(data, 0);
>  	if (!skb) {
>  		kfree(data);
>  		return -ENOMEM;
>  	}
> +	skb->sk = &sk;
>  

Normally this would need a skb->destructor, but I guess nothing will call skb_orphan()
from this point.

^ permalink raw reply

* Re: [PATCH bpf-next 1/2] samples: bpf: improve xdp1 example
From: Y Song @ 2018-10-19  5:32 UTC (permalink / raw)
  To: mcroce; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20181018204709.29547-2-mcroce@redhat.com>

On Thu, Oct 18, 2018 at 1:48 PM Matteo Croce <mcroce@redhat.com> wrote:
>
> Store only the total packet count for every protocol, instead of the
> whole per-cpu array.
> Use bpf_map_get_next_key() to iterate the map, instead of looking up
> all the protocols.
>
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* Re: [PATCH bpf-next 2/2] samples: bpf: get ifindex from ifname
From: Y Song @ 2018-10-19  5:34 UTC (permalink / raw)
  To: mcroce; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <20181018204709.29547-3-mcroce@redhat.com>

On Thu, Oct 18, 2018 at 1:48 PM Matteo Croce <mcroce@redhat.com> wrote:
>
> Find the ifindex via ioctl(SIOCGIFINDEX) instead of requiring the
> numeric ifindex.

Maybe use if_nametoindex which is simpler?

>
> Signed-off-by: Matteo Croce <mcroce@redhat.com>
> ---
>  samples/bpf/xdp1_user.c | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
> index 4f3d824fc044..a1d0c5dcee9c 100644
> --- a/samples/bpf/xdp1_user.c
> +++ b/samples/bpf/xdp1_user.c
> @@ -15,6 +15,9 @@
>  #include <unistd.h>
>  #include <libgen.h>
>  #include <sys/resource.h>
> +#include <sys/ioctl.h>
> +#include <sys/socket.h>
> +#include <linux/if.h>
>
>  #include "bpf_util.h"
>  #include "bpf/bpf.h"
> @@ -59,7 +62,7 @@ static void poll_stats(int map_fd, int interval)
>  static void usage(const char *prog)
>  {
>         fprintf(stderr,
> -               "usage: %s [OPTS] IFINDEX\n\n"
> +               "usage: %s [OPTS] IFACE\n\n"
>                 "OPTS:\n"
>                 "    -S    use skb-mode\n"
>                 "    -N    enforce native mode\n",
> @@ -74,9 +77,11 @@ int main(int argc, char **argv)
>         };
>         const char *optstr = "SN";
>         int prog_fd, map_fd, opt;
> +       struct ifreq ifr = { 0 };
>         struct bpf_object *obj;
>         struct bpf_map *map;
>         char filename[256];
> +       int sock;
>
>         while ((opt = getopt(argc, argv, optstr)) != -1) {
>                 switch (opt) {
> @@ -102,7 +107,24 @@ int main(int argc, char **argv)
>                 return 1;
>         }
>
> -       ifindex = strtoul(argv[optind], NULL, 0);
> +       sock = socket(AF_UNIX, SOCK_DGRAM, 0);
> +       if (sock == -1) {
> +               perror("socket");
> +               return 1;
> +       }
> +
> +       if (strlen(argv[optind]) >= IFNAMSIZ) {
> +               printf("invalid ifname '%s'\n", argv[optind]);
> +               return 1;
> +       }
> +
> +       strcpy(ifr.ifr_name, argv[optind]);
> +       if (ioctl(sock, SIOCGIFINDEX, &ifr) < 0) {
> +               perror("SIOCGIFINDEX");
> +               return 1;
> +       }
> +       close(sock);
> +       ifindex = ifr.ifr_ifindex;
>
>         snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
>         prog_load_attr.file = filename;
> --
> 2.19.1
>

^ permalink raw reply

* Re: netconsole warning in 4.19.0-rc7
From: Dave Jones @ 2018-10-19 13:47 UTC (permalink / raw)
  To: Meelis Roos; +Cc: Cong Wang, LKML, Linux Kernel Network Developers, davem
In-Reply-To: <ddc5fea8-6c66-d096-57bb-c86a3771fec4@linux.ee>

On Fri, Oct 19, 2018 at 12:51:38PM +0300, Meelis Roos wrote:
 > >   > > I took another look at that error path. Turns out this is all we need I
 > >   > > think..
 > >   >
 > >   > With this patch applied on top of 4.19-rc8, I stll get the warning:
 > >   >
 > >   > [   13.722919] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:168 __local_bh_enable_ip+0x2e/0x44
 > > 
 > > It's going to be a couple days before I get chance to get back to this.
 > > Did the previous patch in this thread (without the _bh) fare any better?
 > 
 > Tried it with rcu_read_unlock() instead, still the same.

Ok, this is going to take more time than I have.  DaveM, do you want
to revert 6fe9487892b32cb1c8b8b0d552ed7222a527fe30, or do you want a
patch doing the same ?

That'll bring back the rcu warning, but fewer people were complaining
about that than this new issue..

	Dave

^ permalink raw reply

* Re: [PATCH v5 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Song Liu @ 2018-10-19  5:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Kernel Team
In-Reply-To: <0b32fa1d-a607-34a8-bea0-cd2bf4ef7512@gmail.com>



> On Oct 18, 2018, at 10:22 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 10/18/2018 10:01 PM, Song Liu wrote:
>> Tests are added to make sure CGROUP_SKB cannot access:
>>  tc_classid, data_meta, flow_keys
>> 
>> and can read and write:
>>  mark, prority, and cb[0-4]
>> 
>> and can read other fields.
>> 
>> To make selftest with skb->sk work, a dummy sk is added in
>> bpf_prog_test_run_skb().
>> 
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>> net/bpf/test_run.c                          |   7 +
>> tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
>> 2 files changed, 178 insertions(+)
>> 
>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>> index 0c423b8cd75c..8dccac305268 100644
>> --- a/net/bpf/test_run.c
>> +++ b/net/bpf/test_run.c
>> @@ -10,6 +10,8 @@
>> #include <linux/etherdevice.h>
>> #include <linux/filter.h>
>> #include <linux/sched/signal.h>
>> +#include <net/sock.h>
>> +#include <net/tcp.h>
>> 
>> static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
>> 		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
>> @@ -115,6 +117,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>> 	u32 retval, duration;
>> 	int hh_len = ETH_HLEN;
>> 	struct sk_buff *skb;
>> +	struct sock sk = {0};
> 
> Arg another dummy :/
> 
>> 	void *data;
>> 	int ret;
>> 
>> @@ -137,11 +140,15 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>> 		break;
>> 	}
>> 
>> +	sock_net_set(&sk, &init_net);
> 
> A few lines later we use :
> 
> skb->protocol = eth_type_trans(skb, current->nsproxy->net_ns->loopback_dev);
> 
> So it looks like you want instead for consistency :
> 
> sock_net_set(&sk, current->nsproxy->net_ns);

Thanks! Will fix this in v6.

> 	
> 
>> +	sock_init_data(NULL, &sk);
>> +
>> 	skb = build_skb(data, 0);
>> 	if (!skb) {
>> 		kfree(data);
>> 		return -ENOMEM;
>> 	}
>> +	skb->sk = &sk;
>> 
> 
> Normally this would need a skb->destructor, but I guess nothing will call skb_orphan()
> from this point.

Yeah, I double checked, this should be OK. 

Song

^ permalink raw reply

* [PATCH v6 bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
From: Song Liu @ 2018-10-19  5:53 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, edumazet, Song Liu
In-Reply-To: <20181019055333.3829195-1-songliubraving@fb.com>

BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
skb. This patch enables direct access of skb for these programs.

Two helper functions bpf_compute_and_save_data_end() and
bpf_restore_data_end() are introduced. There are used in
__cgroup_bpf_run_filter_skb(), to compute proper data_end for the
BPF program, and restore original data afterwards.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/filter.h | 21 +++++++++++++++++++++
 kernel/bpf/cgroup.c    |  6 ++++++
 net/core/filter.c      | 36 +++++++++++++++++++++++++++++++++++-
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5771874bc01e..91b4c934f02e 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -548,6 +548,27 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
 	cb->data_end  = skb->data + skb_headlen(skb);
 }
 
+/* Similar to bpf_compute_data_pointers(), except that save orginal
+ * data in cb->data and cb->meta_data for restore.
+ */
+static inline void bpf_compute_and_save_data_end(
+	struct sk_buff *skb, void **saved_data_end)
+{
+	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+	*saved_data_end = cb->data_end;
+	cb->data_end  = skb->data + skb_headlen(skb);
+}
+
+/* Restore data saved by bpf_compute_data_pointers(). */
+static inline void bpf_restore_data_end(
+	struct sk_buff *skb, void *saved_data_end)
+{
+	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+	cb->data_end = saved_data_end;
+}
+
 static inline u8 *bpf_skb_cb(struct sk_buff *skb)
 {
 	/* eBPF programs may read/write skb->cb[] area to transfer meta
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 00f6ed2e4f9a..9425c2fb872f 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -553,6 +553,7 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 {
 	unsigned int offset = skb->data - skb_network_header(skb);
 	struct sock *save_sk;
+	void *saved_data_end;
 	struct cgroup *cgrp;
 	int ret;
 
@@ -566,8 +567,13 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	save_sk = skb->sk;
 	skb->sk = sk;
 	__skb_push(skb, offset);
+
+	/* compute pointers for the bpf prog */
+	bpf_compute_and_save_data_end(skb, &saved_data_end);
+
 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
 				 bpf_prog_run_save_cb);
+	bpf_restore_data_end(skb, saved_data_end);
 	__skb_pull(skb, offset);
 	skb->sk = save_sk;
 	return ret == 1 ? 0 : -EPERM;
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..e3ca30bd6840 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5346,6 +5346,40 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return bpf_skb_is_valid_access(off, size, type, prog, info);
 }
 
+static bool cg_skb_is_valid_access(int off, int size,
+				   enum bpf_access_type type,
+				   const struct bpf_prog *prog,
+				   struct bpf_insn_access_aux *info)
+{
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, tc_classid):
+	case bpf_ctx_range(struct __sk_buff, data_meta):
+	case bpf_ctx_range(struct __sk_buff, flow_keys):
+		return false;
+	}
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case bpf_ctx_range(struct __sk_buff, mark):
+		case bpf_ctx_range(struct __sk_buff, priority):
+		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	switch (off) {
+	case bpf_ctx_range(struct __sk_buff, data):
+		info->reg_type = PTR_TO_PACKET;
+		break;
+	case bpf_ctx_range(struct __sk_buff, data_end):
+		info->reg_type = PTR_TO_PACKET_END;
+		break;
+	}
+
+	return bpf_skb_is_valid_access(off, size, type, prog, info);
+}
+
 static bool lwt_is_valid_access(int off, int size,
 				enum bpf_access_type type,
 				const struct bpf_prog *prog,
@@ -7038,7 +7072,7 @@ const struct bpf_prog_ops xdp_prog_ops = {
 
 const struct bpf_verifier_ops cg_skb_verifier_ops = {
 	.get_func_proto		= cg_skb_func_proto,
-	.is_valid_access	= sk_filter_is_valid_access,
+	.is_valid_access	= cg_skb_is_valid_access,
 	.convert_ctx_access	= bpf_convert_ctx_access,
 };
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH v6 bpf-next 0/2] bpf: add cg_skb_is_valid_access
From: Song Liu @ 2018-10-19  5:53 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, edumazet, Song Liu

Changes v5 -> v6:
1. Fixed dummy sk in bpf_prog_test_run_skb() as suggested by Eric Dumazet.

Changes v4 -> v5:
1. Replaced bpf_compute_and_save_data_pointers() with
   bpf_compute_and_save_data_end();
   Replaced bpf_restore_data_pointers() with bpf_restore_data_end().
2. Fixed indentation in test_verifier.c

Changes v3 -> v4:
1. Fixed crash issue reported by Alexei.

Changes v2 -> v3:
1. Added helper function bpf_compute_and_save_data_pointers() and
   bpf_restore_data_pointers().

Changes v1 -> v2:
1. Updated the list of read-only fields, and read-write fields.
2. Added dummy sk to bpf_prog_test_run_skb().

This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access
some __skb_buff data directly.

Song Liu (2):
  bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
  bpf: add tests for direct packet access from CGROUP_SKB

 include/linux/filter.h                      |  21 +++
 kernel/bpf/cgroup.c                         |   6 +
 net/bpf/test_run.c                          |   7 +
 net/core/filter.c                           |  36 ++++-
 tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
 5 files changed, 240 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH v6 bpf-next 2/2] bpf: add tests for direct packet access from CGROUP_SKB
From: Song Liu @ 2018-10-19  5:53 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, kernel-team, edumazet, Song Liu
In-Reply-To: <20181019055333.3829195-1-songliubraving@fb.com>

Tests are added to make sure CGROUP_SKB cannot access:
  tc_classid, data_meta, flow_keys

and can read and write:
  mark, prority, and cb[0-4]

and can read other fields.

To make selftest with skb->sk work, a dummy sk is added in
bpf_prog_test_run_skb().

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 net/bpf/test_run.c                          |   7 +
 tools/testing/selftests/bpf/test_verifier.c | 171 ++++++++++++++++++++
 2 files changed, 178 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0c423b8cd75c..87ea279cb095 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -10,6 +10,8 @@
 #include <linux/etherdevice.h>
 #include <linux/filter.h>
 #include <linux/sched/signal.h>
+#include <net/sock.h>
+#include <net/tcp.h>
 
 static __always_inline u32 bpf_test_run_one(struct bpf_prog *prog, void *ctx,
 		struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE])
@@ -115,6 +117,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	u32 retval, duration;
 	int hh_len = ETH_HLEN;
 	struct sk_buff *skb;
+	struct sock sk = {0};
 	void *data;
 	int ret;
 
@@ -137,11 +140,15 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 		break;
 	}
 
+	sock_net_set(&sk, current->nsproxy->net_ns);
+	sock_init_data(NULL, &sk);
+
 	skb = build_skb(data, 0);
 	if (!skb) {
 		kfree(data);
 		return -ENOMEM;
 	}
+	skb->sk = &sk;
 
 	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
 	__skb_put(skb, size);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index cf4cd32b6772..f1ae8d09770f 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -4862,6 +4862,177 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 	},
+	{
+		"direct packet read test#1 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, data)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_end)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, len)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, pkt_type)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, mark)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, queue_mapping)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, protocol)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_present)),
+			BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+			BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+			BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#2 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_tci)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, vlan_proto)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, priority)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff,
+					     ingress_ifindex)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_index)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, hash)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#3 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_4,
+				    offsetof(struct __sk_buff, cb[0])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_5,
+				    offsetof(struct __sk_buff, cb[1])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
+				    offsetof(struct __sk_buff, cb[2])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_7,
+				    offsetof(struct __sk_buff, cb[3])),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_8,
+				    offsetof(struct __sk_buff, cb[4])),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"direct packet read test#4 for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1,
+				    offsetof(struct __sk_buff, family)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_4, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip4)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_5, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[0])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[1])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[2])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_ip6[3])),
+			BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_1,
+				    offsetof(struct __sk_buff, remote_port)),
+			BPF_LDX_MEM(BPF_W, BPF_REG_8, BPF_REG_1,
+				    offsetof(struct __sk_buff, local_port)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of tc_classid for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, tc_classid)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of data_meta for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, data_meta)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid access of flow_keys for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+				    offsetof(struct __sk_buff, flow_keys)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
+	{
+		"invalid write access to napi_id for CGROUP_SKB",
+		.insns = {
+			BPF_LDX_MEM(BPF_W, BPF_REG_9, BPF_REG_1,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_9,
+				    offsetof(struct __sk_buff, napi_id)),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = REJECT,
+		.errstr = "invalid bpf_context access",
+		.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+	},
 	{
 		"valid cgroup storage access",
 		.insns = {
-- 
2.17.1

^ permalink raw reply related

* Re: [RFC PATCH net-next 2/4] net: 8021q: vlan_core: allow use list of vlans for real device
From: Ivan Khoronzhuk @ 2018-10-19 14:02 UTC (permalink / raw)
  To: Bjørn Mork
  Cc: grygorii.strashko, davem, linux-omap, netdev, linux-kernel,
	alexander.h.duyck
In-Reply-To: <878t2up4er.fsf@miraculix.mork.no>

On Fri, Oct 19, 2018 at 01:22:20PM +0200, Bjørn Mork wrote:
>Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> writes:
>
>> @@ -236,6 +239,13 @@ __vlan_find_dev_deep_rcu(struct net_device *real_dev,
>>  	return NULL;
>>  }
>>
>> +static inline int
>> +vlan_for_each(struct net_device *dev,
>> +	      int (*action)(struct net_device *dev, int vid, void *arg),
>> +	      void *arg)
>> +{
>> +}
>> +
>
>
>This stub should return 0, shouldn't it?
yes, it has.

>
>
>Bjørn

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* [PATCH] selftests/bpf: add missing executables to .gitignore
From: Anders Roxell @ 2018-10-19 14:24 UTC (permalink / raw)
  To: ast, daniel, shuah; +Cc: netdev, linux-kernel, linux-kselftest, Anders Roxell

Fixes: 371e4fcc9d96 ("selftests/bpf: cgroup local storage-based network counters")
Fixes: 370920c47b26 ("selftests/bpf: Test libbpf_{prog,attach}_type_by_name")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
---
 tools/testing/selftests/bpf/.gitignore | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 8a60c9b9892d..1b799e30c06d 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -25,3 +25,5 @@ test_cgroup_storage
 test_select_reuseport
 test_flow_dissector
 flow_dissector_load
+test_netcnt
+test_section_names
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH 1/3] net: socionext: Stop PHY before resetting netsec
From: Masahisa Kojima @ 2018-10-19  6:24 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, Ilias Apalodimas, Jassi Brar, Ard Biesheuvel,
	Yoshitoyo Osaki
In-Reply-To: <a9434d99-a757-c14f-faac-dfd46f6b1a53@gmail.com>

Hi,

Thank you very much for your comment.
I have not realized that reading MII_PHYSID1/ID2 are ignored
if the PHY is in power down, this patch has a side effect
in ACPI case.

My intention is that netsec_reset_hardware() should be called
in PHY power down state, but current place to add PHY power down
is not proper.
I will consider the right place to do.

On Fri, 19 Oct 2018 at 11:59, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 10/18/2018 6:08 PM, masahisa.kojima@linaro.org wrote:
> > From: Masahisa Kojima <masahisa.kojima@linaro.org>
> >
> > After resetting netsec IP, driver have to wait until
> > netsec mode turns to NRM mode.
> > But sometimes mode transition to NRM will not complete
> > if the PHY is in normal operation state.
> > To avoid this situation, stop PHY before resetting netsec.
> >
> > Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
> > Signed-off-by: Yoshitoyo Osaki <osaki.yoshitoyo@socionext.com>
> > ---
> >  drivers/net/ethernet/socionext/netsec.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> > index 4289ccb26e4e..273cc5fc07e0 100644
> > --- a/drivers/net/ethernet/socionext/netsec.c
> > +++ b/drivers/net/ethernet/socionext/netsec.c
> > @@ -1343,11 +1343,11 @@ static int netsec_netdev_stop(struct net_device *ndev)
> >       netsec_uninit_pkt_dring(priv, NETSEC_RING_TX);
> >       netsec_uninit_pkt_dring(priv, NETSEC_RING_RX);
> >
> > -     ret = netsec_reset_hardware(priv, false);
> > -
> >       phy_stop(ndev->phydev);
> >       phy_disconnect(ndev->phydev);
> >
> > +     ret = netsec_reset_hardware(priv, false);
> > +
> >       pm_runtime_put_sync(priv->dev);
> >
> >       return ret;
> > @@ -1415,7 +1415,7 @@ static const struct net_device_ops netsec_netdev_ops = {
> >  };
> >
> >  static int netsec_of_probe(struct platform_device *pdev,
> > -                        struct netsec_priv *priv)
> > +                        struct netsec_priv *priv, u32 *phy_addr)
> >  {
> >       priv->phy_np = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
> >       if (!priv->phy_np) {
> > @@ -1423,6 +1423,8 @@ static int netsec_of_probe(struct platform_device *pdev,
> >               return -EINVAL;
> >       }
> >
> > +     *phy_addr = of_mdio_parse_addr(&pdev->dev, priv->phy_np);
> > +
> >       priv->clk = devm_clk_get(&pdev->dev, NULL); /* get by 'phy_ref_clk' */
> >       if (IS_ERR(priv->clk)) {
> >               dev_err(&pdev->dev, "phy_ref_clk not found\n");
> > @@ -1473,6 +1475,7 @@ static int netsec_register_mdio(struct netsec_priv *priv, u32 phy_addr)
> >  {
> >       struct mii_bus *bus;
> >       int ret;
> > +     u16 data;
> >
> >       bus = devm_mdiobus_alloc(priv->dev);
> >       if (!bus)
> > @@ -1486,6 +1489,10 @@ static int netsec_register_mdio(struct netsec_priv *priv, u32 phy_addr)
> >       bus->parent = priv->dev;
> >       priv->mii_bus = bus;
> >
> > +     /* set phy power down */
> > +     data = netsec_phy_read(bus, phy_addr, 0) | 0x800;
> > +     netsec_phy_write(bus, phy_addr, 0, data);
>
> This is not explained in your commit message, and this is using open
> coded values instead of the symbolic names from include/uapi/linux/mii.h.
>
> Note that depending on the type of PHY you are interfaced with if the
> PHY is in power down, the only register it is allowed to accept writes
> for is MII_BMCR (per 802.3 clause 22 spec), any other read or write to a
> different register can be discarded by its MDIO snooping logic. Since
> probing the MDIO bus involves reading MII_PHYSID1/ID2, what is this
> trying to achieve?
>
> > +
> >       if (dev_of_node(priv->dev)) {
> >               struct device_node *mdio_node, *parent = dev_of_node(priv->dev);
> >
> > @@ -1623,7 +1630,7 @@ static int netsec_probe(struct platform_device *pdev)
> >       }
> >
> >       if (dev_of_node(&pdev->dev))
> > -             ret = netsec_of_probe(pdev, priv);
> > +             ret = netsec_of_probe(pdev, priv, &phy_addr);
> >       else
> >               ret = netsec_acpi_probe(pdev, priv, &phy_addr);
> >       if (ret)
> >
>
> --
> Florian

^ permalink raw reply

* Re: [PATCH 2/3] net: socionext: Add dummy PHY register read in phy_write()
From: Masahisa Kojima @ 2018-10-19  6:44 UTC (permalink / raw)
  To: f.fainelli
  Cc: netdev, Ilias Apalodimas, Jassi Brar, Ard Biesheuvel,
	Yoshitoyo Osaki
In-Reply-To: <6b6328ad-a1a8-f3de-6618-40a703db178f@gmail.com>

On Fri, 19 Oct 2018 at 12:01, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 10/18/2018 6:08 PM, masahisa.kojima@linaro.org wrote:
> > From: Masahisa Kojima <masahisa.kojima@linaro.org>
> >
> > There is a compatibility issue between RTL8211E implemented
> > in Developerbox and netsec network controller IP(F_GMAC4).
> >
> > RTL8211E expects MDC clock must be kept toggling for several
> > clock cycle with MDIO high before entering the IDLE state.
> > To meet this requirement, netsec driver needs to issue dummy
> > read(e.g. read PHYID1(offset 0x2) register) right after write.
> >
> > Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
> > Signed-off-by: Yoshitoyo Osaki <osaki.yoshitoyo@socionext.com>
> > ---
> >  drivers/net/ethernet/socionext/netsec.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> > index 273cc5fc07e0..e7faaf8be99e 100644
> > --- a/drivers/net/ethernet/socionext/netsec.c
> > +++ b/drivers/net/ethernet/socionext/netsec.c
> > @@ -431,9 +431,12 @@ static int netsec_mac_update_to_phy_state(struct netsec_priv *priv)
> >       return 0;
> >  }
> >
> > +static int netsec_phy_read(struct mii_bus *bus, int phy_addr, int reg_addr);
> > +
> >  static int netsec_phy_write(struct mii_bus *bus,
> >                           int phy_addr, int reg, u16 val)
> >  {
> > +     int status;
> >       struct netsec_priv *priv = bus->priv;
> >
> >       if (netsec_mac_write(priv, GMAC_REG_GDR, val))
> > @@ -446,8 +449,19 @@ static int netsec_phy_write(struct mii_bus *bus,
> >                             GMAC_REG_SHIFT_CR_GAR)))
> >               return -ETIMEDOUT;
> >
> > -     return netsec_mac_wait_while_busy(priv, GMAC_REG_GAR,
> > -                                       NETSEC_GMAC_GAR_REG_GB);
> > +     status = netsec_mac_wait_while_busy(priv, GMAC_REG_GAR,
> > +                                         NETSEC_GMAC_GAR_REG_GB);
> > +
> > +     /* Developerbox implements RTL8211E PHY and there is
> > +      * a compatibility problem with F_GMAC4.
> > +      * RTL8211E expects MDC clock must be kept toggling for several
> > +      * clock cycle with MDIO high before entering the IDLE state.
> > +      * To meet this requirement, netsec driver needs to issue dummy
> > +      * read(e.g. read PHYID1(offset 0x2) register) right after write.
> > +      */
> > +     netsec_phy_read(bus, phy_addr, 2);
>
> MII_PHYSID1 instead of 0x2 would be preferable. It is not clear to me
> from your commit message if this is a problem specific to your MDIO
> controller implementation and the RTL8211E PHY or if this is a general
> problem of the PHY irrespective of the MDIO controller it is interface
> with. If the latter, then we should seek a solution at a different layer
> such that other systems don't have that same problem.
>
> Thank you!
> --
> Florian

I will replace direct value with MACROs, also for other instances.

Our MDIO controller stops MDC clock right after the write access,
but RTL8211E PHY expects to keep several clock cycle.
So it is depending on our MDIO controller implementation.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox