Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] fix noderef.cocci warnings
From: kbuild test robot @ 2019-08-08 15:10 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: kbuild-all, Martin Weinelt, bridge, Roopa Prabhu, netdev
In-Reply-To: <908e9e90-70cc-7bbe-f83f-0810c9ef3925@cumulusnetworks.com>

From: kbuild test robot <lkp@intel.com>

net/bridge/br_multicast.c:999:8-14: ERROR: application of sizeof to pointer

 sizeof when applied to a pointer typed expression gives the size of
 the pointer

Generated by: scripts/coccinelle/misc/noderef.cocci

Fixes: 17c91348ed8b ("Use-after-free in br_multicast_rcv")
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
---

url:    https://github.com/0day-ci/linux/commits/Nikolay-Aleksandrov/net-bridge-mcast-fix-possible-uses-of-stale-pointers/20190702-083354

 br_multicast.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -996,7 +996,7 @@ static int br_ip6_multicast_mld2_report(
 			return -EINVAL;
 
 		_nsrcs = skb_header_pointer(skb, nsrcs_offset,
-					   sizeof(_nsrcs), &__nsrcs);
+					   sizeof(*_nsrcs), &__nsrcs);
 		if (!_nsrcs)
 			return -EINVAL;
 

^ permalink raw reply

* Re: [PATCH TEST] net: bridge: mcast: fix possible uses of stale pointers
From: kbuild test robot @ 2019-08-08 15:10 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: kbuild-all, Martin Weinelt, bridge, Roopa Prabhu, netdev
In-Reply-To: <908e9e90-70cc-7bbe-f83f-0810c9ef3925@cumulusnetworks.com>

Hi Nikolay,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net/master]
[cannot apply to v5.3-rc3 next-20190808]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nikolay-Aleksandrov/net-bridge-mcast-fix-possible-uses-of-stale-pointers/20190702-083354

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


coccinelle warnings: (new ones prefixed by >>)

>> net/bridge/br_multicast.c:999:8-14: ERROR: application of sizeof to pointer

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH v2 01/15] net: phy: adin: add support for Analog Devices PHYs
From: Andrew Lunn @ 2019-08-08 15:13 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	f.fainelli, hkallweit1
In-Reply-To: <20190808123026.17382-2-alexandru.ardelean@analog.com>

On Thu, Aug 08, 2019 at 03:30:12PM +0300, Alexandru Ardelean wrote:
> This change adds support for Analog Devices Industrial Ethernet PHYs.
> Particularly the PHYs this driver adds support for:
>  * ADIN1200 - Robust, Industrial, Low Power 10/100 Ethernet PHY
>  * ADIN1300 - Robust, Industrial, Low Latency 10/100/1000 Gigabit
>    Ethernet PHY
> 
> The 2 chips are pin & register compatible with one another. The main
> difference being that ADIN1200 doesn't operate in gigabit mode.
> 
> The chips can be operated by the Generic PHY driver as well via the
> standard IEEE PHY registers (0x0000 - 0x000F) which are supported by the
> kernel as well. This assumes that configuration of the PHY has been done
> completely in HW, according to spec.
> 
> Configuration can also be done via registers, which will be supported by
> this driver.
> 
> Datasheets:
>   https://www.analog.com/media/en/technical-documentation/data-sheets/ADIN1300.pdf
>   https://www.analog.com/media/en/technical-documentation/data-sheets/ADIN1200.pdf
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v2 1/2] tcp: add new tcp_mtu_probe_floor sysctl
From: Josh Hunt @ 2019-08-08 15:14 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: davem, edumazet, ncardwell
In-Reply-To: <a3a69a9d-5e30-77c4-02e2-c644bfdab820@gmail.com>

On 8/7/19 11:12 PM, Eric Dumazet wrote:
> 
> 
> On 8/8/19 1:52 AM, Josh Hunt wrote:
>> The current implementation of TCP MTU probing can considerably
>> underestimate the MTU on lossy connections allowing the MSS to get down to
>> 48. We have found that in almost all of these cases on our networks these
>> paths can handle much larger MTUs meaning the connections are being
>> artificially limited. Even though TCP MTU probing can raise the MSS back up
>> we have seen this not to be the case causing connections to be "stuck" with
>> an MSS of 48 when heavy loss is present.
>>
>> Prior to pushing out this change we could not keep TCP MTU probing enabled
>> b/c of the above reasons. Now with a reasonble floor set we've had it
>> enabled for the past 6 months.
> 
> I am still sad to see you do not share what is a reasonable value and let
> everybody guess.
> 
> It seems to be a top-secret value.

Haha, no sorry I didn't mean for it to come across like that.

We are currently setting tcp_base_mss to 1348 and tcp_mtu_probe_floor to 
1208. I thought I mentioned it in our earlier mails, but I guess I did 
not put the exact #s. 1348 was derived after analyzing common MTU we see 
across our networks and noticing that an MTU of around 1400 would cover 
a very large % (sorry I don't have the #s handy) of those paths. 1400 - 
20 - 20 - 12 = 1348. For the floor we based it off the v6 min MTU of 
1280 and subtracted out headers, etc, so 1280 - 40 - 20 - 12 = 1208. 
Using a floor of 1280 MTU matches what the RFC suggests in section 7.7 
for v6 connections, we're just applying that to v4 right now as well. I 
guess that brings up a good point, would per-IP proto floor sysctls be 
an option here for upstream (the RFC suggests different floors for v4 
and v6)? For now I'm not sure it makes sense b/c of the problems we see 
with lossy connections, but in the future if that can be resolved it 
seems like it would give some more flexibility.

I'd like to investigate this all further at some point to see if we can 
make it work better for lossy connections. It looks like one of the 
problems is there are a # of conditions which cause us to not probe in 
the upward direction. I'm not sure if any of those can be 
relaxed/changed and if so would help these connections out.

Thanks
Josh

^ permalink raw reply

* Re: [PATCH v2 02/15] net: phy: adin: hook genphy_read_abilities() to get_features
From: Andrew Lunn @ 2019-08-08 15:24 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	f.fainelli, hkallweit1
In-Reply-To: <20190808123026.17382-3-alexandru.ardelean@analog.com>

On Thu, Aug 08, 2019 at 03:30:13PM +0300, Alexandru Ardelean wrote:
> The ADIN PHYs can operate with Clause 45, however they are not typical for
> how phylib considers Clause 45 PHYs.
> 
> If the `features` field & the `get_features` hook are unspecified, and the
> device wants to operate via Clause 45, it would also try to read features
> via the `genphy_c45_pma_read_abilities()`, which will try to read PMA regs
> that are unsupported.
> 
> Hooking the `genphy_read_abilities()` function to the `get_features` hook
> will ensure that this does not happen and the PHY features are read
> correctly regardless of Clause 22 or Clause 45 operation.

I think we need to stop and think about a PHY which supports both C22
and C45.

How does bus enumeration work? Is it discovered twice?  I've always
considered phydev->is_c45 means everything is c45, not that some
registers can be accessed via c45. But the driver is mixing c22 and
c45. Does the driver actually require c45? Are some features which are
only accessibly via C45? What does C45 actually bring us for this
device?

     Andrew

^ permalink raw reply

* Re: [PATCH v2 03/15] net: phy: adin: hook genphy_{suspend,resume} into the driver
From: Andrew Lunn @ 2019-08-08 15:24 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	f.fainelli, hkallweit1
In-Reply-To: <20190808123026.17382-4-alexandru.ardelean@analog.com>

On Thu, Aug 08, 2019 at 03:30:14PM +0300, Alexandru Ardelean wrote:
> The chip supports standard suspend/resume via BMCR reg.
> Hook these functions into the `adin` driver.
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v2 04/15] net: phy: adin: add support for interrupts
From: Andrew Lunn @ 2019-08-08 15:25 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	f.fainelli, hkallweit1
In-Reply-To: <20190808123026.17382-5-alexandru.ardelean@analog.com>

On Thu, Aug 08, 2019 at 03:30:15PM +0300, Alexandru Ardelean wrote:
> This change adds support for enabling PHY interrupts that can be used by
> the PHY framework to get signal for link/speed/auto-negotiation changes.
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-08 15:28 UTC (permalink / raw)
  To: Martin Lau
  Cc: Stanislav Fomichev, netdev@vger.kernel.org, bpf@vger.kernel.org,
	davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <20190808063936.3p4ahtdkw35rrzqu@kafai-mbp>

On 08/08, Martin Lau wrote:
> On Wed, Aug 07, 2019 at 08:47:18AM -0700, Stanislav Fomichev wrote:
> > Add new helper bpf_sk_storage_clone which optionally clones sk storage
> > and call it from bpf_sk_storage_clone. Reuse the gap in
> > bpf_sk_storage_elem to store clone/non-clone flag.
> > 
> > Cc: Martin KaFai Lau <kafai@fb.com>
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/net/bpf_sk_storage.h |  10 ++++
> >  include/uapi/linux/bpf.h     |   1 +
> >  net/core/bpf_sk_storage.c    | 102 +++++++++++++++++++++++++++++++++--
> >  net/core/sock.c              |   9 ++--
> >  4 files changed, 115 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
> > index b9dcb02e756b..8e4f831d2e52 100644
> > --- a/include/net/bpf_sk_storage.h
> > +++ b/include/net/bpf_sk_storage.h
> > @@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
> >  extern const struct bpf_func_proto bpf_sk_storage_get_proto;
> >  extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
> >  
> > +#ifdef CONFIG_BPF_SYSCALL
> > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
> > +#else
> > +static inline int bpf_sk_storage_clone(const struct sock *sk,
> > +				       struct sock *newsk)
> > +{
> > +	return 0;
> > +}
> > +#endif
> > +
> >  #endif /* _BPF_SK_STORAGE_H */
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..00459ca4c8cf 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2931,6 +2931,7 @@ enum bpf_func_id {
> >  
> >  /* BPF_FUNC_sk_storage_get flags */
> >  #define BPF_SK_STORAGE_GET_F_CREATE	(1ULL << 0)
> > +#define BPF_SK_STORAGE_GET_F_CLONE	(1ULL << 1)
> It is only used in bpf_sk_storage_get().
> What if the elem is created from bpf_fd_sk_storage_update_elem()
> i.e. from the syscall API ?
> 
> What may be the use case for a map to have both CLONE and non-CLONE
> elements?  If it is not the case, would it be better to add
> BPF_F_CLONE to bpf_attr->map_flags?
I didn't think about putting it on the map itself since the API
is on a per-element, but it does make sense. I can't come up
with a use-case for a per-element selective clone/non-clone.
Thanks, will move to the map itself.

> >  
> >  /* Mode for BPF_FUNC_skb_adjust_room helper. */
> >  enum bpf_adj_room_mode {
> > diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
> > index 94c7f77ecb6b..b6dea67965bc 100644
> > --- a/net/core/bpf_sk_storage.c
> > +++ b/net/core/bpf_sk_storage.c
> > @@ -12,6 +12,9 @@
> >  
> >  static atomic_t cache_idx;
> >  
> > +#define BPF_SK_STORAGE_GET_F_MASK	(BPF_SK_STORAGE_GET_F_CREATE | \
> > +					 BPF_SK_STORAGE_GET_F_CLONE)
> > +
> >  struct bucket {
> >  	struct hlist_head list;
> >  	raw_spinlock_t lock;
> > @@ -66,7 +69,8 @@ struct bpf_sk_storage_elem {
> >  	struct hlist_node snode;	/* Linked to bpf_sk_storage */
> >  	struct bpf_sk_storage __rcu *sk_storage;
> >  	struct rcu_head rcu;
> > -	/* 8 bytes hole */
> > +	u8 clone:1;
> > +	/* 7 bytes hole */
> >  	/* The data is stored in aother cacheline to minimize
> >  	 * the number of cachelines access during a cache hit.
> >  	 */
> > @@ -509,7 +513,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
> >  	return 0;
> >  }
> >  
> > -/* Called by __sk_destruct() */
> > +/* Called by __sk_destruct() & bpf_sk_storage_clone() */
> >  void bpf_sk_storage_free(struct sock *sk)
> >  {
> >  	struct bpf_sk_storage_elem *selem;
> > @@ -739,19 +743,106 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
> >  	return err;
> >  }
> >  
> > +static struct bpf_sk_storage_elem *
> > +bpf_sk_storage_clone_elem(struct sock *newsk,
> > +			  struct bpf_sk_storage_map *smap,
> > +			  struct bpf_sk_storage_elem *selem)
> > +{
> > +	struct bpf_sk_storage_elem *copy_selem;
> > +
> > +	copy_selem = selem_alloc(smap, newsk, NULL, true);
> > +	if (!copy_selem)
> > +		return ERR_PTR(-ENOMEM);
> nit.
> may be just return NULL as selem_alloc() does.
Sounds good.

> > +
> > +	if (map_value_has_spin_lock(&smap->map))
> > +		copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
> > +				      SDATA(selem)->data, true);
> > +	else
> > +		copy_map_value(&smap->map, SDATA(copy_selem)->data,
> > +			       SDATA(selem)->data);
> > +
> > +	return copy_selem;
> > +}
> > +
> > +int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
> > +{
> > +	struct bpf_sk_storage *new_sk_storage = NULL;
> > +	struct bpf_sk_storage *sk_storage;
> > +	struct bpf_sk_storage_elem *selem;
> > +	int ret;
> > +
> > +	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > +
> > +	rcu_read_lock();
> > +	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > +
> > +	if (!sk_storage || hlist_empty(&sk_storage->list))
> > +		goto out;
> > +
> > +	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
> > +		struct bpf_sk_storage_map *smap;
> > +		struct bpf_sk_storage_elem *copy_selem;
> > +
> > +		if (!selem->clone)
> > +			continue;
> > +
> > +		smap = rcu_dereference(SDATA(selem)->smap);
> > +		if (!smap)
> smap should not be NULL.
I see; you never set it back to NULL and we are guaranteed that the
map is still around due to rcu. Removed.

> > +			continue;
> > +
> > +		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
> > +		if (IS_ERR(copy_selem)) {
> > +			ret = PTR_ERR(copy_selem);
> > +			goto err;
> > +		}
> > +
> > +		if (!new_sk_storage) {
> > +			ret = sk_storage_alloc(newsk, smap, copy_selem);
> > +			if (ret) {
> > +				kfree(copy_selem);
> > +				atomic_sub(smap->elem_size,
> > +					   &newsk->sk_omem_alloc);
> > +				goto err;
> > +			}
> > +
> > +			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
> > +			continue;
> > +		}
> > +
> > +		raw_spin_lock_bh(&new_sk_storage->lock);
> > +		selem_link_map(smap, copy_selem);
> Unlike the existing selem-update use-cases in bpf_sk_storage.c,
> the smap->map.refcnt has not been held here.  Reading the smap
> is fine.  However, adding a new selem to a deleting smap is an issue.
> Hence, I think bpf_map_inc_not_zero() should be done first.
In this case, I should probably do it after smap = rcu_deref()?

> > +		__selem_link_sk(new_sk_storage, copy_selem);
> > +		raw_spin_unlock_bh(&new_sk_storage->lock);
> > +	}
> > +
> > +out:
> > +	rcu_read_unlock();
> > +	return 0;
> > +
> > +err:
> > +	rcu_read_unlock();
> > +
> > +	bpf_sk_storage_free(newsk);
> > +	return ret;
> > +}
> > +
> >  BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> >  	   void *, value, u64, flags)
> >  {
> >  	struct bpf_sk_storage_data *sdata;
> >  
> > -	if (flags > BPF_SK_STORAGE_GET_F_CREATE)
> > +	if (flags & ~BPF_SK_STORAGE_GET_F_MASK)
> > +		return (unsigned long)NULL;
> > +
> > +	if ((flags & BPF_SK_STORAGE_GET_F_CLONE) &&
> > +	    !(flags & BPF_SK_STORAGE_GET_F_CREATE))
> >  		return (unsigned long)NULL;
> >  
> >  	sdata = sk_storage_lookup(sk, map, true);
> >  	if (sdata)
> >  		return (unsigned long)sdata->data;
> >  
> > -	if (flags == BPF_SK_STORAGE_GET_F_CREATE &&
> > +	if ((flags & BPF_SK_STORAGE_GET_F_CREATE) &&
> >  	    /* Cannot add new elem to a going away sk.
> >  	     * Otherwise, the new elem may become a leak
> >  	     * (and also other memory issues during map
> > @@ -762,6 +853,9 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> >  		/* sk must be a fullsock (guaranteed by verifier),
> >  		 * so sock_gen_put() is unnecessary.
> >  		 */
> > +		if (!IS_ERR(sdata))
> > +			SELEM(sdata)->clone =
> > +				!!(flags & BPF_SK_STORAGE_GET_F_CLONE);
> >  		sock_put(sk);
> >  		return IS_ERR(sdata) ?
> >  			(unsigned long)NULL : (unsigned long)sdata->data;
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index d57b0cc995a0..f5e801a9cea4 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
> >  			goto out;
> >  		}
> >  		RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
> > -#ifdef CONFIG_BPF_SYSCALL
> > -		RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
> > -#endif
> > +
> > +		if (bpf_sk_storage_clone(sk, newsk)) {
> > +			sk_free_unlock_clone(newsk);
> > +			newsk = NULL;
> > +			goto out;
> > +		}
> >  
> >  		newsk->sk_err	   = 0;
> >  		newsk->sk_err_soft = 0;
> > -- 
> > 2.22.0.770.g0f2c4a37fd-goog
> > 

^ permalink raw reply

* Re: [PATCH v2 05/15] net: phy: adin: add {write,read}_mmd hooks
From: Andrew Lunn @ 2019-08-08 15:35 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	f.fainelli, hkallweit1
In-Reply-To: <20190808123026.17382-6-alexandru.ardelean@analog.com>

On Thu, Aug 08, 2019 at 03:30:16PM +0300, Alexandru Ardelean wrote:
> Both ADIN1200 & ADIN1300 support Clause 45 access.
> The Extended Management Interface (EMI) registers are accessible via both
> Clause 45 (at register MDIO_MMD_VEND1) and using Clause 22.
> 
> However, the Clause 22 MMD access operations differ from the implementation
> in the kernel, in the sense that it uses registers ExtRegPtr (0x10) &
> ExtRegData (0x11) to access Clause 45 & EMI registers.

It is not that they differ from what the kernel supports. Its that
they differ from what the Standard says they should use. These
registers are defined in 802.3, part of C22, and this hardware
implements the standard incorrectly.

	   Andrew

^ permalink raw reply

* Re: [PATCH v2 06/15] net: phy: adin: configure RGMII/RMII/MII modes on config
From: Andrew Lunn @ 2019-08-08 15:38 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	f.fainelli, hkallweit1
In-Reply-To: <20190808123026.17382-7-alexandru.ardelean@analog.com>

On Thu, Aug 08, 2019 at 03:30:17PM +0300, Alexandru Ardelean wrote:
> The ADIN1300 chip supports RGMII, RMII & MII modes. Default (if
> unconfigured) is RGMII.
> This change adds support for configuring these modes via the device
> registers.
> 
> For RGMII with internal delays (modes RGMII_ID,RGMII_TXID, RGMII_RXID),
> the default delay is 2 ns. This can be configurable and will be done in
> a subsequent change.
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v2 07/15] net: phy: adin: make RGMII internal delays configurable
From: Andrew Lunn @ 2019-08-08 15:40 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	f.fainelli, hkallweit1
In-Reply-To: <20190808123026.17382-8-alexandru.ardelean@analog.com>

On Thu, Aug 08, 2019 at 03:30:18PM +0300, Alexandru Ardelean wrote:
> The internal delays for the RGMII are configurable for both RX & TX. This
> change adds support for configuring them via device-tree (or ACPI).
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v2 08/15] net: phy: adin: make RMII fifo depth configurable
From: Andrew Lunn @ 2019-08-08 15:42 UTC (permalink / raw)
  To: Alexandru Ardelean
  Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
	f.fainelli, hkallweit1
In-Reply-To: <20190808123026.17382-9-alexandru.ardelean@analog.com>

On Thu, Aug 08, 2019 at 03:30:19PM +0300, Alexandru Ardelean wrote:
> The FIFO depth can be configured for the RMII mode. This change adds
> support for doing this via device-tree (or ACPI).
> 
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [bpf-next v2 PATCH 0/3] bpf: improvements to xdp_fwd sample
From: Jesper Dangaard Brouer @ 2019-08-08 15:46 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, Jesper Dangaard Brouer

V2: Addressed issues point out by Yonghong Song
 - Please ACK patch 2/3 again
 - Added ACKs and reviewed-by to other patches

This patchset is focused on improvements for XDP forwarding sample
named xdp_fwd, which leverage the existing FIB routing tables as
described in LPC2018[1] talk by David Ahern.

The primary motivation is to illustrate how Toke's recent work
improves usability of XDP_REDIRECT via lookups in devmap. The other
patches are to help users understand the sample.

I have more improvements to xdp_fwd, but those might requires changes
to libbpf.  Thus, sending these patches first as they are isolated.

[1] http://vger.kernel.org/lpc-networking2018.html#session-1

---

Jesper Dangaard Brouer (3):
      samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
      samples/bpf: make xdp_fwd more practically usable via devmap lookup
      samples/bpf: xdp_fwd explain bpf_fib_lookup return codes

 samples/bpf/xdp_fwd_kern.c |   39 ++++++++++++++++++++++++++++++---------
 samples/bpf/xdp_fwd_user.c |   38 ++++++++++++++++++++++++++------------
 2 files changed, 56 insertions(+), 21 deletions(-)

--

^ permalink raw reply

* [bpf-next v2 PATCH 1/3] samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
From: Jesper Dangaard Brouer @ 2019-08-08 15:46 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, Jesper Dangaard Brouer
In-Reply-To: <156527914510.20297.12225832190052744019.stgit@firesoul>

The devmap name 'tx_port' came from a copy-paste from xdp_redirect_map
which only have a single TX port. Change name to xdp_tx_ports
to make it more descriptive.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/xdp_fwd_kern.c |    5 +++--
 samples/bpf/xdp_fwd_user.c |    2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index a7e94e7ff87d..e6ffc4ea06f4 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -23,7 +23,8 @@
 
 #define IPV6_FLOWINFO_MASK              cpu_to_be32(0x0FFFFFFF)
 
-struct bpf_map_def SEC("maps") tx_port = {
+/* For TX-traffic redirect requires net_device ifindex to be in this devmap */
+struct bpf_map_def SEC("maps") xdp_tx_ports = {
 	.type = BPF_MAP_TYPE_DEVMAP,
 	.key_size = sizeof(int),
 	.value_size = sizeof(int),
@@ -117,7 +118,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 
 		memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
 		memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
-		return bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
+		return bpf_redirect_map(&xdp_tx_ports, fib_params.ifindex, 0);
 	}
 
 	return XDP_PASS;
diff --git a/samples/bpf/xdp_fwd_user.c b/samples/bpf/xdp_fwd_user.c
index 5b46ee12c696..ba012d9f93dd 100644
--- a/samples/bpf/xdp_fwd_user.c
+++ b/samples/bpf/xdp_fwd_user.c
@@ -113,7 +113,7 @@ int main(int argc, char **argv)
 			return 1;
 		}
 		map_fd = bpf_map__fd(bpf_object__find_map_by_name(obj,
-								  "tx_port"));
+							"xdp_tx_ports"));
 		if (map_fd < 0) {
 			printf("map not found: %s\n", strerror(map_fd));
 			return 1;


^ permalink raw reply related

* [bpf-next v2 PATCH 2/3] samples/bpf: make xdp_fwd more practically usable via devmap lookup
From: Jesper Dangaard Brouer @ 2019-08-08 15:46 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, Jesper Dangaard Brouer
In-Reply-To: <156527914510.20297.12225832190052744019.stgit@firesoul>

This address the TODO in samples/bpf/xdp_fwd_kern.c, which points out
that the chosen egress index should be checked for existence in the
devmap. This can now be done via taking advantage of Toke's work in
commit 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF").

This change makes xdp_fwd more practically usable, as this allows for
a mixed environment, where IP-forwarding fallback to network stack, if
the egress device isn't configured to use XDP.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/xdp_fwd_kern.c |   17 +++++++++++------
 samples/bpf/xdp_fwd_user.c |   36 +++++++++++++++++++++++++-----------
 2 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index e6ffc4ea06f4..a43d6953c054 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -104,13 +104,18 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 
 	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
 
-	/* verify egress index has xdp support
-	 * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
-	 *       cannot pass map_type 14 into func bpf_map_lookup_elem#1:
-	 * NOTE: without verification that egress index supports XDP
-	 *       forwarding packets are dropped.
-	 */
 	if (rc == 0) {
+		/* Verify egress index has been configured as TX-port.
+		 * (Note: User can still have inserted an egress ifindex that
+		 * doesn't support XDP xmit, which will result in packet drops).
+		 *
+		 * Note: lookup in devmap supported since 0cdbb4b09a0.
+		 * If not supported will fail with:
+		 *  cannot pass map_type 14 into func bpf_map_lookup_elem#1:
+		 */
+		if (!bpf_map_lookup_elem(&xdp_tx_ports, &fib_params.ifindex))
+			return XDP_PASS;
+
 		if (h_proto == htons(ETH_P_IP))
 			ip_decrease_ttl(iph);
 		else if (h_proto == htons(ETH_P_IPV6))
diff --git a/samples/bpf/xdp_fwd_user.c b/samples/bpf/xdp_fwd_user.c
index ba012d9f93dd..20951bc27477 100644
--- a/samples/bpf/xdp_fwd_user.c
+++ b/samples/bpf/xdp_fwd_user.c
@@ -27,14 +27,20 @@
 #include "libbpf.h"
 #include <bpf/bpf.h>
 
-
-static int do_attach(int idx, int fd, const char *name)
+static int do_attach(int idx, int prog_fd, int map_fd, const char *name)
 {
 	int err;
 
-	err = bpf_set_link_xdp_fd(idx, fd, 0);
-	if (err < 0)
+	err = bpf_set_link_xdp_fd(idx, prog_fd, 0);
+	if (err < 0) {
 		printf("ERROR: failed to attach program to %s\n", name);
+		return err;
+	}
+
+	/* Adding ifindex as a possible egress TX port */
+	err = bpf_map_update_elem(map_fd, &idx, &idx, 0);
+	if (err)
+		printf("ERROR: failed using device %s as TX-port\n", name);
 
 	return err;
 }
@@ -47,6 +53,9 @@ static int do_detach(int idx, const char *name)
 	if (err < 0)
 		printf("ERROR: failed to detach program from %s\n", name);
 
+	/* TODO: Remember to cleanup map, when adding use of shared map
+	 *  bpf_map_delete_elem((map_fd, &idx);
+	 */
 	return err;
 }
 
@@ -67,10 +76,10 @@ int main(int argc, char **argv)
 	};
 	const char *prog_name = "xdp_fwd";
 	struct bpf_program *prog;
+	int prog_fd, map_fd = -1;
 	char filename[PATH_MAX];
 	struct bpf_object *obj;
 	int opt, i, idx, err;
-	int prog_fd, map_fd;
 	int attach = 1;
 	int ret = 0;
 
@@ -103,8 +112,17 @@ int main(int argc, char **argv)
 			return 1;
 		}
 
-		if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+		err = bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd);
+		if (err) {
+			if (err == -22) {
+				printf("Does kernel support devmap lookup?\n");
+				/* If not, the error message will be:
+				 * "cannot pass map_type 14 into func
+				 * bpf_map_lookup_elem#1"
+				 */
+			}
 			return 1;
+		}
 
 		prog = bpf_object__find_program_by_title(obj, prog_name);
 		prog_fd = bpf_program__fd(prog);
@@ -119,10 +137,6 @@ int main(int argc, char **argv)
 			return 1;
 		}
 	}
-	if (attach) {
-		for (i = 1; i < 64; ++i)
-			bpf_map_update_elem(map_fd, &i, &i, 0);
-	}
 
 	for (i = optind; i < argc; ++i) {
 		idx = if_nametoindex(argv[i]);
@@ -138,7 +152,7 @@ int main(int argc, char **argv)
 			if (err)
 				ret = err;
 		} else {
-			err = do_attach(idx, prog_fd, argv[i]);
+			err = do_attach(idx, prog_fd, map_fd, argv[i]);
 			if (err)
 				ret = err;
 		}


^ permalink raw reply related

* [bpf-next v2 PATCH 3/3] samples/bpf: xdp_fwd explain bpf_fib_lookup return codes
From: Jesper Dangaard Brouer @ 2019-08-08 15:46 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, Jesper Dangaard Brouer
In-Reply-To: <156527914510.20297.12225832190052744019.stgit@firesoul>

Make it clear that this XDP program depend on the network
stack to do the ARP resolution.  This is connected with the
BPF_FIB_LKUP_RET_NO_NEIGH return code from bpf_fib_lookup().

Another common mistake (seen via XDP-tutorial) is that users
don't realize that sysctl net.ipv{4,6}.conf.all.forwarding
setting is honored by bpf_fib_lookup.

Reported-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/xdp_fwd_kern.c |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index a43d6953c054..701a30f258b1 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -103,8 +103,23 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 	fib_params.ifindex = ctx->ingress_ifindex;
 
 	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
-
-	if (rc == 0) {
+	/*
+	 * Some rc (return codes) from bpf_fib_lookup() are important,
+	 * to understand how this XDP-prog interacts with network stack.
+	 *
+	 * BPF_FIB_LKUP_RET_NO_NEIGH:
+	 *  Even if route lookup was a success, then the MAC-addresses are also
+	 *  needed.  This is obtained from arp/neighbour table, but if table is
+	 *  (still) empty then BPF_FIB_LKUP_RET_NO_NEIGH is returned.  To avoid
+	 *  doing ARP lookup directly from XDP, then send packet to normal
+	 *  network stack via XDP_PASS and expect it will do ARP resolution.
+	 *
+	 * BPF_FIB_LKUP_RET_FWD_DISABLED:
+	 *  The bpf_fib_lookup respect sysctl net.ipv{4,6}.conf.all.forwarding
+	 *  setting, and will return BPF_FIB_LKUP_RET_FWD_DISABLED if not
+	 *  enabled this on ingress device.
+	 */
+	if (rc == BPF_FIB_LKUP_RET_SUCCESS) {
 		/* Verify egress index has been configured as TX-port.
 		 * (Note: User can still have inserted an egress ifindex that
 		 * doesn't support XDP xmit, which will result in packet drops).


^ permalink raw reply related

* Re: [PATCH net-next] r8169: make use of xmit_more
From: Holger Hoffstätte @ 2019-08-08 15:53 UTC (permalink / raw)
  To: Heiner Kallweit, Realtek linux nic maintainers, David Miller
  Cc: netdev@vger.kernel.org, Sander Eikelenboom, Eric Dumazet
In-Reply-To: <acd65426-0c7e-8c5f-a002-a36286f09122@applied-asynchrony.com>

On 8/8/19 4:37 PM, Holger Hoffstätte wrote:
> 
> Hello Heiner -
> 
> On 7/28/19 11:25 AM, Heiner Kallweit wrote:
>> There was a previous attempt to use xmit_more, but the change had to be
>> reverted because under load sometimes a transmit timeout occurred [0].
>> Maybe this was caused by a missing memory barrier, the new attempt
>> keeps the memory barrier before the call to netif_stop_queue like it
>> is used by the driver as of today. The new attempt also changes the
>> order of some calls as suggested by Eric.
>>
>> [0] https://lkml.org/lkml/2019/2/10/39
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> 
> I decided to take one for the team and merged this into my 5.2.x tree (just
> fixing up the path) and it has been working fine for the last 2 weeks in two
> machines..until today, when for the first time in forever some random NFS traffic
> made this old friend come out from under the couch:
> 
> [Aug 8 14:13] ------------[ cut here ]------------
> [  +0.000006] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> [  +0.000021] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x21f/0x230
> [  +0.000001] Modules linked in: lz4 lz4_compress lz4_decompress nfsd auth_rpcgss oid_registry lockd grace sunrpc sch_fq_codel btrfs xor zstd_compress raid6_pq zstd_decompress bfq jitterentropy_rng nct6775 hwmon_vid coretemp hwmon x86_pkg_temp_thermal aesni_intel aes_x86_64 i915 glue_helper crypto_simd cryptd i2c_i801 intel_gtt i2c_algo_bit iosf_mbi drm_kms_helper syscopyarea usbhid sysfillrect r8169 sysimgblt fb_sys_fops realtek drm libphy drm_panel_orientation_quirks i2c_core video backlight mq_deadline
> [  +0.000026] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.2.7 #1
> [  +0.000001] Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 4105 07/01/2013
> [  +0.000004] RIP: 0010:dev_watchdog+0x21f/0x230
> [  +0.000002] Code: 3b 00 75 ea eb ad 4c 89 ef c6 05 1c 45 bd 00 01 e8 66 35 fc ff 44 89 e1 4c 89 ee 48 c7 c7 e8 5e fc 81 48 89 c2 e8 90 df 92 ff <0f> 0b eb 8e 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 66 66 66 66 90
> [  +0.000002] RSP: 0018:ffffc90000118e68 EFLAGS: 00010286
> [  +0.000002] RAX: 0000000000000000 RBX: ffff8887f7837600 RCX: 0000000000000303
> [  +0.000001] RDX: 0000000000000001 RSI: 0000000000000092 RDI: ffffffff827a488c
> [  +0.000001] RBP: ffff8887f9fbc440 R08: 0000000000000303 R09: 0000000000000003
> [  +0.000001] R10: 000000000001004c R11: 0000000000000001 R12: 0000000000000000
> [  +0.000009] R13: ffff8887f9fbc000 R14: ffffffff8173aa20 R15: dead000000000200
> [  +0.000001] FS:  0000000000000000(0000) GS:ffff8887ff580000(0000) knlGS:0000000000000000
> [  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  +0.000001] CR2: 00007f8d1c04d000 CR3: 0000000002209001 CR4: 00000000000606e0
> [  +0.000000] Call Trace:
> [  +0.000002]  <IRQ>
> [  +0.000005]  call_timer_fn+0x2b/0x120
> [  +0.000002]  expire_timers+0xa4/0x100
> [  +0.000001]  run_timer_softirq+0x8c/0x170
> [  +0.000002]  ? __hrtimer_run_queues+0x13a/0x290
> [  +0.000003]  ? sched_clock_cpu+0xe/0x130
> [  +0.000003]  __do_softirq+0xeb/0x2de
> [  +0.000003]  irq_exit+0x9d/0xe0
> [  +0.000002]  smp_apic_timer_interrupt+0x60/0x110
> [  +0.000003]  apic_timer_interrupt+0xf/0x20
> [  +0.000001]  </IRQ>
> [  +0.000003] RIP: 0010:cpuidle_enter_state+0xad/0x930
> [  +0.000001] Code: c5 66 66 66 66 90 31 ff e8 90 99 9e ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 39 08 00 00 31 ff e8 e7 26 a2 ff fb 45 85 e4 <0f> 88 34 02 00 00 49 63 cc 4c 2b 2c 24 48 8d 04 49 48 c1 e0 05 8b
> [  +0.000000] RSP: 0018:ffffc9000008be50 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
> [  +0.000001] RAX: ffff8887ff5a9180 RBX: ffffffff822b6c40 RCX: 000000000000001f
> [  +0.000001] RDX: 0000000000000000 RSI: 0000000033087154 RDI: 0000000000000000
> [  +0.000001] RBP: ffff8887ff5b1310 R08: 000030d021fae397 R09: ffff8887ff59c8c0
> [  +0.000000] R10: ffff8887ff59c8c0 R11: 0000000000000006 R12: 0000000000000004
> [  +0.000001] R13: 000030d021fae397 R14: 0000000000000004 R15: ffff8887fc281600
> [  +0.000001]  cpuidle_enter+0x29/0x40
> [  +0.000002]  do_idle+0x1e5/0x280
> [  +0.000001]  cpu_startup_entry+0x19/0x20
> [  +0.000002]  start_secondary+0x186/0x1c0
> [  +0.000001]  secondary_startup_64+0xa4/0xb0
> [  +0.000001] ---[ end trace 99493c768580f4fd ]---
> 
> The device is:
> 
> Aug  7 23:19:09 tux kernel: libphy: r8169: probed
> Aug  7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, c8:60:00:68:33:cc, XID 2c9, IRQ 36
> Aug  7 23:19:09 tux kernel: r8169 0000:04:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> Aug  7 23:19:12 tux kernel: RTL8211E Gigabit Ethernet r8169-400:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-400:00, irq=IGNORE)
> Aug  7 23:19:13 tux kernel: r8169 0000:04:00.0 eth0: No native access to PCI extended config space, falling back to CSI
> 
> and using fq_codel, of course.
> 
> This cpuidle hiccup used to be completely gone without xmit_more and this was
> the first (and so far only) time since merging it (regardless of load).
> Also, while I'm using BMQ as CPU scheduler, that hasn't made a difference for
> this particular problem in the past (with MuQSS/PDS) either; way back when I had
> Eric's previous attempt(s) it also hiccupped with CFS.
> 
> Revert or wait for more reports when -next is merged in 5.4?

Another question/data point: I've had the whole basket of offloads activated:

   ethtool --offload eth0 rx on tx on gro on gso on sg on tso on

and this caused zero problems without the xmit_more patch. However I just saw
that net-next has a patch where TSO is disabled due to a known HW defect in
RTL8168evl, which is of course what I have. Could this be the reason for the
stall/hiccup when xmit_more has its fingers in the pie? I kind of know what
xmit_more does, just not how it could interact with a possibly broken TSO that
nevertheless seems to work fine otherwise..

thanks
Holger

^ permalink raw reply

* Re: [bpf-next v2 PATCH 2/3] samples/bpf: make xdp_fwd more practically usable via devmap lookup
From: Jesper Dangaard Brouer @ 2019-08-08 15:58 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, brouer
In-Reply-To: <156527920658.20297.5634629362034006298.stgit@firesoul>

On Thu, 08 Aug 2019 17:46:46 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> @@ -103,8 +112,17 @@ int main(int argc, char **argv)
>  			return 1;
>  		}
>  
> -		if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
> +		err = bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd);
> +		if (err) {
> +			if (err == -22) {

Darn - I forgot this, need to be changed.
I'll send a V3.

> +				printf("Does kernel support devmap lookup?\n");
> +				/* If not, the error message will be:
> +				 * "cannot pass map_type 14 into func
> +				 * bpf_map_lookup_elem#1"
> +				 */
> +			}
>  			return 1;



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net v3] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: Willem de Bruijn @ 2019-08-08 15:59 UTC (permalink / raw)
  To: Jakub Kicinski, John Fastabend
  Cc: David Miller, Network Development, davejwatson, borisp, aviadye,
	Daniel Borkmann, Eric Dumazet, Alexei Starovoitov, oss-drivers
In-Reply-To: <20190808000359.20785-1-jakub.kicinski@netronome.com>

On Wed, Aug 7, 2019 at 8:04 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> sk_validate_xmit_skb() and drivers depend on the sk member of
> struct sk_buff to identify segments requiring encryption.
> Any operation which removes or does not preserve the original TLS
> socket such as skb_orphan() or skb_clone() will cause clear text
> leaks.
>
> Make the TCP socket underlying an offloaded TLS connection
> mark all skbs as decrypted, if TLS TX is in offload mode.
> Then in sk_validate_xmit_skb() catch skbs which have no socket
> (or a socket with no validation) and decrypted flag set.
>
> Note that CONFIG_SOCK_VALIDATE_XMIT, CONFIG_TLS_DEVICE and
> sk->sk_validate_xmit_skb are slightly interchangeable right now,
> they all imply TLS offload. The new checks are guarded by
> CONFIG_TLS_DEVICE because that's the option guarding the
> sk_buff->decrypted member.
>
> Second, smaller issue with orphaning is that it breaks
> the guarantee that packets will be delivered to device
> queues in-order. All TLS offload drivers depend on that
> scheduling property. This means skb_orphan_partial()'s
> trick of preserving partial socket references will cause
> issues in the drivers. We need a full orphan, and as a
> result netem delay/throttling will cause all TLS offload
> skbs to be dropped.
>
> Reusing the sk_buff->decrypted flag also protects from
> leaking clear text when incoming, decrypted skb is redirected
> (e.g. by TC).
>
> See commit 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect
> through ULP") for justification why the internal flag is safe.
> The only location which could leak the flag in is tcp_bpf_sendmsg(),
> which is taken care of by clearing the previously unused bit.
>
> v2:
>  - remove superfluous decrypted mark copy (Willem);
>  - remove the stale doc entry (Boris);
>  - rely entirely on EOR marking to prevent coalescing (Boris);
>  - use an internal sendpages flag instead of marking the socket
>    (Boris).
> v3 (Willem):
>  - reorganize the can_skb_orphan_partial() condition;
>  - fix the flag leak-in through tcp_bpf_sendmsg.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---


> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 3d1e15401384..8a56e09cfb0e 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -398,10 +398,14 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
>  static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
>  {
>         struct sk_msg tmp, *msg_tx = NULL;
> -       int flags = msg->msg_flags | MSG_NO_SHARED_FRAGS;
>         int copied = 0, err = 0;
>         struct sk_psock *psock;
>         long timeo;
> +       int flags;
> +
> +       /* Don't let internal do_tcp_sendpages() flags through */
> +       flags = (msg->msg_flags & ~MSG_SENDPAGE_DECRYPTED);
> +       flags |= MSG_NO_SHARED_FRAGS;

Not for this patch, but for tcp_bpf itself: should this more
aggressively filter flags?

Both those that are valid in sendmsg, but not expected in sendpage,
and other internal flags passed to sendpage, but should never be
passable from userspace.

> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index 7c0b2b778703..43922d86e510 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -373,9 +373,9 @@ static int tls_push_data(struct sock *sk,
>         struct tls_context *tls_ctx = tls_get_ctx(sk);
>         struct tls_prot_info *prot = &tls_ctx->prot_info;
>         struct tls_offload_context_tx *ctx = tls_offload_ctx_tx(tls_ctx);
> -       int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
>         int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE);
>         struct tls_record_info *record = ctx->open_record;
> +       int tls_push_record_flags;
>         struct page_frag *pfrag;
>         size_t orig_size = size;
>         u32 max_open_record_len;
> @@ -390,6 +390,9 @@ static int tls_push_data(struct sock *sk,
>         if (sk->sk_err)
>                 return -sk->sk_err;
>
> +       flags |= MSG_SENDPAGE_DECRYPTED;
> +       tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
> +

Without being too familiar with this code: can this plaintext flag be
set once, closer to the call to do_tcp_sendpages, in tls_push_sg?

Instead of two locations with multiple non-trivial codepaths between
them and do_tcp_sendpages.

Or are there paths where the flag is not set? Which I imagine would
imply already passing s/w encrypted ciphertext.

>         timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
>         if (tls_is_partially_sent_record(tls_ctx)) {
>                 rc = tls_push_partial_record(sk, tls_ctx, flags);
> @@ -576,7 +579,9 @@ void tls_device_write_space(struct sock *sk, struct tls_context *ctx)
>                 gfp_t sk_allocation = sk->sk_allocation;
>
>                 sk->sk_allocation = GFP_ATOMIC;
> -               tls_push_partial_record(sk, ctx, MSG_DONTWAIT | MSG_NOSIGNAL);
> +               tls_push_partial_record(sk, ctx,
> +                                       MSG_DONTWAIT | MSG_NOSIGNAL |
> +                                       MSG_SENDPAGE_DECRYPTED);
>                 sk->sk_allocation = sk_allocation;
>         }
>  }
> --
> 2.21.0
>

^ permalink raw reply

* [bpf-next v3 PATCH 0/3] bpf: improvements to xdp_fwd sample
From: Jesper Dangaard Brouer @ 2019-08-08 16:17 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, Jesper Dangaard Brouer

V3: Hopefully fixed all issues point out by Yonghong Song

V2: Addressed issues point out by Yonghong Song
 - Please ACK patch 2/3 again
 - Added ACKs and reviewed-by to other patches

This patchset is focused on improvements for XDP forwarding sample
named xdp_fwd, which leverage the existing FIB routing tables as
described in LPC2018[1] talk by David Ahern.

The primary motivation is to illustrate how Toke's recent work
improves usability of XDP_REDIRECT via lookups in devmap. The other
patches are to help users understand the sample.

I have more improvements to xdp_fwd, but those might requires changes
to libbpf.  Thus, sending these patches first as they are isolated.

[1] http://vger.kernel.org/lpc-networking2018.html#session-1

---

Jesper Dangaard Brouer (3):
      samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
      samples/bpf: make xdp_fwd more practically usable via devmap lookup
      samples/bpf: xdp_fwd explain bpf_fib_lookup return codes

 samples/bpf/xdp_fwd_kern.c |   39 ++++++++++++++++++++++++++++++---------
 samples/bpf/xdp_fwd_user.c |   35 +++++++++++++++++++++++------------
 2 files changed, 53 insertions(+), 21 deletions(-)

--

^ permalink raw reply

* [bpf-next v3 PATCH 1/3] samples/bpf: xdp_fwd rename devmap name to be xdp_tx_ports
From: Jesper Dangaard Brouer @ 2019-08-08 16:17 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528102557.22124.261409336813472418.stgit@firesoul>

The devmap name 'tx_port' came from a copy-paste from xdp_redirect_map
which only have a single TX port. Change name to xdp_tx_ports
to make it more descriptive.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/xdp_fwd_kern.c |    5 +++--
 samples/bpf/xdp_fwd_user.c |    2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index a7e94e7ff87d..e6ffc4ea06f4 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -23,7 +23,8 @@
 
 #define IPV6_FLOWINFO_MASK              cpu_to_be32(0x0FFFFFFF)
 
-struct bpf_map_def SEC("maps") tx_port = {
+/* For TX-traffic redirect requires net_device ifindex to be in this devmap */
+struct bpf_map_def SEC("maps") xdp_tx_ports = {
 	.type = BPF_MAP_TYPE_DEVMAP,
 	.key_size = sizeof(int),
 	.value_size = sizeof(int),
@@ -117,7 +118,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 
 		memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
 		memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
-		return bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
+		return bpf_redirect_map(&xdp_tx_ports, fib_params.ifindex, 0);
 	}
 
 	return XDP_PASS;
diff --git a/samples/bpf/xdp_fwd_user.c b/samples/bpf/xdp_fwd_user.c
index 5b46ee12c696..ba012d9f93dd 100644
--- a/samples/bpf/xdp_fwd_user.c
+++ b/samples/bpf/xdp_fwd_user.c
@@ -113,7 +113,7 @@ int main(int argc, char **argv)
 			return 1;
 		}
 		map_fd = bpf_map__fd(bpf_object__find_map_by_name(obj,
-								  "tx_port"));
+							"xdp_tx_ports"));
 		if (map_fd < 0) {
 			printf("map not found: %s\n", strerror(map_fd));
 			return 1;


^ permalink raw reply related

* [bpf-next v3 PATCH 2/3] samples/bpf: make xdp_fwd more practically usable via devmap lookup
From: Jesper Dangaard Brouer @ 2019-08-08 16:17 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528102557.22124.261409336813472418.stgit@firesoul>

This address the TODO in samples/bpf/xdp_fwd_kern.c, which points out
that the chosen egress index should be checked for existence in the
devmap. This can now be done via taking advantage of Toke's work in
commit 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF").

This change makes xdp_fwd more practically usable, as this allows for
a mixed environment, where IP-forwarding fallback to network stack, if
the egress device isn't configured to use XDP.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
---
 samples/bpf/xdp_fwd_kern.c |   17 +++++++++++------
 samples/bpf/xdp_fwd_user.c |   33 ++++++++++++++++++++++-----------
 2 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index e6ffc4ea06f4..a43d6953c054 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -104,13 +104,18 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 
 	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
 
-	/* verify egress index has xdp support
-	 * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
-	 *       cannot pass map_type 14 into func bpf_map_lookup_elem#1:
-	 * NOTE: without verification that egress index supports XDP
-	 *       forwarding packets are dropped.
-	 */
 	if (rc == 0) {
+		/* Verify egress index has been configured as TX-port.
+		 * (Note: User can still have inserted an egress ifindex that
+		 * doesn't support XDP xmit, which will result in packet drops).
+		 *
+		 * Note: lookup in devmap supported since 0cdbb4b09a0.
+		 * If not supported will fail with:
+		 *  cannot pass map_type 14 into func bpf_map_lookup_elem#1:
+		 */
+		if (!bpf_map_lookup_elem(&xdp_tx_ports, &fib_params.ifindex))
+			return XDP_PASS;
+
 		if (h_proto == htons(ETH_P_IP))
 			ip_decrease_ttl(iph);
 		else if (h_proto == htons(ETH_P_IPV6))
diff --git a/samples/bpf/xdp_fwd_user.c b/samples/bpf/xdp_fwd_user.c
index ba012d9f93dd..97ff1dad7669 100644
--- a/samples/bpf/xdp_fwd_user.c
+++ b/samples/bpf/xdp_fwd_user.c
@@ -27,14 +27,20 @@
 #include "libbpf.h"
 #include <bpf/bpf.h>
 
-
-static int do_attach(int idx, int fd, const char *name)
+static int do_attach(int idx, int prog_fd, int map_fd, const char *name)
 {
 	int err;
 
-	err = bpf_set_link_xdp_fd(idx, fd, 0);
-	if (err < 0)
+	err = bpf_set_link_xdp_fd(idx, prog_fd, 0);
+	if (err < 0) {
 		printf("ERROR: failed to attach program to %s\n", name);
+		return err;
+	}
+
+	/* Adding ifindex as a possible egress TX port */
+	err = bpf_map_update_elem(map_fd, &idx, &idx, 0);
+	if (err)
+		printf("ERROR: failed using device %s as TX-port\n", name);
 
 	return err;
 }
@@ -47,6 +53,9 @@ static int do_detach(int idx, const char *name)
 	if (err < 0)
 		printf("ERROR: failed to detach program from %s\n", name);
 
+	/* TODO: Remember to cleanup map, when adding use of shared map
+	 *  bpf_map_delete_elem((map_fd, &idx);
+	 */
 	return err;
 }
 
@@ -67,10 +76,10 @@ int main(int argc, char **argv)
 	};
 	const char *prog_name = "xdp_fwd";
 	struct bpf_program *prog;
+	int prog_fd, map_fd = -1;
 	char filename[PATH_MAX];
 	struct bpf_object *obj;
 	int opt, i, idx, err;
-	int prog_fd, map_fd;
 	int attach = 1;
 	int ret = 0;
 
@@ -103,8 +112,14 @@ int main(int argc, char **argv)
 			return 1;
 		}
 
-		if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
+		err = bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd);
+		if (err) {
+			printf("Does kernel support devmap lookup?\n");
+			/* If not, the error message will be:
+			 *  "cannot pass map_type 14 into func bpf_map_lookup_elem#1"
+			 */
 			return 1;
+		}
 
 		prog = bpf_object__find_program_by_title(obj, prog_name);
 		prog_fd = bpf_program__fd(prog);
@@ -119,10 +134,6 @@ int main(int argc, char **argv)
 			return 1;
 		}
 	}
-	if (attach) {
-		for (i = 1; i < 64; ++i)
-			bpf_map_update_elem(map_fd, &i, &i, 0);
-	}
 
 	for (i = optind; i < argc; ++i) {
 		idx = if_nametoindex(argv[i]);
@@ -138,7 +149,7 @@ int main(int argc, char **argv)
 			if (err)
 				ret = err;
 		} else {
-			err = do_attach(idx, prog_fd, argv[i]);
+			err = do_attach(idx, prog_fd, map_fd, argv[i]);
 			if (err)
 				ret = err;
 		}


^ permalink raw reply related

* [bpf-next v3 PATCH 3/3] samples/bpf: xdp_fwd explain bpf_fib_lookup return codes
From: Jesper Dangaard Brouer @ 2019-08-08 16:17 UTC (permalink / raw)
  To: netdev, Daniel Borkmann, Alexei Starovoitov
  Cc: a.s.protopopov, dsahern, Toke Høiland-Jørgensen,
	ys114321, Jesper Dangaard Brouer
In-Reply-To: <156528102557.22124.261409336813472418.stgit@firesoul>

Make it clear that this XDP program depend on the network
stack to do the ARP resolution.  This is connected with the
BPF_FIB_LKUP_RET_NO_NEIGH return code from bpf_fib_lookup().

Another common mistake (seen via XDP-tutorial) is that users
don't realize that sysctl net.ipv{4,6}.conf.all.forwarding
setting is honored by bpf_fib_lookup.

Reported-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/xdp_fwd_kern.c |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index a43d6953c054..701a30f258b1 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -103,8 +103,23 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 	fib_params.ifindex = ctx->ingress_ifindex;
 
 	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
-
-	if (rc == 0) {
+	/*
+	 * Some rc (return codes) from bpf_fib_lookup() are important,
+	 * to understand how this XDP-prog interacts with network stack.
+	 *
+	 * BPF_FIB_LKUP_RET_NO_NEIGH:
+	 *  Even if route lookup was a success, then the MAC-addresses are also
+	 *  needed.  This is obtained from arp/neighbour table, but if table is
+	 *  (still) empty then BPF_FIB_LKUP_RET_NO_NEIGH is returned.  To avoid
+	 *  doing ARP lookup directly from XDP, then send packet to normal
+	 *  network stack via XDP_PASS and expect it will do ARP resolution.
+	 *
+	 * BPF_FIB_LKUP_RET_FWD_DISABLED:
+	 *  The bpf_fib_lookup respect sysctl net.ipv{4,6}.conf.all.forwarding
+	 *  setting, and will return BPF_FIB_LKUP_RET_FWD_DISABLED if not
+	 *  enabled this on ingress device.
+	 */
+	if (rc == BPF_FIB_LKUP_RET_SUCCESS) {
 		/* Verify egress index has been configured as TX-port.
 		 * (Note: User can still have inserted an egress ifindex that
 		 * doesn't support XDP xmit, which will result in packet drops).


^ permalink raw reply related

* RE: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: Weiny, Ira @ 2019-08-08 16:25 UTC (permalink / raw)
  To: John Hubbard, Michal Hocko
  Cc: Jan Kara, Matthew Wilcox, Andrew Morton, Christoph Hellwig,
	Williams, Dan J, Dave Chinner, Dave Hansen, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx@lists.freedesktop.org,
	ceph-devel@vger.kernel.org, devel@driverdev.osuosl.org,
	devel@lists.orangefs.org, dri-devel@lists.freedesktop.org,
	intel-gfx@lists.freedesktop.org, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-block@vger.kernel.org,
	linux-crypto@vger.kernel.org, linux-fbdev@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-media@vger.kernel.org,
	linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-rpi-kernel@lists.infradead.org,
	linux-xfs@vger.kernel.org, netdev@vger.kernel.org,
	rds-devel@oss.oracle.com, sparclinux@vger.kernel.org,
	x86@kernel.org, xen-devel@lists.xenproject.org
In-Reply-To: <e648a7f3-6a1b-c9ea-1121-7ab69b6b173d@nvidia.com>

> 
> On 8/7/19 7:36 PM, Ira Weiny wrote:
> > On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
> >> On Wed 07-08-19 10:37:26, Jan Kara wrote:
> >>> On Fri 02-08-19 12:14:09, John Hubbard wrote:
> >>>> On 8/2/19 7:52 AM, Jan Kara wrote:
> >>>>> On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
> >>>>>> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
> >>>>>>> On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> >>>>>>>> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
>   [...]
> > Before I go on, I would like to say that the "imbalance" of
> > get_user_pages() and put_page() bothers me from a purist standpoint...
> > However, since this discussion cropped up I went ahead and ported my
> > work to Linus' current master
> > (5.3-rc3+) and in doing so I only had to steal a bit of Johns code...
> > Sorry John...  :-(
> >
> > I don't have the commit messages all cleaned up and I know there may
> > be some discussion on these new interfaces but I wanted to throw this
> > series out there because I think it may be what Jan and Michal are
> > driving at (or at least in that direction.
> >
> > Right now only RDMA and DAX FS's are supported.  Other users of GUP
> > will still fail on a DAX file and regular files will still be at
> > risk.[2]
> >
> > I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
> >
> > https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
> >
> > I think the most relevant patch to this conversation is:
> >
> > https://github.com/weiny2/linux-
> kernel/commit/5d377653ba5cf11c3b716f90
> > 4b057bee6641aaf6
> >
> 
> ohhh...can you please avoid using the old __put_user_pages_dirty()
> function? 

Agreed... I did not like that.  Part of the reason I did not post this is I'm still trying to figure out what has landed and what I can and can't depend on.

For example, Christoph H. was proposing changes to some of the GUP calls which may conflict.  But I'm not sure his changes are moving forward.  So rather than waiting for the dust to settle I decided to see how hard it would be to get this rebased against mainline and working.  Turns out it was not too hard.

I think that is because, as time has moved on it seems that, for some users such as RDMA, a simple put_user_page() is not going to be sufficient.  We need something else to allow GUP to keep track of the file pins as we discussed.  So I'm starting to think some of this could go in at the same time.

> I thought I'd caught things early enough to get away with the
> rename and deletion of that. You could either:
> 
> a) open code an implementation of vaddr_put_pages_dirty_lock() that
> doesn't call any of the *put_user_pages_dirty*() variants, or
> 
> b) include my first patch ("") are part of your series, or
> 
> c) base this on Andrews's tree, which already has merged in my first patch.
> 

Yep I can do this.  I did not realize that Andrew had accepted any of this work.  I'll check out his tree.  But I don't think he is going to accept this series through his tree.  So what is the ETA on that landing in Linus' tree?

To that point I'm still not sure who would take all this as I am now touching mm, procfs, rdma, ext4, and xfs.

I just thought I would chime in with my progress because I'm to a point where things are working and so I can submit the code but I'm not sure what I can/should depend on landing...  Also, now that 0day has run overnight it has found issues with this rebase so I need to clean those up...  Perhaps I will base on Andrew's tree prior to doing that...

Thanks,
Ira

> 
> thanks,
> --
> John Hubbard
> NVIDIA

^ permalink raw reply

* Re: [net] ixgbe: fix possible deadlock in ixgbe_service_task()
From: Jeff Kirsher @ 2019-08-08 16:36 UTC (permalink / raw)
  To: Taehee Yoo, David Miller; +Cc: Netdev, nhorman, sassmann, andrewx.bowers
In-Reply-To: <CAMArcTXWBHUpy+p18UJ6RZm2W=vhnLRezste=kHSSv=dyd0kBA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

On Wed, 2019-08-07 at 15:08 +0900, Taehee Yoo wrote:
> On Wed, 7 Aug 2019 at 08:36, David Miller <davem@davemloft.net>
> wrote:
> 
> Hi David
> Thank you for the review!
> 
> > From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > Date: Mon,  5 Aug 2019 13:04:03 -0700
> > 
> > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > index cbaf712d6529..3386e752e458 100644
> > > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > @@ -7898,9 +7898,7 @@ static void ixgbe_service_task(struct
> > > work_struct *work)
> > >        }
> > >        if (ixgbe_check_fw_error(adapter)) {
> > >                if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
> > > -                     rtnl_lock();
> > >                        unregister_netdev(adapter->netdev);
> > > -                     rtnl_unlock();
> > >                }
> > 
> > Please remove the (now unnecessary) curly braces for this basic
> > block.
> > 
> 
> I will send a v2 patch.
> Thank you!

I have already created a v2 on your behalf Taechee and will submit to
Dave here shortly.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox