Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] net/smc: fix error return code in smc_setsockopt()
From: David Miller @ 2018-06-03 14:39 UTC (permalink / raw)
  To: weiyongjun1; +Cc: ubraun, linux-s390, netdev, kernel-janitors
In-Reply-To: <1527733882-149144-1-git-send-email-weiyongjun1@huawei.com>

From: Wei Yongjun <weiyongjun1@huawei.com>
Date: Thu, 31 May 2018 02:31:22 +0000

> Fix to return error code -EINVAL instead of 0 if optlen is invalid.
> 
> Fixes: 01d2f7e2cdd3 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Although the TCP code should be checking this in the previous lines,
it's not good practice to depend so tightly upon that.

And it makes this code easier to audit if the check exists here
explicitly too.

So I'll apply this, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: netcp: ethss: remove unnecessary pointer set to NULL
From: David Miller @ 2018-06-03 14:40 UTC (permalink / raw)
  To: yuehaibing; +Cc: w-kwok2, m-karicheri2, netdev, linux-kernel
In-Reply-To: <20180531034848.23080-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Thu, 31 May 2018 11:48:48 +0800

> If statement has make sure the 'slave->phy' is NULL
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Looks good, applied.

^ permalink raw reply

* Re: [PATCH net] net: ipv6: prevent use after free in ip6_route_mpath_notify()
From: David Ahern @ 2018-06-03 14:40 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller; +Cc: netdev, Eric Dumazet
In-Reply-To: <4b46d531-904b-6e5f-67ce-a275f0826d47@cumulusnetworks.com>

On 6/3/18 8:01 AM, David Ahern wrote:
> Is there a reproducer for the syzbot case?

One reproducer is to insert a route and then add a multipath route that
has a duplicate nexthop.e.g,:

ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2

ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4
nexthop via 2001:db8:1::2

Current net and next-next generates the trace; with the fix I proposed I
don't see it on either branch and I do see the expected notifications to
userspace.

^ permalink raw reply

* Re: [PATCH net-next] net/ncsi: Avoid GFP_KERNEL in response handler
From: David Miller @ 2018-06-03 14:42 UTC (permalink / raw)
  To: sam; +Cc: netdev, linux-kernel, openbmc
In-Reply-To: <20180531070254.28878-1-sam@mendozajonas.com>

From: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Date: Thu, 31 May 2018 17:02:54 +1000

> ncsi_rsp_handler_gc() allocates the filter arrays using GFP_KERNEL in
> softirq context, causing the below backtrace. This allocation is only a
> few dozen bytes during probing so allocate with GFP_ATOMIC instead.
 ...
> Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>

Applied with Fixes: tag added, thanks.

^ permalink raw reply

* Re: [PATCH net] net: ipv6: prevent use after free in ip6_route_mpath_notify()
From: David Ahern @ 2018-06-03 14:46 UTC (permalink / raw)
  To: Eric Dumazet, Eric Dumazet, David S . Miller; +Cc: netdev
In-Reply-To: <4dfbdd4b-947b-bbf7-27f3-abbd48a817b4@gmail.com>

On 6/3/18 8:31 AM, Eric Dumazet wrote:
> 
> 
> On 06/03/2018 07:01 AM, David Ahern wrote:
>> On 6/3/18 7:35 AM, Eric Dumazet wrote:
>>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>>> index f4d61736c41abe8cd7f439c4a37100e90c1eacca..830eefdbdb6734eb81ea0322fb6077ee20be1889 100644
>>> --- a/net/ipv6/route.c
>>> +++ b/net/ipv6/route.c
>>> @@ -4263,7 +4263,9 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
>>>  
>>>  	err_nh = NULL;
>>>  	list_for_each_entry(nh, &rt6_nh_list, next) {
>>> +		dst_release(&rt_last->dst);
>>>  		rt_last = nh->rt6_info;
>>> +		dst_hold(&rt_last->dst);
>>>  		err = __ip6_ins_rt(nh->rt6_info, info, &nh->mxc, extack);
>>>  		/* save reference to first route for notification */
>>>  		if (!rt_notif && !err)
>>> @@ -4317,7 +4319,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
>>>  		list_del(&nh->next);
>>>  		kfree(nh);
>>>  	}
>>> -
>>> +	dst_release(&rt_last->dst);
>>>  	return err;
>>>  }
>>
>> Since the rtnl lock is held, a successfully inserted route can not be
>> removed until ip6_route_multipath_add finishes. This is a simpler change
>> that works with net-next as well:
> 
> Your patch changes the intent of your original commit.
> 
> It seems you wanted rt_last to point to the last attempted insertion,
> not the last successful one ?

The note in ip6_route_mpath_notify explains it:

        /* if this is an APPEND route, then rt points to the first route
         * inserted and rt_last points to last route inserted. Userspace

> 
> Or have I misunderstood, and not only we had a use-after-free, but also
> a semantic error ?

It was a mistake to set rt_last before checking err. So the
use-after-free exposed the semantic error.

^ permalink raw reply

* Re: [PATCH net-next] net: axienet: remove stale comment of axienet_open
From: David Miller @ 2018-06-03 14:59 UTC (permalink / raw)
  To: yuehaibing; +Cc: anirudh, John.Linn, netdev, linux-kernel, michal.simek
In-Reply-To: <20180531115115.11920-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Thu, 31 May 2018 19:51:15 +0800

> axienet_open no longer return -ENODEV when PHY cannot be connected to
> since commit d7cc3163e026 ("net: axienet: Support phy-less mode of operation")
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH net v2] ipv6: omit traffic class when calculating flow hash
From: David Ahern @ 2018-06-03 15:00 UTC (permalink / raw)
  To: Michal Kubecek, David S. Miller
  Cc: netdev, linux-kernel, Nicolas Dichtel, Tom Herbert, Ido Schimmel
In-Reply-To: <20180602080528.54B27A0C48@unicorn.suse.cz>

On 6/2/18 1:40 AM, Michal Kubecek wrote:
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 836f31af1369..7fbdc3e9e25d 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -906,6 +906,11 @@ static inline __be32 ip6_make_flowinfo(unsigned int tclass, __be32 flowlabel)
>  	return htonl(tclass << IPV6_TCLASS_SHIFT) | flowlabel;
>  }
>  
> +static inline u32 flowi6_get_flowlabel(const struct flowi6 *fl6)
> +{
> +	return (__force u32)(fl6->flowlabel & IPV6_FLOWLABEL_MASK);
> +}
> +
>  /*
>   *	Prototypes exported by ipv6
>   */

discussing the fix for net-next and making the label vs info consistent,
Michal notes a few places where this helper is needed as a __be32, so
the typecast should be outside of this helper.

^ permalink raw reply

* Re: [PATCH] vlan: use non-archaic spelling of failes
From: David Miller @ 2018-06-03 15:02 UTC (permalink / raw)
  To: cascardo; +Cc: netdev
In-Reply-To: <20180531122020.9225-1-cascardo@canonical.com>

From: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Date: Thu, 31 May 2018 09:20:20 -0300

> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net] mlx4_core: restore optimal ICM memory allocation
From: David Miller @ 2018-06-03 15:02 UTC (permalink / raw)
  To: edumazet
  Cc: netdev, eric.dumazet, jsperbeck, tarick, qing.huang, danielj,
	yanjun.zhu
In-Reply-To: <20180531125224.97098-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 31 May 2018 05:52:24 -0700

> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> brought two regressions caught in our regression suite.
> 
> The big one is an additional cost of 256 bytes of overhead per 4096 bytes,
> or 6.25 % which is unacceptable since ICM can be pretty large.
> 
> This comes from having to allocate one struct mlx4_icm_chunk (256 bytes)
> per MLX4_TABLE_CHUNK, which the buggy commit shrank to 4KB
> (instead of prior 256KB)
> 
> Note that mlx4_alloc_icm() is already able to try high order allocations
> and fallback to low-order allocations under high memory pressure.
> 
> Most of these allocations happen right after boot time, when we get
> plenty of non fragmented memory, there is really no point being so
> pessimistic and break huge pages into order-0 ones just for fun.
> 
> We only have to tweak gfp_mask a bit, to help falling back faster,
> without risking OOM killings.
> 
> Second regression is an KASAN fault, that will need further investigations.
> 
> Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Tariq Toukan <tariqt@mellanox.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: pull-request: wireless-drivers-next 2018-05-31
From: David Miller @ 2018-06-03 15:03 UTC (permalink / raw)
  To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <877enj29x4.fsf@kamboji.qca.qualcomm.com>

From: Kalle Valo <kvalo@codeaurora.org>
Date: Thu, 31 May 2018 17:10:15 +0300

> here's a pull request to net-next tree for 4.18. More info below and
> please let me know if there are any problems.

Pulled, thanks Kalle.

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/11] Misc BPF improvements
From: Alexei Starovoitov @ 2018-06-03 15:08 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: netdev
In-Reply-To: <20180602210641.6163-1-daniel@iogearbox.net>

On Sat, Jun 02, 2018 at 11:06:30PM +0200, Daniel Borkmann wrote:
> This set adds various patches I still had in my queue, first two
> are test cases to provide coverage for the recent two fixes that
> went to bpf tree, then a small improvement on the error message
> for gpl helpers. Next, we expose prog and map id into fdinfo in
> order to allow for inspection of these objections currently used
> in applications. Patch after that removes a retpoline call for
> map lookup/update/delete helpers. A new helper is added in the
> subsequent patch to lookup the skb's socket's cgroup v2 id which
> can be used in an efficient way for e.g. lookups on egress side.
> Next one is a fix to fully clear state info in tunnel/xfrm helpers.
> Given this is full cap_sys_admin from init ns and has same priv
> requirements like tracing, bpf-next should be okay. A small bug
> fix for bpf_asm follows, and next a fix for context access in
> tracing which was recently reported. Lastly, a small update in
> the maintainer's file to add patchwork url and missing files.
> 
> Thanks!
> 
> v2 -> v3:
>   - Noticed a merge artefact inside uapi header comment, sigh,
>     fixed now.
> v1 -> v2:
>   - minor fix in getting context access work on 32 bit for tracing
>   - add paragraph to uapi helper doc to better describe kernel
>     build deps for cggroup helper

Applied, Thanks Daniel.
fixed up commit log s/bpftool p d x i/bpftool prog dump xlated id/
while applying, since it was indeed a bit cryptic.

^ permalink raw reply

* [PATCH bpf-next] bpf: flowlabel in bpf_fib_lookup should be flowinfo
From: dsahern @ 2018-06-03 15:15 UTC (permalink / raw)
  To: netdev, borkmann, ast; +Cc: David Ahern, Michal Kubecek

From: David Ahern <dsahern@gmail.com>

As Michal noted the flow struct takes both the flow label and priority.
Update the bpf_fib_lookup API to note that it is flowinfo and not just
the flow label.

Cc: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/uapi/linux/bpf.h   | 2 +-
 net/core/filter.c          | 2 +-
 samples/bpf/xdp_fwd_kern.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f0b6608b1f1c..5ef032bc4746 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2623,7 +2623,7 @@ struct bpf_fib_lookup {
 	union {
 		/* inputs to lookup */
 		__u8	tos;		/* AF_INET  */
-		__be32	flowlabel;	/* AF_INET6 */
+		__be32	flowinfo;	/* AF_INET6, flow_label + priority */
 
 		/* output: metric of fib result (IPv4/IPv6 only) */
 		__u32	rt_metric;
diff --git a/net/core/filter.c b/net/core/filter.c
index 28e864777c0f..704d515de2df 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4222,7 +4222,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 		fl6.flowi6_oif = 0;
 		strict = RT6_LOOKUP_F_HAS_SADDR;
 	}
-	fl6.flowlabel = params->flowlabel;
+	fl6.flowlabel = params->flowinfo;
 	fl6.flowi6_scope = 0;
 	fl6.flowi6_flags = 0;
 	fl6.mp_hash = 0;
diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
index 4a6be0f87505..6673cdb9f55c 100644
--- a/samples/bpf/xdp_fwd_kern.c
+++ b/samples/bpf/xdp_fwd_kern.c
@@ -88,7 +88,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
 			return XDP_PASS;
 
 		fib_params.family	= AF_INET6;
-		fib_params.flowlabel	= *(__be32 *)ip6h & IPV6_FLOWINFO_MASK;
+		fib_params.flowinfo	= *(__be32 *)ip6h & IPV6_FLOWINFO_MASK;
 		fib_params.l4_protocol	= ip6h->nexthdr;
 		fib_params.sport	= 0;
 		fib_params.dport	= 0;
-- 
2.11.0

^ permalink raw reply related

* Re: [bpf-next V2 PATCH 0/8] bpf/xdp: add flags argument to ndo_xdp_xmit and flag flush operation
From: Alexei Starovoitov @ 2018-06-03 15:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Daniel Borkmann, liu.song.a23, songliubraving,
	John Fastabend
In-Reply-To: <152775714013.24817.5067576840614810786.stgit@firesoul>

On Thu, May 31, 2018 at 10:59:42AM +0200, Jesper Dangaard Brouer wrote:
> As I mentioned in merge commit 10f678683e4 ("Merge branch 'xdp_xmit-bulking'")
> I plan to change the API for ndo_xdp_xmit once more, by adding a flags
> argument, which is done in this patchset.
> 
> I know it is late in the cycle (currently at rc7), but it would be
> nice to avoid changing NDOs over several kernel releases, as it is
> annoying to vendors and distro backporters, but it is not strictly
> UAPI so it is allowed (according to Alexei).
> 
> The end-goal is getting rid of the ndo_xdp_flush operation, as it will
> make it possible for drivers to implement a TXQ synchronization mechanism
> that is not necessarily derived from the CPU id (smp_processor_id).
> 
> This patchset removes all callers of the ndo_xdp_flush operation, but
> it doesn't take the last step of removing it from all drivers.  This
> can be done later, or I can update the patchset on request.
> 
> Micro-benchmarks only show a very small performance improvement, for
> map-redirect around ~2 ns, and for non-map redirect ~7 ns.  I've not
> benchmarked this with CONFIG_RETPOLINE, but the performance benefit
> should be more visible given we end-up removing an indirect call.
> 
> ---
> V2: Updated based on feedback from Song Liu <songliubraving@fb.com>

Applied, but please send a follow up patch to remove ndo_xdp_flush().
Otherwise this patch set is just a code churn that doing the opposite
of what you're trying to achieve and creating more backport pains.

^ permalink raw reply

* Re: [PATCH net] vrf: check the original netdevice for generating redirect
From: David Ahern @ 2018-06-03 15:31 UTC (permalink / raw)
  To: Stephen Suryaputra, netdev
In-Reply-To: <1527825921-17677-1-git-send-email-ssuryaextr@gmail.com>

On 5/31/18 10:05 PM, Stephen Suryaputra wrote:
> Use the right device to determine if redirect should be sent especially
> when using vrf. Same as well as when sending the redirect.
> 
> Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
> ---
>  net/ipv6/ip6_output.c | 3 ++-
>  net/ipv6/ndisc.c      | 6 ++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)

skb->dev in this path is set to the vrf device if applicable, so yes the
change is needed. Thanks for the fix.

Acked-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring
From: Wei Xu @ 2018-06-03 15:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, kvm, virtualization, netdev, linux-kernel, jfreimann,
	tiwei.bie
In-Reply-To: <12f2c455-5868-3b07-0eba-d49dcafd10f2@redhat.com>

On Thu, May 31, 2018 at 11:09:07AM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月30日 19:42, Wei Xu wrote:
> >>  /* This actually signals the guest, using eventfd. */
> >>  void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> >>  {
> >>@@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
> >>  				       struct vhost_virtqueue *vq)
> >>  {
> >>  	struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
> >>-	__virtio16 flags;
> >>+	__virtio16 flags = RING_EVENT_FLAGS_ENABLE;
> >>  	int ret;
> >>-	/* FIXME: disable notification through device area */
> >>+	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
> >>+		return false;
> >>+	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
> >'used_flags' was originally designed for 1.0, why should we pay attetion to it here?
> >
> >Wei
> 
> It was used to recored whether or not we've disabled notification. Then we
> can avoid unnecessary userspace writes or memory barriers.

OK, thanks.

> 
> Thanks

^ permalink raw reply

* Re: [PATCH 3/6] ravb: remove custom .set_link_ksettings from ethtool ops
From: Sergei Shtylyov @ 2018-06-03 15:42 UTC (permalink / raw)
  To: Vladimir Zapolskiy, David S. Miller; +Cc: netdev, linux-renesas-soc
In-Reply-To: <6f908ff0-254b-4378-27d3-5ff973328d88@mentor.com>

Hello!

   Sorry for the delay replying, the management keeps me busy... :-(

On 05/28/2018 12:51 PM, Vladimir Zapolskiy wrote:

>>> The change replaces a custom implementation of .set_link_ksettings
>>> callback with a shared phy_ethtool_set_link_ksettings(), this fixes
>>> sleep in atomic context bug, which is encountered every time when link
>>> settings are changed by ethtool.
>>
>>    Seeing it now...

   And to say that this is *fixed* by removing the custom method is err...
simply misleading. The sleep in atomic context is fixed solely by the removal
of the spinlock grabbing before the phylib call.

>>> Now duplex mode setting is enforced in ravb_adjust_link() only, also
>>> now TX/RX is disabled when link is put down or modifications to E-MAC
>>> registers ECMR and GECMR are expected for both cases of checked and
>>> ignored link status pin state from E-MAC interrupt handler.
>>>
>>> Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
>>> ---
>>>  drivers/net/ethernet/renesas/ravb_main.c | 58 +++++++++-----------------------
>>>  1 file changed, 15 insertions(+), 43 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
>>> index 3d91caa44176..0d811c02ff34 100644
>>> --- a/drivers/net/ethernet/renesas/ravb_main.c
>>> +++ b/drivers/net/ethernet/renesas/ravb_main.c
>>> @@ -980,6 +980,13 @@ static void ravb_adjust_link(struct net_device *ndev)
>>>  	struct ravb_private *priv = netdev_priv(ndev);
>>>  	struct phy_device *phydev = ndev->phydev;
>>>  	bool new_state = false;
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&priv->lock, flags);
>>> +
>>> +	/* Disable TX and RX right over here, if E-MAC change is ignored */
>>> +	if (priv->no_avb_link)
>>> +		ravb_rcv_snd_disable(ndev);
>>>  
>>>  	if (phydev->link) {
>>>  		if (phydev->duplex != priv->duplex) {
>>> @@ -997,18 +1004,21 @@ static void ravb_adjust_link(struct net_device *ndev)
>>>  			ravb_modify(ndev, ECMR, ECMR_TXF, 0);
>>>  			new_state = true;
>>>  			priv->link = phydev->link;
>>> -			if (priv->no_avb_link)
>>> -				ravb_rcv_snd_enable(ndev);
>>>  		}
>>>  	} else if (priv->link) {
>>>  		new_state = true;
>>>  		priv->link = 0;
>>>  		priv->speed = 0;
>>>  		priv->duplex = -1;
>>> -		if (priv->no_avb_link)
>>> -			ravb_rcv_snd_disable(ndev);
>>>  	}
>>>  
>>> +	/* Enable TX and RX right over here, if E-MAC change is ignored */
>>> +	if (priv->no_avb_link && phydev->link)
>>> +		ravb_rcv_snd_enable(ndev);
>>> +
>>> +	mmiowb();
>>> +	spin_unlock_irqrestore(&priv->lock, flags);
>>> +
>>
>>    I like this part. :-)
>>
> 
> A weight off my mind :) And I hope that this change will remain the less
> questionable one, other ones from the series are trivial.
> 
> Anyway I hope it is understandable that this part of the change can not
> be simply extracted from the rest one below, otherwise there'll be bugs of
> another type intorduced.

   I never said I'd like to apply this part alone, my idea was more like removing
the spinlock grabbing and the duplex handling down below.

[...]
>>> @@ -1096,44 +1106,6 @@ static int ravb_phy_start(struct net_device *ndev)
>>>  	return 0;
>>>  }
>>>  
>>> -static int ravb_set_link_ksettings(struct net_device *ndev,
>>> -				   const struct ethtool_link_ksettings *cmd)
>>> -{
>>> -	struct ravb_private *priv = netdev_priv(ndev);
>>> -	unsigned long flags;
>>> -	int error;
>>> -
>>> -	if (!ndev->phydev)
>>> -		return -ENODEV;
>>> -
>>> -	spin_lock_irqsave(&priv->lock, flags);
>>> -
>>> -	/* Disable TX and RX */
>>> -	ravb_rcv_snd_disable(ndev);
>>> -
>>> -	error = phy_ethtool_ksettings_set(ndev->phydev, cmd);
>>> -	if (error)
>>> -		goto error_exit;
>>> -
>>> -	if (cmd->base.duplex == DUPLEX_FULL)
>>> -		priv->duplex = 1;
>>> -	else
>>> -		priv->duplex = 0;
>>> -
>>> -	ravb_set_duplex(ndev);
>>> -
>>> -error_exit:
>>> -	mdelay(1);
>>> -
>>> -	/* Enable TX and RX */
>>> -	ravb_rcv_snd_enable(ndev);
>>> -
>>> -	mmiowb();
>>> -	spin_unlock_irqrestore(&priv->lock, flags);
>>> -
>>> -	return error;
>>> -}
>>> -
>>
>>    But this part is clearly lumping it all together... 
> 
> Please elaborate.

   My point is still that complete removal of the custom method was somewhat
premature and completely unnecessary for fixing the issues we have.

>> [...]
>>> @@ -1357,7 +1329,7 @@ static const struct ethtool_ops ravb_ethtool_ops = {
>>>  	.set_ringparam		= ravb_set_ringparam,
>>>  	.get_ts_info		= ravb_get_ts_info,
>>>  	.get_link_ksettings	= phy_ethtool_get_link_ksettings,
>>> -	.set_link_ksettings	= ravb_set_link_ksettings,
>>> +	.set_link_ksettings	= phy_ethtool_set_link_ksettings,
>>
>>    Should have been a part of the final patch in the fix/enhancement chain...
> 
> Please elaborate.
> 
> Do you mean that firstly I have to make erroneous ravb_set_link_ksettings()
> to look similar to phy_ethtool_set_link_ksettings() and then remove it?

   Yes.

> As I see it in the current context (removal of ravb_set_duplex() call and
> so on), the problem with this approach is that the actual fix change will
> be done on top of a number of enchancement changes, thus it contradicts to

   Now I have to ask you to elaborate. I have no idea what you mean. :-(

   And of course, sometimes the things are broken in a so subtle way, that
only as pile of "cleanups" fixed them, we had that situation in e.g. the
R-Car I2C driver -- *none* of AFAIR 9 patches was good as a -stable patch...

> the accepted development/maintenace model "fixes first", and most probably
> it won't be possible to backport the real fix, however this sole change can
> be backported.

   My idea was to move the [G]ECMR writes to the adjust_link() callback and
to stop grabbing the spinlock where it *was* grabbed in the same fix patch.
Then just a single clean up, to start using the new phylib method.

[...]
> --
> With best wishes,
> Vladimir

MBR, Sergei

^ permalink raw reply

* RE: [PATCH net-next] qed: Add srq core support for RoCE and iWARP
From: Bason, Yuval @ 2018-06-03 16:10 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: davem@davemloft.net, netdev@vger.kernel.org, jgg@mellanox.com,
	dledford@redhat.com, linux-rdma@vger.kernel.org, Kalderon, Michal,
	Elior, Ariel
In-Reply-To: <20180531173301.GV3697@mtr-leonro.mtl.com>

From: Leon Romanovsky [mailto:leon@kernel.org]
Sent: Thursday, May 31, 2018 8:33 PM
> On Wed, May 30, 2018 at 04:11:37PM +0300, Yuval Bason wrote:
> > This patch adds support for configuring SRQ and provides the necessary
> > APIs for rdma upper layer driver (qedr) to enable the SRQ feature.
> >
> > Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
> > Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
> > Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
> > ---
> >  drivers/net/ethernet/qlogic/qed/qed_cxt.c   |   5 +-
> >  drivers/net/ethernet/qlogic/qed/qed_cxt.h   |   1 +
> >  drivers/net/ethernet/qlogic/qed/qed_hsi.h   |   2 +
> >  drivers/net/ethernet/qlogic/qed/qed_iwarp.c |  23 ++++
> >  drivers/net/ethernet/qlogic/qed/qed_main.c  |   2 +
> >  drivers/net/ethernet/qlogic/qed/qed_rdma.c  | 179
> +++++++++++++++++++++++++++-
> >  drivers/net/ethernet/qlogic/qed/qed_rdma.h  |   2 +
> >  drivers/net/ethernet/qlogic/qed/qed_roce.c  |  17 ++-
> >  include/linux/qed/qed_rdma_if.h             |  12 +-
> >  9 files changed, 235 insertions(+), 8 deletions(-)
> >
> 
> ...
> 
> > +	struct qed_sp_init_data init_data;
> 
> ...
> 
> > +	memset(&init_data, 0, sizeof(init_data));
> 
> This patter is so common in this patch, why?
> 
> "struct qed_sp_init_data init_data = {};" will do the trick.
> 
Thanks for pointing out, will be fixed in v2.

> Thanks

^ permalink raw reply

* Re: [PATCH bpf-next v3 05/11] bpf: avoid retpoline for lookup/update/delete calls on maps
From: Daniel Borkmann @ 2018-06-03 16:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: alexei.starovoitov, netdev
In-Reply-To: <20180603085651.73c76704@redhat.com>

On 06/03/2018 08:56 AM, Jesper Dangaard Brouer wrote:
> On Sat,  2 Jun 2018 23:06:35 +0200
> Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
>> Before:
>>
>>   # bpftool p d x i 1
> 
> Could this please be changed to:
> 
>  # bpftool prog dump xlated id 1
> 
> I requested this before, but you seem to have missed my feedback...
> This makes the command "self-documenting" and searchable by Google.

I recently wrote a howto here, but there's also excellent documentation
in terms of man pages for bpftool.

http://cilium.readthedocs.io/en/latest/bpf/#bpftool

My original thinking was that it might be okay to also show usage of
short option matching, like in iproute2 probably few people only write
'ip address' but majority uses 'ip a' instead. But I'm fine either way
if there are strong opinions ... thanks Alexei for fixing up!

^ permalink raw reply

* [PATCH net-next v2] qed: Add srq core support for RoCE and iWARP
From: Yuval Bason @ 2018-06-03 16:13 UTC (permalink / raw)
  To: yuval.bason, davem
  Cc: netdev, jgg, dledford, linux-rdma, Michal Kalderon, Ariel Elior

This patch adds support for configuring SRQ and provides the necessary
APIs for rdma upper layer driver (qedr) to enable the SRQ feature.

Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
---
Changes from v1:
	- sparse warnings
	- replace memset with ={}
---
 drivers/net/ethernet/qlogic/qed/qed_cxt.c   |   5 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.h   |   1 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |   2 +
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c |  23 ++++
 drivers/net/ethernet/qlogic/qed/qed_main.c  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_rdma.c  | 178 +++++++++++++++++++++++++++-
 drivers/net/ethernet/qlogic/qed/qed_rdma.h  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_roce.c  |  17 ++-
 include/linux/qed/qed_rdma_if.h             |  12 +-
 9 files changed, 234 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 820b226..7ed6aa0 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -47,6 +47,7 @@
 #include "qed_hsi.h"
 #include "qed_hw.h"
 #include "qed_init_ops.h"
+#include "qed_rdma.h"
 #include "qed_reg_addr.h"
 #include "qed_sriov.h"
 
@@ -426,7 +427,7 @@ static void qed_cxt_set_srq_count(struct qed_hwfn *p_hwfn, u32 num_srqs)
 	p_mgr->srq_count = num_srqs;
 }
 
-static u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
 {
 	struct qed_cxt_mngr *p_mgr = p_hwfn->p_cxt_mngr;
 
@@ -2071,7 +2072,7 @@ static void qed_rdma_set_pf_params(struct qed_hwfn *p_hwfn,
 	u32 num_cons, num_qps, num_srqs;
 	enum protocol_type proto;
 
-	num_srqs = min_t(u32, 32 * 1024, p_params->num_srqs);
+	num_srqs = min_t(u32, QED_RDMA_MAX_SRQS, p_params->num_srqs);
 
 	if (p_hwfn->mcp_info->func_info.protocol == QED_PCI_ETH_RDMA) {
 		DP_NOTICE(p_hwfn,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
index a4e9586..758a8b4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
@@ -235,6 +235,7 @@ u32 qed_cxt_get_proto_tid_count(struct qed_hwfn *p_hwfn,
 				enum protocol_type type);
 u32 qed_cxt_get_proto_cid_start(struct qed_hwfn *p_hwfn,
 				enum protocol_type type);
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn);
 int qed_cxt_free_proto_ilt(struct qed_hwfn *p_hwfn, enum protocol_type proto);
 
 #define QED_CTX_WORKING_MEM 0
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 8e1e6e1..82ce401 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -9725,6 +9725,8 @@ enum iwarp_eqe_async_opcode {
 	IWARP_EVENT_TYPE_ASYNC_EXCEPTION_DETECTED,
 	IWARP_EVENT_TYPE_ASYNC_QP_IN_ERROR_STATE,
 	IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW,
+	IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY,
+	IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT,
 	MAX_IWARP_EQE_ASYNC_OPCODE
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 2a2b101..474e6cf 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -271,6 +271,8 @@ int qed_iwarp_create_qp(struct qed_hwfn *p_hwfn,
 	p_ramrod->sq_num_pages = qp->sq_num_pages;
 	p_ramrod->rq_num_pages = qp->rq_num_pages;
 
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(qp->srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(p_hwfn->hw_info.opaque_fid);
 	p_ramrod->qp_handle_for_cqe.hi = cpu_to_le32(qp->qp_handle.hi);
 	p_ramrod->qp_handle_for_cqe.lo = cpu_to_le32(qp->qp_handle.lo);
 
@@ -3004,8 +3006,11 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
 				 union event_ring_data *data,
 				 u8 fw_return_code)
 {
+	struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
 	struct regpair *fw_handle = &data->rdma_data.async_handle;
 	struct qed_iwarp_ep *ep = NULL;
+	u16 srq_offset;
+	u16 srq_id;
 	u16 cid;
 
 	ep = (struct qed_iwarp_ep *)(uintptr_t)HILO_64(fw_handle->hi,
@@ -3067,6 +3072,24 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
 		qed_iwarp_cid_cleaned(p_hwfn, cid);
 
 		break;
+	case IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY:
+		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY\n");
+		srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+		/* FW assigns value that is no greater than u16 */
+		srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+		events.affiliated_event(events.context,
+					QED_IWARP_EVENT_SRQ_EMPTY,
+					&srq_id);
+		break;
+	case IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT:
+		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT\n");
+		srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+		/* FW assigns value that is no greater than u16 */
+		srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+		events.affiliated_event(events.context,
+					QED_IWARP_EVENT_SRQ_LIMIT,
+					&srq_id);
+		break;
 	case IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW:
 		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW\n");
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 68c4399..b04d57c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -64,6 +64,7 @@
 
 #define QED_ROCE_QPS			(8192)
 #define QED_ROCE_DPIS			(8)
+#define QED_RDMA_SRQS                   QED_ROCE_QPS
 
 static char version[] =
 	"QLogic FastLinQ 4xxxx Core Module qed " DRV_MODULE_VERSION "\n";
@@ -922,6 +923,7 @@ static void qed_update_pf_params(struct qed_dev *cdev,
 	if (IS_ENABLED(CONFIG_QED_RDMA)) {
 		params->rdma_pf_params.num_qps = QED_ROCE_QPS;
 		params->rdma_pf_params.min_dpis = QED_ROCE_DPIS;
+		params->rdma_pf_params.num_srqs = QED_RDMA_SRQS;
 		/* divide by 3 the MRs to avoid MF ILT overflow */
 		params->rdma_pf_params.gl_pi = QED_ROCE_PROTOCOL_INDEX;
 	}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index a411f9c..b870510 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -259,15 +259,29 @@ static int qed_rdma_alloc(struct qed_hwfn *p_hwfn,
 		goto free_cid_map;
 	}
 
+	/* Allocate bitmap for srqs */
+	p_rdma_info->num_srqs = qed_cxt_get_srq_count(p_hwfn);
+	rc = qed_rdma_bmap_alloc(p_hwfn, &p_rdma_info->srq_map,
+				 p_rdma_info->num_srqs, "SRQ");
+	if (rc) {
+		DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+			   "Failed to allocate srq bitmap, rc = %d\n", rc);
+		goto free_real_cid_map;
+	}
+
 	if (QED_IS_IWARP_PERSONALITY(p_hwfn))
 		rc = qed_iwarp_alloc(p_hwfn);
 
 	if (rc)
-		goto free_cid_map;
+		goto free_srq_map;
 
 	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "Allocation successful\n");
 	return 0;
 
+free_srq_map:
+	kfree(p_rdma_info->srq_map.bitmap);
+free_real_cid_map:
+	kfree(p_rdma_info->real_cid_map.bitmap);
 free_cid_map:
 	kfree(p_rdma_info->cid_map.bitmap);
 free_tid_map:
@@ -351,6 +365,8 @@ static void qed_rdma_resc_free(struct qed_hwfn *p_hwfn)
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->cq_map, 1);
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->toggle_bits, 0);
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->tid_map, 1);
+	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->srq_map, 1);
+	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->real_cid_map, 1);
 
 	kfree(p_rdma_info->port);
 	kfree(p_rdma_info->dev);
@@ -431,6 +447,12 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
 	if (cdev->rdma_max_sge)
 		dev->max_sge = min_t(u32, cdev->rdma_max_sge, dev->max_sge);
 
+	dev->max_srq_sge = QED_RDMA_MAX_SGE_PER_SRQ_WQE;
+	if (p_hwfn->cdev->rdma_max_srq_sge) {
+		dev->max_srq_sge = min_t(u32,
+					 p_hwfn->cdev->rdma_max_srq_sge,
+					 dev->max_srq_sge);
+	}
 	dev->max_inline = ROCE_REQ_MAX_INLINE_DATA_SIZE;
 
 	dev->max_inline = (cdev->rdma_max_inline) ?
@@ -474,6 +496,8 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
 	dev->max_mr_mw_fmr_size = dev->max_mr_mw_fmr_pbl * PAGE_SIZE;
 	dev->max_pkey = QED_RDMA_MAX_P_KEY;
 
+	dev->max_srq = p_hwfn->p_rdma_info->num_srqs;
+	dev->max_srq_wr = QED_RDMA_MAX_SRQ_WQE_ELEM;
 	dev->max_qp_resp_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
 					  (RDMA_RESP_RD_ATOMIC_ELM_SIZE * 2);
 	dev->max_qp_req_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
@@ -1628,6 +1652,155 @@ static void *qed_rdma_get_rdma_ctx(struct qed_dev *cdev)
 	return QED_LEADING_HWFN(cdev);
 }
 
+static int qed_rdma_modify_srq(void *rdma_cxt,
+			       struct qed_rdma_modify_srq_in_params *in_params)
+{
+	struct rdma_srq_modify_ramrod_data *p_ramrod;
+	struct qed_sp_init_data init_data = {};
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_spq_entry *p_ent;
+	u16 opaque_fid;
+	int rc;
+
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_MODIFY_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.rdma_modify_srq;
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+	p_ramrod->wqe_limit = cpu_to_le32(in_params->wqe_limit);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "modified SRQ id = %x",
+		   in_params->srq_id);
+
+	return rc;
+}
+
+static int
+qed_rdma_destroy_srq(void *rdma_cxt,
+		     struct qed_rdma_destroy_srq_in_params *in_params)
+{
+	struct rdma_srq_destroy_ramrod_data *p_ramrod;
+	struct qed_sp_init_data init_data = {};
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_spq_entry *p_ent;
+	struct qed_bmap *bmap;
+	u16 opaque_fid;
+	int rc;
+
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+	init_data.opaque_fid = opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_DESTROY_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.rdma_destroy_srq;
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	bmap = &p_hwfn->p_rdma_info->srq_map;
+
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	qed_bmap_release_id(p_hwfn, bmap, in_params->srq_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "SRQ destroyed Id = %x",
+		   in_params->srq_id);
+
+	return rc;
+}
+
+static int
+qed_rdma_create_srq(void *rdma_cxt,
+		    struct qed_rdma_create_srq_in_params *in_params,
+		    struct qed_rdma_create_srq_out_params *out_params)
+{
+	struct rdma_srq_create_ramrod_data *p_ramrod;
+	struct qed_sp_init_data init_data = {};
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	enum qed_cxt_elem_type elem_type;
+	struct qed_spq_entry *p_ent;
+	u16 opaque_fid, srq_id;
+	struct qed_bmap *bmap;
+	u32 returned_id;
+	int rc;
+
+	bmap = &p_hwfn->p_rdma_info->srq_map;
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	rc = qed_rdma_bmap_alloc_id(p_hwfn, bmap, &returned_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	if (rc) {
+		DP_NOTICE(p_hwfn, "failed to allocate srq id\n");
+		return rc;
+	}
+
+	elem_type = QED_ELEM_SRQ;
+	rc = qed_cxt_dynamic_ilt_alloc(p_hwfn, elem_type, returned_id);
+	if (rc)
+		goto err;
+	/* returned id is no greater than u16 */
+	srq_id = (u16)returned_id;
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.opaque_fid = opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_CREATE_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		goto err;
+
+	p_ramrod = &p_ent->ramrod.rdma_create_srq;
+	DMA_REGPAIR_LE(p_ramrod->pbl_base_addr, in_params->pbl_base_addr);
+	p_ramrod->pages_in_srq_pbl = cpu_to_le16(in_params->num_pages);
+	p_ramrod->pd_id = cpu_to_le16(in_params->pd_id);
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+	p_ramrod->page_size = cpu_to_le16(in_params->page_size);
+	DMA_REGPAIR_LE(p_ramrod->producers_addr, in_params->prod_pair_addr);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		goto err;
+
+	out_params->srq_id = srq_id;
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+		   "SRQ created Id = %x\n", out_params->srq_id);
+
+	return rc;
+
+err:
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	qed_bmap_release_id(p_hwfn, bmap, returned_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	return rc;
+}
+
 bool qed_rdma_allocated_qps(struct qed_hwfn *p_hwfn)
 {
 	bool result;
@@ -1773,6 +1946,9 @@ static int qed_roce_ll2_set_mac_filter(struct qed_dev *cdev,
 	.rdma_free_tid = &qed_rdma_free_tid,
 	.rdma_register_tid = &qed_rdma_register_tid,
 	.rdma_deregister_tid = &qed_rdma_deregister_tid,
+	.rdma_create_srq = &qed_rdma_create_srq,
+	.rdma_modify_srq = &qed_rdma_modify_srq,
+	.rdma_destroy_srq = &qed_rdma_destroy_srq,
 	.ll2_acquire_connection = &qed_ll2_acquire_connection,
 	.ll2_establish_connection = &qed_ll2_establish_connection,
 	.ll2_terminate_connection = &qed_ll2_terminate_connection,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.h b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
index 18ec9cb..6f722ee 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
@@ -96,6 +96,8 @@ struct qed_rdma_info {
 	u8 num_cnqs;
 	u32 num_qps;
 	u32 num_mrs;
+	u32 num_srqs;
+	u16 srq_id_offset;
 	u16 queue_zone_base;
 	u16 max_queue_zones;
 	enum protocol_type proto;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 6acfd43..ee57fcd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -65,6 +65,8 @@
 		     u8 fw_event_code,
 		     u16 echo, union event_ring_data *data, u8 fw_return_code)
 {
+	struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
+
 	if (fw_event_code == ROCE_ASYNC_EVENT_DESTROY_QP_DONE) {
 		u16 icid =
 		    (u16)le32_to_cpu(data->rdma_data.rdma_destroy_qp_data.cid);
@@ -75,11 +77,18 @@
 		 */
 		qed_roce_free_real_icid(p_hwfn, icid);
 	} else {
-		struct qed_rdma_events *events = &p_hwfn->p_rdma_info->events;
+		if (fw_event_code == ROCE_ASYNC_EVENT_SRQ_EMPTY ||
+		    fw_event_code == ROCE_ASYNC_EVENT_SRQ_LIMIT) {
+			u16 srq_id = (u16)data->rdma_data.async_handle.lo;
+
+			events.affiliated_event(events.context, fw_event_code,
+						&srq_id);
+		} else {
+			union rdma_eqe_data rdata = data->rdma_data;
 
-		events->affiliated_event(p_hwfn->p_rdma_info->events.context,
-					 fw_event_code,
-				     (void *)&data->rdma_data.async_handle);
+			events.affiliated_event(events.context, fw_event_code,
+						(void *)&rdata.async_handle);
+		}
 	}
 
 	return 0;
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index 4dd72ba..e05e320 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -485,7 +485,9 @@ enum qed_iwarp_event_type {
 	QED_IWARP_EVENT_ACTIVE_MPA_REPLY,
 	QED_IWARP_EVENT_LOCAL_ACCESS_ERROR,
 	QED_IWARP_EVENT_REMOTE_OPERATION_ERROR,
-	QED_IWARP_EVENT_TERMINATE_RECEIVED
+	QED_IWARP_EVENT_TERMINATE_RECEIVED,
+	QED_IWARP_EVENT_SRQ_LIMIT,
+	QED_IWARP_EVENT_SRQ_EMPTY,
 };
 
 enum qed_tcp_ip_version {
@@ -646,6 +648,14 @@ struct qed_rdma_ops {
 	int (*rdma_alloc_tid)(void *rdma_cxt, u32 *itid);
 	void (*rdma_free_tid)(void *rdma_cxt, u32 itid);
 
+	int (*rdma_create_srq)(void *rdma_cxt,
+			       struct qed_rdma_create_srq_in_params *iparams,
+			       struct qed_rdma_create_srq_out_params *oparams);
+	int (*rdma_destroy_srq)(void *rdma_cxt,
+				struct qed_rdma_destroy_srq_in_params *iparams);
+	int (*rdma_modify_srq)(void *rdma_cxt,
+			       struct qed_rdma_modify_srq_in_params *iparams);
+
 	int (*ll2_acquire_connection)(void *rdma_cxt,
 				      struct qed_ll2_acquire_data *data);
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH bpf-next v3 05/11] bpf: avoid retpoline for lookup/update/delete calls on maps
From: Jesper Dangaard Brouer @ 2018-06-03 17:08 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: brouer, alexei.starovoitov, netdev, Phil Sutter, Jakub Kicinski,
	Jakub Kicinski, Quentin Monnet
In-Reply-To: <d05e733b-7f54-9fd9-e80a-67e704197d14@iogearbox.net>

On Sun, 3 Jun 2018 18:11:45 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 06/03/2018 08:56 AM, Jesper Dangaard Brouer wrote:
> > On Sat,  2 Jun 2018 23:06:35 +0200
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> >   
> >> Before:
> >>
> >>   # bpftool p d x i 1  
> > 
> > Could this please be changed to:
> > 
> >  # bpftool prog dump xlated id 1
> > 
> > I requested this before, but you seem to have missed my feedback...
> > This makes the command "self-documenting" and searchable by Google.  
> 
> I recently wrote a howto here, but there's also excellent documentation
> in terms of man pages for bpftool.
> 
> http://cilium.readthedocs.io/en/latest/bpf/#bpftool
> 
> My original thinking was that it might be okay to also show usage of
> short option matching, like in iproute2 probably few people only write
> 'ip address' but majority uses 'ip a' instead. But I'm fine either way
> if there are strong opinions ... thanks Alexei for fixing up!

First of all I love your documentation effort.

Secondly I personally *hate* how the 'ip' does it's short options
parsing and especially order/precedence ambiguity.  Phil Sutter
(Fedora/RHEL iproute2 maintainer) have a funny quiz illustrating the
ambiguity issues.

Quiz: https://youtu.be/cymH9pcFGa0?t=7m10s
Code problem: https://youtu.be/cymH9pcFGa0?t=9m8s

I hope the maintainers and developers of bpftool make sure we don't end
up in an ambiguity mess like we have with 'ip', pretty please.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH] net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization
From: Maciej Żenczykowski @ 2018-06-03 17:41 UTC (permalink / raw)
  To: Maciej Żenczykowski, David S . Miller, Eric Dumazet
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Wei Wang

From: Maciej Żenczykowski <maze@google.com>

This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
to an integer.

It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
while 2 enables timewait socket reuse only for sockets that we can
prove are loopback connections:
  ie. bound to 'lo' interface or where one of source or destination
  IPs is 127.0.0.0/8, ::ffff:127.0.0.0/104 or ::1.

This enables quicker reuse of ephemeral ports for loopback connections
- where tcp_tw_reuse is 100% safe from a protocol perspective
(this assumes no artificially induced packet loss on 'lo').

This also makes estblishing many loopback connections *much* faster
(allocating ports out of the first half of the ephemeral port range
is significantly faster, then allocating from the second half)

Without this change in a 32K ephemeral port space my sample program
(it just establishes and closes [::1]:ephemeral -> [::1]:server_port
connections in a tight loop) fails after 32765 connections in 24 seconds.
With it enabled 50000 connections only take 4.7 seconds.

This is particularly problematic for IPv6 where we only have one local
address and cannot play tricks with varying source IP from 127.0.0.0/8
pool.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Wei Wang <weiwan@google.com>

Change-Id: I0377961749979d0301b7b62871a32a4b34b654e1
---
 Documentation/networking/ip-sysctl.txt | 10 +++++---
 net/ipv4/sysctl_net_ipv4.c             |  5 +++-
 net/ipv4/tcp_ipv4.c                    | 35 +++++++++++++++++++++++---
 3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 924bd51327b7..6841c74eac00 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -667,11 +667,15 @@ tcp_tso_win_divisor - INTEGER
 	building larger TSO frames.
 	Default: 3
 
-tcp_tw_reuse - BOOLEAN
-	Allow to reuse TIME-WAIT sockets for new connections when it is
-	safe from protocol viewpoint. Default value is 0.
+tcp_tw_reuse - INTEGER
+	Enable reuse of TIME-WAIT sockets for new connections when it is
+	safe from protocol viewpoint.
+	0 - disable
+	1 - global enable
+	2 - enable for loopback traffic only
 	It should not be changed without advice/request of technical
 	experts.
+	Default: 2
 
 tcp_window_scaling - BOOLEAN
 	Enable window scaling as defined in RFC1323.
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d2eed3ddcb0a..d06247ba08b2 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -30,6 +30,7 @@
 
 static int zero;
 static int one = 1;
+static int two = 2;
 static int four = 4;
 static int thousand = 1000;
 static int gso_max_segs = GSO_MAX_SEGS;
@@ -845,7 +846,9 @@ static struct ctl_table ipv4_net_table[] = {
 		.data		= &init_net.ipv4.sysctl_tcp_tw_reuse,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
 	},
 	{
 		.procname	= "tcp_max_tw_buckets",
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index adbdb503db0c..29f922d5e55d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -110,8 +110,38 @@ static u32 tcp_v4_init_ts_off(const struct net *net, const struct sk_buff *skb)
 
 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
 {
+	const struct inet_timewait_sock *tw = inet_twsk(sktw);
 	const struct tcp_timewait_sock *tcptw = tcp_twsk(sktw);
 	struct tcp_sock *tp = tcp_sk(sk);
+	int reuse = sock_net(sk)->ipv4.sysctl_tcp_tw_reuse;
+
+	if (reuse == 2) {
+		/* Still does not detect *everything* that goes through
+		 * lo, since we require a loopback src or dst address
+		 * or direct binding to 'lo' interface.
+		 */
+		bool loopback = false;
+		if (tw->tw_bound_dev_if == LOOPBACK_IFINDEX)
+			loopback = true;
+#if IS_ENABLED(CONFIG_IPV6)
+		if (tw->tw_family == AF_INET6) {
+			if (ipv6_addr_loopback(&tw->tw_v6_daddr) ||
+			    (ipv6_addr_v4mapped(&tw->tw_v6_daddr) &&
+			     (tw->tw_v6_daddr.s6_addr[12] == 127)) ||
+			    ipv6_addr_loopback(&tw->tw_v6_rcv_saddr) ||
+			    (ipv6_addr_v4mapped(&tw->tw_v6_rcv_saddr) &&
+			     (tw->tw_v6_rcv_saddr.s6_addr[12] == 127)))
+				loopback = true;
+		} else
+#endif
+		{
+			if (ipv4_is_loopback(tw->tw_daddr) ||
+			    ipv4_is_loopback(tw->tw_rcv_saddr))
+				loopback = true;
+		}
+		if (!loopback)
+			reuse = 0;
+	}
 
 	/* With PAWS, it is safe from the viewpoint
 	   of data integrity. Even without PAWS it is safe provided sequence
@@ -125,8 +155,7 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
 	   and use initial timestamp retrieved from peer table.
 	 */
 	if (tcptw->tw_ts_recent_stamp &&
-	    (!twp || (sock_net(sk)->ipv4.sysctl_tcp_tw_reuse &&
-			     get_seconds() - tcptw->tw_ts_recent_stamp > 1))) {
+	    (!twp || (reuse && get_seconds() - tcptw->tw_ts_recent_stamp > 1))) {
 		tp->write_seq = tcptw->tw_snd_nxt + 65535 + 2;
 		if (tp->write_seq == 0)
 			tp->write_seq = 1;
@@ -2529,7 +2558,7 @@ static int __net_init tcp_sk_init(struct net *net)
 	net->ipv4.sysctl_tcp_orphan_retries = 0;
 	net->ipv4.sysctl_tcp_fin_timeout = TCP_FIN_TIMEOUT;
 	net->ipv4.sysctl_tcp_notsent_lowat = UINT_MAX;
-	net->ipv4.sysctl_tcp_tw_reuse = 0;
+	net->ipv4.sysctl_tcp_tw_reuse = 2;
 
 	cnt = tcp_hashinfo.ehash_mask + 1;
 	net->ipv4.tcp_death_row.sysctl_max_tw_buckets = (cnt + 1) / 2;
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply related

* [PATCH] net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets
From: Maciej Żenczykowski @ 2018-06-03 17:47 UTC (permalink / raw)
  To: Maciej Żenczykowski, David S . Miller; +Cc: Eric Dumazet, netdev

From: Maciej Żenczykowski <maze@google.com>

It is not safe to do so because such sockets are already in the
hash tables and changing these options can result in invalidating
the tb->fastreuse(port) caching.

This can have later far reaching consequences wrt. bind conflict checks
which rely on these caches (for optimization purposes).

Not to mention that you can currently end up with two identical
non-reuseport listening sockets bound to the same local ip:port
by clearing reuseport on them after they've already both been bound.

There is unfortunately no EISBOUND error or anything similar,
and EISCONN seems to be misleading for a bound-but-not-connected
socket, so use EUCLEAN 'Structure needs cleaning' which AFAICT
is the closest you can get to meaning 'socket in bad state'.
(although perhaps EINVAL wouldn't be a bad choice either?)

This does unfortunately run the risk of breaking buggy
userspace programs...

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>

Change-Id: I77c2b3429b2fdf42671eee0fa7a8ba721c94963b
---
 net/core/sock.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 435a0ba85e52..feca4c98f8a0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -728,9 +728,22 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 			sock_valbool_flag(sk, SOCK_DBG, valbool);
 		break;
 	case SO_REUSEADDR:
-		sk->sk_reuse = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
+		val = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
+		if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
+		    inet_sk(sk)->inet_num &&
+		    (sk->sk_reuse != val)) {
+			ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
+			break;
+		}
+		sk->sk_reuse = val;
 		break;
 	case SO_REUSEPORT:
+		if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
+		    inet_sk(sk)->inet_num &&
+		    (sk->sk_reuseport != valbool)) {
+			ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
+			break;
+		}
 		sk->sk_reuseport = valbool;
 		break;
 	case SO_TYPE:
-- 
2.17.1.1185.g55be947832-goog

^ permalink raw reply related

* Re: [PATCH net-next 0/2] cls_flower: Various fixes
From: Cong Wang @ 2018-06-03 18:33 UTC (permalink / raw)
  To: Paul Blakey
  Cc: Jiri Pirko, Jamal Hadi Salim, David Miller,
	Linux Kernel Network Developers, Yevgeny Kliteynik, Roi Dayan,
	Shahar Klein, Mark Bloch, Or Gerlitz
In-Reply-To: <1527668258-27174-1-git-send-email-paulb@mellanox.com>

On Wed, May 30, 2018 at 1:17 AM, Paul Blakey <paulb@mellanox.com> wrote:
> Two of the fixes are for my multiple mask patch
>
> Paul Blakey (2):
>   cls_flower: Fix missing free of rhashtable
>   cls_flower: Fix comparing of old filter mask with new filter

Both are bug fixes and one-line fixes, so definitely should go
to -net tree and -stable tree.

I don't understand why you decide to rebase on net-next.

^ permalink raw reply

* Re: [PATCH net-next 0/2] cls_flower: Various fixes
From: Jiri Pirko @ 2018-06-03 19:39 UTC (permalink / raw)
  To: Cong Wang
  Cc: Paul Blakey, Jiri Pirko, Jamal Hadi Salim, David Miller,
	Linux Kernel Network Developers, Yevgeny Kliteynik, Roi Dayan,
	Shahar Klein, Mark Bloch, Or Gerlitz
In-Reply-To: <CAM_iQpWQAwD8kfV4B9EK81TWtY6ZwEUZ_DbdCnC-iF22Ch8mxQ@mail.gmail.com>

Sun, Jun 03, 2018 at 08:33:25PM CEST, xiyou.wangcong@gmail.com wrote:
>On Wed, May 30, 2018 at 1:17 AM, Paul Blakey <paulb@mellanox.com> wrote:
>> Two of the fixes are for my multiple mask patch
>>
>> Paul Blakey (2):
>>   cls_flower: Fix missing free of rhashtable
>>   cls_flower: Fix comparing of old filter mask with new filter
>
>Both are bug fixes and one-line fixes, so definitely should go
>to -net tree and -stable tree.

I agree.

^ permalink raw reply

* Re: [PATCH] net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets
From: Christoph Paasch @ 2018-06-03 19:54 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Maciej Żenczykowski, David S . Miller, Eric Dumazet, netdev
In-Reply-To: <20180603174705.51802-1-zenczykowski@gmail.com>

Hello,

On Sun, Jun 3, 2018 at 10:47 AM, Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
> From: Maciej Żenczykowski <maze@google.com>
>
> It is not safe to do so because such sockets are already in the
> hash tables and changing these options can result in invalidating
> the tb->fastreuse(port) caching.
>
> This can have later far reaching consequences wrt. bind conflict checks
> which rely on these caches (for optimization purposes).
>
> Not to mention that you can currently end up with two identical
> non-reuseport listening sockets bound to the same local ip:port
> by clearing reuseport on them after they've already both been bound.

as a side-note: Some time back I realized that one can also - on the
active opener side - create two TCP connections with the same 5-tuple
going out over the same interface.

One simply needs to first create a connection with a socket that has
SO_BINDTODEV set that specifies the same interface as the default
route. The second socket (which doesn't uses SO_BINDTODEV) then can
end up using the same source-port, if the range of available ports has
been exhausted.
This makes for some interesting packet-traces! :)

This is because INET_MATCH in __inet_check_established only checks for
!(sk->sk_bound_dev_if). inet_hash_connect() probably would need info
of the route's outgoing interface (of the new socket) to decide
whether or not there is a match.

But even that wouldn't be failsafe as the routing could change later
on... So, I dropped the ball on that.

Not sure if it's a big deal or not...


Cheers,
Christoph



>
> There is unfortunately no EISBOUND error or anything similar,
> and EISCONN seems to be misleading for a bound-but-not-connected
> socket, so use EUCLEAN 'Structure needs cleaning' which AFAICT
> is the closest you can get to meaning 'socket in bad state'.
> (although perhaps EINVAL wouldn't be a bad choice either?)
>
> This does unfortunately run the risk of breaking buggy
> userspace programs...
>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
>
> Change-Id: I77c2b3429b2fdf42671eee0fa7a8ba721c94963b
> ---
>  net/core/sock.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 435a0ba85e52..feca4c98f8a0 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -728,9 +728,22 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>                         sock_valbool_flag(sk, SOCK_DBG, valbool);
>                 break;
>         case SO_REUSEADDR:
> -               sk->sk_reuse = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
> +               val = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
> +               if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
> +                   inet_sk(sk)->inet_num &&
> +                   (sk->sk_reuse != val)) {
> +                       ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
> +                       break;
> +               }
> +               sk->sk_reuse = val;
>                 break;
>         case SO_REUSEPORT:
> +               if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
> +                   inet_sk(sk)->inet_num &&
> +                   (sk->sk_reuseport != valbool)) {
> +                       ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
> +                       break;
> +               }
>                 sk->sk_reuseport = valbool;
>                 break;
>         case SO_TYPE:
> --
> 2.17.1.1185.g55be947832-goog
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox