Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH iproute2-next] ip tunnel: add json output
From: Andrea Claudi @ 2019-08-02 11:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-netdev, David Ahern
In-Reply-To: <20190801081620.6b25d23c@hermes.lan>

On Thu, Aug 1, 2019 at 5:16 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Thu,  1 Aug 2019 12:12:58 +0200
> Andrea Claudi <aclaudi@redhat.com> wrote:
>
> > Add json support on iptunnel and ip6tunnel.
> > The plain text output format should remain the same.
> >
> > Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
> > ---
> >  ip/ip6tunnel.c | 82 +++++++++++++++++++++++++++++++--------------
> >  ip/iptunnel.c  | 90 +++++++++++++++++++++++++++++++++-----------------
> >  ip/tunnel.c    | 42 +++++++++++++++++------
> >  3 files changed, 148 insertions(+), 66 deletions(-)
> >
> > diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
> > index d7684a673fdc4..f2b9710c1320f 100644
> > --- a/ip/ip6tunnel.c
> > +++ b/ip/ip6tunnel.c
> > @@ -71,57 +71,90 @@ static void usage(void)
> >  static void print_tunnel(const void *t)
> >  {
> >       const struct ip6_tnl_parm2 *p = t;
> > -     char s1[1024];
> > -     char s2[1024];
> > +     SPRINT_BUF(b1);
> >
> >       /* Do not use format_host() for local addr,
> >        * symbolic name will not be useful.
> >        */
> > -     printf("%s: %s/ipv6 remote %s local %s",
> > -            p->name,
> > -            tnl_strproto(p->proto),
> > -            format_host_r(AF_INET6, 16, &p->raddr, s1, sizeof(s1)),
> > -            rt_addr_n2a_r(AF_INET6, 16, &p->laddr, s2, sizeof(s2)));
> > +     open_json_object(NULL);
> > +     print_string(PRINT_ANY, "ifname", "%s: ", p->name);
>
> Print this using color for interface name?

Thanks for the suggestion, I can do the same also for remote and local
addresses?

>
>
> > +     snprintf(b1, sizeof(b1), "%s/ipv6", tnl_strproto(p->proto));
> > +     print_string(PRINT_ANY, "mode", "%s ", b1);
> > +     print_string(PRINT_ANY,
> > +                  "remote",
> > +                  "remote %s ",
> > +                  format_host_r(AF_INET6, 16, &p->raddr, b1, sizeof(b1)));
> > +     print_string(PRINT_ANY,
> > +                  "local",
> > +                  "local %s",
> > +                  rt_addr_n2a_r(AF_INET6, 16, &p->laddr, b1, sizeof(b1)));
> > +
> >       if (p->link) {
> >               const char *n = ll_index_to_name(p->link);
> >
> >               if (n)
> > -                     printf(" dev %s", n);
> > +                     print_string(PRINT_ANY, "link", " dev %s", n);
> >       }
> >
> >       if (p->flags & IP6_TNL_F_IGN_ENCAP_LIMIT)
> > -             printf(" encaplimit none");
> > +             print_bool(PRINT_ANY,
> > +                        "ip6_tnl_f_ign_encap_limit",
> > +                        " encaplimit none",
> > +                        true);
>
> For flags like this, print_null is more typical JSON than a boolean
> value. Null is better for presence flag. Bool is better if both true and
> false are printed.

Using print_null json JSON output becomes:

  {
    "ifname": "gre2",
    "mode": "gre/ipv6",
    "remote": "fd::1",
    "local": "::",
    "ip6_tnl_f_ign_encap_limit": null,
    "hoplimit": 64,
    "tclass": "0x00",
    "flowlabel": "0x00000",
    "flowinfo": "0x00000000",
    "flags": []
  }

Which seems a bit confusing to me (what does the "null" value means?).
Using print_null we also introduce an inconsistency with the JSON
output of ip/link_gre6.c and ip/link_ip6tnl.c.
I would prefer to use print_bool and print out
ip6_tnl_f_ign_encap_limit both when true and false, patching both
files above to do the same. WDYT?

^ permalink raw reply

* Re: [PATCH net 0/2] flow_offload hardware priority fixes
From: Pablo Neira Ayuso @ 2019-08-02 11:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netfilter-devel, davem, netdev, marcelo.leitner, jiri, wenxu,
	saeedm, paulb, gerlitz.or
In-Reply-To: <20190801172014.314a9d01@cakuba.netronome.com>

Hi Jakub,

If the user specifies 'pref' in the new rule, then tc checks if there
is a tcf_proto object that matches this priority. If the tcf_proto
object does not exist, tc creates a tcf_proto object and it adds the
new rule to this tcf_proto.

In cls_flower, each tcf_proto only stores one single rule, so if the
user tries to add another rule with the same 'pref', cls_flower
returns EEXIST.

I'll prepare a new patchset not to map the priority to the netfilter
basechain priority, instead the rule priority will be internally
allocated for each new rule.

Thanks for your feedback.

^ permalink raw reply

* [PATCH net v4] net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
From: Nikolay Aleksandrov @ 2019-08-02 10:57 UTC (permalink / raw)
  To: netdev; +Cc: roopa, davem, bridge, Nikolay Aleksandrov, michael-dev
In-Reply-To: <0a015a21-c1ae-e275-e675-431f08bece86@cumulusnetworks.com>

Most of the bridge device's vlan init bugs come from the fact that its
default pvid is created at the wrong time, way too early in ndo_init()
before the device is even assigned an ifindex. It introduces a bug when the
bridge's dev_addr is added as fdb during the initial default pvid creation
the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
which really makes no sense for user-space[0] and is wrong.
Usually user-space software would ignore such entries, but they are
actually valid and will eventually have all necessary attributes.
It makes much more sense to send a notification *after* the device has
registered and has a proper ifindex allocated rather than before when
there's a chance that the registration might still fail or to receive
it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
from br_vlan_flush() since that case can no longer happen. At
NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
depending why it was called (if called due to NETDEV_REGISTER error
it'll still be == 1, otherwise it could be any value changed during the
device life time).

For the demonstration below a small change to iproute2 for printing all fdb
notifications is added, because it contained a workaround not to show
entries with ifindex == 0.
Command executed while monitoring: $ ip l add br0 type bridge
Before (both ifindex and master == 0):
$ bridge monitor fdb
36:7e:8a:b3:56:ba dev * vlan 1 master * permanent

After (proper br0 ifindex):
$ bridge monitor fdb
e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent

v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
v3: send the correct v2 patch with all changes (stub should return 0)
v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
    the br_vlan_bridge_event stub when bridge vlans are disabled

[0] https://bugzilla.kernel.org/show_bug.cgi?id=204389

Reported-by: michael-dev <michael-dev@fami-braun.de>
Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
To test the patch I've added manual error conditions all over the bridge
init process to exercise all of the error handling paths.

This change is equivalent to v3 w.r.t. init ordering fixes, the core
of the problem was the default pvid init/deinit sequences being done
at wrong times, so we delay them until NETDEV_REGISTER/UNREGISTER.

 net/bridge/br.c         |  5 ++++-
 net/bridge/br_private.h |  9 +++++----
 net/bridge/br_vlan.c    | 34 ++++++++++++++++------------------
 3 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index d164f63a4345..8a8f9e5f264f 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -37,12 +37,15 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 	int err;
 
 	if (dev->priv_flags & IFF_EBRIDGE) {
+		err = br_vlan_bridge_event(dev, event, ptr);
+		if (err)
+			return notifier_from_errno(err);
+
 		if (event == NETDEV_REGISTER) {
 			/* register of bridge completed, add sysfs entries */
 			br_sysfs_addbr(dev);
 			return NOTIFY_DONE;
 		}
-		br_vlan_bridge_event(dev, event, ptr);
 	}
 
 	/* not a port of a bridge */
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e8cf03b43b7d..646504db0220 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -894,8 +894,8 @@ int nbp_get_num_vlan_infos(struct net_bridge_port *p, u32 filter_mask);
 void br_vlan_get_stats(const struct net_bridge_vlan *v,
 		       struct br_vlan_stats *stats);
 void br_vlan_port_event(struct net_bridge_port *p, unsigned long event);
-void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
-			  void *ptr);
+int br_vlan_bridge_event(struct net_device *dev, unsigned long event,
+			 void *ptr);
 
 static inline struct net_bridge_vlan_group *br_vlan_group(
 					const struct net_bridge *br)
@@ -1085,9 +1085,10 @@ static inline void br_vlan_port_event(struct net_bridge_port *p,
 {
 }
 
-static inline void br_vlan_bridge_event(struct net_device *dev,
-					unsigned long event, void *ptr)
+static inline int br_vlan_bridge_event(struct net_device *dev,
+				       unsigned long event, void *ptr)
 {
+	return 0;
 }
 #endif
 
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index a544e161c7fa..f5b2aeebbfe9 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -715,11 +715,6 @@ void br_vlan_flush(struct net_bridge *br)
 
 	ASSERT_RTNL();
 
-	/* delete auto-added default pvid local fdb before flushing vlans
-	 * otherwise it will be leaked on bridge device init failure
-	 */
-	br_fdb_delete_by_port(br, NULL, 0, 1);
-
 	vg = br_vlan_group(br);
 	__vlan_flush(vg);
 	RCU_INIT_POINTER(br->vlgrp, NULL);
@@ -1058,7 +1053,6 @@ int br_vlan_init(struct net_bridge *br)
 {
 	struct net_bridge_vlan_group *vg;
 	int ret = -ENOMEM;
-	bool changed;
 
 	vg = kzalloc(sizeof(*vg), GFP_KERNEL);
 	if (!vg)
@@ -1073,17 +1067,10 @@ int br_vlan_init(struct net_bridge *br)
 	br->vlan_proto = htons(ETH_P_8021Q);
 	br->default_pvid = 1;
 	rcu_assign_pointer(br->vlgrp, vg);
-	ret = br_vlan_add(br, 1,
-			  BRIDGE_VLAN_INFO_PVID | BRIDGE_VLAN_INFO_UNTAGGED |
-			  BRIDGE_VLAN_INFO_BRENTRY, &changed, NULL);
-	if (ret)
-		goto err_vlan_add;
 
 out:
 	return ret;
 
-err_vlan_add:
-	vlan_tunnel_deinit(vg);
 err_tunnel_init:
 	rhashtable_destroy(&vg->vlan_hash);
 err_rhtbl:
@@ -1469,13 +1456,23 @@ static void nbp_vlan_set_vlan_dev_state(struct net_bridge_port *p, u16 vid)
 }
 
 /* Must be protected by RTNL. */
-void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
-			  void *ptr)
+int br_vlan_bridge_event(struct net_device *dev, unsigned long event, void *ptr)
 {
 	struct netdev_notifier_changeupper_info *info;
-	struct net_bridge *br;
+	struct net_bridge *br = netdev_priv(dev);
+	bool changed;
+	int ret = 0;
 
 	switch (event) {
+	case NETDEV_REGISTER:
+		ret = br_vlan_add(br, br->default_pvid,
+				  BRIDGE_VLAN_INFO_PVID |
+				  BRIDGE_VLAN_INFO_UNTAGGED |
+				  BRIDGE_VLAN_INFO_BRENTRY, &changed, NULL);
+		break;
+	case NETDEV_UNREGISTER:
+		br_vlan_delete(br, br->default_pvid);
+		break;
 	case NETDEV_CHANGEUPPER:
 		info = ptr;
 		br_vlan_upper_change(dev, info->upper_dev, info->linking);
@@ -1483,12 +1480,13 @@ void br_vlan_bridge_event(struct net_device *dev, unsigned long event,
 
 	case NETDEV_CHANGE:
 	case NETDEV_UP:
-		br = netdev_priv(dev);
 		if (!br_opt_get(br, BROPT_VLAN_BRIDGE_BINDING))
-			return;
+			break;
 		br_vlan_link_state_change(dev, br);
 		break;
 	}
+
+	return ret;
 }
 
 /* Must be protected by RTNL. */
-- 
2.21.0


^ permalink raw reply related

* [PATCH 2/2] ixgbe: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02 10:55 UTC (permalink / raw)
  Cc: Jeff Kirsher, David S . Miller, intel-wired-lan, netdev,
	linux-kernel, Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Also convert refcount from 0-based to 1-based.

This patch depends on PATCH 1/2.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c | 6 +++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index 00710f43cfd2..d313d00065cd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -773,7 +773,7 @@ int ixgbe_setup_fcoe_ddp_resources(struct ixgbe_adapter *adapter)
 
 	fcoe->extra_ddp_buffer = buffer;
 	fcoe->extra_ddp_buffer_dma = dma;
-	atomic_set(&fcoe->refcnt, 0);
+	refcount_set(&fcoe->refcnt, 1);
 
 	/* allocate pci pool for each cpu */
 	for_each_possible_cpu(cpu) {
@@ -837,7 +837,7 @@ int ixgbe_fcoe_enable(struct net_device *netdev)
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_fcoe *fcoe = &adapter->fcoe;
 
-	atomic_inc(&fcoe->refcnt);
+	refcount_inc(&fcoe->refcnt);
 
 	if (!(adapter->flags & IXGBE_FLAG_FCOE_CAPABLE))
 		return -EINVAL;
@@ -883,7 +883,7 @@ int ixgbe_fcoe_disable(struct net_device *netdev)
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-	if (!atomic_dec_and_test(&adapter->fcoe.refcnt))
+	if (!refcount_dec_and_test(&adapter->fcoe.refcnt))
 		return -EINVAL;
 
 	if (!(adapter->flags & IXGBE_FLAG_FCOE_ENABLED))
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h
index 724f5382329f..7ace5fee6ede 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.h
@@ -51,7 +51,7 @@ struct ixgbe_fcoe_ddp_pool {
 
 struct ixgbe_fcoe {
 	struct ixgbe_fcoe_ddp_pool __percpu *ddp_pool;
-	atomic_t refcnt;
+	refcount_t refcnt;
 	spinlock_t lock;
 	struct ixgbe_fcoe_ddp ddp[IXGBE_FCOE_DDP_MAX_X550];
 	void *extra_ddp_buffer;
-- 
2.20.1


^ permalink raw reply related

* [PATCH 1/2] ixgbe: Explicitly initialize reference count to 0
From: Chuhong Yuan @ 2019-08-02 10:54 UTC (permalink / raw)
  Cc: Jeff Kirsher, David S . Miller, intel-wired-lan, netdev,
	linux-kernel, Chuhong Yuan

The driver does not explicitly call atomic_set to initialize
refcount to 0.
Add the call so that it will be more straight forward to
convert refcount from atomic_t to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
index ccd852ad62a4..00710f43cfd2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
@@ -773,6 +773,7 @@ int ixgbe_setup_fcoe_ddp_resources(struct ixgbe_adapter *adapter)
 
 	fcoe->extra_ddp_buffer = buffer;
 	fcoe->extra_ddp_buffer_dma = dma;
+	atomic_set(&fcoe->refcnt, 0);
 
 	/* allocate pci pool for each cpu */
 	for_each_possible_cpu(cpu) {
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v5 5/6] flow_offload: support get flow_block immediately
From: wenxu @ 2019-08-02 10:45 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: jiri, pablo, fw, netfilter-devel, netdev, John Hurley
In-Reply-To: <20190801161129.25fee619@cakuba.netronome.com>


On 8/2/2019 7:11 AM, Jakub Kicinski wrote:
> On Thu,  1 Aug 2019 11:03:46 +0800, wenxu@ucloud.cn wrote:
>> From: wenxu <wenxu@ucloud.cn>
>>
>> The new flow-indr-block can't get the tcf_block
>> directly. It provide a callback list to find the flow_block immediately
>> when the device register and contain a ingress block.
>>
>> Signed-off-by: wenxu <wenxu@ucloud.cn>
> First of all thanks for splitting the series up into more patches, 
> it is easier to follow the logic now!
>
>> @@ -328,6 +348,7 @@ struct flow_indr_block_dev {
>>  
>>  	INIT_LIST_HEAD(&indr_dev->cb_list);
>>  	indr_dev->dev = dev;
>> +	flow_get_default_block(indr_dev);
>>  	if (rhashtable_insert_fast(&indr_setup_block_ht, &indr_dev->ht_node,
>>  				   flow_indr_setup_block_ht_params)) {
>>  		kfree(indr_dev);
> I wonder if it's still practical to keep the block information in the
> indr_dev structure at all. The way this used to work was:
>
>
> [hash table of devices]     --------------
>                  |         |    netdev    |
>                  |         |    refcnt    |
>   indir_dev[tun0]|  ------ | cached block | ---- [ TC block ]
>                  |         |   callbacks  | .
>                  |          --------------   \__ [cb, cb_priv, cb_ident]
>                  |                               [cb, cb_priv, cb_ident]
>                  |          --------------
>                  |         |    netdev    |
>                  |         |    refcnt    |
>   indir_dev[tun1]|  ------ | cached block | ---- [ TC block ]
>                  |         |   callbacks  |.
> -----------------           --------------   \__ [cb, cb_priv, cb_ident]
>                                                  [cb, cb_priv, cb_ident]
>
>
> In the example above we have two tunnels tun0 and tun1, each one has a
> indr_dev structure allocated, and for each one of them two drivers
> registered for callbacks (hence the callbacks list has two entries).
>
> We used to cache the TC block in the indr_dev structure, but now that
> there are multiple subsytems using the indr_dev we either have to have
> a list of cached blocks (with entries for each subsystem) or just always
> iterate over the subsystems :(
>
> After all the same device may have both a TC block and a NFT block.
>
> I think always iterating would be easier:
>
> The indr_dev struct would no longer have the block pointer, instead
> when new driver registers for the callback instead of:
>
> 	if (indr_dev->ing_cmd_cb)
> 		indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block,
> 				     indr_block_cb->cb, indr_block_cb->cb_priv,
> 				     FLOW_BLOCK_BIND);
>
> We'd have something like the loop in flow_get_default_block():
>
> 	for each (subsystem)
> 		subsystem->handle_new_indir_cb(indr_dev, cb);
>
> And then per-subsystem logic would actually call the cb. Or:
>
> 	for each (subsystem)
> 		block = get_default_block(indir_dev)
> 		indr_dev->ing_cmd_cb(...)

            nft dev chian is also based on register_netdevice_notifier, So for unregister case,

the basechian(block) of nft maybe delete before the __tc_indr_block_cb_unregister. is right?

So maybe we can cache the block as a list of all the subsystem in  indr_dev ?

> I hope this makes sense.
>
>
> Also please double check nft unload logic has an RCU synchronization
> point, I'm not 100% confident rcu_barrier() implies synchronize_rcu().
> Perhaps someone more knowledgeable can chime in :)
>

^ permalink raw reply

* [PATCH] dpaa_eth: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02 10:29 UTC (permalink / raw)
  Cc: Madalin Bucur, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 6 +++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index f38c3fa7d705..2df6e745cb3f 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -485,7 +485,7 @@ static struct dpaa_bp *dpaa_bpid2pool(int bpid)
 static bool dpaa_bpid2pool_use(int bpid)
 {
 	if (dpaa_bpid2pool(bpid)) {
-		atomic_inc(&dpaa_bp_array[bpid]->refs);
+		refcount_inc(&dpaa_bp_array[bpid]->refs);
 		return true;
 	}
 
@@ -496,7 +496,7 @@ static bool dpaa_bpid2pool_use(int bpid)
 static void dpaa_bpid2pool_map(int bpid, struct dpaa_bp *dpaa_bp)
 {
 	dpaa_bp_array[bpid] = dpaa_bp;
-	atomic_set(&dpaa_bp->refs, 1);
+	refcount_set(&dpaa_bp->refs, 1);
 }
 
 static int dpaa_bp_alloc_pool(struct dpaa_bp *dpaa_bp)
@@ -584,7 +584,7 @@ static void dpaa_bp_free(struct dpaa_bp *dpaa_bp)
 	if (!bp)
 		return;
 
-	if (!atomic_dec_and_test(&bp->refs))
+	if (!refcount_dec_and_test(&bp->refs))
 		return;
 
 	if (bp->free_buf_cb)
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index af320f83c742..acc3fcdf730a 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -99,7 +99,7 @@ struct dpaa_bp {
 	int (*seed_cb)(struct dpaa_bp *);
 	/* bpool can be emptied before freeing by this cb */
 	void (*free_buf_cb)(const struct dpaa_bp *, struct bm_buffer *);
-	atomic_t refs;
+	refcount_t refs;
 };
 
 struct dpaa_rx_errors {
-- 
2.20.1


^ permalink raw reply related

* [PATCH][net-next] net/mlx5: remove self-assignment on esw->dev
From: Colin King @ 2019-08-02 10:22 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, David S . Miller, netdev,
	linux-rdma
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

There is a self assignment of esw->dev to itself, clean this up by
removing it.

Addresses-Coverity: ("Self assignment")
Fixes: 6cedde451399 ("net/mlx5: E-Switch, Verify support QoS element type")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index f4ace5f8e884..de0894b695e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1413,7 +1413,7 @@ static int esw_vport_egress_config(struct mlx5_eswitch *esw,
 
 static bool element_type_supported(struct mlx5_eswitch *esw, int type)
 {
-	struct mlx5_core_dev *dev = esw->dev = esw->dev;
+	struct mlx5_core_dev *dev = esw->dev;
 
 	switch (type) {
 	case SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR:
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] b43legacy: Remove pointless cond_resched() wrapper
From: Kalle Valo @ 2019-08-02 10:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: netdev, b43-dev, Larry Finger, linux-wireless
In-Reply-To: <alpine.DEB.2.21.1907262157500.1791@nanos.tec.linutronix.de>

+ linux-wireless

Thomas Gleixner <tglx@linutronix.de> writes:

> cond_resched() can be used unconditionally. If CONFIG_PREEMPT is set, it
> becomes a NOP scheduler wise.
>
> Also the B43_BUG_ON() in that wrapper is a homebrewn variant of
> __might_sleep() which is part of cond_resched() already.
>
> Remove the wrapper and invoke cond_resched() directly.
>
> Found while looking for CONFIG_PREEMPT dependent code treewide.
>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: netdev@vger.kernel.org
> Cc: b43-dev@lists.infradead.org
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: Larry Finger <Larry.Finger@lwfinger.net>

I use patchwork and this doesn't show there as our patchwork follows
only linux-wireless linux. Can you resend and Cc also
linux-wireless@vger.kernel.org, please?

-- 
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH net] ibmveth: use net_err_ratelimited when set_multicast_list
From: Hangbin Liu @ 2019-08-02  9:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, tlfalcon
In-Reply-To: <20190801.125114.1466241781699884892.davem@davemloft.net>

On Thu, Aug 01, 2019 at 12:51:14PM -0400, David Miller wrote:
> From: Hangbin Liu <liuhangbin@gmail.com>
> Date: Thu,  1 Aug 2019 17:03:47 +0800
> 
> > When setting lots of multicast list on ibmveth, e.g. add 3000 membership on a
> > multicast group, the following error message flushes our log file
> > 
> > 8507    [  901.478251] ibmveth 30000003 env3: h_multicast_ctrl rc=4 when adding an entry to the filter table
> > ...
> > 1718386 [ 5636.808658] ibmveth 30000003 env3: h_multicast_ctrl rc=4 when adding an entry to the filter table
> > 
> > We got 1.5 million lines of messages in 1.3h. Let's replace netdev_err() by
> > net_err_ratelimited() to avoid this issue. I don't use netdev_err_once() in
> > case h_multicast_ctrl() return different lpar_rc types.
> > 
> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> 
> Let's work on fixing what causes this problem, or adding a retry
> mechanism, rather than making the message less painful.

Yes, the multicast issue should also be fixed. It looks more like a
driver issue as I haven't seen it on other drivers.

I think these should be two different fixes.

Thanks
Hangbin
> 
> What is worse is that these failures are not in any way communicated
> back up the callchain to show that the operation didn't complete
> sucessfully.
> 
> This is a real mess in behavior and error handling, don't make it
> worse please.

^ permalink raw reply

* Re: [PATCH V2 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
From: Jason Wang @ 2019-08-02  9:40 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm
In-Reply-To: <20190801141512.GB23899@ziepe.ca>


On 2019/8/1 下午10:15, Jason Gunthorpe wrote:
> On Thu, Aug 01, 2019 at 01:02:18PM +0800, Jason Wang wrote:
>> On 2019/8/1 上午3:30, Jason Gunthorpe wrote:
>>> On Wed, Jul 31, 2019 at 09:28:20PM +0800, Jason Wang wrote:
>>>> On 2019/7/31 下午8:39, Jason Gunthorpe wrote:
>>>>> On Wed, Jul 31, 2019 at 04:46:53AM -0400, Jason Wang wrote:
>>>>>> We used to use RCU to synchronize MMU notifier with worker. This leads
>>>>>> calling synchronize_rcu() in invalidate_range_start(). But on a busy
>>>>>> system, there would be many factors that may slow down the
>>>>>> synchronize_rcu() which makes it unsuitable to be called in MMU
>>>>>> notifier.
>>>>>>
>>>>>> A solution is SRCU but its overhead is obvious with the expensive full
>>>>>> memory barrier. Another choice is to use seqlock, but it doesn't
>>>>>> provide a synchronization method between readers and writers. The last
>>>>>> choice is to use vq mutex, but it need to deal with the worst case
>>>>>> that MMU notifier must be blocked and wait for the finish of swap in.
>>>>>>
>>>>>> So this patch switches use a counter to track whether or not the map
>>>>>> was used. The counter was increased when vq try to start or finish
>>>>>> uses the map. This means, when it was even, we're sure there's no
>>>>>> readers and MMU notifier is synchronized. When it was odd, it means
>>>>>> there's a reader we need to wait it to be even again then we are
>>>>>> synchronized.
>>>>> You just described a seqlock.
>>>> Kind of, see my explanation below.
>>>>
>>>>
>>>>> We've been talking about providing this as some core service from mmu
>>>>> notifiers because nearly every use of this API needs it.
>>>> That would be very helpful.
>>>>
>>>>
>>>>> IMHO this gets the whole thing backwards, the common pattern is to
>>>>> protect the 'shadow pte' data with a seqlock (usually open coded),
>>>>> such that the mmu notififer side has the write side of that lock and
>>>>> the read side is consumed by the thread accessing or updating the SPTE.
>>>> Yes, I've considered something like that. But the problem is, mmu notifier
>>>> (writer) need to wait for the vhost worker to finish the read before it can
>>>> do things like setting dirty pages and unmapping page.  It looks to me
>>>> seqlock doesn't provide things like this.
>>> The seqlock is usually used to prevent a 2nd thread from accessing the
>>> VA while it is being changed by the mm. ie you use something seqlocky
>>> instead of the ugly mmu_notifier_unregister/register cycle.
>>
>> Yes, so we have two mappings:
>>
>> [1] vring address to VA
>> [2] VA to PA
>>
>> And have several readers and writers
>>
>> 1) set_vring_num_addr(): writer of both [1] and [2]
>> 2) MMU notifier: reader of [1] writer of [2]
>> 3) GUP: reader of [1] writer of [2]
>> 4) memory accessors: reader of [1] and [2]
>>
>> Fortunately, 1) 3) and 4) have already synchronized through vq->mutex. We
>> only need to deal with synchronization between 2) and each of the reset:
>> Sync between 1) and 2): For mapping [1], I do
>> mmu_notifier_unregister/register. This help to avoid holding any lock to do
>> overlap check.
> I suspect you could have done this with a RCU technique instead of
> register/unregister.


Probably. But the issue to be addressed by this patch is the 
synchronization between MMU notifier and vhost worker.


>
>> Sync between 2) and 4): For mapping [1], both are readers, no need any
>> synchronization. For mapping [2], synchronize through RCU (or something
>> simliar to seqlock).
> You can't really use a seqlock, seqlocks are collision-retry locks,
> and the semantic here is that invalidate_range_start *MUST* not
> continue until thread doing #4 above is guarenteed no longer touching
> the memory.


Yes, that's the tricky part. For hardware like CPU, kicking through IPI 
is sufficient for synchronization. But for vhost kthread, it requires a 
low overhead synchronization.


>
> This must be a proper barrier, like a spinlock, mutex, or
> synchronize_rcu.


I start with synchronize_rcu() but both you and Michael raise some 
concern. Then I try spinlock and mutex:

1) spinlock: add lots of overhead on datapath, this leads 0 performance 
improvement.

2) SRCU: full memory barrier requires on srcu_read_lock(), which still 
leads little performance improvement

3) mutex: a possible issue is need to wait for the page to be swapped in 
(is this unacceptable ?), another issue is that we need hold vq lock 
during range overlap check.

4) using vhost_flush_work() instead of synchronize_rcu(): still need to 
wait for swap. But can do overlap checking without the lock


>
> And, again, you can't re-invent a spinlock with open coding and get
> something better.


So the question is if waiting for swap is considered to be unsuitable 
for MMU notifiers. If not, it would simplify codes. If not, we still 
need to figure out a possible solution.

Btw, I come up another idea, that is to disable preemption when vhost 
thread need to access the memory. Then register preempt notifier and if 
vhost thread is preempted, we're sure no one will access the memory and 
can do the cleanup.

Thanks


>
> Jason

^ permalink raw reply

* Re: [PATCH 06/34] drm/i915: convert put_page() to put_user_page*()
From: Joonas Lahtinen @ 2019-08-02  9:19 UTC (permalink / raw)
  To: Andrew Morton, john.hubbard
  Cc: Christoph Hellwig, Dan Williams, Dave Chinner, Dave Hansen,
	Ira Weiny, Jan Kara, Jason Gunthorpe, Jérôme Glisse,
	LKML, amd-gfx, ceph-devel, devel, devel, dri-devel, intel-gfx,
	kvm, linux-arm-kernel, linux-block, linux-crypto, linux-fbdev,
	linux-fsdevel, linux-media, linux-mm, linux-nfs, linux-rdma,
	linux-rpi-kernel, linux-xfs, netdev, rds-devel, sparclinux, x86,
	xen-devel, John Hubbard, Jani Nikula, Rodrigo Vivi, David Airlie
In-Reply-To: <20190802022005.5117-7-jhubbard@nvidia.com>

Quoting john.hubbard@gmail.com (2019-08-02 05:19:37)
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Note that this effectively changes the code's behavior in
> i915_gem_userptr_put_pages(): it now calls set_page_dirty_lock(),
> instead of set_page_dirty(). This is probably more accurate.

We've already fixed this in drm-tip where the current code uses
set_page_dirty_lock().

This would conflict with our tree. Rodrigo is handling
drm-intel-next for 5.4, so you guys want to coordinate how
to merge.

Regards, Joonas

^ permalink raw reply

* Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites
From: Michal Hocko @ 2019-08-02  9:12 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, Christoph Hellwig, Dan Williams, Dave Chinner,
	Dave Hansen, Ira Weiny, Jan Kara, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
	dri-devel, intel-gfx, kvm, linux-arm-kernel, linux-block,
	linux-crypto, linux-fbdev, linux-fsdevel, linux-media, linux-mm,
	linux-nfs, linux-rdma, linux-rpi-kernel, linux-xfs, netdev,
	rds-devel, sparclinux, x86, xen-devel, John Hubbard
In-Reply-To: <20190802022005.5117-1-jhubbard@nvidia.com>

On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
[...]
> 2) Convert all of the call sites for get_user_pages*(), to
> invoke put_user_page*(), instead of put_page(). This involves dozens of
> call sites, and will take some time.

How do we make sure this is the case and it will remain the case in the
future? There must be some automagic to enforce/check that. It is simply
not manageable to do it every now and then because then 3) will simply
be never safe.

Have you considered coccinele or some other scripted way to do the
transition? I have no idea how to deal with future changes that would
break the balance though.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply

* Re: [PATCH net-next] net: can: Fix compiling warning
From: Sergei Shtylyov @ 2019-08-02  8:59 UTC (permalink / raw)
  To: Mao Wenan, socketcan, davem, netdev; +Cc: linux-kernel, kernel-janitors
In-Reply-To: <20190802033643.84243-1-maowenan@huawei.com>

Hello!

On 02.08.2019 6:36, Mao Wenan wrote:

> There are two warings in net/can, fix them by setting bcm_sock_no_ioctlcmd

    Warnings. :-)

> and raw_sock_no_ioctlcmd as static.
> 
> net/can/bcm.c:1683:5: warning: symbol 'bcm_sock_no_ioctlcmd' was not declared. Should it be static?
> net/can/raw.c:840:5: warning: symbol 'raw_sock_no_ioctlcmd' was not declared. Should it be static?
> 
> Fixes: 473d924d7d46 ("can: fix ioctl function removal")
> 
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
[...]

MBR, Sergei

^ permalink raw reply

* Re: [v2,1/2] tools: bpftool: add net attach command to attach XDP on interface
From: Daniel T. Lee @ 2019-08-02  8:49 UTC (permalink / raw)
  To: Y Song; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CAH3MdRXkr5oD=yTr8qevFMLBuLyv1v-E7BLme8n2FA8+uPe6sg@mail.gmail.com>

On Fri, Aug 2, 2019 at 3:23 PM Y Song <ys114321@gmail.com> wrote:
>
> On Thu, Aug 1, 2019 at 2:04 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> >
> > By this commit, using `bpftool net attach`, user can attach XDP prog on
> > interface. New type of enum 'net_attach_type' has been made, as stated at
> > cover-letter, the meaning of 'attach' is, prog will be attached on interface.
> >
> > BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> > Changes in v2:
> >   - command 'load' changed to 'attach' for the consistency
> >   - 'NET_ATTACH_TYPE_XDP_DRIVE' changed to 'NET_ATTACH_TYPE_XDP_DRIVER'
> >
> >  tools/bpf/bpftool/net.c | 107 +++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 106 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > index 67e99c56bc88..f3b57660b303 100644
> > --- a/tools/bpf/bpftool/net.c
> > +++ b/tools/bpf/bpftool/net.c
> > @@ -55,6 +55,35 @@ struct bpf_attach_info {
> >         __u32 flow_dissector_id;
> >  };
> >
> > +enum net_attach_type {
> > +       NET_ATTACH_TYPE_XDP,
> > +       NET_ATTACH_TYPE_XDP_GENERIC,
> > +       NET_ATTACH_TYPE_XDP_DRIVER,
> > +       NET_ATTACH_TYPE_XDP_OFFLOAD,
> > +       __MAX_NET_ATTACH_TYPE
> > +};
> > +
> > +static const char * const attach_type_strings[] = {
> > +       [NET_ATTACH_TYPE_XDP] = "xdp",
> > +       [NET_ATTACH_TYPE_XDP_GENERIC] = "xdpgeneric",
> > +       [NET_ATTACH_TYPE_XDP_DRIVER] = "xdpdrv",
> > +       [NET_ATTACH_TYPE_XDP_OFFLOAD] = "xdpoffload",
> > +       [__MAX_NET_ATTACH_TYPE] = NULL,
> > +};
> > +
> > +static enum net_attach_type parse_attach_type(const char *str)
> > +{
> > +       enum net_attach_type type;
> > +
> > +       for (type = 0; type < __MAX_NET_ATTACH_TYPE; type++) {
> > +               if (attach_type_strings[type] &&
> > +                  is_prefix(str, attach_type_strings[type]))
> > +                       return type;
> > +       }
> > +
> > +       return __MAX_NET_ATTACH_TYPE;
> > +}
> > +
> >  static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
> >  {
> >         struct bpf_netdev_t *netinfo = cookie;
> > @@ -223,6 +252,77 @@ static int query_flow_dissector(struct bpf_attach_info *attach_info)
> >         return 0;
> >  }
> >
> > +static int parse_attach_args(int argc, char **argv, int *progfd,
> > +                            enum net_attach_type *attach_type, int *ifindex)
> > +{
> > +       if (!REQ_ARGS(3))
> > +               return -EINVAL;
> > +
> > +       *progfd = prog_parse_fd(&argc, &argv);
> > +       if (*progfd < 0)
> > +               return *progfd;
> > +
> > +       *attach_type = parse_attach_type(*argv);
> > +       if (*attach_type == __MAX_NET_ATTACH_TYPE) {
> > +               p_err("invalid net attach/detach type");
> > +               return -EINVAL;
> > +       }
> > +
> > +       NEXT_ARG();
> > +       if (!REQ_ARGS(1))
> > +               return -EINVAL;
> > +
> > +       *ifindex = if_nametoindex(*argv);
> > +       if (!*ifindex) {
> > +               p_err("Invalid ifname");
>
> Do you want to use the full function name "invalid if_nametoindex" here?
>

No. I was trying to fix the message as "Invalid devname", since the
word "devanme"
is mentioned in 'bpftool net help'.

> > +               return -EINVAL;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> > +static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> > +                               int *ifindex)
>
> You can just use plain int as the argument type here.
>

I will change the parameter as pass by value.

> > +{
> > +       __u32 flags;
> > +       int err;
> > +
> > +       flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> > +       if (*attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
> > +               flags |= XDP_FLAGS_SKB_MODE;
> > +       if (*attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
> > +               flags |= XDP_FLAGS_DRV_MODE;
> > +       if (*attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
> > +               flags |= XDP_FLAGS_HW_MODE;
> > +
> > +       err = bpf_set_link_xdp_fd(*ifindex, *progfd, flags);
> > +
> > +       return err;
>
> Just do "return bpf_set_link_xdp_fd(...)" and you do not need variable err.
>

Oh. I've misunderstood why variable err won't be needed.
I'll remove the variable err and update to it.

> > +}
> > +
> > +static int do_attach(int argc, char **argv)
> > +{
> > +       enum net_attach_type attach_type;
> > +       int err, progfd, ifindex;
> > +
> > +       err = parse_attach_args(argc, argv, &progfd, &attach_type, &ifindex);
> > +       if (err)
> > +               return err;
> > +
> > +       if (is_prefix("xdp", attach_type_strings[attach_type]))
> > +               err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);
> > +
> > +       if (err < 0) {
> > +               p_err("link set %s failed", attach_type_strings[attach_type]);
> > +               return -1;
> > +       }
>
> The above "if (err < 0)" can be true only under the above "if (is_prefix(..))"
> conditions. But compiler may optimize this. So I think current form is okay.
> But could you change the return value to "return err" instead of "return -1"?
>

Okay. I'll update the return value to "return err".

> > +
> > +       if (json_output)
> > +               jsonw_null(json_wtr);
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_show(int argc, char **argv)
> >  {
> >         struct bpf_attach_info attach_info = {};
> > @@ -305,13 +405,17 @@ static int do_help(int argc, char **argv)
> >
> >         fprintf(stderr,
> >                 "Usage: %s %s { show | list } [dev <devname>]\n"
> > +               "       %s %s attach PROG LOAD_TYPE <devname>\n"
> >                 "       %s %s help\n"
> > +               "\n"
> > +               "       " HELP_SPEC_PROGRAM "\n"
> > +               "       LOAD_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
> >                 "Note: Only xdp and tc attachments are supported now.\n"
> >                 "      For progs attached to cgroups, use \"bpftool cgroup\"\n"
> >                 "      to dump program attachments. For program types\n"
> >                 "      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> >                 "      consult iproute2.\n",
> > -               bin_name, argv[-2], bin_name, argv[-2]);
> > +               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> >
> >         return 0;
> >  }
> > @@ -319,6 +423,7 @@ static int do_help(int argc, char **argv)
> >  static const struct cmd cmds[] = {
> >         { "show",       do_show },
> >         { "list",       do_show },
> > +       { "attach",     do_attach },
> >         { "help",       do_help },
> >         { 0 }
> >  };
> > --
> > 2.20.1
> >

Thank you for the review :)

^ permalink raw reply

* [patch 1/1] drivers/net/ethernet/marvell/mvmdio.c: Fix non OF case
From: Arnaud Patard @ 2019-08-02  8:32 UTC (permalink / raw)
  To: netdev; +Cc: Andrew Lunn, Arnaud Patard

Orion5.x systems are still using machine files and not device-tree.
Commit 96cb4342382290c9 ("net: mvmdio: allow up to three clocks to be
specified for orion-mdio") has replaced devm_clk_get() with of_clk_get(),
leading to a oops at boot and not working network, as reported in 
https://lists.debian.org/debian-arm/2019/07/msg00088.html and possibly in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908712.

Link: https://lists.debian.org/debian-arm/2019/07/msg00088.html
Fixes: 96cb4342382290c9 ("net: mvmdio: allow up to three clocks to be specified for orion-mdio")
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Index: linux-next/drivers/net/ethernet/marvell/mvmdio.c
===================================================================
--- linux-next.orig/drivers/net/ethernet/marvell/mvmdio.c
+++ linux-next/drivers/net/ethernet/marvell/mvmdio.c
@@ -319,20 +319,33 @@ static int orion_mdio_probe(struct platf
 
 	init_waitqueue_head(&dev->smi_busy_wait);
 
-	for (i = 0; i < ARRAY_SIZE(dev->clk); i++) {
-		dev->clk[i] = of_clk_get(pdev->dev.of_node, i);
-		if (PTR_ERR(dev->clk[i]) == -EPROBE_DEFER) {
+	if (pdev->dev.of_node) {
+		for (i = 0; i < ARRAY_SIZE(dev->clk); i++) {
+			dev->clk[i] = of_clk_get(pdev->dev.of_node, i);
+			if (PTR_ERR(dev->clk[i]) == -EPROBE_DEFER) {
+				ret = -EPROBE_DEFER;
+				goto out_clk;
+			}
+			if (IS_ERR(dev->clk[i]))
+				break;
+			clk_prepare_enable(dev->clk[i]);
+		}
+
+		if (!IS_ERR(of_clk_get(pdev->dev.of_node,
+				       ARRAY_SIZE(dev->clk))))
+			dev_warn(&pdev->dev,
+				 "unsupported number of clocks, limiting to the first "
+				 __stringify(ARRAY_SIZE(dev->clk)) "\n");
+	} else {
+		dev->clk[0] = clk_get(&pdev->dev, NULL);
+		if (PTR_ERR(dev->clk[0]) == -EPROBE_DEFER) {
 			ret = -EPROBE_DEFER;
 			goto out_clk;
 		}
-		if (IS_ERR(dev->clk[i]))
-			break;
-		clk_prepare_enable(dev->clk[i]);
+		if (!IS_ERR(dev->clk[0]))
+			clk_prepare_enable(dev->clk[0]);
 	}
 
-	if (!IS_ERR(of_clk_get(pdev->dev.of_node, ARRAY_SIZE(dev->clk))))
-		dev_warn(&pdev->dev, "unsupported number of clocks, limiting to the first "
-			 __stringify(ARRAY_SIZE(dev->clk)) "\n");
 
 	dev->err_interrupt = platform_get_irq(pdev, 0);
 	if (dev->err_interrupt > 0 &&



^ permalink raw reply

* [PATCH net] net/smc: avoid fallback in case of non-blocking connect
From: Karsten Graul @ 2019-08-02  8:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, gor, heiko.carstens, raspl, ubraun

From: Ursula Braun <ubraun@linux.ibm.com>

FASTOPEN is not possible with SMC. sendmsg() with msg_flag MSG_FASTOPEN
triggers a fallback to TCP if the socket is in state SMC_INIT.
But if a nonblocking connect is already started, fallback to TCP
is no longer possible, even though the socket may still be in state
SMC_INIT.
And if a nonblocking connect is already started, a listen() call
does not make sense.

Reported-by: syzbot+bd8cc73d665590a1fcad@syzkaller.appspotmail.com
Fixes: 50717a37db032 ("net/smc: nonblocking connect rework")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
---
 net/smc/af_smc.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index f5ea09258ab0..5b932583e407 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -263,7 +263,7 @@ static int smc_bind(struct socket *sock, struct sockaddr *uaddr,
 
 	/* Check if socket is already active */
 	rc = -EINVAL;
-	if (sk->sk_state != SMC_INIT)
+	if (sk->sk_state != SMC_INIT || smc->connect_nonblock)
 		goto out_rel;
 
 	smc->clcsock->sk->sk_reuse = sk->sk_reuse;
@@ -1390,7 +1390,8 @@ static int smc_listen(struct socket *sock, int backlog)
 	lock_sock(sk);
 
 	rc = -EINVAL;
-	if ((sk->sk_state != SMC_INIT) && (sk->sk_state != SMC_LISTEN))
+	if ((sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) ||
+	    smc->connect_nonblock)
 		goto out;
 
 	rc = 0;
@@ -1518,7 +1519,7 @@ static int smc_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 		goto out;
 
 	if (msg->msg_flags & MSG_FASTOPEN) {
-		if (sk->sk_state == SMC_INIT) {
+		if (sk->sk_state == SMC_INIT && !smc->connect_nonblock) {
 			smc_switch_to_fallback(smc);
 			smc->fallback_rsn = SMC_CLC_DECL_OPTUNSUPP;
 		} else {
-- 
2.21.0


^ permalink raw reply related

* [PATCH v2 2/2] cxgb4: smt: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:41 UTC (permalink / raw)
  Cc: Vishal Kulkarni, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/chelsio/cxgb4/smt.c | 14 +++++++-------
 drivers/net/ethernet/chelsio/cxgb4/smt.h |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/smt.c b/drivers/net/ethernet/chelsio/cxgb4/smt.c
index eaf1fb74689c..343887fa52aa 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/smt.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/smt.c
@@ -57,7 +57,7 @@ struct smt_data *t4_init_smt(void)
 		s->smtab[i].state = SMT_STATE_UNUSED;
 		memset(&s->smtab[i].src_mac, 0, ETH_ALEN);
 		spin_lock_init(&s->smtab[i].lock);
-		atomic_set(&s->smtab[i].refcnt, 0);
+		refcount_set(&s->smtab[i].refcnt, 1);
 	}
 	return s;
 }
@@ -68,7 +68,7 @@ static struct smt_entry *find_or_alloc_smte(struct smt_data *s, u8 *smac)
 	struct smt_entry *e, *end;
 
 	for (e = &s->smtab[0], end = &s->smtab[s->smt_size]; e != end; ++e) {
-		if (atomic_read(&e->refcnt) == 0) {
+		if (refcount_read(&e->refcnt) == 1) {
 			if (!first_free)
 				first_free = e;
 		} else {
@@ -98,7 +98,7 @@ static struct smt_entry *find_or_alloc_smte(struct smt_data *s, u8 *smac)
 static void t4_smte_free(struct smt_entry *e)
 {
 	spin_lock_bh(&e->lock);
-	if (atomic_read(&e->refcnt) == 0) {  /* hasn't been recycled */
+	if (refcount_read(&e->refcnt) == 1) {  /* hasn't been recycled */
 		e->state = SMT_STATE_UNUSED;
 	}
 	spin_unlock_bh(&e->lock);
@@ -111,7 +111,7 @@ static void t4_smte_free(struct smt_entry *e)
  */
 void cxgb4_smt_release(struct smt_entry *e)
 {
-	if (atomic_dec_and_test(&e->refcnt))
+	if (refcount_dec_and_test(&e->refcnt))
 		t4_smte_free(e);
 }
 EXPORT_SYMBOL(cxgb4_smt_release);
@@ -215,14 +215,14 @@ static struct smt_entry *t4_smt_alloc_switching(struct adapter *adap, u16 pfvf,
 	e = find_or_alloc_smte(s, smac);
 	if (e) {
 		spin_lock(&e->lock);
-		if (!atomic_read(&e->refcnt)) {
-			atomic_set(&e->refcnt, 1);
+		if (refcount_read(&e->refcnt) == 1) {
+			refcount_set(&e->refcnt, 2);
 			e->state = SMT_STATE_SWITCHING;
 			e->pfvf = pfvf;
 			memcpy(e->src_mac, smac, ETH_ALEN);
 			write_smt_entry(adap, e);
 		} else {
-			atomic_inc(&e->refcnt);
+			refcount_inc(&e->refcnt);
 		}
 		spin_unlock(&e->lock);
 	}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/smt.h b/drivers/net/ethernet/chelsio/cxgb4/smt.h
index d6c2cc271398..4774606a0101 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/smt.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/smt.h
@@ -59,7 +59,7 @@ struct smt_entry {
 	u16 idx;
 	u16 pfvf;
 	u8 src_mac[ETH_ALEN];
-	atomic_t refcnt;
+	refcount_t refcnt;
 	spinlock_t lock;	/* protect smt entry add,removal */
 };
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 1/2] cxgb4: sched: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:41 UTC (permalink / raw)
  Cc: Vishal Kulkarni, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/chelsio/cxgb4/sched.c | 8 ++++----
 drivers/net/ethernet/chelsio/cxgb4/sched.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.c b/drivers/net/ethernet/chelsio/cxgb4/sched.c
index 60218dc676a8..0d902d125be4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.c
@@ -173,7 +173,7 @@ static int t4_sched_queue_unbind(struct port_info *pi, struct ch_sched_queue *p)
 
 		list_del(&qe->list);
 		kvfree(qe);
-		if (atomic_dec_and_test(&e->refcnt)) {
+		if (refcount_dec_and_test(&e->refcnt)) {
 			e->state = SCHED_STATE_UNUSED;
 			memset(&e->info, 0, sizeof(e->info));
 		}
@@ -216,7 +216,7 @@ static int t4_sched_queue_bind(struct port_info *pi, struct ch_sched_queue *p)
 		goto out_err;
 
 	list_add_tail(&qe->list, &e->queue_list);
-	atomic_inc(&e->refcnt);
+	refcount_inc(&e->refcnt);
 	return err;
 
 out_err:
@@ -434,7 +434,7 @@ static struct sched_class *t4_sched_class_alloc(struct port_info *pi,
 		if (err)
 			return NULL;
 		memcpy(&e->info, &np, sizeof(e->info));
-		atomic_set(&e->refcnt, 0);
+		refcount_set(&e->refcnt, 1);
 		e->state = SCHED_STATE_ACTIVE;
 	}
 
@@ -488,7 +488,7 @@ struct sched_table *t4_init_sched(unsigned int sched_size)
 		s->tab[i].idx = i;
 		s->tab[i].state = SCHED_STATE_UNUSED;
 		INIT_LIST_HEAD(&s->tab[i].queue_list);
-		atomic_set(&s->tab[i].refcnt, 0);
+		refcount_set(&s->tab[i].refcnt, 1);
 	}
 	return s;
 }
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.h b/drivers/net/ethernet/chelsio/cxgb4/sched.h
index 168fb4ce3759..23a6ca1e6d3e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.h
@@ -69,7 +69,7 @@ struct sched_class {
 	u8 idx;
 	struct ch_sched_params info;
 	struct list_head queue_list;
-	atomic_t refcnt;
+	refcount_t refcnt;
 };
 
 struct sched_table {      /* per port scheduling table */
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next] net: can: Fix compiling warning
From: maowenan @ 2019-08-02  8:39 UTC (permalink / raw)
  To: Oliver Hartkopp, davem, netdev; +Cc: linux-kernel, kernel-janitors
In-Reply-To: <0050efdb-af9f-49b9-8d83-f574b3d46a2e@hartkopp.net>



On 2019/8/2 16:10, Oliver Hartkopp wrote:
> On 02/08/2019 05.36, Mao Wenan wrote:
>> There are two warings in net/can, fix them by setting bcm_sock_no_ioctlcmd
>> and raw_sock_no_ioctlcmd as static.
>>
>> net/can/bcm.c:1683:5: warning: symbol 'bcm_sock_no_ioctlcmd' was not declared. Should it be static?
>> net/can/raw.c:840:5: warning: symbol 'raw_sock_no_ioctlcmd' was not declared. Should it be static?
>>
>> Fixes: 473d924d7d46 ("can: fix ioctl function removal")
>>
>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> 
> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
> 
> Thanks Mao!
> 
> Btw. what kind of compiler/make switches are you using so that I can see these warnings myself the next time?
> 

I use ARCH=mips CROSS_COMPILE=mips-linux-gnu- .

> Best regards,
> Oliver
> 
>> ---
>>   net/can/bcm.c | 2 +-
>>   net/can/raw.c | 2 +-
>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/can/bcm.c b/net/can/bcm.c
>> index bf1d0bbecec8..b8a32b4ac368 100644
>> --- a/net/can/bcm.c
>> +++ b/net/can/bcm.c
>> @@ -1680,7 +1680,7 @@ static int bcm_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
>>       return size;
>>   }
>>   -int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>> +static int bcm_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>>                unsigned long arg)
>>   {
>>       /* no ioctls for socket layer -> hand it down to NIC layer */
>> diff --git a/net/can/raw.c b/net/can/raw.c
>> index da386f1fa815..a01848ff9b12 100644
>> --- a/net/can/raw.c
>> +++ b/net/can/raw.c
>> @@ -837,7 +837,7 @@ static int raw_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
>>       return size;
>>   }
>>   -int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>> +static int raw_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd,
>>                unsigned long arg)
>>   {
>>       /* no ioctls for socket layer -> hand it down to NIC layer */
>>
> 
> .
> 


^ permalink raw reply

* [PATCH v2 2/2] cxgb4: smt: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:35 UTC (permalink / raw)
  Cc: Vishal Kulkarni, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/chelsio/cxgb4/smt.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/smt.c b/drivers/net/ethernet/chelsio/cxgb4/smt.c
index eaf1fb74689c..343887fa52aa 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/smt.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/smt.c
@@ -57,7 +57,7 @@ struct smt_data *t4_init_smt(void)
 		s->smtab[i].state = SMT_STATE_UNUSED;
 		memset(&s->smtab[i].src_mac, 0, ETH_ALEN);
 		spin_lock_init(&s->smtab[i].lock);
-		atomic_set(&s->smtab[i].refcnt, 0);
+		refcount_set(&s->smtab[i].refcnt, 1);
 	}
 	return s;
 }
@@ -68,7 +68,7 @@ static struct smt_entry *find_or_alloc_smte(struct smt_data *s, u8 *smac)
 	struct smt_entry *e, *end;
 
 	for (e = &s->smtab[0], end = &s->smtab[s->smt_size]; e != end; ++e) {
-		if (atomic_read(&e->refcnt) == 0) {
+		if (refcount_read(&e->refcnt) == 1) {
 			if (!first_free)
 				first_free = e;
 		} else {
@@ -98,7 +98,7 @@ static struct smt_entry *find_or_alloc_smte(struct smt_data *s, u8 *smac)
 static void t4_smte_free(struct smt_entry *e)
 {
 	spin_lock_bh(&e->lock);
-	if (atomic_read(&e->refcnt) == 0) {  /* hasn't been recycled */
+	if (refcount_read(&e->refcnt) == 1) {  /* hasn't been recycled */
 		e->state = SMT_STATE_UNUSED;
 	}
 	spin_unlock_bh(&e->lock);
@@ -111,7 +111,7 @@ static void t4_smte_free(struct smt_entry *e)
  */
 void cxgb4_smt_release(struct smt_entry *e)
 {
-	if (atomic_dec_and_test(&e->refcnt))
+	if (refcount_dec_and_test(&e->refcnt))
 		t4_smte_free(e);
 }
 EXPORT_SYMBOL(cxgb4_smt_release);
@@ -215,14 +215,14 @@ static struct smt_entry *t4_smt_alloc_switching(struct adapter *adap, u16 pfvf,
 	e = find_or_alloc_smte(s, smac);
 	if (e) {
 		spin_lock(&e->lock);
-		if (!atomic_read(&e->refcnt)) {
-			atomic_set(&e->refcnt, 1);
+		if (refcount_read(&e->refcnt) == 1) {
+			refcount_set(&e->refcnt, 2);
 			e->state = SMT_STATE_SWITCHING;
 			e->pfvf = pfvf;
 			memcpy(e->src_mac, smac, ETH_ALEN);
 			write_smt_entry(adap, e);
 		} else {
-			atomic_inc(&e->refcnt);
+			refcount_inc(&e->refcnt);
 		}
 		spin_unlock(&e->lock);
 	}
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 1/2] cxgb4: sched: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-08-02  8:35 UTC (permalink / raw)
  Cc: Vishal Kulkarni, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
Changes in v2:
  - Convert refcount from 0-base to 1-base.

 drivers/net/ethernet/chelsio/cxgb4/sched.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.c b/drivers/net/ethernet/chelsio/cxgb4/sched.c
index 60218dc676a8..0d902d125be4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.c
@@ -173,7 +173,7 @@ static int t4_sched_queue_unbind(struct port_info *pi, struct ch_sched_queue *p)
 
 		list_del(&qe->list);
 		kvfree(qe);
-		if (atomic_dec_and_test(&e->refcnt)) {
+		if (refcount_dec_and_test(&e->refcnt)) {
 			e->state = SCHED_STATE_UNUSED;
 			memset(&e->info, 0, sizeof(e->info));
 		}
@@ -216,7 +216,7 @@ static int t4_sched_queue_bind(struct port_info *pi, struct ch_sched_queue *p)
 		goto out_err;
 
 	list_add_tail(&qe->list, &e->queue_list);
-	atomic_inc(&e->refcnt);
+	refcount_inc(&e->refcnt);
 	return err;
 
 out_err:
@@ -434,7 +434,7 @@ static struct sched_class *t4_sched_class_alloc(struct port_info *pi,
 		if (err)
 			return NULL;
 		memcpy(&e->info, &np, sizeof(e->info));
-		atomic_set(&e->refcnt, 0);
+		refcount_set(&e->refcnt, 1);
 		e->state = SCHED_STATE_ACTIVE;
 	}
 
@@ -488,7 +488,7 @@ struct sched_table *t4_init_sched(unsigned int sched_size)
 		s->tab[i].idx = i;
 		s->tab[i].state = SCHED_STATE_UNUSED;
 		INIT_LIST_HEAD(&s->tab[i].queue_list);
-		atomic_set(&s->tab[i].refcnt, 0);
+		refcount_set(&s->tab[i].refcnt, 1);
 	}
 	return s;
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [v2,2/2] tools: bpftool: add net detach command to detach XDP on interface
From: Daniel T. Lee @ 2019-08-02  8:33 UTC (permalink / raw)
  To: Y Song; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CAH3MdRX_OCnN82GESHBD+-wZvZqo7fba0ExDyqTh_3_tfRR1Nw@mail.gmail.com>

On Fri, Aug 2, 2019 at 3:26 PM Y Song <ys114321@gmail.com> wrote:
>
> On Thu, Aug 1, 2019 at 2:04 AM Daniel T. Lee <danieltimlee@gmail.com> wrote:
> >
> > By this commit, using `bpftool net detach`, the attached XDP prog can
> > be detached. Detaching the BPF prog will be done through libbpf
> > 'bpf_set_link_xdp_fd' with the progfd set to -1.
> >
> > Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> > ---
> > Changes in v2:
> >   - command 'unload' changed to 'detach' for the consistency
> >
> >  tools/bpf/bpftool/net.c | 55 ++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 54 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
> > index f3b57660b303..2ae9a613b05c 100644
> > --- a/tools/bpf/bpftool/net.c
> > +++ b/tools/bpf/bpftool/net.c
> > @@ -281,6 +281,31 @@ static int parse_attach_args(int argc, char **argv, int *progfd,
> >         return 0;
> >  }
> >
> > +static int parse_detach_args(int argc, char **argv,
> > +                            enum net_attach_type *attach_type, int *ifindex)
> > +{
> > +       if (!REQ_ARGS(2))
> > +               return -EINVAL;
> > +
> > +       *attach_type = parse_attach_type(*argv);
> > +       if (*attach_type == __MAX_NET_ATTACH_TYPE) {
> > +               p_err("invalid net attach/detach type");
> > +               return -EINVAL;
> > +       }
> > +
> > +       NEXT_ARG();
> > +       if (!REQ_ARGS(1))
> > +               return -EINVAL;
> > +
> > +       *ifindex = if_nametoindex(*argv);
> > +       if (!*ifindex) {
> > +               p_err("Invalid ifname");
> > +               return -EINVAL;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_attach_detach_xdp(int *progfd, enum net_attach_type *attach_type,
> >                                 int *ifindex)
> >  {
> > @@ -323,6 +348,31 @@ static int do_attach(int argc, char **argv)
> >         return 0;
> >  }
> >
> > +static int do_detach(int argc, char **argv)
> > +{
> > +       enum net_attach_type attach_type;
> > +       int err, progfd, ifindex;
> > +
> > +       err = parse_detach_args(argc, argv, &attach_type, &ifindex);
> > +       if (err)
> > +               return err;
> > +
> > +       /* to detach xdp prog */
> > +       progfd = -1;
> > +       if (is_prefix("xdp", attach_type_strings[attach_type]))
> > +               err = do_attach_detach_xdp(&progfd, &attach_type, &ifindex);
>
> Similar to previous patch, parameters no need to be pointer.
>

I will change the parameter as pass by value.

> > +
> > +       if (err < 0) {
> > +               p_err("link set %s failed", attach_type_strings[attach_type]);
> > +               return -1;
>
> Maybe "return err"?
>

Hadn't thought of that.
I will change to it!

> > +       }
> > +
> > +       if (json_output)
> > +               jsonw_null(json_wtr);
> > +
> > +       return 0;
> > +}
> > +
> >  static int do_show(int argc, char **argv)
> >  {
> >         struct bpf_attach_info attach_info = {};
> > @@ -406,6 +456,7 @@ static int do_help(int argc, char **argv)
> >         fprintf(stderr,
> >                 "Usage: %s %s { show | list } [dev <devname>]\n"
> >                 "       %s %s attach PROG LOAD_TYPE <devname>\n"
> > +               "       %s %s detach LOAD_TYPE <devname>\n"
> >                 "       %s %s help\n"
> >                 "\n"
> >                 "       " HELP_SPEC_PROGRAM "\n"
> > @@ -415,7 +466,8 @@ static int do_help(int argc, char **argv)
> >                 "      to dump program attachments. For program types\n"
> >                 "      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
> >                 "      consult iproute2.\n",
> > -               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
> > +               bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
> > +               bin_name, argv[-2]);
> >
> >         return 0;
> >  }
> > @@ -424,6 +476,7 @@ static const struct cmd cmds[] = {
> >         { "show",       do_show },
> >         { "list",       do_show },
> >         { "attach",     do_attach },
> > +       { "detach",     do_detach },
> >         { "help",       do_help },
> >         { 0 }
> >  };
> > --
> > 2.20.1
> >

Thanks for the review!

^ permalink raw reply

* [PATCH net] net/smc: do not schedule tx_work in SMC_CLOSED state
From: Karsten Graul @ 2019-08-02  8:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, gor, heiko.carstens, raspl, ubraun

From: Ursula Braun <ubraun@linux.ibm.com>

The setsockopts options TCP_NODELAY and TCP_CORK may schedule the
tx worker. Make sure the socket is not yet moved into SMC_CLOSED
state (for instance by a shutdown SHUT_RDWR call).

Reported-by: syzbot+92209502e7aab127c75f@syzkaller.appspotmail.com
Reported-by: syzbot+b972214bb803a343f4fe@syzkaller.appspotmail.com
Fixes: 01d2f7e2cdd31 ("net/smc: sockopts TCP_NODELAY and TCP_CORK")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
---
 net/smc/af_smc.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 302e355f2ebc..f5ea09258ab0 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1732,14 +1732,18 @@ static int smc_setsockopt(struct socket *sock, int level, int optname,
 		}
 		break;
 	case TCP_NODELAY:
-		if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) {
+		if (sk->sk_state != SMC_INIT &&
+		    sk->sk_state != SMC_LISTEN &&
+		    sk->sk_state != SMC_CLOSED) {
 			if (val && !smc->use_fallback)
 				mod_delayed_work(system_wq, &smc->conn.tx_work,
 						 0);
 		}
 		break;
 	case TCP_CORK:
-		if (sk->sk_state != SMC_INIT && sk->sk_state != SMC_LISTEN) {
+		if (sk->sk_state != SMC_INIT &&
+		    sk->sk_state != SMC_LISTEN &&
+		    sk->sk_state != SMC_CLOSED) {
 			if (!val && !smc->use_fallback)
 				mod_delayed_work(system_wq, &smc->conn.tx_work,
 						 0);
-- 
2.21.0


^ permalink raw reply related

* RE: [net-next 2/9] i40e: make visible changed vf mac on host
From: Loktionov, Aleksandr @ 2019-08-02  8:14 UTC (permalink / raw)
  To: Shannon Nelson, Kirsher, Jeffrey T, davem@davemloft.net
  Cc: netdev@vger.kernel.org, nhorman@redhat.com, sassmann@redhat.com,
	Bowers, AndrewX
In-Reply-To: <9a3a4675-b031-7666-f259-978d18b6db19@pensando.io>

Good day Nelson

In 99% cases VF has _only one_ unicast mac anyway, and the last MAC has been chosen because of VF mac address change algo -  it marks unicast filter for deletion and appends a new unicast filter to the list.
The implementation has been chosen because of simplicity /* Just 3 more lines to solve the issue */, from one point it may look wasteful for some 1% of VF corner cases.
But from another point of view, more complicated code will affect 99% normal cases. Modern cpu are sensitive to cache thrash by code size and pipeline stalls by conditionals.

Alex

-----Original Message-----
From: Shannon Nelson [mailto:snelson@pensando.io] 
Sent: Friday, August 2, 2019 2:11 AM
To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; davem@davemloft.net
Cc: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; netdev@vger.kernel.org; nhorman@redhat.com; sassmann@redhat.com; Bowers, AndrewX <andrewx.bowers@intel.com>
Subject: Re: [net-next 2/9] i40e: make visible changed vf mac on host

On 8/1/19 1:51 PM, Jeff Kirsher wrote:
> From: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>
> This patch makes changed VM mac address visible on host via ip link 
> show command. This problem is fixed by copying last unicast mac filter 
> to vf->default_lan_addr.addr. Without this patch if VF MAC was not set 
> from host side and if you run ip link show $pf, on host side you'd 
> always see a zero MAC, not the real VF MAC that VF assigned to itself.
>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>   drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
> b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> index 02b09a8ad54c..21f7ac514d1f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
> @@ -2629,6 +2629,9 @@ static int i40e_vc_add_mac_addr_msg(struct i40e_vf *vf, u8 *msg)
>   			} else {
>   				vf->num_mac++;
>   			}
> +			if (is_valid_ether_addr(al->list[i].addr))
> +				ether_addr_copy(vf->default_lan_addr.addr,
> +						al->list[i].addr);
>   		}
>   	}
>   	spin_unlock_bh(&vsi->mac_filter_hash_lock);

Since this copy is done inside the for-loop, it looks like you are copying every address in the list, not just the last one.  This seems wasteful and unnecessary.

Since it is possible, altho' unlikely, that the filter sync that happens a little later could fail, might it be better to do the copy after you know that the sync has succeeded?

Why is the last mac chosen for display rather than the first?  Is there anything special about the last mac as opposed to the first mac?

sln

--------------------------------------------------------------------

Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox