Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [EXT] [PATCH net-next 16/16] qlge: Refill empty buffer queues from wq
From: Benjamin Poirier @ 2019-07-10  1:18 UTC (permalink / raw)
  To: Manish Chopra; +Cc: GR-Linux-NIC-Dev, netdev@vger.kernel.org
In-Reply-To: <DM6PR18MB2697EC53399F214EC3DFC4ABABFD0@DM6PR18MB2697.namprd18.prod.outlook.com>

On 2019/06/27 14:18, Manish Chopra wrote:
> > -----Original Message-----
> > From: Benjamin Poirier <bpoirier@suse.com>
> > Sent: Monday, June 17, 2019 1:19 PM
> > To: Manish Chopra <manishc@marvell.com>; GR-Linux-NIC-Dev <GR-Linux-
> > NIC-Dev@marvell.com>; netdev@vger.kernel.org
> > Subject: [EXT] [PATCH net-next 16/16] qlge: Refill empty buffer queues from
> > wq
> > 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > When operating at mtu 9000, qlge does order-1 allocations for rx buffers in
> > atomic context. This is especially unreliable when free memory is low or
> > fragmented. Add an approach similar to commit 3161e453e496 ("virtio: net
> > refill on out-of-memory") to qlge so that the device doesn't lock up if there
> > are allocation failures.
> > 
[...]
> > +
> > +static void ql_update_buffer_queues(struct rx_ring *rx_ring, gfp_t gfp,
> > +				    unsigned long delay)
> > +{
> > +	bool sbq_fail, lbq_fail;
> > +
> > +	sbq_fail = !!qlge_refill_bq(&rx_ring->sbq, gfp);
> > +	lbq_fail = !!qlge_refill_bq(&rx_ring->lbq, gfp);
> > +
> > +	/* Minimum number of buffers needed to be able to receive at least
> > one
> > +	 * frame of any format:
> > +	 * sbq: 1 for header + 1 for data
> > +	 * lbq: mtu 9000 / lb size
> > +	 * Below this, the queue might stall.
> > +	 */
> > +	if ((sbq_fail && QLGE_BQ_HW_OWNED(&rx_ring->sbq) < 2) ||
> > +	    (lbq_fail && QLGE_BQ_HW_OWNED(&rx_ring->lbq) <
> > +	     DIV_ROUND_UP(9000, LARGE_BUFFER_MAX_SIZE)))
> > +		/* Allocations can take a long time in certain cases (ex.
> > +		 * reclaim). Therefore, use a workqueue for long-running
> > +		 * work items.
> > +		 */
> > +		queue_delayed_work_on(smp_processor_id(),
> > system_long_wq,
> > +				      &rx_ring->refill_work, delay);
> >  }
> > 
> 
> This is probably going to mess up when at the interface load time (qlge_open()) allocation failure occurs, in such cases we don't really want to re-try allocations
> using refill_work but rather simply fail the interface load.

Why would you want to turn a recoverable failure into a fatal failure?

In case of allocation failure at ndo_open time, allocations are retried
later from a workqueue. Meanwhile, the device can use the available rx
buffers (if any could be allocated at all).

> Just to make sure here in such cases it shouldn't lead to kernel panic etc. while completing qlge_open() and
> leaving refill_work executing in background. Or probably handle such allocation failures from the napi context and schedule refill_work from there.
> 

I've just tested allocation failures at open time and didn't find
problems; with mtu 9000, using bcc, for example:
tools/inject.py -P 0.5 -c 100 alloc_page "should_fail_alloc_page(gfp_t gfp_mask, unsigned int order) (order == 1) => qlge_refill_bq()"

What exact scenario do you have in mind that's going to lead to
problems? Please try it out and describe it precisely.

^ permalink raw reply

* RE: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Haiyang Zhang @ 2019-07-10  1:04 UTC (permalink / raw)
  To: David Miller
  Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
	olaf@aepfle.de, vkuznets, linux-kernel@vger.kernel.org
In-Reply-To: <20190709.172936.1666884223446806217.davem@davemloft.net>



> -----Original Message-----
> From: linux-hyperv-owner@vger.kernel.org <linux-hyperv-
> owner@vger.kernel.org> On Behalf Of David Miller
> Sent: Tuesday, July 9, 2019 8:30 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next] Name NICs based on vmbus offer and enable
> async probe by default
> 
> 
> The net-next tree, if you are reading netdev today, has been closed.
I will re-submit when the tree re-opened. 
Thanks,
- Haiyang

^ permalink raw reply

* RE: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Haiyang Zhang @ 2019-07-10  1:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
	olaf@aepfle.de, vkuznets, davem@davemloft.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190709164714.70774c92@hermes.lan>



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, July 9, 2019 7:47 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH net-next] Name NICs based on vmbus offer and enable
> async probe by default
> 
> On Tue, 9 Jul 2019 22:56:30 +0000
> Haiyang Zhang <haiyangz@microsoft.com> wrote:
> 
> > -				VRSS_CHANNEL_MAX);
> > +	if (probe_type == PROBE_PREFER_ASYNCHRONOUS) {
> > +		snprintf(name, IFNAMSIZ, "eth%d", dev->channel->dev_num);
> 
> What about PCI passthrough or VF devices that are also being probed and
> consuming the ethN names.  Won't there be a collision?

VF usually shows up a few seconds later than the synthetic NIC. Faster probing
will reduce the probability of collision.
Even if a collision happens, the code below will re-register the NIC with "eth%d":
+	if (ret == -EEXIST) {
+		pr_info("NIC name %s exists, request another name.\n",
+			net->name);
+		strlcpy(net->name, "eth%d", IFNAMSIZ);
+		ret = register_netdevice(net);
+	}

Thanks,
- Haiyang

^ permalink raw reply

* Re: [PATCH iproute2 v2 1/2] tc: added mask parameter in skbedit action
From: Stephen Hemminger @ 2019-07-10  0:32 UTC (permalink / raw)
  To: Roman Mashak; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1562601978-3611-1-git-send-email-mrv@mojatatu.com>

On Mon,  8 Jul 2019 12:06:17 -0400
Roman Mashak <mrv@mojatatu.com> wrote:

> Add 32-bit missing mask attribute in iproute2/tc, which has been long
> supported by the kernel side.
> 
> v2: print value in hex with print_hex() as suggested by Stephen Hemminger.
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>

Both patches applied

^ permalink raw reply

* Re: [PATCH iproute2] ip-route: fix json formatting for metrics
From: Stephen Hemminger @ 2019-07-10  0:29 UTC (permalink / raw)
  To: Andrea Claudi; +Cc: netdev, dsahern
In-Reply-To: <f1535e547aa6da8216ca2a0da7c06b645a132929.1562578533.git.aclaudi@redhat.com>

On Mon,  8 Jul 2019 11:36:42 +0200
Andrea Claudi <aclaudi@redhat.com> wrote:

> Setting metrics for routes currently lead to non-parsable
> json output. For example:
> 
> $ ip link add type dummy
> $ ip route add 192.168.2.0 dev dummy0 metric 100 mtu 1000 rto_min 3
> $ ip -j route | jq
> parse error: ':' not as part of an object at line 1, column 319
> 
> Fixing this opening a json object in the metrics array and using
> print_string() instead of fprintf().
> 
> This is the output for the above commands applying this patch:
> 
> $ ip -j route | jq
> [
>   {
>     "dst": "192.168.2.0",
>     "dev": "dummy0",
>     "scope": "link",
>     "metric": 100,
>     "flags": [],
>     "metrics": [
>       {
>         "mtu": 1000,
>         "rto_min": 3
>       }
>     ]
>   }
> ]
> 
> Fixes: 663c3cb23103f ("iproute: implement JSON and color output")
> Fixes: 968272e791710 ("iproute: refactor metrics print")
> Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
> ---

Applied, thanks

^ permalink raw reply

* Re: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: David Miller @ 2019-07-10  0:29 UTC (permalink / raw)
  To: haiyangz
  Cc: sashal, linux-hyperv, netdev, kys, sthemmin, olaf, vkuznets,
	linux-kernel
In-Reply-To: <1562712932-79936-1-git-send-email-haiyangz@microsoft.com>


The net-next tree, if you are reading netdev today, has been closed.

^ permalink raw reply

* Re: [bpf PATCH v2 0/6] bpf: sockmap/tls fixes
From: Jakub Kicinski @ 2019-07-10  0:04 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf
In-Reply-To: <5d24b55e8b868_3b162ae67af425b43e@john-XPS-13-9370.notmuch>

On Tue, 09 Jul 2019 08:40:14 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > Looks like strparser is not done'd for offload?  
> 
> Right so if rx_conf != TLS_SW then the hardware needs to do
> the strparser functionality.

Can I just take a stab at fixing the HW part?

Can I rebase this onto net-next?  There are a few patches from net
missing in the bpf tree.


^ permalink raw reply

* Re: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Stephen Hemminger @ 2019-07-09 23:47 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
	olaf@aepfle.de, vkuznets, davem@davemloft.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <1562712932-79936-1-git-send-email-haiyangz@microsoft.com>

On Tue, 9 Jul 2019 22:56:30 +0000
Haiyang Zhang <haiyangz@microsoft.com> wrote:

> -				VRSS_CHANNEL_MAX);
> +	if (probe_type == PROBE_PREFER_ASYNCHRONOUS) {
> +		snprintf(name, IFNAMSIZ, "eth%d", dev->channel->dev_num);

What about PCI passthrough or VF devices that are also being probed and
consuming the ethN names.  Won't there be a collision?

^ permalink raw reply

* Re: [PATCH iproute2] utils: don't match empty strings as prefixes
From: Matteo Croce @ 2019-07-09 23:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David Ahern
In-Reply-To: <20190709143758.695a65bc@hermes.lan>

On Tue, Jul 9, 2019 at 11:38 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue,  9 Jul 2019 22:40:40 +0200
> Matteo Croce <mcroce@redhat.com> wrote:
>
> > iproute has an utility function which checks if a string is a prefix for
> > another one, to allow use of abbreviated commands, e.g. 'addr' or 'a'
> > instead of 'address'.
> >
> > This routine unfortunately considers an empty string as prefix
> > of any pattern, leading to undefined behaviour when an empty
> > argument is passed to ip:
> >
> >     # ip ''
> >     1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
> >         link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> >         inet 127.0.0.1/8 scope host lo
> >            valid_lft forever preferred_lft forever
> >         inet6 ::1/128 scope host
> >            valid_lft forever preferred_lft forever
> >
> >     # tc ''
> >     qdisc noqueue 0: dev lo root refcnt 2
> >
> >     # ip address add 192.0.2.0/24 '' 198.51.100.1 dev dummy0
> >     # ip addr show dev dummy0
> >     6: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
> >         link/ether 02:9d:5e:e9:3f:c0 brd ff:ff:ff:ff:ff:ff
> >         inet 192.0.2.0/24 brd 198.51.100.1 scope global dummy0
> >            valid_lft forever preferred_lft forever
> >
> > Rewrite matches() so it takes care of an empty input, and doesn't
> > scan the input strings three times: the actual implementation
> > does 2 strlen and a memcpy to accomplish the same task.
> >
> > Signed-off-by: Matteo Croce <mcroce@redhat.com>
> > ---
> >  include/utils.h |  2 +-
> >  lib/utils.c     | 14 +++++++++-----
> >  2 files changed, 10 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/utils.h b/include/utils.h
> > index 927fdc17..f4d12abb 100644
> > --- a/include/utils.h
> > +++ b/include/utils.h
> > @@ -198,7 +198,7 @@ int nodev(const char *dev);
> >  int check_ifname(const char *);
> >  int get_ifname(char *, const char *);
> >  const char *get_ifname_rta(int ifindex, const struct rtattr *rta);
> > -int matches(const char *arg, const char *pattern);
> > +int matches(const char *prefix, const char *string);
> >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits);
> >  int inet_addr_match_rta(const inet_prefix *m, const struct rtattr *rta);
> >
> > diff --git a/lib/utils.c b/lib/utils.c
> > index be0f11b0..73ce19bb 100644
> > --- a/lib/utils.c
> > +++ b/lib/utils.c
> > @@ -887,13 +887,17 @@ const char *get_ifname_rta(int ifindex, const struct rtattr *rta)
> >       return name;
> >  }
> >
> > -int matches(const char *cmd, const char *pattern)
> > +/* Check if 'prefix' is a non empty prefix of 'string' */
> > +int matches(const char *prefix, const char *string)
> >  {
> > -     int len = strlen(cmd);
> > +     if (!*prefix)
> > +             return 1;
> > +     while(*string && *prefix == *string) {
> > +             prefix++;
> > +             string++;
> > +     }
> >
> > -     if (len > strlen(pattern))
> > -             return -1;
> > -     return memcmp(pattern, cmd, len);
> > +     return *prefix;
> >  }
> >
> >  int inet_addr_match(const inet_prefix *a, const inet_prefix *b, int bits)
>
> ERROR: space required before the open parenthesis '('
> #134: FILE: lib/utils.c:895:
> +       while(*string && *prefix == *string) {
>
> total: 1 errors, 1 warnings, 30 lines checked
>
> The empty prefix string is a bug and should not be allowed.
> Also return value should be same as old code (yours isn't).
>
>
>

The old return value was the difference between the first pair of
bytes, according to the memcmp manpage.
All calls only checks if the matches() return value is 0 or not 0:

iproute2$ git grep 'matches(' |grep -v -e '== 0' -e '= 0' -e '!matches('
include/utils.h:int matches(const char *prefix, const char *string);
include/xtables.h:extern void xtables_register_matches(struct
xtables_match *, unsigned int);
lib/color.c:    if (matches(dup, "-color"))
lib/utils.c:int matches(const char *prefix, const char *string)
tc/tc.c:                if (matches(argv[0], iter->c))

Is it a problem if it returns a non negative value for non matching strings?

Regards,


--
Matteo Croce
per aspera ad upstream

^ permalink raw reply

* Re: [PATCH] net/mlx5e: Return in default case statement in tx_post_resync_params
From: Nathan Chancellor @ 2019-07-09 23:10 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Boris Pismenny,
	netdev, linux-rdma, LKML, clang-built-linux
In-Reply-To: <CAKwvOdkYdNiKorJAKHZ7LTfk9eOpMqe6F4QSmJWQ=-YNuPAyrw@mail.gmail.com>

On Tue, Jul 09, 2019 at 03:44:59PM -0700, Nick Desaulniers wrote:
> On Mon, Jul 8, 2019 at 4:13 PM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >
> > clang warns:
> >
> > drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:251:2:
> > warning: variable 'rec_seq_sz' is used uninitialized whenever switch
> > default is taken [-Wsometimes-uninitialized]
> >         default:
> >         ^~~~~~~
> > drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:255:46: note:
> > uninitialized use occurs here
> >         skip_static_post = !memcmp(rec_seq, &rn_be, rec_seq_sz);
> >                                                     ^~~~~~~~~~
> > drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:239:16: note:
> > initialize the variable 'rec_seq_sz' to silence this warning
> >         u16 rec_seq_sz;
> >                       ^
> >                        = 0
> > 1 warning generated.
> >
> > This case statement was clearly designed to be one that should not be
> > hit during runtime because of the WARN_ON statement so just return early
> > to prevent copying uninitialized memory up into rn_be.
> >
> > Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
> > Link: https://github.com/ClangBuiltLinux/linux/issues/590
> > Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> > index 3f5f4317a22b..5c08891806f0 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> > @@ -250,6 +250,7 @@ tx_post_resync_params(struct mlx5e_txqsq *sq,
> >         }
> >         default:
> >                 WARN_ON(1);
> > +               return;
> >         }
> 
> hmm...a switch statement with a single case is a code smell.  How
> about a single conditional with early return?  Then the "meat" of the
> happy path doesn't need an additional level of indentation.
> -- 
> Thanks,
> ~Nick Desaulniers

I assume that the reason for this is there may be other cipher types
added in the future? I suppose the maintainers can give more clarity to
that.

Furthermore, if they want the switch statements to remain, it looks like
fill_static_params_ctx also returns in the default statement so it seems
like this is the right fix.

Cheers,
Nathan

^ permalink raw reply

* [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Haiyang Zhang @ 2019-07-09 22:56 UTC (permalink / raw)
  To: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
	vkuznets, davem@davemloft.net, linux-kernel@vger.kernel.org

Previously the async probing caused NIC naming in random order.

The patch adds a dev_num field in vmbus channel structure. It’s assigned
to the first available number when the channel is offered. So netvsc can
use it for NIC naming based on channel offer sequence. Now we re-enable
the async probing mode by default for faster probing.

Also added a modules parameter, probe_type, to set sync probing mode if
a user wants to.

Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/hv/channel_mgmt.c       | 46 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/hyperv/netvsc_drv.c | 33 ++++++++++++++++++++++++++---
 include/linux/hyperv.h          |  4 ++++
 3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index addcef5..ab7c05b 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -304,6 +304,8 @@ bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp,
 
 EXPORT_SYMBOL_GPL(vmbus_prep_negotiate_resp);
 
+#define HV_DEV_NUM_INVALID (-1)
+
 /*
  * alloc_channel - Allocate and initialize a vmbus channel object
  */
@@ -315,6 +317,8 @@ static struct vmbus_channel *alloc_channel(void)
 	if (!channel)
 		return NULL;
 
+	channel->dev_num = HV_DEV_NUM_INVALID;
+
 	spin_lock_init(&channel->lock);
 	init_completion(&channel->rescind_event);
 
@@ -533,6 +537,42 @@ static void vmbus_add_channel_work(struct work_struct *work)
 }
 
 /*
+ * Get the first available device number of its type, then
+ * record it in the channel structure.
+ */
+static void hv_set_devnum(struct vmbus_channel *newchannel)
+{
+	struct vmbus_channel *channel;
+	unsigned int i = 0;
+	bool found;
+
+	BUG_ON(!mutex_is_locked(&vmbus_connection.channel_mutex));
+
+	/* Only HV_NIC uses this number for now */
+	if (hv_get_dev_type(newchannel) != HV_NIC)
+		return;
+
+next:
+	found = false;
+
+	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+		if (i == channel->dev_num &&
+		    guid_equal(&channel->offermsg.offer.if_type,
+			       &newchannel->offermsg.offer.if_type)) {
+			found = true;
+			break;
+		}
+	}
+
+	if (found) {
+		i++;
+		goto next;
+	}
+
+	newchannel->dev_num = i;
+}
+
+/*
  * vmbus_process_offer - Process the offer by creating a channel/device
  * associated with this offer
  */
@@ -561,10 +601,12 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 		}
 	}
 
-	if (fnew)
+	if (fnew) {
+		hv_set_devnum(newchannel);
+
 		list_add_tail(&newchannel->listentry,
 			      &vmbus_connection.chn_list);
-	else {
+	} else {
 		/*
 		 * Check to see if this is a valid sub-channel.
 		 */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index afdcc56..af53690 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -57,6 +57,10 @@
 module_param(debug, int, 0444);
 MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 
+static unsigned int probe_type __ro_after_init = PROBE_PREFER_ASYNCHRONOUS;
+module_param(probe_type, uint, 0444);
+MODULE_PARM_DESC(probe_type, "Probe type: 1=async(default), 2=sync");
+
 static LIST_HEAD(netvsc_dev_list);
 
 static void netvsc_change_rx_flags(struct net_device *net, int change)
@@ -2233,10 +2237,19 @@ static int netvsc_probe(struct hv_device *dev,
 	struct net_device_context *net_device_ctx;
 	struct netvsc_device_info *device_info = NULL;
 	struct netvsc_device *nvdev;
+	char name[IFNAMSIZ];
 	int ret = -ENOMEM;
 
-	net = alloc_etherdev_mq(sizeof(struct net_device_context),
-				VRSS_CHANNEL_MAX);
+	if (probe_type == PROBE_PREFER_ASYNCHRONOUS) {
+		snprintf(name, IFNAMSIZ, "eth%d", dev->channel->dev_num);
+		net = alloc_netdev_mqs(sizeof(struct net_device_context), name,
+				       NET_NAME_ENUM, ether_setup,
+				       VRSS_CHANNEL_MAX, VRSS_CHANNEL_MAX);
+	} else {
+		net = alloc_etherdev_mq(sizeof(struct net_device_context),
+					VRSS_CHANNEL_MAX);
+	}
+
 	if (!net)
 		goto no_net;
 
@@ -2323,6 +2336,14 @@ static int netvsc_probe(struct hv_device *dev,
 		net->max_mtu = ETH_DATA_LEN;
 
 	ret = register_netdevice(net);
+
+	if (ret == -EEXIST) {
+		pr_info("NIC name %s exists, request another name.\n",
+			net->name);
+		strlcpy(net->name, "eth%d", IFNAMSIZ);
+		ret = register_netdevice(net);
+	}
+
 	if (ret != 0) {
 		pr_err("Unable to register netdev.\n");
 		goto register_failed;
@@ -2407,7 +2428,7 @@ static int netvsc_remove(struct hv_device *dev)
 	.probe = netvsc_probe,
 	.remove = netvsc_remove,
 	.driver = {
-		.probe_type = PROBE_FORCE_SYNCHRONOUS,
+		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
 	},
 };
 
@@ -2473,6 +2494,12 @@ static int __init netvsc_drv_init(void)
 	}
 	netvsc_ring_bytes = ring_size * PAGE_SIZE;
 
+	if (probe_type != PROBE_PREFER_ASYNCHRONOUS)
+		probe_type = PROBE_FORCE_SYNCHRONOUS;
+
+	netvsc_drv.driver.probe_type = probe_type;
+	pr_info("probe_type: %u\n", probe_type);
+
 	ret = vmbus_driver_register(&netvsc_drv);
 	if (ret)
 		return ret;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 6256cc3..12fc5ea 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -841,6 +841,10 @@ struct vmbus_channel {
 	 */
 	struct vmbus_channel *primary_channel;
 	/*
+	 * Used for device naming based on channel offer sequence.
+	 */
+	int dev_num;
+	/*
 	 * Support per-channel state for use by vmbus drivers.
 	 */
 	void *per_channel_state;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [RFC PATCH v4 net-next 05/11] dt-bindings: net: ti: add new cpsw switch driver bindings
From: Rob Herring @ 2019-07-09 22:48 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: netdev, Ilias Apalodimas, Andrew Lunn, David S . Miller,
	Ivan Khoronzhuk, Jiri Pirko, Florian Fainelli, Sekhar Nori,
	linux-kernel, linux-omap, Murali Karicheri, Ivan Vecera,
	devicetree
In-Reply-To: <20190621181314.20778-6-grygorii.strashko@ti.com>

On Fri, Jun 21, 2019 at 09:13:08PM +0300, Grygorii Strashko wrote:
> Add bindings for the new TI CPSW switch driver. Comparing to the legacy
> bindings (net/cpsw.txt):
> - ports definition follows DSA bindings (net/dsa/dsa.txt) and ports can be
> marked as "disabled" if not physically wired.
> - ports definition follows DSA bindings (net/dsa/dsa.txt) and ports can be
> marked as "disabled" if not physically wired.
> - all deprecated properties dropped;
> - all legacy propertiies dropped which represents constant HW cpapbilities
> (cpdma_channels, ale_entries, bd_ram_size, mac_control, slaves,
> active_slave)
> - cpts properties grouped in "cpts" sub-node
> 
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  .../bindings/net/ti,cpsw-switch.txt           | 147 ++++++++++++++++++
>  1 file changed, 147 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/ti,cpsw-switch.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/ti,cpsw-switch.txt b/Documentation/devicetree/bindings/net/ti,cpsw-switch.txt
> new file mode 100644
> index 000000000000..787219cddccd
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/ti,cpsw-switch.txt
> @@ -0,0 +1,147 @@
> +TI SoC Ethernet Switch Controller Device Tree Bindings (new)
> +------------------------------------------------------
> +
> +The 3-port switch gigabit ethernet subsystem provides ethernet packet
> +communication and can be configured as an ethernet switch. It provides the
> +gigabit media independent interface (GMII),reduced gigabit media
> +independent interface (RGMII), reduced media independent interface (RMII),
> +the management data input output (MDIO) for physical layer device (PHY)
> +management.
> +
> +Required properties:
> +- compatible : be one of the below:
> +	  "ti,cpsw-switch" for backward compatible
> +	  "ti,am335x-cpsw-switch" for AM335x controllers
> +	  "ti,am4372-cpsw-switch" for AM437x controllers
> +	  "ti,dra7-cpsw-switch" for DRA7x controllers
> +- reg : physical base address and size of the CPSW module IO range
> +- ranges : shall contain the CPSW module IO range available for child devices
> +- clocks : should contain the CPSW functional clock
> +- clock-names : should be "fck"
> +	See bindings/clock/clock-bindings.txt
> +- interrupts : should contain CPSW RX, TX, MISC, RX_THRESH interrupts
> +- interrupt-names : should contain "rx_thresh", "rx", "tx", "misc"

What's the defined order because it's not consistent here.

> +	See bindings/interrupt-controller/interrupts.txt
> +
> +Optional properties:
> +- syscon : phandle to the system control device node which provides access to
> +	efuse IO range with MAC addresses
> +
> +Required Sub-nodes:
> +- ports	: contains CPSW external ports descriptions

Use ethernet-ports to avoid 'ports' from the graph binding.

> +	Required properties:
> +	- #address-cells : Must be 1
> +	- #size-cells : Must be 0
> +	- reg : CPSW port number. Should be 1 or 2
> +	- phys : phandle on phy-gmii-sel PHY (see phy/ti-phy-gmii-sel.txt)
> +	- phy-mode : operation mode of the PHY interface [1]
> +	- phy-handle : phandle to a PHY on an MDIO bus [1]
> +
> +	Optional properties:
> +	- ti,label : Describes the label associated with this port

What's wrong with standard 'label' property.

> +	- ti,dual_emac_pvid : Specifies default PORT VID to be used to segregate

s/_/-/

> +		ports. Default value - CPSW port number.
> +	- mac-address : array of 6 bytes, specifies the MAC address. Always
> +		accounted first if present [1]

No need to re-define this here. 

> +	- local-mac-address : See [1]
> +
> +- mdio : CPSW MDIO bus block description
> +	- bus_freq : MDIO Bus frequency
> +	See bindings/net/mdio.txt and davinci-mdio.txt

Standard properties clock-frequency or bus-frequency would have been 
better...

> +
> +- cpts : The Common Platform Time Sync (CPTS) module description
> +	- clocks : should contain the CPTS reference clock
> +	- clock-names : should be "cpts"
> +	See bindings/clock/clock-bindings.txt
> +
> +	Optional properties - all ports:
> +	- cpts_clock_mult : Numerator to convert input clock ticks into ns
> +	- cpts_clock_shift : Denominator to convert input clock ticks into ns
> +			  Mult and shift will be calculated basing on CPTS
> +			  rftclk frequency if both cpts_clock_shift and
> +			  cpts_clock_mult properties are not provided.

Should have 'ti' prefix and use '-' rather than '_'. However, these are 
already defined somewhere else, right? I can't tell that from reading 
this.

> +
> +[1] See Documentation/devicetree/bindings/net/ethernet.txt
> +
> +Examples - SOC:

Please don't split example into SoC and board. Just show the flat 
example.

> +mac_sw: ethernet_switch@0 {

switch@0

> +	compatible = "ti,dra7-cpsw-switch","ti,cpsw-switch";
> +	reg = <0x0 0x4000>;
> +	ranges = <0 0 0x4000>;
> +	clocks = <&gmac_main_clk>;
> +	clock-names = "fck";
> +	#address-cells = <1>;
> +	#size-cells = <1>;
> +	syscon = <&scm_conf>;
> +	status = "disabled";
> +
> +	interrupts = <GIC_SPI 334 IRQ_TYPE_LEVEL_HIGH>,
> +		     <GIC_SPI 335 IRQ_TYPE_LEVEL_HIGH>,
> +		     <GIC_SPI 336 IRQ_TYPE_LEVEL_HIGH>,
> +		     <GIC_SPI 337 IRQ_TYPE_LEVEL_HIGH>;
> +	interrupt-names = "rx_thresh", "rx", "tx", "misc"
> +
> +	ports {
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +
> +		cpsw_port1: port@1 {
> +			reg = <1>;
> +			ti,label = "port1";

Not really any better than the node name. Do you really even need this 
property?

> +			/* Filled in by U-Boot */
> +			mac-address = [ 00 00 00 00 00 00 ];
> +			phys = <&phy_gmii_sel 1>;
> +		};
> +
> +		cpsw_port2: port@2 {
> +			reg = <2>;
> +			ti,label = "port2";
> +			/* Filled in by U-Boot */
> +			mac-address = [ 00 00 00 00 00 00 ];
> +			phys = <&phy_gmii_sel 2>;
> +		};
> +	};
> +
> +	davinci_mdio_sw: mdio@1000 {
> +		compatible = "ti,cpsw-mdio","ti,davinci_mdio";
> +		#address-cells = <1>;
> +		#size-cells = <0>;
> +		ti,hwmods = "davinci_mdio";
> +		bus_freq = <1000000>;
> +		reg = <0x1000 0x100>;
> +	};
> +
> +	cpts {
> +		clocks = <&gmac_clkctrl DRA7_GMAC_GMAC_CLKCTRL 25>;
> +		clock-names = "cpts";
> +	};
> +};
> +
> +Examples - platform/board:
> +
> +&mac_sw {
> +	pinctrl-names = "default", "sleep";
> +	status = "okay";
> +};
> +
> +&cpsw_port1 {
> +	phy-handle = <&ethphy0_sw>;
> +	phy-mode = "rgmii";
> +	ti,dual_emac_pvid = <1>;
> +};
> +
> +&cpsw_port2 {
> +	phy-handle = <&ethphy1_sw>;
> +	phy-mode = "rgmii";
> +	ti,dual_emac_pvid = <2>;
> +};
> +
> +&davinci_mdio_sw {
> +	ethphy0_sw: ethernet-phy@0 {
> +		reg = <0>;
> +	};
> +
> +	ethphy1_sw: ethernet-phy@1 {
> +		reg = <1>;
> +	};
> +};
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [PATCH] net/mlx5e: Return in default case statement in tx_post_resync_params
From: Nick Desaulniers @ 2019-07-09 22:44 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Boris Pismenny,
	netdev, linux-rdma, LKML, clang-built-linux
In-Reply-To: <20190708231154.89969-1-natechancellor@gmail.com>

On Mon, Jul 8, 2019 at 4:13 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> clang warns:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:251:2:
> warning: variable 'rec_seq_sz' is used uninitialized whenever switch
> default is taken [-Wsometimes-uninitialized]
>         default:
>         ^~~~~~~
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:255:46: note:
> uninitialized use occurs here
>         skip_static_post = !memcmp(rec_seq, &rn_be, rec_seq_sz);
>                                                     ^~~~~~~~~~
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:239:16: note:
> initialize the variable 'rec_seq_sz' to silence this warning
>         u16 rec_seq_sz;
>                       ^
>                        = 0
> 1 warning generated.
>
> This case statement was clearly designed to be one that should not be
> hit during runtime because of the WARN_ON statement so just return early
> to prevent copying uninitialized memory up into rn_be.
>
> Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
> Link: https://github.com/ClangBuiltLinux/linux/issues/590
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> index 3f5f4317a22b..5c08891806f0 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> @@ -250,6 +250,7 @@ tx_post_resync_params(struct mlx5e_txqsq *sq,
>         }
>         default:
>                 WARN_ON(1);
> +               return;
>         }

hmm...a switch statement with a single case is a code smell.  How
about a single conditional with early return?  Then the "meat" of the
happy path doesn't need an additional level of indentation.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Shannon Nelson @ 2019-07-09 22:42 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20190709022703.GB5835@lunn.ch>

On 7/8/19 7:27 PM, Andrew Lunn wrote:
>> +static int ionic_get_module_eeprom(struct net_device *netdev,
>> +				   struct ethtool_eeprom *ee,
>> +				   u8 *data)
>> +{
>> +	struct lif *lif = netdev_priv(netdev);
>> +	struct ionic_dev *idev = &lif->ionic->idev;
>> +	struct xcvr_status *xcvr;
>> +	u32 len;
>> +
>> +	/* copy the module bytes into data */
>> +	xcvr = &idev->port_info->status.xcvr;
>> +	len = min_t(u32, sizeof(xcvr->sprom), ee->len);
>> +	memcpy(data, xcvr->sprom, len);
> Hi Shannon
>
> This also looks odd. Where is the call into the firmware to get the
> eeprom contents? Even though it is called 'eeprom', the data is not
> static. It contains real time diagnostic values, temperature, transmit
> power, receiver power, voltages etc.

idev->port_info is a memory mapped space that the device keeps up-to-date.

>
>> +
>> +	dev_dbg(&lif->netdev->dev, "notifyblock eid=0x%llx link_status=0x%x link_speed=0x%x link_down_cnt=0x%x\n",
>> +		lif->info->status.eid,
>> +		lif->info->status.link_status,
>> +		lif->info->status.link_speed,
>> +		lif->info->status.link_down_count);
>> +	dev_dbg(&lif->netdev->dev, "  port_status id=0x%x status=0x%x speed=0x%x\n",
>> +		idev->port_info->status.id,
>> +		idev->port_info->status.status,
>> +		idev->port_info->status.speed);
>> +	dev_dbg(&lif->netdev->dev, "    xcvr status state=0x%x phy=0x%x pid=0x%x\n",
>> +		xcvr->state, xcvr->phy, xcvr->pid);
>> +	dev_dbg(&lif->netdev->dev, "  port_config state=0x%x speed=0x%x mtu=0x%x an_enable=0x%x fec_type=0x%x pause_type=0x%x loopback_mode=0x%x\n",
>> +		idev->port_info->config.state,
>> +		idev->port_info->config.speed,
>> +		idev->port_info->config.mtu,
>> +		idev->port_info->config.an_enable,
>> +		idev->port_info->config.fec_type,
>> +		idev->port_info->config.pause_type,
>> +		idev->port_info->config.loopback_mode);
> It is a funny place to have all the debug code.

Someone wanted a hook for getting some link info on the fly on request.

sln

>
>     Andrew


^ permalink raw reply

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: Jakub Kicinski @ 2019-07-09 22:34 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, pieter.jansenvanvuuren, andrew, f.fainelli, vivien.didelot,
	idosch, Alexei Starovoitov
In-Reply-To: <20190709123844.GA27309@splinter>

On Tue, 9 Jul 2019 15:38:44 +0300, Ido Schimmel wrote:
> On Mon, Jul 08, 2019 at 03:51:58PM -0700, Jakub Kicinski wrote:
> > On Mon, 8 Jul 2019 16:19:08 +0300, Ido Schimmel wrote:  
> > > On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:  
> > > > From: Ido Schimmel <idosch@idosch.org>
> > > > Date: Sun,  7 Jul 2019 10:58:17 +0300
> > > >     
> > > > > Users have several ways to debug the kernel and understand why a packet
> > > > > was dropped. For example, using "drop monitor" and "perf". Both
> > > > > utilities trace kfree_skb(), which is the function called when a packet
> > > > > is freed as part of a failure. The information provided by these tools
> > > > > is invaluable when trying to understand the cause of a packet loss.
> > > > > 
> > > > > In recent years, large portions of the kernel data path were offloaded
> > > > > to capable devices. Today, it is possible to perform L2 and L3
> > > > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > > > > Different TC classifiers and actions are also offloaded to capable
> > > > > devices, at both ingress and egress.
> > > > > 
> > > > > However, when the data path is offloaded it is not possible to achieve
> > > > > the same level of introspection as tools such "perf" and "drop monitor"
> > > > > become irrelevant.
> > > > > 
> > > > > This patchset aims to solve this by allowing users to monitor packets
> > > > > that the underlying device decided to drop along with relevant metadata
> > > > > such as the drop reason and ingress port.    
> > > > 
> > > > We are now going to have 5 or so ways to capture packets passing through
> > > > the system, this is nonsense.
> > > > 
> > > > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> > > > devlink thing.
> > > > 
> > > > This is insanity, too many ways to do the same thing and therefore the
> > > > worst possible user experience.
> > > > 
> > > > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> > > > XDP perf events, and these taps there too.
> > > > 
> > > > I mean really, think about it from the average user's perspective.  To
> > > > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> > > > listen on devlink but configure a special tap thing beforehand and then
> > > > if someone is using XDP I gotta setup another perf event buffer capture
> > > > thing too.    
> > > 
> > > Let me try to explain again because I probably wasn't clear enough. The
> > > devlink-trap mechanism is not doing the same thing as other solutions.
> > > 
> > > The packets we are capturing in this patchset are packets that the
> > > kernel (the CPU) never saw up until now - they were silently dropped by
> > > the underlying device performing the packet forwarding instead of the
> > > CPU.  
> 
> Jakub,
> 
> It seems to me that most of the criticism is about consolidation of
> interfaces because you believe I'm doing something you can already do
> today, but this is not the case.

To be clear I'm not opposed to the patches, I'm just trying to
facilitate a discussion.

> Switch ASICs have dedicated traps for specific packets. Usually, these
> packets are control packets (e.g., ARP, BGP) which are required for the
> correct functioning of the control plane. You can see this in the SAI
> interface, which is an abstraction layer over vendors' SDKs:
> 
> https://github.com/opencomputeproject/SAI/blob/master/inc/saihostif.h#L157
> 
> We need to be able to configure the hardware policers of these traps and
> read their statistics to understand how many packets they dropped. We
> currently do not have a way to do any of that and we rely on hardcoded
> defaults in the driver which do not fit every use case (from
> experience):
> 
> https://elixir.bootlin.com/linux/v5.2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum.c#L4103
> 
> We plan to extend devlink-trap mechanism to cover all these use cases. I
> hope you agree that this functionality belongs in devlink given it is a
> device-specific configuration and not a netdev-specific one.

No disagreement on providing knobs for traps.

> That being said, in its current form, this mechanism is focused on traps
> that correlate to packets the device decided to drop as this is very
> useful for debugging.

That'd be mixing two things - trap configuration and tracing exceptions
in one API. That's a little suboptimal but not too terrible, especially
if there is a higher level APIs users can default to.

> Given that the entire configuration is done via devlink and that devlink
> stores all the information about these traps, it seems logical to also
> report these packets and their metadata to user space as devlink events.
> 
> If this is not desirable, we can try to call into drop_monitor from
> devlink and add a new command (e.g., NET_DM_CMD_HW_ALERT), which will
> encode all the information we currently have in DEVLINK_CMD_TRAP_REPORT.
> 
> IMO, this is less desirable, as instead of having one tool (devlink) to
> interact with this mechanism we will need two (devlink & dropwatch).
> 
> Below I tried to answer all your questions and refer to all the points
> you brought up.
> 
> > When you say silently dropped do you mean that mlxsw as of today
> > doesn't have any counters exposed for those events?  
> 
> Some of these packets are counted, but not all of them.
> 
> > If we wanted to consolidate this into something existing we can either
> >  (a) add similar traps in the kernel data path;
> >  (b) make these traps extension of statistics.
> > 
> > My knee jerk reaction to seeing the patches was that it adds a new
> > place where device statistics are reported.  
> 
> Not at all. This would be a step back. We can already count discards due
> to VLAN membership on ingress on a per-port basis. A software maintained
> global counter does not buy us anything.
> 
> By also getting the dropped packet - coupled with the drop reason and
> ingress port - you can understand exactly why and on which VLAN the
> packet was dropped. I wrote a Wireshark dissector for these netlink
> packets to make our life easier. You can see the details in my comment
> to the cover letter:
> 
> https://marc.info/?l=linux-netdev&m=156248736710238&w=2
> 
> In case you do not care about individual packets, but still want more
> fine-grained statistics for your monitoring application, you can use
> eBPF. For example, one thing we did is attaching a kprobe to
> devlink_trap_report() with an eBPF program that dissects the incoming
> skbs and maintains a counter per-{5 tuple, drop reason}. With
> ebpf_exporter you can export these statistics to Prometheus on which you
> can run queries and visualize the results with Grafana. This is
> especially useful for tail and early drops since it allows you to
> understand which flows contribute to most of the drops.

No question that the mechanism is useful.

> > Users who want to know why things are dropped will not get detailed
> > breakdown from ethtool -S which for better or worse is the one stop
> > shop for device stats today.  
> 
> I hope I managed to explain why counters are not enough, but I also want
> to point out that ethtool statistics are not properly documented and
> this hinders their effectiveness. I did my best to document the exposed
> traps in order to avoid the same fate:
> 
> https://patchwork.ozlabs.org/patch/1128585/
> 
> In addition, there are selftests to show how each trap can be triggered
> to reduce the ambiguity even further:
> 
> https://patchwork.ozlabs.org/patch/1128610/
> 
> And a note in the documentation to make sure future functionality is
> tested as well:
> 
> https://patchwork.ozlabs.org/patch/1128608/
> 
> > Having thought about it some more, however, I think that having a
> > forwarding "exception" object and hanging statistics off of it is a
> > better design, even if we need to deal with some duplication to get
> > there.
> > 
> > IOW having an way to "trap all packets which would increment a
> > statistic" (option (b) above) is probably a bad design.
> > 
> > As for (a) I wonder how many of those events have a corresponding event
> > in the kernel stack?  
> 
> Generic packet drops all have a corresponding kfree_skb() calls in the
> kernel, but that does not mean that every packet dropped by the hardware
> would also be dropped by the kernel if it were to be injected to its Rx
> path.

The notion that all SW events get captured by kfree_skb() would not be
correct. We have the kfree_skb(), and xdp_exception(), and drivers can
drop packets if various allocations fail.. the situation is already not
great.

I think that having a single useful place where users can look to see
all traffic exception events would go a long way. Software side as I
mentioned is pretty brutal, IDK how many users are actually willing to
decode stack traces to figure out why their system is dropping
packets :/  

> In my reply to Dave I gave buffer drops as an example.

The example of buffer drops is also probably the case where having the
packet is least useful, but yes, I definitely agree devices need a way
of reporting events that can't happen in SW.

> There are also situations in which packets can be dropped due to
> device-specific exceptions and these do not have a corresponding drop
> reason in the kernel. See example here:
> 
> https://patchwork.ozlabs.org/patch/1128587/
> 
> > If we could add corresponding trace points and just feed those from
> > the device driver, that'd obviously be a holy grail.  
> 
> Unlike tracepoints, netlink gives you a structured and extensible
> interface. For example, in Spectrum-1 we cannot provide the Tx port for
> early/tail drops, whereas for Spectrum-2 and later we can. With netlink,
> we can just omit the DEVLINK_ATTR_TRAP_OUT_PORT attribute for
> Spectrum-1. You also get a programmatic interface that you can query for
> this information:
> 
> # devlink -v trap show netdevsim/netdevsim10 trap ingress_vlan_filter
> netdevsim/netdevsim10:
>   name ingress_vlan_filter type drop generic true report false action drop group l2_drops
>     metadata:
>        input_port

Right, you can set or not set skb fields to some extent but its
definitely not as flexible as netlink.

^ permalink raw reply

* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Shannon Nelson @ 2019-07-09 22:35 UTC (permalink / raw)
  To: Michal Kubecek, netdev; +Cc: Andrew Lunn
In-Reply-To: <20190709051501.GA16610@unicorn.suse.cz>

On 7/8/19 10:15 PM, Michal Kubecek wrote:
> Also, there is no need to zero initialize ks->link_modes.advertising
> above if it's going to be rewritten here anyway.
>
> Michal

Got it.

Thanks,
sln


^ permalink raw reply

* Re: [PATCH net-next iproute2 v1] devlink: Show devlink port number
From: David Ahern @ 2019-07-09 22:33 UTC (permalink / raw)
  To: Parav Pandit, netdev; +Cc: stephen, jiri, dsahern
In-Reply-To: <20190709172654.24057-1-parav@mellanox.com>

On 7/9/19 11:26 AM, Parav Pandit wrote:
> Show devlink port number whenever kernel reports that attribute.
> 
> An example output for a physical port.
> $ devlink port show
> pci/0000:06:00.1/65535: type eth netdev eth1_p1 flavour physical port 1
> 
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
> Changelog:
> v0->v1:
>  - Declare and assign port_number as two different lines.
> ---
>  devlink/devlink.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 

applied to iproute2-next. Thanks



^ permalink raw reply

* Re: [PATCH v3 net-next 13/19] ionic: Add initial ethtool support
From: Shannon Nelson @ 2019-07-09 22:34 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: netdev
In-Reply-To: <20190709052541.GB16610@unicorn.suse.cz>

On 7/8/19 10:25 PM, Michal Kubecek wrote:
> On Mon, Jul 08, 2019 at 12:25:26PM -0700, Shannon Nelson wrote:
>> Add in the basic ethtool callbacks for device information
>> and control.
>>

>> +
>> +	if (fec_type != idev->port_info->config.fec_type) {
>> +		mutex_lock(&ionic->dev_cmd_lock);
>> +		ionic_dev_cmd_port_fec(idev, PORT_FEC_TYPE_NONE);
> The second argument should be fec_type, I believe.

Yep.

>
>> +		err = ionic_dev_cmd_wait(ionic, devcmd_timeout);
>> +		mutex_unlock(&ionic->dev_cmd_lock);
>> +		if (err)
>> +			return err;
>> +
>> +		idev->port_info->config.fec_type = fec_type;
>> +	}
>> +
>> +	return 0;
>> +}
> ...
>> +static int ionic_set_ringparam(struct net_device *netdev,
>> +			       struct ethtool_ringparam *ring)
>> +{
>> +	struct lif *lif = netdev_priv(netdev);
>> +	bool running;
>> +	int i, j;
>> +
>> +	if (ring->rx_mini_pending || ring->rx_jumbo_pending) {
>> +		netdev_info(netdev, "Changing jumbo or mini descriptors not supported\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	i = ring->tx_pending & (ring->tx_pending - 1);
>> +	j = ring->rx_pending & (ring->rx_pending - 1);
>> +	if (i || j) {
>> +		netdev_info(netdev, "Descriptor count must be a power of 2\n");
>> +		return -EINVAL;
>> +	}
> You can use is_power_of_2() here (it wouldn't allow 0 but you probably
> don't want to allow that either).

Sure.

Thanks,
sln


^ permalink raw reply

* Re: [PATCH iproute2 2/2] ip tunnel: warn when changing IPv6 tunnel without tunnel name
From: Mahesh Bandewar (महेश बंडेवार) @ 2019-07-09 22:14 UTC (permalink / raw)
  To: Andrea Claudi; +Cc: linux-netdev, stephen, dsahern
In-Reply-To: <dfb76d0e40b0158cf6a87ae9558b256915d73f6f.1562667648.git.aclaudi@redhat.com>

On Tue, Jul 9, 2019 at 6:16 AM Andrea Claudi <aclaudi@redhat.com> wrote:
>
> Tunnel change fails if a tunnel name is not specified while using
> 'ip -6 tunnel change'. However, no warning message is printed and
> no error code is returned.
>
> $ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
> $ ip -6 tunnel change dev dummy0 local 2001:1234::1 remote 2001:1234::2
> $ ip -6 tunnel show ip6tnl1
> ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)
>
> This commit checks if tunnel interface name is equal to an empty
> string: in this case, it prints a warning message to the user.
> It intentionally avoids to return an error to not break existing
> script setup.
>
> This is the output after this commit:
> $ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
> $ ip -6 tunnel change dev dummy0 local 2001:1234::1 remote 2001:1234::2
> Tunnel interface name not specified
> $ ip -6 tunnel show ip6tnl1
> ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)
>
> Reviewed-by: Matteo Croce <mcroce@redhat.com>
> Signed-off-by: Andrea Claudi <aclaudi@redhat.com>

I tried your patch and the commands that I posted in my (previous) patch.

Here is the output after reverting my patch and applying your patch

<show command>
------------------------
vm0:/tmp# ./ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote
fd::2 tos inherit ttl 127 encaplimit none
vm0:/tmp# ./ip -6 tunnel show dev ip6tnl1
vm0:/tmp# echo $?
0

here the output is NULL and return code is 0. This is wrong and I
would expect to see the tunnel info (as displayed in 'ip -6 tunnel
show ip6tnl1')

<change command>
lpaa10:/tmp# ip -6 tunnel change dev ip6tnl1 local 2001:1234::1 remote
2001:1234::2 encaplimit none ttl 127 tos inherit allow-localremote
lpaa10:/tmp# echo $?
0
lpaa10:/tmp# ip -6 tunnel show dev ip6tnl1
lpaa10:/tmp# ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 encaplimit none hoplimit
127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

the change command appeared to be successful but change wasn't applied
(expecting the allow-localremote to be present on the tunnel).
---
>  ip/ip6tunnel.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
> index 999408ed801b1..e3da11eb4518e 100644
> --- a/ip/ip6tunnel.c
> +++ b/ip/ip6tunnel.c
> @@ -386,6 +386,9 @@ static int do_add(int cmd, int argc, char **argv)
>         if (parse_args(argc, argv, cmd, &p) < 0)
>                 return -1;
>
> +       if (!*p.name)
> +               fprintf(stderr, "Tunnel interface name not specified\n");
> +
>         if (p.proto == IPPROTO_GRE)
>                 basedev = "ip6gre0";
>         else if (p.i_flags & VTI_ISVTI)
> --
> 2.20.1
>

^ permalink raw reply

* Re: [PATCH v2 1/4] dt-bindings: allow up to four clocks for orion-mdio
From: Rob Herring @ 2019-07-09 22:07 UTC (permalink / raw)
  To: josua; +Cc: netdev, stable, David S. Miller, Mark Rutland, Andrew Lunn
In-Reply-To: <20190709130101.5160-2-josua@solid-run.com>

On Tue, Jul 9, 2019 at 7:13 AM <josua@solid-run.com> wrote:
>
> From: Josua Mayer <josua@solid-run.com>
>
> Armada 8040 needs four clocks to be enabled for MDIO accesses to work.
> Update the binding to allow the extra clock to be specified.
>
> Cc: stable@vger.kernel.org
> Fixes: 6d6a331f44a1 ("dt-bindings: allow up to three clocks for orion-mdio")
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Josua Mayer <josua@solid-run.com>
> ---
>  Documentation/devicetree/bindings/net/marvell-orion-mdio.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Please don't resend patches when the last one is still under discussion.

Rob

^ permalink raw reply

* Re: [PATCH 1/4] dt-bindings: allow up to four clocks for orion-mdio
From: Rob Herring @ 2019-07-09 22:03 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: josua, netdev, stable, David S. Miller, Mark Rutland
In-Reply-To: <20190709024143.GD5835@lunn.ch>

On Mon, Jul 8, 2019 at 8:41 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > >  Optional properties:
> > >  - interrupts: interrupt line number for the SMI error/done interrupt
> > > -- clocks: phandle for up to three required clocks for the MDIO instance
> > > +- clocks: phandle for up to four required clocks for the MDIO instance
> >
> > This needs to enumerate exactly what the clocks are. Shouldn't there
> > be an additional clock-names value too?
>
> Hi Rob
>
> The driver does not care what they are called. It just turns them all
> on, and turns them off again when removed.

That's fine for the driver to do, but this is the hardware description.

It's not just what they are called, but how many too. Is 1 clock in
the DT valid? 0? It would be unusual for a given piece of h/w to
function with a variable number of clocks.

Rob

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: add support for BRIDGE_MROUTER attribute
From: David Miller @ 2019-07-09 21:50 UTC (permalink / raw)
  To: vivien.didelot; +Cc: netdev, linux, f.fainelli, idosch, andrew
In-Reply-To: <20190709033113.8837-1-vivien.didelot@gmail.com>

From: Vivien Didelot <vivien.didelot@gmail.com>
Date: Mon,  8 Jul 2019 23:31:13 -0400

> This patch adds support for enabling or disabling the flooding of
> unknown multicast traffic on the CPU ports, depending on the value
> of the switchdev SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute.
> 
> The current behavior is kept unchanged but a user can now prevent
> the CPU conduit to be flooded with a lot of unregistered traffic that
> the network stack needs to filter in software with e.g.:
> 
>     echo 0 > /sys/class/net/br0/multicast_router
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>

Looks very reasonable and straightforward.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] pkt_sched: Include const.h
From: David Miller @ 2019-07-09 21:48 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, dsahern, vedang.patel
In-Reply-To: <20190709214517.31684-1-dsahern@kernel.org>

From: David Ahern <dsahern@kernel.org>
Date: Tue,  9 Jul 2019 14:45:17 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Commit 9903c8dc7342 changed TC_ETF defines to use _BITUL instead of BIT
> but did not add the dependecy on linux/const.h. As a consequence,
> importing the uapi headers into iproute2 causes builds to fail. Add
> the dependency.
> 
> Fixes: 9903c8dc7342 ("etf: Don't use BIT() in UAPI headers.")
> Cc: Vedang Patel <vedang.patel@intel.com>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] net: netsec: remove static declaration for netsec_set_tx_de()
From: David Miller @ 2019-07-09 21:47 UTC (permalink / raw)
  To: ilias.apalodimas; +Cc: netdev, jaswinder.singh
In-Reply-To: <1562706889-15471-2-git-send-email-ilias.apalodimas@linaro.org>

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Date: Wed, 10 Jul 2019 00:14:49 +0300

> On commit ba2b232108d3 ("net: netsec: add XDP support") a static
> declaration for netsec_set_tx_de() was added to make the diff easier
> to read.  Now that the patch is merged let's move the functions around
> and get rid of that
> 
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] net: netsec: remove superfluous if statement
From: David Miller @ 2019-07-09 21:46 UTC (permalink / raw)
  To: ilias.apalodimas; +Cc: netdev, jaswinder.singh
In-Reply-To: <1562706889-15471-1-git-send-email-ilias.apalodimas@linaro.org>

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Date: Wed, 10 Jul 2019 00:14:48 +0300

> While freeing tx buffers the memory has to be unmapped if the packet was
> an skb or was used for .ndo_xdp_xmit using the same arguments. Get rid
> of the unneeded extra 'else if' statement
> 
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox