Netdev List
 help / color / mirror / Atom feed
* Re: [patch iproute2] tc: add -bs option for batch mode
From: Stephen Hemminger @ 2017-12-19 15:22 UTC (permalink / raw)
  To: Chris Mi; +Cc: netdev, gerlitz.or
In-Reply-To: <20171219063346.19300-1-chrism@mellanox.com>

On Tue, 19 Dec 2017 15:33:46 +0900
Chris Mi <chrism@mellanox.com> wrote:

> Currently in tc batch mode, only one command is read from the batch
> file and sent to kernel to process. With this patch, we can accumulate
> several commands before sending to kernel. The batch size is specified
> using option -bs or -batchsize.
> 
> To accumulate the commands in tc, we allocate an array of struct iovec.
> If batchsize is bigger than 1 and we haven't accumulated enough
> commands, rtnl_talk() will return without actually sending the message.
> One exception is that there is no more command in the batch file.
> 
> But please note that kernel still processes the requests one by one.
> To process the requests in parallel in kernel is another effort.
> The time we're saving in this patch is the user mode and kernel mode
> context switch. So this patch works on top of the current kernel.
> 
> Using the following script in kernel, we can generate 1,000,000 rules.
> 	tools/testing/selftests/tc-testing/tdc_batch.py
> 
> Without this patch, 'tc -b $file' exection time is:
> 
> real    0m14.916s
> user    0m6.808s
> sys     0m8.046s
> 
> With this patch, 'tc -b $file -bs 10' exection time is:
> 
> real    0m12.286s
> user    0m5.903s
> sys     0m6.312s
> 
> The insertion rate is improved more than 10%.
> 
> Signed-off-by: Chris Mi <chrism@mellanox.com>
> ---
>  include/libnetlink.h | 27 ++++++++++++++++
>  include/utils.h      |  8 +++++
>  lib/libnetlink.c     | 30 +++++++++++++-----
>  lib/utils.c          | 20 ++++++++++++
>  man/man8/tc.8        |  5 +++
>  tc/m_action.c        | 63 ++++++++++++++++++++++++++++---------
>  tc/tc.c              | 27 ++++++++++++++--
>  tc/tc_common.h       |  3 ++
>  tc/tc_filter.c       | 89 ++++++++++++++++++++++++++++++++++++----------------
>  9 files changed, 221 insertions(+), 51 deletions(-)

In addition to my earlier comments, these are the implementation
issues.

> 
> diff --git a/include/libnetlink.h b/include/libnetlink.h
> index a4d83b9e..07e88c94 100644
> --- a/include/libnetlink.h
> +++ b/include/libnetlink.h
> @@ -13,6 +13,8 @@
>  #include <linux/netconf.h>
>  #include <arpa/inet.h>
>  
> +#define MSG_IOV_MAX 256
> +
>  struct rtnl_handle {
>  	int			fd;
>  	struct sockaddr_nl	local;
> @@ -93,6 +95,31 @@ int rtnl_dump_filter_nc(struct rtnl_handle *rth,
>  			void *arg, __u16 nc_flags);
>  #define rtnl_dump_filter(rth, filter, arg) \
>  	rtnl_dump_filter_nc(rth, filter, arg, 0)
> +
> +extern int msg_iov_index;
> +static inline int get_msg_iov_index(void)
> +{
> +	return msg_iov_index;
> +}
> +static inline void set_msg_iov_index(int index)
> +{
> +	msg_iov_index = index;
> +}
> +static inline void incr_msg_iov_index(void)
> +{
> +	++msg_iov_index;
> +}
> +
> +extern int batch_size;
> +static inline int get_batch_size(void)
> +{
> +	return batch_size;
> +}
> +static inline void set_batch_size(int size)
> +{
> +	batch_size = size;
> +}

Iproute2 is C not C++; no accessors for every variable. 


>  int rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
>  	      struct nlmsghdr **answer)
>  	__attribute__((warn_unused_result));
> diff --git a/include/utils.h b/include/utils.h
> index d3895d56..66cb4747 100644
> --- a/include/utils.h
> +++ b/include/utils.h
> @@ -235,6 +235,14 @@ void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n);
>  
>  extern int cmdlineno;
>  ssize_t getcmdline(char **line, size_t *len, FILE *in);
> +
> +extern int cmdlinetotal;
> +static inline int getcmdlinetotal(void)
> +{
> +	return cmdlinetotal;
> +}
> +void setcmdlinetotal(const char *name);
> +
>  int makeargs(char *line, char *argv[], int maxargs);
>  int inet_get_addr(const char *src, __u32 *dst, struct in6_addr *dst6);
>  
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index 00e6ce0c..f9be1c6d 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -24,6 +24,7 @@
>  #include <sys/uio.h>
>  
>  #include "libnetlink.h"
> +#include "utils.h"
>  
>  #ifndef SOL_NETLINK
>  #define SOL_NETLINK 270
> @@ -581,6 +582,10 @@ static void rtnl_talk_error(struct nlmsghdr *h, struct nlmsgerr *err,
>  		strerror(-err->error));
>  }
>  
> +static struct iovec msg_iov[MSG_IOV_MAX];
> +int msg_iov_index;
> +int batch_size = 1;
> +
>  static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
>  		       struct nlmsghdr **answer,
>  		       bool show_rtnl_err, nl_ext_ack_fn_t errfn)
> @@ -589,23 +594,34 @@ static int __rtnl_talk(struct rtnl_handle *rtnl, struct nlmsghdr *n,
>  	unsigned int seq;
>  	struct nlmsghdr *h;
>  	struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
> -	struct iovec iov = {
> -		.iov_base = n,
> -		.iov_len = n->nlmsg_len
> -	};
> +	struct iovec *iov = &msg_iov[get_msg_iov_index()];
> +	int index;
> +	char *buf;
> +
> +	iov->iov_base = n;
> +	iov->iov_len = n->nlmsg_len;
> +
> +	incr_msg_iov_index();
>  	struct msghdr msg = {
>  		.msg_name = &nladdr,
>  		.msg_namelen = sizeof(nladdr),
> -		.msg_iov = &iov,
> -		.msg_iovlen = 1,
> +		.msg_iov = msg_iov,
> +		.msg_iovlen = get_msg_iov_index(),
>  	};
> -	char *buf;
>  
>  	n->nlmsg_seq = seq = ++rtnl->seq;
>  
>  	if (answer == NULL)
>  		n->nlmsg_flags |= NLM_F_ACK;

Your real performance win is just not asking for ACK for every rule.
If you switched to pure streaming mode, then the batching wouldn't
be necessary.

>  
> +	index = get_msg_iov_index() % get_batch_size();
> +	if (index != 0 && get_batch_size() > 1 &&
> +	    cmdlineno != getcmdlinetotal() &&
> +	    (n->nlmsg_type == RTM_NEWTFILTER ||
> +	    n->nlmsg_type == RTM_NEWACTION))
> +		return 3;
> +	set_msg_iov_index(index);
> +
>  	status = sendmsg(rtnl->fd, &msg, 0);
>  	if (status < 0) {
>  		perror("Cannot talk to rtnetlink");
> diff --git a/lib/utils.c b/lib/utils.c
> index 7ced8c06..53ca389f 100644
> --- a/lib/utils.c
> +++ b/lib/utils.c
> @@ -1202,6 +1202,26 @@ ssize_t getcmdline(char **linep, size_t *lenp, FILE *in)
>  	return cc;
>  }
>  
> +int cmdlinetotal;
> +
> +void setcmdlinetotal(const char *name)
> +{
> +	char *line = NULL;
> +	size_t len = 0;
> +
> +	if (name && strcmp(name, "-") != 0) {
> +		if (freopen(name, "r", stdin) == NULL) {
> +			fprintf(stderr, "Cannot open file \"%s\" for reading: %s\n",
> +				name, strerror(errno));
> +			return;
> +		}
> +	}
> +
> +	cmdlinetotal = 0;
> +	while (getcmdline(&line, &len, stdin) != -1)
> +		cmdlinetotal++;
> +}
> +
>  /* split command line into argument vector */
>  int makeargs(char *line, char *argv[], int maxargs)
>  {
> diff --git a/man/man8/tc.8 b/man/man8/tc.8
> index ff071b33..de137e16 100644
> --- a/man/man8/tc.8
> +++ b/man/man8/tc.8
> @@ -601,6 +601,11 @@ must exist already.
>  read commands from provided file or standard input and invoke them.
>  First failure will cause termination of tc.
>  
> +.TP
> +.BR "\-bs", " \-bs size", " \-batchsize", " \-batchsize size"
> +How many commands are accumulated before sending to kernel.
> +By default, it is 1. It only takes effect in batch mode.
> +
>  .TP
>  .BR "\-force"
>  don't terminate tc on errors in batch mode.
> diff --git a/tc/m_action.c b/tc/m_action.c
> index 13f942bf..574b78ca 100644
> --- a/tc/m_action.c
> +++ b/tc/m_action.c
> @@ -23,6 +23,7 @@
>  #include <arpa/inet.h>
>  #include <string.h>
>  #include <dlfcn.h>
> +#include <errno.h>
>  
>  #include "utils.h"
>  #include "tc_common.h"
> @@ -546,33 +547,65 @@ bad_val:
>  	return ret;
>  }
>  
> +typedef struct {
> +	struct nlmsghdr		n;
> +	struct tcamsg		t;
> +	char			buf[MAX_MSG];
> +} tc_action_req;
> +
> +static tc_action_req *action_reqs;
> +
> +void free_action_reqs(void)
> +{
> +	if (action_reqs)
> +		free(action_reqs);
> +}
> +
> +static tc_action_req *get_action_req(void)
> +{
> +	tc_action_req *req;
> +
> +	if (action_reqs == NULL) {
> +		action_reqs = malloc(get_batch_size() * sizeof (tc_action_req));
> +		if (action_reqs == NULL)
> +			return NULL;
> +	}
> +	req = &action_reqs[get_msg_iov_index()];
> +	memset(req, 0, sizeof (*req));
> +
> +	return req;
> +}
> +
>  static int tc_action_modify(int cmd, unsigned int flags,
>  			    int *argc_p, char ***argv_p)
>  {
>  	int argc = *argc_p;
>  	char **argv = *argv_p;
>  	int ret = 0;
> -	struct {
> -		struct nlmsghdr         n;
> -		struct tcamsg           t;
> -		char                    buf[MAX_MSG];
> -	} req = {
> -		.n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg)),
> -		.n.nlmsg_flags = NLM_F_REQUEST | flags,
> -		.n.nlmsg_type = cmd,
> -		.t.tca_family = AF_UNSPEC,
> -	};
> -	struct rtattr *tail = NLMSG_TAIL(&req.n);
> +	tc_action_req *req;
> +
> +	req = get_action_req();
> +	if (req == NULL) {
> +		fprintf(stderr, "get_action_req error: not enough buffer\n");
> +		return -ENOMEM;
> +	}
> +
> +	req->n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcamsg));
> +	req->n.nlmsg_flags = NLM_F_REQUEST | flags;
> +	req->n.nlmsg_type = cmd;
> +	req->t.tca_family = AF_UNSPEC;
> +	struct rtattr *tail = NLMSG_TAIL(&req->n);
>  
>  	argc -= 1;
>  	argv += 1;
> -	if (parse_action(&argc, &argv, TCA_ACT_TAB, &req.n)) {
> +	if (parse_action(&argc, &argv, TCA_ACT_TAB, &req->n)) {
>  		fprintf(stderr, "Illegal \"action\"\n");
>  		return -1;
>  	}
> -	tail->rta_len = (void *) NLMSG_TAIL(&req.n) - (void *) tail;
> +	tail->rta_len = (void *) NLMSG_TAIL(&req->n) - (void *) tail;
>  
> -	if (rtnl_talk(&rth, &req.n, NULL) < 0) {
> +	ret = rtnl_talk(&rth, &req->n, NULL);
> +	if (ret < 0) {
>  		fprintf(stderr, "We have an error talking to the kernel\n");
>  		ret = -1;
>  	}
> @@ -739,5 +772,5 @@ int do_action(int argc, char **argv)
>  			return -1;
>  	}
>  
> -	return 0;
> +	return ret;
>  }
> diff --git a/tc/tc.c b/tc/tc.c
> index ad9f07e9..9c8e828b 100644
> --- a/tc/tc.c
> +++ b/tc/tc.c
> @@ -189,7 +189,7 @@ static void usage(void)
>  	fprintf(stderr, "Usage: tc [ OPTIONS ] OBJECT { COMMAND | help }\n"
>  			"       tc [-force] -batch filename\n"
>  			"where  OBJECT := { qdisc | class | filter | action | monitor | exec }\n"
> -	                "       OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] | -n[etns] name |\n"
> +	                "       OPTIONS := { -s[tatistics] | -d[etails] | -r[aw] | -p[retty] | -b[atch] [filename] | -bs | -batchsize [size] | -n[etns] name |\n"
>  			"                    -nm | -nam[es] | { -cf | -conf } path } | -j[son]\n");
>  }
>  
> @@ -223,6 +223,9 @@ static int batch(const char *name)
>  	size_t len = 0;
>  	int ret = 0;
>  
> +	if (get_batch_size() > 1)
> +		setcmdlinetotal(name);
> +
>  	batch_mode = 1;
>  	if (name && strcmp(name, "-") != 0) {
>  		if (freopen(name, "r", stdin) == NULL) {
> @@ -248,15 +251,22 @@ static int batch(const char *name)
>  		if (largc == 0)
>  			continue;	/* blank line */
>  
> -		if (do_cmd(largc, largv)) {
> +		ret = do_cmd(largc, largv);
> +		if (ret != 0 && ret != 3) {
>  			fprintf(stderr, "Command failed %s:%d\n", name, cmdlineno);
>  			ret = 1;
>  			if (!force)
>  				break;
>  		}
>  	}
> +	if (ret == 3)
> +		fprintf(stderr,
> +			"Command is not sent to kernel due to internal error %s:%d\n",
> +			name, cmdlineno);
>  	if (line)
>  		free(line);
> +	free_filter_reqs();
> +	free_action_reqs();
>  
>  	rtnl_close(&rth);
>  	return ret;
> @@ -267,6 +277,7 @@ int main(int argc, char **argv)
>  {
>  	int ret;
>  	char *batch_file = NULL;
> +	int size;
>  
>  	while (argc > 1) {
>  		if (argv[1][0] != '-')
> @@ -297,6 +308,16 @@ int main(int argc, char **argv)
>  			if (argc <= 1)
>  				usage();
>  			batch_file = argv[1];
> +		} else if (matches(argv[1], "-batchsize") == 0 ||
> +				matches(argv[1], "-bs") == 0) {
> +			argc--;	argv++;
> +			if (argc <= 1)
> +				usage();
> +			size = atoi(argv[1]);
> +			if (size > MSG_IOV_MAX)
> +				set_batch_size(MSG_IOV_MAX);
> +			else
> +				set_batch_size(size);
>  		} else if (matches(argv[1], "-netns") == 0) {
>  			NEXT_ARG();
>  			if (netns_switch(argv[1]))
> @@ -342,6 +363,8 @@ int main(int argc, char **argv)
>  	}
>  
>  	ret = do_cmd(argc-1, argv+1);
> +	free_filter_reqs();
> +	free_action_reqs();
>  Exit:
>  	rtnl_close(&rth);
>  
> diff --git a/tc/tc_common.h b/tc/tc_common.h
> index 264fbdac..fde8db1b 100644
> --- a/tc/tc_common.h
> +++ b/tc/tc_common.h
> @@ -24,5 +24,8 @@ struct tc_sizespec;
>  extern int parse_size_table(int *p_argc, char ***p_argv, struct tc_sizespec *s);
>  extern int check_size_table_opts(struct tc_sizespec *s);
>  
> +extern void free_filter_reqs(void);
> +extern void free_action_reqs(void);
> +
>  extern int show_graph;
>  extern bool use_names;
> diff --git a/tc/tc_filter.c b/tc/tc_filter.c
> index 545cc3a1..dc53c128 100644
> --- a/tc/tc_filter.c
> +++ b/tc/tc_filter.c
> @@ -19,6 +19,7 @@
>  #include <arpa/inet.h>
>  #include <string.h>
>  #include <linux/if_ether.h>
> +#include <errno.h>
>  
>  #include "rt_names.h"
>  #include "utils.h"
> @@ -42,18 +43,51 @@ static void usage(void)
>  		"OPTIONS := ... try tc filter add <desired FILTER_KIND> help\n");
>  }
>  
> +typedef struct {
> +	struct nlmsghdr		n;
> +	struct tcmsg		t;
> +	char			buf[MAX_MSG];
> +} tc_filter_req;
> +
> +static tc_filter_req *filter_reqs;
> +
> +void free_filter_reqs(void)
> +{
> +	if (filter_reqs)
> +		free(filter_reqs);
> +}

You don't need to check for NULL when calling free.
free(NULL) is a nop. 

^ permalink raw reply

* Re: [PATCH] net: arc_emac: fix arc_emac_rx() error paths
From: David Miller @ 2017-12-19 15:22 UTC (permalink / raw)
  To: al.kochet; +Cc: netdev, linux-kernel, f.fainelli, edumazet
In-Reply-To: <1513358406-32503-1-git-send-email-al.kochet@gmail.com>

From: Alexander Kochetkov <al.kochet@gmail.com>
Date: Fri, 15 Dec 2017 20:20:06 +0300

> arc_emac_rx() has some issues found by code review.
> 
> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
> rx fifo entry will not be returned to EMAC.
> 
> In case dma_map_single() failure previously allocated skb became
> lost to driver. At the same time address of newly allocated skb
> will not be provided to EMAC.
> 
> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>

This patch adds quite a few bugs, here is one which shows this is not
functionally tested:

> -		/* receive_skb only if new skb was allocated to avoid holes */
> -		netif_receive_skb(skb);
> -
> -		addr = dma_map_single(&ndev->dev, (void *)rx_buff->skb->data,
> +		addr = dma_map_single(&ndev->dev, (void *)skb->data,
>  				      EMAC_BUFFER_SIZE, DMA_FROM_DEVICE);

Map the new SKB.

>  		if (dma_mapping_error(&ndev->dev, addr)) {
>  			if (net_ratelimit())
> -				netdev_err(ndev, "cannot dma map\n");
> -			dev_kfree_skb(rx_buff->skb);
> +				netdev_err(ndev, "cannot map dma buffer\n");
> +			dev_kfree_skb(skb);
> +			/* Return ownership to EMAC */
> +			rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
>  			stats->rx_errors++;
> +			stats->rx_dropped++;
>  			continue;
>  		}
> +
> +		/* unmap previosly mapped skb */
> +		dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
> +				 dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);

And then you unmap it.  "addr" is the new DMA mapping, not the old one.

^ permalink raw reply

* Re: [PATCH] tg3: Fix rx hang on MTU change with 5717/5719
From: David Miller @ 2017-12-19 15:25 UTC (permalink / raw)
  To: brking
  Cc: siva.kallam, prashant, mchan, benjamin.kun, netdev, maurosr,
	muvic, brking, stable
In-Reply-To: <1513372910-32121-1-git-send-email-brking@linux.vnet.ibm.com>

From: Brian King <brking@linux.vnet.ibm.com>
Date: Fri, 15 Dec 2017 15:21:50 -0600

> This fixes a hang issue seen when changing the MTU size from 1500 MTU
> to 9000 MTU on both 5717 and 5719 chips. In discussion with Broadcom,
> they've indicated that these chipsets have the same phy as the 57766
> chipset, so the same workarounds apply. This has been tested by IBM
> on both Power 8 and Power 9 systems as well as by Broadcom on x86
> hardware and has been confirmed to resolve the hang issue.
> 
> Cc: stable <stable@vger.kernel.org>
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>

Applied and queued up for -stable.

Please do not add explicit CC: stable tags for networking patches, as
documented in Documentation/networking/netdev-FAQ.txt

Thank you.

^ permalink raw reply

* Re: Linux ECN Handling
From: Neal Cardwell @ 2017-12-19 15:28 UTC (permalink / raw)
  To: Steve Ibanez
  Cc: Eric Dumazet, Yuchung Cheng, Daniel Borkmann, Netdev,
	Florian Westphal, Mohammad Alizadeh, Lawrence Brakmo
In-Reply-To: <CACJspmKJkHQ+yMO5JCjc8PiZ4VeZd_vKbUEUqzX-MmdXKJFAZA@mail.gmail.com>

On Tue, Dec 19, 2017 at 12:16 AM, Steve Ibanez <sibanez@stanford.edu> wrote:
>
> Hi Neal,
>
> I started looking into this receiver ACKing issue today. Strangely,
> when I tried adding printk statements at the top of the
> tcp_v4_do_rcv(), tcp_rcv_established(), __tcp_ack_snd_check() and
> tcp_send_delayed_ack() functions they were never executed on the
> machine running the iperf3 server (i.e. the destination of the flows).
> Maybe the iperf3 server is using its own TCP stack?
>
> In any case, the ACKing problem is reproducible using just normal
> iperf for which I do see my printk statements being executed. I can
> now confirm that when the CWR marked packet (for which no ACK is sent)
> arrives at the receiver, the __tcp_ack_snd_check() function is never
> invoked; and hence neither is the tcp_send_delayed_ack() function.
> Hopefully this helps narrow down where the issue might be? I started
> adding some printk statements into the tcp_rcv_established() function,
> but I'm not sure where the best places to look would be so I wanted to
> ask your advice on this.

Thanks for the detailed report!

As a next step to narrow down why the CWR-marked packet is not acked,
I would suggest adding printk statements at the bottom of
tcp_rcv_established() in all the spots where we have a goto or return
that would cause us to short-circuit and not reach the
tcp_ack_snd_check() at the bottom of the function. This could be much
like your existing nice debugging printks, that log any time we get to
that spot and if (sk->sk_family == AF_INET && th->cwr). And these
could be in the following spots (marked "here"):

slow_path:
        if (len < (th->doff << 2) || tcp_checksum_complete(skb))
                goto csum_error;                /* <=== here */

        if (!th->ack && !th->rst && !th->syn)
                goto discard;                /* <=== here */

        /*
         *      Standard slow path.
         */

        if (!tcp_validate_incoming(sk, skb, th, 1))
                return;                /* <=== here */

step5:
        if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0)
                goto discard;                /* <=== here */


thanks,
neal

^ permalink raw reply

* Re: [PATCH v2,net-next] ip6_gre: fix a pontential issue in ip6erspan_rcv
From: David Miller @ 2017-12-19 15:34 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, yoshfuji, netdev, linux-kernel, u9012063
In-Reply-To: <1513391125-28227-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 16 Dec 2017 10:25:25 +0800

> pskb_may_pull() can change skb->data, so we need to load ipv6h/ershdr at
> the right place.
> 
> Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
> Acked-by: William Tu <u9012063@gmail.com>
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

This patch does not apply:

> +	ipv6h = ipv6_hdr(skb);
> +	ershdr = (struct erspan_base_hdr *)skb->data;
>  	ver = (ntohs(ershdr->ver_vlan) & VER_MASK) >> VER_OFFSET;
>  	tpi->key = cpu_to_be32(ntohs(ershdr->session_id) & ID_MASK);
>  	pkt_md = (struct erspan_metadata *)(ershdr + 1);

There is not "pkt_md = ..." assignment in net-next on this line.

^ permalink raw reply

* Re: [PATCH v2,net-next 1/2] ip_gre: fix potential memory leak in erspan_rcv
From: David Miller @ 2017-12-19 15:36 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, yoshfuji, netdev, linux-kernel, u9012063
In-Reply-To: <1513392519-30127-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 16 Dec 2017 10:48:38 +0800

> If md is NULL, tun_dst must be freed, otherwise it will cause memory
> leak.
> 
> Fixes: 1a66a836da6 ("gre: add collect_md mode to ERSPAN tunnel")
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> 
> Change since v2:
>   * Rebase on latest master branch.
>   * Correct wrong fix information.

Please do not put a changelog after the fixes and signoff tags, those tags must
appear last in the commit message.

Thank you.

^ permalink raw reply

* Re: [PATCH v2,net-next 2/2] ip6_gre: fix potential memory leak in ip6erspan_rcv
From: David Miller @ 2017-12-19 15:36 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, yoshfuji, netdev, linux-kernel, u9012063
In-Reply-To: <1513392519-30127-2-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 16 Dec 2017 10:48:39 +0800

> If md is NULL, tun_dst must be freed, otherwise it will cause memory
> leak.
> 
> Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
> 
> Change since v2:
>   * Rebase on latest master branch.
>   * Correct wrong fix information.

Likewise, please place the tags at the end of the commit message.

^ permalink raw reply

* RE: [PATCH v3 net-next 6/6] tls: Add generic NIC offload infrastructure.
From: Ilya Lesokhin @ 2017-12-19 15:38 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev@vger.kernel.org, davem@davemloft.net, davejwatson@fb.com,
	tom@herbertland.com, hannes@stressinduktion.org, Boris Pismenny,
	Aviad Yehezkel, Liran Liss
In-Reply-To: <20171219151138.GD6122@localhost.localdomain>

Tuesday, December 19, 2017 5:12 PM, Marcelo Ricardo Leitner wrote:

> > I'm not quite sure what you mean by "no net_device's are registered"
> > Presumably you mean there is no device that implements the
> > NETIF_F_HW_TLS_TX capability yet.
> 
> Not really. Let me try again. This patchset is using the expression "tls_device".
> When I read that, I expect a new interface type, like a tunnel, that would be
> created on top of another interface that has the offloading capability. That's
> why I'm confused. IMHO "tls_offload" is a better fit. Makes sense?
> 

We don't expose a new interface. An existing netdev does the offload.

The xfrm layer also calls the offload layer xfrm_device and It also doesn't need to
add another interface to offload ipsec to a netdev.

I thought about calling it tls_hw or tls_hw_offload.
The problem is that the important distinction here is that the 
offload is done by a netdev.
tls_sw can also use hw offload if you have the required 
memory to memory crypto engine and crypto_alloc_aead("gcm(aes)", 0, 0); 
decides on using it.

^ permalink raw reply

* Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available
From: Michael S. Tsirkin @ 2017-12-19 15:47 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: stephen, netdev, virtualization, alexander.duyck
In-Reply-To: <1513644036-45230-1-git-send-email-sridhar.samudrala@intel.com>

I'll need to look at this more, in particular the feature
bit is missing here. For now one question:

On Mon, Dec 18, 2017 at 04:40:36PM -0800, Sridhar Samudrala wrote:
> @@ -56,6 +58,8 @@ module_param(napi_tx, bool, 0644);
>   */
>  DECLARE_EWMA(pkt_len, 0, 64)
>  
> +#define VF_TAKEOVER_INT	(HZ / 10)
> +
>  #define VIRTNET_DRIVER_VERSION "1.0.0"
>  
>  static const unsigned long guest_offloads[] = {

Why is this delay necessary? And why by 100ms?

^ permalink raw reply

* Re: [PATCH v4 net-next 1/6] lwt: Add net to build_state argument
From: Roopa Prabhu @ 2017-12-19 15:47 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev, rohit
In-Reply-To: <20171215182800.10248-2-tom@quantonium.net>

On Fri, Dec 15, 2017 at 10:27 AM, Tom Herbert <tom@quantonium.net> wrote:
> Users of LWT need to know net if they want to have per net operations
> in LWT.
>
> Signed-off-by: Tom Herbert <tom@quantonium.net>

Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH] net: arc_emac: fix arc_emac_rx() error paths
From: Alexander Kochetkov @ 2017-12-19 15:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, f.fainelli, edumazet
In-Reply-To: <20171219.102254.1959679351283055093.davem@davemloft.net>


> 19 дек. 2017 г., в 18:22, David Miller <davem@davemloft.net> написал(а):
> 
> From: Alexander Kochetkov <al.kochet@gmail.com>
> Date: Fri, 15 Dec 2017 20:20:06 +0300
> 
>> arc_emac_rx() has some issues found by code review.
>> 
>> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
>> rx fifo entry will not be returned to EMAC.
>> 
>> In case dma_map_single() failure previously allocated skb became
>> lost to driver. At the same time address of newly allocated skb
>> will not be provided to EMAC.
>> 
>> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
> 
> This patch adds quite a few bugs, here is one which shows this is not
> functionally tested:

May be I don’t understand correctly, but I don’t see bug here.

Wrong dma mapping usage should immediately manifest itself (kernel
instability, koops and so on). The patch was tested on rk3188 and work fine for me.
Also I did simulations of netdev_alloc_skb_ip_align() and dma_map_single()
faults and can confirm that new error handling work.

Could someone else with ARC EMAC test the patch? I would be very grateful for the help.
Florian or Eric, can you test it on your hardware?

>> +
>> +		/* unmap previosly mapped skb */
>> +		dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
>> +				 dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
> 
> And then you unmap it.  "addr" is the new DMA mapping, not the old one.

The old mapping should be unmapped here. It refer to old skb what contains already
received packet. Because buffer doesn’t belong to EMAC anymore.
That old mapping point to old skb buffer. And netif_receive_skb() will be
called for old skb after that.

The new mapping «addr" will be unmapped then EMAC receive
new packet. Later by the code (after netif_receive_skb()) there are lines for
saving «addr» value:

    rx_buff->skb = skb;
    dma_unmap_addr_set(rx_buff, addr, addr);
    dma_unmap_len_set(rx_buff, len, EMAC_BUFFER_SIZE);

Regards,
Alexander.

^ permalink raw reply

* Re: [PATCH net-next v5 0/5] Introduce NETIF_F_GRO_HW
From: David Miller @ 2017-12-19 15:50 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, andrew.gospodarek
In-Reply-To: <1513411784-17653-1-git-send-email-michael.chan@broadcom.com>

From: Michael Chan <michael.chan@broadcom.com>
Date: Sat, 16 Dec 2017 03:09:39 -0500

> Introduce NETIF_F_GRO_HW feature flag and convert drivers that support
> hardware GRO to use the new flag.

Series applied, thanks for following through with this work.

^ permalink raw reply

* Re: [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28
From: Quentin Monnet @ 2017-12-19 15:57 UTC (permalink / raw)
  To: Roman Gushchin, netdev
  Cc: linux-kernel, kernel-team, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann
In-Reply-To: <20171219143821.26291-1-guro@fb.com>

Hi Roman, thanks for working on this!

2017-12-19 14:38 UTC+0000 ~ Roman Gushchin <guro@fb.com>
> Bpftool build is broken with binutils version 2.28 and later.
> The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> in the binutils repo, which changed the disassembler() function
> signature.
>
> Fix this by checking binutils version and use an appropriate
> disassembler() signature.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/bpf/Makefile             | 6 ++++++
>  tools/bpf/bpf_jit_disasm.c     | 7 +++++++
>  tools/bpf/bpftool/Makefile     | 6 ++++++
>  tools/bpf/bpftool/jit_disasm.c | 5 +++++
>  4 files changed, 24 insertions(+)
>
> diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
> index 07a6697466ef..3fd32fd0daa1 100644
> --- a/tools/bpf/Makefile
> +++ b/tools/bpf/Makefile
> @@ -6,8 +6,14 @@ LEX = flex
>  YACC = bison
>  MAKE = make
>  
> +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
> +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> +
>  CFLAGS += -Wall -O2
>  CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
>  
>  %.yacc.c: %.y
>  	$(YACC) -o $@ -d $<
> diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
> index 75bf526a0168..3ef7c8bdc0f3 100644
> --- a/tools/bpf/bpf_jit_disasm.c
> +++ b/tools/bpf/bpf_jit_disasm.c
> @@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int opcodes)
>  
>  	disassemble_init_for_target(&info);
>  
> +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> +	disassemble = disassembler(bfd_get_arch(bfdf),
> +				   bfd_big_endian(bfdf),
> +				   bfd_get_mach(bfdf),
> +				   bfdf);
> +#else
>  	disassemble = disassembler(bfdf);
> +#endif
>  	assert(disassemble);
>  
>  	do {
> diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> index 3f17ad317512..94ad51bf14b5 100644
> --- a/tools/bpf/bpftool/Makefile
> +++ b/tools/bpf/bpftool/Makefile
> @@ -40,6 +40,12 @@ CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
>  CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
>  LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
>  
> +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))

This does not seem to be portable. I tried that on Ubuntu and `readelf
-v` returns "GNU readelf (GNU Binutils for Ubuntu) 2.26.1", and
BINUTILS_VER catches "Binutils".

> +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
> +
>  INSTALL ?= install
>  RM ?= rm -f
>  
> diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
> index 1551d3918d4c..eaa7127e9eeb 100644
> --- a/tools/bpf/bpftool/jit_disasm.c
> +++ b/tools/bpf/bpftool/jit_disasm.c
> @@ -107,7 +107,12 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
>  
>  	disassemble_init_for_target(&info);
>  
> +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> +	disassemble = disassembler(bfd_get_arch(bfdf), bfd_big_endian(bfdf),
> +				   bfd_get_mach(bfdf), bfdf);
> +#else
>  	disassemble = disassembler(bfdf);
> +#endif
>  	assert(disassemble);
>  
>  	if (json_output)

I discussed this issue with Jakub recently, and one suggestion he had
was to look in tools/build/feature to add a new "feature", by trying to
compile short programs, for making the distinction between binutils
versions. It probably requires more work, but could be more robust than
parsing the version from the command line?

Best,
Quentin

^ permalink raw reply

* Re: [PATCH net-next 1/1] forcedeth: remove duplicate structure member in xmit
From: David Miller @ 2017-12-19 15:57 UTC (permalink / raw)
  To: yanjun.zhu; +Cc: netdev, stephen
In-Reply-To: <1513416663-9321-1-git-send-email-yanjun.zhu@oracle.com>

From: Zhu Yanjun <yanjun.zhu@oracle.com>
Date: Sat, 16 Dec 2017 04:31:03 -0500

> Since both first_tx_ctx and tx_skb are the head of tx ctx, it not
> necessary to use two structure members to statically indicate
> the head of tx ctx. So first_tx_ctx is removed.
> 
> CC: Srinivas Eeda <srinivas.eeda@oracle.com>
> CC: Joe Jin <joe.jin@oracle.com>
> CC: Junxiao Bi <junxiao.bi@oracle.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH iproute2 1/3] iplink: Improve index parameter handling
From: Stephen Hemminger @ 2017-12-19 15:59 UTC (permalink / raw)
  To: Serhey Popovich; +Cc: netdev
In-Reply-To: <18e98fc6-4145-0bb9-143d-d4305d22bdc8@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2680 bytes --]

On Mon, 18 Dec 2017 23:37:09 +0200
Serhey Popovich <serhe.popovych@gmail.com> wrote:

> Stephen Hemminger wrote:
> > On Mon, 18 Dec 2017 23:02:07 +0200
> > Serhey Popovich <serhe.popovych@gmail.com> wrote:
> >   
> >> Stephen Hemminger wrote:  
> >>> On Mon, 18 Dec 2017 20:54:06 +0200
> >>> Serhey Popovych <serhe.popovych@gmail.com> wrote:
> >>>     
> >>>> diff --git a/ip/iplink.c b/ip/iplink.c
> >>>> index 1e685cc..4f9c169 100644
> >>>> --- a/ip/iplink.c
> >>>> +++ b/ip/iplink.c
> >>>> @@ -586,8 +586,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
> >>>>  			*name = *argv;
> >>>>  		} else if (strcmp(*argv, "index") == 0) {
> >>>>  			NEXT_ARG();
> >>>> +			if (*index)
> >>>> +				duparg("index", *argv);
> >>>>  			*index = atoi(*argv);
> >>>> -			if (*index < 0)
> >>>> +			if (*index <= 0)    
> >>>
> >>> Why not use strtoul instead of atoi?    
> >> Do not see reason for strtoul() instead atoi():
> >>
> >>   1) main arg: indexes in kernel represented as "int", which is
> >>      signed. <= 0 values are reserved for various special purposes
> >>      (see net/core/fib_rules.c on how device matching implemented).
> >>
> >>      Configuring network device manually with index <= 0 is not correct
> >>      (however possible). Kernel itself never chooses ifindex <= 0.
> >>
> >>      Having unsigned int > 0x7fffffff actually means index <= 0.
> >>
> >>   2) this is not single place in iproute2 where it is used: not
> >>      going to remove last user.
> >>
> >>   3) make changes clear and transparent for review.  
> > 
> > I would rather all of iproute2 correctly handles unsigned values.
> > Too much code is old K&R style C "the world is an int" and "who needs
> > to check for negative".  
> 
> You are right :(. I'm just trying to improve things a bit.
> 
> > 
> > There already is get_unsigned() in iproute2 util functions.  
> This is good one based on strtoul(). But do we want to submit say
> index = (unsigned int)2147483648(0x7fffffff) to the kernel that is
> illegal from it's perspective?
> 
> Or do you mean I can prepare treewide change to replace atoi() with
> get_unsigned()/get_integer() where appropriate?
> 
> We already check if (*index < 0) since commit 3c682146aeff
> (iplink: forbid negative ifindex and modifying ifindex), and I just
> put index == 0 in the same range of invalid indexes.
> 

The legacy BSD ABI for interfaces uses int, so that sets the upper
bound for kernel.

The netlink ABI limit is u32 for ifindex so technically 1..UINT32_MAX are
possible values but kernel is bound by BSD mistake.

I will take the original patch.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 0/2] net: speedup geneve/vxlan tunnels dismantle
From: David Miller @ 2017-12-19 16:00 UTC (permalink / raw)
  To: yanhaishuang; +Cc: edumazet, netdev, linux-kernel
In-Reply-To: <1513418090-8595-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sat, 16 Dec 2017 17:54:48 +0800

> This patch series add batching to vxlan/geneve tunnels so that netns
> dismantles are less costly.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH] isdn: avm: Handle return value of skb_dequeue()
From: David Miller @ 2017-12-19 16:04 UTC (permalink / raw)
  To: arvind.yadav.cs; +Cc: isdn, stephen, johannes.berg, linux-kernel, netdev
In-Reply-To: <5818ec104ed4354db3625ac8c7741d771719e4d4.1513453258.git.arvind.yadav.cs@gmail.com>

From: Arvind Yadav <arvind.yadav.cs@gmail.com>
Date: Sun, 17 Dec 2017 01:17:23 +0530

> skb_dequeue() will return NULL for an empty list or a pointer
> to the head element.
> 
> Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>

There is no code path where this is possible.

The two call sites:

1) Explicitly add an SKB to the queue

OR

2) Explcitily guard the call with an skb_queue_empty() check

Therefore the stated condition is not possible.

^ permalink raw reply

* Re: [PATCH iproute2 1/3] iplink: Improve index parameter handling
From: Serhey Popovich @ 2017-12-19 16:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20171219075951.7aca0d53@xeon-e3>


[-- Attachment #1.1: Type: text/plain, Size: 2783 bytes --]

Stephen Hemminger wrote:
> On Mon, 18 Dec 2017 23:37:09 +0200
> Serhey Popovich <serhe.popovych@gmail.com> wrote:
> 
>> Stephen Hemminger wrote:
>>> On Mon, 18 Dec 2017 23:02:07 +0200
>>> Serhey Popovich <serhe.popovych@gmail.com> wrote:
>>>   
>>>> Stephen Hemminger wrote:  
>>>>> On Mon, 18 Dec 2017 20:54:06 +0200
>>>>> Serhey Popovych <serhe.popovych@gmail.com> wrote:
>>>>>     
>>>>>> diff --git a/ip/iplink.c b/ip/iplink.c
>>>>>> index 1e685cc..4f9c169 100644
>>>>>> --- a/ip/iplink.c
>>>>>> +++ b/ip/iplink.c
>>>>>> @@ -586,8 +586,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
>>>>>>  			*name = *argv;
>>>>>>  		} else if (strcmp(*argv, "index") == 0) {
>>>>>>  			NEXT_ARG();
>>>>>> +			if (*index)
>>>>>> +				duparg("index", *argv);
>>>>>>  			*index = atoi(*argv);
>>>>>> -			if (*index < 0)
>>>>>> +			if (*index <= 0)    
>>>>>
>>>>> Why not use strtoul instead of atoi?    
>>>> Do not see reason for strtoul() instead atoi():
>>>>
>>>>   1) main arg: indexes in kernel represented as "int", which is
>>>>      signed. <= 0 values are reserved for various special purposes
>>>>      (see net/core/fib_rules.c on how device matching implemented).
>>>>
>>>>      Configuring network device manually with index <= 0 is not correct
>>>>      (however possible). Kernel itself never chooses ifindex <= 0.
>>>>
>>>>      Having unsigned int > 0x7fffffff actually means index <= 0.
>>>>
>>>>   2) this is not single place in iproute2 where it is used: not
>>>>      going to remove last user.
>>>>
>>>>   3) make changes clear and transparent for review.  
>>>
>>> I would rather all of iproute2 correctly handles unsigned values.
>>> Too much code is old K&R style C "the world is an int" and "who needs
>>> to check for negative".  
>>
>> You are right :(. I'm just trying to improve things a bit.
>>
>>>
>>> There already is get_unsigned() in iproute2 util functions.  
>> This is good one based on strtoul(). But do we want to submit say
>> index = (unsigned int)2147483648(0x7fffffff) to the kernel that is
>> illegal from it's perspective?
>>
>> Or do you mean I can prepare treewide change to replace atoi() with
>> get_unsigned()/get_integer() where appropriate?
>>
>> We already check if (*index < 0) since commit 3c682146aeff
>> (iplink: forbid negative ifindex and modifying ifindex), and I just
>> put index == 0 in the same range of invalid indexes.
>>
> 
> The legacy BSD ABI for interfaces uses int, so that sets the upper
> bound for kernel.
> 
> The netlink ABI limit is u32 for ifindex so technically 1..UINT32_MAX are
> possible values but kernel is bound by BSD mistake.
Thank you for in depth explanation!

> 
> I will take the original patch.
> 
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH 0/4] bcm63xx_enet: remove mac_id usage
From: David Miller @ 2017-12-19 16:07 UTC (permalink / raw)
  To: jonas.gorski
  Cc: netdev, linux-mips, ralf, f.fainelli, bcm-kernel-feedback-list
In-Reply-To: <20171217160255.30342-1-jonas.gorski@gmail.com>

From: Jonas Gorski <jonas.gorski@gmail.com>
Date: Sun, 17 Dec 2017 17:02:51 +0100

> This patchset aims at reducing the platform device id number usage with
> the target of making it eventually possible to probe the driver through OF.
> 
> Runtested on BCM6358.
> 
> Since the patches touch mostly net/, they should go through net-next.

Series applied, thank you.

^ permalink raw reply

* Re: [patch net] mlxsw: spectrum_router: Remove batch neighbour deletion causing FW bug
From: David Miller @ 2017-12-19 16:08 UTC (permalink / raw)
  To: jiri; +Cc: netdev, petrm, idosch, mlxsw
In-Reply-To: <20171217161643.2725-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Sun, 17 Dec 2017 17:16:43 +0100

> From: Petr Machata <petrm@mellanox.com>
> 
> This reverts commit 63dd00fa3e524c27cc0509190084ab147ecc8ae2.
> 
> RAUHT DELETE_ALL seems to trigger a bug in FW. That manifests by later
> calls to RAUHT ADD of an IPv6 neighbor to fail with "bad parameter"
> error code.
> 
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> Fixes: 63dd00fa3e52 ("mlxsw: spectrum_router: Add batch neighbour deletion")
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Applied.

^ permalink raw reply

* Re: [patch net-next] mlxsw: spectrum: Add "spectrum" prefix macro
From: David Miller @ 2017-12-19 16:09 UTC (permalink / raw)
  To: joe; +Cc: jiri, netdev, arkadis, idosch, mlxsw
In-Reply-To: <1513527900.31581.27.camel@perches.com>

From: Joe Perches <joe@perches.com>
Date: Sun, 17 Dec 2017 08:25:00 -0800

> On Sun, 2017-12-17 at 17:15 +0100, Jiri Pirko wrote:
>> From: Arkadi Sharshevsky <arkadis@mellanox.com>
>> 
>> Add "spectrum" string prefix macro for error strings.
> []
>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
> []
>> @@ -4168,13 +4168,11 @@ mlxsw_sp_master_lag_check(struct mlxsw_sp *mlxsw_sp,
>>  	u16 lag_id;
>>  
>>  	if (mlxsw_sp_lag_index_get(mlxsw_sp, lag_dev, &lag_id) != 0) {
>> -		NL_SET_ERR_MSG(extack,
>> -			       "spectrum: Exceeded number of supported LAG devices");
>> +		NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "Exceeded number of supported LAG devices");
> 
> Perhaps use NL_SET_ERR_MSG_MOD instead.

Yeah that probably makes sense.

^ permalink raw reply

* Re: [RFC PATCH net-next] tools/bpf: fix build with binutils >= 2.28
From: Roman Gushchin @ 2017-12-19 16:10 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: netdev, linux-kernel, kernel-team, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann
In-Reply-To: <79acdc04-bba9-c9d5-a651-57d0e9628653@netronome.com>

On Tue, Dec 19, 2017 at 03:57:02PM +0000, Quentin Monnet wrote:
> Hi Roman, thanks for working on this!
> 
> 2017-12-19 14:38 UTC+0000 ~ Roman Gushchin <guro@fb.com>
> > Bpftool build is broken with binutils version 2.28 and later.
> > The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
> > in the binutils repo, which changed the disassembler() function
> > signature.
> >
> > Fix this by checking binutils version and use an appropriate
> > disassembler() signature.
> >
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Cc: Alexei Starovoitov <ast@kernel.org>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > ---
> >  tools/bpf/Makefile             | 6 ++++++
> >  tools/bpf/bpf_jit_disasm.c     | 7 +++++++
> >  tools/bpf/bpftool/Makefile     | 6 ++++++
> >  tools/bpf/bpftool/jit_disasm.c | 5 +++++
> >  4 files changed, 24 insertions(+)
> >
> > diff --git a/tools/bpf/Makefile b/tools/bpf/Makefile
> > index 07a6697466ef..3fd32fd0daa1 100644
> > --- a/tools/bpf/Makefile
> > +++ b/tools/bpf/Makefile
> > @@ -6,8 +6,14 @@ LEX = flex
> >  YACC = bison
> >  MAKE = make
> >  
> > +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
> > +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +
> >  CFLAGS += -Wall -O2
> >  CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> > +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> > +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
> >  
> >  %.yacc.c: %.y
> >  	$(YACC) -o $@ -d $<
> > diff --git a/tools/bpf/bpf_jit_disasm.c b/tools/bpf/bpf_jit_disasm.c
> > index 75bf526a0168..3ef7c8bdc0f3 100644
> > --- a/tools/bpf/bpf_jit_disasm.c
> > +++ b/tools/bpf/bpf_jit_disasm.c
> > @@ -72,7 +72,14 @@ static void get_asm_insns(uint8_t *image, size_t len, int opcodes)
> >  
> >  	disassemble_init_for_target(&info);
> >  
> > +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> > +	disassemble = disassembler(bfd_get_arch(bfdf),
> > +				   bfd_big_endian(bfdf),
> > +				   bfd_get_mach(bfdf),
> > +				   bfdf);
> > +#else
> >  	disassemble = disassembler(bfdf);
> > +#endif
> >  	assert(disassemble);
> >  
> >  	do {
> > diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
> > index 3f17ad317512..94ad51bf14b5 100644
> > --- a/tools/bpf/bpftool/Makefile
> > +++ b/tools/bpf/bpftool/Makefile
> > @@ -40,6 +40,12 @@ CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow
> >  CFLAGS += -D__EXPORTED_HEADERS__ -I$(srctree)/tools/include/uapi -I$(srctree)/tools/include -I$(srctree)/tools/lib/bpf -I$(srctree)/kernel/bpf/
> >  LIBS = -lelf -lbfd -lopcodes $(LIBBPF)
> >  
> > +BINUTILS_VER := $(word 4, $(shell readelf -v | head -n 1))
> 
> This does not seem to be portable. I tried that on Ubuntu and `readelf
> -v` returns "GNU readelf (GNU Binutils for Ubuntu) 2.26.1", and
> BINUTILS_VER catches "Binutils".
> 
> > +BINUTILS_VER_MAJ := $(word 1, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +BINUTILS_VER_MIN := $(word 2, $(subst ., , $(subst -, , ${BINUTILS_VER})))
> > +CFLAGS += -DBINUTILS_VER_MAJ=${BINUTILS_VER_MAJ}
> > +CFLAGS += -DBINUTILS_VER_MIN=${BINUTILS_VER_MIN}
> > +
> >  INSTALL ?= install
> >  RM ?= rm -f
> >  
> > diff --git a/tools/bpf/bpftool/jit_disasm.c b/tools/bpf/bpftool/jit_disasm.c
> > index 1551d3918d4c..eaa7127e9eeb 100644
> > --- a/tools/bpf/bpftool/jit_disasm.c
> > +++ b/tools/bpf/bpftool/jit_disasm.c
> > @@ -107,7 +107,12 @@ void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes)
> >  
> >  	disassemble_init_for_target(&info);
> >  
> > +#if (BINUTILS_VER_MAJ >= 2) && (BINUTILS_VER_MIN >= 28)
> > +	disassemble = disassembler(bfd_get_arch(bfdf), bfd_big_endian(bfdf),
> > +				   bfd_get_mach(bfdf), bfdf);
> > +#else
> >  	disassemble = disassembler(bfdf);
> > +#endif
> >  	assert(disassemble);
> >  
> >  	if (json_output)
> 
> I discussed this issue with Jakub recently, and one suggestion he had
> was to look in tools/build/feature to add a new "feature", by trying to
> compile short programs, for making the distinction between binutils
> versions. It probably requires more work, but could be more robust than
> parsing the version from the command line?

Hm, might be an option. Parsing readelf output is pretty ugly, here I agree.
In general it feels more like a binutils issue, so we have to workaround it
in either way.

Is Jakub or someone else working on it?

Thanks!

^ permalink raw reply

* Re: [PATCH net] vxlan: update skb dst pmtu on tx path
From: David Miller @ 2017-12-19 16:12 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, jbenc
In-Reply-To: <62a0508ac6fde19a62a81f0a5451d39b8a07d722.1513578056.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 18 Dec 2017 14:20:56 +0800

> Unlike ip tunnels, now vxlan doesn't do any pmtu update for
> upper dst pmtu, even if it doesn't match the lower dst pmtu
> any more.
> 
> The problem can be reproduced when reducing the vxlan lower
> dev's pmtu when running netperf. In jianlin's testing, the
> performance went to 1/7 of the previous.
> 
> This patch is to update the upper dst pmtu to match the lower
> dst pmtu on tx path so that packets can be sent out even when
> lower dev's pmtu has been changed.
> 
> It also works for metadata dst.
> 
> Note that this patch doesn't process any pmtu icmp packet.
> But even in the future, the support for pmtu icmp packets
> process of udp tunnels will also needs this.
> 
> The same thing will be done for geneve in another patch.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Yikes...

You're going to have to find a way to fix this without
invoking ->update_pmtu() on every single transmit.  That's
really excessive, especially for an operation which is
going to be a NOP %99.9999 of the time.

We need some way, instead, for the MTU change event to propagate
properly.  I know this might be hard, but doing this in the transmit
handler on every packet to deal with it is not the way to go.

Thanks.

^ permalink raw reply

* Re: [net] Revert "net: core: maybe return -EEXIST in __dev_alloc_name"
From: Johannes Berg @ 2017-12-19 16:15 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: netdev@vger.kernel.org, Jouni Malinen, Rasmus Villemoes,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <87vah29a1m.fsf@concordia.ellerman.id.au>

Hi,

> This revert seems to have broken networking on one of my powerpc
> machines, according to git bisect.

Fun!

TBH, I only looked at the immediate problem we ran into, and reverted
what was causing it. I don't think we saw the follow-up problem you're
seeing.

> The symptom is DHCP fails and I don't get a link, I didn't dig any
> further than that. I can if it's helpful.
> 
> I think the problem is that 87c320e51519 ("net: core: dev_get_valid_name
> is now the same as dev_alloc_name_ns") only makes sense while
> d6f295e9def0 remains in the tree.
> 
> ie. before the entire series, dev_get_valid_name() would return EEXIST,
> and that was retained when 87c320e51519 was merged, but now that
> d6f295e9def0 has been reverted dev_get_valid_name() is returning ENFILE.
> 
> I can get the network up again if I also revert 87c320e51519 ("net:
> core: dev_get_valid_name is now the same as dev_alloc_name_ns"), or with
> the gross patch below.

Makes sense. I guess that should be reverted too then, or even your
"gross" patch applied.

johannes

^ permalink raw reply

* Re: [patch net-next] mlxsw: spectrum: Add "spectrum" prefix macro
From: Jiri Pirko @ 2017-12-19 16:17 UTC (permalink / raw)
  To: David Miller; +Cc: joe, netdev, arkadis, idosch, mlxsw
In-Reply-To: <20171219.110906.560509930655165996.davem@davemloft.net>

Tue, Dec 19, 2017 at 05:09:06PM CET, davem@davemloft.net wrote:
>From: Joe Perches <joe@perches.com>
>Date: Sun, 17 Dec 2017 08:25:00 -0800
>
>> On Sun, 2017-12-17 at 17:15 +0100, Jiri Pirko wrote:
>>> From: Arkadi Sharshevsky <arkadis@mellanox.com>
>>> 
>>> Add "spectrum" string prefix macro for error strings.
>> []
>>> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>> []
>>> @@ -4168,13 +4168,11 @@ mlxsw_sp_master_lag_check(struct mlxsw_sp *mlxsw_sp,
>>>  	u16 lag_id;
>>>  
>>>  	if (mlxsw_sp_lag_index_get(mlxsw_sp, lag_dev, &lag_id) != 0) {
>>> -		NL_SET_ERR_MSG(extack,
>>> -			       "spectrum: Exceeded number of supported LAG devices");
>>> +		NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "Exceeded number of supported LAG devices");
>> 
>> Perhaps use NL_SET_ERR_MSG_MOD instead.
>
>Yeah that probably makes sense.

Yes. Thanks

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox