* Re: tc-ipt v0.2: Extension does not know id 1504083504
From: Corey Hickey @ 2017-09-30 6:52 UTC (permalink / raw)
To: Sergey K., netdev
In-Reply-To: <CAGv8E+wkpsZL8SMzx+oXei6245RU+SacrBcUN7p2RtXcaBCafw@mail.gmail.com>
On 2017-09-29 23:34, Sergey K. wrote:
> Hello.
>
> I have to apply this patch
>
> https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/commit/?h=v4.10.0&id=97a02cabefb2e2dcfe27f89943709afa84be5525
>
> to my version of iproute2.
> But now I have a new information message, when I'm using construction like this:
>
> tc filter add dev eth0 parent ffff: u32 match u32 0 0 action xt -j
> MARK --set-mark 0
>
> message text:
> tc-ipt v0.2: Extension does not know id 1504083504
>
>
> I'm using Debian Stretch, kernel 4.9.0-3-amd64, iptables 1.6.0 and
> patched iproute 4.9.0
>
> How to solve?
Funny, I just ran into this too and subscribed here to report it. The
error occurs during parsing of any options to the jump target; if the
target has no options, there is no error.
The problem seems to be an outdated version of struct xtables_target in
include/xtables.h. The version in iptables has an additional member
"udata" that makes the offsets in the struct different for anything
following.
A quick fix for this particular problem is to copy include/xtables.h from:
git://git.netfilter.org/iptables
...into include/ in the iproute2 source, then recompile after a 'make
clean'.
As for a comprehensive fix, I don't know--presumably other headers in
include/ may be out of date, but I don't want to just blindly send a
patch unless someone who knows the ramifications says it's ok. This
seems like it would need maintainer oversight. If there's something I
can do, though, let me know.
-Corey
^ permalink raw reply
* Re: [PATCH v2] netlink: do not proceed if dump's start() errs
From: Johannes Berg @ 2017-09-30 6:56 UTC (permalink / raw)
To: Jason A. Donenfeld, davem, netdev, linux-kernel
In-Reply-To: <20170927224144.1749-1-Jason@zx2c4.com>
On Thu, 2017-09-28 at 00:41 +0200, Jason A. Donenfeld wrote:
> Drivers that use the start method for netlink dumping rely on dumpit
> not
> being called if start fails. For example, ila_xlat.c allocates memory
> and assigns it to cb->args[0] in its start() function. It might fail
> to
> do that and return -ENOMEM instead. However, even when returning an
> error, dumpit will be called, which, in the example above, quickly
> dereferences the memory in cb->args[0], which will OOPS the kernel.
> This
> is but one example of how this goes wrong.
>
> Since start() has always been a function with an int return type, it
> therefore makes sense to use it properly, rather than ignoring it.
> This
> patch thus returns early and does not call dumpit() when start()
> fails.
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
FWIW, I found another (indirect, via genetlink, like ila_xlat.c) in-
tree user that cares and expects the correct failure behaviour:
net/ipv6/seg6.c
.start = seg6_genl_dumphmac_start,
which can also have memory allocation failures. No others appear to
exist, afaict.
Either way, perhaps it's worth sending this to stable for that reason.
johannes
^ permalink raw reply
* tc-ipt v0.2: Extension does not know id 1504083504
From: Sergey K. @ 2017-09-30 6:34 UTC (permalink / raw)
To: netdev
Hello.
I have to apply this patch
https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git/commit/?h=v4.10.0&id=97a02cabefb2e2dcfe27f89943709afa84be5525
to my version of iproute2.
But now I have a new information message, when I'm using construction like this:
tc filter add dev eth0 parent ffff: u32 match u32 0 0 action xt -j
MARK --set-mark 0
message text:
tc-ipt v0.2: Extension does not know id 1504083504
I'm using Debian Stretch, kernel 4.9.0-3-amd64, iptables 1.6.0 and
patched iproute 4.9.0
How to solve?
^ permalink raw reply
* Re: [PATCH v2] netlink: do not proceed if dump's start() errs
From: David Miller @ 2017-09-30 6:27 UTC (permalink / raw)
To: Jason; +Cc: johannes.berg, netdev, linux-kernel, johannes
In-Reply-To: <20170927224144.1749-1-Jason@zx2c4.com>
From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Thu, 28 Sep 2017 00:41:44 +0200
> Drivers that use the start method for netlink dumping rely on dumpit not
> being called if start fails. For example, ila_xlat.c allocates memory
> and assigns it to cb->args[0] in its start() function. It might fail to
> do that and return -ENOMEM instead. However, even when returning an
> error, dumpit will be called, which, in the example above, quickly
> dereferences the memory in cb->args[0], which will OOPS the kernel. This
> is but one example of how this goes wrong.
>
> Since start() has always been a function with an int return type, it
> therefore makes sense to use it properly, rather than ignoring it. This
> patch thus returns early and does not call dumpit() when start() fails.
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Johannes Berg <johannes@sipsolutions.net>
Johannes, this looks straightforward to me, but please would you give
it a quick review?
Thank you.
^ permalink raw reply
* Re: [PATCH] net: hns3: Fix an error handling path in 'hclge_rss_init_hw()'
From: Yunsheng Lin @ 2017-09-30 6:10 UTC (permalink / raw)
To: Christophe JAILLET, yisen.zhuang, salil.mehta, davem, lipeng321,
colin.king, arnd
Cc: netdev, linux-kernel, kernel-janitors
In-Reply-To: <20170930053434.4558-1-christophe.jaillet@wanadoo.fr>
Hi, Christophe
On 2017/9/30 13:34, Christophe JAILLET wrote:
> If this sanity check fails, we must free 'rss_indir'. Otherwise there is a
> memory leak.
> 'goto err' as done in the other error handling paths to fix it.
Thanks for fixing.
>
> Fixes: 46a3df9f9718 ("net: hns3: Fix for setting rss_size incorrectly")
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
> ---
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> index e0685e630afe..c1cdbfd83bdb 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> @@ -2652,7 +2652,8 @@ static int hclge_rss_init_hw(struct hclge_dev *hdev)
> dev_err(&hdev->pdev->dev,
> "Configure rss tc size failed, invalid TC_SIZE = %d\n",
> rss_size);
> - return -EINVAL;
> + ret = -EINVAL;
> + goto err;
> }
>
> roundup_size = roundup_pow_of_two(rss_size);
>
^ permalink raw reply
* [PATCH] net: hns3: Fix an error handling path in 'hclge_rss_init_hw()'
From: Christophe JAILLET @ 2017-09-30 5:34 UTC (permalink / raw)
To: yisen.zhuang, salil.mehta, davem, linyunsheng, lipeng321,
colin.king, arnd
Cc: netdev, linux-kernel, kernel-janitors, Christophe JAILLET
If this sanity check fails, we must free 'rss_indir'. Otherwise there is a
memory leak.
'goto err' as done in the other error handling paths to fix it.
Fixes: 46a3df9f9718 ("net: hns3: Fix for setting rss_size incorrectly")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index e0685e630afe..c1cdbfd83bdb 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2652,7 +2652,8 @@ static int hclge_rss_init_hw(struct hclge_dev *hdev)
dev_err(&hdev->pdev->dev,
"Configure rss tc size failed, invalid TC_SIZE = %d\n",
rss_size);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err;
}
roundup_size = roundup_pow_of_two(rss_size);
--
2.11.0
^ permalink raw reply related
* Re: [PATCH] mkiss: remove redundant check on len being zero
From: David Miller @ 2017-09-30 4:44 UTC (permalink / raw)
To: colin.king
Cc: stephen, johannes.berg, ralf, netdev, kernel-janitors,
linux-kernel
In-Reply-To: <20170927214513.22562-1-colin.king@canonical.com>
From: Colin King <colin.king@canonical.com>
Date: Wed, 27 Sep 2017 22:45:13 +0100
> From: Colin Ian King <colin.king@canonical.com>
>
> The check on len is redundant as it is always greater than 1,
> so just remove it and make the printk less complex.
>
> Detected by CoverityScan, CID#1226729 ("Logically dead code")
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] net-ipv6: add support for sockopt(SOL_IPV6, IPV6_FREEBIND)
From: David Miller @ 2017-09-30 4:31 UTC (permalink / raw)
To: zenczykowski; +Cc: maze, netdev
In-Reply-To: <20170927043242.151591-1-zenczykowski@gmail.com>
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue, 26 Sep 2017 21:32:42 -0700
> From: Maciej Żenczykowski <maze@google.com>
>
> So far we've been relying on sockopt(SOL_IP, IP_FREEBIND) being usable
> even on IPv6 sockets.
>
> However, it turns out it is perfectly reasonable to want to set freebind
> on an AF_INET6 SOCK_RAW socket - but there is no way to set any SOL_IP
> socket option on such a socket (they're all blindly errored out).
>
> One use case for this is to allow spoofing src ip on a raw socket
> via sendmsg cmsg.
>
> Tested:
> built, and booted
> # python
> >>> import socket
> >>> SOL_IP = socket.SOL_IP
> >>> SOL_IPV6 = socket.IPPROTO_IPV6
> >>> IP_FREEBIND = 15
> >>> IPV6_FREEBIND = 78
> >>> s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0)
> >>> s.getsockopt(SOL_IP, IP_FREEBIND)
> 0
> >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
> 0
> >>> s.setsockopt(SOL_IPV6, IPV6_FREEBIND, 1)
> >>> s.getsockopt(SOL_IP, IP_FREEBIND)
> 1
> >>> s.getsockopt(SOL_IPV6, IPV6_FREEBIND)
> 1
>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
Applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH net-next] liquidio: fix format truncation warning reported by gcc 7.1.1
From: David Miller @ 2017-09-30 4:29 UTC (permalink / raw)
To: felix.manlunas; +Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla
In-Reply-To: <20170926184827.GA3512@felix-thinkpad.cavium.com>
From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Tue, 26 Sep 2017 11:48:27 -0700
> From: Satanand Burla <satananda.burla@cavium.com>
>
> gcc 7.1.1 with -Wformat-truncation reports these warnings:
>
> drivers/net/ethernet/cavium/liquidio/lio_core.c: In function `octeon_setup_interrupt':
> drivers/net/ethernet/cavium/liquidio/lio_core.c:1003:41: warning: `%u' directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 13 [-Wformat-truncation=]
> INTRNAMSIZ, "LiquidIO%u-pf%u-rxtx-%u",
> ^~
Please just increase INTRNAMSIZ as needed.
Thanks.
^ permalink raw reply
* Re: [PATCH net-next] net: ipv6: send NS for DAD when link operationally up
From: David Miller @ 2017-09-30 4:26 UTC (permalink / raw)
To: mmanning; +Cc: netdev
In-Reply-To: <1506373296-24977-1-git-send-email-mmanning@brocade.com>
From: Mike Manning <mmanning@brocade.com>
Date: Mon, 25 Sep 2017 22:01:36 +0100
> The NS for DAD are sent on admin up as long as a valid qdisc is found.
> A race condition exists by which these packets will not egress the
> interface if the operational state of the lower device is not yet up.
> The solution is to delay DAD until the link is operationally up
> according to RFC2863. Rather than only doing this, follow the existing
> code checks by deferring IPv6 device initialization altogether. The fix
> allows DAD on devices like tunnels that are controlled by userspace
> control plane. The fix has no impact on regular deployments, but means
> that there is no IPv6 connectivity until the port has been opened in
> the case of port-based network access control, which should be
> desirable.
>
> Signed-off-by: Mike Manning <mmanning@brocade.com>
Applied, thank you.
^ permalink raw reply
* Re: [net-next V2 PATCH 5/5] samples/bpf: add cpumap sample program xdp_redirect_cpu
From: Alexei Starovoitov @ 2017-09-30 3:06 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: netdev, jakub.kicinski, Michael S. Tsirkin, Jason Wang, mchan,
John Fastabend, peter.waskiewicz.jr, Daniel Borkmann,
Andy Gospodarek
In-Reply-To: <150670287254.23765.608477053861949304.stgit@firesoul>
On Fri, Sep 29, 2017 at 06:34:32PM +0200, Jesper Dangaard Brouer wrote:
> +
> +static __always_inline
> +u16 get_dest_port_ipv4_udp(struct xdp_md *ctx, u64 nh_off)
> +{
> + void *data_end = (void *)(long)ctx->data_end;
> + void *data = (void *)(long)ctx->data;
> + struct iphdr *iph = data + nh_off;
> + struct udphdr *udph;
> + u16 dport;
formatting is off in the above.
> +
> + if (iph + 1 > data_end)
> + return 0;
> + if (!(iph->protocol == IPPROTO_UDP))
> + return 0;
> +
> + udph = (void *)(iph + 1);
> + if (udph + 1 > data_end)
> + return 0;
> +
> + dport = ntohs(udph->dest);
> + return dport;
> +}
> +
> +static __always_inline
> +int get_proto_ipv4(struct xdp_md *ctx, u64 nh_off)
> +{
> + void *data_end = (void *)(long)ctx->data_end;
> + void *data = (void *)(long)ctx->data;
> + struct iphdr *iph = data + nh_off;
> +
> + if (iph + 1 > data_end)
> + return 0;
> + return iph->protocol;
and here
> +}
> +
> +static __always_inline
> +int get_proto_ipv6(struct xdp_md *ctx, u64 nh_off)
> +{
> + void *data_end = (void *)(long)ctx->data_end;
> + void *data = (void *)(long)ctx->data;
> + struct ipv6hdr *ip6h = data + nh_off;
> +
> + if (ip6h + 1 > data_end)
> + return 0;
and here
> +/*** Trace point code ***/
> +
> +/* Tracepoint format: /sys/kernel/debug/tracing/events/xdp/xdp_redirect/format
> + * Code in: kernel/include/trace/events/xdp.h
> + */
> +struct xdp_redirect_ctx {
> + unsigned short common_type; // offset:0; size:2; signed:0;
> + unsigned char common_flags; // offset:2; size:1; signed:0;
> + unsigned char common_preempt_count;// offset:3; size:1; signed:0;
> + int common_pid; // offset:4; size:4; signed:1;
this part is not right. First 8 bytes are not accessible by bpf code.
Please use __u64 pad; or similar here.
Just noticed that samples/bpf/xdp_monitor_kern.c has the same problem.
> +
> + int prog_id; // offset:8; size:4; signed:1;
> + u32 act; // offset:12 size:4; signed:0;
> + int ifindex; // offset:16 size:4; signed:1;
> + int err; // offset:20 size:4; signed:1;
> + int to_ifindex; // offset:24 size:4; signed:1;
> + u32 map_id; // offset:28 size:4; signed:0;
> + int map_index; // offset:32 size:4; signed:1;
> +}; // offset:36
the second part of fields is correct.
> +/* Tracepoint format: /sys/kernel/debug/tracing/events/xdp/xdp_exception/format
> + * Code in: kernel/include/trace/events/xdp.h
> + */
> +struct xdp_exception_ctx {
> + unsigned short common_type; // offset:0; size:2; signed:0;
> + unsigned char common_flags; // offset:2; size:1; signed:0;
> + unsigned char common_preempt_count;// offset:3; size:1; signed:0;
> + int common_pid; // offset:4; size:4; signed:1;
same problem here.
> + int prog_id; // offset:8; size:4; signed:1;
> + u32 act; // offset:12; size:4; signed:0;
> + int ifindex; // offset:16; size:4; signed:1;
> +};
> +
> +SEC("tracepoint/xdp/xdp_exception")
> +int trace_xdp_exception(struct xdp_exception_ctx *ctx)
> +{
> + struct datarec *rec;
> + u32 key = 0;
> +
> + rec = bpf_map_lookup_elem(&exception_cnt, &key);
> + if (!rec)
> + return 1;
> + rec->dropped += 1;
> +
> + return 0;
> +}
> +
> +/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_cpumap_enqueue/format
> + * Code in: kernel/include/trace/events/xdp.h
> + */
> +struct cpumap_enqueue_ctx {
> + unsigned short common_type; // offset:0; size:2; signed:0;
> + unsigned char common_flags; // offset:2; size:1; signed:0;
> + unsigned char common_preempt_count;// offset:3; size:1; signed:0;
> + int common_pid; // offset:4; size:4; signed:1;
and here
> +/* Tracepoint: /sys/kernel/debug/tracing/events/xdp/xdp_cpumap_kthread/format
> + * Code in: kernel/include/trace/events/xdp.h
> + */
> +struct cpumap_kthread_ctx {
> + unsigned short common_type; // offset:0; size:2; signed:0;
> + unsigned char common_flags; // offset:2; size:1; signed:0;
> + unsigned char common_preempt_count;// offset:3; size:1; signed:0;
> + int common_pid; // offset:4; size:4; signed:1;
and here.
> +int main(int argc, char **argv)
> +{
> + struct rlimit r = {10 * 1024*1024, RLIM_INFINITY};
extra spaces around * wouldn't hurt
> + /* Parse commands line args */
> + while ((opt = getopt_long(argc, argv, "hSd:",
> + long_options, &longindex)) != -1) {
> + switch (opt) {
> + case 'd':
> + if (strlen(optarg) >= IF_NAMESIZE) {
> + fprintf(stderr, "ERR: --dev name too long\n");
> + goto error;
> + }
> + ifname = (char *)&ifname_buf;
> + strncpy(ifname, optarg, IF_NAMESIZE);
> + ifindex = if_nametoindex(ifname);
> + if (ifindex == 0) {
> + fprintf(stderr,
> + "ERR: --dev name unknown err(%d):%s\n",
> + errno, strerror(errno));
> + goto error;
> + }
> + break;
> + case 's':
> + interval = atoi(optarg);
> + break;
> + case 'S':
> + xdp_flags |= XDP_FLAGS_SKB_MODE;
> + break;
> + case 'D':
> + debug = true;
> + break;
> + case 'z':
> + use_separators = false;
> + break;
> + case 'p':
> + /* Selecting eBPF prog to load */
> + prog_num = atoi(optarg);
> + if (prog_num < 0 || prog_num >= MAX_PROG) {
> + fprintf(stderr,
> + "--prognum too large err(%d):%s\n",
> + errno, strerror(errno));
> + goto error;
> + }
> + break;
> + case 'c':
> + /* Add multiple CPUs */
> + add_cpu = strtoul(optarg, NULL, 0);
> + if (add_cpu > MAX_CPUS) {
> + fprintf(stderr,
> + "--cpu nr too large for cpumap err(%d):%s\n",
> + errno, strerror(errno));
> + goto error;
> + }
> + create_cpu_entry(add_cpu, qsize, added_cpus, true);
> + added_cpus++;
> + break;
> + case 'q':
> + qsize = atoi(optarg);
> + break;
> + case 'h':
> + error:
> + default:
> + usage(argv);
> + return EXIT_FAIL_OPTION;
the rest looks good. Nice that it parses cmdline properly
and even has -h flag :)
^ permalink raw reply
* Re: [PATCH net-next 2/5] bpf: Add map_name to bpf_map_info
From: Jakub Kicinski @ 2017-09-30 2:52 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20170930023518.bmc6dbpnz27wunod@kafai-mbp>
On Fri, 29 Sep 2017 19:35:18 -0700, Martin KaFai Lau wrote:
> On Sat, Sep 30, 2017 at 02:07:46AM +0000, Jakub Kicinski wrote:
> > Hi Martin!
> >
> > On Wed, 27 Sep 2017 14:37:53 -0700, Martin KaFai Lau wrote:
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 33ccc474fb04..252f4bc9eb25 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -56,6 +56,7 @@ struct bpf_map {
> > > struct work_struct work;
> > > atomic_t usercnt;
> > > struct bpf_map *inner_map_meta;
> > > + u8 name[BPF_OBJ_NAME_LEN];
> >
> > Any reason not to use plain char? I was looking at adding names to
> > bpftool and:
> Happy to have early user :)
The newly exposed info is so useful I couldn't resist :)
> It was mostly due to my early idea on treating the name as
> a blob without much char checking, which was then trashed
> but did not reflect that here.
>
> I will make a followup patch later.
Awesome, thanks!
^ permalink raw reply
* Re: [PATCH net-next 2/5] bpf: Add map_name to bpf_map_info
From: Martin KaFai Lau @ 2017-09-30 2:35 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20170929190746.322df52d@cakuba>
On Sat, Sep 30, 2017 at 02:07:46AM +0000, Jakub Kicinski wrote:
> Hi Martin!
>
> On Wed, 27 Sep 2017 14:37:53 -0700, Martin KaFai Lau wrote:
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 33ccc474fb04..252f4bc9eb25 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -56,6 +56,7 @@ struct bpf_map {
> > struct work_struct work;
> > atomic_t usercnt;
> > struct bpf_map *inner_map_meta;
> > + u8 name[BPF_OBJ_NAME_LEN];
>
> Any reason not to use plain char? I was looking at adding names to
> bpftool and:
Happy to have early user :)
It was mostly due to my early idea on treating the name as
a blob without much char checking, which was then trashed
but did not reflect that here.
I will make a followup patch later.
>
> map.c: In function ‘show_map_close’:
> map.c:386:13: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness [-Wpointer-sign]
> if (strlen(info->name))
> ^~~~
> In file included from map.c:43:0:
> /usr/include/string.h:399:15: note: expected ‘const char *’ but argument is of type ‘__u8 * {aka unsigned char *}’
> extern size_t strlen (const char *__s)
> ^~~~~~
> > };
> >
> > /* function argument constraints */
>
^ permalink raw reply
* Re: [PATCH net-next 2/5] bpf: Add map_name to bpf_map_info
From: Jakub Kicinski @ 2017-09-30 2:07 UTC (permalink / raw)
To: Martin KaFai Lau; +Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20170927213756.1254938-3-kafai@fb.com>
Hi Martin!
On Wed, 27 Sep 2017 14:37:53 -0700, Martin KaFai Lau wrote:
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 33ccc474fb04..252f4bc9eb25 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -56,6 +56,7 @@ struct bpf_map {
> struct work_struct work;
> atomic_t usercnt;
> struct bpf_map *inner_map_meta;
> + u8 name[BPF_OBJ_NAME_LEN];
Any reason not to use plain char? I was looking at adding names to
bpftool and:
map.c: In function ‘show_map_close’:
map.c:386:13: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness [-Wpointer-sign]
if (strlen(info->name))
^~~~
In file included from map.c:43:0:
/usr/include/string.h:399:15: note: expected ‘const char *’ but argument is of type ‘__u8 * {aka unsigned char *}’
extern size_t strlen (const char *__s)
^~~~~~
> };
>
> /* function argument constraints */
^ permalink raw reply
* Re: [PATCH net] RDS: IB: Limit the scope of has_fr/has_fmr variables
From: Santosh Shilimkar @ 2017-09-30 1:38 UTC (permalink / raw)
To: Avinash Repaka, David S. Miller, netdev, linux-rdma, rds-devel,
linux-kernel
In-Reply-To: <1506734030-15205-1-git-send-email-avinash.repaka@oracle.com>
On 9/29/2017 6:13 PM, Avinash Repaka wrote:
> This patch fixes the scope of has_fr and has_fmr variables as they are
> needed only in rds_ib_add_one().
>
> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
> ---
Indeed the final merge version actually didn't need
those across files. Change looks good to me. Thanks !!
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
^ permalink raw reply
* Re: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
From: Willem de Bruijn @ 2017-09-30 1:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Network Development, David Miller, Jason Wang, Koichiro Den,
virtualization, Willem de Bruijn
In-Reply-To: <20170929223710-mutt-send-email-mst@kernel.org>
On Fri, Sep 29, 2017 at 3:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Sep 27, 2017 at 08:25:56PM -0400, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Vhost-net has a hard limit on the number of zerocopy skbs in flight.
>> When reached, transmission stalls. Stalls cause latency, as well as
>> head-of-line blocking of other flows that do not use zerocopy.
>>
>> Instead of stalling, revert to copy-based transmission.
>>
>> Tested by sending two udp flows from guest to host, one with payload
>> of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
>> large flow is redirected to a netem instance with 1MBps rate limit
>> and deep 1000 entry queue.
>>
>> modprobe ifb
>> ip link set dev ifb0 up
>> tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit
>>
>> tc qdisc add dev tap0 ingress
>> tc filter add dev tap0 parent ffff: protocol ip \
>> u32 match ip dport 8000 0xffff \
>> action mirred egress redirect dev ifb0
>>
>> Before the delay, both flows process around 80K pps. With the delay,
>> before this patch, both process around 400. After this patch, the
>> large flow is still rate limited, while the small reverts to its
>> original rate. See also discussion in the first link, below.
>>
>> The limit in vhost_exceeds_maxpend must be carefully chosen. When
>> vq->num >> 1, the flows remain correlated. This value happens to
>> correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
>> fractions and ensure correctness also for much smaller values of
>> vq->num, by testing the min() of both explicitly. See also the
>> discussion in the second link below.
>>
>> Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
>> Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>
> I'd like to see the effect on the non rate limited case though.
> If guest is quick won't we have lots of copies then?
Yes, but not significantly more than without this patch.
I ran 1, 10 and 100 flow tcp_stream throughput tests from a sender
in the guest to a receiver in the host.
To answer the other benchmark question first, I did not see anything
noteworthy when increasing vq->num from 256 to 1024.
With 1 and 10 flows without this patch all packets use zerocopy.
With the patch, less than 1% eschews zerocopy.
With 100 flows, even without this patch, 90+% of packets are copied.
Some zerocopy packets from vhost_net fail this test in tun.c
if (iov_iter_npages(&i, INT_MAX) <= MAX_SKB_FRAGS)
Generating packets with up to 21 frags. I'm not sure yet why or
what the fraction of these packets is. But this in turn can
disable zcopy_used in vhost_net_tx_select_zcopy for a
larger share of packets:
return !net->tx_flush &&
net->tx_packets / 64 >= net->tx_zcopy_err;
Because the number of copied and zerocopy packets are the
same before and after the patch, so are the overall throughput
numbers.
^ permalink raw reply
* Re: [PATCH net-next v2 1/7] net: dsa: add master helper to look up slaves
From: Florian Fainelli @ 2017-09-30 1:14 UTC (permalink / raw)
To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170929211921.5571-2-vivien.didelot@savoirfairelinux.com>
On 09/29/2017 02:19 PM, Vivien Didelot wrote:
> The DSA tagging code does not need to know about the DSA architecture,
> it only needs to return the slave device corresponding to the source
> port index (and eventually the source device index for cascade-capable
> switches) parsed from the frame received on the master device.
>
> For this purpose, provide an inline dsa_master_get_slave helper which
> validates the device and port indexes and look up the slave device.
>
> This makes the tagging rcv functions more concise and robust, and also
> makes dsa_get_cpu_port obsolete.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* [PATCH net] RDS: IB: Limit the scope of has_fr/has_fmr variables
From: Avinash Repaka @ 2017-09-30 1:13 UTC (permalink / raw)
To: Santosh Shilimkar, David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
rds-devel-N0ozoZBvEnrZJqsBc5GL+g,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
Cc: Avinash Repaka
This patch fixes the scope of has_fr and has_fmr variables as they are
needed only in rds_ib_add_one().
Signed-off-by: Avinash Repaka <avinash.repaka-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/rds/ib.c | 11 ++++++-----
net/rds/ib.h | 2 --
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/net/rds/ib.c b/net/rds/ib.c
index a0954ac..36dd209 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -126,6 +126,7 @@ void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
static void rds_ib_add_one(struct ib_device *device)
{
struct rds_ib_device *rds_ibdev;
+ bool has_fr, has_fmr;
/* Only handle IB (no iWARP) devices */
if (device->node_type != RDMA_NODE_IB_CA)
@@ -143,11 +144,11 @@ static void rds_ib_add_one(struct ib_device *device)
rds_ibdev->max_wrs = device->attrs.max_qp_wr;
rds_ibdev->max_sge = min(device->attrs.max_sge, RDS_IB_MAX_SGE);
- rds_ibdev->has_fr = (device->attrs.device_cap_flags &
- IB_DEVICE_MEM_MGT_EXTENSIONS);
- rds_ibdev->has_fmr = (device->alloc_fmr && device->dealloc_fmr &&
- device->map_phys_fmr && device->unmap_fmr);
- rds_ibdev->use_fastreg = (rds_ibdev->has_fr && !rds_ibdev->has_fmr);
+ has_fr = (device->attrs.device_cap_flags &
+ IB_DEVICE_MEM_MGT_EXTENSIONS);
+ has_fmr = (device->alloc_fmr && device->dealloc_fmr &&
+ device->map_phys_fmr && device->unmap_fmr);
+ rds_ibdev->use_fastreg = (has_fr && !has_fmr);
rds_ibdev->fmr_max_remaps = device->attrs.max_map_per_fmr?: 32;
rds_ibdev->max_1m_mrs = device->attrs.max_mr ?
diff --git a/net/rds/ib.h b/net/rds/ib.h
index bf48224..6ea6a27 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -215,8 +215,6 @@ struct rds_ib_device {
struct list_head conn_list;
struct ib_device *dev;
struct ib_pd *pd;
- bool has_fmr;
- bool has_fr;
bool use_fastreg;
unsigned int max_mrs;
--
2.4.11
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [net-next 05/15] i40e: relax warning message in case of version mismatch
From: Jeff Kirsher @ 2017-09-30 0:44 UTC (permalink / raw)
To: davem; +Cc: Mariusz Stachura, netdev, nhorman, sassmann, jogreene,
Jeff Kirsher
In-Reply-To: <20170930004507.20072-1-jeffrey.t.kirsher@intel.com>
From: Mariusz Stachura <mariusz.stachura@intel.com>
Fortville and Fort Park devices are often on different firmware release
schedules. This change relaxes the minor version warning message,
so it is only displayed for older FW warning version for old
firmware Fortville 3 or earlier.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3c650917b54f..a887087d08cd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11374,8 +11374,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
hw->aq.api_min_ver > I40E_FW_API_VERSION_MINOR)
dev_info(&pdev->dev,
"The driver for the device detected a newer version of the NVM image than expected. Please install the most recent version of the network driver.\n");
- else if (hw->aq.api_maj_ver < I40E_FW_API_VERSION_MAJOR ||
- hw->aq.api_min_ver < (I40E_FW_API_VERSION_MINOR - 1))
+ else if (hw->aq.api_maj_ver == 1 && hw->aq.api_min_ver < 4)
dev_info(&pdev->dev,
"The driver for the device detected an older version of the NVM image than expected. Please update the NVM image.\n");
--
2.14.1
^ permalink raw reply related
* [net-next 11/15] i40e: prevent service task from running while we're suspended
From: Jeff Kirsher @ 2017-09-30 0:45 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170930004507.20072-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Although the service task does check the suspended status before
running, it might already be part way through running when we go to
suspend. Lets ensure that the service task is stopped and will not be
restarted again until we finish resuming. This ensures that service task
code does not cause strange interactions with the suspend/resume
handlers.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 494cafde6b26..368373459ad5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12065,6 +12065,10 @@ static int i40e_suspend(struct device *dev)
set_bit(__I40E_DOWN, pf->state);
+ /* Ensure service task will not be running */
+ del_timer_sync(&pf->service_timer);
+ cancel_work_sync(&pf->service_task);
+
if (pf->wol_en && (pf->hw_features & I40E_HW_WOL_MC_MAGIC_PKT_WAKE))
i40e_enable_mc_magic_wake(pf);
@@ -12097,6 +12101,10 @@ static int i40e_resume(struct device *dev)
/* Clear suspended state last after everything is recovered */
clear_bit(__I40E_SUSPENDED, pf->state);
+ /* Restart the service task */
+ mod_timer(&pf->service_timer,
+ round_jiffies(jiffies + pf->service_timer_period));
+
return 0;
}
--
2.14.1
^ permalink raw reply related
* [net-next 04/15] i40e: simplify member variable accesses
From: Jeff Kirsher @ 2017-09-30 0:44 UTC (permalink / raw)
To: davem
Cc: Sudheer Mogilappagari, netdev, nhorman, sassmann, jogreene,
Jeff Kirsher
In-Reply-To: <20170930004507.20072-1-jeffrey.t.kirsher@intel.com>
From: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
This commit replaces usage of vsi->back in i40e_print_link_message()
(which is actually a PF pointer) with temp variable.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2e8fe6186b38..3c650917b54f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5346,13 +5346,14 @@ static int i40e_init_pf_dcb(struct i40e_pf *pf)
void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
{
enum i40e_aq_link_speed new_speed;
+ struct i40e_pf *pf = vsi->back;
char *speed = "Unknown";
char *fc = "Unknown";
char *fec = "";
char *req_fec = "";
char *an = "";
- new_speed = vsi->back->hw.phy.link_info.link_speed;
+ new_speed = pf->hw.phy.link_info.link_speed;
if ((vsi->current_isup == isup) && (vsi->current_speed == new_speed))
return;
@@ -5366,13 +5367,13 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
/* Warn user if link speed on NPAR enabled partition is not at
* least 10GB
*/
- if (vsi->back->hw.func_caps.npar_enable &&
- (vsi->back->hw.phy.link_info.link_speed == I40E_LINK_SPEED_1GB ||
- vsi->back->hw.phy.link_info.link_speed == I40E_LINK_SPEED_100MB))
+ if (pf->hw.func_caps.npar_enable &&
+ (pf->hw.phy.link_info.link_speed == I40E_LINK_SPEED_1GB ||
+ pf->hw.phy.link_info.link_speed == I40E_LINK_SPEED_100MB))
netdev_warn(vsi->netdev,
"The partition detected link speed that is less than 10Gbps\n");
- switch (vsi->back->hw.phy.link_info.link_speed) {
+ switch (pf->hw.phy.link_info.link_speed) {
case I40E_LINK_SPEED_40GB:
speed = "40 G";
break;
@@ -5395,7 +5396,7 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
break;
}
- switch (vsi->back->hw.fc.current_mode) {
+ switch (pf->hw.fc.current_mode) {
case I40E_FC_FULL:
fc = "RX/TX";
break;
@@ -5410,18 +5411,18 @@ void i40e_print_link_message(struct i40e_vsi *vsi, bool isup)
break;
}
- if (vsi->back->hw.phy.link_info.link_speed == I40E_LINK_SPEED_25GB) {
+ if (pf->hw.phy.link_info.link_speed == I40E_LINK_SPEED_25GB) {
req_fec = ", Requested FEC: None";
fec = ", FEC: None";
an = ", Autoneg: False";
- if (vsi->back->hw.phy.link_info.an_info & I40E_AQ_AN_COMPLETED)
+ if (pf->hw.phy.link_info.an_info & I40E_AQ_AN_COMPLETED)
an = ", Autoneg: True";
- if (vsi->back->hw.phy.link_info.fec_info &
+ if (pf->hw.phy.link_info.fec_info &
I40E_AQ_CONFIG_FEC_KR_ENA)
fec = ", FEC: CL74 FC-FEC/BASE-R";
- else if (vsi->back->hw.phy.link_info.fec_info &
+ else if (pf->hw.phy.link_info.fec_info &
I40E_AQ_CONFIG_FEC_RS_ENA)
fec = ", FEC: CL108 RS-FEC";
--
2.14.1
^ permalink raw reply related
* [net-next 06/15] i40e: fix for flow director counters not wrapping as expected
From: Jeff Kirsher @ 2017-09-30 0:44 UTC (permalink / raw)
To: davem; +Cc: Mariusz Stachura, netdev, nhorman, sassmann, jogreene,
Jeff Kirsher
In-Reply-To: <20170930004507.20072-1-jeffrey.t.kirsher@intel.com>
From: Mariusz Stachura <mariusz.stachura@intel.com>
An errata with GLQF_PCNT causes it to not wrap as expected. This
can cause an error in flow director statistics. This patch resets
affected counters just after reading.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 35 +++++++++++++++++++----------
1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a887087d08cd..638f5bad0bd7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -599,6 +599,20 @@ static void i40e_stat_update32(struct i40e_hw *hw, u32 reg,
*stat = (u32)((new_data + BIT_ULL(32)) - *offset);
}
+/**
+ * i40e_stat_update_and_clear32 - read and clear hw reg, update a 32 bit stat
+ * @hw: ptr to the hardware info
+ * @reg: the hw reg to read and clear
+ * @stat: ptr to the stat
+ **/
+static void i40e_stat_update_and_clear32(struct i40e_hw *hw, u32 reg, u64 *stat)
+{
+ u32 new_data = rd32(hw, reg);
+
+ wr32(hw, reg, 1); /* must write a nonzero value to clear register */
+ *stat += new_data;
+}
+
/**
* i40e_update_eth_stats - Update VSI-specific ethernet statistics counters.
* @vsi: the VSI to be updated
@@ -1040,18 +1054,15 @@ static void i40e_update_pf_stats(struct i40e_pf *pf)
&osd->rx_jabber, &nsd->rx_jabber);
/* FDIR stats */
- i40e_stat_update32(hw,
- I40E_GLQF_PCNT(I40E_FD_ATR_STAT_IDX(pf->hw.pf_id)),
- pf->stat_offsets_loaded,
- &osd->fd_atr_match, &nsd->fd_atr_match);
- i40e_stat_update32(hw,
- I40E_GLQF_PCNT(I40E_FD_SB_STAT_IDX(pf->hw.pf_id)),
- pf->stat_offsets_loaded,
- &osd->fd_sb_match, &nsd->fd_sb_match);
- i40e_stat_update32(hw,
- I40E_GLQF_PCNT(I40E_FD_ATR_TUNNEL_STAT_IDX(pf->hw.pf_id)),
- pf->stat_offsets_loaded,
- &osd->fd_atr_tunnel_match, &nsd->fd_atr_tunnel_match);
+ i40e_stat_update_and_clear32(hw,
+ I40E_GLQF_PCNT(I40E_FD_ATR_STAT_IDX(hw->pf_id)),
+ &nsd->fd_atr_match);
+ i40e_stat_update_and_clear32(hw,
+ I40E_GLQF_PCNT(I40E_FD_SB_STAT_IDX(hw->pf_id)),
+ &nsd->fd_sb_match);
+ i40e_stat_update_and_clear32(hw,
+ I40E_GLQF_PCNT(I40E_FD_ATR_TUNNEL_STAT_IDX(hw->pf_id)),
+ &nsd->fd_atr_tunnel_match);
val = rd32(hw, I40E_PRTPM_EEE_STAT);
nsd->tx_lpi_status =
--
2.14.1
^ permalink raw reply related
* [net-next 12/15] i40e: shutdown all IRQs and disable MSI-X when suspended
From: Jeff Kirsher @ 2017-09-30 0:45 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170930004507.20072-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
On some platforms with a large number of CPUs, we will allocate many IRQ
vectors. When hibernating, the system will attempt to migrate all of the
vectors back to CPU0 when shutting down all the other CPUs. It is
possible that we have so many vectors that it cannot re-assign them to
CPU0. This is even more likely if we have many devices installed in one
platform.
The end result is failure to hibernate, as it is not possible to
shutdown the CPUs. We can avoid this by disabling MSI-X and clearing our
interrupt scheme when the device is suspended. A more ideal solution
would be some method for the stack to properly handle this for all
drivers, rather than on a case-by-case basis for each driver to fix
itself.
However, until this more ideal solution exists, we can do our part and
shutdown our IRQs during suspend, which should allow systems with
a large number of CPUs to safely suspend or hibernate.
It may be worth investigating if we should shut down even further when
we suspend as it may make the path cleaner, but this was the minimum fix
for the hibernation issue mentioned here.
Testing-hints:
This affects systems with a large number of CPUs, and with multiple
devices enabled. Without this change, those platforms are unable to
hibernate at all.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 68 ++++++++++++++++++++++++-
drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
2 files changed, 68 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 368373459ad5..8a44793d5390 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8354,6 +8354,57 @@ static int i40e_init_interrupt_scheme(struct i40e_pf *pf)
return 0;
}
+#ifdef CONFIG_PM
+/**
+ * i40e_restore_interrupt_scheme - Restore the interrupt scheme
+ * @pf: private board data structure
+ *
+ * Restore the interrupt scheme that was cleared when we suspended the
+ * device. This should be called during resume to re-allocate the q_vectors
+ * and reacquire IRQs.
+ */
+static int i40e_restore_interrupt_scheme(struct i40e_pf *pf)
+{
+ int err, i;
+
+ /* We cleared the MSI and MSI-X flags when disabling the old interrupt
+ * scheme. We need to re-enabled them here in order to attempt to
+ * re-acquire the MSI or MSI-X vectors
+ */
+ pf->flags |= (I40E_FLAG_MSIX_ENABLED | I40E_FLAG_MSI_ENABLED);
+
+ err = i40e_init_interrupt_scheme(pf);
+ if (err)
+ return err;
+
+ /* Now that we've re-acquired IRQs, we need to remap the vectors and
+ * rings together again.
+ */
+ for (i = 0; i < pf->num_alloc_vsi; i++) {
+ if (pf->vsi[i]) {
+ err = i40e_vsi_alloc_q_vectors(pf->vsi[i]);
+ if (err)
+ goto err_unwind;
+ i40e_vsi_map_rings_to_vectors(pf->vsi[i]);
+ }
+ }
+
+ err = i40e_setup_misc_vector(pf);
+ if (err)
+ goto err_unwind;
+
+ return 0;
+
+err_unwind:
+ while (i--) {
+ if (pf->vsi[i])
+ i40e_vsi_free_q_vectors(pf->vsi[i]);
+ }
+
+ return err;
+}
+#endif /* CONFIG_PM */
+
/**
* i40e_setup_misc_vector - Setup the misc vector to handle non queue events
* @pf: board private structure
@@ -12077,7 +12128,12 @@ static int i40e_suspend(struct device *dev)
wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0));
wr32(hw, I40E_PFPM_WUFC, (pf->wol_en ? I40E_PFPM_WUFC_MAG_MASK : 0));
- i40e_free_misc_vector(pf);
+ /* Clear the interrupt scheme and release our IRQs so that the system
+ * can safely hibernate even when there are a large number of CPUs.
+ * Otherwise hibernation might fail when mapping all the vectors back
+ * to CPU0.
+ */
+ i40e_clear_interrupt_scheme(pf);
return 0;
}
@@ -12090,11 +12146,21 @@ static int i40e_resume(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
struct i40e_pf *pf = pci_get_drvdata(pdev);
+ int err;
/* If we're not suspended, then there is nothing to do */
if (!test_bit(__I40E_SUSPENDED, pf->state))
return 0;
+ /* We cleared the interrupt scheme when we suspended, so we need to
+ * restore it now to resume device functionality.
+ */
+ err = i40e_restore_interrupt_scheme(pf);
+ if (err) {
+ dev_err(&pdev->dev, "Cannot restore interrupt scheme: %d\n",
+ err);
+ }
+
clear_bit(__I40E_DOWN, pf->state);
i40e_reset_and_rebuild(pf, false, false);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index c243f9da95ae..80ade6510279 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -46,7 +46,7 @@ static const char i40evf_driver_string[] =
#define DRV_VERSION_MAJOR 3
#define DRV_VERSION_MINOR 0
-#define DRV_VERSION_BUILD 0
+#define DRV_VERSION_BUILD 1
#define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
__stringify(DRV_VERSION_MINOR) "." \
__stringify(DRV_VERSION_BUILD) \
--
2.14.1
^ permalink raw reply related
* [net-next 15/15] i40e: refactor FW version checking
From: Jeff Kirsher @ 2017-09-30 0:45 UTC (permalink / raw)
To: davem; +Cc: Mitch Williams, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170930004507.20072-1-jeffrey.t.kirsher@intel.com>
From: Mitch Williams <mitch.a.williams@intel.com>
The i40e driver now supports two different devices with two different
firmware versions. So be smart about how we handle these. Move the FW
version macros to the appropriate header file, and add a convenience
macro that checks the version based on the device. Then use this macro
to check whether or not the driver can use the new link info API.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 10 +++++++++-
drivers/net/ethernet/intel/i40e/i40e_common.c | 6 ++++--
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h | 10 +++++++++-
4 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 5d5f422cbae5..e2a9ec80a623 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -34,7 +34,15 @@
*/
#define I40E_FW_API_VERSION_MAJOR 0x0001
-#define I40E_FW_API_VERSION_MINOR 0x0005
+#define I40E_FW_API_VERSION_MINOR_X722 0x0005
+#define I40E_FW_API_VERSION_MINOR_X710 0x0007
+
+#define I40E_FW_MINOR_VERSION(_h) ((_h)->mac.type == I40E_MAC_XL710 ? \
+ I40E_FW_API_VERSION_MINOR_X710 : \
+ I40E_FW_API_VERSION_MINOR_X722)
+
+/* API version 1.7 implements additional link and PHY-specific APIs */
+#define I40E_MINOR_VER_GET_LINK_INFO_XL710 0x0007
struct i40e_aq_desc {
__le16 flags;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 111426ba5fbc..7346d8850c8e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1593,8 +1593,10 @@ i40e_status i40e_aq_get_phy_capabilities(struct i40e_hw *hw,
status = I40E_ERR_UNKNOWN_PHY;
if (report_init) {
- hw->phy.phy_types = le32_to_cpu(abilities->phy_type);
- hw->phy.phy_types |= ((u64)abilities->phy_type_ext << 32);
+ if (hw->mac.type == I40E_MAC_XL710 &&
+ hw->aq.api_maj_ver == I40E_FW_API_VERSION_MAJOR &&
+ hw->aq.api_min_ver >= I40E_MINOR_VER_GET_LINK_INFO_XL710)
+ status = i40e_aq_get_link_info(hw, true, NULL, NULL);
}
return status;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8a44793d5390..47f71d7c3ae0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11434,7 +11434,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
i40e_nvm_version_str(hw));
if (hw->aq.api_maj_ver == I40E_FW_API_VERSION_MAJOR &&
- hw->aq.api_min_ver > I40E_FW_API_VERSION_MINOR)
+ hw->aq.api_min_ver > I40E_FW_MINOR_VERSION(hw))
dev_info(&pdev->dev,
"The driver for the device detected a newer version of the NVM image than expected. Please install the most recent version of the network driver.\n");
else if (hw->aq.api_maj_ver == 1 && hw->aq.api_min_ver < 4)
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
index 83e63e55c4b4..f9f48d1900b0 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h
@@ -34,7 +34,15 @@
*/
#define I40E_FW_API_VERSION_MAJOR 0x0001
-#define I40E_FW_API_VERSION_MINOR 0x0005
+#define I40E_FW_API_VERSION_MINOR_X722 0x0005
+#define I40E_FW_API_VERSION_MINOR_X710 0x0007
+
+#define I40E_FW_MINOR_VERSION(_h) ((_h)->mac.type == I40E_MAC_XL710 ? \
+ I40E_FW_API_VERSION_MINOR_X710 : \
+ I40E_FW_API_VERSION_MINOR_X722)
+
+/* API version 1.7 implements additional link and PHY-specific APIs */
+#define I40E_MINOR_VER_GET_LINK_INFO_XL710 0x0007
struct i40e_aq_desc {
__le16 flags;
--
2.14.1
^ permalink raw reply related
* [net-next 14/15] i40e: Enable VF to negotiate number of allocated queues
From: Jeff Kirsher @ 2017-09-30 0:45 UTC (permalink / raw)
To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20170930004507.20072-1-jeffrey.t.kirsher@intel.com>
From: Alan Brady <alan.brady@intel.com>
Currently the PF allocates a default number of queues for each VF and
cannot be changed. This patch enables the VF to request a different
number of queues allocated to it. This patch also adds a new virtchnl
op and capability flag to facilitate this negotiation.
After the PF receives a request message, it will set a requested number
of queues for that VF. Then when the VF resets, its VSI will get a new
number of queues allocated to it.
This is a best effort request and since we only allocate a guaranteed
default number, if the VF tries to ask for more than the guaranteed
number, there may not be enough in HW to accommodate it unless other
queues for other VFs are freed. It should also be noted decreasing the
number queues allocated to a VF to below the default will NOT enable the
allocation of more than 32 VFs per PF and will not free queues guaranteed
to each VF by default.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 75 ++++++++++++++++++++++
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h | 1 +
include/linux/avf/virtchnl.h | 20 ++++++
4 files changed, 97 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index b7a539cdca00..439c63cb2a0c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -77,6 +77,7 @@
#define i40e_default_queues_per_vmdq(pf) \
(((pf)->hw_features & I40E_HW_RSS_AQ_CAPABLE) ? 4 : 1)
#define I40E_DEFAULT_QUEUES_PER_VF 4
+#define I40E_MAX_VF_QUEUES 16
#define I40E_DEFAULT_QUEUES_PER_TC 1 /* should be a power of 2 */
#define i40e_pf_get_max_q_per_tc(pf) \
(((pf)->hw_features & I40E_HW_128_QP_RSS_CAPABLE) ? 128 : 64)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 4d1e670f490e..a75396c157d9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -815,6 +815,14 @@ static void i40e_free_vf_res(struct i40e_vf *vf)
*/
clear_bit(I40E_VF_STATE_INIT, &vf->vf_states);
+ /* It's possible the VF had requeuested more queues than the default so
+ * do the accounting here when we're about to free them.
+ */
+ if (vf->num_queue_pairs > I40E_DEFAULT_QUEUES_PER_VF) {
+ pf->queues_left += vf->num_queue_pairs -
+ I40E_DEFAULT_QUEUES_PER_VF;
+ }
+
/* free vsi & disconnect it from the parent uplink */
if (vf->lan_vsi_idx) {
i40e_vsi_release(pf->vsi[vf->lan_vsi_idx]);
@@ -868,12 +876,27 @@ static int i40e_alloc_vf_res(struct i40e_vf *vf)
int total_queue_pairs = 0;
int ret;
+ if (vf->num_req_queues &&
+ vf->num_req_queues <= pf->queues_left + I40E_DEFAULT_QUEUES_PER_VF)
+ pf->num_vf_qps = vf->num_req_queues;
+ else
+ pf->num_vf_qps = I40E_DEFAULT_QUEUES_PER_VF;
+
/* allocate hw vsi context & associated resources */
ret = i40e_alloc_vsi_res(vf, I40E_VSI_SRIOV);
if (ret)
goto error_alloc;
total_queue_pairs += pf->vsi[vf->lan_vsi_idx]->alloc_queue_pairs;
+ /* We account for each VF to get a default number of queue pairs. If
+ * the VF has now requested more, we need to account for that to make
+ * certain we never request more queues than we actually have left in
+ * HW.
+ */
+ if (total_queue_pairs > I40E_DEFAULT_QUEUES_PER_VF)
+ pf->queues_left -=
+ total_queue_pairs - I40E_DEFAULT_QUEUES_PER_VF;
+
if (vf->trusted)
set_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps);
else
@@ -1579,6 +1602,9 @@ static int i40e_vc_get_vf_resources_msg(struct i40e_vf *vf, u8 *msg)
VIRTCHNL_VF_OFFLOAD_WB_ON_ITR;
}
+ if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_REQ_QUEUES)
+ vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_REQ_QUEUES;
+
vfres->num_vsis = num_vsis;
vfres->num_queue_pairs = vf->num_queue_pairs;
vfres->max_vectors = pf->hw.func_caps.num_msix_vectors_vf;
@@ -1986,6 +2012,52 @@ static int i40e_vc_disable_queues_msg(struct i40e_vf *vf, u8 *msg, u16 msglen)
aq_ret);
}
+/**
+ * i40e_vc_request_queues_msg
+ * @vf: pointer to the VF info
+ * @msg: pointer to the msg buffer
+ * @msglen: msg length
+ *
+ * VFs get a default number of queues but can use this message to request a
+ * different number. Will respond with either the number requested or the
+ * maximum we can support.
+ **/
+static int i40e_vc_request_queues_msg(struct i40e_vf *vf, u8 *msg, int msglen)
+{
+ struct virtchnl_vf_res_request *vfres =
+ (struct virtchnl_vf_res_request *)msg;
+ int req_pairs = vfres->num_queue_pairs;
+ int cur_pairs = vf->num_queue_pairs;
+ struct i40e_pf *pf = vf->pf;
+
+ if (!test_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states))
+ return -EINVAL;
+
+ if (req_pairs <= 0) {
+ dev_err(&pf->pdev->dev,
+ "VF %d tried to request %d queues. Ignoring.\n",
+ vf->vf_id, req_pairs);
+ } else if (req_pairs > I40E_MAX_VF_QUEUES) {
+ dev_err(&pf->pdev->dev,
+ "VF %d tried to request more than %d queues.\n",
+ vf->vf_id,
+ I40E_MAX_VF_QUEUES);
+ vfres->num_queue_pairs = I40E_MAX_VF_QUEUES;
+ } else if (req_pairs - cur_pairs > pf->queues_left) {
+ dev_warn(&pf->pdev->dev,
+ "VF %d requested %d more queues, but only %d left.\n",
+ vf->vf_id,
+ req_pairs - cur_pairs,
+ pf->queues_left);
+ vfres->num_queue_pairs = pf->queues_left + cur_pairs;
+ } else {
+ vf->num_req_queues = req_pairs;
+ }
+
+ return i40e_vc_send_msg_to_vf(vf, VIRTCHNL_OP_REQUEST_QUEUES, 0,
+ (u8 *)vfres, sizeof(vfres));
+}
+
/**
* i40e_vc_get_stats_msg
* @vf: pointer to the VF info
@@ -2708,6 +2780,9 @@ int i40e_vc_process_vf_msg(struct i40e_pf *pf, s16 vf_id, u32 v_opcode,
case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
ret = i40e_vc_disable_vlan_stripping(vf, msg, msglen);
break;
+ case VIRTCHNL_OP_REQUEST_QUEUES:
+ ret = i40e_vc_request_queues_msg(vf, msg, msglen);
+ break;
case VIRTCHNL_OP_UNKNOWN:
default:
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
index 1f4b0c504368..5111d05d5f2f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.h
@@ -97,6 +97,7 @@ struct i40e_vf {
u16 lan_vsi_id; /* ID as used by firmware */
u8 num_queue_pairs; /* num of qps assigned to VF vsis */
+ u8 num_req_queues; /* num of requested qps */
u64 num_mdd_events; /* num of mdd events detected */
/* num of continuous malformed or invalid msgs detected */
u64 num_invalid_msgs;
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index 2b038442c352..60e5d90cb18a 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -135,6 +135,7 @@ enum virtchnl_ops {
VIRTCHNL_OP_SET_RSS_HENA = 26,
VIRTCHNL_OP_ENABLE_VLAN_STRIPPING = 27,
VIRTCHNL_OP_DISABLE_VLAN_STRIPPING = 28,
+ VIRTCHNL_OP_REQUEST_QUEUES = 29,
};
/* This macro is used to generate a compilation error if a structure
@@ -235,6 +236,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
#define VIRTCHNL_VF_OFFLOAD_RSS_AQ 0x00000008
#define VIRTCHNL_VF_OFFLOAD_RSS_REG 0x00000010
#define VIRTCHNL_VF_OFFLOAD_WB_ON_ITR 0x00000020
+#define VIRTCHNL_VF_OFFLOAD_REQ_QUEUES 0x00000040
#define VIRTCHNL_VF_OFFLOAD_VLAN 0x00010000
#define VIRTCHNL_VF_OFFLOAD_RX_POLLING 0x00020000
#define VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2 0x00040000
@@ -325,6 +327,21 @@ struct virtchnl_vsi_queue_config_info {
struct virtchnl_queue_pair_info qpair[1];
};
+/* VIRTCHNL_OP_REQUEST_QUEUES
+ * VF sends this message to request the PF to allocate additional queues to
+ * this VF. Each VF gets a guaranteed number of queues on init but asking for
+ * additional queues must be negotiated. This is a best effort request as it
+ * is possible the PF does not have enough queues left to support the request.
+ * If the PF cannot support the number requested it will respond with the
+ * maximum number it is able to support; otherwise it will respond with the
+ * number requested.
+ */
+
+/* VF resource request */
+struct virtchnl_vf_res_request {
+ u16 num_queue_pairs;
+};
+
VIRTCHNL_CHECK_STRUCT_LEN(72, virtchnl_vsi_queue_config_info);
/* VIRTCHNL_OP_CONFIG_IRQ_MAP
@@ -691,6 +708,9 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING:
case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
break;
+ case VIRTCHNL_OP_REQUEST_QUEUES:
+ valid_len = sizeof(struct virtchnl_vf_res_request);
+ break;
/* These are always errors coming from the VF. */
case VIRTCHNL_OP_EVENT:
case VIRTCHNL_OP_UNKNOWN:
--
2.14.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox