* Re: [RFC v2 bpf-next 8/9] bpf: Provide helper to do lookups in kernel FIB table
From: David Ahern @ 2018-04-30 1:13 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: netdev, borkmann, ast, davem, shm, roopa, brouer, toke,
john.fastabend
In-Reply-To: <20180429233640.jklasxafvap2q7ig@ast-mbp>
On 4/29/18 5:36 PM, Alexei Starovoitov wrote:
>> + if (flags & BPF_FIB_LOOKUP_DIRECT) {
>> + u32 tbid = l3mdev_fib_table_rcu(dev) ? : RT_TABLE_MAIN;
>> + struct fib_table *tb;
>> +
>> + tb = fib_get_table(net, tbid);
>> + if (unlikely(!tb))
>> + return 0;
>> +
>> + err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
>> + } else {
>> + fl4.flowi4_mark = 0;
>> + fl4.flowi4_secid = 0;
>> + fl4.flowi4_tun_key.tun_id = 0;
>> + fl4.flowi4_uid = sock_net_uid(net, NULL);
>> +
>> + err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
>> + }
>> +
>> + if (err || res.type != RTN_UNICAST)
>> + return 0;
>
> should this be an error returned to the user instead of zero?
> Seems useful to indicate.
res.type != UNICAST is not an error; it means other delivery type (e.g.,
local).
err < 0 means unreachable, prohibit, blackhole, etc. Arguably the error
could be returned to the xdp program, but it is more complicated than
that. Blackhole is a common default route or policy, but RTN_BLACKHOLE
== -EINVAL which is also the error code if the user passes invalid
arguments to the program.
>
>> +
>> + if (res.fi->fib_nhs > 1)
>> + fib_select_path(net, &res, &fl4, NULL);
>> +
>> + nh = &res.fi->fib_nh[res.nh_sel];
>> +
>> + /* do not handle lwt encaps right now */
>> + if (nh->nh_lwtstate)
>> + return 0;
>
> adn return enotsupp here?
see below
>
>> +
>> + dev = nh->nh_dev;
>> + if (unlikely(!dev))
>> + return 0;
>
> enodev ?
see below
>
>> +
>> + if (nh->nh_gw)
>> + params->ipv4_dst = nh->nh_gw;
>> +
>> + params->rt_metric = res.fi->fib_priority;
>> +
>> + /* xdp and cls_bpf programs are run in RCU-bh so
>> + * rcu_read_lock_bh is not needed here
>> + */
>> + neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
>> + if (neigh)
>> + return bpf_fib_set_fwd_params(params, neigh, dev);
>> +
>> + return 0;
>
> Even this return 0 doesn't quite fit to what doc says:
> "0 if packet needs to continue up the stack for further processing"
> What stack suppose to do ?
First packet on a route the nexthop may not be resolved. Without punting
to the stack it never has an impetus to resolve that neighbor.
> It will hit the same condition and packet will be dropped, right?
no. It can resolve the neighbor so follow up packets can be forwarded in
the fast path.
> Isn't it better to report all errors back to bpf prog and let
> the program make decision instead of 'return 0' almost everywhere?
The idea here is to fast pass packets that fit a supported profile and
are to be forwarded. Everything else should continue up the stack as it
has wider capabilities. The helper and XDP programs should make no
assumptions on what the broader kernel and userspace might be monitoring
or want to do with packets that can not be forwarded in the fast path.
This is very similar to hardware forwarding when it punts packets to the
CPU for control plane assistance.
^ permalink raw reply
* [PATCH] ethtool: fix a potential missing-check bug
From: Wenwen Wang @ 2018-04-30 1:31 UTC (permalink / raw)
To: Wenwen Wang
Cc: Kangjie Lu, David S. Miller, Florian Fainelli, Andrew Lunn,
Russell King, Edward Cree, Inbar Karmy, Eugenia Emantayev,
Al Viro, Yury Norov, Vidya Sagar Ravipati, Alan Brady,
Stephen Hemminger, open list:NETWORKING [GENERAL], open list
In ethtool_get_rxnfc(), the object "info" is firstly copied from
user-space. If the FLOW_RSS flag is set in the member field flow_type of
"info" (and cmd is ETHTOOL_GRXFH), info needs to be copied again from
user-space because FLOW_RSS is newer and has new definition, as mentioned
in the comment. However, given that the user data resides in user-space, a
malicious user can race to change the data after the first copy. By doing
so, the user can inject inconsistent data. For example, in the second
copy, the FLOW_RSS flag could be cleared in the field flow_type of "info".
In the following execution, "info" will be used in the function
ops->get_rxnfc(). Such inconsistent data can potentially lead to unexpected
information leakage since ops->get_rxnfc() will prepare various types of
data according to flow_type, and the prepared data will be eventually
copied to user-space. This inconsistent data may also cause undefined
behaviors based on how ops->get_rxnfc() is implemented.
This patch re-verifies the flow_type field of "info" after the second copy.
If the value is not as expected, an error code will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu>
---
net/core/ethtool.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 03416e6..a121034 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1032,6 +1032,8 @@ static noinline_for_stack int ethtool_get_rxnfc(struct net_device *dev,
info_size = sizeof(info);
if (copy_from_user(&info, useraddr, info_size))
return -EFAULT;
+ if (!(info.flow_type & FLOW_RSS))
+ return -EINVAL;
}
if (info.cmd == ETHTOOL_GRXCLSRLALL) {
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v4 net-next 0/2] tcp: mmap: rework zerocopy receive
From: David Miller @ 2018-04-30 1:34 UTC (permalink / raw)
To: edumazet; +Cc: netdev, luto, linux-kernel, linux-mm, ka-cheong.poon,
eric.dumazet
In-Reply-To: <20180427155809.79094-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Fri, 27 Apr 2018 08:58:07 -0700
> syzbot reported a lockdep issue caused by tcp mmap() support.
>
> I implemented Andy Lutomirski nice suggestions to resolve the
> issue and increase scalability as well.
>
> First patch is adding a new getsockopt() operation and changes mmap()
> behavior.
>
> Second patch changes tcp_mmap reference program.
>
> v4: tcp mmap() support depends on CONFIG_MMU, as kbuild bot told us.
>
> v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option
> instead of setsockopt(), feedback from Ka-Cheon Poon
>
> v2: Added a missing page align of zc->length in tcp_zerocopy_receive()
> Properly clear zc->recv_skip_hint in case user request was completed.
Looks great, series applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH] vhost: make msg padding explicit
From: David Miller @ 2018-04-30 1:34 UTC (permalink / raw)
To: mst; +Cc: linux-kernel, kevin, jasowang, kvm, virtualization, netdev
In-Reply-To: <1524844881-178524-1-git-send-email-mst@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 27 Apr 2018 19:02:05 +0300
> There's a 32 bit hole just after type. It's best to
> give it a name, this way compiler is forced to initialize
> it with rest of the structure.
>
> Reported-by: Kevin Easton <kevin@guarana.org>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Who applied this, me? :-)
^ permalink raw reply
* Re: [PATCH v6 net-next 0/3] lan78xx updates along with Fixed phy Support
From: David Miller @ 2018-04-30 1:41 UTC (permalink / raw)
To: raghuramchary.jallipalli; +Cc: netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180428060316.31396-1-raghuramchary.jallipalli@microchip.com>
From: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Date: Sat, 28 Apr 2018 11:33:13 +0530
> These series of patches handle few modifications in driver
> and adds support for fixed phy.
Series applied, thank you.
^ permalink raw reply
* [PATCH net] tcp: fix TCP_REPAIR_QUEUE bound checking
From: Eric Dumazet @ 2018-04-30 1:55 UTC (permalink / raw)
To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet, Pavel Emelyanov
syzbot is able to produce a nasty WARN_ON() in tcp_verify_left_out()
with following C-repro :
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [-1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(20002), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1242, MSG_FASTOPEN, {sa_family=AF_INET, sin_port=htons(20002), sin_addr=inet_addr("127.0.0.1")}, 16) = 1242
setsockopt(3, SOL_TCP, TCP_REPAIR_WINDOW, "\4\0\0@+\205\0\0\377\377\0\0\377\377\377\177\0\0\0\0", 20) = 0
writev(3, [{"\270", 1}], 1) = 1
setsockopt(3, SOL_TCP, TCP_REPAIR_OPTIONS, "\10\0\0\0\0\0\0\0\0\0\0\0|\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 386) = 0
writev(3, [{"\210v\r[\226\320t\231qwQ\204\264l\254\t\1\20\245\214p\350H\223\254;\\\37\345\307p$"..., 3144}], 1) = 3144
The 3rd system call looks odd :
setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [-1], 4) = 0
This patch makes sure bound checking is using an unsigned compare.
Fixes: ee9952831cfd ("tcp: Initial repair mode")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4b18ad41d4df354b151ef2a05e771cce48dcb5b7..44be7f43455e4aefde8db61e2d941a69abcc642a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2674,7 +2674,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_REPAIR_QUEUE:
if (!tp->repair)
err = -EPERM;
- else if (val < TCP_QUEUES_NR)
+ else if ((unsigned int)val < TCP_QUEUES_NR)
tp->repair_queue = val;
else
err = -EINVAL;
--
2.17.0.441.gb46fe60e1d-goog
^ permalink raw reply related
* Re: [PATCH 0/3] Clean up users of skb_tx_hash and __skb_tx_hash
From: David Miller @ 2018-04-30 2:01 UTC (permalink / raw)
To: alexander.h.duyck
Cc: netdev, linux-rdma, dennis.dalessandro, niranjana.vishwanathapura,
tariqt
In-Reply-To: <20180427180142.4883.96259.stgit@ahduyck-green-test.jf.intel.com>
From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Fri, 27 Apr 2018 14:06:22 -0400
> I am in the process of doing some work to try and enable macvlan Tx queue
> selection without using ndo_select_queue. As a part of that I will likely
> need to make changes to skb_tx_hash. As such this is a clean up or refactor
> of the two spots where he function has been used. In both cases it didn't
> really seem like the function was being used correctly so I have updated
> both code paths to not make use of the function.
>
> My current development environment doesn't have an mlx4 or OPA vnic
> available so the changes to those have been build tested only.
Looks good, series applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next 2/2] sctp: add sctp_make_op_error_limited and reuse inner functions
From: kbuild test robot @ 2018-04-30 2:14 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: kbuild-all, netdev, linux-sctp, Vlad Yasevich, Neil Horman,
Xin Long
In-Reply-To: <53909b1bd523d45a453431968fd1e03fd4be6196.1525017179.git.marcelo.leitner@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2012 bytes --]
Hi Marcelo,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/Marcelo-Ricardo-Leitner/sctp-allow-sctp_init_cause-to-return-errors/20180430-073613
config: i386-randconfig-s1-201817 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
net/sctp/sm_make_chunk.c: In function 'sctp_make_op_error_limited':
>> net/sctp/sm_make_chunk.c:1260:9: error: implicit declaration of function 'sctp_mtu_payload' [-Werror=implicit-function-declaration]
size = sctp_mtu_payload(sp, size, sizeof(struct sctp_errhdr));
^~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/sctp_mtu_payload +1260 net/sctp/sm_make_chunk.c
1240
1241 /* Create an Operation Error chunk of a fixed size, specifically,
1242 * min(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT) - overheads.
1243 * This is a helper function to allocate an error chunk for for those
1244 * invalid parameter codes in which we may not want to report all the
1245 * errors, if the incoming chunk is large. If it can't fit in a single
1246 * packet, we ignore it.
1247 */
1248 static inline struct sctp_chunk *sctp_make_op_error_limited(
1249 const struct sctp_association *asoc,
1250 const struct sctp_chunk *chunk)
1251 {
1252 size_t size = SCTP_DEFAULT_MAXSEGMENT;
1253 struct sctp_sock *sp = NULL;
1254
1255 if (asoc) {
1256 size = min_t(size_t, size, asoc->pathmtu);
1257 sp = sctp_sk(asoc->base.sk);
1258 }
1259
> 1260 size = sctp_mtu_payload(sp, size, sizeof(struct sctp_errhdr));
1261
1262 return sctp_make_op_error_space(asoc, chunk, size);
1263 }
1264
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28784 bytes --]
^ permalink raw reply
* Re: [PATCH net-next 2/2] sctp: add sctp_make_op_error_limited and reuse inner functions
From: kbuild test robot @ 2018-04-30 2:14 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: kbuild-all, netdev, linux-sctp, Vlad Yasevich, Neil Horman,
Xin Long
In-Reply-To: <53909b1bd523d45a453431968fd1e03fd4be6196.1525017179.git.marcelo.leitner@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]
Hi Marcelo,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/Marcelo-Ricardo-Leitner/sctp-allow-sctp_init_cause-to-return-errors/20180430-073613
config: x86_64-randconfig-x006-201817 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
net//sctp/sm_make_chunk.c: In function 'sctp_make_op_error_limited':
>> net//sctp/sm_make_chunk.c:1260:9: error: implicit declaration of function 'sctp_mtu_payload'; did you mean 'sctp_do_peeloff'? [-Werror=implicit-function-declaration]
size = sctp_mtu_payload(sp, size, sizeof(struct sctp_errhdr));
^~~~~~~~~~~~~~~~
sctp_do_peeloff
cc1: some warnings being treated as errors
vim +1260 net//sctp/sm_make_chunk.c
1240
1241 /* Create an Operation Error chunk of a fixed size, specifically,
1242 * min(asoc->pathmtu, SCTP_DEFAULT_MAXSEGMENT) - overheads.
1243 * This is a helper function to allocate an error chunk for for those
1244 * invalid parameter codes in which we may not want to report all the
1245 * errors, if the incoming chunk is large. If it can't fit in a single
1246 * packet, we ignore it.
1247 */
1248 static inline struct sctp_chunk *sctp_make_op_error_limited(
1249 const struct sctp_association *asoc,
1250 const struct sctp_chunk *chunk)
1251 {
1252 size_t size = SCTP_DEFAULT_MAXSEGMENT;
1253 struct sctp_sock *sp = NULL;
1254
1255 if (asoc) {
1256 size = min_t(size_t, size, asoc->pathmtu);
1257 sp = sctp_sk(asoc->base.sk);
1258 }
1259
> 1260 size = sctp_mtu_payload(sp, size, sizeof(struct sctp_errhdr));
1261
1262 return sctp_make_op_error_space(asoc, chunk, size);
1263 }
1264
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29738 bytes --]
^ permalink raw reply
* [PATCH bpf-next v2] samples/bpf: fix kprobe attachment issue on x64
From: Yonghong Song @ 2018-04-30 2:27 UTC (permalink / raw)
To: ast, daniel, netdev; +Cc: kernel-team
Commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename
struct pt_regs-based sys_*() to __x64_sys_*()") renamed a lot
of syscall function sys_*() to __x64_sys_*().
This caused several kprobe based samples/bpf tests failing.
This patch fixed the problem in bpf_load.c.
For x86_64 architecture, function name __x64_sys_*() will be
first used for kprobe event creation. If the creation is successful,
it will be used. Otherwise, function name sys_*() will be used
for kprobe event creation.
Fixes: d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
samples/bpf/bpf_load.c | 34 ++++++++++++++++++++++++++--------
1 file changed, 26 insertions(+), 8 deletions(-)
Changelogs:
v1 -> v2:
. make change in bpf_load.c instead of each individual bpf programs.
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index a27ef3c..da9bccf 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -145,6 +145,9 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
}
if (is_kprobe || is_kretprobe) {
+ bool need_normal_check = true;
+ const char *event_prefix = "";
+
if (is_kprobe)
event += 7;
else
@@ -158,18 +161,33 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
if (isdigit(*event))
return populate_prog_array(event, fd);
- snprintf(buf, sizeof(buf),
- "echo '%c:%s %s' >> /sys/kernel/debug/tracing/kprobe_events",
- is_kprobe ? 'p' : 'r', event, event);
- err = system(buf);
- if (err < 0) {
- printf("failed to create kprobe '%s' error '%s'\n",
- event, strerror(errno));
- return -1;
+#ifdef __x86_64__
+ if (strncmp(event, "sys_", 4) == 0) {
+ snprintf(buf, sizeof(buf),
+ "echo '%c:__x64_%s __x64_%s' >> /sys/kernel/debug/tracing/kprobe_events",
+ is_kprobe ? 'p' : 'r', event, event);
+ err = system(buf);
+ if (err >= 0) {
+ need_normal_check = false;
+ event_prefix = "__x64_";
+ }
+ }
+#endif
+ if (need_normal_check) {
+ snprintf(buf, sizeof(buf),
+ "echo '%c:%s %s' >> /sys/kernel/debug/tracing/kprobe_events",
+ is_kprobe ? 'p' : 'r', event, event);
+ err = system(buf);
+ if (err < 0) {
+ printf("failed to create kprobe '%s' error '%s'\n",
+ event, strerror(errno));
+ return -1;
+ }
}
strcpy(buf, DEBUGFS);
strcat(buf, "events/kprobes/");
+ strcat(buf, event_prefix);
strcat(buf, event);
strcat(buf, "/id");
} else if (is_tracepoint) {
--
2.9.5
^ permalink raw reply related
* Re: [PATCH net-next 2/2] sctp: add sctp_make_op_error_limited and reuse inner functions
From: Marcelo Ricardo Leitner @ 2018-04-30 2:34 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all, netdev, linux-sctp, Vlad Yasevich, Neil Horman,
Xin Long
In-Reply-To: <201804300849.DRgYHLmw%fengguang.wu@intel.com>
On Mon, Apr 30, 2018 at 10:14:06AM +0800, kbuild test robot wrote:
> Hi Marcelo,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on net-next/master]
>
> url: https://github.com/0day-ci/linux/commits/Marcelo-Ricardo-Leitner/sctp-allow-sctp_init_cause-to-return-errors/20180430-073613
This URL doesn't work, 404.
> config: x86_64-randconfig-x006-201817 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All errors (new ones prefixed by >>):
>
> net//sctp/sm_make_chunk.c: In function 'sctp_make_op_error_limited':
> >> net//sctp/sm_make_chunk.c:1260:9: error: implicit declaration of function 'sctp_mtu_payload'; did you mean 'sctp_do_peeloff'? [-Werror=implicit-function-declaration]
> size = sctp_mtu_payload(sp, size, sizeof(struct sctp_errhdr));
> ^~~~~~~~~~~~~~~~
> sctp_do_peeloff
> cc1: some warnings being treated as errors
Seems the test didn't pick up the MTU refactor patchset yet.
$ grep sctp/sctp -- net/sctp/sm_make_chunk.c
#include <net/sctp/sctp.h>
$ git grep sctp_mtu_payload -- include/
include/net/sctp/sctp.h:static inline __u32 sctp_mtu_payload(const
struct sctp_sock *sp,
it should be reachable.
Marcelo
^ permalink raw reply
* Greetings
From: Zeliha Omer faruk @ 2018-04-30 2:35 UTC (permalink / raw)
Hello Dear
Greetings to you, please I have a very important business proposal for our
mutual benefit, please let me know if you are interested please i asked
you before.
Best Regards,
Miss. Zeliha ömer Faruk
Caddesi Kristal Kule Binasi
No:215
^ permalink raw reply
* Re: [PATCH net-next v9 1/4] virtio_net: Introduce VIRTIO_NET_F_STANDBY feature bit
From: Samudrala, Sridhar @ 2018-04-30 2:47 UTC (permalink / raw)
To: Jiri Pirko
Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, aaron.f.brown
In-Reply-To: <20180428075027.GI5632@nanopsycho.orion>
[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]
On 4/28/2018 12:50 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:57PM CEST,sridhar.samudrala@intel.com wrote:
>> This feature bit can be used by hypervisor to indicate virtio_net device to
>> act as a standby for another device with the same MAC address.
>>
>> VIRTIO_NET_F_STANDBY is defined as bit 62 as it is a device feature bit.
>>
>> Signed-off-by: Sridhar Samudrala<sridhar.samudrala@intel.com>
>> ---
>> drivers/net/virtio_net.c | 2 +-
>> include/uapi/linux/virtio_net.h | 3 +++
>> 2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 3b5991734118..51a085b1a242 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -2999,7 +2999,7 @@ static struct virtio_device_id id_table[] = {
>> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>> VIRTIO_NET_F_CTRL_MAC_ADDR, \
>> VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>> - VIRTIO_NET_F_SPEED_DUPLEX
>> + VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY
> This is not part of current qemu master (head 6f0c4706b35dead265509115ddbd2a8d1af516c1)
> Were I can find the qemu code?
>
> Also, I think it makes sense to push HW (qemu HW in this case) first
> and only then the driver.
I had sent qemu patch with a couple of earlier versions of this patchset.
Will include it when i send out v10.
[-- Attachment #2: Type: text/html, Size: 1897 bytes --]
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-04-30 2:47 UTC (permalink / raw)
To: Jiri Pirko
Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, aaron.f.brown
In-Reply-To: <20180428081542.GJ5632@nanopsycho.orion>
[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]
On 4/28/2018 1:15 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:58PM CEST,sridhar.samudrala@intel.com wrote:
>> This provides a generic interface for paravirtual drivers to listen
>> for netdev register/unregister/link change events from pci ethernet
>> devices with the same MAC and takeover their datapath. The notifier and
>> event handling code is based on the existing netvsc implementation.
>>
>> It exposes 2 sets of interfaces to the paravirtual drivers.
>> 1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>> the failover module provides interfaces to create/destroy additional
>> master netdev and all the slave events are managed internally.
>> net_failover_create()
>> net_failover_destroy()
>> A failover netdev is created that acts a master device and controls 2
>> slave devices. The original virtio_net netdev is registered as 'standby'
>> netdev and a passthru/vf device with the same MAC gets registered as
>> 'primary' netdev. Both 'standby' and 'primary' netdevs are associated
>> with the same 'pci' device. The user accesses the network interface via
> 'standby' and 'primary' netdevs are not associated with the same 'pci'
> device.
> "Primary" is the VF netdevice and "standby" is virtio_net. Each
> associated with different pci device.
I meant to say that 'standby' and 'failover' netdevs are associated with
the same 'pci' device. will fix it in v10.
>
>> 'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>> default for transmits when it is available with link up and running.
>> 2. For existing netvsc driver that uses 2 netdev model, no master netdev
>> is created. The paravirtual driver registers each instance of netvsc
>> as a 'failover' netdev along with a set of ops to manage the slave
>> events. There is no 'standby' netdev in this model. A passthru/vf device
>> with the same MAC gets registered as 'primary' netdev.
>> net_failover_register()
>> net_failover_unregister()
> [...]
[-- Attachment #2: Type: text/html, Size: 2696 bytes --]
^ permalink raw reply
* Re: [PATCH] net: systemport: fix spelling mistake: "asymetric" -> "asymmetric"
From: David Miller @ 2018-04-30 2:49 UTC (permalink / raw)
To: colin.king; +Cc: f.fainelli, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20180427190925.17097-1-colin.king@canonical.com>
From: Colin King <colin.king@canonical.com>
Date: Fri, 27 Apr 2018 20:09:25 +0100
> From: Colin Ian King <colin.king@canonical.com>
>
> Trivial fix to spelling mistake in netdev_warn warning message
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net] MAINTAINERS: add myself as SCTP co-maintainer
From: David Miller @ 2018-04-30 2:49 UTC (permalink / raw)
To: marcelo.leitner; +Cc: netdev, linux-sctp, vyasevich, nhorman
In-Reply-To: <88bbaf8536e2c38ee3d0382a7b23a54409548964.1524775440.git.marcelo.leitner@gmail.com>
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Fri, 27 Apr 2018 16:46:11 -0300
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] net: core: Assert the size of netdev_featres_t
From: David Miller @ 2018-04-30 2:52 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, edumazet
In-Reply-To: <20180427201114.28830-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Fri, 27 Apr 2018 13:11:14 -0700
> We have about 53 netdev_features_t bits defined and counting, add a
> build time check to catch when an u64 type will not be enough and we
> will have to convert that to a bitmap. This is done in
> register_netdevice() for convenience.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Applied, but I don't know about putting that check as an inline
function in a header file included by every networking foo.c file.
It means that the inline function has to be parsed and (potentially)
optimized by the compiler for every foo.c file that either directly or
indirectly includes that header.
^ permalink raw reply
* Re: [RFC net-next 0/5] Support for PHY test modes
From: David Miller @ 2018-04-30 2:55 UTC (permalink / raw)
To: f.fainelli
Cc: netdev, andrew, rmk, linux-kernel, cphealy, nikita.yoush,
vivien.didelot, Nisar.Sayed, UNGLinuxDriver
In-Reply-To: <20180428003237.1536-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Fri, 27 Apr 2018 17:32:30 -0700
> This patch series adds support for specifying PHY test modes through
> ethtool and paves the ground for adding support for more complex
> test modes that might require data to be exchanged between user and
> kernel space.
>
> As an example, patches are included to add support for the IEEE
> electrical test modes for 100BaseT2 and 1000BaseT. Those do not
> require data to be passed back and forth.
>
> I believe the infrastructure to be usable enough to add support for
> other things like:
>
> - cable diagnostics
> - pattern generator/waveform generator with specific pattern being
> indicated for instance
>
> Questions for Andrew, and others:
>
> - there could be room for adding additional ETH_TEST_FL_* values in order to
> help determine how the test should be running
> - some of these tests can be disruptive to connectivity, the minimum we could
> do is stop the PHY state machine and restart it when "normal" is used to exit
> those test modes
>
> Comments welcome!
Generally, no objection to providing this in the general manner you
have implemented it via ethtool.
I think in order to answer the disruptive question, you need to give
some information about what kind of context this stuff would be
used in, and if in those contexts what the user expectations are
or might be.
Are these test modes something that usually would be initiated with
the interface down?
^ permalink raw reply
* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-04-30 3:00 UTC (permalink / raw)
To: Jiri Pirko
Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, aaron.f.brown
In-Reply-To: <20180428082433.GK5632@nanopsycho.orion>
On 4/28/2018 1:24 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:59PM CEST, sridhar.samudrala@intel.com wrote:
>> This patch enables virtio_net to switch over to a VF datapath when a VF
>> netdev is present with the same MAC address. It allows live migration
>> of a VM with a direct attached VF without the need to setup a bond/team
>> between a VF and virtio net device in the guest.
>>
>> The hypervisor needs to enable only one datapath at any time so that
>> packets don't get looped back to the VM over the other datapath. When a VF
> Why? Both datapaths could be enabled at a time. Why the loop on
> hypervisor side would be a problem. This in not an issue for
> bonding/team as well.
Somehow the hypervisor needs to make sure that the broadcasts/multicasts from the VM
sent over the VF datapath don't get looped back to the VM via the virtio-net datapth.
This can happen if both datapaths are enabled at the same time.
I would think this is an issue even with bonding/team as well when virtio-net and
VF are backed by the same PF.
>
>
>> is plugged, the virtio datapath link state can be marked as down. The
>> hypervisor needs to unplug the VF device from the guest on the source host
>> and reset the MAC filter of the VF to initiate failover of datapath to
> "reset the MAC filter of the VF" - you mean "set the VF mac"?
Yes. the PF should take away the MAC address assigned to the VF so that the PF
starts receiving those packets.
>
>
>> virtio before starting the migration. After the migration is completed,
>> the destination hypervisor sets the MAC filter on the VF and plugs it back
>> to the guest to switch over to VF datapath.
>>
>> It uses the generic failover framework that provides 2 functions to create
>> and destroy a master failover netdev. When STANDBY feature is enabled, an
>> additional netdev(failover netdev) is created that acts as a master device
>> and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> is marked as 'standby' netdev and a passthru device with the same MAC is
>> registered as 'primary' netdev.
>>
>> This patch is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
> [...]
>
^ permalink raw reply
* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Samudrala, Sridhar @ 2018-04-30 3:03 UTC (permalink / raw)
To: Jiri Pirko
Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, aaron.f.brown
In-Reply-To: <20180428090601.GL5632@nanopsycho.orion>
On 4/28/2018 2:06 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
>> This provides a generic interface for paravirtual drivers to listen
>> for netdev register/unregister/link change events from pci ethernet
>> devices with the same MAC and takeover their datapath. The notifier and
>> event handling code is based on the existing netvsc implementation.
>>
>> It exposes 2 sets of interfaces to the paravirtual drivers.
>> 1. For paravirtual drivers like virtio_net that use 3 netdev model, the
>> the failover module provides interfaces to create/destroy additional
>> master netdev and all the slave events are managed internally.
>> net_failover_create()
>> net_failover_destroy()
>> A failover netdev is created that acts a master device and controls 2
>> slave devices. The original virtio_net netdev is registered as 'standby'
>> netdev and a passthru/vf device with the same MAC gets registered as
>> 'primary' netdev. Both 'standby' and 'primary' netdevs are associated
>> with the same 'pci' device. The user accesses the network interface via
>> 'failover' netdev. The 'failover' netdev chooses 'primary' netdev as
>> default for transmits when it is available with link up and running.
>> 2. For existing netvsc driver that uses 2 netdev model, no master netdev
>> is created. The paravirtual driver registers each instance of netvsc
>> as a 'failover' netdev along with a set of ops to manage the slave
>> events. There is no 'standby' netdev in this model. A passthru/vf device
>> with the same MAC gets registered as 'primary' netdev.
>> net_failover_register()
>> net_failover_unregister()
>>
> First of all, I like this v9 very much. Nice progress!
> Couple of notes inlined.
Thanks for the detailed reviews and all your suggestions for improvements
agree with all your comments. will address them in v10.
^ permalink raw reply
* Re: [PATCH bpf-next v2] samples/bpf: fix kprobe attachment issue on x64
From: Alexei Starovoitov @ 2018-04-30 3:39 UTC (permalink / raw)
To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180430022748.4093322-1-yhs@fb.com>
On Sun, Apr 29, 2018 at 07:27:48PM -0700, Yonghong Song wrote:
> Commit d5a00528b58c ("syscalls/core, syscalls/x86: Rename
> struct pt_regs-based sys_*() to __x64_sys_*()") renamed a lot
> of syscall function sys_*() to __x64_sys_*().
> This caused several kprobe based samples/bpf tests failing.
>
> This patch fixed the problem in bpf_load.c.
> For x86_64 architecture, function name __x64_sys_*() will be
> first used for kprobe event creation. If the creation is successful,
> it will be used. Otherwise, function name sys_*() will be used
> for kprobe event creation.
>
> Fixes: d5a00528b58c ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()")
> Signed-off-by: Yonghong Song <yhs@fb.com>
Applied, Thanks Yonghong.
^ permalink raw reply
* Re: [PATCH net-next v9 3/4] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-04-30 4:16 UTC (permalink / raw)
To: Jiri Pirko
Cc: mst, stephen, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, aaron.f.brown
In-Reply-To: <20180428094205.GM5632@nanopsycho.orion>
[-- Attachment #1: Type: text/plain, Size: 4928 bytes --]
On 4/28/2018 2:42 AM, Jiri Pirko wrote:
> Fri, Apr 27, 2018 at 07:06:59PM CEST,sridhar.samudrala@intel.com wrote:
>> This patch enables virtio_net to switch over to a VF datapath when a VF
>> netdev is present with the same MAC address. It allows live migration
>> of a VM with a direct attached VF without the need to setup a bond/team
>> between a VF and virtio net device in the guest.
>>
>> The hypervisor needs to enable only one datapath at any time so that
>> packets don't get looped back to the VM over the other datapath. When a VF
>> is plugged, the virtio datapath link state can be marked as down. The
>> hypervisor needs to unplug the VF device from the guest on the source host
>> and reset the MAC filter of the VF to initiate failover of datapath to
>> virtio before starting the migration. After the migration is completed,
>> the destination hypervisor sets the MAC filter on the VF and plugs it back
>> to the guest to switch over to VF datapath.
>>
>> It uses the generic failover framework that provides 2 functions to create
>> and destroy a master failover netdev. When STANDBY feature is enabled, an
>> additional netdev(failover netdev) is created that acts as a master device
>> and tracks the state of the 2 lower netdevs. The original virtio_net netdev
>> is marked as 'standby' netdev and a passthru device with the same MAC is
>> registered as 'primary' netdev.
>>
>> This patch is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>
> When I enabled the standby feature (hardcoded), I have 2 netdevices now:
> 4: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
> inet6 fe80::5054:ff:feb2:a7f1/64 scope link
> valid_lft forever preferred_lft forever
> 5: ens3n_sby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
> link/ether 52:54:00:b2:a7:f1 brd ff:ff:ff:ff:ff:ff
> inet6 fe80::5054:ff:feb2:a7f1/64 scope link
> valid_lft forever preferred_lft forever
>
> However, it seems to confuse my initscripts on Fedora:
> [root@test1 ~]# ifup ens3
> ./network-functions: line 78: [: /etc/dhcp/dhclient-ens3: binary operator expected
> ./network-functions: line 80: [: /etc/dhclient-ens3: binary operator expected
> ./network-functions: line 69: [: /var/lib/dhclient/dhclient-ens3: binary operator expected
>
> Determining IP information for ens3
> ens3n_sby...Cannot find device "ens3n_sby.pid"
> Cannot find device "ens3n_sby.lease"
> failed.
>
> I tried to change the standby device mac:
> ip link set ens3n_sby addr 52:54:00:b2:a7:f2
> [root@test1 ~]# ifup ens3
>
> Determining IP information for ens3... done.
> [root@test1 ~]#
>
> But now the network does not work. I think that the mac change on
> standby device should be probably refused, no?
Yes. we should block changing standby device mac.
> When I change the mac back, all works fine.
This is strange. So you had to change the standby device mac twice to
get dhcp working.
I do see NetworkManager trying to get dhcp address on standby device, but
i don't see any issue with connectivity. To be totally transparent, we
need to only expose one netdev.
>
> Now I try to change mac of the failover master:
> [root@test1 ~]# ip link set ens3 addr 52:54:00:b2:a7:f3
> RTNETLINK answers: Operation not supported
>
> That I did expect to work. I would expect this would change the mac of
> the master and both standby and primary slaves.
If a VF is untrusted, a VM will not able to change its MAC and moreover
in this mode we are assuming that the hypervisor has assigned the MAC and
guest is not expected to change the MAC.
For the initial implementation, i would propose not allowing the guest to
change the MAC of failover or standby dev.
>
> Now I tried to add a primary pci device. I don't have any fancy VF on my
> test setup, but I expected the good old 8139cp to work:
> [root@test1 ~]# ethtool -i ens9
> driver: 8139cp
> ....
> [root@test1 ~]# ip link set ens9 addr 52:54:00:b2:a7:f1
>
> I see no message in dmesg, so I guess the failover module did not
> enslave this netdev. The mac change is not monitored. I would expect
> that it is and whenever a device changes mac to the failover one, it
> should be enslaved and whenever it changes mac back to something else,
> it should be released - the primary one ofcourse.
Sure. that may be the best way to handle the guest changing the primary
netdev's mac.
>
>
>
> [...]
>
>> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> + size_t len)
>> +{
>> + struct virtnet_info *vi = netdev_priv(dev);
>> + int ret;
>> +
>> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
>> + return -EOPNOTSUPP;
>> +
>> + ret = snprintf(buf, len, "_sby");
> please avoid the "_".
>
> [...]
[-- Attachment #2: Type: text/html, Size: 6088 bytes --]
^ permalink raw reply
* Product Inquiry
From: Gerhard Kahmann @ 2018-04-30 5:01 UTC (permalink / raw)
?Dear Sir,
We recently visited your website, we were recommended by one of your customer and we are interested in your models, We will like to place an order from the list of your products. However, we would like to see your company's latest catalogs with the; minimum order quantity, delivery time/FOB, payment terms etc. Official order placement will follow as soon as possible.
Awaiting your prompt reply
Best Regards
Gerhard Kahmann
Purchasing Dept
*****************************
^ permalink raw reply
* Re: [PATCH ipsec] vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too
From: Steffen Klassert @ 2018-04-30 5:59 UTC (permalink / raw)
To: Stefano Brivio
Cc: Xin Long, Alexey Kodanev, Jarod Wilson, Sabrina Dubroca, netdev
In-Reply-To: <5b82986d3d509cf02bcc81b463a9be66babb90c8.1524764025.git.sbrivio@redhat.com>
On Thu, Apr 26, 2018 at 07:39:09PM +0200, Stefano Brivio wrote:
> A vti6 interface can carry IPv4 as well, so it makes no sense to
> enforce a minimum MTU of IPV6_MIN_MTU.
>
> If the user sets an MTU below IPV6_MIN_MTU, IPv6 will be
> disabled on the interface, courtesy of addrconf_notify().
>
> Reported-by: Xin Long <lucien.xin@gmail.com>
> Fixes: b96f9afee4eb ("ipv4/6: use core net MTU range checking")
> Fixes: c6741fbed6dc ("vti6: Properly adjust vti6 MTU from MTU of lower device")
> Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Applied, thanks Stefano!
^ permalink raw reply
* Re: [PATCH bpf-next] selftests/bpf: bpf tunnel test.
From: Y Song @ 2018-04-30 7:02 UTC (permalink / raw)
To: William Tu; +Cc: Linux Kernel Network Developers
In-Reply-To: <CALDO+SY68uk2KF-c85De7qs+69553azz8Wp3j4LKRywDnnjjgg@mail.gmail.com>
Hi, William,
When compiled the selftests/bpf in my centos 7 based system, I have
the following failures,
clang -I. -I./include/uapi -I../../../include/uapi
-Wno-compare-distinct-pointer-types \
-O2 -target bpf -emit-llvm -c test_tunnel_kern.c -o - | \
llc -march=bpf -mcpu=generic -filetype=obj -o
/data/users/yhs/work/net-next/tools/testing/selftests/bpf/test_tunnel_kern.o
test_tunnel_kern.c:21:10: fatal error: 'linux/erspan.h' file not found
#include <linux/erspan.h>
^~~~~~~~~~~~~~~~
1 error generated.
Maybe I missed some packages to install?
Thanks!
On Thu, Apr 26, 2018 at 6:59 AM, William Tu <u9012063@gmail.com> wrote:
> On Wed, Apr 25, 2018 at 8:01 AM, William Tu <u9012063@gmail.com> wrote:
>> The patch migrates the original tests at samples/bpf/tcbpf2_kern.c
>> and samples/bpf/test_tunnel_bpf.sh to selftests. There are a couple
>> changes from the original:
>> 1) add ipv6 vxlan, ipv6 geneve, ipv6 ipip tests
>> 2) simplify the original ipip tests (remove iperf tests)
>> 3) improve documentation
>> 4) use bpf_ntoh* and bpf_hton* api
>>
>> In summary, 'test_tunnel_kern.o' contains the following bpf program:
>> GRE: gre_set_tunnel, gre_get_tunnel
>> IP6GRE: ip6gretap_set_tunnel, ip6gretap_get_tunnel
>> ERSPAN: erspan_set_tunnel, erspan_get_tunnel
>> IP6ERSPAN: ip4ip6erspan_set_tunnel, ip4ip6erspan_get_tunnel
>> VXLAN: vxlan_set_tunnel, vxlan_get_tunnel
>> IP6VXLAN: ip6vxlan_set_tunnel, ip6vxlan_get_tunnel
>> GENEVE: geneve_set_tunnel, geneve_get_tunnel
>> IP6GENEVE: ip6geneve_set_tunnel, ip6geneve_get_tunnel
>> IPIP: ipip_set_tunnel, ipip_get_tunnel
>> IP6IP: ipip6_set_tunnel, ipip6_get_tunnel,
>> ip6ip6_set_tunnel, ip6ip6_get_tunnel
>>
>> Signed-off-by: William Tu <u9012063@gmail.com>
>> ---
>
> I made a mistake by removing the recent XFRM helper test cases.
> I will send v2.
>
> William
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox