Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net: decnet: Replace GFP_ATOMIC with GFP_KERNEL in dn_route_init
From: Jia-Ju Bai @ 2018-04-09 14:10 UTC (permalink / raw)
  To: davem, kafai, weiwan, dsa, johannes.berg, fw
  Cc: linux-decnet-user, netdev, linux-kernel, Jia-Ju Bai

dn_route_init() is never called in atomic context.

The call chain ending up at dn_route_init() is:
[1] dn_route_init() <- decnet_init()
decnet_init() is only set as a parameter of module_init().

Despite never getting called from atomic context,
dn_route_init() calls __get_free_pages() with GFP_ATOMIC,
which waits busily for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
to avoid busy waiting and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 net/decnet/dn_route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 0bd3afd..59ed12a 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1898,7 +1898,7 @@ void __init dn_route_init(void)
 		while(dn_rt_hash_mask & (dn_rt_hash_mask - 1))
 			dn_rt_hash_mask--;
 		dn_rt_hash_table = (struct dn_rt_hash_bucket *)
-			__get_free_pages(GFP_ATOMIC, order);
+			__get_free_pages(GFP_KERNEL, order);
 	} while (dn_rt_hash_table == NULL && --order > 0);

 	if (!dn_rt_hash_table)
-- 
1.9.1

^ permalink raw reply related

* [PATCH] net: dccp: Replace GFP_ATOMIC with GFP_KERNEL in dccp_init
From: Jia-Ju Bai @ 2018-04-09 14:10 UTC (permalink / raw)
  To: gerrit, davem; +Cc: dccp, netdev, linux-kernel, Jia-Ju Bai

dccp_init() is never called in atomic context.
This function is only set as a parameter of module_init().

Despite never getting called from atomic context,
dccp_init() calls __get_free_pages() with GFP_ATOMIC,
which waits busily for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
to avoid busy waiting and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 net/dccp/proto.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index b68168f..f63ba93 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1159,7 +1159,7 @@ static int __init dccp_init(void)
 			hash_size--;
 		dccp_hashinfo.ehash_mask = hash_size - 1;
 		dccp_hashinfo.ehash = (struct inet_ehash_bucket *)
-			__get_free_pages(GFP_ATOMIC|__GFP_NOWARN, ehash_order);
+			__get_free_pages(GFP_KERNEL|__GFP_NOWARN, ehash_order);
 	} while (!dccp_hashinfo.ehash && --ehash_order > 0);
 
 	if (!dccp_hashinfo.ehash) {
@@ -1182,7 +1182,7 @@ static int __init dccp_init(void)
 		    bhash_order > 0)
 			continue;
 		dccp_hashinfo.bhash = (struct inet_bind_hashbucket *)
-			__get_free_pages(GFP_ATOMIC|__GFP_NOWARN, bhash_order);
+			__get_free_pages(GFP_KERNEL|__GFP_NOWARN, bhash_order);
 	} while (!dccp_hashinfo.bhash && --bhash_order >= 0);
 
 	if (!dccp_hashinfo.bhash) {
-- 
1.9.1

^ permalink raw reply related

* [PATCH net] inetpeer: fix uninit-value in inet_getpeer
From: Eric Dumazet @ 2018-04-09 13:43 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

syzbot/KMSAN reported that p->dtime was read while it was
not yet initialized in :

	delta = (__u32)jiffies - p->dtime;
	if (delta < ttl || !refcount_dec_if_one(&p->refcnt))
		gc_stack[i] = NULL;

This is a false positive, because the inetpeer wont be erased
from rb-tree if the refcount_dec_if_one(&p->refcnt) does not
succeed. And this wont happen before first inet_putpeer() call
for this inetpeer has been done, and ->dtime field is written
exactly before the refcount_dec_and_test(&p->refcnt).

The KMSAN report was :

BUG: KMSAN: uninit-value in inet_peer_gc net/ipv4/inetpeer.c:163 [inline]
BUG: KMSAN: uninit-value in inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228
CPU: 0 PID: 9494 Comm: syz-executor5 Not tainted 4.16.0+ #82
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:53
 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
 inet_peer_gc net/ipv4/inetpeer.c:163 [inline]
 inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228
 inet_getpeer_v4 include/net/inetpeer.h:110 [inline]
 icmpv4_xrlim_allow net/ipv4/icmp.c:330 [inline]
 icmp_send+0x2b44/0x3050 net/ipv4/icmp.c:725
 ip_options_compile+0x237c/0x29f0 net/ipv4/ip_options.c:472
 ip_rcv_options net/ipv4/ip_input.c:284 [inline]
 ip_rcv_finish+0xda8/0x16d0 net/ipv4/ip_input.c:365
 NF_HOOK include/linux/netfilter.h:288 [inline]
 ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493
 __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562
 __netif_receive_skb net/core/dev.c:4627 [inline]
 netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
 netif_receive_skb+0x230/0x240 net/core/dev.c:4725
 tun_rx_batched drivers/net/tun.c:1555 [inline]
 tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962
 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
 do_iter_write+0x30d/0xd40 fs/read_write.c:932
 vfs_writev fs/read_write.c:977 [inline]
 do_writev+0x3c9/0x830 fs/read_write.c:1012
 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
 SyS_writev+0x56/0x80 fs/read_write.c:1082
 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x455111
RSP: 002b:00007fae0365cba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 000000000000002e RCX: 0000000000455111
RDX: 0000000000000001 RSI: 00007fae0365cbf0 RDI: 00000000000000fc
RBP: 0000000020000040 R08: 00000000000000fc R09: 0000000000000000
R10: 000000000000002e R11: 0000000000000293 R12: 00000000ffffffff
R13: 0000000000000658 R14: 00000000006fc8e0 R15: 0000000000000000

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
 kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
 kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756
 inet_getpeer+0xed8/0x1e70 net/ipv4/inetpeer.c:210
 inet_getpeer_v4 include/net/inetpeer.h:110 [inline]
 ip4_frag_init+0x4d1/0x740 net/ipv4/ip_fragment.c:153
 inet_frag_alloc net/ipv4/inet_fragment.c:369 [inline]
 inet_frag_create net/ipv4/inet_fragment.c:385 [inline]
 inet_frag_find+0x7da/0x1610 net/ipv4/inet_fragment.c:418
 ip_find net/ipv4/ip_fragment.c:275 [inline]
 ip_defrag+0x448/0x67a0 net/ipv4/ip_fragment.c:676
 ip_check_defrag+0x775/0xda0 net/ipv4/ip_fragment.c:724
 packet_rcv_fanout+0x2a8/0x8d0 net/packet/af_packet.c:1447
 deliver_skb net/core/dev.c:1897 [inline]
 deliver_ptype_list_skb net/core/dev.c:1912 [inline]
 __netif_receive_skb_core+0x314a/0x4a80 net/core/dev.c:4545
 __netif_receive_skb net/core/dev.c:4627 [inline]
 netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
 netif_receive_skb+0x230/0x240 net/core/dev.c:4725
 tun_rx_batched drivers/net/tun.c:1555 [inline]
 tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962
 tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
 do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
 do_iter_write+0x30d/0xd40 fs/read_write.c:932
 vfs_writev fs/read_write.c:977 [inline]
 do_writev+0x3c9/0x830 fs/read_write.c:1012
 SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
 SyS_writev+0x56/0x80 fs/read_write.c:1082
 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/ipv4/inetpeer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 1f04bd91fc2e999ddb82f4be92d39d229166b691..d757b9642d0d1c418bffad5bcd50e8e7bf336c66 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -211,6 +211,7 @@ struct inet_peer *inet_getpeer(struct inet_peer_base *base,
 		p = kmem_cache_alloc(peer_cachep, GFP_ATOMIC);
 		if (p) {
 			p->daddr = *daddr;
+			p->dtime = (__u32)jiffies;
 			refcount_set(&p->refcnt, 2);
 			atomic_set(&p->rid, 0);
 			p->metrics[RTAX_LOCK-1] = INETPEER_METRICS_NEW;
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Quentin Monnet @ 2018-04-09 13:33 UTC (permalink / raw)
  To: Markus Heiser, Daniel Borkmann
  Cc: Jonathan Corbet, ast, netdev, oss-drivers, Linux Doc Mailing List,
	linux-man
In-Reply-To: <550CD0C6-10B5-4C8C-9C1E-70AA61ABDC34@darmarit.de>

2018-04-09 12:52 UTC+0200 ~ Markus Heiser <markus.heiser@darmarit.de>
> 
>> Am 09.04.2018 um 12:08 schrieb Daniel Borkmann <daniel@iogearbox.net>:
> [...]
> 
>>> May I completely misunderstood you, so correct my if I'am wrong:
>>>
>>> - ./scripts/bpf_helpers_doc.py : produces reST markup from C-comments
>>> - ./scripts/kerne-doc          : produces reST markup from C-comments
>>>
>>> IMO: both are doing the same job, so why not using kernel-doc?
>>
>> They are not really doing the same job, in bpf_helpers_doc.py case you don't
>> want the whole header rendered, but just a fraction of it, that is, the
>> single big comment which describes all BPF helper functions that a BPF
>> program developer has available to use in user space - aka the entries in
>> the __BPF_FUNC_MAPPER() macro;
> 
> 
>> I also doubt the latter would actually qualify
>> in kdoc context as some sort of a function description.
> 
> latter .. ah, OK .. thanks for clarifying. 
> 
> -- Markus --

As Daniel explained, kernel-doc does not apply in this case, we do not
have the full function prototype for eBPF helpers in the header file.
But to be honest, I didn't even realise kernel-doc was available and
could do something close to what I was looking for, so thanks for your
feedback! :)

Quentin

^ permalink raw reply

* v4.16 in_dev_finish_destroy() accessing freed idev->dev->pcpu_refcnt
From: Mark Rutland @ 2018-04-09 13:30 UTC (permalink / raw)
  To: netdev, linux-kernel

Hi all,

As a heads-up, while fuzzing v4.16 on arm64, I'm hitting an intermittent
issue where in_dev_finish_destroy() calls dev_put() on idev->dev, where
idev->dev->pcpu_refcnt is NULL. Apparently idev->dev has already been
freed.

This results in a fault where we try to access the percpu offset of
NULL, such as in the example splat at the end of this mail.

So far I've had no luck in extracting a reliable reproducer, but I've
uploaded the relevant syzkaller logs (and reports) to my kernel.org
webspace [1].

Thanks,
Mark.

[1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20180409-in_dev_finish_destroy-null-pcpu_refcnt/

Unable to handle kernel paging request at virtual address 60002be80000
Mem abort info:
  ESR = 0x96000004
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000004
  CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000fb56b784
[000060002be80000] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.16.0-00006-g6eee13dfb529 #1
Hardware name: linux,dummy-virt (DT)
pstate: 80400005 (Nzcv daif +PAN -UAO)
pc : __percpu_add arch/arm64/include/asm/percpu.h:102 [inline]
pc : dev_put include/linux/netdevice.h:3395 [inline]
pc : in_dev_finish_destroy+0xcc/0x230 net/ipv4/devinet.c:231
lr : dev_put include/linux/netdevice.h:3395 [inline]
lr : in_dev_finish_destroy+0xa4/0x230 net/ipv4/devinet.c:231
sp : ffff800036317b70
x29: ffff800036317b70 x28: ffff80001f39d338 
x27: ffff20000a917460 x26: ffff20000a917000 
x25: 000000000000000a x24: ffff20000a919000 
x23: ffff800036317c90 x22: ffff200009b84da8 
x21: 1ffff00003e73a68 x20: ffff80006e7a9100 
x19: ffff80001f39d180 x18: 0000000000000000 
x17: 0000000000000000 x16: ffff2000082951f0 
x15: 0000000000000001 x14: 00000000f2000000 
x13: ffff20000a9fb560 x12: 0000000000000002 
x11: 1ffff0000c147fbb x10: 0000000000000004 
x9 : 0000000000000000 x8 : 1fffe4000178908c 
x7 : 0000000000000000 x6 : 0000000000000000 
x5 : 0000000000000000 x4 : 1fffe4000178908c 
x3 : 1fffe4000178908c x2 : ffffffffffffffff 
x1 : 000060002be80000 x0 : 000060002be80000 
Process ksoftirqd/2 (pid: 22, stack limit = 0x00000000e96adbb2)
Call trace:
 __percpu_add arch/arm64/include/asm/percpu.h:102 [inline]
 dev_put include/linux/netdevice.h:3395 [inline]
 in_dev_finish_destroy+0xcc/0x230 net/ipv4/devinet.c:231
 in_dev_put include/linux/inetdevice.h:248 [inline]
 in_dev_rcu_put+0x34/0x48 net/ipv4/devinet.c:287
 __rcu_reclaim kernel/rcu/rcu.h:172 [inline]
 rcu_do_batch kernel/rcu/tree.c:2674 [inline]
 invoke_rcu_callbacks kernel/rcu/tree.c:2933 [inline]
 __rcu_process_callbacks kernel/rcu/tree.c:2900 [inline]
 rcu_process_callbacks+0x350/0x988 kernel/rcu/tree.c:2917
 __do_softirq+0x318/0x734 kernel/softirq.c:285
 run_ksoftirqd+0x70/0xa8 kernel/softirq.c:666
 smpboot_thread_fn+0x544/0x9d0 kernel/smpboot.c:164
 kthread+0x2f8/0x380 kernel/kthread.c:238
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1158
Code: d538d081 f9425a80 92800002 8b010000 (885f7c04) 
---[ end trace 577c2c0fbbf0d4d4 ]---

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Quentin Monnet @ 2018-04-09 13:25 UTC (permalink / raw)
  To: Daniel Borkmann, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <05d2d03a-0b39-9d0c-9ba0-3461afc45fac@iogearbox.net>

2018-04-09 11:01 UTC+0200 ~ Daniel Borkmann <daniel@iogearbox.net>
> On 04/06/2018 01:11 PM, Quentin Monnet wrote:
>> eBPF helper functions can be called from within eBPF programs to perform
>> a variety of tasks that would be otherwise hard or impossible to do with
>> eBPF itself. There is a growing number of such helper functions in the
>> kernel, but documentation is scarce. The main user space header file
>> does contain a short commented description of most helpers, but it is
>> somewhat outdated and not complete. It is more a "cheat sheet" than a
>> real documentation accessible to new eBPF developers.
>>
>> This commit attempts to improve the situation by replacing the existing
>> overview for the helpers with a more developed description. Furthermore,
>> a Python script is added to generate a manual page for eBPF helpers. The
>> workflow is the following, and requires the rst2man utility:
>>
>>     $ ./scripts/bpf_helpers_doc.py \
>>             --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
>>     $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
>>     $ man /tmp/bpf-helpers.7
>>
>> The objective is to keep all documentation related to the helpers in a
>> single place, and to be able to generate from here a manual page that
>> could be packaged in the man-pages repository and shipped with most
>> distributions [1].
>>
>> Additionally, parsing the prototypes of the helper functions could
>> hopefully be reused, with a different Printer object, to generate
>> header files needed in some eBPF-related projects.
>>
>> Regarding the description of each helper, it comprises several items:
>>
>> - The function prototype.
>> - A description of the function and of its arguments (except for a
>>   couple of cases, when there are no arguments and the return value
>>   makes the function usage really obvious).
>> - A description of return values (if not void).
>> - A listing of eBPF program types (if relevant, map types) compatible
>>   with the helper.
>> - Information about the helper being restricted to GPL programs, or not.
>> - The kernel version in which the helper was introduced.
>> - The commit that introduced the helper (this is mostly to have it in
>>   the source of the man page, as it can be used to track changes and
>>   update the page).
>>
>> For several helpers, descriptions are inspired (at times, nearly copied)
>> from the commit logs introducing them in the kernel--Many thanks to
>> their respective authors! They were completed as much as possible, the
>> objective being to have something easily accessible even for people just
>> starting with eBPF. There is probably a bit more work to do in this
>> direction for some helpers.
>>
>> Some RST formatting is used in the descriptions (not in function
>> prototypes, to keep them readable, but the Python script provided in
>> order to generate the RST for the manual page does add formatting to
>> prototypes, to produce something pretty) to get "bold" and "italics" in
>> manual pages. Hopefully, the descriptions in bpf.h file remains
>> perfectly readable. Note that the few trailing white spaces are
>> intentional, removing them would break paragraphs for rst2man.
>>
>> The descriptions should ideally be updated each time someone adds a new
>> helper, or updates the behaviour (compatibility extended to new program
>> types, new socket option supported...) or the interface (new flags
>> available, ...) of existing ones.
>>
>> [1] I have not contacted people from the man-pages project prior to
>>     sending this RFC, so I can offer no guaranty at this time that they
>>     would accept to take the generated man page.
>>
>> Cc: linux-doc@vger.kernel.org
>> Cc: linux-man@vger.kernel.org
>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> 
> Great work, thanks a lot for doing this!
> 
> [...]
>> + * int bpf_probe_read(void *dst, u32 size, const void *src)
>> + * 	Description
>> + * 		For tracing programs, safely attempt to read *size* bytes from
>> + * 		address *src* and store the data in *dst*.
>> + * 	Return
>> + * 		0 on success, or a negative error in case of failure.
>> + * 	For
>> + * 		*Tracing programs*.
>> + * 	GPL only
>> + * 		Yes
>> + * 	Since
>> + * 		Linux 4.1
>> + *
>> + * .. commit 2541517c32be
>>   *
>>   * u64 bpf_ktime_get_ns(void)
>> - *     Return: current ktime
>> - *
>> - * int bpf_trace_printk(const char *fmt, int fmt_size, ...)
>> - *     Return: length of buffer written or negative error
>> + * 	Description
>> + * 		Return the time elapsed since system boot, in nanoseconds.
>> + * 	Return
>> + * 		Current *ktime*.
>> + * 	For
>> + * 		All program types, except
>> + * 		**BPF_PROG_TYPE_CGROUP_DEVICE**.
> 
> I think we should probably always just list the actual program types instead,
> since when new types get added to the kernel not supporting bpf_ktime_get_ns()
> helper in this example, the above statement would be misleading (potentially
> more misleading than the other way around when it's not yet mentioned to be
> supported).

Agreed. I realise “All program types” is really awkward here.

>> + * 	GPL only
>> + * 		Yes
>> + * 	Since
>> + * 		Linux 4.1
>> + *
>> + * .. commit d9847d310ab4
>> + *

[...]

>> + * u32 bpf_get_smp_processor_id(void)
>> + * 	Return
>> + * 		The SMP (Symmetric multiprocessing) processor id.
>> + * 	For
>> + * 		*Networking programs*.
> 
> Ditto plus above is actually not correct. See tracing_func_proto():
> 
>         [...]
>         case BPF_FUNC_get_smp_processor_id:
>                 return &bpf_get_smp_processor_id_proto;
>         [...]

Thanks for the catch! I will fix on my side, even if we drop the "For"
section for now.

>> + * 	GPL only
>> + * 		No
>> + * 	Since
>> + * 		Linux 4.1
>> + *
>> + * .. commit c04167ce2ca0
>> + *
>> + * int bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from, u32 len, u64 flags)
>> + * 	Description
>> + * 		Store *len* bytes from address *from* into the packet
>> + * 		associated to *skb*, at *offset*. *flags* are a combination of
>> + * 		**BPF_F_RECOMPUTE_CSUM** (automatically recompute the
>> + * 		checksum for the packet after storing the bytes) and
>> + * 		**BPF_F_INVALIDATE_HASH** (set *skb*\ **->hash**, *skb*\
>> + * 		**->swhash** and *skb*\ **->l4hash** to 0).
>> + *
>> + * 		A call to this helper is susceptible to change data from the
>> + * 		packet. Therefore, at load time, all checks on pointers
>> + * 		previously done by the verifier are invalidated and must be
>> + * 		performed again.
>> + * 	Return
> [...]
> 
> Agree with Alexei that 'Description' but also 'Return' is the most useful
> (the latter we can still improve a bit wrt errors). 'For' gets indeed quickly
> out of hand but I do think that for BPF program developers 'Since' has good
> value to quickly check when a helper could be available, but then again
> this is actually tied to an individual kconfig and/or program type, and could
> potentially require to add or (worst case) remove the info in a second commit
> e.g. after the merge window similar with the commit sha. Probably best indeed
> to stick to the signature, 'Description' and 'Return' initially.

I understand that the "For" section will be the hardest to maintain
up-to-date, however, I believe that--along with minimal kernel version
and GPL information--it would be useful for readers (Alexei, what is
your case against the "GPL only" section?). Daniel, your proposal for
the "Since" information in a second time sounds good! But I do not have
a solution right now for maintaining the "For" section on the long term.

Anyway, I am fine with keeping just signatures, descriptions and return
values for now. I will submit a new version with only those items.

> We should probably also have the bpf_helpers_doc.py workflow in a separate
> comment above this one such that developers don't have to look up the original
> commit message, but have the required steps to render the rst/man page available
> right where they need it.

Sure! I was considering adding some comments on top of the description,
but haven't done it for the RFC. I will do for the next version.

Thanks for the reviews!
Quentin

^ permalink raw reply

* [PATCH v5] net: thunderx: rework mac addresses list to u64 array
From: Vadim Lomovtsev @ 2018-04-09 13:24 UTC (permalink / raw)
  To: sgoutham, sunil.kovvuri, rric, linux-arm-kernel, netdev,
	linux-kernel, davem
  Cc: dnelson, robin.murphy, hch, gustavo, Vadim Lomovtsev
In-Reply-To: <20180406195354.16037-1-Vadim.Lomovtsev@caviumnetworks.com>

From: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>

It is too expensive to pass u64 values via linked list, instead
allocate array for them by overall number of mac addresses from netdev.

This eventually removes multiple kmalloc() calls, aviod memory
fragmentation and allow to put single null check on kmalloc
return value in order to prevent a potential null pointer dereference.

Addresses-Coverity-ID: 1467429 ("Dereference null return value")
Fixes: 37c3347eb247 ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
---
Changes from v1 to v2:
 - C99 syntax: update xcast_addr_list struct field mc[0] -> mc[];
Changes from v2 to v3:
 - update commit description with 'Reported-by: Dan Carpenter';
 - update size calculations for mc list to offsetof() call
   instead of explicit arithmetic;
Changes from v3 to v4:
 - change loop control variable type from u8 to int, accordingly
   to mc_count size;
Changes from v4 to v5:
 - remove explicit initialization of 'idx' variable as per review comment;
---
 drivers/net/ethernet/cavium/thunder/nic.h        |  7 +-----
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 28 +++++++++---------------
 2 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index 5fc46c5a4f36..448d1fafc827 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -265,14 +265,9 @@ struct nicvf_drv_stats {
 
 struct cavium_ptp;
 
-struct xcast_addr {
-	struct list_head list;
-	u64              addr;
-};
-
 struct xcast_addr_list {
-	struct list_head list;
 	int              count;
+	u64              mc[];
 };
 
 struct nicvf_work {
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 1e9a31fef729..707db3304396 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1929,7 +1929,7 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
 						  work.work);
 	struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
 	union nic_mbx mbx = {};
-	struct xcast_addr *xaddr, *next;
+	int idx;
 
 	if (!vf_work)
 		return;
@@ -1956,16 +1956,10 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
 	/* check if we have any specific MACs to be added to PF DMAC filter */
 	if (vf_work->mc) {
 		/* now go through kernel list of MACs and add them one by one */
-		list_for_each_entry_safe(xaddr, next,
-					 &vf_work->mc->list, list) {
+		for (idx = 0; idx < vf_work->mc->count; idx++) {
 			mbx.xcast.msg = NIC_MBOX_MSG_ADD_MCAST;
-			mbx.xcast.data.mac = xaddr->addr;
+			mbx.xcast.data.mac = vf_work->mc->mc[idx];
 			nicvf_send_msg_to_pf(nic, &mbx);
-
-			/* after receiving ACK from PF release memory */
-			list_del(&xaddr->list);
-			kfree(xaddr);
-			vf_work->mc->count--;
 		}
 		kfree(vf_work->mc);
 	}
@@ -1996,17 +1990,15 @@ static void nicvf_set_rx_mode(struct net_device *netdev)
 			mode |= BGX_XCAST_MCAST_FILTER;
 			/* here we need to copy mc addrs */
 			if (netdev_mc_count(netdev)) {
-				struct xcast_addr *xaddr;
-
-				mc_list = kmalloc(sizeof(*mc_list), GFP_ATOMIC);
-				INIT_LIST_HEAD(&mc_list->list);
+				mc_list = kmalloc(offsetof(typeof(*mc_list),
+							   mc[netdev_mc_count(netdev)]),
+						  GFP_ATOMIC);
+				if (unlikely(!mc_list))
+					return;
+				mc_list->count = 0;
 				netdev_hw_addr_list_for_each(ha, &netdev->mc) {
-					xaddr = kmalloc(sizeof(*xaddr),
-							GFP_ATOMIC);
-					xaddr->addr =
+					mc_list->mc[mc_list->count] =
 						ether_addr_to_u64(ha->addr);
-					list_add_tail(&xaddr->list,
-						      &mc_list->list);
 					mc_list->count++;
 				}
 			}
-- 
2.14.3

^ permalink raw reply related

* Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)
From: Stefan Hajnoczi @ 2018-04-09 13:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, netdev, syzkaller-bugs, syzbot, linux-kernel,
	Linux Virtualization
In-Reply-To: <20180409032835.GB1648@stefanha-x1.localdomain>

On Mon, Apr 9, 2018 at 11:28 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Apr 09, 2018 at 05:44:36AM +0300, Michael S. Tsirkin wrote:
>> On Mon, Apr 09, 2018 at 10:37:45AM +0800, Stefan Hajnoczi wrote:
>> > On Sat, Apr 7, 2018 at 3:02 AM, syzbot
>> > <syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com> wrote:
>> > > syzbot hit the following crash on upstream commit
>> > > 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +0000)
>> > > Merge tag 'armsoc-drivers' of
>> > > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
>> > > syzbot dashboard link:
>> > > https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd
>> >
>> > To prevent duplicated work: I am working on this one.
>> >
>> > Stefan
>>
>> Do you want to try this patchset:
>> https://lkml.org/lkml/2018/4/5/665
>>
>> ?
>
> Thanks, I'll give it a shot.
>
> I also noticed a regression in commit
> d65026c6c62e7d9616c8ceb5a53b68bcdc050525 ("vhost: validate log when
> IOTLB is enabled") and am currently testing a fix.

I have sent a fix:
https://lkml.org/lkml/2018/4/9/390

Stefan

^ permalink raw reply

* Re: WARNING in ip_rt_bug
From: Eric Dumazet @ 2018-04-09 13:11 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot
  Cc: David Miller, Alexey Kuznetsov, LKML, netdev, syzkaller-bugs,
	Hideaki YOSHIFUJI, Eric Dumazet
In-Reply-To: <CACT4Y+YfM=SL3-cgeJ_7JCvAwX2KsyMWV=HHpGfLV=-RKo7TQQ@mail.gmail.com>



On 04/08/2018 11:06 PM, Dmitry Vyukov wrote:
> On Mon, Apr 9, 2018 at 7:59 AM, syzbot
> <syzbot+b09ac67a2af842b12eab@syzkaller.appspotmail.com> wrote:
>> Hello,
>>
>> syzbot hit the following crash on net-next commit
>> 8bde261e535257e81087d39ff808414e2f5aa39d (Sun Apr 1 02:31:43 2018 +0000)
>> Merge tag 'mlx5-updates-2018-03-30' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
>> syzbot dashboard link:
>> https://syzkaller.appspot.com/bug?extid=b09ac67a2af842b12eab
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>> Raw console output:
>> https://syzkaller.appspot.com/x/log.txt?id=5991727739437056
>> Kernel config:
>> https://syzkaller.appspot.com/x/.config?id=3327544840960562528
>> compiler: gcc (GCC) 7.1.1 20170620
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+b09ac67a2af842b12eab@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
> 
> 
> +Eric said that perhaps we just need to revert:
> 
> commit c378a9c019cf5e017d1ed24954b54fae7bebd2bc
> Date:   Sat May 21 07:16:42 2011 +0000
>     ipv4: Give backtrace in ip_rt_bug().
> 

And David replied :

<quote>
Let's not do the revert, I wouldn't have seen the backtrace which
points where this bug is if we had.

icmp_route_lookup(), in one branch, does an input route lookup and
uses the result of that to send the icmp message.

That can't be right, input routes should never be used for
transmitting traffice and that's how we end up at ip_rt_bug().

</quote>

^ permalink raw reply

* [PATCH] net/mlx5: remove some extraneous spaces in indentations
From: Colin King @ 2018-04-09 12:43 UTC (permalink / raw)
  To: Saeed Mahameed, Matan Barak, Leon Romanovsky, netdev, linux-rdma
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

There are several lines where there is an extraneous space causing
indentation misalignment. Remove them.

Cleans up Cocconelle warnings:

./drivers/net/ethernet/mellanox/mlx5/core/qp.c:409:3-18: code aligned
with following code on line 410
./drivers/net/ethernet/mellanox/mlx5/core/qp.c:415:3-18: code aligned
with following code on line 416
./drivers/net/ethernet/mellanox/mlx5/core/qp.c:421:3-18: code aligned
with following code on line 422

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/qp.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
index 02d6c5b5d502..4ca07bfb6b14 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c
@@ -407,21 +407,21 @@ static int modify_qp_mbox_alloc(struct mlx5_core_dev *dev, u16 opcode, int qpn,
 	case MLX5_CMD_OP_RST2INIT_QP:
 		if (MBOX_ALLOC(mbox, rst2init_qp))
 			return -ENOMEM;
-		 MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn,
-				   opt_param_mask, qpc);
-		 break;
+		MOD_QP_IN_SET_QPC(rst2init_qp, mbox->in, opcode, qpn,
+				  opt_param_mask, qpc);
+		break;
 	case MLX5_CMD_OP_INIT2RTR_QP:
 		if (MBOX_ALLOC(mbox, init2rtr_qp))
 			return -ENOMEM;
-		 MOD_QP_IN_SET_QPC(init2rtr_qp, mbox->in, opcode, qpn,
-				   opt_param_mask, qpc);
-		 break;
+		MOD_QP_IN_SET_QPC(init2rtr_qp, mbox->in, opcode, qpn,
+				  opt_param_mask, qpc);
+		break;
 	case MLX5_CMD_OP_RTR2RTS_QP:
 		if (MBOX_ALLOC(mbox, rtr2rts_qp))
 			return -ENOMEM;
-		 MOD_QP_IN_SET_QPC(rtr2rts_qp, mbox->in, opcode, qpn,
-				   opt_param_mask, qpc);
-		 break;
+		MOD_QP_IN_SET_QPC(rtr2rts_qp, mbox->in, opcode, qpn,
+				  opt_param_mask, qpc);
+		break;
 	case MLX5_CMD_OP_RTS2RTS_QP:
 		if (MBOX_ALLOC(mbox, rts2rts_qp))
 			return -ENOMEM;
-- 
2.15.1

^ permalink raw reply related

* Re: [PATCH 2/2] alx: add disable_wol paramenter
From: Andrew Lunn @ 2018-04-09 12:39 UTC (permalink / raw)
  To: AceLan Kao
  Cc: Jay Cliburn, Chris Snook, David S . Miller, Rakesh Pandit, netdev,
	linux-kernel
In-Reply-To: <1523273714-17264-2-git-send-email-acelan.kao@canonical.com>

On Mon, Apr 09, 2018 at 07:35:14PM +0800, AceLan Kao wrote:
> The WoL feature was reported broken and will lead to
> the system resume immediately after suspending.
> This symptom is not happening on every system, so adding
> disable_wol option and disable WoL by default to prevent the issue from
> happening again.

>  const char alx_drv_name[] = "alx";
>  
> +/* disable WoL by default */
> +bool disable_wol = 1;
> +module_param(disable_wol, bool, 0);
> +MODULE_PARM_DESC(disable_wol, "Disable Wake on Lan feature");
> +

Hi AceLan

This seems like you are papering over the cracks. And module
parameters are not liked.

Please try to find the real problem.

       Andrew

^ permalink raw reply

* Re: Passing uninitialised local variable
From: Petr Machata @ 2018-04-09 12:23 UTC (permalink / raw)
  To: Arend van Spriel
  Cc: Himanshu Jha, franky.lin, hante.meuleman, chi-hsien.lin,
	wright.feng, kvalo, johannes.berg, linux-wireless,
	brcm80211-dev-list.pdl, brcm80211-dev-list, netdev
In-Reply-To: <5ABD5735.1050608@broadcom.com>

Arend van Spriel <arend.vanspriel@broadcom.com> writes:

> On 3/28/2018 1:20 PM, Himanshu Jha wrote:
>> I recently found that a local variable in passed uninitialised to the
>> function at
>>
>> drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:2950
>>
>>                  u32 var;
>>                  err = brcmf_fil_iovar_int_get(ifp, "dtim_assoc", &var);

>>
>> s32
>> brcmf_fil_iovar_int_get(struct brcmf_if *ifp, char *name, u32 *data)
>> {
>>          __le32 data_le = cpu_to_le32(*data);

>> }
>>
>> We can cleary see that 'var' in used uninitialised in the very first line
>> which is an undefined behavior.
>
> Why undefined? We copy some stack data and we do transfer that to the device. However in this case
> the device does nothing with it and it is simply overwritten by the response.

"Undefined behavior" is a technical term for when there are no
guarantees as to what the result of executing a given code will be. None
at all--it might for example abort, and that would be perfectly valid as
well. (To be clear, this is not about the device, but about the CPU that
this code runs on.)

Uninitialized reads are one example of a code construct that invokes
undefined behavior.

Thanks,
Petr

^ permalink raw reply

* [PATCH 2/2] alx: add disable_wol paramenter
From: AceLan Kao @ 2018-04-09 11:35 UTC (permalink / raw)
  To: Jay Cliburn, Chris Snook, David S . Miller, Rakesh Pandit, netdev,
	linux-kernel
In-Reply-To: <1523273714-17264-1-git-send-email-acelan.kao@canonical.com>

The WoL feature was reported broken and will lead to
the system resume immediately after suspending.
This symptom is not happening on every system, so adding
disable_wol option and disable WoL by default to prevent the issue from
happening again.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=61651
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
---
 drivers/net/ethernet/atheros/alx/ethtool.c | 7 ++++++-
 drivers/net/ethernet/atheros/alx/main.c    | 5 +++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/alx/ethtool.c b/drivers/net/ethernet/atheros/alx/ethtool.c
index 859e272..e50129d 100644
--- a/drivers/net/ethernet/atheros/alx/ethtool.c
+++ b/drivers/net/ethernet/atheros/alx/ethtool.c
@@ -46,6 +46,8 @@
 #include "reg.h"
 #include "hw.h"
 
+extern const bool disable_wol;
+
 /* The order of these strings must match the order of the fields in
  * struct alx_hw_stats
  * See hw.h
@@ -315,6 +317,9 @@ static void alx_get_wol(struct net_device *netdev, struct ethtool_wolinfo *wol)
 	struct alx_priv *alx = netdev_priv(netdev);
 	struct alx_hw *hw = &alx->hw;
 
+	if (disable_wol)
+		return;
+
 	wol->supported = WAKE_MAGIC | WAKE_PHY;
 	wol->wolopts = 0;
 
@@ -329,7 +334,7 @@ static int alx_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol)
 	struct alx_priv *alx = netdev_priv(netdev);
 	struct alx_hw *hw = &alx->hw;
 
-	if (wol->wolopts & ~(WAKE_MAGIC | WAKE_PHY))
+	if (disable_wol || (wol->wolopts & ~(WAKE_MAGIC | WAKE_PHY)))
 		return -EOPNOTSUPP;
 
 	hw->sleep_ctrl = 0;
diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c
index c0e2bb2..c27fdf7 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -51,6 +51,11 @@
 
 const char alx_drv_name[] = "alx";
 
+/* disable WoL by default */
+bool disable_wol = 1;
+module_param(disable_wol, bool, 0);
+MODULE_PARM_DESC(disable_wol, "Disable Wake on Lan feature");
+
 static void alx_free_txbuf(struct alx_tx_queue *txq, int entry)
 {
 	struct alx_buffer *txb = &txq->bufs[entry];
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/2] Revert "alx: remove WoL support"
From: AceLan Kao @ 2018-04-09 11:35 UTC (permalink / raw)
  To: Jay Cliburn, Chris Snook, David S . Miller, Rakesh Pandit, netdev,
	linux-kernel

This reverts commit bc2bebe8de8ed4ba6482c9cc370b0dd72ffe8cd2.

There are still many people need this feature, so try adding it back.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=61651
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
---
 drivers/net/ethernet/atheros/alx/ethtool.c |  36 +++++++
 drivers/net/ethernet/atheros/alx/hw.c      | 154 ++++++++++++++++++++++++++++-
 drivers/net/ethernet/atheros/alx/hw.h      |   5 +
 drivers/net/ethernet/atheros/alx/main.c    | 142 ++++++++++++++++++++++++--
 4 files changed, 326 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/atheros/alx/ethtool.c b/drivers/net/ethernet/atheros/alx/ethtool.c
index 2f4eabf..859e272 100644
--- a/drivers/net/ethernet/atheros/alx/ethtool.c
+++ b/drivers/net/ethernet/atheros/alx/ethtool.c
@@ -310,11 +310,47 @@ static int alx_get_sset_count(struct net_device *netdev, int sset)
 	}
 }
 
+static void alx_get_wol(struct net_device *netdev, struct ethtool_wolinfo *wol)
+{
+	struct alx_priv *alx = netdev_priv(netdev);
+	struct alx_hw *hw = &alx->hw;
+
+	wol->supported = WAKE_MAGIC | WAKE_PHY;
+	wol->wolopts = 0;
+
+	if (hw->sleep_ctrl & ALX_SLEEP_WOL_MAGIC)
+		wol->wolopts |= WAKE_MAGIC;
+	if (hw->sleep_ctrl & ALX_SLEEP_WOL_PHY)
+		wol->wolopts |= WAKE_PHY;
+}
+
+static int alx_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol)
+{
+	struct alx_priv *alx = netdev_priv(netdev);
+	struct alx_hw *hw = &alx->hw;
+
+	if (wol->wolopts & ~(WAKE_MAGIC | WAKE_PHY))
+		return -EOPNOTSUPP;
+
+	hw->sleep_ctrl = 0;
+
+	if (wol->wolopts & WAKE_MAGIC)
+		hw->sleep_ctrl |= ALX_SLEEP_WOL_MAGIC;
+	if (wol->wolopts & WAKE_PHY)
+		hw->sleep_ctrl |= ALX_SLEEP_WOL_PHY;
+
+	device_set_wakeup_enable(&alx->hw.pdev->dev, hw->sleep_ctrl);
+
+	return 0;
+}
+
 const struct ethtool_ops alx_ethtool_ops = {
 	.get_pauseparam	= alx_get_pauseparam,
 	.set_pauseparam	= alx_set_pauseparam,
 	.get_msglevel	= alx_get_msglevel,
 	.set_msglevel	= alx_set_msglevel,
+	.get_wol	= alx_get_wol,
+	.set_wol	= alx_set_wol,
 	.get_link	= ethtool_op_get_link,
 	.get_strings	= alx_get_strings,
 	.get_sset_count	= alx_get_sset_count,
diff --git a/drivers/net/ethernet/atheros/alx/hw.c b/drivers/net/ethernet/atheros/alx/hw.c
index 6ac40b0..f9bf612 100644
--- a/drivers/net/ethernet/atheros/alx/hw.c
+++ b/drivers/net/ethernet/atheros/alx/hw.c
@@ -332,6 +332,16 @@ void alx_set_macaddr(struct alx_hw *hw, const u8 *addr)
 	alx_write_mem32(hw, ALX_STAD1, val);
 }
 
+static void alx_enable_osc(struct alx_hw *hw)
+{
+	u32 val;
+
+	/* rising edge */
+	val = alx_read_mem32(hw, ALX_MISC);
+	alx_write_mem32(hw, ALX_MISC, val & ~ALX_MISC_INTNLOSC_OPEN);
+	alx_write_mem32(hw, ALX_MISC, val | ALX_MISC_INTNLOSC_OPEN);
+}
+
 static void alx_reset_osc(struct alx_hw *hw, u8 rev)
 {
 	u32 val, val2;
@@ -774,7 +784,6 @@ int alx_setup_speed_duplex(struct alx_hw *hw, u32 ethadv, u8 flowctrl)
 	return err;
 }
 
-
 void alx_post_phy_link(struct alx_hw *hw)
 {
 	u16 phy_val, len, agc;
@@ -848,6 +857,65 @@ void alx_post_phy_link(struct alx_hw *hw)
 	}
 }
 
+/* NOTE:
+ *    1. phy link must be established before calling this function
+ *    2. wol option (pattern,magic,link,etc.) is configed before call it.
+ */
+int alx_pre_suspend(struct alx_hw *hw, int speed, u8 duplex)
+{
+	u32 master, mac, phy, val;
+	int err = 0;
+
+	master = alx_read_mem32(hw, ALX_MASTER);
+	master &= ~ALX_MASTER_PCLKSEL_SRDS;
+	mac = hw->rx_ctrl;
+	/* 10/100 half */
+	ALX_SET_FIELD(mac, ALX_MAC_CTRL_SPEED,  ALX_MAC_CTRL_SPEED_10_100);
+	mac &= ~(ALX_MAC_CTRL_FULLD | ALX_MAC_CTRL_RX_EN | ALX_MAC_CTRL_TX_EN);
+
+	phy = alx_read_mem32(hw, ALX_PHY_CTRL);
+	phy &= ~(ALX_PHY_CTRL_DSPRST_OUT | ALX_PHY_CTRL_CLS);
+	phy |= ALX_PHY_CTRL_RST_ANALOG | ALX_PHY_CTRL_HIB_PULSE |
+	       ALX_PHY_CTRL_HIB_EN;
+
+	/* without any activity  */
+	if (!(hw->sleep_ctrl & ALX_SLEEP_ACTIVE)) {
+		err = alx_write_phy_reg(hw, ALX_MII_IER, 0);
+		if (err)
+			return err;
+		phy |= ALX_PHY_CTRL_IDDQ | ALX_PHY_CTRL_POWER_DOWN;
+	} else {
+		if (hw->sleep_ctrl & (ALX_SLEEP_WOL_MAGIC | ALX_SLEEP_CIFS))
+			mac |= ALX_MAC_CTRL_RX_EN | ALX_MAC_CTRL_BRD_EN;
+		if (hw->sleep_ctrl & ALX_SLEEP_CIFS)
+			mac |= ALX_MAC_CTRL_TX_EN;
+		if (duplex == DUPLEX_FULL)
+			mac |= ALX_MAC_CTRL_FULLD;
+		if (speed == SPEED_1000)
+			ALX_SET_FIELD(mac, ALX_MAC_CTRL_SPEED,
+				      ALX_MAC_CTRL_SPEED_1000);
+		phy |= ALX_PHY_CTRL_DSPRST_OUT;
+		err = alx_write_phy_ext(hw, ALX_MIIEXT_ANEG,
+					ALX_MIIEXT_S3DIG10,
+					ALX_MIIEXT_S3DIG10_SL);
+		if (err)
+			return err;
+	}
+
+	alx_enable_osc(hw);
+	hw->rx_ctrl = mac;
+	alx_write_mem32(hw, ALX_MASTER, master);
+	alx_write_mem32(hw, ALX_MAC_CTRL, mac);
+	alx_write_mem32(hw, ALX_PHY_CTRL, phy);
+
+	/* set val of PDLL D3PLLOFF */
+	val = alx_read_mem32(hw, ALX_PDLL_TRNS1);
+	val |= ALX_PDLL_TRNS1_D3PLLOFF_EN;
+	alx_write_mem32(hw, ALX_PDLL_TRNS1, val);
+
+	return 0;
+}
+
 bool alx_phy_configured(struct alx_hw *hw)
 {
 	u32 cfg, hw_cfg;
@@ -920,6 +988,26 @@ int alx_clear_phy_intr(struct alx_hw *hw)
 	return alx_read_phy_reg(hw, ALX_MII_ISR, &isr);
 }
 
+int alx_config_wol(struct alx_hw *hw)
+{
+	u32 wol = 0;
+	int err = 0;
+
+	/* turn on magic packet event */
+	if (hw->sleep_ctrl & ALX_SLEEP_WOL_MAGIC)
+		wol |= ALX_WOL0_MAGIC_EN | ALX_WOL0_PME_MAGIC_EN;
+
+	/* turn on link up event */
+	if (hw->sleep_ctrl & ALX_SLEEP_WOL_PHY) {
+		wol |=  ALX_WOL0_LINK_EN | ALX_WOL0_PME_LINK;
+		/* only link up can wake up */
+		err = alx_write_phy_reg(hw, ALX_MII_IER, ALX_IER_LINK_UP);
+	}
+	alx_write_mem32(hw, ALX_WOL0, wol);
+
+	return err;
+}
+
 void alx_disable_rss(struct alx_hw *hw)
 {
 	u32 ctrl = alx_read_mem32(hw, ALX_RXQ0);
@@ -1044,6 +1132,70 @@ void alx_mask_msix(struct alx_hw *hw, int index, bool mask)
 	alx_post_write(hw);
 }
 
+int alx_select_powersaving_speed(struct alx_hw *hw, int *speed, u8 *duplex)
+{
+	int i, err;
+	u16 lpa;
+
+	err = alx_read_phy_link(hw);
+	if (err)
+		return err;
+
+	if (hw->link_speed == SPEED_UNKNOWN) {
+		*speed = SPEED_UNKNOWN;
+		*duplex = DUPLEX_UNKNOWN;
+		return 0;
+	}
+
+	err = alx_read_phy_reg(hw, MII_LPA, &lpa);
+	if (err)
+		return err;
+
+	if (!(lpa & LPA_LPACK)) {
+		*speed = hw->link_speed;
+		return 0;
+	}
+
+	if (lpa & LPA_10FULL) {
+		*speed = SPEED_10;
+		*duplex = DUPLEX_FULL;
+	} else if (lpa & LPA_10HALF) {
+		*speed = SPEED_10;
+		*duplex = DUPLEX_HALF;
+	} else if (lpa & LPA_100FULL) {
+		*speed = SPEED_100;
+		*duplex = DUPLEX_FULL;
+	} else {
+		*speed = SPEED_100;
+		*duplex = DUPLEX_HALF;
+	}
+
+	if (*speed == hw->link_speed && *duplex == hw->duplex)
+		return 0;
+	err = alx_write_phy_reg(hw, ALX_MII_IER, 0);
+	if (err)
+		return err;
+	err = alx_setup_speed_duplex(hw, alx_speed_to_ethadv(*speed, *duplex) |
+					ADVERTISED_Autoneg, ALX_FC_ANEG |
+					ALX_FC_RX | ALX_FC_TX);
+	if (err)
+		return err;
+
+	/* wait for linkup */
+	for (i = 0; i < ALX_MAX_SETUP_LNK_CYCLE; i++) {
+		msleep(100);
+
+		err = alx_read_phy_link(hw);
+		if (err < 0)
+			return err;
+		if (hw->link_speed != SPEED_UNKNOWN)
+			break;
+	}
+	if (i == ALX_MAX_SETUP_LNK_CYCLE)
+		return -ETIMEDOUT;
+
+	return 0;
+}
 
 bool alx_get_phy_info(struct alx_hw *hw)
 {
diff --git a/drivers/net/ethernet/atheros/alx/hw.h b/drivers/net/ethernet/atheros/alx/hw.h
index e42d7e0..a7fb6c8 100644
--- a/drivers/net/ethernet/atheros/alx/hw.h
+++ b/drivers/net/ethernet/atheros/alx/hw.h
@@ -487,6 +487,8 @@ struct alx_hw {
 	u8 flowctrl;
 	u32 adv_cfg;
 
+	u32 sleep_ctrl;
+
 	spinlock_t mdio_lock;
 	struct mdio_if_info mdio;
 	u16 phy_id[2];
@@ -549,12 +551,14 @@ void alx_reset_pcie(struct alx_hw *hw);
 void alx_enable_aspm(struct alx_hw *hw, bool l0s_en, bool l1_en);
 int alx_setup_speed_duplex(struct alx_hw *hw, u32 ethadv, u8 flowctrl);
 void alx_post_phy_link(struct alx_hw *hw);
+int alx_pre_suspend(struct alx_hw *hw, int speed, u8 duplex);
 int alx_read_phy_reg(struct alx_hw *hw, u16 reg, u16 *phy_data);
 int alx_write_phy_reg(struct alx_hw *hw, u16 reg, u16 phy_data);
 int alx_read_phy_ext(struct alx_hw *hw, u8 dev, u16 reg, u16 *pdata);
 int alx_write_phy_ext(struct alx_hw *hw, u8 dev, u16 reg, u16 data);
 int alx_read_phy_link(struct alx_hw *hw);
 int alx_clear_phy_intr(struct alx_hw *hw);
+int alx_config_wol(struct alx_hw *hw);
 void alx_cfg_mac_flowcontrol(struct alx_hw *hw, u8 fc);
 void alx_start_mac(struct alx_hw *hw);
 int alx_reset_mac(struct alx_hw *hw);
@@ -563,6 +567,7 @@ bool alx_phy_configured(struct alx_hw *hw);
 void alx_configure_basic(struct alx_hw *hw);
 void alx_mask_msix(struct alx_hw *hw, int index, bool mask);
 void alx_disable_rss(struct alx_hw *hw);
+int alx_select_powersaving_speed(struct alx_hw *hw, int *speed, u8 *duplex);
 bool alx_get_phy_info(struct alx_hw *hw);
 void alx_update_hw_stats(struct alx_hw *hw);
 
diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c
index 567ee54..c0e2bb2 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -1070,6 +1070,7 @@ static int alx_init_sw(struct alx_priv *alx)
 	alx->dev->max_mtu = ALX_MAX_FRAME_LEN(ALX_MAX_FRAME_SIZE);
 	alx->tx_ringsz = 256;
 	alx->rx_ringsz = 512;
+	hw->sleep_ctrl = ALX_SLEEP_WOL_MAGIC | ALX_SLEEP_WOL_PHY;
 	hw->imt = 200;
 	alx->int_mask = ALX_ISR_MISC;
 	hw->dma_chnl = hw->max_dma_chnl;
@@ -1346,6 +1347,66 @@ static int alx_stop(struct net_device *netdev)
 	return 0;
 }
 
+static int __alx_shutdown(struct pci_dev *pdev, bool *wol_en)
+{
+	struct alx_priv *alx = pci_get_drvdata(pdev);
+	struct net_device *netdev = alx->dev;
+	struct alx_hw *hw = &alx->hw;
+	int err, speed;
+	u8 duplex;
+
+	netif_device_detach(netdev);
+
+	if (netif_running(netdev))
+		__alx_stop(alx);
+
+#ifdef CONFIG_PM_SLEEP
+	err = pci_save_state(pdev);
+	if (err)
+		return err;
+#endif
+
+	err = alx_select_powersaving_speed(hw, &speed, &duplex);
+	if (err)
+		return err;
+	err = alx_clear_phy_intr(hw);
+	if (err)
+		return err;
+	err = alx_pre_suspend(hw, speed, duplex);
+	if (err)
+		return err;
+	err = alx_config_wol(hw);
+	if (err)
+		return err;
+
+	*wol_en = false;
+	if (hw->sleep_ctrl & ALX_SLEEP_ACTIVE) {
+		netif_info(alx, wol, netdev,
+			   "wol: ctrl=%X, speed=%X\n",
+			   hw->sleep_ctrl, speed);
+		device_set_wakeup_enable(&pdev->dev, true);
+		*wol_en = true;
+	}
+
+	pci_disable_device(pdev);
+
+	return 0;
+}
+
+static void alx_shutdown(struct pci_dev *pdev)
+{
+	int err;
+	bool wol_en;
+
+	err = __alx_shutdown(pdev, &wol_en);
+	if (!err) {
+		pci_wake_from_d3(pdev, wol_en);
+		pci_set_power_state(pdev, PCI_D3hot);
+	} else {
+		dev_err(&pdev->dev, "shutdown fail %d\n", err);
+	}
+}
+
 static void alx_link_check(struct work_struct *work)
 {
 	struct alx_priv *alx;
@@ -1841,6 +1902,8 @@ static int alx_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto out_unmap;
 	}
 
+	device_set_wakeup_enable(&pdev->dev, hw->sleep_ctrl);
+
 	netdev_info(netdev,
 		    "Qualcomm Atheros AR816x/AR817x Ethernet [%pM]\n",
 		    netdev->dev_addr);
@@ -1883,12 +1946,22 @@ static void alx_remove(struct pci_dev *pdev)
 static int alx_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
-	struct alx_priv *alx = pci_get_drvdata(pdev);
+	int err;
+	bool wol_en;
+
+	err = __alx_shutdown(pdev, &wol_en);
+	if (err) {
+		dev_err(&pdev->dev, "shutdown fail in suspend %d\n", err);
+		return err;
+	}
+
+	if (wol_en) {
+		pci_prepare_to_sleep(pdev);
+	} else {
+		pci_wake_from_d3(pdev, false);
+		pci_set_power_state(pdev, PCI_D3hot);
+	}
 
-	if (!netif_running(alx->dev))
-		return 0;
-	netif_device_detach(alx->dev);
-	__alx_stop(alx);
 	return 0;
 }
 
@@ -1896,23 +1969,69 @@ static int alx_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct alx_priv *alx = pci_get_drvdata(pdev);
+	struct net_device *netdev = alx->dev;
 	struct alx_hw *hw = &alx->hw;
+	int err;
+
+	pci_set_power_state(pdev, PCI_D0);
+	pci_restore_state(pdev);
+	pci_save_state(pdev);
+
+	pci_enable_wake(pdev, PCI_D3hot, 0);
+	pci_enable_wake(pdev, PCI_D3cold, 0);
 
+	hw->link_speed = SPEED_UNKNOWN;
+	alx->int_mask = ALX_ISR_MISC;
+
+	alx_reset_pcie(hw);
 	alx_reset_phy(hw);
 
-	if (!netif_running(alx->dev))
-		return 0;
-	netif_device_attach(alx->dev);
-	return __alx_open(alx, true);
+	pci_set_power_state(pdev, PCI_D0);
+	pci_restore_state(pdev);
+	pci_save_state(pdev);
+
+	pci_enable_wake(pdev, PCI_D3hot, 0);
+	pci_enable_wake(pdev, PCI_D3cold, 0);
+
+	hw->link_speed = SPEED_UNKNOWN;
+	alx->int_mask = ALX_ISR_MISC;
+
+	alx_reset_pcie(hw);
+	alx_reset_phy(hw);
+
+	err = alx_reset_mac(hw);
+	if (err) {
+		netif_err(alx, hw, alx->dev,
+			  "resume:reset_mac fail %d\n", err);
+		return -EIO;
+	}
+
+	err = alx_setup_speed_duplex(hw, hw->adv_cfg, hw->flowctrl);
+	if (err) {
+		netif_err(alx, hw, alx->dev,
+			  "resume:setup_speed_duplex fail %d\n", err);
+		return -EIO;
+	}
+
+	if (netif_running(netdev)) {
+		err = __alx_open(alx, true);
+		if (err)
+			return err;
+	}
+
+	netif_device_attach(netdev);
+
+	return err;
 }
+#endif
 
+#ifdef CONFIG_PM_SLEEP
 static SIMPLE_DEV_PM_OPS(alx_pm_ops, alx_suspend, alx_resume);
 #define ALX_PM_OPS      (&alx_pm_ops)
 #else
 #define ALX_PM_OPS      NULL
 #endif
 
-
 static pci_ers_result_t alx_pci_error_detected(struct pci_dev *pdev,
 					       pci_channel_state_t state)
 {
@@ -1955,6 +2074,8 @@ static pci_ers_result_t alx_pci_error_slot_reset(struct pci_dev *pdev)
 	}
 
 	pci_set_master(pdev);
+	pci_enable_wake(pdev, PCI_D3hot, 0);
+	pci_enable_wake(pdev, PCI_D3cold, 0);
 
 	alx_reset_pcie(hw);
 	if (!alx_reset_mac(hw))
@@ -2011,6 +2132,7 @@ static struct pci_driver alx_driver = {
 	.id_table    = alx_pci_tbl,
 	.probe       = alx_probe,
 	.remove      = alx_remove,
+	.shutdown    = alx_shutdown,
 	.err_handler = &alx_err_handlers,
 	.driver.pm   = ALX_PM_OPS,
 };
-- 
2.7.4

^ permalink raw reply related

* Re: [iovisor-dev] Best userspace programming API for XDP features query to kernel?
From: Jesper Dangaard Brouer @ 2018-04-09 11:26 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Eric Leblond, Victor Julien, Peter Manev, oisf-devel,
	Jakub Kicinski, Alexei Starovoitov, iovisor-dev@lists.iovisor.org,
	Saeed Mahameed, Daniel Borkmann, netdev, davem, Jiri Benc, brouer
In-Reply-To: <4b850e02-0481-dd75-9341-3951aa76272e@iogearbox.net>

On Fri, 6 Apr 2018 12:36:18 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 04/05/2018 10:51 PM, Jesper Dangaard Brouer wrote:
> > On Thu, 5 Apr 2018 12:37:19 +0200
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> >   
> >> On 04/04/2018 02:28 PM, Jesper Dangaard Brouer via iovisor-dev wrote:  
> >>> Hi Suricata people,
> >>>
> >>> When Eric Leblond (and I helped) integrated XDP in Suricata, we ran
> >>> into the issue, that at Suricata load/start time, we cannot determine
> >>> if the chosen XDP config options, like xdp-cpu-redirect[1], is valid on
> >>> this HW (e.g require driver XDP_REDIRECT support and bpf cpumap).
> >>>
> >>> We would have liked a way to report that suricata.yaml config was
> >>> invalid for this hardware/setup.  Now, it just loads, and packets gets
> >>> silently dropped by XDP (well a WARN_ONCE and catchable via tracepoints).
> >>>
> >>> My question to suricata developers: (Q1) Do you already have code that
> >>> query the kernel or drivers for features?
> >>>
> >>> At the IOvisor call (2 weeks ago), we discussed two options of exposing
> >>> XDP features avail in a given driver.
> >>>
> >>> Option#1: Extend existing ethtool -k/-K "offload and other features"
> >>> with some XDP features, that userspace can query. (Do you already query
> >>> offloads, regarding Q1)
> >>>
> >>> Option#2: Invent a new 'ip link set xdp' netlink msg with a query option.    
> >>
> >> I don't really mind if you go via ethtool, as long as we handle this
> >> generically from there and e.g. call the dev's ndo_bpf handler such that
> >> we keep all the information in one place. This can be a new ndo_bpf command
> >> e.g. XDP_QUERY_FEATURES or such.  
> > 
> > Just to be clear: notice as Victor points out[2], they are programmable
> > going though the IOCTL (SIOCETHTOOL) and not using cmdline tools.  
> 
> Sure, that was perfectly clear. (But at the same time if you extend the
> ioctl, it's obvious to also add support to actual ethtool cmdline tool.)
> 
> > [2] https://github.com/OISF/suricata/blob/master/src/util-ioctl.c#L326
> > 
> > If you want everything to go through the drivers ndo_bpf call anyway
> > (which userspace API is netlink based) then at what point to you  
> 
> Not really, that's the front end. ndo_bpf itself is a plain netdev op
> and has no real tie to netlink.
> 
> > want drivers to call their own ndo_bpf, when activated though their
> > ethtool_ops ? (Sorry, but I don't follow the flow you are proposing)
> > 
> > Notice, I'm not directly against using the drivers ndo_bpf call.  I can
> > see it does provide kernel more flexibility than the ethtool IOCTL.  
> 
> What I was saying is that even if you go via ethtool ioctl api, where
> you end up in dev_ethtool() and have some new ETHTOOL_* query command,
> then instead of adding a new ethtool_ops callback, we can and should
> reuse ndo_bpf from there.

Okay, you want to catch this in dev_ethtool(), that connected the dots
for me, thanks.  BUT it does feel a little fake and confusing to do
this, as the driver developers seeing the info via ethtool, would
expect they need to implement an ethtool_ops callback.

My original idea behind going through the ethtool APIs, were that every
driver is already using this, and it's familiar to the driver
developers.  And there are existing userspace tools that sysadm's
already know, that we can leverage.

Thinking about this over the weekend.  I have changed my mind a bit,
based on your feedback. I do buy into your idea of forcing things
through the drivers ndo_bpf call.  I'm no-longer convinced this should
go through ethtool, but as (you desc) we can, if we find that this is
the easiest ways to export and leverage an existing userspace tool.



> [...]
> > Here, I want to discuss how drivers expose/tell userspace that they
> > support a given feature: Specifically a bit for: XDP_REDIRECT action
> > support.
> >   
> >> Same for meta data,  
> > 
> > Well, not really.  It would be a "nice-to-have", but not strictly
> > needed as a feature bit.  XDP meta-data is controlled via a helper.
> > And the BPF-prog can detect/see runtime, that the helper bpf_xdp_adjust_meta
> > returns -ENOTSUPP (and need to check the ret value anyhow).  Thus,
> > there is that not much gained by exposing this to be detected setup
> > time, as all drivers should eventually support this, and we can detect
> > it runtime.
> > 
> > The missing XDP_REDIRECT action features bit it different, as the
> > BPF-prog cannot detect runtime that this is an unsupported action.
> > Plus, setup time we cannot query the driver for supported XDP actions.  
> 
> Ok, so with the example of meta data, you're arguing that it's okay
> to load a native XDP program onto a driver, and run actual traffic on
> the NIC in order probe for the availability of the feature when you're
> saying that it "can detect/see [at] runtime". I totally agree with you
> that all drivers should eventually support this (same with XDP_REDIRECT),
> but today there are even differences in drivers on bpf_xdp_adjust_meta()/
> bpf_xdp_adjust_head() with regards to how much headroom they have available,
> etc (e.g. some of them have none), so right now you can either go and
> read the code or do a runtime test with running actual traffic through
> the NIC to check whether your BPF prog is supported or not. Theoretically,
> you can do the same runtime test with XDP_REDIRECT (taking the warn in
> bpf_warn_invalid_xdp_action() aside for a moment), but you do have the
> trace_xdp_exception() tracepoint to figure it out, yes, it's a painful
> hassle, but overall, it's not that different as you were trying to argue
> here. For /both/ cases it would be nice to know at setup time whether
> this would be supported or not. Hence, such query is not just limited to
> XDP_REDIRECT alone. Eventually once such interface is agreed upon,
> undoubtedly the list of feature bits will grow is what I'm trying to say;
> only arguing on the XDP_REDIRECT here would be short term.
> 
> [...]
> The underlying problem is effectively the decoupling of program
> verification that doesn't have/know the context of where it is being
> attached to in this case. 

Yes, exactly, that the underlying problem. (plus the first XDP prog
gets verified and driver attached, and subsequent added bpf tail calls,
cannot be "rejected" (at "driver attachment time") as its too late).

> Thinking out loud for a sec on couple of other options aside
> from feature bits, what about i) providing the target ifindex to the
> verifier for XDP programs, such that at verification time you have the
> full context similar to nfp offload case today, 

I do like that we could provide the target ifindex earlier.  But
userspace still need some way to express that it e.g. need the
XDP_REDIRECT features (as verifier cannot reliability detect the action
return codes used, as discussed before, due to bpf tail calls and maps
used as return values).

> or ii) populating some
> XDP specific auxillary data to the BPF program at verification time such
> that the driver can check at program attach time whether the requested
> features are possible and if not it will reject and respond with netlink
> extack message to the user (as we do in various XDP attach cases already
> through XDP_SETUP_PROG{,_HW}).

I like proposal ii) better.  But how do I specify/input that I need
e.g. the XDP_REDIRECT feature, such that is it avail to "the BPF
program at verification time"?


My proposal iii), what about at XDP attachment time, create a new
netlink attribute that describe XDP action codes the program
needs/wants. If the info is provided, the ndo_bpf call check and
reject, and respond with netlink extack message.
  If I want to query for avail action codes, then I can use the same
netlink attribute format, and kernel will return it "populated" with
the info.


It is very useful that the program gets rejected at attachment time (to
avoid loading an XDP prog that silently drops packets). BUT I also
want a query option/functionality (reuse netlink attr format).

Specifically for Suricata the same "bypass" features can be implemented
at different levels (XDP, XDP-generic or clsbpf), and the XDP program
contains multiple features.  Thus, depending on what NIC driver
supports, we want to load different XDP and/or clsbpf/TC BPF-programs.
Thus, still support the same user requested feature/functionality, even
if XDP_REDIRECT is not avail, just slightly slower.


> This would, for example, avoid the need for feature bits, and do actual
> rejection of the program while retaining flexibility (and avoiding to
> expose bits that over time hopefully will deprecate anyway due to all
> XDP aware drivers implementing them). For both cases i) and ii), it
> would mean we make the verifier a bit smarter with regards to keeping
> track of driver related (XDP) requirements. Program return code checking
> is already present in the verifier's check_return_code() and we could
> extend it for XDP as well, for example. Seems cleaner and more extensible
> than feature bits, imho.

I thought the verifier's check_return_code() couldn't see/verify if the
XDP return action code is provided as a map lookup result(?). How is
that handled?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH] netfilter: fix CONFIG_NF_REJECT_IPV6=m link error
From: Arnd Bergmann @ 2018-04-09 10:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
	David S. Miller
  Cc: Arnd Bergmann, netfilter-devel, coreteam, netdev, linux-kernel

We get a new link error with CONFIG_NFT_REJECT_INET=y and CONFIG_NF_REJECT_IPV6=m
after larger parts of the nftables modules are linked together:

net/netfilter/nft_reject_inet.o: In function `nft_reject_inet_eval':
nft_reject_inet.c:(.text+0x17c): undefined reference to `nf_send_unreach6'
nft_reject_inet.c:(.text+0x190): undefined reference to `nf_send_reset6'

The problem is that with NF_TABLES_INET set, we implicitly try to use
the ipv6 version as well for NFT_REJECT, but when CONFIG_IPV6 is set to
a loadable module, it's impossible to reach that.

The best workaround I found is to express the above as a Kconfig
dependency, forcing NFT_REJECT itself to be 'm' in that particular
configuration.

Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 net/netfilter/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 704b3832dbad..44d8a55e9721 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -594,6 +594,7 @@ config NFT_QUOTA
 config NFT_REJECT
 	default m if NETFILTER_ADVANCED=n
 	tristate "Netfilter nf_tables reject support"
+	depends on !NF_TABLES_INET || (IPV6!=m || m)
 	help
 	  This option adds the "reject" expression that you can use to
 	  explicitly deny and notify via TCP reset/ICMP informational errors
-- 
2.9.0

^ permalink raw reply related

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Markus Heiser @ 2018-04-09 10:52 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jonathan Corbet, Quentin Monnet, ast, netdev, oss-drivers,
	Linux Doc Mailing List, linux-man
In-Reply-To: <b2fbd09d-7ba2-9884-e18d-eac586af1665@iogearbox.net>


> Am 09.04.2018 um 12:08 schrieb Daniel Borkmann <daniel@iogearbox.net>:
[...]

>> May I completely misunderstood you, so correct my if I'am wrong:
>> 
>> - ./scripts/bpf_helpers_doc.py : produces reST markup from C-comments
>> - ./scripts/kerne-doc          : produces reST markup from C-comments
>> 
>> IMO: both are doing the same job, so why not using kernel-doc?
> 
> They are not really doing the same job, in bpf_helpers_doc.py case you don't
> want the whole header rendered, but just a fraction of it, that is, the
> single big comment which describes all BPF helper functions that a BPF
> program developer has available to use in user space - aka the entries in
> the __BPF_FUNC_MAPPER() macro;


> I also doubt the latter would actually qualify
> in kdoc context as some sort of a function description.

latter .. ah, OK .. thanks for clarifying. 

-- Markus --

^ permalink raw reply

* Re: [PATCH v2 2/2] af_key: Use DIV_ROUND_UP() instead of open-coded equivalent
From: Steffen Klassert @ 2018-04-09 10:34 UTC (permalink / raw)
  To: Kevin Easton; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <494dcebdc07530d6afbd7eda300da890a3e68e1c.1523115061.git.kevin@guarana.org>

On Sat, Apr 07, 2018 at 11:40:47AM -0400, Kevin Easton wrote:
> Several places use (x + 7) / 8 to convert from a number of bits to a number
> of bytes.  Replace those with DIV_ROUND_UP(x, 8) instead, for consistency
> with other parts of the same file.
> 
> Signed-off-by: Kevin Easton <kevin@guarana.org>

Please resubmit this one to ipsec-next after the
merge window. Thanks!

^ permalink raw reply

* Re: [PATCH v2 1/2] af_key: Always verify length of provided sadb_key
From: Steffen Klassert @ 2018-04-09 10:33 UTC (permalink / raw)
  To: Kevin Easton; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <de803d3d01f8b70d306c0d52569cc5265ed20bc9.1523115061.git.kevin@guarana.org>

On Sat, Apr 07, 2018 at 11:40:33AM -0400, Kevin Easton wrote:
> Key extensions (struct sadb_key) include a user-specified number of key
> bits.  The kernel uses that number to determine how much key data to copy
> out of the message in pfkey_msg2xfrm_state().
> 
> The length of the sadb_key message must be verified to be long enough,
> even in the case of SADB_X_AALG_NULL.  Furthermore, the sadb_key_len value
> must be long enough to include both the key data and the struct sadb_key
> itself.
> 
> Introduce a helper function verify_key_len(), and call it from
> parse_exthdrs() where other exthdr types are similarly checked for
> correctness.
> 
> Signed-off-by: Kevin Easton <kevin@guarana.org>
> Reported-by: syzbot+5022a34ca5a3d49b84223653fab632dfb7b4cf37@syzkaller.appspotmail.com

Applied to the ipsec tree, thanks Kevin!

^ permalink raw reply

* Re: [PATCH v2 0/2] af_key: Fix for sadb_key memcpy read overrun
From: Steffen Klassert @ 2018-04-09 10:32 UTC (permalink / raw)
  To: Kevin Easton; +Cc: Herbert Xu, David S. Miller, netdev, linux-kernel
In-Reply-To: <cover.1523115061.git.kevin@guarana.org>

On Sat, Apr 07, 2018 at 11:40:18AM -0400, Kevin Easton wrote:
> As found by syzbot, af_key does not properly validate the key length in
> sadb_key messages from userspace.  This can result in copying from beyond
> the end of the sadb_key part of the message, or indeed beyond the end of
> the entire packet.
> 
> Both these patches apply cleanly to ipsec-next.  Based on Steffen's
> feedback I have re-ordered them so that the fix only is in patch 1, which
> I would suggest is also a stable tree candidate, whereas patch 2 is a
> cleanup only.

I think here is some explanation needed. Usually bugfixes and cleanups
don't go to the same tree. On IPsec bugfixes go to the'ipsec' tree
while cleanups and new features go to the 'ipsec-next' tree. So
you need to split up your patchsets into patches that are targeted
to 'ipsec' and 'ipsec-next'. Aside from that, we are in 'merge window'
currently. This means that most maintainers don't accept patches to
their -next trees. If you have patches for a -next tree, wait until
the merge window is over (when v4.17-rc1 is released) and send them
then.

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Daniel Borkmann @ 2018-04-09 10:08 UTC (permalink / raw)
  To: Markus Heiser
  Cc: Jonathan Corbet, Quentin Monnet, ast, netdev, oss-drivers,
	Linux Doc Mailing List, linux-man
In-Reply-To: <C6491E01-3FC2-4AC4-8190-7B17CFAC01AD@darmarit.de>

On 04/09/2018 11:35 AM, Markus Heiser wrote:
> 
>> Am 09.04.2018 um 11:25 schrieb Daniel Borkmann <daniel@iogearbox.net>:
>>
>> On 04/09/2018 11:21 AM, Markus Heiser wrote:
>> [...]
>>> Do we really need another kernel-doc parser?
>>>
>>>  ./scripts/kernel-doc include/uapi/linux/bpf.h
>>>
>>> should already do the job (producing .rst). For more infos, take a look at
>>
>> This has absolutely zero to do with kernel-doc, but rather producing
>> a description of BPF helper function that are later assembled into an
>> actual man-page that BPF program developers (user space) can use.
> 
> May I completely misunderstood you, so correct my if I'am wrong:
> 
> - ./scripts/bpf_helpers_doc.py : produces reST markup from C-comments
> - ./scripts/kerne-doc          : produces reST markup from C-comments
> 
> IMO: both are doing the same job, so why not using kernel-doc?

They are not really doing the same job, in bpf_helpers_doc.py case you don't
want the whole header rendered, but just a fraction of it, that is, the
single big comment which describes all BPF helper functions that a BPF
program developer has available to use in user space - aka the entries in
the __BPF_FUNC_MAPPER() macro; I also doubt the latter would actually qualify
in kdoc context as some sort of a function description.

^ permalink raw reply

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
From: Daniel Borkmann @ 2018-04-09 10:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Yonghong Song, netdev; +Cc: kernel-team
In-Reply-To: <49c15114-c518-b591-470a-b4073a675588@fb.com>

On 04/09/2018 07:02 AM, Alexei Starovoitov wrote:
> On 4/8/18 9:53 PM, Yonghong Song wrote:
>>>> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog
>>>> *prog, bool do_idr_lock)
>>>>              bpf_prog_kallsyms_del(prog->aux->func[i]);
>>>>          bpf_prog_kallsyms_del(prog);
>>>>
>>>> -        call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>>>> +        synchronize_rcu();
>>>> +        __bpf_prog_put_rcu(&prog->aux->rcu);
>>>
>>> there should have been lockdep splat.
>>> We cannot call synchronize_rcu here, since we cannot sleep
>>> in some cases.
>>
>> Let me double check this. The following is the reason
>> why I am using synchronize_rcu().
>>
>> With call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu) and
>> _bpf_prog_put_rcu calls put_callchain_buffers() which
>> calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
>> will complains since potential sleep inside the call_rcu is not
>> allowed.
> 
> I see. Indeed. We cannot call put_callchain_buffers() from rcu callback,
> but doing synchronize_rcu() here is also not possible.
> How about moving put_callchain into bpf_prog_free_deferred() ?

+1, the assumption is that you can call bpf_prog_put() and also the
bpf_map_put() from any context. Sleeping here for a long time might
subtly break things badly.

^ permalink raw reply

* Re: [PATCH v4] net: thunderx: rework mac addresses list to u64 array
From: Vadim Lomovtsev @ 2018-04-09  9:42 UTC (permalink / raw)
  To: David Miller
  Cc: sgoutham, sunil.kovvuri, rric, linux-arm-kernel, netdev,
	linux-kernel, dnelson, robin.murphy, hch, gustavo,
	Vadim.Lomovtsev
In-Reply-To: <20180408.124200.2204489876311923873.davem@davemloft.net>

On Sun, Apr 08, 2018 at 12:42:00PM -0400, David Miller wrote:
> From: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
> Date: Fri,  6 Apr 2018 12:53:54 -0700
> 
> > @@ -1929,7 +1929,7 @@ static void nicvf_set_rx_mode_task(struct work_struct *work_arg)
> >  						  work.work);
> >  	struct nicvf *nic = container_of(vf_work, struct nicvf, rx_mode_work);
> >  	union nic_mbx mbx = {};
> > -	struct xcast_addr *xaddr, *next;
> > +	int idx = 0;
> 
> No need to initialize idx.
> 
> > +		for (idx = 0; idx < vf_work->mc->count; idx++) {
> 
> As it is always explicitly initialized at, and only used inside of,
> this loop.

Ok, will post next version shortly.
Thanks for your time.

Vadim

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Markus Heiser @ 2018-04-09  9:35 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jonathan Corbet, Quentin Monnet, ast, netdev, oss-drivers,
	Linux Doc Mailing List, linux-man
In-Reply-To: <81150b4e-79d5-f837-72bc-d988e87f842a@iogearbox.net>


> Am 09.04.2018 um 11:25 schrieb Daniel Borkmann <daniel@iogearbox.net>:
> 
> On 04/09/2018 11:21 AM, Markus Heiser wrote:
> [...]
>> Do we really need another kernel-doc parser?
>> 
>>  ./scripts/kernel-doc include/uapi/linux/bpf.h
>> 
>> should already do the job (producing .rst). For more infos, take a look at
> 
> This has absolutely zero to do with kernel-doc, but rather producing
> a description of BPF helper function that are later assembled into an
> actual man-page that BPF program developers (user space) can use.

May I completely misunderstood you, so correct my if I'am wrong:

- ./scripts/bpf_helpers_doc.py : produces reST markup from C-comments
- ./scripts/kerne-doc          : produces reST markup from C-comments

IMO: both are doing the same job, so why not using kernel-doc?

-- Markus --

^ permalink raw reply

* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Markus Heiser @ 2018-04-09  9:26 UTC (permalink / raw)
  To: Daniel Borkmann, Jonathan Corbet
  Cc: Quentin Monnet, ast, netdev, oss-drivers, Linux Doc Mailing List,
	linux-man
In-Reply-To: <05d2d03a-0b39-9d0c-9ba0-3461afc45fac@iogearbox.net>

On 04/06/2018 01:11 PM, Quentin Monnet wrote:
>> eBPF helper functions can be called from within eBPF programs to perform
>> a variety of tasks that would be otherwise hard or impossible to do with
>> eBPF itself. There is a growing number of such helper functions in the
>> kernel, but documentation is scarce. The main user space header file
>> does contain a short commented description of most helpers, but it is
>> somewhat outdated and not complete. It is more a "cheat sheet" than a
>> real documentation accessible to new eBPF developers.
>> 
>> This commit attempts to improve the situation by replacing the existing
>> overview for the helpers with a more developed description. Furthermore,
>> a Python script is added to generate a manual page for eBPF helpers. The
>> workflow is the following, and requires the rst2man utility:
>> 
>>    $ ./scripts/bpf_helpers_doc.py \
>>            --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
>>    $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
>>    $ man /tmp/bpf-helpers.7
>> 
>> The objective is to keep all documentation related to the helpers in a
>> single place,

Do we really need another kernel-doc parser?

  ./scripts/kernel-doc include/uapi/linux/bpf.h

should already do the job (producing .rst). For more infos, take a look at

  https://www.kernel.org/doc/html/latest/doc-guide/index.html

-- Markus --

PS: sorry for re-post, first post was HTML which is not accepted by ML :o

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox