Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/7] fix hnode refcounting
From: Al Viro @ 2018-09-08 15:03 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev, Cong Wang, Jiri Pirko, stable
In-Reply-To: <612ba054-370d-d118-b439-c68ea466eec9@mojatatu.com>

On Fri, Sep 07, 2018 at 08:13:56AM -0400, Jamal Hadi Salim wrote:
> > 	} else {
> >                  bool last;
> > 
> >                  err = tfilter_del_notify(net, skb, n, tp, block,
> >                                           q, parent, fh, false, &last,
> >                                           extack);
> > How can we ever get there with NULL fh?
> > 
> 
> Try:
> tc filter delete dev $P parent ffff: protocol ip prio 10 u32
> tcm handle is 0, so will hit that code path.

Huh?  It will hit tcf_proto_destroy() (and thus u32_destroy()), but where will
it hit u32_delete()?  Sure, we have fh == NULL there; what happens next is
                if (t->tcm_handle == 0) {
                        tcf_chain_tp_remove(chain, &chain_info, tp);    
                        tfilter_notify(net, skb, n, tp, block, q, parent, fh,
                                       RTM_DELTFILTER, false);
                        tcf_proto_destroy(tp, extack);
and that's it.  IDGI...  Direct experiment shows that on e.g.
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 match ip protocol 1 0xff
tc filter delete dev eth0 parent ffff: protocol ip prio 10 u32
we get u32_destroy() called, with u32_destroy_hnode() called by it,
but no u32_delete() is called at all, let alone with ht == NULL...

^ permalink raw reply

* Re: Why not use all the syn queues? in the function "tcp_conn_request", I have some questions.
From: Ttttabcd @ 2018-09-08 15:23 UTC (permalink / raw)
  To: Neal Cardwell; +Cc: Netdev
In-Reply-To: <vN3mecCgCus4YcjtQ_kmAF54GhPE5BuQBEmDmPauw6J-FAWXZfzrXiw15lD6p2TmQtEH_ZORkfbA7BYgu7XeBIAXVzto628SNysPZz2Ia9s=@protonmail.com>

Thank you very much for your previous answer, sorry for the inconvenience.

But now I want to ask you one more question.

The question is why we need two variables to control the syn queue?

The first is the "backlog" parameter of the "listen" system call that controls the maximum length limit of the syn queue, it also controls the accept queue.

The second is /proc/sys/net/ipv4/tcp_max_syn_backlog, which also controls the maximum length limit of the syn queue.

So simply changing one of them and wanting to increase the syn queue is not working.

In our last discussion, I understood tcp_max_syn_backlog will retain the last quarter to the IP that has been proven to be alive

But if tcp_max_syn_backlog is very large, the syn queue will be filled as well.

So I don't understand why not just use a variable to control the syn queue.

For example, just use tcp_max_syn_backlog, which is the maximum length limit for the syn queue, and it can also be retained to prove that the IP remains the last quarter.

The backlog parameter of the listen system call only controls the accpet queue.

I feel this is more reasonable. If I don't look at the source code, I really can't guess the backlog parameter actually controls the syn queue.

I always thought that it only controlled the accept queue before I looked at the source code, because the man page is written like this.

Here is the man page's original words.

The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog. When syncookies are enabled there is no logical maximum length and this setting is ignored. See tcp(7) for more information.

^ permalink raw reply

* Re: [PATCH net] powerpc: use big endian to hash len and proto in csum_ipv6_magic
From: LEROY Christophe @ 2018-09-08 16:33 UTC (permalink / raw)
  To: Xin Long; +Cc: Roopa Prabhu, Michael Ellerman, linuxppc-dev, network dev
In-Reply-To: <9183876a4a8ff0099686521d60f395a5230b67ed.1536401712.git.lucien.xin@gmail.com>

Xin Long <lucien.xin@gmail.com> a écrit :

> The function csum_ipv6_magic doesn't convert len and proto to big
> endian before doing ipv6 csum hash, which is not consistent with
> RFC and other arches.
>
> Jianlin found it when ICMPv6 packets from other hosts were dropped
> in the powerpc64 system.
>
> This patch is to fix it by using instruction 'lwbrx' to do this
> conversion in powerpc32/64 csum_ipv6_magic.
>
> Fixes: e9c4943a107b ("powerpc: Implement csum_ipv6_magic in assembly")
> Reported-by: Jianlin Shi <jishi@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  arch/powerpc/lib/checksum_32.S | 4 ++++
>  arch/powerpc/lib/checksum_64.S | 4 ++++
>  2 files changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
> index aa22406..7d3446e 100644
> --- a/arch/powerpc/lib/checksum_32.S
> +++ b/arch/powerpc/lib/checksum_32.S
> @@ -325,6 +325,10 @@ _GLOBAL(csum_ipv6_magic)
>  	adde	r0, r0, r9
>  	lwz	r11, 12(r4)
>  	adde	r0, r0, r10
> +	STWX_BE	r5, 0, r1
> +	lwz	r5, 0(r1)
> +	STWX_BE	r6, 0, r1
> +	lwz	r6, 0(r1)

PPC32 doesn't support little endian, so nothing to do here.

>  	add	r5, r5, r6	/* assumption: len + proto doesn't carry */
>  	adde	r0, r0, r11
>  	adde	r0, r0, r5
> diff --git a/arch/powerpc/lib/checksum_64.S b/arch/powerpc/lib/checksum_64.S
> index 886ed94..302e732 100644
> --- a/arch/powerpc/lib/checksum_64.S
> +++ b/arch/powerpc/lib/checksum_64.S
> @@ -439,6 +439,10 @@ EXPORT_SYMBOL(csum_partial_copy_generic)
>  _GLOBAL(csum_ipv6_magic)
>  	ld	r8, 0(r3)
>  	ld	r9, 8(r3)
> +	STWX_BE	r5, 0, r1
> +	lwz	r5, 0(r1)
> +	STWX_BE	r6, 0, r1
> +	lwz	r6, 0(r1)
>  	add	r5, r5, r6

This is overkill. For LE it should be enough to rotate r5 by 8 bits  
after the sum. Best place to do it would be after ld r11 I think.

Christophe

>  	addc	r0, r8, r9
>  	ld	r10, 0(r4)
> --
> 2.1.0

^ permalink raw reply

* [PATCH net-next] tcp: show number of network segments in some SNMP counters
From: Yafang Shao @ 2018-09-08 16:58 UTC (permalink / raw)
  To: edumazet, davem; +Cc: netdev, linux-kernel, Yafang Shao

It is better to show the number of network segments in bellow SNMP
counters, because that could be more useful for the user.
For example, the user could easily figure out how mant packets are
dropped and how many packets are queued in the out-of-oder queue.

- LINUX_MIB_TCPRCVQDROP
- LINUX_MIB_TCPZEROWINDOWDROP
- LINUX_MIB_TCPBACKLOGDROP
- LINUX_MIB_TCPMINTTLDROP
- LINUX_MIB_TCPOFODROP
- LINUX_MIB_TCPOFOQUEUE

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 net/ipv4/tcp_input.c | 18 ++++++++++++------
 net/ipv4/tcp_ipv4.c  |  9 ++++++---
 net/ipv6/tcp_ipv6.c  |  6 ++++--
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 62508a2..90f449b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4496,7 +4496,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 	tcp_ecn_check_ce(sk, skb);
 
 	if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) {
-		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFODROP);
+		NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPOFODROP,
+			      tcp_skb_pcount(skb));
 		tcp_drop(sk, skb);
 		return;
 	}
@@ -4505,7 +4506,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 	tp->pred_flags = 0;
 	inet_csk_schedule_ack(sk);
 
-	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
+	NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE,
+		      tcp_skb_pcount(skb));
 	seq = TCP_SKB_CB(skb)->seq;
 	end_seq = TCP_SKB_CB(skb)->end_seq;
 	SOCK_DEBUG(sk, "out of order segment: rcv_next %X seq %X - %X\n",
@@ -4666,7 +4668,8 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
 	skb->len = size;
 
 	if (tcp_try_rmem_schedule(sk, skb, skb->truesize)) {
-		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP);
+		NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP,
+			      tcp_skb_pcount(skb));
 		goto err_free;
 	}
 
@@ -4725,7 +4728,8 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 	 */
 	if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) {
 		if (tcp_receive_window(tp) == 0) {
-			NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP);
+			NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP,
+				      tcp_skb_pcount(skb));
 			goto out_of_window;
 		}
 
@@ -4734,7 +4738,8 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		if (skb_queue_len(&sk->sk_receive_queue) == 0)
 			sk_forced_mem_schedule(sk, skb->truesize);
 		else if (tcp_try_rmem_schedule(sk, skb, skb->truesize)) {
-			NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP);
+			NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP,
+				      tcp_skb_pcount(skb));
 			goto drop;
 		}
 
@@ -4796,7 +4801,8 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 		 * remembering D-SACK for its head made in previous line.
 		 */
 		if (!tcp_receive_window(tp)) {
-			NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP);
+			NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP,
+				      tcp_skb_pcount(skb));
 			goto out_of_window;
 		}
 		goto queue_and_out;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 09547ef..f2fe14b 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -475,7 +475,8 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 		goto out;
 
 	if (unlikely(iph->ttl < inet_sk(sk)->min_ttl)) {
-		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
+		__NET_ADD_STATS(net, LINUX_MIB_TCPMINTTLDROP,
+				tcp_skb_pcount(skb));
 		goto out;
 	}
 
@@ -1633,7 +1634,8 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
 
 	if (unlikely(sk_add_backlog(sk, skb, limit))) {
 		bh_unlock_sock(sk);
-		__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP);
+		__NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP,
+				tcp_skb_pcount(skb));
 		return true;
 	}
 	return false;
@@ -1790,7 +1792,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
 		}
 	}
 	if (unlikely(iph->ttl < inet_sk(sk)->min_ttl)) {
-		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
+		__NET_ADD_STATS(net, LINUX_MIB_TCPMINTTLDROP,
+				tcp_skb_pcount(skb));
 		goto discard_and_relse;
 	}
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 03e6b7a..97dfc16 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -391,7 +391,8 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
 		goto out;
 
 	if (ipv6_hdr(skb)->hop_limit < inet6_sk(sk)->min_hopcount) {
-		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
+		__NET_ADD_STATS(net, LINUX_MIB_TCPMINTTLDROP,
+				tcp_skb_pcount(skb));
 		goto out;
 	}
 
@@ -1523,7 +1524,8 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 		}
 	}
 	if (hdr->hop_limit < inet6_sk(sk)->min_hopcount) {
-		__NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
+		__NET_ADD_STATS(net, LINUX_MIB_TCPMINTTLDROP,
+				tcp_skb_pcount(skb));
 		goto discard_and_relse;
 	}
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] net/sock: move memory_allocated over to percpu_counter variables
From: Olof Johansson @ 2018-09-08 17:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, David Miller, Neil Horman, Marcelo Ricardo Leitner,
	Vladislav Yasevich, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	linux-crypto, LKML, linux-sctp, netdev, linux-decnet-user,
	kernel-team, Yuchung Cheng, Neal Cardwell
In-Reply-To: <CANn89iKgZkfwQ8nAGEfOzubOh69y285TNKB5Q518Wf_phbq2Yg@mail.gmail.com>

Hi,

On Fri, Sep 7, 2018 at 12:21 AM, Eric Dumazet <edumazet@google.com> wrote:
> On Fri, Sep 7, 2018 at 12:03 AM Eric Dumazet <edumazet@google.com> wrote:
>
>> Problem is : we have platforms with more than 100 cpus, and
>> sk_memory_allocated() cost will be too expensive,
>> especially if the host is under memory pressure, since all cpus will
>> touch their private counter.
>>
>> per cpu variables do not really scale, they were ok 10 years ago when
>> no more than 16 cpus were the norm.
>>
>> I would prefer change TCP to not aggressively call
>> __sk_mem_reduce_allocated() from tcp_write_timer()
>>
>> Ideally only tcp_retransmit_timer() should attempt to reduce forward
>> allocations, after recurring timeout.
>>
>> Note that after 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d ("net: avoid
>> sk_forward_alloc overflows")
>> we have better control over sockets having huge forward allocations.
>>
>> Something like :
>
> Or something less risky :

I gave both of these patches a run, and neither do as well on the
system that has slower atomics. :(

The percpu version:

     8.05%  workload         [kernel.vmlinux]
    [k] __do_softirq
     7.04%  swapper          [kernel.vmlinux]
    [k] cpuidle_enter_state
     5.54%  workload         [kernel.vmlinux]
    [k] _raw_spin_unlock_irqrestore
     1.66%  swapper          [kernel.vmlinux]
    [k] __do_softirq
     1.55%  workload         [kernel.vmlinux]
    [k] finish_task_switch
     1.24%  swapper          [kernel.vmlinux]
    [k] finish_task_switch
     1.07%  workload         [kernel.vmlinux]
    [k] net_rx_action

The first patch from you still has significant amount of time spent in
the atomics paths (non-inlined versions used):

     7.87%  workload         [kernel.vmlinux]
[k] __ll_sc_atomic64_sub
     7.48%  workload         [kernel.vmlinux]
[k] __do_softirq
     5.05%  workload         [kernel.vmlinux]
[k] _raw_spin_unlock_irqrestore
     2.42%  workload         [kernel.vmlinux]
[k] __ll_sc_atomic64_add_return
     1.49%  swapper          [kernel.vmlinux]
[k] cpuidle_enter_state
     1.31%  workload         [kernel.vmlinux]
[k] finish_task_switch
     1.09%  workload         [kernel.vmlinux]
[k] tcp_sendmsg_locked
     1.08%  workload         [kernel.vmlinux]
[k] __arch_copy_from_user
     1.02%  workload         [kernel.vmlinux]
[k] net_rx_action

I think a lot of the overhead from percpu approach can be alleviated
if we can use percpu_counter_read() instead of _sum() (i.e. no need to
iterate through the local per-cpu recent delta). I don't know the TCP
stack well enough to tell where it's OK to use a bit of slack in the
numbers though -- by default count will at most be off by 32*online
cpus. Might not be a significant number in reality.


-Olof

^ permalink raw reply

* Re: [PATCH net-next] net: sched: act_skbedit: remove dependency on rtnl lock
From: David Miller @ 2018-09-08 17:18 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri
In-Reply-To: <1535958435-6861-1-git-send-email-vladbu@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>
Date: Mon,  3 Sep 2018 10:07:15 +0300

> According to the new locking rule, we have to take tcf_lock for both
> ->init() and ->dump(), as RTNL will be removed.
> 
> Use tcf lock to protect skbedit action struct private data from concurrent
> modification in init and dump. Use rcu swap operation to reassign params
> pointer under protection of tcf lock. (old params value is not used by
> init, so there is no need of standalone rcu dereference step)
> 
> Remove rtnl lock assertion that is no longer required.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: sched: act_nat: remove dependency on rtnl lock
From: David Miller @ 2018-09-08 17:18 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri
In-Reply-To: <1535958560-6967-1-git-send-email-vladbu@mellanox.com>

From: Vlad Buslov <vladbu@mellanox.com>
Date: Mon,  3 Sep 2018 10:09:20 +0300

> According to the new locking rule, we have to take tcf_lock for both
> ->init() and ->dump(), as RTNL will be removed.
> 
> Use tcf spinlock to protect private nat action data from concurrent
> modification during dump. (nat init already uses tcf spinlock when changing
> action state)
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>

Also applied, thanks.

^ permalink raw reply

* Re: KASAN: use-after-free Read in __rhashtable_lookup (2)
From: syzbot @ 2018-09-08 22:07 UTC (permalink / raw)
  To: davem, dvyukov, linux-kernel, linux-rdma, netdev, rds-devel,
	santosh.shilimkar, sowmini.varadhan, syzkaller-bugs
In-Reply-To: <00000000000027a1e605741b2afa@google.com>

syzbot has found a reproducer for the following crash on:

HEAD commit:    d7b686ebf704 Merge branch 'i2c/for-current' of git://git.k..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1132d70a400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8f59875069d721b6
dashboard link: https://syzkaller.appspot.com/bug?extid=8967084bcac563795dc6
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10b67e49400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11119e49400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in memcmp+0xe3/0x160 lib/string.c:861
Read of size 1 at addr ffff8801ce73eb70 by task syz-executor383/11736

CPU: 1 PID: 11736 Comm: syz-executor383 Not tainted 4.19.0-rc2+ #228
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
  print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  memcmp+0xe3/0x160 lib/string.c:861
  memcmp include/linux/string.h:386 [inline]
  rhashtable_compare include/linux/rhashtable.h:462 [inline]
  __rhashtable_lookup.isra.8.constprop.20+0x73a/0xd00  
include/linux/rhashtable.h:484
  rhashtable_lookup include/linux/rhashtable.h:516 [inline]
  rhashtable_lookup_fast include/linux/rhashtable.h:542 [inline]
  rds_add_bound net/rds/bind.c:117 [inline]
  rds_bind+0x7d2/0x1520 net/rds/bind.c:238
  __sys_bind+0x331/0x440 net/socket.c:1481
  __do_sys_bind net/socket.c:1492 [inline]
  __se_sys_bind net/socket.c:1490 [inline]
  __x64_sys_bind+0x73/0xb0 net/socket.c:1490
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x444e29
Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 2b ce fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffec4b12988 EFLAGS: 00000217 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000444e29
RDX: 0000000000000010 RSI: 00000000200002c0 RDI: 0000000000000006
RBP: 0000000000000000 R08: 00000000004002e0 R09: 00000000004002e0
R10: 0000000000000004 R11: 0000000000000217 R12: 00000000000333a3
R13: 0000000000402170 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 11738:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
  kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
  kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
  sk_prot_alloc+0x69/0x2e0 net/core/sock.c:1462
  sk_alloc+0x10d/0x1690 net/core/sock.c:1522
  rds_create+0x14f/0x740 net/rds/af_rds.c:666
  __sock_create+0x536/0x930 net/socket.c:1275
  sock_create net/socket.c:1315 [inline]
  __sys_socket+0x106/0x260 net/socket.c:1345
  __do_sys_socket net/socket.c:1354 [inline]
  __se_sys_socket net/socket.c:1352 [inline]
  __x64_sys_socket+0x73/0xb0 net/socket.c:1352
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 11738:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kmem_cache_free+0x83/0x290 mm/slab.c:3756
  sk_prot_free net/core/sock.c:1503 [inline]
  __sk_destruct+0x766/0xbd0 net/core/sock.c:1587
  sk_destruct+0x78/0x90 net/core/sock.c:1595
  __sk_free+0xcf/0x300 net/core/sock.c:1606
  sk_free+0x42/0x50 net/core/sock.c:1617
  sock_put include/net/sock.h:1691 [inline]
  rds_release+0x3e8/0x570 net/rds/af_rds.c:91
  __sock_release+0xd7/0x250 net/socket.c:579
  sock_close+0x19/0x20 net/socket.c:1139
  __fput+0x385/0xa30 fs/file_table.c:278
  ____fput+0x15/0x20 fs/file_table.c:309
  task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
  tracehook_notify_resume include/linux/tracehook.h:193 [inline]
  exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801ce73e700
  which belongs to the cache RDS of size 1608
The buggy address is located 1136 bytes inside of
  1608-byte region [ffff8801ce73e700, ffff8801ce73ed48)
The buggy address belongs to the page:
page:ffffea000739cf80 count:1 mapcount:0 mapping:ffff8801cb1c0c40 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea00073a5348 ffffea000738ea48 ffff8801cb1c0c40
raw: 0000000000000000 ffff8801ce73e000 0000000100000002 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801ce73ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801ce73ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801ce73eb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                              ^
  ffff8801ce73eb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801ce73ec00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

^ permalink raw reply

* You can get rid of 50lbs in 4 weeks with this new Salts
From:  Eddie @ 2018-09-08 17:06 UTC (permalink / raw)
  To: netdev

+ THE DOCTORS - EXCLUSIVE + 

Posted on: Saturday September 8th, 2018
__________________________________________

The Doctors newest product is helping millions of people all over the world! 

The new-formula created by Travis Stork can help you remove over 1lb/ DAILY.   

Using keto salts, we throw your body into a state of ketosis automatically.  And keep it there, allowing your body to continuely burn fat and keep your metabolism in the on position.  

TRY IT NOW - REMOVE 2 LBS EVERY DAY
http://www.zotiz.host/Kkydxonjcoz/juowrcrhy11817tavmoter/WpfpVoEjJyWo9fbn5YWGS3cUQGnZVgVHnumMQ0Hr1hw/KNFXbKTbt6yJeuFZ797Wgjxla8VYbQpwSC9tYAP2FU5etF4BDliNtD5UOFdMCUJ8IptxL6MKleJK0-QHNXz8W1Lzf_kgTFnAb-VXqig8BgYUGnMdFZm-8IVQsu4OTY8x

Dozens shared their dramatic change with this product from The Doctors.  
http://www.zotiz.host/Kkydxonjcoz/juowrcrhy11817tavmoter/WpfpVoEjJyWo9fbn5YWGS3cUQGnZVgVHnumMQ0Hr1hw/KNFXbKTbt6yJeuFZ797Wgjxla8VYbQpwSC9tYAP2FU5etF4BDliNtD5UOFdMCUJ8IptxL6MKleJK0-QHNXz8W1Lzf_kgTFnAb-VXqig8BgYUGnMdFZm-8IVQsu4OTY8x

Daily Fact: Removing 2lbs daily has never been easier.  Thanks to The Doctors obesity can be destroyed!
http://www.zotiz.host/Kkydxonjcoz/juowrcrhy11800tavmoter/WpfpVoEjJyWo9fbn5YWGS3cUQGnZVgVHnumMQ0Hr1hw/KNFXbKTbt6yJeuFZ797Wgjxla8VYbQpwSC9tYAP2FU5etF4BDliNtD5UOFdMCUJ8IptxL6MKleJK0-QHNXz8W1Lzf_kgTFnAb-VXqig8BgYUGnMdFZm-8IVQsu4OTY8x

,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. ,_-. 

You can always stop below
http://www.zotiz.host/zddrbixf/x8YTO4usQVI8-mZFdMnGUYgB8giqXV-bAnFTgk_fzL1W8zXNHQ-0KJelKM6LxtpI8JUCMdFOU5DtNilDB4Fte5UF2PAYt9CSwpQbYV8alxjgW797ZFueJy6tbTKbXFNK.wh1rH0QMmunHVgVZnGQUc3SGWY5nbf9oWyJjEoVpfpW
8430 Oakland Lane Belleville, NJ 07109

Second option below
You can come here if not to stop any communication
http://www.zotiz.host/zddrbixf/x8YTO4usQVI8-mZFdMnGUYgB8giqXV-bAnFTgk_fzL1W8zXNHQ-0KJelKM6LxtpI8JUCMdFOU5DtNilDB4Fte5UF2PAYt9CSwpQbYV8alxjgW797ZFueJy6tbTKbXFNK.wh1rH0QMmunHVgVZnGQUc3SGWY5nbf9oWyJjEoVpfpW
8339 Ohio Court Norman, OK 73072

my figure be If to said it was taste a
curls Pumblechook-244946

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: Enable TC Ops for GMAC >= 4
From: David Miller @ 2018-09-08 17:26 UTC (permalink / raw)
  To: Jose.Abreu; +Cc: netdev, Joao.Pinto, peppe.cavallaro, alexandre.torgue
In-Reply-To: <37423859c349aececab23d1875c41325f816a8bf.1536236899.git.joabreu@synopsys.com>

From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Thu,  6 Sep 2018 13:29:30 +0100

> GMAC >= 4 also supports CBS. Lets enable the TC Ops for these versions.
> 
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>

Applied, thanks.

Please work to fix the performance regressions et al. that are still
unresolved in the driver.

This stmmac MAC is in such a varied number of chipsets that you
must be so extremely careful with every change you make to this
driver and I know that you cannot test a large portion of the
chips effected by your changes as you've stated as such in the
past.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] tcp: show number of network segments in some SNMP counters
From: Yafang Shao @ 2018-09-08 17:42 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev, LKML, Yafang Shao
In-Reply-To: <1536425898-12059-1-git-send-email-laoar.shao@gmail.com>

On Sun, Sep 9, 2018 at 12:58 AM, Yafang Shao <laoar.shao@gmail.com> wrote:
> It is better to show the number of network segments in bellow SNMP
> counters, because that could be more useful for the user.
> For example, the user could easily figure out how mant packets are
> dropped and how many packets are queued in the out-of-oder queue.
>
> - LINUX_MIB_TCPRCVQDROP
> - LINUX_MIB_TCPZEROWINDOWDROP
> - LINUX_MIB_TCPBACKLOGDROP
> - LINUX_MIB_TCPMINTTLDROP
> - LINUX_MIB_TCPOFODROP
> - LINUX_MIB_TCPOFOQUEUE
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  net/ipv4/tcp_input.c | 18 ++++++++++++------
>  net/ipv4/tcp_ipv4.c  |  9 ++++++---
>  net/ipv6/tcp_ipv6.c  |  6 ++++--
>  3 files changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 62508a2..90f449b 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4496,7 +4496,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
>         tcp_ecn_check_ce(sk, skb);
>
>         if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) {
> -               NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFODROP);
> +               NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPOFODROP,
> +                             tcp_skb_pcount(skb));
>                 tcp_drop(sk, skb);
>                 return;
>         }
> @@ -4505,7 +4506,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
>         tp->pred_flags = 0;
>         inet_csk_schedule_ack(sk);
>
> -       NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
> +       NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE,
> +                     tcp_skb_pcount(skb));
>         seq = TCP_SKB_CB(skb)->seq;
>         end_seq = TCP_SKB_CB(skb)->end_seq;
>         SOCK_DEBUG(sk, "out of order segment: rcv_next %X seq %X - %X\n",
> @@ -4666,7 +4668,8 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
>         skb->len = size;
>
>         if (tcp_try_rmem_schedule(sk, skb, skb->truesize)) {
> -               NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP);
> +               NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP,
> +                             tcp_skb_pcount(skb));
>                 goto err_free;
>         }
>
> @@ -4725,7 +4728,8 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
>          */
>         if (TCP_SKB_CB(skb)->seq == tp->rcv_nxt) {
>                 if (tcp_receive_window(tp) == 0) {
> -                       NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP);
> +                       NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP,
> +                                     tcp_skb_pcount(skb));
>                         goto out_of_window;
>                 }
>
> @@ -4734,7 +4738,8 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
>                 if (skb_queue_len(&sk->sk_receive_queue) == 0)
>                         sk_forced_mem_schedule(sk, skb->truesize);
>                 else if (tcp_try_rmem_schedule(sk, skb, skb->truesize)) {
> -                       NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP);
> +                       NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPRCVQDROP,
> +                                     tcp_skb_pcount(skb));
>                         goto drop;
>                 }
>
> @@ -4796,7 +4801,8 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
>                  * remembering D-SACK for its head made in previous line.
>                  */
>                 if (!tcp_receive_window(tp)) {
> -                       NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP);
> +                       NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP,
> +                                     tcp_skb_pcount(skb));
>                         goto out_of_window;
>                 }
>                 goto queue_and_out;
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 09547ef..f2fe14b 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -475,7 +475,8 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
>                 goto out;
>
>         if (unlikely(iph->ttl < inet_sk(sk)->min_ttl)) {
> -               __NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
> +               __NET_ADD_STATS(net, LINUX_MIB_TCPMINTTLDROP,
> +                               tcp_skb_pcount(skb));
>                 goto out;
>         }
>
> @@ -1633,7 +1634,8 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
>
>         if (unlikely(sk_add_backlog(sk, skb, limit))) {
>                 bh_unlock_sock(sk);
> -               __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP);
> +               __NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP,
> +                               tcp_skb_pcount(skb));
>                 return true;
>         }
>         return false;
> @@ -1790,7 +1792,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
>                 }
>         }
>         if (unlikely(iph->ttl < inet_sk(sk)->min_ttl)) {
> -               __NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
> +               __NET_ADD_STATS(net, LINUX_MIB_TCPMINTTLDROP,
> +                               tcp_skb_pcount(skb));
>                 goto discard_and_relse;
>         }
>
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 03e6b7a..97dfc16 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -391,7 +391,8 @@ static void tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
>                 goto out;
>
>         if (ipv6_hdr(skb)->hop_limit < inet6_sk(sk)->min_hopcount) {
> -               __NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
> +               __NET_ADD_STATS(net, LINUX_MIB_TCPMINTTLDROP,
> +                               tcp_skb_pcount(skb));
>                 goto out;
>         }
>
> @@ -1523,7 +1524,8 @@ static int tcp_v6_rcv(struct sk_buff *skb)
>                 }
>         }
>         if (hdr->hop_limit < inet6_sk(sk)->min_hopcount) {
> -               __NET_INC_STATS(net, LINUX_MIB_TCPMINTTLDROP);
> +               __NET_ADD_STATS(net, LINUX_MIB_TCPMINTTLDROP,
> +                               tcp_skb_pcount(skb));
>                 goto discard_and_relse;
>         }
>
> --
> 1.8.3.1
>

Seems it is not proper to use tcp_skb_pcount(skb). Will send V2.
Sorry about the noise.


Thanks
Yafang

^ permalink raw reply

* Re: Fixed PHYs and link up/down from user space ?
From: Florian Fainelli @ 2018-09-08 18:14 UTC (permalink / raw)
  To: Joakim Tjernlund, netdev@vger.kernel.org
In-Reply-To: <f06e484fc7d3e0b213c53d1a5dae90c787311768.camel@infinera.com>

On September 8, 2018 6:59:31 AM PDT, Joakim Tjernlund <Joakim.Tjernlund@infinera.com> wrote:
>I am looking for a way to set physical link state from user space for a
>Fixed PHY.
>Found the /sys/class/net/eth1/carrier I/F but that didn't work and I
>cannot find something else.

The carrier sysfs attribute is not writable by default but it can be made so by hooking a ndo_change_carrier() callback to your network device.

Fixed PHYs also offer the ability to poll a GPIO to determine the link state, or register a callback to update the link status based on an event (interrupt handler or otherwise). Note that attempting to change the carrier from user space and the fixed PHY being polled by the PHY state machine will likely both want to force the carrier, so you may have to register a fixed link status callback just to get them to agree.

>
>I want to make ifplugd/dhcp function as if there were a real cable
>there(or not)
> 
>
> Jocke

-- 
Florian

^ permalink raw reply

* Re: Why not use all the syn queues? in the function "tcp_conn_request", I have some questions.
From: Neal Cardwell @ 2018-09-08 18:24 UTC (permalink / raw)
  To: ttttabcd; +Cc: Netdev
In-Reply-To: <xtLQxGmTfE30khT4_p5QLLMtYtEMDzpenHusMD3NTAS3ikmWwsrKBHLsYdn1dYOp1na5jnapsUyxENcGDTbKbcgYHwYLYe7QBXeFLXGJznk=@protonmail.com>

On Sat, Sep 8, 2018 at 11:23 AM Ttttabcd <ttttabcd@protonmail.com> wrote:
>
> Thank you very much for your previous answer, sorry for the inconvenience.
>
> But now I want to ask you one more question.
>
> The question is why we need two variables to control the syn queue?
>
> The first is the "backlog" parameter of the "listen" system call that controls the maximum length limit of the syn queue, it also controls the accept queue.

By default, and essentially always in practice (AFAIK), Linux
installations enable syncookies. With syncookies, there is essentially
no limit on the syn queue, or number of incomplete passive connections
(as the man page you quoted notes). So in practice the listen()
parameter usually controls only the accept queue.

> The second is /proc/sys/net/ipv4/tcp_max_syn_backlog, which also controls the maximum length limit of the syn queue.
>
> So simply changing one of them and wanting to increase the syn queue is not working.
>
> In our last discussion, I understood tcp_max_syn_backlog will retain the last quarter to the IP that has been proven to be alive

That discussion pertains to a code path that is relevant if syncookies
are disabled, which is very uncommon (see above).

> But if tcp_max_syn_backlog is very large, the syn queue will be filled as well.
>
> So I don't understand why not just use a variable to control the syn queue.
>
> For example, just use tcp_max_syn_backlog, which is the maximum length limit for the syn queue, and it can also be retained to prove that the IP remains the last quarter.
>
> The backlog parameter of the listen system call only controls the accpet queue.
>
> I feel this is more reasonable. If I don't look at the source code, I really can't guess the backlog parameter actually controls the syn queue.
>
> I always thought that it only controlled the accept queue before I looked at the source code, because the man page is written like this.

Keep in mind that the semantics of the listen() argument and the
/proc/sys/net/ipv4/tcp_max_syn_backlog sysctl knob, as described in
the man page, are part of the Linux kernel's user-visible API. So, in
essence, they cannot be changed. Changing the semantics of system
calls and sysctl knobs breaks applications and system configuration
scripts. :-)

neal

^ permalink raw reply

* [PATCH net-next 00/15] SKB list handling cleanups
From: David Miller @ 2018-09-08 20:09 UTC (permalink / raw)
  To: netdev

This is a preparatory patch series which cleans up various forms of
sloppy SKB list handling, and makes certain semantics explicit.

We are trying to eliminate code that directly accesses the SKB
list and SKB queue head next/prev members in any way.  It is
impossible to convert SKB queue head over the struct list_head
while such code exists.

This patch series does not eliminate all such code, only the simplest
cases.  A latter series will tackle the complicated ones.

A helper is added to make the "skb->next == NULL means not on a list"
rule explicit, and another is added to combine this with list_del().

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH net-next 01/15] sch_htb: Remove local SKB queue handling code.
From: David Miller @ 2018-09-08 20:09 UTC (permalink / raw)
  To: netdev


Instead, adjust __qdisc_enqueue_tail() such that HTB can use it
instead.

The only other caller of __qdisc_enqueue_tail() is
qdisc_enqueue_tail() so we can move the backlog and return value
handling (which HTB doesn't need/want) to the latter.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/sch_generic.h | 11 +++++------
 net/sched/sch_htb.c       | 18 +-----------------
 2 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a6d00093f35e..bc8f6b0b6610 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -828,8 +828,8 @@ static inline void qdisc_skb_head_init(struct qdisc_skb_head *qh)
 	qh->qlen = 0;
 }
 
-static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
-				       struct qdisc_skb_head *qh)
+static inline void __qdisc_enqueue_tail(struct sk_buff *skb,
+					struct qdisc_skb_head *qh)
 {
 	struct sk_buff *last = qh->tail;
 
@@ -842,14 +842,13 @@ static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
 		qh->head = skb;
 	}
 	qh->qlen++;
-	qdisc_qstats_backlog_inc(sch, skb);
-
-	return NET_XMIT_SUCCESS;
 }
 
 static inline int qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch)
 {
-	return __qdisc_enqueue_tail(skb, sch, &sch->q);
+	__qdisc_enqueue_tail(skb, &sch->q);
+	qdisc_qstats_backlog_inc(sch, skb);
+	return NET_XMIT_SUCCESS;
 }
 
 static inline struct sk_buff *__qdisc_dequeue_head(struct qdisc_skb_head *qh)
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 43c4bfe625a9..cf23829cbede 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -577,22 +577,6 @@ static inline void htb_deactivate(struct htb_sched *q, struct htb_class *cl)
 	cl->prio_activity = 0;
 }
 
-static void htb_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
-			     struct qdisc_skb_head *qh)
-{
-	struct sk_buff *last = qh->tail;
-
-	if (last) {
-		skb->next = NULL;
-		last->next = skb;
-		qh->tail = skb;
-	} else {
-		qh->tail = skb;
-		qh->head = skb;
-	}
-	qh->qlen++;
-}
-
 static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		       struct sk_buff **to_free)
 {
@@ -603,7 +587,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 	if (cl == HTB_DIRECT) {
 		/* enqueue to helper queue */
 		if (q->direct_queue.qlen < q->direct_qlen) {
-			htb_enqueue_tail(skb, sch, &q->direct_queue);
+			__qdisc_enqueue_tail(skb, &q->direct_queue);
 			q->direct_pkts++;
 		} else {
 			return qdisc_drop(skb, sch, to_free);
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 02/15] sch_netem: Move private queue handler to generic location.
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


By hand copies of SKB list handlers do not belong in individual packet
schedulers.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/sch_generic.h | 11 +++++++++++
 net/sched/sch_netem.c     | 12 +-----------
 2 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index bc8f6b0b6610..fdaa5506e6f7 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -851,6 +851,17 @@ static inline int qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch)
 	return NET_XMIT_SUCCESS;
 }
 
+static inline void __qdisc_enqueue_head(struct sk_buff *skb,
+					struct qdisc_skb_head *qh)
+{
+	skb->next = qh->head;
+
+	if (!qh->head)
+		qh->tail = skb;
+	qh->head = skb;
+	qh->qlen++;
+}
+
 static inline struct sk_buff *__qdisc_dequeue_head(struct qdisc_skb_head *qh)
 {
 	struct sk_buff *skb = qh->head;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index ad18a2052416..b9541ce4d672 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -412,16 +412,6 @@ static struct sk_buff *netem_segment(struct sk_buff *skb, struct Qdisc *sch,
 	return segs;
 }
 
-static void netem_enqueue_skb_head(struct qdisc_skb_head *qh, struct sk_buff *skb)
-{
-	skb->next = qh->head;
-
-	if (!qh->head)
-		qh->tail = skb;
-	qh->head = skb;
-	qh->qlen++;
-}
-
 /*
  * Insert one skb into qdisc.
  * Note: parent depends on return value to account for queue length.
@@ -570,7 +560,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 		cb->time_to_send = ktime_get_ns();
 		q->counter = 0;
 
-		netem_enqueue_skb_head(&sch->q, skb);
+		__qdisc_enqueue_head(skb, &sch->q);
 		sch->qstats.requeues++;
 	}
 
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 03/15] infiniband: nes: Use skb_peek_next() and skb_queue_walk().
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


Instead of direct SKB list accesses.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/infiniband/hw/nes/nes_mgt.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_mgt.c b/drivers/infiniband/hw/nes/nes_mgt.c
index 9bdb84dc225c..e96ffff61c3a 100644
--- a/drivers/infiniband/hw/nes/nes_mgt.c
+++ b/drivers/infiniband/hw/nes/nes_mgt.c
@@ -198,9 +198,9 @@ static struct sk_buff *nes_get_next_skb(struct nes_device *nesdev, struct nes_qp
 
 	if (skb) {
 		/* Continue processing fpdu */
-		if (skb->next == (struct sk_buff *)&nesqp->pau_list)
+		skb = skb_peek_next(skb, &nesqp->pau_list);
+		if (!skb)
 			goto out;
-		skb = skb->next;
 		processacks = false;
 	} else {
 		/* Starting a new one */
@@ -553,12 +553,10 @@ static void queue_fpdus(struct sk_buff *skb, struct nes_vnic *nesvnic, struct ne
 	if (skb_queue_len(&nesqp->pau_list) == 0) {
 		skb_queue_head(&nesqp->pau_list, skb);
 	} else {
-		tmpskb = nesqp->pau_list.next;
-		while (tmpskb != (struct sk_buff *)&nesqp->pau_list) {
+		skb_queue_walk(&nesqp->pau_list, tmpskb) {
 			cb = (struct nes_rskb_cb *)&tmpskb->cb[0];
 			if (before(seqnum, cb->seqnum))
 				break;
-			tmpskb = tmpskb->next;
 		}
 		skb_insert(tmpskb, skb, &nesqp->pau_list);
 	}
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 04/15] ppp: Remove direct skb_queue_head list pointer access.
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


Add a helper, __skb_peek(), and use it in ppp_mp_reconstruct().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ppp/ppp_generic.c |  2 +-
 include/linux/skbuff.h        | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 02ad03a2fab7..500bc0027c1b 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2400,7 +2400,7 @@ ppp_mp_reconstruct(struct ppp *ppp)
 
 	if (ppp->mrru == 0)	/* do nothing until mrru is set */
 		return NULL;
-	head = list->next;
+	head = __skb_peek(list);
 	tail = NULL;
 	skb_queue_walk_safe(list, p, tmp) {
 	again:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 17a13e4785fc..89283b77294d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1592,6 +1592,17 @@ static inline struct sk_buff *skb_peek(const struct sk_buff_head *list_)
 	return skb;
 }
 
+/**
+ *	__skb_peek - peek at the head of a non-empty &sk_buff_head
+ *	@list_: list to peek at
+ *
+ *	Like skb_peek(), but the caller knows that the list is not empty.
+ */
+static inline struct sk_buff *__skb_peek(const struct sk_buff_head *list_)
+{
+	return list_->next;
+}
+
 /**
  *	skb_peek_next - peek skb following the given one from a queue
  *	@skb: skb to start from
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 05/15] mac80211: Don't access sk_queue_head->next directly.
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


Use __skb_peek() instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/mac80211/rx.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index c6bfd4019d44..a0ca27aeb732 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2077,6 +2077,7 @@ ieee80211_reassemble_find(struct ieee80211_sub_if_data *sdata,
 	idx = sdata->fragment_next;
 	for (i = 0; i < IEEE80211_FRAGMENT_MAX; i++) {
 		struct ieee80211_hdr *f_hdr;
+		struct sk_buff *f_skb;
 
 		idx--;
 		if (idx < 0)
@@ -2088,7 +2089,8 @@ ieee80211_reassemble_find(struct ieee80211_sub_if_data *sdata,
 		    entry->last_frag + 1 != frag)
 			continue;
 
-		f_hdr = (struct ieee80211_hdr *)entry->skb_list.next->data;
+		f_skb = __skb_peek(&entry->skb_list);
+		f_hdr = (struct ieee80211_hdr *) f_skb->data;
 
 		/*
 		 * Check ftype and addresses are equal, else check next fragment
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 06/15] lan78xx: Do not access skb_queue_head list pointers directly.
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev

Use skb_queue_walk() instead.

Adjust inner loop test to utilize and skb_queue_is_first().
Unfortunately we have to keep pkt_cnt around because it is
used by a latter loop in this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/usb/lan78xx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 331bc99d55e7..3ce3c66559e4 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3340,9 +3340,9 @@ static void lan78xx_tx_bh(struct lan78xx_net *dev)
 	count = 0;
 	length = 0;
 	spin_lock_irqsave(&tqp->lock, flags);
-	for (skb = tqp->next; pkt_cnt < tqp->qlen; skb = skb->next) {
+	skb_queue_walk(tqp, skb) {
 		if (skb_is_gso(skb)) {
-			if (pkt_cnt) {
+			if (!skb_queue_is_first(tqp, skb)) {
 				/* handle previous packets first */
 				break;
 			}
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 07/15] sctp: Use skb_queue_is_first().
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


Instead of direct skb_queue_head pointer accesses.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/ulpqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index 0b427100b0d4..331cc734e3db 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -459,7 +459,7 @@ static struct sctp_ulpevent *sctp_ulpq_retrieve_reassembled(struct sctp_ulpq *ul
 			 * element in the queue, then count it towards
 			 * possible PD.
 			 */
-			if (pos == ulpq->reasm.next) {
+			if (skb_queue_is_first(&ulpq->reasm, pos)) {
 			    pd_first = pos;
 			    pd_last = pos;
 			    pd_len = pos->len;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 08/15] p54: Use skb_peek_tail() instead of direct head pointer accesses.
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/wireless/intersil/p54/txrx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/intersil/p54/txrx.c b/drivers/net/wireless/intersil/p54/txrx.c
index 3a4214d362ff..790784568ad2 100644
--- a/drivers/net/wireless/intersil/p54/txrx.c
+++ b/drivers/net/wireless/intersil/p54/txrx.c
@@ -121,8 +121,8 @@ static int p54_assign_address(struct p54_common *priv, struct sk_buff *skb)
 	}
 	if (unlikely(!target_skb)) {
 		if (priv->rx_end - last_addr >= len) {
-			target_skb = priv->tx_queue.prev;
-			if (!skb_queue_empty(&priv->tx_queue)) {
+			target_skb = skb_peek_tail(&priv->tx_queue);
+			if (target_skb) {
 				info = IEEE80211_SKB_CB(target_skb);
 				range = (void *)info->rate_driver_data;
 				target_addr = range->end_addr;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 09/15] bnx2fc_fcoe: Use skb_queue_walk_safe().
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


Instead of direct list pointer accesses.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
index f00045813378..27c8d6ba05bb 100644
--- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
+++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
@@ -150,15 +150,11 @@ static void bnx2fc_clean_rx_queue(struct fc_lport *lp)
 	struct fcoe_rcv_info *fr;
 	struct sk_buff_head *list;
 	struct sk_buff *skb, *next;
-	struct sk_buff *head;
 
 	bg = &bnx2fc_global;
 	spin_lock_bh(&bg->fcoe_rx_list.lock);
 	list = &bg->fcoe_rx_list;
-	head = list->next;
-	for (skb = head; skb != (struct sk_buff *)list;
-	     skb = next) {
-		next = skb->next;
+	skb_queue_walk_safe(list, skb, next) {
 		fr = fcoe_dev_from_skb(skb);
 		if (fr->fr_dev == lp) {
 			__skb_unlink(skb, list);
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 10/15] staging: rtl8192e: Use __skb_peek().
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


Instead of direct list head pointer accesses.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/staging/rtl8192e/rtl8192e/rtl_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
index d2605158546b..96f265eee007 100644
--- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
+++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
@@ -1149,7 +1149,7 @@ static enum reset_type _rtl92e_tx_check_stuck(struct net_device *dev)
 		if (skb_queue_len(&ring->queue) == 0) {
 			continue;
 		} else {
-			skb = (&ring->queue)->next;
+			skb = __skb_peek(&ring->queue);
 			tcb_desc = (struct cb_desc *)(skb->cb +
 				    MAX_DEV_ADDR_SIZE);
 			tcb_desc->nStuckCount++;
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 11/15] brcmfmac: Use __skb_peek().
From: David Miller @ 2018-09-08 20:10 UTC (permalink / raw)
  To: netdev


Instead of direct SKB list pointer accesses.

In these situations, we absolutely know that the SKB queue in question
is non-empty.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c | 2 +-
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index d2f788d88668..3e37c8cf82c6 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
@@ -576,7 +576,7 @@ int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev *sdiodev,
 
 	if (pktq->qlen == 1)
 		err = brcmf_sdiod_skbuff_read(sdiodev, sdiodev->func2, addr,
-					      pktq->next);
+					      __skb_peek(pktq));
 	else if (!sdiodev->sg_support) {
 		glom_skb = brcmu_pkt_buf_get_skb(totlen);
 		if (!glom_skb)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index a907d7b065fa..1e2fd289323a 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -2189,7 +2189,7 @@ brcmf_sdio_txpkt_prep(struct brcmf_sdio *bus, struct sk_buff_head *pktq,
 	 * length of the chain (including padding)
 	 */
 	if (bus->txglom)
-		brcmf_sdio_update_hwhdr(pktq->next->data, total_len);
+		brcmf_sdio_update_hwhdr(__skb_peek(pktq)->data, total_len);
 	return 0;
 }
 
-- 
2.17.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox