Netdev List
 help / color / mirror / Atom feed
* linux-next: build failure after merge of the bpf tree
From: Stephen Rothwell @ 2018-04-16  2:30 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	John Fastabend

[-- Attachment #1: Type: text/plain, Size: 2927 bytes --]

Hi all,

After merging the bpf tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:

kernel/bpf/core.o: In function `sock_map_release':
core.c:(.text+0xd04): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
kernel/events/core.o: In function `sock_map_release':
core.c:(.text+0x85cc): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
block/blk-core.o: In function `sock_map_release':
blk-core.c:(.text+0x58e8): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
drivers/net/virtio_net.o: In function `sock_map_release':
virtio_net.c:(.text+0x53ec): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/core/dev.o: In function `sock_map_release':
dev.c:(.text+0x6c68): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/core/rtnetlink.o: In function `sock_map_release':
rtnetlink.c:(.text+0x63e0): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/core/filter.o: In function `sock_map_release':
filter.c:(.text+0x8c8c): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/core/sock_reuseport.o: In function `sock_map_release':
sock_reuseport.c:(.text+0x398): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/bpf/test_run.o: In function `sock_map_release':
test_run.c:(.text+0x3dc): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/packet/af_packet.o: In function `sock_map_release':
af_packet.c:(.text+0x6958): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here

Caused by commit

  9b2e8bbc4e7a ("bpf: sockmap, map_release does not hold refcnt for pinned maps")

I applied the following patch for today:

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 16 Apr 2018 12:27:24 +1000
Subject: [PATCH] fix for "bpf: sockmap, map_release does not hold refcnt for
 pinned maps"

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 include/linux/bpf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f46561de5154..3b6c2b66f414 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -660,7 +660,7 @@ static inline int sock_map_prog(struct bpf_map *map,
 	return -EOPNOTSUPP;
 }
 
-void sock_map_release(struct bpf_map *map) {}
+static inline void sock_map_release(struct bpf_map *map) {}
 #endif
 
 /* verifier prototypes for helper functions called from eBPF programs */
-- 
2.16.3

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related

* Re: tcp hang when socket fills up ?
From: Eric Dumazet @ 2018-04-16  2:26 UTC (permalink / raw)
  To: Dominique Martinet, Eric Dumazet; +Cc: Michal Kubecek, netdev
In-Reply-To: <20180416014740.GA12245@nautica>



On 04/15/2018 06:47 PM, Dominique Martinet wrote:

> Also, here are the per-socket stats I could find (ss -i after having
> reproduced hang):
> 	 reno wscale:7,7 rto:7456 backoff:5 rtt:32.924/1.41 ato:40 mss:1374
> 	 pmtu:1500 rcvmss:1248 advmss:1448 cwnd:1 ssthresh:16
> 	 bytes_acked:32004 bytes_received:4189 segs_out:85 segs_in:54
> 	 data_segs_out:78 data_segs_in:18 send 333.9Kbps lastsnd:3912
> 	 lastrcv:11464 lastack:11387 pacing_rate 21.4Mbps delivery_rate
> 	 3.5Mbps busy:12188ms unacked:33 retrans:1/5 lost:33 rcv_rtt:37
> 	 rcv_space:29200 rcv_ssthresh:39184 notsent:28796 minrtt:24.986
> 

ss -temoi might give us more info

Really it looks like at some point, all incoming packets are shown by tcpdump but do not reach the TCP socket anymore.

(segs_in: might be steady, look at the d0 counter shown by ss -temoi  (dX : drop counters, sk->sk_drops)


Are you sure you do not have some iptables/netfilter stuff ?

While running your experiment, try on the server.

perf record -a -g -e skb:kfree_skb  sleep 30
perf report

^ permalink raw reply

* [PATCH net] net: Fix one possible memleak in ip_setup_cork
From: gfree.wind @ 2018-04-16  2:16 UTC (permalink / raw)
  To: davem, kuznet, netdev; +Cc: Gao Feng
In-Reply-To: <1523845005-6353-1-git-send-email-gfree.wind@vip.163.com>

From: Gao Feng <gfree.wind@vip.163.com>

It would allocate memory in this function when the cork->opt is NULL. But
the memory isn't freed if failed in the latter rt check, and return error
directly. It causes the memleak if its caller is ip_make_skb which also
doesn't free the cork->opt when meet a error.

Now move the rt check ahead to avoid the memleak.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
---
 net/ipv4/ip_output.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4c11b81..83c73ba 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1109,6 +1109,10 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	struct ip_options_rcu *opt;
 	struct rtable *rt;
 
+	rt = *rtp;
+	if (unlikely(!rt))
+		return -EFAULT;
+
 	/*
 	 * setup for corking.
 	 */
@@ -1124,9 +1128,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 		cork->flags |= IPCORK_OPT;
 		cork->addr = ipc->addr;
 	}
-	rt = *rtp;
-	if (unlikely(!rt))
-		return -EFAULT;
+
 	/*
 	 * We steal reference to this route, caller should not release it
 	 */
-- 
1.9.1

^ permalink raw reply related

* [PATCH net] net: Fix one possible memleak in ip_setup_cork
From: gfree.wind @ 2018-04-16  2:16 UTC (permalink / raw)
  To: davem, kuznet, netdev; +Cc: Gao Feng

From: Gao Feng <gfree.wind@vip.163.com>

It would allocate memory in this function when the cork->opt is NULL. But
the memory isn't freed if failed in the latter rt check, and return error
directly. It causes the memleak if its caller is ip_make_skb which also
doesn't free the cork->opt when meet a error.

Now move the rt check ahead to avoid the memleak.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
---
 net/ipv4/ip_output.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4c11b81..83c73ba 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1109,6 +1109,10 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	struct ip_options_rcu *opt;
 	struct rtable *rt;
 
+	rt = *rtp;
+	if (unlikely(!rt))
+		return -EFAULT;
+
 	/*
 	 * setup for corking.
 	 */
@@ -1124,9 +1128,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 		cork->flags |= IPCORK_OPT;
 		cork->addr = ipc->addr;
 	}
-	rt = *rtp;
-	if (unlikely(!rt))
-		return -EFAULT;
+
 	/*
 	 * We steal reference to this route, caller should not release it
 	 */
-- 
1.9.1

^ permalink raw reply related

* Re: tcp hang when socket fills up ?
From: Dominique Martinet @ 2018-04-16  1:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michal Kubecek, netdev
In-Reply-To: <20180414015515.GA24798@nautica>

Eric Dumazet wrote on Fri, Apr 13, 2018:
> That might be caused by some TS val/ecr breakage :
> 
> Many acks were received by the server tcpdump,
> but none of them was accepted by TCP stack, for some reason.
> 
> Try to disable TCP timestamps, it will give some hint if bug does not reproduce.

This was spot on, after disabling tcp timestamps I cannot reproduce the
hang anymore.

I've had another look at the original sequence (as seen by the server)
and I don't see much wrong; tell me what I missed:
 - the replayed packet has seq 32004:33378, so the first ignored ack
would be the one with ack 33378, is that right? (meaning the server did
accept the one for 32004 and none after that)

Assuming it is, excerpt from around then (first emission of that packet then
client replies):
16:49:26.700531 IP <server>.13317 > <client>.31872: Flags [.], seq 32004:33378, ack 4190, win 307, options [nop,nop,TS val 1313937607 ecr 1617129440], length 1374
...
16:49:26.728084 IP <client>.31872 > <server>.13317: Flags [.], ack 32004, win 759, options [nop,nop,TS val 1617129473 ecr 1313937602], length 0
...
16:49:26.729531 IP <client>.31872 > <server>.13317: Flags [.], ack 33378, win 782, options [nop,nop,TS val 1617129475 ecr 1313937607], length 0
16:49:26.730002 IP <client>.31872 > <server>.13317: Flags [.], ack 34752, win 805, options [nop,nop,TS val 1617129475 ecr 1313937607], length 0
...
16:49:26.731634 IP <client>.31872 > <server>.13317: Flags [.], ack 36126, win 827, options [nop,nop,TS val 1617129476 ecr 1313937607], length 0


 - the ecr value matches the val of the packet it acks
 - the val is >= that of previous packet (won't be considered
reorder/should pass paws check?)
 - even if the packets are processed in parallel and some kind of race
occurs, a "bigger" ack should ack all the previous packets, right?

 - Just to make sure, I checked /proc/net/netstat for PAWSEstab but that
is 0:
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPHPHits TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPSlowStartRetrans TCPTimeouts TCPLossProbes TCPLossProbeRecovery TCPRenoRecoveryFail TCPSackRecoveryFail TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger T
 CPAbortFailed TCPMemoryPressures TCPMemoryPressuresChrono TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPMD5Failure TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop PFMemallocDrop TCPMinTTLDrop TCPDeferAcceptDrop IPReversePathFilter TCPTimeWaitOverflow TCPReqQFullDoCookies TCPReqQFullDrop TCPRetransFail TCPRcvCoalesce TCPOFOQueue TCPOFODrop TCPOFOMerge TCPChallengeACK TCPSYNChallenge TCPFastOpenActive TCPFastOpenActiveFail TCPFastOpenPassive TCPFastOpenPassiveFail TCPFastOpenListenOverflow TCPFastOpenCookieReqd TCPFastOpenBlackhole TCPSpuriousRtxHostQueues BusyPollRxPackets TCPAutoCorking TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv TCPSynRetrans TCPOrigDataSent TCPHystartTrainDetect TCPHystartTra
 inCwnd TCPHystartDelayDetect TCPHystartDelayCwnd TCPACKSkippedSynRecv TCPACKSkippedPAWS TCPACKSkippedSeq TCPACKSkippedFinWait2 TCPACKSkippedTimeWait TCPACKSkippedChallenge TCPWinProbe TCPKeepAlive TCPMTUPFail TCPMTUPSuccess
TcpExt: 0 0 0 0 58 0 0 26 0 0 50 0 0 0 0 75402 17 201 0 0 6876848 59804 2258387 0 33 0 0 3 0 0 0 0 1 102 15 0 0 0 1306 60 386 292 10 0 0 108750 201 1 228 1 8 4 0 63 0 3 0 0 0 0 107 1 0 0 0 2834 1962 622 0 0 0 0 0 0 0 0 0 1065022 54160 0 1 3 3 0 0 0 0 0 0 0 475 0 9578 6 8 71 257 5116325 0 0 0 0 0 0 6 0 0 0 61 85 0 0
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets InCsumErrors InNoECTPkts InECT1Pkts InECT0Pkts InCEPkts
IpExt: 0 0 0 0 206 0 16405602866 8921427728 0 0 57928 0 0 16121410 0 5388 0

Also, here are the per-socket stats I could find (ss -i after having
reproduced hang):
	 reno wscale:7,7 rto:7456 backoff:5 rtt:32.924/1.41 ato:40 mss:1374
	 pmtu:1500 rcvmss:1248 advmss:1448 cwnd:1 ssthresh:16
	 bytes_acked:32004 bytes_received:4189 segs_out:85 segs_in:54
	 data_segs_out:78 data_segs_in:18 send 333.9Kbps lastsnd:3912
	 lastrcv:11464 lastack:11387 pacing_rate 21.4Mbps delivery_rate
	 3.5Mbps busy:12188ms unacked:33 retrans:1/5 lost:33 rcv_rtt:37
	 rcv_space:29200 rcv_ssthresh:39184 notsent:28796 minrtt:24.986


Here are the same stats with tcp timestamp disabled (after running my
reproducer, e.g. outputing a big chunk of text quickly):
	 reno wscale:7,7 rto:228 rtt:27.267/1.423 ato:40 mss:1386 pmtu:1500
	 rcvmss:1248 advmss:1460 cwnd:10 ssthresh:12 bytes_acked:17311070
	 bytes_received:40445 segs_out:13331 segs_in:7523
	 data_segs_out:13279 data_segs_in:947 send 4.1Mbps lastsnd:5
	 lastrcv:5 lastack:5 pacing_rate 4.9Mbps delivery_rate 431.4Kbps
	 app_limited busy:36064ms unacked:1 retrans:0/6 rcv_rtt:9112.95
	 rcv_space:29233 rcv_ssthresh:39184 minrtt:25.566

So I guess lost:33 matching unacked:33 might be another hint?


I'll need a bit more time reading the code to understand what this all
implies ; feel free to beat me to it.

Thanks,
-- 
Dominique Martinet | Asmadeus

^ permalink raw reply

* [PATCH 1/1] net/mlx4_core: avoid resetting HCA when accessing an offline device
From: Zhu Yanjun @ 2018-04-16  1:02 UTC (permalink / raw)
  To: tariqt, netdev, linux-rdma, haakon.bugge

While a faulty cable is used or HCA firmware error, HCA device will
be offline. When the driver is accessing this offline device, the
following call trace will pop out.

"
...
  [<ffffffff816e4842>] dump_stack+0x63/0x81
  [<ffffffff816e459e>] panic+0xcc/0x21b
  [<ffffffffa03e5f8a>] mlx4_enter_error_state+0xba/0xf0 [mlx4_core]
  [<ffffffffa03e7298>] mlx4_cmd_reset_flow+0x38/0x60 [mlx4_core]
  [<ffffffffa03e7381>] mlx4_cmd_poll+0xc1/0x2e0 [mlx4_core]
  [<ffffffffa03e9f00>] __mlx4_cmd+0xb0/0x160 [mlx4_core]
  [<ffffffffa0406934>] mlx4_SENSE_PORT+0x54/0xd0 [mlx4_core]
  [<ffffffffa03f5f54>] mlx4_dev_cap+0x4a4/0xb50 [mlx4_core]
...
"
In the above call trace, the function mlx4_cmd_poll calls the function
mlx4_cmd_post to access the HCA while HCA is offline. Then mlx4_cmd_post
returns an error -EIO. Per -EIO, the function mlx4_cmd_poll calls
mlx4_cmd_reset_flow to reset HCA. And the above call trace pops out.

This is not reasonable. Since HCA device is offline when it is being
accessed, it should not be reset again.

In this patch, since HCA is offline, the function mlx4_cmd_post returns
an error -EINVAL. Per -EINVAL, the function mlx4_cmd_poll directly returns
instead of resetting HCA.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 6a9086d..f1c8c42 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -451,6 +451,8 @@ static int mlx4_cmd_post(struct mlx4_dev *dev, u64 in_param, u64 out_param,
 		 * Device is going through error recovery
 		 * and cannot accept commands.
 		 */
+		mlx4_err(dev, "%s : Device is in error recovery.\n", __func__);
+		ret = -EINVAL;
 		goto out;
 	}
 
@@ -657,6 +659,9 @@ static int mlx4_cmd_poll(struct mlx4_dev *dev, u64 in_param, u64 *out_param,
 	}
 
 out_reset:
+	if (err == -EINVAL)
+		goto out;
+
 	if (err)
 		err = mlx4_cmd_reset_flow(dev, op, op_modifier, err);
 out:
@@ -766,6 +771,9 @@ static int mlx4_cmd_wait(struct mlx4_dev *dev, u64 in_param, u64 *out_param,
 		*out_param = context->out_param;
 
 out_reset:
+	if (err == -EINVAL)
+		goto out;
+
 	if (err)
 		err = mlx4_cmd_reset_flow(dev, op, op_modifier, err);
 out:
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] ibmvnic: Clear pending interrupt after device reset
From: David Miller @ 2018-04-16  0:55 UTC (permalink / raw)
  To: tlfalcon; +Cc: netdev, linuxppc-dev, jallen, nfont, benh
In-Reply-To: <1523836416-16531-1-git-send-email-tlfalcon@linux.vnet.ibm.com>

From: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Date: Sun, 15 Apr 2018 18:53:36 -0500

> Due to a firmware bug, the hypervisor can send an interrupt to a
> transmit or receive queue just prior to a partition migration, not
> allowing the device enough time to handle it and send an EOI. When
> the partition migrates, the interrupt is lost but an "EOI-pending"
> flag for the interrupt line is still set in firmware. No further
> interrupts will be sent until that flag is cleared, effectively
> freezing that queue. To workaround this, the driver will disable the
> hardware interrupt and send an H_EOI signal prior to re-enabling it.
> This will flush the pending EOI and allow the driver to continue
> operation.
> 
> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>

Hey Thomas, I see two copies of this patch posted.  Any special
reason for that?

Thanks.

^ permalink raw reply

* [PATCH net] net: af_packet: fix race in PACKET_{R|T}X_RING
From: Eric Dumazet @ 2018-04-16  0:52 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

In order to remove the race caught by syzbot [1], we need
to lock the socket before using po->tp_version as this could
change under us otherwise.

This means lock_sock() and release_sock() must be done by
packet_set_ring() callers.

[1] :
BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:53
 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
 packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
 packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662
 SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
 SyS_setsockopt+0x76/0xa0 net/socket.c:1828
 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x449099
RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099
RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003
RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000
R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001

Local variable description: ----req_u@packet_setsockopt
Variable was created at:
 packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612
 SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849

Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/packet/af_packet.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 616cb9c18f88edd759dfb461051670c225978afa..c31b0687396a6ef45413f06efcc7c3f923e91d01 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3008,6 +3008,7 @@ static int packet_release(struct socket *sock)
 
 	packet_flush_mclist(sk);
 
+	lock_sock(sk);
 	if (po->rx_ring.pg_vec) {
 		memset(&req_u, 0, sizeof(req_u));
 		packet_set_ring(sk, &req_u, 1, 0);
@@ -3017,6 +3018,7 @@ static int packet_release(struct socket *sock)
 		memset(&req_u, 0, sizeof(req_u));
 		packet_set_ring(sk, &req_u, 1, 1);
 	}
+	release_sock(sk);
 
 	f = fanout_release(sk);
 
@@ -3643,6 +3645,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
 		union tpacket_req_u req_u;
 		int len;
 
+		lock_sock(sk);
 		switch (po->tp_version) {
 		case TPACKET_V1:
 		case TPACKET_V2:
@@ -3653,12 +3656,17 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
 			len = sizeof(req_u.req3);
 			break;
 		}
-		if (optlen < len)
-			return -EINVAL;
-		if (copy_from_user(&req_u.req, optval, len))
-			return -EFAULT;
-		return packet_set_ring(sk, &req_u, 0,
-			optname == PACKET_TX_RING);
+		if (optlen < len) {
+			ret = -EINVAL;
+		} else {
+			if (copy_from_user(&req_u.req, optval, len))
+				ret = -EFAULT;
+			else
+				ret = packet_set_ring(sk, &req_u, 0,
+						    optname == PACKET_TX_RING);
+		}
+		release_sock(sk);
+		return ret;
 	}
 	case PACKET_COPY_THRESH:
 	{
@@ -4208,8 +4216,6 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 	/* Added to avoid minimal code churn */
 	struct tpacket_req *req = &req_u->req;
 
-	lock_sock(sk);
-
 	rb = tx_ring ? &po->tx_ring : &po->rx_ring;
 	rb_queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
 
@@ -4347,7 +4353,6 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 	if (pg_vec)
 		free_pg_vec(pg_vec, order, req->tp_block_nr);
 out:
-	release_sock(sk);
 	return err;
 }
 
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* Re: [PATCH] filter.txt: update 'tools/net/' to 'tools/bpf/'
From: David Miller @ 2018-04-16  0:45 UTC (permalink / raw)
  To: shhuiw; +Cc: ast, daniel, corbet, netdev, linux-doc
In-Reply-To: <20180415080712.2213-1-shhuiw@foxmail.com>

From: Wang Sheng-Hui <shhuiw@foxmail.com>
Date: Sun, 15 Apr 2018 16:07:12 +0800

> The tools are located at tootls/bpf/ instead of tools/net/.
> Update the filter.txt doc.
> 
> Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH iproute2-next 1/1] tc: jsonify ife action
From: David Ahern @ 2018-04-16  0:24 UTC (permalink / raw)
  To: Roman Mashak; +Cc: stephen, netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1523655605-20765-1-git-send-email-mrv@mojatatu.com>

On 4/13/18 3:40 PM, Roman Mashak wrote:
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> ---
>  tc/m_ife.c | 54 ++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 32 insertions(+), 22 deletions(-)
> 

applied to iproute2-next

^ permalink raw reply

* Re: [PATCH v2 iproute2-next 1/1] tc: jsonify skbedit action
From: David Ahern @ 2018-04-16  0:11 UTC (permalink / raw)
  To: Roman Mashak; +Cc: stephen, netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1523383469-26207-1-git-send-email-mrv@mojatatu.com>

On 4/10/18 12:04 PM, Roman Mashak wrote:
> v2:
>    FIxed strings format in print_string()
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> ---
>  tc/m_skbedit.c | 53 +++++++++++++++++++++++++++++------------------------
>  1 file changed, 29 insertions(+), 24 deletions(-)
> 

applied to iproute2-next

^ permalink raw reply

* [PATCH] ibmvnic: Clear pending interrupt after device reset
From: Thomas Falcon @ 2018-04-15 23:53 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, jallen, nfont, benh, Thomas Falcon

Due to a firmware bug, the hypervisor can send an interrupt to a
transmit or receive queue just prior to a partition migration, not
allowing the device enough time to handle it and send an EOI. When
the partition migrates, the interrupt is lost but an "EOI-pending"
flag for the interrupt line is still set in firmware. No further
interrupts will be sent until that flag is cleared, effectively
freezing that queue. To workaround this, the driver will disable the
hardware interrupt and send an H_EOI signal prior to re-enabling it.
This will flush the pending EOI and allow the driver to continue
operation.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index f84a920..ef7995fc 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1034,16 +1034,14 @@ static int __ibmvnic_open(struct net_device *netdev)
 		netdev_dbg(netdev, "Enabling rx_scrq[%d] irq\n", i);
 		if (prev_state == VNIC_CLOSED)
 			enable_irq(adapter->rx_scrq[i]->irq);
-		else
-			enable_scrq_irq(adapter, adapter->rx_scrq[i]);
+		enable_scrq_irq(adapter, adapter->rx_scrq[i]);
 	}
 
 	for (i = 0; i < adapter->req_tx_queues; i++) {
 		netdev_dbg(netdev, "Enabling tx_scrq[%d] irq\n", i);
 		if (prev_state == VNIC_CLOSED)
 			enable_irq(adapter->tx_scrq[i]->irq);
-		else
-			enable_scrq_irq(adapter, adapter->tx_scrq[i]);
+		enable_scrq_irq(adapter, adapter->tx_scrq[i]);
 	}
 
 	rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
@@ -1184,6 +1182,7 @@ static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter)
 			if (adapter->tx_scrq[i]->irq) {
 				netdev_dbg(netdev,
 					   "Disabling tx_scrq[%d] irq\n", i);
+				disable_scrq_irq(adapter, adapter->tx_scrq[i]);
 				disable_irq(adapter->tx_scrq[i]->irq);
 			}
 	}
@@ -1193,6 +1192,7 @@ static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter)
 			if (adapter->rx_scrq[i]->irq) {
 				netdev_dbg(netdev,
 					   "Disabling rx_scrq[%d] irq\n", i);
+				disable_scrq_irq(adapter, adapter->rx_scrq[i]);
 				disable_irq(adapter->rx_scrq[i]->irq);
 			}
 		}
@@ -2601,12 +2601,19 @@ static int enable_scrq_irq(struct ibmvnic_adapter *adapter,
 {
 	struct device *dev = &adapter->vdev->dev;
 	unsigned long rc;
+	u64 val;
 
 	if (scrq->hw_irq > 0x100000000ULL) {
 		dev_err(dev, "bad hw_irq = %lx\n", scrq->hw_irq);
 		return 1;
 	}
 
+	val = (0xff000000) | scrq->hw_irq;
+	rc = plpar_hcall_norets(H_EOI, val);
+	if (rc)
+		dev_err(dev, "H_EOI FAILED irq 0x%llx. rc=%ld\n",
+			val, rc);
+
 	rc = plpar_hcall_norets(H_VIOCTL, adapter->vdev->unit_address,
 				H_ENABLE_VIO_INTERRUPT, scrq->hw_irq, 0, 0);
 	if (rc)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] ibmvnic: Clear pending interrupt after device reset
From: Thomas Falcon @ 2018-04-15 23:46 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, jallen, nfont, benh
In-Reply-To: <1523834853-15448-1-git-send-email-tlfalcon@linux.vnet.ibm.com>

On 04/15/2018 06:27 PM, Thomas Falcon wrote:
> Due to a firmware bug, the hypervisor can send an interrupt to a
> transmit or receive queue just prior to a partition migration, not
> allowing the device enough time to handle it and send an EOI. When
> the partition migrates, the interrupt is lost but an "EOI-pending"
> flag for the interrupt line is still set in firmware. No further
> interrupts will be sent until that flag is cleared, effectively
> freezing that queue. To workaround this, the driver will disable the
> hardware interrupt and send an H_EOI signal prior to re-enabling it.
> This will flush the pending EOI and allow the driver to continue
> operation.

Excuse me, I misspelled the linuxppc-dev email address.

Tom

> Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index f84a920..ef7995fc 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -1034,16 +1034,14 @@ static int __ibmvnic_open(struct net_device *netdev)
>  		netdev_dbg(netdev, "Enabling rx_scrq[%d] irq\n", i);
>  		if (prev_state == VNIC_CLOSED)
>  			enable_irq(adapter->rx_scrq[i]->irq);
> -		else
> -			enable_scrq_irq(adapter, adapter->rx_scrq[i]);
> +		enable_scrq_irq(adapter, adapter->rx_scrq[i]);
>  	}
>
>  	for (i = 0; i < adapter->req_tx_queues; i++) {
>  		netdev_dbg(netdev, "Enabling tx_scrq[%d] irq\n", i);
>  		if (prev_state == VNIC_CLOSED)
>  			enable_irq(adapter->tx_scrq[i]->irq);
> -		else
> -			enable_scrq_irq(adapter, adapter->tx_scrq[i]);
> +		enable_scrq_irq(adapter, adapter->tx_scrq[i]);
>  	}
>
>  	rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
> @@ -1184,6 +1182,7 @@ static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter)
>  			if (adapter->tx_scrq[i]->irq) {
>  				netdev_dbg(netdev,
>  					   "Disabling tx_scrq[%d] irq\n", i);
> +				disable_scrq_irq(adapter, adapter->tx_scrq[i]);
>  				disable_irq(adapter->tx_scrq[i]->irq);
>  			}
>  	}
> @@ -1193,6 +1192,7 @@ static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter)
>  			if (adapter->rx_scrq[i]->irq) {
>  				netdev_dbg(netdev,
>  					   "Disabling rx_scrq[%d] irq\n", i);
> +				disable_scrq_irq(adapter, adapter->rx_scrq[i]);
>  				disable_irq(adapter->rx_scrq[i]->irq);
>  			}
>  		}
> @@ -2601,12 +2601,19 @@ static int enable_scrq_irq(struct ibmvnic_adapter *adapter,
>  {
>  	struct device *dev = &adapter->vdev->dev;
>  	unsigned long rc;
> +	u64 val;
>
>  	if (scrq->hw_irq > 0x100000000ULL) {
>  		dev_err(dev, "bad hw_irq = %lx\n", scrq->hw_irq);
>  		return 1;
>  	}
>
> +	val = (0xff000000) | scrq->hw_irq;
> +	rc = plpar_hcall_norets(H_EOI, val);
> +	if (rc)
> +		dev_err(dev, "H_EOI FAILED irq 0x%llx. rc=%ld\n",
> +			val, rc);
> +
>  	rc = plpar_hcall_norets(H_VIOCTL, adapter->vdev->unit_address,
>  				H_ENABLE_VIO_INTERRUPT, scrq->hw_irq, 0, 0);
>  	if (rc)

^ permalink raw reply

* [PATCH] ibmvnic: Clear pending interrupt after device reset
From: Thomas Falcon @ 2018-04-15 23:27 UTC (permalink / raw)
  To: netdev; +Cc: linuxppc-dev, jallen, nfont, benh, Thomas Falcon

Due to a firmware bug, the hypervisor can send an interrupt to a
transmit or receive queue just prior to a partition migration, not
allowing the device enough time to handle it and send an EOI. When
the partition migrates, the interrupt is lost but an "EOI-pending"
flag for the interrupt line is still set in firmware. No further
interrupts will be sent until that flag is cleared, effectively
freezing that queue. To workaround this, the driver will disable the
hardware interrupt and send an H_EOI signal prior to re-enabling it.
This will flush the pending EOI and allow the driver to continue
operation.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index f84a920..ef7995fc 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1034,16 +1034,14 @@ static int __ibmvnic_open(struct net_device *netdev)
 		netdev_dbg(netdev, "Enabling rx_scrq[%d] irq\n", i);
 		if (prev_state == VNIC_CLOSED)
 			enable_irq(adapter->rx_scrq[i]->irq);
-		else
-			enable_scrq_irq(adapter, adapter->rx_scrq[i]);
+		enable_scrq_irq(adapter, adapter->rx_scrq[i]);
 	}
 
 	for (i = 0; i < adapter->req_tx_queues; i++) {
 		netdev_dbg(netdev, "Enabling tx_scrq[%d] irq\n", i);
 		if (prev_state == VNIC_CLOSED)
 			enable_irq(adapter->tx_scrq[i]->irq);
-		else
-			enable_scrq_irq(adapter, adapter->tx_scrq[i]);
+		enable_scrq_irq(adapter, adapter->tx_scrq[i]);
 	}
 
 	rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_UP);
@@ -1184,6 +1182,7 @@ static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter)
 			if (adapter->tx_scrq[i]->irq) {
 				netdev_dbg(netdev,
 					   "Disabling tx_scrq[%d] irq\n", i);
+				disable_scrq_irq(adapter, adapter->tx_scrq[i]);
 				disable_irq(adapter->tx_scrq[i]->irq);
 			}
 	}
@@ -1193,6 +1192,7 @@ static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter)
 			if (adapter->rx_scrq[i]->irq) {
 				netdev_dbg(netdev,
 					   "Disabling rx_scrq[%d] irq\n", i);
+				disable_scrq_irq(adapter, adapter->rx_scrq[i]);
 				disable_irq(adapter->rx_scrq[i]->irq);
 			}
 		}
@@ -2601,12 +2601,19 @@ static int enable_scrq_irq(struct ibmvnic_adapter *adapter,
 {
 	struct device *dev = &adapter->vdev->dev;
 	unsigned long rc;
+	u64 val;
 
 	if (scrq->hw_irq > 0x100000000ULL) {
 		dev_err(dev, "bad hw_irq = %lx\n", scrq->hw_irq);
 		return 1;
 	}
 
+	val = (0xff000000) | scrq->hw_irq;
+	rc = plpar_hcall_norets(H_EOI, val);
+	if (rc)
+		dev_err(dev, "H_EOI FAILED irq 0x%llx. rc=%ld\n",
+			val, rc);
+
 	rc = plpar_hcall_norets(H_VIOCTL, adapter->vdev->unit_address,
 				H_ENABLE_VIO_INTERRUPT, scrq->hw_irq, 0, 0);
 	if (rc)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net] team: avoid adding twice the same option to the event list
From: Paolo Abeni @ 2018-04-15 19:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jiri
In-Reply-To: <20180413.140705.1693433489799741559.davem@davemloft.net>

On Fri, 2018-04-13 at 14:07 -0400, David Miller wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> Date: Fri, 13 Apr 2018 13:59:25 +0200
> 
> > When parsing the options provided by the user space,
> > team_nl_cmd_options_set() insert them in a temporary list to send
> > multiple events with a single message.
> > While each option's attribute is correctly validated, the code does
> > not check for duplicate entries before inserting into the event
> > list.
> > 
> > Exploiting the above, the syzbot was able to trigger the following
> > splat:
>  ...
> > This changeset addresses the avoiding list_add() if the current
> > option is already present in the event list.
> > 
> > Reported-and-tested-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")
> 
> Looks good to me.
> 
> It's too bad that the tmp list entries don't get marked as they are
> added, or get unlinked by the list processor.  Either scheme would
> make the "already added" test a lot simpler.

Yes, I considered both changes, but than opted for this solution,
beliving it would be less invasive and more suitable for -net.

Cheers,

Paolo

^ permalink raw reply

* Re: [PATCH iproute2] utils: Do not reset family for default, any, all addresses
From: Thomas Deutschmann @ 2018-04-15 19:14 UTC (permalink / raw)
  To: David Ahern, stephen; +Cc: netdev, Serhey Popovych
In-Reply-To: <20180413163633.1844-1-dsahern@gmail.com>

Hi,

I can confirm that this patch solves the issue for us and restores
previous behavior.

Thank you.


-- 
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5

^ permalink raw reply

* Re: linux-next on x60: network manager often complains "network is disabled" after resume
From: Pavel Machek @ 2018-04-15 16:16 UTC (permalink / raw)
  To: Dan Williams
  Cc: Woody Suwalski, Rafael J. Wysocki, kernel list,
	Linux-pm mailing list, Netdev list
In-Reply-To: <95efbba35c3389015d4919a59f8d01bc2d375a19.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2698 bytes --]

On Mon 2018-03-26 10:33:55, Dan Williams wrote:
> On Sun, 2018-03-25 at 08:19 +0200, Pavel Machek wrote:
> > > > > Ok, what does 'nmcli dev' and 'nmcli radio' show?
> > > > 
> > > > Broken state.
> > > > 
> > > > pavel@amd:~$ nmcli dev
> > > > DEVICE  TYPE      STATE        CONNECTION
> > > > eth1    ethernet  unavailable  --
> > > > lo      loopback  unmanaged    --
> > > > wlan0   wifi      unmanaged    --
> > > 
> > > If the state is "unmanaged" on resume, that would indicate a
> > > problem
> > > with sleep/wake and likely not a kernel network device issue.
> > > 
> > > We should probably move this discussion to the NM lists to debug
> > > further.  Before you suspend, run "nmcli gen log level trace" to
> > > turn
> > > on full debug logging, then reproduce the issue, and send a pointer
> > > to
> > > those logs (scrubbed for anything you consider sensitive) to the NM
> > > mailing list.
> > 
> > Hmm :-)
> > 
> > root@amd:/data/pavel# nmcli gen log level trace
> > Error: Unknown log level 'trace'
> 
> What NM version?  'trace' is pretty old (since 1.0 from December 2014)
> so unless you're using a really, really old version of Debian I'd
> expect you'd have it.  Anyway, debug would do.

Hmm.

pavel@duo:~$ /usr/sbin/NetworkManager --version
You must be root to run NetworkManager!
pavel@duo:~$ sudo /usr/sbin/NetworkManager --version
0.9.10.0

So I set the log level, but I still don't see much in the log:

Apr 14 18:14:29 duo dbus[3009]: [system] Successfully activated
service 'org.freedesktop.nm_dispatcher'
Apr 14 18:14:29 duo nm-dispatcher: Dispatching action 'down' for wlan1
Apr 14 18:14:29 duo systemd[1]: Started Network Manager Script
Dispatcher Service.
Apr 14 18:14:29 duo systemd-sleep[6853]: Suspending system...
Apr 14 21:27:53 duo systemd[1]: systemd-journald.service watchdog
timeout (limit 1min)!
pavel@duo:~$ date
Sun Apr 15 12:26:32 CEST 2018
pavel@duo:~$

Is it possible that time handling accross suspend changed in v4.17?

I get some weird effects. With display backlight...

> > Where do I get the logs? I don't see much in the syslog...
> 
> > And.. It seems that it is "every other suspend". One resume results
> > in
> > broken network, one in working one, one in broken one...
> 
> Does your distro use pm-utils, upower, or systemd for suspend/resume
> handling?

upower, I guess:

pavel@duo:/data/l/linux$ ps aux | grep upower
root      3820  0.0  0.1  42848  7984 ?        Ssl  Apr14   0:01
/usr/lib/upower/upowerd

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: linux-next on x60: network manager often complains "network is disabled" after resume
From: Pavel Machek @ 2018-04-15 16:15 UTC (permalink / raw)
  To: Woody Suwalski
  Cc: Rafael J. Wysocki, kernel list, Linux-pm mailing list,
	Netdev list
In-Reply-To: <c7d96582-e2e6-d9c8-1140-3f1dab836132@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1969 bytes --]

On Tue 2018-03-20 21:11:54, Woody Suwalski wrote:
> Woody Suwalski wrote:
> >Pavel Machek wrote:
> >>On Mon 2018-03-19 05:17:45, Woody Suwalski wrote:
> >>>Pavel Machek wrote:
> >>>>Hi!
> >>>>
> >>>>With recent linux-next, after resume networkmanager often claims that
> >>>>"network is disabled". Sometimes suspend/resume clears that.
> >>>>
> >>>>Any ideas? Does it work for you?
> >>>>                                    Pavel
> >>>Tried the 4.16-rc6 with nm 1.4.4 - I do not see the issue.
> >>Thanks for testing... but yes, 4.16 should be ok. If not fixed,
> >>problem will appear in 4.17-rc1.
> >>
> >Works here OK. Tried ~10 suspends, all restarted OK.
> >kernel next-20180320
> >nmcli shows that Wifi always connects OK
> >
> >Woody
> >
> Contrary, it just happened to me on a 64-bit build 4.16-rc5 on T440.
> I think that Dan's suspicion is correct - it is a snafu in the PM: trying to
> hibernate results in a message:
> Failed to hibernate system via logind: There's already a shutdown or sleep
> operation in progress.
> 
> And ps shows "Ds /lib/systemd/systemd-sleep suspend"...

Problem now seems to be in the mainline.

But no, I don't see systemd-sleep in my process list :-(.

I guess you can't reproduce it easily? I tried bisecting, but while it
happens often enough to make v4.17 hard to use, it does not permit
reliable bisect.

These should be bad according to my notes

b04240a33b99b32cf6fbdf5c943c04e505a0cb07 
 ed80dc19e4dd395c951f745acd1484d61c4cfb20
 52113a0d3889d6e2738cf09bf79bc9cac7b5e1c6
 4fc97ef94bbfa185d16b3e44199b7559d0668747
 14ebdb2c814f508936fe178a2abc906a16a3ab48
 639adbeef5ae1bb8eeebbb0cde0b885397bde192

bisection claimed

c16add24522547bf52c189b3c0d1ab6f5c2b4375

is first bad commit, but I'm not sure if I trust that.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* One question about __tcp_select_window()
From: Wang Jian @ 2018-04-15 12:50 UTC (permalink / raw)
  To: netdev

Hi all,

While I read __tcp_select_window() code, I find that it maybe return a
smaller window.
Below is one scenario I thought, may be not right:
In function __tcp_select_window(), assume:
full_space is 6mss, free_space is 2mss, tp->rcv_wnd is 3MSS.
And assume disable window scaling, then
window = tp->rcv_wnd > free_space && window > free_space
then it will round down free_space and return it.

Is this expected behavior? The comment is also saying
"Get the largest window that is a nice multiple of mss."

Should we do something like below ? Or I miss something?
I don't know how to verify it now.

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2680,9 +2680,9 @@ u32 __tcp_select_window(struct sock *sk)
                 * We also don't do any window rounding when the free space
                 * is too small.
                 */
-               if (window <= free_space - mss || window > free_space)
+               if (window <= free_space - mss)
                        window = rounddown(free_space, mss);
-               else if (mss == full_space &&
+               else if (window <= free_space && mss == full_space &&
                         free_space > window + (full_space >> 1))
                        window = free_space;
        }

Thanks.

^ permalink raw reply

* Re: [Patch net] llc: properly handle dev_queue_xmit() return value
From: Noam Rathaus @ 2018-04-15 10:08 UTC (permalink / raw)
  To: David Miller; +Cc: Cong Wang, netdev
In-Reply-To: <CAHqykcRxO2SSQXbpg_tNs49TNxpLZzDsYePokJSusdkdfTyp8g@mail.gmail.com>

Hi,

Is there any update?

On Fri, Apr 13, 2018 at 7:49 PM, Noam Rathaus <noamr@beyondsecurity.com> wrote:
> Hi
>
> Any update?
>
> On Thu, 29 Mar 2018 at 14:11, Noam Rathaus <noamr@beyondsecurity.com> wrote:
>>
>> Hi,
>>
>> Will you notify me when its been accepted? if not, how can I do this
>> checking myself to see if it was accepted?
>>
>> On Tue, Mar 27, 2018 at 8:13 PM, David Miller <davem@davemloft.net> wrote:
>> > From: Noam Rathaus <noamr@beyondsecurity.com>
>> > Date: Tue, 27 Mar 2018 16:27:49 +0000
>> >
>> >> Guys please fill me in on the next step?
>> >>
>> >> If it’s applied it means it’s part of the official code of the kernel
>> >> now?
>> >
>> > It means it is in my networking GIT tree and will make it's way to Linus
>> > in the not so distant future.
>>
>>
>>
>> --
>>
>> Thanks,
>> Noam Rathaus
>> Beyond Security
>>
>> PGP Key ID: 7EF920D3C045D63F (Exp 2019-03)
>
> --
> Thanks,
> Noam Rathaus



-- 

Thanks,
Noam Rathaus
Beyond Security

PGP Key ID: 7EF920D3C045D63F (Exp 2019-03)

^ permalink raw reply

* [PATCH] filter.txt: update 'tools/net/' to 'tools/bpf/'
From: Wang Sheng-Hui @ 2018-04-15  8:07 UTC (permalink / raw)
  To: ast, daniel, corbet, netdev, linux-doc

The tools are located at tootls/bpf/ instead of tools/net/.
Update the filter.txt doc.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
---
 Documentation/networking/filter.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index a4508ec1816b..fd55c7de9991 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -169,7 +169,7 @@ access to BPF code as well.
 BPF engine and instruction set
 ------------------------------
 
-Under tools/net/ there's a small helper tool called bpf_asm which can
+Under tools/bpf/ there's a small helper tool called bpf_asm which can
 be used to write low-level filters for example scenarios mentioned in the
 previous section. Asm-like syntax mentioned here has been implemented in
 bpf_asm and will be used for further explanations (instead of dealing with
@@ -359,7 +359,7 @@ $ ./bpf_asm -c foo
 In particular, as usage with xt_bpf or cls_bpf can result in more complex BPF
 filters that might not be obvious at first, it's good to test filters before
 attaching to a live system. For that purpose, there's a small tool called
-bpf_dbg under tools/net/ in the kernel source directory. This debugger allows
+bpf_dbg under tools/bpf/ in the kernel source directory. This debugger allows
 for testing BPF filters against given pcap files, single stepping through the
 BPF code on the pcap's packets and to do BPF machine register dumps.
 
@@ -483,7 +483,7 @@ Example output from dmesg:
 [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
 [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
 
-In the kernel source tree under tools/net/, there's bpf_jit_disasm for
+In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
 generating disassembly out of the kernel log's hexdump:
 
 # ./bpf_jit_disasm
-- 
2.11.0

^ permalink raw reply related

* Re: SRIOV switchdev mode BoF minutes
From: Or Gerlitz @ 2018-04-15  6:01 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: David Miller, Anjali Singhai Jain, Andy Gospodarek, Michael Chan,
	Simon Horman, Jakub Kicinski, John Fastabend, Saeed Mahameed,
	Jiri Pirko, Rony Efraim, Linux Netdev List
In-Reply-To: <e93e22c3-6c2e-00c9-10c6-163c4aacff14@intel.com>

On Sat, Apr 14, 2018 at 2:03 AM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:

> I meant between PFs on 2 compute nodes.

If the PF serves as uplink rep, it functions as  a switch port -- applications
don't run on switch ports. One way to get apps to run on the host in switchdev
mode is probe one of the VFs there.


[...]

> By smartnic env, i guess you are referring to OVS control plane also running
> on the NIC.

correct

> I will look forward to your patches.

FWIW, note that my patches don't bring any newz for you.. I am aligning
mlx5 with what was agreed on netdev, e.g nfp does it (uplink rep and
such) already.

^ permalink raw reply

* [PATCH linux-stable-4.14] tcp: clear tp->packets_out when purging write queue
From: Soheil Hassas Yeganeh @ 2018-04-15  0:45 UTC (permalink / raw)
  To: davem, netdev
  Cc: ycheng, ncardwell, subashab, hvtaifwkbgefbaei,
	Soheil Hassas Yeganeh, Eric Dumazet

From: Soheil Hassas Yeganeh <soheil@google.com>

Clear tp->packets_out when purging the write queue, otherwise
tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
This results in NULL pointer dereference.

Also, remove the redundant `tp->packets_out = 0` from
tcp_disconnect(), since tcp_disconnect() calls
tcp_write_queue_purge().

Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST)
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Tested-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
---
 include/net/tcp.h | 1 +
 net/ipv4/tcp.c    | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index d323d4fa742ca..fb653736f3353 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1616,6 +1616,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
 	sk_mem_reclaim(sk);
 	tcp_clear_all_retrans_hints(tcp_sk(sk));
 	tcp_init_send_head(sk);
+	tcp_sk(sk)->packets_out = 0;
 }
 
 static inline struct sk_buff *tcp_write_queue_head(const struct sock *sk)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 38b9a6276a9de..4dda8d301802e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2354,7 +2354,6 @@ int tcp_disconnect(struct sock *sk, int flags)
 	icsk->icsk_backoff = 0;
 	tp->snd_cwnd = 2;
 	icsk->icsk_probes_out = 0;
-	tp->packets_out = 0;
 	tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
 	tp->snd_cwnd_cnt = 0;
 	tp->window_clamp = 0;
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net] tcp: clear tp->packets_out when purging write queue
From: Soheil Hassas Yeganeh @ 2018-04-15  0:44 UTC (permalink / raw)
  To: davem, netdev
  Cc: ycheng, ncardwell, subashab, hvtaifwkbgefbaei,
	Soheil Hassas Yeganeh, Eric Dumazet

From: Soheil Hassas Yeganeh <soheil@google.com>

Clear tp->packets_out when purging the write queue, otherwise
tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
This results in NULL pointer dereference.

Also, remove the redundant `tp->packets_out = 0` from
tcp_disconnect(), since tcp_disconnect() calls
tcp_write_queue_purge().

Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST)
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Tested-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4fa3f812b9ff8..9ce1c726185eb 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2368,6 +2368,7 @@ void tcp_write_queue_purge(struct sock *sk)
 	INIT_LIST_HEAD(&tcp_sk(sk)->tsorted_sent_queue);
 	sk_mem_reclaim(sk);
 	tcp_clear_all_retrans_hints(tcp_sk(sk));
+	tcp_sk(sk)->packets_out = 0;
 }
 
 int tcp_disconnect(struct sock *sk, int flags)
@@ -2417,7 +2418,6 @@ int tcp_disconnect(struct sock *sk, int flags)
 	icsk->icsk_backoff = 0;
 	tp->snd_cwnd = 2;
 	icsk->icsk_probes_out = 0;
-	tp->packets_out = 0;
 	tp->snd_ssthresh = TCP_INFINITE_SSTHRESH;
 	tp->snd_cwnd_cnt = 0;
 	tp->window_clamp = 0;
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* Re: Cavium Octeon III network driver.
From: Florian Fainelli @ 2018-04-15  0:08 UTC (permalink / raw)
  To: Steven J. Hill, netdev
In-Reply-To: <c269ed89-75ac-895a-984f-badc0b4d9a05@cavium.com>

Hi Steven,

On 04/13/2018 03:43 PM, Steven J. Hill wrote:
> Patches for Cavium's Octeon III network driver were submitted by
> David Daney back on 20180222. David has since left the company and
> I am now responsible for the upstreaming effort. When looking at
> <pachwork.ozlabs.org> they are marked as "Not Applicable". What
> steps do I take next? Thanks.

net-next tree is currently closed, but once it opens back up, you would
likely want to resubmit those patches. Last I remember they were ready
to go.
-- 
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox