Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 0/7] tcp: implement rb-tree based retransmit queue
From: Eric Dumazet @ 2017-10-06  5:21 UTC (permalink / raw)
  To: David S . Miller, Neal Cardwell, Yuchung Cheng
  Cc: netdev, Eric Dumazet, Eric Dumazet

This patch series implement RB-tree based retransmit queue for TCP,
to better match modern BDP.

Tested:

 On receiver :
 netem on ingress : delay 150ms 200us loss 1
 GRO disabled to force stress and SACK storms.

for f in `seq 1 10`
do
 ./netperf -H lpaa6 -l30 -- -K bbr -o THROUGHPUT|tail -1
done | awk '{print $0} {sum += $0} END {printf "%7u\n",sum}'

Before patch :

323.87  351.48  339.59  338.62  306.72
204.07  304.93  291.88  202.47  176.88
->   2840

After patch:

1700.83 2207.98 2070.17 1544.26 2114.76
2124.89 1693.14 1080.91 2216.82 1299.94
->  18053

Average of 1805 Mbits istead of 284 Mbits.

Eric Dumazet (7):
  net: add rb_to_skb() and other rb tree helpers
  tcp: uninline tcp_write_queue_purge()
  tcp: tcp_tx_timestamp() cleanup
  tcp: tcp_mark_head_lost() optimization
  tcp: reduce tcp_fastretrans_alert() verbosity
  tcp: pass previous skb to tcp_shifted_skb()
  tcp: implement rb-tree based retransmit queue

 include/linux/skbuff.h  |  18 +++++
 include/net/sock.h      |   7 +-
 include/net/tcp.h       | 100 ++++++++++++--------------
 net/ipv4/tcp.c          |  63 ++++++++++++----
 net/ipv4/tcp_fastopen.c |   8 +--
 net/ipv4/tcp_input.c    | 187 ++++++++++++++++++++++++------------------------
 net/ipv4/tcp_ipv4.c     |   2 +-
 net/ipv4/tcp_output.c   | 137 +++++++++++++++++++----------------
 net/ipv4/tcp_timer.c    |  24 ++++---
 net/sched/sch_netem.c   |  14 ++--
 10 files changed, 311 insertions(+), 249 deletions(-)

-- 
2.14.2.920.gcf0c67979c-goog

^ permalink raw reply

* Re: [PATCH] net/ipv6: remove unused err variable on icmpv6_push_pending_frames
From: David Miller @ 2017-10-06  5:18 UTC (permalink / raw)
  To: devtimhansen; +Cc: kuznet, yoshfuji, netdev, linux-kernel, alexander.levin
In-Reply-To: <20171005194532.GA126147@debian>

From: Tim Hansen <devtimhansen@gmail.com>
Date: Thu, 5 Oct 2017 15:45:32 -0400

> int err is unused by icmpv6_push_pending_frames(), this patch returns removes the variable and returns the function with 0.
> 
> git bisect shows this variable has been around since linux has been in git in commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2.  
> 
> This was found by running make coccicheck M=net/ipv6/ on linus' tree on commit 77ede3a014a32746002f7889211f0cecf4803163 (current HEAD as of this patch).
> 
> Signed-off-by: Tim Hansen <devtimhansen@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ipv6: remove unused code in ipv6_find_hdr()
From: David Miller @ 2017-10-06  5:15 UTC (permalink / raw)
  To: xiaolou4617; +Cc: kuznet, yoshfuji, netdev
In-Reply-To: <1507226828-51366-1-git-send-email-xiaolou4617@gmail.com>

From: Lin Zhang <xiaolou4617@gmail.com>
Date: Fri,  6 Oct 2017 02:07:08 +0800

> Storing the left length of skb into 'len' actually has no effect 
> so we can remove it.
> 
> Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next 06/12] qed: Add LL2 slowpath handling
From: Kalderon, Michal @ 2017-10-06  5:09 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Elior, Ariel
In-Reply-To: <20171005.172013.746380495399822.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org <linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> on behalf of David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

>From: "Kalderon, Michal" <Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
>Date: Thu, 5 Oct 2017 20:27:22 +0000
>
>> The spinlock is required for the case that rx buffers are posted
>> from a different thread, where it could be run simultaneously to the
>> rxq_completion.
>
>This only brings us back to my original argument, if the lock is
>necessary in order to synchronize with those paths, how can you
>possible drop the lock safely here?
>
>Is it because you re-read the head and tail pointers of the queue each
>time around the loop?

It's safe to drop the lock here because the implementation of the queue (qed_chain)
maintains both a consumer pointer (tail) indices and producer pointer(head).
The main loop reads  the FWs value and stores it as the "new cons"
and traverses the queue only until it reaches the FWs consumer value. 
The post function adds more buffers to the end of the queue and updates the 
producer. They will not affect the consumer pointers. So posting of buffers
doesn't affect the main loop.

The resources that are protected by the lock and accessed inside the loop
and from post-buffers are three linked-lists, free-descq, posting_descq and
active_descq, their head and tail are  read on every access
(elements are removed and moved between the lists).

Following this discussion, it looks like there was no need to take the lock in the
outer function, but only around the places that access these lists, this is a delicate
change which affects the ll2 clients (iscsi,fcoe,roce,iwarp). As this series is rather
large as it is and is intended for iWARP, please consider taking this one as it is. 
Since it doesn't change existing functionality and doesn't introduce risk to other
components. 

thanks,
Michal

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next 1/3] bpf: Change bpf_obj_name_cpy() to better ensure map's name is init by 0
From: Martin KaFai Lau @ 2017-10-06  4:52 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20171006045213.752372-1-kafai@fb.com>

During get_info_by_fd, the prog/map name is memcpy-ed.  It depends
on the prog->aux->name and map->name to be zero initialized.

bpf_prog_aux is easy to guarantee that aux->name is zero init.

The name in bpf_map may be harder to be guaranteed in the future when
new map type is added.

Hence, this patch makes bpf_obj_name_cpy() to always zero init
the prog/map name.

Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/syscall.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0048cb24ba7b..d124e702e040 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -322,6 +322,8 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
 {
 	const char *end = src + BPF_OBJ_NAME_LEN;
 
+	memset(dst, 0, BPF_OBJ_NAME_LEN);
+
 	/* Copy all isalnum() and '_' char */
 	while (src < end && *src) {
 		if (!isalnum(*src) && *src != '_')
@@ -333,9 +335,6 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
 	if (src == end)
 		return -EINVAL;
 
-	/* '\0' terminates dst */
-	*dst = 0;
-
 	return 0;
 }
 
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 3/3] bpf: Append prog->aux->name in bpf_get_prog_name()
From: Martin KaFai Lau @ 2017-10-06  4:52 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team
In-Reply-To: <20171006045213.752372-1-kafai@fb.com>

This patch makes the bpf_prog's name available
in kallsyms.

The new format is bpf_prog_tag[_name].

Sample kallsyms from running selftests/bpf/test_progs:
[root@arch-fb-vm1 ~]# egrep ' bpf_prog_[0-9a-fA-F]{16}' /proc/kallsyms
ffffffffa0048000 t bpf_prog_dabf0207d1992486_test_obj_id
ffffffffa0038000 t bpf_prog_a04f5eef06a7f555__123456789ABCDE
ffffffffa0050000 t bpf_prog_a04f5eef06a7f555

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
---
 kernel/bpf/core.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index c6be15ae83ee..248961af2421 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -309,12 +309,25 @@ bpf_get_prog_addr_region(const struct bpf_prog *prog,
 
 static void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
 {
+	const char *end = sym + KSYM_NAME_LEN;
+
 	BUILD_BUG_ON(sizeof("bpf_prog_") +
-		     sizeof(prog->tag) * 2 + 1 > KSYM_NAME_LEN);
+		     sizeof(prog->tag) * 2 +
+		     /* name has been null terminated.
+		      * We should need +1 for the '_' preceding
+		      * the name.  However, the null character
+		      * is double counted between the name and the
+		      * sizeof("bpf_prog_") above, so we omit
+		      * the +1 here.
+		      */
+		     sizeof(prog->aux->name) > KSYM_NAME_LEN);
 
 	sym += snprintf(sym, KSYM_NAME_LEN, "bpf_prog_");
 	sym  = bin2hex(sym, prog->tag, sizeof(prog->tag));
-	*sym = 0;
+	if (prog->aux->name[0])
+		snprintf(sym, (size_t)(end - sym), "_%s", prog->aux->name);
+	else
+		*sym = 0;
 }
 
 static __always_inline unsigned long
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 2/3] bpf: Use char in prog and map name
From: Martin KaFai Lau @ 2017-10-06  4:52 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Jakub Kicinski
In-Reply-To: <20171006045213.752372-1-kafai@fb.com>

Instead of u8, use char for prog and map name.  It can avoid the
userspace tool getting compiler's signess warning.  The
bpf_prog_aux, bpf_map, bpf_attr, bpf_prog_info and
bpf_map_info are changed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
---
 include/linux/bpf.h            | 4 ++--
 include/uapi/linux/bpf.h       | 8 ++++----
 tools/include/uapi/linux/bpf.h | 8 ++++----
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a67daea731ab..bc7da2ddfcaf 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -56,7 +56,7 @@ struct bpf_map {
 	struct work_struct work;
 	atomic_t usercnt;
 	struct bpf_map *inner_map_meta;
-	u8 name[BPF_OBJ_NAME_LEN];
+	char name[BPF_OBJ_NAME_LEN];
 };
 
 /* function argument constraints */
@@ -189,7 +189,7 @@ struct bpf_prog_aux {
 	struct bpf_prog *prog;
 	struct user_struct *user;
 	u64 load_time; /* ns since boottime */
-	u8 name[BPF_OBJ_NAME_LEN];
+	char name[BPF_OBJ_NAME_LEN];
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6082faf5fd2a..a37ad348c436 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -230,7 +230,7 @@ union bpf_attr {
 		__u32	numa_node;	/* numa node (effective only if
 					 * BPF_F_NUMA_NODE is set).
 					 */
-		__u8	map_name[BPF_OBJ_NAME_LEN];
+		char	map_name[BPF_OBJ_NAME_LEN];
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -253,7 +253,7 @@ union bpf_attr {
 		__aligned_u64	log_buf;	/* user supplied buffer */
 		__u32		kern_version;	/* checked when prog_type=kprobe */
 		__u32		prog_flags;
-		__u8		prog_name[BPF_OBJ_NAME_LEN];
+		char		prog_name[BPF_OBJ_NAME_LEN];
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -869,7 +869,7 @@ struct bpf_prog_info {
 	__u32 created_by_uid;
 	__u32 nr_map_ids;
 	__aligned_u64 map_ids;
-	__u8  name[BPF_OBJ_NAME_LEN];
+	char name[BPF_OBJ_NAME_LEN];
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
@@ -879,7 +879,7 @@ struct bpf_map_info {
 	__u32 value_size;
 	__u32 max_entries;
 	__u32 map_flags;
-	__u8  name[BPF_OBJ_NAME_LEN];
+	char  name[BPF_OBJ_NAME_LEN];
 } __attribute__((aligned(8)));
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index cb2b9f95160a..f75ac330831d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -230,7 +230,7 @@ union bpf_attr {
 		__u32	numa_node;	/* numa node (effective only if
 					 * BPF_F_NUMA_NODE is set).
 					 */
-		__u8	map_name[BPF_OBJ_NAME_LEN];
+		char	map_name[BPF_OBJ_NAME_LEN];
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
@@ -253,7 +253,7 @@ union bpf_attr {
 		__aligned_u64	log_buf;	/* user supplied buffer */
 		__u32		kern_version;	/* checked when prog_type=kprobe */
 		__u32		prog_flags;
-		__u8		prog_name[BPF_OBJ_NAME_LEN];
+		char		prog_name[BPF_OBJ_NAME_LEN];
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
@@ -869,7 +869,7 @@ struct bpf_prog_info {
 	__u32 created_by_uid;
 	__u32 nr_map_ids;
 	__aligned_u64 map_ids;
-	__u8  name[BPF_OBJ_NAME_LEN];
+	char  name[BPF_OBJ_NAME_LEN];
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
@@ -879,7 +879,7 @@ struct bpf_map_info {
 	__u32 value_size;
 	__u32 max_entries;
 	__u32 map_flags;
-	__u8  name[BPF_OBJ_NAME_LEN];
+	char  name[BPF_OBJ_NAME_LEN];
 } __attribute__((aligned(8)));
 
 /* User bpf_sock_ops struct to access socket values and specify request ops
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next 0/3] bpf: Misc improvements and a new usage on bpf obj name
From: Martin KaFai Lau @ 2017-10-06  4:52 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

The first two patches make improvements on the bpf obj name.

The last patch adds the prog name to kallsyms.

Martin KaFai Lau (3):
  bpf: Change bpf_obj_name_cpy() to better ensure map's name is init by
    0
  bpf: Use char in prog and map name
  bpf: Append prog->aux->name in bpf_get_prog_name()

 include/linux/bpf.h            |  4 ++--
 include/uapi/linux/bpf.h       |  8 ++++----
 kernel/bpf/core.c              | 17 +++++++++++++++--
 kernel/bpf/syscall.c           |  5 ++---
 tools/include/uapi/linux/bpf.h |  8 ++++----
 5 files changed, 27 insertions(+), 15 deletions(-)

-- 
2.9.5

^ permalink raw reply

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
From: SviMik @ 2017-10-06  4:45 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev, Guillaume Nault
In-Reply-To: <CAEwTi7SS7qsbV19NsPGwdiwTcejxygUEqQi-_MJ07uMec0CY9A@mail.gmail.com>

2017-10-04 10:49 GMT+03:00 James Chapman <jchapman@katalix.com>:
> On 3 October 2017 at 08:27, James Chapman <jchapman@katalix.com> wrote:
>> For capturing complete oops messages, have you tried setting up
>> netconsole? You might also find the full text in the syslog on reboot.

Why, thank you! You've just told me that Santa Claus exists :)
I've set up netconsole on 93 of my servers, and hope starting from
tomorrow I'll have more pretty kernel panic reports, and get them even
from servers where I had never had a chance to capture the console
before.

>> It's interesting that you are seeing l2tp issues since switching to
>> 4.x kernels. Are you able to try earlier kernels to find the latest
>> version that works? I'm curious whether things broke at v3.15.

I'll try, but it will take some time to grab enough statistics. The
bug is relatively rare, only few panics per day on the whole bunch of
93 servers.

> It's possible that this may be fixed by a patch that is already
> upstream and merged for v4.14. The fix is from Guillaume Nault:
>
> f3c66d4 l2tp: prevent creation of sessions on terminated tunnels
>
> If it's possible that the L2TP server may try to create a session in a
> tunnel that is being closed, this bug would be exposed.
>
> Guillaume's fix isn't yet pushed to stable releases. Are you able to
> try a v4.14-rc build?

Sorry, I'm not skilled enough to build a kernel for CentOS on my own.
Will wait till it appears in elrepo. The latest version there is
currently 4.13.5. Meanwhile I'll try to switch to 3.10 and see how it
works.

I have also captured few more kernel panics in the last few days.
Please see if they are related to this bug:
http://svimik.com/hdmmsk1kp2.png
http://svimik.com/hdmmsk1kp3.png
http://svimik.com/hdmmsk1kp4.png
http://svimik.com/hdmmsk2kp6.png

^ permalink raw reply

* Re: [PATCH net-next v3 0/2] libbpf: support more map options
From: David Miller @ 2017-10-06  4:42 UTC (permalink / raw)
  To: kraigatgoog; +Cc: ast, daniel, brouer, chonggangli, netdev
In-Reply-To: <20171005144158.14860-1-kraigatgoog@gmail.com>

From: Craig Gallek <kraigatgoog@gmail.com>
Date: Thu,  5 Oct 2017 10:41:56 -0400

> The functional change to this series is the ability to use flags when
> creating maps from object files loaded by libbpf.  In order to do this,
> the first patch updates the library to handle map definitions that
> differ in size from libbpf's struct bpf_map_def.
> 
> For object files with a larger map definition, libbpf will continue to load
> if the unknown fields are all zero, otherwise the map is rejected.  If the
> map definition in the object file is smaller than expected, libbpf will use
> zero as a default value in the missing fields.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net] selftests/net: rxtimestamp: Fix an off by one
From: David Miller @ 2017-10-06  4:29 UTC (permalink / raw)
  To: dan.carpenter
  Cc: shuah, maloney, willemb, linux-kselftest, netdev, kernel-janitors
In-Reply-To: <20171005125347.otplfss4mo7hfopv@mwanda>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Thu, 5 Oct 2017 15:53:47 +0300

> The > should be >= so that we don't write one element beyond the end of
> the array.
> 
> Fixes: 16e781224198 ("selftests/net: Add a test to validate behavior of rx timestamps")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: qcom/emac: make function emac_isr static
From: David Miller @ 2017-10-06  4:27 UTC (permalink / raw)
  To: colin.king; +Cc: timur, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20171005091023.27781-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Thu,  5 Oct 2017 10:10:23 +0100

> From: Colin Ian King <colin.king@canonical.com>
> 
> The function emac_isr is local to the source and does not need to
> be in global scope, so make it static.
> 
> Cleans up sparse warnings:
> symbol 'emac_isr' was not declared. Should it be static?
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/3] tcp: improving RACK cpu performance
From: David Miller @ 2017-10-06  4:26 UTC (permalink / raw)
  To: ycheng; +Cc: netdev
In-Reply-To: <20171004200000.39257-1-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Wed,  4 Oct 2017 12:59:57 -0700

> This patch set improves the CPU consumption of the RACK TCP loss
> recovery algorithm, in particular for high-speed networks. Currently,
> for every ACK in recovery RACK can potentially iterate over all sent
> packets in the write queue. On large BDP networks with non-trivial
> losses the RACK write queue walk CPU usage becomes unreasonably high.
> 
> This patch introduces a new queue in TCP that keeps only skbs sent and
> not yet (s)acked or marked lost, in time order instead of sequence
> order.  With that, RACK can examine this time-sorted list and only
> check packets that were sent recently, within the reordering window,
> per ACK. This is the fastest way without any write queue walks. The
> number of skbs examined per ACK is reduced by orders of magnitude.

That's a pretty risky way to implement the second SKB list.... but
you avoided making sk_buff larger so what can I say :-)

Series applied, thank.

^ permalink raw reply

* Re: [PATCH] net/ipv4: Remove unused variable in route.c
From: David Miller @ 2017-10-06  4:17 UTC (permalink / raw)
  To: devtimhansen; +Cc: kuznet, yoshfuji, netdev, linux-kernel, alexander.levin
In-Reply-To: <20171004195949.GA39492@debian>

From: Tim Hansen <devtimhansen@gmail.com>
Date: Wed, 4 Oct 2017 15:59:49 -0400

> int rc is unmodified after initalization in net/ipv4/route.c, this patch simply cleans up that variable and returns 0.
> 
> This was found with coccicheck M=net/ipv4/ on linus' tree.
> 
> Signed-off-by: Tim Hansen <devtimhansen@gmail.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net v2] RDS: IB: Initialize max_items based on underlying device attributes
From: David Miller @ 2017-10-06  4:16 UTC (permalink / raw)
  To: avinash.repaka
  Cc: santosh.shilimkar, netdev, linux-rdma, rds-devel, linux-kernel
In-Reply-To: <1507144289-1690-1-git-send-email-avinash.repaka@oracle.com>

From: Avinash Repaka <avinash.repaka@oracle.com>
Date: Wed,  4 Oct 2017 12:11:29 -0700

> Use max_1m_mrs/max_8k_mrs while setting max_items, as the former
> variables are set based on the underlying device attributes.
> 
> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net v2] RDS: IB: Limit the scope of has_fr/has_fmr variables
From: David Miller @ 2017-10-06  4:16 UTC (permalink / raw)
  To: avinash.repaka
  Cc: santosh.shilimkar, netdev, linux-rdma, rds-devel, linux-kernel
In-Reply-To: <1507144244-1611-1-git-send-email-avinash.repaka@oracle.com>

From: Avinash Repaka <avinash.repaka@oracle.com>
Date: Wed,  4 Oct 2017 12:10:43 -0700

> This patch fixes the scope of has_fr and has_fmr variables as they are
> needed only in rds_ib_add_one().
> 
> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v2 net-next 2/2] tcp: clean up TFO server's initial tcp_rearm_rto() call
From: David Miller @ 2017-10-06  4:10 UTC (permalink / raw)
  To: weiwan; +Cc: netdev, ycheng, ncardwell, edumazet
In-Reply-To: <20171004170404.132419-1-tracywwnj@gmail.com>

From: Wei Wang <weiwan@google.com>
Date: Wed,  4 Oct 2017 10:04:04 -0700

> From: Wei Wang <weiwan@google.com>
> 
> This commit does a cleanup and moves tcp_rearm_rto() call in the TFO
> server case into a previous spot in tcp_rcv_state_process() to make
> it more compact.
> This is only a cosmetic change.
> 
> Suggested-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] tcp: uniform the set up of sockets after successful connection
From: David Miller @ 2017-10-06  4:10 UTC (permalink / raw)
  To: weiwan; +Cc: netdev, ycheng, ncardwell, edumazet
In-Reply-To: <20171004170344.132339-1-tracywwnj@gmail.com>

From: Wei Wang <weiwan@google.com>
Date: Wed,  4 Oct 2017 10:03:44 -0700

> From: Wei Wang <weiwan@google.com>
> 
> Currently in the TCP code, the initialization sequence for cached
> metrics, congestion control, BPF, etc, after successful connection
> is very inconsistent. This introduces inconsistent bevhavior and is
> prone to bugs. The current call sequence is as follows:
 ...
> This commit uniforms the above functions to have the following sequence:
>         tcp_mtup_init(sk);
>         icsk->icsk_af_ops->rebuild_header(sk);
>         tcp_init_metrics(sk);
>         tcp_call_bpf(sk, BPF_SOCK_OPS_ACTIVE/PASSIVE_ESTABLISHED_CB);
>         tcp_init_congestion_control(sk);
>         tcp_init_buffer_space(sk);
> This sequence is the same as the (1) active case. We pick this sequence
> because this order correctly allows BPF to override the settings
> including congestion control module and initial cwnd, etc from
> the route, and then allows the CC module to see those settings.
> 
> Suggested-by: Neal Cardwell <ncardwell@google.com>
> Tested-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Acked-by: Eric Dumazet <edumazet@google.com>

Nice change, applied, thanks.

^ permalink raw reply

* Re: [PATCH net v2 3/9] net/mac89x0: Fix and modernize log messages
From: David Miller @ 2017-10-06  4:08 UTC (permalink / raw)
  To: fthain; +Cc: netdev, linux-kernel
In-Reply-To: <e9fb341944d187cabf1ccef78cfdc9de64e3f158.1507211120.git.fthain@telegraphics.com.au>

From: Finn Thain <fthain@telegraphics.com.au>
Date: Thu,  5 Oct 2017 21:11:05 -0400 (EDT)

> Fix misplaced newlines in conditional log messages.

Please don't do this, the way the author formatted the strings
was intentional, they intended to print out:

	NAME: cs89%c0%s rev %c found at %#8lx IRQ %d ADDR %pM

But now you are splitting it into multiple lines.  Also, you're
printing the IRQ information after register_netdev() which is
bad.  As soon as register_netdev() is called, the driver's
->open() routine can be invoked, and during which time some
log messages could be emitted during that operation.

And that would cut the probe messages up.

I know how you got to this state, you saw a reference to dev->name
before it had a real value.  You just removed the "eth%d" string
entirely.  And since you removed the dev->name reference, you had
no reason to move log messages after register_netdev() at all.

Anyways, you can also see the intention of the author here becuase
they have _explicit_ leading newlines in the error path messages that
come after the inital probe printk.

The real way to fix the early dev->name reference is to replace it
with a dev_info() call and have it use the struct device name rather
than the netdev device one.

Again, I think you really shouldn't be making these small weird
changes to these old drivers.

^ permalink raw reply

* Re: [PATCH v3 0/5] VSOCK: add sock_diag interface
From: David Miller @ 2017-10-06  3:40 UTC (permalink / raw)
  To: stefanha; +Cc: netdev, jhansen, decui
In-Reply-To: <20171005204654.2737-1-stefanha@redhat.com>

From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu,  5 Oct 2017 16:46:49 -0400

> v3:
>  * Rebased onto net-next/master and resolved Hyper-V transport conflict
> 
> v2:
>  * Moved tests to tools/testing/vsock/.  I was unable to put them in selftests/
>    because they require manual setup of a VMware/KVM guest.
>  * Moved to __vsock_in_bound/connected_table() to af_vsock.h
>  * Fixed local variable ordering in Patch 4
> 
> There is currently no way for userspace to query open AF_VSOCK sockets.  This
> means ss(8), netstat(8), and other utilities cannot display AF_VSOCK sockets.
> 
> This patch series adds the netlink sock_diag interface for AF_VSOCK.  Userspace
> programs sent a DUMP request including an sk_state bitmap to filter sockets
> based on their state (connected, listening, etc).  The vsock_diag.ko module
> replies with information about matching sockets.  This userspace ABI is defined
> in <linux/vm_sockets_diag.h>.
> 
> The final patch adds a test suite that exercises the basic cases.
> 
> Jorgen and Dexuan: I have only tested the virtio transport but this should also
> work for VMCI and Hyper-V.  Please give it a shot if you have time.

Series applied, thanks.

^ permalink raw reply

* Re: r8169 Wake-on-LAN causes immediate ACPI GPE wakeup
From: Daniel Drake @ 2017-10-06  2:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: nic_swsd, netdev, ACPI Devel Maling List, Linux Kernel,
	Linux Upstreaming Team, Linux PM
In-Reply-To: <CAJZ5v0i+Ct9HbT+vMchW0dNxYO2WvyY5Tmz1a+3k0FLmbFGE2w@mail.gmail.com>

On Fri, Oct 6, 2017 at 9:24 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On the other hand, the RP05 (root port) _PRW says it will wake up the
>> system via GPE09, and the _L09 handler at least has one codepath which
>> could potentially do a Notify(PXSX, 2) to indicate an ethernet wakeup.
>
> Which can only happen in the S0 system state.

Not quite sure I understand your comment here. Of course the _L09
handler (and any other ACPI code) can only execute in S0 state.
However if Linux leaves GPE09 enabled during S3 suspend, and then
detects that it has an event pending on resume, it will execute _L09
during resume. (However, we have not observed GPE09 firing at all)

>> But in testing:
>>  - If GPE08 is enabled as a wakeup source, the system will always wake
>> up as soon as it goes to sleep
>
> What exactly do you mean by "enabled as a wakeup source"?

Linux associates the ethernet PCI device with PCI0.RP05.PXSX, which
has _PRW referencing GPE08.
r8169 has already done device_set_wakeup_enable() at probe time.
So when going into suspend, acpi_enable_wakeup_devices() and the code
beneath that will ensure that GPE08 is enabled when the system goes
into suspend.
See acpi_hw_enable_wakeup_gpe_block().

To disable it, "echo disabled > power/wakeup" for the ethernet device in sysfs.

To test with GPE09 instead, I modified the _PRW method in the DSDT to
point at GPE09, and again used sysfs to control whether the GPE is
enabled during suspend or not.

Thanks
Daniel

^ permalink raw reply

* Re: r8169 Wake-on-LAN causes immediate ACPI GPE wakeup
From: Daniel Drake @ 2017-10-06  2:34 UTC (permalink / raw)
  To: Francois Romieu
  Cc: nic_swsd, netdev, linux-acpi, Linux Kernel,
	Linux Upstreaming Team
In-Reply-To: <20171006001649.GA21884@electric-eye.fr.zoreil.com>

On Fri, Oct 6, 2017 at 8:16 AM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> Daniel Drake <drake@endlessm.com> :
> [...]
>> Also, is there a standard behaviour defined for ethernet drivers
>> regarding wake-on-LAN? r8169 appears to enable wake-on-LAN by default
>> if it believes the hardware is capable of it,
>
> If so it isn't its designed behavior.
>
> The r8169 driver does not enable specific WoL event source (unicast packet,
> link, etc.). It should keep the current settings unless one of those holds:
> - explicit wol config from userspace (obviously :o) )
> - runtime pm requires different settings to resume. The change should
>   be temporary (save before suspend, restore after resume).
>
> The device is supposed to require both an event source + Config1.PMEnable.
>
> A problem may happen if some event source bit is already set while
> Config1.PMEnable is not. The driver has been forcing Config1.PMEnable
> since 5d06a99f543e734ceb53bbc9e550537be97f0c49. One may thus experience
> transition from inconsistent wol settings to enabled ones (if you want
> to dig it, check beforehand if Config1.PMEnable is really read-write or
> hardwired to 1).

The code in question here is in rtl_init_one():

    device_set_wakeup_enable(&pdev->dev, tp->features & RTL_FEATURE_WOL);

This enables wakeups regardless of current WOL settings, as long as
the hardware supports the WOL feature.
Should we remove this line? rtl8169_set_wol() looks like it will do
the right thing here if WOL is later enabled.

Daniel

^ permalink raw reply

* Re: r8169 Wake-on-LAN causes immediate ACPI GPE wakeup
From: Rafael J. Wysocki @ 2017-10-06  1:24 UTC (permalink / raw)
  To: Daniel Drake
  Cc: nic_swsd, netdev, ACPI Devel Maling List, Linux Kernel,
	Linux Upstreaming Team, Linux PM
In-Reply-To: <CAD8Lp45YVtw48+8jSq6gtu1xA7Rt+qr6vvVZF4AC-QXeCKNj4w@mail.gmail.com>

On Thu, Oct 5, 2017 at 10:57 AM, Daniel Drake <drake@endlessm.com> wrote:
> Hi,
>
> On the Acer laptop models Aspire ES1-533, Aspire ES1-732, PackardBell
> ENTE69AP and Gateway NE533, we are seeing a problem where the system
> immediately wakes up after being put into S3 suspend.
>
> This problem has been seen on all kernel versions that we have tried,
> including 4.14-rc3.
>
> After disabling wakeup sources one by one, we found that the r8169
> ethernet is responsible for these wakeups here, the hardware is:
>
> 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
>     Subsystem: Acer Incorporated [ALI] Device 1084
>     Flags: bus master, fast devsel, latency 0, IRQ 124
>     I/O ports at 1000 [size=256]
>     Memory at 91204000 (64-bit, non-prefetchable) [size=4K]
>     Memory at 91200000 (64-bit, non-prefetchable) [size=16K]
>     Capabilities: [40] Power Management version 3
>     Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
>     Capabilities: [70] Express Endpoint, MSI 01
>     Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
>     Capabilities: [100] Advanced Error Reporting
>     Capabilities: [140] Virtual Channel
>     Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
>     Capabilities: [170] Latency Tolerance Reporting
>     Capabilities: [178] L1 PM Substates
>     Kernel driver in use: r8169
>
> This driver enables WOL by default. The system wakes up immediately
> when it is put into S3 suspend, even if there is no ethernet cable
> plugged in.
>
> The problem was also reproduced with the r8168 vendor driver, however
> it does not occur under Windows, where we can suspend the system just
> fine and also wake it up with a magic WOL packet.
>
> Further investigation takes us into ACPI-land. The complete DSDT is here:
> https://gist.github.com/dsd/62293b6d8c30a5204128709813a55ffb
>
> Both Windows and Linux associate PCI0.RP05.PXSX with this device, so
> let's consider this part of the DSDT:
>
>   Device (RP05)
>   {
>       Method (_ADR, 0, NotSerialized)  // _ADR: Address
>       {
>           If (RPA5 != Zero)
>           {
>               Return (RPA5) /* \RPA5 */
>           }
>           Else
>           {
>               Return (0x00130002)
>           }
>       }
>
>       Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
>       {
>           Return (GPRW (0x09, 0x04))
>       }
>
>       Device (PXSX)
>       {
>           Name (_ADR, Zero)  // _ADR: Address
>           Name (_PRW, Package (0x02)  // _PRW: Power Resources for Wake
>           {
>               0x08,
>               0x04
>           })
>       }
>
>   }
>
> RP05 corresponds to
> 00:13.0 PCI bridge: Intel Corporation Device 5ada (rev fb)
>
> I am not familiar with this subdevice approach, where PXSX (with
> address 0) is detected as a child of the PCI bridge, however both
> Windows and Linux associate PXSX with the ethernet device, so I guess
> it is correct.
>
> Now to focus on the _PRW power resource for wakeup. The PXSX
> (ethernet) device says that it will wake up the system using GPE08.
> However if you look at the _L08 GPE08 event handler, you will see that
> it does not do anything related to RP05/PXSX (it instead calls into
> RP02, which does not even physically exist on this platform) -
> suspicious.

Well, it is broken, but that doesn't matter for wakeups from S3.

> On the other hand, the RP05 (root port) _PRW says it will wake up the
> system via GPE09, and the _L09 handler at least has one codepath which
> could potentially do a Notify(PXSX, 2) to indicate an ethernet wakeup.

Which can only happen in the S0 system state.

> But in testing:
>  - If GPE08 is enabled as a wakeup source, the system will always wake
> up as soon as it goes to sleep

What exactly do you mean by "enabled as a wakeup source"?

>  - I have never seen a wakeup on GPE09
>  - Disabling GPE08 and all other GPE wakeups, the system sleeps fine,
> and Wake-on-LAN works fine too

Again, what exactly do you mean by "Disabling GPE08 and all other GPE
wakeups"?  That is, what exactly do you do to disable/enable them?

> So in summary, the messy situation is that the DSDT suggests that
> GPE08 will be used for ethernet wakeups, however that GPE seems to
> fire instantly during suspend, and actually wake-on-LAN does not
> appear to use ACPI GPEs to wake the system it all - it must use some
> other mechanism. Windows is for some reason ignoring the ethernet
> device _PRW information so it does not suffer this issue.

Oh well.

> Does anyone have suggestions for how Linux should work with this?
>
> What logic should we use to ignore the _PRW in this case, or how can
> we quirk it?

User space can do that via /proc/acpi/wakeup.  The kernel not so much.

I guess it might be possible to add a DMI-based quirk for this system
to ignore _PRW for a specific ACPI device object, but that would be
super-ugly.

> Also, is there a standard behaviour defined for ethernet drivers
> regarding wake-on-LAN?

I'm not aware of any. :-)

> r8169 appears to enable wake-on-LAN by default
> if it believes the hardware is capable of it, but other ethernet
> drivers seem to default to WOL off. (I don't expect users of the
> affected consumer laptops here to care about WOL support.)

Defaulting to off is generally safer, because you avoid spurious
wakeups when the user doesn't care about WoL.

Thanks,
Rafael

^ permalink raw reply

* [PATCH net v2 3/9] net/mac89x0: Fix and modernize log messages
From: Finn Thain @ 2017-10-06  1:11 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel
In-Reply-To: <cover.1507211120.git.fthain@telegraphics.com.au>

Fix misplaced newlines in conditional log messages.
Add missing printk severity levels.
Log the MAC address after the interface gets a meaningful name.
Drop deprecated "out of memory" message as per checkpatch advice.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/net/ethernet/cirrus/mac89x0.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/cirrus/mac89x0.c b/drivers/net/ethernet/cirrus/mac89x0.c
index 4fd72c1a69f5..fa4b6968afd5 100644
--- a/drivers/net/ethernet/cirrus/mac89x0.c
+++ b/drivers/net/ethernet/cirrus/mac89x0.c
@@ -56,6 +56,8 @@
   local_irq_{dis,en}able()
 */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 static const char version[] =
 "cs89x0.c:v1.02 11/26/96 Russell Nelson <nelson@crynwr.com>\n";
 
@@ -248,16 +250,14 @@ struct net_device * __init mac89x0_probe(int unit)
 	if (net_debug && version_printed++ == 0)
 		printk(version);
 
-	printk(KERN_INFO "%s: cs89%c0%s rev %c found at %#8lx",
-	       dev->name,
-	       lp->chip_type==CS8900?'0':'2',
-	       lp->chip_type==CS8920M?"M":"",
-	       lp->chip_revision,
-	       dev->base_addr);
+	pr_info("CS89%c0%s rev %c found at %#8lx\n",
+	        lp->chip_type == CS8900 ? '0' : '2',
+	        lp->chip_type == CS8920M ? "M" : "",
+	        lp->chip_revision, dev->base_addr);
 
 	/* Try to read the MAC address */
 	if ((readreg(dev, PP_SelfST) & (EEPROM_PRESENT | EEPROM_OK)) == 0) {
-		printk("\nmac89x0: No EEPROM, giving up now.\n");
+		pr_info("No EEPROM, giving up now\n");
 		goto out1;
         } else {
                 for (i = 0; i < ETH_ALEN; i += 2) {
@@ -270,15 +270,14 @@ struct net_device * __init mac89x0_probe(int unit)
 
 	dev->irq = SLOT2IRQ(slot);
 
-	/* print the IRQ and ethernet address. */
-
-	printk(" IRQ %d ADDR %pM\n", dev->irq, dev->dev_addr);
-
 	dev->netdev_ops		= &mac89x0_netdev_ops;
 
 	err = register_netdev(dev);
 	if (err)
 		goto out1;
+
+	netdev_info(dev, "MAC %pM, IRQ %d\n", dev->dev_addr, dev->irq);
+
 	return NULL;
 out1:
 	nubus_writew(0, dev->base_addr + ADD_PORT);
@@ -473,7 +472,6 @@ net_rx(struct net_device *dev)
 	/* Malloc up new buffer. */
 	skb = alloc_skb(length, GFP_ATOMIC);
 	if (skb == NULL) {
-		printk("%s: Memory squeeze, dropping packet.\n", dev->name);
 		dev->stats.rx_dropped++;
 		return;
 	}
@@ -561,7 +559,7 @@ static int set_mac_address(struct net_device *dev, void *addr)
 		return -EADDRNOTAVAIL;
 
 	memcpy(dev->dev_addr, saddr->sa_data, ETH_ALEN);
-	printk("%s: Setting MAC address to %pM\n", dev->name, dev->dev_addr);
+	netdev_info(dev, "Setting MAC address to %pM\n", dev->dev_addr);
 
 	/* set the Ethernet address */
 	for (i=0; i < ETH_ALEN/2; i++)
@@ -585,7 +583,7 @@ init_module(void)
 	net_debug = debug;
         dev_cs89x0 = mac89x0_probe(-1);
 	if (IS_ERR(dev_cs89x0)) {
-                printk(KERN_WARNING "mac89x0.c: No card found\n");
+		pr_warn("No card found\n");
 		return PTR_ERR(dev_cs89x0);
 	}
 	return 0;
-- 
2.13.5

^ permalink raw reply related

* [PATCH net v2 1/9] net/smc9194: Remove bogus CONFIG_MAC reference
From: Finn Thain @ 2017-10-06  1:11 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, linux-kernel
In-Reply-To: <cover.1507211120.git.fthain@telegraphics.com.au>

The only version of smc9194.c with Mac support is the one in the
linux-mac68k CVS repo. AFAIK that driver never made it to the mainline.

Despite that, as of v2.3.45, arch/m68k/config.in listed CONFIG_SMC9194
under CONFIG_MAC. This mistake got carried over into Kconfig in v2.5.55.
(See pre-git era "[PATCH] add m68k dependencies to net driver config".)

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
---
 drivers/net/ethernet/smsc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/smsc/Kconfig b/drivers/net/ethernet/smsc/Kconfig
index 63aca9f847e1..4c2f612e4414 100644
--- a/drivers/net/ethernet/smsc/Kconfig
+++ b/drivers/net/ethernet/smsc/Kconfig
@@ -20,7 +20,7 @@ if NET_VENDOR_SMSC
 
 config SMC9194
 	tristate "SMC 9194 support"
-	depends on (ISA || MAC && BROKEN)
+	depends on ISA
 	select CRC32
 	---help---
 	  This is support for the SMC9xxx based Ethernet cards. Choose this
-- 
2.13.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox