Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 net-next 0/8] bpf: Add option to set mark and priority in cgroup sock programs
From: David Miller @ 2017-08-29 21:53 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, daniel, ast, tj
In-Reply-To: <1503687941-626-1-git-send-email-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Fri, 25 Aug 2017 12:05:33 -0700

> Add option to set mark and priority in addition to bound device for newly
> created sockets. Also, allow the bpf programs to use the get_current_uid_gid
> helper meaning socket marks, priority and device can be set base on the
> uid/gid of the running process.
> 
> For flexbility in deploying these programs, option is added to allow cgroups
> to be walked from current to root running any program attached. This allows
> one cgroup level to control the device a socket is bound to (e.g, a VRF) while
> cgroups can be used to set socket marks and priority.
> 
> Sample programs are updated to demonstrate the new options.
> 
> v2
> - added flag to control recursive behavior as requested by Alexei
> - added comment to sock_filter_func_proto regarding use of
>   get_current_uid_gid helper
> - updated test programs for recursive option

I'm marking this patch series as "deferred" while the semantic issues
keep getting discussed.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/4] mlx4 misc patches
From: David Miller @ 2017-08-29 21:58 UTC (permalink / raw)
  To: tariqt; +Cc: netdev, eranbe
In-Reply-To: <1503927503-29065-1-git-send-email-tariqt@mellanox.com>

From: Tariq Toukan <tariqt@mellanox.com>
Date: Mon, 28 Aug 2017 16:38:19 +0300

> This patchset contains misc patches from the team
> to the mlx4 Core and Eth drivers.
> 
> Patch 1 by Eran replaces large static allocations by dynamic ones.
> Patch 2 by Leon makes an explicit conversion and solves a smatch warning.
> In patch 3 I fix a misplaced brackets of the sizeof operation.
> Patch 4 by Moshe adds the ability to inform the FW regarding user mac updates.
> 
> Series generated against net-next commit:
> 901c5d2fbfcd ARM: dts: rk3228-evb: Fix the compiling error

Series applied, thanks.

^ permalink raw reply

* [PATCH net-next] Documentation: networking: Add blurb about patches in patchwork
From: Florian Fainelli @ 2017-08-29 22:07 UTC (permalink / raw)
  To: netdev; +Cc: davem, Florian Fainelli

Explain that the patch queue in patchwork should not be touched by patch
submitters.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/networking/netdev-FAQ.txt | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/networking/netdev-FAQ.txt b/Documentation/networking/netdev-FAQ.txt
index 247a30ba8e17..cfc66ea72329 100644
--- a/Documentation/networking/netdev-FAQ.txt
+++ b/Documentation/networking/netdev-FAQ.txt
@@ -111,6 +111,14 @@ A: Generally speaking, the patches get triaged quickly (in less than 48h).
    patch is a good way to ensure your patch is ignored or pushed to
    the bottom of the priority list.
 
+Q: I submitted multiple versions of the patch series, should I directly update
+   patchwork for the previous versions of these patch series?
+
+A: No, please don't interfere with the patch status on patchwork, leave it to
+   the maintainer to figure out what is the most recent and current version that
+   should be applied. If there is any doubt, the maintainer will reply and ask
+   what should be done.
+
 Q: How can I tell what patches are queued up for backporting to the
    various stable releases?
 
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net-next 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
From: Florian Fainelli @ 2017-08-29 22:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, opendmb, andrew, vivien.didelot
In-Reply-To: <20170829.145203.2052260546778215324.davem@davemloft.net>

On 08/29/2017 02:52 PM, David Miller wrote:
> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Tue, 29 Aug 2017 14:45:30 -0700
> 
>> On 08/29/2017 02:42 PM, David Miller wrote:
>>> From: Florian Fainelli <f.fainelli@gmail.com>
>>> Date: Tue, 29 Aug 2017 11:39:41 -0700
>>>
>>>> While trying an ARM BE kernel for kinks, the 3 drivers below started not
>>>> working and the reasons why became pretty obvious because the register space
>>>> remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's
>>>> native endian (let's call that a feature).
>>>
>>> Series applied, thanks Florian.
>>
>> If you have not pushed yet, seems like not, can you check you applied v2
>> of this patch series, in particular this patch:
>>
>> http://patchwork.ozlabs.org/patch/807296/
> 
> If you didn't keep messing with the patchwork state of patches in my
> queue this confusion would not have happened.
> 
> I did happen to apply v2, but because of all of the confusion you keep
> creating, I used the header message from v1.
> 
> It's already pushed out so there is nothing we can do about it.
> 

It's all good, thanks! /me goes updating netdev-FAQ.txt to mention not
touching patchwork.
-- 
Florian

^ permalink raw reply

* Re: [PATCH net v1 1/1] tipc: permit bond slave as bearer
From: David Miller @ 2017-08-29 22:10 UTC (permalink / raw)
  To: parthasarathy.bhuvaragan
  Cc: netdev, tipc-discussion, jon.maloy, maloy, ying.xue
In-Reply-To: <1503935822-20445-1-git-send-email-parthasarathy.bhuvaragan@ericsson.com>

From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Date: Mon, 28 Aug 2017 17:57:02 +0200

> For a bond slave device as a tipc bearer, the dev represents the bond
> interface and orig_dev represents the slave in tipc_l2_rcv_msg().
> Since we decode the tipc_ptr from bonding device (dev), we fail to
> find the bearer and thus tipc links are not established.
> 
> In this commit, we register the tipc protocol callback per device and
> look for tipc bearer from both the devices.
> 
> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>

Applied.

^ permalink raw reply

* Re: [PATCH] packet: Don't write vnet header beyond end of buffer
From: David Miller @ 2017-08-29 22:10 UTC (permalink / raw)
  To: bpoirier; +Cc: netdev, linux-kernel, willemb
In-Reply-To: <20170828182941.10677-1-bpoirier@suse.com>

From: Benjamin Poirier <bpoirier@suse.com>
Date: Mon, 28 Aug 2017 14:29:41 -0400

> ... which may happen with certain values of tp_reserve and maclen.
> 
> Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> Cc: Willem de Bruijn <willemb@google.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net v2 0/6] net:ethernet:aquantia: Atlantic driver Update 2017-08-23
From: David Miller @ 2017-08-29 22:12 UTC (permalink / raw)
  To: Pavel.Belous
  Cc: netdev, darcari, Igor.Russkikh, Nadezhda.Krupnina, simon.edelhaus
In-Reply-To: <cover.1503945861.git.pavel.belous@aquantia.com>

From: Pavel Belous <Pavel.Belous@aquantia.com>
Date: Mon, 28 Aug 2017 21:52:07 +0300

> From: Pavel Belous <pavel.belous@aquantia.com>
> 
> This series contains updates for aQuantia Atlantic driver.
> 
> It has bugfixes and some improvements.
> 
> Changes in v2:
>  - "MCP state change" fix removed (will be sent as
>     a separate fix after further investigation.)

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
From: David Miller @ 2017-08-29 22:12 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, opendmb, andrew, vivien.didelot
In-Reply-To: <cf41703a-67e4-609b-e7ec-bd962ffec6c7@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 29 Aug 2017 15:08:00 -0700

> On 08/29/2017 02:52 PM, David Miller wrote:
>> From: Florian Fainelli <f.fainelli@gmail.com>
>> Date: Tue, 29 Aug 2017 14:45:30 -0700
>> 
>>> On 08/29/2017 02:42 PM, David Miller wrote:
>>>> From: Florian Fainelli <f.fainelli@gmail.com>
>>>> Date: Tue, 29 Aug 2017 11:39:41 -0700
>>>>
>>>>> While trying an ARM BE kernel for kinks, the 3 drivers below started not
>>>>> working and the reasons why became pretty obvious because the register space
>>>>> remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's
>>>>> native endian (let's call that a feature).
>>>>
>>>> Series applied, thanks Florian.
>>>
>>> If you have not pushed yet, seems like not, can you check you applied v2
>>> of this patch series, in particular this patch:
>>>
>>> http://patchwork.ozlabs.org/patch/807296/
>> 
>> If you didn't keep messing with the patchwork state of patches in my
>> queue this confusion would not have happened.
>> 
>> I did happen to apply v2, but because of all of the confusion you keep
>> creating, I used the header message from v1.
>> 
>> It's already pushed out so there is nothing we can do about it.
>> 
> 
> It's all good, thanks! /me goes updating netdev-FAQ.txt to mention not
> touching patchwork.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] Documentation: networking: Add blurb about patches in patchwork
From: David Miller @ 2017-08-29 22:12 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev
In-Reply-To: <20170829220751.3814-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 29 Aug 2017 15:07:51 -0700

> Explain that the patch queue in patchwork should not be touched by patch
> submitters.
> 
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 0/3] tc: act_ife: handle IEEE IFE ethertype as default
From: David Miller @ 2017-08-29 22:14 UTC (permalink / raw)
  To: aring; +Cc: jhs, yotamg, xiyou.wangcong, jiri, lucasb, netdev,
	linux-kselftest
In-Reply-To: <20170828190315.26646-1-aring@mojatatu.com>

From: Alexander Aring <aring@mojatatu.com>
Date: Mon, 28 Aug 2017 15:03:12 -0400

> this patch series will introduce the IFE ethertype which is registered by
> IEEE. If the netlink act_ife type netlink attribute is not given it will
> use this value by default now.
> At least it will introduce some UAPI testcases to check if the default type
> is used if not specified and vice versa.

Series applied, thank you.

^ permalink raw reply

* [PATCH net-next] neigh: increase queue_len_bytes to match wmem_default
From: Eric Dumazet @ 2017-08-29 22:16 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: David Miller, netdev
In-Reply-To: <1504029689.11498.79.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

Florian reported UDP xmit drops that could be root caused to the
too small neigh limit.

Current limit is 64 KB, meaning that even a single UDP socket would hit
it, since its default sk_sndbuf comes from net.core.wmem_default
(~212992 bytes on 64bit arches).

Once ARP/ND resolution is in progress, we should allow a little more
packets to be queued, at least for one producer.

Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf
limit and either block in sendmsg() or return -EAGAIN.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/networking/ip-sysctl.txt |    7 +++++--
 include/net/sock.h                     |   10 ++++++++++
 net/core/sock.c                        |   10 ----------
 net/decnet/dn_neigh.c                  |    2 +-
 net/ipv4/arp.c                         |    2 +-
 net/ipv4/tcp_input.c                   |    2 +-
 net/ipv6/ndisc.c                       |    2 +-
 7 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 6b0bc0f715346a097a6df46e2ba2771359abcd23..b3345d0fe0a67e477a6754848e7fc7be144322d5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -109,7 +109,10 @@ neigh/default/unres_qlen_bytes - INTEGER
 	queued for each	unresolved address by other network layers.
 	(added in linux 3.3)
 	Setting negative value is meaningless and will return error.
-	Default: 65536 Bytes(64KB)
+	Default: SK_WMEM_MAX, (same as net.core.wmem_default).
+		Exact value depends on architecture and kernel options,
+		but should be enough to allow queuing 256 packets
+		of medium size.
 
 neigh/default/unres_qlen - INTEGER
 	The maximum number of packets which may be queued for each
@@ -119,7 +122,7 @@ neigh/default/unres_qlen - INTEGER
 	unexpected packet loss. The current default value is calculated
 	according to default value of unres_qlen_bytes and true size of
 	packet.
-	Default: 31
+	Default: 101
 
 mtu_expires - INTEGER
 	Time, in seconds, that cached PMTU information is kept.
diff --git a/include/net/sock.h b/include/net/sock.h
index 1c2912d433e81b10f3fdc87bcfcbb091570edc03..03a362568357acc7278a318423dd3873103f90ca 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2368,6 +2368,16 @@ bool sk_net_capable(const struct sock *sk, int cap);
 
 void sk_get_meminfo(const struct sock *sk, u32 *meminfo);
 
+/* Take into consideration the size of the struct sk_buff overhead in the
+ * determination of these values, since that is non-constant across
+ * platforms.  This makes socket queueing behavior and performance
+ * not depend upon such differences.
+ */
+#define _SK_MEM_PACKETS		256
+#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
+#define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
+#define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
+
 extern __u32 sysctl_wmem_max;
 extern __u32 sysctl_rmem_max;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index dfdd14cac775e9bfcee0085ee32ffcd0ab28b67b..9b7b6bbb2a23e7652a1f34a305f29d49de00bc8c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -307,16 +307,6 @@ static struct lock_class_key af_wlock_keys[AF_MAX];
 static struct lock_class_key af_elock_keys[AF_MAX];
 static struct lock_class_key af_kern_callback_keys[AF_MAX];
 
-/* Take into consideration the size of the struct sk_buff overhead in the
- * determination of these values, since that is non-constant across
- * platforms.  This makes socket queueing behavior and performance
- * not depend upon such differences.
- */
-#define _SK_MEM_PACKETS		256
-#define _SK_MEM_OVERHEAD	SKB_TRUESIZE(256)
-#define SK_WMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
-#define SK_RMEM_MAX		(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS)
-
 /* Run time adjustable parameters. */
 __u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX;
 EXPORT_SYMBOL(sysctl_wmem_max);
diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index 21dedf6fd0f76dec22b2b3685beb89cfefea7ded..22bf0b95d6edc3c27ef3a99d27cb70a1551e3e0e 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -94,7 +94,7 @@ struct neigh_table dn_neigh_table = {
 			[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ,
 			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
 			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
-			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64*1024,
+			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
 			[NEIGH_VAR_PROXY_QLEN] = 0,
 			[NEIGH_VAR_ANYCAST_DELAY] = 0,
 			[NEIGH_VAR_PROXY_DELAY] = 0,
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 8b52179ddc6e54eabf6d3c2ed0132083228680bb..7c45b8896709815c5dde5972fd57cb5c3bcb2648 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -171,7 +171,7 @@ struct neigh_table arp_tbl = {
 			[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ,
 			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
 			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
-			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64 * 1024,
+			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
 			[NEIGH_VAR_PROXY_QLEN] = 64,
 			[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,
 			[NEIGH_VAR_PROXY_DELAY]	= (8 * HZ) / 10,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 568ccfd6dd371d88136ffabe5cfcc36f099786b6..7616cd76f6f6a62f395da897baef2c66c0098193 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6086,9 +6086,9 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct net *net = sock_net(sk);
 	struct sock *fastopen_sk = NULL;
-	struct dst_entry *dst = NULL;
 	struct request_sock *req;
 	bool want_cookie = false;
+	struct dst_entry *dst;
 	struct flowi fl;
 
 	/* TW buckets are converted to open requests without
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 5e338eb89509b1df6ebd060f8bd19fcb4b86fe05..266a530414d7be4f1e7be922e465bbab46f7cbac 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -127,7 +127,7 @@ struct neigh_table nd_tbl = {
 			[NEIGH_VAR_BASE_REACHABLE_TIME] = ND_REACHABLE_TIME,
 			[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ,
 			[NEIGH_VAR_GC_STALETIME] = 60 * HZ,
-			[NEIGH_VAR_QUEUE_LEN_BYTES] = 64 * 1024,
+			[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX,
 			[NEIGH_VAR_PROXY_QLEN] = 64,
 			[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,
 			[NEIGH_VAR_PROXY_DELAY] = (8 * HZ) / 10,

^ permalink raw reply related

* Re: [PATCH net-next 0/4] nsh: headers, GSO
From: David Miller @ 2017-08-29 22:17 UTC (permalink / raw)
  To: jbenc; +Cc: netdev, yi.y.yang, e, jan.scheurich, blp
In-Reply-To: <cover.1503948295.git.jbenc@redhat.com>

From: Jiri Benc <jbenc@redhat.com>
Date: Mon, 28 Aug 2017 21:43:20 +0200

> This adds header structs and helpers for NSH together with GSO support.
> 
> Note there is no code in this patchset that actually manipulates the NSH
> headers. That was sent to netdev by Yi Yang ("[PATCH net-next v6 0/3]
> openvswitch: add NSH support"). The aim of this series is to lay the
> groundwork and ease the implementation for him.
> 
> In addition to openvswitch, the NSH support should be added to tc (flower to
> match, act_nsh to push/pop NSH headers). That will come later. There's
> currently no plan to support NSH by other means than those two.
> 
> The patch 3 in this patchset was written by Yi Yang, I took it from the
> aforementioned series and slightly modified it - see the note in the patch.

Series applied, thanks Jiri.

^ permalink raw reply

* Re: Permissions for eBPF objects
From: Mickaël Salaün @ 2017-08-29 22:23 UTC (permalink / raw)
  To: Chenbo Feng, Alexei Starovoitov
  Cc: Daniel Borkmann, Jeffrey Vander Stoep, Stephen Smalley, netdev,
	SELinux
In-Reply-To: <CAMOXUJknLRNjtfQmFy65xiO974MhPK7As5D7La_3udqmS-BxNg@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2147 bytes --]


On 29/08/2017 03:44, Chenbo Feng wrote:
> On Mon, Aug 28, 2017 at 6:15 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Mon, Aug 28, 2017 at 05:47:19PM -0700, Chenbo Feng wrote:
>>> On Fri, Aug 25, 2017 at 6:03 PM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Fri, Aug 25, 2017 at 10:07:27PM +0200, Daniel Borkmann wrote:
>>>>> On 08/25/2017 09:52 PM, Chenbo Feng wrote:
>>>>>> On Fri, Aug 25, 2017 at 12:45 PM, Jeffrey Vander Stoep <jeffv@google.com> wrote:
>>>>>>> On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote:
>>>>>>>> On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
>>>>>>>> wrote:
>>>>>>>>> I’d like to get your thoughts on adding LSM permission checks on BPF
>>>>>>>>> objects.
>>>>
>>>> before reinventing the wheel please take a look at landlock work.
>>>> Everything that was discussed in this thread is covered by it.
>>>> The patches have been in development for more than a year and most of the early
>>>> issues have been resolved.
>>>> It will be presented again during security summit in LA in September.
>>>>
>>> I am not very familiar with landlock lsm, isn't this module also
>>> depend on the lsm hooks to do
>>> the landlock check? If so then adding lsm hooks for eBPF object seems
>>> not conflict with the
>>> work on progress.
>>
>> I see. I got it the other way around. What lsm checks are you proposing?
>> and why unprivileged_bpf_disabled is not enough?
>> you want to allow unpriv only for specific user(s) ?
>>
> Exactly, the proposal patch I am currently working on will add checks
> before map creation,
> map read,  and map modify, since all these functionalities will be
> available to all users when
> unprivileged_bpf_disabled is turned off. And eBPF prog_load may also
> need a check as well
> since loading some types of program is not restricted either.
> 

It would be interesting to be able to check a wide range of actions
performed with the BPF syscall: the command and the union bpf_attr
argument. Because it is a multiplexer, that may be challenging, though.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH net-next v3 2/2] tcp_diag: report TCP MD5 signing keys and addresses
From: Ivan Delalande @ 2017-08-29 22:29 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev, Ivan Delalande
In-Reply-To: <20170829222954.24863-1-colona@arista.com>

Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
not possible to retrieve these from the kernel once they have been
configured on sockets.

Signed-off-by: Ivan Delalande <colona@arista.com>
---
 include/uapi/linux/inet_diag.h |   1 +
 net/ipv4/tcp_diag.c            | 115 ++++++++++++++++++++++++++++++++++++++---
 2 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index 678496897a68..f52ff62bfabe 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -143,6 +143,7 @@ enum {
 	INET_DIAG_MARK,
 	INET_DIAG_BBRINFO,
 	INET_DIAG_CLASS_ID,
+	INET_DIAG_MD5SIG,
 	__INET_DIAG_MAX,
 };
 
diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index a748c74aa8b7..f972f9f7eae4 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -16,6 +16,7 @@
 
 #include <linux/tcp.h>
 
+#include <net/netlink.h>
 #include <net/tcp.h>
 
 static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
@@ -36,6 +37,106 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
 		tcp_get_info(sk, info);
 }
 
+#ifdef CONFIG_TCP_MD5SIG
+static void inet_diag_md5sig_fill(struct tcp_md5sig *info,
+				  const struct tcp_md5sig_key *key)
+{
+	#if IS_ENABLED(CONFIG_IPV6)
+	if (key->family == AF_INET6) {
+		struct sockaddr_in6 *sin6 =
+			(struct sockaddr_in6 *)&info->tcpm_addr;
+
+		memcpy(&sin6->sin6_addr, &key->addr.a6,
+		       sizeof(struct in6_addr));
+	} else
+	#endif
+	{
+		struct sockaddr_in *sin =
+			(struct sockaddr_in *)&info->tcpm_addr;
+
+		memcpy(&sin->sin_addr, &key->addr.a4, sizeof(struct in_addr));
+	}
+
+	info->tcpm_addr.ss_family = key->family;
+	info->tcpm_prefixlen = key->prefixlen;
+	info->tcpm_keylen = key->keylen;
+	memcpy(info->tcpm_key, key->key, key->keylen);
+}
+
+static int inet_diag_put_md5sig(struct sk_buff *skb,
+				const struct tcp_md5sig_info *md5sig)
+{
+	const struct tcp_md5sig_key *key;
+	struct nlattr *attr;
+	struct tcp_md5sig *info;
+	int md5sig_count = 0;
+
+	hlist_for_each_entry_rcu(key, &md5sig->head, node)
+		md5sig_count++;
+	if (md5sig_count == 0)
+		return 0;
+
+	attr = nla_reserve(skb, INET_DIAG_MD5SIG,
+			   md5sig_count * sizeof(struct tcp_md5sig));
+	if (!attr)
+		return -EMSGSIZE;
+
+	info = nla_data(attr);
+	hlist_for_each_entry_rcu(key, &md5sig->head, node) {
+		inet_diag_md5sig_fill(info++, key);
+		if (--md5sig_count == 0)
+			break;
+	}
+	if (md5sig_count > 0)
+		memset(info, 0, md5sig_count * sizeof(struct tcp_md5sig));
+
+	return 0;
+}
+#endif
+
+static int tcp_diag_get_aux(struct sock *sk, bool net_admin,
+			    struct sk_buff *skb)
+{
+#ifdef CONFIG_TCP_MD5SIG
+	if (net_admin) {
+		struct tcp_md5sig_info *md5sig;
+		int err = 0;
+
+		rcu_read_lock();
+		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
+		if (md5sig)
+			err = inet_diag_put_md5sig(skb, md5sig);
+		rcu_read_unlock();
+		if (err < 0)
+			return err;
+	}
+#endif
+
+	return 0;
+}
+
+static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
+{
+	size_t size = 0;
+
+#ifdef CONFIG_TCP_MD5SIG
+	if (sk_fullsock(sk)) {
+		const struct tcp_md5sig_info *md5sig;
+		const struct tcp_md5sig_key *key;
+
+		rcu_read_lock();
+		md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info);
+		if (md5sig) {
+			hlist_for_each_entry_rcu(key, &md5sig->head, node)
+				size += sizeof(struct tcp_md5sig);
+		}
+		rcu_read_unlock();
+	}
+#endif
+
+	return size;
+}
+
 static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb,
 			  const struct inet_diag_req_v2 *r, struct nlattr *bc)
 {
@@ -68,13 +169,15 @@ static int tcp_diag_destroy(struct sk_buff *in_skb,
 #endif
 
 static const struct inet_diag_handler tcp_diag_handler = {
-	.dump		 = tcp_diag_dump,
-	.dump_one	 = tcp_diag_dump_one,
-	.idiag_get_info	 = tcp_diag_get_info,
-	.idiag_type	 = IPPROTO_TCP,
-	.idiag_info_size = sizeof(struct tcp_info),
+	.dump			= tcp_diag_dump,
+	.dump_one		= tcp_diag_dump_one,
+	.idiag_get_info		= tcp_diag_get_info,
+	.idiag_get_aux		= tcp_diag_get_aux,
+	.idiag_get_aux_size	= tcp_diag_get_aux_size,
+	.idiag_type		= IPPROTO_TCP,
+	.idiag_info_size	= sizeof(struct tcp_info),
 #ifdef CONFIG_INET_DIAG_DESTROY
-	.destroy	 = tcp_diag_destroy,
+	.destroy		= tcp_diag_destroy,
 #endif
 };
 
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next v3 0/2] report TCP MD5 signing keys and addresses
From: Ivan Delalande @ 2017-08-29 22:29 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev, Ivan Delalande

Allow userspace to retrieve MD5 signature keys and addresses configured
on TCP sockets through inet_diag.

Thank you Eric Dumazet for the useful explanations and feedback.

v3: - rename inet_diag_*md5sig in tcp_diag.c to tcp_diag_* for
      consistency,
    - don't lock the socket tcp_diag_put_md5sig,
    - add checks on md5sig_count in tcp_diag_put_md5sig to not create
      the netlink attribute if the list is empty, and to avoid overflows
      or memory leaks if the list has changed in the meantime.

v2: - move changes to tcp_diag.c and extend inet_diag_handler to allow
      protocols to provide additional data on INET_DIAG_INFO,
    - lock socket before calling tcp_diag_put_md5sig.


I also have a patch for iproute2/ss to test this change, making it print
this new attribute. I'm planning to polish and send it if this series
gets applied.


Ivan Delalande (2):
  inet_diag: allow protocols to provide additional data
  tcp_diag: report TCP MD5 signing keys and addresses

 include/linux/inet_diag.h      |   7 +++
 include/uapi/linux/inet_diag.h |   1 +
 net/ipv4/inet_diag.c           |  22 ++++++--
 net/ipv4/tcp_diag.c            | 115 ++++++++++++++++++++++++++++++++++++++---
 4 files changed, 135 insertions(+), 10 deletions(-)

-- 
2.14.1

^ permalink raw reply

* [PATCH net-next v3 1/2] inet_diag: allow protocols to provide additional data
From: Ivan Delalande @ 2017-08-29 22:29 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev, Ivan Delalande
In-Reply-To: <20170829222954.24863-1-colona@arista.com>

Extend inet_diag_handler to allow individual protocols to report
additional data on INET_DIAG_INFO through idiag_get_aux. The size
can be dynamic and is computed by idiag_get_aux_size.

Signed-off-by: Ivan Delalande <colona@arista.com>
---
 include/linux/inet_diag.h |  7 +++++++
 net/ipv4/inet_diag.c      | 22 ++++++++++++++++++----
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h
index 65da430e260f..ee251c585854 100644
--- a/include/linux/inet_diag.h
+++ b/include/linux/inet_diag.h
@@ -25,6 +25,13 @@ struct inet_diag_handler {
 					  struct inet_diag_msg *r,
 					  void *info);
 
+	int		(*idiag_get_aux)(struct sock *sk,
+					 bool net_admin,
+					 struct sk_buff *skb);
+
+	size_t		(*idiag_get_aux_size)(struct sock *sk,
+					      bool net_admin);
+
 	int		(*destroy)(struct sk_buff *in_skb,
 				   const struct inet_diag_req_v2 *req);
 
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 67325d5832d7..8a88ef373395 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -93,8 +93,17 @@ void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill);
 
-static size_t inet_sk_attr_size(void)
+static size_t inet_sk_attr_size(struct sock *sk,
+				const struct inet_diag_req_v2 *req,
+				bool net_admin)
 {
+	const struct inet_diag_handler *handler;
+	size_t aux = 0;
+
+	handler = inet_diag_table[req->sdiag_protocol];
+	if (handler && handler->idiag_get_aux_size)
+		aux = handler->idiag_get_aux_size(sk, net_admin);
+
 	return	  nla_total_size(sizeof(struct tcp_info))
 		+ nla_total_size(1) /* INET_DIAG_SHUTDOWN */
 		+ nla_total_size(1) /* INET_DIAG_TOS */
@@ -105,6 +114,7 @@ static size_t inet_sk_attr_size(void)
 		+ nla_total_size(SK_MEMINFO_VARS * sizeof(u32))
 		+ nla_total_size(TCP_CA_NAME_MAX)
 		+ nla_total_size(sizeof(struct tcpvegas_info))
+		+ nla_total_size(aux)
 		+ 64;
 }
 
@@ -260,6 +270,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 
 	handler->idiag_get_info(sk, r, info);
 
+	if (ext & (1 << (INET_DIAG_INFO - 1)) && handler->idiag_get_aux)
+		if (handler->idiag_get_aux(sk, net_admin, skb) < 0)
+			goto errout;
+
 	if (sk->sk_state < TCP_TIME_WAIT) {
 		union tcp_cc_info info;
 		size_t sz = 0;
@@ -452,13 +466,14 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
 	struct net *net = sock_net(in_skb->sk);
 	struct sk_buff *rep;
 	struct sock *sk;
+	bool net_admin = netlink_net_capable(in_skb, CAP_NET_ADMIN);
 	int err;
 
 	sk = inet_diag_find_one_icsk(net, hashinfo, req);
 	if (IS_ERR(sk))
 		return PTR_ERR(sk);
 
-	rep = nlmsg_new(inet_sk_attr_size(), GFP_KERNEL);
+	rep = nlmsg_new(inet_sk_attr_size(sk, req, net_admin), GFP_KERNEL);
 	if (!rep) {
 		err = -ENOMEM;
 		goto out;
@@ -467,8 +482,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
 	err = sk_diag_fill(sk, rep, req,
 			   sk_user_ns(NETLINK_CB(in_skb).sk),
 			   NETLINK_CB(in_skb).portid,
-			   nlh->nlmsg_seq, 0, nlh,
-			   netlink_net_capable(in_skb, CAP_NET_ADMIN));
+			   nlh->nlmsg_seq, 0, nlh, net_admin);
 	if (err < 0) {
 		WARN_ON(err == -EMSGSIZE);
 		nlmsg_free(rep);
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net-next v1] amd-xgbe: Interrupt summary bits are h/w version dependent
From: David Miller @ 2017-08-29 22:31 UTC (permalink / raw)
  To: thomas.lendacky; +Cc: netdev
In-Reply-To: <20170828202934.17073.940.stgit@tlendack-t1.amdoffice.net>

From: Tom Lendacky <thomas.lendacky@amd.com>
Date: Mon, 28 Aug 2017 15:29:34 -0500

> There is a difference in the bit position of the normal interrupt summary
> enable (NIE) and abnormal interrupt summary enable (AIE) between revisions
> of the hardware.  For older revisions the NIE and AIE bits are positions
> 16 and 15 respectively.  For newer revisions the NIE and AIE bits are
> positions 15 and 14.  The effect in changing the bit position is that
> newer hardware won't receive AIE interrupts in the current version of the
> driver.  Specifically, the driver uses this interrupt to collect
> statistics on when a receive buffer unavailable event occurs and to
> restart the driver/device when a fatal bus error occurs.
> 
> Update the driver to set the interrupt enable bit based on the reported
> version of the hardware.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: Use rt6i_idev index for echo replies to a local address
From: David Miller @ 2017-08-29 22:33 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, tariqt
In-Reply-To: <1503953614-32395-1-git-send-email-dsahern@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Mon, 28 Aug 2017 13:53:34 -0700

> Tariq repored local pings to linklocal address is failing:
> $ ifconfig ens8
> ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>         inet 11.141.16.6  netmask 255.255.0.0  broadcast 11.141.255.255
>         inet6 fe80::7efe:90ff:fecb:7502  prefixlen 64  scopeid 0x20<link>
>         ether 7c:fe:90:cb:75:02  txqueuelen 1000  (Ethernet)
>         RX packets 12  bytes 1164 (1.1 KiB)
>         RX errors 0  dropped 0  overruns 0  frame 0
>         TX packets 30  bytes 2484 (2.4 KiB)
>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
> 
> $  /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
> PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes
> 
> --- fe80::7efe:90ff:fecb:7502%ens8 ping statistics ---
> 3 packets transmitted, 0 received, 100% packet loss, time 2043ms
> 
> icmpv6_echo_reply needs to use the rt6i_idev dev index for local traffic
> similar to how icmp6_send does. Convert the change for icmp6_send into a
> helper that can be used in both places. Add the long over due
> skb_rt6_info helper to convert dst on an skb to rt6_info similar to
> skb_rtable for ipv4.
> 
> Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on
>        device with address")
> Reported-by: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied but please do not break up Fixes: tags even if the line is
very long.

Otherwise it's a pain to grep the logs for specific multi-word string
sequences in the Fixes tag.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next 0/6] flow_dissector: Protocol specific flow dissector offload
From: David Miller @ 2017-08-29 22:34 UTC (permalink / raw)
  To: tom; +Cc: netdev
In-Reply-To: <20170829171942.8974-1-tom@quantonium.net>


Please add proper signoffs to your patches.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] inet_diag: allow protocols to provide additional data
From: Stephen Hemminger @ 2017-08-29 22:34 UTC (permalink / raw)
  To: Ivan Delalande; +Cc: David Miller, Eric Dumazet, netdev
In-Reply-To: <20170829222954.24863-2-colona@arista.com>

On Tue, 29 Aug 2017 15:29:53 -0700
Ivan Delalande <colona@arista.com> wrote:

> @@ -452,13 +466,14 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo,
>  	struct net *net = sock_net(in_skb->sk);
>  	struct sk_buff *rep;
>  	struct sock *sk;
> +	bool net_admin = netlink_net_capable(in_skb, CAP_NET_ADMIN);

Please keep declarations in Christmas tree order if possible.

int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
		      struct sk_buff *skb, const struct inet_diag_req_v2 *req,
		      struct user_namespace *user_ns,
		      u32 portid, u32 seq, u16 nlmsg_flags,
		      const struct nlmsghdr *unlh,
		      bool net_admin)
{
	bool net_admin = netlink_net_capable(in_skb, CAP_NET_ADMIN);
	const struct tcp_congestion_ops *ca_ops;
	const struct inet_diag_handler *handler;
	int ext = req->idiag_ext;
	struct inet_diag_msg *r;
	struct nlmsghdr  *nlh;
	struct nlattr *attr;
	void *info = NULL;

^ permalink raw reply

* Re: [PATCH RFC 2/2] virtio_net: enable probing for NEEDS_RESET support
From: Willem de Bruijn @ 2017-08-29 22:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Network Development, Jason Wang, virtualization, Willem de Bruijn,
	virtio-dev
In-Reply-To: <20170830000452-mutt-send-email-mst@kernel.org>

On Tue, Aug 29, 2017 at 5:13 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Aug 29, 2017 at 05:02:29PM -0400, Willem de Bruijn wrote:
>> + virtio-dev
>>
>> On Tue, Aug 29, 2017 at 4:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Tue, Aug 29, 2017 at 04:27:41PM -0400, Willem de Bruijn wrote:
>> >> On Tue, Aug 29, 2017 at 4:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> >> > On Tue, Aug 29, 2017 at 04:07:59PM -0400, Willem de Bruijn wrote:
>> >> >> From: Willem de Bruijn <willemb@google.com>
>> >> >>
>> >> >> Implement a mechanism to signal that a virtio device implements the
>> >> >> VIRTIO_CONFIG_S_NEEDS_RESET command.
>> >> >>
>> >> >> Testing for VIRTIO_CONFIG_S_NEEDS_RESET support by issuing the request
>> >> >> and verifying the reset state would require an expensive state change.
>> >> >>
>> >> >> To avoid that, add a status bit to the feature register used during
>> >> >> feature negotiation between host and guest.
>> >> >>
>> >> >> Set the bit for virtio-net, which supports the feature.
>> >> >>
>> >> >> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> >> >
>> >> > All virtio 1 devices have the reset feature
>> >>
>> >> I don't quite follow. No device drivers implement this request currently?
>> >
>> > Depends. Spec 1.0 describes the bit and that driver can respond
>> > by reseting the device. You seem to do something else
>> > in this patchset, but as designed in 1.0 it does not seem to need
>> > a feature bit.
>>
>> I see. So support is designed to be best effort?
>>
>> The feature bit is only needed if driver support is optional.
>>
>> The ack response is necessary if the device acts differently
>> depending on whether the reset happened. The device has
>> to reset its local state, too, so I think that this is needed.
>
> That reset should only happen when guest driver resets the device.
> And spec already has a mechanism for that anyway.

And the device can read the ring state to see whether it has reset,
perhaps.

>>
>> >> > so maybe guest does
>> >> > not need this flag. Does device need it? Does device really
>> >> > care that guest can't recover?
>> >>
>> >> If all device drivers support it, then the flag is not needed.
>> >>
>> >> But as long as legacy device drivers can exist that do not support
>> >> this feature, it has to have a way of differentiating between the two.
>> >
>> > Why? Device won't set this unless it's in a bad state. In that case,
>> > setting the bit is harmless even if drivers ignore it.
>>
>> The goal is for the device to be able to rely on the driver reset to get
>> to a good state even if it gets it into a bad state.
>>
>> That allows it to implement optimizations that could get it into that bad
>> state.
>
> I see. I don't think this is what the need reset was designed for.
>
> We can extend it to cover this case but let's add a bit more
> documentation then.

Okay.

> And in particular won't it be better if we could just reset one ring,
> and not the whole device state? This might be nicer so flows on other
> rings aren't disrupted.

Indeed. But that would require a different request, then? It also
depends on the use case. A full device reset is a big hammer,
but if used only to get out of rare edge cases, it is good enough.

>>
>> In particular, in the edge case where the device performs backpressure,
>> takes the descriptor out of the avail ring and does not immediately post
>> it to the used ring.
>>
>> A reset will make the guest free all delayed packets and treat any
>> unsent and unacknowledged as network drops. This allows the
>> device to indeed drop long delayed packets when they eventually
>> surface (e.g., leave a qdisc queue).
>
> In this particular case, won't it be easier for device to just
> report all packets as used, without involving the guest?

When the device can just iterate over the outstanding packet, yeah,
that's actually simpler.

>> This is of course not safe with
>> zerocopy packets.
>
>
> I wonder if we can teach kernel to drop zero copy packets too.

Your point about changing frags[] underneath a cloned skb really does
make that hard. We might be able to mitigate individual specific cases
of high latency, such as TC queue occupancy.

>
> --
> MST

^ permalink raw reply

* [PATCH] drivers: net: xgene: Correct probe sequence handling
From: Iyappan Subramanian @ 2017-08-29 22:43 UTC (permalink / raw)
  To: davem, netdev
  Cc: linux-arm-kernel, dnelson, patches, Quan Nguyen,
	Iyappan Subramanian

From: Quan Nguyen <qnguyen@apm.com>

The phy is connected at early stage of probe but not properly
disconnected if error occurs.  This patch fixes the issue.

Also changing the return type of xgene_enet_check_phy_handle(),
since this function always returns success.

Signed-off-by: Quan Nguyen <qnguyen@apm.com>
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 27 ++++++++++++------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 1d307f2..6e253d9 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1661,21 +1661,21 @@ static int xgene_enet_get_irqs(struct xgene_enet_pdata *pdata)
 	return 0;
 }
 
-static int xgene_enet_check_phy_handle(struct xgene_enet_pdata *pdata)
+static void xgene_enet_check_phy_handle(struct xgene_enet_pdata *pdata)
 {
 	int ret;
 
 	if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII)
-		return 0;
+		return;
 
 	if (!IS_ENABLED(CONFIG_MDIO_XGENE))
-		return 0;
+		return;
 
 	ret = xgene_enet_phy_connect(pdata->ndev);
 	if (!ret)
 		pdata->mdio_driver = true;
 
-	return 0;
+	return;
 }
 
 static void xgene_enet_gpiod_get(struct xgene_enet_pdata *pdata)
@@ -1779,10 +1779,6 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata)
 	if (ret)
 		return ret;
 
-	ret = xgene_enet_check_phy_handle(pdata);
-	if (ret)
-		return ret;
-
 	xgene_enet_gpiod_get(pdata);
 
 	pdata->clk = devm_clk_get(&pdev->dev, NULL);
@@ -2097,9 +2093,11 @@ static int xgene_enet_probe(struct platform_device *pdev)
 		goto err;
 	}
 
+	xgene_enet_check_phy_handle(pdata);
+
 	ret = xgene_enet_init_hw(pdata);
 	if (ret)
-		goto err;
+		goto err2;
 
 	link_state = pdata->mac_ops->link_state;
 	if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
@@ -2117,29 +2115,30 @@ static int xgene_enet_probe(struct platform_device *pdev)
 	spin_lock_init(&pdata->stats_lock);
 	ret = xgene_extd_stats_init(pdata);
 	if (ret)
-		goto err2;
+		goto err1;
 
 	xgene_enet_napi_add(pdata);
 	ret = register_netdev(ndev);
 	if (ret) {
 		netdev_err(ndev, "Failed to register netdev\n");
-		goto err2;
+		goto err1;
 	}
 
 	return 0;
 
-err2:
+err1:
 	/*
 	 * If necessary, free_netdev() will call netif_napi_del() and undo
 	 * the effects of xgene_enet_napi_add()'s calls to netif_napi_add().
 	 */
 
+	xgene_enet_delete_desc_rings(pdata);
+
+err2:
 	if (pdata->mdio_driver)
 		xgene_enet_phy_disconnect(pdata);
 	else if (phy_interface_mode_is_rgmii(pdata->phy_mode))
 		xgene_enet_mdio_remove(pdata);
-err1:
-	xgene_enet_delete_desc_rings(pdata);
 err:
 	free_netdev(ndev);
 	return ret;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
From: Willem de Bruijn @ 2017-08-29 22:55 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Network Development, Koichiro Den, virtualization
In-Reply-To: <20170829233920-mutt-send-email-mst@kernel.org>

On Tue, Aug 29, 2017 at 4:40 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Aug 29, 2017 at 03:53:08PM -0400, Willem de Bruijn wrote:
>> On Tue, Aug 29, 2017 at 3:42 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > On Tue, Aug 29, 2017 at 03:35:38PM -0400, Willem de Bruijn wrote:
>> >> By the way, I have had an unrelated patch outstanding for a while
>> >> to have virtio-net support the VIRTIO_CONFIG_S_NEEDS_RESET
>> >> command. Will send that as RFC.
>> >
>> > Oh nice.
>>
>> Great :)
>>
>> > One needs to be careful about locking there which is why
>> > no devices support that yet.
>>
>> I originally wrote it based on the virtnet_reset function introduced
>> for xdp. Calling this from virtnet_config_changed_work is non trivial,
>> as virtnet_freeze_down waits until no config worker is running.
>>
>> Otherwise, I could not find any constraints on when freeze may be
>> called, and it largely follows the same path. I hope I didn't miss anything.
>
> The issue is that on freeze processes are not running so we
> generally know no new packets will arrive (might be wrong
> for bridging, then it's a bug). On device error you must
> prevent new skbs from coming in, etc.

Thanks a lot for the quick review. I had indeed not yet figured out
which invariants freeze can depend on that are not universal. Same
for the virtnet_reset call from the .ndo_xdp. Will need to take a much
better look at that.

^ permalink raw reply

* Re: [PATCH net-next v3 2/2] tcp_diag: report TCP MD5 signing keys and addresses
From: Eric Dumazet @ 2017-08-29 23:01 UTC (permalink / raw)
  To: Ivan Delalande; +Cc: David Miller, netdev
In-Reply-To: <20170829222954.24863-3-colona@arista.com>

On Tue, 2017-08-29 at 15:29 -0700, Ivan Delalande wrote:
> Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
> processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
> not possible to retrieve these from the kernel once they have been
> configured on sockets.
> 
> Signed-off-by: Ivan Delalande <colona@arista.com>
> ---
>  include/uapi/linux/inet_diag.h |   1 +
>  net/ipv4/tcp_diag.c            | 115 ++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 110 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
> index 678496897a68..f52ff62bfabe 100644
> --- a/include/uapi/linux/inet_diag.h
> +++ b/include/uapi/linux/inet_diag.h
> @@ -143,6 +143,7 @@ enum {
>  	INET_DIAG_MARK,
>  	INET_DIAG_BBRINFO,
>  	INET_DIAG_CLASS_ID,
> +	INET_DIAG_MD5SIG,
>  	__INET_DIAG_MAX,
>  };
>  
> diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
> index a748c74aa8b7..f972f9f7eae4 100644
> --- a/net/ipv4/tcp_diag.c
> +++ b/net/ipv4/tcp_diag.c
> @@ -16,6 +16,7 @@
>  
>  #include <linux/tcp.h>
>  
> +#include <net/netlink.h>
>  #include <net/tcp.h>
>  
>  static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
> @@ -36,6 +37,106 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r,
>  		tcp_get_info(sk, info);
>  }
>  
> +#ifdef CONFIG_TCP_MD5SIG
> +static void inet_diag_md5sig_fill(struct tcp_md5sig *info,
> +				  const struct tcp_md5sig_key *key)
> +{
> +	#if IS_ENABLED(CONFIG_IPV6)
> +	if (key->family == AF_INET6) {
> +		struct sockaddr_in6 *sin6 =
> +			(struct sockaddr_in6 *)&info->tcpm_addr;
> +
> +		memcpy(&sin6->sin6_addr, &key->addr.a6,
> +		       sizeof(struct in6_addr));
> +	} else
> +	#endif
> +	{
> +		struct sockaddr_in *sin =
> +			(struct sockaddr_in *)&info->tcpm_addr;
> +
> +		memcpy(&sin->sin_addr, &key->addr.a4, sizeof(struct in_addr));

You probably need a memset(... 0  ...) to clear the XX bytes that are
not used by IPv4 address.

Otherwise your patch is going to 'leak'  124 bytes of kernel memory to
user space and offer yet another way for attackers to build exploits.

AFAIK, struct tcp_md5sig has a huge hole, since it uses

"struct __kernel_sockaddr_storage tcpm_addr "

So a similar problem exists for IPv6.

Maybe it is time to define a much smaller structure, using only 16 bytes
instead of 128 for the address.

^ permalink raw reply

* Re: [RESEND PATCH] Allow passing tid or pid in SCM_CREDENTIALS without CAP_SYS_ADMIN
From: David Miller @ 2017-08-29 23:02 UTC (permalink / raw)
  To: prakash.sangappa; +Cc: linux-kernel, netdev, ebiederm, drepper
In-Reply-To: <1503965540-30393-1-git-send-email-prakash.sangappa@oracle.com>

From: Prakash Sangappa <prakash.sangappa@oracle.com>
Date: Mon, 28 Aug 2017 17:12:20 -0700

> Currently passing tid(gettid(2)) of a thread in struct ucred in
> SCM_CREDENTIALS message requires CAP_SYS_ADMIN capability otherwise
> it fails with EPERM error. Some applications deal with thread id
> of a thread(tid) and so it would help to allow tid in SCM_CREDENTIALS
> message. Basically, either tgid(pid of the process) or the tid of
> the thread should be allowed without the need for CAP_SYS_ADMIN capability.
> 
> SCM_CREDENTIALS will be used to determine the global id of a process or
> a thread running inside a pid namespace.
> 
> This patch adds necessary check to accept tid in SCM_CREDENTIALS
> struct ucred.
> 
> Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>

I'm pretty sure that by the descriptions in previous changes to this
function, what you are proposing is basically a minor form of PID
spoofing which we only want someone with CAP_SYS_ADMIN over the
PID namespace to be able to do.

Sorry, I'm not applying this.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox