Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v6 3/5] rxrpc: check return value of skb_to_sgvec always
From: Sabrina Dubroca @ 2017-04-28 11:41 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: netdev, linux-kernel, David.Laight, kernel-hardening, davem
In-Reply-To: <20170425184734.26563-3-Jason@zx2c4.com>

2017-04-25, 20:47:32 +0200, Jason A. Donenfeld wrote:
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
>  net/rxrpc/rxkad.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
> index 4374e7b9c7bf..dcf46c9c3ece 100644
> --- a/net/rxrpc/rxkad.c
> +++ b/net/rxrpc/rxkad.c
[...]
> @@ -429,7 +432,8 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
>  	}
>  

Adding a few more lines of context:

	sg = _sg;
	if (unlikely(nsg > 4)) {
		sg = kmalloc(sizeof(*sg) * nsg, GFP_NOIO);
		if (!sg)
			goto nomem;
	}

>  	sg_init_table(sg, nsg);
> -	skb_to_sgvec(skb, sg, offset, len);
> +	if (unlikely(skb_to_sgvec(skb, sg, offset, len) < 0))
> +		goto nomem;

You're leaking sg when nsg > 4, you'll need to add this:

	if (sg != _sg)
		kfree(sg);



BTW, when you resubmit, please Cc: the maintainers of the files you're
changing for each patch, so that they can review this stuff. And send
patch 1 to all of them, otherwise they might be surprised that we even
need <0 checking after calls to skb_to_sgvec.

You might also want to add a cover letter.

-- 
Sabrina

^ permalink raw reply

* Re: rhashtable - Cap total number of entries to 2^31
From: Herbert Xu @ 2017-04-28 11:31 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Florian Fainelli, David Miller, fw, netdev, Thomas Graf,
	Stephen Rothwell, Linux-Next Mailing List,
	Linux Kernel Mailing List
In-Reply-To: <acb22f3f-8286-01f2-e536-0ab44eb06b34@de.ibm.com>

On Fri, Apr 28, 2017 at 12:23:15PM +0200, Christian Borntraeger wrote:
>
> I can reproduce this boot failure on s390 bisected to 
> commit 6d684e54690caef45cf14051ddeb7c71beeb681b
>    rhashtable: Cap total number of entries to 2^31
> in linux-next from Apr 28

It should go away with

https://patchwork.ozlabs.org/patch/756233/

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* 31868 netdev
From: stef.ryckmans @ 2017-04-28 11:03 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: 051948408.zip --]
[-- Type: application/zip, Size: 2842 bytes --]

^ permalink raw reply

* Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
From: Jesper Dangaard Brouer @ 2017-04-28 10:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Gospodarek, John Fastabend, Alexei Starovoitov,
	Daniel Borkmann, Daniel Borkmann, netdev@vger.kernel.org,
	xdp-newbies@vger.kernel.org, brouer
In-Reply-To: <53e9dd2f-f40a-b43b-99c9-62f5ce3a665c@fb.com>

On Thu, 27 Apr 2017 16:31:14 -0700
Alexei Starovoitov <ast@fb.com> wrote:

> On 4/27/17 1:41 AM, Jesper Dangaard Brouer wrote:
> > When registering/attaching a XDP/bpf program, we would just send the
> > file-descriptor for this port-map along (like we do with the bpf_prog
> > FD). Plus, it own ingress-port number this program is in the port-map.
> >
> > It is not clear to me, in-which-data-structure on the kernel-side we
> > store this reference to the port-map and ingress-port. As today we only
> > have the "raw" struct bpf_prog pointer. I see several options:
> >
> > 1. Create a new xdp_prog struct that contains existing bpf_prog,
> > a port-map pointer and ingress-port. (IMHO easiest solution)
> >
> > 2. Just create a new pointer to port-map and store it in driver rx-ring
> > struct (like existing bpf_prog), but this create a race-challenge
> > replacing (cmpxchg) the program (or perhaps it's not a problem as it
> > runs under rcu and RTNL-lock).
> >
> > 3. Extend bpf_prog to store this port-map and ingress-port, and have a
> > fast-way to access it.  I assume it will be accessible via
> > bpf_prog->bpf_prog_aux->used_maps[X] but it will be too slow for XDP.  
> 
> I'm not sure I completely follow the 3 proposals.
> Are you suggesting to have only one netdev_array per program?

Yes, but I can see you have a more clever idea below.

> Why not to allow any number like we do for tailcall+prog_array, etc?

> We can teach verifier to allow new helper
>  bpf_tx_port(netdev_array, port_num);
> to only be used with netdev_array map type.
> It will fetch netdevice pointer from netdev_array[port_num]
> and will tx the packet into it.

I love it. 

I just don't like the "netdev" part of the name "netdev_array" as one
basic ideas of a port tabel, is that a port can be anything that can
consume a XDP_buff packet.  This generalization allow us to move code
out of the drivers.  We might be on the same page, as I do imagine that
netdev_array or port_array is just a struct bpf_map pointer, and the
bpf_map->map_type will tell us that this bpf_map contains net_device
pointers.  Thus, when later introducing a new type of redirect (like to
a socket or remote-CPU) then we just add a new bpf_map_type for this,
without needing to change anything in the drivers, right?

Do you imagine that bpf-side bpf_tx_port() returns XDP_REDIRECT?
Or does it return if the call was successful (e.g validate port_num
existed in map)?

On the kernel side, we need to receive this info "port_array" and
"port_num", given you don't provide the call a xdp_buff/ctx, then I
assume you want the per-CPU temp-store solution.  Then during the
XDP_REDIRECT action we call a core redirect function that based on the
bpf_map_type does a lookup, and find the net_device ptr.


> We can make it similar to bpf_tail_call(), so that program will
> finish on successful bpf_tx_port() or
> make it into 'delayed' tx which will be executed when program finishes.
> Not sure which approach is better.

I know you are talking about something slightly different, about
delaying TX.

But I want to mention (as I've done before) that it is important (for
me) that we get bulking working/integrated.   I imagine the driver will
call a function that will delay the TX/redirect action and at the end
of the NAPI cycle have a function that flush packets, bulk per
destination port.

I was wondering where to store these delayed TX packets, but now that
we have an associated bpf_map data-structure (netdev_array), I'm thinking
about storing packets (ordered by port) inside that.  And then have a
bpf_tx_flush(netdev_array) call in the driver (for every port-table-map
seen, which will likely be small).


> We can also extend this netdev_array into broadcast/multicast. Like
> bpf_tx_allports(&netdev_array);
> call from the program will xmit the packet to all netdevices
> in that 'netdev_array' map type.

When broadcasting you often don't want to broadcast the packet out of
the incoming interface.  How can you support this?

Normally you would know your ingress port, and then excluded that port
in the broadcast.  But with many netdev_array's how do the program know
it's own ingress port.


> The map-in-map support can be trivially extended to allow netdev_array,
> then the program can create N multicast groups of netdevices.
> Each multicast group == one netdev_array map.
> The user space will populate a hashmap with these netdev_arrays and
> bpf kernel side can select dynamically which multicast group to use
> to send the packets to.
> bpf kernel side may look like:
> struct bpf_netdev_array *netdev_array = bpf_map_lookup_elem(&hash, key);
> if (!netdev_array)
>    ...
> if (my_condition)
>     bpf_tx_allports(netdev_array);  /* broadcast to all netdevices */
> else
>     bpf_tx_port(netdev_array, port_num); /* tx into one netdevice */
> 
> that's an artificial example. Just trying to point out
> that we shouldn't restrict the feature too soon.
 
I like how you solve the multicast problem.  (But I do need to learn
some more of the inner-workings of bpf map-in-map to follow this
completely).

Thanks a lot for all this input, I got a much more clear picture of how
I can/should implement this :-)
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH iproute2 net-next] bpf: add support for generic xdp
From: Daniel Borkmann @ 2017-04-28 10:42 UTC (permalink / raw)
  To: stephen; +Cc: alexei.starovoitov, davem, netdev, Daniel Borkmann

Follow-up to commit c7272ca72009 ("bpf: add initial support for
attaching xdp progs") to also support generic XDP. This adds an
indicator for loaded generic XDP programs when programs are loaded
as shown in c7272ca72009, but the driver still lacks native XDP
support.

  # ip link
  [...]
  3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc [...]
      link/ether 0c:c4:7a:03:f9:25 brd ff:ff:ff:ff:ff:ff
  [...]

In case the driver does support native XDP, but the user wants
to load the program as generic XDP (e.g. for testing purposes),
then this can be done with the same semantics as in c7272ca72009,
but with 'xdpgeneric' instead of 'xdp' command for loading:

  # ip -force link set dev eno1 xdpgeneric obj xdp.o

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 ( Requires a header update to pull in XDP_FLAGS_SKB_MODE. )

 ip/iplink.c           |  7 +++++--
 ip/iplink_xdp.c       | 46 +++++++++++++++++++++++++++++++++-------------
 ip/xdp.h              |  2 +-
 man/man8/ip-link.8.in | 19 +++++++++++++++++--
 4 files changed, 56 insertions(+), 18 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 866ad72..96b0da3 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -606,9 +606,12 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 			if (get_integer(&mtu, *argv, 0))
 				invarg("Invalid \"mtu\" value\n", *argv);
 			addattr_l(&req->n, sizeof(*req), IFLA_MTU, &mtu, 4);
-		} else if (strcmp(*argv, "xdp") == 0) {
+		} else if (strcmp(*argv, "xdpgeneric") == 0 ||
+			   strcmp(*argv, "xdp") == 0) {
+			bool generic = strcmp(*argv, "xdpgeneric") == 0;
+
 			NEXT_ARG();
-			if (xdp_parse(&argc, &argv, req))
+			if (xdp_parse(&argc, &argv, req, generic))
 				exit(-1);
 		} else if (strcmp(*argv, "netns") == 0) {
 			NEXT_ARG();
diff --git a/ip/iplink_xdp.c b/ip/iplink_xdp.c
index a81ed97..4a3343f 100644
--- a/ip/iplink_xdp.c
+++ b/ip/iplink_xdp.c
@@ -19,41 +19,56 @@
 
 extern int force;
 
+struct xdp_req {
+	struct iplink_req *req;
+	__u32 flags;
+};
+
 static void xdp_ebpf_cb(void *raw, int fd, const char *annotation)
 {
-	__u32 flags = !force ? XDP_FLAGS_UPDATE_IF_NOEXIST : 0;
-	struct iplink_req *req = raw;
-	struct rtattr *xdp;
+	struct xdp_req *xdp = raw;
+	struct iplink_req *req = xdp->req;
+	struct rtattr *xdp_attr;
 
-	xdp = addattr_nest(&req->n, sizeof(*req), IFLA_XDP);
+	xdp_attr = addattr_nest(&req->n, sizeof(*req), IFLA_XDP);
 	addattr32(&req->n, sizeof(*req), IFLA_XDP_FD, fd);
-	addattr32(&req->n, sizeof(*req), IFLA_XDP_FLAGS, flags);
-	addattr_nest_end(&req->n, xdp);
+	if (xdp->flags)
+		addattr32(&req->n, sizeof(*req), IFLA_XDP_FLAGS, xdp->flags);
+	addattr_nest_end(&req->n, xdp_attr);
 }
 
 static const struct bpf_cfg_ops bpf_cb_ops = {
 	.ebpf_cb = xdp_ebpf_cb,
 };
 
-static int xdp_delete(struct iplink_req *req)
+static int xdp_delete(struct xdp_req *xdp)
 {
-	xdp_ebpf_cb(req, -1, NULL);
+	xdp_ebpf_cb(xdp, -1, NULL);
 	return 0;
 }
 
-int xdp_parse(int *argc, char ***argv, struct iplink_req *req)
+int xdp_parse(int *argc, char ***argv, struct iplink_req *req, bool generic)
 {
 	struct bpf_cfg_in cfg = {
 		.argc = *argc,
 		.argv = *argv,
 	};
+	struct xdp_req xdp = {
+		.req = req,
+	};
 
 	if (*argc == 1) {
 		if (strcmp(**argv, "none") == 0 ||
 		    strcmp(**argv, "off") == 0)
-			return xdp_delete(req);
+			return xdp_delete(&xdp);
 	}
-	if (bpf_parse_common(BPF_PROG_TYPE_XDP, &cfg, &bpf_cb_ops, req))
+
+	if (!force)
+		xdp.flags |= XDP_FLAGS_UPDATE_IF_NOEXIST;
+	if (generic)
+		xdp.flags |= XDP_FLAGS_SKB_MODE;
+
+	if (bpf_parse_common(BPF_PROG_TYPE_XDP, &cfg, &bpf_cb_ops, &xdp))
 		return -1;
 
 	*argc = cfg.argc;
@@ -64,12 +79,17 @@ int xdp_parse(int *argc, char ***argv, struct iplink_req *req)
 void xdp_dump(FILE *fp, struct rtattr *xdp)
 {
 	struct rtattr *tb[IFLA_XDP_MAX + 1];
+	__u32 flags = 0;
 
 	parse_rtattr_nested(tb, IFLA_XDP_MAX, xdp);
+
 	if (!tb[IFLA_XDP_ATTACHED] ||
 	    !rta_getattr_u8(tb[IFLA_XDP_ATTACHED]))
 		return;
 
-	fprintf(fp, "xdp ");
-	/* More to come here in future for 'ip -d link' (digest, etc) ... */
+	if (tb[IFLA_XDP_FLAGS])
+		flags = rta_getattr_u32(tb[IFLA_XDP_FLAGS]);
+
+	fprintf(fp, "xdp%s ",
+		flags & XDP_FLAGS_SKB_MODE ? "generic" : "");
 }
diff --git a/ip/xdp.h b/ip/xdp.h
index bc69645..1b95e0f 100644
--- a/ip/xdp.h
+++ b/ip/xdp.h
@@ -3,7 +3,7 @@
 
 #include "utils.h"
 
-int xdp_parse(int *argc, char ***argv, struct iplink_req *req);
+int xdp_parse(int *argc, char ***argv, struct iplink_req *req, bool generic);
 void xdp_dump(FILE *fp, struct rtattr *tb);
 
 #endif /* __XDP__ */
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index a5ddfe7..52571b7 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -126,7 +126,7 @@ ip-link \- network device configuration
 .RB "[ " port_guid " eui64 ] ]"
 .br
 .in -9
-.RB "[ " xdp  " { " off " | "
+.RB "[ { " xdp " | " xdpgeneric  " } { " off " | "
 .br
 .in +8
 .BR object
@@ -1572,8 +1572,23 @@ which may impact security and/or performance. (e.g. VF multicast promiscuous mod
 
 .TP
 .B xdp object "|" pinned "|" off
-set (or unset) a XDP ("express data path") BPF program to run on every
+set (or unset) a XDP ("eXpress Data Path") BPF program to run on every
 packet at driver level.
+.B ip link
+output will indicate a
+.B xdp
+flag for the networking device. If the driver does not have native XDP
+support, the kernel will fall back to a slower, driver-independent "generic"
+XDP variant. The
+.B ip link
+output will in that case indicate
+.B xdpgeneric
+instead of
+.B xdp
+only. If the driver does have native XDP support, but the program is
+loaded under
+.B xdpgeneric object "|" pinned
+then the kernel will use the generic XDP variant instead of the native one.
 
 .B off
 (or
-- 
1.9.3

^ permalink raw reply related

* Re: rhashtable - Cap total number of entries to 2^31
From: Christian Borntraeger @ 2017-04-28 10:23 UTC (permalink / raw)
  To: Florian Fainelli, Herbert Xu, David Miller
  Cc: fw, netdev, Thomas Graf, Stephen Rothwell,
	Linux-Next Mailing List, Linux Kernel Mailing List
In-Reply-To: <56843a86-9a09-16e8-acec-05a80396f282@gmail.com>

On 04/28/2017 12:21 AM, Florian Fainelli wrote:
> On 04/27/2017 02:16 PM, Florian Fainelli wrote:
>> Hi Herbert,
>>
>> On 04/26/2017 10:44 PM, Herbert Xu wrote:
>>> On Tue, Apr 25, 2017 at 10:48:22AM -0400, David Miller wrote:
>>>> From: Florian Westphal <fw@strlen.de>
>>>> Date: Tue, 25 Apr 2017 16:17:49 +0200
>>>>
>>>>> I'd have less of an issue with this if we'd be talking about
>>>>> something computationally expensive, but this is about storing
>>>>> an extra value inside a struct just to avoid one "shr" in insert path...
>>>>
>>>> Agreed, this shift is probably filling an available cpu cycle :-)
>>>
>>> OK, but we need to have an extra field for another reason anyway.
>>> The problem is that we're not capping the total number of elements
>>> in the hashtable when max_size is not set, this means that nelems
>>> can overflow which will cause havoc with the automatic shrinking
>>> when it tries to fit 2^32 entries into a minimum-sized table.
>>>
>>> So I'm taking that hole back for now :)
>>>
>>> ---8<---
>>> When max_size is not set or if it set to a sufficiently large
>>> value, the nelems counter can overflow.  This would cause havoc
>>> with the automatic shrinking as it would then attempt to fit a
>>> huge number of entries into a tiny hash table.
>>>
>>> This patch fixes this by adding max_elems to struct rhashtable
>>> to cap the number of elements.  This is set to 2^31 as nelems is
>>> not a precise count.  This is sufficiently smaller than UINT_MAX
>>> that it should be safe.
>>>
>>> When max_size is set max_elems will be lowered to at most twice
>>> max_size as is the status quo.
>>>
>>> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
>>
>> This commit:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=6d684e54690caef45cf14051ddeb7c71beeb681b
>>
>> makes my ARMv7 (32-bit) system panic on boot with the log below. I can
>> test net-next (or net) and report back if you want me to test anything.
>> Thanks!
> 
> And another on with a QEMU guest:
> 
> [    0.389212] NET: Registered protocol family 16
> [    0.388807] Kernel panic - not syncing: rtnetlink_init: cannot
> initialize rtnetlink
> [    0.388807]
> [    0.389445] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 4.11.0-rc8-02077-ge221c1f0fe25 #1
> [    0.389745] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
> [    0.390219] Call Trace:
> [    0.391406]  dump_stack+0x51/0x78
> [    0.391585]  panic+0xc7/0x20e
> [    0.391740]  ? register_pernet_operations+0xa1/0xd0
> [    0.392031]  rtnetlink_init+0x22/0x1a0
> [    0.392190]  netlink_proto_init+0x168/0x184
> [    0.392359]  ? ptp_classifier_init+0x26/0x30
> [    0.392528]  ? netlink_net_init+0x2e/0x2e
> [    0.392692]  do_one_initcall+0x54/0x190
> [    0.392852]  ? parse_args+0x248/0x400
> [    0.393033]  kernel_init_freeable+0x127/0x1b6
> [    0.393208]  ? kernel_init_freeable+0x1b6/0x1b6
> [    0.393389]  ? rest_init+0x70/0x70
> [    0.393533]  kernel_init+0x9/0x100
> [    0.393676]  ret_from_fork+0x29/0x40
> [    0.394555] ---[ end Kernel panic - not syncing: rtnetlink_init:
> cannot initialize rtnetlink
> [    0.394555]
> 
> I traced this down to:
> 
> rtnetlink_net_init()
>   netlink_kernel_create()
>      netlink_insert()
> 	__netlink_insert()
> 	   rhashtable_lookup_insert_key()
> 	      __rhashtable_insert_fast()
>                 rht_grow_above_max()
> 
> And indeed we have:
> 
> ht->nelemts = 0
> ht->max_elems = 0
> 
> such that rht_grow_above_max() returns true.
> 
> With your commit we actually take this branch:
> 
> if (ht->p.max_size < ht->max_elems / 2)
> 	ht->max_elems = ht->p.max_size * 2;
> 
> since max_size = 0 we have max_elems = 0 as well.
> 
> Candidate fix #1:
> 
> diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
> index 45f89369c4c8..ad9020e1609c 100644
> --- a/include/linux/rhashtable.h
> +++ b/include/linux/rhashtable.h
> @@ -329,7 +329,7 @@ static inline bool rht_grow_above_100(const struct
> rhashtable *ht,
>  static inline bool rht_grow_above_max(const struct rhashtable *ht,
>                                       const struct bucket_table *tbl)
>  {
> -       return atomic_read(&ht->nelems) >= ht->max_elems;
> +       return ht->p.max_size && atomic_read(&ht->nelems) >= ht->max_elems;
>  }
> 
> Candidate fix #2:
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index 751630bbe409..6b4f07760fec 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -963,7 +963,7 @@ int rhashtable_init(struct rhashtable *ht,
> 
>         /* Cap total entries at 2^31 to avoid nelems overflow. */
>         ht->max_elems = 1u << 31;
> -       if (ht->p.max_size < ht->max_elems / 2)
> +       if (ht->p.max_size && (ht->p.max_size < ht->max_elems / 2))
>                 ht->max_elems = ht->p.max_size * 2;
> 
>         ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
> 
> Number #2 does not introduce an additional conditional on the fastpath,
> so I suppose that would be what we would prefer?
> 
>>
>> [    0.158619] futex hash table entries: 1024 (order: 4, 65536 bytes)
>> [    0.166386] NET: Registered protocol family 16
>> [    0.179596] Kernel panic - not syncing: rtnetlink_init: cannot
>> initialize rtnetlink
>> [    0.179596]
>> [    0.189350] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 4.11.0-rc8-02028-g6d684e54690c #37
>> [    0.197908] Hardware name: Broadcom STB (Flattened Device Tree)
>> [    0.204254] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
>> (show_stack+0x10/0x14)
>> [    0.212447] [<c020b294>] (show_stack) from [<c04bc454>]
>> (dump_stack+0x90/0xa4)
>> [    0.220144] [<c04bc454>] (dump_stack) from [<c02ab684>]
>> (panic+0xf0/0x270)
>> [    0.227460] [<c02ab684>] (panic) from [<c0c2705c>]
>> (rtnetlink_init+0x24/0x1d4)
>> [    0.235145] [<c0c2705c>] (rtnetlink_init) from [<c0c27630>]
>> (netlink_proto_init+0x124/0x148)
>> [    0.244124] [<c0c27630>] (netlink_proto_init) from [<c02017f8>]
>> (do_one_initcall+0x40/0x168)
>> [    0.253072] [<c02017f8>] (do_one_initcall) from [<c0c00dfc>]
>> (kernel_init_freeable+0x164/0x200)
>> [    0.262304] [<c0c00dfc>] (kernel_init_freeable) from [<c087bfd8>]
>> (kernel_init+0x8/0x110)
>> [    0.270970] [<c087bfd8>] (kernel_init) from [<c0207fa8>]
>> (ret_from_fork+0x14/0x2c)
>> [    0.279014] CPU1: stopping
>> [    0.281916] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>> 4.11.0-rc8-02028-g6d684e54690c #37
>> [    0.290499] Hardware name: Broadcom STB (Flattened Device Tree)
>> [    0.296796] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
>> (show_stack+0x10/0x14)
>> [    0.305018] [<c020b294>] (show_stack) from [<c04bc454>]
>> (dump_stack+0x90/0xa4)
>> [    0.312684] [<c04bc454>] (dump_stack) from [<c020e984>]
>> (handle_IPI+0x170/0x190)
>> [    0.320531] [<c020e984>] (handle_IPI) from [<c020144c>]
>> (gic_handle_irq+0x88/0x8c)
>> [    0.328586] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
>> (__irq_svc+0x58/0x74)
>> [    0.336543] Exception stack(0xee055f68 to 0xee055fb0)
>> [    0.341938] 5f60:                   00000001 00000000 ee055fc0
>> c0219b60 ee054000 c1603cc8
>> [    0.350661] 5f80: c1603c6c 00000000 00000000 c1486188 ee055fc0
>> c1603cd4 c1483408 ee055fb8
>> [    0.359323] 5fa0: c0208a40 c0208a44 60000013 ffffffff
>> [    0.364745] [<c020bd78>] (__irq_svc) from [<c0208a44>]
>> (arch_cpu_idle+0x38/0x3c)
>> [    0.372613] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
>> (do_idle+0x168/0x204)
>> [    0.380479] [<c0255e98>] (do_idle) from [<c02561ac>]
>> (cpu_startup_entry+0x18/0x1c)
>> [    0.388493] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
>> [    0.395687] CPU3: stopping
>> [    0.398606] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
>> 4.11.0-rc8-02028-g6d684e54690c #37
>> [    0.407242] Hardware name: Broadcom STB (Flattened Device Tree)
>> [    0.413564] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
>> (show_stack+0x10/0x14)
>> [    0.421795] [<c020b294>] (show_stack) from [<c04bc454>]
>> (dump_stack+0x90/0xa4)
>> [    0.429495] [<c04bc454>] (dump_stack) from [<c020e984>]
>> (handle_IPI+0x170/0x190)
>> [    0.437394] [<c020e984>] (handle_IPI) from [<c020144c>]
>> (gic_handle_irq+0x88/0x8c)
>> [    0.445475] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
>> (__irq_svc+0x58/0x74)
>> [    0.453406] Exception stack(0xee059f68 to 0xee059fb0)
>> [    0.458792] 9f60:                   00000001 00000000 ee059fc0
>> c0219b60 ee058000 c1603cc8
>> [    0.467489] 9f80: c1603c6c 00000000 00000000 c1486188 ee059fc0
>> c1603cd4 c1483408 ee059fb8
>> [    0.476177] 9fa0: c0208a40 c0208a44 60000013 ffffffff
>> [    0.481581] [<c020bd78>] (__irq_svc) from [<c0208a44>]
>> (arch_cpu_idle+0x38/0x3c)
>> [    0.489474] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
>> (do_idle+0x168/0x204)
>> [    0.497331] [<c0255e98>] (do_idle) from [<c02561ac>]
>> (cpu_startup_entry+0x18/0x1c)
>> [    0.505369] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
>> [    0.512562] CPU2: stopping
>> [    0.515463] CPU: 2 PID: 0 Comm: swapper/2 Not tainted
>> 4.11.0-rc8-02028-g6d684e54690c #37
>> [    0.524047] Hardware name: Broadcom STB (Flattened Device Tree)
>> [    0.530368] [<c020fa18>] (unwind_backtrace) from [<c020b294>]
>> (show_stack+0x10/0x14)
>> [    0.538573] [<c020b294>] (show_stack) from [<c04bc454>]
>> (dump_stack+0x90/0xa4)
>> [    0.546195] [<c04bc454>] (dump_stack) from [<c020e984>]
>> (handle_IPI+0x170/0x190)
>> [    0.554050] [<c020e984>] (handle_IPI) from [<c020144c>]
>> (gic_handle_irq+0x88/0x8c)
>> [    0.562096] [<c020144c>] (gic_handle_irq) from [<c020bd78>]
>> (__irq_svc+0x58/0x74)
>> [    0.570044] Exception stack(0xee057f68 to 0xee057fb0)
>> [    0.575465] 7f60:                   00000001 00000000 ee057fc0
>> c0219b60 ee056000 c1603cc8
>> [    0.584145] 7f80: c1603c6c 00000000 00000000 c1486188 ee057fc0
>> c1603cd4 c1483408 ee057fb8
>> [    0.592806] 7fa0: c0208a40 c0208a44 60000013 ffffffff
>> [    0.598220] [<c020bd78>] (__irq_svc) from [<c0208a44>]
>> (arch_cpu_idle+0x38/0x3c)
>> [    0.606103] [<c0208a44>] (arch_cpu_idle) from [<c0255e98>]
>> (do_idle+0x168/0x204)
>> [    0.613960] [<c0255e98>] (do_idle) from [<c02561ac>]
>> (cpu_startup_entry+0x18/0x1c)
>> [    0.621990] [<c02561ac>] (cpu_startup_entry) from [<002014ec>] (0x2014ec)
>> [    0.629201] ---[ end Kernel panic - not syncing: rtnetlink_init:
>> cannot initialize rtnetlink
>> [    0.629201]
>>
> 
> 

I can reproduce this boot failure on s390 bisected to 
commit 6d684e54690caef45cf14051ddeb7c71beeb681b
   rhashtable: Cap total number of entries to 2^31
in linux-next from Apr 28

[    0.452478] NET: Registered protocol family 16
[    0.477867] Kernel panic - not syncing: rtnetlink_init: cannot initialize rtnetlink
[    0.477867] 
[    0.477869] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc8-02028-g6d684e5 #490
[    0.477870] Hardware name: IBM              2964 NC9              704              (KVM)
[    0.477871] Stack:
[    0.477871]        00000002743efb30 00000002743efbc0 0000000000000003 0000000000000000
[    0.477873]        00000002743efc60 00000002743efbd8 00000002743efbd8 0000000000000020
[    0.477875]        0000000000f4444e 0000000000000020 000000000000000a 000000000000000a
[    0.477877]        000000000000000c 00000002743efc28 0000000000000000 0000000000000000
[    0.477878]        0000000000958d60 00000000001125c4 00000002743efbc0 00000002743efc18
[    0.477880] Call Trace:
[    0.477882] ([<000000000011247a>] show_trace+0x62/0x78)
[    0.477883]  [<0000000000112568>] show_stack+0x68/0xe0 
[    0.477886]  [<0000000000687d46>] dump_stack+0x7e/0xb0 
[    0.477887]  [<000000000028353c>] panic+0x104/0x240 
[    0.477890]  [<0000000000ea9934>] rtnetlink_init+0x3c/0x1b8 
[    0.477951]  [<0000000000eab500>] netlink_proto_init+0x170/0x198 
[    0.477953]  [<000000000010024c>] do_one_initcall+0x4c/0x148 
[    0.477954]  [<0000000000e59d3a>] kernel_init_freeable+0x1ea/0x2a0 
[    0.477957]  [<000000000094404a>] kernel_init+0x2a/0x148 
[    0.477959]  [<000000000094e35e>] kernel_thread_starter+0x6/0xc 
[    0.477960]  [<000000000094e358>] kernel_thread_starter+0x0/0xc 

^ permalink raw reply

* Re: [PATCH net-next v5 1/2] net: hns: support deferred probe when can not obtain irq
From: Matthias Brugger @ 2017-04-28 10:17 UTC (permalink / raw)
  To: Yankejian, davem, salil.mehta, yisen.zhuang, lipeng321, zhouhuiru,
	huangdaode
  Cc: netdev, linuxarm
In-Reply-To: <1493362187-51671-2-git-send-email-yankejian@huawei.com>



On 28/04/17 08:49, Yankejian wrote:
> From: lipeng <lipeng321@huawei.com>
>
> In the hip06 and hip07 SoCs, the interrupt lines from the
> DSAF controllers are connected to mbigen hw module.
> The mbigen module is probed with module_init, and, as such,
> is not guaranteed to probe before the HNS driver. So we need
> to support deferred probe.
>
> Signed-off-by: lipeng <lipeng321@huawei.com>
> Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
> Reviewed-by: Matthias Brugger <mbrugger@suse.com>

Looks good now, so you can keep my Reviewed-by.

> ---
> change log:
> V4 -> V5:
> 1. Float on net-next;
>
> V3 -> V4:
> 1. Delete redundant commit message;
> 2. add Reviewed-by: Matthias Brugger <mbrugger@suse.com>;
>
> V2 -> V3:
> 1. Check return value when  platform_get_irq in hns_rcb_get_cfg;
> ---
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 4 +++-
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 8 +++++++-
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 2 +-
>  3 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> index eba406b..93e71e2 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
> @@ -510,7 +510,9 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)
>
>  		hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);
>
> -		hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
> +		ret = hns_rcb_get_cfg(dsaf_dev->rcb_common[i]);
> +		if (ret)
> +			goto get_cfg_fail;
>  	}
>
>  	for (i = 0; i < HNS_PPE_COM_NUM; i++)
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> index c20a0f4..e2e2853 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
> @@ -492,7 +492,7 @@ static int hns_rcb_get_base_irq_idx(struct rcb_common_cb *rcb_common)
>   *hns_rcb_get_cfg - get rcb config
>   *@rcb_common: rcb common device
>   */
> -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
> +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
>  {
>  	struct ring_pair_cb *ring_pair_cb;
>  	u32 i;
> @@ -517,10 +517,16 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common)
>  		ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] =
>  		is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) :
>  			  platform_get_irq(pdev, base_irq_idx + i * 3);
> +		if ((ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] == -EPROBE_DEFER) ||
> +		    (ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] == -EPROBE_DEFER))
> +			return -EPROBE_DEFER;
> +
>  		ring_pair_cb->q.phy_base =
>  			RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i);
>  		hns_rcb_ring_pair_get_cfg(ring_pair_cb);
>  	}
> +
> +	return 0;
>  }
>
>  /**
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
> index a664ee8..6028164 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
> @@ -121,7 +121,7 @@ struct rcb_common_cb {
>  void hns_rcb_common_free_cfg(struct dsaf_device *dsaf_dev, u32 comm_index);
>  int hns_rcb_common_init_hw(struct rcb_common_cb *rcb_common);
>  void hns_rcb_start(struct hnae_queue *q, u32 val);
> -void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
> +int hns_rcb_get_cfg(struct rcb_common_cb *rcb_common);
>  void hns_rcb_get_queue_mode(enum dsaf_mode dsaf_mode,
>  			    u16 *max_vfn, u16 *max_q_per_vf);
>
>

^ permalink raw reply

* [PATCH net] ipv4: Don't pass IP fragments to upper layer GRO handlers.
From: Steffen Klassert @ 2017-04-28  8:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Upper layer GRO handlers can not handle IP fragments, so
exit GRO processing in this case.

This fixes ESP GRO because the packet must be reassembled
before we can decapsulate, otherwise we get authentication
failures.

It also aligns IPv4 to IPv6 where packets with fragmentation
headers are not passed to upper layer GRO handlers.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/af_inet.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6b1fc6e..13a9a32 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1343,6 +1343,9 @@ struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	if (*(u8 *)iph != 0x45)
 		goto out_unlock;
 
+	if (ip_is_fragment(iph))
+		goto out_unlock;
+
 	if (unlikely(ip_fast_csum((u8 *)iph, 5)))
 		goto out_unlock;
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/2] ipvs: change comparison on sync_refresh_period
From: Simon Horman @ 2017-04-28 10:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Aaron Conole, Simon Horman
In-Reply-To: <20170428101159.9810-1-horms@verge.net.au>

From: Aaron Conole <aconole@bytheb.org>

The sync_refresh_period variable is unsigned, so it can never be < 0.

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 30d6b2cc00a0..0e5b64a75da0 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -520,7 +520,7 @@ static int ip_vs_sync_conn_needed(struct netns_ipvs *ipvs,
 		if (!(cp->flags & IP_VS_CONN_F_TEMPLATE) &&
 		    pkts % sync_period != sysctl_sync_threshold(ipvs))
 			return 0;
-	} else if (sync_refresh_period <= 0 &&
+	} else if (!sync_refresh_period &&
 		   pkts != sysctl_sync_threshold(ipvs))
 		return 0;
 
-- 
2.12.2.816.g2cccc81164


^ permalink raw reply related

* [PATCH 1/2] ipvs: remove unused function ip_vs_set_state_timeout
From: Simon Horman @ 2017-04-28 10:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Aaron Conole, Simon Horman
In-Reply-To: <20170428101159.9810-1-horms@verge.net.au>

From: Aaron Conole <aconole@bytheb.org>

There are no in-tree callers of this function and it isn't exported.

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h              |  2 --
 net/netfilter/ipvs/ip_vs_proto.c | 22 ----------------------
 2 files changed, 24 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 632082300e77..4f4f786255ef 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1349,8 +1349,6 @@ int ip_vs_protocol_init(void);
 void ip_vs_protocol_cleanup(void);
 void ip_vs_protocol_timeout_change(struct netns_ipvs *ipvs, int flags);
 int *ip_vs_create_timeout_table(int *table, int size);
-int ip_vs_set_state_timeout(int *table, int num, const char *const *names,
-			    const char *name, int to);
 void ip_vs_tcpudp_debug_packet(int af, struct ip_vs_protocol *pp,
 			       const struct sk_buff *skb, int offset,
 			       const char *msg);
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 8ae480715cea..ca880a3ad033 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -193,28 +193,6 @@ ip_vs_create_timeout_table(int *table, int size)
 }
 
 
-/*
- *	Set timeout value for state specified by name
- */
-int
-ip_vs_set_state_timeout(int *table, int num, const char *const *names,
-			const char *name, int to)
-{
-	int i;
-
-	if (!table || !name || !to)
-		return -EINVAL;
-
-	for (i = 0; i < num; i++) {
-		if (strcmp(names[i], name))
-			continue;
-		table[i] = to * HZ;
-		return 0;
-	}
-	return -ENOENT;
-}
-

^ permalink raw reply related

* [GIT PULL 0/2] Third Round of IPVS Updates for v4.12
From: Simon Horman @ 2017-04-28 10:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman

Hi Pablo,

please consider these enhancements to IPVS for v4.12.
If it is too late for v4.12 then please consider them for v4.13.

* Remove unused function
* Correct comparison of unsigned value

The following changes since commit 9a08ecfe74d7796ddc92ec312d3b7eaeba5a7c22:

  netfilter: don't attach a nat extension by default (2017-04-26 09:30:22 +0200)

are available in the git repository at:

  http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs3-for-v4.12

for you to fetch changes up to fb90e8dedb465bd06512f718b139ed8680d26dbe:

  ipvs: change comparison on sync_refresh_period (2017-04-28 12:00:10 +0200)

----------------------------------------------------------------
Aaron Conole (2):
      ipvs: remove unused function ip_vs_set_state_timeout
      ipvs: change comparison on sync_refresh_period

 include/net/ip_vs.h              |  2 --
 net/netfilter/ipvs/ip_vs_proto.c | 22 ----------------------
 net/netfilter/ipvs/ip_vs_sync.c  |  2 +-
 3 files changed, 1 insertion(+), 25 deletions(-)

^ permalink raw reply

* [PATCH 1/1] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Simon Horman @ 2017-04-28 10:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Paolo Abeni, Simon Horman
In-Reply-To: <20170428101154.9750-1-horms@verge.net.au>

From: Paolo Abeni <pabeni@redhat.com>

When creating a new ipvs service, ipv6 addresses are always accepted
if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
is not explicitly checked.

This allows the user-space to configure ipvs services even if the
system is booted with ipv6.disable=1. On specific configuration, ipvs
can try to call ipv6 routing code at setup time, causing the kernel to
oops due to fib6_rules_ops being NULL.

This change addresses the issue adding a check for the ipv6
module being enabled while validating ipv6 service operations and
adding the same validation for dest operations.

According to git history, this issue is apparently present since
the introduction of ipv6 support, and the oops can be triggered
since commit 09571c7ae30865ad ("IPVS: Add function to determine
if IPv6 address is local")

Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5aeb0dde6ccc..4d753beaac32 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3078,6 +3078,17 @@ static int ip_vs_genl_dump_services(struct sk_buff *skb,
 	return skb->len;
 }
 
+static bool ip_vs_is_af_valid(int af)
+{
+	if (af == AF_INET)
+		return true;
+#ifdef CONFIG_IP_VS_IPV6
+	if (af == AF_INET6 && ipv6_mod_enabled())
+		return true;
+#endif
+	return false;
+}
+
 static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
 				    struct ip_vs_service_user_kern *usvc,
 				    struct nlattr *nla, int full_entry,
@@ -3104,11 +3115,7 @@ static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
 	memset(usvc, 0, sizeof(*usvc));
 
 	usvc->af = nla_get_u16(nla_af);
-#ifdef CONFIG_IP_VS_IPV6
-	if (usvc->af != AF_INET && usvc->af != AF_INET6)
-#else
-	if (usvc->af != AF_INET)
-#endif
+	if (!ip_vs_is_af_valid(usvc->af))
 		return -EAFNOSUPPORT;
 
 	if (nla_fwmark) {
@@ -3610,6 +3617,11 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
 		if (udest.af == 0)
 			udest.af = svc->af;
 
+		if (!ip_vs_is_af_valid(udest.af)) {
+			ret = -EAFNOSUPPORT;
+			goto out;
+		}
+
 		if (udest.af != svc->af && cmd != IPVS_CMD_DEL_DEST) {
 			/* The synchronization protocol is incompatible
 			 * with mixed family services
-- 
2.12.2.816.g2cccc81164


^ permalink raw reply related

* [GIT PULL v2 0/1] IPVS Fixes for v4.11
From: Simon Horman @ 2017-04-28 10:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman

Hi Pablo,

please consider this fix to IPVS for v4.11.
Or if it is too late for v4.11 please consider it for v4.12.
I would also like it considered for stable.

* Explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
  to avoid oops caused by IPVS accesing IPv6 routing code in such
  circumstances.

Change since v1 of pull request:
* Rebase on nf
* Correct URL; it should be ipvs not ipvs-next


The following changes since commit 9dd2ab609eef736d5639e0de1bcc2e71e714b28e:

  netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 path (2017-04-25 11:10:38 +0200)

are available in the git repository at:

  http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git ipvs-fixes-for-v4.11

for you to fetch changes up to 1442f6f7c1b77de1c508318164a527e240c24a4d:

  ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled (2017-04-28 12:04:35 +0200)

----------------------------------------------------------------
Paolo Abeni (1):
      ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled

 net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

^ permalink raw reply

* pull request (net-next): ipsec-next 2017-04-28
From: Steffen Klassert @ 2017-04-28  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

Just one patch to fix a misplaced spin_unlock_bh in an error path.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit e2989ee9746b3f2e78d1a39bbc402d884e8b8bf1:

  bpf, doc: update list of architectures that do eBPF JIT (2017-04-23 15:56:48 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to e892d2d40445a14a19530a2be8c489b87bcd7c19:

  esp: Fix misplaced spin_unlock_bh. (2017-04-24 07:56:31 +0200)

----------------------------------------------------------------
Steffen Klassert (1):
      esp: Fix misplaced spin_unlock_bh.

 net/ipv4/esp4.c | 6 +-----
 net/ipv6/esp6.c | 6 +-----
 2 files changed, 2 insertions(+), 10 deletions(-)

^ permalink raw reply

* [PATCH] esp: Fix misplaced spin_unlock_bh.
From: Steffen Klassert @ 2017-04-28  8:42 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1493368958-29609-1-git-send-email-steffen.klassert@secunet.com>

A recent commit moved esp_alloc_tmp() out of a lock
protected region, but forgot to remove the unlock from
the error path. This patch removes the forgotten unlock.
While at it, remove some unneeded error assignments too.

Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output")
Fixes: 383d0350f2cc ("esp6: Reorganize esp_output")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/esp4.c | 6 +-----
 net/ipv6/esp6.c | 6 +-----
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 7e501ad..7f2caf7 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -356,11 +356,8 @@ int esp_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info *
 	ivlen = crypto_aead_ivsize(aead);
 
 	tmp = esp_alloc_tmp(aead, esp->nfrags + 2, extralen);
-	if (!tmp) {
-		spin_unlock_bh(&x->lock);
-		err = -ENOMEM;
+	if (!tmp)
 		goto error;
-	}
 
 	extra = esp_tmp_extra(tmp);
 	iv = esp_tmp_iv(aead, tmp, extralen);
@@ -389,7 +386,6 @@ int esp_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info *
 		spin_lock_bh(&x->lock);
 		if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
 			spin_unlock_bh(&x->lock);
-			err = -ENOMEM;
 			goto error;
 		}
 
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 8b55abf..1fe99ba 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -330,11 +330,8 @@ int esp6_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info
 	ivlen = crypto_aead_ivsize(aead);
 
 	tmp = esp_alloc_tmp(aead, esp->nfrags + 2, seqhilen);
-	if (!tmp) {
-		spin_unlock_bh(&x->lock);
-		err = -ENOMEM;
+	if (!tmp)
 		goto error;
-	}
 
 	seqhi = esp_tmp_seqhi(tmp);
 	iv = esp_tmp_iv(aead, tmp, seqhilen);
@@ -362,7 +359,6 @@ int esp6_output_tail(struct xfrm_state *x, struct sk_buff *skb, struct esp_info
 		spin_lock_bh(&x->lock);
 		if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
 			spin_unlock_bh(&x->lock);
-			err = -ENOMEM;
 			goto error;
 		}
 
-- 
2.7.4

^ permalink raw reply related

* Re: [GIT 0/1] IPVS Fixes for v4.11
From: Simon Horman @ 2017-04-28 10:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov
In-Reply-To: <20170428095816.6588-1-horms@verge.net.au>

Sorry, I messed this up.
I will repost.

On Fri, Apr 28, 2017 at 11:58:15AM +0200, Simon Horman wrote:
> Hi Pablo,
> 
> please consider this fix to IPVS for v4.11.
> Or if it is too late for v4.11 please consider it for v4.12.
> I would also like it considered for stable.
> 
> * Explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
>   to avoid oops caused by IPVS accesing IPv6 routing code in such
>   circumstances.
> 
> The following changes since commit 1debdc8f9ebd07daf140e417b3841596911e0066:
> 
>   sh_eth: unmap DMA buffers when freeing rings (2017-04-18 22:04:32 -0400)
> 
> are available in the git repository at:
> 
>   http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git ipvs-fixes-for-v4.11
> 
> for you to fetch changes up to 8f8688b0d483ff06236808ab5fc8bc83c5eaa8d9:
> 
>   ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled (2017-04-24 11:53:55 +0200)
> 
> ----------------------------------------------------------------
> Paolo Abeni (1):
>       ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
> 
>  net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> -- 
> 2.12.2.816.g2cccc81164
> 

^ permalink raw reply

* [PATCH 1/1] ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
From: Simon Horman @ 2017-04-28  9:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Paolo Abeni, Simon Horman
In-Reply-To: <20170428095816.6588-1-horms@verge.net.au>

From: Paolo Abeni <pabeni@redhat.com>

When creating a new ipvs service, ipv6 addresses are always accepted
if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
is not explicitly checked.

This allows the user-space to configure ipvs services even if the
system is booted with ipv6.disable=1. On specific configuration, ipvs
can try to call ipv6 routing code at setup time, causing the kernel to
oops due to fib6_rules_ops being NULL.

This change addresses the issue adding a check for the ipv6
module being enabled while validating ipv6 service operations and
adding the same validation for dest operations.

According to git history, this issue is apparently present since
the introduction of ipv6 support, and the oops can be triggered
since commit 09571c7ae30865ad ("IPVS: Add function to determine
if IPv6 address is local")

Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 5aeb0dde6ccc..4d753beaac32 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3078,6 +3078,17 @@ static int ip_vs_genl_dump_services(struct sk_buff *skb,
 	return skb->len;
 }
 
+static bool ip_vs_is_af_valid(int af)
+{
+	if (af == AF_INET)
+		return true;
+#ifdef CONFIG_IP_VS_IPV6
+	if (af == AF_INET6 && ipv6_mod_enabled())
+		return true;
+#endif
+	return false;
+}
+
 static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
 				    struct ip_vs_service_user_kern *usvc,
 				    struct nlattr *nla, int full_entry,
@@ -3104,11 +3115,7 @@ static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs,
 	memset(usvc, 0, sizeof(*usvc));
 
 	usvc->af = nla_get_u16(nla_af);
-#ifdef CONFIG_IP_VS_IPV6
-	if (usvc->af != AF_INET && usvc->af != AF_INET6)
-#else
-	if (usvc->af != AF_INET)
-#endif
+	if (!ip_vs_is_af_valid(usvc->af))
 		return -EAFNOSUPPORT;
 
 	if (nla_fwmark) {
@@ -3610,6 +3617,11 @@ static int ip_vs_genl_set_cmd(struct sk_buff *skb, struct genl_info *info)
 		if (udest.af == 0)
 			udest.af = svc->af;
 
+		if (!ip_vs_is_af_valid(udest.af)) {
+			ret = -EAFNOSUPPORT;
+			goto out;
+		}
+
 		if (udest.af != svc->af && cmd != IPVS_CMD_DEL_DEST) {
 			/* The synchronization protocol is incompatible
 			 * with mixed family services
-- 
2.12.2.816.g2cccc81164

^ permalink raw reply related

* [GIT 0/1] IPVS Fixes for v4.11
From: Simon Horman @ 2017-04-28  9:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov, Simon Horman

Hi Pablo,

please consider this fix to IPVS for v4.11.
Or if it is too late for v4.11 please consider it for v4.12.
I would also like it considered for stable.

* Explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
  to avoid oops caused by IPVS accesing IPv6 routing code in such
  circumstances.

The following changes since commit 1debdc8f9ebd07daf140e417b3841596911e0066:

  sh_eth: unmap DMA buffers when freeing rings (2017-04-18 22:04:32 -0400)

are available in the git repository at:

  http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git ipvs-fixes-for-v4.11

for you to fetch changes up to 8f8688b0d483ff06236808ab5fc8bc83c5eaa8d9:

  ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled (2017-04-24 11:53:55 +0200)

----------------------------------------------------------------
Paolo Abeni (1):
      ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled

 net/netfilter/ipvs/ip_vs_ctl.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

-- 
2.12.2.816.g2cccc81164


^ permalink raw reply

* [PATCH 2/2] xfrm: fix GRO for !CONFIG_NETFILTER
From: Steffen Klassert @ 2017-04-28  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1493370873-30836-1-git-send-email-steffen.klassert@secunet.com>

From: Sabrina Dubroca <sd@queasysnail.net>

In xfrm_input() when called from GRO, async == 0, and we end up
skipping the processing in xfrm4_transport_finish(). GRO path will
always skip the NF_HOOK, so we don't need the special-case for
!NETFILTER during GRO processing.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 46bdb4f..e23570b 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -395,7 +395,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		if (xo)
 			xfrm_gro = xo->flags & XFRM_GRO;
 
-		err = x->inner_mode->afinfo->transport_finish(skb, async);
+		err = x->inner_mode->afinfo->transport_finish(skb, xfrm_gro || async);
 		if (xfrm_gro) {
 			skb_dst_drop(skb);
 			gro_cells_receive(&gro_cells, skb);
-- 
2.7.4

^ permalink raw reply related

* pull request (net): ipsec 2017-04-28
From: Steffen Klassert @ 2017-04-28  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev

1) Do garbage collecting after a policy flush to remove old
   bundles immediately. From Xin Long.

2) Fix GRO if netfilter is not defined.
   From Sabrina Dubroca.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit fd2c83b35752f0a8236b976978ad4658df14a59f:

  net/packet: check length in getsockopt() called with PACKET_HDRLEN (2017-04-25 14:05:52 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git master

for you to fetch changes up to cfcf99f987ba321a3d122580716beb9b08d52eb8:

  xfrm: fix GRO for !CONFIG_NETFILTER (2017-04-27 12:20:19 +0200)

----------------------------------------------------------------
Sabrina Dubroca (1):
      xfrm: fix GRO for !CONFIG_NETFILTER

Xin Long (1):
      xfrm: do the garbage collection after flushing policy

 net/xfrm/xfrm_input.c  | 2 +-
 net/xfrm/xfrm_policy.c | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH 1/2] xfrm: do the garbage collection after flushing policy
From: Steffen Klassert @ 2017-04-28  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Steffen Klassert, netdev
In-Reply-To: <1493370873-30836-1-git-send-email-steffen.klassert@secunet.com>

From: Xin Long <lucien.xin@gmail.com>

Now xfrm garbage collection can be triggered by 'ip xfrm policy del'.
These is no reason not to do it after flushing policies, especially
considering that 'garbage collection deferred' is only triggered
when it reaches gc_thresh.

It's no good that the policy is gone but the xdst still hold there.
The worse thing is that xdst->route/orig_dst is also hold and can
not be released even if the orig_dst is already expired.

This patch is to do the garbage collection if there is any policy
removed in xfrm_policy_flush.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 236cbbc..dfc77b9 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1006,6 +1006,10 @@ int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
 		err = -ESRCH;
 out:
 	spin_unlock_bh(&net->xfrm.xfrm_policy_lock);
+
+	if (cnt)
+		xfrm_garbage_collect(net);
+
 	return err;
 }
 EXPORT_SYMBOL(xfrm_policy_flush);
-- 
2.7.4

^ permalink raw reply related

* (unknown), 
From: администратор @ 2017-04-28  9:09 UTC (permalink / raw)


внимания;

Ваши сообщения превысил лимит памяти, который составляет 5 Гб, определенных администратором, который в настоящее время работает на 10.9GB, Вы не сможете отправить или получить новую почту, пока вы повторно не проверить ваш почтовый ящик почты. Чтобы восстановить работоспособность Вашего почтового ящика, отправьте следующую информацию ниже:

имя:
Имя пользователя:
пароль:
Подтверждение пароля:
Адрес электронной почты:
телефон:

Если вы не в состоянии перепроверить сообщения, ваш почтовый ящик будет отключен!

Приносим извинения за неудобства.
Проверочный код: EN: Ru...635829wjxnxl....74990.RU.2017
Почты технической поддержки ©2017

спасибо
системы администратор

^ permalink raw reply

* RE: [PATCH net-next 1/4] ixgbe: sparc: rename the ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
From: Gabriele Paoloni @ 2017-04-28  9:12 UTC (permalink / raw)
  To: Casey Leedom, Bjorn Helgaas, Alexander Duyck
  Cc: Dingtianhong, Mark Rutland, Amir Ancel, linux-pci@vger.kernel.org,
	Catalin Marinas, Will Deacon, Linuxarm, David Laight,
	jeffrey.t.kirsher@intel.com, netdev@vger.kernel.org, Robin Murphy,
	davem@davemloft.net, linux-arm-kernel@lists.infradead.org
In-Reply-To: <MWHPR12MB1600CB0756EA24211C93E053C8100@MWHPR12MB1600.namprd12.prod.outlook.com>

Hi Casey

Many thanks for the detailed explanation

> -----Original Message-----
> From: Casey Leedom [mailto:leedom@chelsio.com]
> Sent: 27 April 2017 21:35
> To: Bjorn Helgaas; Alexander Duyck
> Cc: Dingtianhong; Mark Rutland; Amir Ancel; Gabriele Paoloni; linux-
> pci@vger.kernel.org; Catalin Marinas; Will Deacon; Linuxarm; David
> Laight; jeffrey.t.kirsher@intel.com; netdev@vger.kernel.org; Robin
> Murphy; davem@davemloft.net; linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH net-next 1/4] ixgbe: sparc: rename the
> ARCH_WANT_RELAX_ORDER to IXGBE_ALLOW_RELAXED_ORDER
> 
> | From: Bjorn Helgaas <helgaas@kernel.org>
> | Sent: Thursday, April 27, 2017 10:19 AM
> |
> | Are you hinting that the PCI core or arch code could actually
> *enable*
> | Relaxed Ordering without the driver doing anything?  Is it safe to do
> that?
> | Is there such a thing as a device that is capable of using RO, but
> where the
> | driver must be aware of it being enabled, so it programs the device
> | appropriately?
> 
>   I forgot to reply to this portion of Bjorn's email.
> 
>   The PCI Configuration Space PCI Capability Device Control[Enable
> Relaxed
> Ordering] bit governs enabling the _ability_ for the PCIe Device to
> send
> TLPs with the Relaxed Ordering Attribute set.  It does not _cause_ RO
> to be
> set on TLPs.  Doing that would almost certainly cause Data Corruption
> Bugs
> since you only want a subset of TLPs to have RO set.
> 
>   For instance, we typically use RO for Ingress Packet Data delivery
> but
> non-RO for messages notifying the Host that an Ingress Packet has been
> delivered.  This ensures that the "Ingress Packet Delivered" non-RO TLP
> is
> processed _after_ any preceding RO TLPs delivering the actual Ingress
> Packet
> Data.
> 
>   In the above scenario, if one were to turn off Enable Relaxed
> Ordering via
> the PCIe Capability, then the on-chip PCIe engine would simply never
> send a
> TLP with the Relaxed Ordering Attribute set, regardless of any other
> chip
> programming.
> 
>   And finally, just to be absolutely clear, using Relaxed Ordering
> isn't and
> "Architecture Thing".  It's a PCIe Fabric End Point Thing.  Many End
> Points
> simply ignore the Relaxed Ordering Attribute (except to reflect it back
> in
> Response TLPs).  In this sense, Relaxed Ordering simply provides
> potentially useful optimization information to the PCIe End Point.

I think your view matches what I found out about the current usage of the
"Enable Relaxed Ordering" bit in Linux mainline: i.e. looking at where and
why the other drivers set/clear the "Enable Relaxed Ordering" they do not
look for any global symbol, nor they look at the host architecture.

So with respect to this specific ixgbe driver I guess the main question is
why RO was disabled by default by Intel for this EP (commit 3d5c520727ce
mentions issues with "some chipsets"), then why it is safe to enable it back
on SPARC....?

Thanks
Gab

> 
> Casey

^ permalink raw reply

* Re: [PATCH net] esp: skip GRO for fragmented packets
From: Sabrina Dubroca @ 2017-04-28  9:04 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: netdev, Herbert Xu
In-Reply-To: <20170427104334.GE2649@secunet.com>

2017-04-27, 12:43:35 +0200, Steffen Klassert wrote:
> On Thu, Apr 27, 2017 at 12:31:14PM +0200, Sabrina Dubroca wrote:
> > Currently, ESP4 GRO doesn't work for fragmented packets, so let's send
> > these through the normal path.
> > 
> > Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > ---
> > Steffen, if you prefer to drop this patch and fix this properly,
> > that's okay for me. I can't look much deeper into this right now and
> > it's broken on current net/master.
> 
> I did a fix for this last week, but forgot to submit it.
> We can fix this in inet_gro_receive(), as no GRO handler
> can really handle fragmented packets.
> 
> I'll plan to fix it with this patch:

Yeah, that looks okay to me, thanks.
Let's make sure it ends up in 4.11 (or an early 4.11.x).

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH v1 net-next 5/6] net: allow simultaneous SW and HW transmit timestamping
From: Miroslav Lichvar @ 2017-04-28  8:54 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Richard Cochran, Willem de Bruijn,
	Soheil Hassas Yeganeh, Keller, Jacob E, Denny Page, Jiri Benc
In-Reply-To: <CAF=yD-+GSK491AWQx8=6yd3=-HHwxdWq677ubwdjbV5AXzRbog@mail.gmail.com>

On Wed, Apr 26, 2017 at 08:00:02PM -0400, Willem de Bruijn wrote:
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 81ef53f..42bff22 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -3300,8 +3300,7 @@ void skb_tstamp_tx(struct sk_buff *orig_skb,
> >
> >  static inline void sw_tx_timestamp(struct sk_buff *skb)
> >  {
> > -       if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
> > -           !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
> > +       if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP)
> >                 skb_tstamp_tx(skb, NULL);
> >  }

> > +++ b/net/core/skbuff.c
> > @@ -3874,6 +3874,10 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >         if (!sk)
> >                 return;
> >
> > +       if (!hwtstamps && !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW) &&
> > +           skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)
> > +               return;
> > +
> 
> This check should only happen for software transmit timestamps, so simpler to
> revise the check in sw_tx_timestamp above to
> 
>   if (skb_shinfo(skb)->tx_flags & SKBTX_SW_TSTAMP &&
> -        !(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS))
> +      (!(skb_shinfo(orig_skb)->tx_flags & SKBTX_IN_PROGRESS)) ||
> +      (skb->sk && skb->sk->sk_tsflags & SOF_TIMESTAMPING_OPT_TX_SWHW)

I'm not sure if this can work. sk_buff.h would need to include sock.h
in order to get the definition of struct sock. Any suggestions?

-- 
Miroslav Lichvar

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox