Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 05/13] bpf: get better bpf_prog ksyms based on btf func type_id
From: Alexei Starovoitov @ 2018-10-16 17:59 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, kafai, daniel, netdev, kernel-team
In-Reply-To: <20181012185446.2379289-1-yhs@fb.com>

On Fri, Oct 12, 2018 at 11:54:42AM -0700, Yonghong Song wrote:
> This patch added interface to load a program with the following
> additional information:
>    . prog_btf_fd
>    . func_info and func_info_len
> where func_info will provides function range and type_id
> corresponding to each function.
> 
> If verifier agrees with function range provided by the user,
> the bpf_prog ksym for each function will use the func name
> provided in the type_id, which is supposed to provide better
> encoding as it is not limited by 16 bytes program name
> limitation and this is better for bpf program which contains
> multiple subprograms.
> 
> The bpf_prog_info interface is also extended to
> return btf_id and jited_func_types, so user spaces can
> print out the function prototype for each jited function.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
...
>  	BUILD_BUG_ON(sizeof("bpf_prog_") +
>  		     sizeof(prog->tag) * 2 +
> @@ -401,6 +403,13 @@ static void bpf_get_prog_name(const struct bpf_prog *prog, char *sym)
>  
>  	sym += snprintf(sym, KSYM_NAME_LEN, "bpf_prog_");
>  	sym  = bin2hex(sym, prog->tag, sizeof(prog->tag));
> +
> +	if (prog->aux->btf) {
> +		func_name = btf_get_name_by_id(prog->aux->btf, prog->aux->type_id);
> +		snprintf(sym, (size_t)(end - sym), "_%s", func_name);
> +		return;

Would it make sense to add a comment here that prog->aux->name is ignored
when full btf name is available? (otherwise the same name will appear twice in ksym)

> +	}
> +
>  	if (prog->aux->name[0])
>  		snprintf(sym, (size_t)(end - sym), "_%s", prog->aux->name);
...
> +static int check_btf_func(struct bpf_prog *prog, struct bpf_verifier_env *env,
> +			  union bpf_attr *attr)
> +{
> +	struct bpf_func_info *data;
> +	int i, nfuncs, ret = 0;
> +
> +	if (!attr->func_info_len)
> +		return 0;
> +
> +	nfuncs = attr->func_info_len / sizeof(struct bpf_func_info);
> +	if (env->subprog_cnt != nfuncs) {
> +		verbose(env, "number of funcs in func_info does not match verifier\n");

'does not match verifier' is hard to make sense of.
How about 'number of funcs in func_info doesn't match number of subprogs' ?

> +		return -EINVAL;
> +	}
> +
> +	data = kvmalloc(attr->func_info_len, GFP_KERNEL | __GFP_NOWARN);
> +	if (!data) {
> +		verbose(env, "no memory to allocate attr func_info\n");

I don't think we ever print such warnings for memory allocations.
imo this can be removed, since enomem is enough.

> +		return -ENOMEM;
> +	}
> +
> +	if (copy_from_user(data, u64_to_user_ptr(attr->func_info),
> +			   attr->func_info_len)) {
> +		verbose(env, "memory copy error for attr func_info\n");

similar thing. kernel never warns about copy_from_user errors.

> +		ret = -EFAULT;
> +		goto cleanup;
> +		}
> +
> +	for (i = 0; i < nfuncs; i++) {
> +		if (env->subprog_info[i].start != data[i].insn_offset) {
> +			verbose(env, "func_info subprog start (%d) does not match verifier (%d)\n",
> +				env->subprog_info[i].start, data[i].insn_offset);

I think printing exact insn offset isn't going to be much help
for regular user to debug it. If this happens, it's likely llvm issue.
How about 'func_info BTF section doesn't match subprog layout in BPF program' ?

^ permalink raw reply

* Re: [PATCH net] net/sched: properly init chain in case of multiple control actions
From: Davide Caratti @ 2018-10-16 17:38 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jiri Pirko, Jamal Hadi Salim, David Miller,
	Linux Kernel Network Developers
In-Reply-To: <CAM_iQpWShGR3Kq+6bYs6UbdzPq0XuM86bQV2B2GkV3MaeTNQZA@mail.gmail.com>

On Mon, 2018-10-15 at 11:31 -0700, Cong Wang wrote:
> On Sat, Oct 13, 2018 at 8:23 AM Davide Caratti <dcaratti@redhat.com> wrote:
> > 
> > On Fri, 2018-10-12 at 13:57 -0700, Cong Wang wrote:
> > > Why not just validate the fallback action in each action init()?
> > > For example, checking tcfg_paction in tcf_gact_init().
> > > 
> > > I don't see the need of making it generic.
...
> > A (legal?) trick  is to let tcf_action store the fallback action when it
> > contains a 'goto chain' command, I just posted a proposal for gact. If you
> > think it's ok, I will test and post the same for act_police.
> 
> Do we really need to support TC_ACT_GOTO_CHAIN for
> gact->tcfg_paction etc.? I mean, is it useful in practice or is it just for
> completeness?
> 
> IF we don't need to support it, we can just make it invalid without needing
> to initialize it in ->init() at all.
> 
> If we do, however, we really need to move it into each ->init(), because
> we have to lock each action if we are modifying an existing one. With
> your patch, tcf_action_goto_chain_init() is still called without the per-action
> lock.
> 
> What's more, if we support two different actions in gact, that is, tcfg_paction
> and tcf_action, how could you still only have one a->goto_chain pointer?
> There should be two pointers for each of them. :)

whatever fixes the NULL dereference is OK for me.
I thought that the proposal made with

https://www.mail-archive.com/netdev@vger.kernel.org/msg251933.html

(i.e., letting init() copy tcfg_paction to tcf_action in case it contained
'goto chain x') was smart enough to preserve the current behavior, and
also let 'goto chain' work in case it was configured  *only* for the
fallback action.
When the action is modified, the change to tcfg_paction is done with the
same spinlock as tcf_action, so I didn't notice anything worse than the
current locking layout. 

(well, after some more thinking I looked again at that patch and yes, it
lacked the most important thing:)

--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -88,6 +88,9 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
                p_parm = nla_data(tb[TCA_GACT_PROB]);
                if (p_parm->ptype >= MAX_RAND)
                        return -EINVAL;
+               if (TC_ACT_EXT_CMP(p_parm->paction, TC_ACT_GOTO_CHAIN) &&
+                   TC_ACT_EXT_CMP(parm->action, TC_ACT_GOTO_CHAIN))
+                       return -EINVAL;
        }
 #endif

That said, 'goto chain' never worked for police and gact since the first
introduction of 'goto chain', so we are not breaking any userspace program.
And I don't necessarily need 'goto chain' in police and gact fallback
actions; nobody complained in 1 year, so we can just add these two lines
in tcf_gact_init() and something similar in tcf_police_init():


                if (p_parm->ptype >= MAX_RAND)
                        return -EINVAL;
+               if (TC_ACT_EXT_CMP(p_parm->paction, TC_ACT_GOTO_CHAIN))
+                       return -EINVAL;


(and maybe also help users with a proper extack). Just let me know which
approach you prefer, I will test and send patches.
thanks!

-- 
davide

^ permalink raw reply

* [bpf-next PATCH] bpf: sockmap, fix skmsg recvmsg handler to track size correctly
From: John Fastabend @ 2018-10-16 17:36 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev

When converting sockmap to new skmsg generic data structures we missed
that the recvmsg handler did not correctly use sg.size and instead was
using individual elements length. The result is if a sock is closed
with outstanding data we omit the call to sk_mem_uncharge() and can
get the warning below.

[   66.728282] WARNING: CPU: 6 PID: 5783 at net/core/stream.c:206 sk_stream_kill_queues+0x1fa/0x210

To fix this correct the redirect handler to xfer the size along with
the scatterlist and also decrement the size from the recvmsg handler.
Now when a sock is closed the remaining 'size' will be decremented
with sk_mem_uncharge().

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/skmsg.h |    1 +
 net/ipv4/tcp_bpf.c    |    1 +
 2 files changed, 2 insertions(+)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 0b919f0..31df0d9 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -176,6 +176,7 @@ static inline void sk_msg_xfer(struct sk_msg *dst, struct sk_msg *src,
 {
 	dst->sg.data[which] = src->sg.data[which];
 	dst->sg.data[which].length  = size;
+	dst->sg.size		   += size;
 	src->sg.data[which].length -= size;
 	src->sg.data[which].offset += size;
 }
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 80debb0..f9d3cf1 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -73,6 +73,7 @@ int __tcp_bpf_recvmsg(struct sock *sk, struct sk_psock *psock,
 			sge->offset += copy;
 			sge->length -= copy;
 			sk_mem_uncharge(sk, copy);
+			msg_rx->sg.size -= copy;
 			if (!sge->length) {
 				i++;
 				if (i == MAX_SKB_FRAGS)

^ permalink raw reply related

* [PATCH net] r8169: re-enable MSI-X on RTL8168g
From: Heiner Kallweit @ 2018-10-16 17:35 UTC (permalink / raw)
  To: David Miller, Realtek linux nic maintainers; +Cc: netdev@vger.kernel.org

Similar to d49c88d7677b ("r8169: Enable MSI-X on RTL8106e") after
e9d0ba506ea8 ("PCI: Reprogram bridge prefetch registers on resume")
we can safely assume that this also fixes the root cause of
the issue worked around by 7c53a722459c ("r8169: don't use MSI-X on
RTL8168g"). So let's revert it.

Fixes: 7c53a722459c ("r8169: don't use MSI-X on RTL8168g")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index f4df367fb..28184b984 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7098,11 +7098,6 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
 		RTL_W8(tp, Cfg9346, Cfg9346_Lock);
 		flags = PCI_IRQ_LEGACY;
-	} else if (tp->mac_version == RTL_GIGA_MAC_VER_40) {
-		/* This version was reported to have issues with resume
-		 * from suspend when using MSI-X
-		 */
-		flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI;
 	} else {
 		flags = PCI_IRQ_ALL_TYPES;
 	}
-- 
2.19.1

^ permalink raw reply related

* Re: [PATCH net] netfilter: fix DNAT target for shifted portmap ranges
From: Pablo Neira Ayuso @ 2018-10-16 17:35 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Thierry Du Tre, Florian Westphal, David S. Miller,
	netfilter-devel
In-Reply-To: <e59ead42affbd4280e678a7c77eda13106d40984.1539701235.git.pabeni@redhat.com>

On Tue, Oct 16, 2018 at 04:52:05PM +0200, Paolo Abeni wrote:
> The commit 2eb0f624b709 ("netfilter: add NAT support for shifted
> portmap ranges") did not set the checkentry/destroy callbacks for
> the newly added DNAT target. As a result, rulesets using only
> such nat targets are not effective, as the relevant conntrack hooks
> are not enabled.
> The above affect also nft_compat rulesets.
> Fix the issue adding the missing initializers.

Applied, thanks Paolo.

^ permalink raw reply

* [PATCH 4.9 68/71] ip: add helpers to process in-order fragments faster.
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Willem de Bruijn, Peter Oskolkov,
	Eric Dumazet, Florian Westphal, David S. Miller
In-Reply-To: <20181016170539.315587743@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Peter Oskolkov <posk@google.com>

This patch introduces several helper functions/macros that will be
used in the follow-up patch. No runtime changes yet.

The new logic (fully implemented in the second patch) is as follows:

* Nodes in the rb-tree will now contain not single fragments, but lists
  of consecutive fragments ("runs").

* At each point in time, the current "active" run at the tail is
  maintained/tracked. Fragments that arrive in-order, adjacent
  to the previous tail fragment, are added to this tail run without
  triggering the re-balancing of the rb-tree.

* If a fragment arrives out of order with the offset _before_ the tail run,
  it is inserted into the rb-tree as a single fragment.

* If a fragment arrives after the current tail fragment (with a gap),
  it starts a new "tail" run, as is inserted into the rb-tree
  at the end as the head of the new run.

skb->cb is used to store additional information
needed here (suggested by Eric Dumazet).

Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 353c9cb360874e737fb000545f783df756c06f9a)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h |    6 +++
 net/ipv4/ip_fragment.c  |   73 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -56,7 +56,9 @@ struct frag_v6_compare_key {
  * @lock: spinlock protecting this frag
  * @refcnt: reference count of the queue
  * @fragments: received fragments head
+ * @rb_fragments: received fragments rb-tree root
  * @fragments_tail: received fragments tail
+ * @last_run_head: the head of the last "run". see ip_fragment.c
  * @stamp: timestamp of the last received fragment
  * @len: total length of the original datagram
  * @meat: length of received fragments so far
@@ -77,6 +79,7 @@ struct inet_frag_queue {
 	struct sk_buff		*fragments;  /* Used in IPv6. */
 	struct rb_root		rb_fragments; /* Used in IPv4. */
 	struct sk_buff		*fragments_tail;
+	struct sk_buff		*last_run_head;
 	ktime_t			stamp;
 	int			len;
 	int			meat;
@@ -112,6 +115,9 @@ void inet_frag_kill(struct inet_frag_que
 void inet_frag_destroy(struct inet_frag_queue *q);
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key);
 
+/* Free all skbs in the queue; return the sum of their truesizes. */
+unsigned int inet_frag_rbtree_purge(struct rb_root *root);
+
 static inline void inet_frag_put(struct inet_frag_queue *q)
 {
 	if (atomic_dec_and_test(&q->refcnt))
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -56,6 +56,57 @@
  */
 static const char ip_frag_cache_name[] = "ip4-frags";
 
+/* Use skb->cb to track consecutive/adjacent fragments coming at
+ * the end of the queue. Nodes in the rb-tree queue will
+ * contain "runs" of one or more adjacent fragments.
+ *
+ * Invariants:
+ * - next_frag is NULL at the tail of a "run";
+ * - the head of a "run" has the sum of all fragment lengths in frag_run_len.
+ */
+struct ipfrag_skb_cb {
+	struct inet_skb_parm	h;
+	struct sk_buff		*next_frag;
+	int			frag_run_len;
+};
+
+#define FRAG_CB(skb)		((struct ipfrag_skb_cb *)((skb)->cb))
+
+static void ip4_frag_init_run(struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb));
+
+	FRAG_CB(skb)->next_frag = NULL;
+	FRAG_CB(skb)->frag_run_len = skb->len;
+}
+
+/* Append skb to the last "run". */
+static void ip4_frag_append_to_last_run(struct inet_frag_queue *q,
+					struct sk_buff *skb)
+{
+	RB_CLEAR_NODE(&skb->rbnode);
+	FRAG_CB(skb)->next_frag = NULL;
+
+	FRAG_CB(q->last_run_head)->frag_run_len += skb->len;
+	FRAG_CB(q->fragments_tail)->next_frag = skb;
+	q->fragments_tail = skb;
+}
+
+/* Create a new "run" with the skb. */
+static void ip4_frag_create_run(struct inet_frag_queue *q, struct sk_buff *skb)
+{
+	if (q->last_run_head)
+		rb_link_node(&skb->rbnode, &q->last_run_head->rbnode,
+			     &q->last_run_head->rbnode.rb_right);
+	else
+		rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node);
+	rb_insert_color(&skb->rbnode, &q->rb_fragments);
+
+	ip4_frag_init_run(skb);
+	q->fragments_tail = skb;
+	q->last_run_head = skb;
+}
+
 /* Describe an entry in the "incomplete datagrams" queue. */
 struct ipq {
 	struct inet_frag_queue q;
@@ -652,6 +703,28 @@ struct sk_buff *ip_check_defrag(struct n
 }
 EXPORT_SYMBOL(ip_check_defrag);
 
+unsigned int inet_frag_rbtree_purge(struct rb_root *root)
+{
+	struct rb_node *p = rb_first(root);
+	unsigned int sum = 0;
+
+	while (p) {
+		struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode);
+
+		p = rb_next(p);
+		rb_erase(&skb->rbnode, root);
+		while (skb) {
+			struct sk_buff *next = FRAG_CB(skb)->next_frag;
+
+			sum += skb->truesize;
+			kfree_skb(skb);
+			skb = next;
+		}
+	}
+	return sum;
+}
+EXPORT_SYMBOL(inet_frag_rbtree_purge);
+
 #ifdef CONFIG_SYSCTL
 static int dist_min;
 

^ permalink raw reply

* [PATCH 4.9 66/71] net: add rb_to_skb() and other rb tree helpers
From: Greg Kroah-Hartman @ 2018-10-16 17:10 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller
In-Reply-To: <20181016170539.315587743@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

Geeralize private netem_rb_to_skb()

TCP rtx queue will soon be converted to rb-tree,
so we will need skb_rbtree_walk() helpers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18a4c0eab2623cc95be98a1e6af1ad18e7695977)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/skbuff.h |   18 ++++++++++++++++++
 net/ipv4/tcp_input.c   |   33 ++++++++++++---------------------
 2 files changed, 30 insertions(+), 21 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2988,6 +2988,12 @@ static inline int __skb_grow_rcsum(struc
 
 #define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode)
 
+#define rb_to_skb(rb) rb_entry_safe(rb, struct sk_buff, rbnode)
+#define skb_rb_first(root) rb_to_skb(rb_first(root))
+#define skb_rb_last(root)  rb_to_skb(rb_last(root))
+#define skb_rb_next(skb)   rb_to_skb(rb_next(&(skb)->rbnode))
+#define skb_rb_prev(skb)   rb_to_skb(rb_prev(&(skb)->rbnode))
+
 #define skb_queue_walk(queue, skb) \
 		for (skb = (queue)->next;					\
 		     skb != (struct sk_buff *)(queue);				\
@@ -3002,6 +3008,18 @@ static inline int __skb_grow_rcsum(struc
 		for (; skb != (struct sk_buff *)(queue);			\
 		     skb = skb->next)
 
+#define skb_rbtree_walk(skb, root)						\
+		for (skb = skb_rb_first(root); skb != NULL;			\
+		     skb = skb_rb_next(skb))
+
+#define skb_rbtree_walk_from(skb)						\
+		for (; skb != NULL;						\
+		     skb = skb_rb_next(skb))
+
+#define skb_rbtree_walk_from_safe(skb, tmp)					\
+		for (; tmp = skb ? skb_rb_next(skb) : NULL, (skb != NULL);	\
+		     skb = tmp)
+
 #define skb_queue_walk_from_safe(queue, skb, tmp)				\
 		for (tmp = skb->next;						\
 		     skb != (struct sk_buff *)(queue);				\
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4406,7 +4406,7 @@ static void tcp_ofo_queue(struct sock *s
 
 	p = rb_first(&tp->out_of_order_queue);
 	while (p) {
-		skb = rb_entry(p, struct sk_buff, rbnode);
+		skb = rb_to_skb(p);
 		if (after(TCP_SKB_CB(skb)->seq, tp->rcv_nxt))
 			break;
 
@@ -4470,7 +4470,7 @@ static int tcp_try_rmem_schedule(struct
 static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct rb_node **p, *q, *parent;
+	struct rb_node **p, *parent;
 	struct sk_buff *skb1;
 	u32 seq, end_seq;
 	bool fragstolen;
@@ -4529,7 +4529,7 @@ coalesce_done:
 	parent = NULL;
 	while (*p) {
 		parent = *p;
-		skb1 = rb_entry(parent, struct sk_buff, rbnode);
+		skb1 = rb_to_skb(parent);
 		if (before(seq, TCP_SKB_CB(skb1)->seq)) {
 			p = &parent->rb_left;
 			continue;
@@ -4574,9 +4574,7 @@ insert:
 
 merge_right:
 	/* Remove other segments covered by skb. */
-	while ((q = rb_next(&skb->rbnode)) != NULL) {
-		skb1 = rb_entry(q, struct sk_buff, rbnode);
-
+	while ((skb1 = skb_rb_next(skb)) != NULL) {
 		if (!after(end_seq, TCP_SKB_CB(skb1)->seq))
 			break;
 		if (before(end_seq, TCP_SKB_CB(skb1)->end_seq)) {
@@ -4591,7 +4589,7 @@ merge_right:
 		tcp_drop(sk, skb1);
 	}
 	/* If there is no skb after us, we are the last_skb ! */
-	if (!q)
+	if (!skb1)
 		tp->ooo_last_skb = skb;
 
 add_sack:
@@ -4792,7 +4790,7 @@ static struct sk_buff *tcp_skb_next(stru
 	if (list)
 		return !skb_queue_is_last(list, skb) ? skb->next : NULL;
 
-	return rb_entry_safe(rb_next(&skb->rbnode), struct sk_buff, rbnode);
+	return skb_rb_next(skb);
 }
 
 static struct sk_buff *tcp_collapse_one(struct sock *sk, struct sk_buff *skb,
@@ -4821,7 +4819,7 @@ static void tcp_rbtree_insert(struct rb_
 
 	while (*p) {
 		parent = *p;
-		skb1 = rb_entry(parent, struct sk_buff, rbnode);
+		skb1 = rb_to_skb(parent);
 		if (before(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb1)->seq))
 			p = &parent->rb_left;
 		else
@@ -4941,19 +4939,12 @@ static void tcp_collapse_ofo_queue(struc
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 range_truesize, sum_tiny = 0;
 	struct sk_buff *skb, *head;
-	struct rb_node *p;
 	u32 start, end;
 
-	p = rb_first(&tp->out_of_order_queue);
-	skb = rb_entry_safe(p, struct sk_buff, rbnode);
+	skb = skb_rb_first(&tp->out_of_order_queue);
 new_range:
 	if (!skb) {
-		p = rb_last(&tp->out_of_order_queue);
-		/* Note: This is possible p is NULL here. We do not
-		 * use rb_entry_safe(), as ooo_last_skb is valid only
-		 * if rbtree is not empty.
-		 */
-		tp->ooo_last_skb = rb_entry(p, struct sk_buff, rbnode);
+		tp->ooo_last_skb = skb_rb_last(&tp->out_of_order_queue);
 		return;
 	}
 	start = TCP_SKB_CB(skb)->seq;
@@ -4961,7 +4952,7 @@ new_range:
 	range_truesize = skb->truesize;
 
 	for (head = skb;;) {
-		skb = tcp_skb_next(skb, NULL);
+		skb = skb_rb_next(skb);
 
 		/* Range is terminated when we see a gap or when
 		 * we are at the queue end.
@@ -5017,7 +5008,7 @@ static bool tcp_prune_ofo_queue(struct s
 		prev = rb_prev(node);
 		rb_erase(node, &tp->out_of_order_queue);
 		goal -= rb_to_skb(node)->truesize;
-		tcp_drop(sk, rb_entry(node, struct sk_buff, rbnode));
+		tcp_drop(sk, rb_to_skb(node));
 		if (!prev || goal <= 0) {
 			sk_mem_reclaim(sk);
 			if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf &&
@@ -5027,7 +5018,7 @@ static bool tcp_prune_ofo_queue(struct s
 		}
 		node = prev;
 	} while (node);
-	tp->ooo_last_skb = rb_entry(prev, struct sk_buff, rbnode);
+	tp->ooo_last_skb = rb_to_skb(prev);
 
 	/* Reset SACK state.  A conforming SACK implementation will
 	 * do the same at a timeout based retransmit.  When a connection

^ permalink raw reply

* [PATCH 4.9 44/71] inet: frags: add a pointer to struct netns_frags
From: Greg Kroah-Hartman @ 2018-10-16 17:09 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Greg Kroah-Hartman, stable, Eric Dumazet, David S. Miller
In-Reply-To: <20181016170539.315587743@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

In order to simplify the API, add a pointer to struct inet_frags.
This will allow us to make things less complex.

These functions no longer have a struct inet_frags parameter :

inet_frag_destroy(struct inet_frag_queue *q  /*, struct inet_frags *f */)
inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 093ba72914b696521e4885756a68a3332782c8de)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/inet_frag.h                 |   11 ++++++-----
 include/net/ipv6.h                      |    3 +--
 net/ieee802154/6lowpan/reassembly.c     |   13 +++++++------
 net/ipv4/inet_fragment.c                |   17 ++++++++++-------
 net/ipv4/ip_fragment.c                  |    9 +++++----
 net/ipv6/netfilter/nf_conntrack_reasm.c |   16 +++++++++-------
 net/ipv6/reassembly.c                   |   20 ++++++++++----------
 7 files changed, 48 insertions(+), 41 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -9,6 +9,7 @@ struct netns_frags {
 	int			high_thresh;
 	int			low_thresh;
 	int			max_dist;
+	struct inet_frags	*f;
 };
 
 /**
@@ -108,20 +109,20 @@ static inline int inet_frags_init_net(st
 	atomic_set(&nf->mem, 0);
 	return 0;
 }
-void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
+void inet_frags_exit_net(struct netns_frags *nf);
 
-void inet_frag_kill(struct inet_frag_queue *q, struct inet_frags *f);
-void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f);
+void inet_frag_kill(struct inet_frag_queue *q);
+void inet_frag_destroy(struct inet_frag_queue *q);
 struct inet_frag_queue *inet_frag_find(struct netns_frags *nf,
 		struct inet_frags *f, void *key, unsigned int hash);
 
 void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
 				   const char *prefix);
 
-static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f)
+static inline void inet_frag_put(struct inet_frag_queue *q)
 {
 	if (atomic_dec_and_test(&q->refcnt))
-		inet_frag_destroy(q, f);
+		inet_frag_destroy(q);
 }
 
 static inline bool inet_frag_evicting(struct inet_frag_queue *q)
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -559,8 +559,7 @@ struct frag_queue {
 	u8			ecn;
 };
 
-void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
-			   struct inet_frags *frags);
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq);
 
 static inline bool ipv6_addr_any(const struct in6_addr *a)
 {
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -93,10 +93,10 @@ static void lowpan_frag_expire(unsigned
 	if (fq->q.flags & INET_FRAG_COMPLETE)
 		goto out;
 
-	inet_frag_kill(&fq->q, &lowpan_frags);
+	inet_frag_kill(&fq->q);
 out:
 	spin_unlock(&fq->q.lock);
-	inet_frag_put(&fq->q, &lowpan_frags);
+	inet_frag_put(&fq->q);
 }
 
 static inline struct lowpan_frag_queue *
@@ -229,7 +229,7 @@ static int lowpan_frag_reasm(struct lowp
 	struct sk_buff *fp, *head = fq->q.fragments;
 	int sum_truesize;
 
-	inet_frag_kill(&fq->q, &lowpan_frags);
+	inet_frag_kill(&fq->q);
 
 	/* Make the one we just received the head. */
 	if (prev) {
@@ -437,7 +437,7 @@ int lowpan_frag_rcv(struct sk_buff *skb,
 		ret = lowpan_frag_queue(fq, skb, frag_type);
 		spin_unlock(&fq->q.lock);
 
-		inet_frag_put(&fq->q, &lowpan_frags);
+		inet_frag_put(&fq->q);
 		return ret;
 	}
 
@@ -585,13 +585,14 @@ static int __net_init lowpan_frags_init_
 	ieee802154_lowpan->frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	ieee802154_lowpan->frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	ieee802154_lowpan->frags.timeout = IPV6_FRAG_TIMEOUT;
+	ieee802154_lowpan->frags.f = &lowpan_frags;
 
 	res = inet_frags_init_net(&ieee802154_lowpan->frags);
 	if (res < 0)
 		return res;
 	res = lowpan_frags_ns_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&ieee802154_lowpan->frags, &lowpan_frags);
+		inet_frags_exit_net(&ieee802154_lowpan->frags);
 	return res;
 }
 
@@ -601,7 +602,7 @@ static void __net_exit lowpan_frags_exit
 		net_ieee802154_lowpan(net);
 
 	lowpan_frags_ns_sysctl_unregister(net);
-	inet_frags_exit_net(&ieee802154_lowpan->frags, &lowpan_frags);
+	inet_frags_exit_net(&ieee802154_lowpan->frags);
 }
 
 static struct pernet_operations lowpan_frags_ops = {
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -219,8 +219,9 @@ void inet_frags_fini(struct inet_frags *
 }
 EXPORT_SYMBOL(inet_frags_fini);
 
-void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f)
+void inet_frags_exit_net(struct netns_frags *nf)
 {
+	struct inet_frags *f =nf->f;
 	unsigned int seq;
 	int i;
 
@@ -264,33 +265,34 @@ __acquires(hb->chain_lock)
 	return hb;
 }
 
-static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
+static inline void fq_unlink(struct inet_frag_queue *fq)
 {
 	struct inet_frag_bucket *hb;
 
-	hb = get_frag_bucket_locked(fq, f);
+	hb = get_frag_bucket_locked(fq, fq->net->f);
 	hlist_del(&fq->list);
 	fq->flags |= INET_FRAG_COMPLETE;
 	spin_unlock(&hb->chain_lock);
 }
 
-void inet_frag_kill(struct inet_frag_queue *fq, struct inet_frags *f)
+void inet_frag_kill(struct inet_frag_queue *fq)
 {
 	if (del_timer(&fq->timer))
 		atomic_dec(&fq->refcnt);
 
 	if (!(fq->flags & INET_FRAG_COMPLETE)) {
-		fq_unlink(fq, f);
+		fq_unlink(fq);
 		atomic_dec(&fq->refcnt);
 	}
 }
 EXPORT_SYMBOL(inet_frag_kill);
 
-void inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f)
+void inet_frag_destroy(struct inet_frag_queue *q)
 {
 	struct sk_buff *fp;
 	struct netns_frags *nf;
 	unsigned int sum, sum_truesize = 0;
+	struct inet_frags *f;
 
 	WARN_ON(!(q->flags & INET_FRAG_COMPLETE));
 	WARN_ON(del_timer(&q->timer) != 0);
@@ -298,6 +300,7 @@ void inet_frag_destroy(struct inet_frag_
 	/* Release all fragment data. */
 	fp = q->fragments;
 	nf = q->net;
+	f = nf->f;
 	while (fp) {
 		struct sk_buff *xp = fp->next;
 
@@ -333,7 +336,7 @@ static struct inet_frag_queue *inet_frag
 			atomic_inc(&qp->refcnt);
 			spin_unlock(&hb->chain_lock);
 			qp_in->flags |= INET_FRAG_COMPLETE;
-			inet_frag_put(qp_in, f);
+			inet_frag_put(qp_in);
 			return qp;
 		}
 	}
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -167,7 +167,7 @@ static void ip4_frag_free(struct inet_fr
 
 static void ipq_put(struct ipq *ipq)
 {
-	inet_frag_put(&ipq->q, &ip4_frags);
+	inet_frag_put(&ipq->q);
 }
 
 /* Kill ipq entry. It is not destroyed immediately,
@@ -175,7 +175,7 @@ static void ipq_put(struct ipq *ipq)
  */
 static void ipq_kill(struct ipq *ipq)
 {
-	inet_frag_kill(&ipq->q, &ip4_frags);
+	inet_frag_kill(&ipq->q);
 }
 
 static bool frag_expire_skip_icmp(u32 user)
@@ -875,20 +875,21 @@ static int __net_init ipv4_frags_init_ne
 	net->ipv4.frags.timeout = IP_FRAG_TIME;
 
 	net->ipv4.frags.max_dist = 64;
+	net->ipv4.frags.f = &ip4_frags;
 
 	res = inet_frags_init_net(&net->ipv4.frags);
 	if (res < 0)
 		return res;
 	res = ip4_frags_ns_ctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->ipv4.frags, &ip4_frags);
+		inet_frags_exit_net(&net->ipv4.frags);
 	return res;
 }
 
 static void __net_exit ipv4_frags_exit_net(struct net *net)
 {
 	ip4_frags_ns_ctl_unregister(net);
-	inet_frags_exit_net(&net->ipv4.frags, &ip4_frags);
+	inet_frags_exit_net(&net->ipv4.frags);
 }
 
 static struct pernet_operations ip4_frags_ops = {
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -177,7 +177,7 @@ static void nf_ct_frag6_expire(unsigned
 	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
 	net = container_of(fq->q.net, struct net, nf_frag.frags);
 
-	ip6_expire_frag_queue(net, fq, &nf_frags);
+	ip6_expire_frag_queue(net, fq);
 }
 
 /* Creation primitives. */
@@ -263,7 +263,7 @@ static int nf_ct_frag6_queue(struct frag
 			 * this case. -DaveM
 			 */
 			pr_debug("end of fragment not rounded to 8 bytes.\n");
-			inet_frag_kill(&fq->q, &nf_frags);
+			inet_frag_kill(&fq->q);
 			return -EPROTO;
 		}
 		if (end > fq->q.len) {
@@ -356,7 +356,7 @@ found:
 	return 0;
 
 discard_fq:
-	inet_frag_kill(&fq->q, &nf_frags);
+	inet_frag_kill(&fq->q);
 err:
 	return -EINVAL;
 }
@@ -378,7 +378,7 @@ nf_ct_frag6_reasm(struct frag_queue *fq,
 	int    payload_len;
 	u8 ecn;
 
-	inet_frag_kill(&fq->q, &nf_frags);
+	inet_frag_kill(&fq->q);
 
 	WARN_ON(head == NULL);
 	WARN_ON(NFCT_FRAG6_CB(head)->offset != 0);
@@ -623,7 +623,7 @@ int nf_ct_frag6_gather(struct net *net,
 
 out_unlock:
 	spin_unlock_bh(&fq->q.lock);
-	inet_frag_put(&fq->q, &nf_frags);
+	inet_frag_put(&fq->q);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);
@@ -635,19 +635,21 @@ static int nf_ct_net_init(struct net *ne
 	net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
+	net->nf_frag.frags.f = &nf_frags;
+
 	res = inet_frags_init_net(&net->nf_frag.frags);
 	if (res < 0)
 		return res;
 	res = nf_ct_frag6_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+		inet_frags_exit_net(&net->nf_frag.frags);
 	return res;
 }
 
 static void nf_ct_net_exit(struct net *net)
 {
 	nf_ct_frags6_sysctl_unregister(net);
-	inet_frags_exit_net(&net->nf_frag.frags, &nf_frags);
+	inet_frags_exit_net(&net->nf_frag.frags);
 }
 
 static struct pernet_operations nf_ct_net_ops = {
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -128,8 +128,7 @@ void ip6_frag_init(struct inet_frag_queu
 }
 EXPORT_SYMBOL(ip6_frag_init);
 
-void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq,
-			   struct inet_frags *frags)
+void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
 {
 	struct net_device *dev = NULL;
 
@@ -138,7 +137,7 @@ void ip6_expire_frag_queue(struct net *n
 	if (fq->q.flags & INET_FRAG_COMPLETE)
 		goto out;
 
-	inet_frag_kill(&fq->q, frags);
+	inet_frag_kill(&fq->q);
 
 	rcu_read_lock();
 	dev = dev_get_by_index_rcu(net, fq->iif);
@@ -166,7 +165,7 @@ out_rcu_unlock:
 	rcu_read_unlock();
 out:
 	spin_unlock(&fq->q.lock);
-	inet_frag_put(&fq->q, frags);
+	inet_frag_put(&fq->q);
 }
 EXPORT_SYMBOL(ip6_expire_frag_queue);
 
@@ -178,7 +177,7 @@ static void ip6_frag_expire(unsigned lon
 	fq = container_of((struct inet_frag_queue *)data, struct frag_queue, q);
 	net = container_of(fq->q.net, struct net, ipv6.frags);
 
-	ip6_expire_frag_queue(net, fq, &ip6_frags);
+	ip6_expire_frag_queue(net, fq);
 }
 
 static struct frag_queue *
@@ -359,7 +358,7 @@ found:
 	return -1;
 
 discard_fq:
-	inet_frag_kill(&fq->q, &ip6_frags);
+	inet_frag_kill(&fq->q);
 err:
 	__IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
 			IPSTATS_MIB_REASMFAILS);
@@ -386,7 +385,7 @@ static int ip6_frag_reasm(struct frag_qu
 	int sum_truesize;
 	u8 ecn;
 
-	inet_frag_kill(&fq->q, &ip6_frags);
+	inet_frag_kill(&fq->q);
 
 	ecn = ip_frag_ecn_table[fq->ecn];
 	if (unlikely(ecn == 0xff))
@@ -563,7 +562,7 @@ static int ipv6_frag_rcv(struct sk_buff
 		ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff);
 
 		spin_unlock(&fq->q.lock);
-		inet_frag_put(&fq->q, &ip6_frags);
+		inet_frag_put(&fq->q);
 		return ret;
 	}
 
@@ -714,6 +713,7 @@ static int __net_init ipv6_frags_init_ne
 	net->ipv6.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
 	net->ipv6.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
 	net->ipv6.frags.timeout = IPV6_FRAG_TIMEOUT;
+	net->ipv6.frags.f = &ip6_frags;
 
 	res = inet_frags_init_net(&net->ipv6.frags);
 	if (res < 0)
@@ -721,14 +721,14 @@ static int __net_init ipv6_frags_init_ne
 
 	res = ip6_frags_ns_sysctl_register(net);
 	if (res < 0)
-		inet_frags_exit_net(&net->ipv6.frags, &ip6_frags);
+		inet_frags_exit_net(&net->ipv6.frags);
 	return res;
 }
 
 static void __net_exit ipv6_frags_exit_net(struct net *net)
 {
 	ip6_frags_ns_sysctl_unregister(net);
-	inet_frags_exit_net(&net->ipv6.frags, &ip6_frags);
+	inet_frags_exit_net(&net->ipv6.frags);
 }
 
 static struct pernet_operations ip6_frags_ops = {

^ permalink raw reply

* Hello My Dear Friend,
From: Mr Marc Joseph Hebert @ 2018-10-16 17:14 UTC (permalink / raw)


I am Mr Marc Joseph Hebert a I work in the Finance Risk
control/Accounts Broker Unit of a prestigious bank in London. Under
varying state laws in United Kingdom, financial institutions and other
companies are required to turn over any funds considered "abandoned,"
including uncashed paychecks, forgotten bank account balances,
unclaimed refunds, insurance payouts and contents of safe deposit
boxes. I have the official duty to process and release unclaimed funds
in the bank to government treasury.

Recently, there are multiple abandoned accounts in the bank which I
have transferred some to the government treasury. Some of these funds
are what I want to transfer (10.6m GBP) out of the bank to a sincere
and

trustworthy person for either investment purpose or sharing between
us.  Can you handle this with confidentiality, sincerity and
seriousness?

Please indicate your interest by simply replying to this email with
your full personal details below.

(1) Your Full Name:
(2) Full Residential Address:
(3) Phone And Fax Number:
(4) Occupation:
(5) Whatsapp Number:

I anticipate your urgent response to this financial deal.

Your responds should be forwarded to my private email below.

marc.joseph.hebert1@gmail.com

Sincerely,

Mr Marc Joseph Hebert
Finance Risk control/Accounts Broker Unit.

^ permalink raw reply

* Re: [PATCH net-next 0/5] Align PTT and add various link modes.
From: David Miller @ 2018-10-16 17:04 UTC (permalink / raw)
  To: rahul.verma; +Cc: netdev, Ariel.Elior, Dept-EngEverestLinuxL2
In-Reply-To: <20181016105922.25562-1-rahul.verma@cavium.com>

From: Rahul Verma <rahul.verma@cavium.com>
Date: Tue, 16 Oct 2018 03:59:17 -0700

> From: Rahul Verma <Rahul.Verma@cavium.com>
> 
> This series aligns the ptt propagation as local ptt or global ptt.
> Adds new transceiver modes, speed capabilities and board config,
> which is utilized to display the enhanced link modes, media types
> and speed. Enhances the link with detailed information.

Series applied.

^ permalink raw reply

* Re: [PATCH net] sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead
From: David Miller @ 2018-10-16 16:59 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, linux-sctp, marcelo.leitner, nhorman
In-Reply-To: <e1b1741db983e1775312816bc2e6f0f685f9828d.1539676322.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Tue, 16 Oct 2018 15:52:02 +0800

> According to rfc7496 section 4.3 or 4.4:
> 
>    sprstat_policy:  This parameter indicates for which PR-SCTP policy
>       the user wants the information.  It is an error to use
>       SCTP_PR_SCTP_NONE in sprstat_policy.  If SCTP_PR_SCTP_ALL is used,
>       the counters provided are aggregated over all supported policies.
> 
> We change to dump pr_assoc and pr_stream all status by SCTP_PR_SCTP_ALL
> instead, and return error for SCTP_PR_SCTP_NONE, as it also said "It is
> an error to use SCTP_PR_SCTP_NONE in sprstat_policy. "
> 
> Fixes: 826d253d57b1 ("sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt")
> Fixes: d229d48d183f ("sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp")
> Reported-by: Ying Xu <yinxu@redhat.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Sebastian Andrzej Siewior @ 2018-10-16 16:55 UTC (permalink / raw)
  To: netdev, virtualization
  Cc: tglx, Toshiaki Makita, Michael S. Tsirkin, Jason Wang,
	David S. Miller

on 32bit, lockdep notices:
| ================================
| WARNING: inconsistent lock state
| 4.19.0-rc8+ #9 Tainted: G        W
| --------------------------------
| inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
| ip/1106 [HC0[0]:SC1[1]:HE1:SE0] takes:
| (ptrval) (&syncp->seq#2){+.?.}, at: net_rx_action+0xc8/0x380
| {SOFTIRQ-ON-W} state was registered at:
|   lock_acquire+0x7e/0x170
|   try_fill_recv+0x5fa/0x700
|   virtnet_open+0xe0/0x180
|   __dev_open+0xae/0x130
|   __dev_change_flags+0x17f/0x200
|   dev_change_flags+0x23/0x60
|   do_setlink+0x2bb/0xa20
|   rtnl_newlink+0x523/0x830
|   rtnetlink_rcv_msg+0x14b/0x470
|   netlink_rcv_skb+0x6e/0xf0
|   rtnetlink_rcv+0xd/0x10
|   netlink_unicast+0x16e/0x1f0
|   netlink_sendmsg+0x1af/0x3a0
|   ___sys_sendmsg+0x20f/0x240
|   __sys_sendmsg+0x39/0x80
|   sys_socketcall+0x13a/0x2a0
|   do_int80_syscall_32+0x50/0x180
|   restore_all+0x0/0xb2
| irq event stamp: 3326
| hardirqs last  enabled at (3326): [<c159e6d0>] net_rx_action+0x80/0x380
| hardirqs last disabled at (3325): [<c159e6aa>] net_rx_action+0x5a/0x380
| softirqs last  enabled at (3322): [<c14b440d>] virtnet_napi_enable+0xd/0x60
| softirqs last disabled at (3323): [<c101d63d>] call_on_stack+0xd/0x50
|
| other info that might help us debug this:
|  Possible unsafe locking scenario:
|
|        CPU0
|        ----
|   lock(&syncp->seq#2);
|   <Interrupt>
|     lock(&syncp->seq#2);
|
|  *** DEADLOCK ***

This is the "up" path which is not a hotpath. There is also
refill_work().
It might be unwise to add the local_bh_disable() to try_fill_recv()
because if it is used mostly in BH so that local_bh_en+dis might be a
waste of cycles.

Adding local_bh_disable() around try_fill_recv() for the non-BH call
sites would render GFP_KERNEL pointless.

Also, ptr->var++ is not an atomic operation even on 64bit CPUs. Which
means if try_fill_recv() runs on CPU0 (via virtnet_receive()) then the
worker might run on CPU1.

Do we care or is this just stupid stats?  Any suggestions?

This warning appears since commit 461f03dc99cf6 ("virtio_net: Add kick stats").

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dab504ec5e502..d782160cfa882 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1206,9 +1206,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 			break;
 	} while (rq->vq->num_free);
 	if (virtqueue_kick_prepare(rq->vq) && virtqueue_notify(rq->vq)) {
+		local_bh_disable();
 		u64_stats_update_begin(&rq->stats.syncp);
 		rq->stats.kicks++;
 		u64_stats_update_end(&rq->stats.syncp);
+		local_bh_enable();
 	}
 
 	return !oom;
-- 
2.19.1

^ permalink raw reply related

* Re: bpfilter causes a leftover kernel process
From: Alexei Starovoitov @ 2018-10-16 16:38 UTC (permalink / raw)
  To: Olivier Brunel; +Cc: Network Development, Daniel Borkmann
In-Reply-To: <20180905175243.78a6ba81@jjacky.com>

On Wed, Sep 5, 2018 at 5:05 PM Olivier Brunel <jjk@jjacky.com> wrote:
>
> You'll see in the end that systemd complains that it can't
> unmount /oldroot (EBUSY), aka the root fs; and that's because of the
> bpfilter helper, which wasn't killed because it's seen as a kernel
> thread due to its empty command line and therefore not signaled.

thanks for tracking it down.
can somebody send a patch to give bpfilter non-empty cmdline?
I think that would be a better fix than tweaking all pid1s.

^ permalink raw reply

* Reclaiming memory for network interface
From: Sujeev Dias @ 2018-10-16 16:36 UTC (permalink / raw)
  Cc: netdev, Tony Truong

Hi

Setup: sdm845 connected to external modem over pcie interface

During a data call, we found out we spend more than 25% of cpu for 
memory ops with io coherency.  That include allocation, freeing, dma 
mapping, and unmapping.  As we pushing to higher data rate (beyond 7 
Gbps), the time we spend in memory operation is significant. So, we're 
looking into ways we can reclaim this memory.

One of idea we're thinking is:

1. allocate pages

2. Increment reference count of page

3. allocate skb, and assign page into paged data portion

4. Assign cb function to skb->destructor

5. once destructor get called, move the page to a new skb

Sound simple enough, but we couldn't find anyone actually doing this 
way.  Anything to be concern with above proposal? We see some example of 
using destructor to do deferred unmap but didn't see any example of 
re-using the buffer. Also, couldn't find any meaningful discussion about 
reclaiming memory for network data. Any thoughts on how we should solve 
this issue?  Any comment is welcome, thanks.

Sincerely

Sujeev

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2018-10-16 23:46 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	Davide Caratti, David Ahern

[-- Attachment #1: Type: text/plain, Size: 1217 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  net/sched/cls_api.c

between commit:

  e331473fee3d ("net/sched: cls_api: add missing validation of netlink attributes")

from the net tree and commit:

  dac9c9790e54 ("net: Add extack to nlmsg_parse")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc net/sched/cls_api.c
index 70f144ac5e1d,43c8559aca56..000000000000
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@@ -1951,8 -2055,8 +2057,8 @@@ static int tc_dump_chain(struct sk_buf
  	if (nlmsg_len(cb->nlh) < sizeof(*tcm))
  		return skb->len;
  
 -	err = nlmsg_parse(cb->nlh, sizeof(*tcm), tca, TCA_MAX, NULL,
 +	err = nlmsg_parse(cb->nlh, sizeof(*tcm), tca, TCA_MAX, rtm_tca_policy,
- 			  NULL);
+ 			  cb->extack);
  	if (err)
  		return err;
  

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net] net: bpfilter: use get_pid_task instead of pid_task
From: Alexei Starovoitov @ 2018-10-16 15:51 UTC (permalink / raw)
  To: Taehee Yoo; +Cc: davem, netdev, daniel, ast
In-Reply-To: <20181016153510.16962-1-ap420073@gmail.com>

On Wed, Oct 17, 2018 at 12:35:10AM +0900, Taehee Yoo wrote:
> pid_task() dereferences rcu protected tasks array.
> But there is no rcu_read_lock() in shutdown_umh() routine so that
> rcu_read_lock() is needed.
> get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock()
> then calls pid_task(). if task isn't NULL, it increases reference count
> of task.
> 
> test commands:
>    %modprobe bpfilter
>    %modprobe -rv bpfilter
> 
> splat looks like:
> [15102.030932] =============================
> [15102.030957] WARNING: suspicious RCU usage
> [15102.030985] 4.19.0-rc7+ #21 Not tainted
> [15102.031010] -----------------------------
> [15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage!
> [15102.031063]
> 	       other info that might help us debug this:
> 
> [15102.031332]
> 	       rcu_scheduler_active = 2, debug_locks = 1
> [15102.031363] 1 lock held by modprobe/1570:
> [15102.031389]  #0: 00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter]
> [15102.031552]
>                stack backtrace:
> [15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21
> [15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
> [15102.031628] Call Trace:
> [15102.031676]  dump_stack+0xc9/0x16b
> [15102.031723]  ? show_regs_print_info+0x5/0x5
> [15102.031801]  ? lockdep_rcu_suspicious+0x117/0x160
> [15102.031855]  pid_task+0x134/0x160
> [15102.031900]  ? find_vpid+0xf0/0xf0
> [15102.032017]  shutdown_umh.constprop.1+0x1e/0x53 [bpfilter]
> [15102.032055]  stop_umh+0x46/0x52 [bpfilter]
> [15102.032092]  __x64_sys_delete_module+0x47e/0x570
> [ ... ]
> 
> Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>

thanks a lot for the fix
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* [PATCH net] net: bpfilter: use get_pid_task instead of pid_task
From: Taehee Yoo @ 2018-10-16 15:35 UTC (permalink / raw)
  To: davem, netdev; +Cc: daniel, ast, ap420073

pid_task() dereferences rcu protected tasks array.
But there is no rcu_read_lock() in shutdown_umh() routine so that
rcu_read_lock() is needed.
get_pid_task() is wrapper function of pid_task. it holds rcu_read_lock()
then calls pid_task(). if task isn't NULL, it increases reference count
of task.

test commands:
   %modprobe bpfilter
   %modprobe -rv bpfilter

splat looks like:
[15102.030932] =============================
[15102.030957] WARNING: suspicious RCU usage
[15102.030985] 4.19.0-rc7+ #21 Not tainted
[15102.031010] -----------------------------
[15102.031038] kernel/pid.c:330 suspicious rcu_dereference_check() usage!
[15102.031063]
	       other info that might help us debug this:

[15102.031332]
	       rcu_scheduler_active = 2, debug_locks = 1
[15102.031363] 1 lock held by modprobe/1570:
[15102.031389]  #0: 00000000580ef2b0 (bpfilter_lock){+.+.}, at: stop_umh+0x13/0x52 [bpfilter]
[15102.031552]
               stack backtrace:
[15102.031583] CPU: 1 PID: 1570 Comm: modprobe Not tainted 4.19.0-rc7+ #21
[15102.031607] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[15102.031628] Call Trace:
[15102.031676]  dump_stack+0xc9/0x16b
[15102.031723]  ? show_regs_print_info+0x5/0x5
[15102.031801]  ? lockdep_rcu_suspicious+0x117/0x160
[15102.031855]  pid_task+0x134/0x160
[15102.031900]  ? find_vpid+0xf0/0xf0
[15102.032017]  shutdown_umh.constprop.1+0x1e/0x53 [bpfilter]
[15102.032055]  stop_umh+0x46/0x52 [bpfilter]
[15102.032092]  __x64_sys_delete_module+0x47e/0x570
[ ... ]

Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
 net/bpfilter/bpfilter_kern.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
index b64e1649993b..94e88f510c5b 100644
--- a/net/bpfilter/bpfilter_kern.c
+++ b/net/bpfilter/bpfilter_kern.c
@@ -23,9 +23,11 @@ static void shutdown_umh(struct umh_info *info)
 
 	if (!info->pid)
 		return;
-	tsk = pid_task(find_vpid(info->pid), PIDTYPE_PID);
-	if (tsk)
+	tsk = get_pid_task(find_vpid(info->pid), PIDTYPE_PID);
+	if (tsk) {
 		force_sig(SIGKILL, tsk);
+		put_task_struct(tsk);
+	}
 	fput(info->pipe_to_umh);
 	fput(info->pipe_from_umh);
 	info->pid = 0;
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH bpf-next] bpf, tls: add tls header to tools infrastructure
From: Alexei Starovoitov @ 2018-10-16 15:21 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: rdna, john.fastabend, netdev
In-Reply-To: <20181016135936.6032-1-daniel@iogearbox.net>

On Tue, Oct 16, 2018 at 03:59:36PM +0200, Daniel Borkmann wrote:
> Andrey reported a build error for the BPF kselftest suite when compiled on
> a machine which does not have tls related header bits installed natively:
> 
>   test_sockmap.c:120:23: fatal error: linux/tls.h: No such file or directory
>    #include <linux/tls.h>
>                          ^
>   compilation terminated.
> 
> Fix it by adding the header to the tools include infrastructure and add
> definitions such as SOL_TLS that could potentially be missing.
> 
> Fixes: e9dd904708c4 ("bpf: add tls support for testing in test_sockmap")
> Reported-by: Andrey Ignatov <rdna@fb.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Applied, Thanks

^ permalink raw reply

* Re: [PATCH net-next v2 0/2] FDDI: DEC FDDIcontroller 700 TURBOchannel adapter support
From: Maciej W. Rozycki @ 2018-10-16 14:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20181015.214629.1214428866405613348.davem@davemloft.net>

On Mon, 15 Oct 2018, David Miller wrote:

> Series applied, thank you.

 Great, thanks!

  Maciej

^ permalink raw reply

* [PATCH net] netfilter: fix DNAT target for shifted portmap ranges
From: Paolo Abeni @ 2018-10-16 14:52 UTC (permalink / raw)
  To: netdev
  Cc: Thierry Du Tre, Pablo Neira Ayuso, Florian Westphal,
	David S. Miller, netfilter-devel

The commit 2eb0f624b709 ("netfilter: add NAT support for shifted
portmap ranges") did not set the checkentry/destroy callbacks for
the newly added DNAT target. As a result, rulesets using only
such nat targets are not effective, as the relevant conntrack hooks
are not enabled.
The above affect also nft_compat rulesets.
Fix the issue adding the missing initializers.

Fixes: 2eb0f624b709 ("netfilter: add NAT support for shifted portmap ranges")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/netfilter/xt_nat.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netfilter/xt_nat.c b/net/netfilter/xt_nat.c
index 8af9707f8789..ac91170fc8c8 100644
--- a/net/netfilter/xt_nat.c
+++ b/net/netfilter/xt_nat.c
@@ -216,6 +216,8 @@ static struct xt_target xt_nat_target_reg[] __read_mostly = {
 	{
 		.name		= "DNAT",
 		.revision	= 2,
+		.checkentry	= xt_nat_checkentry,
+		.destroy	= xt_nat_destroy,
 		.target		= xt_dnat_target_v2,
 		.targetsize	= sizeof(struct nf_nat_range2),
 		.table		= "nat",
-- 
2.17.2

^ permalink raw reply related

* [PATCH] net/ipv4: fix tcp_poll for SMC fallback
From: Karsten Graul @ 2018-10-16 14:45 UTC (permalink / raw)
  To: netdev; +Cc: ubraun, hch, linux-s390

Commit dd979b4df817 ("net: simplify sock_poll_wait") breaks tcp_poll for 
SMC fallback: An AF_SMC socket establishes an internal TCP socket for the 
CLC handshake with the remote peer. Whenever the SMC connection can not be 
established this CLC socket is used as a fallback. All socket operations on the 
SMC socket are then forwarded to the CLC socket. In case of poll, the 
file->private_data pointer references the SMC socket because the CLC socket has 
no file assigned. This causes tcp_poll to wait on the wrong socket.

This patch fixes the issue by (re)introducing a sock_poll_wait variant with 
a socket parameter, and let tcp_poll use this variant.

Fixes: dd979b4df817 ("net: simplify sock_poll_wait")
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
---
 include/net/sock.h | 20 +++++++++++++++++---
 net/ipv4/tcp.c     |  2 +-
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 433f45fc2d68..eb2980d48aeb 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2057,14 +2057,14 @@ static inline bool skwq_has_sleeper(struct socket_wq *wq)
 /**
  * sock_poll_wait - place memory barrier behind the poll_wait call.
  * @filp:           file
+ * @sock:           socket to wait
  * @p:              poll_table
  *
  * See the comments in the wq_has_sleeper function.
  */
-static inline void sock_poll_wait(struct file *filp, poll_table *p)
+static inline void _sock_poll_wait(struct file *filp, struct socket *sock,
+				   poll_table *p)
 {
-	struct socket *sock = filp->private_data;
-
 	if (!poll_does_not_wait(p)) {
 		poll_wait(filp, &sock->wq->wait, p);
 		/* We need to be sure we are in sync with the
@@ -2076,6 +2076,20 @@ static inline void sock_poll_wait(struct file *filp, poll_table *p)
 	}
 }
 
+/**
+ * sock_poll_wait - place memory barrier behind the poll_wait call.
+ * @filp:           file
+ * @p:              poll_table
+ *
+ * See the comments in the wq_has_sleeper function.
+ */
+static inline void sock_poll_wait(struct file *filp, poll_table *p)
+{
+	struct socket *sock = filp->private_data;
+
+	_sock_poll_wait(filp, sock, p);
+}
+
 static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
 {
 	if (sk->sk_txhash) {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 10c6246396cc..a8041729839d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -507,7 +507,7 @@ __poll_t tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	const struct tcp_sock *tp = tcp_sk(sk);
 	int state;
 
-	sock_poll_wait(file, wait);
+	_sock_poll_wait(file, sock, wait);
 
 	state = inet_sk_state_load(sk);
 	if (state == TCP_LISTEN)
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH bpf-next v2 1/8] tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanup
From: Daniel Borkmann @ 2018-10-16 14:29 UTC (permalink / raw)
  To: Eric Dumazet, alexei.starovoitov; +Cc: john.fastabend, davejwatson, netdev
In-Reply-To: <0b093f03-f9e2-99dd-9303-448eeb8c04f7@gmail.com>

On 10/16/2018 04:17 PM, Eric Dumazet wrote:
> On 10/12/2018 05:45 PM, Daniel Borkmann wrote:
[...]
>> diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
>> index a5995bb..34e9635 100644
>> --- a/net/ipv4/tcp_ulp.c
>> +++ b/net/ipv4/tcp_ulp.c
>> @@ -123,6 +123,8 @@ void tcp_cleanup_ulp(struct sock *sk)
>>  {
>>  	struct inet_connection_sock *icsk = inet_csk(sk);
>>  
>> +	sock_owned_by_me(sk);
>> +
>>  	if (!icsk->icsk_ulp_ops)
>>  		return;
> 
> Ahem... inet_csk_prepare_forced_close() releases the socket lock,
> and tcp_done(newsk); is called after inet_csk_prepare_forced_close() 

Right you are, will fix it up. Thanks!

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/8] tcp, ulp: enforce sock_owned_by_me upon ulp init and cleanup
From: Eric Dumazet @ 2018-10-16 14:17 UTC (permalink / raw)
  To: Daniel Borkmann, alexei.starovoitov; +Cc: john.fastabend, davejwatson, netdev
In-Reply-To: <20181013004603.3747-2-daniel@iogearbox.net>



On 10/12/2018 05:45 PM, Daniel Borkmann wrote:
> Whenever the ULP data on the socket is mangled, enforce that the
> caller has the socket lock held as otherwise things may race with
> initialization and cleanup callbacks from ulp ops as both would
> mangle internal socket state.
> 
> Joint work with John.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  net/ipv4/tcp_ulp.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
> index a5995bb..34e9635 100644
> --- a/net/ipv4/tcp_ulp.c
> +++ b/net/ipv4/tcp_ulp.c
> @@ -123,6 +123,8 @@ void tcp_cleanup_ulp(struct sock *sk)
>  {
>  	struct inet_connection_sock *icsk = inet_csk(sk);
>  
> +	sock_owned_by_me(sk);
> +
>  	if (!icsk->icsk_ulp_ops)
>  		return;

Ahem... inet_csk_prepare_forced_close() releases the socket lock,
and tcp_done(newsk); is called after inet_csk_prepare_forced_close() 


syzkaller got the following trace



TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending cookies.  Check SNMP counters.
tmpfs: Bad mount option s\xA8\xFE\x9E\x92\xE9K\xD7:\x85\x87$z\x94\xFB3\xBF\xE4\x8E\x88\xE2\xF0\x19\x11\b%\x92\xF8\xE5\xC3lh
WARNING: CPU: 0 PID: 12625 at include/net/sock.h:1539 sock_owned_by_me include/net/sock.h:1539 [inline]
WARNING: CPU: 0 PID: 12625 at include/net/sock.h:1539 tcp_cleanup_ulp+0x1ad/0x200 net/ipv4/tcp_ulp.c:102
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 12625 Comm: syz-executor3 Not tainted 4.19.0-rc8-next-20181016+ #95
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x244/0x39d lib/dump_stack.c:113
 panic+0x2ad/0x55c kernel/panic.c:188
 __warn.cold.8+0x20/0x45 kernel/panic.c:540
 report_bug+0x254/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
 do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
RIP: 0010:sock_owned_by_me include/net/sock.h:1539 [inline]
RIP: 0010:tcp_cleanup_ulp+0x1ad/0x200 net/ipv4/tcp_ulp.c:102
Code: 83 c0 03 38 d0 7c 04 84 d2 75 61 44 8b 25 cb 4e df 02 31 ff 44 89 e6 e8 51 3d ed fa 45 85 e4 0f 84 91 fe ff ff e8 33 3c ed fa <0f> 0b e9 85 fe ff ff 4c 89 ef e8 34 84 30 fb e9 9f fe ff ff 4c 89
RSP: 0018:ffff8801dae06860 EFLAGS: 00010206
RAX: ffff8801bf202040 RBX: ffff8801918501c0 RCX: ffffffff8690e6ff
RDX: 0000000000000100 RSI: ffffffff8690e70d RDI: 0000000000000005
RBP: ffff8801dae06880 R08: ffff8801bf202040 R09: 0000000000000002
R10: 0000000000000000 R11: ffff8801bf202040 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000003 R15: ffff8801dae069a0
 tcp_v4_destroy_sock+0x15c/0x980 net/ipv4/tcp_ipv4.c:1980
 tcp_v6_destroy_sock+0x15/0x20 net/ipv6/tcp_ipv6.c:1762
 inet_csk_destroy_sock+0x19f/0x440 net/ipv4/inet_connection_sock.c:838
 tcp_done+0x272/0x310 net/ipv4/tcp.c:3760
 tcp_v6_syn_recv_sock+0x1f21/0x25f0 net/ipv6/tcp_ipv6.c:1236
 tcp_get_cookie_sock+0x10e/0x580 net/ipv4/syncookies.c:213
 cookie_v6_check+0x1830/0x27d0 net/ipv6/syncookies.c:257
 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1028 [inline]
 tcp_v6_do_rcv+0x10ea/0x13c0 net/ipv6/tcp_ipv6.c:1336
 tcp_v6_rcv+0x34e0/0x3ab0 net/ipv6/tcp_ipv6.c:1545
 ip6_input_finish+0x3fc/0x1aa0 net/ipv6/ip6_input.c:384
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_input+0xe4/0x600 net/ipv6/ip6_input.c:427
 dst_input include/net/dst.h:450 [inline]
 ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ipv6_rcv+0x110/0x630 net/ipv6/ip6_input.c:272
 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4930
 __netif_receive_skb+0x27/0x1e0 net/core/dev.c:5040
 process_backlog+0x24e/0x7a0 net/core/dev.c:5844
 napi_poll net/core/dev.c:6264 [inline]
 net_rx_action+0x7fa/0x19b0 net/core/dev.c:6330
 __do_softirq+0x308/0xb7e kernel/softirq.c:292
 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1023
 </IRQ>
 do_softirq.part.14+0x126/0x160 kernel/softirq.c:337
 do_softirq kernel/softirq.c:329 [inline]
 __local_bh_enable_ip+0x21d/0x260 kernel/softirq.c:189
 local_bh_enable include/linux/bottom_half.h:32 [inline]
 rcu_read_unlock_bh include/linux/rcupdate.h:696 [inline]
 ip6_finish_output2+0xce4/0x27a0 net/ipv6/ip6_output.c:121
 ip6_finish_output+0x468/0xc60 net/ipv6/ip6_output.c:154
 NF_HOOK_COND include/linux/netfilter.h:278 [inline]
 ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171
 dst_output include/net/dst.h:444 [inline]
 NF_HOOK include/linux/netfilter.h:289 [inline]
 ip6_xmit+0xf64/0x2410 net/ipv6/ip6_output.c:275
 inet6_csk_xmit+0x375/0x630 net/ipv6/inet6_connection_sock.c:139
 __tcp_transmit_skb+0x1bc5/0x3b00 net/ipv4/tcp_output.c:1162
 tcp_transmit_skb net/ipv4/tcp_output.c:1178 [inline]
 tcp_write_xmit+0x1676/0x5710 net/ipv4/tcp_output.c:2364
 tcp_push_one+0xdd/0x110 net/ipv4/tcp_output.c:2551
 tcp_sendmsg_locked+0xbc3/0x3fa0 net/ipv4/tcp.c:1386
 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1443
 inet_sendmsg+0x19c/0x690 net/ipv4/af_inet.c:798
 sock_sendmsg_nosec net/socket.c:622 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:632
 __sys_sendto+0x3d7/0x670 net/socket.c:1789
 __do_sys_sendto net/socket.c:1801 [inline]
 __se_sys_sendto net/socket.c:1797 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1797
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f438931cc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000457569
RDX: fffffffffffffedd RSI: 0000000020000280 RDI: 0000000000000005
RBP: 000000000072bf00 R08: 0000000020000080 R09: 000000000000001c
R10: 000000002000012c R11: 0000000000000246 R12: 00007f438931d6d4
R13: 00000000004c3921 R14: 00000000004d57d8 R15: 00000000ffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..

^ permalink raw reply

* [PATCH bpf-next] bpf, tls: add tls header to tools infrastructure
From: Daniel Borkmann @ 2018-10-16 13:59 UTC (permalink / raw)
  To: alexei.starovoitov; +Cc: rdna, john.fastabend, netdev, Daniel Borkmann

Andrey reported a build error for the BPF kselftest suite when compiled on
a machine which does not have tls related header bits installed natively:

  test_sockmap.c:120:23: fatal error: linux/tls.h: No such file or directory
   #include <linux/tls.h>
                         ^
  compilation terminated.

Fix it by adding the header to the tools include infrastructure and add
definitions such as SOL_TLS that could potentially be missing.

Fixes: e9dd904708c4 ("bpf: add tls support for testing in test_sockmap")
Reported-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/include/uapi/linux/tls.h             | 78 ++++++++++++++++++++++++++++++
 tools/testing/selftests/bpf/test_sockmap.c | 13 +++--
 2 files changed, 86 insertions(+), 5 deletions(-)
 create mode 100644 tools/include/uapi/linux/tls.h

diff --git a/tools/include/uapi/linux/tls.h b/tools/include/uapi/linux/tls.h
new file mode 100644
index 0000000..ff02287
--- /dev/null
+++ b/tools/include/uapi/linux/tls.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
+/*
+ * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UAPI_LINUX_TLS_H
+#define _UAPI_LINUX_TLS_H
+
+#include <linux/types.h>
+
+/* TLS socket options */
+#define TLS_TX			1	/* Set transmit parameters */
+#define TLS_RX			2	/* Set receive parameters */
+
+/* Supported versions */
+#define TLS_VERSION_MINOR(ver)	((ver) & 0xFF)
+#define TLS_VERSION_MAJOR(ver)	(((ver) >> 8) & 0xFF)
+
+#define TLS_VERSION_NUMBER(id)	((((id##_VERSION_MAJOR) & 0xFF) << 8) |	\
+				 ((id##_VERSION_MINOR) & 0xFF))
+
+#define TLS_1_2_VERSION_MAJOR	0x3
+#define TLS_1_2_VERSION_MINOR	0x3
+#define TLS_1_2_VERSION		TLS_VERSION_NUMBER(TLS_1_2)
+
+/* Supported ciphers */
+#define TLS_CIPHER_AES_GCM_128				51
+#define TLS_CIPHER_AES_GCM_128_IV_SIZE			8
+#define TLS_CIPHER_AES_GCM_128_KEY_SIZE		16
+#define TLS_CIPHER_AES_GCM_128_SALT_SIZE		4
+#define TLS_CIPHER_AES_GCM_128_TAG_SIZE		16
+#define TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE		8
+
+#define TLS_SET_RECORD_TYPE	1
+#define TLS_GET_RECORD_TYPE	2
+
+struct tls_crypto_info {
+	__u16 version;
+	__u16 cipher_type;
+};
+
+struct tls12_crypto_info_aes_gcm_128 {
+	struct tls_crypto_info info;
+	unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
+	unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
+	unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
+	unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
+};
+
+#endif /* _UAPI_LINUX_TLS_H */
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 10a5fa8..7cb69ce 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -28,6 +28,7 @@
 #include <linux/sock_diag.h>
 #include <linux/bpf.h>
 #include <linux/if_link.h>
+#include <linux/tls.h>
 #include <assert.h>
 #include <libgen.h>
 
@@ -43,6 +44,13 @@
 int running;
 static void running_handler(int a);
 
+#ifndef TCP_ULP
+# define TCP_ULP 31
+#endif
+#ifndef SOL_TLS
+# define SOL_TLS 282
+#endif
+
 /* randomly selected ports for testing on lo */
 #define S1_PORT 10000
 #define S2_PORT 10001
@@ -114,11 +122,6 @@ static void usage(char *argv[])
 	printf("\n");
 }
 
-#define TCP_ULP 31
-#define TLS_TX 1
-#define TLS_RX 2
-#include <linux/tls.h>
-
 char *sock_to_string(int s)
 {
 	if (s == c1)
-- 
2.9.5

^ permalink raw reply related

* [PATCH] ptp: fix Spectre v1 vulnerability
From: Gustavo A. R. Silva @ 2018-10-16 13:06 UTC (permalink / raw)
  To: Richard Cochran, David S. Miller
  Cc: netdev, linux-kernel, Gustavo A. R. Silva

pin_index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/ptp/ptp_chardev.c:253 ptp_ioctl() warn: potential spectre issue
'ops->pin_config' [r] (local cap)

Fix this by sanitizing pin_index before using it to index
ops->pin_config, and before passing it as an argument to
function ptp_set_pinfunc(), in which it is used to index
info->pin_config.

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/ptp/ptp_chardev.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 01b0e2b..2012551 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -24,6 +24,8 @@
 #include <linux/slab.h>
 #include <linux/timekeeping.h>
 
+#include <linux/nospec.h>
+
 #include "ptp_private.h"
 
 static int ptp_disable_pinfunc(struct ptp_clock_info *ops,
@@ -248,6 +250,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 			err = -EINVAL;
 			break;
 		}
+		pin_index = array_index_nospec(pin_index, ops->n_pins);
 		if (mutex_lock_interruptible(&ptp->pincfg_mux))
 			return -ERESTARTSYS;
 		pd = ops->pin_config[pin_index];
@@ -266,6 +269,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 			err = -EINVAL;
 			break;
 		}
+		pin_index = array_index_nospec(pin_index, ops->n_pins);
 		if (mutex_lock_interruptible(&ptp->pincfg_mux))
 			return -ERESTARTSYS;
 		err = ptp_set_pinfunc(ptp, pin_index, pd.func, pd.chan);
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox