Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next PATCH 3/3] net: for rate-limited ICMP replies save one atomic operation
From: Eric Dumazet @ 2017-01-09 17:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev, xiyou.wangcong
In-Reply-To: <20170109150414.30215.63724.stgit@firesoul>

On Mon, 2017-01-09 at 16:04 +0100, Jesper Dangaard Brouer wrote:
> It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
> by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
> as pointed out by Eric Dumazet, but the BH disabled state must be correct.
> 
> The icmp_global_allow() call states it must be called with BH
> disabled.  This protection was given by the calls icmp_xmit_lock and
> icmpv6_xmit_lock.  Thus, split out local_bh_disable/enable from these
> functions and maintain it explicitly at callers.
> 
> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [net-next PATCH 1/3] Revert "icmp: avoid allocating large struct on stack"
From: Eric Dumazet @ 2017-01-09 17:50 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jesper Dangaard Brouer, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpUNmaBx6L4CcrXkh6aFj-fEXdoGTBJt+33W3Xg-=7yhWA@mail.gmail.com>

On Mon, 2017-01-09 at 09:42 -0800, Cong Wang wrote:
> On Mon, Jan 9, 2017 at 7:04 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> > This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct
> > on stack"), because struct icmp_bxm no really a large struct, and
> > allocating and free of this small 112 bytes hurts performance.
> 
> The original commit fixes a warning for large stack usage, icmp_send()
> is deep in the call stack.
> 
> Your optimization for a slow path makes no sense to me.

Do you have the stack trace of this event ?

Even Linus allowed vmalloc() kernel stacks, while it certainly was an
heresy 10 years ago.

I doubt it makes a difference trying to save 104 bytes of kernel stack.

^ permalink raw reply

* Re: [net-next PATCH 0/3] net: optimize ICMP-reply code path
From: Eric Dumazet @ 2017-01-09 17:56 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jesper Dangaard Brouer, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpWVokaBs=mvG+PKArVx_t3xVg5n9SWjj69Uu=m-ywyH+w@mail.gmail.com>

On Mon, 2017-01-09 at 09:43 -0800, Cong Wang wrote:
> On Mon, Jan 9, 2017 at 7:03 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > Use-case: The specific case I experienced this being a bottleneck is,
> > sending UDP packets to a port with no listener, which obviously result
> > in kernel replying with ICMP Destination Unreachable (type:3), Port
> > Unreachable (code:3), which cause the bottleneck.
> 
> Why this is a case we should care about for performance?

This is called provisioning.

If you have a server farm that was qualified to handle 100 Mpps,
you want to absorb these 100 Mpps, even if the UDP server is restarted
in a clean or catastrophic mode.

The catastrophic mode would be the case that Jesper described : No UDP
socket is bound and ready to receive packets.

^ permalink raw reply

* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Jiri Pirko @ 2017-01-09 17:58 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vivien Didelot, netdev, linux-kernel, kernel, David S. Miller,
	Andrew Lunn, Uwe Kleine-König, Andrey Smirnov
In-Reply-To: <89492624-84f2-ca73-75dd-7fa10819ad09@gmail.com>

Mon, Jan 09, 2017 at 06:42:07PM CET, f.fainelli@gmail.com wrote:
>On 01/09/2017 08:06 AM, Jiri Pirko wrote:
>> Mon, Jan 09, 2017 at 04:45:33PM CET, vivien.didelot@savoirfairelinux.com wrote:
>>> Hi Jiri,
>>>
>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>
>>>>> Extra question: shouldn't phys_port_{id,name} be switchdev attributes in
>>>>
>>>> Again, phys_port_id has nothing to do with switches. Should be removed
>>>> from dsa because its use there is incorrect.
>>>
>>> Florian, since 3a543ef just got in, can it be reverted?
>> 
>> Yes, please revert it. It is only in net-next.
>
>Maybe the use case can be understood before reverting the change. How do
>we actually the physical port number of an Ethernet switch per-port
>network device? The name is not enough, because there are plenty of
>cases where we need to manipulate a physical port number (be it just for
>informational purposes).

Like what?

Why the name is not enough? This is something propagated to userspace
and never used internally in kernel.

Btw, ndo_get_phys_port_id does not give you number, but arbitrary binary.


>
>Should we just amend the existing description of ndo_get_phys_port_id()?
>Should we introduce another ndo for that?
>-- 
>Florian

^ permalink raw reply

* Re: [net-next PATCH 1/3] Revert "icmp: avoid allocating large struct on stack"
From: Cong Wang @ 2017-01-09 17:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jesper Dangaard Brouer, Linux Kernel Network Developers
In-Reply-To: <1483984208.5846.9.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, Jan 9, 2017 at 9:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2017-01-09 at 09:42 -0800, Cong Wang wrote:
>> On Mon, Jan 9, 2017 at 7:04 AM, Jesper Dangaard Brouer
>> <brouer@redhat.com> wrote:
>> > This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct
>> > on stack"), because struct icmp_bxm no really a large struct, and
>> > allocating and free of this small 112 bytes hurts performance.
>>
>> The original commit fixes a warning for large stack usage, icmp_send()
>> is deep in the call stack.
>>
>> Your optimization for a slow path makes no sense to me.
>
> Do you have the stack trace of this event ?
>
> Even Linus allowed vmalloc() kernel stacks, while it certainly was an
> heresy 10 years ago.
>
> I doubt it makes a difference trying to save 104 bytes of kernel stack.

I think you should have known this, quote from Eric Dumazet
(hopefully the same one):

    On Fri, 2013-05-31 at 22:22 -0700, Eric Dumazet wrote:

    Strange, I posted a patch like that some days ago.

which is from: https://patchwork.ozlabs.org/patch/248051/

Facepalm...

^ permalink raw reply

* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Florian Fainelli @ 2017-01-09 18:06 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Vivien Didelot, netdev, linux-kernel, kernel, David S. Miller,
	Andrew Lunn, Uwe Kleine-König, Andrey Smirnov
In-Reply-To: <20170109175842.GH1862@nanopsycho>

On 01/09/2017 09:58 AM, Jiri Pirko wrote:
> Mon, Jan 09, 2017 at 06:42:07PM CET, f.fainelli@gmail.com wrote:
>> On 01/09/2017 08:06 AM, Jiri Pirko wrote:
>>> Mon, Jan 09, 2017 at 04:45:33PM CET, vivien.didelot@savoirfairelinux.com wrote:
>>>> Hi Jiri,
>>>>
>>>> Jiri Pirko <jiri@resnulli.us> writes:
>>>>
>>>>>> Extra question: shouldn't phys_port_{id,name} be switchdev attributes in
>>>>>
>>>>> Again, phys_port_id has nothing to do with switches. Should be removed
>>>>> from dsa because its use there is incorrect.
>>>>
>>>> Florian, since 3a543ef just got in, can it be reverted?
>>>
>>> Yes, please revert it. It is only in net-next.
>>
>> Maybe the use case can be understood before reverting the change. How do
>> we actually the physical port number of an Ethernet switch per-port
>> network device? The name is not enough, because there are plenty of
>> cases where we need to manipulate a physical port number (be it just for
>> informational purposes).
> 
> Like what?

Specifying the physical port number (and derive a queue number
eventually) for some ethtool (e.g: rxnfc)/tc (queue mapping) operations
where there is an action/queue/port destination argument that gets
programmed into the hardware.

You already have the originating port number from the interface you call
the method against, but you also need the destination port number since
that is what the HW understands.

Aside from that, it is useful for allowing interface naming in user
space if you don't want to use labels.

> 
> Why the name is not enough? This is something propagated to userspace
> and never used internally in kernel.

Because the name is not reflective of the port number in some switches.
In my case for instance, we have 5 ports that are named after the
entities they connect to (an integrated Gigabit PHY, two RGMII pads, one
MoCA interface, and the CPU)

0 -> gphy
1 -> rgmii_1
2 -> rgmii_2
7 -> moca
8 -> cpu

> 
> Btw, ndo_get_phys_port_id does not give you number, but arbitrary binary.

It's not entirely arbitrary for DSA switches since the port number is
stored in an u8 whose value is the port number in hexadecimal (as shown
by sysfs at least).
-- 
Florian

^ permalink raw reply

* Re: [net-next PATCH 1/3] Revert "icmp: avoid allocating large struct on stack"
From: Eric Dumazet @ 2017-01-09 18:07 UTC (permalink / raw)
  To: Cong Wang; +Cc: Jesper Dangaard Brouer, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpV6YQ6nh_Y0f3BDgMheOic3bPiqQ1oNHR+saxxE7nBuoA@mail.gmail.com>

On Mon, 2017-01-09 at 09:59 -0800, Cong Wang wrote:
> On Mon, Jan 9, 2017 at 9:50 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2017-01-09 at 09:42 -0800, Cong Wang wrote:
> >> On Mon, Jan 9, 2017 at 7:04 AM, Jesper Dangaard Brouer
> >> <brouer@redhat.com> wrote:
> >> > This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct
> >> > on stack"), because struct icmp_bxm no really a large struct, and
> >> > allocating and free of this small 112 bytes hurts performance.
> >>
> >> The original commit fixes a warning for large stack usage, icmp_send()
> >> is deep in the call stack.
> >>
> >> Your optimization for a slow path makes no sense to me.
> >
> > Do you have the stack trace of this event ?
> >
> > Even Linus allowed vmalloc() kernel stacks, while it certainly was an
> > heresy 10 years ago.
> >
> > I doubt it makes a difference trying to save 104 bytes of kernel stack.
> 
> I think you should have known this, quote from Eric Dumazet
> (hopefully the same one):
> 
>     On Fri, 2013-05-31 at 22:22 -0700, Eric Dumazet wrote:
> 
>     Strange, I posted a patch like that some days ago.
> 
> which is from: https://patchwork.ozlabs.org/patch/248051/
> 
> Facepalm...


We are in 2017.  Whatever was said in 2013 is irrelevant.

You really should come to netdev conferences so that you understand
goals and efforts, instead of living in your cave.

Then you can slap me in the face, since this is obviously your desire.

Then, we will drink a beer and relax.

^ permalink raw reply

* [PATCH net-next 3/5] bpf: allow adjusted map element values to spill
From: Alexei Starovoitov @ 2017-01-09 18:19 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, Gianluca Borello, Josef Bacik, netdev
In-Reply-To: <1483985990-1532850-1-git-send-email-ast@fb.com>

From: Gianluca Borello <g.borello@gmail.com>

commit 484611357c19 ("bpf: allow access into map value arrays")
introduces the ability to do pointer math inside a map element value via
the PTR_TO_MAP_VALUE_ADJ register type.

The current support doesn't handle the case where a PTR_TO_MAP_VALUE_ADJ
is spilled into the stack, limiting several use cases, especially when
generating bpf code from a compiler.

Handle this case by explicitly enabling the register type
PTR_TO_MAP_VALUE_ADJ to be spilled. Also, make sure that min_value and
max_value are reset just for BPF_LDX operations that don't result in a
restore of a spilled register from stack.

Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c                       | 21 +++++++++----
 tools/testing/selftests/bpf/test_verifier.c | 46 +++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b7014606745b..59ed07b9b4ea 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -481,6 +481,13 @@ static void reset_reg_range_values(struct bpf_reg_state *regs, u32 regno)
 	regs[regno].max_value = BPF_REGISTER_MAX_RANGE;
 }
 
+static void mark_reg_unknown_value_and_range(struct bpf_reg_state *regs,
+					     u32 regno)
+{
+	mark_reg_unknown_value(regs, regno);
+	reset_reg_range_values(regs, regno);
+}
+
 enum reg_arg_type {
 	SRC_OP,		/* register is used as source operand */
 	DST_OP,		/* register is used as destination operand */
@@ -532,6 +539,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	switch (type) {
 	case PTR_TO_MAP_VALUE:
 	case PTR_TO_MAP_VALUE_OR_NULL:
+	case PTR_TO_MAP_VALUE_ADJ:
 	case PTR_TO_STACK:
 	case PTR_TO_CTX:
 	case PTR_TO_PACKET:
@@ -616,7 +624,8 @@ static int check_stack_read(struct bpf_verifier_state *state, int off, int size,
 		}
 		if (value_regno >= 0)
 			/* have read misc data from the stack */
-			mark_reg_unknown_value(state->regs, value_regno);
+			mark_reg_unknown_value_and_range(state->regs,
+							 value_regno);
 		return 0;
 	}
 }
@@ -825,7 +834,8 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 		else
 			err = check_map_access(env, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
-			mark_reg_unknown_value(state->regs, value_regno);
+			mark_reg_unknown_value_and_range(state->regs,
+							 value_regno);
 
 	} else if (reg->type == PTR_TO_CTX) {
 		enum bpf_reg_type reg_type = UNKNOWN_VALUE;
@@ -837,7 +847,8 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 		}
 		err = check_ctx_access(env, off, size, t, &reg_type);
 		if (!err && t == BPF_READ && value_regno >= 0) {
-			mark_reg_unknown_value(state->regs, value_regno);
+			mark_reg_unknown_value_and_range(state->regs,
+							 value_regno);
 			/* note that reg.[id|off|range] == 0 */
 			state->regs[value_regno].type = reg_type;
 		}
@@ -870,7 +881,8 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 		}
 		err = check_packet_access(env, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
-			mark_reg_unknown_value(state->regs, value_regno);
+			mark_reg_unknown_value_and_range(state->regs,
+							 value_regno);
 	} else {
 		verbose("R%d invalid mem access '%s'\n",
 			regno, reg_type_str[reg->type]);
@@ -2744,7 +2756,6 @@ static int do_check(struct bpf_verifier_env *env)
 			if (err)
 				return err;
 
-			reset_reg_range_values(regs, insn->dst_reg);
 			if (BPF_SIZE(insn->code) != BPF_W &&
 			    BPF_SIZE(insn->code) != BPF_DW) {
 				insn_idx++;
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index b7732e557bf9..e7b075819c08 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3396,6 +3396,52 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
 	},
+	{
+		"map element value is preserved across register spilling",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -184),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_1, 0),
+			BPF_ST_MEM(BPF_DW, BPF_REG_3, 0, 42),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr_unpriv = "R0 leaks addr",
+		.result = ACCEPT,
+		.result_unpriv = REJECT,
+	},
+	{
+		"map element value (adjusted) is preserved across register spilling",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_0,
+				offsetof(struct test_val, foo)),
+			BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -184),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_1, 0),
+			BPF_ST_MEM(BPF_DW, BPF_REG_3, 0, 42),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr_unpriv = "R0 pointer arithmetic prohibited",
+		.result = ACCEPT,
+		.result_unpriv = REJECT,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)
-- 
2.8.0

^ permalink raw reply related

* [PATCH net-next 1/5] bpf: split check_mem_access logic for map values
From: Alexei Starovoitov @ 2017-01-09 18:19 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, Gianluca Borello, Josef Bacik, netdev
In-Reply-To: <1483985990-1532850-1-git-send-email-ast@fb.com>

From: Gianluca Borello <g.borello@gmail.com>

Move the logic to check memory accesses to a PTR_TO_MAP_VALUE_ADJ from
check_mem_access() to a separate helper check_map_access_adj(). This
enables to use those checks in other parts of the verifier as well,
where boundaries on PTR_TO_MAP_VALUE_ADJ might need to be checked, for
example when checking helper function arguments. The same thing is
already happening for other types such as PTR_TO_PACKET and its
check_packet_access() helper.

The code has been copied verbatim, with the only difference of removing
the "off += reg->max_value" statement and moving the sum into the call
statement to check_map_access(), as that was only needed due to the
earlier common check_map_access() call.

Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 88 ++++++++++++++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 39 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 83ed2f8f6f22..8333fbcfbfe7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -635,6 +635,51 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
 	return 0;
 }
 
+/* check read/write into an adjusted map element */
+static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno,
+				int off, int size)
+{
+	struct bpf_verifier_state *state = &env->cur_state;
+	struct bpf_reg_state *reg = &state->regs[regno];
+	int err;
+
+	/* We adjusted the register to this map value, so we
+	 * need to change off and size to min_value and max_value
+	 * respectively to make sure our theoretical access will be
+	 * safe.
+	 */
+	if (log_level)
+		print_verifier_state(state);
+	env->varlen_map_value_access = true;
+	/* The minimum value is only important with signed
+	 * comparisons where we can't assume the floor of a
+	 * value is 0.  If we are using signed variables for our
+	 * index'es we need to make sure that whatever we use
+	 * will have a set floor within our range.
+	 */
+	if (reg->min_value < 0) {
+		verbose("R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
+			regno);
+		return -EACCES;
+	}
+	err = check_map_access(env, regno, reg->min_value + off, size);
+	if (err) {
+		verbose("R%d min value is outside of the array range\n",
+			regno);
+		return err;
+	}
+
+	/* If we haven't set a max value then we need to bail
+	 * since we can't be sure we won't do bad things.
+	 */
+	if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
+		verbose("R%d unbounded memory access, make sure to bounds check any array access into a map\n",
+			regno);
+		return -EACCES;
+	}
+	return check_map_access(env, regno, reg->max_value + off, size);
+}
+
 #define MAX_PACKET_OFF 0xffff
 
 static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
@@ -775,45 +820,10 @@ static int check_mem_access(struct bpf_verifier_env *env, u32 regno, int off,
 			return -EACCES;
 		}
 
-		/* If we adjusted the register to this map value at all then we
-		 * need to change off and size to min_value and max_value
-		 * respectively to make sure our theoretical access will be
-		 * safe.
-		 */
-		if (reg->type == PTR_TO_MAP_VALUE_ADJ) {
-			if (log_level)
-				print_verifier_state(state);
-			env->varlen_map_value_access = true;
-			/* The minimum value is only important with signed
-			 * comparisons where we can't assume the floor of a
-			 * value is 0.  If we are using signed variables for our
-			 * index'es we need to make sure that whatever we use
-			 * will have a set floor within our range.
-			 */
-			if (reg->min_value < 0) {
-				verbose("R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n",
-					regno);
-				return -EACCES;
-			}
-			err = check_map_access(env, regno, reg->min_value + off,
-					       size);
-			if (err) {
-				verbose("R%d min value is outside of the array range\n",
-					regno);
-				return err;
-			}
-
-			/* If we haven't set a max value then we need to bail
-			 * since we can't be sure we won't do bad things.
-			 */
-			if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
-				verbose("R%d unbounded memory access, make sure to bounds check any array access into a map\n",
-					regno);
-				return -EACCES;
-			}
-			off += reg->max_value;
-		}
-		err = check_map_access(env, regno, off, size);
+		if (reg->type == PTR_TO_MAP_VALUE_ADJ)
+			err = check_map_access_adj(env, regno, off, size);
+		else
+			err = check_map_access(env, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown_value(state->regs, value_regno);
 
-- 
2.8.0

^ permalink raw reply related

* [PATCH net-next 2/5] bpf: allow helpers access to map element values
From: Alexei Starovoitov @ 2017-01-09 18:19 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, Gianluca Borello, Josef Bacik, netdev
In-Reply-To: <1483985990-1532850-1-git-send-email-ast@fb.com>

From: Gianluca Borello <g.borello@gmail.com>

Enable helpers to directly access a map element value by passing a
register type PTR_TO_MAP_VALUE (or PTR_TO_MAP_VALUE_ADJ) to helper
arguments ARG_PTR_TO_STACK or ARG_PTR_TO_RAW_STACK.

This enables several use cases. For example, a typical tracing program
might want to capture pathnames passed to sys_open() with:

struct trace_data {
	char pathname[PATHLEN];
};

SEC("kprobe/sys_open")
void bpf_sys_open(struct pt_regs *ctx)
{
	struct trace_data data;
	bpf_probe_read(data.pathname, sizeof(data.pathname), ctx->di);

	/* consume data.pathname, for example via
	 * bpf_trace_printk() or bpf_perf_event_output()
	 */
}

Such a program could easily hit the stack limit in case PATHLEN needs to
be large or more local variables need to exist, both of which are quite
common scenarios. Allowing direct helper access to map element values,
one could do:

struct bpf_map_def SEC("maps") scratch_map = {
	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
	.key_size = sizeof(u32),
	.value_size = sizeof(struct trace_data),
	.max_entries = 1,
};

SEC("kprobe/sys_open")
int bpf_sys_open(struct pt_regs *ctx)
{
	int id = 0;
	struct trace_data *p = bpf_map_lookup_elem(&scratch_map, &id);
	if (!p)
		return;
	bpf_probe_read(p->pathname, sizeof(p->pathname), ctx->di);

	/* consume p->pathname, for example via
	 * bpf_trace_printk() or bpf_perf_event_output()
	 */
}

And wouldn't risk exhausting the stack.

Code changes are loosely modeled after commit 6841de8b0d03 ("bpf: allow
helpers access the packet directly"). Unlike with PTR_TO_PACKET, these
changes just work with ARG_PTR_TO_STACK and ARG_PTR_TO_RAW_STACK (not
ARG_PTR_TO_MAP_KEY, ARG_PTR_TO_MAP_VALUE, ...): adding those would be
trivial, but since there is not currently a use case for that, it's
reasonable to limit the set of changes.

Also, add new tests to make sure accesses to map element values from
helpers never go out of boundary, even when adjusted.

Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c                       |   9 +-
 tools/testing/selftests/bpf/test_verifier.c | 491 ++++++++++++++++++++++++++++
 2 files changed, 498 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 8333fbcfbfe7..b7014606745b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -627,7 +627,7 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
 {
 	struct bpf_map *map = env->cur_state.regs[regno].map_ptr;
 
-	if (off < 0 || off + size > map->value_size) {
+	if (off < 0 || size <= 0 || off + size > map->value_size) {
 		verbose("invalid access to map value, value_size=%d off=%d size=%d\n",
 			map->value_size, off, size);
 		return -EACCES;
@@ -1025,7 +1025,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		 */
 		if (type == CONST_IMM && reg->imm == 0)
 			/* final test in check_stack_boundary() */;
-		else if (type != PTR_TO_PACKET && type != expected_type)
+		else if (type != PTR_TO_PACKET && type != PTR_TO_MAP_VALUE &&
+			 type != PTR_TO_MAP_VALUE_ADJ && type != expected_type)
 			goto err_type;
 		meta->raw_mode = arg_type == ARG_PTR_TO_RAW_STACK;
 	} else {
@@ -1088,6 +1089,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		}
 		if (regs[regno - 1].type == PTR_TO_PACKET)
 			err = check_packet_access(env, regno - 1, 0, reg->imm);
+		else if (regs[regno - 1].type == PTR_TO_MAP_VALUE)
+			err = check_map_access(env, regno - 1, 0, reg->imm);
+		else if (regs[regno - 1].type == PTR_TO_MAP_VALUE_ADJ)
+			err = check_map_access_adj(env, regno - 1, 0, reg->imm);
 		else
 			err = check_stack_boundary(env, regno - 1, reg->imm,
 						   zero_size_allowed, meta);
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 853d7e43434a..b7732e557bf9 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -2905,6 +2905,497 @@ static struct bpf_test tests[] = {
 		.result = REJECT,
 		.errstr = "invalid bpf_context access",
 	},
+	{
+		"helper access to map: full range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, sizeof(struct test_val)),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to map: partial range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, 8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to map: empty range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=0 size=0",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to map: out-of-bound range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, sizeof(struct test_val) + 8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=0 size=56",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to map: negative range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, -8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=0 size=-8",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const imm): full range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_2,
+				sizeof(struct test_val) -
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const imm): partial range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_2, 8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const imm): empty range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "R1 min value is outside of the array range",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const imm): out-of-bound range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_2,
+				sizeof(struct test_val) -
+				offsetof(struct test_val, foo) + 8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=4 size=52",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const imm): negative range (> adjustment)",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_2, -8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=4 size=-8",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const imm): negative range (< adjustment)",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1,
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_2, -1),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "R1 min value is outside of the array range",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const reg): full range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_3,
+				offsetof(struct test_val, foo)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2,
+				sizeof(struct test_val) -
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const reg): partial range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_3,
+				offsetof(struct test_val, foo)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2, 8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const reg): empty range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "R1 min value is outside of the array range",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const reg): out-of-bound range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_3,
+				offsetof(struct test_val, foo)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2,
+				sizeof(struct test_val) -
+				offsetof(struct test_val, foo) + 8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=4 size=52",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const reg): negative range (> adjustment)",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_3,
+				offsetof(struct test_val, foo)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2, -8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=4 size=-8",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via const reg): negative range (< adjustment)",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_3,
+				offsetof(struct test_val, foo)),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2, -1),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "R1 min value is outside of the array range",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via variable): full range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_3,
+				offsetof(struct test_val, foo), 4),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2,
+				sizeof(struct test_val) -
+				offsetof(struct test_val, foo)),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via variable): partial range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_3,
+				offsetof(struct test_val, foo), 4),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2, 8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via variable): empty range",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_3,
+				offsetof(struct test_val, foo), 4),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "R1 min value is outside of the array range",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via variable): no max check",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "R1 min value is negative, either use unsigned index or do a if (index >=0) check",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to adjusted map (via variable): wrong max check",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_0, 0),
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_3,
+				offsetof(struct test_val, foo), 4),
+			BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_3),
+			BPF_MOV64_IMM(BPF_REG_2,
+				sizeof(struct test_val) -
+				offsetof(struct test_val, foo) + 1),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=4 size=45",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)
-- 
2.8.0

^ permalink raw reply related

* [PATCH net-next 5/5] bpf: rename ARG_PTR_TO_STACK
From: Alexei Starovoitov @ 2017-01-09 18:19 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, Gianluca Borello, Josef Bacik, netdev
In-Reply-To: <1483985990-1532850-1-git-send-email-ast@fb.com>

since ARG_PTR_TO_STACK is no longer just pointer to stack
rename it to ARG_PTR_TO_MEM and adjust comment.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h      | 12 ++++++------
 kernel/bpf/helpers.c     |  4 ++--
 kernel/bpf/verifier.c    | 28 ++++++++++++++--------------
 kernel/trace/bpf_trace.c | 20 ++++++++++----------
 net/core/filter.c        | 40 ++++++++++++++++++++--------------------
 5 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f74ae68086dc..94ea8d2383e6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -69,14 +69,14 @@ enum bpf_arg_type {
 	/* the following constraints used to prototype bpf_memcmp() and other
 	 * functions that access data on eBPF program stack
 	 */
-	ARG_PTR_TO_STACK,	/* any pointer to eBPF program stack */
-	ARG_PTR_TO_RAW_STACK,	/* any pointer to eBPF program stack, area does not
-				 * need to be initialized, helper function must fill
-				 * all bytes or clear them in error case.
+	ARG_PTR_TO_MEM,		/* pointer to valid memory (stack, packet, map value) */
+	ARG_PTR_TO_UNINIT_MEM,	/* pointer to memory does not need to be initialized,
+				 * helper function must fill all bytes or clear
+				 * them in error case.
 				 */
 
-	ARG_CONST_STACK_SIZE,	/* number of bytes accessed from stack */
-	ARG_CONST_STACK_SIZE_OR_ZERO, /* number of bytes accessed from stack or 0 */
+	ARG_CONST_SIZE,		/* number of bytes accessed from memory */
+	ARG_CONST_SIZE_OR_ZERO,	/* number of bytes accessed from memory or 0 */
 
 	ARG_PTR_TO_CTX,		/* pointer to context */
 	ARG_ANYTHING,		/* any (initialized) argument is ok */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 045cbe673356..3d24e238221e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -176,6 +176,6 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.func		= bpf_get_current_comm,
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_RAW_STACK,
-	.arg2_type	= ARG_CONST_STACK_SIZE,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
 };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3d4f7bf32aaf..2efdc9128e3c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1034,8 +1034,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		expected_type = PTR_TO_STACK;
 		if (type != PTR_TO_PACKET && type != expected_type)
 			goto err_type;
-	} else if (arg_type == ARG_CONST_STACK_SIZE ||
-		   arg_type == ARG_CONST_STACK_SIZE_OR_ZERO) {
+	} else if (arg_type == ARG_CONST_SIZE ||
+		   arg_type == ARG_CONST_SIZE_OR_ZERO) {
 		expected_type = CONST_IMM;
 		/* One exception. Allow UNKNOWN_VALUE registers when the
 		 * boundaries are known and don't cause unsafe memory accesses
@@ -1050,8 +1050,8 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		expected_type = PTR_TO_CTX;
 		if (type != expected_type)
 			goto err_type;
-	} else if (arg_type == ARG_PTR_TO_STACK ||
-		   arg_type == ARG_PTR_TO_RAW_STACK) {
+	} else if (arg_type == ARG_PTR_TO_MEM ||
+		   arg_type == ARG_PTR_TO_UNINIT_MEM) {
 		expected_type = PTR_TO_STACK;
 		/* One exception here. In case function allows for NULL to be
 		 * passed in as argument, it's a CONST_IMM type. Final test
@@ -1062,7 +1062,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		else if (type != PTR_TO_PACKET && type != PTR_TO_MAP_VALUE &&
 			 type != PTR_TO_MAP_VALUE_ADJ && type != expected_type)
 			goto err_type;
-		meta->raw_mode = arg_type == ARG_PTR_TO_RAW_STACK;
+		meta->raw_mode = arg_type == ARG_PTR_TO_UNINIT_MEM;
 	} else {
 		verbose("unsupported arg_type %d\n", arg_type);
 		return -EFAULT;
@@ -1108,9 +1108,9 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			err = check_stack_boundary(env, regno,
 						   meta->map_ptr->value_size,
 						   false, NULL);
-	} else if (arg_type == ARG_CONST_STACK_SIZE ||
-		   arg_type == ARG_CONST_STACK_SIZE_OR_ZERO) {
-		bool zero_size_allowed = (arg_type == ARG_CONST_STACK_SIZE_OR_ZERO);
+	} else if (arg_type == ARG_CONST_SIZE ||
+		   arg_type == ARG_CONST_SIZE_OR_ZERO) {
+		bool zero_size_allowed = (arg_type == ARG_CONST_SIZE_OR_ZERO);
 
 		/* bpf_xxx(..., buf, len) call will access 'len' bytes
 		 * from stack pointer 'buf'. Check it
@@ -1118,7 +1118,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		 */
 		if (regno == 0) {
 			/* kernel subsystem misconfigured verifier */
-			verbose("ARG_CONST_STACK_SIZE cannot be first argument\n");
+			verbose("ARG_CONST_SIZE cannot be first argument\n");
 			return -EACCES;
 		}
 
@@ -1235,15 +1235,15 @@ static int check_raw_mode(const struct bpf_func_proto *fn)
 {
 	int count = 0;
 
-	if (fn->arg1_type == ARG_PTR_TO_RAW_STACK)
+	if (fn->arg1_type == ARG_PTR_TO_UNINIT_MEM)
 		count++;
-	if (fn->arg2_type == ARG_PTR_TO_RAW_STACK)
+	if (fn->arg2_type == ARG_PTR_TO_UNINIT_MEM)
 		count++;
-	if (fn->arg3_type == ARG_PTR_TO_RAW_STACK)
+	if (fn->arg3_type == ARG_PTR_TO_UNINIT_MEM)
 		count++;
-	if (fn->arg4_type == ARG_PTR_TO_RAW_STACK)
+	if (fn->arg4_type == ARG_PTR_TO_UNINIT_MEM)
 		count++;
-	if (fn->arg5_type == ARG_PTR_TO_RAW_STACK)
+	if (fn->arg5_type == ARG_PTR_TO_UNINIT_MEM)
 		count++;
 
 	return count > 1 ? -EINVAL : 0;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index fa77311dadb2..f883c43c96f3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -76,8 +76,8 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
 	.func		= bpf_probe_read,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_RAW_STACK,
-	.arg2_type	= ARG_CONST_STACK_SIZE,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
 	.arg3_type	= ARG_ANYTHING,
 };
 
@@ -109,8 +109,8 @@ static const struct bpf_func_proto bpf_probe_write_user_proto = {
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_ANYTHING,
-	.arg2_type	= ARG_PTR_TO_STACK,
-	.arg3_type	= ARG_CONST_STACK_SIZE,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
 };
 
 static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
@@ -213,8 +213,8 @@ static const struct bpf_func_proto bpf_trace_printk_proto = {
 	.func		= bpf_trace_printk,
 	.gpl_only	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_STACK,
-	.arg2_type	= ARG_CONST_STACK_SIZE,
+	.arg1_type	= ARG_PTR_TO_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
 };
 
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
@@ -329,8 +329,8 @@ static const struct bpf_func_proto bpf_perf_event_output_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_CONST_MAP_PTR,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_STACK,
-	.arg5_type	= ARG_CONST_STACK_SIZE,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE,
 };
 
 static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs);
@@ -492,8 +492,8 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_tp = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_CONST_MAP_PTR,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_STACK,
-	.arg5_type	= ARG_CONST_STACK_SIZE,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE,
 };
 
 BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
diff --git a/net/core/filter.c b/net/core/filter.c
index 1969b3f118c1..f4d16a905754 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1416,8 +1416,8 @@ static const struct bpf_func_proto bpf_skb_store_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_STACK,
-	.arg4_type	= ARG_CONST_STACK_SIZE,
+	.arg3_type	= ARG_PTR_TO_MEM,
+	.arg4_type	= ARG_CONST_SIZE,
 	.arg5_type	= ARG_ANYTHING,
 };
 
@@ -1447,8 +1447,8 @@ static const struct bpf_func_proto bpf_skb_load_bytes_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_ANYTHING,
-	.arg3_type	= ARG_PTR_TO_RAW_STACK,
-	.arg4_type	= ARG_CONST_STACK_SIZE,
+	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_CONST_SIZE,
 };
 
 BPF_CALL_2(bpf_skb_pull_data, struct sk_buff *, skb, u32, len)
@@ -1601,10 +1601,10 @@ static const struct bpf_func_proto bpf_csum_diff_proto = {
 	.gpl_only	= false,
 	.pkt_access	= true,
 	.ret_type	= RET_INTEGER,
-	.arg1_type	= ARG_PTR_TO_STACK,
-	.arg2_type	= ARG_CONST_STACK_SIZE_OR_ZERO,
-	.arg3_type	= ARG_PTR_TO_STACK,
-	.arg4_type	= ARG_CONST_STACK_SIZE_OR_ZERO,
+	.arg1_type	= ARG_PTR_TO_MEM,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_PTR_TO_MEM,
+	.arg4_type	= ARG_CONST_SIZE_OR_ZERO,
 	.arg5_type	= ARG_ANYTHING,
 };
 
@@ -2306,8 +2306,8 @@ static const struct bpf_func_proto bpf_skb_event_output_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_CONST_MAP_PTR,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_STACK,
-	.arg5_type	= ARG_CONST_STACK_SIZE,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE,
 };
 
 static unsigned short bpf_tunnel_key_af(u64 flags)
@@ -2377,8 +2377,8 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_key_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_RAW_STACK,
-	.arg3_type	= ARG_CONST_STACK_SIZE,
+	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
 	.arg4_type	= ARG_ANYTHING,
 };
 
@@ -2412,8 +2412,8 @@ static const struct bpf_func_proto bpf_skb_get_tunnel_opt_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_RAW_STACK,
-	.arg3_type	= ARG_CONST_STACK_SIZE,
+	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
 };
 
 static struct metadata_dst __percpu *md_dst;
@@ -2483,8 +2483,8 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_key_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_STACK,
-	.arg3_type	= ARG_CONST_STACK_SIZE,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
 	.arg4_type	= ARG_ANYTHING,
 };
 
@@ -2509,8 +2509,8 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_CTX,
-	.arg2_type	= ARG_PTR_TO_STACK,
-	.arg3_type	= ARG_CONST_STACK_SIZE,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
 };
 
 static const struct bpf_func_proto *
@@ -2593,8 +2593,8 @@ static const struct bpf_func_proto bpf_xdp_event_output_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 	.arg2_type	= ARG_CONST_MAP_PTR,
 	.arg3_type	= ARG_ANYTHING,
-	.arg4_type	= ARG_PTR_TO_STACK,
-	.arg5_type	= ARG_CONST_STACK_SIZE,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE,
 };
 
 static const struct bpf_func_proto *
-- 
2.8.0

^ permalink raw reply related

* [PATCH net-next 0/5] bpf: verifier improvements
From: Alexei Starovoitov @ 2017-01-09 18:19 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, Gianluca Borello, Josef Bacik, netdev

A number of bpf verifier improvements from Gianluca.
See individual patches for details.

Alexei Starovoitov (1):
  bpf: rename ARG_PTR_TO_STACK

Gianluca Borello (4):
  bpf: split check_mem_access logic for map values
  bpf: allow helpers access to map element values
  bpf: allow adjusted map element values to spill
  bpf: allow helpers access to variable memory

 include/linux/bpf.h                         |  12 +-
 kernel/bpf/helpers.c                        |   4 +-
 kernel/bpf/verifier.c                       | 212 +++++--
 kernel/trace/bpf_trace.c                    |  20 +-
 net/core/filter.c                           |  40 +-
 tools/testing/selftests/bpf/test_verifier.c | 947 ++++++++++++++++++++++++++++
 6 files changed, 1131 insertions(+), 104 deletions(-)

-- 
2.8.0

^ permalink raw reply

* [PATCH net-next 4/5] bpf: allow helpers access to variable memory
From: Alexei Starovoitov @ 2017-01-09 18:19 UTC (permalink / raw)
  To: David S . Miller; +Cc: Daniel Borkmann, Gianluca Borello, Josef Bacik, netdev
In-Reply-To: <1483985990-1532850-1-git-send-email-ast@fb.com>

From: Gianluca Borello <g.borello@gmail.com>

Currently, helpers that read and write from/to the stack can do so using
a pair of arguments of type ARG_PTR_TO_STACK and ARG_CONST_STACK_SIZE.
ARG_CONST_STACK_SIZE accepts a constant register of type CONST_IMM, so
that the verifier can safely check the memory access. However, requiring
the argument to be a constant can be limiting in some circumstances.

Since the current logic keeps track of the minimum and maximum value of
a register throughout the simulated execution, ARG_CONST_STACK_SIZE can
be changed to also accept an UNKNOWN_VALUE register in case its
boundaries have been set and the range doesn't cause invalid memory
accesses.

One common situation when this is useful:

int len;
char buf[BUFSIZE]; /* BUFSIZE is 128 */

if (some_condition)
	len = 42;
else
	len = 84;

some_helper(..., buf, len & (BUFSIZE - 1));

The compiler can often decide to assign the constant values 42 or 48
into a variable on the stack, instead of keeping it in a register. When
the variable is then read back from stack into the register in order to
be passed to the helper, the verifier will not be able to recognize the
register as constant (the verifier is not currently tracking all
constant writes into memory), and the program won't be valid.

However, by allowing the helper to accept an UNKNOWN_VALUE register,
this program will work because the bitwise AND operation will set the
range of possible values for the UNKNOWN_VALUE register to [0, BUFSIZE),
so the verifier can guarantee the helper call will be safe (assuming the
argument is of type ARG_CONST_STACK_SIZE_OR_ZERO, otherwise one more
check against 0 would be needed). Custom ranges can be set not only with
ALU operations, but also by explicitly comparing the UNKNOWN_VALUE
register with constants.

Another very common example happens when intercepting system call
arguments and accessing user-provided data of variable size using
bpf_probe_read(). One can load at runtime the user-provided length in an
UNKNOWN_VALUE register, and then read that exact amount of data up to a
compile-time determined limit in order to fit into the proper local
storage allocated on the stack, without having to guess a suboptimal
access size at compile time.

Also, in case the helpers accepting the UNKNOWN_VALUE register operate
in raw mode, disable the raw mode so that the program is required to
initialize all memory, since there is no guarantee the helper will fill
it completely, leaving possibilities for data leak (just relevant when
the memory used by the helper is the stack, not when using a pointer to
map element value or packet). In other words, ARG_PTR_TO_RAW_STACK will
be treated as ARG_PTR_TO_STACK.

Signed-off-by: Gianluca Borello <g.borello@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c                       |  74 ++++-
 tools/testing/selftests/bpf/test_verifier.c | 410 ++++++++++++++++++++++++++++
 2 files changed, 474 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 59ed07b9b4ea..3d4f7bf32aaf 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -980,6 +980,25 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
 	return 0;
 }
 
+static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
+				   int access_size, bool zero_size_allowed,
+				   struct bpf_call_arg_meta *meta)
+{
+	struct bpf_reg_state *regs = env->cur_state.regs;
+
+	switch (regs[regno].type) {
+	case PTR_TO_PACKET:
+		return check_packet_access(env, regno, 0, access_size);
+	case PTR_TO_MAP_VALUE:
+		return check_map_access(env, regno, 0, access_size);
+	case PTR_TO_MAP_VALUE_ADJ:
+		return check_map_access_adj(env, regno, 0, access_size);
+	default: /* const_imm|ptr_to_stack or invalid ptr */
+		return check_stack_boundary(env, regno, access_size,
+					    zero_size_allowed, meta);
+	}
+}
+
 static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			  enum bpf_arg_type arg_type,
 			  struct bpf_call_arg_meta *meta)
@@ -1018,7 +1037,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 	} else if (arg_type == ARG_CONST_STACK_SIZE ||
 		   arg_type == ARG_CONST_STACK_SIZE_OR_ZERO) {
 		expected_type = CONST_IMM;
-		if (type != expected_type)
+		/* One exception. Allow UNKNOWN_VALUE registers when the
+		 * boundaries are known and don't cause unsafe memory accesses
+		 */
+		if (type != UNKNOWN_VALUE && type != expected_type)
 			goto err_type;
 	} else if (arg_type == ARG_CONST_MAP_PTR) {
 		expected_type = CONST_PTR_TO_MAP;
@@ -1099,15 +1121,47 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 			verbose("ARG_CONST_STACK_SIZE cannot be first argument\n");
 			return -EACCES;
 		}
-		if (regs[regno - 1].type == PTR_TO_PACKET)
-			err = check_packet_access(env, regno - 1, 0, reg->imm);
-		else if (regs[regno - 1].type == PTR_TO_MAP_VALUE)
-			err = check_map_access(env, regno - 1, 0, reg->imm);
-		else if (regs[regno - 1].type == PTR_TO_MAP_VALUE_ADJ)
-			err = check_map_access_adj(env, regno - 1, 0, reg->imm);
-		else
-			err = check_stack_boundary(env, regno - 1, reg->imm,
-						   zero_size_allowed, meta);
+
+		/* If the register is UNKNOWN_VALUE, the access check happens
+		 * using its boundaries. Otherwise, just use its imm
+		 */
+		if (type == UNKNOWN_VALUE) {
+			/* For unprivileged variable accesses, disable raw
+			 * mode so that the program is required to
+			 * initialize all the memory that the helper could
+			 * just partially fill up.
+			 */
+			meta = NULL;
+
+			if (reg->min_value < 0) {
+				verbose("R%d min value is negative, either use unsigned or 'var &= const'\n",
+					regno);
+				return -EACCES;
+			}
+
+			if (reg->min_value == 0) {
+				err = check_helper_mem_access(env, regno - 1, 0,
+							      zero_size_allowed,
+							      meta);
+				if (err)
+					return err;
+			}
+
+			if (reg->max_value == BPF_REGISTER_MAX_RANGE) {
+				verbose("R%d unbounded memory access, use 'var &= const' or 'if (var < const)'\n",
+					regno);
+				return -EACCES;
+			}
+			err = check_helper_mem_access(env, regno - 1,
+						      reg->max_value,
+						      zero_size_allowed, meta);
+			if (err)
+				return err;
+		} else {
+			/* register is CONST_IMM */
+			err = check_helper_mem_access(env, regno - 1, reg->imm,
+						      zero_size_allowed, meta);
+		}
 	}
 
 	return err;
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index e7b075819c08..9bb45346dc72 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -3442,6 +3442,416 @@ static struct bpf_test tests[] = {
 		.result = ACCEPT,
 		.result_unpriv = REJECT,
 	},
+	{
+		"helper access to variable memory: stack, bitwise AND + JMP, correct bounds",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -64),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -56),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -48),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -40),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -32),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -24),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 64),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, bitwise AND, zero included",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 64),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid stack type R1 off=-64 access_size=0",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, bitwise AND + JMP, wrong max",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 65),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid stack type R1 off=-64 access_size=65",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, JMP, correct bounds",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -64),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -56),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -48),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -40),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -32),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -24),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_2, 64, 4),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, JMP (signed), correct bounds",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -64),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -56),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -48),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -40),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -32),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -24),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2, 64, 4),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JSGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, JMP, bounds + offset",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_2, 64, 5),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 3),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 1),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid stack type R1 off=-64 access_size=65",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, JMP, wrong max",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_2, 65, 4),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid stack type R1 off=-64 access_size=65",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, JMP, no max check",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R2 unbounded memory access",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, JMP, no min check",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_JMP_IMM(BPF_JGT, BPF_REG_2, 64, 3),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid stack type R1 off=-64 access_size=0",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: stack, JMP (signed), no min check",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_2, 16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -128),
+			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2, 64, 3),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R2 min value is negative",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: map, JMP, correct bounds",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 10),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, sizeof(struct test_val)),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -128),
+			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
+				sizeof(struct test_val), 4),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: map, JMP, wrong max",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 10),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_MOV64_IMM(BPF_REG_2, sizeof(struct test_val)),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -128),
+			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
+				sizeof(struct test_val) + 1, 4),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "invalid access to map value, value_size=48 off=0 size=49",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: map adjusted, JMP, correct bounds",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 11),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 20),
+			BPF_MOV64_IMM(BPF_REG_2, sizeof(struct test_val)),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -128),
+			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
+				sizeof(struct test_val) - 20, 4),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: map adjusted, JMP, wrong max",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+			BPF_ST_MEM(BPF_DW, BPF_REG_2, 0, 0),
+			BPF_LD_MAP_FD(BPF_REG_1, 0),
+			BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem),
+			BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 11),
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 20),
+			BPF_MOV64_IMM(BPF_REG_2, sizeof(struct test_val)),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -128),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -128),
+			BPF_JMP_IMM(BPF_JSGT, BPF_REG_2,
+				sizeof(struct test_val) - 19, 4),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_JMP_REG(BPF_JGE, BPF_REG_4, BPF_REG_2, 2),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_EXIT_INSN(),
+		},
+		.fixup_map2 = { 3 },
+		.errstr = "R1 min value is outside of the array range",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: size > 0 not allowed on NULL",
+		.insns = {
+			BPF_MOV64_IMM(BPF_REG_1, 0),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 64),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_EMIT_CALL(BPF_FUNC_csum_diff),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "R1 type=imm expected=fp",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"helper access to variable memory: size = 0 not allowed on != NULL",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_2, 0),
+			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 8),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_MOV64_IMM(BPF_REG_4, 0),
+			BPF_MOV64_IMM(BPF_REG_5, 0),
+			BPF_EMIT_CALL(BPF_FUNC_csum_diff),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid stack type R1 off=-8 access_size=0",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	},
+	{
+		"helper access to variable memory: 8 bytes leak",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -64),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -56),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -48),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -40),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -24),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 63),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 1),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
+			BPF_EXIT_INSN(),
+		},
+		.errstr = "invalid indirect read from stack off -64+32 size 64",
+		.result = REJECT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
+	{
+		"helper access to variable memory: 8 bytes no leak (init memory)",
+		.insns = {
+			BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_MOV64_IMM(BPF_REG_0, 0),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -64),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -56),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -48),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -40),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -32),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -24),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -16),
+			BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -64),
+			BPF_MOV64_IMM(BPF_REG_2, 0),
+			BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 32),
+			BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, 32),
+			BPF_MOV64_IMM(BPF_REG_3, 0),
+			BPF_EMIT_CALL(BPF_FUNC_probe_read),
+			BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_10, -16),
+			BPF_EXIT_INSN(),
+		},
+		.result = ACCEPT,
+		.prog_type = BPF_PROG_TYPE_TRACEPOINT,
+	},
 };
 
 static int probe_filter_length(const struct bpf_insn *fp)
-- 
2.8.0

^ permalink raw reply related

* Re: [PATCH net-next] net/sched: cls_flower: Add user specified data
From: John Fastabend @ 2017-01-09 18:23 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim
  Cc: Paul Blakey, David S. Miller, netdev, Jiri Pirko, Hadar Hen Zion,
	Or Gerlitz, Roi Dayan, Roman Mashak, Simon Horman
In-Reply-To: <20170108171902.GH1971@nanopsycho>

On 17-01-08 09:19 AM, Jiri Pirko wrote:
> Mon, Jan 02, 2017 at 11:21:41PM CET, jhs@mojatatu.com wrote:
>> On 17-01-02 01:23 PM, John Fastabend wrote:
>>
>>>
>>> Additionally I would like to point out this is an arbitrary length binary
>>> blob (for undefined use, without even a specified encoding) that gets pushed
>>> between user space and hardware ;) This seemed to get folks fairly excited in
>>> the past.
>>>
>>
>> The binary blob size is a little strange - but i think there is value
>> in storing some "cookie" field. The challenge is whether the kernel
>> gets to intepret it; in which case encoding must be specified. Or
>> whether we should leave it up to user space - in which something
>> like tc could standardize its own encodings.
> 
> This should never be interpreted by kernel. I think this would be good
> to make clear in the comment in the code.
> 

Ah OK I had assumed you would be pushing this via tc_cls_flower_offload into
the driver in a follow up patch. But if it lives in kernel space as opaque
cookie guess its no different then other id fields order/prio/cookie.

Thanks for clarifying.

^ permalink raw reply

* Re: [PATCHv2 net-next 0/5] sctp: add support for generating stream reconf chunks
From: Xin Long @ 2017-01-09 18:27 UTC (permalink / raw)
  To: David Miller
  Cc: Neil Horman, network dev, linux-sctp, Marcelo Ricardo Leitner,
	Vlad Yasevich
In-Reply-To: <20170109.105301.469495957021068521.davem@davemloft.net>

On Mon, Jan 9, 2017 at 11:53 PM, David Miller <davem@davemloft.net> wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Mon, 9 Jan 2017 07:43:25 -0500
>
>> These all look reasonably good, but it seems before we accept them,
>> there should be an additional patch that actually makes use of the code.
>> I presume that is forthcomming?
>
> This all comes from my asking that the original huge set of patches be
> split up.
>
> People always just rush this kind of work and never think about laying
> out the resubmission properly.
>
> One should always only submit new interfaces along with an actual use
> because only with a use can we properly review whether the new
> interface is good or not.

I was trying to keep the same order with rfc, but it seems not a good
idea, will resplit, thanks.

^ permalink raw reply

* [PATCH net-next] tcp: make TCP_INFO more consistent
From: Eric Dumazet @ 2017-01-09 18:29 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Yuchung Cheng

From: Eric Dumazet <edumazet@google.com>

tcp_get_info() has to lock the socket, so lets lock it
for an extended critical section, so that various fields
have consistent values.

This solves an annoying issue that some applications
reported when multiple counters are updated during one
particular rx/rx event, and TCP_INFO was called from
another cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
---
 net/ipv4/tcp.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ec97e4b4a62f44decdcb559991500a737c7d6379..c8d46c140b4a47ee7c954317977393664e295abf 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2766,6 +2766,9 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 		info->tcpi_sacked = sk->sk_max_ack_backlog;
 		return;
 	}
+
+	slow = lock_sock_fast(sk);
+
 	info->tcpi_ca_state = icsk->icsk_ca_state;
 	info->tcpi_retransmits = icsk->icsk_retransmits;
 	info->tcpi_probes = icsk->icsk_probes_out;
@@ -2816,15 +2819,11 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 
 	info->tcpi_total_retrans = tp->total_retrans;
 
-	slow = lock_sock_fast(sk);
-
 	info->tcpi_bytes_acked = tp->bytes_acked;
 	info->tcpi_bytes_received = tp->bytes_received;
 	info->tcpi_notsent_bytes = max_t(int, 0, tp->write_seq - tp->snd_nxt);
 	tcp_get_info_chrono_stats(tp, info);
 
-	unlock_sock_fast(sk, slow);
-
 	info->tcpi_segs_out = tp->segs_out;
 	info->tcpi_segs_in = tp->segs_in;
 
@@ -2840,6 +2839,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 		do_div(rate64, intv);
 		info->tcpi_delivery_rate = rate64;
 	}
+	unlock_sock_fast(sk, slow);
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
 

^ permalink raw reply related

* Re: [PATCHv3 4/5] arm: mvebu: Add device tree for 98DX3236 SoCs
From: Rob Herring @ 2017-01-09 18:44 UTC (permalink / raw)
  To: Chris Packham
  Cc: linux-arm-kernel, Mark Rutland, Jason Cooper, Andrew Lunn,
	Gregory Clement, Sebastian Hesselbarth, Russell King, devicetree,
	linux-kernel, netdev
In-Reply-To: <20170106041517.9589-5-chris.packham@alliedtelesis.co.nz>

On Fri, Jan 06, 2017 at 05:15:01PM +1300, Chris Packham wrote:
> The Marvell 98DX3236, 98DX3336, 98DX4521 and variants are switch ASICs
> with integrated CPUs. They are similar to the Armada XP SoCs but have
> different I/O interfaces.
> 
> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> ---
>     Changes in v2:
>     - Update devicetree binding documentation to reflect that 98DX3336 and
>       984251 are supersets of 98DX3236.
>     - disable crypto block
>     - disable sdio for 98DX3236, enable for 98DX4251
>     Changes in v3:
>     - fix typo 4521 -> 4251
>     - document prestera bindings
>     - rework corediv-clock binding
>     - add label to packet processor node
>     - add new compativle string for DFX server
> 
>  .../devicetree/bindings/arm/marvell/98dx3236.txt   |  23 ++
>  .../devicetree/bindings/net/marvell,prestera.txt   |  50 ++++
>  arch/arm/boot/dts/armada-xp-98dx3236.dtsi          | 254 +++++++++++++++++++++
>  arch/arm/boot/dts/armada-xp-98dx3336.dtsi          |  76 ++++++
>  arch/arm/boot/dts/armada-xp-98dx4251.dtsi          |  90 ++++++++
>  5 files changed, 493 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/marvell/98dx3236.txt
>  create mode 100644 Documentation/devicetree/bindings/net/marvell,prestera.txt
>  create mode 100644 arch/arm/boot/dts/armada-xp-98dx3236.dtsi
>  create mode 100644 arch/arm/boot/dts/armada-xp-98dx3336.dtsi
>  create mode 100644 arch/arm/boot/dts/armada-xp-98dx4251.dtsi

Acked-by: Rob Herring <robh@kernel.org>

^ permalink raw reply

* Re: [PATCH net-next] liquidio: store the L4 hash of rx packets in skb
From: Felix Manlunas @ 2017-01-09 18:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla
In-Reply-To: <20170108.170941.1437038681712014840.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote on Sun [2017-Jan-08 17:09:41 -0500]:
> From: Felix Manlunas <felix.manlunas@cavium.com>
> Date: Fri, 6 Jan 2017 16:55:42 -0800
> 
> >  
> > +		if (rh->r_dh.has_hash) {
> > +			u32 hash = be32_to_cpu(*(u32 *)(skb->data + r_dh_off));
> 
> Is the checksum defined to be in the first 4-bytes of the 8-byte DHLEN unit,
> or the second 4-bytes?  Is the answer to this question endian-dependent?

The hash is always in the first 4-bytes.  The location of the hash is endian
independent.  The hash itself (in its original form) is big endian.

^ permalink raw reply

* Re: [PATCH stable 4.1] openvswitch: gre: filter gre packets
From: Joe Stringer @ 2017-01-09 18:48 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev, Uri Foox
In-Reply-To: <1483884867-7264-1-git-send-email-pshelar@ovn.org>

On 8 January 2017 at 06:14, Pravin B Shelar <pshelar@ovn.org> wrote:
> OVS can only process L2 packets. But OVS GRE receive handler
> can accept IP-GRE packets. When such packet is processed by
> OVS datapath it can trigger following assert failure due
> to insufficient linear data in skb. Following patch filters
> received packets to avoid this issue.
>
> [68240.441681] ------------[ cut here ]------------
> [68240.496918] kernel BUG at /build/linux-lts-trusty-D60X6T/linux-lts-trusty-3.13.0/include/linux/skbuff.h:1486!
> [68240.615520] invalid opcode: 0000 [#1] SMP
> [68241.953939] RIP: [<ffffffffa052b4fe>] __skb_pull.part.7+0x4/0x6 [openvswitch]
> [68243.099945] Call Trace:
> [68243.129188]  <IRQ>
> [68243.152204]  [<ffffffffa0524e64>] ovs_flow_extract+0x664/0x720 [openvswitch]
> [68243.314912]  [<ffffffffa0523a80>] ovs_dp_process_received_packet+0x60/0x130 [openvswitch]
> [68243.481559]  [<ffffffffa0529e3a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
> [68243.564884]  [<ffffffffa052b374>] gre_rcv+0xa4/0xb8 [openvswitch]
> [68243.637802]  [<ffffffffa03e2795>] gre_cisco_rcv+0x75/0xbc [gre]
> [68243.708621]  [<ffffffffa03e22f5>] gre_rcv+0x65/0x90 [gre]
> [68243.773214]  [<ffffffff816941d8>] ip_local_deliver_finish+0xa8/0x220
> [68243.849244]  [<ffffffff816944db>] ip_local_deliver+0x4b/0x90
> [68243.916951]  [<ffffffff81693ed1>] ip_rcv_finish+0x121/0x380
> [68243.983627]  [<ffffffff816947a6>] ip_rcv+0x286/0x380
> [68244.043023]  [<ffffffff8165b80a>] __netif_receive_skb_core+0x61a/0x760
> [68244.121122]  [<ffffffff8165b971>] __netif_receive_skb+0x21/0x70
> [68244.191942]  [<ffffffff8165c131>] process_backlog+0xb1/0x190
> [68244.259642]  [<ffffffff8165ca09>] net_rx_action+0x139/0x280
> [68244.326305]  [<ffffffff8107367d>] __do_softirq+0xed/0x360
> [68244.390887]  [<ffffffff81073c8e>] irq_exit+0x11e/0x140
> [68244.452358]  [<ffffffff8177d873>] do_IRQ+0x63/0xe0
> [68244.509674]  [<ffffffff817728ad>] common_interrupt+0x6d/0x6d
> [68245.392237] RIP  [<ffffffffa052b4fe>] __skb_pull.part.7+0x4/0x6 [openvswitch]
> [68245.520082] ---[ end trace 383bac9f3e676970 ]---
>
> Fixes: aa310701e7 ("openvswitch: Add gre tunnel support.")
> Reported-by: Uri Foox <uri@zoey.com>
> CC: Joe Stringer <joe@ovn.org>
> Signed-off-by: Pravin B Shelar <pshelar@ovn.org>

Acked-by: Joe Stringer <joe@ovn.org>

^ permalink raw reply

* Re: [net-next PATCH 1/3] Revert "icmp: avoid allocating large struct on stack"
From: David Miller @ 2017-01-09 18:47 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: eric.dumazet, brouer, netdev
In-Reply-To: <CAM_iQpV6YQ6nh_Y0f3BDgMheOic3bPiqQ1oNHR+saxxE7nBuoA@mail.gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 9 Jan 2017 09:59:34 -0800

> Facepalm...

This is completely unnecessary.  Discuss facts and real issues, rather
than make emotional retorts.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] liquidio: store the L4 hash of rx packets in skb
From: Eric Dumazet @ 2017-01-09 18:51 UTC (permalink / raw)
  To: Felix Manlunas
  Cc: davem, netdev, raghu.vatsavayi, derek.chickles, satananda.burla
In-Reply-To: <20170107005542.GA16278@felix.cavium.com>

On Fri, 2017-01-06 at 16:55 -0800, Felix Manlunas wrote:
> From: Prasad Kanneganti <prasad.kanneganti@cavium.com>
> 
> Store the L4 hash of received packets in the skb; the hash is computed in
> the NIC firmware.
...

>  
> +		if (rh->r_dh.has_hash) {
> +			u32 hash = be32_to_cpu(*(u32 *)(skb->data + r_dh_off));

sparse should complain on this line, right ?

make C=2 CF=-D__CHECK_ENDIAN__ M=drivers/net/ethernet/cavium

^ permalink raw reply

* Re: [net-next PATCH 1/3] Revert "icmp: avoid allocating large struct on stack"
From: David Miller @ 2017-01-09 18:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiyou.wangcong, brouer, netdev
In-Reply-To: <1483985224.21472.3.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 09 Jan 2017 10:07:04 -0800

> You really should come to netdev conferences so that you understand
> goals and efforts, instead of living in your cave.

I completely agree with Eric.

Cong we have a very serious problem with you exactly because you make
quite vicious emotional statements targetted at other developers
merely when they say something you disagree with.

This is completely unacceptable behavior, and you must stop doing
this, now.

And I am absolutely positive that if you had met us all in person at
netdev you would not be treating us like this.

So please do us a _HUGE_ favor, and leave your cave, and come to
an upcoming netdev.

Thank you.

^ permalink raw reply

* Re: [PATCHv2 net-next 0/5] sctp: add support for generating stream reconf chunks
From: David Miller @ 2017-01-09 18:53 UTC (permalink / raw)
  To: lucien.xin; +Cc: nhorman, netdev, linux-sctp, marcelo.leitner, vyasevich
In-Reply-To: <CADvbK_dE74bAoTNo9gLczLuj71w3WMcPJ56tCFGJ=XjTMtTgRw@mail.gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Tue, 10 Jan 2017 02:27:09 +0800

> I was trying to keep the same order with rfc, but it seems not a good
> idea, will resplit, thanks.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] net: ipv4: remove disable of bottom half in inet_rtm_getroute
From: David Miller @ 2017-01-09 18:55 UTC (permalink / raw)
  To: dsa; +Cc: netdev
In-Reply-To: <1483848263-17031-1-git-send-email-dsa@cumulusnetworks.com>

From: David Ahern <dsa@cumulusnetworks.com>
Date: Sat,  7 Jan 2017 20:04:23 -0800

> Nothing about the route lookup requires bottom half to be disabled.
> Remove the local_bh_disable ... local_bh_enable around ip_route_input.
> This appears to be a vestige of days gone by as it has been there
> since the beginning of git time.
> 
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Agreed, this shouldn't be necessary.

The key with input route lookups is that the skb->dst reference cannot
escape the packet receive path, unless we force the dst, which
ip_route_input() does under RCU protection.

So this should all be ok.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] liquidio: store the L4 hash of rx packets in skb
From: David Miller @ 2017-01-09 18:56 UTC (permalink / raw)
  To: felix.manlunas; +Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla
In-Reply-To: <20170109184505.GA8756@felix.cavium.com>

From: Felix Manlunas <felix.manlunas@cavium.com>
Date: Mon, 9 Jan 2017 10:45:05 -0800

> David Miller <davem@davemloft.net> wrote on Sun [2017-Jan-08 17:09:41 -0500]:
>> From: Felix Manlunas <felix.manlunas@cavium.com>
>> Date: Fri, 6 Jan 2017 16:55:42 -0800
>> 
>> >  
>> > +		if (rh->r_dh.has_hash) {
>> > +			u32 hash = be32_to_cpu(*(u32 *)(skb->data + r_dh_off));
>> 
>> Is the checksum defined to be in the first 4-bytes of the 8-byte DHLEN unit,
>> or the second 4-bytes?  Is the answer to this question endian-dependent?
> 
> The hash is always in the first 4-bytes.  The location of the hash is endian
> independent.  The hash itself (in its original form) is big endian.

Thanks for explaining.

Please fix the SPARSE issue Eric Dumazet mentioned.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox