Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 iproute2-next 0/3] RDMA tool driver-specific resource tracking
From: David Ahern @ 2018-05-18 16:23 UTC (permalink / raw)
  To: Steve Wise, leon; +Cc: stephen, netdev, linux-rdma
In-Reply-To: <cover.1526483157.git.swise@opengridcomputing.com>

On 5/16/18 9:05 AM, Steve Wise wrote:
> This series enhances the iproute2 rdma tool to include displaying
> driver-specific resource attributes.  It is the user-space part of the
> kernel driver resource tracking series that has been accepted for merging
> into linux-4.18 [1]
> 
> If there are no additional review comments, it can now be merged, I think.
> 

applied to iproute2-next.

BTW: Something is off with the header dependencies. After applying patch
1 nothing was rebuilt.

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Daniel Borkmann @ 2018-05-18 16:17 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Alexei Starovoitov, Alexei Starovoitov, Karlsson, Magnus,
	Duyck, Alexander H, Alexander Duyck, John Fastabend,
	Jesper Dangaard Brouer, Willem de Bruijn, Michael S. Tsirkin,
	Netdev, Björn Töpel, michael.lundkvist,
	Brandeburg, Jesse, Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <CAJ+HfNgB71Z4mNQB0Sq3D8dk7yMTZtnbEUBnj=-kjkac5QA33A@mail.gmail.com>

On 05/18/2018 05:18 PM, Björn Töpel wrote:
> 2018-05-18 15:43 GMT+02:00 Daniel Borkmann <daniel@iogearbox.net>:
>> On 05/18/2018 05:38 AM, Alexei Starovoitov wrote:
>>> On 5/16/18 11:46 PM, Björn Töpel wrote:
>>>> 2018-05-04 1:38 GMT+02:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
>>>>> On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:
>>>>>> On 05/02/2018 01:01 PM, Björn Töpel wrote:
>>>>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>>>>
>>>>>>> This patch set introduces a new address family called AF_XDP that is
>>>>>>> optimized for high performance packet processing and, in upcoming
>>>>>>> patch sets, zero-copy semantics. In this patch set, we have removed
>>>>>>> all zero-copy related code in order to make it smaller, simpler and
>>>>>>> hopefully more review friendly. This patch set only supports copy-mode
>>>>>>> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
>>>>>>> for RX using the XDP_DRV path. Zero-copy support requires XDP and
>>>>>>> driver changes that Jesper Dangaard Brouer is working on. Some of his
>>>>>>> work has already been accepted. We will publish our zero-copy support
>>>>>>> for RX and TX on top of his patch sets at a later point in time.
>>>>>>
>>>>>> +1, would be great to see it land this cycle. Saw few minor nits here
>>>>>> and there but nothing to hold it up, for the series:
>>>>>>
>>>>>> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>>>
>>>>>> Thanks everyone!
>>>>>
>>>>> Great stuff!
>>>>>
>>>>> Applied to bpf-next, with one condition.
>>>>> Upcoming zero-copy patches for both RX and TX need to be posted
>>>>> and reviewed within this release window.
>>>>> If netdev community as a whole won't be able to agree on the zero-copy
>>>>> bits we'd need to revert this feature before the next merge window.
>>>>>
>>>>> Few other minor nits:
>>>>> patch 3:
>>>>> +struct xdp_ring {
>>>>> +       __u32 producer __attribute__((aligned(64)));
>>>>> +       __u32 consumer __attribute__((aligned(64)));
>>>>> +};
>>>>> It kinda begs for ____cacheline_aligned_in_smp to be introduced for uapi headers.
>>>>
>>>> Hmm, I need some guidance on what a sane uapi variant would be. We
>>>> can't have the uapi depend on the kernel build. ARM64, e.g., can have
>>>> both 64B and 128B according to the specs. Contemporary IA processors
>>>> have 64B.
>>>>
>>>> The simplest, and maybe most future-proof, would be 128B aligned for
>>>> all. Another is having 128B for ARM and 64B for all IA. A third option
>>>> is having a hand-shaking API (I think virtio has that) for determine
>>>> the cache line size, but I'd rather not go down that route.
>>>>
>>>> Thoughts/ideas on how a uapi ____cacheline_aligned_in_smp version
>>>> would look like?
>>>
>>> I suspect i40e+arm combination wasn't tested anyway.
>>> The api may have endianness issues too on something like sparc.
>>> I think the way to be backwards compatible in this area
>>> is to make the api usable on x86 only by adding
>>> to include/uapi/linux/if_xdp.h
>>> #if defined(__x86_64__)
>>> #define AF_XDP_CACHE_BYTES 64
>>> #else
>>> #error "AF_XDP support is not yet available for this architecture"
>>> #endif
>>> and doing:
>>>     __u32 producer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>>     __u32 consumer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>>
>>> And progressively add to this for arm64 and few other archs.
>>> Eventually removing #error and adding some generic define
>>> that's good enough for long tail of architectures that
>>> we really cannot test.
>>
>> Been looking into this yesterday as well a bit, and it's a bit of a mess what
>> uapi headers do on this regard (though there are just a handful of such headers).
>> Some of the kernel uapi headers hard-code generally 64 bytes regardless of the
>> underlying arch. In general, the kernel does expose it to user space via sysfs
>> (coherency_line_size). Here's what perf does to retrieve it:
>>
>> #ifdef _SC_LEVEL1_DCACHE_LINESIZE
>> #define cache_line_size(cacheline_sizep) *cacheline_sizep = sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
>> #else
>> static void cache_line_size(int *cacheline_sizep)
>> {
>>         if (sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", cacheline_sizep))
>>                 pr_debug("cannot determine cache line size");
>> }
>> #endif
>>
>> The sysconf() implementation for _SC_LEVEL1_DCACHE_LINESIZE seems also only
>> available for x86, arm64, s390 and ppc on a cursory glance in the glibc code.
>> In the x86 case it retrieves the info from cpuid insn. In order to generically
>> use it in combination with the header you'd have some probe which would then
>> set this as a define before including the header.
> 
> But as a uapi we cannot depend on the L1 cache line size for what's
> currently running on the system, right? So, either a "one cache line
> size for all flavors of an arch", e.g. for ARMv8 that would be 128,
> even though there can be 64 flavors out there.
> 
> Another way would be to remove the ring structure completely, and
> leave that to the user-space application to figure out. So there's a
> runtime interface (getsockopt) to probe the offsets the head and tail
> pointer is after/before the mmap call, and only expose the descriptor
> format explicitly in if_xdp.h. Don't know if that is too unorthodox or
> not...

Good point, I think that may not be too unreasonable thing to do, imho,
at least this doesn't get us into the issue discussed here in the first
place, and it would work for other archs more seamlessly rather than ugly
build error or single per arch 'catch-all' define.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
From: Sandipan Das @ 2018-05-18 16:17 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, netdev, naveen.n.rao, linuxppc-dev, jakub.kicinski
In-Reply-To: <242592f3-c2b4-04bb-7a6c-2394ae4fee98@iogearbox.net>


On 05/18/2018 08:45 PM, Daniel Borkmann wrote:
> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>> The imm field of a bpf instruction is a signed 32-bit integer.
>> For JIT bpf-to-bpf function calls, it stores the offset of the
>> start address of the callee's JITed image from __bpf_call_base.
>>
>> For some architectures, such as powerpc64, this offset may be
>> as large as 64 bits and cannot be accomodated in the imm field
>> without truncation.
>>
>> We resolve this by:
>>
>> [1] Additionally using the auxillary data of each function to
>>     keep a list of start addresses of the JITed images for all
>>     functions determined by the verifier.
>>
>> [2] Retaining the subprog id inside the off field of the call
>>     instructions and using it to index into the list mentioned
>>     above and lookup the callee's address.
>>
>> To make sure that the existing JIT compilers continue to work
>> without requiring changes, we keep the imm field as it is.
>>
>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>> ---
>>  kernel/bpf/verifier.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index a9e4b1372da6..6c56cce9c4e3 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>>  			    insn->src_reg != BPF_PSEUDO_CALL)
>>  				continue;
>>  			subprog = insn->off;
>> -			insn->off = 0;
>>  			insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>>  				func[subprog]->bpf_func -
>>  				__bpf_call_base;
>>  		}
>> +
>> +		/* we use the aux data to keep a list of the start addresses
>> +		 * of the JITed images for each function in the program
>> +		 *
>> +		 * for some architectures, such as powerpc64, the imm field
>> +		 * might not be large enough to hold the offset of the start
>> +		 * address of the callee's JITed image from __bpf_call_base
>> +		 *
>> +		 * in such cases, we can lookup the start address of a callee
>> +		 * by using its subprog id, available from the off field of
>> +		 * the call instruction, as an index for this list
>> +		 */
>> +		func[i]->aux->func = func;
>> +		func[i]->aux->func_cnt = env->subprog_cnt + 1;
> 
> The target tree you have here is infact bpf, since in bpf-next there was a
> cleanup where the + 1 is removed. Just for the record that we need to keep
> this in mind for bpf into bpf-next merge since this would otherwise subtly
> break.
> 

Sorry about the wrong tag. This series is indeed based off bpf-next.

- Sandipan

>>  	}
>>  	for (i = 0; i < env->subprog_cnt; i++) {
>>  		old_bpf_func = func[i]->bpf_func;
>>
> 
> 

^ permalink raw reply

* Re: [iproute2-next v3 1/1] tipc: fixed node and name table listings
From: David Ahern @ 2018-05-18 16:13 UTC (permalink / raw)
  To: Jon Maloy, stephen, netdev
  Cc: mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen, hoang.h.le,
	canh.d.luu, ying.xue, tipc-discussion
In-Reply-To: <1526565762-3681-1-git-send-email-jon.maloy@ericsson.com>

On 5/17/18 8:02 AM, Jon Maloy wrote:
> We make it easier for users to correlate between 128-bit node
> identities and 32-bit node hash number by extending the 'node list'
> command to also show the hash number.
> 
> We also improve the 'nametable show' command to show the node identity
> instead of the node hash number. Since the former potentially is much
> longer than the latter, we make room for it by eliminating the (to the
> user) irrelevant publication key. We also reorder some of the columns so
> that the node id comes last, since this looks nicer and is more logical.
> 
> ---
> v2: Fixed compiler warning as per comment from David Ahern
> v3: Fixed leaking socket as per comment from David Ahern
> 
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH iproute2-next 1/1] tc: add missing space symbol in ife output
From: David Ahern @ 2018-05-18 16:11 UTC (permalink / raw)
  To: Roman Mashak, stephen; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1526563682-28095-1-git-send-email-mrv@mojatatu.com>

On 5/17/18 7:28 AM, Roman Mashak wrote:
> In order to make TDC tests match the output patterns, the missing space
> character must be added in the mode output string.
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> ---
>  tc/m_ife.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Please add the Fixes tag when submitting bug fixes.

Added Fixes tag and applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH net v2] net: dsa: Do not register devlink for unused ports
From: David Miller @ 2018-05-18 16:09 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, jiri, andrew, vivien.didelot, linux-kernel
In-Reply-To: <20180517235539.19418-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 17 May 2018 16:55:39 -0700

> Even if commit 1d27732f411d ("net: dsa: setup and teardown ports") indicated
> that registering a devlink instance for unused ports is not a problem, and this
> is true, this can be confusing nonetheless, so let's not do it.
> 
> Fixes: 1d27732f411d ("net: dsa: setup and teardown ports")
> Reported-by: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header
From: Alexei Starovoitov @ 2018-05-18 16:08 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Alexei Starovoitov, Daniel Borkmann, Network Development,
	linuxppc-dev, Naveen N. Rao, Michael Ellerman, Jakub Kicinski

On Fri, May 18, 2018 at 5:50 AM, Sandipan Das
<sandipan@linux.vnet.ibm.com> wrote:
> Syncing the bpf.h uapi header with tools so that struct
> bpf_prog_info has the two new fields for passing on the
> addresses of the kernel symbols corresponding to each
> function in a JITed program.
>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  tools/include/uapi/linux/bpf.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
>         __u32 xlated_prog_len;
>         __aligned_u64 jited_prog_insns;
>         __aligned_u64 xlated_prog_insns;
> +       __aligned_u64 jited_ksyms;
> +       __u32 nr_jited_ksyms;
>         __u64 load_time;        /* ns since boottime */
>         __u32 created_by_uid;
>         __u32 nr_map_ids;

this breaks uapi.
New fields can only be added to the end.

^ permalink raw reply

* Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Daniel Borkmann @ 2018-05-18 16:08 UTC (permalink / raw)
  To: Naveen N. Rao, ast, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, mpe, netdev
In-Reply-To: <1526658936.9wk22hv49g.naveen@linux.ibm.com>

On 05/18/2018 06:05 PM, Naveen N. Rao wrote:
> Daniel Borkmann wrote:
>> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>>> This adds support for bpf-to-bpf function calls in the powerpc64
>>> JIT compiler. The JIT compiler converts the bpf call instructions
>>> to native branch instructions. After a round of the usual passes,
>>> the start addresses of the JITed images for the callee functions
>>> are known. Finally, to fixup the branch target addresses, we need
>>> to perform an extra pass.
>>>
>>> Because of the address range in which JITed images are allocated
>>> on powerpc64, the offsets of the start addresses of these images
>>> from __bpf_call_base are as large as 64 bits. So, for a function
>>> call, we cannot use the imm field of the instruction to determine
>>> the callee's address. Instead, we use the alternative method of
>>> getting it from the list of function addresses in the auxillary
>>> data of the caller by using the off field as an index.
>>>
>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 69 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
>>> index 1bdb1aff0619..25939892d8f7 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
>>>  /* Assemble the body code between the prologue & epilogue */
>>>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>>                    struct codegen_context *ctx,
>>> -                  u32 *addrs)
>>> +                  u32 *addrs, bool extra_pass)
>>>  {
>>>      const struct bpf_insn *insn = fp->insnsi;
>>>      int flen = fp->len;
>>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>>              break;
>>>  
>>>          /*
>>> -         * Call kernel helper
>>> +         * Call kernel helper or bpf function
>>>           */
>>>          case BPF_JMP | BPF_CALL:
>>>              ctx->seen |= SEEN_FUNC;
>>> -            func = (u8 *) __bpf_call_base + imm;
>>> +
>>> +            /* bpf function call */
>>> +            if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
>>
>> Perhaps it might make sense here for !extra_pass to set func to some dummy
>> address as otherwise the 'kernel helper call' branch used for this is a bit
>> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
>> optimizes the immediate addr, I presume the JIT can handle situations where
>> in the final extra_pass the image needs to grow/shrink again (due to different
>> final address for the call)?
> 
> That's a good catch. We don't handle that -- we expect to get the size right on first pass. We could probably have PPC_FUNC_ADDR() pad the result with nops to make it a constant 5-instruction sequence.

Yeah, arm64 does something similar by not optimizing the imm in order to always
emit 4 insns for it.

Thanks,
Daniel

^ permalink raw reply

* Re: [net PATCH] net: Fix a bug in removing queues from XPS map
From: David Miller @ 2018-05-18 16:07 UTC (permalink / raw)
  To: amritha.nambiar
  Cc: netdev, alexander.h.duyck, f.fainelli, sridhar.samudrala,
	edumazet, hannes, tom
In-Reply-To: <152659384481.2834.5040004714298449412.stgit@anamdev.jf.intel.com>

From: Amritha Nambiar <amritha.nambiar@intel.com>
Date: Thu, 17 May 2018 14:50:44 -0700

> While removing queues from the XPS map, the individual CPU ID
> alone was used to index the CPUs map, this should be changed to also
> factor in the traffic class mapping for the CPU-to-queue lookup.
> 
> Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

Applied and queued up for -stable, thank you.

^ permalink raw reply

* Re: [PATCH iproute2-next] tc: flower: add support for verbose logging
From: David Ahern @ 2018-05-18 16:06 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, netdev
  Cc: Jakub Kicinski, Stephen Hemminger, Jiri Pirko, Alexander Aring,
	Jamal Hadi Salim, Cong Wang
In-Reply-To: <7475756f86139d889344cf2675b13ce4364b81b1.1526243604.git.mleitner@redhat.com>

On 5/13/18 2:44 PM, Marcelo Ricardo Leitner wrote:
> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> 
> Currently there is no way to log offloading errors if the rule is not
> explicitly marked as skip_sw, making it hard for other applications such
> as Open vSwitch to log why a given could not be offloaded.
> 
> This patch adds support for signaling the kernel that more verbose
> logging is wanted, which now will include such messages.
> 
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> ---
>  include/uapi/linux/pkt_cls.h | 1 +
>  man/man8/tc-flower.8         | 7 +++++++
>  tc/f_flower.c                | 4 +++-
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 

applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Naveen N. Rao @ 2018-05-18 16:05 UTC (permalink / raw)
  To: ast, Daniel Borkmann, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, mpe, netdev
In-Reply-To: <c3b23da2-0781-a1ee-0d49-71e9efb52e66@iogearbox.net>

Daniel Borkmann wrote:
> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>> This adds support for bpf-to-bpf function calls in the powerpc64
>> JIT compiler. The JIT compiler converts the bpf call instructions
>> to native branch instructions. After a round of the usual passes,
>> the start addresses of the JITed images for the callee functions
>> are known. Finally, to fixup the branch target addresses, we need
>> to perform an extra pass.
>> 
>> Because of the address range in which JITed images are allocated
>> on powerpc64, the offsets of the start addresses of these images
>> from __bpf_call_base are as large as 64 bits. So, for a function
>> call, we cannot use the imm field of the instruction to determine
>> the callee's address. Instead, we use the alternative method of
>> getting it from the list of function addresses in the auxillary
>> data of the caller by using the off field as an index.
>> 
>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 69 insertions(+), 10 deletions(-)
>> 
>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
>> index 1bdb1aff0619..25939892d8f7 100644
>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
>>  /* Assemble the body code between the prologue & epilogue */
>>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>  			      struct codegen_context *ctx,
>> -			      u32 *addrs)
>> +			      u32 *addrs, bool extra_pass)
>>  {
>>  	const struct bpf_insn *insn = fp->insnsi;
>>  	int flen = fp->len;
>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>  			break;
>>  
>>  		/*
>> -		 * Call kernel helper
>> +		 * Call kernel helper or bpf function
>>  		 */
>>  		case BPF_JMP | BPF_CALL:
>>  			ctx->seen |= SEEN_FUNC;
>> -			func = (u8 *) __bpf_call_base + imm;
>> +
>> +			/* bpf function call */
>> +			if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
> 
> Perhaps it might make sense here for !extra_pass to set func to some dummy
> address as otherwise the 'kernel helper call' branch used for this is a bit
> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
> optimizes the immediate addr, I presume the JIT can handle situations where
> in the final extra_pass the image needs to grow/shrink again (due to different
> final address for the call)?

That's a good catch. We don't handle that -- we expect to get the size 
right on first pass. We could probably have PPC_FUNC_ADDR() pad the 
result with nops to make it a constant 5-instruction sequence.

> 
>> +				if (fp->aux->func && off < fp->aux->func_cnt)
>> +					/* use the subprog id from the off
>> +					 * field to lookup the callee address
>> +					 */
>> +					func = (u8 *) fp->aux->func[off]->bpf_func;
>> +				else
>> +					return -EINVAL;
>> +			/* kernel helper call */
>> +			else
>> +				func = (u8 *) __bpf_call_base + imm;
>>  
>>  			bpf_jit_emit_func_call(image, ctx, (u64)func);
>>  
>> @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>  	return 0;
>>  }
>>  
>> +struct powerpc64_jit_data {
>> +	struct bpf_binary_header *header;
>> +	u32 *addrs;
>> +	u8 *image;
>> +	u32 proglen;
>> +	struct codegen_context ctx;
>> +};
>> +
>>  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  {
>>  	u32 proglen;
>> @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  	u8 *image = NULL;
>>  	u32 *code_base;
>>  	u32 *addrs;
>> +	struct powerpc64_jit_data *jit_data;
>>  	struct codegen_context cgctx;
>>  	int pass;
>>  	int flen;
>> @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  	struct bpf_prog *org_fp = fp;
>>  	struct bpf_prog *tmp_fp;
>>  	bool bpf_blinded = false;
>> +	bool extra_pass = false;
>>  
>>  	if (!fp->jit_requested)
>>  		return org_fp;
>> @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  		fp = tmp_fp;
>>  	}
>>  
>> +	jit_data = fp->aux->jit_data;
>> +	if (!jit_data) {
>> +		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
>> +		if (!jit_data) {
>> +			fp = org_fp;
>> +			goto out;
>> +		}
>> +		fp->aux->jit_data = jit_data;
>> +	}
>> +
>>  	flen = fp->len;
>> +	addrs = jit_data->addrs;
>> +	if (addrs) {
>> +		cgctx = jit_data->ctx;
>> +		image = jit_data->image;
>> +		bpf_hdr = jit_data->header;
>> +		proglen = jit_data->proglen;
>> +		alloclen = proglen + FUNCTION_DESCR_SIZE;
>> +		extra_pass = true;
>> +		goto skip_init_ctx;
>> +	}
>> +
>>  	addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
>>  	if (addrs == NULL) {
>>  		fp = org_fp;
> 
> In this case of !addrs, we leak the just allocated jit_data here!
> 
>> @@ -904,10 +947,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
>>  
>>  	/* Scouting faux-generate pass 0 */
>> -	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
>> +	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
>>  		/* We hit something illegal or unsupported. */
>>  		fp = org_fp;
>> -		goto out;
>> +		goto out_addrs;
>>  	}
>>  
>>  	/*
>> @@ -925,9 +968,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  			bpf_jit_fill_ill_insns);
>>  	if (!bpf_hdr) {
>>  		fp = org_fp;
>> -		goto out;
>> +		goto out_addrs;
>>  	}
>>  
>> +skip_init_ctx:
>>  	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
>>  
>>  	/* Code generation passes 1-2 */
>> @@ -935,7 +979,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  		/* Now build the prologue, body code & epilogue for real. */
>>  		cgctx.idx = 0;
>>  		bpf_jit_build_prologue(code_base, &cgctx);
>> -		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
>> +		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
>>  		bpf_jit_build_epilogue(code_base, &cgctx);
>>  
>>  		if (bpf_jit_enable > 1)
>> @@ -956,15 +1000,30 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  	((u64 *)image)[1] = local_paca->kernel_toc;
>>  #endif
>>  
>> +	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
>> +
>> +	if (!fp->is_func || extra_pass) {
>> +		bpf_jit_binary_lock_ro(bpf_hdr);
> 
> powerpc doesn't implement set_memory_ro(). Generally this is not a problem since
> set_memory_ro() defaults to 'return 0' in this case, but since the bpf_jit_free()
> destructor is overridden here, there's no bpf_jit_binary_unlock_ro() and in case
> powerpc would get set_memory_*() support one day this will then crash in random
> places once the mem gets back to the allocator, thus hard to debug. Two options:
> either you remove the bpf_jit_free() override or you remove the bpf_jit_binary_lock_ro().

Yeah, we shouldn't be using the lock here.

Thanks,
Naveen

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: mscc: ocelot: add bonding support
From: Andrew Lunn @ 2018-05-18 16:04 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: David S . Miller, Allan Nielsen, razvan.stefanescu, po.liu,
	Thomas Petazzoni, Florian Fainelli, netdev, linux-kernel
In-Reply-To: <20180518064106.10122-2-alexandre.belloni@bootlin.com>

On Fri, May 18, 2018 at 08:41:05AM +0200, Alexandre Belloni wrote:
> Add link aggregation hardware offload support for Ocelot.

Hi Alexandre

What i don't see here is anything checking the mode being
requested. If the user requests 'random' but the hardware only
supports 'roundrobin', you probably should not do the offload.

	 Andrew

^ permalink raw reply

* Re: KASAN: use-after-free Read in remove_wait_queue (2)
From: Guillaume Nault @ 2018-05-18 16:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-ppp, Paul Mackerras, netdev, linux-kernel, syzkaller-bugs,
	syzbot, viro
In-Reply-To: <20180514061155.GL677@sol.localdomain>

On Sun, May 13, 2018 at 11:11:55PM -0700, Eric Biggers wrote:
> [+ppp list and maintainer]
> 
> This is a bug in ppp_generic.c; it still happens on Linus' tree and it's easily
> reproducible, see program below.  The bug is that the PPPIOCDETACH ioctl doesn't
> consider that the file can still be attached to epoll instances even when
> ->f_count == 1.

Right. What would it take to remove the file for the epoll instances?
Sorry for the naive question, but I'm not familiar with VFS and didn't
find a helper function we could call.

> Also, the reproducer doesn't test this but I think ppp_poll(),
> ppp_read(), and ppp_write() can all race with PPPIOCDETACH, causing
> use-after-frees as well.

I also believe so. ppp_release() resets ->private_data, and even though
functions like ppp_read() test ->private_data before executing, there's
no synchronisation mechanism to ensure that the update is visible
before the unit or channel is destroyed. Is that the kind of race you
had in mind?

> Any chance that PPPIOCDETACH can simply be removed,
> given that it's apparently been "deprecated" for 16 years?
> Does anyone use it?

The only users I'm aware of are pppd versions older than ppp-2.4.2
(released in November 2003). But even at that time, there were issues
with PPPIOCDETACH as pppd didn't seem to react properly when this call
failed. An Internet search on the "PPPIOCDETACH file->f_count=" kernel
log string, or on the "Couldn't release PPP unit: Invalid argument"
error message of pppd, returns several related bug reports.

Originally, PPPIOCDETACH never failed, but testing ->f_count was
later introduced to fix crashes when the file descriptor had been
duplicated. It seems that this was motivated by polling issues too.

Long story short, it looks like PPPIOCDETACH never has worked well
and we have at least two more bugs to fix. Given how it has proven
fragile, I wouldn't be surprised if there were even more lurking
around. I'd say that it's probably safer to drop it than to add more
workarounds and playing wack-a-mole with those bugs.

^ permalink raw reply

* [PATCH v2 net-next] net: stmmac: Populate missing callbacks in HWIF initialization
From: Jose Abreu @ 2018-05-18 15:54 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, Corentin Labbe, David S. Miller, Joao Pinto,
	Giuseppe Cavallaro, Alexandre Torgue

Some HW specific setups, like sun8i, do not populate all the necessary
callbacks, which is what HWIF helpers were expecting.

Fix this by always trying to get the generic helpers and populate them
if they were not previously populated by HW specific setup.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Fixes: 5f0456b43140 ("net: stmmac: Implement logic to automatically
select HW Interface")
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Cc: Corentin Labbe <clabbe.montjoie@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
Changes from v1:
	- Rebased to latest net-next
---
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   38 ++++++++++++++++-----------
 1 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 23a1264..14770fc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -189,13 +189,16 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 	bool needs_gmac = priv->plat->has_gmac;
 	const struct stmmac_hwif_entry *entry;
 	struct mac_device_info *mac;
+	bool needs_setup = true;
 	int i, ret;
 	u32 id;
 
 	if (needs_gmac) {
 		id = stmmac_get_id(priv, GMAC_VERSION);
-	} else {
+	} else if (needs_gmac4) {
 		id = stmmac_get_id(priv, GMAC4_VERSION);
+	} else {
+		id = 0;
 	}
 
 	/* Save ID for later use */
@@ -209,13 +212,12 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 
 	/* Check for HW specific setup first */
 	if (priv->plat->setup) {
-		priv->hw = priv->plat->setup(priv);
-		if (!priv->hw)
-			return -ENOMEM;
-		return 0;
+		mac = priv->plat->setup(priv);
+		needs_setup = false;
+	} else {
+		mac = devm_kzalloc(priv->device, sizeof(*mac), GFP_KERNEL);
 	}
 
-	mac = devm_kzalloc(priv->device, sizeof(*mac), GFP_KERNEL);
 	if (!mac)
 		return -ENOMEM;
 
@@ -227,24 +229,28 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 			continue;
 		if (needs_gmac4 ^ entry->gmac4)
 			continue;
-		if (id < entry->min_id)
+		/* Use synopsys_id var because some setups can override this */
+		if (priv->synopsys_id < entry->min_id)
 			continue;
 
-		mac->desc = entry->desc;
-		mac->dma = entry->dma;
-		mac->mac = entry->mac;
-		mac->ptp = entry->hwtimestamp;
-		mac->mode = entry->mode;
-		mac->tc = entry->tc;
+		/* Only use generic HW helpers if needed */
+		mac->desc = mac->desc ? : entry->desc;
+		mac->dma = mac->dma ? : entry->dma;
+		mac->mac = mac->mac ? : entry->mac;
+		mac->ptp = mac->ptp ? : entry->hwtimestamp;
+		mac->mode = mac->mode ? : entry->mode;
+		mac->tc = mac->tc ? : entry->tc;
 
 		priv->hw = mac;
 		priv->ptpaddr = priv->ioaddr + entry->regs.ptp_off;
 		priv->mmcaddr = priv->ioaddr + entry->regs.mmc_off;
 
 		/* Entry found */
-		ret = entry->setup(priv);
-		if (ret)
-			return ret;
+		if (needs_setup) {
+			ret = entry->setup(priv);
+			if (ret)
+				return ret;
+		}
 
 		/* Run quirks, if needed */
 		if (entry->quirks) {
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall
From: Daniel Borkmann @ 2018-05-18 15:51 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-7-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> Currently, for multi-function programs, we cannot get the JITed
> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
> command. Because of this, userspace tools such as bpftool fail
> to identify a multi-function program as being JITed or not.
> 
> With the JIT enabled and the test program running, this can be
> verified as follows:
> 
>   # cat /proc/sys/net/core/bpf_jit_enable
>   1
> 
> Before applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T11:43:38+0530  uid 0
>           xlated 216B  not jited  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>   no instructions returned
> 
> After applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T12:13:01+0530  uid 0
>           xlated 216B  jited 308B  memlock 65536B
>   ...

That's really nice! One comment inline below:

>   # bpftool prog dump jited id 1
>      0:   nop
>      4:   nop
>      8:   mflr    r0
>      c:   std     r0,16(r1)
>     10:   stdu    r1,-112(r1)
>     14:   std     r31,104(r1)
>     18:   addi    r31,r1,48
>     1c:   li      r3,10
>   ...
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  kernel/bpf/syscall.c | 38 ++++++++++++++++++++++++++++++++------
>  1 file changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 54a72fafe57c..2430d159078c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	struct bpf_prog_info info = {};
>  	u32 info_len = attr->info.info_len;
>  	char __user *uinsns;
> -	u32 ulen;
> +	u32 ulen, i;
>  	int err;
>  
>  	err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
> @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	ulen = min_t(u32, info.nr_map_ids, ulen);
>  	if (ulen) {
>  		u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids);
> -		u32 i;
>  
>  		for (i = 0; i < ulen; i++)
>  			if (put_user(prog->aux->used_maps[i]->id,
> @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	 * for offload.
>  	 */
>  	ulen = info.jited_prog_len;
> -	info.jited_prog_len = prog->jited_len;
> +	if (prog->aux->func_cnt) {
> +		info.jited_prog_len = 0;
> +		for (i = 0; i < prog->aux->func_cnt; i++)
> +			info.jited_prog_len += prog->aux->func[i]->jited_len;
> +	} else {
> +		info.jited_prog_len = prog->jited_len;
> +	}
> +
>  	if (info.jited_prog_len && ulen) {
>  		if (bpf_dump_raw_ok()) {
>  			uinsns = u64_to_user_ptr(info.jited_prog_insns);
>  			ulen = min_t(u32, info.jited_prog_len, ulen);
> -			if (copy_to_user(uinsns, prog->bpf_func, ulen))
> -				return -EFAULT;
> +
> +			/* for multi-function programs, copy the JITed
> +			 * instructions for all the functions
> +			 */
> +			if (prog->aux->func_cnt) {
> +				u32 len, free;
> +				u8 *img;
> +
> +				free = ulen;
> +				for (i = 0; i < prog->aux->func_cnt; i++) {
> +					len = prog->aux->func[i]->jited_len;
> +					img = (u8 *) prog->aux->func[i]->bpf_func;
> +					if (len > free)
> +						break;
> +					if (copy_to_user(uinsns, img, len))
> +						return -EFAULT;
> +					uinsns += len;
> +					free -= len;

Is there any way we can introduce a delimiter between the different
images such that they could be more easily correlated with the call
from the main (or other sub-)program instead of having one contiguous
dump blob?

> +				}
> +			} else {
> +				if (copy_to_user(uinsns, prog->bpf_func, ulen))
> +					return -EFAULT;
> +			}
>  		} else {
>  			info.jited_prog_insns = 0;
>  		}
> @@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	if (info.nr_jited_ksyms && ulen) {
>  		u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms);
>  		ulong ksym_addr;
> -		u32 i;
>  
>  		/* copy the address of the kernel symbol corresponding to
>  		 * each function
> 

^ permalink raw reply

* Re: [PATCH v3 net-next 0/6] tcp: implement SACK compression
From: Eric Dumazet @ 2018-05-18 15:48 UTC (permalink / raw)
  To: David Miller, edumazet
  Cc: netdev, toke, ncardwell, ycheng, soheil, eric.dumazet
In-Reply-To: <20180518.114238.2120564820986009221.davem@davemloft.net>



On 05/18/2018 08:42 AM, David Miller wrote:

> This looks great, series applied, thanks Eric.
> 
> So now we handle locally terminated traffic well, however we still need
> something that handles traffic flowing through the system.  And for that
> reason the CAKE ACK filter still has merit.
> 
> I completely agree with you that it has to be implemented properly,
> parse all TCP options and the SACK blocks, and pass the ACKs through
> when there are options it doesn't understand or the SACK/timestampe/etc.
> update contains new information that must be passed along and not dropper.
> 
> Thanks again.
> 

Be assured I will carefully review Cake ACK filter ;)

Thanks David !

^ permalink raw reply

* [PATCH] net: sched: don't disable bh when accessing action idr
From: Vlad Buslov @ 2018-05-18 15:45 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, linux-kernel, Vlad Buslov

Underlying implementation of action map has changed and doesn't require
disabling bh anymore. Replace all action idr spinlock usage with regular
calls that do not disable bh.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/act_api.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 72251241..3f4cf93 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -77,9 +77,9 @@ static void free_tcf(struct tc_action *p)
 
 static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
 {
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 	idr_remove(&idrinfo->action_idr, p->tcfa_index);
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 	gen_kill_estimator(&p->tcfa_rate_est);
 	free_tcf(p);
 }
@@ -156,7 +156,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	struct tc_action *p;
 	unsigned long id = 1;
 
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 
 	s_i = cb->args[0];
 
@@ -191,7 +191,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	if (index >= 0)
 		cb->args[0] = index + 1;
 
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 	if (n_i) {
 		if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
 			cb->args[1] = n_i;
@@ -261,9 +261,9 @@ static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
 {
 	struct tc_action *p = NULL;
 
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 	p = idr_find(&idrinfo->action_idr, index);
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 
 	return p;
 }
@@ -323,7 +323,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	}
 	spin_lock_init(&p->tcfa_lock);
 	idr_preload(GFP_KERNEL);
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 	/* user doesn't specify an index */
 	if (!index) {
 		index = 1;
@@ -331,7 +331,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	} else {
 		err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
 	}
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 	idr_preload_end();
 	if (err)
 		goto err3;
@@ -369,9 +369,9 @@ void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
 {
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
 
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 	idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 }
 EXPORT_SYMBOL(tcf_idr_insert);
 
-- 
2.7.5

^ permalink raw reply related

* Re: WARNING in ip_recv_error
From: David Miller @ 2018-05-18 15:44 UTC (permalink / raw)
  To: eric.dumazet
  Cc: threeearcat, kuznet, yoshfuji, netdev, linux-kernel, byoungyoung,
	kt0755, bammanag, willemb
In-Reply-To: <293d029c-b14c-a625-3703-97a5754e99f1@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 18 May 2018 08:30:43 -0700

> We probably need to revert Willem patch (7ce875e5ecb8562fd44040f69bda96c999e38bbc)

Is it really valid to reach ip_recv_err with an ipv6 socket?

^ permalink raw reply

* Re: [PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall
From: Daniel Borkmann @ 2018-05-18 15:43 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-4-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds new two new fields to struct bpf_prog_info. For
> multi-function programs, these fields can be used to pass
> a list of kernel symbol addresses for all functions in a
> given program and to userspace using the bpf system call
> with the BPF_OBJ_GET_INFO_BY_FD command.
> 
> When bpf_jit_kallsyms is enabled, we can get the address
> of the corresponding kernel symbol for a callee function
> and resolve the symbol's name. The address is determined
> by adding the value of the call instruction's imm field
> to __bpf_call_base. This offset gets assigned to the imm
> field by the verifier.
> 
> For some architectures, such as powerpc64, the imm field
> is not large enough to hold this offset.
> 
> We resolve this by:
> 
> [1] Assigning the subprog id to the imm field of a call
>     instruction in the verifier instead of the offset of
>     the callee's symbol's address from __bpf_call_base.
> 
> [2] Determining the address of a callee's corresponding
>     symbol by using the imm field as an index for the
>     list of kernel symbol addresses now available from
>     the program info.
> 
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  include/uapi/linux/bpf.h |  2 ++
>  kernel/bpf/syscall.c     | 20 ++++++++++++++++++++
>  kernel/bpf/verifier.c    |  7 +------
>  3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
>  	__u32 xlated_prog_len;
>  	__aligned_u64 jited_prog_insns;
>  	__aligned_u64 xlated_prog_insns;
> +	__aligned_u64 jited_ksyms;
> +	__u32 nr_jited_ksyms;
>  	__u64 load_time;	/* ns since boottime */
>  	__u32 created_by_uid;
>  	__u32 nr_map_ids;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index bfcde949c7f8..54a72fafe57c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	if (!capable(CAP_SYS_ADMIN)) {
>  		info.jited_prog_len = 0;
>  		info.xlated_prog_len = 0;
> +		info.nr_jited_ksyms = 0;
>  		goto done;
>  	}
>  
> @@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  		}
>  	}
>  
> +	ulen = info.nr_jited_ksyms;
> +	info.nr_jited_ksyms = prog->aux->func_cnt;
> +	if (info.nr_jited_ksyms && ulen) {

Since this exposes addresses (though masked one which is correct), this
definitely needs to be guarded with bpf_dump_raw_ok() like we do in other
places here (see JIT dump for example).

> +		u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms);
> +		ulong ksym_addr;
> +		u32 i;
> +
> +		/* copy the address of the kernel symbol corresponding to
> +		 * each function
> +		 */
> +		ulen = min_t(u32, info.nr_jited_ksyms, ulen);
> +		for (i = 0; i < ulen; i++) {
> +			ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
> +			ksym_addr &= PAGE_MASK;
> +			if (put_user((u64) ksym_addr, &user_jited_ksyms[i]))
> +				return -EFAULT;
> +		}
> +	}
> +
>  done:
>  	if (copy_to_user(uinfo, &info, info_len) ||
>  	    put_user(info_len, &uattr->info.info_len))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6c56cce9c4e3..e826c396aba2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>  	 * later look the same as if they were interpreted only.
>  	 */
>  	for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
> -		unsigned long addr;
> -
>  		if (insn->code != (BPF_JMP | BPF_CALL) ||
>  		    insn->src_reg != BPF_PSEUDO_CALL)
>  			continue;
>  		insn->off = env->insn_aux_data[i].call_imm;
>  		subprog = find_subprog(env, i + insn->off + 1);
> -		addr  = (unsigned long)func[subprog]->bpf_func;

Hmm, in current bpf tree this says 'subprog + 1' here, so this is not
rebased against bpf tree but bpf-next (unlike what subject says)?

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/kernel/bpf/verifier.c#n5351

> -		addr &= PAGE_MASK;
> -		insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
> -			    addr - __bpf_call_base;
> +		insn->imm = subprog;
>  	}
>  
>  	prog->jited = 1;
> 

^ permalink raw reply

* Re: [PATCH v3 net-next 0/6] tcp: implement SACK compression
From: David Miller @ 2018-05-18 15:42 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, toke, ncardwell, ycheng, soheil, eric.dumazet
In-Reply-To: <20180517214729.186094-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 17 May 2018 14:47:23 -0700

> When TCP receives an out-of-order packet, it immediately sends
> a SACK packet, generating network load but also forcing the
> receiver to send 1-MSS pathological packets, increasing its
> RTX queue length/depth, and thus processing time.
> 
> Wifi networks suffer from this aggressive behavior, but generally
> speaking, all these SACK packets add fuel to the fire when networks
> are under congestion.
> 
> This patch series adds SACK compression, but the infrastructure
> could be leveraged to also compress ACK in the future.
> 
> v2: Addressed Neal feedback.
>     Added two sysctls to allow fine tuning, or even disabling the feature.
> 
> v3: take rtt = min(srtt, rcv_rtt) as Yuchung suggested, because rcv_rtt
>     can be over estimated for RPC (or sender limited)

This looks great, series applied, thanks Eric.

So now we handle locally terminated traffic well, however we still need
something that handles traffic flowing through the system.  And for that
reason the CAKE ACK filter still has merit.

I completely agree with you that it has to be implemented properly,
parse all TCP options and the SACK blocks, and pass the ACKs through
when there are options it doesn't understand or the SACK/timestampe/etc.
update contains new information that must be passed along and not dropper.

Thanks again.

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 01/13] audit: add container id
From: Steve Grubb @ 2018-05-18 15:38 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, eparis, serge
In-Reply-To: <20180518152106.do5b3mu6e6eyvo7q@madcap2.tricolour.ca>

On Fri, 18 May 2018 11:21:06 -0400
Richard Guy Briggs <rgb@redhat.com> wrote:

> On 2018-05-18 09:56, Steve Grubb wrote:
> > On Thu, 17 May 2018 17:56:00 -0400
> > Richard Guy Briggs <rgb@redhat.com> wrote:
> >   
> > > > During syscall events, the path info is returned in a a record
> > > > simply called AUDIT_PATH, cwd info is returned in AUDIT_CWD. So,
> > > > rather than calling the record that gets attached to everything
> > > > AUDIT_CONTAINER_INFO, how about simply AUDIT_CONTAINER.    
> > > 
> > > Considering the container initiation record is different than the
> > > record to document the container involved in an otherwise normal
> > > syscall, we need two names.  I don't have a strong opinion what
> > > they are.
> > > 
> > > I'd prefer AUDIT_CONTAIN and AUDIT_CONTAINER_INFO so that the two
> > > are different enough to be visually distinct while leaving
> > > AUDIT_CONTAINERID for the field type in patch 4 ("audit: add
> > > containerid filtering")  
> 
> (Sorry, I had intended AUDIT_CONTAINER for the first in that paragraph
> above.)
> 
> > How about AUDIT_CONTAINER for the auxiliary record? The one that
> > starts the container, I don't have a strong opinion on. Could be
> > AUDIT_CONTAINER_INIT, AUDIT_CONTAINER_START, AUDIT_CONTAINERID,
> > AUDIT_CONTAINER_ID, or something else. The API call that sets the ID
> > for filtering could be AUDIT_CID or AUDIT_CONTID if that helps
> > decide what the initial event might be. Normally, it should match
> > the field being filtered.  
> 
> Ok, I had shortened the record field name to "contid=" to be unique
> enough while not using too much netlink bandwidth.  I could have used
> "cid=" but that could be unobvious or ambiguous.  I didn't want to use
> the full "containerid=" due to that.  I suppose I could change the
> field name macro to AUDIT_CONTID.
> 
> For the one that starts the container, I'd prefer to leave the name a
> bit more general than "_INIT", "_START", so maybe I'll swap them
> around and use AUDIT_CONTAINER_INFO for the startup record, and use
> AUDIT_CONTAINER for the syscall auxiliary record.
> 
> Does that work?

I'll go along with that. Thanks. But making that swap frees up
AUDIT_CONTAINER_ID which could be the first event. But
AUDIT_CONTAINER_INFO is also fine with me.

Best Regards,
-Steve

^ permalink raw reply

* Re: [PATCH net-next ] net: mscc: Add SPDX identifier
From: David Miller @ 2018-05-18 15:30 UTC (permalink / raw)
  To: alexandre.belloni
  Cc: Allan.Nielsen, razvan.stefanescu, po.liu, thomas.petazzoni,
	andrew, f.fainelli, netdev, linux-kernel
In-Reply-To: <20180517192305.17517-1-alexandre.belloni@bootlin.com>

From: Alexandre Belloni <alexandre.belloni@bootlin.com>
Date: Thu, 17 May 2018 21:23:05 +0200

> ocelot_qsys.h is missing the SPDX identfier, fix that.
> 
> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

Applied, thank you.

^ permalink raw reply

* Re: WARNING in ip_recv_error
From: Eric Dumazet @ 2018-05-18 15:30 UTC (permalink / raw)
  To: DaeRyong Jeong, davem, kuznet, yoshfuji
  Cc: netdev, linux-kernel, byoungyoung, kt0755, bammanag,
	Willem de Bruijn
In-Reply-To: <20180518120826.GA19515@dragonet.kaist.ac.kr>



On 05/18/2018 05:08 AM, DaeRyong Jeong wrote:
> We report the crash: WARNING in ip_recv_error
> (I resend the email since I mistakenly missed the subject in my previous
> email. I'm sorry.)
> 
> 
> This crash has been found in v4.17-rc1 using RaceFuzzer (a modified
> version of Syzkaller), which we describe more at the end of this
> report. Our analysis shows that the race occurs when invoking two
> syscalls concurrently, setsockopt$inet6_IPV6_ADDRFORM and recvmsg.
> 
> 
> Diagnosis:
> We think the concurrent execution of do_ipv6_setsockopt() with optname
> IPV6_ADDRFORM and inet_recvmsg() causes the crash. do_ipv6_setsockopt()
> can update sk->prot to &udp_prot and sk->sk_family to PF_INET. But
> inet_recvmsg() can execute sk->sk_prot->recvmsg() right after that
> sk->prot is updated and sk->sk_family is not updated by
> do_ipv6_setsockopt(). This will lead WARN_ON in ip_recv_error().
> 
> 
> Thread interleaving:
> CPU0 (do_ipv6_setsockopt)			CPU1 (inet_recvmsg)
> =====						=====
> struct proto *prot = &udp_prot;
> ...
> sk->sk_prot = prot;
> sk->sk_socket->ops = &inet_dgram_ops;
> 						err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
> 									flags & ~MSG_DONTWAIT, &addr_len);
> 
> 						(in udp_recvmsg)
> 						if (flags & MSG_ERRQUEUE)
> 							return ip_recv_error(sk, msg, len, addr_len);
> 	
> 						(in ip_recv_error)
> 						WARN_ON_ONCE(sk->sk_family == AF_INET6);
> sk->sk_family = PF_INET;
> 
> 
> Call Sequence:
> CPU0
> =====
> udpv6_setsockopt
> 	ipv6_setsockopt
> 		do_ipv6_setsockopt
> 
> CPU1
> =====
> sock_recvmsg
> 	sock_recvmsg_nosec
> 		inet_recvmsg
> 			udp_recvmsg
> 
> 
> ==================================================================
> WARNING: CPU: 1 PID: 32600 at /home/daeryong/workspace/new-race-fuzzer/kernels_repo/kernel_v4.17-rc1/net/ipv4/ip_sockglue.c:508 ip_recv_error+0x6f2/0x720 net/ipv4/ip_sockglue.c:508
> Kernel panic - not syncing: panic_on_warn set ...
> 
> CPU: 1 PID: 32600 Comm: syz-executor0 Not tainted 4.17.0-rc1 #1
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x166/0x21c lib/dump_stack.c:113
>  panic+0x1a0/0x3a7 kernel/panic.c:184
>  __warn+0x191/0x1a0 kernel/panic.c:536
>  report_bug+0x132/0x1b0 lib/bug.c:186
>  fixup_bug.part.11+0x28/0x50 arch/x86/kernel/traps.c:178
>  fixup_bug arch/x86/kernel/traps.c:247 [inline]
>  do_error_trap+0x28b/0x2d0 arch/x86/kernel/traps.c:296
>  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
> RIP: 0010:ip_recv_error+0x6f2/0x720 net/ipv4/ip_sockglue.c:508
> RSP: 0018:ffff8801dadff630 EFLAGS: 00010212
> RAX: 0000000000040000 RBX: 0000000000002002 RCX: ffffffff8327de12
> RDX: 000000000000008a RSI: ffffc90001a0c000 RDI: ffff8801be615010
> RBP: ffff8801dadff720 R08: 0000000000002002 R09: ffff8801dadff918
> R10: ffff8801dadff738 R11: ffff8801dadffaff R12: ffff8801be615000
> R13: ffff8801dadffd50 R14: 1ffff1003b5bfece R15: ffff8801dadffb90
>  udp_recvmsg+0x834/0xa10 net/ipv4/udp.c:1571
>  inet_recvmsg+0x121/0x420 net/ipv4/af_inet.c:830
>  sock_recvmsg_nosec net/socket.c:802 [inline]
>  sock_recvmsg+0x7f/0xa0 net/socket.c:809
>  ___sys_recvmsg+0x1f0/0x430 net/socket.c:2279
>  __sys_recvmsg+0xfc/0x1c0 net/socket.c:2328
>  __do_sys_recvmsg net/socket.c:2338 [inline]
>  __se_sys_recvmsg net/socket.c:2335 [inline]
>  __x64_sys_recvmsg+0x48/0x50 net/socket.c:2335
>  do_syscall_64+0x15f/0x4a0 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4563f9
> RSP: 002b:00007f24f6927b28 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
> RAX: ffffffffffffffda RBX: 000000000072bfa0 RCX: 00000000004563f9
> RDX: 0000000000002002 RSI: 0000000020000240 RDI: 0000000000000016
> RBP: 00000000000004e4 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f24f69286d4
> R13: 00000000ffffffff R14: 00000000006fc600 R15: 0000000000000000
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> ==================================================================
> 
> 
> = About RaceFuzzer
> 
> RaceFuzzer is a customized version of Syzkaller, specifically tailored
> to find race condition bugs in the Linux kernel. While we leverage
> many different technique, the notable feature of RaceFuzzer is in
> leveraging a custom hypervisor (QEMU/KVM) to interleave the
> scheduling. In particular, we modified the hypervisor to intentionally
> stall a per-core execution, which is similar to supporting per-core
> breakpoint functionality. This allows RaceFuzzer to force the kernel
> to deterministically trigger racy condition (which may rarely happen
> in practice due to randomness in scheduling).
> 
> RaceFuzzer's C repro always pinpoints two racy syscalls. Since C
> repro's scheduling synchronization should be performed at the user
> space, its reproducibility is limited (reproduction may take from 1
> second to 10 minutes (or even more), depending on a bug). This is
> because, while RaceFuzzer precisely interleaves the scheduling at the
> kernel's instruction level when finding this bug, C repro cannot fully
> utilize such a feature. Please disregard all code related to
> "should_hypercall" in the C repro, as this is only for our debugging
> purposes using our own hypervisor.
> 


We probably need to revert Willem patch (7ce875e5ecb8562fd44040f69bda96c999e38bbc)

^ permalink raw reply

* Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Daniel Borkmann @ 2018-05-18 15:30 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-3-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds support for bpf-to-bpf function calls in the powerpc64
> JIT compiler. The JIT compiler converts the bpf call instructions
> to native branch instructions. After a round of the usual passes,
> the start addresses of the JITed images for the callee functions
> are known. Finally, to fixup the branch target addresses, we need
> to perform an extra pass.
> 
> Because of the address range in which JITed images are allocated
> on powerpc64, the offsets of the start addresses of these images
> from __bpf_call_base are as large as 64 bits. So, for a function
> call, we cannot use the imm field of the instruction to determine
> the callee's address. Instead, we use the alternative method of
> getting it from the list of function addresses in the auxillary
> data of the caller by using the off field as an index.
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 69 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index 1bdb1aff0619..25939892d8f7 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
>  /* Assemble the body code between the prologue & epilogue */
>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>  			      struct codegen_context *ctx,
> -			      u32 *addrs)
> +			      u32 *addrs, bool extra_pass)
>  {
>  	const struct bpf_insn *insn = fp->insnsi;
>  	int flen = fp->len;
> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>  			break;
>  
>  		/*
> -		 * Call kernel helper
> +		 * Call kernel helper or bpf function
>  		 */
>  		case BPF_JMP | BPF_CALL:
>  			ctx->seen |= SEEN_FUNC;
> -			func = (u8 *) __bpf_call_base + imm;
> +
> +			/* bpf function call */
> +			if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)

Perhaps it might make sense here for !extra_pass to set func to some dummy
address as otherwise the 'kernel helper call' branch used for this is a bit
misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
optimizes the immediate addr, I presume the JIT can handle situations where
in the final extra_pass the image needs to grow/shrink again (due to different
final address for the call)?

> +				if (fp->aux->func && off < fp->aux->func_cnt)
> +					/* use the subprog id from the off
> +					 * field to lookup the callee address
> +					 */
> +					func = (u8 *) fp->aux->func[off]->bpf_func;
> +				else
> +					return -EINVAL;
> +			/* kernel helper call */
> +			else
> +				func = (u8 *) __bpf_call_base + imm;
>  
>  			bpf_jit_emit_func_call(image, ctx, (u64)func);
>  
> @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>  	return 0;
>  }
>  
> +struct powerpc64_jit_data {
> +	struct bpf_binary_header *header;
> +	u32 *addrs;
> +	u8 *image;
> +	u32 proglen;
> +	struct codegen_context ctx;
> +};
> +
>  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  {
>  	u32 proglen;
> @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  	u8 *image = NULL;
>  	u32 *code_base;
>  	u32 *addrs;
> +	struct powerpc64_jit_data *jit_data;
>  	struct codegen_context cgctx;
>  	int pass;
>  	int flen;
> @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  	struct bpf_prog *org_fp = fp;
>  	struct bpf_prog *tmp_fp;
>  	bool bpf_blinded = false;
> +	bool extra_pass = false;
>  
>  	if (!fp->jit_requested)
>  		return org_fp;
> @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  		fp = tmp_fp;
>  	}
>  
> +	jit_data = fp->aux->jit_data;
> +	if (!jit_data) {
> +		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
> +		if (!jit_data) {
> +			fp = org_fp;
> +			goto out;
> +		}
> +		fp->aux->jit_data = jit_data;
> +	}
> +
>  	flen = fp->len;
> +	addrs = jit_data->addrs;
> +	if (addrs) {
> +		cgctx = jit_data->ctx;
> +		image = jit_data->image;
> +		bpf_hdr = jit_data->header;
> +		proglen = jit_data->proglen;
> +		alloclen = proglen + FUNCTION_DESCR_SIZE;
> +		extra_pass = true;
> +		goto skip_init_ctx;
> +	}
> +
>  	addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
>  	if (addrs == NULL) {
>  		fp = org_fp;

In this case of !addrs, we leak the just allocated jit_data here!

> @@ -904,10 +947,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
>  
>  	/* Scouting faux-generate pass 0 */
> -	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
> +	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
>  		/* We hit something illegal or unsupported. */
>  		fp = org_fp;
> -		goto out;
> +		goto out_addrs;
>  	}
>  
>  	/*
> @@ -925,9 +968,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  			bpf_jit_fill_ill_insns);
>  	if (!bpf_hdr) {
>  		fp = org_fp;
> -		goto out;
> +		goto out_addrs;
>  	}
>  
> +skip_init_ctx:
>  	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
>  
>  	/* Code generation passes 1-2 */
> @@ -935,7 +979,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  		/* Now build the prologue, body code & epilogue for real. */
>  		cgctx.idx = 0;
>  		bpf_jit_build_prologue(code_base, &cgctx);
> -		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
> +		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
>  		bpf_jit_build_epilogue(code_base, &cgctx);
>  
>  		if (bpf_jit_enable > 1)
> @@ -956,15 +1000,30 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>  	((u64 *)image)[1] = local_paca->kernel_toc;
>  #endif
>  
> +	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
> +
> +	if (!fp->is_func || extra_pass) {
> +		bpf_jit_binary_lock_ro(bpf_hdr);

powerpc doesn't implement set_memory_ro(). Generally this is not a problem since
set_memory_ro() defaults to 'return 0' in this case, but since the bpf_jit_free()
destructor is overridden here, there's no bpf_jit_binary_unlock_ro() and in case
powerpc would get set_memory_*() support one day this will then crash in random
places once the mem gets back to the allocator, thus hard to debug. Two options:
either you remove the bpf_jit_free() override or you remove the bpf_jit_binary_lock_ro().

> +	} else {
> +		jit_data->addrs = addrs;
> +		jit_data->ctx = cgctx;
> +		jit_data->proglen = proglen;
> +		jit_data->image = image;
> +		jit_data->header = bpf_hdr;
> +	}
> +
>  	fp->bpf_func = (void *)image;
>  	fp->jited = 1;
>  	fp->jited_len = alloclen;
>  
> -	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
> +	if (!fp->is_func || extra_pass) {
> +out_addrs:
> +		kfree(addrs);
> +		kfree(jit_data);
> +		fp->aux->jit_data = NULL;
> +	}
>  
>  out:
> -	kfree(addrs);
> -
>  	if (bpf_blinded)
>  		bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
>  
> 

^ permalink raw reply

* Re: [PATCH] hippi: fix spelling mistake: "Framming" -> "Framing"
From: David Miller @ 2018-05-18 15:26 UTC (permalink / raw)
  To: jes.sorensen; +Cc: colin.king, netdev, kernel-janitors, linux-kernel
In-Reply-To: <bdb779e6-1705-3607-7ad8-3b57dfcba31a@gmail.com>

From: Jes Sorensen <jes.sorensen@gmail.com>
Date: Fri, 18 May 2018 10:33:50 -0400

> I do wonder if it's time to retire this driver and the HIPPI code. I
> haven't had access to hardware for over a decade so I have no idea
> if it even works anymore.

That's a good question.

If you really think nobody is using this stuff, yes we can remove it.

Then if someone brings it to our attention that there are users, we
can very easily bring it back.

But as the HIPPI maintiner it is really up to you.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox