Netdev List
 help / color / mirror / Atom feed
* Re: xdp and fragments with virtio
From: David Ahern @ 2018-05-18 16:42 UTC (permalink / raw)
  To: Jason Wang, netdev@vger.kernel.org
In-Reply-To: <fbcf2fcc-0e40-6f0f-f8a5-8255a59eaec5@redhat.com>

On 5/17/18 3:24 AM, Jason Wang wrote:
> It looks like we wrongly drop packets after linearizing the packets
> during XDP_REDIRECT.
> 
> Please try the patch (but I do spot some other issues, will post a series):
> 
> Thanks
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f34794a..59702f9 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -800,7 +800,7 @@ static struct sk_buff *receive_mergeable(struct
> net_device *dev,
>                         }
>                         *xdp_xmit = true;
>                         if (unlikely(xdp_page != page))
> -                               goto err_xdp;
> +                               put_page(page);
>                         rcu_read_unlock();
>                         goto xdp_xmit;
>                 default:

Yes, that does solve the problem of fragments getting lost.

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] fix test_sockmap
From: John Fastabend @ 2018-05-18 16:42 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev
In-Reply-To: <20180518071753.4768-1-bhole_prashant_q7@lab.ntt.co.jp>

On 05/18/2018 12:17 AM, Prashant Bhole wrote:
> This series fixes bugs in test_sockmap code. They weren't caught
> previously because failure in RX/TX thread was not notified to the
> main thread.
> 
> Also fixed data verification logic and slightly improved test output
> such that parameters values (cork, apply, start, end) of failed test
> can be easily seen.
> 

Great, this was on my list so thanks for taking care of it.

> Note: Even after fixing above problems there are issues with tests
> which set cork parameter. Tests fail (RX thread timeout) when cork
> value is non-zero and overall data sent by TX thread isn't multiples
> of cork value.


This is expected. When 'cork' is set the sender should only xmit
the data when 'cork' bytes are available. If the user doesn't
provide the N bytes the data is cork'ed waiting for the bytes and
if the socket is closed the state is cleaned up. What these tests
are testing is the cleanup path when a user doesn't provide the
N bytes. In practice this is used to validate headers and prevent
users from sending partial headers. We want to keep these tests because
they verify a tear-down path in the code.

After your changes do these get reported as failures? If so we
need to account for the above in the calculations.

> 
> Prashant Bhole (5):
>   selftests/bpf: test_sockmap, check test failure
>   selftests/bpf: test_sockmap, join cgroup in selftest mode
>   selftests/bpf: test_sockmap, fix test timeout
>   selftests/bpf: test_sockmap, fix data verification
>   selftests/bpf: test_sockmap, print additional test options
> 
>  tools/testing/selftests/bpf/test_sockmap.c | 76 +++++++++++++++++++++++-------
>  1 file changed, 58 insertions(+), 18 deletions(-)
> 

^ permalink raw reply

* Re: [PATCH 00/14] Modify action API for implementing lockless actions
From: Vlad Buslov @ 2018-05-18 16:37 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, Roman Mashak, Linux Kernel Network Developers,
	David Miller, Cong Wang, pablo, kadlec, fw, ast, Daniel Borkmann,
	Eric Dumazet, kliteyn, Lucas Bates
In-Reply-To: <e594a4b0-079d-9353-7443-443ab1134f88@mojatatu.com>


On Fri 18 May 2018 at 12:33, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 17/05/18 09:35 AM, Vlad Buslov wrote:
>> 
>> On Wed 16 May 2018 at 21:51, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Wed, May 16, 2018 at 11:23:41PM CEST, vladbu@mellanox.com wrote:
>>>>
>
>>>>> Please make sure you have these in your kernel config:
>>>>>
>>>>> CONFIG_NET_ACT_IFE=y
>>>>> CONFIG_NET_IFE_SKBMARK=m
>>>>> CONFIG_NET_IFE_SKBPRIO=m
>>>>> CONFIG_NET_IFE_SKBTCINDEX=m
>>>
>>> Roman, could you please add this to some file? Something similar to:
>>> tools/testing/selftests/net/forwarding/config
>>>
>
> How would putting the file there help?
>
>>> Thanks!
>>>
>>>>>
>>>>> For tdc to run all the tests, it is assumed that all the supported tc
>>>>> actions/filters are enabled and compiled.
>>>>
>>>> Enabling these options allowed all ife tests to pass. Thanks!
>>>>
>>>> Error in u32 test still appears however:
>>>>
>>>> Test e9a3: Add u32 with source match
>>>>
>>>> -----> prepare stage *** Could not execute: "$TC qdisc add dev $DEV1 ingress"
>>>>
>>>> -----> prepare stage *** Error message: "Cannot find device "v0p1"
>> 
>> I investigated and was able to fix u32 problems.
>> 
>> First of all, u32 test requires having veth interfaces that are not
>> created by test infrastructure by default. Following command fixes the
>> issue:
>> 
>> sudo ip link add v0p0 type veth peer name v0p1
>> 
>
> That is documented on the README i believe - however, we should
> be able to detect that a test needs the device and create it via
> a plugin. Lucas?

Description from README:

*  The kernel must have veth support available, as a veth pair is created
   prior to running the tests.

For me it looked like test suite will create veth pair by itself and I
only need to enable veth support in kernel config.

>
>> After executing this command test passes, however looking at test
>> definition itself it seems meaningless. It creates filter with match
>> source IP 127.0.0.1, then tests if filter with source IP 127.0.0.2
>> exists, but passes successfully because it actually expects to match
>> zero filters with such IP :)
>> 
>> I fixed it and it passed properly matching single filter with source IP
>> 127.0.0.2.
>>
>
> Please send a patch.

Done.

>
>> After this flower test failed. The flower test expects that user
>> explicitly provide "-d" option with interface to use. With -d it failed
>> again. This time because it expects action to have 1m references, but
>> actual value was 1000001. I investigated it and found out that test
>> passed, if executed without running other tests first. So it seemed that
>> some other test was leaking reference to gact action. It turned out that
>> culprit was mirred test 6fb4, which created pipe action but didn't flush
>> it afterward.
>>
>
> Hopefully the last patch from Roman fixes that? Otherwise send something
> on top.

Done.

>
>> With all tests passing on that particular version of net-next, I will
>> now rebase my changes on top of it and run them again.
>> 
>
> Thank you Vlad!
>
> cheers,
> jamal

^ permalink raw reply

* Re: [PATCH] net: ethernet: ti: cpsw: fix packet leaking in dual_mac mode
From: Grygorii Strashko @ 2018-05-18 16:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Naresh Kamboju
  Cc: David Miller, netdev, nsekhar, open list, linux-omap
In-Reply-To: <20180517191037.GE504@kroah.com>



On 05/17/2018 02:10 PM, Greg Kroah-Hartman wrote:
> On Thu, May 17, 2018 at 11:18:16PM +0530, Naresh Kamboju wrote:
>> On 2 May 2018 at 20:38, David Miller <davem@davemloft.net> wrote:
>>> From: Grygorii Strashko <grygorii.strashko@ti.com>
>>> Date: Tue, 1 May 2018 12:41:22 -0500
>> <trim>
>>>> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>>
>>> Applied and queued up for -stable, thank you.
>>
>> 4.4 stable-rc build failed for arm32.
>> MACHINE=am57xx-evm
>>
>> Build error log:
>> --------------------
>> drivers/net/ethernet/ti/cpsw.c:
>>   In function 'cpsw_add_dual_emac_def_ale_entries':
>> drivers/net/ethernet/ti/cpsw.c:1112:23:
>>   error: 'cpsw' undeclared (first use in this function)
>>     cpsw_ale_control_set(cpsw->ale, slave_port,
>>                          ^~~~
>> drivers/net/ethernet/ti/cpsw.c:1112:23: note:
>>   each undeclared identifier is reported only once for each function it appears
>>   in
>> scripts/Makefile.build:269: recipe for target 'drivers/net/ethernet/ti/cpsw.o'
>>   failed
>>   make[6]: *** [drivers/net/ethernet/ti/cpsw.o] Error 1
>> scripts/Makefile.build:476: recipe for target 'drivers/net/ethernet/ti' failed
>>   make[5]: *** [drivers/net/ethernet/ti] Error 2
>>
> 
> Now dropped, it's nice to see I got 3 reports about this :)
> 

Sry, my bad. I've tested till 4.4 if it can be applied without 
conflicts, but not tested build.

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH bpf-next v2 1/7] perf/core: add perf_get_event() to return perf_event given a struct file
From: Yonghong Song @ 2018-05-18 16:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180518071822.GC12217@hirez.programming.kicks-ass.net>



On 5/18/18 12:18 AM, Peter Zijlstra wrote:
> On Thu, May 17, 2018 at 10:32:53PM -0700, Yonghong Song wrote:
>> A new extern function, perf_get_event(), is added to return a perf event
>> given a struct file. This function will be used in later patches.
> 
> Can't you do a narrower interface? Like return the prog. I'm not too
> keen on random !perf code frobbing around inside the event.

Hi, Peter,

My initial implementation (not upstreamed) actually have the whole
function bpf_get_perf_event_info() in the events/core.c. In that
case, the "struct file *" pointer is passed. This way, the event pointer
does not need to go to kernel/bpf/syscall.c or kernel/trace/bpf_trace.c.

I dropped this mechanism since it added more codes in the events/core.c
file, and I felt that such query code might clutter events/core.c.
The function bpf_get_perf_event_info() is now placed in 
kernel/trace/bpf_trace.c.

Just getting bpf prog pointer is not enough as it does not provide
enough attachment information. Getting such information requires
poking into event/tp_event etc.

Currently we have this extern function exposed by events/core.c:
    extern struct perf_event *perf_get_event(struct file *file);
We could make the result value "const" like
    extern const struct perf_event *perf_get_event(struct file *file);
This will make it clear that we do not change "event" fields, and
merely poking at it.

Please let me know your preference.

Thanks!
Yonghong

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Björn Töpel @ 2018-05-18 16:32 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Alexei Starovoitov, Karlsson, Magnus,
	Duyck, Alexander H, Alexander Duyck, John Fastabend,
	Jesper Dangaard Brouer, Willem de Bruijn, Michael S. Tsirkin,
	Netdev, Björn Töpel, michael.lundkvist,
	Brandeburg, Jesse, Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <ccb3ed90-440a-bc29-d6bb-9dc09824ed82@iogearbox.net>

2018-05-18 18:17 GMT+02:00 Daniel Borkmann <daniel@iogearbox.net>:
> On 05/18/2018 05:18 PM, Björn Töpel wrote:
>> 2018-05-18 15:43 GMT+02:00 Daniel Borkmann <daniel@iogearbox.net>:
>>> On 05/18/2018 05:38 AM, Alexei Starovoitov wrote:
>>>> On 5/16/18 11:46 PM, Björn Töpel wrote:
>>>>> 2018-05-04 1:38 GMT+02:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
>>>>>> On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:
>>>>>>> On 05/02/2018 01:01 PM, Björn Töpel wrote:
>>>>>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>>>>>
>>>>>>>> This patch set introduces a new address family called AF_XDP that is
>>>>>>>> optimized for high performance packet processing and, in upcoming
>>>>>>>> patch sets, zero-copy semantics. In this patch set, we have removed
>>>>>>>> all zero-copy related code in order to make it smaller, simpler and
>>>>>>>> hopefully more review friendly. This patch set only supports copy-mode
>>>>>>>> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
>>>>>>>> for RX using the XDP_DRV path. Zero-copy support requires XDP and
>>>>>>>> driver changes that Jesper Dangaard Brouer is working on. Some of his
>>>>>>>> work has already been accepted. We will publish our zero-copy support
>>>>>>>> for RX and TX on top of his patch sets at a later point in time.
>>>>>>>
>>>>>>> +1, would be great to see it land this cycle. Saw few minor nits here
>>>>>>> and there but nothing to hold it up, for the series:
>>>>>>>
>>>>>>> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>>>>
>>>>>>> Thanks everyone!
>>>>>>
>>>>>> Great stuff!
>>>>>>
>>>>>> Applied to bpf-next, with one condition.
>>>>>> Upcoming zero-copy patches for both RX and TX need to be posted
>>>>>> and reviewed within this release window.
>>>>>> If netdev community as a whole won't be able to agree on the zero-copy
>>>>>> bits we'd need to revert this feature before the next merge window.
>>>>>>
>>>>>> Few other minor nits:
>>>>>> patch 3:
>>>>>> +struct xdp_ring {
>>>>>> +       __u32 producer __attribute__((aligned(64)));
>>>>>> +       __u32 consumer __attribute__((aligned(64)));
>>>>>> +};
>>>>>> It kinda begs for ____cacheline_aligned_in_smp to be introduced for uapi headers.
>>>>>
>>>>> Hmm, I need some guidance on what a sane uapi variant would be. We
>>>>> can't have the uapi depend on the kernel build. ARM64, e.g., can have
>>>>> both 64B and 128B according to the specs. Contemporary IA processors
>>>>> have 64B.
>>>>>
>>>>> The simplest, and maybe most future-proof, would be 128B aligned for
>>>>> all. Another is having 128B for ARM and 64B for all IA. A third option
>>>>> is having a hand-shaking API (I think virtio has that) for determine
>>>>> the cache line size, but I'd rather not go down that route.
>>>>>
>>>>> Thoughts/ideas on how a uapi ____cacheline_aligned_in_smp version
>>>>> would look like?
>>>>
>>>> I suspect i40e+arm combination wasn't tested anyway.
>>>> The api may have endianness issues too on something like sparc.
>>>> I think the way to be backwards compatible in this area
>>>> is to make the api usable on x86 only by adding
>>>> to include/uapi/linux/if_xdp.h
>>>> #if defined(__x86_64__)
>>>> #define AF_XDP_CACHE_BYTES 64
>>>> #else
>>>> #error "AF_XDP support is not yet available for this architecture"
>>>> #endif
>>>> and doing:
>>>>     __u32 producer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>>>     __u32 consumer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>>>
>>>> And progressively add to this for arm64 and few other archs.
>>>> Eventually removing #error and adding some generic define
>>>> that's good enough for long tail of architectures that
>>>> we really cannot test.
>>>
>>> Been looking into this yesterday as well a bit, and it's a bit of a mess what
>>> uapi headers do on this regard (though there are just a handful of such headers).
>>> Some of the kernel uapi headers hard-code generally 64 bytes regardless of the
>>> underlying arch. In general, the kernel does expose it to user space via sysfs
>>> (coherency_line_size). Here's what perf does to retrieve it:
>>>
>>> #ifdef _SC_LEVEL1_DCACHE_LINESIZE
>>> #define cache_line_size(cacheline_sizep) *cacheline_sizep = sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
>>> #else
>>> static void cache_line_size(int *cacheline_sizep)
>>> {
>>>         if (sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", cacheline_sizep))
>>>                 pr_debug("cannot determine cache line size");
>>> }
>>> #endif
>>>
>>> The sysconf() implementation for _SC_LEVEL1_DCACHE_LINESIZE seems also only
>>> available for x86, arm64, s390 and ppc on a cursory glance in the glibc code.
>>> In the x86 case it retrieves the info from cpuid insn. In order to generically
>>> use it in combination with the header you'd have some probe which would then
>>> set this as a define before including the header.
>>
>> But as a uapi we cannot depend on the L1 cache line size for what's
>> currently running on the system, right? So, either a "one cache line
>> size for all flavors of an arch", e.g. for ARMv8 that would be 128,
>> even though there can be 64 flavors out there.
>>
>> Another way would be to remove the ring structure completely, and
>> leave that to the user-space application to figure out. So there's a
>> runtime interface (getsockopt) to probe the offsets the head and tail
>> pointer is after/before the mmap call, and only expose the descriptor
>> format explicitly in if_xdp.h. Don't know if that is too unorthodox or
>> not...
>
> Good point, I think that may not be too unreasonable thing to do, imho,
> at least this doesn't get us into the issue discussed here in the first
> place, and it would work for other archs more seamlessly rather than ugly
> build error or single per arch 'catch-all' define.
>

I'll try that route (runtime-based offset to prod/cons), and see where
it ends up!


Enjoy the weekend,
Björn

> Thanks,
> Daniel

^ permalink raw reply

* Re: [PATCH v3 iproute2-next 0/3] RDMA tool driver-specific resource tracking
From: David Ahern @ 2018-05-18 16:23 UTC (permalink / raw)
  To: Steve Wise, leon; +Cc: stephen, netdev, linux-rdma
In-Reply-To: <cover.1526483157.git.swise@opengridcomputing.com>

On 5/16/18 9:05 AM, Steve Wise wrote:
> This series enhances the iproute2 rdma tool to include displaying
> driver-specific resource attributes.  It is the user-space part of the
> kernel driver resource tracking series that has been accepted for merging
> into linux-4.18 [1]
> 
> If there are no additional review comments, it can now be merged, I think.
> 

applied to iproute2-next.

BTW: Something is off with the header dependencies. After applying patch
1 nothing was rebuilt.

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Daniel Borkmann @ 2018-05-18 16:17 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Alexei Starovoitov, Alexei Starovoitov, Karlsson, Magnus,
	Duyck, Alexander H, Alexander Duyck, John Fastabend,
	Jesper Dangaard Brouer, Willem de Bruijn, Michael S. Tsirkin,
	Netdev, Björn Töpel, michael.lundkvist,
	Brandeburg, Jesse, Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <CAJ+HfNgB71Z4mNQB0Sq3D8dk7yMTZtnbEUBnj=-kjkac5QA33A@mail.gmail.com>

On 05/18/2018 05:18 PM, Björn Töpel wrote:
> 2018-05-18 15:43 GMT+02:00 Daniel Borkmann <daniel@iogearbox.net>:
>> On 05/18/2018 05:38 AM, Alexei Starovoitov wrote:
>>> On 5/16/18 11:46 PM, Björn Töpel wrote:
>>>> 2018-05-04 1:38 GMT+02:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
>>>>> On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:
>>>>>> On 05/02/2018 01:01 PM, Björn Töpel wrote:
>>>>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>>>>
>>>>>>> This patch set introduces a new address family called AF_XDP that is
>>>>>>> optimized for high performance packet processing and, in upcoming
>>>>>>> patch sets, zero-copy semantics. In this patch set, we have removed
>>>>>>> all zero-copy related code in order to make it smaller, simpler and
>>>>>>> hopefully more review friendly. This patch set only supports copy-mode
>>>>>>> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
>>>>>>> for RX using the XDP_DRV path. Zero-copy support requires XDP and
>>>>>>> driver changes that Jesper Dangaard Brouer is working on. Some of his
>>>>>>> work has already been accepted. We will publish our zero-copy support
>>>>>>> for RX and TX on top of his patch sets at a later point in time.
>>>>>>
>>>>>> +1, would be great to see it land this cycle. Saw few minor nits here
>>>>>> and there but nothing to hold it up, for the series:
>>>>>>
>>>>>> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>>>
>>>>>> Thanks everyone!
>>>>>
>>>>> Great stuff!
>>>>>
>>>>> Applied to bpf-next, with one condition.
>>>>> Upcoming zero-copy patches for both RX and TX need to be posted
>>>>> and reviewed within this release window.
>>>>> If netdev community as a whole won't be able to agree on the zero-copy
>>>>> bits we'd need to revert this feature before the next merge window.
>>>>>
>>>>> Few other minor nits:
>>>>> patch 3:
>>>>> +struct xdp_ring {
>>>>> +       __u32 producer __attribute__((aligned(64)));
>>>>> +       __u32 consumer __attribute__((aligned(64)));
>>>>> +};
>>>>> It kinda begs for ____cacheline_aligned_in_smp to be introduced for uapi headers.
>>>>
>>>> Hmm, I need some guidance on what a sane uapi variant would be. We
>>>> can't have the uapi depend on the kernel build. ARM64, e.g., can have
>>>> both 64B and 128B according to the specs. Contemporary IA processors
>>>> have 64B.
>>>>
>>>> The simplest, and maybe most future-proof, would be 128B aligned for
>>>> all. Another is having 128B for ARM and 64B for all IA. A third option
>>>> is having a hand-shaking API (I think virtio has that) for determine
>>>> the cache line size, but I'd rather not go down that route.
>>>>
>>>> Thoughts/ideas on how a uapi ____cacheline_aligned_in_smp version
>>>> would look like?
>>>
>>> I suspect i40e+arm combination wasn't tested anyway.
>>> The api may have endianness issues too on something like sparc.
>>> I think the way to be backwards compatible in this area
>>> is to make the api usable on x86 only by adding
>>> to include/uapi/linux/if_xdp.h
>>> #if defined(__x86_64__)
>>> #define AF_XDP_CACHE_BYTES 64
>>> #else
>>> #error "AF_XDP support is not yet available for this architecture"
>>> #endif
>>> and doing:
>>>     __u32 producer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>>     __u32 consumer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>>
>>> And progressively add to this for arm64 and few other archs.
>>> Eventually removing #error and adding some generic define
>>> that's good enough for long tail of architectures that
>>> we really cannot test.
>>
>> Been looking into this yesterday as well a bit, and it's a bit of a mess what
>> uapi headers do on this regard (though there are just a handful of such headers).
>> Some of the kernel uapi headers hard-code generally 64 bytes regardless of the
>> underlying arch. In general, the kernel does expose it to user space via sysfs
>> (coherency_line_size). Here's what perf does to retrieve it:
>>
>> #ifdef _SC_LEVEL1_DCACHE_LINESIZE
>> #define cache_line_size(cacheline_sizep) *cacheline_sizep = sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
>> #else
>> static void cache_line_size(int *cacheline_sizep)
>> {
>>         if (sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", cacheline_sizep))
>>                 pr_debug("cannot determine cache line size");
>> }
>> #endif
>>
>> The sysconf() implementation for _SC_LEVEL1_DCACHE_LINESIZE seems also only
>> available for x86, arm64, s390 and ppc on a cursory glance in the glibc code.
>> In the x86 case it retrieves the info from cpuid insn. In order to generically
>> use it in combination with the header you'd have some probe which would then
>> set this as a define before including the header.
> 
> But as a uapi we cannot depend on the L1 cache line size for what's
> currently running on the system, right? So, either a "one cache line
> size for all flavors of an arch", e.g. for ARMv8 that would be 128,
> even though there can be 64 flavors out there.
> 
> Another way would be to remove the ring structure completely, and
> leave that to the user-space application to figure out. So there's a
> runtime interface (getsockopt) to probe the offsets the head and tail
> pointer is after/before the mmap call, and only expose the descriptor
> format explicitly in if_xdp.h. Don't know if that is too unorthodox or
> not...

Good point, I think that may not be too unreasonable thing to do, imho,
at least this doesn't get us into the issue discussed here in the first
place, and it would work for other archs more seamlessly rather than ugly
build error or single per arch 'catch-all' define.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
From: Sandipan Das @ 2018-05-18 16:17 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, netdev, naveen.n.rao, linuxppc-dev, jakub.kicinski
In-Reply-To: <242592f3-c2b4-04bb-7a6c-2394ae4fee98@iogearbox.net>


On 05/18/2018 08:45 PM, Daniel Borkmann wrote:
> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>> The imm field of a bpf instruction is a signed 32-bit integer.
>> For JIT bpf-to-bpf function calls, it stores the offset of the
>> start address of the callee's JITed image from __bpf_call_base.
>>
>> For some architectures, such as powerpc64, this offset may be
>> as large as 64 bits and cannot be accomodated in the imm field
>> without truncation.
>>
>> We resolve this by:
>>
>> [1] Additionally using the auxillary data of each function to
>>     keep a list of start addresses of the JITed images for all
>>     functions determined by the verifier.
>>
>> [2] Retaining the subprog id inside the off field of the call
>>     instructions and using it to index into the list mentioned
>>     above and lookup the callee's address.
>>
>> To make sure that the existing JIT compilers continue to work
>> without requiring changes, we keep the imm field as it is.
>>
>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>> ---
>>  kernel/bpf/verifier.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index a9e4b1372da6..6c56cce9c4e3 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>>  			    insn->src_reg != BPF_PSEUDO_CALL)
>>  				continue;
>>  			subprog = insn->off;
>> -			insn->off = 0;
>>  			insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>>  				func[subprog]->bpf_func -
>>  				__bpf_call_base;
>>  		}
>> +
>> +		/* we use the aux data to keep a list of the start addresses
>> +		 * of the JITed images for each function in the program
>> +		 *
>> +		 * for some architectures, such as powerpc64, the imm field
>> +		 * might not be large enough to hold the offset of the start
>> +		 * address of the callee's JITed image from __bpf_call_base
>> +		 *
>> +		 * in such cases, we can lookup the start address of a callee
>> +		 * by using its subprog id, available from the off field of
>> +		 * the call instruction, as an index for this list
>> +		 */
>> +		func[i]->aux->func = func;
>> +		func[i]->aux->func_cnt = env->subprog_cnt + 1;
> 
> The target tree you have here is infact bpf, since in bpf-next there was a
> cleanup where the + 1 is removed. Just for the record that we need to keep
> this in mind for bpf into bpf-next merge since this would otherwise subtly
> break.
> 

Sorry about the wrong tag. This series is indeed based off bpf-next.

- Sandipan

>>  	}
>>  	for (i = 0; i < env->subprog_cnt; i++) {
>>  		old_bpf_func = func[i]->bpf_func;
>>
> 
> 

^ permalink raw reply

* Re: [iproute2-next v3 1/1] tipc: fixed node and name table listings
From: David Ahern @ 2018-05-18 16:13 UTC (permalink / raw)
  To: Jon Maloy, stephen, netdev
  Cc: mohan.krishna.ghanta.krishnamurthy, tung.q.nguyen, hoang.h.le,
	canh.d.luu, ying.xue, tipc-discussion
In-Reply-To: <1526565762-3681-1-git-send-email-jon.maloy@ericsson.com>

On 5/17/18 8:02 AM, Jon Maloy wrote:
> We make it easier for users to correlate between 128-bit node
> identities and 32-bit node hash number by extending the 'node list'
> command to also show the hash number.
> 
> We also improve the 'nametable show' command to show the node identity
> instead of the node hash number. Since the former potentially is much
> longer than the latter, we make room for it by eliminating the (to the
> user) irrelevant publication key. We also reorder some of the columns so
> that the node id comes last, since this looks nicer and is more logical.
> 
> ---
> v2: Fixed compiler warning as per comment from David Ahern
> v3: Fixed leaking socket as per comment from David Ahern
> 
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH iproute2-next 1/1] tc: add missing space symbol in ife output
From: David Ahern @ 2018-05-18 16:11 UTC (permalink / raw)
  To: Roman Mashak, stephen; +Cc: netdev, kernel, jhs, xiyou.wangcong, jiri
In-Reply-To: <1526563682-28095-1-git-send-email-mrv@mojatatu.com>

On 5/17/18 7:28 AM, Roman Mashak wrote:
> In order to make TDC tests match the output patterns, the missing space
> character must be added in the mode output string.
> 
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> ---
>  tc/m_ife.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Please add the Fixes tag when submitting bug fixes.

Added Fixes tag and applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH net v2] net: dsa: Do not register devlink for unused ports
From: David Miller @ 2018-05-18 16:09 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, jiri, andrew, vivien.didelot, linux-kernel
In-Reply-To: <20180517235539.19418-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 17 May 2018 16:55:39 -0700

> Even if commit 1d27732f411d ("net: dsa: setup and teardown ports") indicated
> that registering a devlink instance for unused ports is not a problem, and this
> is true, this can be confusing nonetheless, so let's not do it.
> 
> Fixes: 1d27732f411d ("net: dsa: setup and teardown ports")
> Reported-by: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH bpf v2 4/6] tools: bpf: sync bpf uapi header
From: Alexei Starovoitov @ 2018-05-18 16:08 UTC (permalink / raw)
  To: Sandipan Das
  Cc: Alexei Starovoitov, Daniel Borkmann, Network Development,
	linuxppc-dev, Naveen N. Rao, Michael Ellerman, Jakub Kicinski

On Fri, May 18, 2018 at 5:50 AM, Sandipan Das
<sandipan@linux.vnet.ibm.com> wrote:
> Syncing the bpf.h uapi header with tools so that struct
> bpf_prog_info has the two new fields for passing on the
> addresses of the kernel symbols corresponding to each
> function in a JITed program.
>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  tools/include/uapi/linux/bpf.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
>         __u32 xlated_prog_len;
>         __aligned_u64 jited_prog_insns;
>         __aligned_u64 xlated_prog_insns;
> +       __aligned_u64 jited_ksyms;
> +       __u32 nr_jited_ksyms;
>         __u64 load_time;        /* ns since boottime */
>         __u32 created_by_uid;
>         __u32 nr_map_ids;

this breaks uapi.
New fields can only be added to the end.

^ permalink raw reply

* Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Daniel Borkmann @ 2018-05-18 16:08 UTC (permalink / raw)
  To: Naveen N. Rao, ast, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, mpe, netdev
In-Reply-To: <1526658936.9wk22hv49g.naveen@linux.ibm.com>

On 05/18/2018 06:05 PM, Naveen N. Rao wrote:
> Daniel Borkmann wrote:
>> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>>> This adds support for bpf-to-bpf function calls in the powerpc64
>>> JIT compiler. The JIT compiler converts the bpf call instructions
>>> to native branch instructions. After a round of the usual passes,
>>> the start addresses of the JITed images for the callee functions
>>> are known. Finally, to fixup the branch target addresses, we need
>>> to perform an extra pass.
>>>
>>> Because of the address range in which JITed images are allocated
>>> on powerpc64, the offsets of the start addresses of these images
>>> from __bpf_call_base are as large as 64 bits. So, for a function
>>> call, we cannot use the imm field of the instruction to determine
>>> the callee's address. Instead, we use the alternative method of
>>> getting it from the list of function addresses in the auxillary
>>> data of the caller by using the off field as an index.
>>>
>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
>>>  1 file changed, 69 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
>>> index 1bdb1aff0619..25939892d8f7 100644
>>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
>>>  /* Assemble the body code between the prologue & epilogue */
>>>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>>                    struct codegen_context *ctx,
>>> -                  u32 *addrs)
>>> +                  u32 *addrs, bool extra_pass)
>>>  {
>>>      const struct bpf_insn *insn = fp->insnsi;
>>>      int flen = fp->len;
>>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>>              break;
>>>  
>>>          /*
>>> -         * Call kernel helper
>>> +         * Call kernel helper or bpf function
>>>           */
>>>          case BPF_JMP | BPF_CALL:
>>>              ctx->seen |= SEEN_FUNC;
>>> -            func = (u8 *) __bpf_call_base + imm;
>>> +
>>> +            /* bpf function call */
>>> +            if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
>>
>> Perhaps it might make sense here for !extra_pass to set func to some dummy
>> address as otherwise the 'kernel helper call' branch used for this is a bit
>> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
>> optimizes the immediate addr, I presume the JIT can handle situations where
>> in the final extra_pass the image needs to grow/shrink again (due to different
>> final address for the call)?
> 
> That's a good catch. We don't handle that -- we expect to get the size right on first pass. We could probably have PPC_FUNC_ADDR() pad the result with nops to make it a constant 5-instruction sequence.

Yeah, arm64 does something similar by not optimizing the imm in order to always
emit 4 insns for it.

Thanks,
Daniel

^ permalink raw reply

* Re: [net PATCH] net: Fix a bug in removing queues from XPS map
From: David Miller @ 2018-05-18 16:07 UTC (permalink / raw)
  To: amritha.nambiar
  Cc: netdev, alexander.h.duyck, f.fainelli, sridhar.samudrala,
	edumazet, hannes, tom
In-Reply-To: <152659384481.2834.5040004714298449412.stgit@anamdev.jf.intel.com>

From: Amritha Nambiar <amritha.nambiar@intel.com>
Date: Thu, 17 May 2018 14:50:44 -0700

> While removing queues from the XPS map, the individual CPU ID
> alone was used to index the CPUs map, this should be changed to also
> factor in the traffic class mapping for the CPU-to-queue lookup.
> 
> Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

Applied and queued up for -stable, thank you.

^ permalink raw reply

* Re: [PATCH iproute2-next] tc: flower: add support for verbose logging
From: David Ahern @ 2018-05-18 16:06 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, netdev
  Cc: Jakub Kicinski, Stephen Hemminger, Jiri Pirko, Alexander Aring,
	Jamal Hadi Salim, Cong Wang
In-Reply-To: <7475756f86139d889344cf2675b13ce4364b81b1.1526243604.git.mleitner@redhat.com>

On 5/13/18 2:44 PM, Marcelo Ricardo Leitner wrote:
> From: Marcelo Ricardo Leitner <mleitner@redhat.com>
> 
> Currently there is no way to log offloading errors if the rule is not
> explicitly marked as skip_sw, making it hard for other applications such
> as Open vSwitch to log why a given could not be offloaded.
> 
> This patch adds support for signaling the kernel that more verbose
> logging is wanted, which now will include such messages.
> 
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> ---
>  include/uapi/linux/pkt_cls.h | 1 +
>  man/man8/tc-flower.8         | 7 +++++++
>  tc/f_flower.c                | 4 +++-
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 

applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH bpf v2 2/6] bpf: powerpc64: add JIT support for multi-function programs
From: Naveen N. Rao @ 2018-05-18 16:05 UTC (permalink / raw)
  To: ast, Daniel Borkmann, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, mpe, netdev
In-Reply-To: <c3b23da2-0781-a1ee-0d49-71e9efb52e66@iogearbox.net>

Daniel Borkmann wrote:
> On 05/18/2018 02:50 PM, Sandipan Das wrote:
>> This adds support for bpf-to-bpf function calls in the powerpc64
>> JIT compiler. The JIT compiler converts the bpf call instructions
>> to native branch instructions. After a round of the usual passes,
>> the start addresses of the JITed images for the callee functions
>> are known. Finally, to fixup the branch target addresses, we need
>> to perform an extra pass.
>> 
>> Because of the address range in which JITed images are allocated
>> on powerpc64, the offsets of the start addresses of these images
>> from __bpf_call_base are as large as 64 bits. So, for a function
>> call, we cannot use the imm field of the instruction to determine
>> the callee's address. Instead, we use the alternative method of
>> getting it from the list of function addresses in the auxillary
>> data of the caller by using the off field as an index.
>> 
>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/net/bpf_jit_comp64.c | 79 ++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 69 insertions(+), 10 deletions(-)
>> 
>> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
>> index 1bdb1aff0619..25939892d8f7 100644
>> --- a/arch/powerpc/net/bpf_jit_comp64.c
>> +++ b/arch/powerpc/net/bpf_jit_comp64.c
>> @@ -256,7 +256,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
>>  /* Assemble the body code between the prologue & epilogue */
>>  static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>  			      struct codegen_context *ctx,
>> -			      u32 *addrs)
>> +			      u32 *addrs, bool extra_pass)
>>  {
>>  	const struct bpf_insn *insn = fp->insnsi;
>>  	int flen = fp->len;
>> @@ -712,11 +712,23 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>  			break;
>>  
>>  		/*
>> -		 * Call kernel helper
>> +		 * Call kernel helper or bpf function
>>  		 */
>>  		case BPF_JMP | BPF_CALL:
>>  			ctx->seen |= SEEN_FUNC;
>> -			func = (u8 *) __bpf_call_base + imm;
>> +
>> +			/* bpf function call */
>> +			if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
> 
> Perhaps it might make sense here for !extra_pass to set func to some dummy
> address as otherwise the 'kernel helper call' branch used for this is a bit
> misleading in that sense. The PPC_LI64() used in bpf_jit_emit_func_call()
> optimizes the immediate addr, I presume the JIT can handle situations where
> in the final extra_pass the image needs to grow/shrink again (due to different
> final address for the call)?

That's a good catch. We don't handle that -- we expect to get the size 
right on first pass. We could probably have PPC_FUNC_ADDR() pad the 
result with nops to make it a constant 5-instruction sequence.

> 
>> +				if (fp->aux->func && off < fp->aux->func_cnt)
>> +					/* use the subprog id from the off
>> +					 * field to lookup the callee address
>> +					 */
>> +					func = (u8 *) fp->aux->func[off]->bpf_func;
>> +				else
>> +					return -EINVAL;
>> +			/* kernel helper call */
>> +			else
>> +				func = (u8 *) __bpf_call_base + imm;
>>  
>>  			bpf_jit_emit_func_call(image, ctx, (u64)func);
>>  
>> @@ -864,6 +876,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
>>  	return 0;
>>  }
>>  
>> +struct powerpc64_jit_data {
>> +	struct bpf_binary_header *header;
>> +	u32 *addrs;
>> +	u8 *image;
>> +	u32 proglen;
>> +	struct codegen_context ctx;
>> +};
>> +
>>  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  {
>>  	u32 proglen;
>> @@ -871,6 +891,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  	u8 *image = NULL;
>>  	u32 *code_base;
>>  	u32 *addrs;
>> +	struct powerpc64_jit_data *jit_data;
>>  	struct codegen_context cgctx;
>>  	int pass;
>>  	int flen;
>> @@ -878,6 +899,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  	struct bpf_prog *org_fp = fp;
>>  	struct bpf_prog *tmp_fp;
>>  	bool bpf_blinded = false;
>> +	bool extra_pass = false;
>>  
>>  	if (!fp->jit_requested)
>>  		return org_fp;
>> @@ -891,7 +913,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  		fp = tmp_fp;
>>  	}
>>  
>> +	jit_data = fp->aux->jit_data;
>> +	if (!jit_data) {
>> +		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
>> +		if (!jit_data) {
>> +			fp = org_fp;
>> +			goto out;
>> +		}
>> +		fp->aux->jit_data = jit_data;
>> +	}
>> +
>>  	flen = fp->len;
>> +	addrs = jit_data->addrs;
>> +	if (addrs) {
>> +		cgctx = jit_data->ctx;
>> +		image = jit_data->image;
>> +		bpf_hdr = jit_data->header;
>> +		proglen = jit_data->proglen;
>> +		alloclen = proglen + FUNCTION_DESCR_SIZE;
>> +		extra_pass = true;
>> +		goto skip_init_ctx;
>> +	}
>> +
>>  	addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
>>  	if (addrs == NULL) {
>>  		fp = org_fp;
> 
> In this case of !addrs, we leak the just allocated jit_data here!
> 
>> @@ -904,10 +947,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
>>  
>>  	/* Scouting faux-generate pass 0 */
>> -	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
>> +	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
>>  		/* We hit something illegal or unsupported. */
>>  		fp = org_fp;
>> -		goto out;
>> +		goto out_addrs;
>>  	}
>>  
>>  	/*
>> @@ -925,9 +968,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  			bpf_jit_fill_ill_insns);
>>  	if (!bpf_hdr) {
>>  		fp = org_fp;
>> -		goto out;
>> +		goto out_addrs;
>>  	}
>>  
>> +skip_init_ctx:
>>  	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
>>  
>>  	/* Code generation passes 1-2 */
>> @@ -935,7 +979,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  		/* Now build the prologue, body code & epilogue for real. */
>>  		cgctx.idx = 0;
>>  		bpf_jit_build_prologue(code_base, &cgctx);
>> -		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
>> +		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
>>  		bpf_jit_build_epilogue(code_base, &cgctx);
>>  
>>  		if (bpf_jit_enable > 1)
>> @@ -956,15 +1000,30 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
>>  	((u64 *)image)[1] = local_paca->kernel_toc;
>>  #endif
>>  
>> +	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
>> +
>> +	if (!fp->is_func || extra_pass) {
>> +		bpf_jit_binary_lock_ro(bpf_hdr);
> 
> powerpc doesn't implement set_memory_ro(). Generally this is not a problem since
> set_memory_ro() defaults to 'return 0' in this case, but since the bpf_jit_free()
> destructor is overridden here, there's no bpf_jit_binary_unlock_ro() and in case
> powerpc would get set_memory_*() support one day this will then crash in random
> places once the mem gets back to the allocator, thus hard to debug. Two options:
> either you remove the bpf_jit_free() override or you remove the bpf_jit_binary_lock_ro().

Yeah, we shouldn't be using the lock here.

Thanks,
Naveen

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: mscc: ocelot: add bonding support
From: Andrew Lunn @ 2018-05-18 16:04 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: David S . Miller, Allan Nielsen, razvan.stefanescu, po.liu,
	Thomas Petazzoni, Florian Fainelli, netdev, linux-kernel
In-Reply-To: <20180518064106.10122-2-alexandre.belloni@bootlin.com>

On Fri, May 18, 2018 at 08:41:05AM +0200, Alexandre Belloni wrote:
> Add link aggregation hardware offload support for Ocelot.

Hi Alexandre

What i don't see here is anything checking the mode being
requested. If the user requests 'random' but the hardware only
supports 'roundrobin', you probably should not do the offload.

	 Andrew

^ permalink raw reply

* Re: KASAN: use-after-free Read in remove_wait_queue (2)
From: Guillaume Nault @ 2018-05-18 16:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-ppp, Paul Mackerras, netdev, linux-kernel, syzkaller-bugs,
	syzbot, viro
In-Reply-To: <20180514061155.GL677@sol.localdomain>

On Sun, May 13, 2018 at 11:11:55PM -0700, Eric Biggers wrote:
> [+ppp list and maintainer]
> 
> This is a bug in ppp_generic.c; it still happens on Linus' tree and it's easily
> reproducible, see program below.  The bug is that the PPPIOCDETACH ioctl doesn't
> consider that the file can still be attached to epoll instances even when
> ->f_count == 1.

Right. What would it take to remove the file for the epoll instances?
Sorry for the naive question, but I'm not familiar with VFS and didn't
find a helper function we could call.

> Also, the reproducer doesn't test this but I think ppp_poll(),
> ppp_read(), and ppp_write() can all race with PPPIOCDETACH, causing
> use-after-frees as well.

I also believe so. ppp_release() resets ->private_data, and even though
functions like ppp_read() test ->private_data before executing, there's
no synchronisation mechanism to ensure that the update is visible
before the unit or channel is destroyed. Is that the kind of race you
had in mind?

> Any chance that PPPIOCDETACH can simply be removed,
> given that it's apparently been "deprecated" for 16 years?
> Does anyone use it?

The only users I'm aware of are pppd versions older than ppp-2.4.2
(released in November 2003). But even at that time, there were issues
with PPPIOCDETACH as pppd didn't seem to react properly when this call
failed. An Internet search on the "PPPIOCDETACH file->f_count=" kernel
log string, or on the "Couldn't release PPP unit: Invalid argument"
error message of pppd, returns several related bug reports.

Originally, PPPIOCDETACH never failed, but testing ->f_count was
later introduced to fix crashes when the file descriptor had been
duplicated. It seems that this was motivated by polling issues too.

Long story short, it looks like PPPIOCDETACH never has worked well
and we have at least two more bugs to fix. Given how it has proven
fragile, I wouldn't be surprised if there were even more lurking
around. I'd say that it's probably safer to drop it than to add more
workarounds and playing wack-a-mole with those bugs.

^ permalink raw reply

* [PATCH v2 net-next] net: stmmac: Populate missing callbacks in HWIF initialization
From: Jose Abreu @ 2018-05-18 15:54 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, Corentin Labbe, David S. Miller, Joao Pinto,
	Giuseppe Cavallaro, Alexandre Torgue

Some HW specific setups, like sun8i, do not populate all the necessary
callbacks, which is what HWIF helpers were expecting.

Fix this by always trying to get the generic helpers and populate them
if they were not previously populated by HW specific setup.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Fixes: 5f0456b43140 ("net: stmmac: Implement logic to automatically
select HW Interface")
Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Cc: Corentin Labbe <clabbe.montjoie@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
Changes from v1:
	- Rebased to latest net-next
---
 drivers/net/ethernet/stmicro/stmmac/hwif.c |   38 ++++++++++++++++-----------
 1 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 23a1264..14770fc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -189,13 +189,16 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 	bool needs_gmac = priv->plat->has_gmac;
 	const struct stmmac_hwif_entry *entry;
 	struct mac_device_info *mac;
+	bool needs_setup = true;
 	int i, ret;
 	u32 id;
 
 	if (needs_gmac) {
 		id = stmmac_get_id(priv, GMAC_VERSION);
-	} else {
+	} else if (needs_gmac4) {
 		id = stmmac_get_id(priv, GMAC4_VERSION);
+	} else {
+		id = 0;
 	}
 
 	/* Save ID for later use */
@@ -209,13 +212,12 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 
 	/* Check for HW specific setup first */
 	if (priv->plat->setup) {
-		priv->hw = priv->plat->setup(priv);
-		if (!priv->hw)
-			return -ENOMEM;
-		return 0;
+		mac = priv->plat->setup(priv);
+		needs_setup = false;
+	} else {
+		mac = devm_kzalloc(priv->device, sizeof(*mac), GFP_KERNEL);
 	}
 
-	mac = devm_kzalloc(priv->device, sizeof(*mac), GFP_KERNEL);
 	if (!mac)
 		return -ENOMEM;
 
@@ -227,24 +229,28 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 			continue;
 		if (needs_gmac4 ^ entry->gmac4)
 			continue;
-		if (id < entry->min_id)
+		/* Use synopsys_id var because some setups can override this */
+		if (priv->synopsys_id < entry->min_id)
 			continue;
 
-		mac->desc = entry->desc;
-		mac->dma = entry->dma;
-		mac->mac = entry->mac;
-		mac->ptp = entry->hwtimestamp;
-		mac->mode = entry->mode;
-		mac->tc = entry->tc;
+		/* Only use generic HW helpers if needed */
+		mac->desc = mac->desc ? : entry->desc;
+		mac->dma = mac->dma ? : entry->dma;
+		mac->mac = mac->mac ? : entry->mac;
+		mac->ptp = mac->ptp ? : entry->hwtimestamp;
+		mac->mode = mac->mode ? : entry->mode;
+		mac->tc = mac->tc ? : entry->tc;
 
 		priv->hw = mac;
 		priv->ptpaddr = priv->ioaddr + entry->regs.ptp_off;
 		priv->mmcaddr = priv->ioaddr + entry->regs.mmc_off;
 
 		/* Entry found */
-		ret = entry->setup(priv);
-		if (ret)
-			return ret;
+		if (needs_setup) {
+			ret = entry->setup(priv);
+			if (ret)
+				return ret;
+		}
 
 		/* Run quirks, if needed */
 		if (entry->quirks) {
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH bpf v2 6/6] bpf: fix JITed dump for multi-function programs via syscall
From: Daniel Borkmann @ 2018-05-18 15:51 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-7-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> Currently, for multi-function programs, we cannot get the JITed
> instructions using the bpf system call's BPF_OBJ_GET_INFO_BY_FD
> command. Because of this, userspace tools such as bpftool fail
> to identify a multi-function program as being JITed or not.
> 
> With the JIT enabled and the test program running, this can be
> verified as follows:
> 
>   # cat /proc/sys/net/core/bpf_jit_enable
>   1
> 
> Before applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T11:43:38+0530  uid 0
>           xlated 216B  not jited  memlock 65536B
>   ...
> 
>   # bpftool prog dump jited id 1
>   no instructions returned
> 
> After applying this patch:
> 
>   # bpftool prog list
>   1: kprobe  name foo  tag b811aab41a39ad3d  gpl
>           loaded_at 2018-05-16T12:13:01+0530  uid 0
>           xlated 216B  jited 308B  memlock 65536B
>   ...

That's really nice! One comment inline below:

>   # bpftool prog dump jited id 1
>      0:   nop
>      4:   nop
>      8:   mflr    r0
>      c:   std     r0,16(r1)
>     10:   stdu    r1,-112(r1)
>     14:   std     r31,104(r1)
>     18:   addi    r31,r1,48
>     1c:   li      r3,10
>   ...
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  kernel/bpf/syscall.c | 38 ++++++++++++++++++++++++++++++++------
>  1 file changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 54a72fafe57c..2430d159078c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1896,7 +1896,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	struct bpf_prog_info info = {};
>  	u32 info_len = attr->info.info_len;
>  	char __user *uinsns;
> -	u32 ulen;
> +	u32 ulen, i;
>  	int err;
>  
>  	err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
> @@ -1922,7 +1922,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	ulen = min_t(u32, info.nr_map_ids, ulen);
>  	if (ulen) {
>  		u32 __user *user_map_ids = u64_to_user_ptr(info.map_ids);
> -		u32 i;
>  
>  		for (i = 0; i < ulen; i++)
>  			if (put_user(prog->aux->used_maps[i]->id,
> @@ -1970,13 +1969,41 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	 * for offload.
>  	 */
>  	ulen = info.jited_prog_len;
> -	info.jited_prog_len = prog->jited_len;
> +	if (prog->aux->func_cnt) {
> +		info.jited_prog_len = 0;
> +		for (i = 0; i < prog->aux->func_cnt; i++)
> +			info.jited_prog_len += prog->aux->func[i]->jited_len;
> +	} else {
> +		info.jited_prog_len = prog->jited_len;
> +	}
> +
>  	if (info.jited_prog_len && ulen) {
>  		if (bpf_dump_raw_ok()) {
>  			uinsns = u64_to_user_ptr(info.jited_prog_insns);
>  			ulen = min_t(u32, info.jited_prog_len, ulen);
> -			if (copy_to_user(uinsns, prog->bpf_func, ulen))
> -				return -EFAULT;
> +
> +			/* for multi-function programs, copy the JITed
> +			 * instructions for all the functions
> +			 */
> +			if (prog->aux->func_cnt) {
> +				u32 len, free;
> +				u8 *img;
> +
> +				free = ulen;
> +				for (i = 0; i < prog->aux->func_cnt; i++) {
> +					len = prog->aux->func[i]->jited_len;
> +					img = (u8 *) prog->aux->func[i]->bpf_func;
> +					if (len > free)
> +						break;
> +					if (copy_to_user(uinsns, img, len))
> +						return -EFAULT;
> +					uinsns += len;
> +					free -= len;

Is there any way we can introduce a delimiter between the different
images such that they could be more easily correlated with the call
from the main (or other sub-)program instead of having one contiguous
dump blob?

> +				}
> +			} else {
> +				if (copy_to_user(uinsns, prog->bpf_func, ulen))
> +					return -EFAULT;
> +			}
>  		} else {
>  			info.jited_prog_insns = 0;
>  		}
> @@ -1987,7 +2014,6 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	if (info.nr_jited_ksyms && ulen) {
>  		u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms);
>  		ulong ksym_addr;
> -		u32 i;
>  
>  		/* copy the address of the kernel symbol corresponding to
>  		 * each function
> 

^ permalink raw reply

* Re: [PATCH v3 net-next 0/6] tcp: implement SACK compression
From: Eric Dumazet @ 2018-05-18 15:48 UTC (permalink / raw)
  To: David Miller, edumazet
  Cc: netdev, toke, ncardwell, ycheng, soheil, eric.dumazet
In-Reply-To: <20180518.114238.2120564820986009221.davem@davemloft.net>



On 05/18/2018 08:42 AM, David Miller wrote:

> This looks great, series applied, thanks Eric.
> 
> So now we handle locally terminated traffic well, however we still need
> something that handles traffic flowing through the system.  And for that
> reason the CAKE ACK filter still has merit.
> 
> I completely agree with you that it has to be implemented properly,
> parse all TCP options and the SACK blocks, and pass the ACKs through
> when there are options it doesn't understand or the SACK/timestampe/etc.
> update contains new information that must be passed along and not dropper.
> 
> Thanks again.
> 

Be assured I will carefully review Cake ACK filter ;)

Thanks David !

^ permalink raw reply

* [PATCH] net: sched: don't disable bh when accessing action idr
From: Vlad Buslov @ 2018-05-18 15:45 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, linux-kernel, Vlad Buslov

Underlying implementation of action map has changed and doesn't require
disabling bh anymore. Replace all action idr spinlock usage with regular
calls that do not disable bh.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/act_api.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 72251241..3f4cf93 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -77,9 +77,9 @@ static void free_tcf(struct tc_action *p)
 
 static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
 {
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 	idr_remove(&idrinfo->action_idr, p->tcfa_index);
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 	gen_kill_estimator(&p->tcfa_rate_est);
 	free_tcf(p);
 }
@@ -156,7 +156,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	struct tc_action *p;
 	unsigned long id = 1;
 
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 
 	s_i = cb->args[0];
 
@@ -191,7 +191,7 @@ static int tcf_dump_walker(struct tcf_idrinfo *idrinfo, struct sk_buff *skb,
 	if (index >= 0)
 		cb->args[0] = index + 1;
 
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 	if (n_i) {
 		if (act_flags & TCA_FLAG_LARGE_DUMP_ON)
 			cb->args[1] = n_i;
@@ -261,9 +261,9 @@ static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
 {
 	struct tc_action *p = NULL;
 
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 	p = idr_find(&idrinfo->action_idr, index);
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 
 	return p;
 }
@@ -323,7 +323,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	}
 	spin_lock_init(&p->tcfa_lock);
 	idr_preload(GFP_KERNEL);
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 	/* user doesn't specify an index */
 	if (!index) {
 		index = 1;
@@ -331,7 +331,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	} else {
 		err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
 	}
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 	idr_preload_end();
 	if (err)
 		goto err3;
@@ -369,9 +369,9 @@ void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
 {
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
 
-	spin_lock_bh(&idrinfo->lock);
+	spin_lock(&idrinfo->lock);
 	idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
-	spin_unlock_bh(&idrinfo->lock);
+	spin_unlock(&idrinfo->lock);
 }
 EXPORT_SYMBOL(tcf_idr_insert);
 
-- 
2.7.5

^ permalink raw reply related

* Re: WARNING in ip_recv_error
From: David Miller @ 2018-05-18 15:44 UTC (permalink / raw)
  To: eric.dumazet
  Cc: threeearcat, kuznet, yoshfuji, netdev, linux-kernel, byoungyoung,
	kt0755, bammanag, willemb
In-Reply-To: <293d029c-b14c-a625-3703-97a5754e99f1@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 18 May 2018 08:30:43 -0700

> We probably need to revert Willem patch (7ce875e5ecb8562fd44040f69bda96c999e38bbc)

Is it really valid to reach ip_recv_err with an ipv6 socket?

^ permalink raw reply

* Re: [PATCH bpf v2 3/6] bpf: get kernel symbol addresses via syscall
From: Daniel Borkmann @ 2018-05-18 15:43 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-4-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> This adds new two new fields to struct bpf_prog_info. For
> multi-function programs, these fields can be used to pass
> a list of kernel symbol addresses for all functions in a
> given program and to userspace using the bpf system call
> with the BPF_OBJ_GET_INFO_BY_FD command.
> 
> When bpf_jit_kallsyms is enabled, we can get the address
> of the corresponding kernel symbol for a callee function
> and resolve the symbol's name. The address is determined
> by adding the value of the call instruction's imm field
> to __bpf_call_base. This offset gets assigned to the imm
> field by the verifier.
> 
> For some architectures, such as powerpc64, the imm field
> is not large enough to hold this offset.
> 
> We resolve this by:
> 
> [1] Assigning the subprog id to the imm field of a call
>     instruction in the verifier instead of the offset of
>     the callee's symbol's address from __bpf_call_base.
> 
> [2] Determining the address of a callee's corresponding
>     symbol by using the imm field as an index for the
>     list of kernel symbol addresses now available from
>     the program info.
> 
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  include/uapi/linux/bpf.h |  2 ++
>  kernel/bpf/syscall.c     | 20 ++++++++++++++++++++
>  kernel/bpf/verifier.c    |  7 +------
>  3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d94d333a8225..040c9cac7303 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2188,6 +2188,8 @@ struct bpf_prog_info {
>  	__u32 xlated_prog_len;
>  	__aligned_u64 jited_prog_insns;
>  	__aligned_u64 xlated_prog_insns;
> +	__aligned_u64 jited_ksyms;
> +	__u32 nr_jited_ksyms;
>  	__u64 load_time;	/* ns since boottime */
>  	__u32 created_by_uid;
>  	__u32 nr_map_ids;
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index bfcde949c7f8..54a72fafe57c 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1933,6 +1933,7 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  	if (!capable(CAP_SYS_ADMIN)) {
>  		info.jited_prog_len = 0;
>  		info.xlated_prog_len = 0;
> +		info.nr_jited_ksyms = 0;
>  		goto done;
>  	}
>  
> @@ -1981,6 +1982,25 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
>  		}
>  	}
>  
> +	ulen = info.nr_jited_ksyms;
> +	info.nr_jited_ksyms = prog->aux->func_cnt;
> +	if (info.nr_jited_ksyms && ulen) {

Since this exposes addresses (though masked one which is correct), this
definitely needs to be guarded with bpf_dump_raw_ok() like we do in other
places here (see JIT dump for example).

> +		u64 __user *user_jited_ksyms = u64_to_user_ptr(info.jited_ksyms);
> +		ulong ksym_addr;
> +		u32 i;
> +
> +		/* copy the address of the kernel symbol corresponding to
> +		 * each function
> +		 */
> +		ulen = min_t(u32, info.nr_jited_ksyms, ulen);
> +		for (i = 0; i < ulen; i++) {
> +			ksym_addr = (ulong) prog->aux->func[i]->bpf_func;
> +			ksym_addr &= PAGE_MASK;
> +			if (put_user((u64) ksym_addr, &user_jited_ksyms[i]))
> +				return -EFAULT;
> +		}
> +	}
> +
>  done:
>  	if (copy_to_user(uinfo, &info, info_len) ||
>  	    put_user(info_len, &uattr->info.info_len))
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 6c56cce9c4e3..e826c396aba2 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5426,17 +5426,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>  	 * later look the same as if they were interpreted only.
>  	 */
>  	for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
> -		unsigned long addr;
> -
>  		if (insn->code != (BPF_JMP | BPF_CALL) ||
>  		    insn->src_reg != BPF_PSEUDO_CALL)
>  			continue;
>  		insn->off = env->insn_aux_data[i].call_imm;
>  		subprog = find_subprog(env, i + insn->off + 1);
> -		addr  = (unsigned long)func[subprog]->bpf_func;

Hmm, in current bpf tree this says 'subprog + 1' here, so this is not
rebased against bpf tree but bpf-next (unlike what subject says)?

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/kernel/bpf/verifier.c#n5351

> -		addr &= PAGE_MASK;
> -		insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
> -			    addr - __bpf_call_base;
> +		insn->imm = subprog;
>  	}
>  
>  	prog->jited = 1;
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox