Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Samudrala, Sridhar @ 2019-08-16  6:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: magnus.karlsson, bjorn.topel, netdev, bpf, intel-wired-lan,
	maciej.fijalkowski, tom.herbert
In-Reply-To: <20190815122844.52eeda08@cakuba.netronome.com>



On 8/15/2019 12:28 PM, Jakub Kicinski wrote:
> On Wed, 14 Aug 2019 20:46:18 -0700, Sridhar Samudrala wrote:
>> This patch series introduces XDP_SKIP_BPF flag that can be specified
>> during the bind() call of an AF_XDP socket to skip calling the BPF
>> program in the receive path and pass the buffer directly to the socket.
>>
>> When a single AF_XDP socket is associated with a queue and a HW
>> filter is used to redirect the packets and the app is interested in
>> receiving all the packets on that queue, we don't need an additional
>> BPF program to do further filtering or lookup/redirect to a socket.
>>
>> Here are some performance numbers collected on
>>    - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>>    - Intel 40Gb Ethernet NIC (i40e)
>>
>> All tests use 2 cores and the results are in Mpps.
>>
>> turbo on (default)
>> ---------------------------------------------	
>>                        no-skip-bpf    skip-bpf
>> ---------------------------------------------	
>> rxdrop zerocopy           21.9         38.5
>> l2fwd  zerocopy           17.0         20.5
>> rxdrop copy               11.1         13.3
>> l2fwd  copy                1.9          2.0
>>
>> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>> ---------------------------------------------	
>>                        no-skip-bpf    skip-bpf
>> ---------------------------------------------	
>> rxdrop zerocopy           15.4         29.0
>> l2fwd  zerocopy           11.8         18.2
>> rxdrop copy                8.2         10.5
>> l2fwd  copy                1.7          1.7
>> ---------------------------------------------	
> 
> Could you include a third column here - namely the in-XDP performance?
> AFAIU the way to achieve better performance with AF_XDP is to move the
> fast path into the kernel's XDP program..

The in-xdp drop that can be measured with xdp1 is lower than rxdrop
zerocopy with skip-bpf although in-xdp drop uses only 1 core. af-xdp 
1-core performance would improve with need-wakeup or busypoll patches 
and based on early experiments so far af-xdp with need-wakeup/busypoll + 
skip-bpf perf is higher than in-xdp drop.

Will include in-xdp drop data too in the next revision.

> 
> Maciej's work on batching XDP program's execution should lower the
> retpoline overhead, without leaning close to the bypass model.
> 

^ permalink raw reply

* Re: [PATCH bpf-next 0/5] Add support for SKIP_BPF flag for AF_XDP sockets
From: Samudrala, Sridhar @ 2019-08-16  6:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, magnus.karlsson, bjorn.topel,
	netdev, bpf, intel-wired-lan, maciej.fijalkowski, tom.herbert
In-Reply-To: <87ftm2wdzk.fsf@toke.dk>



On 8/15/2019 10:11 AM, Toke Høiland-Jørgensen wrote:
> "Samudrala, Sridhar" <sridhar.samudrala@intel.com> writes:
> 
>> On 8/15/2019 4:12 AM, Toke Høiland-Jørgensen wrote:
>>> Sridhar Samudrala <sridhar.samudrala@intel.com> writes:
>>>
>>>> This patch series introduces XDP_SKIP_BPF flag that can be specified
>>>> during the bind() call of an AF_XDP socket to skip calling the BPF
>>>> program in the receive path and pass the buffer directly to the socket.
>>>>
>>>> When a single AF_XDP socket is associated with a queue and a HW
>>>> filter is used to redirect the packets and the app is interested in
>>>> receiving all the packets on that queue, we don't need an additional
>>>> BPF program to do further filtering or lookup/redirect to a socket.
>>>>
>>>> Here are some performance numbers collected on
>>>>     - 2 socket 28 core Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
>>>>     - Intel 40Gb Ethernet NIC (i40e)
>>>>
>>>> All tests use 2 cores and the results are in Mpps.
>>>>
>>>> turbo on (default)
>>>> ---------------------------------------------	
>>>>                         no-skip-bpf    skip-bpf
>>>> ---------------------------------------------	
>>>> rxdrop zerocopy           21.9         38.5
>>>> l2fwd  zerocopy           17.0         20.5
>>>> rxdrop copy               11.1         13.3
>>>> l2fwd  copy                1.9          2.0
>>>>
>>>> no turbo :  echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
>>>> ---------------------------------------------	
>>>>                         no-skip-bpf    skip-bpf
>>>> ---------------------------------------------	
>>>> rxdrop zerocopy           15.4         29.0
>>>> l2fwd  zerocopy           11.8         18.2
>>>> rxdrop copy                8.2         10.5
>>>> l2fwd  copy                1.7          1.7
>>>> ---------------------------------------------
>>>
>>> You're getting this performance boost by adding more code in the fast
>>> path for every XDP program; so what's the performance impact of that for
>>> cases where we do run an eBPF program?
>>
>> The no-skip-bpf results are pretty close to what i see before the
>> patches are applied. As umem is cached in rx_ring for zerocopy the
>> overhead is much smaller compared to the copy scenario where i am
>> currently calling xdp_get_umem_from_qid().
> 
> I meant more for other XDP programs; what is the performance impact of
> XDP_DROP, for instance?

Will run xdp1 with and without the patches and include that data with 
the next revision.


^ permalink raw reply

* Re: [PATCH v4 11/14] net: phy: adin: implement Energy Detect Powerdown mode
From: Ardelean, Alexandru @ 2019-08-16  6:09 UTC (permalink / raw)
  To: devicetree@vger.kernel.org, netdev@vger.kernel.org,
	f.fainelli@gmail.com, linux-kernel@vger.kernel.org
  Cc: andrew@lunn.ch, davem@davemloft.net, mark.rutland@arm.com,
	robh+dt@kernel.org, hkallweit1@gmail.com
In-Reply-To: <f13feaee-0bad-a774-5527-296b6f74c91b@gmail.com>

On Wed, 2019-08-14 at 10:57 -0700, Florian Fainelli wrote:
> [External]
> 
> 
> 
> On 8/12/2019 4:23 AM, Alexandru Ardelean wrote:
> > The ADIN PHYs support Energy Detect Powerdown mode, which puts the PHY into
> > a low power mode when there is no signal on the wire (typically cable
> > unplugged).
> > This behavior is enabled by default, but can be disabled via device
> > property.
> 
> We could consider adding a PHY tunable, having this as a Device Tree
> property amounts to putting a policy inside DT, which is frowned upon.

That would be interesting actually, and I would also prefer it over static DT.
Maybe for this patch, I'll just enable EDPD by default and see about a tuna option.

> 
> > Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
> 
> Other than that, the code looks fine:
> 
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the kbuild tree
From: Stephen Rothwell @ 2019-08-16  6:01 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David Miller, Networking, Masahiro Yamada,
	Linux Next Mailing List, Linux Kernel Mailing List, Kees Cook,
	Andrii Nakryiko, Daniel Borkmann
In-Reply-To: <CAEf4BzY9dDZF-DBDmuQQz0Rcx3DNGvQn_GLr0Uar1PAbAf2iig@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 426 bytes --]

Hi Andrii,

On Thu, 15 Aug 2019 22:21:29 -0700 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> Thanks, Stephen! Looks good except one minor issue below.

Thanks for checking.

> >   vmlinux_link()
> >   {
> >  +      info LD ${2}  
> 
> This needs to be ${1}.

At least its only an information message and doesn't affect the build.
I will fix my resolution for Monday.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Song Liu @ 2019-08-16  5:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Kees Cook, Andy Lutomirski, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API, LSM List
In-Reply-To: <B0364660-AD6A-4E5C-B04F-3B6DA78B4BBE@amacapital.net>



> On Aug 15, 2019, at 5:54 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> 
> 
> 
>> On Aug 15, 2019, at 4:46 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> 
>>> 
>>> I'm not sure why you draw the line for VMs -- they're just as buggy
>>> as anything else. Regardless, I reject this line of thinking: yes,
>>> all software is buggy, but that isn't a reason to give up.
>> 
>> hmm. are you saying you want kernel community to work towards
>> making containers (namespaces) being able to run arbitrary code
>> downloaded from the internet?
> 
> Yes.
> 
> As an example, Sandstorm uses a combination of namespaces (user, network, mount, ipc) and a moderately permissive seccomp policy to run arbitrary code. Not just little snippets, either — node.js, Mongo, MySQL, Meteor, and other fairly heavyweight stacks can all run under Sandstorm, with the whole stack (database engine binaries, etc) supplied by entirely untrusted customers.  During the time Sandstorm was under active development, I can recall *one* bug that would have allowed a sandbox escape. That’s a pretty good track record.  (Also, Meltdown and Spectre, sigh.)
> 
> To be clear, Sandstorm did not allow creation of a userns by the untrusted code, and Sandstorm would have heavily restricted bpf(), but that should only be necessary because of the possibility of kernel bugs, not because of the overall design.
> 
> Alexei, I’m trying to encourage you to aim for something even better than you have now. Right now, if you grant a user various very strong capabilities, that user’s systemd can use bpf network filters.  Your proposal would allow this with a different, but still very strong, set of capabilities. There’s nothing wrong with this per se, but I think you can aim much higher:
> 
> CAP_NET_ADMIN and your CAP_BPF both effectively allow the holder to take over the system, *by design*.  I’m suggesting that you engage the security community (Kees, myself, Aleksa, Jann, Serge, Christian, etc) to aim for something better: make it so that a normal Linux distro would be willing to relax its settings enough so that normal users can use bpf filtering in the systemd units and maybe eventually use even more bpf() capabilities. And let’s make is to that mainstream container managers (that use userns!) will be willing (as an option) to delegate bpf() to their containers. We’re happy to help design, review, and even write code, but we need you to be willing to work with us to make a design that seems like it will work and then to wait long enough to merge it for us to think about it, try to poke holes in it, and convince ourselves and each other that it has a good chance of being sound.
> 
> Obviously there will be many cases where an unprivileged program should *not* be able to use bpf() IP filtering, but let’s make it so that enabling these advanced features does not automatically give away the keys to the kingdom.
> 
> (Sandstorm still exists but is no longer as actively developed, sadly.)

I am trying to understand different perspectives here. 

Disclaimer: Alexei and I both work for Facebook. But he may disagree 
with everything I am about to say below, because we haven't sync'ed 
about this for a while. :)

I think there are two types of use cases here: 

    1. CAP_BPF_ADMIN: one big key to all sys_bpf(). 
    2. CAP_BPF: subset of sys_bpf() that is safe for containers.

IIUC, currently, CAP_BPF_ADMIN is (almost) same as CAP_SYS_ADMIN. 
And there aren't many real world use cases for CAP_BPF. 

The /dev/bpf patch tries to separate CAP_BPF_ADMIN from CAP_SYS_ADMIN.
On the other hand, Andy would like to introduce CAP_BPF and build
amazing use cases around it (chicken-egg problem). 

Did I misunderstand anything?

If not, I think these two use cases do not really conflict with each
other, and we probably need both of them. Then, the next question is 
do we really need both/either of them. Maybe having two separate 
discussions would make it easier?


The following are some questions I am trying to understand for 
the two cases. 

For CAP_BPF_ADMIN (or /dev/bpf):
Can we just use CAP_NET_ADMIN? It is safer than CAP_SYS_ADMIN, and
reuse existing CAP_ should be easier than introducing a new one? 

For CAP_BPF: 
Do we really need it for the containers? Is it possible to implement 
all container use cases with SUID? At this moment, I think SUID is 
the right way to go for this use case, because this is likely to 
start with a small set of functionalities. We can introduce CAP_BPF
when the container use case is too complicated for SUID. 


I hope some of these questions/thoughts would make some sense?

Thanks,
Song

^ permalink raw reply

* Re: [net-next v2 1/1] tipc: clean up skb list lock handling on send path
From: Xin Long @ 2019-08-16  5:29 UTC (permalink / raw)
  To: Jon Maloy
  Cc: davem, network dev, tung.q.nguyen, hoang.h.le, Long Xin, shuali,
	Ying Xue, Eric Dumazet, tipc-discussion
In-Reply-To: <1565880170-19548-1-git-send-email-jon.maloy@ericsson.com>

On Thu, Aug 15, 2019 at 11:36 PM Jon Maloy <jon.maloy@ericsson.com> wrote:
>
> The policy for handling the skb list locks on the send and receive paths
> is simple.
>
> - On the send path we never need to grab the lock on the 'xmitq' list
>   when the destination is an exernal node.
>
> - On the receive path we always need to grab the lock on the 'inputq'
>   list, irrespective of source node.
>
> However, when transmitting node local messages those will eventually
> end up on the receive path of a local socket, meaning that the argument
> 'xmitq' in tipc_node_xmit() will become the 'ínputq' argument in  the
> function tipc_sk_rcv(). This has been handled by always initializing
> the spinlock of the 'xmitq' list at message creation, just in case it
> may end up on the receive path later, and despite knowing that the lock
> in most cases never will be used.
>
> This approach is inaccurate and confusing, and has also concealed the
> fact that the stated 'no lock grabbing' policy for the send path is
> violated in some cases.
>
> We now clean up this by never initializing the lock at message creation,
> instead doing this at the moment we find that the message actually will
> enter the receive path. At the same time we fix the four locations
> where we incorrectly access the spinlock on the send/error path.
>
> This patch also reverts commit d12cffe9329f ("tipc: ensure head->lock
> is initialised") which has now become redundant.
>
> CC: Eric Dumazet <edumazet@google.com>
> Reported-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> Acked-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>
> ---
> v2: removed more unnecessary lock initializations after feedback
>     from Xin Long.
> ---
>  net/tipc/bcast.c      | 10 +++++-----
>  net/tipc/group.c      |  4 ++--
>  net/tipc/link.c       | 14 +++++++-------
>  net/tipc/name_distr.c |  2 +-
>  net/tipc/node.c       |  7 ++++---
>  net/tipc/socket.c     | 14 +++++++-------
>  6 files changed, 26 insertions(+), 25 deletions(-)
>
> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> index 34f3e56..6ef1abd 100644
> --- a/net/tipc/bcast.c
> +++ b/net/tipc/bcast.c
> @@ -185,7 +185,7 @@ static void tipc_bcbase_xmit(struct net *net, struct sk_buff_head *xmitq)
>         }
>
>         /* We have to transmit across all bearers */
> -       skb_queue_head_init(&_xmitq);
> +       __skb_queue_head_init(&_xmitq);
>         for (bearer_id = 0; bearer_id < MAX_BEARERS; bearer_id++) {
>                 if (!bb->dests[bearer_id])
>                         continue;
> @@ -256,7 +256,7 @@ static int tipc_bcast_xmit(struct net *net, struct sk_buff_head *pkts,
>         struct sk_buff_head xmitq;
>         int rc = 0;
>
> -       skb_queue_head_init(&xmitq);
> +       __skb_queue_head_init(&xmitq);
>         tipc_bcast_lock(net);
>         if (tipc_link_bc_peers(l))
>                 rc = tipc_link_xmit(l, pkts, &xmitq);
> @@ -286,7 +286,7 @@ static int tipc_rcast_xmit(struct net *net, struct sk_buff_head *pkts,
>         u32 dnode, selector;
>
>         selector = msg_link_selector(buf_msg(skb_peek(pkts)));
> -       skb_queue_head_init(&_pkts);
> +       __skb_queue_head_init(&_pkts);
>
>         list_for_each_entry_safe(dst, tmp, &dests->list, list) {
>                 dnode = dst->node;
> @@ -344,7 +344,7 @@ static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb,
>         msg_set_size(_hdr, MCAST_H_SIZE);
>         msg_set_is_rcast(_hdr, !msg_is_rcast(hdr));
>
> -       skb_queue_head_init(&tmpq);
> +       __skb_queue_head_init(&tmpq);
>         __skb_queue_tail(&tmpq, _skb);
>         if (method->rcast)
>                 tipc_bcast_xmit(net, &tmpq, cong_link_cnt);
> @@ -378,7 +378,7 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts,
>         int rc = 0;
>
>         skb_queue_head_init(&inputq);
> -       skb_queue_head_init(&localq);
> +       __skb_queue_head_init(&localq);
>
>         /* Clone packets before they are consumed by next call */
>         if (dests->local && !tipc_msg_reassemble(pkts, &localq)) {
> diff --git a/net/tipc/group.c b/net/tipc/group.c
> index 5f98d38..89257e2 100644
> --- a/net/tipc/group.c
> +++ b/net/tipc/group.c
> @@ -199,7 +199,7 @@ void tipc_group_join(struct net *net, struct tipc_group *grp, int *sk_rcvbuf)
>         struct tipc_member *m, *tmp;
>         struct sk_buff_head xmitq;
>
> -       skb_queue_head_init(&xmitq);
> +       __skb_queue_head_init(&xmitq);
>         rbtree_postorder_for_each_entry_safe(m, tmp, tree, tree_node) {
>                 tipc_group_proto_xmit(grp, m, GRP_JOIN_MSG, &xmitq);
>                 tipc_group_update_member(m, 0);
> @@ -435,7 +435,7 @@ bool tipc_group_cong(struct tipc_group *grp, u32 dnode, u32 dport,
>                 return true;
>         if (state == MBR_PENDING && adv == ADV_IDLE)
>                 return true;
> -       skb_queue_head_init(&xmitq);
> +       __skb_queue_head_init(&xmitq);
>         tipc_group_proto_xmit(grp, m, GRP_ADV_MSG, &xmitq);
>         tipc_node_distr_xmit(grp->net, &xmitq);
>         return true;
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index dd3155b..289e848 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -959,7 +959,7 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list,
>                 pr_warn("Too large msg, purging xmit list %d %d %d %d %d!\n",
>                         skb_queue_len(list), msg_user(hdr),
>                         msg_type(hdr), msg_size(hdr), mtu);
> -               skb_queue_purge(list);
> +               __skb_queue_purge(list);
>                 return -EMSGSIZE;
>         }
>
> @@ -988,7 +988,7 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list,
>                 if (likely(skb_queue_len(transmq) < maxwin)) {
>                         _skb = skb_clone(skb, GFP_ATOMIC);
>                         if (!_skb) {
> -                               skb_queue_purge(list);
> +                               __skb_queue_purge(list);
>                                 return -ENOBUFS;
>                         }
>                         __skb_dequeue(list);
> @@ -1668,7 +1668,7 @@ void tipc_link_create_dummy_tnl_msg(struct tipc_link *l,
>         struct sk_buff *skb;
>         u32 dnode = l->addr;
>
> -       skb_queue_head_init(&tnlq);
> +       __skb_queue_head_init(&tnlq);
>         skb = tipc_msg_create(TUNNEL_PROTOCOL, FAILOVER_MSG,
>                               INT_H_SIZE, BASIC_H_SIZE,
>                               dnode, onode, 0, 0, 0);
> @@ -1708,9 +1708,9 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
>         if (!tnl)
>                 return;
>
> -       skb_queue_head_init(&tnlq);
> -       skb_queue_head_init(&tmpxq);
> -       skb_queue_head_init(&frags);
> +       __skb_queue_head_init(&tnlq);
> +       __skb_queue_head_init(&tmpxq);
> +       __skb_queue_head_init(&frags);
>
>         /* At least one packet required for safe algorithm => add dummy */
>         skb = tipc_msg_create(TIPC_LOW_IMPORTANCE, TIPC_DIRECT_MSG,
> @@ -1720,7 +1720,7 @@ void tipc_link_tnl_prepare(struct tipc_link *l, struct tipc_link *tnl,
>                 pr_warn("%sunable to create tunnel packet\n", link_co_err);
>                 return;
>         }
> -       skb_queue_tail(&tnlq, skb);
> +       __skb_queue_tail(&tnlq, skb);
>         tipc_link_xmit(l, &tnlq, &tmpxq);
>         __skb_queue_purge(&tmpxq);
>
> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
> index 44abc8e..61219f0 100644
> --- a/net/tipc/name_distr.c
> +++ b/net/tipc/name_distr.c
> @@ -190,7 +190,7 @@ void tipc_named_node_up(struct net *net, u32 dnode)
>         struct name_table *nt = tipc_name_table(net);
>         struct sk_buff_head head;
>
> -       skb_queue_head_init(&head);
> +       __skb_queue_head_init(&head);
>
>         read_lock_bh(&nt->cluster_scope_lock);
>         named_distribute(net, &head, dnode, &nt->cluster_scope);
> diff --git a/net/tipc/node.c b/net/tipc/node.c
> index 1bdcf0f..c8f6177 100644
> --- a/net/tipc/node.c
> +++ b/net/tipc/node.c
> @@ -1444,13 +1444,14 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list,
>
>         if (in_own_node(net, dnode)) {
>                 tipc_loopback_trace(net, list);
> +               spin_lock_init(&list->lock);
>                 tipc_sk_rcv(net, list);
>                 return 0;
>         }
>
>         n = tipc_node_find(net, dnode);
>         if (unlikely(!n)) {
> -               skb_queue_purge(list);
> +               __skb_queue_purge(list);
>                 return -EHOSTUNREACH;
>         }
>
> @@ -1459,7 +1460,7 @@ int tipc_node_xmit(struct net *net, struct sk_buff_head *list,
>         if (unlikely(bearer_id == INVALID_BEARER_ID)) {
>                 tipc_node_read_unlock(n);
>                 tipc_node_put(n);
> -               skb_queue_purge(list);
> +               __skb_queue_purge(list);
>                 return -EHOSTUNREACH;
>         }
>
> @@ -1491,7 +1492,7 @@ int tipc_node_xmit_skb(struct net *net, struct sk_buff *skb, u32 dnode,
>  {
>         struct sk_buff_head head;
>
> -       skb_queue_head_init(&head);
> +       __skb_queue_head_init(&head);
>         __skb_queue_tail(&head, skb);
>         tipc_node_xmit(net, &head, dnode, selector);
>         return 0;
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 83ae41d..3b9f8cc 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -809,7 +809,7 @@ static int tipc_sendmcast(struct  socket *sock, struct tipc_name_seq *seq,
>         msg_set_nameupper(hdr, seq->upper);
>
>         /* Build message as chain of buffers */
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>         rc = tipc_msg_build(hdr, msg, 0, dlen, mtu, &pkts);
>
>         /* Send message if build was successful */
> @@ -853,7 +853,7 @@ static int tipc_send_group_msg(struct net *net, struct tipc_sock *tsk,
>         msg_set_grp_bc_seqno(hdr, bc_snd_nxt);
>
>         /* Build message as chain of buffers */
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>         mtu = tipc_node_get_mtu(net, dnode, tsk->portid);
>         rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>         if (unlikely(rc != dlen))
> @@ -1058,7 +1058,7 @@ static int tipc_send_group_bcast(struct socket *sock, struct msghdr *m,
>         msg_set_grp_bc_ack_req(hdr, ack);
>
>         /* Build message as chain of buffers */
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>         rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>         if (unlikely(rc != dlen))
>                 return rc;
> @@ -1387,7 +1387,7 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen)
>         if (unlikely(rc))
>                 return rc;
>
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>         mtu = tipc_node_get_mtu(net, dnode, tsk->portid);
>         rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
>         if (unlikely(rc != dlen))
> @@ -1445,7 +1445,7 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen)
>         int send, sent = 0;
>         int rc = 0;
>
> -       skb_queue_head_init(&pkts);
> +       __skb_queue_head_init(&pkts);
>
>         if (unlikely(dlen > INT_MAX))
>                 return -EMSGSIZE;
> @@ -1805,7 +1805,7 @@ static int tipc_recvmsg(struct socket *sock, struct msghdr *m,
>
>         /* Send group flow control advertisement when applicable */
>         if (tsk->group && msg_in_group(hdr) && !grp_evt) {
> -               skb_queue_head_init(&xmitq);
> +               __skb_queue_head_init(&xmitq);
>                 tipc_group_update_rcv_win(tsk->group, tsk_blocks(hlen + dlen),
>                                           msg_orignode(hdr), msg_origport(hdr),
>                                           &xmitq);
> @@ -2674,7 +2674,7 @@ static void tipc_sk_timeout(struct timer_list *t)
>         struct sk_buff_head list;
>         int rc = 0;
>
> -       skb_queue_head_init(&list);
> +       __skb_queue_head_init(&list);
>         bh_lock_sock(sk);
>
>         /* Try again later if socket is busy */
> --
> 2.1.4
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH -next v2] btf: fix return value check in btf_vmlinux_init()
From: Alexei Starovoitov @ 2019-08-16  5:24 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, Network Development, bpf,
	kernel-janitors
In-Reply-To: <20190816024044.139761-1-weiyongjun1@huawei.com>

On Thu, Aug 15, 2019 at 7:36 PM Wei Yongjun <weiyongjun1@huawei.com> wrote:
>
> In case of error, the function kobject_create_and_add() returns NULL
> pointer not ERR_PTR(). The IS_ERR() test in the return value check
> should be replaced with NULL test.
>
> Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> Acked-by: Andrii Nakryiko <andriin@fb.com>

Applied. Thanks.

Please spell out [PATCH v2 bpf-next] in the subject next time.

^ permalink raw reply

* Re: [PATCH bpf-next 2/4] selftests/bpf: test_progs: test__skip
From: Andrii Nakryiko @ 2019-08-16  5:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stanislav Fomichev, Stanislav Fomichev, Networking, bpf,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
In-Reply-To: <CAADnVQ+Bz6R17bassdr3xOR7rhbuw-HbdXYu-hHkxE8S2WiNrA@mail.gmail.com>

On Thu, Aug 15, 2019 at 10:16 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 14, 2019 at 1:01 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > >
> > > Let me know if you see a value in highlighting test vs subtest skip.
> > >
> > > Other related question is: should we do verbose output in case
> > > of a skip? Right now we don't do it.
> >
> > It might be useful, I guess, especially if it's not too common. But
> > Alexei is way more picky about stuff like that, so I'd defer to him. I
> > have no problem with a clean "SKIPPED: <test>/<subtest> (maybe some
> > reason for skipping here)" message.
>
> Since test_progs prints single number for FAILED tests then single number
> for SKIPPED tests is fine as well.

I'm fine with single number, but it should count number of subtests
skipped, if there are subtests within test, same as for FAILED.

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the kbuild tree
From: Andrii Nakryiko @ 2019-08-16  5:21 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Masahiro Yamada,
	Linux Next Mailing List, Linux Kernel Mailing List, Kees Cook,
	Andrii Nakryiko, Daniel Borkmann
In-Reply-To: <20190816124143.2640218a@canb.auug.org.au>

On Thu, Aug 15, 2019 at 7:42 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   scripts/link-vmlinux.sh
>
> between commit:
>
>   e167191e4a8a ("kbuild: Parameterize kallsyms generation and correct reporting")
>
> from the kbuild tree and commits:
>
>   341dfcf8d78e ("btf: expose BTF info through sysfs")
>   7fd785685e22 ("btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux")
>
> from the net-next tree.
>
> I fixed it up (I think - see below) and can carry the fix as necessary.

Thanks, Stephen! Looks good except one minor issue below.

> This is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc scripts/link-vmlinux.sh
> index 2438a9faf3f1,c31193340108..000000000000
> --- a/scripts/link-vmlinux.sh
> +++ b/scripts/link-vmlinux.sh
> @@@ -56,11 -56,10 +56,11 @@@ modpost_link(
>   }
>
>   # Link of vmlinux
> - # ${1} - optional extra .o files
> - # ${2} - output file
> + # ${1} - output file
> + # ${@:2} - optional extra .o files
>   vmlinux_link()
>   {
>  +      info LD ${2}

This needs to be ${1}.

>         local lds="${objtree}/${KBUILD_LDS}"
>         local objects
>
> @@@ -139,18 -149,6 +150,18 @@@ kallsyms(
>         ${CC} ${aflags} -c -o ${2} ${afile}
>   }
>
>  +# Perform one step in kallsyms generation, including temporary linking of
>  +# vmlinux.
>  +kallsyms_step()
>  +{
>  +      kallsymso_prev=${kallsymso}
>  +      kallsymso=.tmp_kallsyms${1}.o
>  +      kallsyms_vmlinux=.tmp_vmlinux${1}
>  +
> -       vmlinux_link "${kallsymso_prev}" ${kallsyms_vmlinux}
> ++      vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o}
>  +      kallsyms ${kallsyms_vmlinux} ${kallsymso}
>  +}
>  +
>   # Create map file with all symbols from ${1}
>   # See mksymap for additional details
>   mksysmap()
> @@@ -228,8 -227,14 +240,15 @@@ ${MAKE} -f "${srctree}/scripts/Makefile
>   info MODINFO modules.builtin.modinfo
>   ${OBJCOPY} -j .modinfo -O binary vmlinux.o modules.builtin.modinfo
>
> + btf_vmlinux_bin_o=""
> + if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
> +       if gen_btf .tmp_vmlinux.btf .btf.vmlinux.bin.o ; then
> +               btf_vmlinux_bin_o=.btf.vmlinux.bin.o
> +       fi
> + fi
> +
>   kallsymso=""
>  +kallsymso_prev=""
>   kallsyms_vmlinux=""
>   if [ -n "${CONFIG_KALLSYMS}" ]; then
>
> @@@ -268,11 -285,8 +287,7 @@@
>         fi
>   fi
>
> - vmlinux_link "${kallsymso}" vmlinux
> -
> - if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
> -       gen_btf vmlinux
> - fi
>  -info LD vmlinux
> + vmlinux_link vmlinux "${kallsymso}" "${btf_vmlinux_bin_o}"
>
>   if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
>         info SORTEX vmlinux

^ permalink raw reply

* Re: [PATCH net-next] r8152: divide the tx and rx bottom functions
From: David Miller @ 2019-08-16  5:17 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2F18D43A3@RTITMBSVM03.realtek.com.tw>

From: Hayes Wang <hayeswang@realtek.com>
Date: Fri, 16 Aug 2019 02:59:16 +0000

> David Miller [mailto:davem@davemloft.net]
>> Sent: Friday, August 16, 2019 4:59 AM
> [...]
>> Theoretically, yes.
>> 
>> But do you have actual performance numbers showing this to be worth
>> the change?
>> 
>> Always provide performance numbers with changes that are supposed to
>> improve performance.
> 
> On x86, they are almost the same.
> Tx/Rx: 943/943 Mbits/sec -> 945/944
> 
> For arm platform,
> Tx/Rx: 917/917 Mbits/sec -> 933/933
> Improve about 1.74%.

Belongs in the commit message.

^ permalink raw reply

* Re: [PATCH bpf-next 2/4] selftests/bpf: test_progs: test__skip
From: Alexei Starovoitov @ 2019-08-16  5:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, Stanislav Fomichev, Networking, bpf,
	David S. Miller, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko
In-Reply-To: <CAEf4BzaEJcTKV6s8cVinpJcBStvs2LAJ+obNjevw54EOQq1QdQ@mail.gmail.com>

On Wed, Aug 14, 2019 at 1:01 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> >
> > Let me know if you see a value in highlighting test vs subtest skip.
> >
> > Other related question is: should we do verbose output in case
> > of a skip? Right now we don't do it.
>
> It might be useful, I guess, especially if it's not too common. But
> Alexei is way more picky about stuff like that, so I'd defer to him. I
> have no problem with a clean "SKIPPED: <test>/<subtest> (maybe some
> reason for skipping here)" message.

Since test_progs prints single number for FAILED tests then single number
for SKIPPED tests is fine as well.

^ permalink raw reply

* Re: [PATCH bpf] tools: bpftool: close prog FD before exit on showing a single program
From: Alexei Starovoitov @ 2019-08-16  5:11 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers
In-Reply-To: <20190815142223.2203-1-quentin.monnet@netronome.com>

On Thu, Aug 15, 2019 at 7:22 AM Quentin Monnet
<quentin.monnet@netronome.com> wrote:
>
> When showing metadata about a single program by invoking
> "bpftool prog show PROG", the file descriptor referring to the program
> is not closed before returning from the function. Let's close it.
>
> Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool")
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Applied. Thanks

^ permalink raw reply

* Re: [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Alexei Starovoitov @ 2019-08-16  5:08 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Network Development,
	oss-drivers
In-Reply-To: <20190815143220.4199-1-quentin.monnet@netronome.com>

On Thu, Aug 15, 2019 at 7:32 AM Quentin Monnet
<quentin.monnet@netronome.com> wrote:
>
> Hi,
> Because the "__printf()" attributes were used only where the functions are
> implemented, and not in header files, the checks have not been enforced on
> all the calls to printf()-like functions, and a number of errors slipped in
> bpftool over time.
>
> This set cleans up such errors, and then moves the "__printf()" attributes
> to header files, so that the checks are performed at all locations.

Applied. Thanks

^ permalink raw reply

* Reminder: 6 active syzbot reports in "net/tls" subsystem
From: Eric Biggers @ 2019-08-16  4:18 UTC (permalink / raw)
  To: netdev, Boris Pismenny, Aviad Yehezkel, Dave Watson,
	John Fastabend, Daniel Borkmann, Jakub Kicinski, David S. Miller,
	Vakul Garg
  Cc: syzkaller-bugs

[This email was generated by a script.  Let me know if you have any suggestions
to make it better, or if you want it re-generated with the latest status.]

Of the distinct crashes that syzbot has seen in the last week, I've manually
marked 6 of them as possibly being bugs in the "net/tls" subsystem.  I've listed
these bug reports below.

Of these 6 reports, 3 were bisected to commits from the following people:

	Vakul Garg <vakul.garg@nxp.com>
	Dave Watson <davejwatson@fb.com>

I've manually checked that these bisection results look plausible.

If you believe a bug report is no longer valid, please close it by sending a
'#syz fix', '#syz dup', or '#syz invalid' command in reply to the original
thread, as explained at https://goo.gl/tpsmEJ#status

If you believe I misattributed a bug report to the "net/tls" subsystem, please
let me know and (if possible) forward it to the correct place.

Note: in total, I've actually assigned 27 open syzbot reports to this subsystem.
But to help focus people's efforts, I've only listed the 6 that have
(re-)occurred in the last week.  Let me know if you want the full list.

Here are the bug reports:

--------------------------------------------------------------------------------
Title:              kernel BUG at include/linux/scatterlist.h:LINE!
Last occurred:      0 days ago
Reported:           85 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=effb623cefb879664122cc47df3af728957eb279
Original thread:    https://lore.kernel.org/lkml/000000000000f41cd905897c075e@google.com/T/#u

This bug has a C reproducer.

This bug was bisected to:

		commit f295b3ae9f5927e084bd5decdff82390e3471801
		Author: Vakul Garg <vakul.garg@nxp.com>
		Date:   Wed Mar 20 02:03:36 2019 +0000

		  net/tls: Add support of AES128-CCM based ciphers

The original thread for this bug has received 1 reply, 66 days ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+df0d4ec12332661dd1f9@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000f41cd905897c075e@google.com

--------------------------------------------------------------------------------
Title:              kernel BUG at ./include/linux/scatterlist.h:LINE!
Last occurred:      6 days ago
Reported:           56 days ago
Branches:           Mainline
Dashboard link:     https://syzkaller.appspot.com/bug?id=3008161aab5958fe4125a4cae3e4b7ad3ea50a26
Original thread:    https://lore.kernel.org/lkml/000000000000417551058bc0bef9@google.com/T/#u

This bug has a C reproducer.

This bug was bisected to:

		commit f295b3ae9f5927e084bd5decdff82390e3471801
		Author: Vakul Garg <vakul.garg@nxp.com>
		Date:   Wed Mar 20 02:03:36 2019 +0000

		  net/tls: Add support of AES128-CCM based ciphers

No one has replied to the original thread for this bug yet.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+ef0daa6ce95facb233c1@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000417551058bc0bef9@google.com

--------------------------------------------------------------------------------
Title:              INFO: task hung in tls_sw_release_resources_tx
Last occurred:      0 days ago
Reported:           0 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=845e2a9172ab3afe80b95af12014c65930a053d5
Original thread:    https://lore.kernel.org/lkml/000000000000523ea3059025b11d@google.com/T/#u

This bug has a C reproducer.

This bug was bisected to:

		commit 130b392c6cd6b2aed1b7eb32253d4920babb4891
		Author: Dave Watson <davejwatson@fb.com>
		Date:   Wed Jan 30 21:58:31 2019 +0000

		  net: tls: Add tls 1.3 support

The original thread for this bug has received 1 reply, 3 hours ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+6a9ff159672dfbb41c95@syzkaller.appspotmail.com

If you send any email or patch for this bug, please reply to the original
thread, which had activity only 3 hours ago.  For the git send-email command to
use, or tips on how to reply if the thread isn't in your mailbox, see the "Reply
instructions" at https://lore.kernel.org/r/000000000000523ea3059025b11d@google.com

--------------------------------------------------------------------------------
Title:              KMSAN: uninit-value in gf128mul_4k_lle (3)
Last occurred:      0 days ago
Reported:           265 days ago
Branches:           https://github.com/google/kmsan.git master
Dashboard link:     https://syzkaller.appspot.com/bug?id=a01db4c67933e9e4be8e721a8ee15a9530f1ac04
Original thread:    https://lore.kernel.org/lkml/000000000000bf2457057b5ccda3@google.com/T/#u

This bug has a C reproducer.

The original thread for this bug received 2 replies; the last was 260 days ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+f8495bff23a879a6d0bd@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000bf2457057b5ccda3@google.com

--------------------------------------------------------------------------------
Title:              INFO: task hung in __flush_work
Last occurred:      7 days ago
Reported:           180 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=9613d8dffb5c6cc39da8ec290cb8f3eb62bdf21f
Original thread:    https://lore.kernel.org/lkml/0000000000008f9c780581fd7417@google.com/T/#u

This bug has a C reproducer.

No one replied to the original thread for this bug.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+aa0b64a57e300a1c6bcc@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/0000000000008f9c780581fd7417@google.com

--------------------------------------------------------------------------------
Title:              KMSAN: uninit-value in aesti_encrypt
Last occurred:      1 day ago
Reported:           49 days ago
Branches:           https://github.com/google/kmsan.git master
Dashboard link:     https://syzkaller.appspot.com/bug?id=9e9babd01df34db0c4d4dbde8ca57a0380e6db0b
Original thread:    https://lore.kernel.org/lkml/000000000000a97a15058c50c52e@google.com/T/#u

This bug has a C reproducer.

The original thread for this bug has received 4 replies; the last was 43 days
ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+6f50c99e8f6194bf363f@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000a97a15058c50c52e@google.com

^ permalink raw reply

* Reminder: 8 active syzbot reports in "net/bpf" subsystem
From: Eric Biggers @ 2019-08-16  4:17 UTC (permalink / raw)
  To: netdev, bpf, David S. Miller, Alexei Starovoitov, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, syzkaller-bugs

[This email was generated by a script.  Let me know if you have any suggestions
to make it better, or if you want it re-generated with the latest status.]

Of the distinct crashes that syzbot has seen in the last week, I've manually
marked 8 of them as possibly being bugs in the "net/bpf" subsystem.  I've listed
these bug reports below.

Of these 8 reports, 1 was bisected to a commit from the following person:

	Alexei Starovoitov <ast@kernel.org>

I've manually checked that this bisection result looks plausible.

If you believe a bug report is no longer valid, please close it by sending a
'#syz fix', '#syz dup', or '#syz invalid' command in reply to the original
thread, as explained at https://goo.gl/tpsmEJ#status

If you believe I misattributed a bug report to the "net/bpf" subsystem, please
let me know and (if possible) forward it to the correct place.

Note: in total, I've actually assigned 42 open syzbot reports to this subsystem.
But to help focus people's efforts, I've only listed the 8 that have
(re-)occurred in the last week.  Let me know if you want the full list.

Here are the bug reports:

--------------------------------------------------------------------------------
Title:              WARNING in bpf_jit_free
Last occurred:      0 days ago
Reported:           395 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=d04f9c2ec11ab2678f7427795ff5170cb9eb2220
Original thread:    https://lore.kernel.org/lkml/000000000000e92d1805711f5552@google.com/T/#u

This bug has a C reproducer.

syzbot has bisected this bug, but I think the bisection result is incorrect.

The original thread for this bug received 5 replies; the last was 65 days ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+2ff1e7cb738fd3c41113@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000e92d1805711f5552@google.com

--------------------------------------------------------------------------------
Title:              WARNING: kernel stack frame pointer has bad value (2)
Last occurred:      1 day ago
Reported:           395 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=02a32f98a4e3b5a2ed6929aabdd28dd1618b9c03
Original thread:    https://lore.kernel.org/lkml/0000000000000956640571197f98@google.com/T/#u

This bug has a C reproducer.

The original thread for this bug received 1 reply, 395 days ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+903cdd6bce9a6eb832a4@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/0000000000000956640571197f98@google.com

--------------------------------------------------------------------------------
Title:              BUG: unable to handle kernel paging request in bpf_prog_kallsyms_add
Last occurred:      0 days ago
Reported:           339 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=97f89d84d528e4f5150dcfbdeb97347bc8471e96
Original thread:    https://lore.kernel.org/lkml/0000000000009417ef0575802d44@google.com/T/#u

This bug has a syzkaller reproducer only.

The original thread for this bug received 2 replies; the last was 164 days ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+c827a78260579449ad39@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/0000000000009417ef0575802d44@google.com

--------------------------------------------------------------------------------
Title:              WARNING in bpf_prog_kallsyms_find
Last occurred:      0 days ago
Reported:           100 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=40b0c218e639f1d882b86abff2549cfe11c5101e
Original thread:    https://lore.kernel.org/lkml/000000000000a8fa360588580820@google.com/T/#u

This bug has a C reproducer.

No one replied to the original thread for this bug.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+89d1ce6e80218a6192d8@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000a8fa360588580820@google.com

--------------------------------------------------------------------------------
Title:              KASAN: use-after-free Read in sk_psock_unlink
Last occurred:      5 days ago
Reported:           293 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=d691981726208716cc7aec231fb915e27763d662
Original thread:    https://lore.kernel.org/lkml/000000000000fd342e05791cc86f@google.com/T/#u

This bug has a syzkaller reproducer only.

syzbot has bisected this bug, but I think the bisection result is incorrect.

The original thread for this bug received 1 reply, 85 days ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+3acd9f67a6a15766686e@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000fd342e05791cc86f@google.com

--------------------------------------------------------------------------------
Title:              KASAN: slab-out-of-bounds Read in do_jit
Last occurred:      0 days ago
Reported:           23 days ago
Branches:           Mainline and others
Dashboard link:     https://syzkaller.appspot.com/bug?id=3aacade388873fa82bd6d2efb6aaa9ab85964020
Original thread:    https://lore.kernel.org/lkml/000000000000a6ab6b058e5b899b@google.com/T/#u

This bug has a C reproducer.

This bug was bisected to:

		commit 2589726d12a1b12eaaa93c7f1ea64287e383c7a5
		Author: Alexei Starovoitov <ast@kernel.org>
		Date:   Sat Jun 15 19:12:20 2019 +0000

		  bpf: introduce bounded loops

No one has replied to the original thread for this bug yet.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+6b40f58c6d280fa23b40@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000a6ab6b058e5b899b@google.com

--------------------------------------------------------------------------------
Title:              WARNING in is_bpf_text_address
Last occurred:      0 days ago
Reported:           55 days ago
Branches:           Mainline
Dashboard link:     https://syzkaller.appspot.com/bug?id=2386340f7a641010bb1e17228d1e9319592c01ba
Original thread:    https://lore.kernel.org/lkml/00000000000000ac4f058bd50039@google.com/T/#u

This bug has a C reproducer.

The original thread for this bug has received 5 replies; the last was 2 hours
ago.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com

If you send any email or patch for this bug, please reply to the original
thread, which had activity only 2 hours ago.  For the git send-email command to
use, or tips on how to reply if the thread isn't in your mailbox, see the "Reply
instructions" at https://lore.kernel.org/r/00000000000000ac4f058bd50039@google.com

--------------------------------------------------------------------------------
Title:              memory leak in sock_hash_update_common
Last occurred:      7 days ago
Reported:           85 days ago
Branches:           Mainline
Dashboard link:     https://syzkaller.appspot.com/bug?id=9992588b3bbe2617f62f41b1162af9fc8ea4829c
Original thread:    https://lore.kernel.org/lkml/000000000000fa662405897c0774@google.com/T/#u

This bug has a syzkaller reproducer only.

No one has replied to the original thread for this bug yet.

If you fix this bug, please add the following tag to the commit:
    Reported-by: syzbot+30c7a1fc662026545124@syzkaller.appspotmail.com

If you send any email or patch for this bug, please consider replying to the
original thread.  For the git send-email command to use, or tips on how to reply
if the thread isn't in your mailbox, see the "Reply instructions" at
https://lore.kernel.org/r/000000000000fa662405897c0774@google.com

^ permalink raw reply

* Re: [PATCH net] tunnel: fix dev null pointer dereference when send pkg larger than mtu in collect_md mode
From: Hangbin Liu @ 2019-08-16  4:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Stefano Brivio, wenxu, Alexei Starovoitov,
	David S . Miller
In-Reply-To: <20190816032418.GX18865@dhcp-12-139.nay.redhat.com>

On Fri, Aug 16, 2019 at 11:24:18AM +0800, Hangbin Liu wrote:
> If yes, how about just set the skb dst to rt->dst, as the
> iptunnel_xmit would do later.
> 
> skb_dst_drop(skb);
> skb_dst_set(skb, &rt->dst);
> 

Tested and this donesn't work good....

^ permalink raw reply

* [PATCH] airo: fix memory leaks
From: Wenwen Wang @ 2019-08-16  3:50 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Kalle Valo, David S. Miller, Herbert Xu, Dan Carpenter,
	Eric Biggers, Ard Biesheuvel,
	open list:NETWORKING DRIVERS (WIRELESS),
	open list:NETWORKING DRIVERS, open list

In proc_BSSList_open(), 'file->private_data' is allocated through kzalloc()
and 'data->rbuffer' is allocated through kmalloc(). In the following
execution, if an error occurs, they are not deallocated, leading to memory
leaks. To fix this issue, free the allocated memory regions before
returning the error.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
---
 drivers/net/wireless/cisco/airo.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/cisco/airo.c b/drivers/net/wireless/cisco/airo.c
index 9342ffb..f43c065 100644
--- a/drivers/net/wireless/cisco/airo.c
+++ b/drivers/net/wireless/cisco/airo.c
@@ -5441,11 +5441,18 @@ static int proc_BSSList_open( struct inode *inode, struct file *file ) {
 			Cmd cmd;
 			Resp rsp;
 
-			if (ai->flags & FLAG_RADIO_MASK) return -ENETDOWN;
+			if (ai->flags & FLAG_RADIO_MASK) {
+				kfree(data->rbuffer);
+				kfree(file->private_data);
+				return -ENETDOWN;
+			}
 			memset(&cmd, 0, sizeof(cmd));
 			cmd.cmd=CMD_LISTBSS;
-			if (down_interruptible(&ai->sem))
+			if (down_interruptible(&ai->sem)) {
+				kfree(data->rbuffer);
+				kfree(file->private_data);
 				return -ERESTARTSYS;
+			}
 			issuecommand(ai, &cmd, &rsp);
 			up(&ai->sem);
 			data->readlen = 0;
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH v2] socket.7: Add description of SO_SELECT_ERR_QUEUE
From: Ricardo Biehl Pasquali @ 2019-08-16  3:43 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages), jacob.e.keller
  Cc: linux-man, netdev, stefan.puiu, corbet, davem
In-Reply-To: <f053fe2c-20e5-4754-8b13-89cddfbfb52d@gmail.com>

TL;DR: This email proposes a description of the socket
option SO_SELECT_ERR_QUEUE taking into account the change
in wake up behavior when errors are enqueued introduced by
the commit 6e5d58fdc9bedd0255a8 ("skbuff: Fix not waking
applications when errors are enqueued") in Linux 4.16.

On Mon, Jul 29, 2019 at 08:51:42PM +0200, Michael Kerrisk (man-pages) wrote:
> Sorry -- I've not had a lot of cycles to spare for man-pages of late.

Hi. No problem, I've just wondering whether you were
receiving the messages.

> Thanks for the patch. But your text doesn't quite capture the idea
> in this commit message:
> 
> commit 7d4c04fc170087119727119074e72445f2bb192b
> Author: Keller, Jacob E <jacob.e.keller@intel.com>
> Date:   Thu Mar 28 11:19:25 2013 +0000

It definitely does not.

Initially, despite the description of the commit and the
name of the option, I was investigating only the poll() case
as this was what I was working on.

Sorry.

Now I investigated the behavior of select() and poll(). I've
updated a test code that I wrote some time ago.

See <https://github.com/pasqualirb/poll_select_test>.

I've also written a Behavior section in README which I did
not include here.

> What would you think of something like this:
>        SO_SELECT_ERR_QUEUE (since Linux 3.10)
>               When this option is set on a socket, an error condition  on
>               a socket causes notification not only via the exceptfds set
>               of select(2).  Similarly, poll(2) also  returns  a  POLLPRI
>               whenever an POLLERR event is returned.
> 
>               Background:  this  option  was  added  when waking up on an
>               error condition occurred occured only via the  readfds  and
>               writefds  sets of select(2).  The option was added to allow
>               monitoring for error conditions via the exceptfds  argument
>               without simultaneously having to receive notifications (via
>               readfds) for regular data that can be read from the socket.
>               After changes in Linux 4.16, in Linux 4.16, the use of this
>               flag to achieve the desired notifications is no longer nec‐
>               essary.  This option is nevertheless retained for backwards
>               compatibility.
> 
> ?

I think the part "causes notification not only via the
exeptfds set" implies that the option causes notification
in other sets besides exceptfds. However, the option causes
notification in exceptfds (before Linux 4.16).

In "Background", before Linux 4.16, "waking up" happened
also in exeptfds (see 'Internal details' section), although
select() did not return.

A description covering poll() and select() cases plus wake
up behavior might be:

  When this option is set on a socket and an error condition
  triggers wake up (see Background below), an exeptional
  condition (POLLPRI of poll(2); exeptfds of select(2)) is
  returned if user requested it.

  Background:

  Before Linux 4.16, an error condition triggers wake up only
  if user requested POLLIN or POLLPRI (i.e. any of readfds,
  writefds or exeptfds of select(2)). However, for an error
  condition to be returned to the user instead of sleeping
  again in the kernel, POLLERR (i.e. readfds or writefds of
  select(2)) must also have been requested (implicit in
  poll(2)). The option eliminates this need in select(2) by
  returning POLLPRI (i.e. exeptfds) if user requested it.

  Since Linux 4.16, an error condition triggers wake up only
  if user requested POLLERR (i.e. readfds or writefds of
  select(2)). Wake up is not triggered when requesting only
  exeptfds, although returning on it occurs if the error
  condition was generated before calling select(2).

  // Linux 4.16 commit 6e5d58fdc9bedd0255a8 ("skbuff: Fix not
  // waking applications when errors are enqueued")

Another description, focusing on select(), might be:

  Before Linux 4.16, when this option is set on a socket and
  an error condition occurs, select(2) returns on exeptfds if
  user requested it. It is already returned on readfds and
  writefds. Since Linux 4.16, when the option is set, an error
  condition does not return via exeptfds anymore unless it
  occurred before calling select(2).

  For poll(2), regardless of the kernel version, the option
  causes POLLPRI to be added when POLLERR is returned.

  The option does not affect wake up, it affects only whether
  select(2) returns. The wake up behavior is affected in Linux
  4.16. Before this release, waking up on an error condition
  required requesting POLLIN or POLLPRI. However, for an error
  condition to be returned to the user instead of sleeping
  again in the kernel, POLLERR must also be requested. Since
  Linux 4.16, waking up requires requesting only POLLERR.

I have been rewriting this multiple times in the past two
weeks, and I still think it is not clear/simple enough.

What do you think? Please comment your understanding and
your ideas.

Internal details
================

The commit 6e5d58fdc9bedd0255a8 ("skbuff: Fix not waking
applications when errors are enqueued") introduced in Linux
4.16, changed the function that triggered the wake up. The
function sk_data_ready() (sock_def_readable()), which wakes
up the task if POLLIN or POLLPRI is requested, was replaced
by sk_error_report() (sock_queue_err_skb()), which wakes up
the task only if POLLERR is requested.

With the option (SO_SELECT_ERR_QUEUE) set, requesting only
exeptfds (POLLPRI) does not intersect the trigger events
anymore, so the task is not woken. However, if POLLERR is
triggered __before__ calling select(), select() __will__
return because availability of events is checked before
sleep.

In select(), POLLPRI is always requested [1]. POLLERR is
requested by readfds and writefds [2]. POLLIN and POLLHUP
by readfds [2]. POLLOUT by writefds [2].

In poll(), user freely requests events, but POLLERR and
POLLHUP are always requested [3].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/
    linux.git/tree/fs/select.c?id=6e5d58fdc9bedd0255a8#n443

[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/
    linux.git/tree/fs/select.c?id=6e5d58fdc9bedd0255a8#n435

[3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/
    linux.git/tree/fs/select.c?id=6e5d58fdc9bedd0255a8#n820

	pasquali

^ permalink raw reply

* Re: [PATCH net] tunnel: fix dev null pointer dereference when send pkg larger than mtu in collect_md mode
From: Hangbin Liu @ 2019-08-16  3:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Stefano Brivio, wenxu, Alexei Starovoitov,
	David S . Miller
In-Reply-To: <cb5b5d82-1239-34a9-23f5-1894a2ec92a2@gmail.com>

Hi Eric,

Thanks for the review.
On Thu, Aug 15, 2019 at 11:16:58AM +0200, Eric Dumazet wrote:
> > diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> > index 38c02bb62e2c..c6713c7287df 100644
> > --- a/net/ipv4/ip_tunnel.c
> > +++ b/net/ipv4/ip_tunnel.c
> > @@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
> >  		goto tx_error;
> >  	}
> >  
> > +	if (skb_dst(skb) && !skb_dst(skb)->dev)
> > +		skb_dst(skb)->dev = rt->dst.dev;
> > +
> 
> 
> IMO this looks wrong.
> This dst seems shared. 

If the dst is shared, it may cause some problem. Could you point me where the
dst may be shared possibly?

> Once set, we will reuse the same dev ?

If yes, how about just set the skb dst to rt->dst, as the
iptunnel_xmit would do later.

skb_dst_drop(skb);
skb_dst_set(skb, &rt->dst);

or do you have any other idea?
> 
> If intended, why not doing this in __metadata_dst_init() instead of in the fast path ?

I'm afraid we couldn't do this, I didn't find a way to init dev in
__metadata_dst_init(). Do you?

Thanks
Hangbin

^ permalink raw reply

* RE: [PATCH net-next] r8152: divide the tx and rx bottom functions
From: Hayes Wang @ 2019-08-16  2:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <20190815.135851.1942927063321516679.davem@davemloft.net>

David Miller [mailto:davem@davemloft.net]
> Sent: Friday, August 16, 2019 4:59 AM
[...]
> Theoretically, yes.
> 
> But do you have actual performance numbers showing this to be worth
> the change?
> 
> Always provide performance numbers with changes that are supposed to
> improve performance.

On x86, they are almost the same.
Tx/Rx: 943/943 Mbits/sec -> 945/944

For arm platform,
Tx/Rx: 917/917 Mbits/sec -> 933/933
Improve about 1.74%.

Best Regards,
Hayes



^ permalink raw reply

* linux-next: manual merge of the net-next tree with the kbuild tree
From: Stephen Rothwell @ 2019-08-16  2:41 UTC (permalink / raw)
  To: David Miller, Networking, Masahiro Yamada
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Kees Cook,
	Andrii Nakryiko, Daniel Borkmann

[-- Attachment #1: Type: text/plain, Size: 2690 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  scripts/link-vmlinux.sh

between commit:

  e167191e4a8a ("kbuild: Parameterize kallsyms generation and correct reporting")

from the kbuild tree and commits:

  341dfcf8d78e ("btf: expose BTF info through sysfs")
  7fd785685e22 ("btf: rename /sys/kernel/btf/kernel into /sys/kernel/btf/vmlinux")

from the net-next tree.

I fixed it up (I think - see below) and can carry the fix as necessary.
This is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc scripts/link-vmlinux.sh
index 2438a9faf3f1,c31193340108..000000000000
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@@ -56,11 -56,10 +56,11 @@@ modpost_link(
  }
  
  # Link of vmlinux
- # ${1} - optional extra .o files
- # ${2} - output file
+ # ${1} - output file
+ # ${@:2} - optional extra .o files
  vmlinux_link()
  {
 +	info LD ${2}
  	local lds="${objtree}/${KBUILD_LDS}"
  	local objects
  
@@@ -139,18 -149,6 +150,18 @@@ kallsyms(
  	${CC} ${aflags} -c -o ${2} ${afile}
  }
  
 +# Perform one step in kallsyms generation, including temporary linking of
 +# vmlinux.
 +kallsyms_step()
 +{
 +	kallsymso_prev=${kallsymso}
 +	kallsymso=.tmp_kallsyms${1}.o
 +	kallsyms_vmlinux=.tmp_vmlinux${1}
 +
- 	vmlinux_link "${kallsymso_prev}" ${kallsyms_vmlinux}
++	vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o}
 +	kallsyms ${kallsyms_vmlinux} ${kallsymso}
 +}
 +
  # Create map file with all symbols from ${1}
  # See mksymap for additional details
  mksysmap()
@@@ -228,8 -227,14 +240,15 @@@ ${MAKE} -f "${srctree}/scripts/Makefile
  info MODINFO modules.builtin.modinfo
  ${OBJCOPY} -j .modinfo -O binary vmlinux.o modules.builtin.modinfo
  
+ btf_vmlinux_bin_o=""
+ if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
+ 	if gen_btf .tmp_vmlinux.btf .btf.vmlinux.bin.o ; then
+ 		btf_vmlinux_bin_o=.btf.vmlinux.bin.o
+ 	fi
+ fi
+ 
  kallsymso=""
 +kallsymso_prev=""
  kallsyms_vmlinux=""
  if [ -n "${CONFIG_KALLSYMS}" ]; then
  
@@@ -268,11 -285,8 +287,7 @@@
  	fi
  fi
  
- vmlinux_link "${kallsymso}" vmlinux
- 
- if [ -n "${CONFIG_DEBUG_INFO_BTF}" ]; then
- 	gen_btf vmlinux
- fi
 -info LD vmlinux
+ vmlinux_link vmlinux "${kallsymso}" "${btf_vmlinux_bin_o}"
  
  if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
  	info SORTEX vmlinux

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH -next v2] btf: fix return value check in btf_vmlinux_init()
From: Wei Yongjun @ 2019-08-16  2:40 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko
  Cc: Wei Yongjun, netdev, bpf, kernel-janitors
In-Reply-To: <20190815142432.101401-1-weiyongjun1@huawei.com>

In case of error, the function kobject_create_and_add() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.

Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
---
 kernel/bpf/sysfs_btf.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
index 4659349fc795..7ae5dddd1fe6 100644
--- a/kernel/bpf/sysfs_btf.c
+++ b/kernel/bpf/sysfs_btf.c
@@ -30,17 +30,12 @@ static struct kobject *btf_kobj;
 
 static int __init btf_vmlinux_init(void)
 {
-	int err;
-
 	if (!_binary__btf_vmlinux_bin_start)
 		return 0;
 
 	btf_kobj = kobject_create_and_add("btf", kernel_kobj);
-	if (IS_ERR(btf_kobj)) {
-		err = PTR_ERR(btf_kobj);
-		btf_kobj = NULL;
-		return err;
-	}
+	if (!btf_kobj)
+		return -ENOMEM;
 
 	bin_attr_btf_vmlinux.size = _binary__btf_vmlinux_bin_end -
 				    _binary__btf_vmlinux_bin_start;




^ permalink raw reply related

* Re: [PATCH v5] perf machine: arm/arm64: Improve completeness for kernel address space
From: Leo Yan @ 2019-08-16  1:45 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, linux-kernel, netdev,
	bpf, clang-built-linux, Mathieu Poirier, Peter Zijlstra,
	Suzuki Poulouse, coresight, linux-arm-kernel
In-Reply-To: <e0919e39-7607-815b-3a12-96f098e45a5f@intel.com>

Hi Adrian,

On Thu, Aug 15, 2019 at 02:45:57PM +0300, Adrian Hunter wrote:

[...]

> >> How come you cannot use kallsyms to get the information?
> > 
> > Thanks for pointing out this.  Sorry I skipped your comment "I don't
> > know how you intend to calculate ARM_PRE_START_SIZE" when you reviewed
> > the patch v3, I should use that chance to elaborate the detailed idea
> > and so can get more feedback/guidance before procceed.
> > 
> > Actually, I have considered to use kallsyms when worked on the previous
> > patch set.
> > 
> > As mentioned in patch set v4's cover letter, I tried to implement
> > machine__create_extra_kernel_maps() for arm/arm64, the purpose is to
> > parse kallsyms so can find more kernel maps and thus also can fixup
> > the kernel start address.  But I found the 'perf script' tool directly
> > calls machine__get_kernel_start() instead of running into the flow for
> > machine__create_extra_kernel_maps();
> 
> Doesn't it just need to loop through each kernel map to find the lowest
> start address?

Based on your suggestion, I worked out below change and verified it
can work well on arm64 for fixing up start address; please let me know
if the change works for you?

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index f6ee7fbad3e4..51d78313dca1 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2671,9 +2671,26 @@ int machine__nr_cpus_avail(struct machine *machine)
 	return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
 }
 
+static int machine__fixup_kernel_start(void *arg,
+				       const char *name __maybe_unused,
+				       char type,
+				       u64 start)
+{
+	struct machine *machine = arg;
+
+	type = toupper(type);
+
+	/* Fixup for text, weak, data and bss sections. */
+	if (type == 'T' || type == 'W' || type == 'D' || type == 'B')
+		machine->kernel_start = min(machine->kernel_start, start);
+
+	return 0;
+}
+
 int machine__get_kernel_start(struct machine *machine)
 {
 	struct map *map = machine__kernel_map(machine);
+	char filename[PATH_MAX];
 	int err = 0;
 
 	/*
@@ -2687,6 +2704,7 @@ int machine__get_kernel_start(struct machine *machine)
 	machine->kernel_start = 1ULL << 63;
 	if (map) {
 		err = map__load(map);
 		/*
 		 * On x86_64, PTI entry trampolines are less than the
 		 * start of kernel text, but still above 2^63. So leave
@@ -2695,6 +2713,16 @@ int machine__get_kernel_start(struct machine *machine)
 		if (!err && !machine__is(machine, "x86_64"))
 			machine->kernel_start = map->start;
 	}
+
+	machine__get_kallsyms_filename(machine, filename, PATH_MAX);
+
+	if (symbol__restricted_filename(filename, "/proc/kallsyms"))
+		goto out;
+
+	if (kallsyms__parse(filename, machine, machine__fixup_kernel_start))
+		pr_warning("Fail to fixup kernel start address. skipping...\n");
+
+out:
 	return err;
 }

Thanks,
Leo Yan

^ permalink raw reply related

* Re: WARNING in is_bpf_text_address
From: Bart Van Assche @ 2019-08-16  1:39 UTC (permalink / raw)
  To: Will Deacon, syzbot
  Cc: akpm, ast, bpf, daniel, davem, dvyukov, hawk, hdanton,
	jakub.kicinski, johannes.berg, johannes, john.fastabend, kafai,
	linux-kernel, longman, mingo, netdev, paulmck, peterz,
	songliubraving, syzkaller-bugs, tglx, tj, torvalds, will.deacon,
	xdp-newbies, yhs
In-Reply-To: <20190815075142.vuza32plqtiuhixx@willie-the-truck>

On 8/15/19 12:51 AM, Will Deacon wrote:
> Hi Bart,
> 
> On Sat, Aug 10, 2019 at 05:24:06PM -0700, syzbot wrote:
>> syzbot has found a reproducer for the following crash on:
>>
>> HEAD commit:    451577f3 Merge tag 'kbuild-fixes-v5.3-3' of git://git.kern..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=120850a6600000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=2031e7d221391b8a
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bd3bba6ff3fcea7a6ec6
>> compiler:       clang version 9.0.0 (/home/glider/llvm/clang
>> 80fee25776c2fb61e74c1ecb1a523375c2500b69)
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=130ffe4a600000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17137d2c600000
>>
>> The bug was bisected to:
>>
>> commit a0b0fd53e1e67639b303b15939b9c653dbe7a8c4
>> Author: Bart Van Assche <bvanassche@acm.org>
>> Date:   Thu Feb 14 23:00:46 2019 +0000
>>
>>      locking/lockdep: Free lock classes that are no longer in use
>>
>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=152f6a9da00000
>> final crash:    https://syzkaller.appspot.com/x/report.txt?x=172f6a9da00000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=132f6a9da00000
> 
> I know you don't think much to these reports, but please could you have a
> look (even if it's just to declare it a false positive)?

Hi Will,

Had you already noticed the following message?

https://lore.kernel.org/bpf/d76d7a63-7854-e92d-30cb-52546d333ffe@iogearbox.net/

 From that message: "Hey Bart, don't think it's related in any way to 
your commit. I'll allocate some time on working on this issue today, 
thanks!"

Bart.

^ permalink raw reply

* Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP
From: Toshiaki Makita @ 2019-08-16  1:38 UTC (permalink / raw)
  To: William Tu
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David S. Miller, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Jamal Hadi Salim,
	Cong Wang, Jiri Pirko, Linux Kernel Network Developers, bpf
In-Reply-To: <CALDO+SYC4sPw-7iDkFMCD=kf2UnTW2qc0m6Kgz41zLmNNxQ+Ww@mail.gmail.com>

On 2019/08/16 0:46, William Tu wrote:
> On Tue, Aug 13, 2019 at 5:07 AM Toshiaki Makita
> <toshiaki.makita1@gmail.com> wrote:
>>
>> This is a rough PoC for an idea to offload TC flower to XDP.
>>
>>
>> * Motivation
>>
>> The purpose is to speed up software TC flower by using XDP.
>>
>> I chose TC flower because my current interest is in OVS. OVS uses TC to
>> offload flow tables to hardware, so if TC can offload flows to XDP, OVS
>> also can be offloaded to XDP.
>>
>> When TC flower filter is offloaded to XDP, the received packets are
>> handled by XDP first, and if their protocol or something is not
>> supported by the eBPF program, the program returns XDP_PASS and packets
>> are passed to upper layer TC.
>>
>> The packet processing flow will be like this when this mechanism,
>> xdp_flow, is used with OVS.
>>
>>   +-------------+
>>   | openvswitch |
>>   |    kmod     |
>>   +-------------+
>>          ^
>>          | if not match in filters (flow key or action not supported by TC)
>>   +-------------+
>>   |  TC flower  |
>>   +-------------+
>>          ^
>>          | if not match in flow tables (flow key or action not supported by XDP)
>>   +-------------+
>>   |  XDP prog   |
>>   +-------------+
>>          ^
>>          | incoming packets
>>
> I like this idea, some comments about the OVS AF_XDP work.
> 
> Another way when using OVS AF_XDP is to serve as slow path of TC flow
> HW offload.
> For example:
> 
>   Userspace OVS datapath (The one used by OVS-DPDK)
>       ^
>        |
>    +------------------------------+
>    |  OVS AF_XDP netdev |
>    +------------------------------+
>           ^
>           | if not supported or not match in flow tables
>    +---------------------+
>    |  TC HW flower  |
>    +---------------------+
>           ^
>           | incoming packets
> 
> So in this case it's either TC HW flower offload, or the userspace PMD OVS.
> Both cases should be pretty fast.
> 
> I think xdp_flow can also be used by OVS AF_XDP netdev, sitting between
> TC HW flower and OVS AF_XDP netdev.
> Before the XDP program sending packet to AF_XDP socket, the
> xdp_flow can execute first, and if not match, then send to AF_XDP.
> So in your patch set, implement s.t like
>    bpf_redirect_map(&xsks_map, index, 0);

Thanks, the concept sounds good but this is probably difficult as long as
this is a TC offload, which is emulating TC.
If I changed the direction and implement offload in ovs-vswitchd, it would
be possible. I'll remember this optimization.

> Another thing is that at each layer we are doing its own packet parsing.
>  From your graph, first parse at XDP program, then at TC flow, then at
> openvswitch kmod.
> I wonder if we can reuse some parsing result.

That would be nice if possible...
Currently I don't have any ideas to do that. Someday XDP may support more
metadata for this or HW-offload like checksum. Then we can store the information
and upper layers may be able to use that.

Toshiaki Makita

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox