Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.
From: Nitin Hande @ 2018-10-18 23:32 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Joe Stringer, netdev, ast, Jesper Brouer, john fastabend
In-Reply-To: <f13124bf-bb4f-b681-d014-299993305262@iogearbox.net>

On Thu, 18 Oct 2018 23:20:17 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 10/18/2018 11:06 PM, Joe Stringer wrote:
> > On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:  
> [...]
> >> Open Issue
> >> * The underlying code relies on presence of an skb to find out the
> >> right sk for the case of REUSEPORT socket option. Since there is
> >> no skb available at XDP hookpoint, the helper function will return
> >> the first available sk based off the 5 tuple hash. If the desire
> >> is to return a particular sk matching reuseport_cb function, please
> >> suggest way to tackle it, which can be addressed in a future commit.  
> 
> >> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>  
> > 
> > Thanks Nitin, LGTM overall.
> > 
> > The REUSEPORT thing suggests that the usage of this helper from XDP
> > layer may lead to a different socket being selected vs. the equivalent
> > call at TC hook, or other places where the selection may occur. This
> > could be a bit counter-intuitive.
> > 
> > One thought I had to work around this was to introduce a flag,
> > something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> > effectively communicate in the API that the bpf_sk_lookup_xxx()
> > functions will only select a REUSEPORT socket based on the hash and
> > not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> > of the flag would support finding REUSEPORT sockets by other
> > mechanisms (which would be allowed for now from TC hooks but would be
> > disallowed from XDP, since there's no specific plan to support this). 

Thanks Joe for the quick response.This certainly looks feasible. With the
flag, both tc and XDP hookpoints will be consistent in their approach.
 
> 
> Hmm, given skb is NULL here the only way to lookup the socket in such
> scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
> perhaps alternative is to pass this hash in from XDP itself to the
> helper so it could be custom selector.

Interesting, and this will be an additional helper or done within this
sk_lookup() helper ?

 Do you have a specific use case
> on this for XDP (just curious)?

Yes, this is for a dual-stack solution. The XDP program functions as a
demux, if there is a receiver the packet will be ingressed on Linux
networking stack, else it enters the other stack path.

Thanks
Nitin

> 
> Thanks,
> Daniel

^ permalink raw reply

* Re: [PATCH net] net: ipmr: fix unresolved entry dumps
From: David Miller @ 2018-10-18 23:46 UTC (permalink / raw)
  To: dsahern; +Cc: nikolay, netdev
In-Reply-To: <ec18010b-cd8a-f419-a929-c0b1a0a09928@gmail.com>

From: David Ahern <dsahern@gmail.com>
Date: Thu, 18 Oct 2018 09:16:17 -0600

> On 10/17/18 11:36 PM, David Miller wrote:
>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> Date: Wed, 17 Oct 2018 22:34:34 +0300
>> 
>>> If the skb space ends in an unresolved entry while dumping we'll miss
>>> some unresolved entries. The reason is due to zeroing the entry counter
>>> between dumping resolved and unresolved mfc entries. We should just
>>> keep counting until the whole table is dumped and zero when we move to
>>> the next as we have a separate table counter.
>>>
>>> Reported-by: Colin Ian King <colin.king@canonical.com>
>>> Fixes: 8fb472c09b9d ("ipmr: improve hash scalability")
>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> 
>> Applied and queued up for -stable.
>> 
> 
> FYI: on the net to net-next merge, those 2 lines were moved to
> mr_table_dump.

Thanks for the heads up.

^ permalink raw reply

* Re: [net 1/1] tipc: fix info leak from kernel tipc_event
From: David Miller @ 2018-10-18 23:51 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, gordan.mihaljevic, tung.q.nguyen, hoang.h.le, canh.d.luu,
	ying.xue, tipc-discussion
In-Reply-To: <1539877109-7722-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 18 Oct 2018 17:38:29 +0200

> We initialize a struct tipc_event allocated on the kernel stack to
> zero to avert info leak to user space.
> 
> Reported-by: syzbot+057458894bc8cada4dee@syzkaller.appspotmail.com
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net-next] tcp: fix TCP_REPAIR xmit queue setup
From: David Miller @ 2018-10-18 23:51 UTC (permalink / raw)
  To: edumazet; +Cc: ncardwell, soheil, avagin, netdev, eric.dumazet
In-Reply-To: <20181018161219.127534-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 18 Oct 2018 09:12:19 -0700

> Andrey reported the following warning triggered while running CRIU tests:
> 
> tcp_clean_rtx_queue()
> ...
> 	last_ackt = tcp_skb_timestamp_us(skb);
> 	WARN_ON_ONCE(last_ackt == 0);
> 
> This is caused by 5f6188a8003d ("tcp: do not change tcp_wstamp_ns
> in tcp_mstamp_refresh"), as we end up having skbs in retransmit queue
> with a zero skb->skb_mstamp_ns field.
> 
> We could fix this bug in different ways, like making sure
> tp->tcp_wstamp_ns is not zero at socket creation, but as Neal pointed
> out, we also do not want that pacing status of a repaired socket
> could push tp->tcp_wstamp_ns far ahead in the future.
> 
> So we prefer changing tcp_write_xmit() to not call tcp_update_skb_after_send()
> and instead do what is requested by TCP_REPAIR logic.
> 
> Fixes: 5f6188a8003d ("tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Andrey Vagin <avagin@openvz.org>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.
From: Joe Stringer @ 2018-10-18 23:52 UTC (permalink / raw)
  To: daniel, Martin KaFai Lau
  Cc: Joe Stringer, Nitin Hande, netdev, ast, Jesper Brouer,
	john fastabend
In-Reply-To: <f13124bf-bb4f-b681-d014-299993305262@iogearbox.net>

On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 10/18/2018 11:06 PM, Joe Stringer wrote:
> > On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
> [...]
> >> Open Issue
> >> * The underlying code relies on presence of an skb to find out the
> >> right sk for the case of REUSEPORT socket option. Since there is
> >> no skb available at XDP hookpoint, the helper function will return
> >> the first available sk based off the 5 tuple hash. If the desire
> >> is to return a particular sk matching reuseport_cb function, please
> >> suggest way to tackle it, which can be addressed in a future commit.
>
> >> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
> >
> > Thanks Nitin, LGTM overall.
> >
> > The REUSEPORT thing suggests that the usage of this helper from XDP
> > layer may lead to a different socket being selected vs. the equivalent
> > call at TC hook, or other places where the selection may occur. This
> > could be a bit counter-intuitive.
> >
> > One thought I had to work around this was to introduce a flag,
> > something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> > effectively communicate in the API that the bpf_sk_lookup_xxx()
> > functions will only select a REUSEPORT socket based on the hash and
> > not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> > of the flag would support finding REUSEPORT sockets by other
> > mechanisms (which would be allowed for now from TC hooks but would be
> > disallowed from XDP, since there's no specific plan to support this).
>
> Hmm, given skb is NULL here the only way to lookup the socket in such
> scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
> perhaps alternative is to pass this hash in from XDP itself to the
> helper so it could be custom selector. Do you have a specific use case
> on this for XDP (just curious)?

I don't have a use case for SO_REUSEPORT introspection from XDP, so
I'm primarily thinking from the perspective of making the behaviour
clear in the API in a way that leaves open the possibility for a
reasonable implementation in future. From that perspective, my main
concern is that it may surprise some BPF writers that the same
"bpf_sk_lookup_tcp()" call (with identical parameters) may have
different behaviour at TC vs. XDP layers, as the BPF selection of
sockets is respected at TC but not at XDP.

FWIW we're already out of parameters for the actual call, so if we
wanted to allow passing a hash in, we'd need to either dedicate half
the 'flags' field for this configurable hash, or consider adding the
new hash parameter to 'struct bpf_sock_tuple'.

+Martin for any thoughts on SO_REUSEPORT and XDP here.

^ permalink raw reply

* Re: [PATCH net] ip6_tunnel: Fix encapsulation layout
From: David Miller @ 2018-10-18 23:55 UTC (permalink / raw)
  To: sbrivio; +Cc: tom, kraig, pabeni, netdev
In-Reply-To: <f5f6a69915694976b6feec7ae64a89e1da4e551f.1539890448.git.sbrivio@redhat.com>

From: Stefano Brivio <sbrivio@redhat.com>
Date: Thu, 18 Oct 2018 21:25:07 +0200

> Commit 058214a4d1df ("ip6_tun: Add infrastructure for doing
> encapsulation") added the ip6_tnl_encap() call in ip6_tnl_xmit(), before
> the call to ipv6_push_frag_opts() to append the IPv6 Tunnel Encapsulation
> Limit option (option 4, RFC 2473, par. 5.1) to the outer IPv6 header.
> 
> As long as the option didn't actually end up in generated packets, this
> wasn't an issue. Then commit 89a23c8b528b ("ip6_tunnel: Fix missing tunnel
> encapsulation limit option") fixed sending of this option, and the
> resulting layout, e.g. for FoU, is:
> 
> .-------------------.------------.----------.-------------------.----- - -
> | Outer IPv6 Header | UDP header | Option 4 | Inner IPv6 Header | Payload
> '-------------------'------------'----------'-------------------'----- - -
> 
> Needless to say, FoU and GUE (at least) won't work over IPv6. The option
> is appended by default, and I couldn't find a way to disable it with the
> current iproute2.
> 
> Turn this into a more reasonable:
> 
> .-------------------.----------.------------.-------------------.----- - -
> | Outer IPv6 Header | Option 4 | UDP header | Inner IPv6 Header | Payload
> '-------------------'----------'------------'-------------------'----- - -
> 
> With this, and with 84dad55951b0 ("udp6: fix encap return code for
> resubmitting"), FoU and GUE work again over IPv6.
> 
> Fixes: 058214a4d1df ("ip6_tun: Add infrastructure for doing encapsulation")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

This goes back to v4.7 then, I can't believe this has been broken for
so long. :-/

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [pull request][net-next 0/8] Mellanox, mlx5 updates 2018-10-18
From: David Miller @ 2018-10-19  0:02 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <20181018203907.25149-1-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Thu, 18 Oct 2018 13:38:59 -0700

> This pull request provides some misc updates to mlx5 core and netdevice
> driver.
> 
> For more details please see tag log below.
> 
> Please pull and let me know if there's any problem.

Pulled, thanks Saeed.

^ permalink raw reply

* Re: [GIT] Networking
From: Greg KH @ 2018-10-19  8:13 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, netdev, linux-kernel
In-Reply-To: <20181018.171914.1210096727703399564.davem@davemloft.net>

On Thu, Oct 18, 2018 at 05:19:14PM -0700, David Miller wrote:
> 
> 1) Fix gro_cells leak in xfrm layer, from Li RongQing.
> 
> 2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that.
>    From Eric Dumazet.
> 
> 3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn
>    Töpel.
> 
> 4) Out of bounds packet access in _decode_session6(), from Alexei
>    Starovoitov.
> 
> 5) Several ethtool bugs, where we copy a struct into the kernel
>    twice and our validations of the values in the first copy can
>    be invalidated by the second copy due to asynchronous updates
>    to the memory by the user.  From Wenwen Wang.
> 
> 6) Missing netlink attribute validation in cls_api, from Davide
>    Caratti.
> 
> 7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang.
> 
> 8) rxrpc operates on wrong kvec, from Yue Haibing.
> 
> 9) A regression was introduced by the disassosciation of route
>    neighbour references in rt6_probe(), causing probe for
>    neighbourless routes to not be properly rate limited.  Fix
>    from Sabrina Dubroca.
> 
> 10) Unsafe RCU locking in tipc, from Tung Nguyen.
> 
> 11) Use after free in inet6_mc_check(), from Eric Dumazet.
> 
> 12) PMTU from icmp packets should update the SCTP transport
>     pathmtu, from Xin Long.
> 
> 13) Missing peer put on error in rxrpc, from David Howells.
> 
> 14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren.
> 
> 15) Fix overflowing shift statement in qla3xxx driver, from Nathan
>     Chancellor.
> 
> 16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva.
> 
> 17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return
>     value in an inverted manner, fix from Paolo Abeni.
> 
> 18) Fix missed unresolved entries in ipmr dumps, from Nikolay
>     Aleksandrov.
> 
> 19) Fix NAPI handling under high load, we can completely miss events
>     when NAPI has to loop more than one time in a cycle.  From Heiner
>     Kallweit.
> 
> Please pull, thanks a lot!
> 
> The following changes since commit bab5c80b211035739997ebd361a679fa85b39465:
> 
>   Merge tag 'armsoc-fixes-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2018-10-12 17:41:27 +0200)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

Now merged, thanks.

greg k-h

^ permalink raw reply

* Re: [PATCH v4 bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
From: Daniel Borkmann @ 2018-10-19  0:14 UTC (permalink / raw)
  To: Song Liu, netdev; +Cc: ast, kernel-team
In-Reply-To: <20181018160649.1611530-2-songliubraving@fb.com>

On 10/18/2018 06:06 PM, Song Liu wrote:
> BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
> skb. This patch enables direct access of skb for these programs.
> 
> Two helper functions bpf_compute_and_save_data_pointers() and
> bpf_restore_data_pointers() are introduced. There are used in
> __cgroup_bpf_run_filter_skb(), to compute proper data_end for the
> BPF program, and restore original data afterwards.
> 
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  include/linux/filter.h | 24 ++++++++++++++++++++++++
>  kernel/bpf/cgroup.c    |  6 ++++++
>  net/core/filter.c      | 36 +++++++++++++++++++++++++++++++++++-
>  3 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 5771874bc01e..96b3ee7f14c9 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -548,6 +548,30 @@ static inline void bpf_compute_data_pointers(struct sk_buff *skb)
>  	cb->data_end  = skb->data + skb_headlen(skb);
>  }
>  
> +/* Similar to bpf_compute_data_pointers(), except that save orginal
> + * data in cb->data and cb->meta_data for restore.
> + */
> +static inline void bpf_compute_and_save_data_pointers(
> +	struct sk_buff *skb, void *saved_pointers[2])
> +{
> +	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
> +
> +	saved_pointers[0] = cb->data_meta;
> +	saved_pointers[1] = cb->data_end;
> +	cb->data_meta = skb->data - skb_metadata_len(skb);
> +	cb->data_end  = skb->data + skb_headlen(skb);

Hmm, can you elaborate why populating data_meta here ...

> +}
> +
> +/* Restore data saved by bpf_compute_data_pointers(). */
> +static inline void bpf_restore_data_pointers(
> +	struct sk_buff *skb, void *saved_pointers[2])
> +{
> +	struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
> +
> +	cb->data_meta = saved_pointers[0];
> +	cb->data_end = saved_pointers[1];;
> +}
> +
>  static inline u8 *bpf_skb_cb(struct sk_buff *skb)
>  {
>  	/* eBPF programs may read/write skb->cb[] area to transfer meta
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 00f6ed2e4f9a..5f5180104ddc 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -554,6 +554,7 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
>  	unsigned int offset = skb->data - skb_network_header(skb);
>  	struct sock *save_sk;
>  	struct cgroup *cgrp;
> +	void *saved_pointers[2];
>  	int ret;
>  
>  	if (!sk || !sk_fullsock(sk))
> @@ -566,8 +567,13 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
>  	save_sk = skb->sk;
>  	skb->sk = sk;
>  	__skb_push(skb, offset);
> +
> +	/* compute pointers for the bpf prog */
> +	bpf_compute_and_save_data_pointers(skb, saved_pointers);
> +
>  	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
>  				 bpf_prog_run_save_cb);
> +	bpf_restore_data_pointers(skb, saved_pointers);
>  	__skb_pull(skb, offset);
>  	skb->sk = save_sk;
>  	return ret == 1 ? 0 : -EPERM;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1a3ac6c46873..e3ca30bd6840 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5346,6 +5346,40 @@ static bool sk_filter_is_valid_access(int off, int size,
>  	return bpf_skb_is_valid_access(off, size, type, prog, info);
>  }
>  
> +static bool cg_skb_is_valid_access(int off, int size,
> +				   enum bpf_access_type type,
> +				   const struct bpf_prog *prog,
> +				   struct bpf_insn_access_aux *info)
> +{
> +	switch (off) {
> +	case bpf_ctx_range(struct __sk_buff, tc_classid):
> +	case bpf_ctx_range(struct __sk_buff, data_meta):
> +	case bpf_ctx_range(struct __sk_buff, flow_keys):
> +		return false;

... if it's disallowed anyway (disallowing it is the right thing to do,
but no need to save/restore then..)?

> +	}
> +	if (type == BPF_WRITE) {
> +		switch (off) {
> +		case bpf_ctx_range(struct __sk_buff, mark):
> +		case bpf_ctx_range(struct __sk_buff, priority):
> +		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
> +			break;
> +		default:
> +			return false;
> +		}
> +	}
> +
> +	switch (off) {
> +	case bpf_ctx_range(struct __sk_buff, data):
> +		info->reg_type = PTR_TO_PACKET;
> +		break;
> +	case bpf_ctx_range(struct __sk_buff, data_end):
> +		info->reg_type = PTR_TO_PACKET_END;
> +		break;
> +	}
> +
> +	return bpf_skb_is_valid_access(off, size, type, prog, info);
> +}
> +
>  static bool lwt_is_valid_access(int off, int size,
>  				enum bpf_access_type type,
>  				const struct bpf_prog *prog,
> @@ -7038,7 +7072,7 @@ const struct bpf_prog_ops xdp_prog_ops = {
>  
>  const struct bpf_verifier_ops cg_skb_verifier_ops = {
>  	.get_func_proto		= cg_skb_func_proto,
> -	.is_valid_access	= sk_filter_is_valid_access,
> +	.is_valid_access	= cg_skb_is_valid_access,
>  	.convert_ctx_access	= bpf_convert_ctx_access,
>  };
>  
> 

^ permalink raw reply

* Re: [PATCH v2] isdn: hfc_{pci,sx}: Avoid empty body if statements
From: David Miller @ 2018-10-19  0:50 UTC (permalink / raw)
  To: natechancellor; +Cc: isdn, netdev, linux-kernel, yamada.masahiro
In-Reply-To: <20181019004251.GA28878@flashbox>

From: Nathan Chancellor <natechancellor@gmail.com>
Date: Thu, 18 Oct 2018 17:42:51 -0700

> On Thu, Oct 18, 2018 at 05:23:10PM -0700, David Miller wrote:
>> From: Nathan Chancellor <natechancellor@gmail.com>
>> Date: Thu, 18 Oct 2018 17:21:17 -0700
>> 
>> > Thanks for the review, I went ahead and compiled with the following diff
>> > on top of v2 and got no warnings from Clang, GCC, or sparse, does this
>> > seem satisfactory for v3?
>> 
>> Well, one thing I notice.
>> 
> 
>> > @@ -86,7 +86,7 @@ release_io_hfcpci(struct IsdnCardState *cs)
>> >         pci_free_consistent(cs->hw.hfcpci.dev, 0x8000,
>> >                             cs->hw.hfcpci.fifos, cs->hw.hfcpci.dma);
>> >         cs->hw.hfcpci.fifos = NULL;
>> > -       iounmap((void *)cs->hw.hfcpci.pci_io);
>> > +       iounmap(cs->hw.hfcpci.pci_io);
>> >  }
>> 
>> Driver uses iounmap().
>> 
>> > @@ -1692,7 +1692,7 @@ setup_hfcpci(struct IsdnCard *card)
>> >                 printk(KERN_WARNING "HFC-PCI: No IRQ for PCI card found\n");
>> >                 return (0);
>> >         }
>> > -       cs->hw.hfcpci.pci_io = (char *)(unsigned long)dev_hfcpci->resource[1].start;
>> > +       cs->hw.hfcpci.pci_io = (void __iomem *)(unsigned long)dev_hfcpci->resource[1].start;
>> >         printk(KERN_INFO "HiSax: HFC-PCI card manufacturer: %s card name: %s\n", id_list[i].vendor_name, id_list[i].card_name);
>> 
>> But does not use iomap().  You won't need any cast here if it did use
>> iomap() properly.
>> 
>> Thanks.
> 
> So this?

That's definitely a lot better, yes!

I wonder what exactly it is checking there though.  I guess it wants to see if the
resource->start value is zero and bail with an error it so.

Anyways, this code has been like this for ages and what you are proposing is
definitely a significant improvement.

Thanks.

^ permalink raw reply

* [PATCH 0/3] Bugfix for the netsec driver
From: masahisa.kojima @ 2018-10-19  1:08 UTC (permalink / raw)
  To: netdev
  Cc: ilias.apalodimas, jaswinder.singh, ard.biesheuvel,
	osaki.yoshitoyo, Masahisa Kojima

From: Masahisa Kojima <masahisa.kojima@linaro.org>

This patch series include bugfix for the netsec ethernet
controller driver, fix the problem in interface down/up.

Masahisa Kojima (3):
  net: socionext: stop PHY before resetting netsec
  net: socionext: Add dummy PHY register read in phy_write()
  net: socionext: reset tx queue in ndo_stop

 drivers/net/ethernet/socionext/netsec.c | 36 +++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [PATCH 1/3] net: socionext: Stop PHY before resetting netsec
From: masahisa.kojima @ 2018-10-19  1:08 UTC (permalink / raw)
  To: netdev
  Cc: ilias.apalodimas, jaswinder.singh, ard.biesheuvel,
	osaki.yoshitoyo, Masahisa Kojima
In-Reply-To: <20181019010843.3605-1-masahisa.kojima@linaro.org>

From: Masahisa Kojima <masahisa.kojima@linaro.org>

After resetting netsec IP, driver have to wait until
netsec mode turns to NRM mode.
But sometimes mode transition to NRM will not complete
if the PHY is in normal operation state.
To avoid this situation, stop PHY before resetting netsec.

Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
Signed-off-by: Yoshitoyo Osaki <osaki.yoshitoyo@socionext.com>
---
 drivers/net/ethernet/socionext/netsec.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 4289ccb26e4e..273cc5fc07e0 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1343,11 +1343,11 @@ static int netsec_netdev_stop(struct net_device *ndev)
 	netsec_uninit_pkt_dring(priv, NETSEC_RING_TX);
 	netsec_uninit_pkt_dring(priv, NETSEC_RING_RX);
 
-	ret = netsec_reset_hardware(priv, false);
-
 	phy_stop(ndev->phydev);
 	phy_disconnect(ndev->phydev);
 
+	ret = netsec_reset_hardware(priv, false);
+
 	pm_runtime_put_sync(priv->dev);
 
 	return ret;
@@ -1415,7 +1415,7 @@ static const struct net_device_ops netsec_netdev_ops = {
 };
 
 static int netsec_of_probe(struct platform_device *pdev,
-			   struct netsec_priv *priv)
+			   struct netsec_priv *priv, u32 *phy_addr)
 {
 	priv->phy_np = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
 	if (!priv->phy_np) {
@@ -1423,6 +1423,8 @@ static int netsec_of_probe(struct platform_device *pdev,
 		return -EINVAL;
 	}
 
+	*phy_addr = of_mdio_parse_addr(&pdev->dev, priv->phy_np);
+
 	priv->clk = devm_clk_get(&pdev->dev, NULL); /* get by 'phy_ref_clk' */
 	if (IS_ERR(priv->clk)) {
 		dev_err(&pdev->dev, "phy_ref_clk not found\n");
@@ -1473,6 +1475,7 @@ static int netsec_register_mdio(struct netsec_priv *priv, u32 phy_addr)
 {
 	struct mii_bus *bus;
 	int ret;
+	u16 data;
 
 	bus = devm_mdiobus_alloc(priv->dev);
 	if (!bus)
@@ -1486,6 +1489,10 @@ static int netsec_register_mdio(struct netsec_priv *priv, u32 phy_addr)
 	bus->parent = priv->dev;
 	priv->mii_bus = bus;
 
+	/* set phy power down */
+	data = netsec_phy_read(bus, phy_addr, 0) | 0x800;
+	netsec_phy_write(bus, phy_addr, 0, data);
+
 	if (dev_of_node(priv->dev)) {
 		struct device_node *mdio_node, *parent = dev_of_node(priv->dev);
 
@@ -1623,7 +1630,7 @@ static int netsec_probe(struct platform_device *pdev)
 	}
 
 	if (dev_of_node(&pdev->dev))
-		ret = netsec_of_probe(pdev, priv);
+		ret = netsec_of_probe(pdev, priv, &phy_addr);
 	else
 		ret = netsec_acpi_probe(pdev, priv, &phy_addr);
 	if (ret)
-- 
2.14.2

^ permalink raw reply related

* [PATCH 2/3] net: socionext: Add dummy PHY register read in phy_write()
From: masahisa.kojima @ 2018-10-19  1:08 UTC (permalink / raw)
  To: netdev
  Cc: ilias.apalodimas, jaswinder.singh, ard.biesheuvel,
	osaki.yoshitoyo, Masahisa Kojima
In-Reply-To: <20181019010843.3605-1-masahisa.kojima@linaro.org>

From: Masahisa Kojima <masahisa.kojima@linaro.org>

There is a compatibility issue between RTL8211E implemented
in Developerbox and netsec network controller IP(F_GMAC4).

RTL8211E expects MDC clock must be kept toggling for several
clock cycle with MDIO high before entering the IDLE state.
To meet this requirement, netsec driver needs to issue dummy
read(e.g. read PHYID1(offset 0x2) register) right after write.

Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
Signed-off-by: Yoshitoyo Osaki <osaki.yoshitoyo@socionext.com>
---
 drivers/net/ethernet/socionext/netsec.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 273cc5fc07e0..e7faaf8be99e 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -431,9 +431,12 @@ static int netsec_mac_update_to_phy_state(struct netsec_priv *priv)
 	return 0;
 }
 
+static int netsec_phy_read(struct mii_bus *bus, int phy_addr, int reg_addr);
+
 static int netsec_phy_write(struct mii_bus *bus,
 			    int phy_addr, int reg, u16 val)
 {
+	int status;
 	struct netsec_priv *priv = bus->priv;
 
 	if (netsec_mac_write(priv, GMAC_REG_GDR, val))
@@ -446,8 +449,19 @@ static int netsec_phy_write(struct mii_bus *bus,
 			      GMAC_REG_SHIFT_CR_GAR)))
 		return -ETIMEDOUT;
 
-	return netsec_mac_wait_while_busy(priv, GMAC_REG_GAR,
-					  NETSEC_GMAC_GAR_REG_GB);
+	status = netsec_mac_wait_while_busy(priv, GMAC_REG_GAR,
+					    NETSEC_GMAC_GAR_REG_GB);
+
+	/* Developerbox implements RTL8211E PHY and there is
+	 * a compatibility problem with F_GMAC4.
+	 * RTL8211E expects MDC clock must be kept toggling for several
+	 * clock cycle with MDIO high before entering the IDLE state.
+	 * To meet this requirement, netsec driver needs to issue dummy
+	 * read(e.g. read PHYID1(offset 0x2) register) right after write.
+	 */
+	netsec_phy_read(bus, phy_addr, 2);
+
+	return status;
 }
 
 static int netsec_phy_read(struct mii_bus *bus, int phy_addr, int reg_addr)
-- 
2.14.2

^ permalink raw reply related

* [PATCH 3/3] net: socionext: Reset tx queue in ndo_stop
From: masahisa.kojima @ 2018-10-19  1:08 UTC (permalink / raw)
  To: netdev
  Cc: ilias.apalodimas, jaswinder.singh, ard.biesheuvel,
	osaki.yoshitoyo, Masahisa Kojima
In-Reply-To: <20181019010843.3605-1-masahisa.kojima@linaro.org>

From: Masahisa Kojima <masahisa.kojima@linaro.org>

Without resetting tx queue in ndo_stop, packets and bytes count
are not reset when the interface is down.
Eventually, tx queue is exhausted and packets will not be
sent out.

Signed-off-by: Masahisa Kojima <masahisa.kojima@linaro.org>
Signed-off-by: Yoshitoyo Osaki <osaki.yoshitoyo@socionext.com>
---
 drivers/net/ethernet/socionext/netsec.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index e7faaf8be99e..4b32da76d577 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -954,6 +954,9 @@ static void netsec_uninit_pkt_dring(struct netsec_priv *priv, int id)
 	dring->head = 0;
 	dring->tail = 0;
 	dring->pkt_cnt = 0;
+
+	if (id == NETSEC_RING_TX)
+		netdev_reset_queue(priv->ndev);
 }
 
 static void netsec_free_dring(struct netsec_priv *priv, int id)
-- 
2.14.2

^ permalink raw reply related

* [PATCH v3 1/2] dt-bindings: phy: Update SERDES_MAX to be SERDES_MAX + 1
From: Gustavo A. R. Silva @ 2018-10-19  9:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: Rob Herring, Mark Rutland, devicetree, Kishon Vijay Abraham I,
	David S. Miller, Quentin Schulz, netdev, Gustavo A. R. Silva
In-Reply-To: <cover.1539940387.git.gustavo@embeddedor.com>

SERDES_MAX is a valid value to index ctrl->phys in
drivers/phy/mscc/phy-ocelot-serdes.c. But, currently,
there is an out-of-bounds bug in the mentioned driver
when reading from ctrl->phys, because the size of
array ctrl->phys is SERDES_MAX.

Partially fix this by updating SERDES_MAX to be SERDES6G_MAX + 1.

Notice that this is the first part of the solution to
the out-of-bounds bug mentioned above. Although this
change is not dependent on any other one.

Suggested-by: Quentin Schulz <quentin.schulz@bootlin.com>
Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
Changes in v3:
 - Post the patch to netdev.

Changes in v2:
 - Add Quentin's Reviewed-by to commit log.
 - Add Rob's Acked-by to commit log.

 include/dt-bindings/phy/phy-ocelot-serdes.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/dt-bindings/phy/phy-ocelot-serdes.h b/include/dt-bindings/phy/phy-ocelot-serdes.h
index bd28f21..fe70ada 100644
--- a/include/dt-bindings/phy/phy-ocelot-serdes.h
+++ b/include/dt-bindings/phy/phy-ocelot-serdes.h
@@ -7,6 +7,6 @@
 #define SERDES1G_MAX	SERDES1G(5)
 #define SERDES6G(x)	(SERDES1G_MAX + 1 + (x))
 #define SERDES6G_MAX	SERDES6G(2)
-#define SERDES_MAX	SERDES6G_MAX
+#define SERDES_MAX	(SERDES6G_MAX + 1)
 
 #endif
-- 
2.7.4

^ permalink raw reply related

* [PATCH v3 2/2] phy: ocelot-serdes: fix out-of-bounds read
From: Gustavo A. R. Silva @ 2018-10-19  9:21 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kishon Vijay Abraham I, David S. Miller, Quentin Schulz, netdev,
	Gustavo A. R. Silva
In-Reply-To: <cover.1539940387.git.gustavo@embeddedor.com>

Currently, there is an out-of-bounds read on array ctrl->phys,
once variable i reaches the maximum array size of SERDES_MAX
in the for loop.

Fix this by changing the condition in the for loop from
i <= SERDES_MAX to i < SERDES_MAX.

Addresses-Coverity-ID: 1473966 ("Out-of-bounds read")
Addresses-Coverity-ID: 1473959 ("Out-of-bounds read")
Fixes: 51f6b410fc22 ("phy: add driver for Microsemi Ocelot SerDes muxing")
Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
Changes in v3:
 - Post the patch to netdev.

Changes in v2:
 - Rebase and add Quentin's Reviewed-by to commit log.

 drivers/phy/mscc/phy-ocelot-serdes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/phy/mscc/phy-ocelot-serdes.c b/drivers/phy/mscc/phy-ocelot-serdes.c
index b2be546..cbb49d9 100644
--- a/drivers/phy/mscc/phy-ocelot-serdes.c
+++ b/drivers/phy/mscc/phy-ocelot-serdes.c
@@ -206,7 +206,7 @@ static struct phy *serdes_simple_xlate(struct device *dev,
 	port = args->args[0];
 	idx = args->args[1];
 
-	for (i = 0; i <= SERDES_MAX; i++) {
+	for (i = 0; i < SERDES_MAX; i++) {
 		struct serdes_macro *macro = phy_get_drvdata(ctrl->phys[i]);
 
 		if (idx != macro->idx)
@@ -260,7 +260,7 @@ static int serdes_probe(struct platform_device *pdev)
 	if (IS_ERR(ctrl->regs))
 		return PTR_ERR(ctrl->regs);
 
-	for (i = 0; i <= SERDES_MAX; i++) {
+	for (i = 0; i < SERDES_MAX; i++) {
 		ret = serdes_phy_create(ctrl, i, &ctrl->phys[i]);
 		if (ret)
 			return ret;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v4 bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
From: Alexei Starovoitov @ 2018-10-19  1:33 UTC (permalink / raw)
  To: Daniel Borkmann, Song Liu, netdev@vger.kernel.org
  Cc: ast@kernel.org, Kernel Team
In-Reply-To: <74f40850-9040-4131-b6a0-5727e9e5e19e@iogearbox.net>

On 10/18/18 5:14 PM, Daniel Borkmann wrote:
>> +	case bpf_ctx_range(struct __sk_buff, data_meta):
>> +	case bpf_ctx_range(struct __sk_buff, flow_keys):
>> +		return false;
> ... if it's disallowed anyway (disallowing it is the right thing to do,
> but no need to save/restore then..)?
>

that's a good point.
why shouldn't we allow cg_skb to access data_meta?
xdp can set it and cgroup_skb_ingress will consume it here.


^ permalink raw reply

* [PATCH] can: flexcan: Implement CAN Runtime PM
From: Joakim Zhang @ 2018-10-19  9:45 UTC (permalink / raw)
  To: wg@grandegger.com, mkl@pengutronix.de, davem@davemloft.net
  Cc: linux-can@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, dl-linux-imx, A.s. Dong,
	Joakim Zhang

From: Dong Aisheng <aisheng.dong@nxp.com>

Flexcan will be disabled during suspend if no wakeup function required
and enabled after resume accordingly. During this period, we could
explicitly disable clocks.

Implement Runtime PM which will:
1) Keep device in suspend state (clocks disabled) if it's not openned
2) Make Power Domain framework be able to shutdown the corresponding power
domain of this device.

Signed-off-by: Dong Aisheng <aisheng.dong@nxp.com>
Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
---
 drivers/net/can/flexcan.c | 125 +++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 43 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 8e972ef08637..3813f6708201 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -23,6 +23,7 @@
 #include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
 #include <linux/regulator/consumer.h>
 
 #define DRV_NAME			"flexcan"
@@ -266,6 +267,7 @@ struct flexcan_priv {
 	u32 reg_imask1_default;
 	u32 reg_imask2_default;
 
+	struct device *dev;
 	struct clk *clk_ipg;
 	struct clk *clk_per;
 	const struct flexcan_devtype_data *devtype_data;
@@ -369,6 +371,27 @@ static inline void flexcan_error_irq_disable(const struct flexcan_priv *priv)
 	priv->write(reg_ctrl, &regs->ctrl);
 }
 
+static int flexcan_clks_enable(const struct flexcan_priv *priv)
+{
+	int err;
+
+	err = clk_prepare_enable(priv->clk_ipg);
+	if (err)
+		return err;
+
+	err = clk_prepare_enable(priv->clk_per);
+	if (err)
+		clk_disable_unprepare(priv->clk_ipg);
+
+	return err;
+}
+
+static void flexcan_clks_disable(const struct flexcan_priv *priv)
+{
+	clk_disable_unprepare(priv->clk_ipg);
+	clk_disable_unprepare(priv->clk_per);
+}
+
 static inline int flexcan_transceiver_enable(const struct flexcan_priv *priv)
 {
 	if (!priv->reg_xceiver)
@@ -495,19 +518,11 @@ static int flexcan_get_berr_counter(const struct net_device *dev,
 	const struct flexcan_priv *priv = netdev_priv(dev);
 	int err;
 
-	err = clk_prepare_enable(priv->clk_ipg);
-	if (err)
-		return err;
-
-	err = clk_prepare_enable(priv->clk_per);
-	if (err)
-		goto out_disable_ipg;
+	pm_runtime_get_sync(priv->dev);
 
 	err = __flexcan_get_berr_counter(dev, bec);
 
-	clk_disable_unprepare(priv->clk_per);
- out_disable_ipg:
-	clk_disable_unprepare(priv->clk_ipg);
+	pm_runtime_put(priv->dev);
 
 	return err;
 }
@@ -1096,17 +1111,13 @@ static int flexcan_open(struct net_device *dev)
 	struct flexcan_priv *priv = netdev_priv(dev);
 	int err;
 
-	err = clk_prepare_enable(priv->clk_ipg);
+	err = pm_runtime_get_sync(priv->dev);
 	if (err)
 		return err;
 
-	err = clk_prepare_enable(priv->clk_per);
-	if (err)
-		goto out_disable_ipg;
-
 	err = open_candev(dev);
 	if (err)
-		goto out_disable_per;
+		goto out_pm_runtime;
 
 	err = request_irq(dev->irq, flexcan_irq, IRQF_SHARED, dev->name, dev);
 	if (err)
@@ -1128,10 +1139,8 @@ static int flexcan_open(struct net_device *dev)
 	free_irq(dev->irq, dev);
  out_close:
 	close_candev(dev);
- out_disable_per:
-	clk_disable_unprepare(priv->clk_per);
- out_disable_ipg:
-	clk_disable_unprepare(priv->clk_ipg);
+ out_pm_runtime:
+	pm_runtime_put(priv->dev);
 
 	return err;
 }
@@ -1145,11 +1154,11 @@ static int flexcan_close(struct net_device *dev)
 	flexcan_chip_stop(dev);
 
 	free_irq(dev->irq, dev);
-	clk_disable_unprepare(priv->clk_per);
-	clk_disable_unprepare(priv->clk_ipg);
 
 	close_candev(dev);
 
+	pm_runtime_put(priv->dev);
+
 	can_led_event(dev, CAN_LED_EVENT_STOP);
 
 	return 0;
@@ -1188,18 +1197,10 @@ static int register_flexcandev(struct net_device *dev)
 	struct flexcan_regs __iomem *regs = priv->regs;
 	u32 reg, err;
 
-	err = clk_prepare_enable(priv->clk_ipg);
-	if (err)
-		return err;
-
-	err = clk_prepare_enable(priv->clk_per);
-	if (err)
-		goto out_disable_ipg;
-
 	/* select "bus clock", chip must be disabled */
 	err = flexcan_chip_disable(priv);
 	if (err)
-		goto out_disable_per;
+		return err;
 	reg = priv->read(&regs->ctrl);
 	reg |= FLEXCAN_CTRL_CLK_SRC;
 	priv->write(reg, &regs->ctrl);
@@ -1228,13 +1229,8 @@ static int register_flexcandev(struct net_device *dev)
 
 	err = register_candev(dev);
 
-	/* disable core and turn off clocks */
  out_chip_disable:
 	flexcan_chip_disable(priv);
- out_disable_per:
-	clk_disable_unprepare(priv->clk_per);
- out_disable_ipg:
-	clk_disable_unprepare(priv->clk_ipg);
 
 	return err;
 }
@@ -1342,6 +1338,7 @@ static int flexcan_probe(struct platform_device *pdev)
 		priv->write = flexcan_write_le;
 	}
 
+	priv->dev = &pdev->dev;
 	priv->can.clock.freq = clock_freq;
 	priv->can.bittiming_const = &flexcan_bittiming_const;
 	priv->can.do_set_mode = flexcan_set_mode;
@@ -1388,22 +1385,35 @@ static int flexcan_probe(struct platform_device *pdev)
 	if (err)
 		goto failed_offload;
 
+	pm_runtime_enable(&pdev->dev);
+	err = pm_runtime_get_sync(&pdev->dev);
+	if (err < 0) {
+		dev_err(&pdev->dev, "pm_runtime_get failed(%d)\n", err);
+		goto failed_rpm_disable;
+	}
+
 	err = register_flexcandev(dev);
 	if (err) {
 		dev_err(&pdev->dev, "registering netdev failed\n");
-		goto failed_register;
+		goto failed_rpm_put;
 	}
 
 	devm_can_led_init(dev);
 
+	pm_runtime_put(&pdev->dev);
+
 	dev_info(&pdev->dev, "device registered (reg_base=%p, irq=%d)\n",
 		 priv->regs, dev->irq);
 
 	return 0;
 
+ failed_rpm_put:
+	pm_runtime_put(&pdev->dev);
+ failed_rpm_disable:
+	pm_runtime_disable(&pdev->dev);
  failed_offload:
- failed_register:
 	free_candev(dev);
+
 	return err;
 }
 
@@ -1413,6 +1423,7 @@ static int flexcan_remove(struct platform_device *pdev)
 	struct flexcan_priv *priv = netdev_priv(dev);
 
 	unregister_flexcandev(dev);
+	pm_runtime_disable(&pdev->dev);
 	can_rx_offload_del(&priv->offload);
 	free_candev(dev);
 
@@ -1423,38 +1434,66 @@ static int __maybe_unused flexcan_suspend(struct device *device)
 {
 	struct net_device *dev = dev_get_drvdata(device);
 	struct flexcan_priv *priv = netdev_priv(dev);
-	int err;
+	int err = 0;
 
 	if (netif_running(dev)) {
 		err = flexcan_chip_disable(priv);
 		if (err)
 			return err;
+		err = pm_runtime_force_suspend(device);
+
 		netif_stop_queue(dev);
 		netif_device_detach(dev);
 	}
 	priv->can.state = CAN_STATE_SLEEPING;
 
-	return 0;
+	return err;
 }
 
 static int __maybe_unused flexcan_resume(struct device *device)
 {
 	struct net_device *dev = dev_get_drvdata(device);
 	struct flexcan_priv *priv = netdev_priv(dev);
-	int err;
+	int err = 0;
 
 	priv->can.state = CAN_STATE_ERROR_ACTIVE;
 	if (netif_running(dev)) {
 		netif_device_attach(dev);
 		netif_start_queue(dev);
-		err = flexcan_chip_enable(priv);
+
+		err = pm_runtime_force_resume(device);
 		if (err)
 			return err;
+		err = flexcan_chip_enable(priv);
 	}
+	return err;
+}
+
+static int __maybe_unused flexcan_runtime_suspend(struct device *device)
+{
+	struct net_device *dev = dev_get_drvdata(device);
+	struct flexcan_priv *priv = netdev_priv(dev);
+
+	flexcan_clks_disable(priv);
+
 	return 0;
 }
 
-static SIMPLE_DEV_PM_OPS(flexcan_pm_ops, flexcan_suspend, flexcan_resume);
+static int __maybe_unused flexcan_runtime_resume(struct device *device)
+{
+	struct net_device *dev = dev_get_drvdata(device);
+	struct flexcan_priv *priv = netdev_priv(dev);
+
+	flexcan_clks_enable(priv);
+
+	return 0;
+}
+
+static const struct dev_pm_ops flexcan_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(flexcan_suspend, flexcan_resume)
+	SET_RUNTIME_PM_OPS(flexcan_runtime_suspend, flexcan_runtime_resume,
+						NULL)
+};
 
 static struct platform_driver flexcan_driver = {
 	.driver = {
-- 
2.17.1

^ permalink raw reply related

* BUG: unable to handle kernel NULL pointer dereference in __do_softirq
From: syzbot @ 2018-10-19  9:49 UTC (permalink / raw)
  To: brouer, davem, edumazet, jasowang, linux-kernel, mst, netdev, sd,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    4cb967a3d9d9 selftests/bpf: fix file resource leak
git tree:       bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15a82ba1400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2c8e578694e13f29
dashboard link: https://syzkaller.appspot.com/bug?extid=6528c946c37ba2809d0a
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=127664f6400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6528c946c37ba2809d0a@syzkaller.appspotmail.com

device nr0\x01 entered promiscuous mode
device nr0\x01 entered promiscuous mode
device nr0\x01 entered promiscuous mode
device nr0\x01 entered promiscuous mode
device nr0\x01 entered promiscuous mode
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
PGD 1d3a54067 P4D 1d3a54067 PUD 1ba78a067 PMD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 10511 Comm: syz-executor4 Not tainted 4.19.0-rc7+ #125
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:          (null)
Code: Bad RIP value.
RSP: 0018:ffff8801daf07868 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff860a27bf
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffff8801ab2bd298
RBP: ffff8801daf07df8 R08: ffff8801c7b2e140 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: dffffc0000000000 R14: ffff8801daf07dd0 R15: ffff8801ab2bd298
FS:  00007f851cacc700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000001bc908000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  <IRQ>
  __do_softirq+0x30b/0xad8 kernel/softirq.c:292
  do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1047
  </IRQ>
  do_softirq.part.13+0x126/0x160 kernel/softirq.c:336
  do_softirq kernel/softirq.c:328 [inline]
  __local_bh_enable_ip+0x21d/0x260 kernel/softirq.c:189
  local_bh_enable include/linux/bottom_half.h:32 [inline]
  tun_get_user+0x1c0b/0x4250 drivers/net/tun.c:1963
  tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1993
  call_write_iter include/linux/fs.h:1808 [inline]
  new_sync_write fs/read_write.c:474 [inline]
  __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
  vfs_write+0x1fc/0x560 fs/read_write.c:549
  ksys_write+0x101/0x260 fs/read_write.c:598
  __do_sys_write fs/read_write.c:610 [inline]
  __se_sys_write fs/read_write.c:607 [inline]
  __x64_sys_write+0x73/0xb0 fs/read_write.c:607
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f851cacbc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
RDX: 000000000000017b RSI: 0000000020000000 RDI: 000000000000000e
RBP: 000000000072c040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f851cacc6d4
R13: 00000000004c5560 R14: 00000000004d8e98 R15: 00000000ffffffff
Modules linked in:
CR2: 0000000000000000
---[ end trace 108e300304238a6e ]---
RIP: 0010:          (null)
Code: Bad RIP value.
RSP: 0018:ffff8801daf07868 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff860a27bf
RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffff8801ab2bd298
RBP: ffff8801daf07df8 R08: ffff8801c7b2e140 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
R13: dffffc0000000000 R14: ffff8801daf07dd0 R15: ffff8801ab2bd298
FS:  00007f851cacc700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000001bc908000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH v4 bpf-next 0/2] bpf: add cg_skb_is_valid_access
From: Alexei Starovoitov @ 2018-10-19  1:46 UTC (permalink / raw)
  To: Alexei Starovoitov, Song Liu
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Kernel Team
In-Reply-To: <20181018233403.xnea2m7ntwxue44r@ast-mbp.dhcp.thefacebook.com>

On 10/18/18 4:34 PM, Alexei Starovoitov wrote:
> On Thu, Oct 18, 2018 at 09:06:47AM -0700, Song Liu wrote:
>> Changes v3 -> v4:
>> 1. Fixed crash issue reported by Alexei.
>>
>> Changes v2 -> v3:
>> 1. Added helper function bpf_compute_and_save_data_pointers() and
>>    bpf_restore_data_pointers().
>>
>> Changes v1 -> v2:
>> 1. Updated the list of read-only fields, and read-write fields.
>> 2. Added dummy sk to bpf_prog_test_run_skb().
>>
>> This set enables BPF program of type BPF_PROG_TYPE_CGROUP_SKB to access
>> some __skb_buff data directly.
>
> Applied, Thanks

Daniel brought up good point re: data_meta. Let's address it one way
or the other and respin the series instead of adding follow up patches.
So I tossed this series out of bpf-next.

^ permalink raw reply

* Re: netconsole warning in 4.19.0-rc7
From: Meelis Roos @ 2018-10-19  9:51 UTC (permalink / raw)
  To: Dave Jones, Cong Wang, LKML, Linux Kernel Network Developers
In-Reply-To: <20181018145922.ytqxvdihp3iqounl@codemonkey.org.uk>

>   > > I took another look at that error path. Turns out this is all we need I
>   > > think..
>   >
>   > With this patch applied on top of 4.19-rc8, I stll get the warning:
>   >
>   > [   13.722919] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:168 __local_bh_enable_ip+0x2e/0x44
> 
> It's going to be a couple days before I get chance to get back to this.
> Did the previous patch in this thread (without the _bh) fare any better?

Tried it with rcu_read_unlock() instead, still the same.

-- 
Meelis Roos <mroos@linux.ee>

^ permalink raw reply

* Re: [PATCH v2 net] r8169: fix NAPI handling under high load
From: Eric Dumazet @ 2018-10-19  1:51 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Heiner Kallweit, David Miller, Realtek linux nic maintainers,
	netdev@vger.kernel.org
In-Reply-To: <20181018225900.GA16653@electric-eye.fr.zoreil.com>



On 10/18/2018 03:59 PM, Francois Romieu wrote:
> Eric Dumazet <eric.dumazet@gmail.com> :
> [...]
>> One has to wonder why rtl8169_poll(), which might be called in a loop under DOS,
>> has to call rtl_ack_events() ?
> 
> So as to cover a wider temporal range before any event can trigger an
> extra irq. I was more worried about irq cost than about IO cost (and
> I still am).
> 
Normally the IRQ would not be enabled under DOS.

Only when a poll would receive less packets than the budget
the following would normally allow the device to send another IRQ

if (work_done < budget) {
    napi_complete_done(napi, work_done);
    rtl_irq_enable(tp, enable_mask);
    mmiowb();
}

^ permalink raw reply

* Re: [PATCH v4 bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB
From: Alexei Starovoitov @ 2018-10-19  2:03 UTC (permalink / raw)
  To: Daniel Borkmann, Song Liu, netdev@vger.kernel.org
  Cc: ast@kernel.org, Kernel Team
In-Reply-To: <13f972be-ab41-8519-24cc-4e891579dc08@fb.com>

On 10/18/18 6:33 PM, Alexei Starovoitov wrote:
> On 10/18/18 5:14 PM, Daniel Borkmann wrote:
>>> +    case bpf_ctx_range(struct __sk_buff, data_meta):
>>> +    case bpf_ctx_range(struct __sk_buff, flow_keys):
>>> +        return false;
>> ... if it's disallowed anyway (disallowing it is the right thing to do,
>> but no need to save/restore then..)?
>>
>
> that's a good point.
> why shouldn't we allow cg_skb to access data_meta?
> xdp can set it and cgroup_skb_ingress will consume it here.

I'll take it back.
When xdp doesn't set meta_data it will be zero and
bpf_compute_data_pointers() will point data_meta to skb->data.
On ingress that's eth header, but for tx it will point
to reserved space for future eth header.
So we cannot do that.
Let's keep it disabled and adjust
bpf_compute_and_save_data_pointers() to save only 'data' pointer.


^ permalink raw reply

* Re: [RFC] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Jason Wang @ 2018-10-19  2:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Toshiaki Makita, netdev, virtualization, tglx, Michael S. Tsirkin,
	David S. Miller
In-Reply-To: <20181018084753.wefvsypdevbzoadg@linutronix.de>


On 2018/10/18 下午4:47, Sebastian Andrzej Siewior wrote:
> On 2018-10-17 14:48:02 [+0800], Jason Wang wrote:
>> On 2018/10/17 上午9:13, Toshiaki Makita wrote:
>>> I'm not sure what condition triggered this warning.
> If the seqlock is acquired once in softirq and then in process context
> again it is enough evidence for lockdep to trigger this warning.
>
>>> Toshiaki Makita
>>
>> Or maybe NAPI is enabled unexpectedly somewhere?
>>
>> Btw, the schedule_delayed_work() in virtnet_open() is also suspicious, if
>> the work is executed before virtnet_napi_enable(), there will be a deadloop
>> for napi_disable().
> something like this? It is also likely if it runs OOM on queue 2, it
> will run OOM again on queue 3.
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index fbcfb4d272336..87d6ec4765270 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1263,22 +1263,22 @@ static void refill_work(struct work_struct *work)
>   {
>   	struct virtnet_info *vi =
>   		container_of(work, struct virtnet_info, refill.work);
> -	bool still_empty;
> +	int still_empty = 0;
>   	int i;
>   
>   	for (i = 0; i < vi->curr_queue_pairs; i++) {
>   		struct receive_queue *rq = &vi->rq[i];
>   
>   		napi_disable(&rq->napi);
> -		still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
> +		if (!try_fill_recv(vi, rq, GFP_KERNEL))
> +		    still_empty++;
>   		virtnet_napi_enable(rq->vq, &rq->napi);
> -
> -		/* In theory, this can happen: if we don't get any buffers in
> -		 * we will *never* try to fill again.
> -		 */
> -		if (still_empty)
> -			schedule_delayed_work(&vi->refill, HZ/2);
>   	}
> +	/* In theory, this can happen: if we don't get any buffers in
> +	 * we will *never* try to fill again.
> +	 */
> +	if (still_empty)
> +		schedule_delayed_work(&vi->refill, HZ/2);
>   }


I think this part is not a must or an independent optimization?


Thanks


>   
>   static int virtnet_receive(struct receive_queue *rq, int budget,
> @@ -1407,12 +1407,13 @@ static int virtnet_open(struct net_device *dev)
>   {
>   	struct virtnet_info *vi = netdev_priv(dev);
>   	int i, err;
> +	int need_refill = 0;
>   
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		if (i < vi->curr_queue_pairs)
>   			/* Make sure we have some buffers: if oom use wq. */
>   			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> -				schedule_delayed_work(&vi->refill, 0);
> +				need_refill++;
>   
>   		err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i);
>   		if (err < 0)
> @@ -1428,6 +1429,8 @@ static int virtnet_open(struct net_device *dev)
>   		virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
>   		virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi);
>   	}
> +	if (need_refill)
> +		schedule_delayed_work(&vi->refill, 0);
>   
>   	return 0;
>   }
> @@ -2236,6 +2239,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>   {
>   	struct virtnet_info *vi = vdev->priv;
>   	int err, i;
> +	int need_refill = 0;
>   
>   	err = init_vqs(vi);
>   	if (err)
> @@ -2246,13 +2250,15 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>   	if (netif_running(vi->dev)) {
>   		for (i = 0; i < vi->curr_queue_pairs; i++)
>   			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> -				schedule_delayed_work(&vi->refill, 0);
> +				need_refill++;
>   
>   		for (i = 0; i < vi->max_queue_pairs; i++) {
>   			virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
>   			virtnet_napi_tx_enable(vi, vi->sq[i].vq,
>   					       &vi->sq[i].napi);
>   		}
> +		if (need_refill)
> +			schedule_delayed_work(&vi->refill, 0);
>   	}
>   
>   	netif_device_attach(vi->dev);
>
>> Thanks
> Sebastian

^ permalink raw reply

* Re: [PATCH] virtio_net: add local_bh_disable() around u64_stats_update_begin
From: Jason Wang @ 2018-10-19  2:19 UTC (permalink / raw)
  To: David Miller, bigeasy
  Cc: stephen, netdev, virtualization, tglx, makita.toshiaki, mst
In-Reply-To: <20181018.162308.2295937118791060714.davem@davemloft.net>


On 2018/10/19 上午7:23, David Miller wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Thu, 18 Oct 2018 10:43:13 +0200
>
>> on 32bit, lockdep notices that virtnet_open() and refill_work() invoke
>> try_fill_recv() from process context while virtnet_receive() invokes the
>> same function from BH context. The problem that the seqcounter within
>> u64_stats_update_begin() may deadlock if it is interrupted by BH and
>> then acquired again.
>>
>> Introduce u64_stats_update_begin_bh() which disables BH on 32bit
>> architectures. Since the BH might interrupt the worker, this new
>> function should not limited to SMP like the others which are expected
>> to be used in softirq.
>>
>> With this change we might lose increments but this is okay. The
>> important part that the two 32bit parts of the 64bit counter are not
>> corrupted.
>>
>> Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats").
>> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
>> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Trying to get down to the bottom of this:
>
> 1) virtnet_receive() runs from softirq but only if NAPI is active and
>     enabled.  It is in this context that it invokes try_fill_recv().
>
> 2) refill_work() runs from process context, but disables NAPI (and
>     thus invocation of virtnet_receive()) before calling
>     try_fill_recv().
>
> 3) virtnet_open() invokes from process context as well, but before the
>     NAPI instances are enabled, it is same as case #2.
>
> 4) virtnet_restore_up() is the same situations as #3.
>
> Therefore I agree that this is a false positive, and simply lockdep
> cannot see the NAPI synchronization done by case #2.
>
> I think we shouldn't add unnecessary BH disabling here, and instead
> find some way to annotate this for lockdep's sake.
>
> Thank you.
>

+1

Thanks

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox