Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER
From: David Abdurachmanov @ 2019-08-26 16:39 UTC (permalink / raw)
  To: Tycho Andersen
  Cc: Paul Walmsley, Palmer Dabbelt, Albert Ou, Oleg Nesterov,
	Kees Cook, Andy Lutomirski, Will Drewry, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, David Abdurachmanov, Thomas Gleixner,
	Allison Randal, Alexios Zavras, Anup Patel, Vincent Chen,
	Alan Kao, linux-riscv, linux-kernel, linux-kselftest, netdev, bpf,
	me
In-Reply-To: <20190826145756.GB4664@cisco>

On Mon, Aug 26, 2019 at 7:57 AM Tycho Andersen <tycho@tycho.ws> wrote:
>
> Hi,
>
> On Fri, Aug 23, 2019 at 05:30:53PM -0700, Paul Walmsley wrote:
> > On Thu, 22 Aug 2019, David Abdurachmanov wrote:
> >
> > > There is one failing kernel selftest: global.user_notification_signal
> >
> > Also - could you follow up with the author of this failing test to see if
> > we can get some more clarity about what might be going wrong here?  It
> > appears that the failing test was added in commit 6a21cc50f0c7f ("seccomp:
> > add a return code to trap to userspace") by Tycho Andersen
> > <tycho@tycho.ws>.
>
> Can you post an strace and a cat of /proc/$pid/stack for both tasks
> where it gets stuck? I don't have any riscv hardware, and it "works
> for me" on x86 and arm64 with 100 tries.

I don't have the a build with SECCOMP for the board right now, so it
will have to wait. I just finished a new kernel (almost rc6) for Fedora,
but it will take time to assemble new repositories and a disk image.

There is older disk image available (5.2.0-rc7 kernel with v2 SECCOMP)
for QEMU or libvirt/QEMU:

https://dl.fedoraproject.org/pub/alt/risc-v/disk-images/fedora/rawhide/20190703.n.0/Developer/
https://fedoraproject.org/wiki/Architectures/RISC-V/Installing#Boot_with_libvirt

(If you are interesting trying it locally.)

IIRC I attempted to connected with strace, but it quickly returns and fails
properly. Simply put strace unblocks whatever is stuck.

david

^ permalink raw reply

* Re: [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
From: Jakub Kicinski @ 2019-08-26 16:41 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Song Liu, bpf, Networking, OSS Drivers,
	Jiong Wang
In-Reply-To: <1417962c-e63d-6c46-bf07-9284f5332583@iogearbox.net>

On Mon, 26 Aug 2019 18:25:10 +0200, Daniel Borkmann wrote:
> On 8/26/19 6:18 PM, Alexei Starovoitov wrote:
> > On Mon, Aug 26, 2019 at 8:57 AM Jakub Kicinski
> > <jakub.kicinski@netronome.com> wrote:  
> >> On Sun, Aug 25, 2019 at 10:37 PM Song Liu <liu.song.a23@gmail.com> wrote:  
> >>> On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski wrote:  
> >>>> From: Jiong Wang <jiong.wang@netronome.com>
> >>>>
> >>>> NFP is using Local Memory to model stack. LM_addr could be used as base of
> >>>> a 16 32-bit word region of Local Memory. Then, if the stack offset is
> >>>> beyond the current region, the local index needs to be updated. The update
> >>>> needs at least three cycles to take effect, therefore the sequence normally
> >>>> looks like:
> >>>>
> >>>>    local_csr_wr[ActLMAddr3, gprB_5]
> >>>>    nop
> >>>>    nop
> >>>>    nop
> >>>>
> >>>> If the local index switch happens on a narrow loads, then the instruction
> >>>> preparing value to zero high 32-bit of the destination register could be
> >>>> counted as one cycle, the sequence then could be something like:
> >>>>
> >>>>    local_csr_wr[ActLMAddr3, gprB_5]
> >>>>    nop
> >>>>    nop
> >>>>    immed[gprB_5, 0]
> >>>>
> >>>> However, we have zero extension optimization that zeroing high 32-bit could
> >>>> be eliminated, therefore above IMMED insn won't be available for which case
> >>>> the first sequence needs to be generated.
> >>>>
> >>>> Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
> >>>> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> >>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>  
> >>> I haven't looked into the code yet. But ^^^ should be
> >>>
> >>> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >>>
> >>> right?  
> >>
> >> I prefer Review on code I review, ack on code I ack, and sign-off on
> >> code I co-author.  
> > 
> > I believe if you're sending somebody else patch you have to add your SOB
> > in addition to their 'Author:' and their SOB fields.  
> 
> +1, for co-authoring there's a 'Co-authored-by:' tag which seems to be frequently
> used these days.

Ack, there is a difference between co-author of code, and co-author as
step by step guidance. I've been doing this for 6 years now, and nobody
ever complained :)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Is that enough or should I repost?

^ permalink raw reply

* Re: Unable to create htb tc classes more than 64K
From: Jesper Dangaard Brouer @ 2019-08-26 16:45 UTC (permalink / raw)
  To: Akshat Kakkar
  Cc: brouer, Cong Wang, NetFilter, lartc, netdev, Eric Dumazet,
	Toke Høiland-Jørgensen, Anton Danilov
In-Reply-To: <CAA5aLPjzX+9YFRGgCgceHjkU0=e6x8YMENfp_cC9fjfHYK3e+A@mail.gmail.com>

On Sun, 18 Aug 2019 00:34:33 +0530
Akshat Kakkar <akshat.1984@gmail.com> wrote:

> My goal is not just to make as many classes as possible, but also to
> use them to do rate limiting per ip per server. Say, I have a list of
> 10000 IPs and more than 100 servers. So simply if I want few IPs to
> get speed of says 1Mbps per server but others say speed of 2 Mbps per
> server. How can I achieve this without having 10000 x 100 classes.
> These numbers can be large than this and hence I am looking for a
> generic solution to this.

As Eric Dumazet also points out indirectly, you will be creating a huge
bottleneck for SMP/multi-core CPUs.  As your HTB root qdisc is a
serialization point for all egress traffic, that all CPUs will need to
take a lock on.

It sounds like your use-case is not global rate limiting, but instead
the goal is to rate limit customers or services (to something
significantly lower than NIC link speed).  To get scalability, in this
case, you can instead use the MQ qdisc (as Eric also points out).
I have an example script here[1], that shows how to setup MQ as root
qdisc and add HTB leafs based on how many TX-queue the interface have
via /sys/class/net/$DEV/queues/tx-*/

[1] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/bin/tc_mq_htb_setup_example.sh


You are not done, yet.  For solving the TX-queue locking congestion, the
traffic needs to be redirected to the appropriate/correct TX CPUs. This
can either be done with RSS (Receive Side Scaling) HW ethtool
adjustment (reduce hash to IPs L3 only), or RPS (Receive Packet
Steering), or with XDP cpumap redirect.

The XDP cpumap redirect feature is implemented with XDP+TC BPF code
here[2]. Notice, that XPS can screw with this so there is a XPS disable
script here[3].


[2] https://github.com/xdp-project/xdp-cpumap-tc
[3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/bin/xps_setup.sh

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH 1/1] netfilter: nf_tables: fib: Drop IPV6 packages if IPv6 is disabled on boot
From: Leonardo Bras @ 2019-08-26 16:47 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso
  Cc: netfilter-devel, coreteam, netdev, linux-kernel, Jozsef Kadlecsik,
	David S. Miller
In-Reply-To: <20190821095844.me6kscvnfruinseu@salvia>

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

Hello Pablo, Florian,

I implemented a V2 of this patch with the changes you proposed.
Could you please give your feedback on that patch?
https://lkml.org/lkml/2019/8/21/527

Thanks!

On Wed, 2019-08-21 at 11:58 +0200, Pablo Neira Ayuso wrote:
> On Tue, Aug 20, 2019 at 01:15:58PM -0300, Leonardo Bras wrote:
> > On Tue, 2019-08-20 at 07:36 +0200, Florian Westphal wrote:
> > > Wouldn't fib_netdev.c have the same problem?
> > Probably, but I haven't hit this issue yet.
> > 
> > > If so, might be better to place this test in both
> > > nft_fib6_eval_type and nft_fib6_eval.
> > 
> > I think that is possible, and not very hard to do.
> > 
> > But in my humble viewpoint, it looks like it's nft_fib_inet_eval() and
> > nft_fib_netdev_eval() have the responsibility to choose a valid
> > protocol or drop the package. 
> > I am not sure if it would be a good move to transfer this
> > responsibility to nft_fib6_eval_type() and nft_fib6_eval(), so I would
> > rather add the same test to nft_fib_netdev_eval().
> > 
> > Does it make sense?
> 
> Please, update common code to netdev and ip6 extensions as Florian
> suggests.
> 
> Thanks.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
From: Soheil Hassas Yeganeh @ 2019-08-26 16:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Neal Cardwell, Eric Dumazet,
	Jason Baron, Vladimir Rutsky
In-Reply-To: <20190826161915.81676-1-edumazet@google.com>

On Mon, Aug 26, 2019 at 12:19 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Vladimir Rutsky reported stuck TCP sessions after memory pressure
> events. Edge Trigger epoll() user would never receive an EPOLLOUT
> notification allowing them to retry a sendmsg().
>
> Jason tested the case of sk_stream_alloc_skb() returning NULL,
> but there are other paths that could lead both sendmsg() and sendpage()
> to return -1 (EAGAIN), with an empty skb queued on the write queue.
>
> This patch makes sure we remove this empty skb so that
> Jason code can detect that the queue is empty, and
> call sk->sk_write_space(sk) accordingly.
>
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky <rutsky@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Nice find!

> ---
>  net/ipv4/tcp.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 77b485d60b9d0e00edc4e2f0d6c5bb3a9460b23b..61082065b26a068975c411b74eb46739ab0632ca 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -935,6 +935,22 @@ static int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
>         return mss_now;
>  }
>
> +/* In some cases, both sendpage() and sendmsg() could have added
> + * an skb to the write queue, but failed adding payload on it.
> + * We need to remove it to consume less memory, but more
> + * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
> + * users.
> + */
> +static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
> +{
> +       if (skb && !skb->len) {
> +               tcp_unlink_write_queue(skb, sk);
> +               if (tcp_write_queue_empty(sk))
> +                       tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> +               sk_wmem_free_skb(sk, skb);
> +       }
> +}
> +
>  ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>                          size_t size, int flags)
>  {
> @@ -1064,6 +1080,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
>         return copied;
>
>  do_error:
> +       tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk));
>         if (copied)
>                 goto out;
>  out_err:
> @@ -1388,18 +1405,11 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>         sock_zerocopy_put(uarg);
>         return copied + copied_syn;
>
> +do_error:
> +       skb = tcp_write_queue_tail(sk);
>  do_fault:
> -       if (!skb->len) {
> -               tcp_unlink_write_queue(skb, sk);
> -               /* It is the one place in all of TCP, except connection
> -                * reset, where we can be unlinking the send_head.
> -                */
> -               if (tcp_write_queue_empty(sk))
> -                       tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> -               sk_wmem_free_skb(sk, skb);
> -       }
> +       tcp_remove_empty_skb(sk, skb);
>
> -do_error:
>         if (copied + copied_syn)
>                 goto out;
>  out_err:
> --
> 2.23.0.187.g17f5b7556c-goog
>

^ permalink raw reply

* Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: Jakub Kicinski @ 2019-08-26 16:55 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, Roopa Prabhu, netdev, David Miller,
	Stephen Hemminger, dcbw, Michal Kubecek, Andrew Lunn, parav,
	Saeed Mahameed, mlxsw
In-Reply-To: <20190826160916.GE2309@nanopsycho.orion>

On Mon, 26 Aug 2019 18:09:16 +0200, Jiri Pirko wrote:
> DaveA, Roopa. Do you insist on doing add/remove of altnames in the
> existing setlist command using embedded message op attrs? I'm asking
> because after some time thinking about it, it still feels wrong to me :/
> 
> If this would be a generic netlink api, we would just add another couple
> of commands. What is so different we can't add commands here?
> It is also much simpler code. Easy error handling, no need for
> rollback, no possibly inconsistent state, etc.

+1 the separate op feels like a better uapi to me as well.

Perhaps we could redo the iproute2 command line interface to make the
name the primary object? Would that address your concern Dave and Roopa?

^ permalink raw reply

* Re: [PATCH v2 0/3] Add NETIF_F_HW_BR_CAP feature
From: Florian Fainelli @ 2019-08-26 17:01 UTC (permalink / raw)
  To: Andrew Lunn, Horatiu Vultur
  Cc: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
	allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <20190826123811.GA13411@lunn.ch>

On 8/26/19 5:38 AM, Andrew Lunn wrote:
> On Mon, Aug 26, 2019 at 10:11:12AM +0200, Horatiu Vultur wrote:
>> When a network port is added to a bridge then the port is added in
>> promisc mode. Some HW that has bridge capabilities(can learn, forward,
>> flood etc the frames) they are disabling promisc mode in the network
>> driver when the port is added to the SW bridge.
>>
>> This patch adds the feature NETIF_F_HW_BR_CAP so that the network ports
>> that have this feature will not be set in promisc mode when they are
>> added to a SW bridge.
>>
>> In this way the HW that has bridge capabilities don't need to send all the
>> traffic to the CPU and can also implement the promisc mode and toggle it
>> using the command 'ip link set dev swp promisc on'
> 
> Hi Horatiu
> 
> I'm still not convinced this is needed. The model is, the hardware is
> there to accelerate what Linux can do in software. Any peculiarities
> of the accelerator should be hidden in the driver.  If the accelerator
> can do its job without needing promisc mode, do that in the driver.
> 
> So you are trying to differentiate between promisc mode because the
> interface is a member of a bridge, and promisc mode because some
> application, like pcap, has asked for promisc mode.
> 
> dev->promiscuity is a counter. So what you can do it look at its
> value, and how the interface is being used. If the interface is not a
> member of a bridge, and the count > 0, enable promisc mode in the
> accelerator. If the interface is a member of a bridge, and the count >
> 1, enable promisc mode in the accelerator.

That is an excellent suggestion actually.

Horatiu, the other issue with your approach here is that the features
don't propagate to/from lower/upper/real devices, so if e.g.: you have a
VLAN interface enslaved as a part of the bridge, or a bond, or a tunnel
interface, the logic won't make us check NETIF_F_HW_BR_CAP because those
virtual network devices won't inherit it from their real device. I am
not suggesting you fix this with your patch series, but rather, seek a
driver local solution.
-- 
Florian

^ permalink raw reply

* Re: [PATCH 12/16] arm64: prefer __section from compiler_attributes.h
From: Nick Desaulniers @ 2019-08-26 17:03 UTC (permalink / raw)
  To: Miguel Ojeda
  Cc: Will Deacon, Andrew Morton, Sedat Dilek, Josh Poimboeuf,
	Yonghong Song, clang-built-linux, Catalin Marinas,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrey Konovalov, Greg Kroah-Hartman, Enrico Weigelt,
	Suzuki K Poulose, Thomas Gleixner, Masayoshi Mizuma,
	Shaokun Zhang, Alexios Zavras, Allison Randal, Linux ARM,
	linux-kernel, Network Development, bpf
In-Reply-To: <CANiq72mcSniCzMzW6AX_5tG5W2edjEmZ=Rf=jo-Mw3H-9RVJqw@mail.gmail.com>

On Sat, Aug 24, 2019 at 5:48 AM Miguel Ojeda
<miguel.ojeda.sandonis@gmail.com> wrote:
>
> On Sat, Aug 24, 2019 at 1:25 PM Will Deacon <will@kernel.org> wrote:
> >
> > Which bit are you pinging about? This patch (12/16) has been in -next for a
> > while and is queued in the arm64 tree for 5.4. The Oops/boot issue is
> > addressed in patch 14 which probably needs to be sent as a separate patch
> > (with a commit message) if it's targetting 5.3 and, I assume, routed via
> > somebody like akpm.
>
> I was pinging about the bit I was quoting, i.e. whether the Oops in
> the cover letter was #14 indeed. Also, since Nick said he wanted to
> get this ASAP through compiler-attributes, I assumed he wanted it to
> be in 5.3, but I have not seen the independent patch.
>
> Since he seems busy, I will write a better commit message myself and
> send it to Linus next week.

Sorry, very hectic week here last week.  I'll try to get the import
bit split off, collect the acks/reviewed-by tags, and resend a v2 of
the series this week.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* KASAN: slab-out-of-bounds Read in sctp_inq_pop
From: syzbot @ 2019-08-26 17:14 UTC (permalink / raw)
  To: davem, linux-kernel, linux-sctp, marcelo.leitner, netdev, nhorman,
	syzkaller-bugs, vyasevich

Hello,

syzbot found the following crash on:

HEAD commit:    9733a7c6 Add linux-next specific files for 20190823
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=143ec11e600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f6c78a1438582bd1
dashboard link: https://syzkaller.appspot.com/bug?extid=3ca06c5cb35ee3fc1f89
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3ca06c5cb35ee3fc1f89@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-out-of-bounds in sctp_inq_pop+0xafd/0xd80  
net/sctp/inqueue.c:201
Read of size 2 at addr ffff8880a4e37222 by task syz-executor.3/32407

CPU: 1 PID: 32407 Comm: syz-executor.3 Not tainted 5.3.0-rc5-next-20190823  
#72
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
  __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
  kasan_report+0x12/0x17 mm/kasan/common.c:610
  __asan_report_load2_noabort+0x14/0x20 mm/kasan/generic_report.c:130
  sctp_inq_pop+0xafd/0xd80 net/sctp/inqueue.c:201
  sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:335
  sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
  sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
  sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
  ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
  ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
  NF_HOOK include/linux/netfilter.h:305 [inline]
  NF_HOOK include/linux/netfilter.h:299 [inline]
  ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
  dst_input include/net/dst.h:442 [inline]
  ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
  ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
  ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
  ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
  __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
  __netif_receive_skb_list_core+0x1a2/0x9d0 net/core/dev.c:5087
  __netif_receive_skb_list net/core/dev.c:5149 [inline]
  netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
  gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
  gro_normal_list net/core/dev.c:5755 [inline]
  gro_normal_one net/core/dev.c:5769 [inline]
  napi_frags_finish net/core/dev.c:5782 [inline]
  napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
  tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
  tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
  call_write_iter include/linux/fs.h:1890 [inline]
  do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
  do_iter_write fs/read_write.c:976 [inline]
  do_iter_write+0x17b/0x380 fs/read_write.c:957
  vfs_writev+0x1b3/0x2f0 fs/read_write.c:1021
  do_writev+0x15b/0x330 fs/read_write.c:1064
  __do_sys_writev fs/read_write.c:1137 [inline]
  __se_sys_writev fs/read_write.c:1134 [inline]
  __x64_sys_writev+0x75/0xb0 fs/read_write.c:1134
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459731
Code: 75 14 b8 14 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 34 b9 fb ff c3 48  
83 ec 08 e8 fa 2c 00 00 48 89 04 24 b8 14 00 00 00 0f 05 <48> 8b 3c 24 48  
89 c2 e8 43 2d 00 00 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007fb4cd361ba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 000000000000002a RCX: 0000000000459731
RDX: 0000000000000001 RSI: 00007fb4cd361c00 RDI: 00000000000000f0
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 00007fb4cd3626d4
R13: 00000000004c87e3 R14: 00000000004df640 R15: 00000000ffffffff

Allocated by task 32407:
  save_stack+0x23/0x90 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_kmalloc mm/kasan/common.c:486 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:459
  kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:494
  slab_post_alloc_hook mm/slab.h:584 [inline]
  slab_alloc mm/slab.c:3319 [inline]
  kmem_cache_alloc+0x121/0x710 mm/slab.c:3483
  __build_skb+0x26/0x70 net/core/skbuff.c:310
  __napi_alloc_skb+0x1d2/0x300 net/core/skbuff.c:523
  napi_alloc_skb include/linux/skbuff.h:2801 [inline]
  napi_get_frags net/core/dev.c:5742 [inline]
  napi_get_frags+0x65/0x140 net/core/dev.c:5737
  tun_napi_alloc_frags drivers/net/tun.c:1473 [inline]
  tun_get_user+0x16bd/0x3fa0 drivers/net/tun.c:1834
  tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
  call_write_iter include/linux/fs.h:1890 [inline]
  do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
  do_iter_write fs/read_write.c:976 [inline]
  do_iter_write+0x17b/0x380 fs/read_write.c:957
  vfs_writev+0x1b3/0x2f0 fs/read_write.c:1021
  do_writev+0x15b/0x330 fs/read_write.c:1064
  __do_sys_writev fs/read_write.c:1137 [inline]
  __se_sys_writev fs/read_write.c:1134 [inline]
  __x64_sys_writev+0x75/0xb0 fs/read_write.c:1134
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 3891:
  save_stack+0x23/0x90 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:448
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:456
  __cache_free mm/slab.c:3425 [inline]
  kmem_cache_free+0x86/0x320 mm/slab.c:3693
  kfree_skbmem net/core/skbuff.c:623 [inline]
  kfree_skbmem+0xc5/0x150 net/core/skbuff.c:617
  __kfree_skb net/core/skbuff.c:680 [inline]
  consume_skb net/core/skbuff.c:838 [inline]
  consume_skb+0x103/0x3b0 net/core/skbuff.c:832
  skb_free_datagram+0x1b/0x100 net/core/datagram.c:328
  netlink_recvmsg+0x6c6/0xf50 net/netlink/af_netlink.c:1996
  sock_recvmsg_nosec net/socket.c:871 [inline]
  sock_recvmsg net/socket.c:889 [inline]
  sock_recvmsg+0xce/0x110 net/socket.c:885
  ___sys_recvmsg+0x271/0x5a0 net/socket.c:2480
  __sys_recvmsg+0x102/0x1d0 net/socket.c:2537
  __do_sys_recvmsg net/socket.c:2547 [inline]
  __se_sys_recvmsg net/socket.c:2544 [inline]
  __x64_sys_recvmsg+0x78/0xb0 net/socket.c:2544
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a4e37140
  which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 2 bytes to the right of
  224-byte region [ffff8880a4e37140, ffff8880a4e37220)
The buggy address belongs to the page:
page:ffffea0002938dc0 refcount:1 mapcount:0 mapping:ffff88821b6a3a80  
index:0x0
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea000257fa88 ffffea00023a2008 ffff88821b6a3a80
raw: 0000000000000000 ffff8880a4e37000 000000010000000c 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8880a4e37100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
  ffff8880a4e37180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffff8880a4e37200: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
                                ^
  ffff8880a4e37280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880a4e37300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* Re: [PATCH net] tcp: remove empty skb from write queue in error cases
From: Neal Cardwell @ 2019-08-26 17:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Soheil Hassas Yeganeh, Eric Dumazet,
	Jason Baron, Vladimir Rutsky
In-Reply-To: <20190826161915.81676-1-edumazet@google.com>

On Mon, Aug 26, 2019 at 12:19 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Vladimir Rutsky reported stuck TCP sessions after memory pressure
> events. Edge Trigger epoll() user would never receive an EPOLLOUT
> notification allowing them to retry a sendmsg().
>
> Jason tested the case of sk_stream_alloc_skb() returning NULL,
> but there are other paths that could lead both sendmsg() and sendpage()
> to return -1 (EAGAIN), with an empty skb queued on the write queue.
>
> This patch makes sure we remove this empty skb so that
> Jason code can detect that the queue is empty, and
> call sk->sk_write_space(sk) accordingly.
>
> Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Reported-by: Vladimir Rutsky <rutsky@google.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

Nice detective work. :-) Thanks, Eric!

neal

^ permalink raw reply

* [PATCH v1 1/3] can: mcp251x: Use devm_clk_get_optional() to get the input clock
From: Andy Shevchenko @ 2019-08-26 17:26 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, linux-can,
	David S. Miller, netdev
  Cc: Andy Shevchenko

Simplify the code which fetches the input clock by using
devm_clk_get_optional(). This comes with a small functional change: previously
all errors were ignored when platform data is present. Now all errors are
treated as errors. If no input clock is present devm_clk_get_optional() will
return NULL instead of an error which matches the behavior of the old code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/net/can/spi/mcp251x.c | 30 ++++++++++++------------------
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 58992fd61cb9..e04b578f2b1f 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1014,15 +1014,13 @@ static int mcp251x_can_probe(struct spi_device *spi)
 	struct clk *clk;
 	int freq, ret;
 
-	clk = devm_clk_get(&spi->dev, NULL);
-	if (IS_ERR(clk)) {
-		if (pdata)
-			freq = pdata->oscillator_frequency;
-		else
-			return PTR_ERR(clk);
-	} else {
-		freq = clk_get_rate(clk);
-	}
+	clk = devm_clk_get_optional(&spi->dev, NULL);
+	if (IS_ERR(clk))
+		return PTR_ERR(clk);
+
+	freq = clk_get_rate(clk);
+	if (freq == 0 && pdata)
+		freq = pdata->oscillator_frequency;
 
 	/* Sanity check */
 	if (freq < 1000000 || freq > 25000000)
@@ -1033,11 +1031,9 @@ static int mcp251x_can_probe(struct spi_device *spi)
 	if (!net)
 		return -ENOMEM;
 
-	if (!IS_ERR(clk)) {
-		ret = clk_prepare_enable(clk);
-		if (ret)
-			goto out_free;
-	}
+	ret = clk_prepare_enable(clk);
+	if (ret)
+		goto out_free;
 
 	net->netdev_ops = &mcp251x_netdev_ops;
 	net->flags |= IFF_ECHO;
@@ -1122,8 +1118,7 @@ static int mcp251x_can_probe(struct spi_device *spi)
 	mcp251x_power_enable(priv->power, 0);
 
 out_clk:
-	if (!IS_ERR(clk))
-		clk_disable_unprepare(clk);
+	clk_disable_unprepare(clk);
 
 out_free:
 	free_candev(net);
@@ -1141,8 +1136,7 @@ static int mcp251x_can_remove(struct spi_device *spi)
 
 	mcp251x_power_enable(priv->power, 0);
 
-	if (!IS_ERR(priv->clk))
-		clk_disable_unprepare(priv->clk);
+	clk_disable_unprepare(priv->clk);
 
 	free_candev(net);
 
-- 
2.23.0.rc1


^ permalink raw reply related

* [PATCH v1 3/3] can: mcp251x: Call wrapper instead of regulator_disable()
From: Andy Shevchenko @ 2019-08-26 17:26 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, linux-can,
	David S. Miller, netdev
  Cc: Andy Shevchenko
In-Reply-To: <20190826172623.79378-1-andriy.shevchenko@linux.intel.com>

There is no need to check for regulator presence in the ->suspend()
since a wrapper does it for us. Due to this we may unconditionally set
AFTER_SUSPEND_POWER flag.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/net/can/spi/mcp251x.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index 0b7e743ca0a0..6ee0ea51399a 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1162,10 +1162,8 @@ static int __maybe_unused mcp251x_can_suspend(struct device *dev)
 		priv->after_suspend = AFTER_SUSPEND_DOWN;
 	}
 
-	if (!IS_ERR_OR_NULL(priv->power)) {
-		regulator_disable(priv->power);
-		priv->after_suspend |= AFTER_SUSPEND_POWER;
-	}
+	mcp251x_power_enable(priv->power, 0);
+	priv->after_suspend |= AFTER_SUSPEND_POWER;
 
 	return 0;
 }
-- 
2.23.0.rc1


^ permalink raw reply related

* [PATCH v1 2/3] can: mcp251x: Make use of device property API
From: Andy Shevchenko @ 2019-08-26 17:26 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, linux-can,
	David S. Miller, netdev
  Cc: Andy Shevchenko
In-Reply-To: <20190826172623.79378-1-andriy.shevchenko@linux.intel.com>

Make use of device property API in this driver so that both OF based
system and ACPI based system can use this driver.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 drivers/net/can/spi/mcp251x.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index e04b578f2b1f..0b7e743ca0a0 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -53,8 +53,7 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/netdevice.h>
-#include <linux/of.h>
-#include <linux/of_device.h>
+#include <linux/property.h>
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/spi/spi.h>
@@ -914,7 +913,7 @@ static int mcp251x_open(struct net_device *net)
 	priv->tx_skb = NULL;
 	priv->tx_len = 0;
 
-	if (!spi->dev.of_node)
+	if (!dev_fwnode(&spi->dev))
 		flags = IRQF_TRIGGER_FALLING;
 
 	ret = request_threaded_irq(spi->irq, NULL, mcp251x_can_ist,
@@ -1006,8 +1005,7 @@ MODULE_DEVICE_TABLE(spi, mcp251x_id_table);
 
 static int mcp251x_can_probe(struct spi_device *spi)
 {
-	const struct of_device_id *of_id = of_match_device(mcp251x_of_match,
-							   &spi->dev);
+	const void *match = device_get_match_data(&spi->dev);
 	struct mcp251x_platform_data *pdata = dev_get_platdata(&spi->dev);
 	struct net_device *net;
 	struct mcp251x_priv *priv;
@@ -1044,8 +1042,8 @@ static int mcp251x_can_probe(struct spi_device *spi)
 	priv->can.clock.freq = freq / 2;
 	priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES |
 		CAN_CTRLMODE_LOOPBACK | CAN_CTRLMODE_LISTENONLY;
-	if (of_id)
-		priv->model = (enum mcp251x_model)of_id->data;
+	if (match)
+		priv->model = (enum mcp251x_model)match;
 	else
 		priv->model = spi_get_device_id(spi)->driver_data;
 	priv->net = net;
-- 
2.23.0.rc1


^ permalink raw reply related

* Re: [PATCH net-next v4 6/6] net: dsa: mv88e6xxx: fully support SERDES on Topaz family
From: Marek Behun @ 2019-08-26 17:27 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Vivien Didelot, Florian Fainelli, Vladimir Oltean
In-Reply-To: <20190826153830.GE2168@lunn.ch>

On Mon, 26 Aug 2019 17:38:30 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > +static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
> > +				    phy_interface_t mode, bool allow_over_2500,
> > +				    bool make_cmode_writable)  
> 
> I don't like these two parameters. The caller of this function can do
> the check for allow_over_2500 and error out before calling this.
> 
> Is make_cmode_writable something that could be done once at probe and
> then forgotten about? Or is it needed before every write? At least
> move it into the specific port_set_cmode() that requires it.

It can be done once at probe. At first I thought about doing this in
setup_errata, but this is not an erratum. So shall I create a new
method for this in chip operations structure? Something like
port_additional_setup() ?

Marek

^ permalink raw reply

* Re: [PATCH net-next v4 6/6] net: dsa: mv88e6xxx: fully support SERDES on Topaz family
From: Marek Behun @ 2019-08-26 17:31 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Vivien Didelot, Florian Fainelli, Vladimir Oltean
In-Reply-To: <20190826153830.GE2168@lunn.ch>

On Mon, 26 Aug 2019 17:38:30 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> > +static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
> > +				    phy_interface_t mode, bool allow_over_2500,
> > +				    bool make_cmode_writable)  
> 
> I don't like these two parameters. The caller of this function can do
> the check for allow_over_2500 and error out before calling this.
> 
> Is make_cmode_writable something that could be done once at probe and
> then forgotten about? Or is it needed before every write? At least
> move it into the specific port_set_cmode() that requires it.
> 
> Thanks
> 	Andrew

Btw those two additional parameters were modeled as in
  static int mv88e6xxx_port_set_speed(struct mv88e6xxx_chip *chip,
                                      int port, int speed, bool alt_bit,
                                      bool force_bit);

^ permalink raw reply

* [PATCH v1 net-next 3/4] net: stmmac: add EHL RGMII 1Gbps PCI info and PCI ID
From: Voon Weifeng @ 2019-08-27  1:38 UTC (permalink / raw)
  To: David S. Miller, Maxime Coquelin
  Cc: netdev, linux-kernel, Jose Abreu, Giuseppe Cavallaro, Andrew Lunn,
	Alexandre Torgue, Ong Boon Leong, Voon Weifeng
In-Reply-To: <1566869891-29239-1-git-send-email-weifeng.voon@intel.com>

Added EHL RGMII 1Gbps PCI ID. Different MII and speed will have
different PCI ID.

Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index edb76408308b..e969dc9bb9f0 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -213,6 +213,19 @@ static int ehl_sgmii_data(struct pci_dev *pdev,
 	.setup = ehl_sgmii_data,
 };
 
+static int ehl_rgmii_data(struct pci_dev *pdev,
+			  struct plat_stmmacenet_data *plat)
+{
+	plat->bus_id = 1;
+	plat->phy_addr = 0;
+	plat->interface = PHY_INTERFACE_MODE_RGMII;
+	return ehl_common_data(pdev, plat);
+}
+
+static struct stmmac_pci_info ehl_rgmii1g_pci_info = {
+	.setup = ehl_rgmii_data,
+};
+
 static int tgl_common_data(struct pci_dev *pdev,
 			   struct plat_stmmacenet_data *plat)
 {
@@ -481,6 +494,7 @@ static int __maybe_unused stmmac_pci_resume(struct device *dev)
 
 #define STMMAC_QUARK_ID  0x0937
 #define STMMAC_DEVICE_ID 0x1108
+#define STMMAC_EHL_RGMII1G_ID	0x4b30
 #define STMMAC_EHL_SGMII1G_ID	0x4b31
 #define STMMAC_TGL_SGMII1G_ID	0xa0ac
 
@@ -493,6 +507,7 @@ static int __maybe_unused stmmac_pci_resume(struct device *dev)
 	STMMAC_DEVICE(STMMAC, STMMAC_DEVICE_ID, stmmac_pci_info),
 	STMMAC_DEVICE(STMICRO, PCI_DEVICE_ID_STMICRO_MAC, stmmac_pci_info),
 	STMMAC_DEVICE(INTEL, STMMAC_QUARK_ID, quark_pci_info),
+	STMMAC_DEVICE(INTEL, STMMAC_EHL_RGMII1G_ID, ehl_rgmii1g_pci_info),
 	STMMAC_DEVICE(INTEL, STMMAC_EHL_SGMII1G_ID, ehl_sgmii1g_pci_info),
 	STMMAC_DEVICE(INTEL, STMMAC_TGL_SGMII1G_ID, tgl_sgmii1g_pci_info),
 	{}
-- 
1.9.1


^ permalink raw reply related

* [PATCH v1 net-next 1/4] net: stmmac: add EHL SGMII 1Gbps PCI info and PCI ID
From: Voon Weifeng @ 2019-08-27  1:38 UTC (permalink / raw)
  To: David S. Miller, Maxime Coquelin
  Cc: netdev, linux-kernel, Jose Abreu, Giuseppe Cavallaro, Andrew Lunn,
	Alexandre Torgue, Ong Boon Leong, Voon Weifeng
In-Reply-To: <1566869891-29239-1-git-send-email-weifeng.voon@intel.com>

Added EHL SGMII 1Gbps PCI ID. Different MII and speed will have
different PCI ID.

Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 107 +++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index d5d08e11c353..f6930e02f578 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -108,6 +108,111 @@ static int stmmac_default_data(struct pci_dev *pdev,
 	.setup = stmmac_default_data,
 };
 
+static int intel_mgbe_common_data(struct pci_dev *pdev,
+				  struct plat_stmmacenet_data *plat)
+{
+	int i;
+
+	plat->clk_csr = 5;
+	plat->has_gmac = 0;
+	plat->has_gmac4 = 1;
+	plat->force_sf_dma_mode = 0;
+	plat->tso_en = 1;
+
+	plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+
+	for (i = 0; i < plat->rx_queues_to_use; i++) {
+		plat->rx_queues_cfg[i].mode_to_use = MTL_QUEUE_DCB;
+		plat->rx_queues_cfg[i].chan = i;
+
+		/* Disable Priority config by default */
+		plat->rx_queues_cfg[i].use_prio = false;
+
+		/* Disable RX queues routing by default */
+		plat->rx_queues_cfg[i].pkt_route = 0x0;
+	}
+
+	for (i = 0; i < plat->tx_queues_to_use; i++) {
+		plat->tx_queues_cfg[i].mode_to_use = MTL_QUEUE_DCB;
+
+		/* Disable Priority config by default */
+		plat->tx_queues_cfg[i].use_prio = false;
+	}
+
+	/* FIFO size is 4096 bytes for 1 tx/rx queue */
+	plat->tx_fifo_size = plat->tx_queues_to_use * 4096;
+	plat->rx_fifo_size = plat->rx_queues_to_use * 4096;
+
+	plat->tx_sched_algorithm = MTL_TX_ALGORITHM_WRR;
+	plat->tx_queues_cfg[0].weight = 0x09;
+	plat->tx_queues_cfg[1].weight = 0x0A;
+	plat->tx_queues_cfg[2].weight = 0x0B;
+	plat->tx_queues_cfg[3].weight = 0x0C;
+	plat->tx_queues_cfg[4].weight = 0x0D;
+	plat->tx_queues_cfg[5].weight = 0x0E;
+	plat->tx_queues_cfg[6].weight = 0x0F;
+	plat->tx_queues_cfg[7].weight = 0x10;
+
+	plat->mdio_bus_data->phy_mask = 0;
+
+	plat->dma_cfg->pbl = 32;
+	plat->dma_cfg->pblx8 = true;
+	plat->dma_cfg->fixed_burst = 0;
+	plat->dma_cfg->mixed_burst = 0;
+	plat->dma_cfg->aal = 0;
+
+	plat->axi = devm_kzalloc(&pdev->dev, sizeof(*plat->axi),
+				 GFP_KERNEL);
+	if (!plat->axi)
+		return -ENOMEM;
+
+	plat->axi->axi_lpi_en = 0;
+	plat->axi->axi_xit_frm = 0;
+	plat->axi->axi_wr_osr_lmt = 1;
+	plat->axi->axi_rd_osr_lmt = 1;
+	plat->axi->axi_blen[0] = 4;
+	plat->axi->axi_blen[1] = 8;
+	plat->axi->axi_blen[2] = 16;
+
+	/* Set default value for multicast hash bins */
+	plat->multicast_filter_bins = HASH_TABLE_SIZE;
+
+	/* Set default value for unicast filter entries */
+	plat->unicast_filter_entries = 1;
+
+	/* Set the maxmtu to a default of JUMBO_LEN */
+	plat->maxmtu = JUMBO_LEN;
+
+	return 0;
+}
+
+static int ehl_common_data(struct pci_dev *pdev,
+			   struct plat_stmmacenet_data *plat)
+{
+	int ret;
+
+	plat->rx_queues_to_use = 8;
+	plat->tx_queues_to_use = 8;
+	ret = intel_mgbe_common_data(pdev, plat);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int ehl_sgmii_data(struct pci_dev *pdev,
+			  struct plat_stmmacenet_data *plat)
+{
+	plat->bus_id = 1;
+	plat->phy_addr = 0;
+	plat->interface = PHY_INTERFACE_MODE_SGMII;
+	return ehl_common_data(pdev, plat);
+}
+
+static struct stmmac_pci_info ehl_sgmii1g_pci_info = {
+	.setup = ehl_sgmii_data,
+};
+
 static const struct stmmac_pci_func_data galileo_stmmac_func_data[] = {
 	{
 		.func = 6,
@@ -349,6 +454,7 @@ static int __maybe_unused stmmac_pci_resume(struct device *dev)
 
 #define STMMAC_QUARK_ID  0x0937
 #define STMMAC_DEVICE_ID 0x1108
+#define STMMAC_EHL_SGMII1G_ID	0x4b31
 
 #define STMMAC_DEVICE(vendor_id, dev_id, info)	{	\
 	PCI_VDEVICE(vendor_id, dev_id),			\
@@ -359,6 +465,7 @@ static int __maybe_unused stmmac_pci_resume(struct device *dev)
 	STMMAC_DEVICE(STMMAC, STMMAC_DEVICE_ID, stmmac_pci_info),
 	STMMAC_DEVICE(STMICRO, PCI_DEVICE_ID_STMICRO_MAC, stmmac_pci_info),
 	STMMAC_DEVICE(INTEL, STMMAC_QUARK_ID, quark_pci_info),
+	STMMAC_DEVICE(INTEL, STMMAC_EHL_SGMII1G_ID, ehl_sgmii1g_pci_info),
 	{}
 };
 
-- 
1.9.1


^ permalink raw reply related

* [PATCH v1 net-next 2/4] net: stmmac: add TGL SGMII 1Gbps PCI info and PCI ID
From: Voon Weifeng @ 2019-08-27  1:38 UTC (permalink / raw)
  To: David S. Miller, Maxime Coquelin
  Cc: netdev, linux-kernel, Jose Abreu, Giuseppe Cavallaro, Andrew Lunn,
	Alexandre Torgue, Ong Boon Leong, Voon Weifeng
In-Reply-To: <1566869891-29239-1-git-send-email-weifeng.voon@intel.com>

Added TGL SGMII 1Gbps PCI ID. Different MII and speed will have
different PCI ID.

Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 29 ++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index f6930e02f578..edb76408308b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -213,6 +213,33 @@ static int ehl_sgmii_data(struct pci_dev *pdev,
 	.setup = ehl_sgmii_data,
 };
 
+static int tgl_common_data(struct pci_dev *pdev,
+			   struct plat_stmmacenet_data *plat)
+{
+	int ret;
+
+	plat->rx_queues_to_use = 6;
+	plat->tx_queues_to_use = 4;
+	ret = intel_mgbe_common_data(pdev, plat);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int tgl_sgmii_data(struct pci_dev *pdev,
+			  struct plat_stmmacenet_data *plat)
+{
+	plat->bus_id = 1;
+	plat->phy_addr = 0;
+	plat->interface = PHY_INTERFACE_MODE_SGMII;
+	return tgl_common_data(pdev, plat);
+}
+
+static struct stmmac_pci_info tgl_sgmii1g_pci_info = {
+	.setup = tgl_sgmii_data,
+};
+
 static const struct stmmac_pci_func_data galileo_stmmac_func_data[] = {
 	{
 		.func = 6,
@@ -455,6 +482,7 @@ static int __maybe_unused stmmac_pci_resume(struct device *dev)
 #define STMMAC_QUARK_ID  0x0937
 #define STMMAC_DEVICE_ID 0x1108
 #define STMMAC_EHL_SGMII1G_ID	0x4b31
+#define STMMAC_TGL_SGMII1G_ID	0xa0ac
 
 #define STMMAC_DEVICE(vendor_id, dev_id, info)	{	\
 	PCI_VDEVICE(vendor_id, dev_id),			\
@@ -466,6 +494,7 @@ static int __maybe_unused stmmac_pci_resume(struct device *dev)
 	STMMAC_DEVICE(STMICRO, PCI_DEVICE_ID_STMICRO_MAC, stmmac_pci_info),
 	STMMAC_DEVICE(INTEL, STMMAC_QUARK_ID, quark_pci_info),
 	STMMAC_DEVICE(INTEL, STMMAC_EHL_SGMII1G_ID, ehl_sgmii1g_pci_info),
+	STMMAC_DEVICE(INTEL, STMMAC_TGL_SGMII1G_ID, tgl_sgmii1g_pci_info),
 	{}
 };
 
-- 
1.9.1


^ permalink raw reply related

* [PATCH v1 net-next 4/4] net: stmmac: setup higher frequency clk support for EHL & TGL
From: Voon Weifeng @ 2019-08-27  1:38 UTC (permalink / raw)
  To: David S. Miller, Maxime Coquelin
  Cc: netdev, linux-kernel, Jose Abreu, Giuseppe Cavallaro, Andrew Lunn,
	Alexandre Torgue, Ong Boon Leong, Voon Weifeng
In-Reply-To: <1566869891-29239-1-git-send-email-weifeng.voon@intel.com>

EHL DW EQOS is running on a 200MHz clock. Setting up stmmac-clk,
ptp clock and ptp_max_adj to 200MHz.

Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 21 +++++++++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c |  3 +++
 include/linux/stmmac.h                           |  1 +
 3 files changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index e969dc9bb9f0..20906287b6d4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -9,6 +9,7 @@
   Author: Giuseppe Cavallaro <peppe.cavallaro@st.com>
 *******************************************************************************/
 
+#include <linux/clk-provider.h>
 #include <linux/pci.h>
 #include <linux/dmi.h>
 
@@ -174,6 +175,19 @@ static int intel_mgbe_common_data(struct pci_dev *pdev,
 	plat->axi->axi_blen[1] = 8;
 	plat->axi->axi_blen[2] = 16;
 
+	plat->ptp_max_adj = plat->clk_ptp_rate;
+
+	/* Set system clock */
+	plat->stmmac_clk = clk_register_fixed_rate(&pdev->dev,
+						   "stmmac-clk", NULL, 0,
+						   plat->clk_ptp_rate);
+
+	if (IS_ERR(plat->stmmac_clk)) {
+		dev_warn(&pdev->dev, "Fail to register stmmac-clk\n");
+		plat->stmmac_clk = NULL;
+	}
+	clk_prepare_enable(plat->stmmac_clk);
+
 	/* Set default value for multicast hash bins */
 	plat->multicast_filter_bins = HASH_TABLE_SIZE;
 
@@ -193,6 +207,7 @@ static int ehl_common_data(struct pci_dev *pdev,
 
 	plat->rx_queues_to_use = 8;
 	plat->tx_queues_to_use = 8;
+	plat->clk_ptp_rate = 200000000;
 	ret = intel_mgbe_common_data(pdev, plat);
 	if (ret)
 		return ret;
@@ -233,6 +248,7 @@ static int tgl_common_data(struct pci_dev *pdev,
 
 	plat->rx_queues_to_use = 6;
 	plat->tx_queues_to_use = 4;
+	plat->clk_ptp_rate = 200000000;
 	ret = intel_mgbe_common_data(pdev, plat);
 	if (ret)
 		return ret;
@@ -438,10 +454,15 @@ static int stmmac_pci_probe(struct pci_dev *pdev,
  */
 static void stmmac_pci_remove(struct pci_dev *pdev)
 {
+	struct net_device *ndev = dev_get_drvdata(&pdev->dev);
+	struct stmmac_priv *priv = netdev_priv(ndev);
 	int i;
 
 	stmmac_dvr_remove(&pdev->dev);
 
+	if (priv->plat->stmmac_clk)
+		clk_unregister_fixed_rate(priv->plat->stmmac_clk);
+
 	for (i = 0; i <= PCI_STD_RESOURCE_END; i++) {
 		if (pci_resource_len(pdev, i) == 0)
 			continue;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
index c48224973a37..173493db038c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
@@ -194,6 +194,9 @@ void stmmac_ptp_register(struct stmmac_priv *priv)
 		priv->pps[i].available = true;
 	}
 
+	if (priv->plat->ptp_max_adj)
+		stmmac_ptp_clock_ops.max_adj = priv->plat->ptp_max_adj;
+
 	stmmac_ptp_clock_ops.n_per_out = priv->dma_cap.pps_out_num;
 
 	spin_lock_init(&priv->ptp_lock);
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 5cc6b6faf359..7ad7ae35cf88 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -168,6 +168,7 @@ struct plat_stmmacenet_data {
 	struct clk *clk_ptp_ref;
 	unsigned int clk_ptp_rate;
 	unsigned int clk_ref_rate;
+	s32 ptp_max_adj;
 	struct reset_control *stmmac_rst;
 	struct stmmac_axi *axi;
 	int has_gmac4;
-- 
1.9.1


^ permalink raw reply related

* [PATCH v1 net-next 0/4] Add EHL and TGL PCI info and PCI ID
From: Voon Weifeng @ 2019-08-27  1:38 UTC (permalink / raw)
  To: David S. Miller, Maxime Coquelin
  Cc: netdev, linux-kernel, Jose Abreu, Giuseppe Cavallaro, Andrew Lunn,
	Alexandre Torgue, Ong Boon Leong, Voon Weifeng

In order to keep PCI info simple and neat, this patch series have
introduced a 3 hierarchy of struct. First layer will be the
intel_mgbe_common_data struct which keeps all Intel common configuration.
Second layer will be xxx_common_data which keeps all the different Intel
microarchitecture, e.g tgl, ehl. The third layer will be configuration
that tied to the PCI ID only based on speed and RGMII/SGMII interface.

EHL and TGL will also having a higher system clock which is 200Mhz.

Voon Weifeng (4):
  net: stmmac: add EHL SGMII 1Gbps PCI info and PCI ID
  net: stmmac: add TGL SGMII 1Gbps PCI info and PCI ID
  net: stmmac: add EHL RGMII 1Gbps PCI info and PCI ID
  net: stmmac: setup higher frequency clk support for EHL & TGL

 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 172 +++++++++++++++++++++++
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c |   3 +
 include/linux/stmmac.h                           |   1 +
 3 files changed, 176 insertions(+)

-- 
1.9.1


^ permalink raw reply

* Re: [PATCH net-next v4 6/6] net: dsa: mv88e6xxx: fully support SERDES on Topaz family
From: Vivien Didelot @ 2019-08-26 17:44 UTC (permalink / raw)
  To: Marek Behun; +Cc: Andrew Lunn, netdev, Florian Fainelli, Vladimir Oltean
In-Reply-To: <20190826192717.50738e37@nic.cz>

On Mon, 26 Aug 2019 19:27:17 +0200, Marek Behun <marek.behun@nic.cz> wrote:
> On Mon, 26 Aug 2019 17:38:30 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > > +static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
> > > +				    phy_interface_t mode, bool allow_over_2500,
> > > +				    bool make_cmode_writable)  
> > 
> > I don't like these two parameters. The caller of this function can do
> > the check for allow_over_2500 and error out before calling this.
> > 
> > Is make_cmode_writable something that could be done once at probe and
> > then forgotten about? Or is it needed before every write? At least
> > move it into the specific port_set_cmode() that requires it.
> 
> It can be done once at probe. At first I thought about doing this in
> setup_errata, but this is not an erratum. So shall I create a new
> method for this in chip operations structure? Something like
> port_additional_setup() ?

No. Those "setup" or "config" functions are likely to do everything and
become a mess, thus unmaintainable. Operations must be specific.

^ permalink raw reply

* [PATCH v1 net-next] net: stmmac: Add support for MDIO interrupts
From: Voon Weifeng @ 2019-08-27  1:45 UTC (permalink / raw)
  To: David S. Miller, Maxime Coquelin
  Cc: netdev, linux-kernel, Jose Abreu, Giuseppe Cavallaro, Andrew Lunn,
	Alexandre Torgue, Ong Boon Leong, Voon Weifeng

From: "Chuah, Kim Tatt" <kim.tatt.chuah@intel.com>

DW EQoS v5.xx controllers added capability for interrupt generation
when MDIO interface is done (GMII Busy bit is cleared).
This patch adds support for this interrupt on supported HW to avoid
polling on GMII Busy bit.

stmmac_mdio_read() & stmmac_mdio_write() will sleep until wake_up() is
called by the interrupt handler.

Reviewed-by: Voon Weifeng <weifeng.voon@intel.com>
Reviewed-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Reviewed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Chuah, Kim Tatt <kim.tatt.chuah@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 49aa56ca09cc..775a1c114b1a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -448,6 +448,8 @@ struct mac_device_info {
 	unsigned int pcs;
 	unsigned int pmt;
 	unsigned int ps;
+	bool mdio_intr_en;
+	wait_queue_head_t mdio_busy_wait;
 };
 
 struct stmmac_rx_routing {
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
index 2ed11a581d80..1be6a8a88b8f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4.h
@@ -106,6 +106,7 @@ enum dwmac4_irq_status {
 	mmc_irq = 0x00000100,
 	lpi_irq = 0x00000020,
 	pmt_irq = 0x00000010,
+	mdio_irq = 0x00040000,
 };
 
 /* MAC PMT bitmap */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index fc9954e4a772..54adad616b90 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -59,6 +59,9 @@ static void dwmac4_core_init(struct mac_device_info *hw,
 	if (hw->pcs)
 		value |= GMAC_PCS_IRQ_DEFAULT;
 
+	if (hw->mdio_intr_en)
+		value |= GMAC_INT_MDIO_EN;
+
 	writel(value, ioaddr + GMAC_INT_EN);
 }
 
@@ -629,6 +632,9 @@ static int dwmac4_irq_status(struct mac_device_info *hw,
 			x->irq_rx_path_exit_lpi_mode_n++;
 	}
 
+	if (intr_status & mdio_irq)
+		wake_up(&hw->mdio_busy_wait);
+
 	dwmac_pcs_isr(ioaddr, GMAC_PCS_BASE, intr_status, x);
 	if (intr_status & PCS_RGSMIIIS_IRQ)
 		dwmac4_phystatus(ioaddr, x);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
index 775db776b3cc..48550d617b01 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac5.h
@@ -72,6 +72,9 @@
 #define TCEIE				BIT(0)
 #define DMA_ECC_INT_STATUS		0x00001088
 
+/* MDIO interrupt enable in MAC_Interrupt_Enable register */
+#define GMAC_INT_MDIO_EN		BIT(18)
+
 int dwmac5_safety_feat_config(void __iomem *ioaddr, unsigned int asp);
 int dwmac5_safety_feat_irq_status(struct net_device *ndev,
 		void __iomem *ioaddr, unsigned int asp,
diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index 3af2e5015245..11c7f92e99b4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -73,6 +73,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 	bool gmac;
 	bool gmac4;
 	bool xgmac;
+	bool mdio_intr_en;
 	u32 min_id;
 	const struct stmmac_regs_off regs;
 	const void *desc;
@@ -90,6 +91,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 		.gmac = false,
 		.gmac4 = false,
 		.xgmac = false,
+		.mdio_intr_en = false,
 		.min_id = 0,
 		.regs = {
 			.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -108,6 +110,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 		.gmac = true,
 		.gmac4 = false,
 		.xgmac = false,
+		.mdio_intr_en = false,
 		.min_id = 0,
 		.regs = {
 			.ptp_off = PTP_GMAC3_X_OFFSET,
@@ -126,6 +129,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 		.gmac = false,
 		.gmac4 = true,
 		.xgmac = false,
+		.mdio_intr_en = false,
 		.min_id = 0,
 		.regs = {
 			.ptp_off = PTP_GMAC4_OFFSET,
@@ -144,6 +148,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 		.gmac = false,
 		.gmac4 = true,
 		.xgmac = false,
+		.mdio_intr_en = false,
 		.min_id = DWMAC_CORE_4_00,
 		.regs = {
 			.ptp_off = PTP_GMAC4_OFFSET,
@@ -162,6 +167,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 		.gmac = false,
 		.gmac4 = true,
 		.xgmac = false,
+		.mdio_intr_en = false,
 		.min_id = DWMAC_CORE_4_10,
 		.regs = {
 			.ptp_off = PTP_GMAC4_OFFSET,
@@ -180,6 +186,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 		.gmac = false,
 		.gmac4 = true,
 		.xgmac = false,
+		.mdio_intr_en = true,
 		.min_id = DWMAC_CORE_5_10,
 		.regs = {
 			.ptp_off = PTP_GMAC4_OFFSET,
@@ -198,6 +205,7 @@ static int stmmac_dwmac4_quirks(struct stmmac_priv *priv)
 		.gmac = false,
 		.gmac4 = false,
 		.xgmac = true,
+		.mdio_intr_en = false,
 		.min_id = DWXGMAC_CORE_2_10,
 		.regs = {
 			.ptp_off = PTP_XGMAC_OFFSET,
@@ -276,6 +284,10 @@ int stmmac_hwif_init(struct stmmac_priv *priv)
 		mac->mode = mac->mode ? : entry->mode;
 		mac->tc = mac->tc ? : entry->tc;
 		mac->mmc = mac->mmc ? : entry->mmc;
+		mac->mdio_intr_en = mac->mdio_intr_en ? : entry->mdio_intr_en;
+
+		if (mac->mdio_intr_en)
+			init_waitqueue_head(&mac->mdio_busy_wait);
 
 		priv->hw = mac;
 		priv->ptpaddr = priv->ioaddr + entry->regs.ptp_off;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 40c42637ad75..c9a23218307b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -142,6 +142,15 @@ static int stmmac_xgmac2_mdio_write(struct mii_bus *bus, int phyaddr,
 				  !(tmp & MII_XGMAC_BUSY), 100, 10000);
 }
 
+static bool stmmac_mdio_intr_done(struct mii_bus *bus)
+{
+	struct net_device *ndev = bus->priv;
+	struct stmmac_priv *priv = netdev_priv(ndev);
+	unsigned int mii_address = priv->hw->mii.addr;
+
+	return !(readl(priv->ioaddr + mii_address) & MII_BUSY);
+}
+
 /**
  * stmmac_mdio_read
  * @bus: points to the mii_bus structure
@@ -181,16 +190,26 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg)
 		}
 	}
 
-	if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
-			       100, 10000))
+	if (priv->hw->mdio_intr_en) {
+		if (!wait_event_timeout(priv->hw->mdio_busy_wait,
+					stmmac_mdio_intr_done(bus), HZ / 100))
+			return -EBUSY;
+	} else if (readl_poll_timeout(priv->ioaddr + mii_address, v,
+				      !(v & MII_BUSY), 100, 10000)) {
 		return -EBUSY;
+	}
 
 	writel(data, priv->ioaddr + mii_data);
 	writel(value, priv->ioaddr + mii_address);
 
-	if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
-			       100, 10000))
+	if (priv->hw->mdio_intr_en) {
+		if (!wait_event_timeout(priv->hw->mdio_busy_wait,
+					stmmac_mdio_intr_done(bus), HZ / 100))
+			return -EBUSY;
+	} else if (readl_poll_timeout(priv->ioaddr + mii_address, v,
+				      !(v & MII_BUSY), 100, 10000)) {
 		return -EBUSY;
+	}
 
 	/* Read the data from the MII data register */
 	data = (int)readl(priv->ioaddr + mii_data) & MII_DATA_MASK;
@@ -240,17 +259,30 @@ static int stmmac_mdio_write(struct mii_bus *bus, int phyaddr, int phyreg,
 	}
 
 	/* Wait until any existing MII operation is complete */
-	if (readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
-			       100, 10000))
+	if (priv->hw->mdio_intr_en) {
+		if (!wait_event_timeout(priv->hw->mdio_busy_wait,
+					stmmac_mdio_intr_done(bus), HZ / 100))
+			return -EBUSY;
+	} else if (readl_poll_timeout(priv->ioaddr + mii_address, v,
+				      !(v & MII_BUSY), 100, 10000)) {
 		return -EBUSY;
+	}
 
 	/* Set the MII address register to write */
 	writel(data, priv->ioaddr + mii_data);
 	writel(value, priv->ioaddr + mii_address);
 
 	/* Wait until any existing MII operation is complete */
-	return readl_poll_timeout(priv->ioaddr + mii_address, v, !(v & MII_BUSY),
-				  100, 10000);
+	if (priv->hw->mdio_intr_en) {
+		if (!wait_event_timeout(priv->hw->mdio_busy_wait,
+					stmmac_mdio_intr_done(bus), HZ / 100))
+			return -EBUSY;
+	} else if (readl_poll_timeout(priv->ioaddr + mii_address, v,
+				      !(v & MII_BUSY), 100, 10000)) {
+		return -EBUSY;
+	}
+
+	return 0;
 }
 
 /**
-- 
1.9.1


^ permalink raw reply related

* Re: BUG_ON in skb_segment, after bpf_skb_change_proto was applied
From: Eric Dumazet @ 2019-08-26 17:47 UTC (permalink / raw)
  To: Shmulik Ladkani, Daniel Borkmann, Eric Dumazet, netdev,
	Alexander Duyck
  Cc: Alexei Starovoitov, Yonghong Song, Steffen Klassert, shmulik,
	eyal
In-Reply-To: <20190826170724.25ff616f@pixies>



On 8/26/19 4:07 PM, Shmulik Ladkani wrote:
> Hi,
> 
> In our production systems, running v4.19.y longterm kernels, we hit a
> BUG_ON in 'skb_segment()'. It occurs rarely and although tried, couldn't
> synthetically reproduce.
> 
> In v4.19.41 it crashes at net/core/skbuff.c:3711
> 
> 		while (pos < offset + len) {
> 			if (i >= nfrags) {
> 				i = 0;
> 				nfrags = skb_shinfo(list_skb)->nr_frags;
> 				frag = skb_shinfo(list_skb)->frags;
> 				frag_skb = list_skb;
> 				if (!skb_headlen(list_skb)) {
> 					BUG_ON(!nfrags);
> 				} else {
> 3711:					BUG_ON(!list_skb->head_frag);
> 
> With the accompanying dump:
> 
>  kernel BUG at net/core/skbuff.c:3711!
>  invalid opcode: 0000 [#1] SMP PTI
>  CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 4.19.41-041941-generic #201905080231
>  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
>  RIP: 0010:skb_segment+0xb65/0xda9
>  Code: 89 44 24 60 48 89 4c 24 70 e8 87 b3 ff ff 48 8b 4c 24 70 44 8b 44 24 60 85 c0 44 8b 54 24 4c 0f 84 fc fb ff ff e9 16 fd ff ff <0f> 0b 29 c1 89 ce 09 ca e9 61 ff ff ff 0f 0b 41 8b bf 84 00 00 00
>  RSP: 0018:ffff9e4d79b037c0 EFLAGS: 00010246
>  RAX: ffff9e4d75012ec0 RBX: ffff9e4d74067500 RCX: 0000000000000000
>  RDX: 0000000000480020 RSI: 0000000000000000 RDI: ffff9e4d74e3a200
>  RBP: ffff9e4d79b03898 R08: 0000000000000564 R09: f69d84ecbfe8b972
>  R10: 0000000000000571 R11: a6b66a32f69d84ec R12: 0000000000000564
>  R13: ffff9e4c18d03ef0 R14: 0000000000000000 R15: ffff9e4d74e3a200
>  FS:  0000000000000000(0000) GS:ffff9e4d79b00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00000000007f50d8 CR3: 000000009420a003 CR4: 00000000001606e0
>  Call Trace:
>   <IRQ>
>   tcp_gso_segment+0xf9/0x4e0
>   tcp6_gso_segment+0x5e/0x100
>   ipv6_gso_segment+0x112/0x340
>   skb_mac_gso_segment+0xb9/0x130
>   __skb_gso_segment+0x84/0x190
>   validate_xmit_skb+0x14a/0x2f0
>   validate_xmit_skb_list+0x4b/0x70
>   sch_direct_xmit+0x154/0x390
>   __dev_queue_xmit+0x808/0x920
>   dev_queue_xmit+0x10/0x20
>   neigh_direct_output+0x11/0x20
>   ip6_finish_output2+0x1b9/0x5b0
>   ip6_finish_output+0x13a/0x1b0
>   ip6_output+0x6c/0x110
>   ? ip6_fragment+0xa40/0xa40
>   ip6_forward+0x501/0x810
>   ip6_rcv_finish+0x7a/0x90
>   ipv6_rcv+0x69/0xe0
>   ? nf_hook.part.24+0x10/0x10
>   __netif_receive_skb_core+0x4fa/0xc80
>   ? netif_receive_skb_core+0x20/0x20
>   ? netif_receive_skb_internal+0x45/0xf0
>   ? tcp4_gro_complete+0x86/0x90
>   ? napi_gro_complete+0x53/0x90
>   __netif_receive_skb_one_core+0x3b/0x80
>   __netif_receive_skb+0x18/0x60
>   process_backlog+0xb3/0x170
>   net_rx_action+0x130/0x350
>   __do_softirq+0xdc/0x2d4
> 
> To our best knowledge, the packet flow leading to this BUG_ON is:
> 
>   - ingress on eth0 (veth, gro:on), ipv4 udp encapsulated esp
>   - re-ingresss on eth0, after xfrm, decapsulated ipv4 tcp
>   - the skb was GROed (skb_is_gso:true)
>   - ipv4 forwarding to dummy1, where eBPF nat4-to-6 program is attached
>     at TC Egress (calls 'bpf_skb_change_proto()'), then redirect to ingress
>     on same device.
>     NOTE: 'bpf_skb_proto_4_to_6()' mangles 'shinfo->gso_size'

Doing this on an skb with a frag_list is doomed, in current gso_segment() state.

A rewrite  would be needed (I believe I did so at some point, but Herbert Xu fought hard against it)


>   - ingress on dummy1, ipv6 tcp
>   - ipv6 forwarding
>   - egress on tun2 (tun device) that calls:
>     validate_xmit_skb -> ... -> skb_segment BUG_ON
> 
> A similar issue was reported and fixed by Yonghong Song in commit
> 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb").
> 
> However 13acc94eff12 added "BUG_ON(!list_skb->head_frag)" to line 3711,
> and patchwork states:
> 
>     This patch addressed the issue by handling skb_headlen(list_skb) != 0
>     case properly if list_skb->head_frag is true, which is expected in
>     most cases. [1]
> 
> meaning, 13acc94eff12 does not support list_skb->head_frag=0 case.
> 
> Historically, it is claimed that skb_segment is rather intolerant to
> gso_size changes, quote:
> 
>     Eric suggested to shrink gso_size instead to avoid segmentation+fragments.
>     I think its nice idea, but skb_gso_segment makes certain assumptions about
>     nr_frags and gso_size (it can't handle frag size > desired mss). [2]
> 
> Any suggestions how to debug and fix this?
> 
> Could it be that 'bpf_skb_change_proto()' isn't really allowed to
> mangle 'gso_size', and we should somehow enforce a 'skb_segment()' call
> PRIOR translation?
> 
> Appreciate any input and assistance,
> Shmulik
> 
> [1] https://patchwork.ozlabs.org/patch/889166/
> [2] https://patchwork.ozlabs.org/patch/314327/
> 

^ permalink raw reply

* Re: [PATCH net-next v3 00/10] Refactor cls hardware offload API to support rtnl-independent drivers
From: Jakub Kicinski @ 2019-08-26 17:48 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo
In-Reply-To: <20190826134506.9705-1-vladbu@mellanox.com>

On Mon, 26 Aug 2019 16:44:56 +0300, Vlad Buslov wrote:
> Currently, all cls API hardware offloads driver callbacks require caller
> to hold rtnl lock when calling them. This patch set introduces new API
> that allows drivers to register callbacks that are not dependent on rtnl
> lock and unlocked classifiers to offload filters without obtaining rtnl
> lock first, which is intended to allow offloading tc rules in parallel.
> 
> Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
> TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
> already registered with this flag and only take rtnl lock when qdisc or
> classifier requires it. Classifiers can indicate that their ops
> callbacks don't require caller to hold rtnl lock by setting the
> TCF_PROTO_OPS_DOIT_UNLOCKED flag. Unlocked implementation of flower
> classifier is now upstreamed. However, this implementation still obtains
> rtnl lock before calling hardware offloads API.
> 
> Implement following cls API changes:
> 
> - Introduce new "unlocked_driver_cb" flag to struct flow_block_offload
>   to allow registering and unregistering block hardware offload
>   callbacks that do not require caller to hold rtnl lock. Drivers that
>   doesn't require users of its tc offload callbacks to hold rtnl lock
>   sets the flag to true on block bind/unbind. Internally tcf_block is
>   extended with additional lockeddevcnt counter that is used to count
>   number of devices that require rtnl lock that block is bound to. When
>   this counter is zero, tc_setup_cb_*() functions execute callbacks
>   without obtaining rtnl lock.
> 
> - Extend cls API single hardware rule update tc_setup_cb_call() function
>   with tc_setup_cb_add(), tc_setup_cb_replace(), tc_setup_cb_destroy()
>   and tc_setup_cb_reoffload() functions. These new APIs are needed to
>   move management of block offload counter, filter in hardware counter
>   and flag from classifier implementations to cls API, which is now
>   responsible for managing them in concurrency-safe manner. Access to
>   cb_list from callback execution code is synchronized by obtaining new
>   'cb_lock' rw_semaphore in read mode, which allows executing callbacks
>   in parallel, but excludes any modifications of data from
>   register/unregister code. tcf_block offloads counter type is changed
>   to atomic integer to allow updating the counter concurrently.
> 
> - Extend classifier ops with new ops->hw_add() and ops->hw_del()
>   callbacks which are used to notify unlocked classifiers when filter is
>   successfully added or deleted to hardware without releasing cb_lock.
>   This is necessary to update classifier state atomically with callback
>   list traversal and updating of all relevant counters and allows
>   unlocked classifiers to synchronize with concurrent reoffload without
>   requiring any changes to driver callback API implementations.
> 
> New tc flow_action infrastructure is also modified to allow its user to
> execute without rtnl lock protection. Function tc_setup_flow_action() is
> modified to conditionally obtain rtnl lock before accessing action
> state. Action data that is accessed by reference is either copied or
> reference counted to prevent concurrent action overwrite from
> deallocating it. New function tc_cleanup_flow_action() is introduced to
> cleanup/release all such data obtained by tc_setup_flow_action().
> 
> Flower classifier (only unlocked classifier at the moment) is modified
> to use new cls hardware offloads API and no longer obtains rtnl lock
> before calling it.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: [PATCH net-next v4 6/6] net: dsa: mv88e6xxx: fully support SERDES on Topaz family
From: Vivien Didelot @ 2019-08-26 17:50 UTC (permalink / raw)
  To: Marek Behun; +Cc: Andrew Lunn, netdev, Florian Fainelli, Vladimir Oltean
In-Reply-To: <20190826193125.4c94662e@nic.cz>

On Mon, 26 Aug 2019 19:31:25 +0200, Marek Behun <marek.behun@nic.cz> wrote:
> On Mon, 26 Aug 2019 17:38:30 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > > +static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
> > > +				    phy_interface_t mode, bool allow_over_2500,
> > > +				    bool make_cmode_writable)  
> > 
> > I don't like these two parameters. The caller of this function can do
> > the check for allow_over_2500 and error out before calling this.
> > 
> > Is make_cmode_writable something that could be done once at probe and
> > then forgotten about? Or is it needed before every write? At least
> > move it into the specific port_set_cmode() that requires it.
> > 
> > Thanks
> > 	Andrew
> 
> Btw those two additional parameters were modeled as in
>   static int mv88e6xxx_port_set_speed(struct mv88e6xxx_chip *chip,
>                                       int port, int speed, bool alt_bit,
>                                       bool force_bit);

"AltSpeed" and "ForceSpeed" are two bits found in the MAC Control Register
of certain switch models, which can be set by this private helper.

I don't think that is the case for "allow_over_2500" and "make_cmode_writable".

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox