Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup
From: Kristian Evensen @ 2018-06-14  8:38 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Tobias Hommel, Markus Berner, Network Development,
	Florian Westphal
In-Reply-To: <CAKfDRXjFC4L7Rmv_-nUbuOLLstvid64JxF9ECOL4Dbzn3FZwLA@mail.gmail.com>

Hello,

On Tue, Jun 12, 2018 at 10:29 AM, Kristian Evensen
<kristian.evensen@gmail.com> wrote:
> Thanks for spending time on this. I will see what I can manage in
> terms of a bisect. Our last good kernel was 4.9, so at least it
> narrows the scope down a bit compared to 4.4 or 4.1.

I hope we might have got somewhere. While looking more into ipsec and
4.14, we noticed large performance regressions (-~20%) on some
low-powered devices we are also using. We quickly identified the
removal of the flow cache as the "culprit", and the performance
regression is discussed in the netdev-thread for the removal of the
cache ("xfrm: remove flow cache"). For the time being and in order to
restore the performance, we have reverted the patch series removing
the flow cache. When running our tests (on the APU) after the revert,
we no longer see the crash. Before the revert, the APU would always
crash within some hours. After the revert, our tests have been running
for 24 hours+. Our test is quite basic, we establish 1, 2, 3 ...,  50
tunnels and then run iperf on all tunnels in parallel. The tunnels are
teared down between each iteration.

We are still running the test and will keep doing so, but I thought I
should share this finding in case it can help in fixing the error. I
will report back in case we find out something more, and please let me
know if you have any suggestions for things I can test. I don't for
example know if it is safe to revert one and one commit of the flow
cache, to try to pin the crash even more down.

BR,
Kristian

^ permalink raw reply

* Re: [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are disabled
From: Ilpo Järvinen @ 2018-06-14  8:42 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Michal Kubecek, netdev, Eric Dumazet, LKML
In-Reply-To: <CAK6E8=eCOLU9AX0+bSrOg_UYBm1mFxrGT=ybksba9B0OUfp7jg@mail.gmail.com>

On Wed, 13 Jun 2018, Yuchung Cheng wrote:

> On Wed, Jun 13, 2018 at 9:55 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> >
> > When F-RTO algorithm (RFC 5682) is used on connection without both SACK and
> > timestamps (either because of (mis)configuration or because the other
> > endpoint does not advertise them), specific pattern loss can make RTO grow
> > exponentially until the sender is only able to send one packet per two
> > minutes (TCP_RTO_MAX).
> >
> > One way to reproduce is to
> >
> >   - make sure the connection uses neither SACK nor timestamps
> >   - let tp->reorder grow enough so that lost packets are retransmitted
> >     after RTO (rather than when high_seq - snd_una > reorder * MSS)
> >   - let the data flow stabilize
> >   - drop multiple sender packets in "every second" pattern

Hmm? What is deterministically dropping every second packet for a 
particular flow that has RTOs in between?

Years back I was privately contacted by somebody from a middlebox vendor 
for a case with very similar exponentially growing RTO due to the FRTO 
heuristic. It turned out that they didn't want to send dupacks for 
out-of-order packets because they wanted to keep the TCP side of their 
deep packet inspection middlebox primitive. He claimed that the middlebox 
doesn't need to send dupacks because there could be such a TCP 
implementation that too doesn't do them either (not that he had anything 
to point to besides their middlebox ;-)), which according to him was 
not required because of his intepretation of RFC793 (IIRC). ...Nevermind 
anything that has occurred since that era.

...Back then, I also envisioned in that mail exchange with him that a 
middlebox could break FRTO by always forcing a drop on the key packet
FRTO depends on. Ironically, that is exactly what is required to trigger 
this issue? Sure, every a heuristic can be fooled if a deterministic (or
crafted) pattern is introduced to defeat that particular heuristic. ...But 
I'd prefer that networks "dropping every second packet" of a flow to be 
fixed rather than FRTO?

In addition, one could even argue that the sender is sending whole the 
time with lower and lower rate (given the exponentially increasing RTO) 
and still gets losses, so that a further rate reduction would be the 
correct action. ...But take this intuitive reasoning with some grain of 
salt (that is, I can see reasons myself to disagree with it :-)).

> >   - either there is no new data to send or acks received in response to new
> >     data are also window updates (i.e. not dupacks by definition)

Can you explain what exactly do you mean with this "no new data to send" 
condition here as F-RTO is/should not be used if there's no new data to 
send?!?

...Or, why is the receiver going against SHOULD in RFC5681:
   "A TCP receiver SHOULD send an immediate duplicate ACK when an out-
   of-order segment arrives."
? ...And yes, I know there's this very issue with window updates masking 
duplicate ACKs in Linux TCP receiver but I was met with some skepticism 
on whether fixing it is worth it or not.

> > In this scenario, the sender keeps cycling between retransmitting first
> > lost packet (step 1 of RFC 5682), sending new data by (2b) and timing out
> > again. In this loop, the sender only gets
> >
> >   (a) acks for retransmitted segments (possibly together with old ones)
> >   (b) window updates
> >
> > Without timestamps, neither can be used for RTT estimator and without SACK,
> > we have no newly sacked segments to estimate RTT either. Therefore each
> > timeout doubles RTO and without usable RTT samples so that there is nothing
> > to counter the exponential growth.
> >
> > While disabling both SACK and timestamps doesn't make any sense, the
> > resulting behaviour is so pathological that it deserves an improvement.
> > (Also, both can be disabled on the other side.) Avoid F-RTO algorithm in
> > case both SACK and timestamps are disabled so that the sender falls back to
> > traditional slow start retransmission.
> >
> > Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> 
> Thanks for the patch (and packedrill test)! I would encourage
> submitting an errata to F-RTO RFC about this case.

Unless there's a convincing explination how such a drop pattern would 
occur in real world except due to serious brokeness/misconfiguration on 
network side (that should not be there), I'm not that sure it's exactly
what erratas are meant for.

-- 
 i.

^ permalink raw reply

* Re: [RFC PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation
From: Roger Pau Monné @ 2018-06-14  8:43 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: tglx, mingo, hpa, x86, boris.ostrovsky, konrad.wilk, netdev,
	jgross, xen-devel, linux-kernel, kamatam, eduval, vallish,
	fllinden, guruanb, rjw, pavel, len.brown, linux-pm, cyberax
In-Reply-To: <20180613222048.GB33296@kaos-source-ops-60001.pdx1.amazon.com>

Please try to avoid top posting.

On Wed, Jun 13, 2018 at 10:20:48PM +0000, Anchal Agarwal wrote:
> Hi Roger,
> To answer your question, due to the lack of mentioned commit
> (commit 12ea729645ac ("xen/blkback: unmap all persistent grants when
> frontend gets disconnected") in the older dom0 kernels(<3.2),resume from

This fix that you mention is only present in kernels >= 3.18 AFAICT,
and persistent grants where introduced in 3.8 (0a8704a51f38), so
anything < 3.8 should work fine. Not sure why you mention 3.2 here.

> hibernation can fail on guest side. In the absence of the commit,
> Persistant Grants are not unmapped immediately when frontend is 
> disconnected from backend and hence leave the block device in an 
> inconsistent state. To avoid this unstability and work with larger set 
> of kernel versions, this approach had been used. Once you don't have 
> any pending req/resp it is safer for guest to resume from hibernation.

I think the fix should be backported (if it hasn't been done yet) to
kernels between 3.8 and 3.18. I don't like to add all this code just
to work around a Linux backend kernel bug.

AFAICT if persistent grants work as expected you could use almost the
same path that's used for migration, greatly reducing the amount of
code that you need to add.

Thanks, Roger.

^ permalink raw reply

* Re: [PATCH bpf v2] xdp: Fix handling of devmap in generic XDP
From: Jesper Dangaard Brouer @ 2018-06-14  8:49 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: Alexei Starovoitov, Daniel Borkmann, netdev, brouer
In-Reply-To: <1528942062-2353-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

On Thu, 14 Jun 2018 11:07:42 +0900
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:

> Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed
> the return value type of __devmap_lookup_elem() from struct net_device *
> to struct bpf_dtab_netdev * but forgot to modify generic XDP code
> accordingly.
> Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
> net_device is expected, then skb->dev was set to invalid value.
> 
> v2:
> - Fix compiler warning without CONFIG_BPF_SYSCALL.
> 
> Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue")
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

Thanks for catching this!

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Notice, that the current code works (and does not crash), but it is
pure luck.  Because struct bpf_dtab_netdev happen to have the
net_device as the first member.

struct bpf_dtab_netdev {
	struct net_device *dev; /* must be first member, due to tracepoint */
	struct bpf_dtab *dtab;
	unsigned int bit;
	struct xdp_bulk_queue __percpu *bulkq;
	struct rcu_head rcu;
};

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH bpf v2] xdp: Fix handling of devmap in generic XDP
From: Toshiaki Makita @ 2018-06-14  9:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Alexei Starovoitov, Daniel Borkmann, netdev
In-Reply-To: <20180614104959.4e4e57b8@redhat.com>

On 2018/06/14 17:49, Jesper Dangaard Brouer wrote:
> On Thu, 14 Jun 2018 11:07:42 +0900
> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> 
>> Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed
>> the return value type of __devmap_lookup_elem() from struct net_device *
>> to struct bpf_dtab_netdev * but forgot to modify generic XDP code
>> accordingly.
>> Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
>> net_device is expected, then skb->dev was set to invalid value.
>>
>> v2:
>> - Fix compiler warning without CONFIG_BPF_SYSCALL.
>>
>> Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue")
>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> 
> Thanks for catching this!
> 
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> Notice, that the current code works (and does not crash), but it is
> pure luck.  Because struct bpf_dtab_netdev happen to have the
> net_device as the first member.
> 
> struct bpf_dtab_netdev {
> 	struct net_device *dev; /* must be first member, due to tracepoint */
> 	struct bpf_dtab *dtab;
> 	unsigned int bit;
> 	struct xdp_bulk_queue __percpu *bulkq;
> 	struct rcu_head rcu;
> };
> 

Actually no, the current code does not work and can crash, because we
need to dereference the pointer, i.e. need fwd->dev (IOW *fwd) not fwd.

-- 
Toshiaki Makita

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net-queue] i40e: Fix incorrect skb reserved size on rx
From: Malek, Patryk @ 2018-06-14  9:14 UTC (permalink / raw)
  To: Toshiaki Makita, Daniel Borkmann, Kirsher, Jeffrey T
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
In-Reply-To: <8963a38e-0583-1a3f-bcfe-8a62d5da6dbf@lab.ntt.co.jp>

> On 2018/06/13 18:06, Daniel Borkmann wrote:
> > On 06/13/2018 10:08 AM, Toshiaki Makita wrote:
> >> i40e_build_skb() reserves I40E_SKB_PAD + (xdp->data -
> >> xdp->data_hard_start) but obviously I40E_SKB_PAD is unnecessary
> here
> >> and mac_header/data feilds in skb becomes incorrect, and breaks

Shouldn't this be fields instead of feilds?

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net-queue] i40e: Fix incorrect skb reserved size on rx
From: Toshiaki Makita @ 2018-06-14  9:21 UTC (permalink / raw)
  To: Malek, Patryk
  Cc: Daniel Borkmann, Kirsher, Jeffrey T,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org
In-Reply-To: <FA03331EB45A2544B0CBCB1A14B6429E2B2A3ED7@IRSMSX104.ger.corp.intel.com>

On 2018/06/14 18:14, Malek, Patryk wrote:
>> On 2018/06/13 18:06, Daniel Borkmann wrote:
>>> On 06/13/2018 10:08 AM, Toshiaki Makita wrote:
>>>> i40e_build_skb() reserves I40E_SKB_PAD + (xdp->data -
>>>> xdp->data_hard_start) but obviously I40E_SKB_PAD is unnecessary
>> here
>>>> and mac_header/data feilds in skb becomes incorrect, and breaks
> 
> Shouldn't this be fields instead of feilds?

Thanks, but this is now superseded by Daniel's patch so dropped I think.
http://patchwork.ozlabs.org/patch/928778/

-- 
Toshiaki Makita

^ permalink raw reply

* Re: Request to enable setting the nested network namespace
From: Jiri Pirko @ 2018-06-14  9:27 UTC (permalink / raw)
  To: Pamela Mei; +Cc: netdev
In-Reply-To: <CAG89sxLKUdcDNj8JHSX_QnHxJitZEMBpDywUDkZpy9qv8wGanw@mail.gmail.com>

Thu, Jun 14, 2018 at 10:04:57AM CEST, pamela.mei@gmail.com wrote:
>In linux, set up 2 network namespaces, ns1 and ns2. "ip netns list"
>can view the 2 network namespaces.
>Move one network device from linux root namespace to ns1 then from ns1
>to ns2, then delete ns2,
>expect that network device can move back to ns1,
>but actual result is that eth1 is back to linux root network
>namespace. I'm not sure whether it's as expected.
>
>Here is the detail test steps:
>
>1.ip netns add ns1
>
>2.ip netns add ns2
>
>3.ip link set eth1 netns ns1
>
>4.ip netns exec ns1 ip link set eth1 netns ns2
>
>5.ip netns del ns2
>
>Expected result: eth1 will be in ns1
>
>Actual result: eth1 is back in linux root namespace 1
>
>Question: is there any method to realize such scenario to make sure
>device can be back to ns1 not linux root network namespace 1?
>
>How about if there's a function to enable nest network namespace e.g.
>can set ns1 as the parent namespace of ns2, then device can return to
>ns1 when ns2 is gone.

You would have to track the whole history of netns changes for each
netdevice. That does not sound right. Move back to initial netns seems
correct to me.


>
>
>Cheers,
>
>Pamela MEI

^ permalink raw reply

* Re: [PATCH bpf v2] xdp: Fix handling of devmap in generic XDP
From: Jesper Dangaard Brouer @ 2018-06-14  9:33 UTC (permalink / raw)
  To: Toshiaki Makita; +Cc: Alexei Starovoitov, Daniel Borkmann, netdev, brouer
In-Reply-To: <23f82d78-88dd-a5e5-ecb1-718fcf5c4a1e@lab.ntt.co.jp>

On Thu, 14 Jun 2018 18:00:22 +0900
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:

> On 2018/06/14 17:49, Jesper Dangaard Brouer wrote:
> > On Thu, 14 Jun 2018 11:07:42 +0900
> > Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
> >   
> >> Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed
> >> the return value type of __devmap_lookup_elem() from struct net_device *
> >> to struct bpf_dtab_netdev * but forgot to modify generic XDP code
> >> accordingly.
> >> Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
> >> net_device is expected, then skb->dev was set to invalid value.
> >>
> >> v2:
> >> - Fix compiler warning without CONFIG_BPF_SYSCALL.
> >>
> >> Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue")
> >> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>  
> > 
> > Thanks for catching this!
> > 
> > Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > 
> > Notice, that the current code works (and does not crash), but it is
> > pure luck.  Because struct bpf_dtab_netdev happen to have the
> > net_device as the first member.
> > 
> > struct bpf_dtab_netdev {
> > 	struct net_device *dev; /* must be first member, due to tracepoint */
> > 	struct bpf_dtab *dtab;
> > 	unsigned int bit;
> > 	struct xdp_bulk_queue __percpu *bulkq;
> > 	struct rcu_head rcu;
> > };
> >   
> 
> Actually no, the current code does not work and can crash, because we
> need to dereference the pointer, i.e. need fwd->dev (IOW *fwd) not fwd.

You are right, I ran some more tests, and yes, I managed to crash the
kernel.  Strange that is worked in my initial testing.  Now it
consistently crash.

[] general protection fault: 0000 [#1] SMP PTI
[] Modules linked in: time_bench_sample(O) time_bench(O) fuse mlx5_ib ib_uverbs ib_core tun nfnetli
nllc bpfilter sunrpc coretemp kvm_intel kvm irqbypass intel_cstate intel_uncore intel_rapl_perf pcs
phpchp wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad sch_fq_codel hid_generic mlx5_core i40e ml
xdevlink mdio i2c_algo_bit ptp sd_mod i2c_core pps_core [last unloaded: x_tables]
[] CPU: 0 PID: 8 Comm: ksoftirqd/0 Tainted: G        W  O      4.17.0-rc7-net-next-xdp-xdp_paper01+
 
[] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[] RIP: 0010:netdev_pick_tx+0x3f/0xc0
[] RSP: 0018:ffffc900031c3b98 EFLAGS: 00010296
[] RAX: dead000000000200 RBX: ffff88070f3d2e80 RCX: 0000000000000200
[] RDX: 0000000000000000 RSI: ffff88070b678d00 RDI: ffff88070f3d2e80
[] RBP: ffff88070f3d2e80 R08: ffff88084fda8080 R09: ffff88087c802f00
[] R10: ffffea001c2d1e00 R11: ffff88081e8287f0 R12: ffff88070b678d00
[] R13: ffffc90003843000 R14: 0000000000000000 R15: ffffc900031c3c30
[] FS:  0000000000000000(0000) GS:ffff88087fc00000(0000) knlGS:0000000000000000
[] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[] CR2: 00007fc939b36140 CR3: 000000087f20a005 CR4: 00000000003606f0
[] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[] Call Trace:
[]  generic_xdp_tx+0x24/0x180
[]  xdp_do_generic_redirect+0x240/0x390
[]  do_xdp_generic+0x250/0x3b0
[]  ? kmem_cache_alloc+0x38/0x1c0
[]  netif_receive_skb_internal+0x8d/0xe0
[]  napi_gro_receive+0xb5/0xd0
[]  mlx5e_handle_rx_cqe+0x1a4/0x5d0 [mlx5_core]
[]  mlx5e_poll_rx_cq+0xbc/0x8d0 [mlx5_core]
[]  ? mlx5e_post_rx_wqes+0x2bc/0x400 [mlx5_core]
[]  mlx5e_napi_poll+0xb0/0xcc0 [mlx5_core]
[]  net_rx_action+0x145/0x3d0
[]  ? sort_range+0x20/0x20
[]  __do_softirq+0xdc/0x2b4
[]  ? sort_range+0x20/0x20
[]  run_ksoftirqd+0x18/0x20
[]  smpboot_thread_fn+0xdf/0x150
[]  kthread+0x111/0x130
[]  ? kthread_create_worker_on_cpu+0x70/0x70
[]  ret_from_fork+0x1f/0x30
[] Code: 00 83 e8 01 3d ff 1f 00 00 76 10 65 8b 05 3a 02 94 7e 83 c0 01 89 86 ac 00 00 00 83 bd 8c 03 00 00 01 74 52 48 8b 85 e8 01 00 00 <48> 8b 40 30 48 85 c0 74 48 48 c7 c1 50 85 6c 81 4c 89 e6 48 89 
[] RIP: netdev_pick_tx+0x3f/0xc0 RSP: ffffc900031c3b98
[] ---[ end trace 8b77c7349af71e1b ]---
[] Kernel panic - not syncing: Fatal exception in interrupt
[] Kernel Offset: disabled
[] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---


(gdb) list *(generic_xdp_tx)+0x24
0xffffffff816cf874 is in generic_xdp_tx (net/core/dev.c:4142).
4137		struct netdev_queue *txq;
4138		bool free_skb = true;
4139		int cpu, rc;
4140	
4141		txq = netdev_pick_tx(dev, skb, NULL);
4142		cpu = smp_processor_id();
4143		HARD_TX_LOCK(dev, txq, cpu);
4144		if (!netif_xmit_stopped(txq)) {
4145			rc = netdev_start_xmit(skb, dev, txq, 0);
4146			if (dev_xmit_complete(rc))


(gdb) list *(netdev_pick_tx)+0x3f
0xffffffff816ceeef is in netdev_pick_tx (net/core/dev.c:3472).
3467	#endif
3468	
3469		if (dev->real_num_tx_queues != 1) {
3470			const struct net_device_ops *ops = dev->netdev_ops;
3471	
3472			if (ops->ndo_select_queue)
3473				queue_index = ops->ndo_select_queue(dev, skb, accel_priv,
3474								    __netdev_pick_tx);
3475			else
3476				queue_index = __netdev_pick_tx(dev, skb);
(gdb) 


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are disabled
From: Michal Kubecek @ 2018-06-14  9:34 UTC (permalink / raw)
  To: Ilpo Jarvinen; +Cc: Yuchung Cheng, netdev, Eric Dumazet, LKML
In-Reply-To: <alpine.DEB.2.20.1806141045430.29120@whs-18.cs.helsinki.fi>

On Thu, Jun 14, 2018 at 11:42:43AM +0300, Ilpo Järvinen wrote:
> On Wed, 13 Jun 2018, Yuchung Cheng wrote:
> 
> > On Wed, Jun 13, 2018 at 9:55 AM, Michal Kubecek <mkubecek@suse.cz> wrote:
> > >
> > > When F-RTO algorithm (RFC 5682) is used on connection without both SACK and
> > > timestamps (either because of (mis)configuration or because the other
> > > endpoint does not advertise them), specific pattern loss can make RTO grow
> > > exponentially until the sender is only able to send one packet per two
> > > minutes (TCP_RTO_MAX).
> > >
> > > One way to reproduce is to
> > >
> > >   - make sure the connection uses neither SACK nor timestamps
> > >   - let tp->reorder grow enough so that lost packets are retransmitted
> > >     after RTO (rather than when high_seq - snd_una > reorder * MSS)
> > >   - let the data flow stabilize
> > >   - drop multiple sender packets in "every second" pattern
> 
> Hmm? What is deterministically dropping every second packet for a 
> particular flow that has RTOs in between?

AFAIK the customer we managed to push to investigate the primary source
of the packet loss identified some problems with their load balancing
solution but I don't have more details. For the record, the loss didn't
last through the phase of RTO growing exponentially (so that there were
no lost retransmissions) but did last long enough to drop at least 20
packets. With the exponential growth, that was enough for RTO to reach
TCP_RTO_MAX (120s) and make the connection essentially stalled.

Actually, it doesn't need to be exactly "every second". As long as you
don't lose two consecutive segments (which would allow you to fall back
in step (2a)), you can have more than one received segments between them
and get the same issue.

> Years back I was privately contacted by somebody from a middlebox vendor 
> for a case with very similar exponentially growing RTO due to the FRTO 
> heuristic. It turned out that they didn't want to send dupacks for 
> out-of-order packets because they wanted to keep the TCP side of their 
> deep packet inspection middlebox primitive. He claimed that the middlebox 
> doesn't need to send dupacks because there could be such a TCP 
> implementation that too doesn't do them either (not that he had anything 
> to point to besides their middlebox ;-)), which according to him was 
> not required because of his intepretation of RFC793 (IIRC). ...Nevermind 
> anything that has occurred since that era.
> 
> ...Back then, I also envisioned in that mail exchange with him that a 
> middlebox could break FRTO by always forcing a drop on the key packet
> FRTO depends on. Ironically, that is exactly what is required to trigger 
> this issue? Sure, every a heuristic can be fooled if a deterministic (or
> crafted) pattern is introduced to defeat that particular heuristic.

OK, let me elaborate a bit more about the background. Within last few
months, we had six different reports of TCP stalls (typically for NFS
connections alternating between idle period and bulk transfers) which
started after an upgrade from SLE11 (with 3.0 kernel) to SLE12 SP2 or
SP3 (both 4.4 kernel).

Two of them were analysed down to the NAS on the other side which was
sending SACK blocks violating the RFC in two different ways - as
described in thread "TCP one-by-one acking - RFC interpretation
question".

Three of them do not seem to show any apparent RFC violation and the
problem is only in RTO doubling with each retransmission while there are
no usable replies that could be used for RTT estimate (in the absence of
both SACK and timestamps).

For the sake of completeness, there was also one report from two days
ago which looked almost the same but in the end it turned out that in
this case, SLES (with Firefox) was the receiver and sender was actually
Windows 2016 server with Microsoft IIS.

> I'd prefer that networks "dropping every second packet" of a flow to be 
> fixed rather than FRTO?

Yes, that was my first reaction that their primary focus should be the
lossy network. However, it's not behaving like this all the time, the
periods of loss are relatively short - but long enough to trigger the
"RTO loop".

> In addition, one could even argue that the sender is sending whole the 
> time with lower and lower rate (given the exponentially increasing RTO) 
> and still gets losses, so that a further rate reduction would be the 
> correct action. ...But take this intuitive reasoning with some grain of 
> salt (that is, I can see reasons myself to disagree with it :-)).

As I explained above, the loss was over by the time of first RTO
retransmission. I should probably have made that clear in the commit
message.

> > >   - either there is no new data to send or acks received in response to new
> > >     data are also window updates (i.e. not dupacks by definition)
> 
> Can you explain what exactly do you mean with this "no new data to send" 
> condition here as F-RTO is/should not be used if there's no new data to 
> send?!?

AFAICS RFC 5682 is not explicit about this and offers multiple options.
Anyway, this is not essential and in most of the customer provided
captures, it wasn't the case.

> ...Or, why is the receiver going against SHOULD in RFC5681:
>    "A TCP receiver SHOULD send an immediate duplicate ACK when an out-
>    of-order segment arrives."
> ? ...And yes, I know there's this very issue with window updates masking 
> duplicate ACKs in Linux TCP receiver but I was met with some skepticism 
> on whether fixing it is worth it or not.

Normally, we would have timestamps (and even SACK). Without them, you
cannot reliably recognize a dupack with changed window size from
a spontaneous window update.

> > Acked-by: Yuchung Cheng <ycheng@google.com>
> > 
> > Thanks for the patch (and packedrill test)! I would encourage
> > submitting an errata to F-RTO RFC about this case.
> 
> Unless there's a convincing explination how such a drop pattern would 
> occur in real world except due to serious brokeness/misconfiguration on 
> network side (that should not be there), I'm not that sure it's exactly
> what erratas are meant for.

As explained above, this commit was not inspired by some theoretical
study trying to find dark corner cases, it was result of investigation
of reports from  multiple customer encountering the problem in
real-life.  Sure, there was always something bad, namely SACK/timestamps
being disabled and network losing packets, but the effect (one packet
per two minutes) is so disastrous that I believe it should be handled.

Michal Kubecek

^ permalink raw reply

* Re: WARNING in bpf_prog_select_runtime
From: Daniel Borkmann @ 2018-06-14  9:45 UTC (permalink / raw)
  To: syzbot, ast, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <000000000000556929056e952ae0@google.com>

On 06/14/2018 09:37 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    ee946c36be21 Merge tag 'platform-drivers-x86-v4.17-2' of g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11ca275b800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=889265cebaf9bda1
> dashboard link: https://syzkaller.appspot.com/bug?extid=3b889862e65a98317058
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=17530b5b800000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com

Will submit a fix for this today.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-14 10:02 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Samudrala, Sridhar, Alexander Duyck, virtio-dev, aaron.f.brown,
	Jiri Pirko, Michael S. Tsirkin, Jakub Kicinski, Netdev,
	qemu-devel, virtualization
In-Reply-To: <CADGSJ213f8tJpNXuOhv8qRew-Y5VZAwA+srNMrLZYnKdVGLdAA@mail.gmail.com>

I've been pointed to this discussion (which I had missed previously)
and I'm getting a headache. Let me first summarize how I understand how
this feature is supposed to work, then I'll respond to some individual
points.

The basic idea is to enable guests to migrate seamlessly, while still
making it possible for them to use a passed-through device for more
performance etc. The means to do so is to hook a virtio-net device
together with a network device passed through via vfio. The
vfio-handled device is there for performance, the virtio device for
migratability. We have a new virtio feature bit for that which needs to
be negotiated for that 'combined' device to be available. We have to
consider two cases:

- Older guests that do not support the new feature bit. We presume that
  those guests will be confused if they get two network devices with
  the same MAC, so the idea is to not show them the vfio-handled device
  at all.
- Guests that negotiate the feature bit. We only know positively that
  they (a) know the feature bit and (b) are prepared to handle the
  consequences of negotiating it after they set the FEATURES_OK bit.
  This is therefore the earliest point in time that the vfio-handled
  device should be visible or usable for the guest.

On Wed, 13 Jun 2018 18:02:01 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Tue, Jun 12, 2018 at 5:08 PM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
> > On 6/12/2018 4:34 AM, Michael S. Tsirkin wrote:  
> >>
> >> On Mon, Jun 11, 2018 at 10:02:45PM -0700, Samudrala, Sridhar wrote:  
> >>>
> >>> On 6/11/2018 7:17 PM, Michael S. Tsirkin wrote:  
> >>>>
> >>>> On Tue, Jun 12, 2018 at 09:54:44AM +0800, Jason Wang wrote:  
> >>>>>
> >>>>> On 2018年06月12日 01:26, Michael S. Tsirkin wrote:  
> >>>>>>
> >>>>>> On Mon, May 07, 2018 at 04:09:54PM -0700, Sridhar Samudrala wrote:  
> >>>>>>>
> >>>>>>> This feature bit can be used by hypervisor to indicate virtio_net
> >>>>>>> device to
> >>>>>>> act as a standby for another device with the same MAC address.
> >>>>>>>
> >>>>>>> I tested this with a small change to the patch to mark the STANDBY
> >>>>>>> feature 'true'
> >>>>>>> by default as i am using libvirt to start the VMs.
> >>>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu
> >>>>>>> via libvirt
> >>>>>>> XML file?
> >>>>>>>
> >>>>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>  
> >>>>>>
> >>>>>> So I do not think we can commit to this interface: we
> >>>>>> really need to control visibility of the primary device.  
> >>>>>
> >>>>> The problem is legacy guest won't use primary device at all if we do
> >>>>> this.  
> >>>>
> >>>> And that's by design - I think it's the only way to ensure the
> >>>> legacy guest isn't confused.  
> >>>
> >>> Yes. I think so. But i am not sure if Qemu is the right place to control
> >>> the visibility
> >>> of the primary device. The primary device may not be specified as an
> >>> argument to Qemu. It
> >>> may be plugged in later.
> >>> The cloud service provider is providing a feature that enables low
> >>> latency datapath and live
> >>> migration capability.
> >>> A tenant can use this feature only if he is running a VM that has
> >>> virtio-net with failover support.  

So, do you know from the outset that there will be such a coupled
device? I.e., is it a property of the VM definition?

Can there be a 'prepared' virtio-net device that presents the STANDBY
feature even if there currently is no vfio-handled device available --
but making it possible to simply hotplug that device later?

Should it be possible to add a virtio/vfio pair later on?

> >>
> >> Well live migration is there already. The new feature is low latency
> >> data path.  
> >
> >
> > we get live migration with just virtio.  But I meant live migration with VF
> > as
> > primary device.
> >  
> >>
> >> And it's the guest that needs failover support not the VM.  
> >
> >
> > Isn't guest and VM synonymous?

I think we need to be really careful to not mix up the two: The VM
contains the definitions, but it is up to the guest how it uses them.

> >
> >  
> >>
> >>  
> >>> I think Qemu should check if guest virtio-net supports this feature and
> >>> provide a mechanism for
> >>> an upper layer indicating if the STANDBY feature is successfully
> >>> negotiated or not.
> >>> The upper layer can then decide if it should hot plug a VF with the same
> >>> MAC and manage the 2 links.
> >>> If VF is successfully hot plugged, virtio-net link should be disabled.  
> >>
> >> Did you even talk to upper layer management about it?
> >> Just list the steps they need to do and you will see
> >> that's a lot of machinery to manage by the upper layer.
> >>
> >> What do we gain in flexibility? As far as I can see the
> >> only gain is some resources saved for legacy VMs.
> >>
> >> That's not a lot as tenant of the upper layer probably already has
> >> at least a hunch that it's a new guest otherwise
> >> why bother specifying the feature at all - you
> >> save even more resources without it.
> >>  
> >
> > I am not all that familiar with how Qemu manages network devices. If we can
> > do all the
> > required management of the primary/standby devices within Qemu, that is
> > definitely a better
> > approach without upper layer involvement.  
> 
> Right. I would imagine in the extreme case the upper layer doesn't
> have to be involved at all if QEMU manages all hot plug/unplug logic.
> The management tool can supply passthrough device and virtio with the
> same group UUID, QEMU auto-manages the presence of the primary, and
> hot plug the device as needed before or after the migration.

I do not really see how you can manage that kind of stuff in QEMU only.
Have you talked to some libvirt folks? (And I'm not sure what you refer
to with 'group UUID'?)

Also, I think you need to make a distinction between hotplugging a
device and making it visible to the guest. What does 'hotplugging' mean
here? Adding it to the VM definition? Would it be enough to have the
vfio-based device not operational until the virtio feature bit has been
negotiated?

What happens if the guest does not use the vfio-based device after it
has been made available? Will you still disable the virtio-net link?
(All that link handling definitely sounds like a task for libvirt or
the like.)

Regarding hot(un)plugging during migration, I think you also need to
keep in mind that different architectures/busses have different
semantics there. Something that works if there's an unplug handshake may
not work on a platform with surprise removal.

Have you considered guest agents? All of this is punching through
several layers, and I'm not sure if that is a good idea.

^ permalink raw reply

* [PATCH v2 3/5] batman: use BIT_ULL for NL80211_STA_INFO_* attribute types
From: Omer Efrat @ 2018-06-14 10:12 UTC (permalink / raw)
  To: linux-wireless-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, Omer Efrat

Since 'filled' member in station_info changed to u64, BIT_ULL macro
should be used with NL80211_STA_INFO_* attribute types instead of BIT.

The BIT macro uses unsigned long type which some architectures handle as 32bit
and this results in compilation warnings such as:

net/mac80211/sta_info.c:2223:2: warning: left shift count >= width of type
  sinfo->filled |= BIT(NL80211_STA_INFO_TID_STATS);
  ^

Signed-off-by: Omer Efrat <omer.efrat-CtGflUZwD1xBDgjK7y7TUQ@public.gmane.org>
---
 net/batman-adv/bat_v_elp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/bat_v_elp.c b/net/batman-adv/bat_v_elp.c
index 71c20c1..71e6474 100644
--- a/net/batman-adv/bat_v_elp.c
+++ b/net/batman-adv/bat_v_elp.c
@@ -114,7 +114,7 @@ static u32 batadv_v_elp_get_throughput(struct batadv_hardif_neigh_node *neigh)
 		}
 		if (ret)
 			goto default_throughput;
-		if (!(sinfo.filled & BIT(NL80211_STA_INFO_EXPECTED_THROUGHPUT)))
+		if (!(sinfo.filled & BIT_ULL(NL80211_STA_INFO_EXPECTED_THROUGHPUT)))
 			goto default_throughput;
 
 		return sinfo.expected_throughput / 100;
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC PATCH RESEND] tcp: avoid F-RTO if SACK and timestamps are disabled
From: Ilpo Järvinen @ 2018-06-14 10:18 UTC (permalink / raw)
  To: Michal Kubecek; +Cc: Netdev, Eric Dumazet, Yuchung Cheng, LKML
In-Reply-To: <20180613165716.4fy7ufk7jnk3r67r@unicorn.suse.cz>

On Wed, 13 Jun 2018, Michal Kubecek wrote:

> On Wed, Jun 13, 2018 at 06:55:43PM +0200, Michal Kubecek wrote:
> > When F-RTO algorithm (RFC 5682) is used on connection without both SACK and
> > timestamps (either because of (mis)configuration or because the other
> > endpoint does not advertise them), specific pattern loss can make RTO grow
> > exponentially until the sender is only able to send one packet per two
> > minutes (TCP_RTO_MAX).
> > 
> > One way to reproduce is to
> > 
> >   - make sure the connection uses neither SACK nor timestamps
> >   - let tp->reorder grow enough so that lost packets are retransmitted
> >     after RTO (rather than when high_seq - snd_una > reorder * MSS)
> >   - let the data flow stabilize
> >   - drop multiple sender packets in "every second" pattern
> >   - either there is no new data to send or acks received in response to new
> >     data are also window updates (i.e. not dupacks by definition)
> > 
> > In this scenario, the sender keeps cycling between retransmitting first
> > lost packet (step 1 of RFC 5682), sending new data by (2b) and timing out
> > again. In this loop, the sender only gets
> > 
> >   (a) acks for retransmitted segments (possibly together with old ones)
> >   (b) window updates
> > 
> > Without timestamps, neither can be used for RTT estimator and without SACK,
> > we have no newly sacked segments to estimate RTT either. Therefore each
> > timeout doubles RTO and without usable RTT samples so that there is nothing
> > to counter the exponential growth.
> > 
> > While disabling both SACK and timestamps doesn't make any sense, the
> > resulting behaviour is so pathological that it deserves an improvement.
> > (Also, both can be disabled on the other side.) Avoid F-RTO algorithm in
> > case both SACK and timestamps are disabled so that the sender falls back to
> > traditional slow start retransmission.
> > 
> > Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
> 
> I was able to illustrate the issue using a packetdrill script. It cheats
> a bit by setting net.ipv4.tcp_reordering to 30 so that it we can get to
> the issue more quickly. In this case, we don't have more data to send
> but it's not essential; the issue can be reproduced even with sending of
> new data in F-RTO, it would only make everything more complicated.
> 
> I was able to run the same script on kernels 4.17-rc6, 4.12 (SLE15) and
> 4.4 (SLE12-SP2). Kernel 3.12 required minor modifications but not in the
> important part (the slow start is a bit slower there).
> 
> ---------------------------------------------------------------------------
> --tolerance_usecs=10000
> 
> // flush cached TCP metrics
> 0.000  `ip tcp_metrics flush all`
> +0.000 `sysctl -q net.ipv4.tcp_reordering=20`
> 
> 
> // establish a connection
> +0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> +0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> +0.000 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [131072], 4) = 0
> +0.000 bind(3, ..., ...) = 0
> +0.000 listen(3, 1) = 0
> 
> +0.100 < S 0:0(0) win 40000 <mss 1000>
> +0.000 > S. 0:0(0) ack 1 <mss 1460>
> +0.100 < . 1:1(0) ack 1 win 40000
> +0.000 accept(3, ..., ...) = 4
> 
> // Send 10 data segments.
> +0.100 write(4, ..., 30000) = 30000
> // For some reason (unknown yet), GSO packets are only 2000 bytes long
> +0.000 > . 1:2001(2000) ack 1
> +0.000 > . 2001:4001(2000) ack 1
> +0.000 > . 4001:6001(2000) ack 1
> +0.000 > . 6001:8001(2000) ack 1
> +0.000 > . 8001:10001(2000) ack 1
> +0.100 < . 1:1(0) ack 2001 win 38000
> +0.000 > . 10001:12001(2000) ack 1
> +0.000 > . 12001:14001(2000) ack 1
> +0.001 < . 1:1(0) ack 4001 win 36000
> +0.000 > . 14001:16001(2000) ack 1
> +0.000 > . 16001:18001(2000) ack 1
> +0.001 < . 1:1(0) ack 6001 win 34000
> +0.000 > . 18001:20001(2000) ack 1
> +0.000 > . 20001:22001(2000) ack 1
> +0.001 < . 1:1(0) ack 8001 win 32000
> +0.000 > . 22001:24001(2000) ack 1
> +0.000 > . 24001:26001(2000) ack 1
> +0.001 < . 1:1(0) ack 10001 win 30000
> +0.000 > . 26001:28001(2000) ack 1
> +0.000 > P. 28001:30001(2000) ack 1
> 
> // loss of 12001:13001, 14001:15001, ..., 28001:29001
> +0.100 < . 1:1(0) ack 12001 win 30000	// original ack
> +0.000 < . 1:1(0) ack 12001 win 30000	// 13001:14001
> +0.000 < . 1:1(0) ack 12001 win 30000	// 15001:16001
> +0.000 < . 1:1(0) ack 12001 win 30000	// 17001:18001
> +0.000 < . 1:1(0) ack 12001 win 30000	// 19001:20001
> +0.000 < . 1:1(0) ack 12001 win 30000	// 21001:22001
> +0.000 < . 1:1(0) ack 12001 win 30000	// 13001:24001
> +0.000 < . 1:1(0) ack 12001 win 30000	// 25001:26001
> +0.000 < . 1:1(0) ack 12001 win 30000	// 27001:28001
> +0.000 < . 1:1(0) ack 12001 win 30000	// 29001:30001
> 
> // RTO 300ms
> +0.270~+0.330 > . 12001:13001(1000) ack 1

Lets analyze this case:
ca_state = CA_Loss

> +0.100 < . 1:1(0) ack 14001 win 38000

snd_una advances => icsk_retransmits = 0

...The lack of new data segments here seems very relevant to me and it 
hides from you what is really happening under the hood...

> // RTO 600ms
> +0.540~+0.660 > . 14001:15001(1000) ack 1

The above should already result false for FRTO in this case:
                   (new_recovery || icsk->icsk_retransmits) &&

...But it doesn't. If there would be the new data segment they would show 
to you that we're running a FRTO bogus undo here (with a burst of new 
data segments before the second RTO). The bogus undo on that ACK causes 
ca_state to switch away from CA_Loss and FRTO can then reoccur even though 
it was not intended. Please, try with this patch:
  https://patchwork.ozlabs.org/patch/883654/


...Since you're dealing with non-SACK flows here, you might want to 
consider the other fixes in that same series too as they all fix bad 
brokeness. I should do an updated version for that series but I've been 
waiting for the TCP testsuite to be published...


-- 
 i.

^ permalink raw reply

* Re: [PATCH v2 3/5] batman: use BIT_ULL for NL80211_STA_INFO_* attribute types
From: Sven Eckelmann @ 2018-06-14 10:40 UTC (permalink / raw)
  To: Omer Efrat
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1528971137-432-1-git-send-email-omer.efrat-CtGflUZwD1xBDgjK7y7TUQ@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On Donnerstag, 14. Juni 2018 13:12:17 CEST Omer Efrat wrote:
> Since 'filled' member in station_info changed to u64, BIT_ULL macro
> should be used with NL80211_STA_INFO_* attribute types instead of BIT.
> 
> The BIT macro uses unsigned long type which some architectures handle as 32bit
> and this results in compilation warnings such as:
> 
> net/mac80211/sta_info.c:2223:2: warning: left shift count >= width of type
>   sinfo->filled |= BIT(NL80211_STA_INFO_TID_STATS);
>   ^
> 
> Signed-off-by: Omer Efrat <omer.efrat-CtGflUZwD1xBDgjK7y7TUQ@public.gmane.org>
> ---
>  net/batman-adv/bat_v_elp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

It is called "batman-adv" and not "batman". And when (as in commit) did it 
change to 64 bit?  Shouldn't there be a "Fixed: " line to know which kernels 
are affected (especially for the stable kernel developers).

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 3/5] batman: use BIT_ULL for NL80211_STA_INFO_* attribute types
From: Sven Eckelmann @ 2018-06-14 10:53 UTC (permalink / raw)
  To: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA, Omer Efrat
In-Reply-To: <32533954.9n15W0HXMB@bentobox>

[-- Attachment #1: Type: text/plain, Size: 431 bytes --]

Hi,

here are the infos which were missing and which should be included in the 
commit message

> > Since 'filled' member in station_info changed to u64

in commit 739960f128e5 ("cfg80211/nl80211: Add support for 
NL80211_STA_INFO_RX_DURATION")

[...]

Fixes: d62890885efb ("batman-adv: Accept only filled wifi station info")

> > Signed-off-by: Omer Efrat <omer.efrat-CtGflUZwD1xBDgjK7y7TUQ@public.gmane.org>


Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [B.A.T.M.A.N.] [PATCH v2 3/5] batman: use BIT_ULL for NL80211_STA_INFO_* attribute types
From: Johannes Berg @ 2018-06-14 11:05 UTC (permalink / raw)
  To: Sven Eckelmann, b.a.t.m.a.n; +Cc: Omer Efrat, netdev, linux-wireless
In-Reply-To: <1567584.jbsRn7ofiA@bentobox>

On Thu, 2018-06-14 at 12:53 +0200, Sven Eckelmann wrote:
> Hi,
> 
> here are the infos which were missing and which should be included in the 
> commit message
> 
> > > Since 'filled' member in station_info changed to u64
> 
> in commit 739960f128e5 ("cfg80211/nl80211: Add support for 
> NL80211_STA_INFO_RX_DURATION")

Yeah, which actually means this patch isn't needed?

BIT(NL80211_STA_INFO_EXPECTED_THROUGHPUT) is fine since
NL80211_STA_INFO_EXPECTED_THROUGHPUT is actually == 27.

johannes

^ permalink raw reply

* Re: [PATCH] selftests: bpf: config: add config fragments
From: William Tu @ 2018-06-14 11:06 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Anders Roxell, Alexei Starovoitov, Shuah Khan,
	Linux Kernel Network Developers, LKML, linux-kselftest
In-Reply-To: <d6851756-ae3b-09ec-f487-1eeece6bd4c6@iogearbox.net>

On Tue, Jun 12, 2018 at 5:08 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 06/12/2018 01:05 PM, Anders Roxell wrote:
>> Tests test_tunnel.sh fails due to config fragments ins't enabled.
>>
>> Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.")
>> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
>> ---
>>
>> All tests passes except ip6gretap that still fails. I'm unsure why.
>> Ideas?

Hi Anders,

ip6erspan is based on ip6gretap, does ip6erspan pass?

Regards,
William

^ permalink raw reply

* [RFC v2, net-next, PATCH 0/4] Add switchdev on TI-CPSW
From: Ilias Apalodimas @ 2018-06-14 11:11 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, andrew, f.fainelli
  Cc: francois.ozog, yogeshs, spatton, Jose.Abreu, Ilias Apalodimas

Hello,

This the RFC v2 which does not register the CPU port based on net-next. 
I didn't manage to rewrite the driver and splitting it to 
common library-old-new but, i did reorganize the patches a bit based 
on Andrew's suggestions. Hopefully it's easier to read.

patch #1: Prepares headers files and move common code to cpsw_priv.h.
patch #2: Adds functions to ALE for modifying VLANs/MDBs.
patch #3: Prepares cpsw driver for switchdev mode, without changing any
of the funtionality.
patch #4: Adds new mode of operation based on switchdev.

In order to enable this you need enable CONFIG_NET_SWITCHDEV, 
CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_TI_CPSW_SWITCHDEV
and add this to udev config: 

SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="0f011900", \
        ATTR{phys_port_name}!="", NAME="sw0$attr{phys_port_name}"

Since the phys_switch_id is based on cpsw version, users with different 
version will need to do 'ip -d link show dev sw0p1 | grep switchid' and 
replace with the correct value.

This patch creates 2 ports, sw0p1 and sw0p2 both connected to PHYs.

Bridge setup:
ip link add name br0 type bridge
ip link set dev br0 type bridge ageing_time 1000
ip link set dev br0 type bridge vlan_filtering 1
ip link set dev sw0p1 up
ip link set dev sw0p2 up
ip link set dev sw0p1 master br0
ip link set dev sw0p2 master br0
ifconfig br0 up

- VLAN config:
untagged:
bridge vlan add dev sw0p1 vid 100 pvid untagged master
bridge vlan add dev sw0p2 vid 100 pvid untagged master

tagged:
bridge vlan add dev sw0p1 vid 100 master
bridge vlan add dev sw0p2 vid 100 master

IP address on br0:
This will add VLAN 100 on the cpu port.
bridge vlan add dev br0 vid 100 pvid untagged self
udhcpc -i br0

- FDBs:
FDBs are automatically added on the appropriate switch port uppon detection
Manually adding FDBs:
bridge fdb add aa:bb:cc:dd:ee:ff dev sw0p1 master vlan 100
bridge fdb add aa:bb:cc:dd:ee:fe dev sw0p2 master

- MDBs:
MDBs are automatically added on the appropriate switch port uppon detection
Manually adding MDBs:
bridge mdb add dev br0 port sw0p1 grp 239.1.1.1 permanent vid 100

- Multicast testing client-port1(tagged on vlan 100) server-port1
switch-config is provided by TI (https://git.ti.com/switch-config)
and is used to verify correct switch configuration.
1. switch-config output
	- type: vlan , vid = 100, untag_force = 0x4, reg_mcast = 0x6,
	unreg_mcast = 0x0, member_list = 0x6
Server running on sw0p2: iperf -s -u -B 239.1.1.1 -i 1
Client running on sw0p1: iperf -c 239.1.1.1 -u -b 990m -f m -i 5 -t 3600
No IGMP reaches the CPU port to add MDBs(since CPU does not receive 
unregistered multicast as programmed).

If the MDB is added manually via:
bridge mdb add dev br0 port sw0p2 grp 239.1.1.1 permanent vid 100
or unregistered flooding is enabled via: 
bridge link set dev sw0p2 mcast_flood on
Multicast traffic is offloaded as expected.

2. switch-config output
	- type: vlan , vid = 100, untag_force = 0x7, reg_mcast = 0x7, 
	unreg_mcast = 0x1, member_list = 0x7
In this case CPU port receives the IGMP message and programs the
switch accordingly. 

tcpdump on br0 shows no packets. If the MDB entry is removed with
"bridge mdb del dev br0 port sw0p1 grp 239.1.1.1 permanent"
br0 is flooded with multicast packets correctly(since unreg multicast is
set for the CPU port).
If the the mdb entry is manually added tcpdump shows no packets and
multicast offloading starts working again.

root@ti:~# bridge mdb show
dev br0 port sw0p1 grp ff02::fb temp offload vid 100
dev br0 port sw0p1 grp 239.1.1.1 temp offload vid 100
root@ti:~# switch-config -d
type: mcast, vid = 100, addr = 01:00:5e:01:01:01, mcast_state = f, \
no super, port_mask = 0x2

- Multicast testing server-port0 client-port1
CPU port(port 0) does not show on bridge mdb show
Using ti's switch-config to dump the switch status shows that the MDB is 
installed correctly.

root@ti:~# switch-config -d
type: mcast, vid = 100, addr = 01:00:5e:01:01:01, mcast_state = f, \
no super, port_mask = 0x1

- registered multicast:
Setting on/off and IFF_MULTICAST (on eth0/eth1/br0) will affect registered 
multicast masks programmed in the switch(for port1, port2, cpu port
respectively).
This muct occur before adding VLANs on the interfaces. If you change the
flag after the VLAN configuration you need to re-issue the VLAN config 
commands. 

If CPU port is participating the proper VLANs MDBs/FDBs will be
offloaded by the switch as described in switchdev API. This will also be
reflected on "bridge vlan/fdb/mdb show" command(in case the host  sends
the join the mdb entry won't show there, but ALE status confirms that 
it's added).

- NFS:
The only way for NFS to work is by chrooting to a minimal environment when 
switch configuration that will affect connectivity is needed.
Assuming you are booting NFS with eth1 interface(the script is hacky and 
it's just there to prove NFS is doable).

setup.sh:
#!/bin/sh
mkdir proc
mount -t proc none /proc
ifconfig br0  > /dev/null
if [ $? -ne 0 ]; then
        echo "Setting up bridge"
        ip link add name br0 type bridge
        ip link set dev br0 type bridge ageing_time 1000
        ip link set dev br0 type bridge vlan_filtering 1

        ip link set eth1 down 
        ip link set eth1 name sw0p1 
        ip link set dev sw0p1 up
        ip link set dev sw0p2 up
        ip link set dev sw0p2 master br0
        ip link set dev sw0p1 master br0
        bridge vlan add dev br0 vid 1 pvid untagged self
        ifconfig sw0p1 0.0.0.0
        udhchc -i br0
fi
umount /proc

run_nfs.sh:
#!/bin/sh
mkdir /tmp/root/bin -p
mkdir /tmp/root/lib -p

cp -r /lib/ /tmp/root/
cp -r /bin/ /tmp/root/
cp /sbin/ip /tmp/root/bin
cp /sbin/bridge /tmp/root/bin
cp /sbin/ifconfig /tmp/root/bin
cp /sbin/udhcpc /tmp/root/bin
cp /path/to/setup.sh /tmp/root/bin
chroot /tmp/root/ busybox sh /bin/run_nfs.sh

run ./run_nfs.sh

- Current issues/future work:
1. For this hardware and it's applications it's essential to control the 
	CPU port individually. After removing the CPU port we lost the ability 
	to control unregistered multicast traffic flags. 
	This code unconditionally(if it participates on that VLAN) adds CPU port
	on the unregistered multicast mask, while for ports 1 and 2 this is 
	configurable via:
	"bridge link set dev eth1 mcast_flood on/off"
	Petr Machata introduced a funtionality on
	VLANs(9c86ce2c1ae337fc10568a12aea812ed03de8319) where the command
	"bridge vlan add dev br0 vid 100 pvid untagged self" is propagated to the
	driver and allows us to configure the CPU port. 
	Adding something similar for MDBs i.e 
	"bridge link set dev br0 mcast_flood on self" that reaches the driver
	is an idea on how to control the CPU port independently and removing the
	need to add/remove the CPU port on the vlan group for this to happen.
2. VLAN CoS is always set to 0
3. Add support for ageing configuration
4. ALE_P0_UNI_FLOOD can be controlled via: 
	"bridge link set dev br0 flood on self" if this propagates to the driver 
	as well.
5. Add documentation for CPSW configuration on the patch

- Changes since RFC v1:
 - Removed CPU port registration. User can now add CPU port VLANs with
        "bridge vlan add/del dev br0 vid 100 pvid untagged self".
 - Removing VLANs will modify registered/unregistered multicast port masks
 	properly.
 - ALE_P0_UNI_FLOOD is controlled from bridge members. As long as the 
        bridge has members(switch interfaces) this will be enabled.
 - added management for SWITCHDEV_OBJ_ID_HOST_MDB to control MDBs for the 
        CPU port.
 - Added STP support.
 - Added multicast flood support. CPU port is always enabled for now.

Ilias Apalodimas (4):
  net/cpsw: move common headers definitions to cpsw_priv.h
  net/cpsw_ale: add functions to modify VLANs/MDBs
  net/cpsw: prepare cpsw for switchdev support
  net/cpsw_switchdev: add switchdev mode of operation on cpsw driver

 drivers/net/ethernet/ti/Kconfig          |   9 +
 drivers/net/ethernet/ti/Makefile         |   1 +
 drivers/net/ethernet/ti/cpsw.c           | 555 ++++++++++++++++++++++---------
 drivers/net/ethernet/ti/cpsw_ale.c       | 188 ++++++++++-
 drivers/net/ethernet/ti/cpsw_ale.h       |  10 +
 drivers/net/ethernet/ti/cpsw_priv.h      | 148 +++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.c | 418 +++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
 8 files changed, 1167 insertions(+), 166 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_priv.h
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h

-- 
2.7.4

^ permalink raw reply

* [RFC v2, net-next, PATCH 1/4] net/cpsw: move common headers definitions to cpsw_priv.h
From: Ilias Apalodimas @ 2018-06-14 11:11 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, andrew, f.fainelli
  Cc: francois.ozog, yogeshs, spatton, Jose.Abreu, Ilias Apalodimas
In-Reply-To: <1528974690-31600-1-git-send-email-ilias.apalodimas@linaro.org>

A following patch introduces switchdev functionality. Move common
definitions to a private header file

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c      | 111 +---------------------------
 drivers/net/ethernet/ti/cpsw_priv.h | 141 ++++++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+), 110 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_priv.h

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 534596c..d13b57f 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -42,6 +42,7 @@
 
 #include "cpsw.h"
 #include "cpsw_ale.h"
+#include "cpsw_priv.h"
 #include "cpts.h"
 #include "davinci_cpdma.h"
 
@@ -89,7 +90,6 @@ do {								\
 #define CPSW_VERSION_3		0x19010f
 #define CPSW_VERSION_4		0x190112
 
-#define HOST_PORT_NUM		0
 #define CPSW_ALE_PORTS_NUM	3
 #define SLIVER_SIZE		0x40
 
@@ -310,16 +310,6 @@ struct cpsw_ss_regs {
 #define CPSW_MAX_BLKS_TX_SHIFT		4
 #define CPSW_MAX_BLKS_RX		5
 
-struct cpsw_host_regs {
-	u32	max_blks;
-	u32	blk_cnt;
-	u32	tx_in_ctl;
-	u32	port_vlan;
-	u32	tx_pri_map;
-	u32	cpdma_tx_pri_map;
-	u32	cpdma_rx_chan_map;
-};
-
 struct cpsw_sliver_regs {
 	u32	id_ver;
 	u32	mac_control;
@@ -371,105 +361,6 @@ struct cpsw_hw_stats {
 	u32	rxdmaoverruns;
 };
 
-struct cpsw_slave_data {
-	struct device_node *phy_node;
-	char		phy_id[MII_BUS_ID_SIZE];
-	int		phy_if;
-	u8		mac_addr[ETH_ALEN];
-	u16		dual_emac_res_vlan;	/* Reserved VLAN for DualEMAC */
-};
-
-struct cpsw_platform_data {
-	struct cpsw_slave_data	*slave_data;
-	u32	ss_reg_ofs;	/* Subsystem control register offset */
-	u32	channels;	/* number of cpdma channels (symmetric) */
-	u32	slaves;		/* number of slave cpgmac ports */
-	u32	active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */
-	u32	ale_entries;	/* ale table size */
-	u32	bd_ram_size;  /*buffer descriptor ram size */
-	u32	mac_control;	/* Mac control register */
-	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
-	bool	dual_emac;	/* Enable Dual EMAC mode */
-};
-
-struct cpsw_slave {
-	void __iomem			*regs;
-	struct cpsw_sliver_regs __iomem	*sliver;
-	int				slave_num;
-	u32				mac_control;
-	struct cpsw_slave_data		*data;
-	struct phy_device		*phy;
-	struct net_device		*ndev;
-	u32				port_vlan;
-};
-
-static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
-{
-	return readl_relaxed(slave->regs + offset);
-}
-
-static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
-{
-	writel_relaxed(val, slave->regs + offset);
-}
-
-struct cpsw_vector {
-	struct cpdma_chan *ch;
-	int budget;
-};
-
-struct cpsw_common {
-	struct device			*dev;
-	struct cpsw_platform_data	data;
-	struct napi_struct		napi_rx;
-	struct napi_struct		napi_tx;
-	struct cpsw_ss_regs __iomem	*regs;
-	struct cpsw_wr_regs __iomem	*wr_regs;
-	u8 __iomem			*hw_stats;
-	struct cpsw_host_regs __iomem	*host_port_regs;
-	u32				version;
-	u32				coal_intvl;
-	u32				bus_freq_mhz;
-	int				rx_packet_max;
-	struct cpsw_slave		*slaves;
-	struct cpdma_ctlr		*dma;
-	struct cpsw_vector		txv[CPSW_MAX_QUEUES];
-	struct cpsw_vector		rxv[CPSW_MAX_QUEUES];
-	struct cpsw_ale			*ale;
-	bool				quirk_irq;
-	bool				rx_irq_disabled;
-	bool				tx_irq_disabled;
-	u32 irqs_table[IRQ_NUM];
-	struct cpts			*cpts;
-	int				rx_ch_num, tx_ch_num;
-	int				speed;
-	int				usage_count;
-};
-
-struct cpsw_priv {
-	struct net_device		*ndev;
-	struct device			*dev;
-	u32				msg_enable;
-	u8				mac_addr[ETH_ALEN];
-	bool				rx_pause;
-	bool				tx_pause;
-	u32 emac_port;
-	struct cpsw_common *cpsw;
-};
-
-struct cpsw_stats {
-	char stat_string[ETH_GSTRING_LEN];
-	int type;
-	int sizeof_stat;
-	int stat_offset;
-};
-
-enum {
-	CPSW_STATS,
-	CPDMA_RX_STATS,
-	CPDMA_TX_STATS,
-};
-
 #define CPSW_STAT(m)		CPSW_STATS,				\
 				sizeof(((struct cpsw_hw_stats *)0)->m), \
 				offsetof(struct cpsw_hw_stats, m)
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
new file mode 100644
index 0000000..3b02a83
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -0,0 +1,141 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+
+#define HOST_PORT_NUM		0
+#define IRQ_NUM			2
+#define CPSW_MAX_QUEUES		8
+
+#define CPSW_VERSION_1		0x19010a
+#define CPSW_VERSION_2		0x19010c
+#define CPSW_VERSION_3		0x19010f
+#define CPSW_VERSION_4		0x190112
+
+/* CPSW_PORT_V1 */
+#define CPSW1_MAX_BLKS      0x00 /* Maximum FIFO Blocks */
+#define CPSW1_BLK_CNT       0x04 /* FIFO Block Usage Count (Read Only) */
+#define CPSW1_TX_IN_CTL     0x08 /* Transmit FIFO Control */
+#define CPSW1_PORT_VLAN     0x0c /* VLAN Register */
+#define CPSW1_TX_PRI_MAP    0x10 /* Tx Header Priority to Switch Pri Mapping */
+#define CPSW1_TS_CTL        0x14 /* Time Sync Control */
+#define CPSW1_TS_SEQ_LTYPE  0x18 /* Time Sync Sequence ID Offset and Msg Type */
+#define CPSW1_TS_VLAN       0x1c /* Time Sync VLAN1 and VLAN2 */
+
+/* CPSW_PORT_V2 */
+#define CPSW2_CONTROL       0x00 /* Control Register */
+#define CPSW2_MAX_BLKS      0x08 /* Maximum FIFO Blocks */
+#define CPSW2_BLK_CNT       0x0c /* FIFO Block Usage Count (Read Only) */
+#define CPSW2_TX_IN_CTL     0x10 /* Transmit FIFO Control */
+#define CPSW2_PORT_VLAN     0x14 /* VLAN Register */
+#define CPSW2_TX_PRI_MAP    0x18 /* Tx Header Priority to Switch Pri Mapping */
+#define CPSW2_TS_SEQ_MTYPE  0x1c /* Time Sync Sequence ID Offset and Msg Type */
+
+struct cpsw_slave_data {
+	struct	device_node *phy_node;
+	char	phy_id[MII_BUS_ID_SIZE];
+	int	phy_if;
+	u8	mac_addr[ETH_ALEN];
+	u16	dual_emac_res_vlan;	/* Reserved VLAN for DualEMAC */
+};
+
+struct cpsw_platform_data {
+	struct cpsw_slave_data	*slave_data;
+	u32	ss_reg_ofs;	/* Subsystem control register offset */
+	u32	channels;	/* number of cpdma channels (symmetric) */
+	u32	slaves;		/* number of slave cpgmac ports */
+	u32	active_slave; /* time stamping, ethtool and SIOCGMIIPHY slave */
+	u32	ale_entries;	/* ale table size */
+	u32	bd_ram_size;  /*buffer descriptor ram size */
+	u32	mac_control;	/* Mac control register */
+	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
+	bool	dual_emac;	/* Enable Dual EMAC mode */
+};
+
+struct cpsw_slave {
+	void __iomem			*regs;
+	struct cpsw_sliver_regs __iomem	*sliver;
+	int				slave_num;
+	u32				mac_control;
+	struct cpsw_slave_data		*data;
+	struct phy_device		*phy;
+	struct net_device		*ndev;
+	u32				port_vlan;
+};
+
+struct cpsw_vector {
+	struct cpdma_chan *ch;
+	int budget;
+};
+
+struct cpsw_common {
+	struct device			*dev;
+	struct cpsw_platform_data	data;
+	struct napi_struct		napi_rx;
+	struct napi_struct		napi_tx;
+	struct cpsw_ss_regs __iomem	*regs;
+	struct cpsw_wr_regs __iomem	*wr_regs;
+	u8 __iomem			*hw_stats;
+	struct cpsw_host_regs __iomem	*host_port_regs;
+	u32				version;
+	u32				coal_intvl;
+	u32				bus_freq_mhz;
+	int				rx_packet_max;
+	struct cpsw_slave		*slaves;
+	struct cpdma_ctlr		*dma;
+	struct cpsw_vector		txv[CPSW_MAX_QUEUES];
+	struct cpsw_vector		rxv[CPSW_MAX_QUEUES];
+	struct cpsw_ale			*ale;
+	bool				quirk_irq;
+	bool				rx_irq_disabled;
+	bool				tx_irq_disabled;
+	u32				irqs_table[IRQ_NUM];
+	struct cpts			*cpts;
+	int				rx_ch_num, tx_ch_num;
+	int				speed;
+	int				usage_count;
+};
+
+struct cpsw_priv {
+	struct net_device	*ndev;
+	struct device		*dev;
+	u32			msg_enable;
+	u8			mac_addr[ETH_ALEN];
+	bool			rx_pause;
+	bool			tx_pause;
+	u8			port_state[3];
+	u32			emac_port;
+	struct cpsw_common	*cpsw;
+};
+
+struct cpsw_stats {
+	char stat_string[ETH_GSTRING_LEN];
+	int type;
+	int sizeof_stat;
+	int stat_offset;
+};
+
+enum {
+	CPSW_STATS,
+	CPDMA_RX_STATS,
+	CPDMA_TX_STATS,
+};
+
+struct cpsw_host_regs {
+	u32	max_blks;
+	u32	blk_cnt;
+	u32	tx_in_ctl;
+	u32	port_vlan;
+	u32	tx_pri_map;
+	u32	cpdma_tx_pri_map;
+	u32	cpdma_rx_chan_map;
+};
+
+static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
+{
+	return readl_relaxed(slave->regs + offset);
+}
+
+static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
+{
+	writel_relaxed(val, slave->regs + offset);
+}
-- 
2.7.4

^ permalink raw reply related

* [RFC v2, net-next, PATCH 2/4] net/cpsw_ale: add functions to modify VLANs/MDBs
From: Ilias Apalodimas @ 2018-06-14 11:11 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, andrew, f.fainelli
  Cc: francois.ozog, yogeshs, spatton, Jose.Abreu, Ilias Apalodimas
In-Reply-To: <1528974690-31600-1-git-send-email-ilias.apalodimas@linaro.org>

A following patch introduces switchdev functionality. Add functions
to cpsw ALE engine to modify VLANs/MDBs

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw_ale.c | 188 ++++++++++++++++++++++++++++++++++++-
 drivers/net/ethernet/ti/cpsw_ale.h |  10 ++
 2 files changed, 195 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 93dc05c..98e6bcd 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -287,6 +287,9 @@ int cpsw_ale_flush_multicast(struct cpsw_ale *ale, int port_mask, int vid)
 		if (cpsw_ale_get_mcast(ale_entry)) {
 			u8 addr[6];
 
+			if (cpsw_ale_get_super(ale_entry))
+				continue;
+
 			cpsw_ale_get_addr(ale_entry, addr);
 			if (!is_broadcast_ether_addr(addr))
 				cpsw_ale_flush_mcast(ale, ale_entry, port_mask);
@@ -365,7 +368,7 @@ int cpsw_ale_add_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
 	cpsw_ale_set_vlan_entry_type(ale_entry, flags, vid);
 
 	cpsw_ale_set_addr(ale_entry, addr);
-	cpsw_ale_set_super(ale_entry, (flags & ALE_BLOCKED) ? 1 : 0);
+	cpsw_ale_set_super(ale_entry, (flags & ALE_SUPER) ? 1 : 0);
 	cpsw_ale_set_mcast_state(ale_entry, mcast_state);
 
 	mask = cpsw_ale_get_port_mask(ale_entry,
@@ -409,6 +412,46 @@ int cpsw_ale_del_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
 }
 EXPORT_SYMBOL_GPL(cpsw_ale_del_mcast);
 
+static int cpsw_ale_read_mc(struct cpsw_ale *ale, u8 *addr, int flags, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_addr(ale, addr, (flags & ALE_VLAN) ? vid : 0);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_port_mask(ale_entry, ale->port_mask_bits);
+}
+
+int cpsw_ale_mcast_add_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid, int mcast_state)
+{
+	int mcast_members, ret;
+
+	mcast_members = cpsw_ale_read_mc(ale, addr, flags, vid) | port_mask;
+	ret = cpsw_ale_add_mcast(ale, addr, mcast_members, flags, vid,
+				 mcast_state);
+
+	return ret;
+}
+
+int cpsw_ale_mcast_del_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid)
+{
+	int mcast_members, ret;
+	int idx;
+
+	mcast_members = cpsw_ale_read_mc(ale, addr, flags, vid) & ~port_mask;
+	idx = cpsw_ale_match_addr(ale, addr, (flags & ALE_VLAN) ? vid : 0);
+	if (idx < 0)
+		return 0;
+	ret = cpsw_ale_del_mcast(ale, addr, mcast_members, flags, vid);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_mcast_del_modify);
+
 /* ALE NetCP NU switch specific vlan functions */
 static void cpsw_ale_set_vlan_mcast(struct cpsw_ale *ale, u32 *ale_entry,
 				    int reg_mcast, int unreg_mcast)
@@ -424,6 +467,52 @@ static void cpsw_ale_set_vlan_mcast(struct cpsw_ale *ale, u32 *ale_entry,
 	writel(unreg_mcast, ale->params.ale_regs + ALE_VLAN_MASK_MUX(idx));
 }
 
+static int cpsw_ale_read_untagged(struct cpsw_ale *ale, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_vlan_untag_force(ale_entry, ale->vlan_field_bits);
+}
+
+/* returns mask of current members for specificed vlan */
+static int cpsw_ale_read_vlan_members(struct cpsw_ale *ale, u16 vid)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	return cpsw_ale_get_vlan_member_list(ale_entry, ale->vlan_field_bits);
+}
+
+/* returns mask of registered/unregistered multicast registration */
+static int cpsw_ale_read_reg_unreg_mc(struct cpsw_ale *ale, u16 vid, bool unreg)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
+	int idx;
+	int ret;
+
+	idx = cpsw_ale_match_vlan(ale, vid);
+	if (idx >= 0)
+		cpsw_ale_read(ale, idx, ale_entry);
+
+	if (unreg)
+		ret = cpsw_ale_get_vlan_unreg_mcast(ale_entry,
+						    ale->vlan_field_bits);
+	else
+		ret = cpsw_ale_get_vlan_reg_mcast(ale_entry,
+						  ale->vlan_field_bits);
+
+	return ret;
+}
+
 int cpsw_ale_add_vlan(struct cpsw_ale *ale, u16 vid, int port, int untag,
 		      int reg_mcast, int unreg_mcast)
 {
@@ -462,6 +551,11 @@ EXPORT_SYMBOL_GPL(cpsw_ale_add_vlan);
 
 int cpsw_ale_del_vlan(struct cpsw_ale *ale, u16 vid, int port_mask)
 {
+	int reg_mcast =
+		cpsw_ale_read_reg_unreg_mc(ale, vid, 0) & port_mask;
+	int unreg_mcast =
+		cpsw_ale_read_reg_unreg_mc(ale, vid, 1) & port_mask;
+	int untag = cpsw_ale_read_untagged(ale, vid) & port_mask;
 	u32 ale_entry[ALE_ENTRY_WORDS] = {0, 0, 0};
 	int idx;
 
@@ -471,17 +565,105 @@ int cpsw_ale_del_vlan(struct cpsw_ale *ale, u16 vid, int port_mask)
 
 	cpsw_ale_read(ale, idx, ale_entry);
 
-	if (port_mask)
+	if (port_mask) {
+		cpsw_ale_set_vlan_untag_force(ale_entry, untag,
+					      ale->vlan_field_bits);
+		if (!ale->params.nu_switch_ale) {
+			cpsw_ale_set_vlan_reg_mcast(ale_entry, reg_mcast,
+						    ale->vlan_field_bits);
+			cpsw_ale_set_vlan_unreg_mcast(ale_entry, unreg_mcast,
+						      ale->vlan_field_bits);
+		} else {
+			cpsw_ale_set_vlan_mcast(ale, ale_entry, reg_mcast,
+						unreg_mcast);
+		}
 		cpsw_ale_set_vlan_member_list(ale_entry, port_mask,
 					      ale->vlan_field_bits);
-	else
+	} else {
 		cpsw_ale_set_entry_type(ale_entry, ALE_TYPE_FREE);
+	}
 
 	cpsw_ale_write(ale, idx, ale_entry);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(cpsw_ale_del_vlan);
 
+int cpsw_ale_vlan_add_modify(struct cpsw_ale *ale, u16 vid, int port_mask,
+			     int untag_mask, int reg_mask, int unreg_mask)
+{
+	int ret = 0;
+	int vlan_members = cpsw_ale_read_vlan_members(ale, vid) & ~port_mask;
+	int reg_mcast_members =
+		cpsw_ale_read_reg_unreg_mc(ale, vid, 0) & ~port_mask;
+	int unreg_mcast_members =
+		cpsw_ale_read_reg_unreg_mc(ale, vid, 1) & ~port_mask;
+	int untag_members = cpsw_ale_read_untagged(ale, vid) & ~port_mask;
+
+	vlan_members |= port_mask;
+	untag_members |= untag_mask;
+	reg_mcast_members |= reg_mask;
+	unreg_mcast_members |= unreg_mask;
+
+	ret = cpsw_ale_add_vlan(ale, vid, vlan_members, untag_members,
+				reg_mcast_members, unreg_mcast_members);
+	if (ret) {
+		dev_err(ale->params.dev, "Unable to add vlan\n");
+		return ret;
+	}
+	dev_dbg(ale->params.dev, "port mask 0x%x untag 0x%x\n", vlan_members,
+		untag_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_vlan_add_modify);
+
+int cpsw_ale_vlan_del_modify(struct cpsw_ale *ale, u16 vid, int port_mask)
+{
+	int ret = 0;
+	int vlan_members;
+
+	vlan_members = cpsw_ale_read_vlan_members(ale, vid);
+	vlan_members &= ~port_mask;
+
+	ret = cpsw_ale_del_vlan(ale, vid, vlan_members);
+	if (ret) {
+		dev_err(ale->params.dev, "Unable to del vlan\n");
+		return ret;
+	}
+	dev_dbg(ale->params.dev, "port mask 0x%x\n", port_mask);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_vlan_del_modify);
+
+void cpsw_ale_set_unreg_mcast(struct cpsw_ale *ale, int unreg_mcast_mask,
+			      bool add)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS];
+	int unreg_members = 0;
+	int type, idx;
+
+	for (idx = 0; idx < ale->params.ale_entries; idx++) {
+		cpsw_ale_read(ale, idx, ale_entry);
+		type = cpsw_ale_get_entry_type(ale_entry);
+		if (type != ALE_TYPE_VLAN)
+			continue;
+
+		unreg_members =
+			cpsw_ale_get_vlan_unreg_mcast(ale_entry,
+						      ale->vlan_field_bits);
+		if (add)
+			unreg_members |= unreg_mcast_mask;
+		else
+			unreg_members &= ~unreg_mcast_mask;
+		cpsw_ale_set_vlan_unreg_mcast(ale_entry, unreg_members,
+					      ale->vlan_field_bits);
+		cpsw_ale_write(ale, idx, ale_entry);
+	}
+}
+EXPORT_SYMBOL_GPL(cpsw_ale_set_unreg_mcast);
+
 void cpsw_ale_set_allmulti(struct cpsw_ale *ale, int allmulti)
 {
 	u32 ale_entry[ALE_ENTRY_WORDS];
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index d4fe901..1eef640 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -123,4 +123,14 @@ int cpsw_ale_control_set(struct cpsw_ale *ale, int port,
 			 int control, int value);
 void cpsw_ale_dump(struct cpsw_ale *ale, u32 *data);
 
+int cpsw_ale_vlan_add_modify(struct cpsw_ale *ale, u16 vid, int port_mask,
+			     int untag_mask, int reg_mcast, int unreg_mcast);
+int cpsw_ale_vlan_del_modify(struct cpsw_ale *ale, u16 vid, int port_mask);
+int cpsw_ale_mcast_add_modify(struct cpsw_ale *ale, u8 *addr, int port_mask,
+			      int flags, u16 vid, int mcast_state);
+int cpsw_ale_mcast_del_modify(struct cpsw_ale *ale, u8 *addr, int port,
+			      int flags, u16 vid);
+void cpsw_ale_set_unreg_mcast(struct cpsw_ale *ale, int unreg_mcast_mask,
+			      bool add);
+
 #endif
-- 
2.7.4

^ permalink raw reply related

* [RFC v2, net-next, PATCH 3/4] net/cpsw: prepare cpsw for switchdev support
From: Ilias Apalodimas @ 2018-06-14 11:11 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, andrew, f.fainelli
  Cc: francois.ozog, yogeshs, spatton, Jose.Abreu, Ilias Apalodimas
In-Reply-To: <1528974690-31600-1-git-send-email-ilias.apalodimas@linaro.org>

A following patch introduces switchdev functionality. Prepare cpsw
driver to accommodate an extra mode of operation using switchdev.
This patch does not changes the cpsw driver current functionality

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/cpsw.c      | 146 ++++++++++++++++++++++++------------
 drivers/net/ethernet/ti/cpsw_priv.h |   7 +-
 2 files changed, 104 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index d13b57f..e5765cc 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -147,9 +147,6 @@ do {								\
 #define CPSW_CMINTMAX_INTVL	(1000 / CPSW_CMINTMIN_CNT)
 #define CPSW_CMINTMIN_INTVL	((1000 / CPSW_CMINTMAX_CNT) + 1)
 
-#define cpsw_slave_index(cpsw, priv)				\
-		((cpsw->data.dual_emac) ? priv->emac_port :	\
-		cpsw->data.active_slave)
 #define IRQ_NUM			2
 #define CPSW_MAX_QUEUES		8
 #define CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT 256
@@ -182,6 +179,9 @@ static int descs_pool_size = CPSW_CPDMA_DESCS_POOL_SIZE_DEFAULT;
 module_param(descs_pool_size, int, 0444);
 MODULE_PARM_DESC(descs_pool_size, "Number of CPDMA CPPI descriptors in pool");
 
+static int cpsw_is_dual_mac(u8 switch_mode);
+static int cpsw_is_switch(u8 switch_mode);
+
 struct cpsw_wr_regs {
 	u32	id_ver;
 	u32	soft_reset;
@@ -434,8 +434,9 @@ static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
 		struct cpsw_slave *slave;				\
 		struct cpsw_common *cpsw = (priv)->cpsw;		\
 		int n;							\
-		if (cpsw->data.dual_emac)				\
-			(func)((cpsw)->slaves + priv->emac_port, ##arg);\
+		if (!cpsw_is_switch(cpsw->data.switch_mode))		\
+			(func)((cpsw)->slaves + priv->emac_port - 1,	\
+			       ##arg);					\
 		else							\
 			for (n = cpsw->data.slaves,			\
 					slave = cpsw->slaves;		\
@@ -445,7 +446,7 @@ static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
 
 #define cpsw_dual_emac_src_port_detect(cpsw, status, ndev, skb)		\
 	do {								\
-		if (!cpsw->data.dual_emac)				\
+		if (cpsw_is_switch(cpsw->data.switch_mode))		\
 			break;						\
 		if (CPDMA_RX_SOURCE_PORT(status) == 1) {		\
 			ndev = cpsw->slaves[0].ndev;			\
@@ -457,7 +458,7 @@ static const struct cpsw_stats cpsw_gstrings_ch_stats[] = {
 	} while (0)
 #define cpsw_add_mcast(cpsw, priv, addr)				\
 	do {								\
-		if (cpsw->data.dual_emac) {				\
+		if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {		\
 			struct cpsw_slave *slave = cpsw->slaves +	\
 						priv->emac_port;	\
 			int slave_port = cpsw_get_slave_port(		\
@@ -477,13 +478,31 @@ static inline int cpsw_get_slave_port(u32 slave_num)
 	return slave_num + 1;
 }
 
+static int cpsw_is_dual_mac(u8 switch_mode)
+{
+	return switch_mode == CPSW_DUAL_EMAC;
+}
+
+static int cpsw_is_switch(u8 switch_mode)
+{
+	return switch_mode == CPSW_TI_SWITCH;
+}
+
+static int cpsw_slave_index(struct cpsw_priv *priv)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	return cpsw->data.switch_mode ? priv->emac_port - 1 :
+		cpsw->data.active_slave;
+}
+
 static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 {
 	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
 	struct cpsw_ale *ale = cpsw->ale;
 	int i;
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		bool flag = false;
 
 		/* Enabling promiscuous mode for one interface will be
@@ -509,7 +528,7 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			cpsw_ale_control_set(ale, 0, ALE_BYPASS, 0);
 			dev_dbg(&ndev->dev, "promiscuity disabled\n");
 		}
-	} else {
+	} else if (cpsw_is_switch(cpsw->data.switch_mode)) {
 		if (enable) {
 			unsigned long timeout = jiffies + HZ;
 
@@ -556,10 +575,11 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	int vid;
 
-	if (cpsw->data.dual_emac)
-		vid = cpsw->slaves[priv->emac_port].port_vlan;
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode))
+		vid = cpsw->slaves[slave_no].port_vlan;
 	else
 		vid = cpsw->data.default_vlan;
 
@@ -630,8 +650,9 @@ static void cpsw_tx_handler(void *token, int len, int status)
 static void cpsw_rx_vlan_encap(struct sk_buff *skb)
 {
 	struct cpsw_priv *priv = netdev_priv(skb->dev);
-	struct cpsw_common *cpsw = priv->cpsw;
 	u32 rx_vlan_encap_hdr = *((u32 *)skb->data);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	u16 vtag, vid, prio, pkt_type;
 
 	/* Remove VLAN header encapsulation word */
@@ -652,8 +673,8 @@ static void cpsw_rx_vlan_encap(struct sk_buff *skb)
 	if (!vid)
 		return;
 	/* Ignore default vlans in dual mac mode */
-	if (cpsw->data.dual_emac &&
-	    vid == cpsw->slaves[priv->emac_port].port_vlan)
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode) &&
+	    vid == cpsw->slaves[slave_no].port_vlan)
 		return;
 
 	prio = (rx_vlan_encap_hdr >>
@@ -682,9 +703,9 @@ static void cpsw_rx_handler(void *token, int len, int status)
 	cpsw_dual_emac_src_port_detect(cpsw, status, ndev, skb);
 
 	if (unlikely(status < 0) || unlikely(!netif_running(ndev))) {
-		/* In dual emac mode check for all interfaces */
-		if (cpsw->data.dual_emac && cpsw->usage_count &&
-		    (status >= 0)) {
+		/* In any other that switch mode check for all interfaces */
+		if (!cpsw_is_switch(cpsw->data.switch_mode) &&
+		    cpsw->usage_count && status >= 0) {
 			/* The packet received is for the interface which
 			 * is already down and the other interface is up
 			 * and running, instead of freeing which results
@@ -1235,11 +1256,11 @@ static inline int cpsw_tx_packet_submit(struct cpsw_priv *priv,
 					struct sk_buff *skb,
 					struct cpdma_chan *txch)
 {
-	struct cpsw_common *cpsw = priv->cpsw;
 
 	skb_tx_timestamp(skb);
+
 	return cpdma_chan_submit(txch, skb, skb->data, skb->len,
-				 priv->emac_port + cpsw->data.dual_emac);
+				 priv->emac_port);
 }
 
 static inline void cpsw_add_dual_emac_def_ale_entries(
@@ -1314,7 +1335,7 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
 
 	slave_port = cpsw_get_slave_port(slave->slave_num);
 
-	if (cpsw->data.dual_emac)
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode))
 		cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port);
 	else
 		cpsw_ale_add_mcast(cpsw->ale, priv->ndev->broadcast,
@@ -1393,8 +1414,8 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 	control_reg = readl(&cpsw->regs->control);
 	control_reg |= CPSW_VLAN_AWARE | CPSW_RX_VLAN_ENCAP;
 	writel(control_reg, &cpsw->regs->control);
-	fifo_mode = (cpsw->data.dual_emac) ? CPSW_FIFO_DUAL_MAC_MODE :
-		     CPSW_FIFO_NORMAL_MODE;
+	fifo_mode = cpsw_is_dual_mac(cpsw->data.switch_mode) ?
+		CPSW_FIFO_DUAL_MAC_MODE : CPSW_FIFO_NORMAL_MODE;
 	writel(fifo_mode, &cpsw->host_port_regs->tx_in_ctl);
 
 	/* setup host port priority mapping */
@@ -1405,7 +1426,7 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 	cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM,
 			     ALE_PORT_STATE, ALE_PORT_STATE_FORWARD);
 
-	if (!cpsw->data.dual_emac) {
+	if (!cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		cpsw_ale_add_ucast(cpsw->ale, priv->mac_addr, HOST_PORT_NUM,
 				   0, 0);
 		cpsw_ale_add_mcast(cpsw->ale, priv->ndev->broadcast,
@@ -1508,7 +1529,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
 	for_each_slave(priv, cpsw_slave_open, priv);
 
 	/* Add default VLAN */
-	if (!cpsw->data.dual_emac)
+	if (!cpsw_is_dual_mac(cpsw->data.switch_mode))
 		cpsw_add_default_vlan(priv);
 	else
 		cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
@@ -1685,9 +1706,13 @@ static void cpsw_hwtstamp_v2(struct cpsw_priv *priv)
 {
 	struct cpsw_slave *slave;
 	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	u32 ctrl, mtype;
 
-	slave = &cpsw->slaves[cpsw_slave_index(cpsw, priv)];
+	if (slave_no < 0)
+		return;
+
+	slave = &cpsw->slaves[slave_no];
 
 	ctrl = slave_read(slave, CPSW2_CONTROL);
 	switch (cpsw->version) {
@@ -1822,7 +1847,7 @@ static int cpsw_ndo_ioctl(struct net_device *dev, struct ifreq *req, int cmd)
 {
 	struct cpsw_priv *priv = netdev_priv(dev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
 
 	if (!netif_running(dev))
 		return -EINVAL;
@@ -1863,6 +1888,7 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p)
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct sockaddr *addr = (struct sockaddr *)p;
 	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(priv);
 	int flags = 0;
 	u16 vid = 0;
 	int ret;
@@ -1876,8 +1902,8 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p)
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
-		vid = cpsw->slaves[priv->emac_port].port_vlan;
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
+		vid = cpsw->slaves[slave_no].port_vlan;
 		flags = ALE_VLAN;
 	}
 
@@ -1915,8 +1941,8 @@ static inline int cpsw_add_vlan_ale_entry(struct cpsw_priv *priv,
 	u32 port_mask;
 	struct cpsw_common *cpsw = priv->cpsw;
 
-	if (cpsw->data.dual_emac) {
-		port_mask = (1 << (priv->emac_port + 1)) | ALE_PORT_HOST;
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
+		port_mask = (1 << priv->emac_port) | ALE_PORT_HOST;
 
 		if (priv->ndev->flags & IFF_ALLMULTI)
 			unreg_mcast_mask = port_mask;
@@ -1969,7 +1995,7 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev,
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		/* In dual EMAC, reserved VLAN id should not be used for
 		 * creating VLAN interfaces as this can break the dual
 		 * EMAC port separation
@@ -2005,7 +2031,7 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
@@ -2183,7 +2209,10 @@ static int cpsw_get_link_ksettings(struct net_device *ndev,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (!cpsw->slaves[slave_no].phy)
 		return -EOPNOTSUPP;
@@ -2197,7 +2226,10 @@ static int cpsw_set_link_ksettings(struct net_device *ndev,
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_ksettings_set(cpsw->slaves[slave_no].phy,
@@ -2210,7 +2242,10 @@ static void cpsw_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return;
 
 	wol->supported = 0;
 	wol->wolopts = 0;
@@ -2223,7 +2258,10 @@ static int cpsw_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_set_wol(cpsw->slaves[slave_no].phy, wol);
@@ -2487,7 +2525,10 @@ static int cpsw_get_eee(struct net_device *ndev, struct ethtool_eee *edata)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_get_eee(cpsw->slaves[slave_no].phy, edata);
@@ -2499,7 +2540,10 @@ static int cpsw_set_eee(struct net_device *ndev, struct ethtool_eee *edata)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return phy_ethtool_set_eee(cpsw->slaves[slave_no].phy, edata);
@@ -2511,7 +2555,10 @@ static int cpsw_nway_reset(struct net_device *ndev)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
 	struct cpsw_common *cpsw = priv->cpsw;
-	int slave_no = cpsw_slave_index(cpsw, priv);
+	int slave_no = cpsw_slave_index(priv);
+
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
 
 	if (cpsw->slaves[slave_no].phy)
 		return genphy_restart_aneg(cpsw->slaves[slave_no].phy);
@@ -2662,7 +2709,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 	data->mac_control = prop;
 
 	if (of_property_read_bool(node, "dual_emac"))
-		data->dual_emac = 1;
+		data->switch_mode = CPSW_DUAL_EMAC;
 
 	/*
 	 * Populate all the child nodes here...
@@ -2743,7 +2790,7 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 			if (ret)
 				return ret;
 		}
-		if (data->dual_emac) {
+		if (cpsw_is_dual_mac(data->switch_mode)) {
 			if (of_property_read_u32(slave_node, "dual_emac_res_vlan",
 						 &prop)) {
 				dev_err(&pdev->dev, "Missing dual_emac_res_vlan in DT.\n");
@@ -2823,7 +2870,7 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	}
 	memcpy(ndev->dev_addr, priv_sl2->mac_addr, ETH_ALEN);
 
-	priv_sl2->emac_port = 1;
+	priv_sl2->emac_port = 2;
 	cpsw->slaves[1].ndev = ndev;
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
@@ -2947,7 +2994,10 @@ static int cpsw_probe(struct platform_device *pdev)
 		cpsw->slaves[i].slave_num = i;
 
 	cpsw->slaves[0].ndev = ndev;
-	priv->emac_port = 0;
+	if (cpsw_is_switch(cpsw->data.switch_mode))
+		priv->emac_port = HOST_PORT_NUM;
+	else
+		priv->emac_port = 1;
 
 	clk = devm_clk_get(&pdev->dev, "fck");
 	if (IS_ERR(clk)) {
@@ -3106,7 +3156,7 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_dma_ret;
 	}
 
-	if (cpsw->data.dual_emac) {
+	if (!cpsw_is_switch(cpsw->data.switch_mode)) {
 		ret = cpsw_probe_dual_emac(priv);
 		if (ret) {
 			cpsw_err(priv, probe, "error probe slave 2 emac interface\n");
@@ -3186,7 +3236,7 @@ static int cpsw_remove(struct platform_device *pdev)
 		return ret;
 	}
 
-	if (cpsw->data.dual_emac)
+	if (!cpsw_is_switch(cpsw->data.switch_mode))
 		unregister_netdev(cpsw->slaves[1].ndev);
 	unregister_netdev(ndev);
 
@@ -3195,7 +3245,7 @@ static int cpsw_remove(struct platform_device *pdev)
 	cpsw_remove_dt(pdev);
 	pm_runtime_put_sync(&pdev->dev);
 	pm_runtime_disable(&pdev->dev);
-	if (cpsw->data.dual_emac)
+	if (!cpsw_is_switch(cpsw->data.switch_mode))
 		free_netdev(cpsw->slaves[1].ndev);
 	free_netdev(ndev);
 	return 0;
@@ -3208,7 +3258,7 @@ static int cpsw_suspend(struct device *dev)
 	struct net_device	*ndev = platform_get_drvdata(pdev);
 	struct cpsw_common	*cpsw = ndev_to_cpsw(ndev);
 
-	if (cpsw->data.dual_emac) {
+	if (!cpsw_is_switch(cpsw->data.switch_mode)) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
@@ -3237,7 +3287,7 @@ static int cpsw_resume(struct device *dev)
 
 	/* shut up ASSERT_RTNL() warning in netif_set_real_num_tx/rx_queues */
 	rtnl_lock();
-	if (cpsw->data.dual_emac) {
+	if (!cpsw_is_switch(cpsw->data.switch_mode)) {
 		int i;
 
 		for (i = 0; i < cpsw->data.slaves; i++) {
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
index 3b02a83..86a2709 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.h
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -30,6 +30,11 @@
 #define CPSW2_TX_PRI_MAP    0x18 /* Tx Header Priority to Switch Pri Mapping */
 #define CPSW2_TS_SEQ_MTYPE  0x1c /* Time Sync Sequence ID Offset and Msg Type */
 
+enum {
+	CPSW_TI_SWITCH,
+	CPSW_DUAL_EMAC,
+};
+
 struct cpsw_slave_data {
 	struct	device_node *phy_node;
 	char	phy_id[MII_BUS_ID_SIZE];
@@ -48,7 +53,7 @@ struct cpsw_platform_data {
 	u32	bd_ram_size;  /*buffer descriptor ram size */
 	u32	mac_control;	/* Mac control register */
 	u16	default_vlan;	/* Def VLAN for ALE lookup in VLAN aware mode*/
-	bool	dual_emac;	/* Enable Dual EMAC mode */
+	u8	switch_mode;    /* Enable Dual EMAC/switchdev mode */
 };
 
 struct cpsw_slave {
-- 
2.7.4

^ permalink raw reply related

* [RFC v2, net-next, PATCH 4/4] net/cpsw_switchdev: add switchdev mode of operation on cpsw driver
From: Ilias Apalodimas @ 2018-06-14 11:11 UTC (permalink / raw)
  To: netdev, grygorii.strashko, ivan.khoronzhuk, nsekhar, jiri,
	ivecera, andrew, f.fainelli
  Cc: francois.ozog, yogeshs, spatton, Jose.Abreu, Ilias Apalodimas
In-Reply-To: <1528974690-31600-1-git-send-email-ilias.apalodimas@linaro.org>

This patch enables switchdev funtionality on the driver based on a
.config option(CONFIG_TI_CPSW_SWITCHDEV). CPSW driver used a DTS option
called dual_emac to enable switch or dual emac mode. The new config option
will override this configuration.

It creates 2 ports, eth0 and eth1(that can be renamed to sw0p1 and sw0p2
via udev rules).
sw0p1 and sw0p2 are the netdev interfaces connected to PHY devices.
This hardware also has a CPU port which is configured invidividually in
the case of VLANs.

On device init all netdevices (including the CPU port) will operate on
VLAN 0. sw0p1 and sw0p2 will operate as normal netdev interfaces.
Once they are added in a bridge the default bridge vlan will not be added
to the CPU port. In order to get an ip address on br0 you'll need to add
the CPU port on that vlan by issuing:
bridge vlan add dev br0 vid <vid> pvid untagged self

Multicast traffic:
setting IFF_MULTICAST on and off will affect registered multicast on that
port(if enabled port will be added on registered multicast traffic mask).
This muct occur before adding VLANs on the interfaces. If you change the
flag after the VLAN configuration you need to re-issue the VLAN config
commands.

MDBs/FDBs:
If the CPU port is member of the appropriate VLANs then switchdev API
will add FDB/MDB entries uppon detection. If the CPU port is not a member
the user can manually specify the entries.

ALE_P0_UNI_FLOOD will be enabled when the first interface joins the bridge
and will be disabled once the last interface leaves the bridge

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/ti/Kconfig          |   9 +
 drivers/net/ethernet/ti/Makefile         |   1 +
 drivers/net/ethernet/ti/cpsw.c           | 306 +++++++++++++++++++++-
 drivers/net/ethernet/ti/cpsw_priv.h      |   2 +
 drivers/net/ethernet/ti/cpsw_switchdev.c | 418 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_switchdev.h |   4 +
 6 files changed, 731 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.c
 create mode 100644 drivers/net/ethernet/ti/cpsw_switchdev.h

diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
index 9263d63..a299d86 100644
--- a/drivers/net/ethernet/ti/Kconfig
+++ b/drivers/net/ethernet/ti/Kconfig
@@ -73,6 +73,15 @@ config TI_CPSW
 	  To compile this driver as a module, choose M here: the module
 	  will be called cpsw.
 
+config TI_CPSW_SWITCHDEV
+	bool "TI CPSW switchdev support"
+	depends on TI_CPSW
+	depends on NET_SWITCHDEV
+	help
+	  Enable switchdev support on TI's CPSW Ethernet Switch.
+
+	  This will allow you to configure the switch using standard tools.
+
 config TI_CPTS
 	bool "TI Common Platform Time Sync (CPTS) Support"
 	depends on TI_CPSW || TI_KEYSTONE_NETCP || COMPILE_TEST
diff --git a/drivers/net/ethernet/ti/Makefile b/drivers/net/ethernet/ti/Makefile
index 0be551d..d6eb2a2 100644
--- a/drivers/net/ethernet/ti/Makefile
+++ b/drivers/net/ethernet/ti/Makefile
@@ -15,6 +15,7 @@ obj-$(CONFIG_TI_CPSW_PHY_SEL) += cpsw-phy-sel.o
 obj-$(CONFIG_TI_CPSW_ALE) += cpsw_ale.o
 obj-$(CONFIG_TI_CPTS_MOD) += cpts.o
 obj-$(CONFIG_TI_CPSW) += ti_cpsw.o
+obj-$(CONFIG_TI_CPSW_SWITCHDEV) += cpsw_switchdev.o
 ti_cpsw-y := cpsw.o
 
 obj-$(CONFIG_TI_KEYSTONE_NETCP) += keystone_netcp.o
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index e5765cc..b501908 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -18,12 +18,10 @@
 #include <linux/clk.h>
 #include <linux/timer.h>
 #include <linux/module.h>
-#include <linux/platform_device.h>
 #include <linux/irqreturn.h>
 #include <linux/interrupt.h>
 #include <linux/if_ether.h>
 #include <linux/etherdevice.h>
-#include <linux/netdevice.h>
 #include <linux/net_tstamp.h>
 #include <linux/phy.h>
 #include <linux/workqueue.h>
@@ -43,6 +41,7 @@
 #include "cpsw.h"
 #include "cpsw_ale.h"
 #include "cpsw_priv.h"
+#include "cpsw_switchdev.h"
 #include "cpts.h"
 #include "davinci_cpdma.h"
 
@@ -361,6 +360,13 @@ struct cpsw_hw_stats {
 	u32	rxdmaoverruns;
 };
 
+struct cpsw_switchdev_event_work {
+	struct work_struct work;
+	struct switchdev_notifier_fdb_info fdb_info;
+	struct cpsw_priv *priv;
+	unsigned long event;
+};
+
 #define CPSW_STAT(m)		CPSW_STATS,				\
 				sizeof(((struct cpsw_hw_stats *)0)->m), \
 				offsetof(struct cpsw_hw_stats, m)
@@ -488,14 +494,32 @@ static int cpsw_is_switch(u8 switch_mode)
 	return switch_mode == CPSW_TI_SWITCH;
 }
 
+static int cpsw_is_switchdev(u8 switch_mode)
+{
+	return switch_mode == CPSW_SWITCHDEV;
+}
+
 static int cpsw_slave_index(struct cpsw_priv *priv)
 {
 	struct cpsw_common *cpsw = priv->cpsw;
 
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	if (priv->emac_port == HOST_PORT_NUM)
+		return -1;
+#endif
+
 	return cpsw->data.switch_mode ? priv->emac_port - 1 :
 		cpsw->data.active_slave;
 }
 
+static void cpsw_switchdev_port_enable(struct net_device *ndev)
+{
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	cpsw_port_switchdev_init(ndev);
+	ndev->features |= NETIF_F_NETNS_LOCAL;
+#endif
+}
+
 static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 {
 	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
@@ -521,6 +545,7 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 		if (enable) {
 			/* Enable Bypass */
 			cpsw_ale_control_set(ale, 0, ALE_BYPASS, 1);
+			cpsw_ale_set_allmulti(ale, IFF_ALLMULTI);
 
 			dev_dbg(&ndev->dev, "promiscuity enabled\n");
 		} else {
@@ -554,6 +579,7 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 
 			/* Flood All Unicast Packets to Host port */
 			cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
+			cpsw_ale_set_allmulti(ale, IFF_ALLMULTI);
 			dev_dbg(&ndev->dev, "promiscuity enabled\n");
 		} else {
 			/* Don't Flood All Unicast Packets to Host port */
@@ -568,6 +594,19 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			}
 			dev_dbg(&ndev->dev, "promiscuity disabled\n");
 		}
+	} else if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		/* When interfaces are placed into a bridge they'll switch to
+		 * promiscuous mode. In switchdev case ALE_P0_UNI_FLOOD is
+		 * changed whether any switch port participates in the bridge
+		 * or not
+		 */
+		struct cpsw_priv *priv = netdev_priv(ndev);
+		int slave_idx = cpsw_slave_index(priv);
+		int slave_num;
+
+		slave_num = cpsw_get_slave_port(slave_idx);
+		cpsw_ale_control_set(ale, slave_num, ALE_PORT_NOLEARN, 0);
+		cpsw_ale_control_set(ale, slave_num, ALE_PORT_NO_SA_UPDATE, 0);
 	}
 }
 
@@ -586,7 +625,6 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev)
 	if (ndev->flags & IFF_PROMISC) {
 		/* Enable promiscuous mode */
 		cpsw_set_promiscious(ndev, true);
-		cpsw_ale_set_allmulti(cpsw->ale, IFF_ALLMULTI);
 		return;
 	} else {
 		/* Disable promiscuous mode */
@@ -721,6 +759,10 @@ static void cpsw_rx_handler(void *token, int len, int status)
 		return;
 	}
 
+#if IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV)
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		skb->offload_fwd_mark = 1;
+#endif
 	new_skb = netdev_alloc_skb_ip_align(ndev, cpsw->rx_packet_max);
 	if (new_skb) {
 		skb_copy_queue_mapping(new_skb, skb);
@@ -1427,10 +1469,13 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 			     ALE_PORT_STATE, ALE_PORT_STATE_FORWARD);
 
 	if (!cpsw_is_dual_mac(cpsw->data.switch_mode)) {
-		cpsw_ale_add_ucast(cpsw->ale, priv->mac_addr, HOST_PORT_NUM,
-				   0, 0);
+		char stpa[] = {0x01, 0x80, 0xc2, 0x0, 0x0, 0x0};
+
 		cpsw_ale_add_mcast(cpsw->ale, priv->ndev->broadcast,
 				   ALE_PORT_HOST, 0, 0, ALE_MCAST_FWD_2);
+		cpsw_ale_add_mcast(cpsw->ale, stpa,
+				   ALE_PORT_HOST, ALE_SUPER, 0,
+				   ALE_MCAST_BLOCK_LEARN_FWD);
 	}
 }
 
@@ -1529,11 +1574,14 @@ static int cpsw_ndo_open(struct net_device *ndev)
 	for_each_slave(priv, cpsw_slave_open, priv);
 
 	/* Add default VLAN */
-	if (!cpsw_is_dual_mac(cpsw->data.switch_mode))
+	if (!cpsw_is_dual_mac(cpsw->data.switch_mode)) {
 		cpsw_add_default_vlan(priv);
-	else
+		cpsw_ale_add_ucast(cpsw->ale, priv->mac_addr, HOST_PORT_NUM, 0,
+				   0);
+	} else {
 		cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
 				  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
+	}
 
 	/* initialize shared resources for every ndev */
 	if (!cpsw->usage_count) {
@@ -1852,6 +1900,9 @@ static int cpsw_ndo_ioctl(struct net_device *dev, struct ifreq *req, int cmd)
 	if (!netif_running(dev))
 		return -EINVAL;
 
+	if (slave_no < 0)
+		return -EOPNOTSUPP;
+
 	switch (cmd) {
 	case SIOCSHWTSTAMP:
 		return cpsw_hwtstamp_set(dev, req);
@@ -1941,7 +1992,7 @@ static inline int cpsw_add_vlan_ale_entry(struct cpsw_priv *priv,
 	u32 port_mask;
 	struct cpsw_common *cpsw = priv->cpsw;
 
-	if (cpsw_is_dual_mac(cpsw->data.switch_mode)) {
+	if (!cpsw_is_switch(cpsw->data.switch_mode)) {
 		port_mask = (1 << priv->emac_port) | ALE_PORT_HOST;
 
 		if (priv->ndev->flags & IFF_ALLMULTI)
@@ -1989,6 +2040,10 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev,
 	if (vid == cpsw->data.default_vlan)
 		return 0;
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode) &&
+	    (netif_is_bridge_port(ndev)))
+		return -EOPNOTSUPP;
+
 	ret = pm_runtime_get_sync(cpsw->dev);
 	if (ret < 0) {
 		pm_runtime_put_noidle(cpsw->dev);
@@ -2025,6 +2080,10 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 	if (vid == cpsw->data.default_vlan)
 		return 0;
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode) &&
+	    (netif_is_bridge_port(ndev)))
+		return -EOPNOTSUPP;
+
 	ret = pm_runtime_get_sync(cpsw->dev);
 	if (ret < 0) {
 		pm_runtime_put_noidle(cpsw->dev);
@@ -2056,6 +2115,24 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 	return ret;
 }
 
+static int cpsw_ndo_get_phys_port_name(struct net_device *ndev, char *name,
+				       size_t len)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int err;
+
+	if (!cpsw_is_switchdev(cpsw->data.switch_mode))
+		return -EOPNOTSUPP;
+
+	err = snprintf(name, len, "p%d", priv->emac_port);
+
+	if (err >= len)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int cpsw_ndo_set_tx_maxrate(struct net_device *ndev, int queue, u32 rate)
 {
 	struct cpsw_priv *priv = netdev_priv(ndev);
@@ -2122,6 +2199,7 @@ static const struct net_device_ops cpsw_netdev_ops = {
 #endif
 	.ndo_vlan_rx_add_vid	= cpsw_ndo_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= cpsw_ndo_vlan_rx_kill_vid,
+	.ndo_get_phys_port_name = cpsw_ndo_get_phys_port_name,
 };
 
 static int cpsw_get_regs_len(struct net_device *ndev)
@@ -2711,6 +2789,10 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 	if (of_property_read_bool(node, "dual_emac"))
 		data->switch_mode = CPSW_DUAL_EMAC;
 
+	/* switchdev overrides DTS */
+	if (IS_ENABLED(CONFIG_TI_CPSW_SWITCHDEV))
+		data->switch_mode = CPSW_SWITCHDEV;
+
 	/*
 	 * Populate all the child nodes here...
 	 */
@@ -2874,6 +2956,9 @@ static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
 	cpsw->slaves[1].ndev = ndev;
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		cpsw_switchdev_port_enable(ndev);
+
 	ndev->netdev_ops = &cpsw_netdev_ops;
 	ndev->ethtool_ops = &cpsw_ethtool_ops;
 
@@ -2903,6 +2988,196 @@ static const struct soc_device_attribute cpsw_soc_devices[] = {
 	{ /* sentinel */ }
 };
 
+static bool cpsw_port_dev_check(const struct net_device *dev)
+{
+	return dev->netdev_ops == &cpsw_netdev_ops;
+}
+
+static void cpsw_fdb_offload_notify(struct net_device *ndev,
+				    struct switchdev_notifier_fdb_info *rcv)
+{
+	struct switchdev_notifier_fdb_info info;
+
+	info.addr = rcv->addr;
+	info.vid = rcv->vid;
+	call_switchdev_notifiers(SWITCHDEV_FDB_OFFLOADED,
+				 ndev, &info.info);
+}
+
+static void cpsw_switchdev_event_work(struct work_struct *work)
+{
+	struct cpsw_switchdev_event_work *switchdev_work =
+		container_of(work, struct cpsw_switchdev_event_work, work);
+	struct cpsw_priv *priv = switchdev_work->priv;
+	struct switchdev_notifier_fdb_info *fdb;
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port = priv->emac_port;
+
+	rtnl_lock();
+	switch (switchdev_work->event) {
+	case SWITCHDEV_FDB_ADD_TO_DEVICE:
+		fdb = &switchdev_work->fdb_info;
+		if (memcmp(priv->mac_addr, (u8 *)fdb->addr, ETH_ALEN) == 0)
+			port = HOST_PORT_NUM;
+		cpsw_ale_add_ucast(cpsw->ale, (u8 *)fdb->addr, port, ALE_VLAN,
+				   fdb->vid);
+		cpsw_fdb_offload_notify(priv->ndev, fdb);
+		break;
+	case SWITCHDEV_FDB_DEL_TO_DEVICE:
+		fdb = &switchdev_work->fdb_info;
+		if (memcmp(priv->mac_addr, (u8 *)fdb->addr, ETH_ALEN) == 0)
+			port = HOST_PORT_NUM;
+		cpsw_ale_del_ucast(cpsw->ale, (u8 *)fdb->addr, port, ALE_VLAN,
+				   fdb->vid);
+		break;
+	default:
+		break;
+	}
+	rtnl_unlock();
+
+	kfree(switchdev_work->fdb_info.addr);
+	kfree(switchdev_work);
+	dev_put(priv->ndev);
+}
+
+/* called under rcu_read_lock() */
+static int cpsw_switchdev_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *ndev = switchdev_notifier_info_to_dev(ptr);
+	struct switchdev_notifier_fdb_info *fdb_info = ptr;
+	struct cpsw_switchdev_event_work *switchdev_work;
+	struct cpsw_priv *priv = netdev_priv(ndev);
+
+	if (!cpsw_port_dev_check(ndev))
+		return NOTIFY_DONE;
+
+	switchdev_work = kzalloc(sizeof(*switchdev_work), GFP_ATOMIC);
+	if (WARN_ON(!switchdev_work))
+		return NOTIFY_BAD;
+
+	INIT_WORK(&switchdev_work->work, cpsw_switchdev_event_work);
+	switchdev_work->priv = priv;
+	switchdev_work->event = event;
+
+	switch (event) {
+	case SWITCHDEV_FDB_ADD_TO_DEVICE:
+	case SWITCHDEV_FDB_DEL_TO_DEVICE:
+		memcpy(&switchdev_work->fdb_info, ptr,
+		       sizeof(switchdev_work->fdb_info));
+		switchdev_work->fdb_info.addr = kzalloc(ETH_ALEN, GFP_ATOMIC);
+		ether_addr_copy((u8 *)switchdev_work->fdb_info.addr,
+				fdb_info->addr);
+		dev_hold(ndev);
+		break;
+	default:
+		kfree(switchdev_work);
+		return NOTIFY_DONE;
+	}
+
+	queue_work(system_long_wq, &switchdev_work->work);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block cpsw_switchdev_notifier = {
+	.notifier_call = cpsw_switchdev_event,
+};
+
+static void cpsw_netdevice_port_link(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	if (!cpsw->br_members) {
+		cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM, ALE_P0_UNI_FLOOD,
+				     1);
+		dev_dbg(&ndev->dev, "Set P0_UNI_FLOOD\n");
+	}
+	cpsw->br_members++;
+}
+
+static void cpsw_netdevice_port_unlink(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	cpsw->br_members--;
+	if (!cpsw->br_members) {
+		cpsw_ale_control_set(cpsw->ale, HOST_PORT_NUM, ALE_P0_UNI_FLOOD,
+				     0);
+		dev_dbg(&ndev->dev, "unset P0_UNI_FLOOD\n");
+	}
+}
+
+/* netdev notifier */
+static int cpsw_netdevice_event(struct notifier_block *unused,
+				unsigned long event, void *ptr)
+{
+	struct net_device *ndev = netdev_notifier_info_to_dev(ptr);
+	struct netdev_notifier_changeupper_info *info;
+
+	switch (event) {
+	case NETDEV_CHANGEUPPER:
+		info = ptr;
+		if (!info->master)
+			goto out;
+		if (info->linking)
+			cpsw_netdevice_port_link(ndev);
+		else
+			cpsw_netdevice_port_unlink(ndev);
+		break;
+	default:
+		return NOTIFY_DONE;
+	}
+
+out:
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block cpsw_netdevice_nb __read_mostly = {
+	.notifier_call = cpsw_netdevice_event,
+};
+
+static int cpsw_register_notifiers(struct cpsw_priv *priv)
+{
+	int ret;
+
+	ret = register_netdevice_notifier(&cpsw_netdevice_nb);
+	if (ret) {
+		cpsw_err(priv, probe, "can't register netdevice notifier\n");
+		return ret;
+	}
+
+	ret = register_switchdev_notifier(&cpsw_switchdev_notifier);
+	if (ret) {
+		cpsw_err(priv, probe, "can't register switchdev notifier\n");
+		goto unreg_netdevice;
+	}
+
+	return ret;
+
+unreg_netdevice:
+	ret = unregister_netdevice_notifier(&cpsw_netdevice_nb);
+
+	return ret;
+}
+
+static int cpsw_unregister_notifiers(struct cpsw_priv *priv)
+{
+	int ret;
+
+	ret = unregister_switchdev_notifier(&cpsw_switchdev_notifier);
+	if (ret)
+		dev_err(priv->dev, "can't unregister switchdev notifier\n");
+
+	ret += unregister_netdevice_notifier(&cpsw_netdevice_nb);
+	if (ret)
+		dev_err(priv->dev, "can't unregister netdevice notifier\n");
+
+	return ret;
+}
+
 static int cpsw_probe(struct platform_device *pdev)
 {
 	struct clk			*clk;
@@ -3135,6 +3410,9 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_dma_ret;
 	}
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		cpsw_switchdev_port_enable(ndev);
+
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_HW_VLAN_CTAG_RX;
 
 	ndev->netdev_ops = &cpsw_netdev_ops;
@@ -3202,6 +3480,12 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_dma_ret;
 	}
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode)) {
+		ret = cpsw_register_notifiers(priv);
+		if (ret)
+			goto clean_dma_ret;
+	}
+
 	cpsw_notice(priv, probe,
 		    "initialized device (regs %pa, irq %d, pool size %d)\n",
 		    &ss_res->start, ndev->irq, dma_params.descs_pool_size);
@@ -3227,7 +3511,8 @@ static int cpsw_probe(struct platform_device *pdev)
 static int cpsw_remove(struct platform_device *pdev)
 {
 	struct net_device *ndev = platform_get_drvdata(pdev);
-	struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
 	int ret;
 
 	ret = pm_runtime_get_sync(&pdev->dev);
@@ -3236,6 +3521,9 @@ static int cpsw_remove(struct platform_device *pdev)
 		return ret;
 	}
 
+	if (cpsw_is_switchdev(cpsw->data.switch_mode))
+		ret = cpsw_unregister_notifiers(priv);
+
 	if (!cpsw_is_switch(cpsw->data.switch_mode))
 		unregister_netdev(cpsw->slaves[1].ndev);
 	unregister_netdev(ndev);
diff --git a/drivers/net/ethernet/ti/cpsw_priv.h b/drivers/net/ethernet/ti/cpsw_priv.h
index 86a2709..4380b1c 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.h
+++ b/drivers/net/ethernet/ti/cpsw_priv.h
@@ -33,6 +33,7 @@
 enum {
 	CPSW_TI_SWITCH,
 	CPSW_DUAL_EMAC,
+	CPSW_SWITCHDEV,
 };
 
 struct cpsw_slave_data {
@@ -98,6 +99,7 @@ struct cpsw_common {
 	int				rx_ch_num, tx_ch_num;
 	int				speed;
 	int				usage_count;
+	u8				br_members;
 };
 
 struct cpsw_priv {
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
new file mode 100644
index 0000000..528e99e
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
@@ -0,0 +1,418 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Texas Instruments switchdev Driver
+ *
+ * Copyright (C) 2018 Texas Instruments
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/etherdevice.h>
+#include <linux/if_bridge.h>
+#include <net/switchdev.h>
+#include "cpsw.h"
+#include "cpsw_priv.h"
+#include "cpsw_ale.h"
+
+static u32 cpsw_switchdev_get_ver(struct net_device *ndev)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+
+	return cpsw->version;
+}
+
+static int cpsw_port_stp_state_set(struct cpsw_priv *priv,
+				   struct switchdev_trans *trans, u8 state)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	u8 cpsw_state;
+	int ret = 0;
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	switch (state) {
+	case BR_STATE_FORWARDING:
+		cpsw_state = ALE_PORT_STATE_FORWARD;
+		break;
+	case BR_STATE_LEARNING:
+		cpsw_state = ALE_PORT_STATE_LEARN;
+		break;
+	case BR_STATE_DISABLED:
+		cpsw_state = ALE_PORT_STATE_DISABLE;
+		break;
+	case BR_STATE_LISTENING:
+	case BR_STATE_BLOCKING:
+		cpsw_state = ALE_PORT_STATE_BLOCK;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	ret = cpsw_ale_control_set(cpsw->ale, priv->emac_port,
+				   ALE_PORT_STATE, cpsw_state);
+	dev_dbg(priv->dev, "ale state: %u\n", cpsw_state);
+
+	return ret;
+}
+
+static int cpsw_port_attr_br_flags_set(struct cpsw_priv *priv,
+				       struct switchdev_trans *trans,
+				       struct net_device *orig_dev,
+				       unsigned long brport_flags)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	bool unreg_mcast_add = false;
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	if (brport_flags & BR_MCAST_FLOOD)
+		unreg_mcast_add = true;
+	cpsw_ale_set_unreg_mcast(cpsw->ale, BIT(priv->emac_port),
+				 unreg_mcast_add);
+
+	return 0;
+}
+
+static int cpsw_port_attr_set(struct net_device *ndev,
+			      const struct switchdev_attr *attr,
+			      struct switchdev_trans *trans)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	u8 state;
+	int ret;
+
+	dev_dbg(priv->dev, "attr: id %u dev: %s port: %u\n", attr->id,
+		priv->ndev->name, priv->emac_port);
+
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
+		ret = cpsw_port_stp_state_set(priv, trans, attr->u.stp_state);
+		dev_dbg(priv->dev, "stp state: %u\n", state);
+		break;
+	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
+		ret = cpsw_port_attr_br_flags_set(priv, trans, attr->orig_dev,
+						  attr->u.brport_flags);
+		break;
+	default:
+		ret = -EOPNOTSUPP;
+		break;
+	}
+
+	return ret;
+}
+
+static int cpsw_port_attr_get(struct net_device *dev,
+			      struct switchdev_attr *attr)
+{
+	u32 cpsw_ver;
+	int err = 0;
+
+	switch (attr->id) {
+	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
+		cpsw_ver = cpsw_switchdev_get_ver(dev);
+		attr->u.ppid.id_len = sizeof(cpsw_ver);
+		memcpy(&attr->u.ppid.id, &cpsw_ver, attr->u.ppid.id_len);
+		break;
+	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
+		attr->u.brport_flags_support = BR_MCAST_FLOOD;
+		break;
+	default:
+		return -EOPNOTSUPP;
+	}
+
+	return err;
+}
+
+static u16 cpsw_get_pvid(struct cpsw_priv *priv)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	u32 __iomem *port_vlan_reg;
+	u32 pvid;
+
+	if (priv->emac_port) {
+		int reg = CPSW2_PORT_VLAN;
+
+		if (cpsw->version == CPSW_VERSION_1)
+			reg = CPSW1_PORT_VLAN;
+		pvid = slave_read(cpsw->slaves + (priv->emac_port - 1), reg);
+	} else {
+		port_vlan_reg = &cpsw->host_port_regs->port_vlan;
+		pvid = readl(port_vlan_reg);
+	}
+
+	pvid = pvid & 0xfff;
+
+	return pvid;
+}
+
+static void cpsw_set_pvid(struct cpsw_priv *priv, u16 vid, bool cfi, u32 cos)
+{
+	struct cpsw_common *cpsw = priv->cpsw;
+	void __iomem *port_vlan_reg;
+	u32 pvid;
+
+	pvid = vid;
+	pvid |= cfi ? BIT(12) : 0;
+	pvid |= (cos & 0x7) << 13;
+
+	if (priv->emac_port) {
+		int reg = CPSW2_PORT_VLAN;
+
+		if (cpsw->version == CPSW_VERSION_1)
+			reg = CPSW1_PORT_VLAN;
+		/* no barrier */
+		slave_write(cpsw->slaves + (priv->emac_port - 1), pvid, reg);
+	} else {
+		/* CPU port */
+		port_vlan_reg = &cpsw->host_port_regs->port_vlan;
+		writel(pvid, port_vlan_reg);
+	}
+}
+
+static int cpsw_port_vlan_add(struct cpsw_priv *priv, bool untag, bool pvid,
+			      u16 vid, struct net_device *orig_dev)
+{
+	bool cpu_port = netif_is_bridge_master(orig_dev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int unreg_mcast_mask = 0;
+	int reg_mcast_mask = 0;
+	int untag_mask = 0;
+	int port_mask;
+	int ret = 0;
+	u32 flags;
+
+	if (cpu_port) {
+		port_mask = BIT(HOST_PORT_NUM);
+		flags = orig_dev->flags;
+		unreg_mcast_mask = port_mask;
+	} else {
+		port_mask = BIT(priv->emac_port);
+		flags = priv->ndev->flags;
+	}
+
+	if (flags & IFF_MULTICAST)
+		reg_mcast_mask = port_mask;
+
+	if (untag)
+		untag_mask = port_mask;
+
+	ret = cpsw_ale_vlan_add_modify(cpsw->ale, vid, port_mask, untag_mask,
+				       reg_mcast_mask, unreg_mcast_mask);
+	if (ret) {
+		dev_err(priv->dev, "Unable to add vlan\n");
+		return ret;
+	}
+
+	if (!pvid)
+		return ret;
+
+	cpsw_set_pvid(priv, vid, 0, 0);
+
+	dev_dbg(priv->dev, "VID add: %u dev: %s port: %u\n", vid,
+		priv->ndev->name, priv->emac_port);
+
+	return ret;
+}
+
+static int cpsw_port_vlan_del(struct cpsw_priv *priv, u16 vid,
+			      struct net_device *orig_dev)
+{
+	bool cpu_port = netif_is_bridge_master(orig_dev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port_mask;
+	int ret = 0;
+
+	if (cpu_port)
+		port_mask = BIT(HOST_PORT_NUM);
+	else
+		port_mask = BIT(priv->emac_port);
+
+	ret = cpsw_ale_vlan_del_modify(cpsw->ale, vid, port_mask);
+	if (ret != 0)
+		return ret;
+
+	/* We don't care for the return value here, error is returned only if
+	 * the unicast entry is not present
+	 */
+	cpsw_ale_del_ucast(cpsw->ale, priv->mac_addr,
+			   HOST_PORT_NUM, ALE_VLAN, vid);
+
+	if (vid == cpsw_get_pvid(priv))
+		cpsw_set_pvid(priv, 0, 0, 0);
+
+	/* We don't care for the return value here, error is returned only if
+	 * the multicast entry is not present
+	 */
+	cpsw_ale_del_mcast(cpsw->ale, priv->ndev->broadcast,
+			   0, ALE_VLAN, vid);
+
+	dev_dbg(priv->dev, "VID del: %u dev: %s port: %u\n", vid,
+		priv->ndev->name, priv->emac_port);
+
+	return ret;
+}
+
+static int cpsw_port_vlans_add(struct cpsw_priv *priv,
+			       const struct switchdev_obj_port_vlan *vlan,
+			       struct switchdev_trans *trans)
+{
+	bool untag = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
+	struct net_device *orig_dev = vlan->obj.orig_dev;
+	bool cpu_port = netif_is_bridge_master(orig_dev);
+	bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
+	u16 vid;
+
+	if (cpu_port && !(vlan->flags & BRIDGE_VLAN_INFO_BRENTRY))
+		return 0;
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
+		int err;
+
+		err = cpsw_port_vlan_add(priv, untag, pvid, vid, orig_dev);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int cpsw_port_vlans_del(struct cpsw_priv *priv,
+			       const struct switchdev_obj_port_vlan *vlan)
+
+{
+	struct net_device *orig_dev = vlan->obj.orig_dev;
+	u16 vid;
+
+	for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
+		int err;
+
+		err = cpsw_port_vlan_del(priv, vid, orig_dev);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int cpsw_port_mdb_add(struct cpsw_priv *priv,
+			     struct switchdev_obj_port_mdb *mdb,
+			     struct switchdev_trans *trans)
+
+{
+	struct net_device *orig_dev = mdb->obj.orig_dev;
+	bool cpu_port = netif_is_bridge_master(orig_dev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int port_mask;
+	int err;
+
+	if (switchdev_trans_ph_prepare(trans))
+		return 0;
+
+	if (cpu_port)
+		port_mask = BIT(HOST_PORT_NUM);
+	else
+		port_mask = BIT(priv->emac_port);
+
+	err = cpsw_ale_mcast_add_modify(cpsw->ale, mdb->addr, port_mask,
+					ALE_VLAN, mdb->vid, 0);
+
+	dev_dbg(priv->dev, "MDB add: %pM dev: %s vid %u port: %u\n", mdb->addr,
+		priv->ndev->name, mdb->vid, priv->emac_port);
+
+	return err;
+}
+
+static int cpsw_port_mdb_del(struct cpsw_priv *priv,
+			     struct switchdev_obj_port_mdb *mdb)
+
+{
+	struct net_device *orig_dev = mdb->obj.orig_dev;
+	bool cpu_port = netif_is_bridge_master(orig_dev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int del_mask;
+	int err;
+
+	if (cpu_port)
+		del_mask = BIT(HOST_PORT_NUM);
+	else
+		del_mask = BIT(priv->emac_port);
+	err = cpsw_ale_mcast_del_modify(cpsw->ale, mdb->addr, del_mask,
+					ALE_VLAN, mdb->vid);
+	dev_dbg(priv->dev, "MDB del: %pM dev: %s vid %u port: %u\n", mdb->addr,
+		priv->ndev->name, mdb->vid, priv->emac_port);
+
+	return err;
+}
+
+static int cpsw_port_obj_add(struct net_device *ndev,
+			     const struct switchdev_obj *obj,
+			     struct switchdev_trans *trans)
+{
+	struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+	struct switchdev_obj_port_mdb *mdb = SWITCHDEV_OBJ_PORT_MDB(obj);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	int err = 0;
+
+	switch (obj->id) {
+	case SWITCHDEV_OBJ_ID_PORT_VLAN:
+		err = cpsw_port_vlans_add(priv, vlan, trans);
+		break;
+	case SWITCHDEV_OBJ_ID_PORT_MDB:
+	case SWITCHDEV_OBJ_ID_HOST_MDB:
+		err = cpsw_port_mdb_add(priv, mdb, trans);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+static int cpsw_port_obj_del(struct net_device *ndev,
+			     const struct switchdev_obj *obj)
+{
+	struct switchdev_obj_port_vlan *vlan = SWITCHDEV_OBJ_PORT_VLAN(obj);
+	struct switchdev_obj_port_mdb *mdb = SWITCHDEV_OBJ_PORT_MDB(obj);
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	int err = 0;
+
+	switch (obj->id) {
+	case SWITCHDEV_OBJ_ID_PORT_VLAN:
+		err = cpsw_port_vlans_del(priv, vlan);
+		break;
+	case SWITCHDEV_OBJ_ID_PORT_MDB:
+	case SWITCHDEV_OBJ_ID_HOST_MDB:
+		err = cpsw_port_mdb_del(priv, mdb);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+
+	return err;
+}
+
+static const struct switchdev_ops cpsw_port_switchdev_ops = {
+	.switchdev_port_attr_set	= cpsw_port_attr_set,
+	.switchdev_port_attr_get	= cpsw_port_attr_get,
+	.switchdev_port_obj_add		= cpsw_port_obj_add,
+	.switchdev_port_obj_del		= cpsw_port_obj_del,
+};
+
+void cpsw_port_switchdev_init(struct net_device *ndev)
+{
+	ndev->switchdev_ops = &cpsw_port_switchdev_ops;
+}
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.h b/drivers/net/ethernet/ti/cpsw_switchdev.h
new file mode 100644
index 0000000..4940462
--- /dev/null
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.h
@@ -0,0 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <net/switchdev.h>
+
+void cpsw_port_switchdev_init(struct net_device *ndev);
-- 
2.7.4

^ permalink raw reply related

* Re: [BUG] net: stmmac: socfpga ethernet no longer working on linux-next
From: Marek Vasut @ 2018-06-14 10:59 UTC (permalink / raw)
  To: Jose Abreu, Dinh Nguyen, netdev; +Cc: David Miller, clabbe, Dinh Nguyen
In-Reply-To: <dbc3dc9b-7eaf-2480-74a7-a25bcb9428e9@synopsys.com>

On 06/14/2018 10:18 AM, Jose Abreu wrote:
> On 14-06-2018 08:38, Jose Abreu wrote:
>> Hello,
>>
>> On 13-06-2018 21:46, Dinh Nguyen wrote:
>>> Hi,
>>>
>>> The stmmac ethernet has stopped working in linux-next and linus/master
>>> branch(v4.17-11782-gbe779f03d563)
>>>
>>> It appears that the stmmac ethernet has stopped working after these 2 commits:
>>>
>>> 4dbbe8dde848 net: stmmac: Add support for U32 TC filter using Flexible RX Parser
>>> 5f0456b43140 net: stmmac: Implement logic to automatically select HW Interface
>>>
>>> If I move to this commit "565020aaeebf net: stmmac: Disable ACS
>>> Feature for GMAC >= 4", then the stmmac works again on SoCFPGA.
>>>
>>> I was following this thread:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_netdev_msg502858.html&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=yaVFU4TjGY0gVF8El1uKcisy6TPsyCl9uN7Wsis-qhY&m=fvPkLp2xlWolmIYwoFLmALhxlycg1w0UmxiYdT7qojc&s=aC4a2U3X_siDxSNz3c5OeadhEJWll31yP-oi5nNar94&e=
>>>
>>> Was wondering if there was a patch to fix dwmac-sun8i that the socfpga
>>> platform needs as well?
>> Probably. I will check and get back to you ASAP.
> 
> This seems to be a different problem. Can you send me your dmesg
> log and DT bindings you are using?

arch/arm/boot/dts/socfpga_arria10_socdk_sdmmc.dts
for example fails for me in next/master. Worked on 4.17-rc7.

-- 
Best regards,
Marek Vasut

^ permalink raw reply

* Re: mainline: x86_64: kernel panic: RIP: 0010:__xfrm_policy_check+0xcb/0x690
From: William Tu @ 2018-06-14 11:15 UTC (permalink / raw)
  To: Anders Roxell
  Cc: Steffen Klassert, Naresh Kamboju, Networking, David S. Miller,
	herbert, open list:KERNEL SELFTEST FRAMEWORK, open list
In-Reply-To: <CADYN=9LztFGUOb-RaEByN-G-E1MpnEv_HtZtObhyZUWWt0nffg@mail.gmail.com>

On Tue, Jun 12, 2018 at 5:09 AM, Anders Roxell <anders.roxell@linaro.org> wrote:
> On 12 June 2018 at 10:34, Steffen Klassert <steffen.klassert@secunet.com> wrote:
>> On Mon, Jun 11, 2018 at 10:11:46PM +0530, Naresh Kamboju wrote:
>>> Kernel panic on x86_64 machine running mainline 4.17.0 kernel while testing
>>> selftests bpf test_tunnel.sh test caused this kernel panic.
>>> I have noticed this kernel panic start happening from
>>> 4.17.0-rc7-next-20180529 and still happening on 4.17.0-next-20180608.
>>>
>>> [  213.638287] BUG: unable to handle kernel NULL pointer dereference
>>> at 0000000000000008
>>> ++[ ip xfrm poli  213.674036] PGD 0 P4D 0
>>> [  213.674118] audit: type=1327 audit(1528917683.623:7):
>>> proctitle=6970007866726D00706F6C69637900616464007372630031302E312E312E3130302F3332006473740031302E312E312E3230302F33320064697200696E00746D706C00737263003137322E31362E312E31303000647374003137322E31362E312E3230300070726F746F006573700072657169640031006D6F64650074756E6E
>>> [  213.677950] Oops: 0000 [#1] SMP PTI
>>> cy[ add src 10.1.  213.677952] CPU: 2 PID: 0 Comm: swapper/2 Tainted:
>>> G        W         4.17.0-next-20180608 #1
>>> [  213.677953] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
>>> 2.0b 07/27/2017
>>> [  213.726998] RIP: 0010:__xfrm_policy_check+0xcb/0x690
>>> [  213.731962] Code: 80 3d 0a d8 f1 00 00 0f 84 c1 02 00 00 4c 8b 25
>>> 2b af f4 00 e8 66 a6 6a ff 85 c0 74 0d 80 3d eb d7 f1 00 00 0f 84 d5
>>> 02 00 00 <49> 8b 44 24 08 48 85 c0 74 0c 48 8d b5 78 ff ff ff 4c 89 ff
>>> ff d0
>>
>> This looks like a bug that I've seen already. If it is what I think,
>> then commit 2c205dd3981f ("netfilter: add struct nf_nat_hook and use
>> it") introduced this bug.
>>
>> There was already a fix for this on the netdev list, but
>> I don't know the current status of that patch:
>>
>> https://patchwork.ozlabs.org/patch/921387/
>
> Hi, I applied the patch and ran bpf/test_tunnel.sh and I I couldn't
> see any crash.
> However, the script never returned (I had to Ctrl+c to get back), any ideas ?
> See log from the test below.
>
> Cheers,
> Anders
>
> [0;92mPASS: xfrm tunnel[0m

Hi Anders,
I think it should return 0 if you reach the above line.
The console output looks pretty messy due to using 'tee'
I will send a patch to make the output more readable.

Thanks
William

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox