Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups
From: John Stultz @ 2016-12-06  0:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Andy Lutomirski, Mickaël Salaün,
	Daniel Mack, David S. Miller, kafai, Florian Westphal,
	Harald Hoyer, Network Development, Sargun Dhillon,
	Pablo Neira Ayuso, lkml, Tejun Heo, Li Zefan, Jonathan Corbet,
	open list:CONTROL GROUP (CGROUP), Android Kernel Team,
	Rom Lemarchand, Colin Cross, Dmitry Shmidt
In-Reply-To: <CALAqxLXk+A9=A0oXGEduphXOQdbMg-wuGxmLkEk9cPHyn44X5Q@mail.gmail.com>

On Tue, Nov 22, 2016 at 4:57 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>>>>
>>>> I hate to say it, but I think I may see a problem.  Current
>>>> developments are afoot to make cgroups do more than resource control.
>>>> For example, there's Landlock and there's Daniel's ingress/egress
>>>> filter thing.  Current cgroup controllers can mostly just DoS their
>>>> controlled processes.  These new controllers (or controller-like
>>>> things) can exfiltrate data and change semantics.
>>>>
>>>> Does anyone have a security model in mind for these controllers and
>>>> the cgroups that they're attached to?  I'm reasonably confident that
>>>> CAP_SYS_RESOURCE is not the answer...
>>>
>>> and specifically the answer is... ?
>>> Also would be great if you start with specifying the question first
>>> and the problem you're trying to solve.
>>>
>>
>> I don't have a good answer right now.  Here are some constraints, though:
>>
>> 1. An insufficiently privileged process should not be able to move a
>> victim into a dangerous cgroup.
>>
>> 2. An insufficiently privileged process should not be able to move
>> itself into a dangerous cgroup and then use execve to gain privilege
>> such that the execve'd program can be compromised.
>>
>> 3. An insufficiently privileged process should not be able to make an
>> existing cgroup dangerous in a way that could compromise a victim in
>> that cgroup.
>>
>> 4. An insufficiently privileged process should not be able to make a
>> cgroup dangerous in a way that bypasses protections that would
>> otherwise protect execve() as used by itself or some other process in
>> that cgroup.
>>
>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
>> addition to the cgroup being controlled.
>
> Sorry for taking awhile to get back to you here.  I'm a little
> befuddled as to what next steps I should consider (and honestly, I'm
> not totally sure I really grok your concern here, particularly what
> you mean with "dangrous cgroups").
>
> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
> separate "sufficiently" from "insufficiently privileged") better?
>
> Or something closer to the original method Android used of each cgroup
> having an allow_attach() check which could determine what is
> sufficiently privledged for the respective level of danger the cgroup
> might poise?
>
> Or just stepping back, what method would you imagine to be reasonable
> to allow a specified task to migrate other tasks between cgroups
> without it having to be root/suid?

Any suggested feedback here?

thanks
-john

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Florian Westphal @ 2016-12-06  0:20 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Willem de Bruijn, Pablo Neira Ayuso, netfilter-devel,
	Network Development, Daniel Borkmann, Florian Westphal,
	Eric Dumazet
In-Reply-To: <CAF=yD-J-vcQQ_-48xG8qhMuYvLvucCr7SG99rsfJA5-okvhHzA@mail.gmail.com>

Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> While we're discussing the patch, another question, about revisions: I
> tested both modified and original iptables binaries on both standard
> and modified kernels. It all works as expected, except for the case
> where both binaries are used on a single kernel. For instance:
> 
>   iptables -A OUTPUT -m bpf --bytecode "`./nfbpf_compile RAW 'udp port
> 8000'`" -j LOG
>   ./iptables.new -L
> 
> Here the new binary will interpret the object as xt_bpf_match_v1, but
> iptables has inserted xt_bpf_match. The same problem happens the other
> way around. A new binary can be made robust to detect old structs, but
> not the other way around. Specific to bpf, the existing xt_bpf code
> has an unfortunate bug that it always prints at least one line of
> code, even if ->bpf_program_num_elems == 0.
> 
> I notice that other extensions also do not necessarily only extend
> struct vN in vN+1. Is the above a known issue?

Yes, I guess noone ever bothered to fix this.

The kernel blob should contain the match/target revision number,
so userspace can in fact see that 'this is bpf v42', but iirc
the netfilter userspace just loads the highest userspace revision
supported by the kernel (which is then different for the 2 iptables
binaries).

But we *could* display message like 'kernel uses revision 2 but I can
only find 0 and 1' or fall back to the lower supported revision without
guess-the-struct-by-size games.

^ permalink raw reply

* Oops with CONFIG_VMAP_STCK and bond device + virtio-net
From: Laura Abbott @ 2016-12-05 23:53 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: zbyszek, virtualization, netdev, Linux Kernel Mailing List

Hi,

Fedora got a bug report https://bugzilla.redhat.com/show_bug.cgi?id=1401612
In qemu with two virtio-net interfaces:

$ ip l
...
5: ens14: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:e9:64:41 brd ff:ff:ff:ff:ff:ff
6: ens15: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:e9:64:42 brd ff:ff:ff:ff:ff:ff

$ sudo ip link add bond1 type bond
$ sudo ip link set ens14 master bond1
Segmentation fault

 ------------[ cut here ]------------
 kernel BUG at ./include/linux/scatterlist.h:140!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: bonding ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip
  ata_generic crc32c_intel qxl drm_kms_helper virtio_pci serio_raw ttm drm pata_acpi
 CPU: 5 PID: 1983 Comm: ip Not tainted 4.9.0-0.rc6.git2.1.fc26.x86_64 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
 task: ffff9d50a3583240 task.stack: ffffb06e41040000
 RIP: 0010:[<ffffffffbc4896fc>]  [<ffffffffbc4896fc>] sg_init_one+0x8c/0xa0
 RSP: 0018:ffffb06e41043698  EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffffb06e41043774 RCX: 0000000000000028
 RDX: 0000131ec1043774 RSI: 0000000000000013 RDI: ffffb06ec1043774
 RBP: ffffb06e410436b0 R08: 00000000001ddbe0 R09: ffffb06e410436c8
 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000006
 R13: ffffb06e410436c8 R14: ffff9d50b2dc1800 R15: ffff9d50b3db9600
 FS:  00007f15347e5700(0000) GS:ffff9d50bb000000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007ffc09bc4000 CR3: 0000000135797000 CR4: 00000000000406e0
 Stack:
  ffff9d50b229d000 0000000000000000 ffffb06e41043772 ffffb06e41043720
  ffffffffc0051123 ffff9d50a3583240 0000000087654321 0000000000000002
  0000000000000000 0000000000000000 0000000000000000 000000007b8f5301
 Call Trace:
  [<ffffffffc0051123>] virtnet_set_mac_address+0xb3/0x140 [virtio_net]
  [<ffffffffbc7ae305>] dev_set_mac_address+0x55/0xc0
  [<ffffffffc03f319e>] bond_enslave+0x34e/0x1180 [bonding]
  [<ffffffffbc7ca22f>] do_setlink+0x6cf/0xd10
  [<ffffffffbc20dd6a>] ? get_page_from_freelist+0x6ba/0xca0
  [<ffffffffbc037de9>] ? sched_clock+0x9/0x10
  [<ffffffffbc068475>] ? kvm_sched_clock_read+0x25/0x40
  [<ffffffffbc111ed6>] ? __lock_acquire+0x346/0x1290
  [<ffffffffbc4aa436>] ? nla_parse+0xa6/0x120
  [<ffffffffbc7ce9e8>] rtnl_newlink+0x5c8/0x870
  [<ffffffffbc3ecb32>] ? avc_has_perm_noaudit+0x32/0x210
  [<ffffffffbc0bbfca>] ? ns_capable_common+0x7a/0x90
  [<ffffffffbc0bbff3>] ? ns_capable+0x13/0x20
  [<ffffffffbc7ced76>] rtnetlink_rcv_msg+0xe6/0x210
  [<ffffffffbc7c951b>] ? rtnetlink_rcv+0x1b/0x40
  [<ffffffffbc7c951b>] ? rtnetlink_rcv+0x1b/0x40
  [<ffffffffbc7cec90>] ? rtnl_newlink+0x870/0x870
  [<ffffffffbc7f7394>] netlink_rcv_skb+0xa4/0xc0
  [<ffffffffbc7c952a>] rtnetlink_rcv+0x2a/0x40
  [<ffffffffbc7f6d07>] netlink_unicast+0x1f7/0x2f0
  [<ffffffffbc7f6c7f>] ? netlink_unicast+0x16f/0x2f0
  [<ffffffffbc7f7102>] netlink_sendmsg+0x302/0x3c0
  [<ffffffffbc790c28>] sock_sendmsg+0x38/0x50
  [<ffffffffbc791773>] ___sys_sendmsg+0x2e3/0x2f0
  [<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
  [<ffffffffbc068475>] ? kvm_sched_clock_read+0x25/0x40
  [<ffffffffbc037de9>] ? sched_clock+0x9/0x10
  [<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
  [<ffffffffbc18830d>] ? __audit_syscall_entry+0xad/0xf0
  [<ffffffffbc111775>] ? trace_hardirqs_on_caller+0xf5/0x1b0
  [<ffffffffbc7924b4>] __sys_sendmsg+0x54/0x90
  [<ffffffffbc792502>] SyS_sendmsg+0x12/0x20
  [<ffffffffbc003eec>] do_syscall_64+0x6c/0x1f0
  [<ffffffffbc917589>] entry_SYSCALL64_slow_path+0x25/0x25
 Code: ca 75 2c 49 8b 55 08 f6 c2 01 75 25 83 e2 03 81 e3 ff 0f 00 00 45 89 65 14 48
 RIP  [<ffffffffbc4896fc>] sg_init_one+0x8c/0xa0
  RSP <ffffb06e41043698>
 ---[ end trace 9076d2284efbf735 ]---

This looks like an issue with CONFIG_VMAP_STACK since bond_enslave uses
struct sockaddr from the stack and virtnet_set_mac_address calls
sg_init_one which triggers BUG_ON(!virt_addr_valid(buf));

I know there have been a lot of CONFIG_VMAP_STACK fixes around but I
didn't find this one reported yet.

Thanks,
Laura

^ permalink raw reply

* Re: wl1251 & mac address & calibration data
From: Tony Lindgren @ 2016-12-05 23:51 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Aaro Koskinen, Sebastian Reichel, Pavel Machek, Michal Kazior,
	Kalle Valo, Ivaylo Dimitrov, linux-wireless, Network Development,
	linux-kernel
In-Reply-To: <201611261820.47995@pali>

* Pali Rohár <pali.rohar@gmail.com> [161126 09:21]:
> On Thursday 24 November 2016 19:46:01 Aaro Koskinen wrote:
> > Hi,
> > 
> > On Thu, Nov 24, 2016 at 04:20:45PM +0100, Pali Rohár wrote:
> > > Proprietary, signed and closed bootloader NOLO does not support DT.
> > > So for booting you need to append DTS file to kernel image.
> > > 
> > > U-Boot is optional and can be used as intermediate bootloader
> > > between NOLO and kernel. But still it has problems with reading
> > > from nand, so cannot read NVS data nor MAC address.
> > 
> > You could use kexec to pass the fixed DT.
> > 
> > A.
> 
> IIRC it was broken for N900/omap3, no idea if somebody fixed it.

FYI, at least in next-20161205 kexec works on omap3 for me.

Regards,

Tony

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Willem de Bruijn @ 2016-12-05 23:51 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Pablo Neira Ayuso, netfilter-devel, Network Development,
	Daniel Borkmann, Florian Westphal, Eric Dumazet
In-Reply-To: <CA+FuTSfHryyqBhRyfQCChG2VPwpj1KYBk2Z7DZ-dcYL-FbbSeA@mail.gmail.com>

On Mon, Dec 5, 2016 at 6:29 PM, Willem de Bruijn <willemb@google.com> wrote:
> On Mon, Dec 5, 2016 at 6:22 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
>> [...]
>>> Eric also suggests a private variable to avoid being subject to
>>> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
>>> length than current PATH_MAX.
>>
>> Good.
>>
>>> FWIW, there is a workaround for users with deeply nested paths: the
>>> path passed does not have to be absolute. It is literally what is
>>> passed on the command line to iptables right now, including relative
>>> addresses.
>>
>> If iptables userspace always expects to have the bpf file repository
>> in some given location (suggesting to have a directory that we specify
>> at ./configure time, similar to what we do with connlabel.conf), then
>> I think we can rely on relative paths. Would this be flexible enough
>> for your usecase?
>
> As long as it accepts relative paths, I think it will always work.
> Worst case, a user has to cd. No need for hardcoding the bpf mount
> point at compile time.
>
> I have the matching iptables patch for pinned objects, btw. Not for
> elf objects, which requires linking to libelf and parsing the object,
> which is more work (and perhaps best punted on by expanding libbpf in
> bcc to include this functionality. it already exists under samples/bpf
> and iproute2).

While we're discussing the patch, another question, about revisions: I
tested both modified and original iptables binaries on both standard
and modified kernels. It all works as expected, except for the case
where both binaries are used on a single kernel. For instance:

  iptables -A OUTPUT -m bpf --bytecode "`./nfbpf_compile RAW 'udp port
8000'`" -j LOG
  ./iptables.new -L

Here the new binary will interpret the object as xt_bpf_match_v1, but
iptables has inserted xt_bpf_match. The same problem happens the other
way around. A new binary can be made robust to detect old structs, but
not the other way around. Specific to bpf, the existing xt_bpf code
has an unfortunate bug that it always prints at least one line of
code, even if ->bpf_program_num_elems == 0.

I notice that other extensions also do not necessarily only extend
struct vN in vN+1. Is the above a known issue?

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Willem de Bruijn @ 2016-12-05 23:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Willem de Bruijn, netfilter-devel, Network Development,
	Daniel Borkmann, Florian Westphal, Eric Dumazet
In-Reply-To: <20161205232214.GA15825@salvia>

On Mon, Dec 5, 2016 at 6:22 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
> [...]
>> Eric also suggests a private variable to avoid being subject to
>> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
>> length than current PATH_MAX.
>
> Good.
>
>> FWIW, there is a workaround for users with deeply nested paths: the
>> path passed does not have to be absolute. It is literally what is
>> passed on the command line to iptables right now, including relative
>> addresses.
>
> If iptables userspace always expects to have the bpf file repository
> in some given location (suggesting to have a directory that we specify
> at ./configure time, similar to what we do with connlabel.conf), then
> I think we can rely on relative paths. Would this be flexible enough
> for your usecase?

As long as it accepts relative paths, I think it will always work.
Worst case, a user has to cd. No need for hardcoding the bpf mount
point at compile time.

I have the matching iptables patch for pinned objects, btw. Not for
elf objects, which requires linking to libelf and parsing the object,
which is more work (and perhaps best punted on by expanding libbpf in
bcc to include this functionality. it already exists under samples/bpf
and iproute2).

^ permalink raw reply

* [PATCH next] Revert "dctcp: update cwnd on congestion event"
From: Florian Westphal @ 2016-12-05 23:23 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, Neal Cardwell

Neal Cardwell says:
 If I am reading the code correctly, then I would have two concerns:
 1) Has that been tested? That seems like an extremely dramatic
    decrease in cwnd. For example, if the cwnd is 80, and there are 40
    ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
    calculations seem to suggest that after just 11 ACKs the cwnd would be
    down to a minimal value of 2 [..]
 2) That seems to contradict another passage in the draft [..] where it
    sazs:
       Just as specified in [RFC3168], DCTCP does not react to congestion
       indications more than once for every window of data.

Neal is right.  Fortunately we don't have to complicate this by testing
vs. current rtt estimate, we can just revert the patch.

Normal stack already handles this for us: receiving ACKs with ECE
set causes a call to tcp_enter_cwr(), from there on the ssthresh gets
adjusted and prr will take care of cwnd adjustment.

Fixes: 4780566784b396 ("dctcp: update cwnd on congestion event")
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/tcp_dctcp.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index bde22ebb92a8..5f5e5936760e 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -188,8 +188,8 @@ static void dctcp_ce_state_1_to_0(struct sock *sk)
 
 static void dctcp_update_alpha(struct sock *sk, u32 flags)
 {
+	const struct tcp_sock *tp = tcp_sk(sk);
 	struct dctcp *ca = inet_csk_ca(sk);
-	struct tcp_sock *tp = tcp_sk(sk);
 	u32 acked_bytes = tp->snd_una - ca->prior_snd_una;
 
 	/* If ack did not advance snd_una, count dupack as MSS size.
@@ -229,13 +229,6 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags)
 		WRITE_ONCE(ca->dctcp_alpha, alpha);
 		dctcp_reset(tp, ca);
 	}
-
-	if (flags & CA_ACK_ECE) {
-		unsigned int cwnd = dctcp_ssthresh(sk);
-
-		if (cwnd != tp->snd_cwnd)
-			tp->snd_cwnd = cwnd;
-	}
 }
 
 static void dctcp_state(struct sock *sk, u8 new_state)
-- 
2.7.3

^ permalink raw reply related

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Pablo Neira Ayuso @ 2016-12-05 23:22 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netfilter-devel, Network Development, Daniel Borkmann,
	Willem de Bruijn, Florian Westphal, Eric Dumazet
In-Reply-To: <CAF=yD-Jk4W+zU=tdixWe3LPGcFp-sANxXLLoQTRvrkN_3=a9uw@mail.gmail.com>

On Mon, Dec 05, 2016 at 06:06:05PM -0500, Willem de Bruijn wrote:
[...]
> Eric also suggests a private variable to avoid being subject to
> changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
> length than current PATH_MAX.

Good.

> FWIW, there is a workaround for users with deeply nested paths: the
> path passed does not have to be absolute. It is literally what is
> passed on the command line to iptables right now, including relative
> addresses.

If iptables userspace always expects to have the bpf file repository
in some given location (suggesting to have a directory that we specify
at ./configure time, similar to what we do with connlabel.conf), then
I think we can rely on relative paths. Would this be flexible enough
for your usecase?

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Lino Sanfilippo @ 2016-12-05 23:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
	linux-kernel
In-Reply-To: <98fb3577-5f31-bf17-3e02-96c150854108@gmx.de>


> 
> You mean stmmac_xmit()? Thats also softirq AFAICT, its the TX softirq....
> 
> Regards,
> Lino
> 
> 

Hmm. netdevices.txt says:

ndo_start_xmit:
	...

	Context: Process with BHs disabled or BH (timer),
	         will be called with interrupts disabled by netconsole.

	...

If this is correct it can indeed be process context, too. However BHs are already
disabled.

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Willem de Bruijn @ 2016-12-05 23:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, Network Development, Daniel Borkmann,
	Willem de Bruijn, Florian Westphal, Eric Dumazet
In-Reply-To: <20161205230055.GA15379@salvia>

On Mon, Dec 5, 2016 at 6:00 PM, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Dec 05, 2016 at 11:34:15PM +0100, Pablo Neira Ayuso wrote:
>> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
>> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
>> > > > From: Willem de Bruijn <willemb@google.com>
>> > > >
>> > > > Add support for attaching an eBPF object by file descriptor.
>> > > >
>> > > > The iptables binary can be called with a path to an elf object or a
>> > > > pinned bpf object. Also pass the mode and path to the kernel to be
>> > > > able to return it later for iptables dump and save.
>> > > >
>> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
>> > > > ---
>> > >
>> > > Assuming there is no simple way to get variable matchsize in iptables,
>> > > this looks good to me, thanks.
>> >
>> > It should be possible by setting kernel .matchsize to ~0 which
>> > suppresses strict size enforcement.
>> >
>> > Its currently only used by ebt_among, but this should work for any xtables
>> > module.
>>
>> This is likely going to trigger a large rewrite of the core userspace
>> iptables codebase, and likely going to pull part of the mess we have
>> in ebtables into iptables. So I'd prefer not to follow this path.
>
> So this variable path is there to annotate what userspace claims that
> is the file that contains the bpf blob that was loaded, actually this
> is irrelevant to the kernel, so this is just there to dump it back
> when iptables-save it is called. Just a side note, one could set
> anything there from userspace, point somewhere else actually...
>
> Well anyway, going back to the path problem to keep it simple: Why
> don't just trim this down to something smaller, are you really
> expecting to reach PATH_MAX in your usecase?

Not often. Module-specific limitations that differ from global
definitions are just a pain when they bite. This module also has an
arbitrary low limit on the length of the cBPF program passed, for
instance.

Eric also suggests a private variable to avoid being subject to
changes to PATH_MAX. Then we can indeed also choose an arbitrary lower
length than current PATH_MAX.

FWIW, there is a workaround for users with deeply nested paths: the
path passed does not have to be absolute. It is literally what is
passed on the command line to iptables right now, including relative
addresses.

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Pablo Neira Ayuso @ 2016-12-05 23:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, Willem de Bruijn, netfilter-devel, netdev,
	daniel, Willem de Bruijn
In-Reply-To: <1480978749.18162.561.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, Dec 05, 2016 at 02:59:09PM -0800, Eric Dumazet wrote:
> On Mon, 2016-12-05 at 23:40 +0100, Florian Westphal wrote:
> 
> > Fair enough, I have no objections to the patch.
> 
> An additional question is about PATH_MAX :
> 
> Is it guaranteed to stay at 4096 forever ?
> 
> To be safe, maybe we should use a constant of our own.

Right, this reminds me we have to fix something else.

So constant of our own plus something smaller, if possible, would be
good to go. Thanks.

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Willem de Bruijn @ 2016-12-05 23:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: netfilter-devel, pablo, Network Development, Willem de Bruijn
In-Reply-To: <5845F060.2050700@iogearbox.net>

On Mon, Dec 5, 2016 at 5:55 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> Hi Willem,
>
> On 12/05/2016 09:28 PM, Willem de Bruijn wrote:
>>
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Add support for attaching an eBPF object by file descriptor.
>>
>> The iptables binary can be called with a path to an elf object or a
>> pinned bpf object. Also pass the mode and path to the kernel to be
>> able to return it later for iptables dump and save.
>>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>
>
> just out of pure curiosity, use case is for android guys wrt
> accounting, or anything specific that cls_bpf on tc ingress +
> egress cannot do already?

That is the immediate motivation, yes.

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Pablo Neira Ayuso @ 2016-12-05 23:00 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netfilter-devel, netdev, daniel, Willem de Bruijn,
	Florian Westphal, Eric Dumazet
In-Reply-To: <20161205223415.GA14689@salvia>

On Mon, Dec 05, 2016 at 11:34:15PM +0100, Pablo Neira Ayuso wrote:
> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > > 
> > > > Add support for attaching an eBPF object by file descriptor.
> > > > 
> > > > The iptables binary can be called with a path to an elf object or a
> > > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > > able to return it later for iptables dump and save.
> > > > 
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > > 
> > > Assuming there is no simple way to get variable matchsize in iptables,
> > > this looks good to me, thanks.
> > 
> > It should be possible by setting kernel .matchsize to ~0 which
> > suppresses strict size enforcement.
> > 
> > Its currently only used by ebt_among, but this should work for any xtables
> > module.
> 
> This is likely going to trigger a large rewrite of the core userspace
> iptables codebase, and likely going to pull part of the mess we have
> in ebtables into iptables. So I'd prefer not to follow this path.

So this variable path is there to annotate what userspace claims that
is the file that contains the bpf blob that was loaded, actually this
is irrelevant to the kernel, so this is just there to dump it back
when iptables-save it is called. Just a side note, one could set
anything there from userspace, point somewhere else actually...

Well anyway, going back to the path problem to keep it simple: Why
don't just trim this down to something smaller, are you really
expecting to reach PATH_MAX in your usecase?

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Eric Dumazet @ 2016-12-05 22:59 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Willem de Bruijn, netfilter-devel, netdev,
	daniel, Willem de Bruijn
In-Reply-To: <20161205224051.GB16819@breakpoint.cc>

On Mon, 2016-12-05 at 23:40 +0100, Florian Westphal wrote:

> Fair enough, I have no objections to the patch.

An additional question is about PATH_MAX :

Is it guaranteed to stay at 4096 forever ?

To be safe, maybe we should use a constant of our own.



^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Daniel Borkmann @ 2016-12-05 22:55 UTC (permalink / raw)
  To: Willem de Bruijn, netfilter-devel; +Cc: pablo, netdev, Willem de Bruijn
In-Reply-To: <1480969684-74414-1-git-send-email-willemdebruijn.kernel@gmail.com>

Hi Willem,

On 12/05/2016 09:28 PM, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Add support for attaching an eBPF object by file descriptor.
>
> The iptables binary can be called with a path to an elf object or a
> pinned bpf object. Also pass the mode and path to the kernel to be
> able to return it later for iptables dump and save.
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

just out of pure curiosity, use case is for android guys wrt
accounting, or anything specific that cls_bpf on tc ingress +
egress cannot do already?

Thanks,
Daniel

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Lino Sanfilippo @ 2016-12-05 22:54 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
	linux-kernel
In-Reply-To: <20161205224057.GB19135@amd>

On 05.12.2016 23:40, Pavel Machek wrote:
> On Mon 2016-12-05 23:37:09, Lino Sanfilippo wrote:
>> Hi Pavel,
>> 
>> On 05.12.2016 23:02, Pavel Machek wrote:
>> > 
>> > we need spin_lock_bh at minimum, as we are locking user context
>> > against timer.
>> > 
>> > Best regards,
>> > 									Pavel
>> > 
>> 
>> I was referring to stmmac_tx_clean() which AFAICS is only called from softirq context,
>> (one time in the timer handler and one time in napi poll handler) so a spin_lock() should
>> be sufficient. I cant see how this is called from userspace. If it were, a spin_lock_bh() had
>> to be used, of course.
> 
> stmmac_tx_clean() shares lock with stmmac_tx() -- and that's process
> context as far as I can tell. So... spin_lock_bh() at
> minimum... right?
> 
> 									Pavel
> 

You mean stmmac_xmit()? Thats also softirq AFAICT, its the TX softirq....

Regards,
Lino

^ permalink raw reply

* Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit
From: Michał Mirosław @ 2016-12-05 22:52 UTC (permalink / raw)
  To: Ben Pfaff
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
In-Reply-To: <20161205185545.GB3129-LZ6Gd1LRuIk@public.gmane.org>

On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote:
> On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote:
> > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > > > intact through linux networking stack.
> > > This appears to change the established Open vSwitch userspace API.  You
> > > can see that simply from the way that it changes the documentation for
> > > the userspace API.  If I'm right about that, then this change will break
> > > all userspace programs that use the Open vSwitch kernel module,
> > > including Open vSwitch itself.
> > 
> > If I understood the code correctly, it does change expected meaning for
> > the (unlikely?) case of header truncated just before the VLAN TCI - it will
> > be impossible to differentiate this case from the VLAN TCI == 0.
> > 
> > I guess this is a problem with OVS API, because it doesn't directly show
> > the "missing" state of elements, but relies on an "invalid" value.
> 
> That particular corner case should not be a huge problem in any case.
> 
> The real problem is that this appears to break the common case use of
> VLANs in Open vSwitch.  After this patch, parse_vlan() in
> net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the
> accelerated version of them or the version in the skb data) into
> sw_flow_key members.  OK, that's fine on it's own.  However, I don't see
> any corresponding change to the code in flow_netlink.c to compensate for
> the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT)
> was always required to be set to 1 in flow matches inside Netlink
> messages sent from userspace, and the kernel always set it to 1 in
> corresponding messages sent to userspace.
> 
> In other words, if I'm reading this change correctly:
> 
>     * With a kernel before this change, userspace always had to set
>       VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would
>       reject the flow match.
> 
>     * With a kernel after this change, userspace must not set
>       VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow
>       match but nothing will ever match because packets do not actually
>       have the CFI bit set.
> 
> Take a look at this code that the patch deletes from
> validate_vlan_from_nlattrs(), for example, and see how it insisted that
> VLAN_TAG_PRESENT was set:
> 
> 	if (!(tci & htons(VLAN_TAG_PRESENT))) {
> 		if (tci) {
> 			OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT bit set.",
> 				  (inner) ? "C-VLAN" : "VLAN");
> 			return -EINVAL;
> 		} else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> 			/* Corner case for truncated VLAN header. */
> 			OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
> 				  (inner) ? "C-VLAN" : "VLAN");
> 			return -EINVAL;
> 		}
> 	}
> 
> Please let me know if I'm overlooking something.

Hmm. So the easiest change without disrupting current userspace, would be
to flip the CFI bit on the way to/from OVS userspace. Does this seem
correct?

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Florian Westphal @ 2016-12-05 22:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Eric Dumazet, Willem de Bruijn, netfilter-devel,
	netdev, daniel, Willem de Bruijn
In-Reply-To: <20161205223415.GA14689@salvia>

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > > From: Willem de Bruijn <willemb@google.com>
> > > > 
> > > > Add support for attaching an eBPF object by file descriptor.
> > > > 
> > > > The iptables binary can be called with a path to an elf object or a
> > > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > > able to return it later for iptables dump and save.
> > > > 
> > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > ---
> > > 
> > > Assuming there is no simple way to get variable matchsize in iptables,
> > > this looks good to me, thanks.
> > 
> > It should be possible by setting kernel .matchsize to ~0 which
> > suppresses strict size enforcement.
> > 
> > Its currently only used by ebt_among, but this should work for any xtables
> > module.
> 
> This is likely going to trigger a large rewrite of the core userspace
> iptables codebase, and likely going to pull part of the mess we have
> in ebtables into iptables. So I'd prefer not to follow this path.

Fair enough, I have no objections to the patch.

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Pavel Machek @ 2016-12-05 22:40 UTC (permalink / raw)
  To: Lino Sanfilippo
  Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
	linux-kernel
In-Reply-To: <9a85da24-85cf-f94e-908f-a10eecac2369@gmx.de>

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]

On Mon 2016-12-05 23:37:09, Lino Sanfilippo wrote:
> Hi Pavel,
> 
> On 05.12.2016 23:02, Pavel Machek wrote:
> > 
> > we need spin_lock_bh at minimum, as we are locking user context
> > against timer.
> > 
> > Best regards,
> > 									Pavel
> > 
> 
> I was referring to stmmac_tx_clean() which AFAICS is only called from softirq context,
> (one time in the timer handler and one time in napi poll handler) so a spin_lock() should
> be sufficient. I cant see how this is called from userspace. If it were, a spin_lock_bh() had
> to be used, of course.

stmmac_tx_clean() shares lock with stmmac_tx() -- and that's process
context as far as I can tell. So... spin_lock_bh() at
minimum... right?

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.9-rc6: coalescing related pauses.
From: Lino Sanfilippo @ 2016-12-05 22:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Giuseppe CAVALLARO, alexandre.torgue, David Miller, netdev,
	linux-kernel
In-Reply-To: <20161205220221.GA19135@amd>

Hi Pavel,

On 05.12.2016 23:02, Pavel Machek wrote:
> 
> we need spin_lock_bh at minimum, as we are locking user context
> against timer.
> 
> Best regards,
> 									Pavel
> 

I was referring to stmmac_tx_clean() which AFAICS is only called from softirq context,
(one time in the timer handler and one time in napi poll handler) so a spin_lock() should
be sufficient. I cant see how this is called from userspace. If it were, a spin_lock_bh() had
to be used, of course.

Regards,
Lino 

^ permalink raw reply

* Re: [PATCH nf-next] netfilter: xt_bpf: support ebpf
From: Pablo Neira Ayuso @ 2016-12-05 22:34 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Eric Dumazet, Willem de Bruijn, netfilter-devel, netdev, daniel,
	Willem de Bruijn
In-Reply-To: <20161205213001.GA16819@breakpoint.cc>

On Mon, Dec 05, 2016 at 10:30:01PM +0100, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2016-12-05 at 15:28 -0500, Willem de Bruijn wrote:
> > > From: Willem de Bruijn <willemb@google.com>
> > > 
> > > Add support for attaching an eBPF object by file descriptor.
> > > 
> > > The iptables binary can be called with a path to an elf object or a
> > > pinned bpf object. Also pass the mode and path to the kernel to be
> > > able to return it later for iptables dump and save.
> > > 
> > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > ---
> > 
> > Assuming there is no simple way to get variable matchsize in iptables,
> > this looks good to me, thanks.
> 
> It should be possible by setting kernel .matchsize to ~0 which
> suppresses strict size enforcement.
> 
> Its currently only used by ebt_among, but this should work for any xtables
> module.

This is likely going to trigger a large rewrite of the core userspace
iptables codebase, and likely going to pull part of the mess we have
in ebtables into iptables. So I'd prefer not to follow this path.

^ permalink raw reply

* [PATCH v3 net-next v3 4/4] net: dsa: mv88e6xxx: add PPU operations
From: Vivien Didelot @ 2016-12-05 22:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Stefan Eichenberger, Richard Cochran, Vivien Didelot
In-Reply-To: <20161205223028.20308-1-vivien.didelot@savoirfairelinux.com>

Some Marvell chips can enable/disable the PPU on demand. This is needed
to access the PHY registers when there is no indirection mechanism.

Add two new ppu_enable and ppu_disable ops to describe this and finally
get rid of the MV88E6XXX_FLAG_PPU* flags.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c      | 76 ++++++++---------------------------
 drivers/net/dsa/mv88e6xxx/global1.c   | 57 ++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/global1.h   |  3 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 19 ++-------
 4 files changed, 81 insertions(+), 74 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5aae5d7..731d262 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -527,58 +527,18 @@ int mv88e6xxx_update(struct mv88e6xxx_chip *chip, int addr, int reg, u16 update)
 
 static int mv88e6xxx_ppu_disable(struct mv88e6xxx_chip *chip)
 {
-	u16 val;
-	int i, err;
+	if (!chip->info->ops->ppu_disable)
+		return 0;
 
-	err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &val);
-	if (err)
-		return err;
-
-	err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL,
-				 val & ~GLOBAL_CONTROL_PPU_ENABLE);
-	if (err)
-		return err;
-
-	for (i = 0; i < 16; i++) {
-		err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &val);
-		if (err)
-			return err;
-
-		usleep_range(1000, 2000);
-		val &= GLOBAL_STATUS_PPU_STATE_MASK;
-		if (val != GLOBAL_STATUS_PPU_STATE_POLLING)
-			return 0;
-	}
-
-	return -ETIMEDOUT;
+	return chip->info->ops->ppu_disable(chip);
 }
 
 static int mv88e6xxx_ppu_enable(struct mv88e6xxx_chip *chip)
 {
-	u16 val;
-	int i, err;
+	if (!chip->info->ops->ppu_enable)
+		return 0;
 
-	err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &val);
-	if (err)
-		return err;
-
-	err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL,
-				 val | GLOBAL_CONTROL_PPU_ENABLE);
-	if (err)
-		return err;
-
-	for (i = 0; i < 16; i++) {
-		err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &val);
-		if (err)
-			return err;
-
-		usleep_range(1000, 2000);
-		val &= GLOBAL_STATUS_PPU_STATE_MASK;
-		if (val == GLOBAL_STATUS_PPU_STATE_POLLING)
-			return 0;
-	}
-
-	return -ETIMEDOUT;
+	return chip->info->ops->ppu_enable(chip);
 }
 
 static void mv88e6xxx_ppu_reenable_work(struct work_struct *ugly)
@@ -2746,22 +2706,12 @@ static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
 {
 	struct dsa_switch *ds = chip->ds;
 	u32 upstream_port = dsa_upstream_port(ds);
-	u16 reg;
 	int err;
 
 	/* Enable the PHY Polling Unit if present, don't discard any packets,
 	 * and mask all interrupt sources.
 	 */
-	err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &reg);
-	if (err < 0)
-		return err;
-
-	reg &= ~GLOBAL_CONTROL_PPU_ENABLE;
-	if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU) ||
-	    mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU_ACTIVE))
-		reg |= GLOBAL_CONTROL_PPU_ENABLE;
-
-	err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL, reg);
+	err = mv88e6xxx_ppu_enable(chip);
 	if (err)
 		return err;
 
@@ -3223,6 +3173,8 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.ppu_enable = mv88e6185_g1_ppu_enable,
+	.ppu_disable = mv88e6185_g1_ppu_disable,
 	.reset = mv88e6185_g1_reset,
 };
 
@@ -3241,6 +3193,8 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
 	.stats_get_strings = mv88e6095_stats_get_strings,
 	.stats_get_stats = mv88e6095_stats_get_stats,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.ppu_enable = mv88e6185_g1_ppu_enable,
+	.ppu_disable = mv88e6185_g1_ppu_disable,
 	.reset = mv88e6185_g1_reset,
 };
 
@@ -3311,6 +3265,8 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.ppu_enable = mv88e6185_g1_ppu_enable,
+	.ppu_disable = mv88e6185_g1_ppu_disable,
 	.reset = mv88e6185_g1_reset,
 };
 
@@ -3483,6 +3439,8 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.ppu_enable = mv88e6185_g1_ppu_enable,
+	.ppu_disable = mv88e6185_g1_ppu_disable,
 	.reset = mv88e6185_g1_reset,
 };
 
@@ -4262,13 +4220,13 @@ static struct mv88e6xxx_chip *mv88e6xxx_alloc_chip(struct device *dev)
 
 static void mv88e6xxx_phy_init(struct mv88e6xxx_chip *chip)
 {
-	if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU))
+	if (chip->info->ops->ppu_enable && chip->info->ops->ppu_disable)
 		mv88e6xxx_ppu_state_init(chip);
 }
 
 static void mv88e6xxx_phy_destroy(struct mv88e6xxx_chip *chip)
 {
-	if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU))
+	if (chip->info->ops->ppu_enable && chip->info->ops->ppu_disable)
 		mv88e6xxx_ppu_state_destroy(chip);
 }
 
diff --git a/drivers/net/dsa/mv88e6xxx/global1.c b/drivers/net/dsa/mv88e6xxx/global1.c
index c868eb0..75af86a 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.c
+++ b/drivers/net/dsa/mv88e6xxx/global1.c
@@ -35,6 +35,27 @@ int mv88e6xxx_g1_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask)
 
 /* Offset 0x00: Switch Global Status Register */
 
+static int mv88e6185_g1_wait_ppu_disabled(struct mv88e6xxx_chip *chip)
+{
+	u16 state;
+	int i, err;
+
+	for (i = 0; i < 16; i++) {
+		err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &state);
+		if (err)
+			return err;
+
+		/* Check the value of the PPUState bits 15:14 */
+		state &= GLOBAL_STATUS_PPU_STATE_MASK;
+		if (state != GLOBAL_STATUS_PPU_STATE_POLLING)
+			return 0;
+
+		usleep_range(1000, 2000);
+	}
+
+	return -ETIMEDOUT;
+}
+
 static int mv88e6185_g1_wait_ppu_polling(struct mv88e6xxx_chip *chip)
 {
 	u16 state;
@@ -154,6 +175,42 @@ int mv88e6352_g1_reset(struct mv88e6xxx_chip *chip)
 	return mv88e6352_g1_wait_ppu_polling(chip);
 }
 
+int mv88e6185_g1_ppu_enable(struct mv88e6xxx_chip *chip)
+{
+	u16 val;
+	int err;
+
+	err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &val);
+	if (err)
+		return err;
+
+	val |= GLOBAL_CONTROL_PPU_ENABLE;
+
+	err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL, val);
+	if (err)
+		return err;
+
+	return mv88e6185_g1_wait_ppu_polling(chip);
+}
+
+int mv88e6185_g1_ppu_disable(struct mv88e6xxx_chip *chip)
+{
+	u16 val;
+	int err;
+
+	err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &val);
+	if (err)
+		return err;
+
+	val &= ~GLOBAL_CONTROL_PPU_ENABLE;
+
+	err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL, val);
+	if (err)
+		return err;
+
+	return mv88e6185_g1_wait_ppu_disabled(chip);
+}
+
 /* Offset 0x1a: Monitor Control */
 /* Offset 0x1a: Monitor & MGMT Control on some devices */
 
diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h
index 9fca215..1aec738 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -23,6 +23,9 @@ int mv88e6xxx_g1_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask);
 int mv88e6185_g1_reset(struct mv88e6xxx_chip *chip);
 int mv88e6352_g1_reset(struct mv88e6xxx_chip *chip);
 
+int mv88e6185_g1_ppu_enable(struct mv88e6xxx_chip *chip);
+int mv88e6185_g1_ppu_disable(struct mv88e6xxx_chip *chip);
+
 int mv88e6xxx_g1_stats_wait(struct mv88e6xxx_chip *chip);
 int mv88e6xxx_g1_stats_snapshot(struct mv88e6xxx_chip *chip, int port);
 int mv88e6320_g1_stats_snapshot(struct mv88e6xxx_chip *chip, int port);
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index f201d13..af54bae 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -490,12 +490,6 @@ enum mv88e6xxx_cap {
 	MV88E6XXX_CAP_G2_PVT_DATA,	/* (0x0c) Cross Chip Port VLAN Data */
 	MV88E6XXX_CAP_G2_POT,		/* (0x0f) Priority Override Table */
 
-	/* PHY Polling Unit.
-	 * See GLOBAL_CONTROL_PPU_ENABLE and GLOBAL_STATUS_PPU_POLLING.
-	 */
-	MV88E6XXX_CAP_PPU,
-	MV88E6XXX_CAP_PPU_ACTIVE,
-
 	/* Per VLAN Spanning Tree Unit (STU).
 	 * The Port State database, if present, is accessed through VTU
 	 * operations and dedicated SID registers. See GLOBAL_VTU_SID.
@@ -537,8 +531,6 @@ enum mv88e6xxx_cap {
 #define MV88E6XXX_FLAG_G2_PVT_DATA	BIT_ULL(MV88E6XXX_CAP_G2_PVT_DATA)
 #define MV88E6XXX_FLAG_G2_POT		BIT_ULL(MV88E6XXX_CAP_G2_POT)
 
-#define MV88E6XXX_FLAG_PPU		BIT_ULL(MV88E6XXX_CAP_PPU)
-#define MV88E6XXX_FLAG_PPU_ACTIVE	BIT_ULL(MV88E6XXX_CAP_PPU_ACTIVE)
 #define MV88E6XXX_FLAG_STU		BIT_ULL(MV88E6XXX_CAP_STU)
 #define MV88E6XXX_FLAG_TEMP		BIT_ULL(MV88E6XXX_CAP_TEMP)
 #define MV88E6XXX_FLAG_TEMP_LIMIT	BIT_ULL(MV88E6XXX_CAP_TEMP_LIMIT)
@@ -567,7 +559,6 @@ enum mv88e6xxx_cap {
 #define MV88E6XXX_FLAGS_FAMILY_6095	\
 	(MV88E6XXX_FLAG_GLOBAL2 |	\
 	 MV88E6XXX_FLAG_G2_MGMT_EN_0X |	\
-	 MV88E6XXX_FLAG_PPU |		\
 	 MV88E6XXX_FLAG_VTU |		\
 	 MV88E6XXX_FLAGS_MULTI_CHIP)
 
@@ -578,7 +569,6 @@ enum mv88e6xxx_cap {
 	 MV88E6XXX_FLAG_G2_MGMT_EN_2X |	\
 	 MV88E6XXX_FLAG_G2_MGMT_EN_0X |	\
 	 MV88E6XXX_FLAG_G2_POT |	\
-	 MV88E6XXX_FLAG_PPU |		\
 	 MV88E6XXX_FLAG_STU |		\
 	 MV88E6XXX_FLAG_VTU |		\
 	 MV88E6XXX_FLAGS_IRL |		\
@@ -605,7 +595,6 @@ enum mv88e6xxx_cap {
 	 MV88E6XXX_FLAG_G2_INT |	\
 	 MV88E6XXX_FLAG_G2_MGMT_EN_0X |	\
 	 MV88E6XXX_FLAGS_MULTI_CHIP |	\
-	 MV88E6XXX_FLAG_PPU |		\
 	 MV88E6XXX_FLAG_VTU)
 
 #define MV88E6XXX_FLAGS_FAMILY_6320	\
@@ -614,7 +603,6 @@ enum mv88e6xxx_cap {
 	 MV88E6XXX_FLAG_G2_MGMT_EN_2X |	\
 	 MV88E6XXX_FLAG_G2_MGMT_EN_0X |	\
 	 MV88E6XXX_FLAG_G2_POT |	\
-	 MV88E6XXX_FLAG_PPU_ACTIVE |	\
 	 MV88E6XXX_FLAG_TEMP |		\
 	 MV88E6XXX_FLAG_TEMP_LIMIT |	\
 	 MV88E6XXX_FLAG_VTU |		\
@@ -630,7 +618,6 @@ enum mv88e6xxx_cap {
 	 MV88E6XXX_FLAG_G2_MGMT_EN_2X |	\
 	 MV88E6XXX_FLAG_G2_MGMT_EN_0X |	\
 	 MV88E6XXX_FLAG_G2_POT |	\
-	 MV88E6XXX_FLAG_PPU_ACTIVE |	\
 	 MV88E6XXX_FLAG_STU |		\
 	 MV88E6XXX_FLAG_TEMP |		\
 	 MV88E6XXX_FLAG_VTU |		\
@@ -647,7 +634,6 @@ enum mv88e6xxx_cap {
 	 MV88E6XXX_FLAG_G2_MGMT_EN_2X |	\
 	 MV88E6XXX_FLAG_G2_MGMT_EN_0X |	\
 	 MV88E6XXX_FLAG_G2_POT |	\
-	 MV88E6XXX_FLAG_PPU_ACTIVE |	\
 	 MV88E6XXX_FLAG_STU |		\
 	 MV88E6XXX_FLAG_TEMP |		\
 	 MV88E6XXX_FLAG_TEMP_LIMIT |	\
@@ -662,7 +648,6 @@ struct mv88e6xxx_ops;
 #define MV88E6XXX_FLAGS_FAMILY_6390	\
 	(MV88E6XXX_FLAG_EEE |		\
 	 MV88E6XXX_FLAG_GLOBAL2 |	\
-	 MV88E6XXX_FLAG_PPU_ACTIVE |	\
 	 MV88E6XXX_FLAG_STU |		\
 	 MV88E6XXX_FLAG_TEMP |		\
 	 MV88E6XXX_FLAG_TEMP_LIMIT |	\
@@ -792,6 +777,10 @@ struct mv88e6xxx_ops {
 	int (*phy_write)(struct mv88e6xxx_chip *chip, int addr, int reg,
 			 u16 val);
 
+	/* PHY Polling Unit (PPU) operations */
+	int (*ppu_enable)(struct mv88e6xxx_chip *chip);
+	int (*ppu_disable)(struct mv88e6xxx_chip *chip);
+
 	/* Switch Software Reset */
 	int (*reset)(struct mv88e6xxx_chip *chip);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH v3 net-next v3 3/4] net: dsa: mv88e6xxx: add a soft reset operation
From: Vivien Didelot @ 2016-12-05 22:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Stefan Eichenberger, Richard Cochran, Vivien Didelot
In-Reply-To: <20161205223028.20308-1-vivien.didelot@savoirfairelinux.com>

Marvell chips have different way to issue a software reset.

Old chips (such as 88E6060) have a reset bit in an ATU control register.

Newer chips moved this bit in a Global control register. Chips with
controllable PPU should reset the PPU when resetting the switch.

Add a new reset operation to implement these differences and introduce a
mv88e6xxx_software_reset() helper to wrap it conveniently.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c      |  72 ++++++++++----------
 drivers/net/dsa/mv88e6xxx/global1.c   | 121 ++++++++++++++++++++++++++++++++++
 drivers/net/dsa/mv88e6xxx/global1.h   |   4 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  15 +++--
 4 files changed, 172 insertions(+), 40 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 27dfb5d..5aae5d7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -545,7 +545,8 @@ static int mv88e6xxx_ppu_disable(struct mv88e6xxx_chip *chip)
 			return err;
 
 		usleep_range(1000, 2000);
-		if ((val & GLOBAL_STATUS_PPU_MASK) != GLOBAL_STATUS_PPU_POLLING)
+		val &= GLOBAL_STATUS_PPU_STATE_MASK;
+		if (val != GLOBAL_STATUS_PPU_STATE_POLLING)
 			return 0;
 	}
 
@@ -572,7 +573,8 @@ static int mv88e6xxx_ppu_enable(struct mv88e6xxx_chip *chip)
 			return err;
 
 		usleep_range(1000, 2000);
-		if ((val & GLOBAL_STATUS_PPU_MASK) == GLOBAL_STATUS_PPU_POLLING)
+		val &= GLOBAL_STATUS_PPU_STATE_MASK;
+		if (val == GLOBAL_STATUS_PPU_STATE_POLLING)
 			return 0;
 	}
 
@@ -2356,6 +2358,14 @@ static void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port)
 	mutex_unlock(&chip->reg_lock);
 }
 
+static int mv88e6xxx_software_reset(struct mv88e6xxx_chip *chip)
+{
+	if (chip->info->ops->reset)
+		return chip->info->ops->reset(chip);
+
+	return 0;
+}
+
 static void mv88e6xxx_hardware_reset(struct mv88e6xxx_chip *chip)
 {
 	struct gpio_desc *gpiod = chip->reset;
@@ -2391,10 +2401,6 @@ static int mv88e6xxx_disable_ports(struct mv88e6xxx_chip *chip)
 
 static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
 {
-	bool ppu_active = mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU_ACTIVE);
-	u16 is_reset = (ppu_active ? 0x8800 : 0xc800);
-	unsigned long timeout;
-	u16 reg;
 	int err;
 
 	err = mv88e6xxx_disable_ports(chip);
@@ -2403,34 +2409,7 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
 
 	mv88e6xxx_hardware_reset(chip);
 
-	/* Reset the switch. Keep the PPU active if requested. The PPU
-	 * needs to be active to support indirect phy register access
-	 * through global registers 0x18 and 0x19.
-	 */
-	if (ppu_active)
-		err = mv88e6xxx_g1_write(chip, 0x04, 0xc000);
-	else
-		err = mv88e6xxx_g1_write(chip, 0x04, 0xc400);
-	if (err)
-		return err;
-
-	/* Wait up to one second for reset to complete. */
-	timeout = jiffies + 1 * HZ;
-	while (time_before(jiffies, timeout)) {
-		err = mv88e6xxx_g1_read(chip, 0x00, &reg);
-		if (err)
-			return err;
-
-		if ((reg & is_reset) == is_reset)
-			break;
-		usleep_range(1000, 2000);
-	}
-	if (time_after(jiffies, timeout))
-		err = -ETIMEDOUT;
-	else
-		err = 0;
-
-	return err;
+	return mv88e6xxx_software_reset(chip);
 }
 
 static int mv88e6xxx_serdes_power_on(struct mv88e6xxx_chip *chip)
@@ -3244,6 +3223,7 @@ static const struct mv88e6xxx_ops mv88e6085_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6185_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6095_ops = {
@@ -3261,6 +3241,7 @@ static const struct mv88e6xxx_ops mv88e6095_ops = {
 	.stats_get_strings = mv88e6095_stats_get_strings,
 	.stats_get_stats = mv88e6095_stats_get_stats,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6185_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6097_ops = {
@@ -3285,6 +3266,7 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6123_ops = {
@@ -3304,6 +3286,7 @@ static const struct mv88e6xxx_ops mv88e6123_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6131_ops = {
@@ -3328,6 +3311,7 @@ static const struct mv88e6xxx_ops mv88e6131_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6185_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6161_ops = {
@@ -3352,6 +3336,7 @@ static const struct mv88e6xxx_ops mv88e6161_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6165_ops = {
@@ -3369,6 +3354,7 @@ static const struct mv88e6xxx_ops mv88e6165_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6171_ops = {
@@ -3394,6 +3380,7 @@ static const struct mv88e6xxx_ops mv88e6171_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6172_ops = {
@@ -3421,6 +3408,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6175_ops = {
@@ -3446,6 +3434,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6176_ops = {
@@ -3473,6 +3462,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6185_ops = {
@@ -3493,6 +3483,7 @@ static const struct mv88e6xxx_ops mv88e6185_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6185_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6190_ops = {
@@ -3517,6 +3508,7 @@ static const struct mv88e6xxx_ops mv88e6190_ops = {
 	.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6390_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6190x_ops = {
@@ -3541,6 +3533,7 @@ static const struct mv88e6xxx_ops mv88e6190x_ops = {
 	.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6390_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6191_ops = {
@@ -3565,6 +3558,7 @@ static const struct mv88e6xxx_ops mv88e6191_ops = {
 	.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6390_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6240_ops = {
@@ -3592,6 +3586,7 @@ static const struct mv88e6xxx_ops mv88e6240_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6290_ops = {
@@ -3616,6 +3611,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = {
 	.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6390_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6320_ops = {
@@ -3642,6 +3638,7 @@ static const struct mv88e6xxx_ops mv88e6320_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6321_ops = {
@@ -3667,6 +3664,7 @@ static const struct mv88e6xxx_ops mv88e6321_ops = {
 	.stats_get_stats = mv88e6320_stats_get_stats,
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6350_ops = {
@@ -3692,6 +3690,7 @@ static const struct mv88e6xxx_ops mv88e6350_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6351_ops = {
@@ -3717,6 +3716,7 @@ static const struct mv88e6xxx_ops mv88e6351_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6352_ops = {
@@ -3744,6 +3744,7 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
 	.g1_set_cpu_port = mv88e6095_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6095_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6095_g2_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6390_ops = {
@@ -3770,6 +3771,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
 	.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6390_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6390x_ops = {
@@ -3796,6 +3798,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
 	.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6390_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static const struct mv88e6xxx_ops mv88e6391_ops = {
@@ -3820,6 +3823,7 @@ static const struct mv88e6xxx_ops mv88e6391_ops = {
 	.g1_set_cpu_port = mv88e6390_g1_set_cpu_port,
 	.g1_set_egress_port = mv88e6390_g1_set_egress_port,
 	.mgmt_rsvd2cpu = mv88e6390_g1_mgmt_rsvd2cpu,
+	.reset = mv88e6352_g1_reset,
 };
 
 static int mv88e6xxx_verify_madatory_ops(struct mv88e6xxx_chip *chip,
diff --git a/drivers/net/dsa/mv88e6xxx/global1.c b/drivers/net/dsa/mv88e6xxx/global1.c
index 44136ee..c868eb0 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.c
+++ b/drivers/net/dsa/mv88e6xxx/global1.c
@@ -33,6 +33,127 @@ int mv88e6xxx_g1_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask)
 	return mv88e6xxx_wait(chip, chip->info->global1_addr, reg, mask);
 }
 
+/* Offset 0x00: Switch Global Status Register */
+
+static int mv88e6185_g1_wait_ppu_polling(struct mv88e6xxx_chip *chip)
+{
+	u16 state;
+	int i, err;
+
+	for (i = 0; i < 16; ++i) {
+		err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &state);
+		if (err)
+			return err;
+
+		/* Check the value of the PPUState bits 15:14 */
+		state &= GLOBAL_STATUS_PPU_STATE_MASK;
+		if (state == GLOBAL_STATUS_PPU_STATE_POLLING)
+			return 0;
+
+		usleep_range(1000, 2000);
+	}
+
+	return -ETIMEDOUT;
+}
+
+static int mv88e6352_g1_wait_ppu_polling(struct mv88e6xxx_chip *chip)
+{
+	u16 state;
+	int i, err;
+
+	for (i = 0; i < 16; ++i) {
+		err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &state);
+		if (err)
+			return err;
+
+		/* Check the value of the PPUState (or InitState) bit 15 */
+		if (state & GLOBAL_STATUS_PPU_STATE)
+			return 0;
+
+		usleep_range(1000, 2000);
+	}
+
+	return -ETIMEDOUT;
+}
+
+static int mv88e6xxx_g1_wait_init_ready(struct mv88e6xxx_chip *chip)
+{
+	const unsigned long timeout = jiffies + 1 * HZ;
+	u16 val;
+	int err;
+
+	/* Wait up to 1 second for the switch to be ready. The InitReady bit 11
+	 * is set to a one when all units inside the device (ATU, VTU, etc.)
+	 * have finished their initialization and are ready to accept frames.
+	 */
+	while (time_before(jiffies, timeout)) {
+		err = mv88e6xxx_g1_read(chip, GLOBAL_STATUS, &val);
+		if (err)
+			return err;
+
+		if (val & GLOBAL_STATUS_INIT_READY)
+			break;
+
+		usleep_range(1000, 2000);
+	}
+
+	if (time_after(jiffies, timeout))
+		return -ETIMEDOUT;
+
+	return 0;
+}
+
+/* Offset 0x04: Switch Global Control Register */
+
+int mv88e6185_g1_reset(struct mv88e6xxx_chip *chip)
+{
+	u16 val;
+	int err;
+
+	/* Set the SWReset bit 15 along with the PPUEn bit 14, to also restart
+	 * the PPU, including re-doing PHY detection and initialization
+	 */
+	err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &val);
+	if (err)
+		return err;
+
+	val |= GLOBAL_CONTROL_SW_RESET;
+	val |= GLOBAL_CONTROL_PPU_ENABLE;
+
+	err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL, val);
+	if (err)
+		return err;
+
+	err = mv88e6xxx_g1_wait_init_ready(chip);
+	if (err)
+		return err;
+
+	return mv88e6185_g1_wait_ppu_polling(chip);
+}
+
+int mv88e6352_g1_reset(struct mv88e6xxx_chip *chip)
+{
+	u16 val;
+	int err;
+
+	/* Set the SWReset bit 15 */
+	err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL, &val);
+	if (err)
+		return err;
+
+	val |= GLOBAL_CONTROL_SW_RESET;
+
+	err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL, val);
+	if (err)
+		return err;
+
+	err = mv88e6xxx_g1_wait_init_ready(chip);
+	if (err)
+		return err;
+
+	return mv88e6352_g1_wait_ppu_polling(chip);
+}
+
 /* Offset 0x1a: Monitor Control */
 /* Offset 0x1a: Monitor & MGMT Control on some devices */
 
diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h
index cb61378..9fca215 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -19,6 +19,10 @@
 int mv88e6xxx_g1_read(struct mv88e6xxx_chip *chip, int reg, u16 *val);
 int mv88e6xxx_g1_write(struct mv88e6xxx_chip *chip, int reg, u16 val);
 int mv88e6xxx_g1_wait(struct mv88e6xxx_chip *chip, int reg, u16 mask);
+
+int mv88e6185_g1_reset(struct mv88e6xxx_chip *chip);
+int mv88e6352_g1_reset(struct mv88e6xxx_chip *chip);
+
 int mv88e6xxx_g1_stats_wait(struct mv88e6xxx_chip *chip);
 int mv88e6xxx_g1_stats_snapshot(struct mv88e6xxx_chip *chip, int port);
 int mv88e6320_g1_stats_snapshot(struct mv88e6xxx_chip *chip, int port);
diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index 13c7cc4..f201d13 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -193,12 +193,12 @@
 
 #define GLOBAL_STATUS		0x00
 #define GLOBAL_STATUS_PPU_STATE BIT(15) /* 6351 and 6171 */
-/* Two bits for 6165, 6185 etc */
-#define GLOBAL_STATUS_PPU_MASK		(0x3 << 14)
-#define GLOBAL_STATUS_PPU_DISABLED_RST	(0x0 << 14)
-#define GLOBAL_STATUS_PPU_INITIALIZING	(0x1 << 14)
-#define GLOBAL_STATUS_PPU_DISABLED	(0x2 << 14)
-#define GLOBAL_STATUS_PPU_POLLING	(0x3 << 14)
+#define GLOBAL_STATUS_PPU_STATE_MASK		(0x3 << 14) /* 6165 6185 */
+#define GLOBAL_STATUS_PPU_STATE_DISABLED_RST	(0x0 << 14)
+#define GLOBAL_STATUS_PPU_STATE_INITIALIZING	(0x1 << 14)
+#define GLOBAL_STATUS_PPU_STATE_DISABLED	(0x2 << 14)
+#define GLOBAL_STATUS_PPU_STATE_POLLING		(0x3 << 14)
+#define GLOBAL_STATUS_INIT_READY	BIT(11)
 #define GLOBAL_STATUS_IRQ_AVB		8
 #define GLOBAL_STATUS_IRQ_DEVICE	7
 #define GLOBAL_STATUS_IRQ_STATS		6
@@ -792,6 +792,9 @@ struct mv88e6xxx_ops {
 	int (*phy_write)(struct mv88e6xxx_chip *chip, int addr, int reg,
 			 u16 val);
 
+	/* Switch Software Reset */
+	int (*reset)(struct mv88e6xxx_chip *chip);
+
 	/* RGMII Receive/Transmit Timing Control
 	 * Add delay on PHY_INTERFACE_MODE_RGMII_*ID, no delay otherwise.
 	 */
-- 
2.10.2

^ permalink raw reply related

* [PATCH v3 net-next v3 2/4] net: dsa: mv88e6xxx: add helper to hardware reset
From: Vivien Didelot @ 2016-12-05 22:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Stefan Eichenberger, Richard Cochran, Vivien Didelot
In-Reply-To: <20161205223028.20308-1-vivien.didelot@savoirfairelinux.com>

Add an helper to toggle the eventual GPIO connected to the reset pin.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 1d4d3be..27dfb5d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2356,6 +2356,19 @@ static void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port)
 	mutex_unlock(&chip->reg_lock);
 }
 
+static void mv88e6xxx_hardware_reset(struct mv88e6xxx_chip *chip)
+{
+	struct gpio_desc *gpiod = chip->reset;
+
+	/* If there is a GPIO connected to the reset pin, toggle it */
+	if (gpiod) {
+		gpiod_set_value_cansleep(gpiod, 1);
+		usleep_range(10000, 20000);
+		gpiod_set_value_cansleep(gpiod, 0);
+		usleep_range(10000, 20000);
+	}
+}
+
 static int mv88e6xxx_disable_ports(struct mv88e6xxx_chip *chip)
 {
 	int i, err;
@@ -2380,7 +2393,6 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
 {
 	bool ppu_active = mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU_ACTIVE);
 	u16 is_reset = (ppu_active ? 0x8800 : 0xc800);
-	struct gpio_desc *gpiod = chip->reset;
 	unsigned long timeout;
 	u16 reg;
 	int err;
@@ -2389,13 +2401,7 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
 	if (err)
 		return err;
 
-	/* If there is a gpio connected to the reset pin, toggle it */
-	if (gpiod) {
-		gpiod_set_value_cansleep(gpiod, 1);
-		usleep_range(10000, 20000);
-		gpiod_set_value_cansleep(gpiod, 0);
-		usleep_range(10000, 20000);
-	}
+	mv88e6xxx_hardware_reset(chip);
 
 	/* Reset the switch. Keep the PPU active if requested. The PPU
 	 * needs to be active to support indirect phy register access
-- 
2.10.2

^ permalink raw reply related

* [PATCH v3 net-next v3 1/4] net: dsa: mv88e6xxx: add helper to disable ports
From: Vivien Didelot @ 2016-12-05 22:30 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Stefan Eichenberger, Richard Cochran, Vivien Didelot
In-Reply-To: <20161205223028.20308-1-vivien.didelot@savoirfairelinux.com>

Before resetting a switch, the ports should be set to the Disabled state
and the transmit queues should be drained.

Add an helper to explicit that.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index ca453f3..1d4d3be 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2356,6 +2356,26 @@ static void mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port)
 	mutex_unlock(&chip->reg_lock);
 }
 
+static int mv88e6xxx_disable_ports(struct mv88e6xxx_chip *chip)
+{
+	int i, err;
+
+	/* Set all ports to the Disabled state */
+	for (i = 0; i < mv88e6xxx_num_ports(chip); i++) {
+		err = mv88e6xxx_port_set_state(chip, i,
+					       PORT_CONTROL_STATE_DISABLED);
+		if (err)
+			return err;
+	}
+
+	/* Wait for transmit queues to drain,
+	 * i.e. 2ms for a maximum frame to be transmitted at 10 Mbps.
+	 */
+	usleep_range(2000, 4000);
+
+	return 0;
+}
+
 static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
 {
 	bool ppu_active = mv88e6xxx_has(chip, MV88E6XXX_FLAG_PPU_ACTIVE);
@@ -2364,18 +2384,10 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip)
 	unsigned long timeout;
 	u16 reg;
 	int err;
-	int i;
 
-	/* Set all ports to the disabled state. */
-	for (i = 0; i < mv88e6xxx_num_ports(chip); i++) {
-		err = mv88e6xxx_port_set_state(chip, i,
-					       PORT_CONTROL_STATE_DISABLED);
-		if (err)
-			return err;
-	}
-
-	/* Wait for transmit queues to drain. */
-	usleep_range(2000, 4000);
+	err = mv88e6xxx_disable_ports(chip);
+	if (err)
+		return err;
 
 	/* If there is a gpio connected to the reset pin, toggle it */
 	if (gpiod) {
-- 
2.10.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox