Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [V2] brcmfmac: drop Inter-Access Point Protocol packets by default
From: Kalle Valo @ 2018-03-16 13:01 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: James Hughes, Arend van Spriel, netdev, Chi-Hsien Lin, bridge,
	linux-wireless, Hante Meuleman, Pieter-Paul Giesberts,
	brcm80211-dev-list.pdl, Wright Feng, Rafał Miłecki,
	Felix Fietkau, brcm80211-dev-list, Franky Lin
In-Reply-To: <20180315072909.1512-1-zajec5@gmail.com>

Rafał Miłecki wrote:

> From: Rafał Miłecki <rafal@milecki.pl>
> 
> Testing brcmfmac with more recent firmwares resulted in AP interfaces
> not working in some specific setups. Debugging resulted in discovering
> support for IAPP in Broadcom's firmwares.
> 
> Older firmwares were only generating 802.11f frames. Newer ones like:
> 1) 10.10 (TOB) (r663589)
> 2) 10.10.122.20 (r683106)
> for 4366b1 and 4366c0 respectively seem to also /respect/ 802.11f frames
> in the Tx path by performing a STA disassociation.
> 
> This obsoleted standard and its implementation is something that:
> 1) Most people don't need / want to use
> 2) Can allow local DoS attacks
> 3) Breaks AP interfaces in some specific bridge setups
> 
> To solve issues it can cause this commit modifies brcmfmac to drop IAPP
> packets. If affects:
> 1) Rx path: driver won't be sending these unwanted packets up.
> 2) Tx path: driver will reject packets that would trigger STA
>    disassociation perfromed by a firmware (possible local DoS attack).
> 
> It appears there are some Broadcom's clients/users who care about this
> feature despite the drawbacks. They can switch it on using a new module
> param.
> 
> This change results in only two more comparisons (check for module param
> and check for Ethernet packet length) for 99.9% of packets. Its overhead
> should be very minimal.
> 
> Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>

Patch applied to wireless-drivers.git, thanks.

125905517028 brcmfmac: drop Inter-Access Point Protocol packets by default

-- 
https://patchwork.kernel.org/patch/10283971/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [V2] brcmfmac: drop Inter-Access Point Protocol packets by default
From: Kalle Valo @ 2018-03-16 13:01 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Pieter-Paul Giesberts, James Hughes,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	brcm80211-dev-list.pdl-dY08KVG/lbpWk0Htik3J/w,
	brcm80211-dev-list-+wT8y+m8/X5BDgjK7y7TUQ,
	netdev-u79uwXL29TY76Z2rM5mHXA, Linus Lüssing, Felix Fietkau,
	bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Rafał Miłecki
In-Reply-To: <20180315072909.1512-1-zajec5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

Rafał Miłecki wrote:

> From: Rafał Miłecki <rafal-g1n6cQUeyibVItvQsEIGlw@public.gmane.org>
> 
> Testing brcmfmac with more recent firmwares resulted in AP interfaces
> not working in some specific setups. Debugging resulted in discovering
> support for IAPP in Broadcom's firmwares.
> 
> Older firmwares were only generating 802.11f frames. Newer ones like:
> 1) 10.10 (TOB) (r663589)
> 2) 10.10.122.20 (r683106)
> for 4366b1 and 4366c0 respectively seem to also /respect/ 802.11f frames
> in the Tx path by performing a STA disassociation.
> 
> This obsoleted standard and its implementation is something that:
> 1) Most people don't need / want to use
> 2) Can allow local DoS attacks
> 3) Breaks AP interfaces in some specific bridge setups
> 
> To solve issues it can cause this commit modifies brcmfmac to drop IAPP
> packets. If affects:
> 1) Rx path: driver won't be sending these unwanted packets up.
> 2) Tx path: driver will reject packets that would trigger STA
>    disassociation perfromed by a firmware (possible local DoS attack).
> 
> It appears there are some Broadcom's clients/users who care about this
> feature despite the drawbacks. They can switch it on using a new module
> param.
> 
> This change results in only two more comparisons (check for module param
> and check for Ethernet packet length) for 99.9% of packets. Its overhead
> should be very minimal.
> 
> Signed-off-by: Rafał Miłecki <rafal-g1n6cQUeyibVItvQsEIGlw@public.gmane.org>
> Acked-by: Arend van Spriel <arend.vanspriel-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

Patch applied to wireless-drivers.git, thanks.

125905517028 brcmfmac: drop Inter-Access Point Protocol packets by default

-- 
https://patchwork.kernel.org/patch/10283971/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)
From: Sowmini Varadhan @ 2018-03-16 13:00 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: santosh.shilimkar, davem, netdev, linux-rdma, rds-devel, edumazet
In-Reply-To: <152120385651.2065.2567466917573907029.stgit@localhost.localdomain>

On (03/16/18 15:38), Kirill Tkhai wrote:
> 
> 467fa15356acf by Sowmini Varadhan added NETDEV_UNREGISTER_FINAL dependence
> with the commentary:
> 
> 	/* rds-tcp registers as a pernet subys, so the ->exit will only
> 	 * get invoked after network acitivity has quiesced. We need to
> 	 * clean up all sockets  to quiesce network activity, and use
> 	 * the unregistration of the per-net loopback device as a trigger
> 	 * to start that cleanup.
> 	 */
> 
> It seems all the protocols pernet subsystems meet this situation, but they
> solve it in generic way. What does rds so specific have?

The difference with rds is this: most consumers of netns associate
a net_device with the netns, and cleanup of the netns state 
happens as part of the net_device teardown without the constraint
above. rds-tcp does has a netns tied to listening socket, not
to a specific network interface (net_device) so it registers
as a pernet-subsys. But this means that cleanup has to be
cone carefully (see comments in net_namespace.h before
register_pernet_subsys)

For rds-tcp, we need to be able to do the right thing in both of these
cases
1. modprobe -r rds-tcp (cleanup of rds-tcp state should happen in
   every namespace, including init_net)
2. netns delete (rds_tcp.ko should remain loaded for other namespaces)

> This commit makes event handler to iterate rds_tcp_conn_list and
> kill them. If we change the stage to NETDEV_UNREGISTER, what will change?

The above two cases need to work correctly.

> Can unregistered loopback device on dead net add new items to rds_tcp_conn_list?
> How it's possible?

I dont understand the question- no unregistered loopback devices
cannot add items. 

fwiw, I had asked questions about this (netns per net_device
vs netns for module) on the netdev list a few years ago, I can
try to hunt down that thread for you later (nobody replied to 
it, but maybe it will help answer your questions).

--Sowmini

^ permalink raw reply

* [PATCH v2] netns: send uevent messages
From: Christian Brauner @ 2018-03-16 12:50 UTC (permalink / raw)
  To: ebiederm, gregkh, netdev, linux-kernel
  Cc: serge, avagin, ktkhai, Christian Brauner

This patch adds a receive method to NETLINK_KOBJECT_UEVENT netlink sockets
to allow sending uevent messages into the network namespace the socket
belongs to.

Currently non-initial network namespaces are already isolated and don't
receive uevents. There are a number of cases where it is beneficial for a
sufficiently privileged userspace process to send a uevent into a network
namespace.

One such use case would be debugging and fuzzing of a piece of software
which listens and reacts to uevents. By running a copy of that software
inside a network namespace, specific uevents could then be presented to it.
More concretely, this would allow for easy testing of udevd/ueventd.

This will also allow some piece of software to run components inside a
separate network namespace and then effectively filter what that software
can receive. Some examples of software that do directly listen to uevents
and that we have in the past attempted to run inside a network namespace
are rbd (CEPH client) or the X server.

Implementation:
The implementation has been kept as simple as possible from the kernel's
perspective. Specifically, a simple input method uevent_net_rcv() is added
to NETLINK_KOBJECT_UEVENT sockets which completely reuses existing
af_netlink infrastructure and does neither add an additional netlink family
nor requires any user-visible changes.

For example, by using netlink_rcv_skb() we can make use of existing netlink
infrastructure to report back informative error messages to userspace.

Furthermore, this implementation does not introduce any overhead for
existing uevent generating codepaths. The struct netns gets a new uevent
socket member that records the uevent socket associated with that network
namespace including its position in the uevent socket list. Since we record
the uevent socket for each network namespace in struct net we don't have to
walk the whole uevent socket list. Instead we can directly retrieve the
relevant uevent socket and send the message. At exit time we can now also
trivially remove the uevent socket from the uevent socket list. This keeps
the codepath very performant without introducing needless overhead and even
makes older codepaths faster.

Uevent sequence numbers are kept global. When a uevent message is sent to
another network namespace the implementation will simply increment the
global uevent sequence number and append it to the received uevent. This
has the advantage that the kernel will never need to parse the received
uevent message to replace any existing uevent sequence numbers. Instead it
is up to the userspace process to remove any existing uevent sequence
numbers in case the uevent message to be sent contains any.

Security:
In order for a caller to send uevent messages to a target network namespace
the caller must have CAP_SYS_ADMIN in the owning user namespace of the
target network namespace. Additionally, any received uevent message is
verified to not exceed size UEVENT_BUFFER_SIZE. This includes the space
needed to append the uevent sequence number.

Testing:
This patch has been tested and verified to work with the following udev
implementations:
1. CentOS 6 with udevd version 147
2. Debian Sid with systemd-udevd version 237
3. Android 7.1.1 with ueventd

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
Changelog v1->v2:
* Add the whole struct uevent_sock to struct net not just the socket
  member. Since struct uevent_sock records the position of the uevent
  socket in the uevent socket list we can trivially remove it from the
  uevent socket list during cleanup. This speeds up the old removal
  codepath. list_del() will hitl __list_del_entry_valid() in its call chain
  which will validate that the element is a member of the list. If it isn't
  it will take care that the list is not modified.
Changelog v0->v1:
* Hold mutex_lock() until uevent is sent to preserve uevent message
  ordering. See udev and commit for reference:

  commit 7b60a18da393ed70db043a777fd9e6d5363077c4
  Author: Andrew Vagin <avagin@openvz.org>
  Date:   Wed Mar 7 14:49:56 2012 +0400

      uevent: send events in correct order according to seqnum (v3)

      The queue handling in the udev daemon assumes that the events are
      ordered.
---
 include/linux/kobject.h     |  6 +++
 include/net/net_namespace.h |  4 +-
 lib/kobject_uevent.c        | 98 ++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 93 insertions(+), 15 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 7f6f93c3df9c..c572c7abc609 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -39,6 +39,12 @@ extern char uevent_helper[];
 /* counter to tag the uevent, read only except for the kobject core */
 extern u64 uevent_seqnum;

+/* uevent socket */
+struct uevent_sock {
+	struct list_head list;
+	struct sock *sk;
+};
+
 /*
  * The actions here must match the index to the string array
  * in lib/kobject_uevent.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index f306b2aa15a4..abd7d91bffac 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -40,7 +40,7 @@ struct net_device;
 struct sock;
 struct ctl_table_header;
 struct net_generic;
-struct sock;
+struct uevent_sock;
 struct netns_ipvs;

@@ -79,6 +79,8 @@ struct net {
 	struct sock 		*rtnl;			/* rtnetlink socket */
 	struct sock		*genl_sock;

+	struct uevent_sock	*uevent_sock;		/* uevent socket */
+
 	struct list_head 	dev_base_head;
 	struct hlist_head 	*dev_name_head;
 	struct hlist_head	*dev_index_head;
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 9fe6ec8fda28..53e9123474c0 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -25,6 +25,7 @@
 #include <linux/uuid.h>
 #include <linux/ctype.h>
 #include <net/sock.h>
+#include <net/netlink.h>
 #include <net/net_namespace.h>

@@ -33,10 +34,6 @@ u64 uevent_seqnum;
 char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH;
 #endif
 #ifdef CONFIG_NET
-struct uevent_sock {
-	struct list_head list;
-	struct sock *sk;
-};
 static LIST_HEAD(uevent_sock_list);
 #endif

@@ -602,12 +599,88 @@ int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...)
 EXPORT_SYMBOL_GPL(add_uevent_var);

 #if defined(CONFIG_NET)
+static int uevent_net_broadcast(struct sock *usk, struct sk_buff *skb,
+				struct netlink_ext_ack *extack)
+{
+	int ret;
+	/* u64 to chars: 2^64 - 1 = 21 chars */
+	char buf[sizeof("SEQNUM=") + 21];
+	struct sk_buff *skbc;
+
+	/* bump and prepare sequence number */
+	ret = snprintf(buf, sizeof(buf), "SEQNUM=%llu", ++uevent_seqnum);
+	if (ret < 0 || (size_t)ret >= sizeof(buf))
+		return -ENOMEM;
+	ret++;
+
+	/* verify message does not overflow */
+	if ((skb->len + ret) > UEVENT_BUFFER_SIZE) {
+		NL_SET_ERR_MSG(extack, "uevent message too big");
+		return -EINVAL;
+	}
+
+	/* copy skb and extend to accommodate sequence number */
+	skbc = skb_copy_expand(skb, 0, ret, GFP_KERNEL);
+	if (!skbc)
+		return -ENOMEM;
+
+	/* append sequence number */
+	skb_put_data(skbc, buf, ret);
+
+	/* remove msg header */
+	skb_pull(skbc, NLMSG_HDRLEN);
+
+	/* set portid 0 to inform userspace message comes from kernel */
+	NETLINK_CB(skbc).portid = 0;
+	NETLINK_CB(skbc).dst_group = 1;
+
+	ret = netlink_broadcast(usk, skbc, 0, 1, GFP_KERNEL);
+	/* ENOBUFS should be handled in userspace */
+	if (ret == -ENOBUFS || ret == -ESRCH)
+		ret = 0;
+
+	return ret;
+}
+
+static int uevent_net_rcv_skb(struct sk_buff *skb, struct nlmsghdr *nlh,
+			      struct netlink_ext_ack *extack)
+{
+	int ret;
+	struct net *net;
+
+	if (!nlmsg_data(nlh))
+		return -EINVAL;
+
+	/*
+	 * Verify that we are allowed to send messages to the target
+	 * network namespace. The caller must have CAP_SYS_ADMIN in the
+	 * owning user namespace of the target network namespace.
+	 */
+	net = sock_net(NETLINK_CB(skb).sk);
+	if (!netlink_ns_capable(skb, net->user_ns, CAP_SYS_ADMIN)) {
+		NL_SET_ERR_MSG(extack, "missing CAP_SYS_ADMIN capability");
+		return -EPERM;
+	}
+
+	mutex_lock(&uevent_sock_mutex);
+	ret = uevent_net_broadcast(net->uevent_sock->sk, skb, extack);
+	mutex_unlock(&uevent_sock_mutex);
+
+	return ret;
+}
+
+static void uevent_net_rcv(struct sk_buff *skb)
+{
+	netlink_rcv_skb(skb, &uevent_net_rcv_skb);
+}
+
 static int uevent_net_init(struct net *net)
 {
 	struct uevent_sock *ue_sk;
 	struct netlink_kernel_cfg cfg = {
 		.groups	= 1,
-		.flags	= NL_CFG_F_NONROOT_RECV,
+		.input = uevent_net_rcv,
+		.flags	= NL_CFG_F_NONROOT_RECV
 	};

 	ue_sk = kzalloc(sizeof(*ue_sk), GFP_KERNEL);
@@ -621,6 +694,9 @@ static int uevent_net_init(struct net *net)
 		kfree(ue_sk);
 		return -ENODEV;
 	}
+
+	net->uevent_sock = ue_sk;
+
 	mutex_lock(&uevent_sock_mutex);
 	list_add_tail(&ue_sk->list, &uevent_sock_list);
 	mutex_unlock(&uevent_sock_mutex);
@@ -629,22 +705,16 @@ static int uevent_net_init(struct net *net)

 static void uevent_net_exit(struct net *net)
 {
-	struct uevent_sock *ue_sk;
+	struct uevent_sock *ue_sk = net->uevent_sock;

 	mutex_lock(&uevent_sock_mutex);
-	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
-		if (sock_net(ue_sk->sk) == net)
-			goto found;
-	}
-	mutex_unlock(&uevent_sock_mutex);
-	return;
-
-found:
 	list_del(&ue_sk->list);
 	mutex_unlock(&uevent_sock_mutex);

 	netlink_kernel_release(ue_sk->sk);
 	kfree(ue_sk);
+
+	return;
 }

 static struct pernet_operations uevent_net_ops = {
-- 
2.15.1

^ permalink raw reply related

* [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)
From: Kirill Tkhai @ 2018-03-16 12:38 UTC (permalink / raw)
  To: santosh.shilimkar, davem, netdev, linux-rdma, rds-devel, edumazet,
	ktkhai, sowmini.varadhan

Hi,

rds_tcp_dev_event() is the last user of NETDEV_UNREGISTER_FINAL stage.
If we rework it, we'll be able to kill the stage, and this will decrease
the number of rtnl_lock() we take during generic net device unregistration.
This is very hot path for namespaces.

467fa15356acf by Sowmini Varadhan added NETDEV_UNREGISTER_FINAL dependence
with the commentary:

	/* rds-tcp registers as a pernet subys, so the ->exit will only
	 * get invoked after network acitivity has quiesced. We need to
	 * clean up all sockets  to quiesce network activity, and use
	 * the unregistration of the per-net loopback device as a trigger
	 * to start that cleanup.
	 */

It seems all the protocols pernet subsystems meet this situation, but they
solve it in generic way. What does rds so specific have?

This commit makes event handler to iterate rds_tcp_conn_list and
kill them. If we change the stage to NETDEV_UNREGISTER, what will change?
Can unregistered loopback device on dead net add new items to rds_tcp_conn_list?
How it's possible?

What the problem is to move rds_tcp_kill_sock() into pernet subsys exit?

Kirill
---
 net/rds/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index eb04e7fa2467..4c6db9cb6261 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -567,7 +567,7 @@ static int rds_tcp_dev_event(struct notifier_block *this,
 	 * the unregistration of the per-net loopback device as a trigger
 	 * to start that cleanup.
 	 */
-	if (event == NETDEV_UNREGISTER_FINAL &&
+	if (event == NETDEV_UNREGISTER &&
 	    dev->ifindex == LOOPBACK_IFINDEX)
 		rds_tcp_kill_sock(dev_net(dev));

^ permalink raw reply related

* Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()
From: Florian Weimer @ 2018-03-16 11:47 UTC (permalink / raw)
  To: Kees Cook, Andrew Morton
  Cc: Linus Torvalds, Josh Poimboeuf, Rasmus Villemoes, Randy Dunlap,
	Miguel Ojeda, Ingo Molnar, David Laight, Ian Abbott, linux-input,
	linux-btrfs, netdev, linux-kernel, kernel-hardening
In-Reply-To: <1521174359-46392-1-git-send-email-keescook@chromium.org>

On 03/16/2018 05:25 AM, Kees Cook wrote:
> In the effort to remove all VLAs from the kernel[1], it is desirable to
> build with -Wvla. However, this warning is overly pessimistic, in that
> it is only happy with stack array sizes that are declared as constant
> expressions, and not constant values. One case of this is the evaluation
> of the max() macro which, due to its construction, ends up converting
> constant expression arguments into a constant value result. Attempts
> to adjust the behavior of max() ran afoul of version-dependent compiler
> behavior[2].

I find this commit message confusing.  VLAs have precisely defined 
semantics which differ from other arrays, and these differences can be 
observable (maybe not in the kernel, but certainly for userspace), so 
the compiler has to treat a VLA as such even if the length is a constant 
known at compile time.  (The original intent of the warning probably was 
a portability check anyway.)

If you want to catch stack frames which have unbounded size, 
-Werror=stack-usage=1000 or -Werror=vla-larger-than=1000 (with the 
constant adjusted as needed) might be the better approach.

Thanks,
Florian

^ permalink raw reply

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
From: Jason Wang @ 2018-03-16 11:36 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180316100413.vtqwatregzrmhvt3@debian>



On 2018年03月16日 18:04, Tiwei Bie wrote:
> On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
>> On 2018年03月16日 15:40, Tiwei Bie wrote:
>>> On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
>>>> On 2018年03月16日 14:10, Tiwei Bie wrote:
>>>>> On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
>>>>>> On 2018年02月23日 19:18, Tiwei Bie wrote:
>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>>>> ---
>>>>>>>      drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>>>>>>>      include/linux/virtio_ring.h  |   8 +-
>>>>>>>      2 files changed, 618 insertions(+), 89 deletions(-)
>>>

[...]

>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +	for (; n < (out_sgs + in_sgs); n++) {
>>>>>>> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>>>>>> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>>>>>>> +			if (vring_mapping_error(vq, addr))
>>>>>>> +				goto unmap_release;
>>>>>>> +
>>>>>>> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>>>>>> +					VRING_DESC_F_WRITE |
>>>>>>> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>>>>>> +					VRING_DESC_F_USED(!vq->wrap_counter));
>>>>>>> +			if (!indirect && i == head)
>>>>>>> +				head_flags = flags;
>>>>>>> +			else
>>>>>>> +				desc[i].flags = flags;
>>>>>>> +
>>>>>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>>>>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>>>>>> +			prev = i;
>>>>>>> +			i++;
>>>>>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>>>>>> +				i = 0;
>>>>>>> +				vq->wrap_counter ^= 1;
>>>>>>> +			}
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +	/* Last one doesn't continue. */
>>>>>>> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
>>>>>>> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>>>>> I can't get the why we need this here.
>>>>> If only one desc is used, we will need to clear the
>>>>> VRING_DESC_F_NEXT flag from the head_flags.
>>>> Yes, I meant why following desc[prev].flags won't work for this?
>>> Because the update of desc[head].flags (in above case,
>>> prev == head) has been delayed. The flags is saved in
>>> head_flags.
>> Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
>> counter.
>>
>> And I see lots of duplication in the above two loops, I believe we can unify
>> them with a a single loop. the only difference is dma direction and write
>> flag.
> The above implementation for packed ring is basically
> an mirror of the existing implementation in split ring
> as I want to keep the coding style consistent. Below
> is the corresponding code in split ring:
>
> static inline int virtqueue_add(struct virtqueue *_vq,
> 				struct scatterlist *sgs[],
> 				unsigned int total_sg,
> 				unsigned int out_sgs,
> 				unsigned int in_sgs,
> 				void *data,
> 				void *ctx,
> 				gfp_t gfp)
> {
> 	......
>
> 	for (n = 0; n < out_sgs; n++) {
> 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> 			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> 			if (vring_mapping_error(vq, addr))
> 				goto unmap_release;
>
> 			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
> 			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> 			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> 			prev = i;
> 			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> 		}
> 	}
> 	for (; n < (out_sgs + in_sgs); n++) {
> 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> 			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> 			if (vring_mapping_error(vq, addr))
> 				goto unmap_release;
>
> 			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
> 			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> 			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> 			prev = i;
> 			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> 		}
> 	}
>
> 	......
> }

There's no need for such consistency especially consider it's a new kind 
of ring. Anyway, you can stick to it.

[...]

>>>>>>> +	} else
>>>>>>> +		vq->free_head = i;
>>>>>> ID is only valid in the last descriptor in the list, so head + 1 should be
>>>>>> ok too?
>>>>> I don't really get your point. The vq->free_head stores
>>>>> the index of the next avail desc.
>>>> I think I get your idea now, free_head has two meanings:
>>>>
>>>> - next avail index
>>>> - buffer id
>>> In my design, free_head is just the index of the next
>>> avail desc.
>>>
>>> Driver can set anything to buffer ID.
>> Then you need another method to track id to context e.g hashing.
> I keep the context in desc_state[desc_idx]. So there is
> no extra method needed to track the context.

Well, it works for this patch but my reply was for "set anything to 
buffer ID". The size of desc_state is limited, so in fact you can't use 
a value greater than vq.num.


>

[...]

>> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>>>>>>>      	if (!queue) {
>>>>>>>      		/* Try to get a single page. You are my only hope! */
>>>>>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>>>>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>>>>>> +							     packed),
>>>>>>>      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>>>>>>>      	}
>>>>>>>      	if (!queue)
>>>>>>>      		return NULL;
>>>>>>> -	queue_size_in_bytes = vring_size(num, vring_align);
>>>>>>> -	vring_init(&vring, num, queue, vring_align);
>>>>>>> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
>>>>>>> +	if (packed)
>>>>>>> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
>>>>>>> +	else
>>>>>>> +		vring_init(&vring.vring_split, num, queue, vring_align);
>>>>>> Let's rename vring_init to vring_init_split() like other helpers?
>>>>> The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
>>>>> I don't think we can rename it.
>>>> I see, then this need more thoughts to unify the API.
>>> My thought is to keep the old API as is, and introduce
>>> new types and helpers for packed ring.
>> I admit it's not a fault of this patch. But we'd better think of this in the
>> future, consider we may have new kinds of ring.
>>
>>> More details can be found in this patch:
>>> https://lkml.org/lkml/2018/2/23/243
>>> (PS. The type which has bit fields is just for reference,
>>>    and will be changed in next version.)
>>>
>>> Do you have any other suggestions?
>> No.
> Hmm.. Sorry, I didn't describe my question well.
> I mean do you have any suggestions about the API
> design for packed ring in uapi header? Currently
> I introduced below two new helpers:
>
> static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> 				     void *p, unsigned long align);
> static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
>
> When new rings are introduced in the future, above
> helpers can't be reused. Maybe we should make the
> helpers be able to determine the ring type?

Let's wait for Michael's comment here. Generally, I fail to understand 
why vring_init() become a part of uapi. Git grep shows the only use 
cases are virtio_test/vringh_test.

Thanks

>
> Best regards,
> Tiwei Bie
>
>> Thanks
>>
>>> Best regards,
>>> Tiwei Bie
>>>
>>>>>>> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>>>>>> -				   notify, callback, name);
>>>>>>> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>>>>>> +				   context, notify, callback, name);
>>>>>>>      	if (!vq) {
>>>>>>>      		vring_free_queue(vdev, queue_size_in_bytes, queue,
>>>>>>>      				 dma_addr);
>>> [...]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* RE: [PATCH v6 0/6] staging: Introduce DPAA2 Ethernet Switch driver
From: Razvan Stefanescu @ 2018-03-16 11:36 UTC (permalink / raw)
  To: Dan Carpenter, Andrew Lunn, Laurentiu Tudor, stuyoder@gmail.com
  Cc: devel@driverdev.osuosl.org, arnd@arndb.de,
	gregkh@linuxfoundation.org, Alexandru Marginean, Alexander Graf,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Ioana Ciornei
In-Reply-To: <20180315105642.szn2zhrsgwsv35yf@mwanda>



> -----Original Message-----
> From: Dan Carpenter [mailto:dan.carpenter@oracle.com]
> Sent: Thursday, March 15, 2018 12:57 PM
> To: Andrew Lunn <andrew@lunn.ch>; Laurentiu Tudor
> <laurentiu.tudor@nxp.com>; stuyoder@gmail.com
> Cc: Razvan Stefanescu <razvan.stefanescu@nxp.com>;
> devel@driverdev.osuosl.org; arnd@arndb.de; gregkh@linuxfoundation.org;
> Ioana Ciornei <ioana.ciornei@nxp.com>; Alexandru Marginean
> <alexandru.marginean@nxp.com>; Alexander Graf <agraf@suse.de>; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [PATCH v6 0/6] staging: Introduce DPAA2 Ethernet Switch driver
> 
> On Thu, Mar 15, 2018 at 12:44:37AM +0100, Andrew Lunn wrote:
> > On Wed, Mar 14, 2018 at 10:55:52AM -0500, Razvan Stefanescu wrote:
> > > This patchset introduces the Ethernet Switch Driver for Freescale/NXP
> SoCs
> > > with DPAA2 (DataPath Acceleration Architecture v2). The driver manages
> > > switch objects discovered on the fsl-mc bus. A description of the driver
> > > can be found in the associated README file.
> >
> > Hi Greg
> >
> > This code has much better quality than the usual stuff in staging. I
> > see no reason not to merge it.
> 
> Yeah.  It seems pretty decent.  Stuart, Laurentiu, care to comment?
> 
> Meanwhile, netdev and DaveM aren't even on the CC list and they're the
> ones to ultimately decide.

I'll keep in mind to add Dave M and netdev list on CC for the next contributions.
Thank you for the suggestion.

Regards,
Razvan
> 
> regards,
> dan carpenter

^ permalink raw reply

* [RFC v2 2/2] cxgb4: collect hardware dump in second kernel
From: Rahul Lakkireddy @ 2018-03-16 11:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: indranil-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
	Rahul Lakkireddy, ganeshgr-ut6Up61K2wZBDgjK7y7TUQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, ebiederm-aS9lmoZGLiVWk0Htik3J/w
In-Reply-To: <1521198725-13463-1-git-send-email-rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

Register callback to collect hardware/firmware dumps in second kernel
before hardware/firmware is initialized.  The dumps for each device
will be available under /proc/crashdd/cxgb4/ directory in second
kernel.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Ganesh Goudar <ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
v2:
- Updated dump registration to the new API in patch 1.

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h       |  4 ++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 25 ++++++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h |  3 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  | 12 ++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index b2df0ffb7c94..00201e98ad19 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -50,6 +50,7 @@
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
 #include <linux/ptp_classify.h>
+#include <linux/crashdd.h>
 #include <asm/io.h>
 #include "t4_chip_type.h"
 #include "cxgb4_uld.h"
@@ -959,6 +960,9 @@ struct adapter {
 
 	/* HMA */
 	struct hma_data hma;
+
+	/* Dump buffer for collecting logs in kdump kernel */
+	struct crashdd_data crashdd_buf;
 };
 
 /* Support for "sched-class" command to allow a TX Scheduling Class to be
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
index 143686c60234..ce9f544781af 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -488,3 +488,28 @@ void cxgb4_init_ethtool_dump(struct adapter *adapter)
 	adapter->eth_dump.version = adapter->params.fw_vers;
 	adapter->eth_dump.len = 0;
 }
+
+static int cxgb4_cudbg_crashdd_collect(struct crashdd_data *data, void *buf)
+{
+	struct adapter *adap = container_of(data, struct adapter, crashdd_buf);
+	u32 len = data->size;
+
+	return cxgb4_cudbg_collect(adap, buf, &len, CXGB4_ETH_DUMP_ALL);
+}
+
+int cxgb4_cudbg_crashdd_add_dump(struct adapter *adap)
+{
+	struct crashdd_data *data = &adap->crashdd_buf;
+	u32 len;
+
+	len = sizeof(struct cudbg_hdr) +
+	      sizeof(struct cudbg_entity_hdr) * CUDBG_MAX_ENTITY;
+	len += CUDBG_DUMP_BUFF_SIZE;
+
+	data->size = len;
+	snprintf(data->name, sizeof(data->name), "%s_%s", cxgb4_driver_name,
+		 adap->name);
+	data->crashdd_callback = cxgb4_cudbg_crashdd_collect;
+
+	return crashdd_add_dump(cxgb4_driver_name, data);
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
index ce1ac9a1c878..095c6f04357e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h
@@ -41,8 +41,11 @@ enum CXGB4_ETHTOOL_DUMP_FLAGS {
 	CXGB4_ETH_DUMP_HW = (1 << 1), /* various FW and HW dumps */
 };
 
+#define CXGB4_ETH_DUMP_ALL (CXGB4_ETH_DUMP_MEM | CXGB4_ETH_DUMP_HW)
+
 u32 cxgb4_get_dump_length(struct adapter *adap, u32 flag);
 int cxgb4_cudbg_collect(struct adapter *adap, void *buf, u32 *buf_size,
 			u32 flag);
 void cxgb4_init_ethtool_dump(struct adapter *adapter);
+int cxgb4_cudbg_crashdd_add_dump(struct adapter *adap);
 #endif /* __CXGB4_CUDBG_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 5a349e1576cb..6a380bee85b8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5484,6 +5484,18 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		goto out_free_adapter;
 
+	if (is_kdump_kernel()) {
+		/* Collect hardware state and append to /proc/crashdd/cxgb4/
+		 * directory
+		 */
+		err = cxgb4_cudbg_crashdd_add_dump(adapter);
+		if (err) {
+			dev_warn(adapter->pdev_dev,
+				 "Fail collecting crash driver dump, err: %d. Continuing\n",
+				 err);
+			err = 0;
+		}
+	}
 
 	if (!is_t4(adapter->params.chip)) {
 		s_qpp = (QUEUESPERPAGEPF0_S +
-- 
2.14.1

^ permalink raw reply related

* [RFC v2 1/2] proc/crashdd: add API to collect hardware dump in second kernel
From: Rahul Lakkireddy @ 2018-03-16 11:12 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: indranil-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ,
	Rahul Lakkireddy, ganeshgr-ut6Up61K2wZBDgjK7y7TUQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, ebiederm-aS9lmoZGLiVWk0Htik3J/w
In-Reply-To: <1521198725-13463-1-git-send-email-rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

Add a new module crashdd that exports the /proc/crashdd/ directory
in second kernel, containing collected hardware/firmware dumps.

The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/crashdd/ directory are as
follows:

1. During probe (before hardware is initialized), device drivers
register to the crashdd module (via crashdd_add_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. Crashdd creates a driver's directory under /proc/crashdd/<driver>.
Then, it allocates the buffer with requested size and invokes the
device driver's registered callback function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to crashdd.

4. Crashdd exposes the buffer as a file via
/proc/crashdd/<driver>/<dump_file>.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Ganesh Goudar <ganeshgr-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
---
v2:
- Patch added in this series.

 fs/proc/Kconfig         |  11 ++
 fs/proc/Makefile        |   1 +
 fs/proc/crashdd.c       | 263 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/crashdd.h |  43 ++++++++
 4 files changed, 318 insertions(+)
 create mode 100644 fs/proc/crashdd.c
 create mode 100644 include/linux/crashdd.h

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 1ade1206bb89..c378edffe7b3 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -43,6 +43,17 @@ config PROC_VMCORE
         help
         Exports the dump image of crashed kernel in ELF format.
 
+config PROC_CRASH_DRIVER_DUMP
+	bool "/proc/crashdd support"
+	depends on PROC_FS && CRASH_DUMP
+	default y
+        ---help---
+	Device drivers can collect the device specific snapshot of
+	their hardware or firmware before they are initialized in
+	crash recovery kernel. If you say Y here a tree of device
+	specific dumps will be made available under /proc/crashdd/
+	directory.
+
 config PROC_SYSCTL
 	bool "Sysctl support (/proc/sys)" if EXPERT
 	depends on PROC_FS
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index ead487e80510..73883bc857b5 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -33,3 +33,4 @@ proc-$(CONFIG_PROC_KCORE)	+= kcore.o
 proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
 proc-$(CONFIG_PRINTK)	+= kmsg.o
 proc-$(CONFIG_PROC_PAGE_MONITOR)	+= page.o
+proc-$(CONFIG_PROC_CRASH_DRIVER_DUMP) += crashdd.o
diff --git a/fs/proc/crashdd.c b/fs/proc/crashdd.c
new file mode 100644
index 000000000000..c7585031541e
--- /dev/null
+++ b/fs/proc/crashdd.c
@@ -0,0 +1,263 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+#include <linux/crash_dump.h>
+#include <linux/crashdd.h>
+#include "internal.h"
+
+static LIST_HEAD(crashdd_list);
+static DEFINE_MUTEX(crashdd_mutex);
+
+#define CRASHDD_PROC_PERM 400 /* S_ISRUSR */
+static struct proc_dir_entry *proc_crashdd;
+
+static ssize_t crashdd_read_data(struct file *file, char __user *buffer,
+				 size_t buflen, loff_t *fpos)
+{
+	struct crashdd_dump_node *dump = PDE_DATA(file->f_inode);
+	unsigned long len;
+	char *start;
+
+	if (*fpos < 0)
+		return -EINVAL;
+
+	if (!dump || !buflen)
+		return 0;
+
+	if (*fpos >= dump->size)
+		return 0;
+
+	len = dump->size - *fpos;
+	if (len > buflen)
+		len = buflen;
+
+	start = (char *)dump->buf + *fpos;
+	if (copy_to_user(buffer, start, len))
+		return -EFAULT;
+
+	*fpos += len;
+	return len;
+}
+
+static const struct file_operations proc_crashdd_ops = {
+	.read   = crashdd_read_data,
+	.llseek = default_llseek,
+	.open   = simple_open,
+};
+
+static void *crashdd_proc_mkdir(const char *name)
+{
+	return proc_mkdir_mode(name, CRASHDD_PROC_PERM, proc_crashdd);
+}
+
+static void *crashdd_proc_add(struct proc_dir_entry *parent,
+			      const char *name, void *dump)
+{
+	return proc_create_data(name, CRASHDD_PROC_PERM, parent,
+				&proc_crashdd_ops, dump);
+}
+
+static void crashdd_proc_del(struct proc_dir_entry *entry)
+{
+	proc_remove(entry);
+}
+
+/**
+ * crashdd_init_driver - create a proc driver context.
+ * @name: Name of the directory.
+ *
+ * Creates a directory under /proc/crashdd/ with @name.  Allocates and
+ * saves the proc context.  The proc context is added to the global list
+ * and then returned to the caller. On failure, returns NULL.
+ */
+static struct crashdd_driver_node *crashdd_init_driver(const char *name)
+{
+	struct crashdd_driver_node *node;
+
+	node = vzalloc(sizeof(*node));
+	if (!node)
+		return NULL;
+
+	/* Create a driver's directory under /proc/crashdd/ */
+	node->proc_node = crashdd_proc_mkdir(name);
+	if (!node->proc_node) {
+		vfree(node);
+		return NULL;
+	}
+
+	atomic_set(&node->refcnt, 1);
+
+	/* Initialize the list of dumps that go under this driver's
+	 * directory.
+	 */
+	INIT_LIST_HEAD(&node->dump_list);
+
+	/* Add the driver's context to global list */
+	mutex_lock(&crashdd_mutex);
+	list_add_tail(&node->list, &crashdd_list);
+	mutex_unlock(&crashdd_mutex);
+
+	return node;
+}
+
+/**
+ * crashdd_get_driver - get an exisiting proc driver context.
+ * @name: Name of the directory.
+ *
+ * Searches and fetches a proc context having @name.  If @name is
+ * found, then the reference count is incremented and the context
+ * is returned.  If @name is not found, NULL is returned.
+ */
+static struct crashdd_driver_node *crashdd_get_driver(const char *name)
+{
+	struct crashdd_driver_node *node;
+	int found = 0;
+
+	/* Search for an existing driver context having @name */
+	mutex_lock(&crashdd_mutex);
+	list_for_each_entry(node, &crashdd_list, list) {
+		if (!strcmp(node->proc_node->name, name)) {
+			atomic_inc(&node->refcnt);
+			found = 1;
+			break;
+		}
+	}
+	mutex_unlock(&crashdd_mutex);
+
+	if (found)
+		return node;
+
+	/* No driver with @name found */
+	return NULL;
+}
+
+/**
+ * crashdd_put_driver - put an exisiting proc driver context.
+ * @node: driver proc context
+ *
+ * Decrement @node reference count.  If there are no dumps left under it,
+ * delete the proc directory and remove it from the global list.
+ */
+static void crashdd_put_driver(struct crashdd_driver_node *node)
+{
+	mutex_lock(&crashdd_mutex);
+	if (atomic_dec_and_test(&node->refcnt)) {
+		/* Delete @node driver context if it has no dumps under it */
+		crashdd_proc_del(node->proc_node);
+		node->proc_node = NULL;
+		list_del(&node->list);
+	}
+	mutex_unlock(&crashdd_mutex);
+}
+
+/**
+ * crashdd_add_dump - Allocate a directory under /proc/crashdd/ and add the
+ * dump to it.
+ * @driver_name: directory name under which the dump should be added.
+ * @data: dump info.
+ *
+ * Search for @driver_name directory under /proc/crashdd/.  If not found,
+ * allocate a new directory under /proc/crashdd/ with @driver_name.
+ * Allocate the dump context and invoke the calling driver's dump collect
+ * routine.  Once collection is done, add the dump under
+ * /proc/crashdd/@driver_name/ directory.
+ */
+int crashdd_add_dump(const char *driver_name, struct crashdd_data *data)
+{
+	struct crashdd_driver_node *node;
+	struct crashdd_dump_node *dump;
+	void *buf = NULL;
+	int ret;
+
+	if (!driver_name || !strlen(driver_name) ||
+	    !data || !strlen(data->name) ||
+	    !data->crashdd_callback || !data->size)
+		return -EINVAL;
+
+	/* Get a driver proc context with specified name. */
+	node = crashdd_get_driver(driver_name);
+	if (!node) {
+		/* No driver proc context found with specified name.
+		 * So create a new one
+		 */
+		node = crashdd_init_driver(driver_name);
+		if (!node)
+			return -ENOMEM;
+	}
+
+	dump = vzalloc(sizeof(*dump));
+	if (!dump) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	/* Allocate buffer for driver's to write their dumps */
+	buf = vzalloc(data->size);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	/* Allocate a proc file under /proc/crashdd/@driver_name/.
+	 * Also set the dump as proc file's data
+	 */
+	dump->proc_node = crashdd_proc_add(node->proc_node, data->name, dump);
+	if (!dump->proc_node) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	/* Invoke the driver's dump collection routing */
+	ret = data->crashdd_callback(data, buf);
+	if (ret)
+		goto out_err;
+
+	dump->buf = buf;
+	dump->size = data->size;
+	dump->proc_node->size = dump->size;
+
+	/* Add the dump to driver proc context list */
+	mutex_lock(&crashdd_mutex);
+	list_add_tail(&dump->list, &node->dump_list);
+	atomic_inc(&node->refcnt);
+	mutex_unlock(&crashdd_mutex);
+
+	/* Return back the driver proc context reference */
+	crashdd_put_driver(node);
+	return 0;
+
+out_err:
+	if (buf)
+		vfree(buf);
+
+	if (dump) {
+		if (dump->proc_node) {
+			crashdd_proc_del(dump->proc_node);
+			dump->proc_node = NULL;
+		}
+		vfree(dump);
+	}
+
+	crashdd_put_driver(node);
+	return ret;
+}
+EXPORT_SYMBOL(crashdd_add_dump);
+
+/* Init function for crash driver dump module. */
+static int __init crashdd_proc_init(void)
+{
+	/*
+	 * Only export this directory in 2nd kernel.
+	 */
+	if (!is_kdump_kernel())
+		return 0;
+
+	/* Create /proc/crashdd/ directory */
+	proc_crashdd = proc_mkdir_mode("crashdd", CRASHDD_PROC_PERM, NULL);
+	if (!proc_crashdd)
+		return -ENOMEM;
+
+	return 0;
+}
+fs_initcall(crashdd_proc_init);
diff --git a/include/linux/crashdd.h b/include/linux/crashdd.h
new file mode 100644
index 000000000000..1f1ec280a2bb
--- /dev/null
+++ b/include/linux/crashdd.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef CRASH_DRIVER_DUMP_H
+#define CRASH_DRIVER_DUMP_H
+
+/* Max driver/dump name length */
+#define CRASHDD_NAME_LENGTH 32
+
+/* Dump proc context internal to crashdd */
+struct crashdd_dump_node {
+	/* Pointer to list of dumps under the driver proc context */
+	struct list_head list;
+	void *buf;             /* Buffer containing device's dump */
+	unsigned long size;    /* Size of the buffer */
+	/* Pointer to dump's entry in driver directory */
+	struct proc_dir_entry *proc_node;
+};
+
+/* Driver proc context internal to crashdd */
+struct crashdd_driver_node {
+	/* Pointer to global list of driver proc contexts */
+	struct list_head list;
+	struct list_head dump_list; /* List of dumps under this driver */
+	atomic_t refcnt;            /* Number of dumps under this directory */
+	/* Pointer to driver directory entry */
+	struct proc_dir_entry *proc_node;
+};
+
+/* Driver Dump information to be filled by drivers */
+struct crashdd_data {
+	char name[CRASHDD_NAME_LENGTH]; /* Unique name of the dump */
+	unsigned long size;             /* Size of the dump */
+	/* Driver's registered callback to be invoked to collect dump */
+	int (*crashdd_callback)(struct crashdd_data *data, void *buf);
+};
+
+#ifdef CONFIG_PROC_CRASH_DRIVER_DUMP
+int crashdd_add_dump(const char *driver_name, struct crashdd_data *data);
+#else
+#define crashdd_add_dump(x, y) 0
+#endif /* CONFIG_PROC_CRASH_DRIVER_DUMP */
+
+#endif /* CRASH_DRIVER_DUMP_H */
-- 
2.14.1

^ permalink raw reply related

* [RFC v2 0/2] kernel: add support to collect hardware logs in crash recovery kernel
From: Rahul Lakkireddy @ 2018-03-16 11:12 UTC (permalink / raw)
  To: linux-kernel, netdev, kexec
  Cc: davem, ebiederm, akpm, torvalds, ganeshgr, nirranjan, indranil,
	Rahul Lakkireddy

On production servers running variety of workloads over time, kernel
panic can happen sporadically after days or even months. It is
important to collect as much debug logs as possible to root cause
and fix the problem, that may not be easy to reproduce. Snapshot of
underlying hardware/firmware state (like register dump, firmware
logs, adapter memory, etc.), at the time of kernel panic will be very
helpful while debugging the culprit device driver.

This series of patches add new generic framework that enable device
drivers to collect device specific snapshot of the hardware/firmware
state of the underlying device in the crash recovery kernel. In crash
recovery kernel, the collected logs are exposed via /proc/crashdd/
directory, which is copied by user space scripts for post-analysis.

A kernel module crashdd is newly added. In crash recovery kernel,
crashdd exposes /proc/crashdd/ directory containing device specific
hardware/firmware logs.

The sequence of actions done by device drivers to append their device
specific hardware/firmware logs to /proc/crashdd/ directory are as
follows:

1. During probe (before hardware is initialized), device drivers
register to the crashdd module (via crashdd_add_dump()), with
callback function, along with buffer size and log name needed for
firmware/hardware log collection.

2. Crashdd creates a driver's directory under /proc/crashdd/<driver>.
Then, it allocates the buffer with requested size and invokes the
device driver's registered callback function.

3. Device driver collects all hardware/firmware logs into the buffer
and returns control back to crashdd.

4. Crashdd exposes the buffer as a file via
/proc/crashdd/<driver>/<dump_file>.

5. User space script (/usr/lib/kdump/kdump-lib-initramfs.sh) copies
the entire /proc/crashdd/ directory to /var/crash/ directory.

Patch 1 adds crashdd module to allow drivers to register callback to
collect the device specific hardware/firmware logs.  The module also
exports /proc/crashdd/ directory containing the hardware/firmware logs.

Patch 2 shows a cxgb4 driver example using the API to collect
hardware/firmware logs in crash recovery kernel, before hardware is
initialized.  The logs for the devices are made available under
/proc/crashdd/cxgb4/ directory.

Suggestions and feedback will be much appreciated.

Thanks,
Rahul

RFC v1: https://www.spinics.net/lists/netdev/msg486562.html

---
v2:
- Added new crashdd module that exports /proc/crashdd/ containing
  driver's registered hardware/firmware logs in patch 1.
- Replaced the API to allow drivers to register their hardware/firmware
  log collect routine in crash recovery kernel in patch 1.
- Updated patch 2 to use the new API in patch 1.

Rahul Lakkireddy (2):
  proc/crashdd: add API to collect hardware dump in second kernel
  cxgb4: collect hardware dump in second kernel

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h       |   4 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c |  25 +++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.h |   3 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  12 ++
 fs/proc/Kconfig                                  |  11 +
 fs/proc/Makefile                                 |   1 +
 fs/proc/crashdd.c                                | 263 +++++++++++++++++++++++
 include/linux/crashdd.h                          |  43 ++++
 8 files changed, 362 insertions(+)
 create mode 100644 fs/proc/crashdd.c
 create mode 100644 include/linux/crashdd.h

-- 
2.14.1

^ permalink raw reply

* Re: [PATCH v6 0/6] staging: Introduce DPAA2 Ethernet Switch driver
From: Laurentiu Tudor @ 2018-03-16 10:59 UTC (permalink / raw)
  To: Dan Carpenter, Andrew Lunn, stuyoder@gmail.com
  Cc: devel@driverdev.osuosl.org, arnd@arndb.de,
	gregkh@linuxfoundation.org, Alexandru Marginean, Alexander Graf,
	linux-kernel@vger.kernel.org, Razvan Stefanescu, Ioana Ciornei,
	netdev@vger.kernel.org
In-Reply-To: <20180315105642.szn2zhrsgwsv35yf@mwanda>

Hi Dan,

On 03/15/2018 12:56 PM, Dan Carpenter wrote:
> On Thu, Mar 15, 2018 at 12:44:37AM +0100, Andrew Lunn wrote:
>> On Wed, Mar 14, 2018 at 10:55:52AM -0500, Razvan Stefanescu wrote:
>>> This patchset introduces the Ethernet Switch Driver for Freescale/NXP SoCs
>>> with DPAA2 (DataPath Acceleration Architecture v2). The driver manages
>>> switch objects discovered on the fsl-mc bus. A description of the driver
>>> can be found in the associated README file.
>>
>> Hi Greg
>>
>> This code has much better quality than the usual stuff in staging. I
>> see no reason not to merge it.
>
> Yeah.  It seems pretty decent.  Stuart, Laurentiu, care to comment?

Not sure on what you want us to comment ...

> Meanwhile, netdev and DaveM aren't even on the CC list and they're the
> ones to ultimately decide.

I think we'll post to netdev when we'll be done with the TODOs
and start moving the driver out of staging.

---
Best Regards, Laurentiu

^ permalink raw reply

* [PATCH net 7/7] net: aquantia: driver version bump
From: Igor Russkikh @ 2018-03-16 10:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1521192913.git.igor.russkikh@aquantia.com>

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/ver.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/ver.h b/drivers/net/ethernet/aquantia/atlantic/ver.h
index 5265b93..a445de6 100644
--- a/drivers/net/ethernet/aquantia/atlantic/ver.h
+++ b/drivers/net/ethernet/aquantia/atlantic/ver.h
@@ -13,7 +13,7 @@
 #define NIC_MAJOR_DRIVER_VERSION           2
 #define NIC_MINOR_DRIVER_VERSION           0
 #define NIC_BUILD_DRIVER_VERSION           2
-#define NIC_REVISION_DRIVER_VERSION        0
+#define NIC_REVISION_DRIVER_VERSION        1
 
 #define AQ_CFG_DRV_VERSION_SUFFIX "-kern"
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 6/7] net: aquantia: Implement pci shutdown callback
From: Igor Russkikh @ 2018-03-16 10:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1521192913.git.igor.russkikh@aquantia.com>

We should close link and all NIC operations during shutdown.
On some systems graceful reboot never closes NIC interface on its own,
but only indicates pci device shutdown. Without explicit handler, NIC
rx rings continued to transfer DMA data into prepared buffers while CPU
rebooted already. That caused memory corruptions on soft reboot.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c      | 20 ++++++++++++++++++++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h      |  1 +
 drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 15 +++++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index f32e4b3..79b0914 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -943,3 +943,23 @@ int aq_nic_change_pm_state(struct aq_nic_s *self, pm_message_t *pm_msg)
 out:
 	return err;
 }
+
+void aq_nic_shutdown(struct aq_nic_s *self)
+{
+	int err = 0;
+
+	if (!self->ndev)
+		return;
+
+	rtnl_lock();
+
+	netif_device_detach(self->ndev);
+
+	err = aq_nic_stop(self);
+	if (err < 0)
+		goto err_exit;
+	aq_nic_deinit(self);
+
+err_exit:
+	rtnl_unlock();
+}
\ No newline at end of file
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
index 2b689b7..b770eeb 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
@@ -120,5 +120,6 @@ struct aq_nic_cfg_s *aq_nic_get_cfg(struct aq_nic_s *self);
 u32 aq_nic_get_fw_version(struct aq_nic_s *self);
 int aq_nic_change_pm_state(struct aq_nic_s *self, pm_message_t *pm_msg);
 int aq_nic_update_interrupt_moderation_settings(struct aq_nic_s *self);
+void aq_nic_shutdown(struct aq_nic_s *self);
 
 #endif /* AQ_NIC_H */
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index 87c4308..ecc6306 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -323,6 +323,20 @@ static void aq_pci_remove(struct pci_dev *pdev)
 	pci_disable_device(pdev);
 }
 
+static void aq_pci_shutdown(struct pci_dev *pdev)
+{
+	struct aq_nic_s *self = pci_get_drvdata(pdev);
+
+	aq_nic_shutdown(self);
+
+	pci_disable_device(pdev);
+
+	if (system_state == SYSTEM_POWER_OFF) {
+		pci_wake_from_d3(pdev, false);
+		pci_set_power_state(pdev, PCI_D3hot);
+	}
+}
+
 static int aq_pci_suspend(struct pci_dev *pdev, pm_message_t pm_msg)
 {
 	struct aq_nic_s *self = pci_get_drvdata(pdev);
@@ -345,6 +359,7 @@ static struct pci_driver aq_pci_ops = {
 	.remove = aq_pci_remove,
 	.suspend = aq_pci_suspend,
 	.resume = aq_pci_resume,
+	.shutdown = aq_pci_shutdown,
 };
 
 module_pci_driver(aq_pci_ops);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 5/7] net: aquantia: Allow live mac address changes
From: Igor Russkikh @ 2018-03-16 10:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1521192913.git.igor.russkikh@aquantia.com>

There is nothing prevents us from changing MAC on the running interface.
Allow this with ndev priv flag.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 5723f2c..f32e4b3 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -251,6 +251,8 @@ void aq_nic_ndev_init(struct aq_nic_s *self)
 	self->ndev->hw_features |= aq_hw_caps->hw_features;
 	self->ndev->features = aq_hw_caps->hw_features;
 	self->ndev->priv_flags = aq_hw_caps->hw_priv_flags;
+	self->ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+
 	self->ndev->mtu = aq_nic_cfg->mtu - ETH_HLEN;
 	self->ndev->max_mtu = aq_hw_caps->mtu - ETH_FCS_LEN - ETH_HLEN;

-- 
2.7.4

^ permalink raw reply related

* [PATCH net 4/7] net: aquantia: Add aq_tx_clean_budget and valid budget handling logic
From: Igor Russkikh @ 2018-03-16 10:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1521192913.git.igor.russkikh@aquantia.com>

We should report to napi full budget only when we have more job to do.
Before this fix, on any tx queue cleanup we forced napi to do poll again.
Thats a waste of cpu resources and caused storming with napi polls when
there was at least one tx on each interrupt.

With this fix we report full budget only when there is more job on TX
to do. Or, as before, when rx budget was fully consumed.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c  | 4 ++++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h  | 2 ++
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 7 +++++--
 drivers/net/ethernet/aquantia/atlantic/aq_ring.h | 2 +-
 drivers/net/ethernet/aquantia/atlantic/aq_vec.c  | 7 +++----
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index ebbaf63..5723f2c 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -36,6 +36,10 @@ static unsigned int aq_itr_rx;
 module_param_named(aq_itr_rx, aq_itr_rx, uint, 0644);
 MODULE_PARM_DESC(aq_itr_rx, "RX interrupt throttle rate");
 
+unsigned aq_tx_clean_budget = 256;
+module_param_named(aq_tx_clean_budget, aq_tx_clean_budget, uint, 0644);
+MODULE_PARM_DESC(aq_tx_clean_budget, "Maximum descriptors to cleanup on TX at once");
+
 static void aq_nic_update_ndev_stats(struct aq_nic_s *self);
 
 static void aq_nic_rss_init(struct aq_nic_s *self, unsigned int num_rss_queues)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
index d16b0f1..2b689b7 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
@@ -87,6 +87,8 @@ static inline struct device *aq_nic_get_dev(struct aq_nic_s *self)
 	return self->ndev->dev.parent;
 }
 
+extern unsigned aq_tx_clean_budget;
+
 void aq_nic_ndev_init(struct aq_nic_s *self);
 struct aq_nic_s *aq_nic_alloc_hot(struct net_device *ndev);
 void aq_nic_set_tx_ring(struct aq_nic_s *self, unsigned int idx,
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 0be6a11..64a6de6 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -136,11 +136,12 @@ void aq_ring_queue_stop(struct aq_ring_s *ring)
 		netif_stop_subqueue(ndev, ring->idx);
 }
 
-void aq_ring_tx_clean(struct aq_ring_s *self)
+bool aq_ring_tx_clean(struct aq_ring_s *self)
 {
 	struct device *dev = aq_nic_get_dev(self->aq_nic);
+	unsigned int budget = aq_tx_clean_budget;
 
-	for (; self->sw_head != self->hw_head;
+	for (; self->sw_head != self->hw_head && budget--;
 		self->sw_head = aq_ring_next_dx(self, self->sw_head)) {
 		struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
 
@@ -167,6 +168,8 @@ void aq_ring_tx_clean(struct aq_ring_s *self)
 		buff->pa = 0U;
 		buff->eop_index = 0xffffU;
 	}
+
+	return !!budget;
 }
 
 #define AQ_SKB_ALIGN SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
index 965fae0..ac1329f 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
@@ -153,7 +153,7 @@ void aq_ring_free(struct aq_ring_s *self);
 void aq_ring_update_queue_state(struct aq_ring_s *ring);
 void aq_ring_queue_wake(struct aq_ring_s *ring);
 void aq_ring_queue_stop(struct aq_ring_s *ring);
-void aq_ring_tx_clean(struct aq_ring_s *self);
+bool aq_ring_tx_clean(struct aq_ring_s *self);
 int aq_ring_rx_clean(struct aq_ring_s *self,
 		     struct napi_struct *napi,
 		     int *work_done,
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
index f890b8a..9528507 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
@@ -40,7 +40,7 @@ static int aq_vec_poll(struct napi_struct *napi, int budget)
 	int err = 0;
 	unsigned int i = 0U;
 	unsigned int sw_tail_old = 0U;
-	bool was_tx_cleaned = false;
+	bool was_tx_cleaned = true;
 
 	if (!self) {
 		err = -EINVAL;
@@ -57,9 +57,8 @@ static int aq_vec_poll(struct napi_struct *napi, int budget)
 
 			if (ring[AQ_VEC_TX_ID].sw_head !=
 			    ring[AQ_VEC_TX_ID].hw_head) {
-				aq_ring_tx_clean(&ring[AQ_VEC_TX_ID]);
+				was_tx_cleaned = aq_ring_tx_clean(&ring[AQ_VEC_TX_ID]);
 				aq_ring_update_queue_state(&ring[AQ_VEC_TX_ID]);
-				was_tx_cleaned = true;
 			}
 
 			err = self->aq_hw_ops->hw_ring_rx_receive(self->aq_hw,
@@ -90,7 +89,7 @@ static int aq_vec_poll(struct napi_struct *napi, int budget)
 			}
 		}
 
-		if (was_tx_cleaned)
+		if (!was_tx_cleaned)
 			work_done = budget;
 
 		if (work_done < budget) {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 3/7] net: aquantia: Change inefficient wait loop on fw data reads
From: Igor Russkikh @ 2018-03-16 10:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1521192913.git.igor.russkikh@aquantia.com>

B1 hardware changes behavior of mailbox interface, it has busy bit
always raised. Data ready condition should be detected by increment
of address register.

Old code has empty `for` loop, and that caused cpu overloads on B1
hardware. aq_nic_service_timer_cb consumed ~100ms because of that.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 .../aquantia/atlantic/hw_atl/hw_atl_utils.c        | 42 ++++++++++++++--------
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h        |  1 +
 2 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 9a53a4c..2accffc 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -21,6 +21,10 @@
 
 #define HW_ATL_UCP_0X370_REG    0x0370U
 
+#define HW_ATL_MIF_CMD          0x0200U
+#define HW_ATL_MIF_ADDR         0x0208U
+#define HW_ATL_MIF_VAL          0x020CU
+
 #define HW_ATL_FW_SM_RAM        0x2U
 #define HW_ATL_MPI_FW_VERSION	0x18
 #define HW_ATL_MPI_CONTROL_ADR  0x0368U
@@ -269,18 +273,22 @@ int hw_atl_utils_fw_downld_dwords(struct aq_hw_s *self, u32 a,
 		}
 	}
 
-	aq_hw_write_reg(self, 0x00000208U, a);
-
-	for (++cnt; --cnt;) {
-		u32 i = 0U;
+	aq_hw_write_reg(self, HW_ATL_MIF_ADDR, a);
 
-		aq_hw_write_reg(self, 0x00000200U, 0x00008000U);
+	for (++cnt; --cnt && !err;) {
+		aq_hw_write_reg(self, HW_ATL_MIF_CMD, 0x00008000U);
 
-		for (i = 1024U;
-			(0x100U & aq_hw_read_reg(self, 0x00000200U)) && --i;) {
-		}
+		if (IS_CHIP_FEATURE(REVISION_B1))
+			AQ_HW_WAIT_FOR(a != aq_hw_read_reg(self,
+							   HW_ATL_MIF_ADDR),
+				       1, 1000U);
+		else
+			AQ_HW_WAIT_FOR(!(0x100 & aq_hw_read_reg(self,
+							   HW_ATL_MIF_CMD)),
+				       1, 1000U);
 
-		*(p++) = aq_hw_read_reg(self, 0x0000020CU);
+		*(p++) = aq_hw_read_reg(self, HW_ATL_MIF_VAL);
+		a += 4;
 	}
 
 	hw_atl_reg_glb_cpu_sem_set(self, 1U, HW_ATL_FW_SM_RAM);
@@ -676,14 +684,18 @@ void hw_atl_utils_hw_chip_features_init(struct aq_hw_s *self, u32 *p)
 	u32 val = hw_atl_reg_glb_mif_id_get(self);
 	u32 mif_rev = val & 0xFFU;
 
-	if ((3U & mif_rev) == 1U) {
-		chip_features |=
-			HAL_ATLANTIC_UTILS_CHIP_REVISION_A0 |
+	if ((0xFU & mif_rev) == 1U) {
+		chip_features |= HAL_ATLANTIC_UTILS_CHIP_REVISION_A0 |
 			HAL_ATLANTIC_UTILS_CHIP_MPI_AQ |
 			HAL_ATLANTIC_UTILS_CHIP_MIPS;
-	} else if ((3U & mif_rev) == 2U) {
-		chip_features |=
-			HAL_ATLANTIC_UTILS_CHIP_REVISION_B0 |
+	} else if ((0xFU & mif_rev) == 2U) {
+		chip_features |= HAL_ATLANTIC_UTILS_CHIP_REVISION_B0 |
+			HAL_ATLANTIC_UTILS_CHIP_MPI_AQ |
+			HAL_ATLANTIC_UTILS_CHIP_MIPS |
+			HAL_ATLANTIC_UTILS_CHIP_TPO2 |
+			HAL_ATLANTIC_UTILS_CHIP_RPF2;
+	} else if ((0xFU & mif_rev) == 0xAU) {
+		chip_features |= HAL_ATLANTIC_UTILS_CHIP_REVISION_B1 |
 			HAL_ATLANTIC_UTILS_CHIP_MPI_AQ |
 			HAL_ATLANTIC_UTILS_CHIP_MIPS |
 			HAL_ATLANTIC_UTILS_CHIP_TPO2 |
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
index 2c69094..cd8f18f 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
@@ -161,6 +161,7 @@ struct __packed hw_aq_atl_utils_mbox {
 #define HAL_ATLANTIC_UTILS_CHIP_MPI_AQ       0x00000010U
 #define HAL_ATLANTIC_UTILS_CHIP_REVISION_A0  0x01000000U
 #define HAL_ATLANTIC_UTILS_CHIP_REVISION_B0  0x02000000U
+#define HAL_ATLANTIC_UTILS_CHIP_REVISION_B1  0x04000000U
 
 #define IS_CHIP_FEATURE(_F_) (HAL_ATLANTIC_UTILS_CHIP_##_F_ & \
 	self->chip_features)
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 2/7] net: aquantia: Fix a regression with reset on old firmware
From: Igor Russkikh @ 2018-03-16 10:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1521192913.git.igor.russkikh@aquantia.com>

FW 1.5.58 and below needs a fixed delay even after 0x18 register
is filled. Otherwise, setting MPI_INIT state too fast causes
traffic hang.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 0da480b..9a53a4c 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -147,6 +147,8 @@ static int hw_atl_utils_soft_reset_flb(struct aq_hw_s *self)
 		aq_pr_err("FW kickstart failed\n");
 		return -EIO;
 	}
+	/* Old FW requires fixed delay after init */
+	AQ_HW_SLEEP(15);

 	return 0;
 }
@@ -214,6 +216,8 @@ static int hw_atl_utils_soft_reset_rbl(struct aq_hw_s *self)
 		aq_pr_err("FW kickstart failed\n");
 		return -EIO;
 	}
+	/* Old FW requires fixed delay after init */
+	AQ_HW_SLEEP(15);

 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 1/7] net: aquantia: Fix hardware reset when SPI may rarely hangup
From: Igor Russkikh @ 2018-03-16 10:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1521192913.git.igor.russkikh@aquantia.com>

Under some circumstances (notably using thunderbolt interface) SPI
on chip reset may be in active transaction.
Here we forcibly cleanup SPI to prevent possible hangups.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 967f0fd..0da480b 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -80,15 +80,14 @@ int hw_atl_utils_initfw(struct aq_hw_s *self, const struct aq_fw_ops **fw_ops)
 static int hw_atl_utils_soft_reset_flb(struct aq_hw_s *self)
 {
 	int k = 0;
-	u32 gsr;
+	u32 gsr, val;
 
 	aq_hw_write_reg(self, 0x404, 0x40e1);
 	AQ_HW_SLEEP(50);
 
 	/* Cleanup SPI */
-	aq_hw_write_reg(self, 0x534, 0xA0);
-	aq_hw_write_reg(self, 0x100, 0x9F);
-	aq_hw_write_reg(self, 0x100, 0x809F);
+	val = aq_hw_read_reg(self, 0x53C);
+	aq_hw_write_reg(self, 0x53C, val | 0x10);
 
 	gsr = aq_hw_read_reg(self, HW_ATL_GLB_SOFT_RES_ADR);
 	aq_hw_write_reg(self, HW_ATL_GLB_SOFT_RES_ADR, (gsr & 0xBFFF) | 0x8000);
@@ -97,7 +96,14 @@ static int hw_atl_utils_soft_reset_flb(struct aq_hw_s *self)
 	aq_hw_write_reg(self, 0x404, 0x80e0);
 	aq_hw_write_reg(self, 0x32a8, 0x0);
 	aq_hw_write_reg(self, 0x520, 0x1);
+
+	/* Reset SPI again because of possible interrupted SPI burst */
+	val = aq_hw_read_reg(self, 0x53C);
+	aq_hw_write_reg(self, 0x53C, val | 0x10);
 	AQ_HW_SLEEP(10);
+	/* Clear SPI reset state */
+	aq_hw_write_reg(self, 0x53C, val & ~0x10);
+
 	aq_hw_write_reg(self, 0x404, 0x180e0);
 
 	for (k = 0; k < 1000; k++) {
@@ -147,7 +153,7 @@ static int hw_atl_utils_soft_reset_flb(struct aq_hw_s *self)
 
 static int hw_atl_utils_soft_reset_rbl(struct aq_hw_s *self)
 {
-	u32 gsr, rbl_status;
+	u32 gsr, val, rbl_status;
 	int k;
 
 	aq_hw_write_reg(self, 0x404, 0x40e1);
@@ -157,6 +163,10 @@ static int hw_atl_utils_soft_reset_rbl(struct aq_hw_s *self)
 	/* Alter RBL status */
 	aq_hw_write_reg(self, 0x388, 0xDEAD);
 
+	/* Cleanup SPI */
+	val = aq_hw_read_reg(self, 0x53C);
+	aq_hw_write_reg(self, 0x53C, val | 0x10);
+
 	/* Global software reset*/
 	hw_atl_rx_rx_reg_res_dis_set(self, 0U);
 	hw_atl_tx_tx_reg_res_dis_set(self, 0U);
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 0/7] Aquantia atlantic hot fixes 03-2018
From: Igor Russkikh @ 2018-03-16 10:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh

This is a set of atlantic driver hot fixes for various areas:

Some rare issues with hardware reset covered,
Fixed napi_poll flood happening on some traffic conditions,
Allow system to change MAC address on live device,
Add pci shutdown handler.

Igor Russkikh (7):
  net: aquantia: Fix hardware reset when SPI may rarely hangup
  net: aquantia: Fix a regression with reset on old firmware
  net: aquantia: Change inefficient wait loop on fw data reads
  net: aquantia: Add aq_tx_clean_budget and valid budget handling logic
  net: aquantia: Allow live mac address changes
  net: aquantia: Implement pci shutdown callback
  net: aquantia: driver version bump

 drivers/net/ethernet/aquantia/atlantic/aq_nic.c    | 26 +++++++++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h    |  3 +
 .../net/ethernet/aquantia/atlantic/aq_pci_func.c   | 15 +++++
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c   |  7 ++-
 drivers/net/ethernet/aquantia/atlantic/aq_ring.h   |  2 +-
 drivers/net/ethernet/aquantia/atlantic/aq_vec.c    |  7 +--
 .../aquantia/atlantic/hw_atl/hw_atl_utils.c        | 66 +++++++++++++++-------
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h        |  1 +
 drivers/net/ethernet/aquantia/atlantic/ver.h       |  2 +-
 9 files changed, 101 insertions(+), 28 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net-next 10/10] arm64: dts: marvell: mcbin: enable the fourth network interface
From: Antoine Tenart @ 2018-03-16 10:33 UTC (permalink / raw)
  To: davem, kishon, linux, gregory.clement, andrew, jason,
	sebastian.hesselbarth
  Cc: Antoine Tenart, netdev, linux-kernel, thomas.petazzoni,
	maxime.chevallier, miquel.raynal, nadavh, stefanc, ymarkman, mw,
	linux-arm-kernel
In-Reply-To: <20180316103351.16616-1-antoine.tenart@bootlin.com>

This patch enables the fourth network interface on the Marvell
Macchiatobin. It is configured in the 2500Base-X PHY mode.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
---
 arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
index 626e9d0462c3..1ce31bfe6197 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-mcbin.dts
@@ -66,6 +66,7 @@
 		ethernet0 = &cp0_eth0;
 		ethernet1 = &cp1_eth0;
 		ethernet2 = &cp1_eth1;
+		ethernet3 = &cp1_eth2;
 	};
 
 	/* Regulator labels correspond with schematics */
@@ -285,6 +286,16 @@
 	phys = <&cp1_comphy0 1>;
 };
 
+&cp1_eth2 {
+	/* CPS Lane 5 */
+	status = "okay";
+	/* Network PHY */
+	phy-mode = "2500base-x";
+	managed = "in-band-status";
+	/* Generic PHY, providing serdes lanes */
+	phys = <&cp1_comphy5 2>;
+};
+
 &cp1_pinctrl {
 	cp1_spi1_pins: spi1-pins {
 		marvell,pins = "mpp12", "mpp13", "mpp14", "mpp15", "mpp16";
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 09/10] arm64: dts: marvell: 8040-db: set the 10G interfaces management to in-band
From: Antoine Tenart @ 2018-03-16 10:33 UTC (permalink / raw)
  To: davem, kishon, linux, gregory.clement, andrew, jason,
	sebastian.hesselbarth
  Cc: Antoine Tenart, netdev, linux-kernel, thomas.petazzoni,
	maxime.chevallier, miquel.raynal, nadavh, stefanc, ymarkman, mw,
	linux-arm-kernel
In-Reply-To: <20180316103351.16616-1-antoine.tenart@bootlin.com>

This patch sets the 10G interfaces (cp0_eth0, cp1_eth0) management to
in-band. This is needed for the PPv2 driver to handle such ports, with
its conversion to phylink.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
---
 arch/arm64/boot/dts/marvell/armada-8040-db.dts | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-db.dts b/arch/arm64/boot/dts/marvell/armada-8040-db.dts
index dba55baff20f..f6224d323c7a 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-db.dts
@@ -216,6 +216,7 @@
 &cp0_eth0 {
 	status = "okay";
 	phy-mode = "10gbase-kr";
+	managed = "in-band-status";
 };
 
 &cp0_eth2 {
@@ -334,6 +335,7 @@
 &cp1_eth0 {
 	status = "okay";
 	phy-mode = "10gbase-kr";
+	managed = "in-band-status";
 };
 
 &cp1_eth1 {
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 08/10] arm64: dts: marvell: 7040-db: set the 10G interface management to in-band
From: Antoine Tenart @ 2018-03-16 10:33 UTC (permalink / raw)
  To: davem, kishon, linux, gregory.clement, andrew, jason,
	sebastian.hesselbarth
  Cc: Antoine Tenart, netdev, linux-kernel, thomas.petazzoni,
	maxime.chevallier, miquel.raynal, nadavh, stefanc, ymarkman, mw,
	linux-arm-kernel
In-Reply-To: <20180316103351.16616-1-antoine.tenart@bootlin.com>

This patch sets the 10G interface (cp0_eth0) management to in-band. This
is needed for the PPv2 driver to handle such ports, with its conversion
to phylink.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
---
 arch/arm64/boot/dts/marvell/armada-7040-db.dts | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/marvell/armada-7040-db.dts b/arch/arm64/boot/dts/marvell/armada-7040-db.dts
index 3ae05eee2c9a..f27af07db381 100644
--- a/arch/arm64/boot/dts/marvell/armada-7040-db.dts
+++ b/arch/arm64/boot/dts/marvell/armada-7040-db.dts
@@ -267,6 +267,7 @@
 	status = "okay";
 	/* Network PHY */
 	phy-mode = "10gbase-kr";
+	managed = "in-band-status";
 	/* Generic PHY, providing serdes lanes */
 	phys = <&cp0_comphy2 0>;
 };
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 07/10] net: mvpp2: 2500baseX support
From: Antoine Tenart @ 2018-03-16 10:33 UTC (permalink / raw)
  To: davem, kishon, linux, gregory.clement, andrew, jason,
	sebastian.hesselbarth
  Cc: Antoine Tenart, netdev, linux-kernel, thomas.petazzoni,
	maxime.chevallier, miquel.raynal, nadavh, stefanc, ymarkman, mw,
	linux-arm-kernel
In-Reply-To: <20180316103351.16616-1-antoine.tenart@bootlin.com>

This patch adds the 2500Base-X PHY mode support in the Marvell PPv2
driver. 2500Base-X is quite close to 1000Base-X and SGMII modes and uses
nearly the same code path.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 51 +++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index f6c35b688af4..c4de3422d688 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -4897,6 +4897,7 @@ static int mvpp22_gop_init(struct mvpp2_port *port)
 		break;
 	case PHY_INTERFACE_MODE_SGMII:
 	case PHY_INTERFACE_MODE_1000BASEX:
+	case PHY_INTERFACE_MODE_2500BASEX:
 		mvpp22_gop_init_sgmii(port);
 		break;
 	case PHY_INTERFACE_MODE_10GKR:
@@ -4935,7 +4936,8 @@ static void mvpp22_gop_unmask_irq(struct mvpp2_port *port)
 
 	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
 	    port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
+	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+	    port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
 		/* Enable the GMAC link status irq for this port */
 		val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
 		val |= MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
@@ -4966,7 +4968,8 @@ static void mvpp22_gop_mask_irq(struct mvpp2_port *port)
 
 	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
 	    port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
+	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+	    port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
 		val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
 		val &= ~MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
 		writel(val, port->base + MVPP22_GMAC_INT_SUM_MASK);
@@ -4979,7 +4982,8 @@ static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
 
 	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
 	    port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
+	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+	    port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
 		val = readl(port->base + MVPP22_GMAC_INT_MASK);
 		val |= MVPP22_GMAC_INT_MASK_LINK_STAT;
 		writel(val, port->base + MVPP22_GMAC_INT_MASK);
@@ -4994,6 +4998,16 @@ static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
 	mvpp22_gop_unmask_irq(port);
 }
 
+/* Sets the PHY mode of the COMPHY (which configures the serdes lanes).
+ *
+ * The PHY mode used by the PPv2 driver comes from the network subsystem, while
+ * the one given to the COMPHY comes from the generic PHY subsystem. Hence they
+ * differ.
+ *
+ * The COMPHY configures the serdes lanes regardless of the actual use of the
+ * lanes by the physical layer. This is why configurations like
+ * "PPv2 (2500BaseX) - COMPHY (2500SGMII)" are valid.
+ */
 static int mvpp22_comphy_init(struct mvpp2_port *port)
 {
 	enum phy_mode mode;
@@ -5007,6 +5021,9 @@ static int mvpp22_comphy_init(struct mvpp2_port *port)
 	case PHY_INTERFACE_MODE_1000BASEX:
 		mode = PHY_MODE_SGMII;
 		break;
+	case PHY_INTERFACE_MODE_2500BASEX:
+		mode = PHY_MODE_2500SGMII;
+		break;
 	case PHY_INTERFACE_MODE_10GKR:
 		mode = PHY_MODE_10GKR;
 		break;
@@ -5085,7 +5102,8 @@ static void mvpp2_port_loopback_set(struct mvpp2_port *port,
 		val &= ~MVPP2_GMAC_GMII_LB_EN_MASK;
 
 	if (port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX)
+	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+	    port->phy_interface == PHY_INTERFACE_MODE_2500BASEX)
 		val |= MVPP2_GMAC_PCS_LB_EN_MASK;
 	else
 		val &= ~MVPP2_GMAC_PCS_LB_EN_MASK;
@@ -6295,7 +6313,8 @@ static irqreturn_t mvpp2_link_status_isr(int irq, void *dev_id)
 		}
 	} else if (phy_interface_mode_is_rgmii(port->phy_interface) ||
 		   port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
-		   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
+		   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX ||
+		   port->phy_interface == PHY_INTERFACE_MODE_2500BASEX) {
 		val = readl(port->base + MVPP22_GMAC_INT_STAT);
 		if (val & MVPP22_GMAC_INT_STAT_LINK) {
 			event = true;
@@ -8079,8 +8098,10 @@ static void mvpp2_phylink_validate(struct net_device *dev,
 		phylink_set(mask, 10000baseT_Full);
 		/* Fall-through */
 	case PHY_INTERFACE_MODE_1000BASEX:
+	case PHY_INTERFACE_MODE_2500BASEX:
 		phylink_set(mask, 1000baseT_Full);
 		phylink_set(mask, 1000baseX_Full);
+		phylink_set(mask, 2500baseX_Full);
 	}
 
 	bitmap_and(supported, supported, mask, __ETHTOOL_LINK_MODE_MASK_NBITS);
@@ -8123,6 +8144,9 @@ static void mvpp2_gmac_link_state(struct mvpp2_port *port,
 	case PHY_INTERFACE_MODE_1000BASEX:
 		state->speed = SPEED_1000;
 		break;
+	case PHY_INTERFACE_MODE_2500BASEX:
+		state->speed = SPEED_2500;
+		break;
 	default:
 		if (val & MVPP2_GMAC_STATUS0_GMII_SPEED)
 			state->speed = SPEED_1000;
@@ -8221,11 +8245,12 @@ static void mvpp2_gmac_config(struct mvpp2_port *port, unsigned int mode,
 	ctrl0 &= ~MVPP2_GMAC_PORT_TYPE_MASK;
 	ctrl2 &= ~(MVPP2_GMAC_PORT_RESET_MASK | MVPP2_GMAC_PCS_ENABLE_MASK);
 
-	if (state->interface == PHY_INTERFACE_MODE_1000BASEX) {
-		/* 1000BaseX port cannot negotiate speed nor can it negotiate
-		 * duplex: they are always operating with a fixed speed of
-		 * 1000Mbps in full duplex, so force 1000 speed and full duplex
-		 * here.
+	if (state->interface == PHY_INTERFACE_MODE_1000BASEX ||
+	    state->interface == PHY_INTERFACE_MODE_2500BASEX) {
+		/* 1000BaseX and 2500BaseX ports cannot negotiate speed nor can
+		 * they negotiate duplex: they are always operating with a fixed
+		 * speed of 1000/2500Mbps in full duplex, so force 1000/2500
+		 * speed and full duplex here.
 		 */
 		ctrl0 |= MVPP2_GMAC_PORT_TYPE_MASK;
 		an |= MVPP2_GMAC_CONFIG_GMII_SPEED |
@@ -8242,7 +8267,8 @@ static void mvpp2_gmac_config(struct mvpp2_port *port, unsigned int mode,
 		an |= MVPP2_GMAC_FC_ADV_ASM_EN;
 
 	if (state->interface == PHY_INTERFACE_MODE_SGMII ||
-	    state->interface == PHY_INTERFACE_MODE_1000BASEX) {
+	    state->interface == PHY_INTERFACE_MODE_1000BASEX ||
+	    state->interface == PHY_INTERFACE_MODE_2500BASEX) {
 		an |= MVPP2_GMAC_IN_BAND_AUTONEG;
 		ctrl2 |= MVPP2_GMAC_INBAND_AN_MASK | MVPP2_GMAC_PCS_ENABLE_MASK;
 
@@ -8305,7 +8331,8 @@ static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
 		mvpp2_xlg_config(port, mode, state);
 	else if (phy_interface_mode_is_rgmii(state->interface) ||
 		 state->interface == PHY_INTERFACE_MODE_SGMII ||
-		 state->interface == PHY_INTERFACE_MODE_1000BASEX)
+		 state->interface == PHY_INTERFACE_MODE_1000BASEX ||
+		 state->interface == PHY_INTERFACE_MODE_2500BASEX)
 		mvpp2_gmac_config(port, mode, state);
 
 	if (port->priv->hw_version == MVPP21 && port->flags & MVPP2_F_LOOPBACK)
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 06/10] net: mvpp2: 1000baseX support
From: Antoine Tenart @ 2018-03-16 10:33 UTC (permalink / raw)
  To: davem, kishon, linux, gregory.clement, andrew, jason,
	sebastian.hesselbarth
  Cc: Antoine Tenart, netdev, linux-kernel, thomas.petazzoni,
	maxime.chevallier, miquel.raynal, nadavh, stefanc, ymarkman, mw,
	linux-arm-kernel
In-Reply-To: <20180316103351.16616-1-antoine.tenart@bootlin.com>

This patch adds the 1000Base-X PHY mode support in the Marvell PPv2
driver. 1000Base-X is quite close the SGMII and uses nearly the same
code path.

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
---
 drivers/net/ethernet/marvell/mvpp2.c | 74 +++++++++++++++++++++++++-----------
 1 file changed, 52 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 8e8e7afcd437..f6c35b688af4 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -4896,6 +4896,7 @@ static int mvpp22_gop_init(struct mvpp2_port *port)
 		mvpp22_gop_init_rgmii(port);
 		break;
 	case PHY_INTERFACE_MODE_SGMII:
+	case PHY_INTERFACE_MODE_1000BASEX:
 		mvpp22_gop_init_sgmii(port);
 		break;
 	case PHY_INTERFACE_MODE_10GKR:
@@ -4933,7 +4934,8 @@ static void mvpp22_gop_unmask_irq(struct mvpp2_port *port)
 	u32 val;
 
 	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
-	    port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+	    port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
 		/* Enable the GMAC link status irq for this port */
 		val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
 		val |= MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
@@ -4963,7 +4965,8 @@ static void mvpp22_gop_mask_irq(struct mvpp2_port *port)
 	}
 
 	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
-	    port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+	    port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
 		val = readl(port->base + MVPP22_GMAC_INT_SUM_MASK);
 		val &= ~MVPP22_GMAC_INT_SUM_MASK_LINK_STAT;
 		writel(val, port->base + MVPP22_GMAC_INT_SUM_MASK);
@@ -4975,7 +4978,8 @@ static void mvpp22_gop_setup_irq(struct mvpp2_port *port)
 	u32 val;
 
 	if (phy_interface_mode_is_rgmii(port->phy_interface) ||
-	    port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+	    port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
 		val = readl(port->base + MVPP22_GMAC_INT_MASK);
 		val |= MVPP22_GMAC_INT_MASK_LINK_STAT;
 		writel(val, port->base + MVPP22_GMAC_INT_MASK);
@@ -5000,6 +5004,7 @@ static int mvpp22_comphy_init(struct mvpp2_port *port)
 
 	switch (port->phy_interface) {
 	case PHY_INTERFACE_MODE_SGMII:
+	case PHY_INTERFACE_MODE_1000BASEX:
 		mode = PHY_MODE_SGMII;
 		break;
 	case PHY_INTERFACE_MODE_10GKR:
@@ -5079,7 +5084,8 @@ static void mvpp2_port_loopback_set(struct mvpp2_port *port,
 	else
 		val &= ~MVPP2_GMAC_GMII_LB_EN_MASK;
 
-	if (port->phy_interface == PHY_INTERFACE_MODE_SGMII)
+	if (port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+	    port->phy_interface == PHY_INTERFACE_MODE_1000BASEX)
 		val |= MVPP2_GMAC_PCS_LB_EN_MASK;
 	else
 		val &= ~MVPP2_GMAC_PCS_LB_EN_MASK;
@@ -6288,7 +6294,8 @@ static irqreturn_t mvpp2_link_status_isr(int irq, void *dev_id)
 				link = true;
 		}
 	} else if (phy_interface_mode_is_rgmii(port->phy_interface) ||
-		   port->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+		   port->phy_interface == PHY_INTERFACE_MODE_SGMII ||
+		   port->phy_interface == PHY_INTERFACE_MODE_1000BASEX) {
 		val = readl(port->base + MVPP22_GMAC_INT_STAT);
 		if (val & MVPP22_GMAC_INT_STAT_LINK) {
 			event = true;
@@ -8055,21 +8062,25 @@ static void mvpp2_phylink_validate(struct net_device *dev,
 	phylink_set(mask, Pause);
 	phylink_set(mask, Asym_Pause);
 
-	phylink_set(mask, 10baseT_Half);
-	phylink_set(mask, 10baseT_Full);
-	phylink_set(mask, 100baseT_Half);
-	phylink_set(mask, 100baseT_Full);
-	phylink_set(mask, 1000baseT_Full);
-	phylink_set(mask, 1000baseX_Full);
-	phylink_set(mask, 10000baseT_Full);
-
-	if (state->interface == PHY_INTERFACE_MODE_10GKR) {
+	switch (state->interface) {
+	case PHY_INTERFACE_MODE_10GKR:
 		phylink_set(mask, 10000baseCR_Full);
 		phylink_set(mask, 10000baseSR_Full);
 		phylink_set(mask, 10000baseLR_Full);
 		phylink_set(mask, 10000baseLRM_Full);
 		phylink_set(mask, 10000baseER_Full);
 		phylink_set(mask, 10000baseKR_Full);
+		/* Fall-through */
+	default:
+		phylink_set(mask, 10baseT_Half);
+		phylink_set(mask, 10baseT_Full);
+		phylink_set(mask, 100baseT_Half);
+		phylink_set(mask, 100baseT_Full);
+		phylink_set(mask, 10000baseT_Full);
+		/* Fall-through */
+	case PHY_INTERFACE_MODE_1000BASEX:
+		phylink_set(mask, 1000baseT_Full);
+		phylink_set(mask, 1000baseX_Full);
 	}
 
 	bitmap_and(supported, supported, mask, __ETHTOOL_LINK_MODE_MASK_NBITS);
@@ -8108,12 +8119,18 @@ static void mvpp2_gmac_link_state(struct mvpp2_port *port,
 	state->link = !!(val & MVPP2_GMAC_STATUS0_LINK_UP);
 	state->duplex = !!(val & MVPP2_GMAC_STATUS0_FULL_DUPLEX);
 
-	if (val & MVPP2_GMAC_STATUS0_GMII_SPEED)
+	switch (port->phy_interface) {
+	case PHY_INTERFACE_MODE_1000BASEX:
 		state->speed = SPEED_1000;
-	else if (val & MVPP2_GMAC_STATUS0_MII_SPEED)
-		state->speed = SPEED_100;
-	else
-		state->speed = SPEED_10;
+		break;
+	default:
+		if (val & MVPP2_GMAC_STATUS0_GMII_SPEED)
+			state->speed = SPEED_1000;
+		else if (val & MVPP2_GMAC_STATUS0_MII_SPEED)
+			state->speed = SPEED_100;
+		else
+			state->speed = SPEED_10;
+	}
 
 	state->pause = 0;
 	if (val & MVPP2_GMAC_STATUS0_RX_PAUSE)
@@ -8204,7 +8221,18 @@ static void mvpp2_gmac_config(struct mvpp2_port *port, unsigned int mode,
 	ctrl0 &= ~MVPP2_GMAC_PORT_TYPE_MASK;
 	ctrl2 &= ~(MVPP2_GMAC_PORT_RESET_MASK | MVPP2_GMAC_PCS_ENABLE_MASK);
 
-	an |= MVPP2_GMAC_AN_SPEED_EN | MVPP2_GMAC_FLOW_CTRL_AUTONEG;
+	if (state->interface == PHY_INTERFACE_MODE_1000BASEX) {
+		/* 1000BaseX port cannot negotiate speed nor can it negotiate
+		 * duplex: they are always operating with a fixed speed of
+		 * 1000Mbps in full duplex, so force 1000 speed and full duplex
+		 * here.
+		 */
+		ctrl0 |= MVPP2_GMAC_PORT_TYPE_MASK;
+		an |= MVPP2_GMAC_CONFIG_GMII_SPEED |
+		      MVPP2_GMAC_CONFIG_FULL_DUPLEX;
+	} else {
+		an |= MVPP2_GMAC_AN_SPEED_EN | MVPP2_GMAC_FLOW_CTRL_AUTONEG;
+	}
 
 	if (state->duplex)
 		an |= MVPP2_GMAC_CONFIG_FULL_DUPLEX;
@@ -8213,7 +8241,8 @@ static void mvpp2_gmac_config(struct mvpp2_port *port, unsigned int mode,
 	if (phylink_test(state->advertising, Asym_Pause))
 		an |= MVPP2_GMAC_FC_ADV_ASM_EN;
 
-	if (state->interface == PHY_INTERFACE_MODE_SGMII) {
+	if (state->interface == PHY_INTERFACE_MODE_SGMII ||
+	    state->interface == PHY_INTERFACE_MODE_1000BASEX) {
 		an |= MVPP2_GMAC_IN_BAND_AUTONEG;
 		ctrl2 |= MVPP2_GMAC_INBAND_AN_MASK | MVPP2_GMAC_PCS_ENABLE_MASK;
 
@@ -8275,7 +8304,8 @@ static void mvpp2_mac_config(struct net_device *dev, unsigned int mode,
 	if (state->interface == PHY_INTERFACE_MODE_10GKR)
 		mvpp2_xlg_config(port, mode, state);
 	else if (phy_interface_mode_is_rgmii(state->interface) ||
-		 state->interface == PHY_INTERFACE_MODE_SGMII)
+		 state->interface == PHY_INTERFACE_MODE_SGMII ||
+		 state->interface == PHY_INTERFACE_MODE_1000BASEX)
 		mvpp2_gmac_config(port, mode, state);
 
 	if (port->priv->hw_version == MVPP21 && port->flags & MVPP2_F_LOOPBACK)
-- 
2.14.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox