Netdev List

Netdev List
 help / color / mirror / Atom feed

* [iproute PATCH 0/2] Minor ss filter fix and review
From: Phil Sutter @ 2016-04-13 20:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Vadim Kochan, netdev
In-Reply-To: <20160329193242.GA28502@orbyte.nwl.cc>

While looking for a solution to the problem described in patch 2/2, I
discovered the overly complicated assignment in filter_states_set() which
is simplified in patch 1/2.

Phil Sutter (2):
  ss: Drop silly assignment
  ss: Fix accidental state filter override

 misc/ss.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.8.0

^ permalink raw reply

* [iproute PATCH 1/2] ss: Drop silly assignment
From: Phil Sutter @ 2016-04-13 20:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Vadim Kochan, netdev
In-Reply-To: <1460578025-12224-1-git-send-email-phil@nwl.cc>

An expression of the form '(a | b) & b' will evaluate to the value of b
for any value of a or b.

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 misc/ss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index 38cf3312a746e..d6090018c5dbb 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -267,7 +267,7 @@ static void filter_default_dbs(struct filter *f)
 static void filter_states_set(struct filter *f, int states)
 {
 	if (states)
-		f->states = (f->states | states) & states;
+		f->states = states;
 }
 
 static void filter_merge_defaults(struct filter *f)
-- 
2.8.0

^ permalink raw reply related

* [iproute PATCH 2/2] ss: Fix accidental state filter override
From: Phil Sutter @ 2016-04-13 20:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Vadim Kochan, netdev
In-Reply-To: <1460578025-12224-1-git-send-email-phil@nwl.cc>

Passing a filter expression and selecting an address family using the
'-f' flag would overwrite the state filter by accident. Therefore
calling e.g. 'ss -nl -f inet '(sport = :22)' would not only print
listening sockets (as requested by '-l' flag) but connected ones, as
well.

Fix this by reusing the formerly ineffective call to filter_states_set()
to restore the state filter as it was before the call to
filter_af_set().

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 misc/ss.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index d6090018c5dbb..544def3f08ea8 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1556,9 +1556,10 @@ void *parse_hostcond(char *addr, bool is_port)

 out:
 	if (fam != AF_UNSPEC) {
+		int states = f->states;
 		f->families = 0;
 		filter_af_set(f, fam);
-		filter_states_set(f, 0);
+		filter_states_set(f, states);
 	}

 	res = malloc(sizeof(*res));
-- 
2.8.0

^ permalink raw reply related

* Re: [PATCH 1/2] [v4] net: emac: emac gigabit ethernet controller driver
From: Bjørn Mork @ 2016-04-13 20:07 UTC (permalink / raw)
  To: Timur Tabi
  Cc: Shanker Donthineni, kbuild test robot, kbuild-all, netdev,
	linux-kernel, devicetree, linux-arm-msm, sdharia,
	Greg Kroah-Hartman, vikrams, cov, gavidov, Rob Herring, andrew,
	bjorn.andersson, Mark Langsdorf, Jon Masters, Andy Gross,
	David S. Miller
In-Reply-To: <570EA434.30007@codeaurora.org>

Timur Tabi <timur@codeaurora.org> writes:
> Shanker Donthineni wrote:
>>>> >>     drivers/net/ethernet/qualcomm/emac/emac-mac.c: In function 'emac_mac_up':
>>>>>>>> >>>> >>drivers/net/ethernet/qualcomm/emac/emac-mac.c:1076:9: warning: large integer implicitly truncated to unsigned type [-Woverflow]
>>>> >>       writel(~DIS_INT, adpt->base + EMAC_INT_STATUS);
>>> >
>>> >This doesn't happen on arm64, and I don't know how to fix it.  DIS_INT is defined as:
>>> >
>>> >     #define DIS_INT          BIT(31)
>>> >
>> Try with (1U<<31).
>>
>
> Except that Gilad was previously asked to use the BIT() macros:
>
> 	https://lkml.org/lkml/2015/12/15/797

So typecast it.

  writel((u32)~DIS_INT, adpt->base + EMAC_INT_STATUS);


I believe the reason you don't see this on arm64 is that the writel
macro includes the typecast there.  But it doesn't on x86_64


Bjørn

^ permalink raw reply

* RE: TCP reaching to maximum throughput after a long time
From: Machani, Yaniv @ 2016-04-13 20:26 UTC (permalink / raw)
  To: Neal Cardwell, Eric Dumazet
  Cc: Yuchung Cheng, Ben Greear, netdev, David S. Miller, Eric Dumazet,
	Nandita Dukkipati, open list, Kama, Meirav
In-Reply-To: <CADVnQy=1eZbWxLRJ3t8grazBJzQrF6LjudiX3HF3sG=sNmGq5Q@mail.gmail.com>

On Wed, Apr 13, 2016 at 17:32:29, Neal Cardwell wrote:
> Miller; Eric Dumazet; Nandita Dukkipati; open list; Kama, Meirav
> Subject: Re: TCP reaching to maximum throughput after a long time
> 
> I like the idea to disable hystart ack-train.
> 
> 
> Yaniv, can you please re-run your test with CUBIC in three different
> scenarios:
> 
> a) echo 0 > /sys/module/tcp_cubic/parameters/hystart_detect
This fixes the issues, got to max throughput immediately. 

> 
> b) echo 1 > /sys/module/tcp_cubic/parameters/hystart_detect
> 
This shows the same results as before, starting low and increasing slowly.

>
> c) echo 2 > /sys/module/tcp_cubic/parameters/hystart_detect
This gets us a bit higher at the beginning, but never (waited ~60 sec) goes to the max.


Appreciate your help on this.
Yaniv
> 
> 
> This should help us isolate whether the hystart ack-train algorithm is 
> causing problems in this particular case.
> 
> Thanks!
> neal
> 
> 
> 
> On Tue, Apr 12, 2016 at 11:32 PM, Eric Dumazet 
> <eric.dumazet@gmail.com>
> wrote:
> 
> 
> 	On Tue, 2016-04-12 at 20:08 -0700, Yuchung Cheng wrote:
> 
> 	> based on the prev thread I propose we disable hystart ack-train. It 
> is
> 	> brittle under various circumstances. We've disabled that at Google 
> for
> 	> years.
> 
> 	Right, but because we also use sch_fq packet scheduler and pacing ;)
> 
> 
> 
> 




^ permalink raw reply

* Re: [PATCH iproute2] ss: 64bit inode numbers
From: Stephen Hemminger @ 2016-04-13 20:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1460499749.6473.601.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 12 Apr 2016 15:22:29 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> Lets prepare for a possibility to have 64bit inode numbers for sockets,
> even if the kernel currently enforces 32bit numbers.
> 
> Presumably, if both kernel and userland are 64bit (no 32bit emulation),
> kernel could switch to 64bit inode numbers soon.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
 
Why not use ino_t rather than __u64 which is really intended for kernel/user abi.
Then make code generic on sizeof inode type.

^ permalink raw reply

* Re: [PATCH iproute2] ss: 64bit inode numbers
From: Eric Dumazet @ 2016-04-13 21:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20160413135511.4e683440@xeon-e3>

On Wed, 2016-04-13 at 13:55 -0700, Stephen Hemminger wrote:
> On Tue, 12 Apr 2016 15:22:29 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Lets prepare for a possibility to have 64bit inode numbers for sockets,
> > even if the kernel currently enforces 32bit numbers.
> > 
> > Presumably, if both kernel and userland are 64bit (no 32bit emulation),
> > kernel could switch to 64bit inode numbers soon.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>  
> Why not use ino_t rather than __u64 which is really intended for kernel/user abi.
> Then make code generic on sizeof inode type

Well, maybe because code dealing with ino_t is very often buggy ;)

For example iproute2 makes the assumption an ino_t can be printed using
%lu format. This does not work with a 32bit binary.


static void bpf_info_loop(int *fds, struct bpf_map_aux *aux)
{
        int i, tfd[BPF_MAP_ID_MAX];

        printf("ver: %d\nobj: %s\ndev: %lu\nino: %lu\nmaps: %u\n",
               aux->uds_ver, aux->obj_name, aux->obj_st.st_dev,
               aux->obj_st.st_ino, aux->num_ent);

...

No strong preference from my side.

^ permalink raw reply

* Re: [PATCH net-next 1/1] tipc: remove remnants of old broadcast code
From: David Miller @ 2016-04-13 21:50 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, paul.gortmaker, parthasarathy.bhuvaragan, richard.alpe,
	ying.xue, maloy, tipc-discussion
In-Reply-To: <1460562347-26436-1-git-send-email-jon.maloy@ericsson.com>

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Wed, 13 Apr 2016 11:45:47 -0400

> We remove a couple of leftover fields in struct tipc_bearer. Those
> were used by the old broadcast implementation, and are not needed
> any longer. There is no functional changes in this commit.
> 
> Acked-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH for-next V1 1/2] net/mlx5: Fix mlx5 ifc cmd_hca_cap bad offsets
From: Saeed Mahameed @ 2016-04-13 21:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Saeed Mahameed, David S. Miller, Doug Ledford, Linux Netdev List,
	linux-rdma, Linus Torvalds, Or Gerlitz, Matan Barak,
	Leon Romanovsky, Tal Alon, Tariq Toukan
In-Reply-To: <20160413174820.GC28411@obsidianresearch.com>

On Wed, Apr 13, 2016 at 8:48 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Wed, Apr 13, 2016 at 07:11:03PM +0300, Saeed Mahameed wrote:
>
>> Fixes: b0844444590e ("net/mlx5_core: Introduce access function to read internal timer ")
>> Fixes: b4ff3a36d3e4 ("net/mlx5: Use offset based reserved field names in the IFC header file")
>
> Are you sure those are right? b doesn't have the
> reserved_at names.
>

Yes, as stated in the commit message, change b0844444590e replaced two
reserved fields with 3 fields with a different size than the original
two, which broke all the offsets of the fields that came after.

>
> You know, the reserved_XXX just need to have unique unchanging
> numbers, it doesn't matter what the numbers are - but you have to stop
> changing them :( That is the key to avoiding conflicts when
> backporting/merging/etc.
>
> I guess the big rename has already landed, but simply stopping the
> practice of renumbing the reserved fields would have been enough.
>

This was a mistake which had to be fixed, the offset numbering was
wrong from day one of introducing those offsets, I prefer to do it
now, once and forever.

But i totally agree, we will do our best to keep this file clean as
much as possible.
Thanks for your concern.

Saeed.

^ permalink raw reply

* Re: pull-request: wireless-drivers 2016-04-13
From: David Miller @ 2016-04-13 21:56 UTC (permalink / raw)
  To: kvalo-sgV2jX0FEOL9JmXXK+q4OQ
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <87d1pts6zq.fsf-HodKDYzPHsUD5k0oWYwrnHL1okKdlPRT@public.gmane.org>

From: Kalle Valo <kvalo-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Date: Wed, 13 Apr 2016 21:23:21 +0300

> few very small fixes for 4.6. All but one are either build fixes or
> memory leaks. More info in the signed tag below.
> 
> Please let me know if there are any problems.

Pulled, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: pull-request: mac80211-next 2016-04-13
From: David Miller @ 2016-04-13 22:01 UTC (permalink / raw)
  To: johannes; +Cc: netdev, linux-wireless
In-Reply-To: <1460576192-24712-1-git-send-email-johannes@sipsolutions.net>

From: Johannes Berg <johannes@sipsolutions.net>
Date: Wed, 13 Apr 2016 21:36:31 +0200

> For a while now I've wanted to remove enum ieee80211_band, and in
> order to synchronize more easily with Kalle that's (apart from a
> documentation change I already had pending) all in this pull.
> 
> Let me know if there's any problem.

Pulled, thanks Johannes.

^ permalink raw reply

* [PATCH] sctp: simplify sk_receive_queue locking
From: Marcelo Ricardo Leitner @ 2016-04-13 22:12 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Vlad Yasevich, Neil Horman

SCTP already serializes access to rcvbuf through its sock lock:
sctp_recvmsg takes it right in the start and release at the end, while
rx path will also take the lock before doing any socket processing. On
sctp_rcv() it will check if there is an user using the socket and, if
there is, it will queue incoming packets to the backlog. The backlog
processing will do the same. Even timers will do such check and
re-schedule if an user is using the socket.

Simplifying this will allow us to remove sctp_skb_list_tail and get ride
of some expensive lockings.  The lists that it is used on are also
mangled with functions like __skb_queue_tail and __skb_unlink in the
same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
sctp_close() will also purge those while using only the sock lock.

Therefore the lockings performed by sctp_skb_list_tail() are not
necessary. This patch removes this function and replaces its calls with
just skb_queue_splice_tail_init() instead.

The biggest gain is at sctp_ulpq_tail_event(), because the events always
contain a list, even if it's queueing a single skb and this was
triggering expensive calls to spin_lock_irqsave/_irqrestore for every
data chunk received.

As SCTP will deliver each data chunk on a corresponding recvmsg, the
more effective the change will be.
Before this patch, with chunks with 30 bytes:
netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
400000 -s 400000 400000
on a 10Gbit link with 1500 MTU:

SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

425984 425984     30    60.00       137.45   7.34     7.36     52.504  52.608

With it:

SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

425984 425984     30    60.00       179.10   7.97     6.70     43.740  36.788

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---

Tested doing single and multi-threaded recvmsg()s on a socket flooded
with chunks, varying sizes across tests, all good.
netperf as above also good.
sctp_test too.

 include/net/sctp/sctp.h | 15 ---------------
 net/sctp/socket.c       |  4 +---
 net/sctp/ulpqueue.c     |  5 +++--
 3 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 03fb33efcae21d54192204629ff4ced2e36e7d4d..978d5f67d5a700aaa6d15d31c7eff62b430b589a 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -359,21 +359,6 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp);
 #define sctp_skb_for_each(pos, head, tmp) \
 	skb_queue_walk_safe(head, pos, tmp)

-/* A helper to append an entire skb list (list) to another (head). */
-static inline void sctp_skb_list_tail(struct sk_buff_head *list,
-				      struct sk_buff_head *head)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&head->lock, flags);
-	spin_lock(&list->lock);
-
-	skb_queue_splice_tail_init(list, head);
-
-	spin_unlock(&list->lock);
-	spin_unlock_irqrestore(&head->lock, flags);
-}
-
 /**
  *	sctp_list_dequeue - remove from the head of the queue
  *	@list: list to dequeue from
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 36697f85ce48be39f41ddedd9f53369c7f9e28d8..bf265a4bba6e85a31cb9779511c2af9eac077710 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6766,13 +6766,11 @@ struct sk_buff *sctp_skb_recv_datagram(struct sock *sk, int flags,
 		 *  However, this function was correct in any case. 8)
 		 */
 		if (flags & MSG_PEEK) {
-			spin_lock_bh(&sk->sk_receive_queue.lock);
 			skb = skb_peek(&sk->sk_receive_queue);
 			if (skb)
 				atomic_inc(&skb->users);
-			spin_unlock_bh(&sk->sk_receive_queue.lock);
 		} else {
-			skb = skb_dequeue(&sk->sk_receive_queue);
+			skb = __skb_dequeue(&sk->sk_receive_queue);
 		}

 		if (skb)
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index 72e5b3e41cddf9d79371de8ab01484e4601b97b6..ec12a8920e5fd7a0f26d19f1695bc2feeae41518 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -141,7 +141,8 @@ int sctp_clear_pd(struct sock *sk, struct sctp_association *asoc)
 		 */
 		if (!skb_queue_empty(&sp->pd_lobby)) {
 			struct list_head *list;
-			sctp_skb_list_tail(&sp->pd_lobby, &sk->sk_receive_queue);
+			skb_queue_splice_tail_init(&sp->pd_lobby,
+						   &sk->sk_receive_queue);
 			list = (struct list_head *)&sctp_sk(sk)->pd_lobby;
 			INIT_LIST_HEAD(list);
 			return 1;
@@ -252,7 +253,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 	 * collected on a list.
 	 */
 	if (skb_list)
-		sctp_skb_list_tail(skb_list, queue);
+		skb_queue_splice_tail_init(skb_list, queue);
 	else
 		__skb_queue_tail(queue, skb);

-- 
2.5.0

^ permalink raw reply related

* Re: [PATCH v2 net-next 0/7] DSA refactoring: set 1
From: David Miller @ 2016-04-13 22:15 UTC (permalink / raw)
  To: andrew; +Cc: f.fainelli, netdev, vivien.didelot
In-Reply-To: <1460508045-20254-1-git-send-email-andrew@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 13 Apr 2016 02:40:38 +0200

> There has been a long running effort to refractor DSA probing to make
> the switches true linux devices. Here are a small collection of
> patches moving in this direction. Most have been seen before.
> 
> We take a little step forward by passing the dsa device point to the
> driver, thus allowing it to perform resource allocations using the
> normal mechanisms. This device structure will later be replaced by the
> devices own device structure.
> 
> Future patches will add a true driver probe function, so we rename the
> current probe function, cleaning up the namespace.
> 
> phys_port_mask continually confuses me, thinking it is about PHYs. But
> it is actually about ports enabled to the outside world. So rename it to
> enabled_port_mask.
> 
> Lots more patches yet to follow, this is just doing some ground work.
> 
> v2:
>   enabled_port_mask instread of user_port_masks
>   Added Tested-by's and Reviewed-by.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH 1/2] [v4] net: emac: emac gigabit ethernet controller driver
From: Florian Fainelli @ 2016-04-13 22:16 UTC (permalink / raw)
  To: Timur Tabi, netdev, linux-kernel, devicetree, linux-arm-msm,
	sdharia, Shanker Donthineni, Greg Kroah-Hartman, vikrams, cov,
	gavidov, Rob Herring, andrew, bjorn.andersson, Mark Langsdorf,
	Jon Masters, Andy Gross, David S. Miller
In-Reply-To: <1460570393-19838-1-git-send-email-timur@codeaurora.org>

On 13/04/16 10:59, Timur Tabi wrote:
> From: Gilad Avidov <gavidov@codeaurora.org>
> 
> Add supports for ethernet controller HW on Qualcomm Technologies, Inc. SoC.
> This driver supports the following features:
> 1) Checksum offload.
> 2) Runtime power management support.
> 3) Interrupt coalescing support.
> 4) SGMII phy.
> 5) SGMII direct connection without external phy.

I think you should shoot for more simple for an initial submission:

- no offload
- no timestamping

get that accepted, and then add features one by one, it sure is more
work, but it helps with the review, and makes you work off a solid base.

You will see below, but a pet peeve of mine is authors reimplementing
code that exists in PHYLIB.

[snip]

> diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt b/Documentation/devicetree/bindings/net/qcom-emac.txt
> new file mode 100644
> index 0000000..df5e7c0
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/qcom-emac.txt
> @@ -0,0 +1,65 @@
> +Qualcomm EMAC Gigabit Ethernet Controller
> +
> +Required properties:
> +- compatible : Should be "qcom,emac".
> +- reg : Offset and length of the register regions for the device
> +- reg-names : Register region names referenced in 'reg' above.
> +	Required register resource entries are:
> +	"base"   : EMAC controller base register block.
> +	"csr"    : EMAC wrapper register block.
> +	Optional register resource entries are:
> +	"ptp"    : EMAC PTP (1588) register block.
> +		   Required if 'qcom,emac-tstamp-en' is present.
> +	"sgmii"  : EMAC SGMII PHY register block.
> +- interrupts : Interrupt numbers used by this controller
> +- interrupt-names : Interrupt resource names referenced in 'interrupts' above.
> +	Required interrupt resource entries are:
> +	"emac_core0"   : EMAC core0 interrupt.
> +	"sgmii_irq"   : EMAC SGMII interrupt.
> +- phy-addr            : Specifies phy address on MDIO bus.
> +			Required if the optional property "qcom,no-external-phy"
> +			is not specified.

This is not the standard way to represent an Ethernet PHY hanging off a
MDIO bus see ethernet.txt and phy.txt in D/dt/bindings/net/

> +
> +Optional properties:
> +- qcom,emac-tstamp-en       : Enables the PTP (1588) timestamping feature.
> +			      Include this only if PTP (1588) timestamping
> +			      feature is needed. If included, "ptp" register
> +			      base should be specified.

If the "ptp" register range is not specified, then PTP gets disabled, so
is a boolean really required here, considering that this looks like a
policy decision more than anything.

> +- mac-address               : The 6-byte MAC address. If present, it is the
> +			      default MAC address.

This property is pretty much mandatory

> +- qcom,no-external-phy      : Indicates there is no external PHY connected to
> +			      EMAC. Include this only if the EMAC is directly
> +			      connected to the peer end without EPHY.

How is the internal PHY accessed, is it responding on the MDIO bus at a
particular address? If so, standard MDIO scanning/probing works, and you
can have your PHY driver flag this device has internal. Worst case, you
can do what BCMGENET does, and have a special "phy-mode" value set to
"internal" when this knowledge needs to exist prior to MDIO bus scanning
(e.g: to power on the PHY).

> +Example:
> +	emac0: qcom,emac@feb20000 {
> +		compatible = "qcom,fsm9900-emac";
> +		reg-names = "base", "csr", "ptp", "sgmii";
> +		reg =   <0xfeb20000 0x10000>,
> +			<0xfeb36000 0x1000>,
> +			<0xfeb3c000 0x4000>,
> +			<0xfeb38000 0x400>;
> +		#address-cells = <0>;
> +		interrupt-parent = <&emac0>;
> +		#interrupt-cells = <1>;
> +		interrupts = <0 1>;
> +		interrupt-map-mask = <0xffffffff>;
> +		interrupt-map = <0 &intc 0 76 0
> +				 1 &intc 0 80 0>;
> +		interrupt-names = "emac_core0", "sgmii_irq";
> +		qcom,emac-tstamp-en;
> +		phy-addr = <0>;
> +
> +		pinctrl-names = "default";
> +		pinctrl-0 = <&mdio_pins_a>;
> +	};
> +
> +	tlmm: pinctrl@fd510000 {
> +		compatible = "qcom,fsm9900-pinctrl";
> +
> +		mdio_pins_a: mdio {
> +			state {
> +				pins = "gpio123", "gpio124";
> +				function = "mdio";
> +			};
> +		};
> +	};
> diff --git a/drivers/net/ethernet/qualcomm/Kconfig b/drivers/net/ethernet/qualcomm/Kconfig
> index a76e380..85b599f 100644
> --- a/drivers/net/ethernet/qualcomm/Kconfig
> +++ b/drivers/net/ethernet/qualcomm/Kconfig
> @@ -24,4 +24,15 @@ config QCA7000
>  	  To compile this driver as a module, choose M here. The module
>  	  will be called qcaspi.
>  
> +config QCOM_EMAC
> +	tristate "Qualcomm Technologies, Inc. EMAC Gigabit Ethernet support"
> +	select CRC32
> +	---help---
> +	  This driver supports the Qualcomm Technologies, Inc. Gigabit
> +	  Ethernet Media Access Controller (EMAC). The controller
> +	  supports IEEE 802.3-2002, half-duplex mode at 10/100 Mb/s,
> +	  full-duplex mode at 10/100/1000Mb/s, Wake On LAN (WOL) for
> +	  low power, Receive-Side Scaling (RSS), and IEEE 1588-2008
> +	  Precision Clock Synchronization Protocol.
> +
>  endif # NET_VENDOR_QUALCOMM

[snip]

> +/* Config MAC modes */
> +void emac_mac_mode_config(struct emac_adapter *adpt)
> +{
> +	u32 mac;
> +
> +	mac = readl(adpt->base + EMAC_MAC_CTRL);
> +
> +	if (test_bit(EMAC_STATUS_VLANSTRIP_EN, &adpt->status))
> +		mac |= VLAN_STRIP;
> +	else
> +		mac &= ~VLAN_STRIP;
> +
> +	if (test_bit(EMAC_STATUS_PROMISC_EN, &adpt->status))
> +		mac |= PROM_MODE;
> +	else
> +		mac &= ~PROM_MODE;
> +
> +	if (test_bit(EMAC_STATUS_MULTIALL_EN, &adpt->status))
> +		mac |= MULTI_ALL;
> +	else
> +		mac &= ~MULTI_ALL;
> +
> +	if (test_bit(EMAC_STATUS_LOOPBACK_EN, &adpt->status))
> +		mac |= MAC_LP_EN;
> +	else
> +		mac &= ~MAC_LP_EN;

Do you need to maintain these flags when most, if not all of them
already exist in dev->flags or dev->features?

[snip]

> +	/* setup link speed */
> +	mac &= ~SPEED_BMSK;
> +	switch (phy->link_speed) {
> +	case EMAC_LINK_SPEED_1GB_FULL:
> +		mac |= ((emac_mac_speed_1000 << SPEED_SHFT) & SPEED_BMSK);
> +		csr1 |= FREQ_MODE;
> +		break;
> +	default:
> +		mac |= ((emac_mac_speed_10_100 << SPEED_SHFT) & SPEED_BMSK);
> +		csr1 &= ~FREQ_MODE;
> +		break;
> +	}

If you implement the driver using PHYLIB, which you should in order to
support arbitrary or internal PHYs, then this function gets invoked
whenever there is a link parameter change (auto-neg, forcing,
duplex/speed/no link etc.).

[snip]

> +	napi_enable(&adpt->rx_q.napi);
> +
> +	/* enable mac irq */
> +	writel(~DIS_INT, adpt->base + EMAC_INT_STATUS);
> +	writel(adpt->irq.mask, adpt->base + EMAC_INT_MASK);
> +
> +	netif_start_queue(netdev);

Starting the TX queue is typically the last ting you want to do, to
avoid a transient state where the TX queue is enabled, and the link is
not (which is okay if your driver is properly implemented and reflects
carrier changes anyway).

> +	clear_bit(EMAC_STATUS_DOWN, &adpt->status);
> +
> +	/* check link status */
> +	set_bit(EMAC_STATUS_TASK_LSC_REQ, &adpt->status);
> +	adpt->link_chk_timeout = jiffies + EMAC_TRY_LINK_TIMEOUT;
> +	mod_timer(&adpt->timers, jiffies);

Please implement a PHYLIB driver and use phy_start() here.

> +
> +	return 0;
> +}
> +
> +/* Bring down the interface/HW */
> +void emac_mac_down(struct emac_adapter *adpt, bool reset)
> +{
> +	struct net_device *netdev = adpt->netdev;
> +	struct emac_phy *phy = &adpt->phy;
> +	unsigned long flags;
> +
> +	set_bit(EMAC_STATUS_DOWN, &adpt->status);

Do you need to maintain that? Would not netif_running() tell you what
you want if you reflect the carrier state properly?

> +
> +	netif_stop_queue(netdev);
> +	netif_carrier_off(netdev);

phy_stop() would take care of the latter.

[snip]

> +/* Process transmit event */
> +void emac_mac_tx_process(struct emac_adapter *adpt, struct emac_tx_queue *tx_q)
> +{
> +	struct emac_buffer *tpbuf;
> +	u32 hw_consume_idx;
> +	u32 pkts_compl = 0, bytes_compl = 0;
> +	u32 reg = readl_relaxed(adpt->base + tx_q->consume_reg);
> +
> +	hw_consume_idx = (reg & tx_q->consume_mask) >> tx_q->consume_shift;
> +
> +	while (tx_q->tpd.consume_idx != hw_consume_idx) {
> +		tpbuf = GET_TPD_BUFFER(tx_q, tx_q->tpd.consume_idx);
> +		if (tpbuf->dma_addr) {
> +			dma_unmap_single(adpt->netdev->dev.parent,
> +					 tpbuf->dma_addr, tpbuf->length,
> +					 DMA_TO_DEVICE);
> +			tpbuf->dma_addr = 0;
> +		}
> +
> +		if (tpbuf->skb) {
> +			pkts_compl++;
> +			bytes_compl += tpbuf->skb->len;
> +			dev_kfree_skb_irq(tpbuf->skb);
> +			tpbuf->skb = NULL;
> +		}
> +
> +		if (++tx_q->tpd.consume_idx == tx_q->tpd.count)
> +			tx_q->tpd.consume_idx = 0;
> +	}
> +
> +	if (pkts_compl || bytes_compl)
> +		netdev_completed_queue(adpt->netdev, pkts_compl, bytes_compl);

The condition can be eliminated.

[snip]

> +	if (skb_network_offset(skb) != ETH_HLEN)
> +		TPD_TYP_SET(&tpd, 1);
> +
> +	emac_tx_fill_tpd(adpt, tx_q, skb, &tpd);
> +
> +	netdev_sent_queue(adpt->netdev, skb->len);
> +
> +	/* update produce idx */
> +	prod_idx = (tx_q->tpd.produce_idx << tx_q->produce_shift) &
> +		    tx_q->produce_mask;
> +	emac_reg_update32(adpt->base + tx_q->produce_reg,
> +			  tx_q->produce_mask, prod_idx);

Since you have a producer index, you should consider checking
skb->xmit_more to know whether you can update the register now, or
later, which could save some expensive operation and batch TX.

[snip]

> diff --git a/drivers/net/ethernet/qualcomm/emac/emac-phy.c b/drivers/net/ethernet/qualcomm/emac/emac-phy.c
> new file mode 100644
> index 0000000..7d18de3
> --- /dev/null
> +++ b/drivers/net/ethernet/qualcomm/emac/emac-phy.c

This file is really really ugly, and duplicates a lot of functionality
provided by PHYLIB, you really need to implement a PHYLIB MDIO driver
and eventually a small PHY driver for your internal PHY if it needs some
baby sitting.
[snip]

> diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
> new file mode 100644
> index 0000000..ce328f5
> --- /dev/null
> +++ b/drivers/net/ethernet/qualcomm/emac/emac.c
> @@ -0,0 +1,1206 @@
> +/* Copyright (c) 2013-2016, The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 and
> + * only version 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +/* Qualcomm Technologies, Inc. EMAC Gigabit Ethernet Driver
> + * The EMAC driver supports following features:
> + * 1) Receive Side Scaling (RSS).
> + * 2) Checksum offload.
> + * 3) Multiple PHY support on MDIO bus.
> + * 4) Runtime power management support.
> + * 5) Interrupt coalescing support.
> + * 6) SGMII phy.
> + * 7) SGMII direct connection (without external phy).
> + */
> +
> +#include <linux/if_ether.h>
> +#include <linux/if_vlan.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_net.h>
> +#include <linux/phy.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +#include "emac.h"
> +#include "emac-mac.h"
> +#include "emac-phy.h"
> +#include "emac-sgmii.h"
> +
> +#define DRV_VERSION "1.3.0.0"
> +
> +static int debug = -1;
> +module_param(debug, int, S_IRUGO | S_IWUSR | S_IWGRP);

ethtool -s <iface> msglvl provides you with that already.

> +
> +static int emac_irq_use_extended;
> +module_param(emac_irq_use_extended, int, S_IRUGO | S_IWUSR | S_IWGRP);

What is that module parameter used for?

> +
> +const char emac_drv_name[] = "qcom-emac";
> +const char emac_drv_description[] =
> +			"Qualcomm Technologies, Inc. EMAC Ethernet Driver";
> +const char emac_drv_version[] = DRV_VERSION;

Static all other the place?

[snip]

> +
> +/* NAPI */
> +static int emac_napi_rtx(struct napi_struct *napi, int budget)
> +{
> +	struct emac_rx_queue *rx_q = container_of(napi, struct emac_rx_queue,
> +						   napi);
> +	struct emac_adapter *adpt = netdev_priv(rx_q->netdev);
> +	struct emac_irq *irq = rx_q->irq;
> +
> +	int work_done = 0;
> +
> +	/* Keep link state information with original netdev */
> +	if (!netif_carrier_ok(adpt->netdev))
> +		goto quit_polling;

I do not think this is a condition that could occur?

> +
> +	emac_mac_rx_process(adpt, rx_q, &work_done, budget);
> +
> +	if (work_done < budget) {
> +quit_polling:
> +		napi_complete(napi);
> +
> +		irq->mask |= rx_q->intr;
> +		writel(irq->mask, adpt->base + EMAC_INT_MASK);
> +	}
> +
> +	return work_done;
> +}
> +
> +/* Transmit the packet */
> +static int emac_start_xmit(struct sk_buff *skb, struct net_device *netdev)
> +{
> +	struct emac_adapter *adpt = netdev_priv(netdev);
> +
> +	return emac_mac_tx_buf_send(adpt, &adpt->tx_q, skb);

I would inline emac_mac_tx_buf_send()'s body here to make it much easier
to read and audit...

> +}
> +
> +irqreturn_t emac_isr(int _irq, void *data)
> +{
> +	struct emac_irq *irq = data;
> +	struct emac_adapter *adpt = container_of(irq, struct emac_adapter, irq);
> +	struct emac_rx_queue *rx_q = &adpt->rx_q;
> +
> +	int max_ints = 1;
> +	u32 isr, status;
> +
> +	/* disable the interrupt */
> +	writel(0, adpt->base + EMAC_INT_MASK);
> +
> +	do {

With max_ints = 1, this is essentially the same as no loop, so just
inline it to reduce the indentation.

> +		isr = readl_relaxed(adpt->base + EMAC_INT_STATUS);
> +		status = isr & irq->mask;
> +
> +		if (status == 0)
> +			break;
> +
> +		if (status & ISR_ERROR) {
> +			netif_warn(adpt,  intr, adpt->netdev,
> +				   "warning: error irq status 0x%lx\n",
> +				   status & ISR_ERROR);
> +			/* reset MAC */
> +			set_bit(EMAC_STATUS_TASK_REINIT_REQ, &adpt->status);
> +			emac_work_thread_reschedule(adpt);
> +		}
> +
> +		/* Schedule the napi for receive queue with interrupt
> +		 * status bit set
> +		 */
> +		if ((status & rx_q->intr)) {
> +			if (napi_schedule_prep(&rx_q->napi)) {
> +				irq->mask &= ~rx_q->intr;
> +				__napi_schedule(&rx_q->napi);
> +			}
> +		}
> +
> +		if (status & TX_PKT_INT)
> +			emac_mac_tx_process(adpt, &adpt->tx_q);

You should consider using a NAPI instance for reclaiming TX buffers as well.

> +
> +		if (status & ISR_OVER)
> +			netif_warn(adpt, intr, adpt->netdev,
> +				   "warning: TX/RX overflow status 0x%lx\n",
> +				   status & ISR_OVER);

This should be ratelimited presumably

> +
> +		/* link event */
> +		if (status & (ISR_GPHY_LINK | SW_MAN_INT)) {
> +			emac_lsc_schedule_check(adpt);
> +			break;
> +		}
> +	} while (--max_ints > 0);
> +
> +	/* enable the interrupt */
> +	writel(irq->mask, adpt->base + EMAC_INT_MASK);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/* Configure VLAN tag strip/insert feature */
> +static int emac_set_features(struct net_device *netdev,
> +			     netdev_features_t features)
> +{
> +	struct emac_adapter *adpt = netdev_priv(netdev);
> +
> +	netdev_features_t changed = features ^ netdev->features;
> +
> +	if (!(changed & (NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX)))
> +		return 0;
> +
> +	netdev->features = features;
> +	if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX)
> +		set_bit(EMAC_STATUS_VLANSTRIP_EN, &adpt->status);
> +	else
> +		clear_bit(EMAC_STATUS_VLANSTRIP_EN, &adpt->status);

What about TX vlan offload?

[snip]

> +
> +/* Called when the network interface is made active */
> +static int emac_open(struct net_device *netdev)
> +{
> +	struct emac_adapter *adpt = netdev_priv(netdev);
> +	int ret;
> +
> +	netif_carrier_off(netdev);

That seems unnecessary here because your close/down function does that,
and with PHYLIB you would get it set correctly anyway.

[snip]

> +/* PHY related IOCTLs */
> +static int emac_mii_ioctl(struct net_device *netdev,
> +			  struct ifreq *ifr, int cmd)
> +{
> +	struct emac_adapter *adpt = netdev_priv(netdev);
> +	struct emac_phy *phy = &adpt->phy;
> +	struct mii_ioctl_data *data = if_mii(ifr);
> +
> +	switch (cmd) {
> +	case SIOCGMIIPHY:
> +		data->phy_id = phy->addr;
> +		return 0;
> +
> +	case SIOCGMIIREG:
> +		if (!capable(CAP_NET_ADMIN))
> +			return -EPERM;
> +
> +		if (data->reg_num & ~(0x1F))
> +			return -EFAULT;
> +
> +		if (data->phy_id >= PHY_MAX_ADDR)
> +			return -EFAULT;
> +
> +		if (phy->external && data->phy_id != phy->addr)
> +			return -EFAULT;
> +
> +		return emac_phy_read(adpt, data->phy_id, data->reg_num,
> +				     &data->val_out);
> +
> +	case SIOCSMIIREG:
> +		if (!capable(CAP_NET_ADMIN))
> +			return -EPERM;
> +
> +		if (data->reg_num & ~(0x1F))
> +			return -EFAULT;
> +
> +		if (data->phy_id >= PHY_MAX_ADDR)
> +			return -EFAULT;
> +
> +		if (phy->external && data->phy_id != phy->addr)
> +			return -EFAULT;
> +
> +		return emac_phy_write(adpt, data->phy_id, data->reg_num,
> +				      data->val_in);
> +	default:
> +		return -EFAULT;
> +	}

All of that can be eliminated with a PHYLIB implementation too.

[snip]

> +/* Provide network statistics info for the interface */
> +struct rtnl_link_stats64 *emac_get_stats64(struct net_device *netdev,
> +					   struct rtnl_link_stats64 *net_stats)
> +{
> +	struct emac_adapter *adpt = netdev_priv(netdev);
> +	struct emac_stats *stats = &adpt->stats;
> +	u16 addr = REG_MAC_RX_STATUS_BIN;
> +	u64 *stats_itr = &adpt->stats.rx_ok;
> +	u32 val;
> +
> +	while (addr <= REG_MAC_RX_STATUS_END) {
> +		val = readl_relaxed(adpt->base + addr);
> +		*stats_itr += val;
> +		++stats_itr;
> +		addr += sizeof(u32);
> +	}

There is no reader locking here, what happens if two applications read
the statistics at the same time?

[snip]

> +/* Get the resources */
> +static int emac_probe_resources(struct platform_device *pdev,
> +				struct emac_adapter *adpt)
> +{
> +	struct net_device *netdev = adpt->netdev;
> +	struct device_node *node = pdev->dev.of_node;
> +	struct resource *res;
> +	const void *maddr;
> +	int ret = 0;
> +	int i;
> +
> +	/* get time stamp enable flag */
> +	adpt->timestamp_en = of_property_read_bool(node, "qcom,emac-tstamp-en");
> +
> +	/* get mac address */
> +	maddr = of_get_mac_address(node);
> +	if (!maddr)
> +		return -ENODEV;

No, generate a random one, continue, but warn,

> +
> +	memcpy(adpt->mac_perm_addr, maddr, netdev->addr_len);
> +
> +	ret = platform_get_irq_byname(pdev, EMAC_MAC_IRQ_RES);
> +	if (ret < 0) {
> +		netdev_err(adpt->netdev,
> +			   "error: missing %s resource\n", EMAC_MAC_IRQ_RES);
> +		return ret;
> +	}
> +	adpt->irq.irq = ret;
> +
> +	ret = emac_clks_get(pdev, adpt);
> +	if (ret)
> +		return ret;
> +
> +	/* get register addresses */
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "base");
> +	if (!res) {
> +		netdev_err(adpt->netdev, "error: missing 'base' resource\n");
> +		ret = -ENXIO;
> +		goto err_reg_res;
> +	}
> +
> +	adpt->base = devm_ioremap_resource(&pdev->dev, res);
> +	if (!adpt->base) {
> +		ret = -ENOMEM;
> +		goto err_reg_res;
> +	}
> +
> +	res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "csr");
> +	if (!res) {
> +		netdev_err(adpt->netdev, "error: missing 'csr' resource\n");
> +		ret = -ENXIO;
> +		goto err_reg_res;
> +	}

No need to check that, devm_ioremap_resource() does it too.

> +
> +	adpt->csr = devm_ioremap_resource(&pdev->dev, res);
> +	if (!adpt->csr) {
> +		ret = -ENOMEM;
> +		goto err_reg_res;
> +	}
> +
> +	netdev->base_addr = (unsigned long)adpt->base;
> +	return 0;
> +
> +err_reg_res:
> +	for (i = 0; i < EMAC_CLK_CNT; i++) {
> +		if (adpt->clk[i]) {
> +			clk_put(adpt->clk[i]);
> +			adpt->clk[i] = NULL;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
> +/* Release resources */
> +static void emac_release_resources(struct emac_adapter *adpt)
> +{
> +	int i;
> +
> +	for (i = 0; i < EMAC_CLK_CNT; i++)
> +		if (adpt->clk[i]) {
> +			clk_put(adpt->clk[i]);
> +			adpt->clk[i] = NULL;
> +		}
> +}
> +
> +/* Probe function */
> +static int emac_probe(struct platform_device *pdev)
> +{
> +	struct net_device *netdev;
> +	struct emac_adapter *adpt;
> +	struct emac_phy *phy;
> +	int ret = 0;
> +	u32 hw_ver;
> +	u32 extended_irq_mask = emac_irq_use_extended ? IMR_EXTENDED_MASK :
> +							IMR_NORMAL_MASK;
> +
> +	netdev = alloc_etherdev(sizeof(struct emac_adapter));
> +	if (!netdev)
> +		return -ENOMEM;

There are references to multiple queues in the code, so why not
alloc_etherdev_mq() here with the correct number of queues?

> +
> +	dev_set_drvdata(&pdev->dev, netdev);
> +	SET_NETDEV_DEV(netdev, &pdev->dev);
> +
> +	adpt = netdev_priv(netdev);
> +	adpt->netdev = netdev;
> +	phy = &adpt->phy;
> +	adpt->msg_enable = netif_msg_init(debug, EMAC_MSG_DEFAULT);
> +
> +	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));

Really, is not that supposed to run on ARM64 servers?
-- 
Florian

^ permalink raw reply

* [PATCH iproute2] ss: take care of unknown min_rtt
From: Eric Dumazet @ 2016-04-13 22:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

Kernel sets info->tcpi_min_rtt to ~0U when no RTT sample was ever
taken for the session, thus min_rtt is unknown.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 misc/ss.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index 38cf331..51ff5ac 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2018,7 +2018,8 @@ static void tcp_show_info(const struct nlmsghdr *nlh, struct inet_diag_msg *r,
 		s.segs_out = info->tcpi_segs_out;
 		s.segs_in = info->tcpi_segs_in;
 		s.not_sent = info->tcpi_notsent_bytes;
-		s.min_rtt = (double) info->tcpi_min_rtt / 1000;
+		if (info->tcpi_min_rtt && info->tcpi_min_rtt != ~0U)
+			s.min_rtt = (double) info->tcpi_min_rtt / 1000;
 		tcp_stats_print(&s);
 		free(s.dctcp);
 	}

^ permalink raw reply related

* Re: [PATCH 0/3] crypto: af_alg - add TLS type encryption
From: Tadeusz Struk @ 2016-04-13 22:46 UTC (permalink / raw)
  To: Fridolin Pokorny
  Cc: Tom Herbert, Herbert Xu, linux-crypto, LKML, David S. Miller,
	Linux Kernel Network Developers, davejwatson, nmav,
	fridolin.pokorny
In-Reply-To: <570CD852.7060003@redhat.com>

Hi Fridolin,
On 04/12/2016 04:13 AM, Fridolin Pokorny wrote:
> we were experimenting with this. We have a prove of concept of a kernel
> TLS type socket, so called AF_KTLS, which is based on Dave Watson's
> RFC5288 patch. It handles both TLS and DTLS, unfortunately it is not
> ready now to be proposed here. There are still issues which should be
> solved (but mostly user space API design) [1]. If you are interested, we
> could combine efforts.
> 
> Regards,
> Fridolin Pokorny
> 
> [1] https://github.com/fridex/af_ktls

I had a quick look and it looks like is limited only to gcm(aes). 
I would be more interested to have a generic interface that could do generic algorithm
suits like aes-cbc-hmac-sha1 also.
This also seems to work in a synchronous (send one and wait) mode, which is a not good
solution for HW accelerators, which I'm trying to enable.
Thanks,
-- 
TS

^ permalink raw reply

* [PATCH 2/3] netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options
From: Pablo Neira Ayuso @ 2016-04-13 22:54 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460588094-3933-1-git-send-email-pablo@netfilter.org>

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

Baozeng Ding reported a KASAN stack out of bounds issue - it uncovered that
the TCP option parsing routines in netfilter TCP connection tracking could
read one byte out of the buffer of the TCP options.  Therefore in the patch
we check that the available data length is large enough to parse both TCP
option code and size.

Reported-by: Baozeng Ding <sploving1@gmail.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 278f3b9..7cc1d9c 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -410,6 +410,8 @@ static void tcp_options(const struct sk_buff *skb,
 			length--;
 			continue;
 		default:
+			if (length < 2)
+				return;
 			opsize=*ptr++;
 			if (opsize < 2) /* "silly options" */
 				return;
@@ -470,6 +472,8 @@ static void tcp_sack(const struct sk_buff *skb, unsigned int dataoff,
 			length--;
 			continue;
 		default:
+			if (length < 2)
+				return;
 			opsize = *ptr++;
 			if (opsize < 2) /* "silly options" */
 				return;
-- 
2.1.4


^ permalink raw reply related

* [PATCH 3/3] netfilter: ebtables: Fix extension lookup with identical name
From: Pablo Neira Ayuso @ 2016-04-13 22:54 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460588094-3933-1-git-send-email-pablo@netfilter.org>

From: Phil Sutter <phil@nwl.cc>

If a requested extension exists as module and is not loaded,
ebt_check_match() might accidentally use an NFPROTO_UNSPEC one with same
name and fail.

Reproduced with limit match: Given xt_limit and ebt_limit both built as
module, the following would fail:

  modprobe xt_limit
  ebtables -I INPUT --limit 1/s -j ACCEPT

The fix is to make ebt_check_match() distrust a found NFPROTO_UNSPEC
extension and retry after requesting an appropriate module.

Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/ebtables.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 8570bc7..5a61f35 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -370,7 +370,11 @@ ebt_check_match(struct ebt_entry_match *m, struct xt_mtchk_param *par,
 	    left - sizeof(struct ebt_entry_match) < m->match_size)
 		return -EINVAL;
 
-	match = xt_request_find_match(NFPROTO_BRIDGE, m->u.name, 0);
+	match = xt_find_match(NFPROTO_BRIDGE, m->u.name, 0);
+	if (IS_ERR(match) || match->family != NFPROTO_BRIDGE) {
+		request_module("ebt_%s", m->u.name);
+		match = xt_find_match(NFPROTO_BRIDGE, m->u.name, 0);
+	}
 	if (IS_ERR(match))
 		return PTR_ERR(match);
 	m->u.match = match;
-- 
2.1.4


^ permalink raw reply related

* [PATCH 0/3] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2016-04-13 22:54 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains Netfilter fixes for your net tree. More
specifically, they are:

1) Fix missing filter table per-netns registration in arptables, from
   Florian Westphal.

2) Resolve out of bound access when parsing TCP options in
   nf_conntrack_tcp, patch from Jozsef Kadlecsik.

3) Prefer NFPROTO_BRIDGE extensions over NFPROTO_UNSPEC in ebtables,
   this resolves conflict between xt_limit and ebt_limit, from Phil Sutter.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit 0a1a37b6d62e6864a77a82e925217c720f91f963:

  net: add the AF_KCM entries to family name tables (2016-04-06 16:59:01 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to bcf4934288402be3464110109a4dae3bd6fb3e93:

  netfilter: ebtables: Fix extension lookup with identical name (2016-04-13 01:16:57 +0200)

----------------------------------------------------------------
Florian Westphal (1):
      netfilter: arp_tables: register table in initns

Jozsef Kadlecsik (1):
      netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options

Phil Sutter (1):
      netfilter: ebtables: Fix extension lookup with identical name

 net/bridge/netfilter/ebtables.c        | 6 +++++-
 net/ipv4/netfilter/arptable_filter.c   | 6 ++++++
 net/netfilter/nf_conntrack_proto_tcp.c | 4 ++++
 3 files changed, 15 insertions(+), 1 deletion(-)

^ permalink raw reply

* [PATCH 1/3] netfilter: arp_tables: register table in initns
From: Pablo Neira Ayuso @ 2016-04-13 22:54 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1460588094-3933-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

arptables is broken since we didn't register the table anymore --
even 'arptables -L' fails.

Fixes: b9e69e127397187b ("netfilter: xtables: don't hook tables by default")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/arptable_filter.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/netfilter/arptable_filter.c b/net/ipv4/netfilter/arptable_filter.c
index dd8c80d..8f8713b 100644
--- a/net/ipv4/netfilter/arptable_filter.c
+++ b/net/ipv4/netfilter/arptable_filter.c
@@ -81,6 +81,12 @@ static int __init arptable_filter_init(void)
 		return ret;
 	}
 
+	ret = arptable_filter_table_init(&init_net);
+	if (ret) {
+		unregister_pernet_subsys(&arptable_filter_net_ops);
+		kfree(arpfilter_ops);
+	}
+
 	return ret;
 }
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH 1/1] hv_netvsc: Implement support for VF drivers on Hyper-V
From: K. Y. Srinivasan @ 2016-04-13 23:58 UTC (permalink / raw)
  To: davem, netdev, linux-kernel, devel, olaf, apw, jasowang

Support VF drivers on Hyper-V. On Hyper-V, each VF instance presented to
the guest has an associated synthetic interface that shares the MAC address
with the VF instance. Typically these are bonded together to support
live migration. By default, the host delivers all the incoming packets
on the synthetic interface. Once the VF is up, we need to explicitly switch
the data path on the host to divert traffic onto the VF interface. Even after
switching the data path, broadcast and multicast packets are always delivered
on the synthetic interface and these will have to be injected back onto the
VF interface (if VF is up).
This patch implements the necessary support in netvsc to support Linux
VF drivers.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h |   14 ++
 drivers/net/hyperv/netvsc.c     |   29 ++++
 drivers/net/hyperv/netvsc_drv.c |  309 +++++++++++++++++++++++++++++++++++----
 3 files changed, 326 insertions(+), 26 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 8b3bd8e..6700a4d 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -202,6 +202,8 @@ int rndis_filter_receive(struct hv_device *dev,
 int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
 int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac);
 
+void netvsc_switch_datapath(struct netvsc_device *nv_dev, bool vf);
+
 #define NVSP_INVALID_PROTOCOL_VERSION	((u32)0xFFFFFFFF)
 
 #define NVSP_PROTOCOL_VERSION_1		2
@@ -641,6 +643,12 @@ struct netvsc_reconfig {
 	u32 event;
 };
 
+struct garp_wrk {
+	struct work_struct dwrk;
+	struct net_device *netdev;
+	struct netvsc_device *netvsc_dev;
+};
+
 /* The context of the netvsc device  */
 struct net_device_context {
 	/* point back to our device context */
@@ -656,6 +664,7 @@ struct net_device_context {
 
 	struct work_struct work;
 	u32 msg_enable; /* debug level */
+	struct garp_wrk gwrk;
 
 	struct netvsc_stats __percpu *tx_stats;
 	struct netvsc_stats __percpu *rx_stats;
@@ -730,6 +739,11 @@ struct netvsc_device {
 	u32 vf_alloc;
 	/* Serial number of the VF to team with */
 	u32 vf_serial;
+	atomic_t open_cnt;
+	/* State to manage the associated VF interface. */
+	bool vf_inject;
+	struct net_device *vf_netdev;
+	atomic_t vf_use_cnt;
 };
 
 /* NdisInitialize message */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index ec313fc..eddce3c 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -33,6 +33,30 @@
 
 #include "hyperv_net.h"
 
+/*
+ * Switch the data path from the synthetic interface to the VF
+ * interface.
+ */
+void netvsc_switch_datapath(struct netvsc_device *nv_dev, bool vf)
+{
+	struct nvsp_message *init_pkt = &nv_dev->channel_init_pkt;
+	struct hv_device *dev = nv_dev->dev;
+
+	memset(init_pkt, 0, sizeof(struct nvsp_message));
+	init_pkt->hdr.msg_type = NVSP_MSG4_TYPE_SWITCH_DATA_PATH;
+	if (vf)
+		init_pkt->msg.v4_msg.active_dp.active_datapath =
+			NVSP_DATAPATH_VF;
+	else
+		init_pkt->msg.v4_msg.active_dp.active_datapath =
+			NVSP_DATAPATH_SYNTHETIC;
+
+	vmbus_sendpacket(dev->channel, init_pkt,
+			       sizeof(struct nvsp_message),
+			       (unsigned long)init_pkt,
+			       VM_PKT_DATA_INBAND, 0);
+}
+
 
 static struct netvsc_device *alloc_net_device(struct hv_device *device)
 {
@@ -52,11 +76,16 @@ static struct netvsc_device *alloc_net_device(struct hv_device *device)
 	init_waitqueue_head(&net_device->wait_drain);
 	net_device->start_remove = false;
 	net_device->destroy = false;
+	atomic_set(&net_device->open_cnt, 0);
+	atomic_set(&net_device->vf_use_cnt, 0);
 	net_device->dev = device;
 	net_device->ndev = ndev;
 	net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT;
 	net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT;
 
+	net_device->vf_netdev = NULL;
+	net_device->vf_inject = false;
+
 	hv_set_drvdata(device, net_device);
 	return net_device;
 }
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index b8121eb..5669589 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -610,42 +610,24 @@ void netvsc_linkstatus_callback(struct hv_device *device_obj,
 	schedule_delayed_work(&ndev_ctx->dwork, 0);
 }
 
-/*
- * netvsc_recv_callback -  Callback when we receive a packet from the
- * "wire" on the specified device.
- */
-int netvsc_recv_callback(struct hv_device *device_obj,
+
+static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
 				struct hv_netvsc_packet *packet,
-				void **data,
 				struct ndis_tcp_ip_checksum_info *csum_info,
-				struct vmbus_channel *channel,
-				u16 vlan_tci)
+				void *data, u16 vlan_tci)
 {
-	struct net_device *net;
-	struct net_device_context *net_device_ctx;
 	struct sk_buff *skb;
-	struct netvsc_stats *rx_stats;
 
-	net = ((struct netvsc_device *)hv_get_drvdata(device_obj))->ndev;
-	if (!net || net->reg_state != NETREG_REGISTERED) {
-		return NVSP_STAT_FAIL;
-	}
-	net_device_ctx = netdev_priv(net);
-	rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
-
-	/* Allocate a skb - TODO direct I/O to pages? */
 	skb = netdev_alloc_skb_ip_align(net, packet->total_data_buflen);
-	if (unlikely(!skb)) {
-		++net->stats.rx_dropped;
-		return NVSP_STAT_FAIL;
-	}
+	if (!skb)
+		return skb;
 
 	/*
 	 * Copy to skb. This copy is needed here since the memory pointed by
 	 * hv_netvsc_packet cannot be deallocated
 	 */
-	memcpy(skb_put(skb, packet->total_data_buflen), *data,
-		packet->total_data_buflen);
+	memcpy(skb_put(skb, packet->total_data_buflen), data,
+	       packet->total_data_buflen);
 
 	skb->protocol = eth_type_trans(skb, net);
 	if (csum_info) {
@@ -663,6 +645,75 @@ int netvsc_recv_callback(struct hv_device *device_obj,
 		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
 				       vlan_tci);
 
+	return skb;
+}
+
+/*
+ * netvsc_recv_callback -  Callback when we receive a packet from the
+ * "wire" on the specified device.
+ */
+int netvsc_recv_callback(struct hv_device *device_obj,
+				struct hv_netvsc_packet *packet,
+				void **data,
+				struct ndis_tcp_ip_checksum_info *csum_info,
+				struct vmbus_channel *channel,
+				u16 vlan_tci)
+{
+	struct net_device *net;
+	struct net_device_context *net_device_ctx;
+	struct sk_buff *skb;
+	struct sk_buff *vf_skb;
+	struct netvsc_stats *rx_stats;
+	struct netvsc_device *netvsc_dev = hv_get_drvdata(device_obj);
+	u32 bytes_recvd = packet->total_data_buflen;
+	int ret = 0;
+
+	net = netvsc_dev->ndev;
+	if (!net || net->reg_state != NETREG_REGISTERED)
+		return NVSP_STAT_FAIL;
+
+	if (READ_ONCE(netvsc_dev->vf_inject)) {
+		atomic_inc(&netvsc_dev->vf_use_cnt);
+		if (READ_ONCE(netvsc_dev->vf_inject)) {
+			/*
+			 * We raced; just move on.
+			 */
+			atomic_dec(&netvsc_dev->vf_use_cnt);
+			goto vf_injection_done;
+		}
+
+		/*
+		 * Inject this packet into the VF inerface.
+		 * On Hyper-V, multicast and brodcast packets
+		 * are only delivered on the synthetic interface
+		 * (after subjecting these to policy filters on
+		 * the host). Deliver these via the VF interface
+		 * in the guest.
+		 */
+		vf_skb = netvsc_alloc_recv_skb(netvsc_dev->vf_netdev, packet,
+					       csum_info, *data, vlan_tci);
+		if (vf_skb != NULL) {
+			++netvsc_dev->vf_netdev->stats.rx_packets;
+			netvsc_dev->vf_netdev->stats.rx_bytes += bytes_recvd;
+			netif_receive_skb(vf_skb);
+		} else {
+			++net->stats.rx_dropped;
+			ret = NVSP_STAT_FAIL;
+		}
+		atomic_dec(&netvsc_dev->vf_use_cnt);
+		return ret;
+	}
+
+vf_injection_done:
+	net_device_ctx = netdev_priv(net);
+	rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
+
+	/* Allocate a skb - TODO direct I/O to pages? */
+	skb = netvsc_alloc_recv_skb(net, packet, csum_info, *data, vlan_tci);
+	if (unlikely(!skb)) {
+		++net->stats.rx_dropped;
+		return NVSP_STAT_FAIL;
+	}
 	skb_record_rx_queue(skb, channel->
 			    offermsg.offer.sub_channel_index);
 
@@ -1102,6 +1153,172 @@ static void netvsc_free_netdev(struct net_device *netdev)
 	free_netdev(netdev);
 }
 
+static void netvsc_notify_peers(struct work_struct *wrk)
+{
+	struct garp_wrk *gwrk;
+
+	gwrk = container_of(wrk, struct garp_wrk, dwrk);
+
+	netdev_notify_peers(gwrk->netdev);
+
+	atomic_dec(&gwrk->netvsc_dev->vf_use_cnt);
+}
+
+static struct netvsc_device *get_netvsc_device(char *mac)
+{
+	struct net_device *dev;
+	struct net_device_context *netvsc_ctx = NULL;
+	int rtnl_locked;
+
+	rtnl_locked = rtnl_trylock();
+
+	for_each_netdev(&init_net, dev) {
+		if (memcmp(dev->dev_addr, mac, ETH_ALEN) == 0) {
+			if (dev->netdev_ops != &device_ops)
+				continue;
+			netvsc_ctx = netdev_priv(dev);
+			break;
+		}
+	}
+	if (rtnl_locked)
+		rtnl_unlock();
+
+	if (netvsc_ctx == NULL)
+		return NULL;
+
+	return hv_get_drvdata(netvsc_ctx->device_ctx);
+}
+
+static int netvsc_register_vf(struct net_device *vf_netdev)
+{
+	struct netvsc_device *netvsc_dev;
+	const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
+
+	if (eth_ops == NULL || eth_ops == &ethtool_ops)
+		return NOTIFY_DONE;
+
+	/*
+	 * We will use the MAC address to locate the synthetic interface to
+	 * associate with the VF interface. If we don't find a matching
+	 * synthetic interface, move on.
+	 */
+	netvsc_dev = get_netvsc_device(vf_netdev->dev_addr);
+	if (netvsc_dev == NULL)
+		return NOTIFY_DONE;
+
+	pr_info("VF registering: %s\n", vf_netdev->name);
+	/*
+	 * Take a reference on the module.
+	 */
+	try_module_get(THIS_MODULE);
+	netvsc_dev->vf_netdev = vf_netdev;
+	return NOTIFY_OK;
+}
+
+
+static int netvsc_vf_up(struct net_device *vf_netdev)
+{
+	struct netvsc_device *netvsc_dev;
+	const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
+	struct net_device_context *net_device_ctx;
+
+	if (eth_ops == &ethtool_ops)
+		return NOTIFY_DONE;
+
+	netvsc_dev = get_netvsc_device(vf_netdev->dev_addr);
+
+	if ((netvsc_dev == NULL) || (netvsc_dev->vf_netdev == NULL))
+		return NOTIFY_DONE;
+
+	pr_info("VF up: %s\n", vf_netdev->name);
+	net_device_ctx = netdev_priv(netvsc_dev->ndev);
+	netvsc_dev->vf_inject = true;
+
+	/*
+	 * Open the device before switching data path.
+	 */
+	rndis_filter_open(net_device_ctx->device_ctx);
+
+	/*
+	 * notify the host to switch the data path.
+	 */
+	netvsc_switch_datapath(netvsc_dev, true);
+	pr_info("Data path switched to VF: %s\n", vf_netdev->name);
+
+	netif_carrier_off(netvsc_dev->ndev);
+
+	/*
+	 * Now notify peers. We are scheduling work to
+	 * notify peers; take a reference to prevent
+	 * the VF interface from vanishing.
+	 */
+	atomic_inc(&netvsc_dev->vf_use_cnt);
+	net_device_ctx->gwrk.netdev = vf_netdev;
+	net_device_ctx->gwrk.netvsc_dev = netvsc_dev;
+	schedule_work(&net_device_ctx->gwrk.dwrk);
+
+	return NOTIFY_OK;
+}
+
+
+static int netvsc_vf_down(struct net_device *vf_netdev)
+{
+	struct netvsc_device *netvsc_dev;
+	struct net_device_context *net_device_ctx;
+	const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
+
+	if (eth_ops == &ethtool_ops)
+		return NOTIFY_DONE;
+
+	netvsc_dev = get_netvsc_device(vf_netdev->dev_addr);
+
+	if ((netvsc_dev == NULL) || (netvsc_dev->vf_netdev == NULL))
+		return NOTIFY_DONE;
+
+	pr_info("VF down: %s\n", vf_netdev->name);
+	net_device_ctx = netdev_priv(netvsc_dev->ndev);
+	netvsc_dev->vf_inject = false;
+	/*
+	 * Wait for currently active users to
+	 * drain out.
+	 */
+
+	while (atomic_read(&netvsc_dev->vf_use_cnt) != 0)
+		udelay(50);
+	netvsc_switch_datapath(netvsc_dev, false);
+	pr_info("Data path switched from VF: %s\n", vf_netdev->name);
+	rndis_filter_close(net_device_ctx->device_ctx);
+	netif_carrier_on(netvsc_dev->ndev);
+	/*
+	 * Notify peers.
+	 */
+	atomic_inc(&netvsc_dev->vf_use_cnt);
+	net_device_ctx->gwrk.netdev = netvsc_dev->ndev;
+	net_device_ctx->gwrk.netvsc_dev = netvsc_dev;
+	schedule_work(&net_device_ctx->gwrk.dwrk);
+
+	return NOTIFY_OK;
+}
+
+
+static int netvsc_unregister_vf(struct net_device *vf_netdev)
+{
+	struct netvsc_device *netvsc_dev;
+	const struct ethtool_ops *eth_ops = vf_netdev->ethtool_ops;
+
+	if (eth_ops == &ethtool_ops)
+		return NOTIFY_DONE;
+
+	netvsc_dev = get_netvsc_device(vf_netdev->dev_addr);
+	if (netvsc_dev == NULL)
+		return NOTIFY_DONE;
+	pr_info("VF unregistering: %s\n", vf_netdev->name);
+
+	netvsc_dev->vf_netdev = NULL;
+	module_put(THIS_MODULE);
+	return NOTIFY_OK;
+}
+
 static int netvsc_probe(struct hv_device *dev,
 			const struct hv_vmbus_device_id *dev_id)
 {
@@ -1140,6 +1357,7 @@ static int netvsc_probe(struct hv_device *dev,
 	hv_set_drvdata(dev, net);
 	INIT_DELAYED_WORK(&net_device_ctx->dwork, netvsc_link_change);
 	INIT_WORK(&net_device_ctx->work, do_set_multicast);
+	INIT_WORK(&net_device_ctx->gwrk.dwrk, netvsc_notify_peers);
 
 	spin_lock_init(&net_device_ctx->lock);
 	INIT_LIST_HEAD(&net_device_ctx->reconfig_events);
@@ -1235,19 +1453,58 @@ static struct  hv_driver netvsc_drv = {
 	.remove = netvsc_remove,
 };
 
+
+/*
+ * On Hyper-V, every VF interface is matched with a corresponding
+ * synthetic interface. The synthetic interface is presented first
+ * to the guest. When the corresponding VF instance is registered,
+ * we will take care of switching the data path.
+ */
+static int netvsc_netdev_event(struct notifier_block *this,
+			       unsigned long event, void *ptr)
+{
+	struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
+
+	switch (event) {
+	case NETDEV_REGISTER:
+		return netvsc_register_vf(event_dev);
+	case NETDEV_UNREGISTER:
+		return netvsc_unregister_vf(event_dev);
+	case NETDEV_UP:
+		return netvsc_vf_up(event_dev);
+	case NETDEV_DOWN:
+		return netvsc_vf_down(event_dev);
+	default:
+		return NOTIFY_DONE;
+	}
+}
+
+static struct notifier_block netvsc_netdev_notifier = {
+	.notifier_call = netvsc_netdev_event,
+};
+
 static void __exit netvsc_drv_exit(void)
 {
+	unregister_netdevice_notifier(&netvsc_netdev_notifier);
 	vmbus_driver_unregister(&netvsc_drv);
 }
 
 static int __init netvsc_drv_init(void)
 {
+	int ret;
+
 	if (ring_size < RING_SIZE_MIN) {
 		ring_size = RING_SIZE_MIN;
 		pr_info("Increased ring_size to %d (min allowed)\n",
 			ring_size);
 	}
-	return vmbus_driver_register(&netvsc_drv);
+	ret = vmbus_driver_register(&netvsc_drv);
+
+	if (ret)
+		return ret;
+
+	register_netdevice_notifier(&netvsc_netdev_notifier);
+	return 0;
 }
 
 MODULE_LICENSE("GPL");
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH net-next 0/8] DSA refactoring: set 2
From: Andrew Lunn @ 2016-04-13 23:59 UTC (permalink / raw)
  To: David Miller; +Cc: Florian Fainelli, netdev, Vivien Didelot, Andrew Lunn

More preparatory patches for the DSA probe refactoring.

The first three patches turn the Marvell drivers into individual
modules, and a shared library module. They also become real Linux
devices, with probe functions. However, at the moment, this is just
stub code to be filled out with later patches.

With the drivers becoming real devices, it becomes easier to interact
with device tree properties. One such driver property is a GPIO line
used for resetting the switch. This is only used in Marvell devices,
so move it out of the core code into the Marvell shared code. This
changes the existing binding, but no in tree device tree actually uses
it.

The REG_READ and REG_WRITE macros have caused problems in the past,
since they contain a return statement, messing up locking code. They
also assume a ds variable is available. Kill them off.

With the addition of the drivers becoming devices, the order in which
data structures are created is changing. Previously, the DSA framework
would create the dsa_switch structure and pass it to the setup()
method, which would then allocate the driver private structure. Now
the drivers probe as devices, they allocate there private structure
first, and will only get a dsa_switch structure later. Future changes
to have the device export its own MDIO bus rather than have the
framework do it, requires at some of the driver functions operate
using only the private structure. Perform a mass internal API change
to pass the private structure internally, and only pass the dsa_switch
structure for the public APIs.

Each driver maintains a table of IDs to name strings. Extending this
table with the number of ports a switch has. It is likely this will
get further extended with number of VLANs, address databases etc.

The mv88e6131 driver does something different for a single switch vs a
collection of switches. All other drivers don't special case a single
switch. There is no need to do this, so remove the special case. This
will help later with unifying this code into the shared library.

So nothing too interesting here, that will come later...

Andrew Lunn (8):
  dsa: mv88e6xxx: Prepare for turning this into a library module
  dsa: mv88e6xxx: Add macro for registering the drivers
  dsa: Add mdio device support to Marvell switches
  dsa: Move gpio reset into switch driver
  dsa: mv88e6xxx: Kill the REG_READ and REG_WRITE macros
  dsa: mv88e6xxx: Replace ds with ps where possible
  dsa: mv88e6xxx: Use the name table to determine number of ports
  dsa: mv88e6131: Don't special case a single device

 Documentation/devicetree/bindings/net/dsa/dsa.txt  |    2 -
 .../devicetree/bindings/net/dsa/marvell.txt        |   39 +
 drivers/net/dsa/Makefile                           |   19 +-
 drivers/net/dsa/mv88e6123.c                        |   78 +-
 drivers/net/dsa/mv88e6131.c                        |  112 +-
 drivers/net/dsa/mv88e6171.c                        |   62 +-
 drivers/net/dsa/mv88e6352.c                        |   90 +-
 drivers/net/dsa/mv88e6xxx.c                        | 1240 +++++++++++---------
 drivers/net/dsa/mv88e6xxx.h                        |   66 +-
 include/net/dsa.h                                  |    8 -
 net/dsa/dsa.c                                      |   16 -
 11 files changed, 1000 insertions(+), 732 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/marvell.txt

-- 
2.7.0

^ permalink raw reply

* [PATCH net-next 4/8] dsa: Move gpio reset into switch driver
From: Andrew Lunn @ 2016-04-13 23:59 UTC (permalink / raw)
  To: David Miller; +Cc: Florian Fainelli, netdev, Vivien Didelot, Andrew Lunn
In-Reply-To: <1460591998-20598-1-git-send-email-andrew@lunn.ch>

Resetting the switch is something the driver does, not the framework.
So move the parsing of this property into the driver.

There are no in kernel users of this property, so moving it does not
break anything. There is a board which will make use of this property
making its way into the kernel.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt     |  2 --
 Documentation/devicetree/bindings/net/dsa/marvell.txt | 10 ++++++++++
 drivers/net/dsa/mv88e6xxx.c                           | 14 +++++++++++++-
 drivers/net/dsa/mv88e6xxx.h                           |  7 +++++++
 include/net/dsa.h                                     |  8 --------
 net/dsa/dsa.c                                         | 16 ----------------
 6 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index 5fdbbcdf8c4b..9f4807f90c31 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -31,8 +31,6 @@ A switch child node has the following optional property:
 			  switch. Must be set if the switch can not detect
 			  the presence and/or size of a connected EEPROM,
 			  otherwise optional.
-- reset-gpios		: phandle and specifier to a gpio line connected to
-			  reset pin of the switch chip.
 
 A switch may have multiple "port" children nodes
 
diff --git a/Documentation/devicetree/bindings/net/dsa/marvell.txt b/Documentation/devicetree/bindings/net/dsa/marvell.txt
index 51b7cd9408f2..301416e94513 100644
--- a/Documentation/devicetree/bindings/net/dsa/marvell.txt
+++ b/Documentation/devicetree/bindings/net/dsa/marvell.txt
@@ -10,12 +10,20 @@ If you need a stable binding, use the old dsa.txt binding.
 Marvell Switches are MDIO devices. The following properties should be
 placed as a child node of an mdio device.
 
+The properties described here are those specific to Marvell devices.
+Additional required and optional properties can be found in dsa2.txt.
+
 Required properties:
 - compatible		: Should be one of "marvell,mv88e6123",
 			  "marvell,mv88e6131", "marvell,mv88e6171",
 			  "marvell,mv88e6352" or "marvell,mv88e6060"
 - reg			: Address on the MII bus for the switch.
 
+Optional properties:
+
+- reset-gpios		: Should be a gpio specifier for a reset line
+
+Optional Properties
 Example:
 
 	mdio {
@@ -25,5 +33,7 @@ Example:
 		switch0: switch@1 {
 			reg = <0>;
 			compatible = "marvell,mv88e6131";
+
+			reset-gpios = <&gpio5 1 GPIO_ACTIVE_LOW>;
 		};
 	};
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 9e906e0459c7..ef29a94da975 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2830,7 +2830,7 @@ int mv88e6xxx_switch_reset(struct dsa_switch *ds, bool ppu_active)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
 	u16 is_reset = (ppu_active ? 0x8800 : 0xc800);
-	struct gpio_desc *gpiod = ds->pd->reset;
+	struct gpio_desc *gpiod = ps->reset;
 	unsigned long timeout;
 	int ret;
 	int i;
@@ -3178,6 +3178,7 @@ int mv88e6xxx_probe(struct mdio_device *mdiodev, struct dsa_switch_driver *ops,
 	struct mv88e6xxx_priv_state *ps;
 	struct dsa_switch *ds;
 	const char *name;
+	int err;
 
 	ds = devm_kzalloc(dev, sizeof(*ds) + sizeof(*ps), GFP_KERNEL);
 	if (!ds)
@@ -3199,6 +3200,17 @@ int mv88e6xxx_probe(struct mdio_device *mdiodev, struct dsa_switch_driver *ops,
 		return -ENODEV;
 	}
 
+	ps->reset = devm_gpiod_get(&mdiodev->dev, "reset", GPIOD_ASIS);
+	err = PTR_ERR(ps->reset);
+	if (err) {
+		if (err == -ENOENT) {
+			/* Optional, so not an error */
+			ps->reset = NULL;
+		} else {
+			return err;
+		}
+	}
+
 	dev_set_drvdata(dev, ds);
 
 	return 0;
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 3bdc0fdf692f..739d3ff1bddf 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -12,6 +12,7 @@
 #define __MV88E6XXX_H
 
 #include <linux/if_vlan.h>
+#include <linux/gpio/consumer.h>
 
 #ifndef UINT64_MAX
 #define UINT64_MAX		(u64)(~((u64)0))
@@ -446,6 +447,12 @@ struct mv88e6xxx_priv_state {
 	DECLARE_BITMAP(port_state_update_mask, DSA_MAX_PORTS);
 
 	struct work_struct bridge_work;
+
+	/* A switch may have a GPIO line tied to its reset pin. Parse
+	 * this from the device tree, and use it before performing
+	 * switch soft reset.
+	 */
+	struct gpio_desc *reset;
 };
 
 enum stat_type {
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 689ebd3542ba..b54a2ed1002c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -16,7 +16,6 @@
 #include <linux/timer.h>
 #include <linux/workqueue.h>
 #include <linux/of.h>
-#include <linux/of_gpio.h>
 #include <linux/phy.h>
 #include <linux/phy_fixed.h>
 #include <linux/ethtool.h>
@@ -65,13 +64,6 @@ struct dsa_chip_data {
 	 * NULL if there is only one switch chip.
 	 */
 	s8		*rtable;
-
-	/*
-	 * A switch may have a GPIO line tied to its reset pin. Parse
-	 * this from the device tree, and use it before performing
-	 * switch soft reset.
-	 */
-	struct gpio_desc *reset;
 };
 
 struct dsa_platform_data {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 60ea98481806..1e2238e71ea7 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -660,9 +660,6 @@ static int dsa_of_probe(struct device *dev)
 	const char *port_name;
 	int chip_index, port_index;
 	const unsigned int *sw_addr, *port_reg;
-	int gpio;
-	enum of_gpio_flags of_flags;
-	unsigned long flags;
 	u32 eeprom_len;
 	int ret;
 
@@ -741,19 +738,6 @@ static int dsa_of_probe(struct device *dev)
 			put_device(cd->host_dev);
 			cd->host_dev = &mdio_bus_switch->dev;
 		}
-		gpio = of_get_named_gpio_flags(child, "reset-gpios", 0,
-					       &of_flags);
-		if (gpio_is_valid(gpio)) {
-			flags = (of_flags == OF_GPIO_ACTIVE_LOW ?
-				 GPIOF_ACTIVE_LOW : 0);
-			ret = devm_gpio_request_one(dev, gpio, flags,
-						    "switch_reset");
-			if (ret)
-				goto out_free_chip;
-
-			cd->reset = gpio_to_desc(gpio);
-			gpiod_direction_output(cd->reset, 0);
-		}
 
 		for_each_available_child_of_node(child, port) {
 			port_reg = of_get_property(port, "reg", NULL);
-- 
2.7.0

^ permalink raw reply related

* [PATCH net-next 5/8] dsa: mv88e6xxx: Kill the REG_READ and REG_WRITE macros
From: Andrew Lunn @ 2016-04-13 23:59 UTC (permalink / raw)
  To: David Miller; +Cc: Florian Fainelli, netdev, Vivien Didelot, Andrew Lunn
In-Reply-To: <1460591998-20598-1-git-send-email-andrew@lunn.ch>

These macros hide a ds variable and a return statement on error, which
can lead to locking issues. Kill them off.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6123.c |  13 ++-
 drivers/net/dsa/mv88e6131.c |  41 ++++----
 drivers/net/dsa/mv88e6171.c |  16 +--
 drivers/net/dsa/mv88e6352.c |  15 +--
 drivers/net/dsa/mv88e6xxx.c | 232 +++++++++++++++++++++++++++++++-------------
 drivers/net/dsa/mv88e6xxx.h |  19 ----
 6 files changed, 217 insertions(+), 119 deletions(-)

diff --git a/drivers/net/dsa/mv88e6123.c b/drivers/net/dsa/mv88e6123.c
index 7ef0e3c7b703..450dad5af568 100644
--- a/drivers/net/dsa/mv88e6123.c
+++ b/drivers/net/dsa/mv88e6123.c
@@ -54,7 +54,9 @@ static int mv88e6123_setup_global(struct dsa_switch *ds)
 	 * external PHYs to poll), don't discard packets with
 	 * excessive collisions, and mask all interrupt sources.
 	 */
-	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL, 0x0000);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL, 0x0000);
+	if (ret)
+		return ret;
 
 	/* Configure the upstream port, and configure the upstream
 	 * port as the port to which ingress and egress monitor frames
@@ -63,14 +65,15 @@ static int mv88e6123_setup_global(struct dsa_switch *ds)
 	reg = upstream_port << GLOBAL_MONITOR_CONTROL_INGRESS_SHIFT |
 		upstream_port << GLOBAL_MONITOR_CONTROL_EGRESS_SHIFT |
 		upstream_port << GLOBAL_MONITOR_CONTROL_ARP_SHIFT;
-	REG_WRITE(REG_GLOBAL, GLOBAL_MONITOR_CONTROL, reg);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_MONITOR_CONTROL, reg);
+	if (ret)
+		return ret;
 
 	/* Disable remote management for now, and set the switch's
 	 * DSA device number.
 	 */
-	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL_2, ds->index & 0x1f);
-
-	return 0;
+	return mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL_2,
+				   ds->index & 0x1f);
 }
 
 static int mv88e6123_setup(struct dsa_switch *ds)
diff --git a/drivers/net/dsa/mv88e6131.c b/drivers/net/dsa/mv88e6131.c
index 25f35937787e..e1e2410a660c 100644
--- a/drivers/net/dsa/mv88e6131.c
+++ b/drivers/net/dsa/mv88e6131.c
@@ -51,11 +51,16 @@ static int mv88e6131_setup_global(struct dsa_switch *ds)
 	 * to arbitrate between packet queues, set the maximum frame
 	 * size to 1632, and mask all interrupt sources.
 	 */
-	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL,
-		  GLOBAL_CONTROL_PPU_ENABLE | GLOBAL_CONTROL_MAX_FRAME_1632);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL,
+				  GLOBAL_CONTROL_PPU_ENABLE |
+				  GLOBAL_CONTROL_MAX_FRAME_1632);
+	if (ret)
+		return ret;
 
 	/* Set the VLAN ethertype to 0x8100. */
-	REG_WRITE(REG_GLOBAL, GLOBAL_CORE_TAG_TYPE, 0x8100);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CORE_TAG_TYPE, 0x8100);
+	if (ret)
+		return ret;
 
 	/* Disable ARP mirroring, and configure the upstream port as
 	 * the port to which ingress and egress monitor frames are to
@@ -64,31 +69,33 @@ static int mv88e6131_setup_global(struct dsa_switch *ds)
 	reg = upstream_port << GLOBAL_MONITOR_CONTROL_INGRESS_SHIFT |
 		upstream_port << GLOBAL_MONITOR_CONTROL_EGRESS_SHIFT |
 		GLOBAL_MONITOR_CONTROL_ARP_DISABLED;
-	REG_WRITE(REG_GLOBAL, GLOBAL_MONITOR_CONTROL, reg);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_MONITOR_CONTROL, reg);
+	if (ret)
+		return ret;
 
 	/* Disable cascade port functionality unless this device
 	 * is used in a cascade configuration, and set the switch's
 	 * DSA device number.
 	 */
 	if (ds->dst->pd->nr_chips > 1)
-		REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL_2,
-			  GLOBAL_CONTROL_2_MULTIPLE_CASCADE |
-			  (ds->index & 0x1f));
+		ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL_2,
+					  GLOBAL_CONTROL_2_MULTIPLE_CASCADE |
+					  (ds->index & 0x1f));
 	else
-		REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL_2,
-			  GLOBAL_CONTROL_2_NO_CASCADE |
-			  (ds->index & 0x1f));
+		ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL_2,
+					  GLOBAL_CONTROL_2_NO_CASCADE |
+					  (ds->index & 0x1f));
+	if (ret)
+		return ret;
 
 	/* Force the priority of IGMP/MLD snoop frames and ARP frames
 	 * to the highest setting.
 	 */
-	REG_WRITE(REG_GLOBAL2, GLOBAL2_PRIO_OVERRIDE,
-		  GLOBAL2_PRIO_OVERRIDE_FORCE_SNOOP |
-		  7 << GLOBAL2_PRIO_OVERRIDE_SNOOP_SHIFT |
-		  GLOBAL2_PRIO_OVERRIDE_FORCE_ARP |
-		  7 << GLOBAL2_PRIO_OVERRIDE_ARP_SHIFT);
-
-	return 0;
+	return mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_PRIO_OVERRIDE,
+				   GLOBAL2_PRIO_OVERRIDE_FORCE_SNOOP |
+				   7 << GLOBAL2_PRIO_OVERRIDE_SNOOP_SHIFT |
+				   GLOBAL2_PRIO_OVERRIDE_FORCE_ARP |
+				   7 << GLOBAL2_PRIO_OVERRIDE_ARP_SHIFT);
 }
 
 static int mv88e6131_setup(struct dsa_switch *ds)
diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index abf2a9bbbec5..e3998be13515 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -48,8 +48,11 @@ static int mv88e6171_setup_global(struct dsa_switch *ds)
 	/* Discard packets with excessive collisions, mask all
 	 * interrupt sources, enable PPU.
 	 */
-	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL,
-		  GLOBAL_CONTROL_PPU_ENABLE | GLOBAL_CONTROL_DISCARD_EXCESS);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL,
+				  GLOBAL_CONTROL_PPU_ENABLE |
+				  GLOBAL_CONTROL_DISCARD_EXCESS);
+	if (ret)
+		return ret;
 
 	/* Configure the upstream port, and configure the upstream
 	 * port as the port to which ingress and egress monitor frames
@@ -59,14 +62,15 @@ static int mv88e6171_setup_global(struct dsa_switch *ds)
 		upstream_port << GLOBAL_MONITOR_CONTROL_EGRESS_SHIFT |
 		upstream_port << GLOBAL_MONITOR_CONTROL_ARP_SHIFT |
 		upstream_port << GLOBAL_MONITOR_CONTROL_MIRROR_SHIFT;
-	REG_WRITE(REG_GLOBAL, GLOBAL_MONITOR_CONTROL, reg);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_MONITOR_CONTROL, reg);
+	if (ret)
+		return ret;
 
 	/* Disable remote management for now, and set the switch's
 	 * DSA device number.
 	 */
-	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL_2, ds->index & 0x1f);
-
-	return 0;
+	return mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL_2,
+				   ds->index & 0x1f);
 }
 
 static int mv88e6171_setup(struct dsa_switch *ds)
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index 22d42fb51991..baeb594b63cc 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -60,8 +60,11 @@ static int mv88e6352_setup_global(struct dsa_switch *ds)
 	/* Discard packets with excessive collisions,
 	 * mask all interrupt sources, enable PPU (bit 14, undocumented).
 	 */
-	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL,
-		  GLOBAL_CONTROL_PPU_ENABLE | GLOBAL_CONTROL_DISCARD_EXCESS);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL,
+				  GLOBAL_CONTROL_PPU_ENABLE |
+				  GLOBAL_CONTROL_DISCARD_EXCESS);
+	if (ret)
+		return ret;
 
 	/* Configure the upstream port, and configure the upstream
 	 * port as the port to which ingress and egress monitor frames
@@ -70,14 +73,14 @@ static int mv88e6352_setup_global(struct dsa_switch *ds)
 	reg = upstream_port << GLOBAL_MONITOR_CONTROL_INGRESS_SHIFT |
 		upstream_port << GLOBAL_MONITOR_CONTROL_EGRESS_SHIFT |
 		upstream_port << GLOBAL_MONITOR_CONTROL_ARP_SHIFT;
-	REG_WRITE(REG_GLOBAL, GLOBAL_MONITOR_CONTROL, reg);
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_MONITOR_CONTROL, reg);
+	if (ret)
+		return ret;
 
 	/* Disable remote management for now, and set the switch's
 	 * DSA device number.
 	 */
-	REG_WRITE(REG_GLOBAL, 0x1c, ds->index & 0x1f);
-
-	return 0;
+	return mv88e6xxx_reg_write(ds, REG_GLOBAL, 0x1c, ds->index & 0x1f);
 }
 
 static int mv88e6352_setup(struct dsa_switch *ds)
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index ef29a94da975..93110460e0c3 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -182,29 +182,47 @@ EXPORT_SYMBOL_GPL(mv88e6xxx_reg_write);
 
 int mv88e6xxx_set_addr_direct(struct dsa_switch *ds, u8 *addr)
 {
-	REG_WRITE(REG_GLOBAL, GLOBAL_MAC_01, (addr[0] << 8) | addr[1]);
-	REG_WRITE(REG_GLOBAL, GLOBAL_MAC_23, (addr[2] << 8) | addr[3]);
-	REG_WRITE(REG_GLOBAL, GLOBAL_MAC_45, (addr[4] << 8) | addr[5]);
+	int err;
 
-	return 0;
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_MAC_01,
+				  (addr[0] << 8) | addr[1]);
+	if (err)
+		return err;
+
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_MAC_23,
+				  (addr[2] << 8) | addr[3]);
+	if (err)
+		return err;
+
+	return mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_MAC_45,
+				   (addr[4] << 8) | addr[5]);
 }
 EXPORT_SYMBOL_GPL(mv88e6xxx_set_addr_direct);
 
 int mv88e6xxx_set_addr_indirect(struct dsa_switch *ds, u8 *addr)
 {
+	int ret, err;
 	int i;
-	int ret;
 
 	for (i = 0; i < 6; i++) {
 		int j;
 
 		/* Write the MAC address byte. */
-		REG_WRITE(REG_GLOBAL2, GLOBAL2_SWITCH_MAC,
-			  GLOBAL2_SWITCH_MAC_BUSY | (i << 8) | addr[i]);
+		err = mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_SWITCH_MAC,
+					  GLOBAL2_SWITCH_MAC_BUSY |
+					  (i << 8) | addr[i]);
+		if (err)
+			return err;
 
 		/* Wait for the write to complete. */
 		for (j = 0; j < 16; j++) {
-			ret = REG_READ(REG_GLOBAL2, GLOBAL2_SWITCH_MAC);
+			ret = mv88e6xxx_reg_read(ds, REG_GLOBAL2,
+						 GLOBAL2_SWITCH_MAC);
+			if (ret < 0)
+				return ret;
+
+			if (ret < 0)
+				return ret;
 			if ((ret & GLOBAL2_SWITCH_MAC_BUSY) == 0)
 				break;
 		}
@@ -237,13 +255,21 @@ static int mv88e6xxx_ppu_disable(struct dsa_switch *ds)
 	int ret;
 	unsigned long timeout;
 
-	ret = REG_READ(REG_GLOBAL, GLOBAL_CONTROL);
-	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL,
-		  ret & ~GLOBAL_CONTROL_PPU_ENABLE);
+	ret = mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_CONTROL);
+	if (ret < 0)
+		return ret;
+
+	ret = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL,
+				  ret & ~GLOBAL_CONTROL_PPU_ENABLE);
+	if (ret)
+		return ret;
 
 	timeout = jiffies + 1 * HZ;
 	while (time_before(jiffies, timeout)) {
-		ret = REG_READ(REG_GLOBAL, GLOBAL_STATUS);
+		ret = mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_STATUS);
+		if (ret < 0)
+			return ret;
+
 		usleep_range(1000, 2000);
 		if ((ret & GLOBAL_STATUS_PPU_MASK) !=
 		    GLOBAL_STATUS_PPU_POLLING)
@@ -255,15 +281,24 @@ static int mv88e6xxx_ppu_disable(struct dsa_switch *ds)
 
 static int mv88e6xxx_ppu_enable(struct dsa_switch *ds)
 {
-	int ret;
+	int ret, err;
 	unsigned long timeout;
 
-	ret = REG_READ(REG_GLOBAL, GLOBAL_CONTROL);
-	REG_WRITE(REG_GLOBAL, GLOBAL_CONTROL, ret | GLOBAL_CONTROL_PPU_ENABLE);
+	ret = mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_CONTROL);
+	if (ret < 0)
+		return ret;
+
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_CONTROL,
+				  ret | GLOBAL_CONTROL_PPU_ENABLE);
+	if (err)
+		return err;
 
 	timeout = jiffies + 1 * HZ;
 	while (time_before(jiffies, timeout)) {
-		ret = REG_READ(REG_GLOBAL, GLOBAL_STATUS);
+		ret = mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_STATUS);
+		if (ret < 0)
+			return ret;
+
 		usleep_range(1000, 2000);
 		if ((ret & GLOBAL_STATUS_PPU_MASK) ==
 		    GLOBAL_STATUS_PPU_POLLING)
@@ -2697,7 +2732,8 @@ int mv88e6xxx_setup_common(struct dsa_switch *ds)
 	ps->ds = ds;
 	mutex_init(&ps->smi_mutex);
 
-	ps->id = REG_READ(REG_PORT(0), PORT_SWITCH_ID) & 0xfff0;
+	ps->id = __mv88e6xxx_reg_read(ps->bus, ps->sw_addr, REG_PORT(0),
+				      PORT_SWITCH_ID) & 0xfff0;
 
 	INIT_WORK(&ps->bridge_work, mv88e6xxx_bridge_work);
 
@@ -2708,42 +2744,66 @@ EXPORT_SYMBOL_GPL(mv88e6xxx_setup_common);
 int mv88e6xxx_setup_global(struct dsa_switch *ds)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-	int ret;
+	int err;
 	int i;
 
 	/* Set the default address aging time to 5 minutes, and
 	 * enable address learn messages to be sent to all message
 	 * ports.
 	 */
-	REG_WRITE(REG_GLOBAL, GLOBAL_ATU_CONTROL,
-		  0x0140 | GLOBAL_ATU_CONTROL_LEARN2ALL);
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_ATU_CONTROL,
+				  0x0140 | GLOBAL_ATU_CONTROL_LEARN2ALL);
+	if (err)
+		return err;
 
 	/* Configure the IP ToS mapping registers. */
-	REG_WRITE(REG_GLOBAL, GLOBAL_IP_PRI_0, 0x0000);
-	REG_WRITE(REG_GLOBAL, GLOBAL_IP_PRI_1, 0x0000);
-	REG_WRITE(REG_GLOBAL, GLOBAL_IP_PRI_2, 0x5555);
-	REG_WRITE(REG_GLOBAL, GLOBAL_IP_PRI_3, 0x5555);
-	REG_WRITE(REG_GLOBAL, GLOBAL_IP_PRI_4, 0xaaaa);
-	REG_WRITE(REG_GLOBAL, GLOBAL_IP_PRI_5, 0xaaaa);
-	REG_WRITE(REG_GLOBAL, GLOBAL_IP_PRI_6, 0xffff);
-	REG_WRITE(REG_GLOBAL, GLOBAL_IP_PRI_7, 0xffff);
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IP_PRI_0, 0x0000);
+	if (err)
+		return err;
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IP_PRI_1, 0x0000);
+	if (err)
+		return err;
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IP_PRI_2, 0x5555);
+	if (err)
+		return err;
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IP_PRI_3, 0x5555);
+	if (err)
+		return err;
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IP_PRI_4, 0xaaaa);
+	if (err)
+		return err;
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IP_PRI_5, 0xaaaa);
+	if (err)
+		return err;
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IP_PRI_6, 0xffff);
+	if (err)
+		return err;
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IP_PRI_7, 0xffff);
+	if (err)
+		return err;
 
 	/* Configure the IEEE 802.1p priority mapping register. */
-	REG_WRITE(REG_GLOBAL, GLOBAL_IEEE_PRI, 0xfa41);
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_IEEE_PRI, 0xfa41);
+	if (err)
+		return err;
 
 	/* Send all frames with destination addresses matching
 	 * 01:80:c2:00:00:0x to the CPU port.
 	 */
-	REG_WRITE(REG_GLOBAL2, GLOBAL2_MGMT_EN_0X, 0xffff);
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_MGMT_EN_0X, 0xffff);
+	if (err)
+		return err;
 
 	/* Ignore removed tag data on doubly tagged packets, disable
 	 * flow control messages, force flow control priority to the
 	 * highest, and send all special multicast frames to the CPU
 	 * port at the highest priority.
 	 */
-	REG_WRITE(REG_GLOBAL2, GLOBAL2_SWITCH_MGMT,
-		  0x7 | GLOBAL2_SWITCH_MGMT_RSVD2CPU | 0x70 |
-		  GLOBAL2_SWITCH_MGMT_FORCE_FLOW_CTRL_PRI);
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_SWITCH_MGMT,
+				  0x7 | GLOBAL2_SWITCH_MGMT_RSVD2CPU | 0x70 |
+				  GLOBAL2_SWITCH_MGMT_FORCE_FLOW_CTRL_PRI);
+	if (err)
+		return err;
 
 	/* Program the DSA routing table. */
 	for (i = 0; i < 32; i++) {
@@ -2753,23 +2813,35 @@ int mv88e6xxx_setup_global(struct dsa_switch *ds)
 		    i != ds->index && i < ds->dst->pd->nr_chips)
 			nexthop = ds->pd->rtable[i] & 0x1f;
 
-		REG_WRITE(REG_GLOBAL2, GLOBAL2_DEVICE_MAPPING,
-			  GLOBAL2_DEVICE_MAPPING_UPDATE |
-			  (i << GLOBAL2_DEVICE_MAPPING_TARGET_SHIFT) |
-			  nexthop);
+		err = mv88e6xxx_reg_write(
+			ds, REG_GLOBAL2,
+			GLOBAL2_DEVICE_MAPPING,
+			GLOBAL2_DEVICE_MAPPING_UPDATE |
+			(i << GLOBAL2_DEVICE_MAPPING_TARGET_SHIFT) | nexthop);
+	if (err)
+		return err;
 	}
 
 	/* Clear all trunk masks. */
-	for (i = 0; i < 8; i++)
-		REG_WRITE(REG_GLOBAL2, GLOBAL2_TRUNK_MASK,
-			  0x8000 | (i << GLOBAL2_TRUNK_MASK_NUM_SHIFT) |
-			  ((1 << ps->num_ports) - 1));
+	for (i = 0; i < 8; i++) {
+		err = mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_TRUNK_MASK,
+					  0x8000 |
+					  (i << GLOBAL2_TRUNK_MASK_NUM_SHIFT) |
+					  ((1 << ps->num_ports) - 1));
+		if (err)
+			return err;
+	}
 
 	/* Clear all trunk mappings. */
-	for (i = 0; i < 16; i++)
-		REG_WRITE(REG_GLOBAL2, GLOBAL2_TRUNK_MAPPING,
-			  GLOBAL2_TRUNK_MAPPING_UPDATE |
-			  (i << GLOBAL2_TRUNK_MAPPING_ID_SHIFT));
+	for (i = 0; i < 16; i++) {
+		err = mv88e6xxx_reg_write(
+			ds, REG_GLOBAL2,
+			GLOBAL2_TRUNK_MAPPING,
+			GLOBAL2_TRUNK_MAPPING_UPDATE |
+			(i << GLOBAL2_TRUNK_MAPPING_ID_SHIFT));
+		if (err)
+			return err;
+	}
 
 	if (mv88e6xxx_6352_family(ds) || mv88e6xxx_6351_family(ds) ||
 	    mv88e6xxx_6165_family(ds) || mv88e6xxx_6097_family(ds) ||
@@ -2777,17 +2849,27 @@ int mv88e6xxx_setup_global(struct dsa_switch *ds)
 		/* Send all frames with destination addresses matching
 		 * 01:80:c2:00:00:2x to the CPU port.
 		 */
-		REG_WRITE(REG_GLOBAL2, GLOBAL2_MGMT_EN_2X, 0xffff);
+		err = mv88e6xxx_reg_write(ds, REG_GLOBAL2,
+					  GLOBAL2_MGMT_EN_2X, 0xffff);
+		if (err)
+			return err;
 
 		/* Initialise cross-chip port VLAN table to reset
 		 * defaults.
 		 */
-		REG_WRITE(REG_GLOBAL2, GLOBAL2_PVT_ADDR, 0x9000);
+		err = mv88e6xxx_reg_write(ds, REG_GLOBAL2,
+					  GLOBAL2_PVT_ADDR, 0x9000);
+		if (err)
+			return err;
 
 		/* Clear the priority override table. */
-		for (i = 0; i < 16; i++)
-			REG_WRITE(REG_GLOBAL2, GLOBAL2_PRIO_OVERRIDE,
-				  0x8000 | (i << 8));
+		for (i = 0; i < 16; i++) {
+			err = mv88e6xxx_reg_write(ds, REG_GLOBAL2,
+						  GLOBAL2_PRIO_OVERRIDE,
+						  0x8000 | (i << 8));
+			if (err)
+				return err;
+		}
 	}
 
 	if (mv88e6xxx_6352_family(ds) || mv88e6xxx_6351_family(ds) ||
@@ -2798,31 +2880,38 @@ int mv88e6xxx_setup_global(struct dsa_switch *ds)
 		 * ingress rate limit registers to their initial
 		 * state.
 		 */
-		for (i = 0; i < ps->num_ports; i++)
-			REG_WRITE(REG_GLOBAL2, GLOBAL2_INGRESS_OP,
-				  0x9000 | (i << 8));
+		for (i = 0; i < ps->num_ports; i++) {
+			err = mv88e6xxx_reg_write(ds, REG_GLOBAL2,
+						  GLOBAL2_INGRESS_OP,
+						  0x9000 | (i << 8));
+			if (err)
+				return err;
+		}
 	}
 
 	/* Clear the statistics counters for all ports */
-	REG_WRITE(REG_GLOBAL, GLOBAL_STATS_OP, GLOBAL_STATS_OP_FLUSH_ALL);
+	err = mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_STATS_OP,
+				  GLOBAL_STATS_OP_FLUSH_ALL);
+	if (err)
+		return err;
 
 	/* Wait for the flush to complete. */
 	mutex_lock(&ps->smi_mutex);
-	ret = _mv88e6xxx_stats_wait(ds);
-	if (ret < 0)
+	err = _mv88e6xxx_stats_wait(ds);
+	if (err < 0)
 		goto unlock;
 
 	/* Clear all ATU entries */
-	ret = _mv88e6xxx_atu_flush(ds, 0, true);
-	if (ret < 0)
+	err = _mv88e6xxx_atu_flush(ds, 0, true);
+	if (err < 0)
 		goto unlock;
 
 	/* Clear all the VTU and STU entries */
-	ret = _mv88e6xxx_vtu_stu_flush(ds);
+	err = _mv88e6xxx_vtu_stu_flush(ds);
 unlock:
 	mutex_unlock(&ps->smi_mutex);
 
-	return ret;
+	return err;
 }
 EXPORT_SYMBOL_GPL(mv88e6xxx_setup_global);
 
@@ -2832,13 +2921,19 @@ int mv88e6xxx_switch_reset(struct dsa_switch *ds, bool ppu_active)
 	u16 is_reset = (ppu_active ? 0x8800 : 0xc800);
 	struct gpio_desc *gpiod = ps->reset;
 	unsigned long timeout;
-	int ret;
+	int ret, err;
 	int i;
 
 	/* Set all ports to the disabled state. */
 	for (i = 0; i < ps->num_ports; i++) {
-		ret = REG_READ(REG_PORT(i), PORT_CONTROL);
-		REG_WRITE(REG_PORT(i), PORT_CONTROL, ret & 0xfffc);
+		ret = mv88e6xxx_reg_read(ds, REG_PORT(i), PORT_CONTROL);
+		if (ret < 0)
+			return ret;
+
+		err = mv88e6xxx_reg_write(ds, REG_PORT(i), PORT_CONTROL,
+					  ret & 0xfffc);
+		if (err)
+			return err;
 	}
 
 	/* Wait for transmit queues to drain. */
@@ -2857,14 +2952,19 @@ int mv88e6xxx_switch_reset(struct dsa_switch *ds, bool ppu_active)
 	 * through global registers 0x18 and 0x19.
 	 */
 	if (ppu_active)
-		REG_WRITE(REG_GLOBAL, 0x04, 0xc000);
+		err = mv88e6xxx_reg_write(ds, REG_GLOBAL, 0x04, 0xc000);
 	else
-		REG_WRITE(REG_GLOBAL, 0x04, 0xc400);
+		err = mv88e6xxx_reg_write(ds, REG_GLOBAL, 0x04, 0xc400);
+	if (err)
+		return err;
 
 	/* Wait up to one second for reset to complete. */
 	timeout = jiffies + 1 * HZ;
 	while (time_before(jiffies, timeout)) {
-		ret = REG_READ(REG_GLOBAL, 0x00);
+		ret = mv88e6xxx_reg_read(ds, REG_GLOBAL, 0x00);
+		if (ret < 0)
+			return ret;
+
 		if ((ret & is_reset) == is_reset)
 			break;
 		usleep_range(1000, 2000);
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 739d3ff1bddf..6d1b6207144d 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -554,25 +554,6 @@ extern struct dsa_switch_driver mv88e6123_switch_driver;
 extern struct dsa_switch_driver mv88e6352_switch_driver;
 extern struct dsa_switch_driver mv88e6171_switch_driver;
 
-#define REG_READ(addr, reg)						\
-	({								\
-		int __ret;						\
-									\
-		__ret = mv88e6xxx_reg_read(ds, addr, reg);		\
-		if (__ret < 0)						\
-			return __ret;					\
-		__ret;							\
-	})
-
-#define REG_WRITE(addr, reg, val)					\
-	({								\
-		int __ret;						\
-									\
-		__ret = mv88e6xxx_reg_write(ds, addr, reg, val);	\
-		if (__ret < 0)						\
-			return __ret;					\
-	})
-
 /**
  * mv88e6xxx_module_driver() - Helper macro for registering mv88e6xxx drivers
  *
-- 
2.7.0

^ permalink raw reply related

* [PATCH net-next 8/8] dsa: mv88e6131: Don't special case a single device
From: Andrew Lunn @ 2016-04-13 23:59 UTC (permalink / raw)
  To: David Miller; +Cc: Florian Fainelli, netdev, Vivien Didelot, Andrew Lunn
In-Reply-To: <1460591998-20598-1-git-send-email-andrew@lunn.ch>

When multiple switches are interconnected in a tree, a routing table
is used for directing packets out the correct port towards a remote
destination. This works equally well for a single switch, since the
routing table is empty. So don't special case for a single switch.
This makes the mv88e6131 behave like the other drivers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/dsa/mv88e6131.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/mv88e6131.c b/drivers/net/dsa/mv88e6131.c
index 969f67c77835..c7ee11f73be6 100644
--- a/drivers/net/dsa/mv88e6131.c
+++ b/drivers/net/dsa/mv88e6131.c
@@ -74,18 +74,12 @@ static int mv88e6131_setup_global(struct dsa_switch *ds)
 	if (ret)
 		return ret;
 
-	/* Disable cascade port functionality unless this device
-	 * is used in a cascade configuration, and set the switch's
-	 * DSA device number.
+	/* Set the switch's DSA device number and enable the use of
+	 * the routing table.
 	 */
-	if (ds->dst->pd->nr_chips > 1)
-		ret = mv88e6xxx_reg_write(ps, REG_GLOBAL, GLOBAL_CONTROL_2,
-					  GLOBAL_CONTROL_2_MULTIPLE_CASCADE |
-					  (ds->index & 0x1f));
-	else
-		ret = mv88e6xxx_reg_write(ps, REG_GLOBAL, GLOBAL_CONTROL_2,
-					  GLOBAL_CONTROL_2_NO_CASCADE |
-					  (ds->index & 0x1f));
+	ret = mv88e6xxx_reg_write(ps, REG_GLOBAL, GLOBAL_CONTROL_2,
+				  GLOBAL_CONTROL_2_MULTIPLE_CASCADE |
+				  (ds->index & 0x1f));
 	if (ret)
 		return ret;
 
-- 
2.7.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox