Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net: fec: turn on device when extracting statistics
From: Nikita Yushchenko @ 2016-11-28  7:06 UTC (permalink / raw)
  To: David Miller
  Cc: fugang.duan, troy.kisky, andrew, eric, tremyfr, johannes, netdev,
	linux-kernel, cphealy, Fabio Estevam
In-Reply-To: <20161127.202945.1759992980026862076.davem@davemloft.net>



28.11.2016 04:29, David Miller пишет:
> From: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> Date: Fri, 25 Nov 2016 13:02:00 +0300
> 
>> +	int i, ret;
>> +
>> +	ret = pm_runtime_get_sync(&fep->pdev->dev);
>> +	if (IS_ERR_VALUE(ret)) {
>> +		memset(data, 0, sizeof(*data) * ARRAY_SIZE(fec_stats));
>> +		return;
>> +	}
> 
> This really isn't the way to do this.
> 
> When the device is suspended and the clocks are going to be stopped,
> you must fetch the statistic values into a software copy and provide
> those if the device is suspended when statistics are requested.

Ok, can do that, although can't see what's wrong with waking device
here. The situation of requesting stats on down device isn't something
widely used, thus keeping handling of that as local as possible looks
better for me.

^ permalink raw reply

* [PATCH net-next v2 0/6] tcp: sender chronographs instrumentation
From: Yuchung Cheng @ 2016-11-28  7:07 UTC (permalink / raw)
  To: davem, soheil, francisyyan; +Cc: netdev, ncardwell, edumazet, Yuchung Cheng

This patch set provides instrumentation on TCP sender limitations.
While developing the BBR congestion control, we noticed that TCP
sending process is often limited by factors unrelated to congestion
control: insufficient sender buffer and/or insufficient receive
window/buffer to saturate the network bandwidth. Unfortunately these
limits are not visible to the users and often the poor performance
is attributed to the congestion control of choice.

Thie patch aims to help users get the high level understanding of
where sending process is limited by, similar to the TCP_INFO design.
It is not to replace detailed kernel tracing and instrumentation
facilities.

In addition this patch set provide a new option to the timestamping
work to instrument these limits on application data unit. For exampe,
one can use SO_TIMESTAMPING and this patch set to measure the how
long a particular HTTP response is limited by small receive window.

Patch set was initially written by Francis Yan then polished
by Yuchung Cheng, with lots of help from Eric Dumazet and Soheil
Hassas Yeganeh.

Francis Yan (6):
  tcp: instrument tcp sender limits chronographs
  tcp: instrument how long TCP is busy sending
  tcp: instrument how long TCP is limited by receive window
  tcp: instrument how long TCP is limited by insufficient send buffer
  tcp: export sender limits chronographs to TCP_INFO
  tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING

 Documentation/networking/timestamping.txt | 10 +++++
 arch/alpha/include/uapi/asm/socket.h      |  2 +
 arch/frv/include/uapi/asm/socket.h        |  2 +
 arch/ia64/include/uapi/asm/socket.h       |  2 +
 arch/m32r/include/uapi/asm/socket.h       |  2 +
 arch/mips/include/uapi/asm/socket.h       |  2 +
 arch/mn10300/include/uapi/asm/socket.h    |  2 +
 arch/parisc/include/uapi/asm/socket.h     |  2 +
 arch/powerpc/include/uapi/asm/socket.h    |  2 +
 arch/s390/include/uapi/asm/socket.h       |  2 +
 arch/sparc/include/uapi/asm/socket.h      |  2 +
 arch/xtensa/include/uapi/asm/socket.h     |  2 +
 include/linux/tcp.h                       |  9 ++++-
 include/net/tcp.h                         | 20 +++++++++-
 include/uapi/asm-generic/socket.h         |  2 +
 include/uapi/linux/net_tstamp.h           |  3 +-
 include/uapi/linux/tcp.h                  | 12 ++++++
 net/core/skbuff.c                         | 14 +++++--
 net/core/sock.c                           |  7 ++++
 net/ipv4/tcp.c                            | 50 ++++++++++++++++++++++-
 net/ipv4/tcp_input.c                      |  8 +++-
 net/ipv4/tcp_output.c                     | 66 ++++++++++++++++++++++++++++++-
 net/socket.c                              |  7 +++-
 23 files changed, 217 insertions(+), 13 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply

* [PATCH net-next v2 1/6] tcp: instrument tcp sender limits chronographs
From: Yuchung Cheng @ 2016-11-28  7:07 UTC (permalink / raw)
  To: davem, soheil, francisyyan; +Cc: netdev, ncardwell, edumazet, Yuchung Cheng
In-Reply-To: <1480316838-154141-1-git-send-email-ycheng@google.com>

From: Francis Yan <francisyyan@gmail.com>

This patch implements the skeleton of the TCP chronograph
instrumentation on sender side limits:

	1) idle (unspec)
	2) busy sending data other than 3-4 below
	3) rwnd-limited
	4) sndbuf-limited

The limits are enumerated 'tcp_chrono'. Since a connection in
theory can idle forever, we do not track the actual length of this
uninteresting idle period. For the rest we track how long the sender
spends in each limit. At any point during the life time of a
connection, the sender must be in one of the four states.

If there are multiple conditions worthy of tracking in a chronograph
then the highest priority enum takes precedence over
the other conditions. So that if something "more interesting"
starts happening, stop the previous chrono and start a new one.

The time unit is jiffy(u32) in order to save space in tcp_sock.
This implies application must sample the stats no longer than every
49 days of 1ms jiffy.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 include/linux/tcp.h   |  7 +++++--
 include/net/tcp.h     | 14 ++++++++++++++
 net/ipv4/tcp_output.c | 30 ++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 32a7c7e..d5d3bd8 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -211,8 +211,11 @@ struct tcp_sock {
 		u8 reord;    /* reordering detected */
 	} rack;
 	u16	advmss;		/* Advertised MSS			*/
-	u8	rate_app_limited:1,  /* rate_{delivered,interval_us} limited? */
-		unused:7;
+	u32	chrono_start;	/* Start time in jiffies of a TCP chrono */
+	u32	chrono_stat[3];	/* Time in jiffies for chrono_stat stats */
+	u8	chrono_type:2,	/* current chronograph type */
+		rate_app_limited:1,  /* rate_{delivered,interval_us} limited? */
+		unused:5;
 	u8	nonagle     : 4,/* Disable Nagle algorithm?             */
 		thin_lto    : 1,/* Use linear timeouts for thin streams */
 		thin_dupack : 1,/* Fast retransmit on first dupack      */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de8073..e5ff408 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1516,6 +1516,20 @@ struct tcp_fastopen_context {
 	struct rcu_head		rcu;
 };
 
+/* Latencies incurred by various limits for a sender. They are
+ * chronograph-like stats that are mutually exclusive.
+ */
+enum tcp_chrono {
+	TCP_CHRONO_UNSPEC,
+	TCP_CHRONO_BUSY, /* Actively sending data (non-empty write queue) */
+	TCP_CHRONO_RWND_LIMITED, /* Stalled by insufficient receive window */
+	TCP_CHRONO_SNDBUF_LIMITED, /* Stalled by insufficient send buffer */
+	__TCP_CHRONO_MAX,
+};
+
+void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type);
+void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type);
+
 /* write queue abstraction */
 static inline void tcp_write_queue_purge(struct sock *sk)
 {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 19105b4..34f7517 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2081,6 +2081,36 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
 	return false;
 }
 
+static void tcp_chrono_set(struct tcp_sock *tp, const enum tcp_chrono new)
+{
+	const u32 now = tcp_time_stamp;
+
+	if (tp->chrono_type > TCP_CHRONO_UNSPEC)
+		tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
+	tp->chrono_start = now;
+	tp->chrono_type = new;
+}
+
+void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	/* If there are multiple conditions worthy of tracking in a
+	 * chronograph then the highest priority enum takes precedence over
+	 * the other conditions. So that if something "more interesting"
+	 * starts happening, stop the previous chrono and start a new one.
+	 */
+	if (type > tp->chrono_type)
+		tcp_chrono_set(tp, type);
+}
+
+void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+}
+
 /* This routine writes packets to the network.  It advances the
  * send_head.  This happens as incoming acks open up the remote
  * window for us.
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next v2 2/6] tcp: instrument how long TCP is busy sending
From: Yuchung Cheng @ 2016-11-28  7:07 UTC (permalink / raw)
  To: davem, soheil, francisyyan; +Cc: netdev, ncardwell, edumazet, Yuchung Cheng
In-Reply-To: <1480316838-154141-1-git-send-email-ycheng@google.com>

From: Francis Yan <francisyyan@gmail.com>

This patch measures TCP busy time, which is defined as the period
of time when sender has data (or FIN) to send. The time starts when
data is buffered and stops when the write queue is flushed by ACKs
or error events.

Note the busy time does not include SYN time, unless data is
included in SYN (i.e. Fast Open). It does include FIN time even
if the FIN carries no payload. Excluding pure FIN is possible but
would incur one additional test in the fast path, which may not
be worth it.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 include/net/tcp.h     |  6 +++++-
 net/ipv4/tcp_input.c  |  3 +++
 net/ipv4/tcp_output.c | 19 ++++++++++++++++---
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e5ff408..3e097e3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1535,6 +1535,7 @@ static inline void tcp_write_queue_purge(struct sock *sk)
 {
 	struct sk_buff *skb;
 
+	tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
 	while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL)
 		sk_wmem_free_skb(sk, skb);
 	sk_mem_reclaim(sk);
@@ -1593,8 +1594,10 @@ static inline void tcp_advance_send_head(struct sock *sk, const struct sk_buff *
 
 static inline void tcp_check_send_head(struct sock *sk, struct sk_buff *skb_unlinked)
 {
-	if (sk->sk_send_head == skb_unlinked)
+	if (sk->sk_send_head == skb_unlinked) {
 		sk->sk_send_head = NULL;
+		tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+	}
 	if (tcp_sk(sk)->highest_sack == skb_unlinked)
 		tcp_sk(sk)->highest_sack = NULL;
 }
@@ -1616,6 +1619,7 @@ static inline void tcp_add_write_queue_tail(struct sock *sk, struct sk_buff *skb
 	/* Queue it, remembering where we must start sending. */
 	if (sk->sk_send_head == NULL) {
 		sk->sk_send_head = skb;
+		tcp_chrono_start(sk, TCP_CHRONO_BUSY);
 
 		if (tcp_sk(sk)->highest_sack == NULL)
 			tcp_sk(sk)->highest_sack = skb;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 22e6a20..a5d1727 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3178,6 +3178,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 			tp->lost_skb_hint = NULL;
 	}
 
+	if (!skb)
+		tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
+
 	if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una)))
 		tp->snd_up = tp->snd_una;
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 34f7517..e8ea584 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2096,8 +2096,8 @@ void tcp_chrono_start(struct sock *sk, const enum tcp_chrono type)
 	struct tcp_sock *tp = tcp_sk(sk);
 
 	/* If there are multiple conditions worthy of tracking in a
-	 * chronograph then the highest priority enum takes precedence over
-	 * the other conditions. So that if something "more interesting"
+	 * chronograph then the highest priority enum takes precedence
+	 * over the other conditions. So that if something "more interesting"
 	 * starts happening, stop the previous chrono and start a new one.
 	 */
 	if (type > tp->chrono_type)
@@ -2108,7 +2108,18 @@ void tcp_chrono_stop(struct sock *sk, const enum tcp_chrono type)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+
+	/* There are multiple conditions worthy of tracking in a
+	 * chronograph, so that the highest priority enum takes
+	 * precedence over the other conditions (see tcp_chrono_start).
+	 * If a condition stops, we only stop chrono tracking if
+	 * it's the "most interesting" or current chrono we are
+	 * tracking and starts busy chrono if we have pending data.
+	 */
+	if (tcp_write_queue_empty(sk))
+		tcp_chrono_set(tp, TCP_CHRONO_UNSPEC);
+	else if (type == tp->chrono_type)
+		tcp_chrono_set(tp, TCP_CHRONO_BUSY);
 }
 
 /* This routine writes packets to the network.  It advances the
@@ -3328,6 +3339,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 	fo->copied = space;
 
 	tcp_connect_queue_skb(sk, syn_data);
+	if (syn_data->len)
+		tcp_chrono_start(sk, TCP_CHRONO_BUSY);
 
 	err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next v2 3/6] tcp: instrument how long TCP is limited by receive window
From: Yuchung Cheng @ 2016-11-28  7:07 UTC (permalink / raw)
  To: davem, soheil, francisyyan; +Cc: netdev, ncardwell, edumazet, Yuchung Cheng
In-Reply-To: <1480316838-154141-1-git-send-email-ycheng@google.com>

From: Francis Yan <francisyyan@gmail.com>

This patch measures the total time when the TCP stops sending because
the receiver's advertised window is not large enough. Note that
once the limit is lifted we are likely in the busy status if we
have data pending.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 net/ipv4/tcp_output.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e8ea584..b74444c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2144,7 +2144,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 	unsigned int tso_segs, sent_pkts;
 	int cwnd_quota;
 	int result;
-	bool is_cwnd_limited = false;
+	bool is_cwnd_limited = false, is_rwnd_limited = false;
 	u32 max_segs;
 
 	sent_pkts = 0;
@@ -2181,8 +2181,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 				break;
 		}
 
-		if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now)))
+		if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) {
+			is_rwnd_limited = true;
 			break;
+		}
 
 		if (tso_segs == 1) {
 			if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
@@ -2227,6 +2229,11 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
 			break;
 	}
 
+	if (is_rwnd_limited)
+		tcp_chrono_start(sk, TCP_CHRONO_RWND_LIMITED);
+	else
+		tcp_chrono_stop(sk, TCP_CHRONO_RWND_LIMITED);
+
 	if (likely(sent_pkts)) {
 		if (tcp_in_cwnd_reduction(sk))
 			tp->prr_out += sent_pkts;
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next v2 4/6] tcp: instrument how long TCP is limited by insufficient send buffer
From: Yuchung Cheng @ 2016-11-28  7:07 UTC (permalink / raw)
  To: davem, soheil, francisyyan; +Cc: netdev, ncardwell, edumazet, Yuchung Cheng
In-Reply-To: <1480316838-154141-1-git-send-email-ycheng@google.com>

From: Francis Yan <francisyyan@gmail.com>

This patch measures the amount of time when TCP runs out of new data
to send to the network due to insufficient send buffer, while TCP
is still busy delivering (i.e. write queue is not empty). The goal
is to indicate either the send buffer autotuning or user SO_SNDBUF
setting has resulted network under-utilization.

The measurement starts conservatively by checking various conditions
to minimize false claims (i.e. under-estimation is more likely).
The measurement stops when the SOCK_NOSPACE flag is cleared. But it
does not account the time elapsed till the next application write.
Also the measurement only starts if the sender is still busy sending
data, s.t. the limit accounted is part of the total busy time.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 net/ipv4/tcp.c        | 10 ++++++++--
 net/ipv4/tcp_input.c  |  5 ++++-
 net/ipv4/tcp_output.c | 12 ++++++++++++
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 913f9bb..259ffb5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -996,8 +996,11 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset,
 		goto out;
 out_err:
 	/* make sure we wake any epoll edge trigger waiter */
-	if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 && err == -EAGAIN))
+	if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
+		     err == -EAGAIN)) {
 		sk->sk_write_space(sk);
+		tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+	}
 	return sk_stream_error(sk, flags, err);
 }
 
@@ -1331,8 +1334,11 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 out_err:
 	err = sk_stream_error(sk, flags, err);
 	/* make sure we wake any epoll edge trigger waiter */
-	if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 && err == -EAGAIN))
+	if (unlikely(skb_queue_len(&sk->sk_write_queue) == 0 &&
+		     err == -EAGAIN)) {
 		sk->sk_write_space(sk);
+		tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+	}
 	release_sock(sk);
 	return err;
 }
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a5d1727..56fe736 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5059,8 +5059,11 @@ static void tcp_check_space(struct sock *sk)
 		/* pairs with tcp_poll() */
 		smp_mb__after_atomic();
 		if (sk->sk_socket &&
-		    test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
+		    test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
 			tcp_new_space(sk);
+			if (!test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
+				tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED);
+		}
 	}
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b74444c..d3545d0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1514,6 +1514,18 @@ static void tcp_cwnd_validate(struct sock *sk, bool is_cwnd_limited)
 		if (sysctl_tcp_slow_start_after_idle &&
 		    (s32)(tcp_time_stamp - tp->snd_cwnd_stamp) >= inet_csk(sk)->icsk_rto)
 			tcp_cwnd_application_limited(sk);
+
+		/* The following conditions together indicate the starvation
+		 * is caused by insufficient sender buffer:
+		 * 1) just sent some data (see tcp_write_xmit)
+		 * 2) not cwnd limited (this else condition)
+		 * 3) no more data to send (null tcp_send_head )
+		 * 4) application is hitting buffer limit (SOCK_NOSPACE)
+		 */
+		if (!tcp_send_head(sk) && sk->sk_socket &&
+		    test_bit(SOCK_NOSPACE, &sk->sk_socket->flags) &&
+		    (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT))
+			tcp_chrono_start(sk, TCP_CHRONO_SNDBUF_LIMITED);
 	}
 }
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next v2 5/6] tcp: export sender limits chronographs to TCP_INFO
From: Yuchung Cheng @ 2016-11-28  7:07 UTC (permalink / raw)
  To: davem, soheil, francisyyan; +Cc: netdev, ncardwell, edumazet, Yuchung Cheng
In-Reply-To: <1480316838-154141-1-git-send-email-ycheng@google.com>

From: Francis Yan <francisyyan@gmail.com>

This patch exports all the sender chronograph measurements collected
in the previous patches to TCP_INFO interface. Note that busy time
exported includes all the other sending limits (rwnd-limited,
sndbuf-limited). Internally the time unit is jiffy but externally
the measurements are in microseconds for future extensions.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 include/uapi/linux/tcp.h |  4 ++++
 net/ipv4/tcp.c           | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 73ac0db..2863b66 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -214,6 +214,10 @@ struct tcp_info {
 	__u32	tcpi_data_segs_out;	/* RFC4898 tcpEStatsDataSegsOut */
 
 	__u64   tcpi_delivery_rate;
+
+	__u64	tcpi_busy_time;      /* Time (usec) busy sending data */
+	__u64	tcpi_rwnd_limited;   /* Time (usec) limited by receive window */
+	__u64	tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 259ffb5..cdde20f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2708,6 +2708,25 @@ int compat_tcp_setsockopt(struct sock *sk, int level, int optname,
 EXPORT_SYMBOL(compat_tcp_setsockopt);
 #endif
 
+static void tcp_get_info_chrono_stats(const struct tcp_sock *tp,
+				      struct tcp_info *info)
+{
+	u64 stats[__TCP_CHRONO_MAX], total = 0;
+	enum tcp_chrono i;
+
+	for (i = TCP_CHRONO_BUSY; i < __TCP_CHRONO_MAX; ++i) {
+		stats[i] = tp->chrono_stat[i - 1];
+		if (i == tp->chrono_type)
+			stats[i] += tcp_time_stamp - tp->chrono_start;
+		stats[i] *= USEC_PER_SEC / HZ;
+		total += stats[i];
+	}
+
+	info->tcpi_busy_time = total;
+	info->tcpi_rwnd_limited = stats[TCP_CHRONO_RWND_LIMITED];
+	info->tcpi_sndbuf_limited = stats[TCP_CHRONO_SNDBUF_LIMITED];
+}
+
 /* Return information about state of tcp endpoint in API format. */
 void tcp_get_info(struct sock *sk, struct tcp_info *info)
 {
@@ -2800,6 +2819,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 	info->tcpi_bytes_acked = tp->bytes_acked;
 	info->tcpi_bytes_received = tp->bytes_received;
 	info->tcpi_notsent_bytes = max_t(int, 0, tp->write_seq - tp->snd_nxt);
+	tcp_get_info_chrono_stats(tp, info);
 
 	unlock_sock_fast(sk, slow);
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH net-next v2 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
From: Yuchung Cheng @ 2016-11-28  7:07 UTC (permalink / raw)
  To: davem, soheil, francisyyan; +Cc: netdev, ncardwell, edumazet, Yuchung Cheng
In-Reply-To: <1480316838-154141-1-git-send-email-ycheng@google.com>

From: Francis Yan <francisyyan@gmail.com>

This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation.  For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.

To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.

Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
ChangeLog since v1:
 - fix build break if CONFIG_INET is not defined

 Documentation/networking/timestamping.txt | 10 ++++++++++
 arch/alpha/include/uapi/asm/socket.h      |  2 ++
 arch/frv/include/uapi/asm/socket.h        |  2 ++
 arch/ia64/include/uapi/asm/socket.h       |  2 ++
 arch/m32r/include/uapi/asm/socket.h       |  2 ++
 arch/mips/include/uapi/asm/socket.h       |  2 ++
 arch/mn10300/include/uapi/asm/socket.h    |  2 ++
 arch/parisc/include/uapi/asm/socket.h     |  2 ++
 arch/powerpc/include/uapi/asm/socket.h    |  2 ++
 arch/s390/include/uapi/asm/socket.h       |  2 ++
 arch/sparc/include/uapi/asm/socket.h      |  2 ++
 arch/xtensa/include/uapi/asm/socket.h     |  2 ++
 include/linux/tcp.h                       |  2 ++
 include/uapi/asm-generic/socket.h         |  2 ++
 include/uapi/linux/net_tstamp.h           |  3 ++-
 include/uapi/linux/tcp.h                  |  8 ++++++++
 net/core/skbuff.c                         | 14 +++++++++++---
 net/core/sock.c                           |  7 +++++++
 net/ipv4/tcp.c                            | 20 ++++++++++++++++++++
 net/socket.c                              |  7 ++++++-
 20 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 671cccf..96f5069 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -182,6 +182,16 @@ SOF_TIMESTAMPING_OPT_TSONLY:
   the timestamp even if sysctl net.core.tstamp_allow_data is 0.
   This option disables SOF_TIMESTAMPING_OPT_CMSG.
 
+SOF_TIMESTAMPING_OPT_STATS:
+
+  Optional stats that are obtained along with the transmit timestamps.
+  It must be used together with SOF_TIMESTAMPING_OPT_TSONLY. When the
+  transmit timestamp is available, the stats are available in a
+  separate control message of type SCM_TIMESTAMPING_OPT_STATS, as a
+  list of TLVs (struct nlattr) of types. These stats allow the
+  application to associate various transport layer stats with
+  the transmit timestamps, such as how long a certain block of
+  data was limited by peer's receiver window.
 
 New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
 disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..afc901b 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..81e0353 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..57feb0c 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..5853f8e9 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 2027240a..566ecdc 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 5129f23..0e12527 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 9c935d7..7a109b7 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_CNX_ADVICE		0x402E
 
+#define SCM_TIMESTAMPING_OPT_STATS	0x402F
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index 1672e33..44583a5 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 41b51c2..b24a64c 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -96,4 +96,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 31aede3..a25dc32 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -86,6 +86,8 @@
 
 #define SO_CNX_ADVICE		0x0037
 
+#define SCM_TIMESTAMPING_OPT_STATS	0x0038
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 81435d9..9fdbe1f 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -101,4 +101,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index d5d3bd8..00e0ee8 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -428,4 +428,6 @@ static inline void tcp_saved_syn_free(struct tcp_sock *tp)
 	tp->saved_syn = NULL;
 }
 
+struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk);
+
 #endif	/* _LINUX_TCP_H */
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 67d632f..2c748dd 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -92,4 +92,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SCM_TIMESTAMPING_OPT_STATS	54
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 264e515..464dcca 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -25,8 +25,9 @@ enum {
 	SOF_TIMESTAMPING_TX_ACK = (1<<9),
 	SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
 	SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
+	SOF_TIMESTAMPING_OPT_STATS = (1<<12),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_TSONLY,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_STATS,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 2863b66..c53de26 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -220,6 +220,14 @@ struct tcp_info {
 	__u64	tcpi_sndbuf_limited; /* Time (usec) limited by send buffer */
 };
 
+/* netlink attributes types for SCM_TIMESTAMPING_OPT_STATS */
+enum {
+	TCP_NLA_PAD,
+	TCP_NLA_BUSY,		/* Time (usec) busy sending data */
+	TCP_NLA_RWND_LIMITED,	/* Time (usec) limited by receive window */
+	TCP_NLA_SNDBUF_LIMITED,	/* Time (usec) limited by send buffer */
+};
+
 /* for TCP_MD5SIG socket option */
 #define TCP_MD5SIG_MAXKEYLEN	80
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d1d1a5a..ea6fa95 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3839,10 +3839,18 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
 	if (!skb_may_tx_timestamp(sk, tsonly))
 		return;
 
-	if (tsonly)
-		skb = alloc_skb(0, GFP_ATOMIC);
-	else
+	if (tsonly) {
+#ifdef CONFIG_INET
+		if ((sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS) &&
+		    sk->sk_protocol == IPPROTO_TCP &&
+		    sk->sk_type == SOCK_STREAM)
+			skb = tcp_get_timestamping_opt_stats(sk);
+		else
+#endif
+			skb = alloc_skb(0, GFP_ATOMIC);
+	} else {
 		skb = skb_clone(orig_skb, GFP_ATOMIC);
+	}
 	if (!skb)
 		return;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 14e6145..d8c7f8c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -854,6 +854,13 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
 				sk->sk_tskey = 0;
 			}
 		}
+
+		if (val & SOF_TIMESTAMPING_OPT_STATS &&
+		    !(val & SOF_TIMESTAMPING_OPT_TSONLY)) {
+			ret = -EINVAL;
+			break;
+		}
+
 		sk->sk_tsflags = val;
 		if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
 			sock_enable_timestamp(sk,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cdde20f..1149b48 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2841,6 +2841,26 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
 
+struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk)
+{
+	const struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *stats;
+	struct tcp_info info;
+
+	stats = alloc_skb(3 * nla_total_size_64bit(sizeof(u64)), GFP_ATOMIC);
+	if (!stats)
+		return NULL;
+
+	tcp_get_info_chrono_stats(tp, &info);
+	nla_put_u64_64bit(stats, TCP_NLA_BUSY,
+			  info.tcpi_busy_time, TCP_NLA_PAD);
+	nla_put_u64_64bit(stats, TCP_NLA_RWND_LIMITED,
+			  info.tcpi_rwnd_limited, TCP_NLA_PAD);
+	nla_put_u64_64bit(stats, TCP_NLA_SNDBUF_LIMITED,
+			  info.tcpi_sndbuf_limited, TCP_NLA_PAD);
+	return stats;
+}
+
 static int do_tcp_getsockopt(struct sock *sk, int level,
 		int optname, char __user *optval, int __user *optlen)
 {
diff --git a/net/socket.c b/net/socket.c
index e2584c5..e631894 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -693,9 +693,14 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 	    (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
 	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2))
 		empty = 0;
-	if (!empty)
+	if (!empty) {
 		put_cmsg(msg, SOL_SOCKET,
 			 SCM_TIMESTAMPING, sizeof(tss), &tss);
+
+		if (skb->len && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_STATS))
+			put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_OPT_STATS,
+				 skb->len, skb->data);
+	}
 }
 EXPORT_SYMBOL_GPL(__sock_recv_timestamp);
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* [PATCH] net: arc_emac: add dependencies on associated arches and compile test
From: Peter Robinson @ 2016-11-28  7:12 UTC (permalink / raw)
  To: Xing Zheng, Alexander Kochetkov, Philippe Reynes, David S. Miller,
	netdev
  Cc: Peter Robinson

Add dependencies on the architectures that support these devices and
add compile test to ensure ongoing code build coverage.

Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
---
 drivers/net/ethernet/arc/Kconfig | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/arc/Kconfig b/drivers/net/ethernet/arc/Kconfig
index 6890451..e743ddf 100644
--- a/drivers/net/ethernet/arc/Kconfig
+++ b/drivers/net/ethernet/arc/Kconfig
@@ -17,13 +17,14 @@ if NET_VENDOR_ARC
 
 config ARC_EMAC_CORE
 	tristate
+	depends on ARC || ARCH_ROCKCHIP || COMPILE_TEST
 	select MII
 	select PHYLIB
 
 config ARC_EMAC
 	tristate "ARC EMAC support"
 	select ARC_EMAC_CORE
-	depends on OF_IRQ && OF_NET && HAS_DMA
+	depends on OF_IRQ && OF_NET && HAS_DMA && (ARC || COMPILE_TEST)
 	---help---
 	  On some legacy ARC (Synopsys) FPGA boards such as ARCAngel4/ML50x
 	  non-standard on-chip ethernet device ARC EMAC 10/100 is used.
@@ -32,7 +33,7 @@ config ARC_EMAC
 config EMAC_ROCKCHIP
 	tristate "Rockchip EMAC support"
 	select ARC_EMAC_CORE
-	depends on OF_IRQ && OF_NET && REGULATOR && HAS_DMA
+	depends on OF_IRQ && OF_NET && REGULATOR && HAS_DMA && (ARCH_ROCKCHIP || COMPILE_TEST)
 	---help---
 	  Support for Rockchip RK3036/RK3066/RK3188 EMAC ethernet controllers.
 	  This selects Rockchip SoC glue layer support for the
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net-next v3 0/4] Documentation: net: phy: Improve documentation
From: Jerome Brunet @ 2016-11-28  7:31 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: davem, andrew, sf84, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur
In-Reply-To: <20161128024515.13070-1-f.fainelli@gmail.com>

On Sun, 2016-11-27 at 18:45 -0800, Florian Fainelli wrote:
> Hi all,
> 
> This patch series addresses discussions and feedback that was
> recently received
> on the mailing-list in the area of: flow control/pause frames,
> interpretation of
> phy_interface_t and finally add some links to useful standards
> documents.
> 
> Changes in v3:
> 
> - add Timur's feedback into patch 3
> 
> Changes in v2:
> 
> - clarify a few things in the RGMII section, add a paragraph about
> common issues
>   with RGMII delay mismatches
> 

Thanks a lot Florian. This is really helping, especially the part about
RGMII delays.

Reviewed-by: Jerome Brunet <jbrunet@baylibre.com>

> Florian Fainelli (4):
>   Documentation: net: phy: remove description of function pointers
>   Documentation: net: phy: Add a paragraph about pause frames/flow
>     control
>   Documentation: net: phy: Add blurb about RGMII
>   Documentation: net: phy: Add links to several standards documents
> 
>  Documentation/networking/phy.txt | 140
> +++++++++++++++++++++++++++++----------
>  1 file changed, 105 insertions(+), 35 deletions(-)
> 

^ permalink raw reply

* Re: [PATCH net] net/sched: act_pedit: limit negative offset
From: Amir Vadai" @ 2016-11-28  7:51 UTC (permalink / raw)
  To: David Miller; +Cc: xiyou.wangcong, netdev, jhs, ogerlitz, hadarh, jiri
In-Reply-To: <20161128.004936.2064564176474656911.davem@davemloft.net>

On Mon, Nov 28, 2016 at 12:49:36AM -0500, David Miller wrote:
> From: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Sun, 27 Nov 2016 21:39:33 -0800
> 
> > On Sun, Nov 27, 2016 at 7:58 AM, Amir Vadai <amir@vadai.me> wrote:
> >> Should not allow setting a negative offset that goes below the skb head.
> > ...
> >> diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
> >> index b54d56d4959b..e79e8a88f2d2 100644
> >> --- a/net/sched/act_pedit.c
> >> +++ b/net/sched/act_pedit.c
> >> @@ -154,8 +154,11 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
> >>                         }
> >>
> >>                         ptr = skb_header_pointer(skb, off + offset, 4, &_data);
> >> -                       if (!ptr)
> >> +                       if ((unsigned char *)ptr < skb->head) {
> > 
> > 
> > ptr returned could be &_data, which is on stack, so why this comparison
> > makes sense for this case?
> 
> Indeed, this will definitely do the wrong thing when the on-stack area
> passed back to ptr.
yes - my bad. will correct it and send v1

^ permalink raw reply

* [PATCH 1/1] net: macb: ensure ordering write to re-enable RX smoothly
From: Zumeng Chen @ 2016-11-28  7:57 UTC (permalink / raw)
  To: nicolas.ferre; +Cc: davem, netdev, linux-kernel

When a hardware issue happened as described by inline comments, the register
write pattern looks like the following:

  <write ~MACB_BIT(RE)>
  + wmb();
  <write MACB_BIT(RE)>

There might be a memory barrier between these two write operations, so add wmb
to ensure an flip from 0 to 1 for NCR.

Signed-off-by: Zumeng Chen <zumeng.chen@windriver.com>
---
 drivers/net/ethernet/cadence/macb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
index 533653b..2f9c5b2 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -1156,6 +1156,7 @@ static irqreturn_t macb_interrupt(int irq, void *dev_id)
 		if (status & MACB_BIT(RXUBR)) {
 			ctrl = macb_readl(bp, NCR);
 			macb_writel(bp, NCR, ctrl & ~MACB_BIT(RE));
+			wmb();
 			macb_writel(bp, NCR, ctrl | MACB_BIT(RE));
 
 			if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
-- 
2.4.11

^ permalink raw reply related

* Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support
From: Uwe Kleine-König @ 2016-11-28  8:09 UTC (permalink / raw)
  To: Andrew Lunn, Rob Herring, Frank Rowand
  Cc: Andreas Färber, netdev, linux-arm-kernel, Michal Hrusecki,
	Tomas Hlavacek, Bed??icha Ko??atu, Vivien Didelot,
	Florian Fainelli, linux-kernel, devicetree
In-Reply-To: <20161127231009.GA17704@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 2004 bytes --]

Hello Andrew,

On Mon, Nov 28, 2016 at 12:10:09AM +0100, Andrew Lunn wrote:
> > Try to see it from my perspective: I see that some vf610 device I don't
> > have (found via `git grep marvell,mv88e6` or so) uses
> > "marvell,mv88e6085". I then assume it has that device on board. How
> > would I know it doesn't? Same for the other boards you mention.
> > 
> > Unfortunately some of your replies are slightly cryptic. Had you simply
> > replied 'please just use "marvell,mv88e6085" instead', it would've been
> > much more clear what you want. (Same for extending the subject instead
> > of just pointing to some FAQ.)
> 
> By reading the FAQ you have learnt more than me saying put the correct
> tree in the subject line. By asking you to explain why you need a
> compatible string, i'm trying to make you think, look at the code and
> understand it. In the future, you might think and understand the code
> before posting a patch, and then we all save time.

I agree to Andreas though, that it makes an school teacher impression.
Something like:

	Please fix the subject. Check the FAQ for the details, which btw
	is worth a read completely.

is IMHO better in this regard and once you found the problem there you
don't need to ask back if it's that what was meant.

> > So are you okay with patch 1/2 documenting the compatible? Then we could
> > drop 2/2 and use "marvell,mv88e6176", "marvell,mv88e6085" instead of
> > just the latter. Or would you rather drop both and keep the actual chip
> > a comment?
> 
> A comment only please.

I still wonder (and didn't get an answer back when I asked about this)
why a comment is preferred here. For other devices I know it's usual and
requested by the maintainers to use:

	compatible = "exact name", "earlyer device to match driver";

. This is more robust, documents the situation more formally and makes
it better greppable. The price to pay is only a few bytes in the dtb
which IMO is ok.

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 1/5] net: mvneta: Use cacheable memory to store the rx buffer virtual address
From: Jisheng Zhang @ 2016-11-28  8:35 UTC (permalink / raw)
  To: Gregory CLEMENT
  Cc: David S. Miller, linux-kernel, netdev, Arnd Bergmann,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Thomas Petazzoni, linux-arm-kernel, Nadav Haklai, Marcin Wojtas,
	Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <7e6004f918d3fcde9ae71e7893d26b19086236a3.1480087510.git-series.gregory.clement@free-electrons.com>

Hi Gregory,

On Fri, 25 Nov 2016 16:30:14 +0100 Gregory CLEMENT wrote:

> Until now the virtual address of the received buffer were stored in the
> cookie field of the rx descriptor. However, this field is 32-bits only
> which prevents to use the driver on a 64-bits architecture.
> 
> With this patch the virtual address is stored in an array not shared with
> the hardware (no more need to use the DMA API). Thanks to this, it is
> possible to use cache contrary to the access of the rx descriptor member.
> 
> The change is done in the swbm path only because the hwbm uses the cookie
> field, this also means that currently the hwbm is not usable in 64-bits.
> 
> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 96 ++++++++++++++++++++++++----
>  1 file changed, 84 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 87274d4ab102..b6849f88cab7 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -561,6 +561,9 @@ struct mvneta_rx_queue {
>  	u32 pkts_coal;
>  	u32 time_coal;
>  
> +	/* Virtual address of the RX buffer */
> +	void  **buf_virt_addr;

can we store buf_phys_addr in cacheable memory as well?

> +
>  	/* Virtual address of the RX DMA descriptors array */
>  	struct mvneta_rx_desc *descs;
>  
> @@ -1573,10 +1576,14 @@ static void mvneta_tx_done_pkts_coal_set(struct mvneta_port *pp,
>  
>  /* Handle rx descriptor fill by setting buf_cookie and buf_phys_addr */
>  static void mvneta_rx_desc_fill(struct mvneta_rx_desc *rx_desc,
> -				u32 phys_addr, u32 cookie)
> +				u32 phys_addr, void *virt_addr,
> +				struct mvneta_rx_queue *rxq)
>  {
> -	rx_desc->buf_cookie = cookie;
> +	int i;
> +
>  	rx_desc->buf_phys_addr = phys_addr;
> +	i = rx_desc - rxq->descs;
> +	rxq->buf_virt_addr[i] = virt_addr;
>  }
>  
>  /* Decrement sent descriptors counter */
> @@ -1781,7 +1788,8 @@ EXPORT_SYMBOL_GPL(mvneta_frag_free);
>  
>  /* Refill processing for SW buffer management */
>  static int mvneta_rx_refill(struct mvneta_port *pp,
> -			    struct mvneta_rx_desc *rx_desc)
> +			    struct mvneta_rx_desc *rx_desc,
> +			    struct mvneta_rx_queue *rxq)
>  
>  {
>  	dma_addr_t phys_addr;
> @@ -1799,7 +1807,7 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
>  		return -ENOMEM;
>  	}
>  
> -	mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
> +	mvneta_rx_desc_fill(rx_desc, phys_addr, data, rxq);
>  	return 0;
>  }
>  
> @@ -1861,7 +1869,12 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
>  
>  	for (i = 0; i < rxq->size; i++) {
>  		struct mvneta_rx_desc *rx_desc = rxq->descs + i;
> -		void *data = (void *)rx_desc->buf_cookie;
> +		void *data;
> +
> +		if (!pp->bm_priv)
> +			data = rxq->buf_virt_addr[i];
> +		else
> +			data = (void *)(uintptr_t)rx_desc->buf_cookie;
>  
>  		dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
>  				 MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
> @@ -1894,12 +1907,13 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
>  		unsigned char *data;
>  		dma_addr_t phys_addr;
>  		u32 rx_status, frag_size;
> -		int rx_bytes, err;
> +		int rx_bytes, err, index;
>  
>  		rx_done++;
>  		rx_status = rx_desc->status;
>  		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
> -		data = (unsigned char *)rx_desc->buf_cookie;
> +		index = rx_desc - rxq->descs;
> +		data = (unsigned char *)rxq->buf_virt_addr[index];
>  		phys_addr = rx_desc->buf_phys_addr;
>  
>  		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
> @@ -1938,7 +1952,7 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
>  		}
>  
>  		/* Refill processing */
> -		err = mvneta_rx_refill(pp, rx_desc);
> +		err = mvneta_rx_refill(pp, rx_desc, rxq);
>  		if (err) {
>  			netdev_err(dev, "Linux processing - Can't refill\n");
>  			rxq->missed++;
> @@ -2020,7 +2034,7 @@ static int mvneta_rx_hwbm(struct mvneta_port *pp, int rx_todo,
>  		rx_done++;
>  		rx_status = rx_desc->status;
>  		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
> -		data = (unsigned char *)rx_desc->buf_cookie;
> +		data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
>  		phys_addr = rx_desc->buf_phys_addr;
>  		pool_id = MVNETA_RX_GET_BM_POOL_ID(rx_desc);
>  		bm_pool = &pp->bm_priv->bm_pools[pool_id];
> @@ -2708,6 +2722,57 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
>  	return rx_done;
>  }
>  
> +/* Refill processing for HW buffer management */
> +static int mvneta_rx_hwbm_refill(struct mvneta_port *pp,
> +				 struct mvneta_rx_desc *rx_desc)
> +
> +{
> +	dma_addr_t phys_addr;
> +	void *data;
> +
> +	data = mvneta_frag_alloc(pp->frag_size);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	phys_addr = dma_map_single(pp->dev->dev.parent, data,
> +				   MVNETA_RX_BUF_SIZE(pp->pkt_size),
> +				   DMA_FROM_DEVICE);
> +	if (unlikely(dma_mapping_error(pp->dev->dev.parent, phys_addr))) {
> +		mvneta_frag_free(pp->frag_size, data);
> +		return -ENOMEM;
> +	}
> +
> +	phys_addr += pp->rx_offset_correction;
> +	rx_desc->buf_phys_addr = phys_addr;
> +	rx_desc->buf_cookie = (uintptr_t)data;
> +
> +	return 0;
> +}
> +
> +/* Handle rxq fill: allocates rxq skbs; called when initializing a port */
> +static int mvneta_rxq_bm_fill(struct mvneta_port *pp,
> +			      struct mvneta_rx_queue *rxq,
> +			      int num)
> +{
> +	int i;
> +
> +	for (i = 0; i < num; i++) {
> +		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
> +		if (mvneta_rx_hwbm_refill(pp, rxq->descs + i) != 0) {
> +			netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs  filled\n",
> +				   __func__, rxq->id, i, num);
> +			break;
> +		}
> +	}
> +
> +	/* Add this number of RX descriptors as non occupied (ready to
> +	 * get packets)
> +	 */
> +	mvneta_rxq_non_occup_desc_add(pp, rxq, i);
> +
> +	return i;
> +}
> +
>  /* Handle rxq fill: allocates rxq skbs; called when initializing a port */
>  static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>  			   int num)
> @@ -2716,7 +2781,7 @@ static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>  
>  	for (i = 0; i < num; i++) {
>  		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
> -		if (mvneta_rx_refill(pp, rxq->descs + i) != 0) {
> +		if (mvneta_rx_refill(pp, rxq->descs + i, rxq) != 0) {
>  			netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs  filled\n",
>  				__func__, rxq->id, i, num);
>  			break;
> @@ -2784,14 +2849,21 @@ static int mvneta_rxq_init(struct mvneta_port *pp,
>  		mvneta_rxq_buf_size_set(pp, rxq,
>  					MVNETA_RX_BUF_SIZE(pp->pkt_size));
>  		mvneta_rxq_bm_disable(pp, rxq);
> +
> +		rxq->buf_virt_addr = devm_kmalloc(pp->dev->dev.parent,
> +						  rxq->size * sizeof(void *),
> +						  GFP_KERNEL);

I would suggest allocate this buffer during probe. Otherwise, there's
memory leak if we either change the mtu or close then open the eth in
a loop, e.g

while true
do
	ifconfig eth0 up
	ifconfig eth0 down
done

Thanks,
Jisheng

> +		if (!rxq->buf_virt_addr)
> +			return -ENOMEM;
> +
> +		mvneta_rxq_fill(pp, rxq, rxq->size);
>  	} else {
>  		mvneta_rxq_bm_enable(pp, rxq);
>  		mvneta_rxq_long_pool_set(pp, rxq);
>  		mvneta_rxq_short_pool_set(pp, rxq);
> +		mvneta_rxq_bm_fill(pp, rxq, rxq->size);
>  	}
>  
> -	mvneta_rxq_fill(pp, rxq, rxq->size);
> -
>  	return 0;
>  }
>  

^ permalink raw reply

* [PATCH v4] cpsw: ethtool: add support for getting/setting EEE registers
From: yegorslists @ 2016-11-28  8:41 UTC (permalink / raw)
  To: netdev; +Cc: linux-omap, grygorii.strashko, mugunthanvnm, davem,
	Yegor Yefremov

From: Yegor Yefremov <yegorslists@googlemail.com>

Add the ability to query and set Energy Efficient Ethernet parameters
via ethtool for applicable devices.

This patch doesn't activate full EEE support in cpsw driver, but it
enables reading and writing EEE advertising settings. This way one
can disable advertising EEE for certain speeds.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Acked-by: Rami Rosen <roszenrami@gmail.com>
---
Changes:
	v4: respine against net-next (David Miller)
	v3: explain what features will be available with this patch (Florian Fainelli)
	v2: make routines static (Rami Rosen)

 drivers/net/ethernet/ti/cpsw.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index da40ea5..df87bff 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2237,6 +2237,30 @@ static int cpsw_set_channels(struct net_device *ndev,
 	return ret;
 }
 
+static int cpsw_get_eee(struct net_device *ndev, struct ethtool_eee *edata)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(cpsw, priv);
+
+	if (cpsw->slaves[slave_no].phy)
+		return phy_ethtool_get_eee(cpsw->slaves[slave_no].phy, edata);
+	else
+		return -EOPNOTSUPP;
+}
+
+static int cpsw_set_eee(struct net_device *ndev, struct ethtool_eee *edata)
+{
+	struct cpsw_priv *priv = netdev_priv(ndev);
+	struct cpsw_common *cpsw = priv->cpsw;
+	int slave_no = cpsw_slave_index(cpsw, priv);
+
+	if (cpsw->slaves[slave_no].phy)
+		return phy_ethtool_set_eee(cpsw->slaves[slave_no].phy, edata);
+	else
+		return -EOPNOTSUPP;
+}
+
 static const struct ethtool_ops cpsw_ethtool_ops = {
 	.get_drvinfo	= cpsw_get_drvinfo,
 	.get_msglevel	= cpsw_get_msglevel,
@@ -2260,6 +2284,8 @@ static const struct ethtool_ops cpsw_ethtool_ops = {
 	.set_channels	= cpsw_set_channels,
 	.get_link_ksettings	= cpsw_get_link_ksettings,
 	.set_link_ksettings	= cpsw_set_link_ksettings,
+	.get_eee	= cpsw_get_eee,
+	.set_eee	= cpsw_set_eee,
 };
 
 static void cpsw_slave_init(struct cpsw_slave *slave, struct cpsw_common *cpsw,
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net-next 0/5] Support Armada 37xx SoC (ARMv8 64-bits) in mvneta driver
From: Jisheng Zhang @ 2016-11-28  8:40 UTC (permalink / raw)
  To: Gregory CLEMENT
  Cc: David S. Miller, linux-kernel, netdev, Arnd Bergmann,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Thomas Petazzoni, linux-arm-kernel, Nadav Haklai, Marcin Wojtas,
	Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <cover.2b146800967005632cd02d4da77397e6e2fdf51f.1480087510.git-series.gregory.clement@free-electrons.com>

Hi Gregory,

On Fri, 25 Nov 2016 16:30:13 +0100 Gregory CLEMENT wrote:

> Hi,
> 
> The Armada 37xx is a new ARMv8 SoC from Marvell using same network
> controller as the older Armada 370/38x/XP SoCs. This series adapts the
> driver in order to be able to use it on this new SoC. The main changes
> are:
> 
> - 64-bits support: the first patches allow using the driver on a 64-bit
>   architecture.
> 
> - MBUS support: the mbus configuration is different on Armada 37xx
>   from the older SoCs.
> 
> - per cpu interrupt: Armada 37xx do not support per cpu interrupt for
>   the NETA IP, the non-per-CPU behavior was added back.
> 
> The first item is solved by patches 1 to 3.
> The 2 last items are solved by patch 4.
> In patch 5 the dt support is added.
> 
> Beside Armada 37xx, the series have been tested on Armada XP and
> Armada 38x (with Hardware Buffer Management and with Software Buffer
> Managment).

AFAICT, this is a V2? seems no Change log ;)

Thanks,
Jisheng

^ permalink raw reply

* [PATCH net-next V2 0/9] liquidio VF operations
From: Raghu Vatsavayi @ 2016-11-28  8:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, Raghu Vatsavayi

Hi Dave,

This patchseries adds support for VF device specific operations
like mailbox, queues and register access. I also removed extra 'void *'
casting that was reported in V1 patch. Please apply the patches in
following order as these patches depend on each other.

Thanks

Raghu Vatsavayi (9):
  liquidio CN23XX: VF register definitions
  liquidio CN23XX: VF registration
  liquidio CN23XX: VF config setup
  liquidio CN23XX: VF queue setup
  liquidio CN23XX: VF register access
  liquidio CN23XX: init VF softcommand queues
  liquidio CN23XX: VF mailbox
  liquidio CN23XX: VF interrupt
  liquidio CN23XX: VF init and destroy

 drivers/net/ethernet/cavium/Kconfig                |  12 +
 drivers/net/ethernet/cavium/liquidio/Makefile      |  22 +
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c    | 701 +++++++++++++++++++++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |  48 ++
 .../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h  | 274 ++++++++
 drivers/net/ethernet/cavium/liquidio/lio_core.c    |   7 -
 drivers/net/ethernet/cavium/liquidio/lio_main.c    |   6 +-
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 614 ++++++++++++++++++
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  66 +-
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   9 +-
 .../net/ethernet/cavium/liquidio/request_manager.c |  11 +-
 11 files changed, 1755 insertions(+), 15 deletions(-)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net-next V2 1/9] liquidio CN23XX: VF register definitions
From: Raghu Vatsavayi @ 2016-11-28  8:50 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480323038-13702-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for CN23xx VF registers.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 .../net/ethernet/cavium/liquidio/cn23xx_vf_regs.h  | 274 +++++++++++++++++++++
 1 file changed, 274 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
new file mode 100644
index 0000000..d33dd8f
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_regs.h
@@ -0,0 +1,274 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ *          Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***********************************************************************/
+/*! \file cn23xx_vf_regs.h
+ * \brief Host Driver: Register Address and Register Mask values for
+ * Octeon CN23XX vf functions.
+ */
+
+#ifndef __CN23XX_VF_REGS_H__
+#define __CN23XX_VF_REGS_H__
+
+#define     CN23XX_CONFIG_XPANSION_BAR             0x38
+
+#define     CN23XX_CONFIG_PCIE_CAP                 0x70
+#define     CN23XX_CONFIG_PCIE_DEVCAP              0x74
+#define     CN23XX_CONFIG_PCIE_DEVCTL              0x78
+#define     CN23XX_CONFIG_PCIE_LINKCAP             0x7C
+#define     CN23XX_CONFIG_PCIE_LINKCTL             0x80
+#define     CN23XX_CONFIG_PCIE_SLOTCAP             0x84
+#define     CN23XX_CONFIG_PCIE_SLOTCTL             0x88
+
+#define     CN23XX_CONFIG_PCIE_FLTMSK              0x720
+
+/* The input jabber is used to determine the TSO max size.
+ * Due to H/W limitation, this need to be reduced to 60000
+ * in order to to H/W TSO and avoid the WQE malfarmation
+ * PKO_BUG_24989_WQE_LEN
+ */
+#define    CN23XX_DEFAULT_INPUT_JABBER             0xEA60 /*60000*/
+
+/* ##############  BAR0 Registers ################ */
+
+/* Each Input Queue register is at a 16-byte Offset in BAR0 */
+#define    CN23XX_VF_IQ_OFFSET                     0x20000
+
+/*###################### REQUEST QUEUE #########################*/
+
+/* 64 registers for Input Queue Instr Count - SLI_PKT_IN_DONE0_CNTS */
+#define    CN23XX_VF_SLI_IQ_INSTR_COUNT_START64     0x10040
+
+/* 64 registers for Input Queues Start Addr - SLI_PKT0_INSTR_BADDR */
+#define    CN23XX_VF_SLI_IQ_BASE_ADDR_START64       0x10010
+
+/* 64 registers for Input Doorbell - SLI_PKT0_INSTR_BAOFF_DBELL */
+#define    CN23XX_VF_SLI_IQ_DOORBELL_START          0x10020
+
+/* 64 registers for Input Queue size - SLI_PKT0_INSTR_FIFO_RSIZE */
+#define    CN23XX_VF_SLI_IQ_SIZE_START              0x10030
+
+/* 64 registers (64-bit) - ES, RO, NS, Arbitration for Input Queue Data &
+ * gather list fetches. SLI_PKT(0..63)_INPUT_CONTROL.
+ */
+#define    CN23XX_VF_SLI_IQ_PKT_CONTROL_START64     0x10000
+
+/*------- Request Queue Macros ---------*/
+#define CN23XX_VF_SLI_IQ_PKT_CONTROL64(iq)		\
+	(CN23XX_VF_SLI_IQ_PKT_CONTROL_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_BASE_ADDR64(iq)		\
+	(CN23XX_VF_SLI_IQ_BASE_ADDR_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_SIZE(iq)			\
+	(CN23XX_VF_SLI_IQ_SIZE_START + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_DOORBELL(iq)			\
+	(CN23XX_VF_SLI_IQ_DOORBELL_START + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+#define CN23XX_VF_SLI_IQ_INSTR_COUNT64(iq)		\
+	(CN23XX_VF_SLI_IQ_INSTR_COUNT_START64 + ((iq) * CN23XX_VF_IQ_OFFSET))
+
+/*------------------ Masks ----------------*/
+#define    CN23XX_PKT_INPUT_CTL_VF_NUM                  BIT_ULL(32)
+#define    CN23XX_PKT_INPUT_CTL_MAC_NUM                 BIT(29)
+/* Number of instructions to be read in one MAC read request.
+ * setting to Max value(4)
+ */
+#define    CN23XX_PKT_INPUT_CTL_RDSIZE                  (3 << 25)
+#define    CN23XX_PKT_INPUT_CTL_IS_64B                  BIT(24)
+#define    CN23XX_PKT_INPUT_CTL_RST                     BIT(23)
+#define    CN23XX_PKT_INPUT_CTL_QUIET                   BIT(28)
+#define    CN23XX_PKT_INPUT_CTL_RING_ENB                BIT(22)
+#define    CN23XX_PKT_INPUT_CTL_DATA_NS                 BIT(8)
+#define    CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP        BIT(6)
+#define    CN23XX_PKT_INPUT_CTL_DATA_RO                 BIT(5)
+#define    CN23XX_PKT_INPUT_CTL_USE_CSR                 BIT(4)
+#define    CN23XX_PKT_INPUT_CTL_GATHER_NS               BIT(3)
+#define    CN23XX_PKT_INPUT_CTL_GATHER_ES_64B_SWAP      (2)
+#define    CN23XX_PKT_INPUT_CTL_GATHER_RO               (1)
+
+/** Rings per Virtual Function [RO] **/
+#define    CN23XX_PKT_INPUT_CTL_RPVF_MASK               (0x3F)
+#define    CN23XX_PKT_INPUT_CTL_RPVF_POS                (48)
+/* These bits[47:44][RO] give the Physical function number info within the MAC*/
+#define    CN23XX_PKT_INPUT_CTL_PF_NUM_MASK             (0x7)
+#define    CN23XX_PKT_INPUT_CTL_PF_NUM_POS              (45)
+/** These bits[43:32][RO] give the virtual function number info within the PF*/
+#define    CN23XX_PKT_INPUT_CTL_VF_NUM_MASK             (0x1FFF)
+#define    CN23XX_PKT_INPUT_CTL_VF_NUM_POS              (32)
+#define    CN23XX_PKT_INPUT_CTL_MAC_NUM_MASK            (0x3)
+#define    CN23XX_PKT_INPUT_CTL_MAC_NUM_POS             (29)
+#define    CN23XX_PKT_IN_DONE_WMARK_MASK                (0xFFFFULL)
+#define    CN23XX_PKT_IN_DONE_WMARK_BIT_POS             (32)
+#define    CN23XX_PKT_IN_DONE_CNT_MASK                  (0x00000000FFFFFFFFULL)
+
+#ifdef __LITTLE_ENDIAN_BITFIELD
+#define CN23XX_PKT_INPUT_CTL_MASK			\
+	(CN23XX_PKT_INPUT_CTL_RDSIZE			\
+	 | CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP	\
+	 | CN23XX_PKT_INPUT_CTL_USE_CSR)
+#else
+#define CN23XX_PKT_INPUT_CTL_MASK			\
+	(CN23XX_PKT_INPUT_CTL_RDSIZE			\
+	 | CN23XX_PKT_INPUT_CTL_DATA_ES_64B_SWAP	\
+	 | CN23XX_PKT_INPUT_CTL_USE_CSR			\
+	 | CN23XX_PKT_INPUT_CTL_GATHER_ES_64B_SWAP)
+#endif
+
+/** Masks for SLI_PKT_IN_DONE(0..63)_CNTS Register */
+#define    CN23XX_IN_DONE_CNTS_PI_INT               BIT_ULL(62)
+#define    CN23XX_IN_DONE_CNTS_CINT_ENB             BIT_ULL(48)
+
+/*############################ OUTPUT QUEUE #########################*/
+
+/* 64 registers for Output queue control - SLI_PKT(0..63)_OUTPUT_CONTROL */
+#define    CN23XX_VF_SLI_OQ_PKT_CONTROL_START       0x10050
+
+/* 64 registers for Output queue buffer and info size - SLI_PKT0_OUT_SIZE */
+#define    CN23XX_VF_SLI_OQ0_BUFF_INFO_SIZE         0x10060
+
+/* 64 registers for Output Queue Start Addr - SLI_PKT0_SLIST_BADDR */
+#define    CN23XX_VF_SLI_OQ_BASE_ADDR_START64       0x10070
+
+/* 64 registers for Output Queue Packet Credits - SLI_PKT0_SLIST_BAOFF_DBELL */
+#define    CN23XX_VF_SLI_OQ_PKT_CREDITS_START       0x10080
+
+/* 64 registers for Output Queue size - SLI_PKT0_SLIST_FIFO_RSIZE */
+#define    CN23XX_VF_SLI_OQ_SIZE_START              0x10090
+
+/* 64 registers for Output Queue Packet Count - SLI_PKT0_CNTS */
+#define    CN23XX_VF_SLI_OQ_PKT_SENT_START          0x100B0
+
+/* 64 registers for Output Queue INT Levels - SLI_PKT0_INT_LEVELS */
+#define    CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64  0x100A0
+
+/* Each Output Queue register is at a 16-byte Offset in BAR0 */
+#define    CN23XX_VF_OQ_OFFSET                      0x20000
+
+/*------- Output Queue Macros ---------*/
+
+#define CN23XX_VF_SLI_OQ_PKT_CONTROL(oq)		\
+	(CN23XX_VF_SLI_OQ_PKT_CONTROL_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_BASE_ADDR64(oq)		\
+	(CN23XX_VF_SLI_OQ_BASE_ADDR_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_SIZE(oq)			\
+	(CN23XX_VF_SLI_OQ_SIZE_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_BUFF_INFO_SIZE(oq)		\
+	(CN23XX_VF_SLI_OQ0_BUFF_INFO_SIZE + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKTS_SENT(oq)		\
+	(CN23XX_VF_SLI_OQ_PKT_SENT_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKTS_CREDIT(oq)		\
+	(CN23XX_VF_SLI_OQ_PKT_CREDITS_START + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(oq)		\
+	(CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+/* Macro's for accessing CNT and TIME separately from INT_LEVELS */
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_CNT(oq)	\
+	(CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 + ((oq) * CN23XX_VF_OQ_OFFSET))
+
+#define CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_TIME(oq)	\
+	(CN23XX_VF_SLI_OQ_PKT_INT_LEVELS_START64 +	\
+	 ((oq) * CN23XX_VF_OQ_OFFSET) + 4)
+
+/*------------------ Masks ----------------*/
+#define    CN23XX_PKT_OUTPUT_CTL_TENB                  BIT(13)
+#define    CN23XX_PKT_OUTPUT_CTL_CENB                  BIT(12)
+#define    CN23XX_PKT_OUTPUT_CTL_IPTR                  BIT(11)
+#define    CN23XX_PKT_OUTPUT_CTL_ES                    BIT(9)
+#define    CN23XX_PKT_OUTPUT_CTL_NSR                   BIT(8)
+#define    CN23XX_PKT_OUTPUT_CTL_ROR                   BIT(7)
+#define    CN23XX_PKT_OUTPUT_CTL_DPTR                  BIT(6)
+#define    CN23XX_PKT_OUTPUT_CTL_BMODE                 BIT(5)
+#define    CN23XX_PKT_OUTPUT_CTL_ES_P                  BIT(3)
+#define    CN23XX_PKT_OUTPUT_CTL_NSR_P                 BIT(2)
+#define    CN23XX_PKT_OUTPUT_CTL_ROR_P                 BIT(1)
+#define    CN23XX_PKT_OUTPUT_CTL_RING_ENB              BIT(0)
+
+/*######################### Mailbox Reg Macros ########################*/
+#define    CN23XX_VF_SLI_PKT_MBOX_INT_START            0x10210
+#define    CN23XX_SLI_PKT_PF_VF_MBOX_SIG_START         0x10200
+
+#define    CN23XX_SLI_MBOX_OFFSET                      0x20000
+#define    CN23XX_SLI_MBOX_SIG_IDX_OFFSET              0x8
+
+#define CN23XX_VF_SLI_PKT_MBOX_INT(q)	\
+	(CN23XX_VF_SLI_PKT_MBOX_INT_START + ((q) * CN23XX_SLI_MBOX_OFFSET))
+
+#define CN23XX_SLI_PKT_PF_VF_MBOX_SIG(q, idx)		\
+	(CN23XX_SLI_PKT_PF_VF_MBOX_SIG_START +		\
+	 ((q) * CN23XX_SLI_MBOX_OFFSET +		\
+	  (idx) * CN23XX_SLI_MBOX_SIG_IDX_OFFSET))
+
+/*######################## INTERRUPTS #########################*/
+
+#define    CN23XX_VF_SLI_INT_SUM_START		  0x100D0
+
+#define CN23XX_VF_SLI_INT_SUM(q)			\
+	(CN23XX_VF_SLI_INT_SUM_START + ((q) * CN23XX_VF_IQ_OFFSET))
+
+/*------------------ Interrupt Masks ----------------*/
+
+#define    CN23XX_INTR_PO_INT                   BIT_ULL(63)
+#define    CN23XX_INTR_PI_INT                   BIT_ULL(62)
+#define    CN23XX_INTR_MBOX_INT                 BIT_ULL(61)
+#define    CN23XX_INTR_RESEND                   BIT_ULL(60)
+
+#define    CN23XX_INTR_CINT_ENB                 BIT_ULL(48)
+#define    CN23XX_INTR_MBOX_ENB                 BIT(0)
+
+/*############################ MIO #########################*/
+#define    CN23XX_MIO_PTP_CLOCK_CFG       0x0001070000000f00ULL
+#define    CN23XX_MIO_PTP_CLOCK_LO        0x0001070000000f08ULL
+#define    CN23XX_MIO_PTP_CLOCK_HI        0x0001070000000f10ULL
+#define    CN23XX_MIO_PTP_CLOCK_COMP      0x0001070000000f18ULL
+#define    CN23XX_MIO_PTP_TIMESTAMP       0x0001070000000f20ULL
+#define    CN23XX_MIO_PTP_EVT_CNT         0x0001070000000f28ULL
+#define    CN23XX_MIO_PTP_CKOUT_THRESH_LO 0x0001070000000f30ULL
+#define    CN23XX_MIO_PTP_CKOUT_THRESH_HI 0x0001070000000f38ULL
+#define    CN23XX_MIO_PTP_CKOUT_HI_INCR   0x0001070000000f40ULL
+#define    CN23XX_MIO_PTP_CKOUT_LO_INCR   0x0001070000000f48ULL
+#define    CN23XX_MIO_PTP_PPS_THRESH_LO   0x0001070000000f50ULL
+#define    CN23XX_MIO_PTP_PPS_THRESH_HI   0x0001070000000f58ULL
+#define    CN23XX_MIO_PTP_PPS_HI_INCR     0x0001070000000f60ULL
+#define    CN23XX_MIO_PTP_PPS_LO_INCR     0x0001070000000f68ULL
+
+/*############################ RST #########################*/
+#define    CN23XX_RST_BOOT                0x0001180006001600ULL
+
+/*######################## MSIX TABLE #########################*/
+
+#define    CN23XX_MSIX_TABLE_ADDR_START    0x0
+#define    CN23XX_MSIX_TABLE_DATA_START    0x8
+
+#define    CN23XX_MSIX_TABLE_SIZE          0x10
+#define    CN23XX_MSIX_TABLE_ENTRIES       0x41
+
+#define    CN23XX_MSIX_ENTRY_VECTOR_CTL    BIT_ULL(32)
+
+#define CN23XX_MSIX_TABLE_ADDR(idx)		\
+	(CN23XX_MSIX_TABLE_ADDR_START + ((idx) * CN23XX_MSIX_TABLE_SIZE))
+
+#define CN23XX_MSIX_TABLE_DATA(idx)		\
+	(CN23XX_MSIX_TABLE_DATA_START + ((idx) * CN23XX_MSIX_TABLE_SIZE))
+
+#endif
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next V2 2/9] liquidio CN23XX: VF registration
From: Raghu Vatsavayi @ 2016-11-28  8:50 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480323038-13702-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for cn23xx VF probe and registration.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 drivers/net/ethernet/cavium/Kconfig                |  12 +++
 drivers/net/ethernet/cavium/liquidio/Makefile      |  21 ++++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |  34 ++++++
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 120 +++++++++++++++++++++
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   4 +
 5 files changed, 191 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c

diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig
index 92f411c..c0679c2 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -74,4 +74,16 @@ config OCTEON_MGMT_ETHERNET
 	  port on Cavium Networks' Octeon CN57XX, CN56XX, CN55XX,
 	  CN54XX, CN52XX, and CN6XXX chips.
 
+config LIQUIDIO_VF
+	tristate "Cavium LiquidIO VF support"
+	depends on 64BIT && PCI_MSI
+	select PTP_1588_CLOCK
+	---help---
+	  This driver supports Cavium LiquidIO Intelligent Server Adapter
+	  based on CN23XX chips.
+
+	  To compile this driver as a module, choose M here: The module
+	  will be called liquidio_vf. MSI-X interrupt support is required
+	  for this driver to work correctly
+
 endif # NET_VENDOR_CAVIUM
diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile
index 14958de..69d23fc 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -17,3 +17,24 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \
 			octeon_nic.o
 
 liquidio-objs := lio_main.o octeon_console.o $(liquidio-y)
+
+obj-$(CONFIG_LIQUIDIO_VF) += liquidio_vf.o
+
+ifeq ($(CONFIG_LIQUIDIO)$(CONFIG_LIQUIDIO_VF), yy)
+	liquidio_vf-objs := lio_vf_main.o
+else
+liquidio_vf-$(CONFIG_LIQUIDIO_VF) += lio_ethtool.o \
+			lio_core.o         \
+			request_manager.o  \
+			response_manager.o \
+			octeon_device.o    \
+			cn66xx_device.o    \
+			cn68xx_device.o    \
+			cn23xx_pf_device.o \
+			octeon_mailbox.o   \
+			octeon_mem_ops.o   \
+			octeon_droq.o      \
+			octeon_nic.o
+
+liquidio_vf-objs := lio_vf_main.o $(liquidio_vf-y)
+endif
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
new file mode 100644
index 0000000..015b6d4
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -0,0 +1,34 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ *          Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***********************************************************************/
+/*! \file  cn23xx_device.h
+ * \brief Host Driver: Routines that perform CN23XX specific operations.
+ */
+
+#ifndef __CN23XX_VF_DEVICE_H__
+#define __CN23XX_VF_DEVICE_H__
+
+#include "cn23xx_vf_regs.h"
+
+/* Register address and configuration for a CN23XX devices.
+ * If device specific changes need to be made then add a struct to include
+ * device specific fields as shown in the commented section
+ */
+struct octeon_cn23xx_vf {
+	struct octeon_config *conf;
+};
+#endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
new file mode 100644
index 0000000..fd108cd
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -0,0 +1,120 @@
+/**********************************************************************
+ * Author: Cavium, Inc.
+ *
+ * Contact: support@cavium.com
+ *          Please include "LiquidIO" in the subject.
+ *
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, Version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This file is distributed in the hope that it will be useful, but
+ * AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+ * NONINFRINGEMENT.  See the GNU General Public License for more details.
+ ***********************************************************************/
+#include <linux/pci.h>
+#include <net/vxlan.h>
+#include "liquidio_common.h"
+#include "octeon_droq.h"
+#include "octeon_iq.h"
+#include "response_manager.h"
+#include "octeon_device.h"
+
+MODULE_AUTHOR("Cavium Networks, <support@cavium.com>");
+MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Virtual Function Driver");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(LIQUIDIO_VERSION);
+
+struct octeon_device_priv {
+	/* Tasklet structures for this device. */
+	struct tasklet_struct droq_tasklet;
+	unsigned long napi_mask;
+};
+
+static int
+liquidio_vf_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
+static void liquidio_vf_remove(struct pci_dev *pdev);
+
+static const struct pci_device_id liquidio_vf_pci_tbl[] = {
+	{
+		PCI_VENDOR_ID_CAVIUM, OCTEON_CN23XX_VF_VID,
+		PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0
+	},
+	{
+		0, 0, 0, 0, 0, 0, 0
+	}
+};
+MODULE_DEVICE_TABLE(pci, liquidio_vf_pci_tbl);
+
+static struct pci_driver liquidio_vf_pci_driver = {
+	.name		= "LiquidIO_VF",
+	.id_table	= liquidio_vf_pci_tbl,
+	.probe		= liquidio_vf_probe,
+	.remove		= liquidio_vf_remove,
+};
+
+/**
+ * \brief PCI probe handler
+ * @param pdev PCI device structure
+ * @param ent unused
+ */
+static int
+liquidio_vf_probe(struct pci_dev *pdev,
+		  const struct pci_device_id *ent __attribute__((unused)))
+{
+	struct octeon_device *oct_dev = NULL;
+
+	oct_dev = octeon_allocate_device(pdev->device,
+					 sizeof(struct octeon_device_priv));
+
+	if (!oct_dev) {
+		dev_err(&pdev->dev, "Unable to allocate device\n");
+		return -ENOMEM;
+	}
+
+	dev_info(&pdev->dev, "Initializing device %x:%x.\n",
+		 (u32)pdev->vendor, (u32)pdev->device);
+
+	/* Assign octeon_device for this device to the private data area. */
+	pci_set_drvdata(pdev, oct_dev);
+
+	/* set linux specific device pointer */
+	oct_dev->pci_dev = pdev;
+
+	return 0;
+}
+
+/**
+ * \brief Cleans up resources at unload time
+ * @param pdev PCI device structure
+ */
+static void liquidio_vf_remove(struct pci_dev *pdev)
+{
+	struct octeon_device *oct_dev = pci_get_drvdata(pdev);
+
+	dev_dbg(&oct_dev->pci_dev->dev, "Stopping device\n");
+
+	/* This octeon device has been removed. Update the global
+	 * data structure to reflect this. Free the device structure.
+	 */
+	octeon_free_device_mem(oct_dev);
+}
+
+static int __init liquidio_vf_init(void)
+{
+	octeon_init_device_list(0);
+	return pci_register_driver(&liquidio_vf_pci_driver);
+}
+
+static void __exit liquidio_vf_exit(void)
+{
+	pci_unregister_driver(&liquidio_vf_pci_driver);
+
+	pr_info("LiquidIO_VF network module is now unloaded\n");
+}
+
+module_init(liquidio_vf_init);
+module_exit(liquidio_vf_exit);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index 79c8875..05bb0fd 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -28,6 +28,7 @@
 #include "cn66xx_regs.h"
 #include "cn66xx_device.h"
 #include "cn23xx_pf_device.h"
+#include "cn23xx_vf_device.h"
 
 /** Default configuration
  *  for CN66XX OCTEON Models.
@@ -672,6 +673,9 @@ static struct octeon_device *octeon_allocate_device_mem(u32 pci_id,
 	case OCTEON_CN23XX_PF_VID:
 		configsize = sizeof(struct octeon_cn23xx_pf);
 		break;
+	case OCTEON_CN23XX_VF_VID:
+		configsize = sizeof(struct octeon_cn23xx_vf);
+		break;
 	default:
 		pr_err("%s: Unknown PCI Device: 0x%x\n",
 		       __func__,
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next V2 7/9] liquidio CN23XX: VF mailbox
From: Raghu Vatsavayi @ 2016-11-28  8:50 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480323038-13702-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for VF mailbox setup.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c    | 59 ++++++++++++++++++++++
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 10 ++++
 2 files changed, 69 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index ad4e442..7dfec44 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -17,6 +17,7 @@
  ***********************************************************************/
 #include <linux/pci.h>
 #include <linux/netdevice.h>
+#include <linux/vmalloc.h>
 #include "liquidio_common.h"
 #include "octeon_droq.h"
 #include "octeon_iq.h"
@@ -24,6 +25,7 @@
 #include "octeon_device.h"
 #include "cn23xx_vf_device.h"
 #include "octeon_main.h"
+#include "octeon_mailbox.h"
 
 static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues)
 {
@@ -231,6 +233,61 @@ static void cn23xx_setup_vf_oq_regs(struct octeon_device *oct, u32 oq_no)
 	    (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_OQ_PKTS_CREDIT(oq_no);
 }
 
+static void cn23xx_vf_mbox_thread(struct work_struct *work)
+{
+	struct cavium_wk *wk = (struct cavium_wk *)work;
+	struct octeon_mbox *mbox = (struct octeon_mbox *)wk->ctxptr;
+
+	octeon_mbox_process_message(mbox);
+}
+
+static int cn23xx_free_vf_mbox(struct octeon_device *oct)
+{
+	cancel_delayed_work_sync(&oct->mbox[0]->mbox_poll_wk.work);
+	vfree(oct->mbox[0]);
+	return 0;
+}
+
+static int cn23xx_setup_vf_mbox(struct octeon_device *oct)
+{
+	struct octeon_mbox *mbox = NULL;
+
+	mbox = vmalloc(sizeof(*mbox));
+	if (!mbox)
+		return 1;
+
+	memset(mbox, 0, sizeof(struct octeon_mbox));
+
+	spin_lock_init(&mbox->lock);
+
+	mbox->oct_dev = oct;
+
+	mbox->q_no = 0;
+
+	mbox->state = OCTEON_MBOX_STATE_IDLE;
+
+	/* VF mbox interrupt reg */
+	mbox->mbox_int_reg =
+	    (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_PKT_MBOX_INT(0);
+	/* VF reads from SIG0 reg */
+	mbox->mbox_read_reg =
+	    (u8 *)oct->mmio[0].hw_addr + CN23XX_SLI_PKT_PF_VF_MBOX_SIG(0, 0);
+	/* VF writes into SIG1 reg */
+	mbox->mbox_write_reg =
+	    (u8 *)oct->mmio[0].hw_addr + CN23XX_SLI_PKT_PF_VF_MBOX_SIG(0, 1);
+
+	INIT_DELAYED_WORK(&mbox->mbox_poll_wk.work,
+			  cn23xx_vf_mbox_thread);
+
+	mbox->mbox_poll_wk.ctxptr = mbox;
+
+	oct->mbox[0] = mbox;
+
+	writeq(OCTEON_PFVFSIG, mbox->mbox_read_reg);
+
+	return 0;
+}
+
 static int cn23xx_enable_vf_io_queues(struct octeon_device *oct)
 {
 	u32 q_no;
@@ -338,6 +395,8 @@ int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
 
 	oct->fn_list.setup_iq_regs = cn23xx_setup_vf_iq_regs;
 	oct->fn_list.setup_oq_regs = cn23xx_setup_vf_oq_regs;
+	oct->fn_list.setup_mbox = cn23xx_setup_vf_mbox;
+	oct->fn_list.free_mbox = cn23xx_free_vf_mbox;
 	oct->fn_list.setup_device_regs = cn23xx_setup_vf_device_regs;
 
 	oct->fn_list.enable_io_queues = cn23xx_enable_vf_io_queues;
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 162e47b..43a1e3f 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -143,6 +143,10 @@ static void octeon_destroy_resources(struct octeon_device *oct)
 	int i;
 
 	switch (atomic_read(&oct->status)) {
+	case OCT_DEV_MBOX_SETUP_DONE:
+		oct->fn_list.free_mbox(oct);
+
+		/* fallthrough */
 	case OCT_DEV_IN_RESET:
 	case OCT_DEV_DROQ_INIT_DONE:
 		mdelay(100);
@@ -316,6 +320,12 @@ static int octeon_device_init(struct octeon_device *oct)
 	}
 	atomic_set(&oct->status, OCT_DEV_DROQ_INIT_DONE);
 
+	if (oct->fn_list.setup_mbox(oct)) {
+		dev_err(&oct->pci_dev->dev, "Mailbox setup failed\n");
+		return 1;
+	}
+	atomic_set(&oct->status, OCT_DEV_MBOX_SETUP_DONE);
+
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next V2 8/9] liquidio CN23XX: VF interrupt
From: Raghu Vatsavayi @ 2016-11-28  8:50 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480323038-13702-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for VF interrupt processing.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c    | 265 +++++++++++++++++++++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |   6 +
 drivers/net/ethernet/cavium/liquidio/lio_core.c    |   7 -
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 162 +++++++++++++
 .../net/ethernet/cavium/liquidio/octeon_device.c   |   3 +
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   2 +
 6 files changed, 438 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index 7dfec44..108e487 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -27,6 +27,26 @@
 #include "octeon_main.h"
 #include "octeon_mailbox.h"
 
+u32 cn23xx_vf_get_oq_ticks(struct octeon_device *oct, u32 time_intr_in_us)
+{
+	/* This gives the SLI clock per microsec */
+	u32 oqticks_per_us = (u32)oct->pfvf_hsword.coproc_tics_per_us;
+
+	/* This gives the clock cycles per millisecond */
+	oqticks_per_us *= 1000;
+
+	/* This gives the oq ticks (1024 core clock cycles) per millisecond */
+	oqticks_per_us /= 1024;
+
+	/* time_intr is in microseconds. The next 2 steps gives the oq ticks
+	 * corressponding to time_intr.
+	 */
+	oqticks_per_us *= time_intr_in_us;
+	oqticks_per_us /= 1000;
+
+	return oqticks_per_us;
+}
+
 static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues)
 {
 	u32 loop = BUSY_READING_REG_VF_LOOP_COUNT;
@@ -212,6 +232,11 @@ static void cn23xx_setup_vf_iq_regs(struct octeon_device *oct, u32 iq_no)
 	 */
 	pkt_in_done = readq(iq->inst_cnt_reg);
 
+	if (oct->msix_on) {
+		/* Set CINT_ENB to enable IQ interrupt */
+		writeq((pkt_in_done | CN23XX_INTR_CINT_ENB),
+		       iq->inst_cnt_reg);
+	}
 	iq->reset_instr_cnt = 0;
 }
 
@@ -342,6 +367,240 @@ static void cn23xx_disable_vf_io_queues(struct octeon_device *oct)
 	cn23xx_vf_reset_io_queues(oct, num_queues);
 }
 
+void cn23xx_vf_ask_pf_to_do_flr(struct octeon_device *oct)
+{
+	struct octeon_mbox_cmd mbox_cmd;
+
+	mbox_cmd.msg.u64 = 0;
+	mbox_cmd.msg.s.type = OCTEON_MBOX_REQUEST;
+	mbox_cmd.msg.s.resp_needed = 0;
+	mbox_cmd.msg.s.cmd = OCTEON_VF_FLR_REQUEST;
+	mbox_cmd.msg.s.len = 1;
+	mbox_cmd.q_no = 0;
+	mbox_cmd.recv_len = 0;
+	mbox_cmd.recv_status = 0;
+	mbox_cmd.fn = NULL;
+	mbox_cmd.fn_arg = 0;
+
+	octeon_mbox_write(oct, &mbox_cmd);
+}
+
+static void octeon_pfvf_hs_callback(struct octeon_device *oct,
+				    struct octeon_mbox_cmd *cmd,
+				    void *arg)
+{
+	u32 major = 0;
+
+	memcpy((uint8_t *)&oct->pfvf_hsword, cmd->msg.s.params,
+	       CN23XX_MAILBOX_MSGPARAM_SIZE);
+	if (cmd->recv_len > 1)  {
+		major = ((struct lio_version *)(cmd->data))->major;
+		major = major << 16;
+	}
+
+	atomic_set((atomic_t *)arg, major | 1);
+}
+
+int cn23xx_octeon_pfvf_handshake(struct octeon_device *oct)
+{
+	struct octeon_mbox_cmd mbox_cmd;
+	u32 q_no, count = 0;
+	atomic_t status;
+	u32 pfmajor;
+	u32 vfmajor;
+	u32 ret;
+
+	/* Sending VF_ACTIVE indication to the PF driver */
+	dev_dbg(&oct->pci_dev->dev, "requesting info from pf\n");
+
+	mbox_cmd.msg.u64 = 0;
+	mbox_cmd.msg.s.type = OCTEON_MBOX_REQUEST;
+	mbox_cmd.msg.s.resp_needed = 1;
+	mbox_cmd.msg.s.cmd = OCTEON_VF_ACTIVE;
+	mbox_cmd.msg.s.len = 2;
+	mbox_cmd.data[0] = 0;
+	((struct lio_version *)&mbox_cmd.data[0])->major =
+						LIQUIDIO_BASE_MAJOR_VERSION;
+	((struct lio_version *)&mbox_cmd.data[0])->minor =
+						LIQUIDIO_BASE_MINOR_VERSION;
+	((struct lio_version *)&mbox_cmd.data[0])->micro =
+						LIQUIDIO_BASE_MICRO_VERSION;
+	mbox_cmd.q_no = 0;
+	mbox_cmd.recv_len = 0;
+	mbox_cmd.recv_status = 0;
+	mbox_cmd.fn = (octeon_mbox_callback_t)octeon_pfvf_hs_callback;
+	mbox_cmd.fn_arg = &status;
+
+	/* Interrupts are not enabled at this point.
+	 * Enable them with default oq ticks
+	 */
+	oct->fn_list.enable_interrupt(oct, OCTEON_ALL_INTR);
+
+	octeon_mbox_write(oct, &mbox_cmd);
+
+	atomic_set(&status, 0);
+
+	do {
+		schedule_timeout_uninterruptible(1);
+	} while ((!atomic_read(&status)) && (count++ < 100000));
+
+	/* Disable the interrupt so that the interrupsts will be reenabled
+	 * with the oq ticks received from the PF
+	 */
+	oct->fn_list.disable_interrupt(oct, OCTEON_ALL_INTR);
+
+	ret = atomic_read(&status);
+	if (!ret) {
+		dev_err(&oct->pci_dev->dev, "octeon_pfvf_handshake timeout\n");
+		return 1;
+	}
+
+	for (q_no = 0 ; q_no < oct->num_iqs ; q_no++)
+		oct->instr_queue[q_no]->txpciq.s.pkind = oct->pfvf_hsword.pkind;
+
+	vfmajor = LIQUIDIO_BASE_MAJOR_VERSION;
+	pfmajor = ret >> 16;
+	if (pfmajor != vfmajor) {
+		dev_err(&oct->pci_dev->dev,
+			"VF Liquidio driver (major version %d) is not compatible with Liquidio PF driver (major version %d)\n",
+			vfmajor, pfmajor);
+		return 1;
+	}
+
+	dev_dbg(&oct->pci_dev->dev,
+		"VF Liquidio driver (major version %d), Liquidio PF driver (major version %d)\n",
+		vfmajor, pfmajor);
+
+	dev_dbg(&oct->pci_dev->dev, "got data from pf pkind is %d\n",
+		oct->pfvf_hsword.pkind);
+
+	return 0;
+}
+
+static void cn23xx_handle_vf_mbox_intr(struct octeon_ioq_vector *ioq_vector)
+{
+	struct octeon_device *oct = ioq_vector->oct_dev;
+	u64 mbox_int_val;
+
+	if (!ioq_vector->droq_index) {
+		/* read and clear by writing 1 */
+		mbox_int_val = readq(oct->mbox[0]->mbox_int_reg);
+		writeq(mbox_int_val, oct->mbox[0]->mbox_int_reg);
+		if (octeon_mbox_read(oct->mbox[0]))
+			schedule_delayed_work(&oct->mbox[0]->mbox_poll_wk.work,
+					      msecs_to_jiffies(0));
+	}
+}
+
+static u64 cn23xx_vf_msix_interrupt_handler(void *dev)
+{
+	struct octeon_ioq_vector *ioq_vector = (struct octeon_ioq_vector *)dev;
+	struct octeon_device *oct = ioq_vector->oct_dev;
+	struct octeon_droq *droq = oct->droq[ioq_vector->droq_index];
+	u64 pkts_sent;
+	u64 ret = 0;
+
+	dev_dbg(&oct->pci_dev->dev, "In %s octeon_dev @ %p\n", __func__, oct);
+	pkts_sent = readq(droq->pkts_sent_reg);
+
+	/* If our device has interrupted, then proceed. Also check
+	 * for all f's if interrupt was triggered on an error
+	 * and the PCI read fails.
+	 */
+	if (!pkts_sent || (pkts_sent == 0xFFFFFFFFFFFFFFFFULL))
+		return ret;
+
+	/* Write count reg in sli_pkt_cnts to clear these int. */
+	if ((pkts_sent & CN23XX_INTR_PO_INT) ||
+	    (pkts_sent & CN23XX_INTR_PI_INT)) {
+		if (pkts_sent & CN23XX_INTR_PO_INT)
+			ret |= MSIX_PO_INT;
+	}
+
+	if (pkts_sent & CN23XX_INTR_PI_INT)
+		/* We will clear the count when we update the read_index. */
+		ret |= MSIX_PI_INT;
+
+	if (pkts_sent & CN23XX_INTR_MBOX_INT) {
+		cn23xx_handle_vf_mbox_intr(ioq_vector);
+		ret |= MSIX_MBOX_INT;
+	}
+
+	return ret;
+}
+
+static void cn23xx_enable_vf_interrupt(struct octeon_device *oct, u8 intr_flag)
+{
+	struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
+	u32 q_no, time_threshold;
+
+	if (intr_flag & OCTEON_OUTPUT_INTR) {
+		for (q_no = 0; q_no < oct->num_oqs; q_no++) {
+			/* Set up interrupt packet and time thresholds
+			 * for all the OQs
+			 */
+			time_threshold = cn23xx_vf_get_oq_ticks(
+				oct, (u32)CFG_GET_OQ_INTR_TIME(cn23xx->conf));
+
+			octeon_write_csr64(
+			    oct, CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(q_no),
+			    (CFG_GET_OQ_INTR_PKT(cn23xx->conf) |
+			     ((u64)time_threshold << 32)));
+		}
+	}
+
+	if (intr_flag & OCTEON_INPUT_INTR) {
+		for (q_no = 0; q_no < oct->num_oqs; q_no++) {
+			/* Set CINT_ENB to enable IQ interrupt */
+			octeon_write_csr64(
+			    oct, CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no),
+			    ((octeon_read_csr64(
+				  oct, CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no)) &
+			      ~CN23XX_PKT_IN_DONE_CNT_MASK) |
+			     CN23XX_INTR_CINT_ENB));
+		}
+	}
+
+	/* Set queue-0 MBOX_ENB to enable VF mailbox interrupt */
+	if (intr_flag & OCTEON_MBOX_INTR) {
+		octeon_write_csr64(
+		    oct, CN23XX_VF_SLI_PKT_MBOX_INT(0),
+		    (octeon_read_csr64(oct, CN23XX_VF_SLI_PKT_MBOX_INT(0)) |
+		     CN23XX_INTR_MBOX_ENB));
+	}
+}
+
+static void cn23xx_disable_vf_interrupt(struct octeon_device *oct, u8 intr_flag)
+{
+	u32 q_no;
+
+	if (intr_flag & OCTEON_OUTPUT_INTR) {
+		for (q_no = 0; q_no < oct->num_oqs; q_no++) {
+			/* Write all 1's in INT_LEVEL reg to disable PO_INT */
+			octeon_write_csr64(
+			    oct, CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(q_no),
+			    0x3fffffffffffff);
+		}
+	}
+	if (intr_flag & OCTEON_INPUT_INTR) {
+		for (q_no = 0; q_no < oct->num_oqs; q_no++) {
+			octeon_write_csr64(
+			    oct, CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no),
+			    (octeon_read_csr64(
+				 oct, CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no)) &
+			     ~(CN23XX_INTR_CINT_ENB |
+			       CN23XX_PKT_IN_DONE_CNT_MASK)));
+		}
+	}
+
+	if (intr_flag & OCTEON_MBOX_INTR) {
+		octeon_write_csr64(
+		    oct, CN23XX_VF_SLI_PKT_MBOX_INT(0),
+		    (octeon_read_csr64(oct, CN23XX_VF_SLI_PKT_MBOX_INT(0)) &
+		     ~CN23XX_INTR_MBOX_ENB));
+	}
+}
+
 int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
 {
 	struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
@@ -397,8 +656,14 @@ int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
 	oct->fn_list.setup_oq_regs = cn23xx_setup_vf_oq_regs;
 	oct->fn_list.setup_mbox = cn23xx_setup_vf_mbox;
 	oct->fn_list.free_mbox = cn23xx_free_vf_mbox;
+
+	oct->fn_list.msix_interrupt_handler = cn23xx_vf_msix_interrupt_handler;
+
 	oct->fn_list.setup_device_regs = cn23xx_setup_vf_device_regs;
 
+	oct->fn_list.enable_interrupt = cn23xx_enable_vf_interrupt;
+	oct->fn_list.disable_interrupt = cn23xx_disable_vf_interrupt;
+
 	oct->fn_list.enable_io_queues = cn23xx_enable_vf_io_queues;
 	oct->fn_list.disable_io_queues = cn23xx_disable_vf_io_queues;
 
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index d17c1ce..8590bdb 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -34,6 +34,12 @@ struct octeon_cn23xx_vf {
 
 #define BUSY_READING_REG_VF_LOOP_COUNT		10000
 
+#define CN23XX_MAILBOX_MSGPARAM_SIZE		6
+
+void cn23xx_vf_ask_pf_to_do_flr(struct octeon_device *oct);
+
+int cn23xx_octeon_pfvf_handshake(struct octeon_device *oct);
+
 int cn23xx_setup_octeon_vf_device(struct octeon_device *oct);
 
 void cn23xx_dump_vf_initialized_regs(struct octeon_device *oct);
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_core.c b/drivers/net/ethernet/cavium/liquidio/lio_core.c
index 403bcaa..f629c2f 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_core.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_core.c
@@ -85,13 +85,6 @@ void octeon_update_tx_completion_counters(void *buf, int reqtype,
 	}
 
 	(*pkts_compl)++;
-/*TODO, Use some other pound define to suggest
- * the fact that iqs are not tied to netdevs
- * and can take traffic from different netdevs
- * hence bql reporting is done per packet
- * than in bulk. Usage of NO_NAPI in txq completion is
- * a little confusing
- */
 	*bytes_compl += skb->len;
 }
 
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 43a1e3f..3d5c61a 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -59,6 +59,118 @@ struct octeon_device_priv {
 	.remove		= liquidio_vf_remove,
 };
 
+static
+int liquidio_schedule_msix_droq_pkt_handler(struct octeon_droq *droq, u64 ret)
+{
+	struct octeon_device *oct = droq->oct_dev;
+	struct octeon_device_priv *oct_priv =
+	    (struct octeon_device_priv *)oct->priv;
+
+	if (droq->ops.poll_mode) {
+		droq->ops.napi_fn(droq);
+	} else {
+		if (ret & MSIX_PO_INT) {
+			dev_err(&oct->pci_dev->dev,
+				"should not come here should not get rx when poll mode = 0 for vf\n");
+			tasklet_schedule(&oct_priv->droq_tasklet);
+			return 1;
+		}
+		/* this will be flushed periodically by check iq db */
+		if (ret & MSIX_PI_INT)
+			return 0;
+	}
+	return 0;
+}
+
+static irqreturn_t
+liquidio_msix_intr_handler(int irq __attribute__((unused)), void *dev)
+{
+	struct octeon_ioq_vector *ioq_vector = (struct octeon_ioq_vector *)dev;
+	struct octeon_device *oct = ioq_vector->oct_dev;
+	struct octeon_droq *droq = oct->droq[ioq_vector->droq_index];
+	u64 ret;
+
+	ret = oct->fn_list.msix_interrupt_handler(ioq_vector);
+
+	if ((ret & MSIX_PO_INT) || (ret & MSIX_PI_INT))
+		liquidio_schedule_msix_droq_pkt_handler(droq, ret);
+
+	return IRQ_HANDLED;
+}
+
+/**
+ * \brief Setup interrupt for octeon device
+ * @param oct octeon device
+ *
+ *  Enable interrupt in Octeon device as given in the PCI interrupt mask.
+ */
+static int octeon_setup_interrupt(struct octeon_device *oct)
+{
+	struct msix_entry *msix_entries;
+	int num_alloc_ioq_vectors;
+	int num_ioq_vectors;
+	int irqret;
+	int i;
+
+	if (oct->msix_on) {
+		oct->num_msix_irqs = oct->sriov_info.rings_per_vf;
+
+		oct->msix_entries = kcalloc(
+		    oct->num_msix_irqs, sizeof(struct msix_entry), GFP_KERNEL);
+		if (!oct->msix_entries)
+			return 1;
+
+		msix_entries = (struct msix_entry *)oct->msix_entries;
+
+		for (i = 0; i < oct->num_msix_irqs; i++)
+			msix_entries[i].entry = i;
+		num_alloc_ioq_vectors = pci_enable_msix_range(
+						oct->pci_dev, msix_entries,
+						oct->num_msix_irqs,
+						oct->num_msix_irqs);
+		if (num_alloc_ioq_vectors < 0) {
+			dev_err(&oct->pci_dev->dev, "unable to Allocate MSI-X interrupts\n");
+			kfree(oct->msix_entries);
+			oct->msix_entries = NULL;
+			return 1;
+		}
+		dev_dbg(&oct->pci_dev->dev, "OCTEON: Enough MSI-X interrupts are allocated...\n");
+
+		num_ioq_vectors = oct->num_msix_irqs;
+
+		for (i = 0; i < num_ioq_vectors; i++) {
+			irqret = request_irq(msix_entries[i].vector,
+					     liquidio_msix_intr_handler, 0,
+					     "octeon", &oct->ioq_vector[i]);
+			if (irqret) {
+				dev_err(&oct->pci_dev->dev,
+					"OCTEON: Request_irq failed for MSIX interrupt Error: %d\n",
+					irqret);
+
+				while (i) {
+					i--;
+					irq_set_affinity_hint(
+					    msix_entries[i].vector, NULL);
+					free_irq(msix_entries[i].vector,
+						 &oct->ioq_vector[i]);
+				}
+				pci_disable_msix(oct->pci_dev);
+				kfree(oct->msix_entries);
+				oct->msix_entries = NULL;
+				return 1;
+			}
+			oct->ioq_vector[i].vector = msix_entries[i].vector;
+			/* assign the cpu mask for this msix interrupt vector */
+			irq_set_affinity_hint(
+			    msix_entries[i].vector,
+			    (&oct->ioq_vector[i].affinity_mask));
+		}
+		dev_dbg(&oct->pci_dev->dev,
+			"OCTEON[%d]: MSI-X enabled\n", oct->octeon_id);
+	}
+	return 0;
+}
+
 /**
  * \brief PCI probe handler
  * @param pdev PCI device structure
@@ -77,6 +189,7 @@ struct octeon_device_priv {
 		dev_err(&pdev->dev, "Unable to allocate device\n");
 		return -ENOMEM;
 	}
+	oct_dev->msix_on = LIO_FLAG_MSIX_ENABLED;
 
 	dev_info(&pdev->dev, "Initializing device %x:%x.\n",
 		 (u32)pdev->vendor, (u32)pdev->device);
@@ -140,9 +253,37 @@ static void octeon_pci_flr(struct octeon_device *oct)
  */
 static void octeon_destroy_resources(struct octeon_device *oct)
 {
+	struct msix_entry *msix_entries;
 	int i;
 
 	switch (atomic_read(&oct->status)) {
+	case OCT_DEV_INTR_SET_DONE:
+		/* Disable interrupts  */
+		oct->fn_list.disable_interrupt(oct, OCTEON_ALL_INTR);
+
+		if (oct->msix_on) {
+			msix_entries = (struct msix_entry *)oct->msix_entries;
+			for (i = 0; i < oct->num_msix_irqs; i++) {
+				irq_set_affinity_hint(msix_entries[i].vector,
+						      NULL);
+				free_irq(msix_entries[i].vector,
+					 &oct->ioq_vector[i]);
+			}
+			pci_disable_msix(oct->pci_dev);
+			kfree(oct->msix_entries);
+			oct->msix_entries = NULL;
+		}
+		/* Soft reset the octeon device before exiting */
+		if (oct->pci_dev->reset_fn)
+			octeon_pci_flr(oct);
+		else
+			cn23xx_vf_ask_pf_to_do_flr(oct);
+
+		/* fallthrough */
+	case OCT_DEV_MSIX_ALLOC_VECTOR_DONE:
+		octeon_free_ioq_vector(oct);
+
+		/* fallthrough */
 	case OCT_DEV_MBOX_SETUP_DONE:
 		oct->fn_list.free_mbox(oct);
 
@@ -326,6 +467,27 @@ static int octeon_device_init(struct octeon_device *oct)
 	}
 	atomic_set(&oct->status, OCT_DEV_MBOX_SETUP_DONE);
 
+	if (octeon_allocate_ioq_vector(oct)) {
+		dev_err(&oct->pci_dev->dev, "ioq vector allocation failed\n");
+		return 1;
+	}
+	atomic_set(&oct->status, OCT_DEV_MSIX_ALLOC_VECTOR_DONE);
+
+	dev_info(&oct->pci_dev->dev, "OCTEON_CN23XX VF Version: %s, %d ioqs\n",
+		 LIQUIDIO_VERSION, oct->sriov_info.rings_per_vf);
+
+	/* Setup the interrupt handler and record the INT SUM register address*/
+	if (octeon_setup_interrupt(oct))
+		return 1;
+
+	if (cn23xx_octeon_pfvf_handshake(oct))
+		return 1;
+
+	/* Enable Octeon device interrupts */
+	oct->fn_list.enable_interrupt(oct, OCTEON_ALL_INTR);
+
+	atomic_set(&oct->status, OCT_DEV_INTR_SET_DONE);
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index fcc5f10..6d54032 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -754,6 +754,9 @@ struct octeon_device *octeon_allocate_device(u32 pci_id,
 
 	if (OCTEON_CN23XX_PF(oct))
 		num_ioqs = oct->sriov_info.num_pf_rings;
+	else if (OCTEON_CN23XX_VF(oct))
+		num_ioqs = oct->sriov_info.rings_per_vf;
+
 	size = sizeof(struct octeon_ioq_vector) * num_ioqs;
 
 	oct->ioq_vector = vmalloc(size);
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
index 1e6bfa1..18f6836 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
@@ -53,6 +53,7 @@ enum {
 	NUM_OCTEON_CONFS,
 };
 
+#define  OCTEON_INPUT_INTR    (1)
 #define  OCTEON_OUTPUT_INTR   (2)
 #define  OCTEON_MBOX_INTR     (4)
 #define  OCTEON_ALL_INTR      0xff
@@ -294,6 +295,7 @@ struct octdev_props {
 #define LIO_FLAG_MSIX_ENABLED	0x1
 #define MSIX_PO_INT		0x1
 #define MSIX_PI_INT		0x2
+#define MSIX_MBOX_INT		0x4
 
 struct octeon_pf_vf_hs_word {
 #ifdef __LITTLE_ENDIAN_BITFIELD
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next V2 9/9] liquidio CN23XX: VF init and destroy
From: Raghu Vatsavayi @ 2016-11-28  8:50 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480323038-13702-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for VF initialization and destroy resources.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |   2 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 106 +++++++++++++++++++++
 2 files changed, 108 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index 8590bdb..6715df3 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -36,6 +36,8 @@ struct octeon_cn23xx_vf {
 
 #define CN23XX_MAILBOX_MSGPARAM_SIZE		6
 
+#define MAX_VF_IP_OP_PENDING_PKT_COUNT		100
+
 void cn23xx_vf_ask_pf_to_do_flr(struct octeon_device *oct);
 
 int cn23xx_octeon_pfvf_handshake(struct octeon_device *oct);
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 3d5c61a..e6321f3 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -41,6 +41,60 @@ struct octeon_device_priv {
 static void liquidio_vf_remove(struct pci_dev *pdev);
 static int octeon_device_init(struct octeon_device *oct);
 
+static int lio_wait_for_oq_pkts(struct octeon_device *oct)
+{
+	struct octeon_device_priv *oct_priv =
+	    (struct octeon_device_priv *)oct->priv;
+	int retry = MAX_VF_IP_OP_PENDING_PKT_COUNT;
+	int pkt_cnt = 0, pending_pkts;
+	int i;
+
+	do {
+		pending_pkts = 0;
+
+		for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) {
+			if (!(oct->io_qmask.oq & BIT_ULL(i)))
+				continue;
+			pkt_cnt += octeon_droq_check_hw_for_pkts(oct->droq[i]);
+		}
+		if (pkt_cnt > 0) {
+			pending_pkts += pkt_cnt;
+			tasklet_schedule(&oct_priv->droq_tasklet);
+		}
+		pkt_cnt = 0;
+		schedule_timeout_uninterruptible(1);
+
+	} while (retry-- && pending_pkts);
+
+	return pkt_cnt;
+}
+
+/**
+ * \brief wait for all pending requests to complete
+ * @param oct Pointer to Octeon device
+ *
+ * Called during shutdown sequence
+ */
+static int wait_for_pending_requests(struct octeon_device *oct)
+{
+	int i, pcount = 0;
+
+	for (i = 0; i < MAX_VF_IP_OP_PENDING_PKT_COUNT; i++) {
+		pcount = atomic_read(
+		    &oct->response_list[OCTEON_ORDERED_SC_LIST]
+			 .pending_req_count);
+		if (pcount)
+			schedule_timeout_uninterruptible(HZ / 10);
+		else
+			break;
+	}
+
+	if (pcount)
+		return 1;
+
+	return 0;
+}
+
 static const struct pci_device_id liquidio_vf_pci_tbl[] = {
 	{
 		PCI_VENDOR_ID_CAVIUM, OCTEON_CN23XX_VF_VID,
@@ -257,6 +311,35 @@ static void octeon_destroy_resources(struct octeon_device *oct)
 	int i;
 
 	switch (atomic_read(&oct->status)) {
+	case OCT_DEV_RUNNING:
+	case OCT_DEV_CORE_OK:
+		/* No more instructions will be forwarded. */
+		atomic_set(&oct->status, OCT_DEV_IN_RESET);
+
+		dev_dbg(&oct->pci_dev->dev, "Device state is now %s\n",
+			lio_get_state_string(&oct->status));
+
+		schedule_timeout_uninterruptible(HZ / 10);
+
+		/* fallthrough */
+	case OCT_DEV_HOST_OK:
+		/* fallthrough */
+	case OCT_DEV_IO_QUEUES_DONE:
+		if (wait_for_pending_requests(oct))
+			dev_err(&oct->pci_dev->dev, "There were pending requests\n");
+
+		if (lio_wait_for_instr_fetch(oct))
+			dev_err(&oct->pci_dev->dev, "IQ had pending instructions\n");
+
+		/* Disable the input and output queues now. No more packets will
+		 * arrive from Octeon, but we should wait for all packet
+		 * processing to finish.
+		 */
+		oct->fn_list.disable_io_queues(oct);
+
+		if (lio_wait_for_oq_pkts(oct))
+			dev_err(&oct->pci_dev->dev, "OQ had pending packets\n");
+
 	case OCT_DEV_INTR_SET_DONE:
 		/* Disable interrupts  */
 		oct->fn_list.disable_interrupt(oct, OCTEON_ALL_INTR);
@@ -395,6 +478,7 @@ static int octeon_pci_os_setup(struct octeon_device *oct)
 static int octeon_device_init(struct octeon_device *oct)
 {
 	u32 rev_id;
+	int j;
 
 	atomic_set(&oct->status, OCT_DEV_BEGIN_STATE);
 
@@ -488,6 +572,28 @@ static int octeon_device_init(struct octeon_device *oct)
 
 	atomic_set(&oct->status, OCT_DEV_INTR_SET_DONE);
 
+	/* Enable the input and output queues for this Octeon device */
+	if (oct->fn_list.enable_io_queues(oct)) {
+		dev_err(&oct->pci_dev->dev, "enabling io queues failed\n");
+		return 1;
+	}
+
+	atomic_set(&oct->status, OCT_DEV_IO_QUEUES_DONE);
+
+	atomic_set(&oct->status, OCT_DEV_HOST_OK);
+
+	/* Send Credit for Octeon Output queues. Credits are always sent after
+	 * the output queue is enabled.
+	 */
+	for (j = 0; j < oct->num_oqs; j++)
+		writel(oct->droq[j]->max_count, oct->droq[j]->pkts_credit_reg);
+
+	/* Packets can start arriving on the output queues from this point. */
+
+	atomic_set(&oct->status, OCT_DEV_CORE_OK);
+
+	atomic_set(&oct->status, OCT_DEV_RUNNING);
+
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next V2 5/9] liquidio CN23XX: VF register access
From: Raghu Vatsavayi @ 2016-11-28  8:50 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480323038-13702-1-git-send-email-rvatsavayi@caviumnetworks.com>

This patch adds support for VF device register access.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c    | 189 +++++++++++++++++++++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |   2 +
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |   5 +
 3 files changed, 196 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index 60fd138..ad4e442 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -76,6 +76,161 @@ static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues)
 	return ret_val;
 }
 
+static int cn23xx_vf_setup_global_input_regs(struct octeon_device *oct)
+{
+	struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
+	struct octeon_instr_queue *iq;
+	u64 q_no, intr_threshold;
+	u64 d64;
+
+	if (cn23xx_vf_reset_io_queues(oct, oct->sriov_info.rings_per_vf))
+		return -1;
+
+	for (q_no = 0; q_no < (oct->sriov_info.rings_per_vf); q_no++) {
+		void __iomem *inst_cnt_reg;
+
+		octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_DOORBELL(q_no),
+				   0xFFFFFFFF);
+		iq = oct->instr_queue[q_no];
+
+		if (iq)
+			inst_cnt_reg = iq->inst_cnt_reg;
+		else
+			inst_cnt_reg = (u8 *)oct->mmio[0].hw_addr +
+				       CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no);
+
+		d64 = octeon_read_csr64(oct,
+					CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no));
+
+		d64 &= 0xEFFFFFFFFFFFFFFFL;
+
+		octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_INSTR_COUNT64(q_no),
+				   d64);
+
+		/* Select ES, RO, NS, RDSIZE,DPTR Fomat#0 for
+		 * the Input Queues
+		 */
+		octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
+				   CN23XX_PKT_INPUT_CTL_MASK);
+
+		/* set the wmark level to trigger PI_INT */
+		intr_threshold = CFG_GET_IQ_INTR_PKT(cn23xx->conf) &
+				 CN23XX_PKT_IN_DONE_WMARK_MASK;
+
+		writeq((readq(inst_cnt_reg) &
+			~(CN23XX_PKT_IN_DONE_WMARK_MASK <<
+			  CN23XX_PKT_IN_DONE_WMARK_BIT_POS)) |
+		       (intr_threshold << CN23XX_PKT_IN_DONE_WMARK_BIT_POS),
+		       inst_cnt_reg);
+	}
+	return 0;
+}
+
+static void cn23xx_vf_setup_global_output_regs(struct octeon_device *oct)
+{
+	u32 reg_val;
+	u32 q_no;
+
+	for (q_no = 0; q_no < (oct->sriov_info.rings_per_vf); q_no++) {
+		octeon_write_csr(oct, CN23XX_VF_SLI_OQ_PKTS_CREDIT(q_no),
+				 0xFFFFFFFF);
+
+		reg_val =
+		    octeon_read_csr(oct, CN23XX_VF_SLI_OQ_PKTS_SENT(q_no));
+
+		reg_val &= 0xEFFFFFFFFFFFFFFFL;
+
+		reg_val =
+		    octeon_read_csr(oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no));
+
+		/* set IPTR & DPTR */
+		reg_val |=
+		    (CN23XX_PKT_OUTPUT_CTL_IPTR | CN23XX_PKT_OUTPUT_CTL_DPTR);
+
+		/* reset BMODE */
+		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_BMODE);
+
+		/* No Relaxed Ordering, No Snoop, 64-bit Byte swap
+		 * for Output Queue ScatterList reset ROR_P, NSR_P
+		 */
+		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ROR_P);
+		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_NSR_P);
+
+#ifdef __LITTLE_ENDIAN_BITFIELD
+		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ES_P);
+#else
+		reg_val |= (CN23XX_PKT_OUTPUT_CTL_ES_P);
+#endif
+		/* No Relaxed Ordering, No Snoop, 64-bit Byte swap
+		 * for Output Queue Data reset ROR, NSR
+		 */
+		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_ROR);
+		reg_val &= ~(CN23XX_PKT_OUTPUT_CTL_NSR);
+		/* set the ES bit */
+		reg_val |= (CN23XX_PKT_OUTPUT_CTL_ES);
+
+		/* write all the selected settings */
+		octeon_write_csr(oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no),
+				 reg_val);
+	}
+}
+
+static int cn23xx_setup_vf_device_regs(struct octeon_device *oct)
+{
+	if (cn23xx_vf_setup_global_input_regs(oct))
+		return -1;
+
+	cn23xx_vf_setup_global_output_regs(oct);
+
+	return 0;
+}
+
+static void cn23xx_setup_vf_iq_regs(struct octeon_device *oct, u32 iq_no)
+{
+	struct octeon_instr_queue *iq = oct->instr_queue[iq_no];
+	u64 pkt_in_done;
+
+	/* Write the start of the input queue's ring and its size */
+	octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_BASE_ADDR64(iq_no),
+			   iq->base_addr_dma);
+	octeon_write_csr(oct, CN23XX_VF_SLI_IQ_SIZE(iq_no), iq->max_count);
+
+	/* Remember the doorbell & instruction count register addr
+	 * for this queue
+	 */
+	iq->doorbell_reg =
+	    (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_IQ_DOORBELL(iq_no);
+	iq->inst_cnt_reg =
+	    (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_IQ_INSTR_COUNT64(iq_no);
+	dev_dbg(&oct->pci_dev->dev, "InstQ[%d]:dbell reg @ 0x%p instcnt_reg @ 0x%p\n",
+		iq_no, iq->doorbell_reg, iq->inst_cnt_reg);
+
+	/* Store the current instruction counter (used in flush_iq
+	 * calculation)
+	 */
+	pkt_in_done = readq(iq->inst_cnt_reg);
+
+	iq->reset_instr_cnt = 0;
+}
+
+static void cn23xx_setup_vf_oq_regs(struct octeon_device *oct, u32 oq_no)
+{
+	struct octeon_droq *droq = oct->droq[oq_no];
+
+	octeon_write_csr64(oct, CN23XX_VF_SLI_OQ_BASE_ADDR64(oq_no),
+			   droq->desc_ring_dma);
+	octeon_write_csr(oct, CN23XX_VF_SLI_OQ_SIZE(oq_no), droq->max_count);
+
+	octeon_write_csr(oct, CN23XX_VF_SLI_OQ_BUFF_INFO_SIZE(oq_no),
+			 (droq->buffer_size | (OCT_RH_SIZE << 16)));
+
+	/* Get the mapped address of the pkt_sent and pkts_credit regs */
+	droq->pkts_sent_reg =
+	    (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_OQ_PKTS_SENT(oq_no);
+	droq->pkts_credit_reg =
+	    (u8 *)oct->mmio[0].hw_addr + CN23XX_VF_SLI_OQ_PKTS_CREDIT(oq_no);
+}
+
 static int cn23xx_enable_vf_io_queues(struct octeon_device *oct)
 {
 	u32 q_no;
@@ -181,8 +336,42 @@ int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
 		}
 	}
 
+	oct->fn_list.setup_iq_regs = cn23xx_setup_vf_iq_regs;
+	oct->fn_list.setup_oq_regs = cn23xx_setup_vf_oq_regs;
+	oct->fn_list.setup_device_regs = cn23xx_setup_vf_device_regs;
+
 	oct->fn_list.enable_io_queues = cn23xx_enable_vf_io_queues;
 	oct->fn_list.disable_io_queues = cn23xx_disable_vf_io_queues;
 
 	return 0;
 }
+
+void cn23xx_dump_vf_iq_regs(struct octeon_device *oct)
+{
+	u32 regval, q_no;
+
+	dev_dbg(&oct->pci_dev->dev, "SLI_IQ_DOORBELL_0 [0x%x]: 0x%016llx\n",
+		CN23XX_VF_SLI_IQ_DOORBELL(0),
+		CVM_CAST64(octeon_read_csr64(
+					oct, CN23XX_VF_SLI_IQ_DOORBELL(0))));
+
+	dev_dbg(&oct->pci_dev->dev, "SLI_IQ_BASEADDR_0 [0x%x]: 0x%016llx\n",
+		CN23XX_VF_SLI_IQ_BASE_ADDR64(0),
+		CVM_CAST64(octeon_read_csr64(
+			oct, CN23XX_VF_SLI_IQ_BASE_ADDR64(0))));
+
+	dev_dbg(&oct->pci_dev->dev, "SLI_IQ_FIFO_RSIZE_0 [0x%x]: 0x%016llx\n",
+		CN23XX_VF_SLI_IQ_SIZE(0),
+		CVM_CAST64(octeon_read_csr64(oct, CN23XX_VF_SLI_IQ_SIZE(0))));
+
+	for (q_no = 0; q_no < oct->sriov_info.rings_per_vf; q_no++) {
+		dev_dbg(&oct->pci_dev->dev, "SLI_PKT[%d]_INPUT_CTL [0x%x]: 0x%016llx\n",
+			q_no, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
+			CVM_CAST64(octeon_read_csr64(
+				oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no))));
+	}
+
+	pci_read_config_dword(oct->pci_dev, CN23XX_CONFIG_PCIE_DEVCTL, &regval);
+	dev_dbg(&oct->pci_dev->dev, "Config DevCtl [0x%x]: 0x%08x\n",
+		CN23XX_CONFIG_PCIE_DEVCTL, regval);
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index 6785796..d17c1ce 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -35,4 +35,6 @@ struct octeon_cn23xx_vf {
 #define BUSY_READING_REG_VF_LOOP_COUNT		10000
 
 int cn23xx_setup_octeon_vf_device(struct octeon_device *oct);
+
+void cn23xx_dump_vf_initialized_regs(struct octeon_device *oct);
 #endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 41fc9d2..61c8b78 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -241,6 +241,11 @@ static int octeon_device_init(struct octeon_device *oct)
 		return 1;
 	}
 
+	if (oct->fn_list.setup_device_regs(oct)) {
+		dev_err(&oct->pci_dev->dev, "device registers configuration failed\n");
+		return 1;
+	}
+
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next V2 4/9] liquidio CN23XX: VF queue setup
From: Raghu Vatsavayi @ 2016-11-28  8:50 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1480323038-13702-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for configuring VF input/output queues.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.c    | 144 +++++++++++++++++++++
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |   2 +
 drivers/net/ethernet/cavium/liquidio/lio_main.c    |   6 +-
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |   5 +
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  43 +++++-
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   7 +-
 .../net/ethernet/cavium/liquidio/request_manager.c |   4 +-
 7 files changed, 207 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
index d683bda..60fd138 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c
@@ -25,13 +25,134 @@
 #include "cn23xx_vf_device.h"
 #include "octeon_main.h"
 
+static int cn23xx_vf_reset_io_queues(struct octeon_device *oct, u32 num_queues)
+{
+	u32 loop = BUSY_READING_REG_VF_LOOP_COUNT;
+	int ret_val = 0;
+	u32 q_no;
+	u64 d64;
+
+	for (q_no = 0; q_no < num_queues; q_no++) {
+		/* set RST bit to 1. This bit applies to both IQ and OQ */
+		d64 = octeon_read_csr64(oct,
+					CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+		d64 |= CN23XX_PKT_INPUT_CTL_RST;
+		octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
+				   d64);
+	}
+
+	/* wait until the RST bit is clear or the RST and QUIET bits are set */
+	for (q_no = 0; q_no < num_queues; q_no++) {
+		u64 reg_val = octeon_read_csr64(oct,
+					CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+		while ((READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) &&
+		       !(READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_QUIET) &&
+		       loop) {
+			WRITE_ONCE(reg_val, octeon_read_csr64(
+			    oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no)));
+			loop--;
+		}
+		if (!loop) {
+			dev_err(&oct->pci_dev->dev,
+				"clearing the reset reg failed or setting the quiet reg failed for qno: %u\n",
+				q_no);
+			return -1;
+		}
+		WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
+			   ~CN23XX_PKT_INPUT_CTL_RST);
+		octeon_write_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no),
+				   READ_ONCE(reg_val));
+
+		WRITE_ONCE(reg_val, octeon_read_csr64(
+		    oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no)));
+		if (READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) {
+			dev_err(&oct->pci_dev->dev,
+				"clearing the reset failed for qno: %u\n",
+				q_no);
+			ret_val = -1;
+		}
+	}
+
+	return ret_val;
+}
+
+static int cn23xx_enable_vf_io_queues(struct octeon_device *oct)
+{
+	u32 q_no;
+
+	for (q_no = 0; q_no < oct->num_iqs; q_no++) {
+		u64 reg_val;
+
+		/* set the corresponding IQ IS_64B bit */
+		if (oct->io_qmask.iq64B & BIT_ULL(q_no)) {
+			reg_val = octeon_read_csr64(
+			    oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+			reg_val |= CN23XX_PKT_INPUT_CTL_IS_64B;
+			octeon_write_csr64(
+			    oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), reg_val);
+		}
+
+		/* set the corresponding IQ ENB bit */
+		if (oct->io_qmask.iq & BIT_ULL(q_no)) {
+			reg_val = octeon_read_csr64(
+			    oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+			reg_val |= CN23XX_PKT_INPUT_CTL_RING_ENB;
+			octeon_write_csr64(
+			    oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no), reg_val);
+		}
+	}
+	for (q_no = 0; q_no < oct->num_oqs; q_no++) {
+		u32 reg_val;
+
+		/* set the corresponding OQ ENB bit */
+		if (oct->io_qmask.oq & BIT_ULL(q_no)) {
+			reg_val = octeon_read_csr(
+			    oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no));
+			reg_val |= CN23XX_PKT_OUTPUT_CTL_RING_ENB;
+			octeon_write_csr(
+			    oct, CN23XX_VF_SLI_OQ_PKT_CONTROL(q_no), reg_val);
+		}
+	}
+
+	return 0;
+}
+
+static void cn23xx_disable_vf_io_queues(struct octeon_device *oct)
+{
+	u32 num_queues = oct->num_iqs;
+
+	/* per HRM, rings can only be disabled via reset operation,
+	 * NOT via SLI_PKT()_INPUT/OUTPUT_CONTROL[ENB]
+	 */
+	if (num_queues < oct->num_oqs)
+		num_queues = oct->num_oqs;
+
+	cn23xx_vf_reset_io_queues(oct, num_queues);
+}
+
 int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
 {
 	struct octeon_cn23xx_vf *cn23xx = (struct octeon_cn23xx_vf *)oct->chip;
+	u32 rings_per_vf, ring_flag;
+	u64 reg_val;
 
 	if (octeon_map_pci_barx(oct, 0, 0))
 		return 1;
 
+	/* INPUT_CONTROL[RPVF] gives the VF IOq count */
+	reg_val = octeon_read_csr64(oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(0));
+
+	oct->pf_num = (reg_val >> CN23XX_PKT_INPUT_CTL_PF_NUM_POS) &
+		      CN23XX_PKT_INPUT_CTL_PF_NUM_MASK;
+	oct->vf_num = (reg_val >> CN23XX_PKT_INPUT_CTL_VF_NUM_POS) &
+		      CN23XX_PKT_INPUT_CTL_VF_NUM_MASK;
+
+	reg_val = reg_val >> CN23XX_PKT_INPUT_CTL_RPVF_POS;
+
+	rings_per_vf = reg_val & CN23XX_PKT_INPUT_CTL_RPVF_MASK;
+
+	ring_flag = 0;
+
 	cn23xx->conf  = oct_get_config_info(oct, LIO_23XX);
 	if (!cn23xx->conf) {
 		dev_err(&oct->pci_dev->dev, "%s No Config found for CN23XX\n",
@@ -40,5 +161,28 @@ int cn23xx_setup_octeon_vf_device(struct octeon_device *oct)
 		return 1;
 	}
 
+	if (oct->sriov_info.rings_per_vf > rings_per_vf) {
+		dev_warn(&oct->pci_dev->dev,
+			 "num_queues:%d greater than PF configured rings_per_vf:%d. Reducing to %d.\n",
+			 oct->sriov_info.rings_per_vf, rings_per_vf,
+			 rings_per_vf);
+		oct->sriov_info.rings_per_vf = rings_per_vf;
+	} else {
+		if (rings_per_vf > num_present_cpus()) {
+			dev_warn(&oct->pci_dev->dev,
+				 "PF configured rings_per_vf:%d greater than num_cpu:%d. Using rings_per_vf:%d equal to num cpus\n",
+				 rings_per_vf,
+				 num_present_cpus(),
+				 num_present_cpus());
+			oct->sriov_info.rings_per_vf =
+				num_present_cpus();
+		} else {
+			oct->sriov_info.rings_per_vf = rings_per_vf;
+		}
+	}
+
+	oct->fn_list.enable_io_queues = cn23xx_enable_vf_io_queues;
+	oct->fn_list.disable_io_queues = cn23xx_disable_vf_io_queues;
+
 	return 0;
 }
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index 9e4fb50..6785796 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -32,5 +32,7 @@ struct octeon_cn23xx_vf {
 	struct octeon_config *conf;
 };
 
+#define BUSY_READING_REG_VF_LOOP_COUNT		10000
+
 int cn23xx_setup_octeon_vf_device(struct octeon_device *oct);
 #endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 3d05b2f..39a9665 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -40,6 +40,7 @@
 MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_210SV_NAME LIO_FW_NAME_SUFFIX);
 MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_210NV_NAME LIO_FW_NAME_SUFFIX);
 MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_410NV_NAME LIO_FW_NAME_SUFFIX);
+MODULE_FIRMWARE(LIO_FW_DIR LIO_FW_BASE_NAME LIO_23XX_NAME LIO_FW_NAME_SUFFIX);
 
 static int ddr_timeout = 10000;
 module_param(ddr_timeout, int, 0644);
@@ -4484,7 +4485,10 @@ static int octeon_device_init(struct octeon_device *octeon_dev)
 
 	atomic_set(&octeon_dev->status, OCT_DEV_DISPATCH_INIT_DONE);
 
-	octeon_set_io_queues_off(octeon_dev);
+	if (octeon_set_io_queues_off(octeon_dev)) {
+		dev_err(&octeon_dev->pci_dev->dev, "setting io queues off failed\n");
+		return 1;
+	}
 
 	if (OCTEON_CN23XX_PF(octeon_dev)) {
 		ret = octeon_dev->fn_list.setup_device_regs(octeon_dev);
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index dd1dad1..41fc9d2 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -236,6 +236,11 @@ static int octeon_device_init(struct octeon_device *oct)
 
 	atomic_set(&oct->status, OCT_DEV_PCI_MAP_DONE);
 
+	if (octeon_set_io_queues_off(oct)) {
+		dev_err(&oct->pci_dev->dev, "setting io queues off failed\n");
+		return 1;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index 7e6c8b8..fe84e90 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -860,12 +860,53 @@ int octeon_setup_output_queues(struct octeon_device *oct)
 	return 0;
 }
 
-void octeon_set_io_queues_off(struct octeon_device *oct)
+int octeon_set_io_queues_off(struct octeon_device *oct)
 {
+	int loop = BUSY_READING_REG_VF_LOOP_COUNT;
+
 	if (OCTEON_CN6XXX(oct)) {
 		octeon_write_csr(oct, CN6XXX_SLI_PKT_INSTR_ENB, 0);
 		octeon_write_csr(oct, CN6XXX_SLI_PKT_OUT_ENB, 0);
+	} else if (oct->chip_id == OCTEON_CN23XX_VF_VID) {
+		u32 q_no;
+
+		/* IOQs will already be in reset.
+		 * If RST bit is set, wait for quiet bit to be set.
+		 * Once quiet bit is set, clear the RST bit.
+		 */
+		for (q_no = 0; q_no < oct->sriov_info.rings_per_vf; q_no++) {
+			u64 reg_val = octeon_read_csr64(
+				oct, CN23XX_VF_SLI_IQ_PKT_CONTROL64(q_no));
+
+			while ((reg_val & CN23XX_PKT_INPUT_CTL_RST) &&
+			       !(reg_val &  CN23XX_PKT_INPUT_CTL_QUIET) &&
+			       loop) {
+				reg_val = octeon_read_csr64(
+					oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
+				loop--;
+			}
+			if (!loop) {
+				dev_err(&oct->pci_dev->dev,
+					"clearing the reset reg failed or setting the quiet reg failed for qno: %u\n",
+					q_no);
+				return -1;
+			}
+
+			reg_val = reg_val & ~CN23XX_PKT_INPUT_CTL_RST;
+			octeon_write_csr64(oct,
+					   CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
+					   reg_val);
+
+			reg_val = octeon_read_csr64(
+					oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
+			if (reg_val & CN23XX_PKT_INPUT_CTL_RST) {
+				dev_err(&oct->pci_dev->dev,
+					"unable to reset qno %u\n", q_no);
+				return -1;
+			}
+		}
 	}
+	return 0;
 }
 
 void octeon_set_droq_pkt_op(struct octeon_device *oct,
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
index 5ce2048..1e6bfa1 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
@@ -401,8 +401,13 @@ struct octeon_device {
 
 	/** Octeon Chip type. */
 	u16 chip_id;
+
 	u16 rev_id;
+
 	u16 pf_num;
+
+	u16 vf_num;
+
 	/** This device's id - set by the driver. */
 	u32 octeon_id;
 
@@ -766,7 +771,7 @@ int octeon_download_firmware(struct octeon_device *oct, const u8 *data,
 /** Turns off the input and output queues for the device
  *  @param oct which octeon to disable
  */
-void octeon_set_io_queues_off(struct octeon_device *oct);
+int octeon_set_io_queues_off(struct octeon_device *oct);
 
 /** Turns on or off the given output queue for the device
  *  @param oct which octeon to change
diff --git a/drivers/net/ethernet/cavium/liquidio/request_manager.c b/drivers/net/ethernet/cavium/liquidio/request_manager.c
index 8531a00..0e10e2a 100644
--- a/drivers/net/ethernet/cavium/liquidio/request_manager.c
+++ b/drivers/net/ethernet/cavium/liquidio/request_manager.c
@@ -235,7 +235,9 @@ int octeon_setup_iq(struct octeon_device *oct,
 	}
 
 	oct->num_iqs++;
-	oct->fn_list.enable_io_queues(oct);
+	if (oct->fn_list.enable_io_queues(oct))
+		return 1;
+
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net] net, sched: respect rcu grace period on cls destruction
From: Daniel Borkmann @ 2016-11-28  9:09 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, John Fastabend, Roi Dayan, ast,
	Hannes Frederic Sowa, Jiri Pirko, Linux Kernel Network Developers,
	Paul E. McKenney
In-Reply-To: <CAM_iQpVRDnLM_Cmp63PVkpYtmdtNECP-2m89=DGdidkUYyHwug@mail.gmail.com>

On 11/28/2016 07:57 AM, Cong Wang wrote:
> On Sat, Nov 26, 2016 at 4:18 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> Roi reported a crash in flower where tp->root was NULL in ->classify()
>> callbacks. Reason is that in ->destroy() tp->root is set to NULL via
>> RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
>> this doesn't respect RCU grace period for them, and as a result, still
>> outstanding readers from tc_classify() will try to blindly dereference
>> a NULL tp->root.
>>
>> The tp->root object is strictly private to the classifier implementation
>> and holds internal data the core such as tc_ctl_tfilter() doesn't know
>> about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
>> is only checked for NULL in ->get() callback, but nowhere else. This is
>> misleading and seemed to be copied from old classifier code that was not
>> cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic:
>> fix NULL pointer dereference") moved tp->root initialization into ->init()
>> routine, where before it was part of ->change(), so ->get() had to deal
>> with tp->root being NULL back then, so that was indeed a valid case, after
>> d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long
>> ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg()
>> in packet classifiers"); but the NULLifying was reintroduced with the
>> RCUification, but it's not correct for every classifier implementation.
>>
>> In the cases that are fixed here with one exception of cls_cgroup, tp->root
>> object is allocated and initialized inside ->init() callback, which is always
>> performed at a point in time after we allocate a new tp, which means tp and
>> thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
>> Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
>> handler, same for the tp which is kfree_rcu()'ed right when we return
>> from ->destroy() in tcf_destroy(). This means, the head object's lifetime
>> for such classifiers is always tied to the tp lifetime. The RCU callback
>> invocation for the two kfree_rcu() could be out of order, but that's fine
>> since both are independent.
>>
>> Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
>> means that 1) we don't need a useless NULL check in fast-path and, 2) that
>> outstanding readers of that tp in tc_classify() can still execute under
>> respect with RCU grace period as it is actually expected.
>>
>> Things that haven't been touched here: cls_fw and cls_route. They each
>> handle tp->root being NULL in ->classify() path for historic reasons, so
>> their ->destroy() implementation can stay as is. If someone actually
>> cares, they could get cleaned up at some point to avoid the test in fast
>> path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
>> !head should anyone actually be using/testing it, so it at least aligns with
>> cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
>> destruction (to a sleepable context) after RCU grace period as concurrent
>> readers might still access it. (Note that in this case we need to hold module
>> reference to keep work callback address intact, since we only wait on module
>> unload for all call_rcu()s to finish.)
>>
>> This fixes one race to bring RCU grace period guarantees back. Next step
>> as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy
>> proto tp when all filters are gone") to get the order of unlinking the tp
>> in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
>> RCU_INIT_POINTER() before tcf_destroy() and let the notification for
>> removal be done through the prior ->delete() callback. Both are independant
>> issues. Once we have that right, we can then clean tp->root up for a number
>> of classifiers by not making them RCU pointers, which requires a new callback
>> (->uninit) that is triggered from tp's RCU callback, where we just kfree()
>> tp->root from there.
>
> Looks good to my eyes,
>
> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

Thanks for the review (also to John)!

> The ugly part is the work struct, I am not an RCU expert so don't know if we
> have any API to execute an RCU callback in process context. Paul?

Same way we do this in BPF with prog destruction, by the way. I'm not aware
of any callback API for RCU that lets you do this, but maybe Paul knows.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox