Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: this cpu documentation
From: Randy Dunlap @ 2013-04-04  0:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, Tejun Heo, srostedt
In-Reply-To: <0000013dd1a20ebf-4a76fb06-4b9d-492e-9d77-4b3f43aceca7-000000@email.amazonses.com>

On 04/03/13 13:41, Christoph Lameter wrote:
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation
> 
> Document the rationale and the way to use this_cpu operations.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/Documentation/this_cpu_ops
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/this_cpu_ops	2013-04-03 15:25:41.424846306 -0500
> @@ -0,0 +1,194 @@
> +this_cpu operations
> +-------------------
> +
> +this_cpu operations are a way of optimizing access to per cpu variables
> +associated with the *currently* executing processor
> +through the use of segment registers (or a dedicated register where the cpu
> +permanently stored the beginning of the per cpu area for a specific
> +processor).
> +
> +The this_cpu operations add an per cpu variable offset to the processor

                           add a per

> +specific percpu base and encode that operation in the instruction operating
> +on the per cpu variable.
> +
> +This mean there are no atomicity issues between the calculation

        means

> +of the offset and the operation on the data. Therefore it is not necessary
> +to disable preempt or interrupts to ensure that the processor is not changed
> +between the calculation of the address and the operation on the data.
> +
> +Read-modify-write operations are of particular interest. Frequently
> +processors have special lower latency instructions that can operate without
> +the typical synchronization overhead but still provide some sort of relaxed
> +atomicity guarantee. The x86 for example can execute RMV instructions like

                                                        RMW ??

> +inc/dec/cmpxchg without the lock prefix and the associated latency penalty.
> +
> +Access to the variable without the lock prefix is not synchronized but
> +synchronization is not necessary since we are dealing with per cpu data
> +specific to the currently executing processor. Only the current processor
> +should be accessing that variable and therefore there are no concurency

                                                                concurrency

> +issues with other processors in the system.
> +
> +On x86 the fs: or the gs: segment registers contain the basis of the per cpu area. It is

                                                           base

> +then possible to simply use the segment override to relocate a per cpu relative address
> +to the proper per cpu area for the processor. So the relocation to the per cpu base
> +is encoded in the instruction via a segment register prefix.
> +
> +For example:
> +
> +	DEFINE_PER_CPU(int, x);
> +	int z;
> +
> +	z = this_cpu_read(x);
> +
> +results in a single instruction
> +
> +	mov ax, gs:[x]
> +
> +instead of a sequence of calculation of the address and then a fetch from
> +that address which occurs with the percpu operations. Before this_cpu_ops
> +such sequence also required preempt disable/enable to prevent the Os from

                                                                     OS or O/S or kernel

> +moving the thread to a different processor while the calculation is performed.
> +
> +
> +The main use of the this_cpu operations has been to optimize counter operations.
> +
> +
> +	this_cpu_inc(x)
> +
> +results in the following single instruction (no lock prefix!)
> +
> +	inc gs:[x]
> +
> +
> +instead of the following operations required if there is no segment register.
> +
> +	int *y;
> +	int cpu;
> +
> +	cpu = get_cpu();
> +	y = per_cpu_ptr(&x, cpu);
> +	(*y)++;
> +	put_cpu();
> +
> +
> +Note that these operations can only be used on percpu data that is reserved for
> +a specific processor. Without disabling preemption in the surrounding code
> +this_cpu_inc() will only guarantee that one of the percpu counters is correctly
> +incremented. However, there is no guarantee that the OS will not move the process
> +directly before or after the this_cpu instruction is executed. In general this
> +means that the value of the individual counters for each processor are
> +meaningless. The sum of all the per cpu counters is the only value that is of
> +interest.
> +
> +Per cpu variables are used for performance reasons. Bouncing cache lines can
> +be avoided if multiple processors concurrently go through the same code paths.
> +Since each processor has its own per cpu variables no concurrent cacheline
> +updates take place. The price that has to be paid for this optimization is
> +the need to add up the per cpu counters when the value of the counter is
> +needed.
> +
> +
> +Special operations:
> +-------------------
> +
> +	y = this_cpu_ptr(&x)
> +
> +Takes the offset of a per cpu variable (&x !) and returns the address of the
> +per cpu variable that belongs to the currently executing processor.
> +this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
> +requires. No processor number is available. Instead the offset of the local\

drop ending backslash

> +per cpu area is simply added to the percpu offset.
> +
> +
> +
> +Per cpu variables and offsets
> +-----------------------------
> +
> +Per cpu variables have *offsets* to the beginning of the percpu area. They do
> +not have addresses although they look like that in the code. Offsets
> +cannot be directly dereferenced. The offset must be added to a base pointer of
> +a percpu area of a processor in order to form a valid address.
> +
> +Therefore the use of x or &x outside of the context of per cpu operations
> +is invalid and will generally be treated like a NULL pointer dereference.
> +
> +In the context of per cpu operations
> +
> +	x is a per cpu variable. Most this_cpu operations take a cpu variable.
> +
> +	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
> +		of a per cpu variable which makes this look a bit strange.
> +
> +
> +
> +Operations on a field of a per cpu structure
> +--------------------------------------------
> +
> +Lets say we have a percpu structure

   Let's

> +
> +	struct s {
> +		int n,m;
> +	};
> +
> +	DEFINE_PER_CPU(struct s, p);
> +
> +
> +Operations on these fields are straightforward
> +
> +	this_cpu_inc(p.m)
> +
> +	z = this_cpu_cmpxchg(p.m, 0, 1);
> +
> +
> +If we have an offset to struct s:
> +
> +	struct s __percpu *ps = &p;
> +
> +	z = this_cpu_dec(ps->m);
> +
> +	z = this_cpu_inc_return(ps->n);
> +
> +
> +The calculation of the pointer may require the use of this_cpu_ptr() if we
> +do not make use of this_cpu ops later to manipulate fields:
> +
> +	struct s *pp;
> +
> +	pp = this_cpu_ptr(&p);
> +
> +	pp->m--

	add    ;

> +
> +	z = pp->n++

	add        ;

> +
> +
> +Variants of this_cpu ops
> +-------------------------
> +
> +this_cpu ops are interupt safe. Some architecture do not support these per

                    interrupt

> +cpu local operations. In that case the operation must be replaced by code
> +that disables interrupts, then does the operations that are guaranteed to be
> +atomic and then reenable interrupts. Doing so is expensive. If there are
> +other reasons why the scheduler cannot change the processor we are executing
> +on then there is no reason to disable interrupts. For that purpose
> +the __this_cpu operations are provided. F.e.

                                           E.g. or For example:


> +
> +	__this_cpu_inc(x)
> +
> +Will increment x and will not fallback to code that disables interrupts on
> +platforms that cannot accomplish atomicity through address relocation and
> +an RMV operation in the same instruction.

      RMW ?

> +
> +
> +
> +&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
> +--------------------------------------------
> +
> +The first operation takes the offset and forms an address and then adds
> +the offset of the n field.
> +
> +The second one first adds the two offsets and then does the relocation.
> +IMHO the second form looks cleaner and has an easier time with ().
> +
> +
> +Christoph Lameter, April 3rd, 2013


-- 
~Randy

^ permalink raw reply

* Re: TCP: snd_cwnd is nul, please report this bug.
From: Eric Dumazet @ 2013-04-04  0:08 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, Yuchung Cheng, Neal Cardwell
In-Reply-To: <1364965967.5113.195.camel@edumazet-glaptop>

On Tue, 2013-04-02 at 22:12 -0700, Eric Dumazet wrote:
> On Tue, 2013-04-02 at 23:54 -0400, Dave Jones wrote:
> > Just had this warning on 3.9-rc5
> > 
> > Not sure what else to report.  Nothing really odd going on,
> > just some ssh traffic and firefox over wifi (iwlwifi)
> > 
> > anything else I can provide ?
> 
> Thanks for the report.
> 
> I guess we'll need to resurrect a patch I did to track the (buggy)
> setting to 0.

Any chance you can use David net-next tree + following patch ?

(it also applies to Linus tree with some fuzz)

Thanks !

 include/net/tcp.h       |    8 ++++++++
 net/ipv4/tcp.c          |   27 +++++++++++++++++++++++++++
 net/ipv4/tcp_cong.c     |    2 +-
 net/ipv4/tcp_hybla.c    |    6 +++---
 net/ipv4/tcp_illinois.c |    5 +++--
 net/ipv4/tcp_input.c    |   24 +++++++++++++-----------
 net/ipv4/tcp_lp.c       |    2 +-
 net/ipv4/tcp_metrics.c  |    2 +-
 net/ipv4/tcp_output.c   |    2 +-
 net/ipv4/tcp_vegas.c    |    5 +++--
 net/ipv4/tcp_veno.c     |    2 +-
 net/ipv4/tcp_westwood.c |    3 ++-
 net/ipv4/tcp_yeah.c     |    6 ++----
 13 files changed, 66 insertions(+), 28 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 4475aaf..e2e89aa 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -609,6 +609,14 @@ static inline u32 __tcp_set_rto(const struct tcp_sock *tp)
 
 extern void tcp_set_rto(struct sock *sk);
 
+extern void tcp_snd_cwnd_bad(const struct tcp_sock *tp);
+static inline void tcp_snd_cwnd_set(struct tcp_sock *tp, u32 newval)
+{
+	if (unlikely(!newval))
+		tcp_snd_cwnd_bad(tp);
+	tp->snd_cwnd = newval;
+}
+
 static inline void __tcp_fast_path_on(struct tcp_sock *tp, u32 snd_wnd)
 {
 	tp->pred_flags = htonl((tp->tcp_header_len << 26) |
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a96f7b5..4347e22 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3300,6 +3300,33 @@ void tcp_done(struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(tcp_done);
 
+void tcp_snd_cwnd_bad(const struct tcp_sock *tp)
+{
+	static bool warned = false;
+
+	if (!warned) {
+		const struct sock *sk = (struct sock *)tp;
+		const struct inet_sock *inet = inet_sk(sk);
+
+		warned = true;
+		if (sk->sk_family == AF_INET)
+			pr_err("TCP: zero snd_cwnd src:%pI4.%u dst:%pI4.%u\n",
+			       &inet->inet_saddr, ntohs(inet->inet_sport),
+			       &inet->inet_daddr, ntohs(inet->inet_dport));
+#if IS_ENABLED(CONFIG_IPV6)
+		else if (sk->sk_family == AF_INET6) {
+			const struct ipv6_pinfo *np = inet6_sk(sk);
+
+			pr_err("TCP: zero snd_cwnd src:%pI6.%u dst:%pI6.%u\n",
+			       &np->saddr, ntohs(inet->inet_sport),
+			       &np->daddr, ntohs(inet->inet_dport));
+		}
+#endif
+		WARN_ON_ONCE(1);
+	}
+}
+EXPORT_SYMBOL(tcp_snd_cwnd_bad);
+
 extern struct tcp_congestion_ops tcp_reno;
 
 static __initdata unsigned long thash_entries;
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 019c238..6ddd167 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -327,7 +327,7 @@ void tcp_slow_start(struct tcp_sock *tp)
 		tp->snd_cwnd_cnt -= snd_cwnd;
 		delta++;
 	}
-	tp->snd_cwnd = min(snd_cwnd + delta, tp->snd_cwnd_clamp);
+	tcp_snd_cwnd_set(tp, min(snd_cwnd + delta, tp->snd_cwnd_clamp));
 }
 EXPORT_SYMBOL_GPL(tcp_slow_start);
 
diff --git a/net/ipv4/tcp_hybla.c b/net/ipv4/tcp_hybla.c
index 57bdd17..d236b70 100644
--- a/net/ipv4/tcp_hybla.c
+++ b/net/ipv4/tcp_hybla.c
@@ -60,7 +60,7 @@ static void hybla_init(struct sock *sk)
 
 	/* set minimum rtt as this is the 1st ever seen */
 	ca->minrtt = tp->srtt;
-	tp->snd_cwnd = ca->rho;
+	tcp_snd_cwnd_set(tp, ca->rho);
 }
 
 static void hybla_state(struct sock *sk, u8 ca_state)
@@ -157,9 +157,9 @@ static void hybla_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 	}
 	/* clamp down slowstart cwnd to ssthresh value. */
 	if (is_slowstart)
-		tp->snd_cwnd = min(tp->snd_cwnd, tp->snd_ssthresh);
+		tcp_snd_cwnd_set(tp, min(tp->snd_cwnd, tp->snd_ssthresh));
 
-	tp->snd_cwnd = min_t(u32, tp->snd_cwnd, tp->snd_cwnd_clamp);
+	tcp_snd_cwnd_set(tp, min_t(u32, tp->snd_cwnd, tp->snd_cwnd_clamp));
 }
 
 static struct tcp_congestion_ops tcp_hybla __read_mostly = {
diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c
index 834857f..71135e3 100644
--- a/net/ipv4/tcp_illinois.c
+++ b/net/ipv4/tcp_illinois.c
@@ -284,8 +284,9 @@ static void tcp_illinois_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 		*/
 		delta = (tp->snd_cwnd_cnt * ca->alpha) >> ALPHA_SHIFT;
 		if (delta >= tp->snd_cwnd) {
-			tp->snd_cwnd = min(tp->snd_cwnd + delta / tp->snd_cwnd,
-					   (u32) tp->snd_cwnd_clamp);
+			tcp_snd_cwnd_set(tp,
+					 min(tp->snd_cwnd + delta / tp->snd_cwnd,
+					     (u32) tp->snd_cwnd_clamp));
 			tp->snd_cwnd_cnt = 0;
 		}
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6d9ca35..b972577 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2259,8 +2259,8 @@ static void tcp_update_scoreboard(struct sock *sk, int fast_rexmit)
  */
 static inline void tcp_moderate_cwnd(struct tcp_sock *tp)
 {
-	tp->snd_cwnd = min(tp->snd_cwnd,
-			   tcp_packets_in_flight(tp) + tcp_max_burst(tp));
+	tcp_snd_cwnd_set(tp, min(tp->snd_cwnd,
+				 tcp_packets_in_flight(tp) + tcp_max_burst(tp)));
 	tp->snd_cwnd_stamp = tcp_time_stamp;
 }
 
@@ -2312,18 +2312,20 @@ static void tcp_undo_cwr(struct sock *sk, const bool undo_ssthresh)
 
 	if (tp->prior_ssthresh) {
 		const struct inet_connection_sock *icsk = inet_csk(sk);
+		u32 newval;
 
 		if (icsk->icsk_ca_ops->undo_cwnd)
-			tp->snd_cwnd = icsk->icsk_ca_ops->undo_cwnd(sk);
+			newval = icsk->icsk_ca_ops->undo_cwnd(sk);
 		else
-			tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1);
+			newval = max(tp->snd_cwnd, tp->snd_ssthresh << 1);
+		tcp_snd_cwnd_set(tp, newval);
 
 		if (undo_ssthresh && tp->prior_ssthresh > tp->snd_ssthresh) {
 			tp->snd_ssthresh = tp->prior_ssthresh;
 			TCP_ECN_withdraw_cwr(tp);
 		}
 	} else {
-		tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh);
+		tcp_snd_cwnd_set(tp, max(tp->snd_cwnd, tp->snd_ssthresh));
 	}
 	tp->snd_cwnd_stamp = tcp_time_stamp;
 }
@@ -2512,7 +2514,7 @@ static void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked,
 	}
 
 	sndcnt = max(sndcnt, (fast_rexmit ? 1 : 0));
-	tp->snd_cwnd = tcp_packets_in_flight(tp) + sndcnt;
+	tcp_snd_cwnd_set(tp, tcp_packets_in_flight(tp) + sndcnt);
 }
 
 static inline void tcp_end_cwnd_reduction(struct sock *sk)
@@ -2522,7 +2524,7 @@ static inline void tcp_end_cwnd_reduction(struct sock *sk)
 	/* Reset cwnd to ssthresh in CWR or Recovery (unless it's undone) */
 	if (inet_csk(sk)->icsk_ca_state == TCP_CA_CWR ||
 	    (tp->undo_marker && tp->snd_ssthresh < TCP_INFINITE_SSTHRESH)) {
-		tp->snd_cwnd = tp->snd_ssthresh;
+		tcp_snd_cwnd_set(tp, tp->snd_ssthresh);
 		tp->snd_cwnd_stamp = tcp_time_stamp;
 	}
 	tcp_ca_event(sk, CA_EVENT_COMPLETE_CWR);
@@ -2591,9 +2593,9 @@ static void tcp_mtup_probe_success(struct sock *sk)
 
 	/* FIXME: breaks with very large cwnd */
 	tp->prior_ssthresh = tcp_current_ssthresh(sk);
-	tp->snd_cwnd = tp->snd_cwnd *
-		       tcp_mss_to_mtu(sk, tp->mss_cache) /
-		       icsk->icsk_mtup.probe_size;
+	tcp_snd_cwnd_set(tp, tp->snd_cwnd *
+			     tcp_mss_to_mtu(sk, tp->mss_cache) /
+			     icsk->icsk_mtup.probe_size);
 	tp->snd_cwnd_cnt = 0;
 	tp->snd_cwnd_stamp = tcp_time_stamp;
 	tp->snd_ssthresh = tcp_current_ssthresh(sk);
@@ -4665,7 +4667,7 @@ void tcp_cwnd_application_limited(struct sock *sk)
 		u32 win_used = max(tp->snd_cwnd_used, init_win);
 		if (win_used < tp->snd_cwnd) {
 			tp->snd_ssthresh = tcp_current_ssthresh(sk);
-			tp->snd_cwnd = (tp->snd_cwnd + win_used) >> 1;
+			tcp_snd_cwnd_set(tp, (tp->snd_cwnd + win_used) >> 1);
 		}
 		tp->snd_cwnd_used = 0;
 	}
diff --git a/net/ipv4/tcp_lp.c b/net/ipv4/tcp_lp.c
index 72f7218..09bf418 100644
--- a/net/ipv4/tcp_lp.c
+++ b/net/ipv4/tcp_lp.c
@@ -307,7 +307,7 @@ static void tcp_lp_pkts_acked(struct sock *sk, u32 num_acked, s32 rtt_us)
 	/* happened after inference
 	 * cut snd_cwnd into half */
 	else
-		tp->snd_cwnd = max(tp->snd_cwnd >> 1U, 1U);
+		tcp_snd_cwnd_set(tp, max(tp->snd_cwnd >> 1U, 1U));
 
 	/* record this drop time */
 	lp->last_drop = tcp_time_stamp;
diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index f696d7c..d5a6608 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -526,7 +526,7 @@ reset:
 	if (tp->total_retrans > 1)
 		tp->snd_cwnd = 1;
 	else
-		tp->snd_cwnd = tcp_init_cwnd(tp, dst);
+		tcp_snd_cwnd_set(tp, tcp_init_cwnd(tp, dst));
 	tp->snd_cwnd_stamp = tcp_time_stamp;
 }
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index af354c98..2b80083 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -148,7 +148,7 @@ static void tcp_cwnd_restart(struct sock *sk, const struct dst_entry *dst)
 
 	while ((delta -= inet_csk(sk)->icsk_rto) > 0 && cwnd > restart_cwnd)
 		cwnd >>= 1;
-	tp->snd_cwnd = max(cwnd, restart_cwnd);
+	tcp_snd_cwnd_set(tp, max(cwnd, restart_cwnd));
 	tp->snd_cwnd_stamp = tcp_time_stamp;
 	tp->snd_cwnd_used = 0;
 }
diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c
index 80fa2bf..a270cf9 100644
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -238,7 +238,8 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 				 * truncation robs us of full link
 				 * utilization.
 				 */
-				tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);
+				tcp_snd_cwnd_set(tp, min(tp->snd_cwnd,
+							 (u32)target_cwnd + 1));
 				tp->snd_ssthresh = tcp_vegas_ssthresh(tp);
 
 			} else if (tp->snd_cwnd <= tp->snd_ssthresh) {
@@ -272,7 +273,7 @@ static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 			if (tp->snd_cwnd < 2)
 				tp->snd_cwnd = 2;
 			else if (tp->snd_cwnd > tp->snd_cwnd_clamp)
-				tp->snd_cwnd = tp->snd_cwnd_clamp;
+				tcp_snd_cwnd_set(tp, tp->snd_cwnd_clamp);
 
 			tp->snd_ssthresh = tcp_current_ssthresh(sk);
 		}
diff --git a/net/ipv4/tcp_veno.c b/net/ipv4/tcp_veno.c
index ac43cd7..d1fa5c1 100644
--- a/net/ipv4/tcp_veno.c
+++ b/net/ipv4/tcp_veno.c
@@ -180,7 +180,7 @@ static void tcp_veno_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 		if (tp->snd_cwnd < 2)
 			tp->snd_cwnd = 2;
 		else if (tp->snd_cwnd > tp->snd_cwnd_clamp)
-			tp->snd_cwnd = tp->snd_cwnd_clamp;
+			tcp_snd_cwnd_set(tp, tp->snd_cwnd_clamp);
 	}
 	/* Wipe the slate clean for the next rtt. */
 	/* veno->cntrtt = 0; */
diff --git a/net/ipv4/tcp_westwood.c b/net/ipv4/tcp_westwood.c
index 76a1e23..060fbb5 100644
--- a/net/ipv4/tcp_westwood.c
+++ b/net/ipv4/tcp_westwood.c
@@ -233,7 +233,8 @@ static void tcp_westwood_event(struct sock *sk, enum tcp_ca_event event)
 		break;
 
 	case CA_EVENT_COMPLETE_CWR:
-		tp->snd_cwnd = tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk);
+		tp->snd_ssthresh = tcp_westwood_bw_rttmin(sk);
+		tcp_snd_cwnd_set(tp, tp->snd_ssthresh);
 		break;
 
 	case CA_EVENT_LOSS:
diff --git a/net/ipv4/tcp_yeah.c b/net/ipv4/tcp_yeah.c
index 05c3b6f..3ad1dbb 100644
--- a/net/ipv4/tcp_yeah.c
+++ b/net/ipv4/tcp_yeah.c
@@ -161,11 +161,9 @@ static void tcp_yeah_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
 				    tp->snd_cwnd > yeah->reno_count) {
 					u32 reduction = min(queue / TCP_YEAH_GAMMA ,
 							    tp->snd_cwnd >> TCP_YEAH_EPSILON);
+					u32 tmp = tp->snd_cwnd - reduction;
 
-					tp->snd_cwnd -= reduction;
-
-					tp->snd_cwnd = max(tp->snd_cwnd,
-							   yeah->reno_count);
+					tcp_snd_cwnd_set(tp, max(tmp, yeah->reno_count));
 
 					tp->snd_ssthresh = tp->snd_cwnd;
 				}

^ permalink raw reply related

* Re: [PATCH v5 1/6] drivers: phy: add generic PHY framework
From: Stephen Warren @ 2013-04-03 23:54 UTC (permalink / raw)
  To: Kishon Vijay Abraham I
  Cc: balbi-l0cyMroinI0, gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	arnd-r2nGTMty4D4, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	sylvester.nawrocki-Re5JQEeQqe8AvxtiuMwx3w,
	rob-VoJi6FS/r0vR7s880joybQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, cesarb-PWySMVKUnqmsTnJN9+BGXg,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, tony-4v6yS6AI5VpBDgjK7y7TUQ,
	grant.likely-s3s/WqlpOiPyB63q8FvJNQ,
	rob.herring-bsGFqQB8/DxBDgjK7y7TUQ, b-cousson-l0cyMroinI0,
	linux-lFZ/pmaqli7XmaaqVzeoHQ, eballetbo-Re5JQEeQqe8AvxtiuMwx3w,
	javier-0uQlZySMnqxg9hUCZPvPmw, mchehab-H+wXaHxf7aLQT0dZR+AlfA,
	santosh.shilimkar-l0cyMroinI0,
	broonie-yzvPICuk2AATkU/dhu1WVueM+bqZidxxQQ4Iyu8u01E,
	swarren-DDmLM1+adcrQT0dZR+AlfA, linux-doc-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <1364993634-6378-2-git-send-email-kishon-l0cyMroinI0@public.gmane.org>

On 04/03/2013 06:53 AM, Kishon Vijay Abraham I wrote:
> The PHY framework provides a set of APIs for the PHY drivers to
> create/destroy a PHY and APIs for the PHY users to obtain a reference to the
> PHY with or without using phandle. To obtain a reference to the PHY without
> using phandle, the platform specfic intialization code (say from board file)
> should have already called phy_bind with the binding information. The binding
> information consists of phy's device name, phy user device name and an index.
> The index is used when the same phy user binds to mulitple phys.
> 
> PHY drivers should create the PHY by passing phy_descriptor that has
> describes the PHY (label, type etc..) and ops like init, exit, suspend, resume,
> power_on, power_off.
> 
> The documentation for the generic PHY framework is added in
> Documentation/phy.txt and the documentation for the sysfs entry is added
> in Documentation/ABI/testing/sysfs-class-phy and the documentation for
> dt binding is can be found at
> Documentation/devicetree/bindings/phy/phy-bindings.txt

> diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h

> +extern struct phy *devm_phy_create(struct device *dev, const char *label,
> +	struct device_node *of_node, int type, struct phy_ops *ops,
> +	void *priv);

Can't the function get of_node from dev->of_node?

I wonder if we shouldn't split up the registration a bit though:

A function which registers a PHY object itself. That's the function above.

A function which registers a DT-based PHY provider.

Then, the of_xlate op would be part of the PHY provider, not part of
some random PHY that happens to exist on that node. So:

struct phy {
	struct device *dev;
	struct module *owner;
	int	(*init)(struct phy *phy);
	int	(*exit)(struct phy *phy);
	int	(*suspend)(struct phy *phy);
	int	(*resume)(struct phy *phy);
	int	(*power_on)(struct phy *phy);
	int	(*power_off)(struct phy *phy);
};

int phy_register(struct phy *phy);

All PHY providers would use that API, whether running in a DT-base
system or not.

struct of_phy_provider {
	struct device *dev;
	struct phy * (*of_xlate)(struct of_phy_provider *provider,
			struct of_phandle_args *args);
};

int phy_register_of_provider(struct of_phy_provider *provider);

Only DT-based PHY providers would use that API.

... or something like that?

phy_get() would do something like:

	if dev->of_node:
		# look up using registerd of_phy_providers
		phy = phy_get_of(...)
		if phy: return phy
	# now look up using whatever other mapping table exists
	phy = ...
	return phy

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] MPLS: Add limited GSO support
From: Jesse Gross @ 2013-04-03 23:51 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev@openvswitch.org, netdev, Jarno Rajahalme, Pravin B Shelar
In-Reply-To: <1364980315-13649-1-git-send-email-horms@verge.net.au>

On Wed, Apr 3, 2013 at 2:11 AM, Simon Horman <horms@verge.net.au> wrote:
> In the case where a non-MPLS packet is recieved and an MPLS stack is
> added it may well be the case that the original skb is GSO but the
> NIC used for transmit does not support GSO of MPLS packets.
>
> The aim of this code is to provide GSO in software for MPLS packets
> whose skbs are GSO.
>
> When an implementation adds an MPLS stack to a non-MPLS packet it should do
> the following to skb metadata:
>
> * Set skb_mac_header(skb)->protocol to the new MPLS ethertype.
>   That is, either ETH_P_MPLS_UC or ETH_P_MPLS_MC.
>
> * Leave skb->protocol as the old non-MPLS ethertype.
>
> * Set skb->encapsulation = 1.
>
>   This may not strictly be necessary as I believe that checking
>   skb_mac_header(skb)->protocol and skb->protocol should be necessary and
>   sufficient.
>
>   However, it does seem to fit nicely with the current implementation of
>   dev_hard_start_xmit() where the more expensive check of
>   skb_mac_header(skb)->protocol may be guarded by an existing check of
>   skb->encapsulation.
>
> One aspect of this patch that I am unsure about is the modification I have
> made to skb_segment(). This seems to be necessary as checskum accelearation
> may no longer be possible as the packet has changed to be MPLS from some
> other packet type which may have been supported by the hardware in use.
>
> I will post a patch, "[PATCH v3.24] datapath: Add basic MPLS support to
> kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
> That patch sets the above requirements in
> datapath/actions.c:set_ethertype() and was used to exercise the MPLS GSO
> code. The datapath patch is against the Open vSwtich tree but it is
> intended that it be added to the Open vSwtich code present in the mainline
> Linux kernel at some point.
>
> Suggested by Jesse Gross. Based heavily on "v4 GRE: Add TCP segmentation
> offload for GRE" by Pravin B Shelar.
>
> Cc: Jesse Gross <jesse@nicira.com>
> Cc: Pravin B Shelar <pshelar@nicira.com>
> Signed-off-by: Simon Horman <horms@verge.net.au>

MPLS is very similar to both the Ethernet header and vlans in that GSO
only requires replication without any modification.  That means that
if we look at the mac_len as containing all three then we can just
copy it without any special knowledge.  I don't know that we carefully
maintain mac_len in all places but you are already doing that in your
MPLS patches.

The other piece at that point is getting the inner protocol.  I'm
worried that using skb->protocol for this will break things that try
to use it for filtering.  It also means that depending on whether an
MPLS packet is locally sourced or not skb->protocol may be different
because we won't always be able to find the inner header.  I think
this will require a careful definition of that field to make it
consistent.

^ permalink raw reply

* Re: [PATCH v5 0/6] Generic PHY Framework
From: Stephen Warren @ 2013-04-03 23:42 UTC (permalink / raw)
  To: Kishon Vijay Abraham I
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, mchehab-H+wXaHxf7aLQT0dZR+AlfA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, linux-lFZ/pmaqli7XmaaqVzeoHQ,
	javier-0uQlZySMnqxg9hUCZPvPmw, cesarb-PWySMVKUnqmsTnJN9+BGXg,
	eballetbo-Re5JQEeQqe8AvxtiuMwx3w,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	rob.herring-bsGFqQB8/DxBDgjK7y7TUQ,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	sylvester.nawrocki-Re5JQEeQqe8AvxtiuMwx3w,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	broonie-yzvPICuk2AATkU/dhu1WVueM+bqZidxxQQ4Iyu8u01E,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, balbi-l0cyMroinI0,
	santosh.shilimkar-l0cyMroinI0, netdev-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <1364993634-6378-1-git-send-email-kishon-l0cyMroinI0@public.gmane.org>

On 04/03/2013 06:53 AM, Kishon Vijay Abraham I wrote:
> Added a generic PHY framework that provides a set of APIs for the PHY drivers
> to create/destroy a PHY and APIs for the PHY users to obtain a reference to
> the PHY with or without using phandle. To obtain a reference to the PHY
> without using phandle, the platform specfic intialization code (say from board
> file) should have already called phy_bind with the binding information. The
> binding information consists of phy's device name, phy user device name and an
> index. The index is used when the same phy user binds to mulitple phys.
...
> Changes from v4:
> * removed of_phy_get_with_args/devm_of_phy_get_with_args. Now the *phy providers*
>   should use their custom implementation of of_xlate or use of_phy_xlate to get
>   *phy instance* from *phy providers*.
> * Added of_phy_xlate to be used by *phy providers* if it provides only one PHY.
> * changed phy_core from having subsys_initcall to module_init.

s/of_phy_xlate/of_phy_simple_xlate/ ? That would hightlight the fact
it's just /an/ implementation for the simple case, not /the/
implementation for all cases. It'd be consistent with e.g. drivers/gpio,
which has of_gpio_simple_xlate().

^ permalink raw reply

* Re: [net-next PATCH 3/3] net: frag queue per hash bucket locking
From: Jesper Dangaard Brouer @ 2013-04-03 22:11 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, David S. Miller, netdev, Florian Westphal,
	Daniel Borkmann
In-Reply-To: <1364583693.3232.257.camel@localhost>

On Fri, 2013-03-29 at 20:01 +0100, Jesper Dangaard Brouer wrote:
> On Fri, 2013-03-29 at 01:33 +0100, Hannes Frederic Sowa wrote:
> > On Thu, Mar 28, 2013 at 04:39:42PM -0700, Eric Dumazet wrote:
> > > On Fri, 2013-03-29 at 00:30 +0100, Hannes Frederic Sowa wrote:
> > > > On Thu, Mar 28, 2013 at 01:22:44PM -0700, Eric Dumazet wrote:
> > > > > On Thu, 2013-03-28 at 19:57 +0100, Hannes Frederic Sowa wrote:
> > > > > 
> > > > > > I assume that it has to do with the usage of this code in
> > > > > > ipv6/netfilter/nf_conntrack_reasm.c, which could be invoked from process
> > > > > > context, if I read it correctly.
> > > > > 
> > > > > Then there would be a possible deadlock in current code.
> > > > 
> > > > Netfilter currently does a local_bh_disable() before entering inet_fragment
> > > > (and later enables it, again).
> > > > 
> > > 
> > > Good, so no need for the _bh() as I suspected.
> > 
> > Ack.
> > 
> > I replaced the _bh spin_locks with plain spinlocks and tested the code
> > with sending fragments and receiving fragments (netfilter and reassmbly
> > logic) with lockdep and didn't get any splats. Looks good so far.
> 
> Well, it's great to see, that you are working on solving my patch
> proposal.  While I'm on Easter vacation ;-)  Much appreciated.
> I'm officially back from vacation Tuesday, and I'll repost then (after
> testing it on my 10G testlab).

When I rebased patch-03 (on top of net-next commit a210576c) and
removed the _bh spinlock, I saw a performance regression.  BUT this
was caused by some unrelated change in-between.  See tests below.

Test (A) is what I reported before for patch-02, accepted in commit 1b5ab0de.
Test (B) verifying-retest of commit 1b5ab0de correspond to patch-02.
Test (C) is what I reported before for patch-03

Test (D) is net-next master HEAD (commit a210576c), which reveals some
(unknown) performance regression (compared against test (B)). And (D)
function as a new base-test.

(#) Test-type:  20G64K    20G3F    20G64K+DoS  20G3F+DoS  20G64K+MQ 20G3F+MQ
    ----------  -------   -------  ----------  ---------  --------  -------
(A) Patch-02  : 18848.7   13230.1   4103.04     5310.36     130.0    440.2
(B) 1b5ab0de  : 18841.5   13156.8   4101.08     5314.57     129.0    424.2
(C) Patch-03v1: 18838.0   13490.5   4405.11     6814.72     196.6    461.6

(D) a210576c  : 18321.5   11250.4   3635.34     5160.13     119.1    405.2
(E) with _bh  : 17247.3   11492.6   3994.74     6405.29     166.7    413.6
(F) without bh: 17471.3   11298.7   3818.05     6102.11     165.7    406.3

Test (E) and (F) is patch-03, with and without the _bh spinlocks.

I cannot explain the slow down for 20G64K (but its an artificial
"labtest" so I'm not worried).  But the other results does show
improvements.  And test (E) "with _bh" version is slightly better.

p.s. Damm, it too a bit longer, than expected, to test this fairly small
correction to the patch...
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v5 1/6] drivers: phy: add generic PHY framework
From: Sylwester Nawrocki @ 2013-04-03 21:46 UTC (permalink / raw)
  To: Kishon Vijay Abraham I
  Cc: balbi, gregkh, arnd, akpm, swarren, rob, netdev, davem, cesarb,
	linux-usb, linux-omap, linux-kernel, tony, grant.likely,
	rob.herring, b-cousson, linux, eballetbo, javier, mchehab,
	santosh.shilimkar, broonie, swarren, linux-doc,
	devicetree-discuss, linux-arm-kernel
In-Reply-To: <1364993634-6378-2-git-send-email-kishon@ti.com>

On 04/03/2013 02:53 PM, Kishon Vijay Abraham I wrote:
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/phy/phy-bindings.txt
> @@ -0,0 +1,67 @@
> +This document explains only the dt data binding. For general information about

s/dt data/device tree ?

> +PHY subsystem refer Documentation/phy.txt

s/refer/refer to ?

> +
> +PHY device node
> +===============
> +
> +Required Properties:
> +#phy-cells:	Number of cells in a PHY specifier;  The meaning of all those
> +		cells is defined by the binding for the phy node. The PHY
> +		provider can use the values in cells to find the appropriate
> +		PHY.
> +
> +For example:
> +
> +phys: phy {
> +    compatible = "xxx";
> +    reg =<...>;
> +    .
> +    .
> +    #phy-cells =<1>;
> +    .
> +    .
> +};
> +
> +That node describes an IP block that implements 2 different PHYs. In order to
> +differentiate between these 2 PHYs, an additonal specifier should be given
> +while trying to get a reference to it.
> +
> +PHY user node
> +=============
> +
> +Required Properties:
> +phys : the phandle for the PHY device (used by the PHY subsystem)
> +
> +Optional properties:
> +phy-names : the names of the PHY corresponding to the PHYs present in the
> +	    *phys* phandle
> +
> +Example 1:
> +usb1: usb_otg_ss@xxx {
> +    compatible = "xxx";
> +    reg =<xxx>;
> +    .
> +    .
> +    phys =<&usb2_phy>,<&usb3_phy>;
> +    phy-names = "usb2phy", "usb3phy";
> +    .
> +    .
> +};
> +
> +This node represents a controller that uses two PHYs one for usb2 and one for

s/PHYs/PHYs,  ?

> +usb3.
> +
> +Example 2:
> +usb2: usb_otg_ss@xxx {
> +    compatible = "xxx";
> +    reg =<xxx>;
> +    .
> +    .
> +    phys =<&phys 1>;
> +    .
> +    .
> +};
> +
> +This node represents a controller that uses one of the PHYs which is defined
> +previously. Note that the phy handle has an additional specifier "1" to

I find it a bit difficult to parse. Perhaps

"This node represents a controller that uses one of the PHYs of the PHY 
provider
device defined previously. ..."

or something similar ?

> +differentiate between the two PHYs.

s/the two/the two available  ?

> diff --git a/Documentation/phy.txt b/Documentation/phy.txt
> new file mode 100644
> index 0000000..7785ec0
> --- /dev/null
> +++ b/Documentation/phy.txt
> @@ -0,0 +1,113 @@
> +			    PHY SUBSYSTEM
> +		  Kishon Vijay Abraham I<kishon@ti.com>
> +
> +This document explains the Generic PHY Framework along with the APIs provided,
> +and how-to-use.
> +
> +1. Introduction
> +
> +*PHY* is the abbreviation for physical layer. It is used to connect a device
> +to the physical medium e.g., the USB controller has a PHY to provide functions

Shouldn't it be "...medium, e.g. the USB..." ?

> +such as serialization, de-serialization, encoding, decoding and is responsible
> +for obtaining the required data transmission rate. Note that some USB
> +controller has PHY functionality embedded into it and others use an external

"controllers have PHY functionality embedded into them" ?

> +PHY. Other peripherals that uses a PHY include Wireless LAN, Ethernet,

s/uses/use ?

> +SATA etc.
> +
> +The intention of creating this framework is to bring the phy drivers spread
> +all over the Linux kernel to drivers/phy to increase code re-use and to
> +increase code maintainability.
> +
> +This framework will be of use only to devices that uses external PHY (PHY

s/uses/use ?

> +functionality is not embedded within the controller).
> +
> +2. Creating the PHY
> +
> +The PHY driver should create the PHY in order for other peripheral controllers
> +to make use of it. The PHY framework provides 2 APIs to create the PHY.
> +
> +struct phy *phy_create(struct device *dev, const char *label,
> +	struct device_node *of_node, int type, struct phy_ops *ops,
> +	void *priv);
> +struct phy *devm_phy_create(struct device *dev, const char *label,
> +	struct device_node *of_node, int type, struct phy_ops *ops,
> +	void *priv);
> +
> +The PHY drivers can use one of the above 2 APIs to create the PHY by passing
> +the device pointer, label, device node, type, phy ops and a driver data.
> +phy_ops is a set of function pointers for performing PHY operations such as
> +init, exit, suspend, resume, power_on and power_off.
> +
> +3. Binding the PHY to the controller
> +
> +The framework provides an API for binding the controller to the PHY in the
> +case of non dt boot.
> +
> +struct phy_bind *phy_bind(const char *dev_name, int index,
> +				const char *phy_dev_name);
> +
> +The API fills the phy_bind structure with the dev_name (device name of the
> +controller), index and phy_dev_name (device name of the PHY). This will
> +be used when the controller requests this phy. This API should be used by
> +platform specific initialization code (board file).
> +
> +In the case of dt boot, the binding information should be added in the dt
> +data of the controller.

s/in the dt data of/in the device tree node of ?

> +4. Getting a reference to the PHY
> +
> +Before the controller can make use of the PHY, it has to get a reference to
> +it. This framework provides 6 APIs to get a reference to the PHY.
> +
> +struct phy *phy_get(struct device *dev, int index);
> +struct phy *devm_phy_get(struct device *dev, int index);
> +struct phy *of_phy_get(struct device *dev, const char *phandle, int index);
> +struct phy *devm_of_phy_get(struct device *dev, const char *phandle, int index);

Why do we need separate functions for dt and non-dt ? Couldn't it be handled
internally by the framework ? So the PHY users can use same calls for dt
and non-dt, like in case of e.g. the regulators API ?

Also signatures of some functions are different now:

extern struct phy *phy_get(struct device *dev, int index);
extern struct phy *devm_phy_get(struct device *dev, int index);
extern struct phy *of_phy_get(struct device *dev, int index);
extern struct phy *devm_of_phy_get(struct device *dev, int index);

BTW, I think "extern" could be dropped, does it have any significance in
function declarations in header files ?

> +struct phy *of_phy_get_byname(struct device *dev, const char *string);
> +struct phy *devm_of_phy_get_byname(struct device *dev, const char *string);
> +
> +phy_get and devm_phy_get can be used to get the PHY in non-dt boot. This API
> +uses the binding information added using the phy_bind API to find and return
> +the appropriate PHY. The only difference between the two APIs is that
> +devm_phy_get associates the device with the PHY using devres on successful PHY
> +get. On driver detach, release function is invoked on the the devres data and

s/the the/the

> +devres data is freed.
> +
> +of_phy_get and devm_of_phy_get can be used to get the PHY in dt boot. These
> +APIs take the phandle and index to get a reference to the PHY. The only
> +difference between the two APIs is that devm_of_phy_get associates the device
> +with the PHY using devres on successful phy get. On driver detach, release
> +function is invoked on the devres data and it is freed.
> +
> +of_phy_get_byname and devm_of_phy_get_byname can also be used to get the PHY
> +in dt boot. It is same as the above API except that the user has to pass the
> +phy name as filled in "phy-names" phandle in dt data and the framework will

s/phandle/property ?

> +find the index and get the PHY.
> +
> +5. Releasing a reference to the PHY
> +
> +When the controller no longer needs the PHY, it has to release the reference
> +to the PHY it has obtained using the APIs mentioned in the above section. The
> +PHY framework provides 2 APIS to release a reference to the PHY.

s/APIS/APIs ?

> +
> +void phy_put(struct phy *phy);
> +void devm_phy_put(struct device *dev, struct phy *phy);
> +
> +Both these APIs are used to release a reference to the PHY and devm_phy_put
> +destroys the devres associated with this PHY.
> +
> +6. Destroying the PHY
> +
> +When the driver that created the PHY is unloaded, it should destroy the PHY it
> +created using one of the following 2 APIs.
> +
> +void phy_destroy(struct phy *phy);
> +void devm_phy_destroy(struct device *dev, struct phy *phy);
> +
> +Both these APIs destroys the PHY and devm_phy_destroy destroys the devres

s/APIs destroys/APIs destroy ?

> +associated with this PHY.
> +
> +7. DeviceTree Binding
> +
> +The documentation for PHY dt binding can be found @
> +Documentation/devicetree/bindings/phy/phy-bindings.txt

> --- /dev/null
> +++ b/drivers/phy/phy-core.c
> @@ -0,0 +1,616 @@

> +static struct phy *of_phy_lookup(struct device_node *node)
> +{
> +	struct phy *phy;
> +	struct device *dev;
> +	struct class_dev_iter iter;
> +
> +	class_dev_iter_init(&iter, phy_class, NULL, NULL);

There is currently nothing preventing a call to this function before
phy_class is initialized. It happened during my tests, and the nice
stack dump showed that it was the PHY user attempting to get the PHY
before the framework got initialized. So either phy_core_init should
be a subsys_initcall or we need a better protection against phy_class
being NULL or ERR_PTR in more places.

> +	while ((dev = class_dev_iter_next(&iter))) {
> +		phy = container_of(dev, struct phy, dev);
> +		if (node != phy->of_node)
> +			continue;
> +
> +		class_dev_iter_exit(&iter);
> +		return phy;
> +	}
> +
> +	class_dev_iter_exit(&iter);
> +	return ERR_PTR(-EPROBE_DEFER);
> +}

> +/**
> + * of_phy_get() - lookup and obtain a reference to a phy by phandle
> + * @dev: device that requests this phy
> + * @index: the index of the phy
> + *
> + * Returns the phy associated with the given phandle value,
> + * after getting a refcount to it or -ENODEV if there is no such phy or
> + * -EPROBE_DEFER if there is a phandle to the phy, but the device is
> + * not yet loaded.
> + */
> +struct phy *of_phy_get(struct device *dev, int index)
> +{
> +	int ret;
> +	struct phy *phy = NULL;
> +	struct phy_bind *phy_map = NULL;
> +	struct of_phandle_args args;
> +	struct device_node *node;
> +
> +	if (!dev->of_node) {
> +		dev_dbg(dev, "device does not have a device node entry\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	ret = of_parse_phandle_with_args(dev->of_node, "phys", "#phy-cells",
> +		index,&args);
> +	if (ret) {
> +		dev_dbg(dev, "failed to get phy in %s node\n",
> +			dev->of_node->full_name);
> +		return ERR_PTR(-ENODEV);
> +	}
> +
> +	phy = of_phy_lookup(args.np);
> +	if (IS_ERR(phy) || !try_module_get(phy->ops->owner)) {
> +		phy = ERR_PTR(-EPROBE_DEFER);
> +		goto err0;
> +	}
> +
> +	phy = phy->ops->of_xlate(phy,&args);
> +	if (IS_ERR(phy))
> +		goto err1;
> +
> +	phy_map = phy_bind(dev_name(dev), index, dev_name(&phy->dev));
> +	if (!IS_ERR(phy_map)) {
> +		phy_map->phy = phy;
> +		phy_map->auto_bind = true;

Shouldn't it be done with the phy_bind_mutex held ? I guess an unlocked
version of the phy_bind functions is needed, so it can be used internally.

> +	}
> +
> +	get_device(&phy->dev);
> +
> +err1:
> +	module_put(phy->ops->owner);
> +
> +err0:
> +	of_node_put(node);
> +
> +	return phy;
> +}
> +EXPORT_SYMBOL_GPL(of_phy_get);

> +/**
> + * phy_create() - create a new phy
> + * @dev: device that is creating the new phy
> + * @label: label given to phy
> + * @of_node: device node of the phy
> + * @type: specifies the phy type
> + * @ops: function pointers for performing phy operations
> + * @priv: private data from phy driver
> + *
> + * Called to create a phy using phy framework.
> + */
> +struct phy *phy_create(struct device *dev, const char *label,
> +	struct device_node *of_node, int type, struct phy_ops *ops,
> +	void *priv)
> +{
> +	int ret;
> +	struct phy *phy;
> +	struct phy_bind *phy_bind;
> +	const char *devname = NULL;
> +
> +	if (!dev) {
> +		dev_err(dev, "no device provided for PHY\n");
> +		ret = -EINVAL;
> +		goto err0;
> +	}
> +
> +	if (!ops || !ops->of_xlate || !priv) {
> +		dev_err(dev, "no PHY ops/PHY data provided\n");
> +		ret = -EINVAL;
> +		goto err0;
> +	}
> +
> +	if (!phy_class)
> +		phy_core_init();
> +
> +	phy = kzalloc(sizeof(*phy), GFP_KERNEL);
> +	if (!phy) {
> +		ret = -ENOMEM;
> +		goto err0;
> +	}
> +
> +	devname = dev_name(dev);

OK, a sort of serious issue here is that you can't call phy_create()
multiple times for same PHY provider device.

device_add() further will fail as sysfs won't let you create multiple
objects with same name. So I would assume we need to add a new argument
to phy_create() or rename @type to e.g. @phy_id, which could be
concatenated with dev_name(dev) to create a unique name of the phy
device.

And how is the PHY provider supposed to identify a PHY in its common PHY
ops, now when the struct phy port field is removed ? I have temporarily
(ab)used the type field in order to select proper registers within the
phy ops.

> +	device_initialize(&phy->dev);
> +
> +	phy->dev.class = phy_class;
> +	phy->dev.parent = dev;
> +	phy->label = label;

What about duplicating the string here ? That would make the API a bit
more convenient and the callers wouldn't be required to keep all the
labels around.

> +	phy->of_node = of_node;
> +	phy->type = type;
> +	phy->ops = ops;
> +
> +	dev_set_drvdata(&phy->dev, priv);
> +
> +	ret = dev_set_name(&phy->dev, "%s", devname);
> +	if (ret)
> +		goto err1;
> +
> +	mutex_lock(&phy_bind_mutex);
> +	list_for_each_entry(phy_bind,&phy_bind_list, list)
> +		if (!(strcmp(phy_bind->phy_dev_name, devname)))
> +			phy_bind->phy = phy;
> +	mutex_unlock(&phy_bind_mutex);
> +
> +	ret = device_add(&phy->dev);
> +	if (ret)
> +		goto err2;
> +
> +	return phy;
> +
> +err2:
> +	phy_bind->phy = NULL;
> +
> +err1:
> +	put_device(&phy->dev);
> +	kfree(phy);
> +
> +err0:
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL_GPL(phy_create);

> +/**
> + * devm_phy_create() - create a new phy
> + * @dev: device that is creating the new phy
> + * @dev: device that is creating the new phy

Duplicated line.

> + * @label: label given to phy
> + * @of_node: device node of the phy
> + * @type: specifies the phy type
> + * @ops: function pointers for performing phy operations
> + * @priv: private data from phy driver
> + *
> + * Creates a new PHY device adding it to the PHY class.
> + * While at that, it also associates the device with the phy using devres.
> + * On driver detach, release function is invoked on the devres data,
> + * then, devres data is freed.
> + */

> +/**
> + * phy_bind() - bind the phy and the controller that uses the phy
> + * @dev_name: the device name of the device that will bind to the phy
> + * @index: index to specify the port number
> + * @phy_dev_name: the device name of the phy
> + *
> + * Fills the phy_bind structure with the dev_name and phy_dev_name. This will
> + * be used when the phy driver registers the phy and when the controller
> + * requests this phy.
> + *
> + * To be used by platform specific initialization code.
> + */
> +struct phy_bind *phy_bind(const char *dev_name, int index,
> +				const char *phy_dev_name)
> +{
> +	struct phy_bind *phy_bind;
> +
> +	mutex_lock(&phy_bind_mutex);
> +	list_for_each_entry(phy_bind,&phy_bind_list, list) {
> +		if (!strcmp(phy_bind->dev_name, dev_name)&&  phy_bind->index ==
> +			index) {
> +			phy_bind->phy_dev_name = phy_dev_name;
> +			goto ret0;
> +		}
> +	}
> +
> +	phy_bind = kzalloc(sizeof(*phy_bind), GFP_KERNEL);
> +	if (!phy_bind) {
> +		phy_bind = ERR_PTR(-ENOMEM);
> +		goto ret0;
> +	}
> +
> +	phy_bind->dev_name = dev_name;
> +	phy_bind->phy_dev_name = phy_dev_name;
> +	phy_bind->index = index;
> +	phy_bind->auto_bind = false;
> +
> +	list_add_tail(&phy_bind->list,&phy_bind_list);
> +
> +ret0:
> +	mutex_unlock(&phy_bind_mutex);
> +	return phy_bind;
> +}
> +EXPORT_SYMBOL_GPL(phy_bind);

> +static ssize_t phy_bind_show(struct device *dev,
> +			     struct device_attribute *attr, char *buf)
> +{
> +	struct phy_bind *phy_bind;
> +	struct phy *phy;
> +	char *p = buf;
> +	int len;
> +
> +	phy = container_of(dev, struct phy, dev);
> +

Shouldn't this iteration also be protected with the phy_bind_mutex ?

> +	list_for_each_entry(phy_bind,&phy_bind_list, list)
> +		if (phy_bind->phy == phy)
> +			p += sprintf(p, "%s %d %s\n", phy_bind->dev_name,
> +				phy_bind->index, phy_bind->phy_dev_name);
> +	len = (p - buf);
> +
> +	return len;
> +}

> +static int phy_core_init(void)
> +{
> +	if (phy_class)
> +		return 0;
> +
> +	phy_class = class_create(THIS_MODULE, "phy");
> +	if (IS_ERR(phy_class)) {
> +		pr_err("failed to create phy class -->  %ld\n",
> +			PTR_ERR(phy_class));
> +		return PTR_ERR(phy_class);
> +	}
> +
> +	phy_class->dev_release = phy_release;
> +	phy_class->dev_attrs = phy_dev_attrs;
> +
> +	return 0;
> +}
> +module_init(phy_core_init);

Having this framework initialized before the PHY provider and consumer
drivers could save some unnecessary probe deferrals. I was wondering,
what is really an advantage of having it as module_init(), rather than
subsys_initcall() ?

> +static void __exit phy_core_exit(void)
> +{
> +	class_destroy(phy_class);
> +}
> +module_exit(phy_core_exit);
> +
> +MODULE_DESCRIPTION("Generic PHY Framework");
> +MODULE_AUTHOR("Kishon Vijay Abraham I<kishon@ti.com>");
> +MODULE_LICENSE("GPL v2");
> diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
> new file mode 100644
> index 0000000..97a48bd
> --- /dev/null
> +++ b/include/linux/phy/phy.h
> @@ -0,0 +1,228 @@
> +/*
> + * phy.h -- generic phy header file
> + *

> +/**
> + * struct phy - represent the phy device
> + * @dev: phy device
> + * @label: label given to phy
> + * @type: specifies the phy type
> + * @of_node: device node of the phy
> + * @ops: function pointers for performing phy operations
> + */
> +struct phy {
> +	struct device		dev;
> +	const char		*label;
> +	int			type;
> +	struct device_node	*of_node;
> +	struct phy_ops		*ops;

Something in this structure that would identify a PHY when a PHY provider
supports multiple PHYs is probably needed.

> +};
> +
> +/**
> + * struct phy_bind - represent the binding for the phy
> + * @dev_name: the device name of the device that will bind to the phy
> + * @phy_dev_name: the device name of the phy
> + * @index: used if a single controller uses multiple phys
> + * @auto_bind: tells if the binding is done explicitly from board file or not
> + * @phy: reference to the phy
> + * @list: to maintain a linked list of the binding information
> + */
> +struct phy_bind {
> +	const char	*dev_name;
> +	const char	*phy_dev_name;
> +	int		index;
> +	int		auto_bind:1;
> +	struct phy	*phy;
> +	struct list_head list;
> +};
> +
> +static inline int phy_poweron(struct phy *phy)

phy_power_on ? Just a humble suggestion...

> +{
> +	if (phy->ops->power_on)
> +		return phy->ops->power_on(phy);
> +
> +	return -EINVAL;
> +}
> +
> +static inline int phy_poweroff(struct phy *phy)

and phy_power_off ? :-)

> +{
> +	if (phy->ops->power_off)
> +		return phy->ops->power_off(phy);
> +
> +	return -EINVAL;
> +}
> +#endif /* __DRIVERS_PHY_H */

Thanks,
Sylwester

^ permalink raw reply

* Re: [PERCPU] Remove & in front of this_cpu_ptr
From: Eric Dumazet @ 2013-04-03 21:29 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Christoph Lameter, RongQing Li, Shan Wei, netdev
In-Reply-To: <20130403212450.GD3411@htj.dyndns.org>

On Wed, 2013-04-03 at 14:24 -0700, Tejun Heo wrote:

> I don't know about this one.  I actually prefer the latter in that the
> pointer being passed into this_cpu_ptr() is something which is the
> actual percpu pointer either from variable declaration or the
> allocator.  Sure, they both are just different expressions of the same
> thing but the former requires an extra guarantee from percpu subsystem
> that the accessors would work for pointers which aren't the exact
> values defined or allocated.  I'd much prefer unfiying things toward
> the latter than the former.

I agree with you, I prefer &this_cpu_ptr(percpu_pointer)->field

The offset is added after getting the address of the (percpu) base
object.

^ permalink raw reply

* Re: [PERCPU] Remove & in front of this_cpu_ptr
From: Tejun Heo @ 2013-04-03 21:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: RongQing Li, Shan Wei, netdev, Eric Dumazet
In-Reply-To: <0000013dd1a300fb-1fbb26a9-77a7-4c24-95ff-f088309206d9-000000@email.amazonses.com>

Hello, Christoph.

On Wed, Apr 03, 2013 at 08:42:33PM +0000, Christoph Lameter wrote:
> Subject: percpu: Remove & in front of this_cpu_ptr
> 
> Both
> 
> 	this_cpu_ptr(&percpu_pointer->field)
> 
> 
> [Add Offset in percpu pointer to the field offset in the struct
> and then add to the local percpu base]
> 
> as well as
> 
> 	 &this_cpu_ptr(percpu_pointer)->field
> 
> [Add percpu variable offset to local percpu base to form an address
> and then add the field offset to the address].
> 
> are correct. However, the latter looks a bit more complicated.
> The first one is easier to understand. The second one may be
> more difficult for the compiler to optimize as well.

I don't know about this one.  I actually prefer the latter in that the
pointer being passed into this_cpu_ptr() is something which is the
actual percpu pointer either from variable declaration or the
allocator.  Sure, they both are just different expressions of the same
thing but the former requires an extra guarantee from percpu subsystem
that the accessors would work for pointers which aren't the exact
values defined or allocated.  I'd much prefer unfiying things toward
the latter than the former.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: this cpu documentation
From: Tejun Heo @ 2013-04-03 21:18 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, RongQing Li, Shan Wei, netdev, srostedt
In-Reply-To: <0000013dd1a20ebf-4a76fb06-4b9d-492e-9d77-4b3f43aceca7-000000@email.amazonses.com>

On Wed, Apr 03, 2013 at 08:41:32PM +0000, Christoph Lameter wrote:
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: this_cpu: Add documentation
> 
> Document the rationale and the way to use this_cpu operations.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>

Applied to percpu/for-3.10 with the file renamed to this_cpu_ops.txt.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH net-next] e100: Add dma mapping error check
From: Jeff Kirsher @ 2013-04-03 21:13 UTC (permalink / raw)
  To: Neil Horman; +Cc: netdev, Josh Boyer, David S. Miller, e1000-devel
In-Reply-To: <1364923838-14271-1-git-send-email-nhorman@tuxdriver.com>

[-- Attachment #1: Type: text/plain, Size: 4823 bytes --]

On Tue, 2013-04-02 at 13:30 -0400, Neil Horman wrote:
> e100 uses pci_map_single, but fails to check for a dma mapping error
> after its
> use, resulting in a stack trace:
> 
> [   46.656594] ------------[ cut here ]------------
> [   46.657004] WARNING: at lib/dma-debug.c:933 check_unmap
> +0x47b/0x950()
> [   46.657004] Hardware name: To Be Filled By O.E.M.
> [   46.657004] e100 0000:00:0e.0: DMA-API: device driver failed to
> check map
> error[device address=0x000000007a4540fa] [size=90 bytes] [mapped as
> single]
> [   46.657004] Modules linked in:
> [   46.657004]  w83627hf hwmon_vid snd_via82xx ppdev snd_ac97_codec
> ac97_bus
> snd_seq snd_pcm snd_mpu401 snd_mpu401_uart ns558 snd_rawmidi gameport
> parport_pc
> e100 snd_seq_device parport snd_page_alloc snd_timer snd soundcore
> skge shpchp
> k8temp mii edac_core i2c_viapro edac_mce_amd nfsd auth_rpcgss nfs_acl
> lockd
> sunrpc binfmt_misc uinput ata_generic pata_acpi radeon i2c_algo_bit
> drm_kms_helper ttm firewire_ohci drm firewire_core pata_via sata_via
> i2c_core
> sata_promise crc_itu_t
> [   46.657004] Pid: 792, comm: ip Not tainted
> 3.8.0-0.rc6.git0.1.fc19.x86_64 #1
> [   46.657004] Call Trace:
> [   46.657004]  <IRQ>  [<ffffffff81065ed0>] warn_slowpath_common
> +0x70/0xa0
> [   46.657004]  [<ffffffff81065f4c>] warn_slowpath_fmt+0x4c/0x50
> [   46.657004]  [<ffffffff81364cfb>] check_unmap+0x47b/0x950
> [   46.657004]  [<ffffffff8136522f>] debug_dma_unmap_page+0x5f/0x70
> [   46.657004]  [<ffffffffa030f0f0>] ? e100_tx_clean+0x30/0x210 [e100]
> [   46.657004]  [<ffffffffa030f1a8>] e100_tx_clean+0xe8/0x210 [e100]
> [   46.657004]  [<ffffffffa030fc6f>] e100_poll+0x56f/0x6c0 [e100]
> [   46.657004]  [<ffffffff8159dce1>] ? net_rx_action+0xa1/0x370
> [   46.657004]  [<ffffffff8159ddb2>] net_rx_action+0x172/0x370
> [   46.657004]  [<ffffffff810703bf>] __do_softirq+0xef/0x3d0
> [   46.657004]  [<ffffffff816e4ebc>] call_softirq+0x1c/0x30
> [   46.657004]  [<ffffffff8101c485>] do_softirq+0x85/0xc0
> [   46.657004]  [<ffffffff81070885>] irq_exit+0xd5/0xe0
> [   46.657004]  [<ffffffff816e5756>] do_IRQ+0x56/0xc0
> [   46.657004]  [<ffffffff816dacb2>] common_interrupt+0x72/0x72
> [   46.657004]  <EOI>  [<ffffffff816da1eb>] ?
> _raw_spin_unlock_irqrestore+0x3b/0x70
> [   46.657004]  [<ffffffff816d124d>] __slab_free+0x58/0x38b
> [   46.657004]  [<ffffffff81214424>] ? fsnotify_clear_marks_by_inode
> +0x34/0x120
> [   46.657004]  [<ffffffff811b0417>] ? kmem_cache_free+0x97/0x320
> [   46.657004]  [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40
> [   46.657004]  [<ffffffff8157fc14>] ? sock_destroy_inode+0x34/0x40
> [   46.657004]  [<ffffffff811b0692>] kmem_cache_free+0x312/0x320
> [   46.657004]  [<ffffffff8157fc14>] sock_destroy_inode+0x34/0x40
> [   46.657004]  [<ffffffff811e8c28>] destroy_inode+0x38/0x60
> [   46.657004]  [<ffffffff811e8d5e>] evict+0x10e/0x1a0
> [   46.657004]  [<ffffffff811e9605>] iput+0xf5/0x180
> [   46.657004]  [<ffffffff811e4338>] dput+0x248/0x310
> [   46.657004]  [<ffffffff811ce0e1>] __fput+0x171/0x240
> [   46.657004]  [<ffffffff811ce26e>] ____fput+0xe/0x10
> [   46.657004]  [<ffffffff8108d54c>] task_work_run+0xac/0xe0
> [   46.657004]  [<ffffffff8106c6ed>] do_exit+0x26d/0xc30
> [   46.657004]  [<ffffffff8109eccc>] ? finish_task_switch+0x7c/0x120
> [   46.657004]  [<ffffffff816dad58>] ? retint_swapgs+0x13/0x1b
> [   46.657004]  [<ffffffff8106d139>] do_group_exit+0x49/0xc0
> [   46.657004]  [<ffffffff8106d1c4>] sys_exit_group+0x14/0x20
> [   46.657004]  [<ffffffff816e3b19>] system_call_fastpath+0x16/0x1b
> [   46.657004] ---[ end trace 4468c44e2156e7d1 ]---
> [   46.657004] Mapped at:
> [   46.657004]  [<ffffffff813663d1>] debug_dma_map_page+0x91/0x140
> [   46.657004]  [<ffffffffa030e8eb>] e100_xmit_prepare+0x12b/0x1c0
> [e100]
> [   46.657004]  [<ffffffffa030c924>] e100_exec_cb+0x84/0x140 [e100]
> [   46.657004]  [<ffffffffa030e56a>] e100_xmit_frame+0x3a/0x190 [e100]
> [   46.657004]  [<ffffffff8159ee89>] dev_hard_start_xmit+0x259/0x6c0
> 
> Easy fix, modify the cb paramter to e100_exec_cb to return an error,
> and do the
> dma_mapping_error check in the obvious place
> 
> This was reported previously here:
> http://article.gmane.org/gmane.linux.network/257893
> 
> But nobody stepped up and fixed it.
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> Reported-by: Michal Jaegermann <michal@harddata.com>
> CC: Josh Boyer <jwboyer@redhat.com>
> CC: "David S. Miller" <davem@davemloft.net>
> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> CC: e1000-devel@lists.sourceforge.net
> ---
>  drivers/net/ethernet/intel/e100.c | 36
> +++++++++++++++++++++++++-----------
>  1 file changed, 25 insertions(+), 11 deletions(-) 

Thanks Neil, I have added it to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PERCPU] Remove & in front of this_cpu_ptr
From: Christoph Lameter @ 2013-04-03 20:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: RongQing Li, Shan Wei, netdev, Eric Dumazet
In-Reply-To: <alpine.DEB.2.02.1304031540110.3444@gentwo.org>

Subject: percpu: Remove & in front of this_cpu_ptr

Both

	this_cpu_ptr(&percpu_pointer->field)


[Add Offset in percpu pointer to the field offset in the struct
and then add to the local percpu base]

as well as

	 &this_cpu_ptr(percpu_pointer)->field

[Add percpu variable offset to local percpu base to form an address
and then add the field offset to the address].

are correct. However, the latter looks a bit more complicated.
The first one is easier to understand. The second one may be
more difficult for the compiler to optimize as well.

Convert all of them to this_cpu_ptr(&percpu_pointer->field).

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/fs/gfs2/rgrp.c
===================================================================
--- linux.orig/fs/gfs2/rgrp.c	2013-04-03 15:25:22.576562629 -0500
+++ linux/fs/gfs2/rgrp.c	2013-04-03 15:26:43.045773676 -0500
@@ -1726,7 +1726,7 @@ static bool gfs2_rgrp_congested(const st
 	s64 var;

 	preempt_disable();
-	st = &this_cpu_ptr(sdp->sd_lkstats)->lkstats[LM_TYPE_RGRP];
+	st = this_cpu_ptr(&sdp->sd_lkstats->lkstats[LM_TYPE_RGRP]);
 	r_srttb = st->stats[GFS2_LKS_SRTTB];
 	r_dcount = st->stats[GFS2_LKS_DCOUNT];
 	var = st->stats[GFS2_LKS_SRTTVARB] +
Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c	2013-04-03 15:25:22.576562629 -0500
+++ linux/mm/page_alloc.c	2013-04-03 15:30:02.124769119 -0500
@@ -1342,7 +1342,7 @@ void free_hot_cold_page(struct page *pag
 		migratetype = MIGRATE_MOVABLE;
 	}

-	pcp = &this_cpu_ptr(zone->pageset)->pcp;
+	pcp = this_cpu_ptr(&zone->pageset->pcp);
 	if (cold)
 		list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	else
@@ -1484,7 +1484,7 @@ again:
 		struct list_head *list;

 		local_irq_save(flags);
-		pcp = &this_cpu_ptr(zone->pageset)->pcp;
+		pcp = this_cpu_ptr(&zone->pageset->pcp);
 		list = &pcp->lists[migratetype];
 		if (list_empty(list)) {
 			pcp->count += rmqueue_bulk(zone, 0,
Index: linux/net/core/flow.c
===================================================================
--- linux.orig/net/core/flow.c	2013-04-03 15:25:22.576562629 -0500
+++ linux/net/core/flow.c	2013-04-03 15:26:43.045773676 -0500
@@ -328,7 +328,7 @@ static void flow_cache_flush_per_cpu(voi
 	struct flow_flush_info *info = data;
 	struct tasklet_struct *tasklet;

-	tasklet = &this_cpu_ptr(info->cache->percpu)->flush_tasklet;
+	tasklet = this_cpu_ptr(&info->cache->percpu->flush_tasklet);
 	tasklet->data = (unsigned long)info;
 	tasklet_schedule(tasklet);
 }

^ permalink raw reply

* this cpu documentation
From: Christoph Lameter @ 2013-04-03 20:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: RongQing Li, Shan Wei, netdev, Tejun Heo, srostedt
In-Reply-To: <1364833887.5113.161.camel@edumazet-glaptop>


From: Christoph Lameter <cl@linux.com>
Subject: this_cpu: Add documentation

Document the rationale and the way to use this_cpu operations.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/Documentation/this_cpu_ops
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/Documentation/this_cpu_ops	2013-04-03 15:25:41.424846306 -0500
@@ -0,0 +1,194 @@
+this_cpu operations
+-------------------
+
+this_cpu operations are a way of optimizing access to per cpu variables
+associated with the *currently* executing processor
+through the use of segment registers (or a dedicated register where the cpu
+permanently stored the beginning of the per cpu area for a specific
+processor).
+
+The this_cpu operations add an per cpu variable offset to the processor
+specific percpu base and encode that operation in the instruction operating
+on the per cpu variable.
+
+This mean there are no atomicity issues between the calculation
+of the offset and the operation on the data. Therefore it is not necessary
+to disable preempt or interrupts to ensure that the processor is not changed
+between the calculation of the address and the operation on the data.
+
+Read-modify-write operations are of particular interest. Frequently
+processors have special lower latency instructions that can operate without
+the typical synchronization overhead but still provide some sort of relaxed
+atomicity guarantee. The x86 for example can execute RMV instructions like
+inc/dec/cmpxchg without the lock prefix and the associated latency penalty.
+
+Access to the variable without the lock prefix is not synchronized but
+synchronization is not necessary since we are dealing with per cpu data
+specific to the currently executing processor. Only the current processor
+should be accessing that variable and therefore there are no concurency
+issues with other processors in the system.
+
+On x86 the fs: or the gs: segment registers contain the basis of the per cpu area. It is
+then possible to simply use the segment override to relocate a per cpu relative address
+to the proper per cpu area for the processor. So the relocation to the per cpu base
+is encoded in the instruction via a segment register prefix.
+
+For example:
+
+	DEFINE_PER_CPU(int, x);
+	int z;
+
+	z = this_cpu_read(x);
+
+results in a single instruction
+
+	mov ax, gs:[x]
+
+instead of a sequence of calculation of the address and then a fetch from
+that address which occurs with the percpu operations. Before this_cpu_ops
+such sequence also required preempt disable/enable to prevent the Os from
+moving the thread to a different processor while the calculation is performed.
+
+
+The main use of the this_cpu operations has been to optimize counter operations.
+
+
+	this_cpu_inc(x)
+
+results in the following single instruction (no lock prefix!)
+
+	inc gs:[x]
+
+
+instead of the following operations required if there is no segment register.
+
+	int *y;
+	int cpu;
+
+	cpu = get_cpu();
+	y = per_cpu_ptr(&x, cpu);
+	(*y)++;
+	put_cpu();
+
+
+Note that these operations can only be used on percpu data that is reserved for
+a specific processor. Without disabling preemption in the surrounding code
+this_cpu_inc() will only guarantee that one of the percpu counters is correctly
+incremented. However, there is no guarantee that the OS will not move the process
+directly before or after the this_cpu instruction is executed. In general this
+means that the value of the individual counters for each processor are
+meaningless. The sum of all the per cpu counters is the only value that is of
+interest.
+
+Per cpu variables are used for performance reasons. Bouncing cache lines can
+be avoided if multiple processors concurrently go through the same code paths.
+Since each processor has its own per cpu variables no concurrent cacheline
+updates take place. The price that has to be paid for this optimization is
+the need to add up the per cpu counters when the value of the counter is
+needed.
+
+
+Special operations:
+-------------------
+
+	y = this_cpu_ptr(&x)
+
+Takes the offset of a per cpu variable (&x !) and returns the address of the
+per cpu variable that belongs to the currently executing processor.
+this_cpu_ptr avoids multiple steps that the common get_cpu/put_cpu sequence
+requires. No processor number is available. Instead the offset of the local\
+per cpu area is simply added to the percpu offset.
+
+
+
+Per cpu variables and offsets
+-----------------------------
+
+Per cpu variables have *offsets* to the beginning of the percpu area. They do
+not have addresses although they look like that in the code. Offsets
+cannot be directly dereferenced. The offset must be added to a base pointer of
+a percpu area of a processor in order to form a valid address.
+
+Therefore the use of x or &x outside of the context of per cpu operations
+is invalid and will generally be treated like a NULL pointer dereference.
+
+In the context of per cpu operations
+
+	x is a per cpu variable. Most this_cpu operations take a cpu variable.
+
+	&x is the *offset* a per cpu variable. this_cpu_ptr() takes the offset
+		of a per cpu variable which makes this look a bit strange.
+
+
+
+Operations on a field of a per cpu structure
+--------------------------------------------
+
+Lets say we have a percpu structure
+
+	struct s {
+		int n,m;
+	};
+
+	DEFINE_PER_CPU(struct s, p);
+
+
+Operations on these fields are straightforward
+
+	this_cpu_inc(p.m)
+
+	z = this_cpu_cmpxchg(p.m, 0, 1);
+
+
+If we have an offset to struct s:
+
+	struct s __percpu *ps = &p;
+
+	z = this_cpu_dec(ps->m);
+
+	z = this_cpu_inc_return(ps->n);
+
+
+The calculation of the pointer may require the use of this_cpu_ptr() if we
+do not make use of this_cpu ops later to manipulate fields:
+
+	struct s *pp;
+
+	pp = this_cpu_ptr(&p);
+
+	pp->m--
+
+	z = pp->n++
+
+
+Variants of this_cpu ops
+-------------------------
+
+this_cpu ops are interupt safe. Some architecture do not support these per
+cpu local operations. In that case the operation must be replaced by code
+that disables interrupts, then does the operations that are guaranteed to be
+atomic and then reenable interrupts. Doing so is expensive. If there are
+other reasons why the scheduler cannot change the processor we are executing
+on then there is no reason to disable interrupts. For that purpose
+the __this_cpu operations are provided. F.e.
+
+	__this_cpu_inc(x)
+
+Will increment x and will not fallback to code that disables interrupts on
+platforms that cannot accomplish atomicity through address relocation and
+an RMV operation in the same instruction.
+
+
+
+&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
+--------------------------------------------
+
+The first operation takes the offset and forms an address and then adds
+the offset of the n field.
+
+The second one first adds the two offsets and then does the relocation.
+IMHO the second form looks cleaner and has an easier time with ().
+
+
+Christoph Lameter, April 3rd, 2013
+

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Provide OVS_DP_ATTR_UPCALL_PID in datapath messages
From: Jesse Gross @ 2013-04-03 20:17 UTC (permalink / raw)
  To: Thomas Graf; +Cc: dev@openvswitch.org, netdev
In-Reply-To: <20130403073356.GA6402@casper.infradead.org>

On Wed, Apr 3, 2013 at 12:33 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 04/02/13 at 04:30pm, Jesse Gross wrote:
>> Can you describe the race condition some more?  The kernel doesn't
>> change the port ID on its own so even needing to request the value
>> seems rare.
>
> The upcall nlport is changeable with OVS_VPORT_CMD_SET and may
> be received between the OVS_DP_CMD_GET and OVS_VPORT_CMD_GET.
>
>> Assigning the local ports upcall PID through datapath creation is
>> really somewhat of a hack since it's port state.  I don't disagree
>> that it's somewhat asymmetric now but it seems better to move away
>> from the current model if possible.
>
> Would you suggest to wait with local vport creation if no upcall
> nlport is provided and let new user space binaries create the local
> port explicitly?
>
> We can't get rid of the attribute in datapath messages w/o breaking
> ABI but we could make its use optional I guess.

We have to keep the local port as part of datapath creation since it
is used as an identifier for the datapath itself.  However, you could
probably achieve the same result by using a port ID of zero for the
local port, which will prevent it from doing upcalls.  You could then
set it to a real value when you want to "create" it (this is actually
what OVS userspace does).  I think this is probably about as close to
removing it as we can get.

^ permalink raw reply

* Re: [PATCH RFC 0/5] tipc: add support for TIPC over InfiniBand
From: Jon Maloy @ 2013-04-03 19:44 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: allan.stephens-CWA4WttNNZF54TAoqtyWWQ,
	netdev-u79uwXL29TY76Z2rM5mHXA, roland-DgEjT+Ai2ygdnm+yROfE0A,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Paul.Gortmaker-CWA4WttNNZF54TAoqtyWWQ@public.gmane.org
In-Reply-To: <1364993010-15515-1-git-send-email-kaber-dcUjhNyLwpNeoWH0uzbU5w@public.gmane.org>

On 04/03/2013 08:43 AM, Patrick McHardy wrote:
> The following patchset adds support for running TIPC over InfiniBand.
> The patchset consists of three parts (+ a minor fix for the ethernet media
> type):
> 
> - Preparation: removal of an the unused str2addr callback and move of the
>   bcast_addr from struct tipc_media to struct tipc_bearer. This is necessary
>   because InfiniBand doesn't have a fixed broadcast address like ethernet,
>   so it needs to be initialized with the device's broadcast address when
>   the bearer is enabled
> 
> - Introduction of a TIPC InfiniBand media type. A new media type is needed
>   to deal with the different address sizes
> 
> - Support for ETH_P_TIPC in IPoIB
> 
> The last patch is something I'd like to discuss, I realize that this diverges
> from the IPoIB specification, however the alternative would be to implement
> something which would be pretty much identical to IPoIB with the only
> difference of handling a different ethertype in the xmit function.
> 
> In fact I'd like to propose to remove all higher layer protocol knowledge
> from IPoIB except for ARP and RARP, which need special treatment. With the
> recent patch to manage neighbour entries in IPoIB itself, no further knowledge
> of higher layer protocols is required.
> 
> The patchset is based on net-next.
> 
> Comments welcome.
> 

Happy to see this initiative being taken.
It seems to me that you have grasped our intentions for 
how to add a new bearer, so I really don't have much comments,
except the one already made by Erik.

To me it looks good.

Regards
///jon



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] SUNRPC/cache: add module_put() on error path in cache_open()
From: J. Bruce Fields @ 2013-04-03 19:32 UTC (permalink / raw)
  To: Alexey Khoroshilov
  Cc: Trond Myklebust, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	ldv-project-tpLiQldItUH5n4uC9ZG1Ww
In-Reply-To: <1363984604-17739-1-git-send-email-khoroshilov-ufN2psIa012HXe+LvDLADg@public.gmane.org>

Thanks, applying.--b.

On Sat, Mar 23, 2013 at 12:36:44AM +0400, Alexey Khoroshilov wrote:
> If kmalloc() fails in cache_open(), module cd->owner left locked.
> The patch adds module_put(cd->owner) on this path.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Alexey Khoroshilov <khoroshilov-ufN2psIa012HXe+LvDLADg@public.gmane.org>
> ---
>  net/sunrpc/cache.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 25d58e76..1d3c514 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -986,8 +986,10 @@ static int cache_open(struct inode *inode, struct file *filp,
>  	nonseekable_open(inode, filp);
>  	if (filp->f_mode & FMODE_READ) {
>  		rp = kmalloc(sizeof(*rp), GFP_KERNEL);
> -		if (!rp)
> +		if (!rp) {
> +			module_put(cd->owner);
>  			return -ENOMEM;
> +		}
>  		rp->offset = 0;
>  		rp->q.reader = 1;
>  		atomic_inc(&cd->readers);
> -- 
> 1.7.9.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v3] netdev/phy: Implement ieee802.3 clause 45 in mdio-octeon.c
From: David Daney @ 2013-04-03 19:25 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: linux-kernel, David Daney

From: David Daney <david.daney@cavium.com>

The Octeon SMI/MDIO interfaces can do clause 45 communications, so
implement this in the driver.

Also fix some comment formatting to make it consistent and to comply
with the netdev style.

Signed-off-by: David Daney <david.daney@cavium.com>
---

v3: Remove now bogus comment noted by Ben Huchings

v2: No code changes from v1, just fixed comment formatting snafu.

Sorry for the spam.


 drivers/net/phy/mdio-octeon.c | 94 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 87 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c
index c2c878d..b51fa1f 100644
--- a/drivers/net/phy/mdio-octeon.c
+++ b/drivers/net/phy/mdio-octeon.c
@@ -3,7 +3,7 @@
  * License.  See the file "COPYING" in the main directory of this archive
  * for more details.
  *
- * Copyright (C) 2009,2011 Cavium, Inc.
+ * Copyright (C) 2009-2012 Cavium, Inc.
  */
 
 #include <linux/platform_device.h>
@@ -27,30 +27,98 @@
 #define SMI_CLK		0x18
 #define SMI_EN		0x20
 
+enum octeon_mdiobus_mode {
+	UNINIT = 0,
+	C22,
+	C45
+};
+
 struct octeon_mdiobus {
 	struct mii_bus *mii_bus;
 	u64 register_base;
 	resource_size_t mdio_phys;
 	resource_size_t regsize;
+	enum octeon_mdiobus_mode mode;
 	int phy_irq[PHY_MAX_ADDR];
 };
 
+static void octeon_mdiobus_set_mode(struct octeon_mdiobus *p,
+				    enum octeon_mdiobus_mode m)
+{
+	union cvmx_smix_clk smi_clk;
+
+	if (m == p->mode)
+		return;
+
+	smi_clk.u64 = cvmx_read_csr(p->register_base + SMI_CLK);
+	smi_clk.s.mode = (m == C45) ? 1 : 0;
+	smi_clk.s.preamble = 1;
+	cvmx_write_csr(p->register_base + SMI_CLK, smi_clk.u64);
+	p->mode = m;
+}
+
+static int octeon_mdiobus_c45_addr(struct octeon_mdiobus *p,
+				   int phy_id, int regnum)
+{
+	union cvmx_smix_cmd smi_cmd;
+	union cvmx_smix_wr_dat smi_wr;
+	int timeout = 1000;
+
+	octeon_mdiobus_set_mode(p, C45);
+
+	smi_wr.u64 = 0;
+	smi_wr.s.dat = regnum & 0xffff;
+	cvmx_write_csr(p->register_base + SMI_WR_DAT, smi_wr.u64);
+
+	regnum = (regnum >> 16) & 0x1f;
+
+	smi_cmd.u64 = 0;
+	smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_45_ADDRESS */
+	smi_cmd.s.phy_adr = phy_id;
+	smi_cmd.s.reg_adr = regnum;
+	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
+
+	do {
+		/* Wait 1000 clocks so we don't saturate the RSL bus
+		 * doing reads.
+		 */
+		__delay(1000);
+		smi_wr.u64 = cvmx_read_csr(p->register_base + SMI_WR_DAT);
+	} while (smi_wr.s.pending && --timeout);
+
+	if (timeout <= 0)
+		return -EIO;
+	return 0;
+}
+
 static int octeon_mdiobus_read(struct mii_bus *bus, int phy_id, int regnum)
 {
 	struct octeon_mdiobus *p = bus->priv;
 	union cvmx_smix_cmd smi_cmd;
 	union cvmx_smix_rd_dat smi_rd;
+	unsigned int op = 1; /* MDIO_CLAUSE_22_READ */
 	int timeout = 1000;
 
+	if (regnum & MII_ADDR_C45) {
+		int r = octeon_mdiobus_c45_addr(p, phy_id, regnum);
+		if (r < 0)
+			return r;
+
+		regnum = (regnum >> 16) & 0x1f;
+		op = 3; /* MDIO_CLAUSE_45_READ */
+	} else {
+		octeon_mdiobus_set_mode(p, C22);
+	}
+
+
 	smi_cmd.u64 = 0;
-	smi_cmd.s.phy_op = 1; /* MDIO_CLAUSE_22_READ */
+	smi_cmd.s.phy_op = op;
 	smi_cmd.s.phy_adr = phy_id;
 	smi_cmd.s.reg_adr = regnum;
 	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
 
 	do {
-		/*
-		 * Wait 1000 clocks so we don't saturate the RSL bus
+		/* Wait 1000 clocks so we don't saturate the RSL bus
 		 * doing reads.
 		 */
 		__delay(1000);
@@ -69,21 +137,33 @@ static int octeon_mdiobus_write(struct mii_bus *bus, int phy_id,
 	struct octeon_mdiobus *p = bus->priv;
 	union cvmx_smix_cmd smi_cmd;
 	union cvmx_smix_wr_dat smi_wr;
+	unsigned int op = 0; /* MDIO_CLAUSE_22_WRITE */
 	int timeout = 1000;
 
+
+	if (regnum & MII_ADDR_C45) {
+		int r = octeon_mdiobus_c45_addr(p, phy_id, regnum);
+		if (r < 0)
+			return r;
+
+		regnum = (regnum >> 16) & 0x1f;
+		op = 1; /* MDIO_CLAUSE_45_WRITE */
+	} else {
+		octeon_mdiobus_set_mode(p, C22);
+	}
+
 	smi_wr.u64 = 0;
 	smi_wr.s.dat = val;
 	cvmx_write_csr(p->register_base + SMI_WR_DAT, smi_wr.u64);
 
 	smi_cmd.u64 = 0;
-	smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_22_WRITE */
+	smi_cmd.s.phy_op = op;
 	smi_cmd.s.phy_adr = phy_id;
 	smi_cmd.s.reg_adr = regnum;
 	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
 
 	do {
-		/*
-		 * Wait 1000 clocks so we don't saturate the RSL bus
+		/* Wait 1000 clocks so we don't saturate the RSL bus
 		 * doing reads.
 		 */
 		__delay(1000);
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH] netdev/phy: Implement ieee802.3 clause 45 in mdio-octeon.c
From: David Daney @ 2013-04-03 19:13 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David S. Miller, netdev, linux-kernel, David Daney
In-Reply-To: <1365016137.2897.26.camel@bwh-desktop.uk.solarflarecom.com>

On 04/03/2013 12:08 PM, Ben Hutchings wrote:
> On Wed, 2013-04-03 at 11:16 -0700, David Daney wrote:
>> From: David Daney <david.daney@cavium.com>
>>
>> The Octeon SMI/MDIO interfaces can do clause 45 communications, so
>> implement this in the driver.
>>
>> Signed-off-by: David Daney <david.daney@cavium.com>
>> ---
>>   drivers/net/phy/mdio-octeon.c | 89 +++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 86 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c
>> index c2c878d..f4f3abf 100644
>> --- a/drivers/net/phy/mdio-octeon.c
>> +++ b/drivers/net/phy/mdio-octeon.c
> [...]
>>   static int octeon_mdiobus_read(struct mii_bus *bus, int phy_id, int regnum)
>>   {
>>   	struct octeon_mdiobus *p = bus->priv;
>>   	union cvmx_smix_cmd smi_cmd;
>>   	union cvmx_smix_rd_dat smi_rd;
>> +	unsigned int op = 1; /* MDIO_CLAUSE_22_READ */
>>   	int timeout = 1000;
>>
>> +	if (regnum & MII_ADDR_C45) {
>> +		int r = octeon_mdiobus_c45_addr(p, phy_id, regnum);
>> +		if (r < 0)
>> +			return r;
>> +
>> +		regnum = (regnum >> 16) & 0x1f;
>> +		op = 3; /* MDIO_CLAUSE_45_READ */
>> +	} else {
>> +		octeon_mdiobus_set_mode(p, C22);
>> +	}
>> +
>> +
>>   	smi_cmd.u64 = 0;
>> -	smi_cmd.s.phy_op = 1; /* MDIO_CLAUSE_22_READ */
>> +	smi_cmd.s.phy_op = op; /* MDIO_CLAUSE_22_READ */
> [...]
>
> This comment should now be removed.

You have very sharp eyes.  I will send yet another version of the patch 
with this removed.

David Daney


>
> Ben.
>

^ permalink raw reply

* Re: [PATCH] netdev/phy: Implement ieee802.3 clause 45 in mdio-octeon.c
From: Ben Hutchings @ 2013-04-03 19:08 UTC (permalink / raw)
  To: David Daney; +Cc: David S. Miller, netdev, linux-kernel, David Daney
In-Reply-To: <1365013008-10914-1-git-send-email-ddaney.cavm@gmail.com>

On Wed, 2013-04-03 at 11:16 -0700, David Daney wrote:
> From: David Daney <david.daney@cavium.com>
> 
> The Octeon SMI/MDIO interfaces can do clause 45 communications, so
> implement this in the driver.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>
> ---
>  drivers/net/phy/mdio-octeon.c | 89 +++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 86 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c
> index c2c878d..f4f3abf 100644
> --- a/drivers/net/phy/mdio-octeon.c
> +++ b/drivers/net/phy/mdio-octeon.c
[...]
>  static int octeon_mdiobus_read(struct mii_bus *bus, int phy_id, int regnum)
>  {
>  	struct octeon_mdiobus *p = bus->priv;
>  	union cvmx_smix_cmd smi_cmd;
>  	union cvmx_smix_rd_dat smi_rd;
> +	unsigned int op = 1; /* MDIO_CLAUSE_22_READ */
>  	int timeout = 1000;
>  
> +	if (regnum & MII_ADDR_C45) {
> +		int r = octeon_mdiobus_c45_addr(p, phy_id, regnum);
> +		if (r < 0)
> +			return r;
> +
> +		regnum = (regnum >> 16) & 0x1f;
> +		op = 3; /* MDIO_CLAUSE_45_READ */
> +	} else {
> +		octeon_mdiobus_set_mode(p, C22);
> +	}
> +
> +
>  	smi_cmd.u64 = 0;
> -	smi_cmd.s.phy_op = 1; /* MDIO_CLAUSE_22_READ */
> +	smi_cmd.s.phy_op = op; /* MDIO_CLAUSE_22_READ */
[...]

This comment should now be removed.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH v2] netdev/phy: Implement ieee802.3 clause 45 in mdio-octeon.c
From: David Daney @ 2013-04-03 18:34 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: linux-kernel, David Daney

From: David Daney <david.daney@cavium.com>

The Octeon SMI/MDIO interfaces can do clause 45 communications, so
implement this in the driver.

Also fix some comment formatting to make it consistent and to comply
with the netdev style.

Signed-off-by: David Daney <david.daney@cavium.com>
---

No code changes from v1, just fixed comment formatting snafu.

Sorry for the spam.

 drivers/net/phy/mdio-octeon.c | 94 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 87 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c
index c2c878d..39a081b 100644
--- a/drivers/net/phy/mdio-octeon.c
+++ b/drivers/net/phy/mdio-octeon.c
@@ -3,7 +3,7 @@
  * License.  See the file "COPYING" in the main directory of this archive
  * for more details.
  *
- * Copyright (C) 2009,2011 Cavium, Inc.
+ * Copyright (C) 2009-2012 Cavium, Inc.
  */
 
 #include <linux/platform_device.h>
@@ -27,30 +27,98 @@
 #define SMI_CLK		0x18
 #define SMI_EN		0x20
 
+enum octeon_mdiobus_mode {
+	UNINIT = 0,
+	C22,
+	C45
+};
+
 struct octeon_mdiobus {
 	struct mii_bus *mii_bus;
 	u64 register_base;
 	resource_size_t mdio_phys;
 	resource_size_t regsize;
+	enum octeon_mdiobus_mode mode;
 	int phy_irq[PHY_MAX_ADDR];
 };
 
+static void octeon_mdiobus_set_mode(struct octeon_mdiobus *p,
+				    enum octeon_mdiobus_mode m)
+{
+	union cvmx_smix_clk smi_clk;
+
+	if (m == p->mode)
+		return;
+
+	smi_clk.u64 = cvmx_read_csr(p->register_base + SMI_CLK);
+	smi_clk.s.mode = (m == C45) ? 1 : 0;
+	smi_clk.s.preamble = 1;
+	cvmx_write_csr(p->register_base + SMI_CLK, smi_clk.u64);
+	p->mode = m;
+}
+
+static int octeon_mdiobus_c45_addr(struct octeon_mdiobus *p,
+				   int phy_id, int regnum)
+{
+	union cvmx_smix_cmd smi_cmd;
+	union cvmx_smix_wr_dat smi_wr;
+	int timeout = 1000;
+
+	octeon_mdiobus_set_mode(p, C45);
+
+	smi_wr.u64 = 0;
+	smi_wr.s.dat = regnum & 0xffff;
+	cvmx_write_csr(p->register_base + SMI_WR_DAT, smi_wr.u64);
+
+	regnum = (regnum >> 16) & 0x1f;
+
+	smi_cmd.u64 = 0;
+	smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_45_ADDRESS */
+	smi_cmd.s.phy_adr = phy_id;
+	smi_cmd.s.reg_adr = regnum;
+	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
+
+	do {
+		/* Wait 1000 clocks so we don't saturate the RSL bus
+		 * doing reads.
+		 */
+		__delay(1000);
+		smi_wr.u64 = cvmx_read_csr(p->register_base + SMI_WR_DAT);
+	} while (smi_wr.s.pending && --timeout);
+
+	if (timeout <= 0)
+		return -EIO;
+	return 0;
+}
+
 static int octeon_mdiobus_read(struct mii_bus *bus, int phy_id, int regnum)
 {
 	struct octeon_mdiobus *p = bus->priv;
 	union cvmx_smix_cmd smi_cmd;
 	union cvmx_smix_rd_dat smi_rd;
+	unsigned int op = 1; /* MDIO_CLAUSE_22_READ */
 	int timeout = 1000;
 
+	if (regnum & MII_ADDR_C45) {
+		int r = octeon_mdiobus_c45_addr(p, phy_id, regnum);
+		if (r < 0)
+			return r;
+
+		regnum = (regnum >> 16) & 0x1f;
+		op = 3; /* MDIO_CLAUSE_45_READ */
+	} else {
+		octeon_mdiobus_set_mode(p, C22);
+	}
+
+
 	smi_cmd.u64 = 0;
-	smi_cmd.s.phy_op = 1; /* MDIO_CLAUSE_22_READ */
+	smi_cmd.s.phy_op = op; /* MDIO_CLAUSE_22_READ */
 	smi_cmd.s.phy_adr = phy_id;
 	smi_cmd.s.reg_adr = regnum;
 	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
 
 	do {
-		/*
-		 * Wait 1000 clocks so we don't saturate the RSL bus
+		/* Wait 1000 clocks so we don't saturate the RSL bus
 		 * doing reads.
 		 */
 		__delay(1000);
@@ -69,21 +137,33 @@ static int octeon_mdiobus_write(struct mii_bus *bus, int phy_id,
 	struct octeon_mdiobus *p = bus->priv;
 	union cvmx_smix_cmd smi_cmd;
 	union cvmx_smix_wr_dat smi_wr;
+	unsigned int op = 0; /* MDIO_CLAUSE_22_WRITE */
 	int timeout = 1000;
 
+
+	if (regnum & MII_ADDR_C45) {
+		int r = octeon_mdiobus_c45_addr(p, phy_id, regnum);
+		if (r < 0)
+			return r;
+
+		regnum = (regnum >> 16) & 0x1f;
+		op = 1; /* MDIO_CLAUSE_45_WRITE */
+	} else {
+		octeon_mdiobus_set_mode(p, C22);
+	}
+
 	smi_wr.u64 = 0;
 	smi_wr.s.dat = val;
 	cvmx_write_csr(p->register_base + SMI_WR_DAT, smi_wr.u64);
 
 	smi_cmd.u64 = 0;
-	smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_22_WRITE */
+	smi_cmd.s.phy_op = op;
 	smi_cmd.s.phy_adr = phy_id;
 	smi_cmd.s.reg_adr = regnum;
 	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
 
 	do {
-		/*
-		 * Wait 1000 clocks so we don't saturate the RSL bus
+		/* Wait 1000 clocks so we don't saturate the RSL bus
 		 * doing reads.
 		 */
 		__delay(1000);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH] netdev/phy: Implement ieee802.3 clause 45 in mdio-octeon.c
From: David Daney @ 2013-04-03 18:16 UTC (permalink / raw)
  To: David S. Miller, netdev; +Cc: linux-kernel, David Daney

From: David Daney <david.daney@cavium.com>

The Octeon SMI/MDIO interfaces can do clause 45 communications, so
implement this in the driver.

Signed-off-by: David Daney <david.daney@cavium.com>
---
 drivers/net/phy/mdio-octeon.c | 89 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 86 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/mdio-octeon.c b/drivers/net/phy/mdio-octeon.c
index c2c878d..f4f3abf 100644
--- a/drivers/net/phy/mdio-octeon.c
+++ b/drivers/net/phy/mdio-octeon.c
@@ -3,7 +3,7 @@
  * License.  See the file "COPYING" in the main directory of this archive
  * for more details.
  *
- * Copyright (C) 2009,2011 Cavium, Inc.
+ * Copyright (C) 2009-2012 Cavium, Inc.
  */
 
 #include <linux/platform_device.h>
@@ -27,23 +27,93 @@
 #define SMI_CLK		0x18
 #define SMI_EN		0x20
 
+enum octeon_mdiobus_mode {
+	UNINIT = 0,
+	C22,
+	C45
+};
+
 struct octeon_mdiobus {
 	struct mii_bus *mii_bus;
 	u64 register_base;
 	resource_size_t mdio_phys;
 	resource_size_t regsize;
+	enum octeon_mdiobus_mode mode;
 	int phy_irq[PHY_MAX_ADDR];
 };
 
+static void octeon_mdiobus_set_mode(struct octeon_mdiobus *p,
+				    enum octeon_mdiobus_mode m)
+{
+	union cvmx_smix_clk smi_clk;
+
+	if (m == p->mode)
+		return;
+
+	smi_clk.u64 = cvmx_read_csr(p->register_base + SMI_CLK);
+	smi_clk.s.mode = (m == C45) ? 1 : 0;
+	smi_clk.s.preamble = 1;
+	cvmx_write_csr(p->register_base + SMI_CLK, smi_clk.u64);
+	p->mode = m;
+}
+
+static int octeon_mdiobus_c45_addr(struct octeon_mdiobus *p,
+				   int phy_id, int regnum)
+{
+	union cvmx_smix_cmd smi_cmd;
+	union cvmx_smix_wr_dat smi_wr;
+	int timeout = 1000;
+
+	octeon_mdiobus_set_mode(p, C45);
+
+	smi_wr.u64 = 0;
+	smi_wr.s.dat = regnum & 0xffff;
+	cvmx_write_csr(p->register_base + SMI_WR_DAT, smi_wr.u64);
+
+	regnum = (regnum >> 16) & 0x1f;
+
+	smi_cmd.u64 = 0;
+	smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_45_ADDRESS */
+	smi_cmd.s.phy_adr = phy_id;
+	smi_cmd.s.reg_adr = regnum;
+	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
+
+	do {
+		/*
+		 * Wait 1000 clocks so we don't saturate the RSL bus
+		 * doing reads.
+		 */
+		__delay(1000);
+		smi_wr.u64 = cvmx_read_csr(p->register_base + SMI_WR_DAT);
+	} while (smi_wr.s.pending && --timeout);
+
+	if (timeout <= 0)
+		return -EIO;
+	return 0;
+}
+
 static int octeon_mdiobus_read(struct mii_bus *bus, int phy_id, int regnum)
 {
 	struct octeon_mdiobus *p = bus->priv;
 	union cvmx_smix_cmd smi_cmd;
 	union cvmx_smix_rd_dat smi_rd;
+	unsigned int op = 1; /* MDIO_CLAUSE_22_READ */
 	int timeout = 1000;
 
+	if (regnum & MII_ADDR_C45) {
+		int r = octeon_mdiobus_c45_addr(p, phy_id, regnum);
+		if (r < 0)
+			return r;
+
+		regnum = (regnum >> 16) & 0x1f;
+		op = 3; /* MDIO_CLAUSE_45_READ */
+	} else {
+		octeon_mdiobus_set_mode(p, C22);
+	}
+
+
 	smi_cmd.u64 = 0;
-	smi_cmd.s.phy_op = 1; /* MDIO_CLAUSE_22_READ */
+	smi_cmd.s.phy_op = op; /* MDIO_CLAUSE_22_READ */
 	smi_cmd.s.phy_adr = phy_id;
 	smi_cmd.s.reg_adr = regnum;
 	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
@@ -69,14 +139,27 @@ static int octeon_mdiobus_write(struct mii_bus *bus, int phy_id,
 	struct octeon_mdiobus *p = bus->priv;
 	union cvmx_smix_cmd smi_cmd;
 	union cvmx_smix_wr_dat smi_wr;
+	unsigned int op = 0; /* MDIO_CLAUSE_22_WRITE */
 	int timeout = 1000;
 
+
+	if (regnum & MII_ADDR_C45) {
+		int r = octeon_mdiobus_c45_addr(p, phy_id, regnum);
+		if (r < 0)
+			return r;
+
+		regnum = (regnum >> 16) & 0x1f;
+		op = 1; /* MDIO_CLAUSE_45_WRITE */
+	} else {
+		octeon_mdiobus_set_mode(p, C22);
+	}
+
 	smi_wr.u64 = 0;
 	smi_wr.s.dat = val;
 	cvmx_write_csr(p->register_base + SMI_WR_DAT, smi_wr.u64);
 
 	smi_cmd.u64 = 0;
-	smi_cmd.s.phy_op = 0; /* MDIO_CLAUSE_22_WRITE */
+	smi_cmd.s.phy_op = op;
 	smi_cmd.s.phy_adr = phy_id;
 	smi_cmd.s.reg_adr = regnum;
 	cvmx_write_csr(p->register_base + SMI_CMD, smi_cmd.u64);
-- 
1.7.11.7

^ permalink raw reply related

* pull request: wireless 2013-04-03
From: John W. Linville @ 2013-04-03 18:16 UTC (permalink / raw)
  To: davem; +Cc: linux-wireless, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 18792 bytes --]

Dave,

Here are some more fixes intended for the 3.9 stream...

Regarding the mac80211 bits, Johannes says:

"I had changed the idle handling to simplify it, but broken the
sequencing of commands, at least for ath9k-htc, one patch restores the
sequence. The other patch fixes a crash Jouni found while stress-testing
the remain-on-channel code, when an item is deleted the work struct can
run twice and crash the second time."

As for the iwlwifi bits, Johannes says:

"The only fix here is to the passive-no-RX firmware regulatory
enforcement driver support code to not drop auth frames in quick
succession, leading to not being able to connect to APs on passive
channels in certain circumstances."

Don't forget the NFC bits, about which Samuel says:

"This time we have:

- A crash fix for when a DGRAM LLCP socket is listening while the NFC adapter
  is physically removed.
- A potential double skb free when the LLCP socket receive queue is full.
- A fix for properly handling multiple and consecutive LLCP connections, and
  not trash the socket ack log.
- A build failure for the MEI microread physical layer, now that the MEI bus
  APIs have been merged into char-misc-next."

On top of that, Stone Piao provides an mwifiex fix to avoid accessing
beyond the end of a buffer.

Please let me know if there are problems!

Thanks,

John

---

The following changes since commit 3658f3604066d5500ebd73a04084f127dc779441:

  Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile (2013-04-01 08:17:09 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless.git for-davem

for you to fetch changes up to 407ad2b7efebe42f8331fd42c4576ed3a6117e29:

  Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem (2013-04-03 13:50:34 -0400)

----------------------------------------------------------------

Johannes Berg (3):
      mac80211: fix remain-on-channel cancel crash
      mac80211: fix idle handling sequence
      iwlwifi: dvm: fix the passive-no-RX workaround

John W. Linville (4):
      Merge branch 'for-john' of git://git.kernel.org/.../jberg/mac80211
      Merge branch 'for-john' of git://git.kernel.org/.../iwlwifi/iwlwifi-fixes
      Merge tag 'nfc-fixes-3.9-2' of git://git.kernel.org/.../sameo/nfc-fixes
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless into for-davem

Samuel Ortiz (3):
      NFC: llcp: Detach socket from process context only when releasing the socket
      NFC: llcp: Keep the connected socket parent pointer alive
      NFC: microread: Fix build failure due to a new MEI bus API

Stone Piao (1):
      mwifiex: limit channel number not to overflow memory

Thierry Escande (1):
      NFC: llcp: Remove possible double call to kfree_skb

 drivers/net/wireless/iwlwifi/dvm/rxon.c | 18 +++++++---------
 drivers/net/wireless/iwlwifi/dvm/tx.c   |  2 +-
 drivers/net/wireless/mwifiex/cfg80211.c |  3 ++-
 drivers/nfc/microread/mei.c             | 38 +++++++++++++++------------------
 net/mac80211/cfg.c                      |  6 ++++--
 net/mac80211/chan.c                     | 17 ++++++++++++---
 net/mac80211/ieee80211_i.h              |  4 +++-
 net/mac80211/iface.c                    |  2 +-
 net/mac80211/offchannel.c               | 23 ++++++++++++++------
 net/nfc/llcp/llcp.c                     |  8 -------
 net/nfc/llcp/sock.c                     |  6 +++---
 11 files changed, 70 insertions(+), 57 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/dvm/rxon.c b/drivers/net/wireless/iwlwifi/dvm/rxon.c
index 23be948..a82b6b3 100644
--- a/drivers/net/wireless/iwlwifi/dvm/rxon.c
+++ b/drivers/net/wireless/iwlwifi/dvm/rxon.c
@@ -1419,6 +1419,14 @@ void iwlagn_bss_info_changed(struct ieee80211_hw *hw,
 
 	mutex_lock(&priv->mutex);
 
+	if (changes & BSS_CHANGED_IDLE && bss_conf->idle) {
+		/*
+		 * If we go idle, then clearly no "passive-no-rx"
+		 * workaround is needed any more, this is a reset.
+		 */
+		iwlagn_lift_passive_no_rx(priv);
+	}
+
 	if (unlikely(!iwl_is_ready(priv))) {
 		IWL_DEBUG_MAC80211(priv, "leave - not ready\n");
 		mutex_unlock(&priv->mutex);
@@ -1450,16 +1458,6 @@ void iwlagn_bss_info_changed(struct ieee80211_hw *hw,
 			priv->timestamp = bss_conf->sync_tsf;
 			ctx->staging.filter_flags |= RXON_FILTER_ASSOC_MSK;
 		} else {
-			/*
-			 * If we disassociate while there are pending
-			 * frames, just wake up the queues and let the
-			 * frames "escape" ... This shouldn't really
-			 * be happening to start with, but we should
-			 * not get stuck in this case either since it
-			 * can happen if userspace gets confused.
-			 */
-			iwlagn_lift_passive_no_rx(priv);
-
 			ctx->staging.filter_flags &= ~RXON_FILTER_ASSOC_MSK;
 
 			if (ctx->ctxid == IWL_RXON_CTX_BSS)
diff --git a/drivers/net/wireless/iwlwifi/dvm/tx.c b/drivers/net/wireless/iwlwifi/dvm/tx.c
index 6aec2df..d1a670d 100644
--- a/drivers/net/wireless/iwlwifi/dvm/tx.c
+++ b/drivers/net/wireless/iwlwifi/dvm/tx.c
@@ -1192,7 +1192,7 @@ int iwlagn_rx_reply_tx(struct iwl_priv *priv, struct iwl_rx_cmd_buffer *rxb,
 			memset(&info->status, 0, sizeof(info->status));
 
 			if (status == TX_STATUS_FAIL_PASSIVE_NO_RX &&
-			    iwl_is_associated_ctx(ctx) && ctx->vif &&
+			    ctx->vif &&
 			    ctx->vif->type == NL80211_IFTYPE_STATION) {
 				/* block and stop all queues */
 				priv->passive_no_rx = true;
diff --git a/drivers/net/wireless/mwifiex/cfg80211.c b/drivers/net/wireless/mwifiex/cfg80211.c
index a44023a..8aaf56a 100644
--- a/drivers/net/wireless/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/mwifiex/cfg80211.c
@@ -1892,7 +1892,8 @@ mwifiex_cfg80211_scan(struct wiphy *wiphy,
 		}
 	}
 
-	for (i = 0; i < request->n_channels; i++) {
+	for (i = 0; i < min_t(u32, request->n_channels,
+			      MWIFIEX_USER_SCAN_CHAN_MAX); i++) {
 		chan = request->channels[i];
 		priv->user_scan_cfg->chan_list[i].chan_number = chan->hw_value;
 		priv->user_scan_cfg->chan_list[i].radio_type = chan->band;
diff --git a/drivers/nfc/microread/mei.c b/drivers/nfc/microread/mei.c
index eef38cf..ca33ae1 100644
--- a/drivers/nfc/microread/mei.c
+++ b/drivers/nfc/microread/mei.c
@@ -22,7 +22,7 @@
 #include <linux/slab.h>
 #include <linux/interrupt.h>
 #include <linux/gpio.h>
-#include <linux/mei_bus.h>
+#include <linux/mei_cl_bus.h>
 
 #include <linux/nfc.h>
 #include <net/nfc/hci.h>
@@ -32,9 +32,6 @@
 
 #define MICROREAD_DRIVER_NAME "microread"
 
-#define MICROREAD_UUID UUID_LE(0x0bb17a78, 0x2a8e, 0x4c50, 0x94, \
-			       0xd4, 0x50, 0x26, 0x67, 0x23, 0x77, 0x5c)
-
 struct mei_nfc_hdr {
 	u8 cmd;
 	u8 status;
@@ -48,7 +45,7 @@ struct mei_nfc_hdr {
 #define MEI_NFC_MAX_READ (MEI_NFC_HEADER_SIZE + MEI_NFC_MAX_HCI_PAYLOAD)
 
 struct microread_mei_phy {
-	struct mei_device *mei_device;
+	struct mei_cl_device *device;
 	struct nfc_hci_dev *hdev;
 
 	int powered;
@@ -105,14 +102,14 @@ static int microread_mei_write(void *phy_id, struct sk_buff *skb)
 
 	MEI_DUMP_SKB_OUT("mei frame sent", skb);
 
-	r = mei_send(phy->device, skb->data, skb->len);
+	r = mei_cl_send(phy->device, skb->data, skb->len);
 	if (r > 0)
 		r = 0;
 
 	return r;
 }
 
-static void microread_event_cb(struct mei_device *device, u32 events,
+static void microread_event_cb(struct mei_cl_device *device, u32 events,
 			       void *context)
 {
 	struct microread_mei_phy *phy = context;
@@ -120,7 +117,7 @@ static void microread_event_cb(struct mei_device *device, u32 events,
 	if (phy->hard_fault != 0)
 		return;
 
-	if (events & BIT(MEI_EVENT_RX)) {
+	if (events & BIT(MEI_CL_EVENT_RX)) {
 		struct sk_buff *skb;
 		int reply_size;
 
@@ -128,7 +125,7 @@ static void microread_event_cb(struct mei_device *device, u32 events,
 		if (!skb)
 			return;
 
-		reply_size = mei_recv(device, skb->data, MEI_NFC_MAX_READ);
+		reply_size = mei_cl_recv(device, skb->data, MEI_NFC_MAX_READ);
 		if (reply_size < MEI_NFC_HEADER_SIZE) {
 			kfree(skb);
 			return;
@@ -149,8 +146,8 @@ static struct nfc_phy_ops mei_phy_ops = {
 	.disable = microread_mei_disable,
 };
 
-static int microread_mei_probe(struct mei_device *device,
-			       const struct mei_id *id)
+static int microread_mei_probe(struct mei_cl_device *device,
+			       const struct mei_cl_device_id *id)
 {
 	struct microread_mei_phy *phy;
 	int r;
@@ -164,9 +161,9 @@ static int microread_mei_probe(struct mei_device *device,
 	}
 
 	phy->device = device;
-	mei_set_clientdata(device, phy);
+	mei_cl_set_drvdata(device, phy);
 
-	r = mei_register_event_cb(device, microread_event_cb, phy);
+	r = mei_cl_register_event_cb(device, microread_event_cb, phy);
 	if (r) {
 		pr_err(MICROREAD_DRIVER_NAME ": event cb registration failed\n");
 		goto err_out;
@@ -186,9 +183,9 @@ err_out:
 	return r;
 }
 
-static int microread_mei_remove(struct mei_device *device)
+static int microread_mei_remove(struct mei_cl_device *device)
 {
-	struct microread_mei_phy *phy = mei_get_clientdata(device);
+	struct microread_mei_phy *phy = mei_cl_get_drvdata(device);
 
 	pr_info("Removing microread\n");
 
@@ -202,16 +199,15 @@ static int microread_mei_remove(struct mei_device *device)
 	return 0;
 }
 
-static struct mei_id microread_mei_tbl[] = {
-	{ MICROREAD_DRIVER_NAME, MICROREAD_UUID },
+static struct mei_cl_device_id microread_mei_tbl[] = {
+	{ MICROREAD_DRIVER_NAME },
 
 	/* required last entry */
 	{ }
 };
-
 MODULE_DEVICE_TABLE(mei, microread_mei_tbl);
 
-static struct mei_driver microread_driver = {
+static struct mei_cl_driver microread_driver = {
 	.id_table = microread_mei_tbl,
 	.name = MICROREAD_DRIVER_NAME,
 
@@ -225,7 +221,7 @@ static int microread_mei_init(void)
 
 	pr_debug(DRIVER_DESC ": %s\n", __func__);
 
-	r = mei_driver_register(&microread_driver);
+	r = mei_cl_driver_register(&microread_driver);
 	if (r) {
 		pr_err(MICROREAD_DRIVER_NAME ": driver registration failed\n");
 		return r;
@@ -236,7 +232,7 @@ static int microread_mei_init(void)
 
 static void microread_mei_exit(void)
 {
-	mei_driver_unregister(&microread_driver);
+	mei_cl_driver_unregister(&microread_driver);
 }
 
 module_init(microread_mei_init);
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index fb30681..a689360 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -2582,7 +2582,7 @@ static int ieee80211_cancel_roc(struct ieee80211_local *local,
 			list_del(&dep->list);
 			mutex_unlock(&local->mtx);
 
-			ieee80211_roc_notify_destroy(dep);
+			ieee80211_roc_notify_destroy(dep, true);
 			return 0;
 		}
 
@@ -2622,7 +2622,7 @@ static int ieee80211_cancel_roc(struct ieee80211_local *local,
 			ieee80211_start_next_roc(local);
 		mutex_unlock(&local->mtx);
 
-		ieee80211_roc_notify_destroy(found);
+		ieee80211_roc_notify_destroy(found, true);
 	} else {
 		/* work may be pending so use it all the time */
 		found->abort = true;
@@ -2632,6 +2632,8 @@ static int ieee80211_cancel_roc(struct ieee80211_local *local,
 
 		/* work will clean up etc */
 		flush_delayed_work(&found->work);
+		WARN_ON(!found->to_be_freed);
+		kfree(found);
 	}
 
 	return 0;
diff --git a/net/mac80211/chan.c b/net/mac80211/chan.c
index 78c0d90..931be41 100644
--- a/net/mac80211/chan.c
+++ b/net/mac80211/chan.c
@@ -63,6 +63,7 @@ ieee80211_new_chanctx(struct ieee80211_local *local,
 		      enum ieee80211_chanctx_mode mode)
 {
 	struct ieee80211_chanctx *ctx;
+	u32 changed;
 	int err;
 
 	lockdep_assert_held(&local->chanctx_mtx);
@@ -76,6 +77,13 @@ ieee80211_new_chanctx(struct ieee80211_local *local,
 	ctx->conf.rx_chains_dynamic = 1;
 	ctx->mode = mode;
 
+	/* acquire mutex to prevent idle from changing */
+	mutex_lock(&local->mtx);
+	/* turn idle off *before* setting channel -- some drivers need that */
+	changed = ieee80211_idle_off(local);
+	if (changed)
+		ieee80211_hw_config(local, changed);
+
 	if (!local->use_chanctx) {
 		local->_oper_channel_type =
 			cfg80211_get_chandef_type(chandef);
@@ -85,14 +93,17 @@ ieee80211_new_chanctx(struct ieee80211_local *local,
 		err = drv_add_chanctx(local, ctx);
 		if (err) {
 			kfree(ctx);
-			return ERR_PTR(err);
+			ctx = ERR_PTR(err);
+
+			ieee80211_recalc_idle(local);
+			goto out;
 		}
 	}
 
+	/* and keep the mutex held until the new chanctx is on the list */
 	list_add_rcu(&ctx->list, &local->chanctx_list);
 
-	mutex_lock(&local->mtx);
-	ieee80211_recalc_idle(local);
+ out:
 	mutex_unlock(&local->mtx);
 
 	return ctx;
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 388580a..5672533 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -309,6 +309,7 @@ struct ieee80211_roc_work {
 	struct ieee80211_channel *chan;
 
 	bool started, abort, hw_begun, notified;
+	bool to_be_freed;
 
 	unsigned long hw_start_time;
 
@@ -1347,7 +1348,7 @@ void ieee80211_offchannel_return(struct ieee80211_local *local);
 void ieee80211_roc_setup(struct ieee80211_local *local);
 void ieee80211_start_next_roc(struct ieee80211_local *local);
 void ieee80211_roc_purge(struct ieee80211_sub_if_data *sdata);
-void ieee80211_roc_notify_destroy(struct ieee80211_roc_work *roc);
+void ieee80211_roc_notify_destroy(struct ieee80211_roc_work *roc, bool free);
 void ieee80211_sw_roc_work(struct work_struct *work);
 void ieee80211_handle_roc_started(struct ieee80211_roc_work *roc);
 
@@ -1361,6 +1362,7 @@ int ieee80211_if_change_type(struct ieee80211_sub_if_data *sdata,
 			     enum nl80211_iftype type);
 void ieee80211_if_remove(struct ieee80211_sub_if_data *sdata);
 void ieee80211_remove_interfaces(struct ieee80211_local *local);
+u32 ieee80211_idle_off(struct ieee80211_local *local);
 void ieee80211_recalc_idle(struct ieee80211_local *local);
 void ieee80211_adjust_monitor_flags(struct ieee80211_sub_if_data *sdata,
 				    const int offset);
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 3bfe261..58150f8 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -78,7 +78,7 @@ void ieee80211_recalc_txpower(struct ieee80211_sub_if_data *sdata)
 		ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_TXPOWER);
 }
 
-static u32 ieee80211_idle_off(struct ieee80211_local *local)
+u32 ieee80211_idle_off(struct ieee80211_local *local)
 {
 	if (!(local->hw.conf.flags & IEEE80211_CONF_IDLE))
 		return 0;
diff --git a/net/mac80211/offchannel.c b/net/mac80211/offchannel.c
index cc79b4a..430bd25 100644
--- a/net/mac80211/offchannel.c
+++ b/net/mac80211/offchannel.c
@@ -297,10 +297,13 @@ void ieee80211_start_next_roc(struct ieee80211_local *local)
 	}
 }
 
-void ieee80211_roc_notify_destroy(struct ieee80211_roc_work *roc)
+void ieee80211_roc_notify_destroy(struct ieee80211_roc_work *roc, bool free)
 {
 	struct ieee80211_roc_work *dep, *tmp;
 
+	if (WARN_ON(roc->to_be_freed))
+		return;
+
 	/* was never transmitted */
 	if (roc->frame) {
 		cfg80211_mgmt_tx_status(&roc->sdata->wdev,
@@ -316,9 +319,12 @@ void ieee80211_roc_notify_destroy(struct ieee80211_roc_work *roc)
 						   GFP_KERNEL);
 
 	list_for_each_entry_safe(dep, tmp, &roc->dependents, list)
-		ieee80211_roc_notify_destroy(dep);
+		ieee80211_roc_notify_destroy(dep, true);
 
-	kfree(roc);
+	if (free)
+		kfree(roc);
+	else
+		roc->to_be_freed = true;
 }
 
 void ieee80211_sw_roc_work(struct work_struct *work)
@@ -331,6 +337,9 @@ void ieee80211_sw_roc_work(struct work_struct *work)
 
 	mutex_lock(&local->mtx);
 
+	if (roc->to_be_freed)
+		goto out_unlock;
+
 	if (roc->abort)
 		goto finish;
 
@@ -370,7 +379,7 @@ void ieee80211_sw_roc_work(struct work_struct *work)
  finish:
 		list_del(&roc->list);
 		started = roc->started;
-		ieee80211_roc_notify_destroy(roc);
+		ieee80211_roc_notify_destroy(roc, !roc->abort);
 
 		if (started) {
 			drv_flush(local, false);
@@ -410,7 +419,7 @@ static void ieee80211_hw_roc_done(struct work_struct *work)
 
 	list_del(&roc->list);
 
-	ieee80211_roc_notify_destroy(roc);
+	ieee80211_roc_notify_destroy(roc, true);
 
 	/* if there's another roc, start it now */
 	ieee80211_start_next_roc(local);
@@ -460,12 +469,14 @@ void ieee80211_roc_purge(struct ieee80211_sub_if_data *sdata)
 	list_for_each_entry_safe(roc, tmp, &tmp_list, list) {
 		if (local->ops->remain_on_channel) {
 			list_del(&roc->list);
-			ieee80211_roc_notify_destroy(roc);
+			ieee80211_roc_notify_destroy(roc, true);
 		} else {
 			ieee80211_queue_delayed_work(&local->hw, &roc->work, 0);
 
 			/* work will clean up etc */
 			flush_delayed_work(&roc->work);
+			WARN_ON(!roc->to_be_freed);
+			kfree(roc);
 		}
 	}
 
diff --git a/net/nfc/llcp/llcp.c b/net/nfc/llcp/llcp.c
index b530afa..ee25f25 100644
--- a/net/nfc/llcp/llcp.c
+++ b/net/nfc/llcp/llcp.c
@@ -107,8 +107,6 @@ static void nfc_llcp_socket_release(struct nfc_llcp_local *local, bool listen,
 				accept_sk->sk_state_change(sk);
 
 				bh_unlock_sock(accept_sk);
-
-				sock_orphan(accept_sk);
 			}
 
 			if (listen == true) {
@@ -134,8 +132,6 @@ static void nfc_llcp_socket_release(struct nfc_llcp_local *local, bool listen,
 
 		bh_unlock_sock(sk);
 
-		sock_orphan(sk);
-
 		sk_del_node_init(sk);
 	}
 
@@ -164,8 +160,6 @@ static void nfc_llcp_socket_release(struct nfc_llcp_local *local, bool listen,
 
 		bh_unlock_sock(sk);
 
-		sock_orphan(sk);
-
 		sk_del_node_init(sk);
 	}
 
@@ -827,7 +821,6 @@ static void nfc_llcp_recv_ui(struct nfc_llcp_local *local,
 		skb_get(skb);
 	} else {
 		pr_err("Receive queue is full\n");
-		kfree_skb(skb);
 	}
 
 	nfc_llcp_sock_put(llcp_sock);
@@ -1028,7 +1021,6 @@ static void nfc_llcp_recv_hdlc(struct nfc_llcp_local *local,
 			skb_get(skb);
 		} else {
 			pr_err("Receive queue is full\n");
-			kfree_skb(skb);
 		}
 	}
 
diff --git a/net/nfc/llcp/sock.c b/net/nfc/llcp/sock.c
index 5c7cdf3..8f02574 100644
--- a/net/nfc/llcp/sock.c
+++ b/net/nfc/llcp/sock.c
@@ -270,7 +270,9 @@ struct sock *nfc_llcp_accept_dequeue(struct sock *parent,
 		}
 
 		if (sk->sk_state == LLCP_CONNECTED || !newsock) {
-			nfc_llcp_accept_unlink(sk);
+			list_del_init(&lsk->accept_queue);
+			sock_put(sk);
+
 			if (newsock)
 				sock_graft(sk, newsock);
 
@@ -464,8 +466,6 @@ static int llcp_sock_release(struct socket *sock)
 			nfc_llcp_accept_unlink(accept_sk);
 
 			release_sock(accept_sk);
-
-			sock_orphan(accept_sk);
 		}
 	}
 
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related

* Re: TCP: snd_cwnd is nul, please report this bug.
From: David Miller @ 2013-04-03 17:08 UTC (permalink / raw)
  To: ycheng; +Cc: eric.dumazet, davej, netdev, ncardwell, nanditad
In-Reply-To: <CAK6E8=cLLsBYkYPgNwL08-BWsG+eR0Jkkck9S7e1Ts0RzfVHLQ@mail.gmail.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 3 Apr 2013 10:01:32 -0700

> On Tue, Apr 2, 2013 at 10:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2013-04-02 at 23:54 -0400, Dave Jones wrote:
>>> Just had this warning on 3.9-rc5
>>>
>>> Not sure what else to report.  Nothing really odd going on,
>>> just some ssh traffic and firefox over wifi (iwlwifi)
>>>
>>> anything else I can provide ?
> How did you get this warning? is this a WARN_ON?

net/ipv4/tcp_cong.c:		pr_err_once("snd_cwnd is nul, please report this bug.\n");

^ permalink raw reply

* Re: TCP: snd_cwnd is nul, please report this bug.
From: Yuchung Cheng @ 2013-04-03 17:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Dave Jones, netdev, Neal Cardwell, Nandita Dukkipati
In-Reply-To: <1364965967.5113.195.camel@edumazet-glaptop>

On Tue, Apr 2, 2013 at 10:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-04-02 at 23:54 -0400, Dave Jones wrote:
>> Just had this warning on 3.9-rc5
>>
>> Not sure what else to report.  Nothing really odd going on,
>> just some ssh traffic and firefox over wifi (iwlwifi)
>>
>> anything else I can provide ?
How did you get this warning? is this a WARN_ON?
>
> Thanks for the report.
>
> I guess we'll need to resurrect a patch I did to track the (buggy)
> setting to 0.
>
>
>

^ permalink raw reply

* Re: [net-next.git 3/7] stmmac: review private structure fields
From: Ben Hutchings @ 2013-04-03 16:09 UTC (permalink / raw)
  To: Giuseppe CAVALLARO; +Cc: Eric Dumazet, netdev
In-Reply-To: <515BDB30.4080603@st.com>

On Wed, 2013-04-03 at 09:33 +0200, Giuseppe CAVALLARO wrote:
> On 4/3/2013 9:21 AM, Eric Dumazet wrote:
> > On Wed, 2013-04-03 at 07:41 +0200, Giuseppe CAVALLARO wrote:
> >> recently many new supports have been added in the stmmac driver w/o taking care
> >> about where each new field had to be placed inside the private structure for
> >> guaranteeing the best cache usage.
> >> This is what I wanted in the beginning, so this patch reorganizes all the fields
> >> in order to keep adjacent fields for cache effect.
> >> I have also tried to optimize them by using pahole.
> >>
> >> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> >> ---
> >>   drivers/net/ethernet/stmicro/stmmac/stmmac.h |   70 +++++++++++++-------------
> >>   1 files changed, 35 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> >> index 75f997b..8aa28c5 100644
> >> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> >> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> >> @@ -35,36 +35,45 @@
> >>
> >>   struct stmmac_priv {
> >>   	/* Frequently used values are kept adjacent for cache effect */
> >> -	struct dma_desc *dma_tx ____cacheline_aligned;	/* Basic TX desc */
> >> -	struct dma_extended_desc *dma_etx;	/* Extended TX descriptor */
> >> -	dma_addr_t dma_tx_phy;
> >> -	struct sk_buff **tx_skbuff;
> >> -	dma_addr_t *tx_skbuff_dma;
> >> +	struct dma_extended_desc *dma_etx;
> >> +	struct dma_desc *dma_tx ____cacheline_aligned_in_smp;
> >> +	struct sk_buff **tx_skbuff ____cacheline_aligned_in_smp;
> >
> > dma_tx & tx_skbuff are readonly, why put them in separate cache lines ?
> 
> I put tx_skbuff in a separate cache line because, when we use extended
> descriptors, the driver works with dma_etx instead of dma_tx.
> So my idea was to have both dma_etx, dma_tx and tx_skbuff aligned in
> any case.
[...]

It's generally not that important to put fields at the beginning of a
cache line.  (If you've measured with typical systems using stmmac and
found an advantage, then I accept that you have a good reason to do
this.  But that advantage is unlikely to be specific to SMP systems.)

I would use ____cacheline_aligned_in_smp to separate fields that are
likely to be changed on different CPUs, so that the cache line doesn't
bounce between their caches.  Fields that are rarely modified (such as
these pointers), or are used together on the same CPU should be packed
together into a small number of cache lines.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox