Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: William Allen Simpson @ 2011-01-14  3:00 UTC (permalink / raw)
  To: Arnaud Lacombe
  Cc: Stephen Hemminger, Linux Kernel Developers,
	Linux Kernel Network Developers, David Miller, Andrew Morton
In-Reply-To: <AANLkTinhnaF3-scKnTOC5rF68+5otVW2NG3v1y80L0k6@mail.gmail.com>

On 1/13/11 12:53 PM, Arnaud Lacombe wrote:
> On Thu, Jan 13, 2011 at 12:32 PM, William Allen Simpson
> <william.allen.simpson@gmail.com>  wrote:
>> Even though I'm not paid to work on Linux, I'm doing my best to give you
>> folks a quick heads up and provide code to rectify the very recent changes
>> that can be propagated back through the stable tree (to 2.6.33).
>>
>> As always, what you actually do with my code is up to you....
>>
> FWIW, what is the basis of this hunk ? The RFC text[0] seems to use
> the TCP_COOKIE_* naming, not TCPCT_.
>
> Thanks,
>   - Arnaud
>
> [0]: http://www.rfc-editor.org/authors/rfc6013.txt
>
Is this supposed to be humorous?  Maybe folks here find it amusing that
somebody thinks they know more than the *author* about the contents of the
document?  Did you note the words above?  That is, "very recent changes"?

Perhaps you are viewing an older cached version.  Please check for the
current month on every page: "January 2011".

We discussed -- and ultimately decided -- these changes in private email
during the independent review process before making them available to the
general public.  That's how the RFC publication procedure works.

I tried to be helpful to the Linux community in advance of publication, so
you would be prepared.  I'm sorry that the community here is so lacking in
appreciation for my efforts on your behalf.

As always, what you actually do with my code is up to you....

^ permalink raw reply

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: Eric Dumazet @ 2011-01-14  3:08 UTC (permalink / raw)
  To: William Allen Simpson
  Cc: Arnaud Lacombe, Stephen Hemminger, Linux Kernel Developers,
	Linux Kernel Network Developers, David Miller, Andrew Morton
In-Reply-To: <4D2FBC34.6050901@gmail.com>

Le jeudi 13 janvier 2011 à 22:00 -0500, William Allen Simpson a écrit :

> Is this supposed to be humorous?  Maybe folks here find it amusing that
> somebody thinks they know more than the *author* about the contents of the
> document?  Did you note the words above?  That is, "very recent changes"?
> 
> Perhaps you are viewing an older cached version.  Please check for the
> current month on every page: "January 2011".
> 
> We discussed -- and ultimately decided -- these changes in private email
> during the independent review process before making them available to the
> general public.  That's how the RFC publication procedure works.
> 
> I tried to be helpful to the Linux community in advance of publication, so
> you would be prepared.  I'm sorry that the community here is so lacking in
> appreciation for my efforts on your behalf.
> 
> As always, what you actually do with my code is up to you....
> --


Next time you come here, provide an up2date link for us mere mortals, so
that we can check your code against your claims. We dont trust you
anymore, we had to fix several bugs.

This is getting ridiculous.

As I said, we are going to wait for official RFC, because its time
consuming to review your patches, and nobody asked for early TCPCT
coding in linux kernel (you already said your buddies dont care at all)

^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.8)
From: Eric Dumazet @ 2011-01-14  3:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20110113153436.70d3c0a3@s6510>

Le jeudi 13 janvier 2011 à 15:34 -0800, Stephen Hemminger a écrit :
> CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
> packet scheduler based on the Random Exponential Drop (RED) algorithm.
> 
> The core idea is:
>   For every packet arrival:
>   	Calculate Qave
> 	if (Qave < minth) 
> 	     Queue the new packet
> 	else 
> 	     Select randomly a packet from the queue 
> 	     if (both packets from same flow)
> 	     then Drop both the packets
> 	     else if (Qave > maxth)
> 	          Drop packet
> 	     else
> 	       	  Admit packet with probability P (same as RED)
> 
> See also:
>   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
>    queue management scheme for approximating fair bandwidth allocation", 
>   Proceeding of INFOCOM'2000, March 2000.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> 0.8 change queue length and holes account.
>     keep sch->q.qlen updated, and holes counter not needed.
> 
> ---
>  net/sched/Kconfig     |   11 +
>  net/sched/Makefile    |    1 
>  net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 548 insertions(+)
> 

Hi Stephen

Your diffstat was an old one, here the right one.

 include/linux/pkt_sched.h |   29 +
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_choke.c     |  552 ++++++++++++++++++++++++++++++++++++


I tested v8 and found several serious problems, please find a diff of my
latest changes :

- wrong oskb/skb used in choke_enqueue()
- choke_zap_head_holes() is called from choke_dequeue() and crash if we
dequeued last packet. (!!!)
- out of bound access in choke_zap_tail_holes()
- choke_dequeue() can be shorter
- choke_change() must dequeue/drop in excess packets or risk new array
overfill (if we reduce queue limit by tc qdisc change ...)
- inline is not needed, space errors in include file

Thanks !

 include/linux/pkt_sched.h |    8 +++---
 net/sched/sch_choke.c     |   43 ++++++++++++++++++++++--------------
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 83bac92..498c798 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -269,10 +269,10 @@ struct tc_choke_qopt {
 };
 
 struct tc_choke_xstats {
-	__u32           early;          /* Early drops */
-	__u32           pdrop;          /* Drops due to queue limits */
-	__u32           other;          /* Drops due to drop() calls */
-	__u32           marked;         /* Marked packets */
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
 	__u32		matched;	/* Drops due to flow match */
 };
 
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 136d4e5..c710c57 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -74,7 +74,7 @@ struct choke_sched_data {
 };
 
 /* deliver a random number between 0 and N - 1 */
-static inline u32 random_N(unsigned int N)
+static u32 random_N(unsigned int N)
 {
 	return reciprocal_divide(random32(), N);
 }
@@ -99,13 +99,13 @@ static struct sk_buff *choke_peek_random(struct Qdisc *sch,
 }
 
 /* Is ECN parameter configured */
-static inline int use_ecn(const struct choke_sched_data *q)
+static int use_ecn(const struct choke_sched_data *q)
 {
 	return q->flags & TC_RED_ECN;
 }
 
 /* Should packets over max just be dropped (versus marked) */
-static inline int use_harddrop(const struct choke_sched_data *q)
+static int use_harddrop(const struct choke_sched_data *q)
 {
 	return q->flags & TC_RED_HARDDROP;
 }
@@ -113,20 +113,21 @@ static inline int use_harddrop(const struct choke_sched_data *q)
 /* Move head pointer forward to skip over holes */
 static void choke_zap_head_holes(struct choke_sched_data *q)
 {
-	while (q->tab[q->head] == NULL) {
+	do {
 		q->head = (q->head + 1) & q->tab_mask;
-
-		BUG_ON(q->head == q->tail);
-	}
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
 }
 
 /* Move tail pointer backwards to reuse holes */
 static void choke_zap_tail_holes(struct choke_sched_data *q)
 {
-	while (q->tab[q->tail - 1] == NULL) {
+	do {
 		q->tail = (q->tail - 1) & q->tab_mask;
-		BUG_ON(q->head == q->tail);
-	}
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
 }
 
 /* Drop packet from queue array by creating a "hole" */
@@ -145,7 +146,7 @@ static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
    2. fast internal classification
    3. use TC filter based classification
 */
-static inline unsigned int choke_classify(struct sk_buff *skb,
+static unsigned int choke_classify(struct sk_buff *skb,
 					  struct Qdisc *sch, int *qerr)
 
 {
@@ -218,7 +219,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			/* Drop both packets */
 			q->stats.matched++;
 			choke_drop_by_idx(q, idx);
-			sch->qstats.backlog -= qdisc_pkt_len(skb);
+			sch->qstats.backlog -= qdisc_pkt_len(oskb);
 			--sch->q.qlen;
 			qdisc_drop(oskb, sch);
 			goto congestion_drop;
@@ -285,8 +286,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	}
 
 	skb = q->tab[q->head];
-	q->tab[q->head] = NULL; /* not really needed */
-	q->head = (q->head + 1) & q->tab_mask;
+	q->tab[q->head] = NULL;
 	choke_zap_head_holes(q);
 	--sch->q.qlen;
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
@@ -371,12 +371,23 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 		sch_tree_lock(sch);
 		old = q->tab;
 		if (old) {
-			unsigned int tail = 0;
+			unsigned int oqlen = sch->q.qlen, tail = 0;
 
 			while (q->head != q->tail) {
-				ntab[tail++] = q->tab[q->head];
+				struct sk_buff *skb = q->tab[q->head];
+
 				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
 			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
 			q->head = 0;
 			q->tail = tail;
 		}



^ permalink raw reply related

* Re: [PATCH] CHOKe flow scheduler (0.8)
From: Eric Dumazet @ 2011-01-14  3:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <1294975793.3403.117.camel@edumazet-laptop>

Le vendredi 14 janvier 2011 à 04:29 +0100, Eric Dumazet a écrit :
> Le jeudi 13 janvier 2011 à 15:34 -0800, Stephen Hemminger a écrit :
> > CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
> > packet scheduler based on the Random Exponential Drop (RED) algorithm.
> > 
> > The core idea is:
> >   For every packet arrival:
> >   	Calculate Qave
> > 	if (Qave < minth) 
> > 	     Queue the new packet
> > 	else 
> > 	     Select randomly a packet from the queue 
> > 	     if (both packets from same flow)
> > 	     then Drop both the packets
> > 	     else if (Qave > maxth)
> > 	          Drop packet
> > 	     else
> > 	       	  Admit packet with probability P (same as RED)
> > 
> > See also:
> >   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
> >    queue management scheme for approximating fair bandwidth allocation", 
> >   Proceeding of INFOCOM'2000, March 2000.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > ---
> > 0.8 change queue length and holes account.
> >     keep sch->q.qlen updated, and holes counter not needed.
> > 
> > ---
> >  net/sched/Kconfig     |   11 +
> >  net/sched/Makefile    |    1 
> >  net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 548 insertions(+)
> > 
> 
> Hi Stephen
> 
> Your diffstat was an old one, here the right one.
> 
>  include/linux/pkt_sched.h |   29 +
>  net/sched/Kconfig         |   11 
>  net/sched/Makefile        |    1 
>  net/sched/sch_choke.c     |  552 ++++++++++++++++++++++++++++++++++++
> 
> 
> I tested v8 and found several serious problems, please find a diff of my
> latest changes :
> 
> - wrong oskb/skb used in choke_enqueue()
> - choke_zap_head_holes() is called from choke_dequeue() and crash if we
> dequeued last packet. (!!!)
> - out of bound access in choke_zap_tail_holes()
> - choke_dequeue() can be shorter
> - choke_change() must dequeue/drop in excess packets or risk new array
> overfill (if we reduce queue limit by tc qdisc change ...)
> - inline is not needed, space errors in include file
> 
> Thanks !

Hmm, please wait a bit, I had another crash when I stopped my
bench/stress




^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.8)
From: Eric Dumazet @ 2011-01-14  3:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <1294976067.3403.118.camel@edumazet-laptop>

Le vendredi 14 janvier 2011 à 04:34 +0100, Eric Dumazet a écrit :

> Hmm, please wait a bit, I had another crash when I stopped my
> bench/stress

I am not sure p->qavg is correctly computed.

Crash happened because choke_peek_random() was called while no packet
was in queue.

With my params (min=10833 max=32500 burst=18055 limit=130000) this
implies qavg was very big while qlen==0 !

qdisc choke 11: dev ifb0 parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
 Sent 200857857 bytes 365183 pkt (dropped 1010937, overlimits 557577 requeues 0) 
 rate 32253Kbit 7330pps backlog 17875996b 32505p requeues 0 
  marked 0 early 557577 pdrop 0 other 0 matched 226680


Here is latest diff :

 include/linux/pkt_sched.h |    8 +++----
 net/sched/sch_choke.c     |   50 +++++++++++++++++++++++++++++-----------------
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 83bac92..498c798 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -269,10 +269,10 @@ struct tc_choke_qopt {
 };
 
 struct tc_choke_xstats {
-	__u32           early;          /* Early drops */
-	__u32           pdrop;          /* Drops due to queue limits */
-	__u32           other;          /* Drops due to drop() calls */
-	__u32           marked;         /* Marked packets */
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
 	__u32		matched;	/* Drops due to flow match */
 };
 
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 136d4e5..2f94dad 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -74,7 +74,7 @@ struct choke_sched_data {
 };
 
 /* deliver a random number between 0 and N - 1 */
-static inline u32 random_N(unsigned int N)
+static u32 random_N(unsigned int N)
 {
 	return reciprocal_divide(random32(), N);
 }
@@ -94,18 +94,20 @@ static struct sk_buff *choke_peek_random(struct Qdisc *sch,
 			return skb;
 	} while (--retrys > 0);
 
-	/* queue is has lots of holes use the head which is known to exist */
+	/* queue is has lots of holes use the head which is known to exist
+	 * Note : result can still be NULL if q->head == q->tail
+	 */
 	return q->tab[*pidx = q->head];
 }
 
 /* Is ECN parameter configured */
-static inline int use_ecn(const struct choke_sched_data *q)
+static int use_ecn(const struct choke_sched_data *q)
 {
 	return q->flags & TC_RED_ECN;
 }
 
 /* Should packets over max just be dropped (versus marked) */
-static inline int use_harddrop(const struct choke_sched_data *q)
+static int use_harddrop(const struct choke_sched_data *q)
 {
 	return q->flags & TC_RED_HARDDROP;
 }
@@ -113,20 +115,21 @@ static inline int use_harddrop(const struct choke_sched_data *q)
 /* Move head pointer forward to skip over holes */
 static void choke_zap_head_holes(struct choke_sched_data *q)
 {
-	while (q->tab[q->head] == NULL) {
+	do {
 		q->head = (q->head + 1) & q->tab_mask;
-
-		BUG_ON(q->head == q->tail);
-	}
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
 }
 
 /* Move tail pointer backwards to reuse holes */
 static void choke_zap_tail_holes(struct choke_sched_data *q)
 {
-	while (q->tab[q->tail - 1] == NULL) {
+	do {
 		q->tail = (q->tail - 1) & q->tab_mask;
-		BUG_ON(q->head == q->tail);
-	}
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
 }
 
 /* Drop packet from queue array by creating a "hole" */
@@ -145,7 +148,7 @@ static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
    2. fast internal classification
    3. use TC filter based classification
 */
-static inline unsigned int choke_classify(struct sk_buff *skb,
+static unsigned int choke_classify(struct sk_buff *skb,
 					  struct Qdisc *sch, int *qerr)
 
 {
@@ -214,11 +217,12 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		oskb = choke_peek_random(sch, &idx);
 
 		/* Both packets from same flow ? */
-		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
+		if (oskb &&
+		    *(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
 			/* Drop both packets */
 			q->stats.matched++;
 			choke_drop_by_idx(q, idx);
-			sch->qstats.backlog -= qdisc_pkt_len(skb);
+			sch->qstats.backlog -= qdisc_pkt_len(oskb);
 			--sch->q.qlen;
 			qdisc_drop(oskb, sch);
 			goto congestion_drop;
@@ -285,8 +289,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	}
 
 	skb = q->tab[q->head];
-	q->tab[q->head] = NULL; /* not really needed */
-	q->head = (q->head + 1) & q->tab_mask;
+	q->tab[q->head] = NULL;
 	choke_zap_head_holes(q);
 	--sch->q.qlen;
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
@@ -371,12 +374,23 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 		sch_tree_lock(sch);
 		old = q->tab;
 		if (old) {
-			unsigned int tail = 0;
+			unsigned int oqlen = sch->q.qlen, tail = 0;
 
 			while (q->head != q->tail) {
-				ntab[tail++] = q->tab[q->head];
+				struct sk_buff *skb = q->tab[q->head];
+
 				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
 			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
 			q->head = 0;
 			q->tail = tail;
 		}



^ permalink raw reply related

* Re: Flow Control and Port Mirroring Revisited
From: Michael S. Tsirkin @ 2011-01-14  4:58 UTC (permalink / raw)
  To: Simon Horman
  Cc: Jesse Gross, Eric Dumazet, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110113234135.GC8426@verge.net.au>

On Fri, Jan 14, 2011 at 08:41:36AM +0900, Simon Horman wrote:
> On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
> > On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman <horms@verge.net.au> wrote:
> > > On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> > >> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> > >> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> > >> >
> > >> > [ snip ]
> > >> > >
> > >> > > I know that everyone likes a nice netperf result but I agree with
> > >> > > Michael that this probably isn't the right question to be asking.  I
> > >> > > don't think that socket buffers are a real solution to the flow
> > >> > > control problem: they happen to provide that functionality but it's
> > >> > > more of a side effect than anything.  It's just that the amount of
> > >> > > memory consumed by packets in the queue(s) doesn't really have any
> > >> > > implicit meaning for flow control (think multiple physical adapters,
> > >> > > all with the same speed instead of a virtual device and a physical
> > >> > > device with wildly different speeds).  The analog in the physical
> > >> > > world that you're looking for would be Ethernet flow control.
> > >> > > Obviously, if the question is limiting CPU or memory consumption then
> > >> > > that's a different story.
> > >> >
> > >> > Point taken. I will see if I can control CPU (and thus memory) consumption
> > >> > using cgroups and/or tc.
> > >>
> > >> I have found that I can successfully control the throughput using
> > >> the following techniques
> > >>
> > >> 1) Place a tc egress filter on dummy0
> > >>
> > >> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
> > >>    this is effectively the same as one of my hacks to the datapath
> > >>    that I mentioned in an earlier mail. The result is that eth1
> > >>    "paces" the connection.

This is actually a bug. This means that one slow connection will
affect fast ones. I intend to change the default for qemu to sndbuf=0 :
this will fix it but break your "pacing". So pls do not count on this behaviour.

> > > Further to this, I wonder if there is any interest in providing
> > > a method to switch the action order - using ovs-ofctl is a hack imho -
> > > and/or switching the default action order for mirroring.
> > 
> > I'm not sure that there is a way to do this that is correct in the
> > generic case.  It's possible that the destination could be a VM while
> > packets are being mirrored to a physical device or we could be
> > multicasting or some other arbitrarily complex scenario.  Just think
> > of what a physical switch would do if it has ports with two different
> > speeds.
> 
> Yes, I have considered that case. And I agree that perhaps there
> is no sensible default. But perhaps we could make it configurable somehow?

The fix is at the application level. Run netperf with -b and -w flags to
limit the speed to a sensible value.

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Po-Yu Chuang @ 2011-01-14  5:37 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, linux-kernel, Po-Yu Chuang
In-Reply-To: <1294927387.3044.76.camel@localhost>

Dear Ben,

On Thu, Jan 13, 2011 at 10:03 PM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Thu, 2011-01-13 at 19:49 +0800, Po-Yu Chuang wrote:
>> From: Po-Yu Chuang <ratbert@faraday-tech.com>
>>
>> FTMAC100 Ethernet Media Access Controller supports 10/100 Mbps and
>> MII.  This driver has been working on some ARM/NDS32 SoC including
>> Faraday A320 and Andes AG101.
> [...]
>> +#define USE_NAPI
>
> All new network drivers should use NAPI only, so please remove the
> definition of USE_NAPI and change the conditional code to use NAPI
> always.

OK, fixed.

> [...]
>> +     struct net_device_stats stats;
>
> There is a net_device_stats structure in the net_device structure; you
> should normally use that instead of adding another one.

OK, fixed.

> [...]
>> +static int ftmac100_reset(struct ftmac100 *priv)
>> +{
>> +     struct device *dev = &priv->netdev->dev;
>> +     unsigned long flags;
>> +     int i;
>> +
>> +     /* NOTE: reset clears all registers */
>> +
>> +     spin_lock_irqsave(&priv->hw_lock, flags);
>> +     iowrite32(FTMAC100_MACCR_SW_RST, priv->base + FTMAC100_OFFSET_MACCR);
>> +     spin_unlock_irqrestore(&priv->hw_lock, flags);
>> +
>> +     for (i = 0; i < 5; i++) {
>> +             int maccr;
>> +
>> +             spin_lock_irqsave(&priv->hw_lock, flags);
>> +             maccr = ioread32(priv->base + FTMAC100_OFFSET_MACCR);
>> +             spin_unlock_irqrestore(&priv->hw_lock, flags);
>> +             if (!(maccr & FTMAC100_MACCR_SW_RST)) {
>> +                     /*
>> +                      * FTMAC100_MACCR_SW_RST cleared does not indicate
>> +                      * that hardware reset completed (what the f*ck).
>> +                      * We still need to wait for a while.
>> +                      */
>> +                     usleep_range(500, 1000);
>
> Sleeping here means this must be running in process context.  But you
> used spin_lock_irqsave() above which implies you're not sure what the
> context is.  One of these must be wrong.
>
> I wonder whether hw_lock is even needed; you seem to acquire and release
> it around each PIO (read or write).  This should be unnecessary since
> each PIO is atomic.

OK, fixed.

> I think you can also get rid of rx_lock; it's only used in the RX data
> path which is already serialised by NAPI.

OK, fixed.

> [...]
>> +     netdev->last_rx = jiffies;
>
> Don't set last_rx; the networking core takes care of it now.

OK, fixed.

>> +     priv->stats.rx_packets++;
>> +     priv->stats.rx_bytes += skb->len;
>
> This should be done earlier, so that these stats include packets that
> are dropped for any reason.

OK, fixed.

> [...]
>> +     netdev->trans_start = jiffies;
>
> Don't set trans_start; the networking core takes care of it now.

OK, fixed.

> [...]
>> +     priv->descs = dma_alloc_coherent(priv->dev,
>> +             sizeof(struct ftmac100_descs), &priv->descs_dma_addr,
>> +             GFP_KERNEL | GFP_DMA);
>
> On x86, GFP_DMA means the memory must be within the ISA DMA range
> (< 16 MB).  I don't know quite what it means on ARM but it may not be
> what you want.

On ARM, all 4G address space can be used by DMA (at least for all the
hardwares I have ever used). All memory pages are in DMA zone AFAIK.
I put GFP_DMA here just to be safe if there were any constraints.

By checking drivers in drivers/net/arm/, ep93xx_eth.c also uses this flag,
so I guess this is acceptable?

> [...]
>> +     if (status & (FTMAC100_INT_XPKT_OK | FTMAC100_INT_XPKT_LOST)) {
>> +             /*
>> +              * FTMAC100_INT_XPKT_OK:
>> +              *       packet transmitted to ethernet successfully
>> +              *
>> +              * FTMAC100_INT_XPKT_LOST:
>> +              *      packet transmitted to ethernet lost due to late
>> +              *      collision or excessive collision
>> +              */
>> +             ftmac100_tx_complete(priv);
>> +     }
>
> TX completions should also be handled through NAPI if possible.

OK, I'll study how to do that.

> [...]
>> +     priv->rx_pointer = 0;
>> +     priv->tx_clean_pointer = 0;
>> +     priv->tx_pointer = 0;
>> +     spin_lock_init(&priv->hw_lock);
>> +     spin_lock_init(&priv->rx_lock);
>> +     spin_lock_init(&priv->tx_lock);
>
> These locks should be initialised in your probe function.

OK, fixed.

> [...]
>> +     unregister_netdev(netdev);
>
> There should be a netif_napi_del() before this.

OK, fixed.

> A general comment: please use netdev_err(), netdev_info() etc. for
> logging.  This ensures that both the platform device address and the
> network device name appears in the log messages.

OK, fixed.

Thanks a lot for your detailed review. I'll submit a new version ASAP.

Thanks,
Po-Yu Chuang

^ permalink raw reply

* Re: [PATCH] net: remove dev_txq_stats_fold()
From: David Miller @ 2011-01-14  5:45 UTC (permalink / raw)
  To: eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
  Cc: jarkao2-Re5JQEeQqe8AvxtiuMwx3w, david-b-yBeKhBN/0LDR7s880joybQ,
	neiljay-Re5JQEeQqe8AvxtiuMwx3w, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	alexander.h.duyck-ral2JQCrhuEAvxtiuMwx3w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	mina86-deATy8a+UHjQT0dZR+AlfA,
	sandeep.kumar-KZfg59tc24xl57MIdRCFDg
In-Reply-To: <1294870394.3335.75.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Wed, 12 Jan 2011 23:13:14 +0100

> [PATCH v2] net: remove dev_txq_stats_fold()
> 
> After recent changes, (percpu stats on vlan/tunnels...), we dont need
> anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.
> 
> Only remaining users are ixgbe, sch_teql, gianfar & macvlan :
> 
> 1) ixgbe can be converted to use existing tx_ring counters.
> 
> 2) macvlan incremented txq->tx_dropped, it can use the
> dev->stats.tx_dropped counter.
> 
> 3) sch_teql : almost revert ab35cd4b8f42 (Use net_device internal stats)
>     Now we have ndo_get_stats64(), use it, even for "unsigned long"
> fields (No need to bring back a struct net_device_stats)
> 
> 4) gianfar adds a stats structure per tx queue to hold
> tx_bytes/tx_packets
> 
> This removes a lockdep warning (and possible lockup) in rndis gadget,
> calling dev_get_stats() from hard IRQ context.
> 
> Ref: http://www.spinics.net/lists/netdev/msg149202.html
> 
> Reported-by: Neil Jones <neiljay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Jarek Poplawski <jarkao2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> CC: Alexander Duyck <alexander.h.duyck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Jeff Kirsher <jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> CC: Sandeep Gopalpet <sandeep.kumar-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
> CC: Michal Nazarewicz <mina86-deATy8a+UHjQT0dZR+AlfA@public.gmane.org>

Applied, thanks everyone.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 05/10] net/fec: add dual fec support for mx28
From: Shawn Guo @ 2011-01-14  5:48 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: davem, gerg, baruch, eric, bryan.wu, r64343, B32542, lw, w.sang,
	s.hauer, jamie, jamie, netdev, linux-arm-kernel
In-Reply-To: <20110113144805.GS24920@pengutronix.de>

Hi Uwe,

On Thu, Jan 13, 2011 at 03:48:05PM +0100, Uwe Kleine-König wrote:

[...]

> > +/* Controller is ENET-MAC */
> > +#define FEC_QUIRK_ENET_MAC           (1 << 0)
> does this really qualify to be a quirk?
> 
My understanding is that ENET-MAC is a type of "quirky" FEC
controller.

> > +/* Controller needs driver to swap frame */
> > +#define FEC_QUIRK_SWAP_FRAME         (1 << 1)
> IMHO this is a bit misnamed.  FEC_QUIRK_NEEDS_BE_DATA or similar would
> be more accurate.
> 
When your make this change, you may want to pick a better name for
function swap_buffer too.

[...]

> > +static void *swap_buffer(void *bufaddr, int len)
> > +{
> > +     int i;
> > +     unsigned int *buf = bufaddr;
> > +
> > +     for (i = 0; i < (len + 3) / 4; i++, buf++)
> > +             *buf = cpu_to_be32(*buf);
> if len isn't a multiple of 4 this accesses bytes behind len.  Is this
> generally OK here?  (E.g. because skbs always have a length that is a
> multiple of 4?)
The len may not be a multiple of 4.  But I believe bufaddr is always
a buffer allocated in a length that is a multiple of 4, and the 1~3
bytes exceeding the len very likely has no data that matters.  But
yes, it deserves a safer implementation.

[...]

> > +     /*
> > +      * The dual fec interfaces are not equivalent with enet-mac.
> > +      * Here are the differences:
> > +      *
> > +      *  - fec0 supports MII & RMII modes while fec1 only supports RMII
> > +      *  - fec0 acts as the 1588 time master while fec1 is slave
> > +      *  - external phys can only be configured by fec0
> > +      *
> > +      * That is to say fec1 can not work independently. It only works
> > +      * when fec0 is working. The reason behind this design is that the
> > +      * second interface is added primarily for Switch mode.
> > +      *
> > +      * Because of the last point above, both phys are attached on fec0
> > +      * mdio interface in board design, and need to be configured by
> > +      * fec0 mii_bus.
> > +      */
> > +     if ((id_entry->driver_data & FEC_QUIRK_ENET_MAC) && pdev->id) {
> > +             /* fec1 uses fec0 mii_bus */
> > +             fep->mii_bus = fec0_mii_bus;
> > +             return 0;
> What happens if imx28-fec.1 is probed before imx28-fec.0?
It's something that generally should not happen, as these two fec are
not equivalent, and fec.1 should always be added after fec.0 if you
intend to get dual interfaces.  But yes, we should add error checking
for this case in the driver.

-- 
Regards,
Shawn


^ permalink raw reply

* Re: [PATCH] ipsec: update MAX_AH_AUTH_LEN to support sha512
From: David Miller @ 2011-01-14  5:50 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <4D2F73C7.8010107@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 13 Jan 2011 22:51:03 +0100

>>From e330817aa2b33e9d1f44071072fdc4778acf8d76 Mon Sep 17 00:00:00 2001
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Thu, 13 Jan 2011 14:20:19 -0500
> Subject: [PATCH] ipsec: update MAX_AH_AUTH_LEN to support sha512
> 
> icv_truncbits is set to 256 for sha512, so update
> MAX_AH_AUTH_LEN to 64.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

Applied.

^ permalink raw reply

* Re: [PATCH] r8169: keep firmware in memory.
From: David Miller @ 2011-01-14  5:50 UTC (permalink / raw)
  To: romieu; +Cc: netdev, jarek, hayeswang, benh, torvalds
In-Reply-To: <20110113230753.GA2750@electric-eye.fr.zoreil.com>

From: Francois Romieu <romieu@fr.zoreil.com>
Date: Fri, 14 Jan 2011 00:07:53 +0100

> The firmware agent is not available during resume. Loading the firmware
> during open() (see eee3a96c6368f47df8df5bd4ed1843600652b337) is not
> enough.
> 
> close() is run during resume through rtl8169_reset_task(), whence the
> mildly natural release of firmware in the driver removal method instead.
> 
> It will help with http://bugs.debian.org/609538. It will not avoid
> the 60 seconds delay when:
> - there is no firmware
> - the driver is loaded and the device is not up before a suspend/resume
> 
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> Tested-by: Jarek Kamiński <jarek@vilo.eu.org>
> Cc: Hayes <hayeswang@realtek.com>
> Cc: Ben Hutchings <benh@debian.org>

Applied.

^ permalink raw reply

* Re: [PATCH] USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.
From: David Miller @ 2011-01-14  5:50 UTC (permalink / raw)
  To: jj
  Cc: linux-kernel, oliver, gregkh, linux-usb, netdev, alexey.orishko,
	hans.petter.selasky
In-Reply-To: <alpine.LNX.2.00.1101132229260.11347@swampdragon.chaosbits.net>

From: Jesper Juhl <jj@chaosbits.net>
Date: Thu, 13 Jan 2011 22:40:11 +0100 (CET)

> skb_clone() dynamically allocates memory and may fail. If it does it 
> returns NULL. This means we'll dereference a NULL pointer in 
> drivers/net/usb/cdc_ncm.c::cdc_ncm_rx_fixup().
> As far as I can tell, the proper way to deal with this is simply to goto 
> the error label.
> 
> Furthermore gcc complains that 'skb' may be used uninitialized:
>   drivers/net/usb/cdc_ncm.c: In function ‘cdc_ncm_rx_fixup’:
>   drivers/net/usb/cdc_ncm.c:922:18: warning: ‘skb’ may be used uninitialized in this function
> and I believe it is right. On the line where we
>   pr_debug("invalid frame detected (ignored)" ...
> we are using the local variable 'skb' but nothing has ever been assigned 
> to that variable yet. I believe the correct fix for that is to use 
> 'skb_in' instead.
> 
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] ks8695net: Disable non-working ethtool operations
From: David Miller @ 2011-01-14  5:50 UTC (permalink / raw)
  To: bhutchings; +Cc: figo1802, zealcook, netdev
In-Reply-To: <1294941014.3946.46.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 13 Jan 2011 17:50:14 +0000

> Some ethtool operations can only be implemented for the WAN port, and
> not all such operations are allowed to return an error code such as
> -EOPNOTSUPP.  Therefore, define two separate ethtool_ops structures
> for WAN and non-WAN ports; simplify and rename the WAN-only functions.
> 
> This is completely untested as I don't have an ARM build environment.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH] vxge: Remember to release firmware after upgrading firmware
From: David Miller @ 2011-01-14  5:50 UTC (permalink / raw)
  To: jj
  Cc: netdev, linux-kernel, ramkrishna.vepa, jon.mason,
	sivakumar.subramani, sreenivasa.honnur
In-Reply-To: <alpine.LNX.2.00.1101132119080.11347@swampdragon.chaosbits.net>

From: Jesper Juhl <jj@chaosbits.net>
Date: Thu, 13 Jan 2011 21:25:20 +0100 (CET)

> Regardless of whether the firmware update being performed by 
> vxge_fw_upgrade() is a success or not we must still remember to always 
> release_firmware() before returning.
> 
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] ks8695net: Use default implementation of ethtool_ops::get_link
From: David Miller @ 2011-01-14  5:51 UTC (permalink / raw)
  To: bhutchings; +Cc: figo1802, zealcook, netdev
In-Reply-To: <1294941171.3946.49.camel@bwh-desktop>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 13 Jan 2011 17:52:51 +0000

> This is completely untested as I don't have an ARM build environment.
> 
> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
 ...
> -/**
>   *	ks8695_wan_get_pause - Retrieve network pause/flow-control
> advertising
>   *	@ndev: The device to retrieve settings from
>   *	@param: The structure to fill out with the information

Line-wrap damage from your mail client.

> @@ -1058,7 +1044,7 @@ static const struct ethtool_ops
> ks8695_wan_ethtool_ops = {
>  	.get_settings	= ks8695_wan_get_settings,

Same here.

I fixed this up and applied your patch.

^ permalink raw reply

* Re: [PATCH net-next-2.6] netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr
From: David Miller @ 2011-01-14  5:51 UTC (permalink / raw)
  To: joe; +Cc: vapier.adi, tklauser, michael.hennerich, uclinux-dist-devel,
	netdev
In-Reply-To: <1294891685.4114.29.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Wed, 12 Jan 2011 20:08:04 -0800

> Remove code that has no effect.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] etherdevice.h: Add is_unicast_ether_addr function
From: David Miller @ 2011-01-14  5:51 UTC (permalink / raw)
  To: tklauser; +Cc: cmetcalf, netdev
In-Reply-To: <1294906496-14950-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Thu, 13 Jan 2011 09:14:56 +0100

>>From a check for !is_multicast_ether_addr it is not always obvious that
> we're checking for a unicast address. So add this helper function to
> make those code paths easier to read.
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next-2.6] netdev: tilepro: Use is_unicast_ether_addr helper
From: David Miller @ 2011-01-14  5:51 UTC (permalink / raw)
  To: tklauser; +Cc: cmetcalf, netdev
In-Reply-To: <1294906508-14999-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Thu, 13 Jan 2011 09:15:08 +0100

> Use is_unicast_ether_addr from linux/etherdevice.h instead of custom
> macros.
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH 02/10] GRETH: added option to disable a device node from bootloader.
From: David Miller @ 2011-01-14  6:12 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1294907135-24884-2-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Thu, 13 Jan 2011 09:25:27 +0100

> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

This is not how you do this.

Simply not present the device in the OpenFirmware tree at all.  If you
can make this special properly appear, you can also toss the device
node away completely.

There is zero reason whatsoever to create a special hack-job
non-standardized device node properly to do this.

^ permalink raw reply

* Re: [PATCH 03/10] GRETH: added no_gbit option
From: David Miller @ 2011-01-14  6:13 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1294907135-24884-3-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Thu, 13 Jan 2011 09:25:28 +0100

> For debug only. The driver does not report that it is GBit capable, instead
> it will report 10/100 mode to the generic PHY layer.
> 
> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

You may not add special purpose module parameters to your ethernet driver.

Instead, users can use ethtool to control what speeds the device will
try to advertise for during auto-negotiation, or use with a fixed
link configuration.

^ permalink raw reply

* Re: [PATCH 04/10] GRETH: added greth_compat_mode module parameter
From: David Miller @ 2011-01-14  6:14 UTC (permalink / raw)
  To: daniel; +Cc: netdev, kristoffer
In-Reply-To: <1294907135-24884-4-git-send-email-daniel@gaisler.com>

From: Daniel Hellstrom <daniel@gaisler.com>
Date: Thu, 13 Jan 2011 09:25:29 +0100

> The greth_compat_mode option can be used to set a GRETH GBit capable MAC
> in operate as if the GRETH 10/100 device was found. The GRETH GBit supports
> TCP/UDP checksum offloading, unaligned frame buffers, scatter gather etc.
> Enabling this mode allows the developer to test the GRETH 10/100 device
> without all features mentioned above on a GBit MAC capable of the above.
> 
> Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>

No special device specific module parameters please.

What if every single driver author added something like this and they
all named it something different or made it behave in slightly differing
ways?

What kind of user experience would that result in?

It would result in a sucky one, which is why we avoid adding all kinds
of hacky driver specific module options.

Find a generic way to provide this functionality.

^ permalink raw reply

* Re: [PATCH] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Po-Yu Chuang @ 2011-01-14  6:35 UTC (permalink / raw)
  To: Joe Perches; +Cc: netdev, linux-kernel, Po-Yu Chuang
In-Reply-To: <1294933197.4114.85.camel@Joe-Laptop>

Dear Joe,

On Thu, Jan 13, 2011 at 11:39 PM, Joe Perches <joe@perches.com> wrote:
> On Thu, 2011-01-13 at 19:49 +0800, Po-Yu Chuang wrote:
>> From: Po-Yu Chuang <ratbert@faraday-tech.com>
>>
>> FTMAC100 Ethernet Media Access Controller supports 10/100 Mbps and
>> MII.  This driver has been working on some ARM/NDS32 SoC including
>> Faraday A320 and Andes AG101.
>
> A couple of trivial comments not already mentioned by others.
>
>> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> []
>> @@ -2014,6 +2014,15 @@ config BCM63XX_ENET
>>         This driver supports the ethernet MACs in the Broadcom 63xx
>>         MIPS chipset family (BCM63XX).
>>
>> +config FTMAC100
>> +     tristate "Faraday FTMAC100 10/100 Ethernet support"
>> +     depends on ARM
>> +     select MII
>> +     help
>> +       This driver supports the FTMAC100 Ethernet controller from
>> +       Faraday. It is used on Faraday A320, Andes AG101, AG101P
>> +       and some other ARM/NDS32 SoC's.
>> +
>
> ARM specific net drivers are for now in drivers/net/arm/
> Perhaps it's better to locate these files there?

This controller is used by not only ARM-base platforms, but also
NDS32-base ones.
NDS32 is an ISA designed by Andes tech. Although it is not yet in the mainline,
they plan to push their stuff to mainline and are working on that.

So... I don't know if it is better to put my driver to driver/net/arm/.

Should I leave my driver at driver/net/ or put it in drvier/net/arm/
and let Andes tech guys
move it out of driver/net/arm/ when they use it later?

>> +     if (unlikely(ftmac100_rxdes_rx_error(rxdes))) {
>> +             if (printk_ratelimit())
>> +                     dev_info(dev, "rx err\n");
>
> There are many printk_ratelimit() tests.
>
> It's better to use net_ratelimit() or a local ratelimit_state
> so there's less possible suppression of other subsystem
> messages.

OK, fixed.


Thanks a lot for your absolutely non-trivial review. :-)
I'll submit a new version ASAP.

Thanks,
Po-Yu  Chuang

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Simon Horman @ 2011-01-14  6:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesse Gross, Eric Dumazet, Rusty Russell, virtualization, dev,
	virtualization, netdev, kvm
In-Reply-To: <20110114045818.GA29738@redhat.com>

On Fri, Jan 14, 2011 at 06:58:18AM +0200, Michael S. Tsirkin wrote:
> On Fri, Jan 14, 2011 at 08:41:36AM +0900, Simon Horman wrote:
> > On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
> > > On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman <horms@verge.net.au> wrote:
> > > > On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> > > >> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> > > >> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> > > >> >
> > > >> > [ snip ]
> > > >> > >
> > > >> > > I know that everyone likes a nice netperf result but I agree with
> > > >> > > Michael that this probably isn't the right question to be asking.  I
> > > >> > > don't think that socket buffers are a real solution to the flow
> > > >> > > control problem: they happen to provide that functionality but it's
> > > >> > > more of a side effect than anything.  It's just that the amount of
> > > >> > > memory consumed by packets in the queue(s) doesn't really have any
> > > >> > > implicit meaning for flow control (think multiple physical adapters,
> > > >> > > all with the same speed instead of a virtual device and a physical
> > > >> > > device with wildly different speeds).  The analog in the physical
> > > >> > > world that you're looking for would be Ethernet flow control.
> > > >> > > Obviously, if the question is limiting CPU or memory consumption then
> > > >> > > that's a different story.
> > > >> >
> > > >> > Point taken. I will see if I can control CPU (and thus memory) consumption
> > > >> > using cgroups and/or tc.
> > > >>
> > > >> I have found that I can successfully control the throughput using
> > > >> the following techniques
> > > >>
> > > >> 1) Place a tc egress filter on dummy0
> > > >>
> > > >> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
> > > >>    this is effectively the same as one of my hacks to the datapath
> > > >>    that I mentioned in an earlier mail. The result is that eth1
> > > >>    "paces" the connection.
> 
> This is actually a bug. This means that one slow connection will affect
> fast ones. I intend to change the default for qemu to sndbuf=0 : this
> will fix it but break your "pacing". So pls do not count on this
> behaviour.

Do you have a patch I could test?

> > > > Further to this, I wonder if there is any interest in providing
> > > > a method to switch the action order - using ovs-ofctl is a hack imho -
> > > > and/or switching the default action order for mirroring.
> > > 
> > > I'm not sure that there is a way to do this that is correct in the
> > > generic case.  It's possible that the destination could be a VM while
> > > packets are being mirrored to a physical device or we could be
> > > multicasting or some other arbitrarily complex scenario.  Just think
> > > of what a physical switch would do if it has ports with two different
> > > speeds.
> > 
> > Yes, I have considered that case. And I agree that perhaps there
> > is no sensible default. But perhaps we could make it configurable somehow?
> 
> The fix is at the application level. Run netperf with -b and -w flags to
> limit the speed to a sensible value.

Perhaps I should have stated my goals more clearly.
I'm interested in situations where I don't control the application.


^ permalink raw reply

* Re: [PATCH v4 06/10] ARM: mx28: update clock and device name for dual fec support
From: Shawn Guo @ 2011-01-14  6:46 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: davem, gerg, baruch, eric, bryan.wu, r64343, B32542, lw, w.sang,
	s.hauer, jamie, jamie, netdev, linux-arm-kernel
In-Reply-To: <20110113150622.GV24920@pengutronix.de>

Hi Uwe,

On Thu, Jan 13, 2011 at 04:06:22PM +0100, Uwe Kleine-König wrote:
> Hi Shawn,
> 
> On Thu, Jan 06, 2011 at 03:13:14PM +0800, Shawn Guo wrote:
> > Change device name from "fec" to "imx28-fec", so that fec driver
> > can distinguish mx28.
> > 
> > Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
> > ---
> > Changes for v4:
> >  - Use "imx28-fec" as fec device name
> > 
> > Changes for v3:
> >  - Change device name to "enet-mac"
> > 
> >  arch/arm/mach-mxs/clock-mx28.c           |    3 ++-
> >  arch/arm/mach-mxs/devices/platform-fec.c |    2 +-
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/mach-mxs/clock-mx28.c b/arch/arm/mach-mxs/clock-mx28.c
> > index f20b254..e2a8b0f 100644
> > --- a/arch/arm/mach-mxs/clock-mx28.c
> > +++ b/arch/arm/mach-mxs/clock-mx28.c
> > @@ -606,7 +606,8 @@ static struct clk_lookup lookups[] = {
> >  	_REGISTER_CLOCK("duart", "apb_pclk", xbus_clk)
> >  	/* for amba-pl011 driver */
> >  	_REGISTER_CLOCK("duart", NULL, uart_clk)
> > -	_REGISTER_CLOCK("fec.0", NULL, fec_clk)
> > +	_REGISTER_CLOCK("imx28-fec.0", NULL, fec_clk)
> > +	_REGISTER_CLOCK("imx28-fec.1", NULL, fec_clk)
> >  	_REGISTER_CLOCK("rtc", NULL, rtc_clk)
> >  	_REGISTER_CLOCK("pll2", NULL, pll2_clk)
> >  	_REGISTER_CLOCK(NULL, "hclk", hbus_clk)
> > diff --git a/arch/arm/mach-mxs/devices/platform-fec.c b/arch/arm/mach-mxs/devices/platform-fec.c
> > index c08168c..c42dff7 100644
> > --- a/arch/arm/mach-mxs/devices/platform-fec.c
> > +++ b/arch/arm/mach-mxs/devices/platform-fec.c
> > @@ -45,6 +45,6 @@ struct platform_device *__init mxs_add_fec(
> >  		},
> >  	};
> >  
> > -	return mxs_add_platform_device("fec", data->id,
> > +	return mxs_add_platform_device("imx28-fec", data->id,
> IMHO "imx28-fec" shouldn't be hard coded here but come from data.  See
> imx-spi device registration for an example.
> 
Sascha has merged the patch, so I will send a follow-up patch.

-- 
Regards,
Shawn


^ permalink raw reply

* Re: [PATCH] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Po-Yu Chuang @ 2011-01-14  6:44 UTC (permalink / raw)
  To: Andres Salomon; +Cc: Eric Dumazet, netdev, linux-kernel, Po-Yu Chuang
In-Reply-To: <20110113082950.6747ebb8@queued.net>

Dear Andres,

On Fri, Jan 14, 2011 at 12:29 AM, Andres Salomon <dilinger@queued.net> wrote:
> On Thu, 13 Jan 2011 15:22:39 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le jeudi 13 janvier 2011 à 19:49 +0800, Po-Yu Chuang a écrit :

> No one else mentioned it, so I'll add:
>
> Don't explicitly inline functions unless they're in a header, or you
> have a really good reason (and that reason should probably be described
> in a comment).  Otherwise, just leave off the 'inline' keyword; the
> compiler is smart enough to decide whether a function should be inlined
> or not.

OK, fixed.

Thanks,
Po-Yu Chuang

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox