Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] packet: Add fanout support.
From: Eric Dumazet @ 2011-07-05 16:23 UTC (permalink / raw)
  To: Victor Julien; +Cc: Loke, Chetan, David Miller, netdev
In-Reply-To: <4E133A09.8030907@inliniac.net>

Le mardi 05 juillet 2011 à 18:21 +0200, Victor Julien a écrit :
> On 07/05/2011 06:16 PM, Eric Dumazet wrote:
> > Remember, goal is that _all_ packets of a given flow end in same queue.
> > 
> 
> What about a hashing scheme based on just the ip addresses? Would make
> rxhash useless for this purpose, but would be a lot simpler overall maybe...
> 

What about loads where a single IP address is used ?

I wonder what's the problem, since David added a defrag unit ;)

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Michael Büsch @ 2011-07-05 16:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neil Horman, Alexey Zaytsev, Andrew Morton, netdev, Gary Zambrano,
	bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309882352.2271.19.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, 05 Jul 2011 18:12:32 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Hmm... We are in a NAPI handler... There wont be a new interrupt.
> 
> Plus, we do at start of b44_rx() :
> 
> prod  = br32(bp, B44_DMARX_STAT) & DMARX_STAT_CDMASK;
> 
> So all descriptors before prod are guaranteed to be ready for host
> consume... Fact that a dma access is running on 'next descriptor' should
> be irrelevant.
> 
> IMHO Peeking B44_DMARX_STAT for each packet would be a waste of time.

Yeah I think so, too. We don't need to wait for the _whole_ DMA engine
to go idle, before we can access a buffer in the ring. We just need to make sure
that the device is finished with that buffer. And we do this by reading the current
descriptor pointer.

^ permalink raw reply

* Re: [RFC] non-preemptible kernel socket for RAMster
From: Eric Dumazet @ 2011-07-05 16:30 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: netdev, Konrad Wilk, linux-mm
In-Reply-To: <4232c4b6-15be-42d8-be42-6e27f9188ce2@default>

Le mardi 05 juillet 2011 à 08:54 -0700, Dan Magenheimer a écrit :
> In working on a kernel project called RAMster* (where RAM on a
> remote system may be used for clean page cache pages and for swap
> pages), I found I have need for a kernel socket to be used when
> in non-preemptible state.  I admit to being a networking idiot,
> but I have been successfully using the following small patch.
> I'm not sure whether I am lucky so far... perhaps more
> sockets or larger/different loads will require a lot more
> changes (or maybe even make my objective impossible).
> So I thought I'd post it for comment.  I'd appreciate
> any thoughts or suggestions.
> 
> Thanks,
> Dan
> 
> * http://events.linuxfoundation.org/events/linuxcon/magenheimer 
> 
> diff -Napur linux-2.6.37/net/core/sock.c linux-2.6.37-ramster/net/core/sock.c
> --- linux-2.6.37/net/core/sock.c	2011-07-03 19:14:52.267853088 -0600
> +++ linux-2.6.37-ramster/net/core/sock.c	2011-07-03 19:10:04.340980799 -0600
> @@ -1587,6 +1587,14 @@ static void __lock_sock(struct sock *sk)
>  	__acquires(&sk->sk_lock.slock)
>  {
>  	DEFINE_WAIT(wait);
> +	if (!preemptible()) {
> +		while (sock_owned_by_user(sk)) {
> +			spin_unlock_bh(&sk->sk_lock.slock);
> +			cpu_relax();
> +			spin_lock_bh(&sk->sk_lock.slock);
> +		}
> +		return;
> +	}

Hmm, was this tested on UP machine ?

>  
>  	for (;;) {
>  		prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
> @@ -1623,7 +1631,8 @@ static void __release_sock(struct sock *
>  			 * This is safe to do because we've taken the backlog
>  			 * queue private:
>  			 */
> -			cond_resched_softirq();
> +			if (preemptible())
> +				cond_resched_softirq();
>  			skb = next;
>  		} while (skb != NULL);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* I need your assistance
From: Leung Cheung @ 2011-07-05 16:35 UTC (permalink / raw)



Hello,

Compliment of the day to you. I am sending this brief letter to solicit 
your partnership to transfer $22,500,000.00 US Dollars from Hong Kong to 
your country. You will be entitled to 40% after compliting the transaction 
while I will be entitled to 60% as the sole initiator of this mutual 
proposal. I shall send you more information and procedures when I receive 
a positive response from you.

Best Regards,

Mr. Leung Cheung
Email: leungcheung11@helixnet.cn



^ permalink raw reply

* RE: [RFC] non-preemptible kernel socket for RAMster
From: Loke, Chetan @ 2011-07-05 16:36 UTC (permalink / raw)
  To: Dan Magenheimer, netdev; +Cc: Konrad Wilk, linux-mm
In-Reply-To: <4232c4b6-15be-42d8-be42-6e27f9188ce2@default>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Dan Magenheimer
> Sent: July 05, 2011 11:54 AM
> To: netdev@vger.kernel.org
> Cc: Konrad Wilk; linux-mm
> Subject: [RFC] non-preemptible kernel socket for RAMster
> 
> In working on a kernel project called RAMster* (where RAM on a
> remote system may be used for clean page cache pages and for swap
> pages), I found I have need for a kernel socket to be used when


How is RAMster+swap different than NBD's (pending etc?)support for SWAP
over NBD?


Chetan Loke


^ permalink raw reply

* [PATCH net-next-2.6] net: sched: constify tcf_proto and tc_action
From: Eric Dumazet @ 2011-07-05 16:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/act_api.h     |    6 +++---
 include/net/pkt_sched.h   |    4 ++--
 include/net/sch_generic.h |   12 +++++++-----
 net/sched/act_api.c       |    4 ++--
 net/sched/act_gact.c      |    3 ++-
 net/sched/act_mirred.c    |    2 +-
 net/sched/act_nat.c       |    2 +-
 net/sched/act_pedit.c     |    2 +-
 net/sched/act_police.c    |    2 +-
 net/sched/act_simple.c    |    3 ++-
 net/sched/act_skbedit.c   |    2 +-
 net/sched/cls_api.c       |    6 +++---
 net/sched/cls_basic.c     |    2 +-
 net/sched/cls_flow.c      |    2 +-
 net/sched/cls_fw.c        |    2 +-
 net/sched/cls_route.c     |    2 +-
 net/sched/cls_rsvp.h      |    2 +-
 net/sched/cls_tcindex.c   |    2 +-
 net/sched/cls_u32.c       |    2 +-
 net/sched/sch_api.c       |    6 +++---
 20 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index bab385f..c739531 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -72,7 +72,7 @@ struct tcf_act_hdr {
 
 struct tc_action {
 	void			*priv;
-	struct tc_action_ops	*ops;
+	const struct tc_action_ops	*ops;
 	__u32			type; /* for backward compat(TCA_OLD_COMPAT) */
 	__u32			order;
 	struct tc_action	*next;
@@ -86,7 +86,7 @@ struct tc_action_ops {
 	__u32   type; /* TBD to match kind */
 	__u32 	capab;  /* capabilities includes 4 bit version */
 	struct module		*owner;
-	int     (*act)(struct sk_buff *, struct tc_action *, struct tcf_result *);
+	int     (*act)(struct sk_buff *, const struct tc_action *, struct tcf_result *);
 	int     (*get_stats)(struct sk_buff *, struct tc_action *);
 	int     (*dump)(struct sk_buff *, struct tc_action *, int, int);
 	int     (*cleanup)(struct tc_action *, int bind);
@@ -115,7 +115,7 @@ extern void tcf_hash_insert(struct tcf_common *p, struct tcf_hashinfo *hinfo);
 extern int tcf_register_action(struct tc_action_ops *a);
 extern int tcf_unregister_action(struct tc_action_ops *a);
 extern void tcf_action_destroy(struct tc_action *a, int bind);
-extern int tcf_action_exec(struct sk_buff *skb, struct tc_action *a, struct tcf_result *res);
+extern int tcf_action_exec(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res);
 extern struct tc_action *tcf_action_init(struct nlattr *nla, struct nlattr *est, char *n, int ovr, int bind);
 extern struct tc_action *tcf_action_init_1(struct nlattr *nla, struct nlattr *est, char *n, int ovr, int bind);
 extern int tcf_action_dump(struct sk_buff *skb, struct tc_action *a, int, int);
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 65afc49..fffdc60 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -99,9 +99,9 @@ static inline void qdisc_run(struct Qdisc *q)
 		__qdisc_run(q);
 }
 
-extern int tc_classify_compat(struct sk_buff *skb, struct tcf_proto *tp,
+extern int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
 			      struct tcf_result *res);
-extern int tc_classify(struct sk_buff *skb, struct tcf_proto *tp,
+extern int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		       struct tcf_result *res);
 
 /* Calculate maximal size of packet seen by hard_start_xmit
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b931f02..626177b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -181,8 +181,9 @@ struct tcf_proto_ops {
 	struct tcf_proto_ops	*next;
 	char			kind[IFNAMSIZ];
 
-	int			(*classify)(struct sk_buff*, struct tcf_proto*,
-					struct tcf_result *);
+	int			(*classify)(struct sk_buff *,
+					    const struct tcf_proto*,
+					    struct tcf_result *);
 	int			(*init)(struct tcf_proto*);
 	void			(*destroy)(struct tcf_proto*);
 
@@ -205,8 +206,9 @@ struct tcf_proto {
 	/* Fast access part */
 	struct tcf_proto	*next;
 	void			*root;
-	int			(*classify)(struct sk_buff*, struct tcf_proto*,
-					struct tcf_result *);
+	int			(*classify)(struct sk_buff *,
+					    const struct tcf_proto *,
+					    struct tcf_result *);
 	__be16			protocol;
 
 	/* All the rest */
@@ -214,7 +216,7 @@ struct tcf_proto {
 	u32			classid;
 	struct Qdisc		*q;
 	void			*data;
-	struct tcf_proto_ops	*ops;
+	const struct tcf_proto_ops	*ops;
 };
 
 struct qdisc_skb_cb {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 2f64262..f2fb67e 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -365,10 +365,10 @@ static struct tc_action_ops *tc_lookup_action_id(u32 type)
 }
 #endif
 
-int tcf_action_exec(struct sk_buff *skb, struct tc_action *act,
+int tcf_action_exec(struct sk_buff *skb, const struct tc_action *act,
 		    struct tcf_result *res)
 {
-	struct tc_action *a;
+	const struct tc_action *a;
 	int ret = -1;
 
 	if (skb->tc_verd & TC_NCLS) {
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 2b4ab4b..b77f5a0 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -125,7 +125,8 @@ static int tcf_gact_cleanup(struct tc_action *a, int bind)
 	return 0;
 }
 
-static int tcf_gact(struct sk_buff *skb, struct tc_action *a, struct tcf_result *res)
+static int tcf_gact(struct sk_buff *skb, const struct tc_action *a,
+		    struct tcf_result *res)
 {
 	struct tcf_gact *gact = a->priv;
 	int action = TC_ACT_SHOT;
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 961386e..102fc21 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -154,7 +154,7 @@ static int tcf_mirred_cleanup(struct tc_action *a, int bind)
 	return 0;
 }
 
-static int tcf_mirred(struct sk_buff *skb, struct tc_action *a,
+static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		      struct tcf_result *res)
 {
 	struct tcf_mirred *m = a->priv;
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index 762b027..001d1b3 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -102,7 +102,7 @@ static int tcf_nat_cleanup(struct tc_action *a, int bind)
 	return tcf_hash_release(&p->common, bind, &nat_hash_info);
 }
 
-static int tcf_nat(struct sk_buff *skb, struct tc_action *a,
+static int tcf_nat(struct sk_buff *skb, const struct tc_action *a,
 		   struct tcf_result *res)
 {
 	struct tcf_nat *p = a->priv;
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 7affe9a..10d3aed 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -120,7 +120,7 @@ static int tcf_pedit_cleanup(struct tc_action *a, int bind)
 	return 0;
 }
 
-static int tcf_pedit(struct sk_buff *skb, struct tc_action *a,
+static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 		     struct tcf_result *res)
 {
 	struct tcf_pedit *p = a->priv;
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index b3b9b32..6fb3f5a 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -282,7 +282,7 @@ static int tcf_act_police_cleanup(struct tc_action *a, int bind)
 	return ret;
 }
 
-static int tcf_act_police(struct sk_buff *skb, struct tc_action *a,
+static int tcf_act_police(struct sk_buff *skb, const struct tc_action *a,
 			  struct tcf_result *res)
 {
 	struct tcf_police *police = a->priv;
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index a34a22d..73e0a3a 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -36,7 +36,8 @@ static struct tcf_hashinfo simp_hash_info = {
 };
 
 #define SIMP_MAX_DATA	32
-static int tcf_simp(struct sk_buff *skb, struct tc_action *a, struct tcf_result *res)
+static int tcf_simp(struct sk_buff *skb, const struct tc_action *a,
+		    struct tcf_result *res)
 {
 	struct tcf_defact *d = a->priv;
 
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 5f6f0c7..35dbbe9 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -39,7 +39,7 @@ static struct tcf_hashinfo skbedit_hash_info = {
 	.lock	=	&skbedit_lock,
 };
 
-static int tcf_skbedit(struct sk_buff *skb, struct tc_action *a,
+static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
 		       struct tcf_result *res)
 {
 	struct tcf_skbedit *d = a->priv;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 9563887..a69d44f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -40,9 +40,9 @@ static DEFINE_RWLOCK(cls_mod_lock);
 
 /* Find classifier type by string name */
 
-static struct tcf_proto_ops *tcf_proto_lookup_ops(struct nlattr *kind)
+static const struct tcf_proto_ops *tcf_proto_lookup_ops(struct nlattr *kind)
 {
-	struct tcf_proto_ops *t = NULL;
+	const struct tcf_proto_ops *t = NULL;
 
 	if (kind) {
 		read_lock(&cls_mod_lock);
@@ -132,7 +132,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 	struct Qdisc  *q;
 	struct tcf_proto **back, **chain;
 	struct tcf_proto *tp;
-	struct tcf_proto_ops *tp_ops;
+	const struct tcf_proto_ops *tp_ops;
 	const struct Qdisc_class_ops *cops;
 	unsigned long cl;
 	unsigned long fh;
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 8be8872..ea1f70b 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -39,7 +39,7 @@ static const struct tcf_ext_map basic_ext_map = {
 	.police = TCA_BASIC_POLICE
 };
 
-static int basic_classify(struct sk_buff *skb, struct tcf_proto *tp,
+static int basic_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			  struct tcf_result *res)
 {
 	int r;
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 34533a5..6994214 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -356,7 +356,7 @@ static u32 flow_key_get(struct sk_buff *skb, int key)
 	}
 }
 
-static int flow_classify(struct sk_buff *skb, struct tcf_proto *tp,
+static int flow_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			 struct tcf_result *res)
 {
 	struct flow_head *head = tp->root;
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 26e7bc4..389af15 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -77,7 +77,7 @@ static inline int fw_hash(u32 handle)
 		return handle & (HTSIZE - 1);
 }
 
-static int fw_classify(struct sk_buff *skb, struct tcf_proto *tp,
+static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			  struct tcf_result *res)
 {
 	struct fw_head *head = (struct fw_head *)tp->root;
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index a9079053..13ab66e 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -125,7 +125,7 @@ static inline int route4_hash_wild(void)
 	return 0;						\
 }
 
-static int route4_classify(struct sk_buff *skb, struct tcf_proto *tp,
+static int route4_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			   struct tcf_result *res)
 {
 	struct route4_head *head = (struct route4_head *)tp->root;
diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index ed691b1..be4505e 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -130,7 +130,7 @@ static struct tcf_ext_map rsvp_ext_map = {
 		return r;				\
 }
 
-static int rsvp_classify(struct sk_buff *skb, struct tcf_proto *tp,
+static int rsvp_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			 struct tcf_result *res)
 {
 	struct rsvp_session **sht = ((struct rsvp_head *)tp->root)->ht;
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 36667fa..dbe1992 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -79,7 +79,7 @@ tcindex_lookup(struct tcindex_data *p, u16 key)
 }
 
 
-static int tcindex_classify(struct sk_buff *skb, struct tcf_proto *tp,
+static int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			    struct tcf_result *res)
 {
 	struct tcindex_data *p = PRIV(tp);
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 3b93fc0..939b627 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -93,7 +93,7 @@ static inline unsigned int u32_hash_fold(__be32 key,
 	return h;
 }
 
-static int u32_classify(struct sk_buff *skb, struct tcf_proto *tp, struct tcf_result *res)
+static int u32_classify(struct sk_buff *skb, const struct tcf_proto *tp, struct tcf_result *res)
 {
 	struct {
 		struct tc_u_knode *knode;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 8182aef..dca6c1a 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1644,7 +1644,7 @@ done:
  * to this qdisc, (optionally) tests for protocol and asks
  * specific classifiers.
  */
-int tc_classify_compat(struct sk_buff *skb, struct tcf_proto *tp,
+int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
 		       struct tcf_result *res)
 {
 	__be16 protocol = skb->protocol;
@@ -1668,12 +1668,12 @@ int tc_classify_compat(struct sk_buff *skb, struct tcf_proto *tp,
 }
 EXPORT_SYMBOL(tc_classify_compat);
 
-int tc_classify(struct sk_buff *skb, struct tcf_proto *tp,
+int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 		struct tcf_result *res)
 {
 	int err = 0;
 #ifdef CONFIG_NET_CLS_ACT
-	struct tcf_proto *otp = tp;
+	const struct tcf_proto *otp = tp;
 reclassify:
 #endif
 



^ permalink raw reply related

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Neil Horman @ 2011-07-05 16:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309882352.2271.19.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 12:05 -0400, Neil Horman a écrit :
> > On Tue, Jul 05, 2011 at 07:59:33AM +0200, Eric Dumazet wrote:
> > > Le mardi 05 juillet 2011 à 07:33 +0200, Eric Dumazet a écrit :
> > > > Le mardi 05 juillet 2011 à 09:18 +0400, Alexey Zaytsev a écrit :
> > > > 
> > > > > Actually, I've added a trace to show b44_init_rings and b44_free_rings
> > > > > calls, and they are only called once, right after the driver is
> > > > > loaded. So it can't be related to START_RFO. Will attach the diff and
> > > > > dmesg.
> > > > 
> > > > Thanks
> > > > 
> > > > I was wondering if DMA could be faster if providing word aligned
> > > > addresses, could you try :
> > > > 
> > > > -#define RX_PKT_OFFSET          (RX_HEADER_LEN + 2)
> > > > +#define RX_PKT_OFFSET          (RX_HEADER_LEN + NET_IP_ALIGN)
> > > > 
> > > > (On x86, we now have NET_IP_ALIGN = 0 since commit ea812ca1)
> > > > 
> > > 
> > > I suspect a hardware bug.
> > > 
> > I'm not sure if this helps, but I've been reading over this bug, and it seems
> > that the rx path never checks the status of a buffers rx header prior to
> > unmapping it or otherwise modifying it in hardware.  If we were to start munging
> > pointers in the rx channel while a dma was active in it still, it sems like the
> > observed corruption might be the result.  The docs aren't super clear on this,
> > but I think a descriptor needs to be in the idle wait or stopped state prior to
> > being acessed.  This patch might help out there (although I don't have hardware
> > to test)
> > Neil
> > 
> > diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> > index 3d247f3..48540ad 100644
> > --- a/drivers/net/b44.c
> > +++ b/drivers/net/b44.c
> > @@ -769,7 +769,19 @@ static int b44_rx(struct b44 *bp, int budget)
> >  		dma_addr_t map = rp->mapping;
> >  		struct rx_header *rh;
> >  		u16 len;
> > -
> > +		u32 state = br32(bp, B44_DMARX_STAT) & DMARX_STAT_SMASK;
> > +		state >>= 12;
> > +
> > +		/*
> > + 		 * I _think_ descriptors need to be in the idle or stopped state
> > + 		 * before its safe to access them.  If the current buffer
> > + 		 * pointed to by the dma channel is in state 1 or lower (active
> > + 		 * or disabled), then we should just stop receving until the
> > + 		 * next interrupt kicks us again (I think)
> > + 		 */
> > +		if (state < 2)
> > +			return;
> > + 
> >  		dma_sync_single_for_cpu(bp->sdev->dev, map,
> >  					    RX_PKT_BUF_SZ,
> >  					    DMA_FROM_DEVICE);
> 
> Hmm... We are in a NAPI handler... There wont be a new interrupt.
> 
Not until we're done with the napi handler, no.  But if we encounter a dma
descriptor that isn't idle, then we know that either we're clearing out the ring
(ie. for a shutdown), or we'll get another interrupt when the descriptor we
failed on completes.

> Plus, we do at start of b44_rx() :
> 
> prod  = br32(bp, B44_DMARX_STAT) & DMARX_STAT_CDMASK;
> 
Yes, that just tells us which is the current dma index.  After that we loop
through subsequent dma descriptor incrementing the index each time.

> So all descriptors before prod are guaranteed to be ready for host
> consume... Fact that a dma access is running on 'next descriptor' should
> be irrelevant.
> 
But we handle more than one descriptor per b44_rx call - theres a while loop in
there where we do advance to the next descriptor.

> IMHO Peeking B44_DMARX_STAT for each packet would be a waste of time.
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05 16:47 UTC (permalink / raw)
  To: Neil Horman
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <20110705164202.GD2959@hmsreliant.think-freely.org>

Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:

> > So all descriptors before prod are guaranteed to be ready for host
> > consume... Fact that a dma access is running on 'next descriptor' should
> > be irrelevant.
> > 
> But we handle more than one descriptor per b44_rx call - theres a while loop in
> there where we do advance to the next descriptor.

Yes, but we advance up to 'prod', which is the very last safe
descriptor.

If hardware advertises descriptor X being ready to be handled by host,
while DMA on this X descriptor is not yet finished, this would be a
really useless hardware ;)




^ permalink raw reply

* Re: libpcap and tc filters
From: Adam Katz @ 2011-07-05 16:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: jhs, netdev
In-Reply-To: <1309882485.2271.21.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

bah
I tried many times on both my ubuntu 10.04 boxes (2.6.32-32-generic)
one 32 bit, the other 64 and in both cases it simply refused to work.
I tried it with several different versions of libpcap (just in case)
as well as two different versions of tcpreplay (3.4.4 and 3.4.3) and
iproute (ubuntu 10.04 default version and the latest 2.6.39). One of
the machines had several NICs from different manufacturers so I tried
it on both too - and nothing!

However, once I moved to a third machine which is a virtual box with
fedora core 15 (2.6.38.6-26) - it suddenly worked.
In all cases I used the same pcap file and configuration.

unless someone suggests a better solution, it seems like i'll be
adhering to the first law of engineering - "if it works, don't fix it"
and simply replace install fedora core 15 instead of ubuntu.

On Tue, Jul 5, 2011 at 7:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 18:16 +0300, Adam Katz a écrit :
>> strange.
>> I've now tried the exact same configuration and it simply refuses to
>> work. Maybe your tcpreplay is configured differently...
>>
>> What distro are you using? What kernel? What version of libpcap?
>
> I did the same tests here and it works correctly for me.
>
> latest kernel 3.0-rc6
>
> # /usr/local/bin/tcpreplay -V
> tcpreplay version: 3.4.4 (build 2450)
> Copyright 2000-2010 by Aaron Turner <aturner at synfin dot net>
> Cache file supported: 04
> Not compiled with libdnet.
> Compiled against libpcap: 1.1.1
> 64 bit packet counters: enabled
> Verbose printing via tcpdump: enabled
> Packet editing: disabled
> Fragroute engine: disabled
> Injection method: PF_PACKET send()
>
>
>

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05 16:57 UTC (permalink / raw)
  To: Neil Horman
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309884441.2271.34.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

Le mardi 05 juillet 2011 à 18:47 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> > On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> 
> > > So all descriptors before prod are guaranteed to be ready for host
> > > consume... Fact that a dma access is running on 'next descriptor' should
> > > be irrelevant.
> > > 
> > But we handle more than one descriptor per b44_rx call - theres a while loop in
> > there where we do advance to the next descriptor.
> 
> Yes, but we advance up to 'prod', which is the very last safe
> descriptor.
> 
> If hardware advertises descriptor X being ready to be handled by host,
> while DMA on this X descriptor is not yet finished, this would be a
> really useless hardware ;)
> 
> 


BTW the code around line 782 seems really suspect.

We should log an error once, just in case...

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 98c977e..bc7ce27 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -781,6 +781,7 @@ static int b44_rx(struct b44 *bp, int budget)
 		if (len == 0) {
 			int i = 0;
 
+			pr_err_once("b44: zero len !\n");
 			do {
 				udelay(2);
 				barrier();



^ permalink raw reply related

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Joe Perches @ 2011-07-05 17:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Neil Horman, Alexey Zaytsev, Michael Büsch, Andrew Morton,
	netdev, Gary Zambrano, bugme-daemon, David S. Miller,
	Pekka Pietikainen, Florian Schirmer, Felix Fietkau,
	Michael Buesch
In-Reply-To: <1309885050.2271.36.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, 2011-07-05 at 18:57 +0200, Eric Dumazet wrote:
> We should log an error once, just in case...
[]
> diff --git a/drivers/net/b44.c b/drivers/net/b44.c
[]
> @@ -781,6 +781,7 @@ static int b44_rx(struct b44 *bp, int budget)
>  		if (len == 0) {
>  			int i = 0;
>  
> +			pr_err_once("b44: zero len !\n");

Trivia:

You don't need the "b44: " prefix.
The embedded pr_fmt in pr_<level>_once adds it.



^ permalink raw reply

* Re: [PATCH 2/2] packet: Add fanout support.
From: Victor Julien @ 2011-07-05 17:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Loke, Chetan, David Miller, netdev
In-Reply-To: <1309882999.2271.25.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On 07/05/2011 06:23 PM, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 18:21 +0200, Victor Julien a écrit :
>> On 07/05/2011 06:16 PM, Eric Dumazet wrote:
>>> Remember, goal is that _all_ packets of a given flow end in same queue.
>>>
>>
>> What about a hashing scheme based on just the ip addresses? Would make
>> rxhash useless for this purpose, but would be a lot simpler overall maybe...
>>
> 
> What about loads where a single IP address is used ?

How would that be a problem?

> I wonder what's the problem, since David added a defrag unit ;)

No problem, I just suggested it as Chetan Loke brought up the hashing
suggestion. David's solution would work fine for me.

-- 
---------------------------------------------
Victor Julien
http://www.inliniac.net/
PGP: http://www.inliniac.net/victorjulien.asc
---------------------------------------------

^ permalink raw reply

* Re: divide error: 0000, in bictcp_cong_avoid, kernel 2.6.39
From: Stephen Hemminger @ 2011-07-05 17:16 UTC (permalink / raw)
  To: TB; +Cc: netdev
In-Reply-To: <4E120208.2090500@techboom.com>

On Mon, 04 Jul 2011 14:10:16 -0400
TB <lkml@techboom.com> wrote:

> On 11-07-04 01:36 PM, Stephen Hemminger wrote:
> > Any data about the type of connection, kernel configuration or other
> > information that might be useful in reproducing the problem?
> > 
> > Also please try 2.6.39.2
> 
> We haven't found a sure way of reproducing it.
> It happened on 1.2% of our servers over the weekend and seems random.
> Both are connected with 2 gigabit ports using bonding. Traffic tends to
> be heavy, but doesn't seem to be a factor.
> 
> Would a .config help ?
> 
> Only the very basic filter module for iptables is compiled in.
> 
> We will try 2.6.39.2 soon

Kernel config (and compiler version) would help in identifying which
of the three divides is getting divide by zero.

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Neil Horman @ 2011-07-05 17:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309884441.2271.34.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 05, 2011 at 06:47:21PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> > On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> 
> > > So all descriptors before prod are guaranteed to be ready for host
> > > consume... Fact that a dma access is running on 'next descriptor' should
> > > be irrelevant.
> > > 
> > But we handle more than one descriptor per b44_rx call - theres a while loop in
> > there where we do advance to the next descriptor.
> 
> Yes, but we advance up to 'prod', which is the very last safe
> descriptor.
> 
> If hardware advertises descriptor X being ready to be handled by host,
> while DMA on this X descriptor is not yet finished, this would be a
> really useless hardware ;)
> 
Doh, sorry, I completely missed the fact that the status register index value
might be more than 1 entry ahead of cons, and that we advance cons up to prod,
but not past.  I assume then, that the dma state refers to the state of the
channel, rather than the state of the packet that the status register currently
indexes?

Please disregard everything I said :)
Neil

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* RE: [RFC] non-preemptible kernel socket for RAMster
From: Dan Magenheimer @ 2011-07-05 17:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Konrad Wilk, linux-mm
In-Reply-To: <1309883430.2271.27.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Tuesday, July 05, 2011 10:31 AM
> To: Dan Magenheimer
> Cc: netdev@vger.kernel.org; Konrad Wilk; linux-mm
> Subject: Re: [RFC] non-preemptible kernel socket for RAMster
> 
> Le mardi 05 juillet 2011 à 08:54 -0700, Dan Magenheimer a écrit :
> > In working on a kernel project called RAMster* (where RAM on a
> > remote system may be used for clean page cache pages and for swap
> > pages), I found I have need for a kernel socket to be used when
> > in non-preemptible state.  I admit to being a networking idiot,
> > but I have been successfully using the following small patch.
> > I'm not sure whether I am lucky so far... perhaps more
> > sockets or larger/different loads will require a lot more
> > changes (or maybe even make my objective impossible).
> > So I thought I'd post it for comment.  I'd appreciate
> > any thoughts or suggestions.
> >
> > Thanks,
> > Dan
> >
> > * http://events.linuxfoundation.org/events/linuxcon/magenheimer
> >
> > diff -Napur linux-2.6.37/net/core/sock.c linux-2.6.37-ramster/net/core/sock.c
> > --- linux-2.6.37/net/core/sock.c	2011-07-03 19:14:52.267853088 -0600
> > +++ linux-2.6.37-ramster/net/core/sock.c	2011-07-03 19:10:04.340980799 -0600
> > @@ -1587,6 +1587,14 @@ static void __lock_sock(struct sock *sk)
> >  	__acquires(&sk->sk_lock.slock)
> >  {
> >  	DEFINE_WAIT(wait);
> > +	if (!preemptible()) {
> > +		while (sock_owned_by_user(sk)) {
> > +			spin_unlock_bh(&sk->sk_lock.slock);
> > +			cpu_relax();
> > +			spin_lock_bh(&sk->sk_lock.slock);
> > +		}
> > +		return;
> > +	}
> 
> Hmm, was this tested on UP machine ?

Hi Eric --

Thanks for the reply!

I hadn't tested UP in awhile so am testing now, and it seems to
work OK so far.  However, I am just testing my socket, *not* testing
sockets in general.  Are you implying that this patch will
break (kernel) sockets in general on a UP machine?  If so,
could you be more specific as to why?  (Again, I said
I am a networking idiot. ;-)  I played a bit with adding
a new SOCK_ flag and triggering off of that, but this
version of the patch seemed much simpler.

Thanks,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [RFC] non-preemptible kernel socket for RAMster
From: Dan Magenheimer @ 2011-07-05 17:25 UTC (permalink / raw)
  To: Loke, Chetan, netdev; +Cc: Konrad Wilk, linux-mm
In-Reply-To: <D3F292ADF945FB49B35E96C94C2061B91257D65C@nsmail.netscout.com>

> From: Loke, Chetan [mailto:Chetan.Loke@netscout.com]
> Sent: Tuesday, July 05, 2011 10:37 AM
> To: Dan Magenheimer; netdev@vger.kernel.org
> Cc: Konrad Wilk; linux-mm
> Subject: RE: [RFC] non-preemptible kernel socket for RAMster
> 
> > In working on a kernel project called RAMster* (where RAM on a
> > remote system may be used for clean page cache pages and for swap
> > pages), I found I have need for a kernel socket to be used when
> 
> How is RAMster+swap different than NBD's (pending etc?)support for SWAP
> over NBD?

Hi Chetan --

Thanks for your question.

I may be ignorant of details about NBD, but did some quick
research using google.  If I understand correctly, swap over
NBD is still writing to a configured swap disk on the remote
machine.  RAMster is swapping to *RAM* on the remote machine.
The idea is that most machines are very overprovisioned in
RAM, and are rarely using all of their RAM, especially when
a machine is (mostly) idle.  In other words, the "max of
the sums" of RAM usage on a group of machines is much lower
than the "sum of the max" of RAM usage.

So if the network is sufficiently faster than disk for
moving a page of data, RAMster provides a significant
performance improvement.  OR RAMster may allow a significant
reduction in the total amount of RAM across a data center.

The version of RAMster I am working on now is really
a proof-of-concept that works over sockets, using the
ocfs2 cluster layer.  One can easily envision a future
"exo-fabric" which allows one machine to write to the
RAM of another machine... for this future hardware,
RAMster becomes much more interesting.

Thanks,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* RE: [PATCH 2/2] packet: Add fanout support.
From: Loke, Chetan @ 2011-07-05 17:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Victor Julien, David Miller, netdev
In-Reply-To: <1309882577.2271.23.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: July 05, 2011 12:16 PM
> To: Loke, Chetan
> Cc: Victor Julien; David Miller; netdev@vger.kernel.org
> Subject: RE: [PATCH 2/2] packet: Add fanout support.
> 
> Le mardi 05 juillet 2011 à 12:03 -0400, Loke, Chetan a écrit :
> > > -----Original Message-----
> > > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > > owner@vger.kernel.org] On Behalf Of Eric Dumazet
> > > Sent: July 05, 2011 3:00 AM
> > > To: Victor Julien
> > > Cc: David Miller; netdev@vger.kernel.org
> > > Subject: Re: [PATCH 2/2] packet: Add fanout support.
> > >
> > > Le mardi 05 juillet 2011 à 08:56 +0200, Victor Julien a écrit :
> > >
> > > > Is this still also true for IP fragments?
> > > >
> > >
> > > This point was already raised. IP fragments have rxhash = 0,
> obviously,
> > > since we dont have full information (source / destination ports for
> > > example)
> >
> > Can we not do something like:
> >
> > a = src_ip_addr;
> > b = dst_ip_addr;
> >
> > if (ip_is_fragment(ip_hdr(skb)))
> > 	c = ip_hdr->id;
> > else
> > 	c = src_port | dest_port ; /* port_32 etc - Similar to what we
> have today */
> >
> > /* swap a/b etc */
> > jhash3_words(a,b,c);
> >
> >
> >
> 
> Sure, but non fragmented packets will then get a different rxhash.
> 
> Remember, goal is that _all_ packets of a given flow end in same queue.
> 
> 

Sure, a lookup is needed(to steer what I call - Hot/Cold flows) and was proposed by me on the oisf mailing list. Always, use the ip_id bit then? Another problem that needs to be solved is, what if some decoders are overloaded, then what? How will this scheme work? How will we utilize other CPUs? RPS is needed for sure.

If we maintain a i) per port lookup-table ii) 2^20 flows/table and iii) 16 bytes/flow(one can also squeeze it down to 8 bytes) then we will need around 32MB worth memory/port. It's not a huge memory pressure for folks who want to use linux for doing IPS/IDS sort of stuff.

User-space decoders end up copying the packet anyways. So fanout can be implemented in user-space to achieve effective CPU utilization.
As long as we don't bounce on different CPU-socket we could be ok.

Chetan Loke

^ permalink raw reply

* Re: [PATCH v2 1/3] dt/net: add helper function of_get_phy_mode
From: Grant Likely @ 2011-07-05 17:35 UTC (permalink / raw)
  To: Shawn Guo
  Cc: netdev, linux-arm-kernel, devicetree-discuss, patches,
	David Miller
In-Reply-To: <1309878839-25743-2-git-send-email-shawn.guo@linaro.org>

On Tue, Jul 5, 2011 at 9:13 AM, Shawn Guo <shawn.guo@linaro.org> wrote:
> It adds the helper function of_get_phy_mode getting phy interface
> from device tree.
>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> Cc: Grant Likely <grant.likely@secretlab.ca>

Acked-by: Grant Likely <grant.likely@secretlab.ca>

Dave, this should probably get merged via your tree since the rest of
this series depends on it.

g.

> ---
>  drivers/of/of_net.c    |   43 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/of_net.h |    1 +
>  2 files changed, 44 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
> index 86f334a..cc117db 100644
> --- a/drivers/of/of_net.c
> +++ b/drivers/of/of_net.c
> @@ -8,6 +8,49 @@
>  #include <linux/etherdevice.h>
>  #include <linux/kernel.h>
>  #include <linux/of_net.h>
> +#include <linux/phy.h>
> +
> +/**
> + * It maps 'enum phy_interface_t' found in include/linux/phy.h
> + * into the device tree binding of 'phy-mode', so that Ethernet
> + * device driver can get phy interface from device tree.
> + */
> +static const char *phy_modes[] = {
> +       [PHY_INTERFACE_MODE_MII]        = "mii",
> +       [PHY_INTERFACE_MODE_GMII]       = "gmii",
> +       [PHY_INTERFACE_MODE_SGMII]      = "sgmii",
> +       [PHY_INTERFACE_MODE_TBI]        = "tbi",
> +       [PHY_INTERFACE_MODE_RMII]       = "rmii",
> +       [PHY_INTERFACE_MODE_RGMII]      = "rgmii",
> +       [PHY_INTERFACE_MODE_RGMII_ID]   = "rgmii-id",
> +       [PHY_INTERFACE_MODE_RGMII_RXID] = "rgmii-rxid",
> +       [PHY_INTERFACE_MODE_RGMII_TXID] = "rgmii-txid",
> +       [PHY_INTERFACE_MODE_RTBI]       = "rtbi",
> +};
> +
> +/**
> + * of_get_phy_mode - Get phy mode for given device_node
> + * @np:        Pointer to the given device_node
> + *
> + * The function gets phy interface string from property 'phy-mode',
> + * and return its index in phy_modes table, or errno in error case.
> + */
> +const int of_get_phy_mode(struct device_node *np)
> +{
> +       const char *pm;
> +       int err, i;
> +
> +       err = of_property_read_string(np, "phy-mode", &pm);
> +       if (err < 0)
> +               return err;
> +
> +       for (i = 0; i < ARRAY_SIZE(phy_modes); i++)
> +               if (!strcasecmp(pm, phy_modes[i]))
> +                       return i;
> +
> +       return -ENODEV;
> +}
> +EXPORT_SYMBOL_GPL(of_get_phy_mode);
>
>  /**
>  * Search the device tree for the best MAC address to use.  'mac-address' is
> diff --git a/include/linux/of_net.h b/include/linux/of_net.h
> index e913081..f474641 100644
> --- a/include/linux/of_net.h
> +++ b/include/linux/of_net.h
> @@ -9,6 +9,7 @@
>
>  #ifdef CONFIG_OF_NET
>  #include <linux/of.h>
> +extern const int of_get_phy_mode(struct device_node *np);
>  extern const void *of_get_mac_address(struct device_node *np);
>  #endif
>
> --
> 1.7.4.1
>
>
>



-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* Re: [PATCH v2 2/3] net: ibm_newemac: convert it to use of_get_phy_mode
From: Grant Likely @ 2011-07-05 17:38 UTC (permalink / raw)
  To: Shawn Guo
  Cc: netdev, linux-arm-kernel, devicetree-discuss, patches,
	David S. Miller
In-Reply-To: <1309878839-25743-3-git-send-email-shawn.guo@linaro.org>

On Tue, Jul 5, 2011 at 9:13 AM, Shawn Guo <shawn.guo@linaro.org> wrote:
> The patch extends 'enum phy_interface_t' and of_get_phy_mode a little
> bit with PHY_INTERFACE_MODE_NA and PHY_INTERFACE_MODE_SMII added,
> and then converts ibm_newemac net driver to use of_get_phy_mode
> getting phy mode from device tree.
>
> It also resolves the namespace conflict on phy_read/write between
> common mdiobus interface and ibm_newemac private one.
>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Grant Likely <grant.likely@secretlab.ca>

I'm okay with this, but I'm don't know if it is a good idea to add the
new PHY_INTERFACE_MODE defines (but I cannot think of a reason why
not).  I'll let someone else comment on that.

Acked-by: Grant Likely <grant.likely@secretlab.ca>


> ---
>  drivers/net/ibm_newemac/core.c |   33 ++++-----------------------------
>  drivers/net/ibm_newemac/emac.h |   19 ++++++++++---------
>  drivers/net/ibm_newemac/phy.c  |    7 +++++--
>  drivers/of/of_net.c            |    2 ++
>  include/linux/phy.h            |    4 +++-
>  5 files changed, 24 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
> index 725399e..70cb7d8 100644
> --- a/drivers/net/ibm_newemac/core.c
> +++ b/drivers/net/ibm_newemac/core.c
> @@ -39,6 +39,7 @@
>  #include <linux/bitops.h>
>  #include <linux/workqueue.h>
>  #include <linux/of.h>
> +#include <linux/of_net.h>
>  #include <linux/slab.h>
>
>  #include <asm/processor.h>
> @@ -2506,18 +2507,6 @@ static int __devinit emac_init_config(struct emac_instance *dev)
>  {
>        struct device_node *np = dev->ofdev->dev.of_node;
>        const void *p;
> -       unsigned int plen;
> -       const char *pm, *phy_modes[] = {
> -               [PHY_MODE_NA] = "",
> -               [PHY_MODE_MII] = "mii",
> -               [PHY_MODE_RMII] = "rmii",
> -               [PHY_MODE_SMII] = "smii",
> -               [PHY_MODE_RGMII] = "rgmii",
> -               [PHY_MODE_TBI] = "tbi",
> -               [PHY_MODE_GMII] = "gmii",
> -               [PHY_MODE_RTBI] = "rtbi",
> -               [PHY_MODE_SGMII] = "sgmii",
> -       };
>
>        /* Read config from device-tree */
>        if (emac_read_uint_prop(np, "mal-device", &dev->mal_ph, 1))
> @@ -2566,23 +2555,9 @@ static int __devinit emac_init_config(struct emac_instance *dev)
>                dev->mal_burst_size = 256;
>
>        /* PHY mode needs some decoding */
> -       dev->phy_mode = PHY_MODE_NA;
> -       pm = of_get_property(np, "phy-mode", &plen);
> -       if (pm != NULL) {
> -               int i;
> -               for (i = 0; i < ARRAY_SIZE(phy_modes); i++)
> -                       if (!strcasecmp(pm, phy_modes[i])) {
> -                               dev->phy_mode = i;
> -                               break;
> -                       }
> -       }
> -
> -       /* Backward compat with non-final DT */
> -       if (dev->phy_mode == PHY_MODE_NA && pm != NULL && plen == 4) {
> -               u32 nmode = *(const u32 *)pm;
> -               if (nmode > PHY_MODE_NA && nmode <= PHY_MODE_SGMII)
> -                       dev->phy_mode = nmode;
> -       }
> +       dev->phy_mode = of_get_phy_mode(np);
> +       if (dev->phy_mode < 0)
> +               dev->phy_mode = PHY_MODE_NA;
>
>        /* Check EMAC version */
>        if (of_device_is_compatible(np, "ibm,emac4sync")) {
> diff --git a/drivers/net/ibm_newemac/emac.h b/drivers/net/ibm_newemac/emac.h
> index 8a61b597..1568278 100644
> --- a/drivers/net/ibm_newemac/emac.h
> +++ b/drivers/net/ibm_newemac/emac.h
> @@ -26,6 +26,7 @@
>  #define __IBM_NEWEMAC_H
>
>  #include <linux/types.h>
> +#include <linux/phy.h>
>
>  /* EMAC registers                      Write Access rules */
>  struct emac_regs {
> @@ -106,15 +107,15 @@ struct emac_regs {
>  /*
>  * PHY mode settings (EMAC <-> ZMII/RGMII bridge <-> PHY)
>  */
> -#define PHY_MODE_NA    0
> -#define PHY_MODE_MII   1
> -#define PHY_MODE_RMII  2
> -#define PHY_MODE_SMII  3
> -#define PHY_MODE_RGMII 4
> -#define PHY_MODE_TBI   5
> -#define PHY_MODE_GMII  6
> -#define PHY_MODE_RTBI  7
> -#define PHY_MODE_SGMII 8
> +#define PHY_MODE_NA    PHY_INTERFACE_MODE_NA
> +#define PHY_MODE_MII   PHY_INTERFACE_MODE_MII
> +#define PHY_MODE_RMII  PHY_INTERFACE_MODE_RMII
> +#define PHY_MODE_SMII  PHY_INTERFACE_MODE_SMII
> +#define PHY_MODE_RGMII PHY_INTERFACE_MODE_RGMII
> +#define PHY_MODE_TBI   PHY_INTERFACE_MODE_TBI
> +#define PHY_MODE_GMII  PHY_INTERFACE_MODE_GMII
> +#define PHY_MODE_RTBI  PHY_INTERFACE_MODE_RTBI
> +#define PHY_MODE_SGMII PHY_INTERFACE_MODE_SGMII
>
>  /* EMACx_MR0 */
>  #define EMAC_MR0_RXI                   0x80000000
> diff --git a/drivers/net/ibm_newemac/phy.c b/drivers/net/ibm_newemac/phy.c
> index ac9d964..ab4e596 100644
> --- a/drivers/net/ibm_newemac/phy.c
> +++ b/drivers/net/ibm_newemac/phy.c
> @@ -28,12 +28,15 @@
>  #include "emac.h"
>  #include "phy.h"
>
> -static inline int phy_read(struct mii_phy *phy, int reg)
> +#define phy_read _phy_read
> +#define phy_write _phy_write
> +
> +static inline int _phy_read(struct mii_phy *phy, int reg)
>  {
>        return phy->mdio_read(phy->dev, phy->address, reg);
>  }
>
> -static inline void phy_write(struct mii_phy *phy, int reg, int val)
> +static inline void _phy_write(struct mii_phy *phy, int reg, int val)
>  {
>        phy->mdio_write(phy->dev, phy->address, reg, val);
>  }
> diff --git a/drivers/of/of_net.c b/drivers/of/of_net.c
> index cc117db..bb18471 100644
> --- a/drivers/of/of_net.c
> +++ b/drivers/of/of_net.c
> @@ -16,6 +16,7 @@
>  * device driver can get phy interface from device tree.
>  */
>  static const char *phy_modes[] = {
> +       [PHY_INTERFACE_MODE_NA]         = "",
>        [PHY_INTERFACE_MODE_MII]        = "mii",
>        [PHY_INTERFACE_MODE_GMII]       = "gmii",
>        [PHY_INTERFACE_MODE_SGMII]      = "sgmii",
> @@ -26,6 +27,7 @@ static const char *phy_modes[] = {
>        [PHY_INTERFACE_MODE_RGMII_RXID] = "rgmii-rxid",
>        [PHY_INTERFACE_MODE_RGMII_TXID] = "rgmii-txid",
>        [PHY_INTERFACE_MODE_RTBI]       = "rtbi",
> +       [PHY_INTERFACE_MODE_SMII]       = "smii",
>  };
>
>  /**
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 7da5fa8..1622081 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -53,6 +53,7 @@
>
>  /* Interface Mode definitions */
>  typedef enum {
> +       PHY_INTERFACE_MODE_NA,
>        PHY_INTERFACE_MODE_MII,
>        PHY_INTERFACE_MODE_GMII,
>        PHY_INTERFACE_MODE_SGMII,
> @@ -62,7 +63,8 @@ typedef enum {
>        PHY_INTERFACE_MODE_RGMII_ID,
>        PHY_INTERFACE_MODE_RGMII_RXID,
>        PHY_INTERFACE_MODE_RGMII_TXID,
> -       PHY_INTERFACE_MODE_RTBI
> +       PHY_INTERFACE_MODE_RTBI,
> +       PHY_INTERFACE_MODE_SMII,
>  } phy_interface_t;
>
>
> --
> 1.7.4.1
>
>
>



-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* Re: [PATCH v2 3/3] net/fec: add device tree probe support
From: Grant Likely @ 2011-07-05 17:42 UTC (permalink / raw)
  To: Shawn Guo
  Cc: netdev, linux-arm-kernel, devicetree-discuss, patches, Jason Liu,
	David S. Miller
In-Reply-To: <1309878839-25743-4-git-send-email-shawn.guo@linaro.org>

On Tue, Jul 5, 2011 at 9:13 AM, Shawn Guo <shawn.guo@linaro.org> wrote:
> It adds device tree probe support for fec driver.
>
> Signed-off-by: Jason Liu <jason.hui@linaro.org>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Grant Likely <grant.likely@secretlab.ca>
> Acked-by: Grant Likely <grant.likely@secretlab.ca>

Minor comments below, but my Acked-by above still stands.

g.

> ---
>  Documentation/devicetree/bindings/net/fsl-fec.txt |   24 +++++
>  drivers/net/fec.c                                 |   99 +++++++++++++++++++-
>  2 files changed, 118 insertions(+), 5 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/net/fsl-fec.txt
>
> diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt b/Documentation/devicetree/bindings/net/fsl-fec.txt
> new file mode 100644
> index 0000000..1dad888
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
> @@ -0,0 +1,24 @@
> +* Freescale Fast Ethernet Controller (FEC)
> +
> +Required properties:
> +- compatible : Should be "fsl,<soc>-fec"
> +- reg : Address and length of the register set for the device
> +- interrupts : Should contain fec interrupt
> +- phy-mode : String, operation mode of the PHY interface.
> +  Supported values are: "mii", "gmii", "sgmii", "tbi", "rmii",
> +  "rgmii", "rgmii-id", "rgmii-rxid", "rgmii-txid", "rtbi".
> +- gpios : Should specify the gpio for phy reset

Let's be explicit for the gpio property: phy-reset-gpios.

> +
> +Optional properties:
> +- local-mac-address : 6 bytes, mac address
> +
> +Example:
> +
> +fec@83fec000 {
> +       compatible = "fsl,imx51-fec", "fsl,imx27-fec";
> +       reg = <0x83fec000 0x4000>;
> +       interrupts = <87>;
> +       phy-mode = "mii";
> +       gpios = <&gpio1 14 0>; /* phy-reset, GPIO2_14 */
> +       local-mac-address = [00 04 9F 01 1B B9];
> +};
> diff --git a/drivers/net/fec.c b/drivers/net/fec.c
> index 7ae3f28..dec94f4 100644
> --- a/drivers/net/fec.c
> +++ b/drivers/net/fec.c
> @@ -44,6 +44,10 @@
>  #include <linux/platform_device.h>
>  #include <linux/phy.h>
>  #include <linux/fec.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/of_gpio.h>
> +#include <linux/of_net.h>
>
>  #include <asm/cacheflush.h>
>
> @@ -78,6 +82,17 @@ static struct platform_device_id fec_devtype[] = {
>        { }
>  };
>
> +enum fec_type {
> +       IMX27_FEC,
> +       IMX28_FEC,
> +};
> +
> +static const struct of_device_id fec_dt_ids[] = {
> +       { .compatible = "fsl,imx27-fec", .data = &fec_devtype[IMX27_FEC], },
> +       { .compatible = "fsl,imx28-fec", .data = &fec_devtype[IMX28_FEC], },
> +       { /* sentinel */ }
> +};
> +
>  static unsigned char macaddr[ETH_ALEN];
>  module_param_array(macaddr, byte, NULL, 0);
>  MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
> @@ -734,8 +749,22 @@ static void __inline__ fec_get_mac(struct net_device *ndev)
>         */
>        iap = macaddr;
>
> +#ifdef CONFIG_OF
> +       /*
> +        * 2) from device tree data
> +        */
> +       if (!is_valid_ether_addr(iap)) {
> +               struct device_node *np = fep->pdev->dev.of_node;
> +               if (np) {
> +                       const char *mac = of_get_mac_address(np);
> +                       if (mac)
> +                               iap = (unsigned char *) mac;
> +               }
> +       }
> +#endif
> +
>        /*
> -        * 2) from flash or fuse (via platform data)
> +        * 3) from flash or fuse (via platform data)
>         */
>        if (!is_valid_ether_addr(iap)) {
>  #ifdef CONFIG_M5272
> @@ -748,7 +777,7 @@ static void __inline__ fec_get_mac(struct net_device *ndev)
>        }
>
>        /*
> -        * 3) FEC mac registers set by bootloader
> +        * 4) FEC mac registers set by bootloader
>         */
>        if (!is_valid_ether_addr(iap)) {
>                *((unsigned long *) &tmpaddr[0]) =
> @@ -1358,6 +1387,53 @@ static int fec_enet_init(struct net_device *ndev)
>        return 0;
>  }
>
> +#ifdef CONFIG_OF
> +static int __devinit fec_get_phy_mode_dt(struct platform_device *pdev)
> +{
> +       struct device_node *np = pdev->dev.of_node;
> +
> +       if (np)
> +               return of_get_phy_mode(np);
> +
> +       return -ENODEV;
> +}
> +
> +static int __devinit fec_reset_phy(struct platform_device *pdev)
> +{
> +       int err, phy_reset;
> +       struct device_node *np = pdev->dev.of_node;
> +
> +       if (!np)
> +               return -ENODEV;
> +
> +       phy_reset = of_get_gpio(np, 0);
> +       err = gpio_request_one(phy_reset, GPIOF_OUT_INIT_LOW, "phy-reset");
> +       if (err) {
> +               pr_warn("FEC: failed to get gpio phy-reset: %d\n", err);
> +               return err;
> +       }
> +
> +       msleep(1);
> +       gpio_set_value(phy_reset, 1);
> +
> +       return 0;
> +}
> +#else /* CONFIG_OF */
> +static inline int fec_get_phy_mode_dt(struct platform_device *pdev)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline int fec_reset_phy(struct platform_device *pdev)
> +{
> +       /*
> +        * In case of platform probe, the reset has been done
> +        * by machine code.
> +        */
> +       return 0;

Perhaps platform code should be reworked to use GPIO also.

> +}
> +#endif /* CONFIG_OF */
> +
>  static int __devinit
>  fec_probe(struct platform_device *pdev)
>  {
> @@ -1366,6 +1442,11 @@ fec_probe(struct platform_device *pdev)
>        struct net_device *ndev;
>        int i, irq, ret = 0;
>        struct resource *r;
> +       const struct of_device_id *of_id;
> +
> +       of_id = of_match_device(fec_dt_ids, &pdev->dev);
> +       if (of_id)
> +               pdev->id_entry = of_id->data;
>
>        r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>        if (!r)
> @@ -1397,9 +1478,16 @@ fec_probe(struct platform_device *pdev)
>
>        platform_set_drvdata(pdev, ndev);
>
> -       pdata = pdev->dev.platform_data;
> -       if (pdata)
> -               fep->phy_interface = pdata->phy;
> +       fep->phy_interface = fec_get_phy_mode_dt(pdev);
> +       if (fep->phy_interface < 0) {
> +               pdata = pdev->dev.platform_data;
> +               if (pdata)
> +                       fep->phy_interface = pdata->phy;
> +               else
> +                       fep->phy_interface = PHY_INTERFACE_MODE_MII;
> +       }
> +
> +       fec_reset_phy(pdev);
>
>        /* This device has up to three irqs on some platforms */
>        for (i = 0; i < 3; i++) {
> @@ -1534,6 +1622,7 @@ static struct platform_driver fec_driver = {
>  #ifdef CONFIG_PM
>                .pm     = &fec_pm_ops,
>  #endif
> +               .of_match_table = fec_dt_ids,
>        },
>        .id_table = fec_devtype,
>        .probe  = fec_probe,
> --
> 1.7.4.1
>
>
>



-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

^ permalink raw reply

* [PATCH] b44: Use pr_<level>_once and DRV_DESCRIPTION
From: Joe Perches @ 2011-07-05 17:43 UTC (permalink / raw)
  To: Gary Zambrano; +Cc: netdev, linux-kernel

Convert a printk with a static to pr_<level>_once
Add and use DRV_DESCRIPTION to reduce string duplication.
Remove now unused version.

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/b44.c |   14 ++++----------
 1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index cced4fd..6acf73d 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -39,6 +39,7 @@
 
 #define DRV_MODULE_NAME		"b44"
 #define DRV_MODULE_VERSION	"2.0"
+#define DRV_DESCRIPTION		"Broadcom 44xx/47xx 10/100 PCI ethernet driver"
 
 #define B44_DEF_MSG_ENABLE	  \
 	(NETIF_MSG_DRV		| \
@@ -91,11 +92,8 @@
 #define B44_ETHIPV6UDP_HLEN	62
 #define B44_ETHIPV4UDP_HLEN	42
 
-static char version[] __devinitdata =
-	DRV_MODULE_NAME ".c:v" DRV_MODULE_VERSION "\n";
-
 MODULE_AUTHOR("Felix Fietkau, Florian Schirmer, Pekka Pietikainen, David S. Miller");
-MODULE_DESCRIPTION("Broadcom 44xx/47xx 10/100 PCI ethernet driver");
+MODULE_DESCRIPTION(DRV_DESCRIPTION);
 MODULE_LICENSE("GPL");
 MODULE_VERSION(DRV_MODULE_VERSION);
 
@@ -2130,16 +2128,13 @@ static const struct net_device_ops b44_netdev_ops = {
 static int __devinit b44_init_one(struct ssb_device *sdev,
 				  const struct ssb_device_id *ent)
 {
-	static int b44_version_printed = 0;
 	struct net_device *dev;
 	struct b44 *bp;
 	int err;
 
 	instance++;
 
-	if (b44_version_printed++ == 0)
-		pr_info("%s", version);
-
+	pr_info_once("%s version %s\n", DRV_DESCRIPTION, DRV_MODULE_VERSION);
 
 	dev = alloc_etherdev(sizeof(*bp));
 	if (!dev) {
@@ -2225,8 +2220,7 @@ static int __devinit b44_init_one(struct ssb_device *sdev,
 	if (b44_phy_reset(bp) < 0)
 		bp->phy_addr = B44_PHY_ADDR_NO_PHY;
 
-	netdev_info(dev, "Broadcom 44xx/47xx 10/100BaseT Ethernet %pM\n",
-		    dev->dev_addr);
+	netdev_info(dev, "%s %pM\n", DRV_DESCRIPTION, dev->dev_addr);
 
 	return 0;
 
-- 
1.7.6.131.g99019

^ permalink raw reply related

* RE: [RFC] non-preemptible kernel socket for RAMster
From: Loke, Chetan @ 2011-07-05 17:52 UTC (permalink / raw)
  To: Dan Magenheimer, netdev; +Cc: Konrad Wilk, linux-mm
In-Reply-To: <6147447c-ecab-43ea-9b4a-1ff64b2089f0@default>

> -----Original Message-----
> From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> Sent: July 05, 2011 1:25 PM
> To: Loke, Chetan; netdev@vger.kernel.org
> Cc: Konrad Wilk; linux-mm
> Subject: RE: [RFC] non-preemptible kernel socket for RAMster
> 
> > From: Loke, Chetan [mailto:Chetan.Loke@netscout.com]
> > Sent: Tuesday, July 05, 2011 10:37 AM
> > To: Dan Magenheimer; netdev@vger.kernel.org
> > Cc: Konrad Wilk; linux-mm
> > Subject: RE: [RFC] non-preemptible kernel socket for RAMster
> >
> > > In working on a kernel project called RAMster* (where RAM on a
> > > remote system may be used for clean page cache pages and for swap
> > > pages), I found I have need for a kernel socket to be used when
> >
> > How is RAMster+swap different than NBD's (pending etc?)support for
> SWAP
> > over NBD?
> 
> Hi Chetan --
> 
> Thanks for your question.
> 
> I may be ignorant of details about NBD, but did some quick
> research using google.  If I understand correctly, swap over
> NBD is still writing to a configured swap disk on the remote

Hi - I thought NBD-server needs a backing store(a file). 
Now the file itself could reside on a RAM-drive or disk-drive etc.
And so a remote NBD(disk or RAM) can be mounted locally as a swap
device.
The local client should still see it as a block device.

I haven't used the RAM-drive feature myself but you may want to check if
it
works or even borrow that logic in your code.


> machine.  RAMster is swapping to *RAM* on the remote machine.
> The idea is that most machines are very overprovisioned in
> RAM, and are rarely using all of their RAM, especially when
> a machine is (mostly) idle.  In other words, the "max of
> the sums" of RAM usage on a group of machines is much lower
> than the "sum of the max" of RAM usage.
> 
> So if the network is sufficiently faster than disk for
> moving a page of data, RAMster provides a significant
> performance improvement.  OR RAMster may allow a significant
> reduction in the total amount of RAM across a data center.
> 
> The version of RAMster I am working on now is really
> a proof-of-concept that works over sockets, using the
> ocfs2 cluster layer.  One can easily envision a future
> "exo-fabric" which allows one machine to write to the
> RAM of another machine... for this future hardware,
> RAMster becomes much more interesting.
> 

Or you can also try scst-in-RAM mode(if you want to experiment with
different fabrics).


> Thanks,
> Dan

Thanks
Chetan Loke

^ permalink raw reply

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Neil Horman @ 2011-07-05 18:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <1309884441.2271.34.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC>

On Tue, Jul 05, 2011 at 06:47:21PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> > On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> 
> > > So all descriptors before prod are guaranteed to be ready for host
> > > consume... Fact that a dma access is running on 'next descriptor' should
> > > be irrelevant.
> > > 
> > But we handle more than one descriptor per b44_rx call - theres a while loop in
> > there where we do advance to the next descriptor.
> 
> Yes, but we advance up to 'prod', which is the very last safe
> descriptor.
> 
> If hardware advertises descriptor X being ready to be handled by host,
> while DMA on this X descriptor is not yet finished, this would be a
> really useless hardware ;)
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Something else just jumped out at me.  During b44_open, we call b44_init_rings.
This function allocates bp->rx_pending skb's and iteratively puts them in the rx
dma ring. bp->rx_pending is initalized to B44_DEF_RX_RING_PENDING, which is
defined as 200 (just about half of the 512 entries that the dma ring actually
supports in the hardware.  This is normally ok, as subsequent calls to
b44_alloc_rx_skb will fill in entries in the ring as those skbs are consumed.
The problem with this however is that b44_alloc_rx_skb only sets the
DESC_CTRL_EOT bit in the descriptor of the 512th entry, indicating that the
hardware should wrap around and reset the index counter.  If a large volume of
traffic is pushed through the adapter early on after initalization, or if the
cpu is busy during init, it would be possible that the ring buffer would fill up
prior to having additional entries added to the ring, the result being that the
dma engine would reach the end of the allocated descriptors, not see an EOT bit
set, and continue on using unallocated descriptors.

Just a theory, but it would be interesting to see if the problem subsided if you
ensured that you allocated  a full descriptor ring on b44_open
Neil

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 3d247f3..1b58a7c 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -57,7 +57,7 @@
 #define B44_MAX_MTU			1500

 #define B44_RX_RING_SIZE		512
-#define B44_DEF_RX_RING_PENDING		200
+#define B44_DEF_RX_RING_PENDING		512
 #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
 				 B44_RX_RING_SIZE)
 #define B44_TX_RING_SIZE		512

^ permalink raw reply related

* Re: [Bugme-new] [Bug 38102] New: BUG kmalloc-2048: Poison overwritten
From: Eric Dumazet @ 2011-07-05 18:13 UTC (permalink / raw)
  To: Neil Horman
  Cc: Alexey Zaytsev, Michael Büsch, Andrew Morton, netdev,
	Gary Zambrano, bugme-daemon, David S. Miller, Pekka Pietikainen,
	Florian Schirmer, Felix Fietkau, Michael Buesch
In-Reply-To: <20110705180650.GF2959@hmsreliant.think-freely.org>

Le mardi 05 juillet 2011 à 14:06 -0400, Neil Horman a écrit :
> On Tue, Jul 05, 2011 at 06:47:21PM +0200, Eric Dumazet wrote:
> > Le mardi 05 juillet 2011 à 12:42 -0400, Neil Horman a écrit :
> > > On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> > 
> > > > So all descriptors before prod are guaranteed to be ready for host
> > > > consume... Fact that a dma access is running on 'next descriptor' should
> > > > be irrelevant.
> > > > 
> > > But we handle more than one descriptor per b44_rx call - theres a while loop in
> > > there where we do advance to the next descriptor.
> > 
> > Yes, but we advance up to 'prod', which is the very last safe
> > descriptor.
> > 
> > If hardware advertises descriptor X being ready to be handled by host,
> > while DMA on this X descriptor is not yet finished, this would be a
> > really useless hardware ;)
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 
> Something else just jumped out at me.  During b44_open, we call b44_init_rings.
> This function allocates bp->rx_pending skb's and iteratively puts them in the rx
> dma ring. bp->rx_pending is initalized to B44_DEF_RX_RING_PENDING, which is
> defined as 200 (just about half of the 512 entries that the dma ring actually
> supports in the hardware.  This is normally ok, as subsequent calls to
> b44_alloc_rx_skb will fill in entries in the ring as those skbs are consumed.
> The problem with this however is that b44_alloc_rx_skb only sets the
> DESC_CTRL_EOT bit in the descriptor of the 512th entry, indicating that the
> hardware should wrap around and reset the index counter.  If a large volume of
> traffic is pushed through the adapter early on after initalization, or if the
> cpu is busy during init, it would be possible that the ring buffer would fill up
> prior to having additional entries added to the ring, the result being that the
> dma engine would reach the end of the allocated descriptors, not see an EOT bit
> set, and continue on using unallocated descriptors.
> 
> Just a theory, but it would be interesting to see if the problem subsided if you
> ensured that you allocated  a full descriptor ring on b44_open
> Neil
>  
> diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> index 3d247f3..1b58a7c 100644
> --- a/drivers/net/b44.c
> +++ b/drivers/net/b44.c
> @@ -57,7 +57,7 @@
>  #define B44_MAX_MTU			1500
>  
>  #define B44_RX_RING_SIZE		512
> -#define B44_DEF_RX_RING_PENDING		200
> +#define B44_DEF_RX_RING_PENDING		512
>  #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
>  				 B44_RX_RING_SIZE)
>  #define B44_TX_RING_SIZE		512

No

Please take time to read the driver again.

200 desc are setup, and NIC is not allowed to use more than 200 descs.

( B44_DMARX_PTR )

We carefuly advance this pointer after a new desc(s) is(are) setup




^ permalink raw reply

* RE: [PATCH 2/2] packet: Add fanout support.
From: Eric Dumazet @ 2011-07-05 18:20 UTC (permalink / raw)
  To: Loke, Chetan; +Cc: Victor Julien, David Miller, netdev
In-Reply-To: <D3F292ADF945FB49B35E96C94C2061B91257D6E1@nsmail.netscout.com>

Le mardi 05 juillet 2011 à 13:35 -0400, Loke, Chetan a écrit :

> Sure, a lookup is needed(to steer what I call - Hot/Cold flows) and
> was proposed by me on the oisf mailing list. Always, use the ip_id bit
> then? Another problem that needs to be solved is, what if some
> decoders are overloaded, then what? How will this scheme work? How
> will we utilize other CPUs? RPS is needed for sure.
> 
> If we maintain a i) per port lookup-table ii) 2^20 flows/table and
> iii) 16 bytes/flow(one can also squeeze it down to 8 bytes) then we
> will need around 32MB worth memory/port. It's not a huge memory
> pressure for folks who want to use linux for doing IPS/IDS sort of
> stuff.
> 
> User-space decoders end up copying the packet anyways. So fanout can
> be implemented in user-space to achieve effective CPU utilization.
> As long as we don't bounce on different CPU-socket we could be ok.

This is the problem we want to address.

Going into user-space to perform the fanout is what you already have
today, with one socket, one thread doing the fanout to worker threads.

David patch is non adaptative : its a hash on N queue, with a fixed hash
function.

What you want is to add another 'control queue' where new flows are
directed. Then user application is able to reinject into kernel flow
director the "This flow should go to queue X" information.

Or, let the kernel do a mix of rxhash and loadbalance : Be able to
select a queue for a new flow without user land control, using a Flow
hash table.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox