Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next-2.6] sch_choke: add choke_skb_cb
From: Eric Dumazet @ 2011-02-25  3:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Stephen Hemminger, Patrick McHardy

Better document choke skb->cb[] use, like we did in netem and sfb

This adds a compile time check to make sure we dont exhaust skb->cb[]
space.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
---
 net/sched/sch_choke.c |   15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index ee1e209..06afbae 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -219,14 +219,25 @@ static bool choke_match_flow(struct sk_buff *skb1,
 	return *ports1 == *ports2;
 }
 
+struct choke_skb_cb {
+	u16 classid;
+};
+
+static inline struct choke_skb_cb *choke_skb_cb(const struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(skb->cb) <
+		sizeof(struct qdisc_skb_cb) + sizeof(struct choke_skb_cb));
+	return (struct choke_skb_cb *)qdisc_skb_cb(skb)->data;
+}
+
 static inline void choke_set_classid(struct sk_buff *skb, u16 classid)
 {
-	*(unsigned int *)(qdisc_skb_cb(skb)->data) = classid;
+	choke_skb_cb(skb)->classid = classid;
 }
 
 static u16 choke_get_classid(const struct sk_buff *skb)
 {
-	return *(unsigned int *)(qdisc_skb_cb(skb)->data);
+	return choke_skb_cb(skb)->classid;
 }
 
 /*



^ permalink raw reply related

* Re: [Bugme-new] [Bug 29712] New: Bonding Driver(version : 3.5.0) - Problem with ARP monitoring in active backup mode
From: Brian Haley @ 2011-02-25  3:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: harsha.r02, bugzilla-daemon, bugme-daemon, netdev, Jay Vosburgh
In-Reply-To: <20110224145129.f366b59e.akpm@linux-foundation.org>

On 02/24/2011 05:51 PM, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 23 Feb 2011 10:41:34 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=29712
>>
>>            Summary: Bonding Driver(version : 3.5.0) - Problem with ARP
>>                     monitoring in active backup mode
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 2.6.32
> 
> That's a paleolithic kernel you have there.  This problem might have
> been fixed already.  Can you test a more recent kernel?

I can add some more info since I originally looked at the problem.  This
happens on 2.6.38 as well, and on this 2.6.32 kernel with a backported
3.7.0 bonding driver (with the primary_reselect option).  Harsha has a
prototype patch that's being tested, but wanted to log the bug to see
if one of the bonding maintainers had a better solution.

I'll let him respond as I'm now out of the loop...

Thanks,

-Brian

^ permalink raw reply

* Please i wait to hear from you urgently, it very important
From: Ian Davies @ 2011-02-24 19:39 UTC (permalink / raw)


I am Ian Davies ;an accredited vendor of Alliot Groups, a subsidiary firm of Emirates International Holding (EIH); A private equity funds holding company that focuses on hedge funds.

I have contacted you in the hope that you can be my associate by accepting to stand as the legal recipient to a Fixed-Income deposit, valued at 25MUSD by provind an International Offshore account to clear the funds.

Once I file your details as the new recipient to the funds, the fundswill be approved through the AUTOMATED CLEARING HOUSE (ACH) - A facility used by financial institutions to distribute electronic debit and credit entries to bank accounts and therefore settles such entries. Under the automated clearing house system. 


upon approval of your details  as the new recipient; a Credit advice will be issued in your favor and the funds will clear in your account within three banking days. I am willing to give you 40% which is 10MUSD as your commission out of the 25MUSD for your assistance in providing an International Offshore account to clear the funds.


I am confident you will be honest enough to adhere to our agreed commissions in spite of the 25MUSD coming through your account. I will need you to forward me your legal names address and phone to file your details on the fund as the new recipient in this first Quater of the financial fiscal year 2011.

Looking forward to working with you.
Ian Davies
Accredited vendor
Alliot Groups PS


^ permalink raw reply

* [PATCH 1/3] ipvs: use hlist instead of list
From: Simon Horman @ 2011-02-25  2:43 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Changli Gao, Wensong Zhang, Julian Anastasov, Patrick McHardy,
	Simon Horman
In-Reply-To: <1298601812-8168-1-git-send-email-horms@verge.net.au>

From: Changli Gao <xiaosuo@gmail.com>

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h             |    2 +-
 net/netfilter/ipvs/ip_vs_conn.c |   52 +++++++++++++++++++++-----------------
 2 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 9399549..17b01b2 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -494,7 +494,7 @@ struct ip_vs_conn_param {
  *	IP_VS structure allocated for each dynamically scheduled connection
  */
 struct ip_vs_conn {
-	struct list_head        c_list;         /* hashed list heads */
+	struct hlist_node	c_list;         /* hashed list heads */
 #ifdef CONFIG_NET_NS
 	struct net              *net;           /* Name space */
 #endif
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 83233fe..9c2a517 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -59,7 +59,7 @@ static int ip_vs_conn_tab_mask __read_mostly;
 /*
  *  Connection hash table: for input and output packets lookups of IPVS
  */
-static struct list_head *ip_vs_conn_tab __read_mostly;
+static struct hlist_head *ip_vs_conn_tab __read_mostly;
 
 /*  SLAB cache for IPVS connections */
 static struct kmem_cache *ip_vs_conn_cachep __read_mostly;
@@ -201,7 +201,7 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
 	spin_lock(&cp->lock);
 
 	if (!(cp->flags & IP_VS_CONN_F_HASHED)) {
-		list_add(&cp->c_list, &ip_vs_conn_tab[hash]);
+		hlist_add_head(&cp->c_list, &ip_vs_conn_tab[hash]);
 		cp->flags |= IP_VS_CONN_F_HASHED;
 		atomic_inc(&cp->refcnt);
 		ret = 1;
@@ -234,7 +234,7 @@ static inline int ip_vs_conn_unhash(struct ip_vs_conn *cp)
 	spin_lock(&cp->lock);
 
 	if (cp->flags & IP_VS_CONN_F_HASHED) {
-		list_del(&cp->c_list);
+		hlist_del(&cp->c_list);
 		cp->flags &= ~IP_VS_CONN_F_HASHED;
 		atomic_dec(&cp->refcnt);
 		ret = 1;
@@ -259,12 +259,13 @@ __ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
 {
 	unsigned hash;
 	struct ip_vs_conn *cp;
+	struct hlist_node *n;
 
 	hash = ip_vs_conn_hashkey_param(p, false);
 
 	ct_read_lock(hash);
 
-	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
+	hlist_for_each_entry(cp, n, &ip_vs_conn_tab[hash], c_list) {
 		if (cp->af == p->af &&
 		    p->cport == cp->cport && p->vport == cp->vport &&
 		    ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
@@ -345,12 +346,13 @@ struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p)
 {
 	unsigned hash;
 	struct ip_vs_conn *cp;
+	struct hlist_node *n;
 
 	hash = ip_vs_conn_hashkey_param(p, false);
 
 	ct_read_lock(hash);
 
-	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
+	hlist_for_each_entry(cp, n, &ip_vs_conn_tab[hash], c_list) {
 		if (!ip_vs_conn_net_eq(cp, p->net))
 			continue;
 		if (p->pe_data && p->pe->ct_match) {
@@ -394,6 +396,7 @@ struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
 {
 	unsigned hash;
 	struct ip_vs_conn *cp, *ret=NULL;
+	struct hlist_node *n;
 
 	/*
 	 *	Check for "full" addressed entries
@@ -402,7 +405,7 @@ struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
 
 	ct_read_lock(hash);
 
-	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
+	hlist_for_each_entry(cp, n, &ip_vs_conn_tab[hash], c_list) {
 		if (cp->af == p->af &&
 		    p->vport == cp->cport && p->cport == cp->dport &&
 		    ip_vs_addr_equal(p->af, p->vaddr, &cp->caddr) &&
@@ -818,7 +821,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 		return NULL;
 	}
 
-	INIT_LIST_HEAD(&cp->c_list);
+	INIT_HLIST_NODE(&cp->c_list);
 	setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);
 	ip_vs_conn_net_set(cp, p->net);
 	cp->af		   = p->af;
@@ -894,8 +897,8 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
  */
 #ifdef CONFIG_PROC_FS
 struct ip_vs_iter_state {
-	struct seq_net_private p;
-	struct list_head *l;
+	struct seq_net_private	p;
+	struct hlist_head	*l;
 };
 
 static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
@@ -903,13 +906,14 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
 	int idx;
 	struct ip_vs_conn *cp;
 	struct ip_vs_iter_state *iter = seq->private;
+	struct hlist_node *n;
 
 	for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
 		ct_read_lock_bh(idx);
-		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
+		hlist_for_each_entry(cp, n, &ip_vs_conn_tab[idx], c_list) {
 			if (pos-- == 0) {
 				iter->l = &ip_vs_conn_tab[idx];
-			return cp;
+				return cp;
 			}
 		}
 		ct_read_unlock_bh(idx);
@@ -930,7 +934,8 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct ip_vs_conn *cp = v;
 	struct ip_vs_iter_state *iter = seq->private;
-	struct list_head *e, *l = iter->l;
+	struct hlist_node *e;
+	struct hlist_head *l = iter->l;
 	int idx;
 
 	++*pos;
@@ -938,15 +943,15 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 		return ip_vs_conn_array(seq, 0);
 
 	/* more on same hash chain? */
-	if ((e = cp->c_list.next) != l)
-		return list_entry(e, struct ip_vs_conn, c_list);
+	if ((e = cp->c_list.next))
+		return hlist_entry(e, struct ip_vs_conn, c_list);
 
 	idx = l - ip_vs_conn_tab;
 	ct_read_unlock_bh(idx);
 
 	while (++idx < ip_vs_conn_tab_size) {
 		ct_read_lock_bh(idx);
-		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
+		hlist_for_each_entry(cp, e, &ip_vs_conn_tab[idx], c_list) {
 			iter->l = &ip_vs_conn_tab[idx];
 			return cp;
 		}
@@ -959,7 +964,7 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 static void ip_vs_conn_seq_stop(struct seq_file *seq, void *v)
 {
 	struct ip_vs_iter_state *iter = seq->private;
-	struct list_head *l = iter->l;
+	struct hlist_head *l = iter->l;
 
 	if (l)
 		ct_read_unlock_bh(l - ip_vs_conn_tab);
@@ -1148,13 +1153,14 @@ void ip_vs_random_dropentry(struct net *net)
 	 */
 	for (idx = 0; idx < (ip_vs_conn_tab_size>>5); idx++) {
 		unsigned hash = net_random() & ip_vs_conn_tab_mask;
+		struct hlist_node *n;
 
 		/*
 		 *  Lock is actually needed in this loop.
 		 */
 		ct_write_lock_bh(hash);
 
-		list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
+		hlist_for_each_entry(cp, n, &ip_vs_conn_tab[hash], c_list) {
 			if (cp->flags & IP_VS_CONN_F_TEMPLATE)
 				/* connection template */
 				continue;
@@ -1202,12 +1208,14 @@ static void ip_vs_conn_flush(struct net *net)
 
 flush_again:
 	for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
+		struct hlist_node *n;
+
 		/*
 		 *  Lock is actually needed in this loop.
 		 */
 		ct_write_lock_bh(idx);
 
-		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
+		hlist_for_each_entry(cp, n, &ip_vs_conn_tab[idx], c_list) {
 			if (!ip_vs_conn_net_eq(cp, net))
 				continue;
 			IP_VS_DBG(4, "del connection\n");
@@ -1265,8 +1273,7 @@ int __init ip_vs_conn_init(void)
 	/*
 	 * Allocate the connection hash table and initialize its list heads
 	 */
-	ip_vs_conn_tab = vmalloc(ip_vs_conn_tab_size *
-				 sizeof(struct list_head));
+	ip_vs_conn_tab = vmalloc(ip_vs_conn_tab_size * sizeof(*ip_vs_conn_tab));
 	if (!ip_vs_conn_tab)
 		return -ENOMEM;
 
@@ -1286,9 +1293,8 @@ int __init ip_vs_conn_init(void)
 	IP_VS_DBG(0, "Each connection entry needs %Zd bytes at least\n",
 		  sizeof(struct ip_vs_conn));
 
-	for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
-		INIT_LIST_HEAD(&ip_vs_conn_tab[idx]);
-	}
+	for (idx = 0; idx < ip_vs_conn_tab_size; idx++)
+		INIT_HLIST_HEAD(&ip_vs_conn_tab[idx]);
 
 	for (idx = 0; idx < CT_LOCKARRAY_SIZE; idx++)  {
 		rwlock_init(&__ip_vs_conntbl_lock_array[idx].l);
-- 
1.7.2.3


^ permalink raw reply related

* [GIT PULL nf-next-2.6] IPVS
From: Simon Horman @ 2011-02-25  2:43 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Changli Gao, Wensong Zhang, Julian Anastasov, Patrick McHardy,
	Simon Horman

Hi Patrick,

please consider pulling
git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git master
go get the following changes by Changli.

      ipvs: use hlist instead of list
      ipvs: use enum to instead of magic numbers
      ipvs: unify the formula to estimate the overhead of processing connections

 include/net/ip_vs.h              |   16 +++++++++++-
 net/netfilter/ipvs/ip_vs_conn.c  |   52 +++++++++++++++++++++----------------
 net/netfilter/ipvs/ip_vs_lblc.c  |   13 ++-------
 net/netfilter/ipvs/ip_vs_lblcr.c |   25 +++++-------------
 net/netfilter/ipvs/ip_vs_lc.c    |   18 +------------
 net/netfilter/ipvs/ip_vs_wlc.c   |   20 +-------------
 net/netfilter/ipvs/ip_vs_xmit.c  |   41 +++++++++++++++++++----------
 7 files changed, 84 insertions(+), 101 deletions(-)


^ permalink raw reply

* [PATCH 3/3] ipvs: unify the formula to estimate the overhead of processing connections
From: Simon Horman @ 2011-02-25  2:43 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Changli Gao, Wensong Zhang, Julian Anastasov, Patrick McHardy,
	Simon Horman
In-Reply-To: <1298601812-8168-1-git-send-email-horms@verge.net.au>

From: Changli Gao <xiaosuo@gmail.com>

lc and wlc use the same formula, but lblc and lblcr use another one. There
is no reason for using two different formulas for the lc variants.

The formula used by lc is used by all the lc variants in this patch.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Wensong Zhang <wensong@linux-vs.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h              |   14 ++++++++++++++
 net/netfilter/ipvs/ip_vs_lblc.c  |   13 +++----------
 net/netfilter/ipvs/ip_vs_lblcr.c |   25 +++++++------------------
 net/netfilter/ipvs/ip_vs_lc.c    |   18 +-----------------
 net/netfilter/ipvs/ip_vs_wlc.c   |   20 ++------------------
 5 files changed, 27 insertions(+), 63 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 17b01b2..e74da41e 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1243,6 +1243,20 @@ static inline void ip_vs_conn_drop_conntrack(struct ip_vs_conn *cp)
 /* CONFIG_IP_VS_NFCT */
 #endif
 
+static inline unsigned int
+ip_vs_dest_conn_overhead(struct ip_vs_dest *dest)
+{
+	/*
+	 * We think the overhead of processing active connections is 256
+	 * times higher than that of inactive connections in average. (This
+	 * 256 times might not be accurate, we will change it later) We
+	 * use the following formula to estimate the overhead now:
+	 *		  dest->activeconns*256 + dest->inactconns
+	 */
+	return (atomic_read(&dest->activeconns) << 8) +
+		atomic_read(&dest->inactconns);
+}
+
 #endif /* __KERNEL__ */
 
 #endif	/* _NET_IP_VS_H */
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index 4a9c8cd..6bf7a80 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -389,12 +389,7 @@ __ip_vs_lblc_schedule(struct ip_vs_service *svc)
 	int loh, doh;
 
 	/*
-	 * We think the overhead of processing active connections is fifty
-	 * times higher than that of inactive connections in average. (This
-	 * fifty times might not be accurate, we will change it later.) We
-	 * use the following formula to estimate the overhead:
-	 *                dest->activeconns*50 + dest->inactconns
-	 * and the load:
+	 * We use the following formula to estimate the load:
 	 *                (dest overhead) / dest->weight
 	 *
 	 * Remember -- no floats in kernel mode!!!
@@ -410,8 +405,7 @@ __ip_vs_lblc_schedule(struct ip_vs_service *svc)
 			continue;
 		if (atomic_read(&dest->weight) > 0) {
 			least = dest;
-			loh = atomic_read(&least->activeconns) * 50
-				+ atomic_read(&least->inactconns);
+			loh = ip_vs_dest_conn_overhead(least);
 			goto nextstage;
 		}
 	}
@@ -425,8 +419,7 @@ __ip_vs_lblc_schedule(struct ip_vs_service *svc)
 		if (dest->flags & IP_VS_DEST_F_OVERLOAD)
 			continue;
 
-		doh = atomic_read(&dest->activeconns) * 50
-			+ atomic_read(&dest->inactconns);
+		doh = ip_vs_dest_conn_overhead(dest);
 		if (loh * atomic_read(&dest->weight) >
 		    doh * atomic_read(&least->weight)) {
 			least = dest;
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index bd329b1..0063176 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -178,8 +178,7 @@ static inline struct ip_vs_dest *ip_vs_dest_set_min(struct ip_vs_dest_set *set)
 
 		if ((atomic_read(&least->weight) > 0)
 		    && (least->flags & IP_VS_DEST_F_AVAILABLE)) {
-			loh = atomic_read(&least->activeconns) * 50
-				+ atomic_read(&least->inactconns);
+			loh = ip_vs_dest_conn_overhead(least);
 			goto nextstage;
 		}
 	}
@@ -192,8 +191,7 @@ static inline struct ip_vs_dest *ip_vs_dest_set_min(struct ip_vs_dest_set *set)
 		if (dest->flags & IP_VS_DEST_F_OVERLOAD)
 			continue;
 
-		doh = atomic_read(&dest->activeconns) * 50
-			+ atomic_read(&dest->inactconns);
+		doh = ip_vs_dest_conn_overhead(dest);
 		if ((loh * atomic_read(&dest->weight) >
 		     doh * atomic_read(&least->weight))
 		    && (dest->flags & IP_VS_DEST_F_AVAILABLE)) {
@@ -228,8 +226,7 @@ static inline struct ip_vs_dest *ip_vs_dest_set_max(struct ip_vs_dest_set *set)
 	list_for_each_entry(e, &set->list, list) {
 		most = e->dest;
 		if (atomic_read(&most->weight) > 0) {
-			moh = atomic_read(&most->activeconns) * 50
-				+ atomic_read(&most->inactconns);
+			moh = ip_vs_dest_conn_overhead(most);
 			goto nextstage;
 		}
 	}
@@ -239,8 +236,7 @@ static inline struct ip_vs_dest *ip_vs_dest_set_max(struct ip_vs_dest_set *set)
   nextstage:
 	list_for_each_entry(e, &set->list, list) {
 		dest = e->dest;
-		doh = atomic_read(&dest->activeconns) * 50
-			+ atomic_read(&dest->inactconns);
+		doh = ip_vs_dest_conn_overhead(dest);
 		/* moh/mw < doh/dw ==> moh*dw < doh*mw, where mw,dw>0 */
 		if ((moh * atomic_read(&dest->weight) <
 		     doh * atomic_read(&most->weight))
@@ -563,12 +559,7 @@ __ip_vs_lblcr_schedule(struct ip_vs_service *svc)
 	int loh, doh;
 
 	/*
-	 * We think the overhead of processing active connections is fifty
-	 * times higher than that of inactive connections in average. (This
-	 * fifty times might not be accurate, we will change it later.) We
-	 * use the following formula to estimate the overhead:
-	 *                dest->activeconns*50 + dest->inactconns
-	 * and the load:
+	 * We use the following formula to estimate the load:
 	 *                (dest overhead) / dest->weight
 	 *
 	 * Remember -- no floats in kernel mode!!!
@@ -585,8 +576,7 @@ __ip_vs_lblcr_schedule(struct ip_vs_service *svc)
 
 		if (atomic_read(&dest->weight) > 0) {
 			least = dest;
-			loh = atomic_read(&least->activeconns) * 50
-				+ atomic_read(&least->inactconns);
+			loh = ip_vs_dest_conn_overhead(least);
 			goto nextstage;
 		}
 	}
@@ -600,8 +590,7 @@ __ip_vs_lblcr_schedule(struct ip_vs_service *svc)
 		if (dest->flags & IP_VS_DEST_F_OVERLOAD)
 			continue;
 
-		doh = atomic_read(&dest->activeconns) * 50
-			+ atomic_read(&dest->inactconns);
+		doh = ip_vs_dest_conn_overhead(dest);
 		if (loh * atomic_read(&dest->weight) >
 		    doh * atomic_read(&least->weight)) {
 			least = dest;
diff --git a/net/netfilter/ipvs/ip_vs_lc.c b/net/netfilter/ipvs/ip_vs_lc.c
index 6063800..f391819 100644
--- a/net/netfilter/ipvs/ip_vs_lc.c
+++ b/net/netfilter/ipvs/ip_vs_lc.c
@@ -22,22 +22,6 @@
 
 #include <net/ip_vs.h>
 
-
-static inline unsigned int
-ip_vs_lc_dest_overhead(struct ip_vs_dest *dest)
-{
-	/*
-	 * We think the overhead of processing active connections is 256
-	 * times higher than that of inactive connections in average. (This
-	 * 256 times might not be accurate, we will change it later) We
-	 * use the following formula to estimate the overhead now:
-	 *		  dest->activeconns*256 + dest->inactconns
-	 */
-	return (atomic_read(&dest->activeconns) << 8) +
-		atomic_read(&dest->inactconns);
-}
-
-
 /*
  *	Least Connection scheduling
  */
@@ -62,7 +46,7 @@ ip_vs_lc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 		if ((dest->flags & IP_VS_DEST_F_OVERLOAD) ||
 		    atomic_read(&dest->weight) == 0)
 			continue;
-		doh = ip_vs_lc_dest_overhead(dest);
+		doh = ip_vs_dest_conn_overhead(dest);
 		if (!least || doh < loh) {
 			least = dest;
 			loh = doh;
diff --git a/net/netfilter/ipvs/ip_vs_wlc.c b/net/netfilter/ipvs/ip_vs_wlc.c
index fdf0f58..bc1bfc4 100644
--- a/net/netfilter/ipvs/ip_vs_wlc.c
+++ b/net/netfilter/ipvs/ip_vs_wlc.c
@@ -27,22 +27,6 @@
 
 #include <net/ip_vs.h>
 
-
-static inline unsigned int
-ip_vs_wlc_dest_overhead(struct ip_vs_dest *dest)
-{
-	/*
-	 * We think the overhead of processing active connections is 256
-	 * times higher than that of inactive connections in average. (This
-	 * 256 times might not be accurate, we will change it later) We
-	 * use the following formula to estimate the overhead now:
-	 *		  dest->activeconns*256 + dest->inactconns
-	 */
-	return (atomic_read(&dest->activeconns) << 8) +
-		atomic_read(&dest->inactconns);
-}
-
-
 /*
  *	Weighted Least Connection scheduling
  */
@@ -71,7 +55,7 @@ ip_vs_wlc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 		if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) &&
 		    atomic_read(&dest->weight) > 0) {
 			least = dest;
-			loh = ip_vs_wlc_dest_overhead(least);
+			loh = ip_vs_dest_conn_overhead(least);
 			goto nextstage;
 		}
 	}
@@ -85,7 +69,7 @@ ip_vs_wlc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
 	list_for_each_entry_continue(dest, &svc->destinations, n_list) {
 		if (dest->flags & IP_VS_DEST_F_OVERLOAD)
 			continue;
-		doh = ip_vs_wlc_dest_overhead(dest);
+		doh = ip_vs_dest_conn_overhead(dest);
 		if (loh * atomic_read(&dest->weight) >
 		    doh * atomic_read(&least->weight)) {
 			least = dest;
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH 2/3] ipvs: use enum to instead of magic numbers
From: Simon Horman @ 2011-02-25  2:43 UTC (permalink / raw)
  To: lvs-devel, netdev, netfilter-devel, netfilter
  Cc: Changli Gao, Wensong Zhang, Julian Anastasov, Patrick McHardy,
	Simon Horman
In-Reply-To: <1298601812-8168-1-git-send-email-horms@verge.net.au>

From: Changli Gao <xiaosuo@gmail.com>

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_xmit.c |   41 +++++++++++++++++++++++++-------------
 1 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 1f2a4e3..a48239a 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -43,6 +43,13 @@
 
 #include <net/ip_vs.h>
 
+enum {
+	IP_VS_RT_MODE_LOCAL	= 1, /* Allow local dest */
+	IP_VS_RT_MODE_NON_LOCAL	= 2, /* Allow non-local dest */
+	IP_VS_RT_MODE_RDR	= 4, /* Allow redirect from remote daddr to
+				      * local
+				      */
+};
 
 /*
  *      Destination cache to speed up outgoing route lookup
@@ -77,11 +84,7 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos)
 	return dst;
 }
 
-/*
- * Get route to destination or remote server
- * rt_mode: flags, &1=Allow local dest, &2=Allow non-local dest,
- *	    &4=Allow redirect from remote daddr to local
- */
+/* Get route to destination or remote server */
 static struct rtable *
 __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
 		   __be32 daddr, u32 rtos, int rt_mode)
@@ -126,15 +129,16 @@ __ip_vs_get_out_rt(struct sk_buff *skb, struct ip_vs_dest *dest,
 	}
 
 	local = rt->rt_flags & RTCF_LOCAL;
-	if (!((local ? 1 : 2) & rt_mode)) {
+	if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
+	      rt_mode)) {
 		IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI4\n",
 			     (rt->rt_flags & RTCF_LOCAL) ?
 			     "local":"non-local", &rt->rt_dst);
 		ip_rt_put(rt);
 		return NULL;
 	}
-	if (local && !(rt_mode & 4) && !((ort = skb_rtable(skb)) &&
-					 ort->rt_flags & RTCF_LOCAL)) {
+	if (local && !(rt_mode & IP_VS_RT_MODE_RDR) &&
+	    !((ort = skb_rtable(skb)) && ort->rt_flags & RTCF_LOCAL)) {
 		IP_VS_DBG_RL("Redirect from non-local address %pI4 to local "
 			     "requires NAT method, dest: %pI4\n",
 			     &ip_hdr(skb)->daddr, &rt->rt_dst);
@@ -383,8 +387,8 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	EnterFunction(10);
 
-	if (!(rt = __ip_vs_get_out_rt(skb, NULL, iph->daddr,
-				      RT_TOS(iph->tos), 2)))
+	if (!(rt = __ip_vs_get_out_rt(skb, NULL, iph->daddr, RT_TOS(iph->tos),
+				      IP_VS_RT_MODE_NON_LOCAL)))
 		goto tx_error_icmp;
 
 	/* MTU checking */
@@ -512,7 +516,10 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	}
 
 	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
-				      RT_TOS(iph->tos), 1|2|4)))
+				      RT_TOS(iph->tos),
+				      IP_VS_RT_MODE_LOCAL |
+					IP_VS_RT_MODE_NON_LOCAL |
+					IP_VS_RT_MODE_RDR)))
 		goto tx_error_icmp;
 	local = rt->rt_flags & RTCF_LOCAL;
 	/*
@@ -755,7 +762,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
-				      RT_TOS(tos), 1|2)))
+				      RT_TOS(tos), IP_VS_RT_MODE_LOCAL |
+						   IP_VS_RT_MODE_NON_LOCAL)))
 		goto tx_error_icmp;
 	if (rt->rt_flags & RTCF_LOCAL) {
 		ip_rt_put(rt);
@@ -984,7 +992,9 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	EnterFunction(10);
 
 	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
-				      RT_TOS(iph->tos), 1|2)))
+				      RT_TOS(iph->tos),
+				      IP_VS_RT_MODE_LOCAL |
+					IP_VS_RT_MODE_NON_LOCAL)))
 		goto tx_error_icmp;
 	if (rt->rt_flags & RTCF_LOCAL) {
 		ip_rt_put(rt);
@@ -1128,7 +1138,10 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	 */
 
 	if (!(rt = __ip_vs_get_out_rt(skb, cp->dest, cp->daddr.ip,
-				      RT_TOS(ip_hdr(skb)->tos), 1|2|4)))
+				      RT_TOS(ip_hdr(skb)->tos),
+				      IP_VS_RT_MODE_LOCAL |
+					IP_VS_RT_MODE_NON_LOCAL |
+					IP_VS_RT_MODE_RDR)))
 		goto tx_error_icmp;
 	local = rt->rt_flags & RTCF_LOCAL;
 
-- 
1.7.2.3


^ permalink raw reply related

* Re: [PATCH ref0] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Po-Yu Chuang @ 2011-02-25  2:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, linux-kernel, bhutchings, joe, dilinger, mirqus, davem,
	Po-Yu Chuang
In-Reply-To: <1298569685.2814.16.camel@edumazet-laptop>

Hi Eric,

On Fri, Feb 25, 2011 at 1:48 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 24 février 2011 à 18:39 +0100, Eric Dumazet a écrit :
>> Le jeudi 24 février 2011 à 17:29 +0800, Po-Yu Chuang a écrit :
>> > From: Po-Yu Chuang <ratbert@faraday-tech.com>
>> > +
>> > +static bool ftmac100_rx_packet(struct ftmac100 *priv, int *processed)
>> > +{
>> > +   struct net_device *netdev = priv->netdev;
>> > +   struct ftmac100_rxdes *rxdes;
>> > +   struct sk_buff *skb;
>> > +   struct page *page;
>> > +   dma_addr_t map;
>> > +   int length;
>> > +
>> > +   rxdes = ftmac100_rx_locate_first_segment(priv);
>> > +   if (!rxdes)
>> > +           return false;
>> > +
>> > +   if (unlikely(ftmac100_rx_packet_error(priv, rxdes))) {
>> > +           ftmac100_rx_drop_packet(priv);
>> > +           return true;
>> > +   }
>> > +
>> > +   /*
>> > +    * It is impossible to get multi-segment packets
>> > +    * because we always provide big enough receive buffers.
>> > +    */
>> > +   if (unlikely(!ftmac100_rxdes_last_segment(rxdes)))
>> > +           BUG();
>> > +
>> > +   /* start processing */
>> > +   skb = netdev_alloc_skb_ip_align(netdev, ETH_HLEN);
>>
>> Oh I see... You should allocate a bigger head (say... 128 bytes)
>>
>> And copy in it up to 128 bytes of first part... this to avoid upper
>> stack to reallocate skb head (because IP/TCP processing need to get
>> their headers in skb head)
>
> Take a look at drivers/net/niu.c :
>
> #define RX_SKB_ALLOC_SIZE   128 + NET_IP_ALIGN
>
> static int niu_process_rx_pkt(...)
> {
>        ...
>        skb = netdev_alloc_skb(np->dev, RX_SKB_ALLOC_SIZE);
>        ...
>        while (1) {
>                ...
>                niu_rx_skb_append(skb, page, off, append_size);
>        }
> }

Oh I got it.

I will try this and redo the benchmarking.

Thanks,
Po-Yu Chuang

^ permalink raw reply

* Re: [net-next-2.6 0/7][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2011-02-25  0:29 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips
In-Reply-To: <1298545109-8990-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 24 Feb 2011 02:58:22 -0800

> The following series contains jumbo frame support for X540 devices,
> comment cleanup/fixes for ixgbevf & igb, and the addition of Tx rate
> limiting feature for igb.
> 
> The following are changes since commit 55ae22d08fc9b531bc8a88b7306004e7053bb425:
>  Merge branch 'tipc-Feb23-2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6
> 
> and are available in the git repository at:
>  master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next-2.6 master

Pulled, thanks a lot Jeff.

^ permalink raw reply

* Re: [Bugme-new] [Bug 29712] New: Bonding Driver(version : 3.5.0) - Problem with ARP monitoring in active backup mode
From: Andrew Morton @ 2011-02-24 22:51 UTC (permalink / raw)
  To: harsha.r02; +Cc: bugzilla-daemon, bugme-daemon, netdev, Jay Vosburgh
In-Reply-To: <bug-29712-10286@https.bugzilla.kernel.org/>


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 23 Feb 2011 10:41:34 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=29712
> 
>            Summary: Bonding Driver(version : 3.5.0) - Problem with ARP
>                     monitoring in active backup mode
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.32

That's a paleolithic kernel you have there.  This problem might have
been fixed already.  Can you test a more recent kernel?


>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: harsha.r02@mphasis.com
>         Regression: No
> 
> 
> We are facing an issue with arp_monitoring in active_backup mode when
> two network interfaces of two systems are connected back to back (point
> to point connected without switch connection) and bond is created on
> either systems with point-to-point connected interfaces as slaves.
> 
> Steps to reproduce :
> 
> 1. Initially the bond was created with two interfaces eth2 and eth3, having
> eth2 as primary
> 
>     # modprobe bonding primary=eth2 mode=1 arp_interval=500
>     arp_ip_target=192.168.4.61
> 
>     # ifconfig bond0 192.168.2.63 netmask 255.255.255.0
> 
>     # ifenslave bond0 eth2 eth3
> 
>     # ifconfig bond0 up
> 
>     # cat /proc/net/bonding/bond0
> 
>     Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
> 
>     Bonding Mode: fault-tolerance (active-backup)
> 
>     Primary Slave: eth2
>     Currently Active Slave: eth2
>     MII Status: up
>     MII Polling Interval (ms): 0
>     Up Delay (ms): 0
>     Down Delay (ms): 0
>     ARP Polling Interval (ms): 500
>     ARP IP target/s (n.n.n.n form): 192.168.4.61
> 
>     Slave Interface: eth2
>     MII Status: up
>     Link Failure Count: 1
>     Permanent HW addr: 00:26:55:27:88:52
> 
>     Slave Interface: eth3
>     MII Status: down
>     Link Failure Count: 1
>     Permanent HW addr: 00:26:55:27:88:54
> 
> 2. The primary interface was made down, and fail over happened
> 
>     # ifconfig down
> 
>     # cat /proc/net/bonding/bond0
> 
>     Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
> 
>     Bonding Mode: fault-tolerance (active-backup)
>     Primary Slave: eth2
>     Currently Active Slave: eth3 <-- As expected -->
>     MII Status: up
>     MII Polling Interval (ms): 0
>     Up Delay (ms): 0
>     Down Delay (ms): 0
>     ARP Polling Interval (ms): 500
>     ARP IP target/s (n.n.n.n form): 192.168.4.61
> 
>     Slave Interface: eth2
>     MII Status: down
>     Link Failure Count: 2
>     Permanent HW addr: 00:26:55:27:88:52
> 
>     Slave Interface: eth3
>     MII Status: up
>     Link Failure Count: 1
>     Permanent HW addr: 00:26:55:27:88:54
> 
> 3. The primary interface was brought up again and we did not see failover
> happening back to primary
> 
>     ned1g6:~# ifconfig eth2 up
> 
>     ned1g6:~# cat /proc/net/bonding/bond0
> 
>     Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
> 
>     Bonding Mode: fault-tolerance (active-backup)
>     Primary Slave: eth2
>     Currently Active Slave: eth3 <-- Ideally this should have been eth2 -->
>     MII Status: up
>     MII Polling Interval (ms): 0
>     Up Delay (ms): 0
>     Down Delay (ms): 0
>     ARP Polling Interval (ms): 500
>     ARP IP target/s (n.n.n.n form): 192.168.4.61
> 
>     Slave Interface: eth2
>     MII Status: down
>     Link Failure Count: 2
>     Permanent HW addr: 00:26:55:27:88:52
> 
>     Slave Interface: eth3
>     MII Status: up
>     Link Failure Count: 1
>     Permanent HW addr: 00:26:55:27:88:54
> 
> The problem is that when the primary_slave comes up from the down state
> it won't get selected as the currently active slave for the bond.
> 
> Best Regards,
> Harsha


^ permalink raw reply

* RE: [PATCH net-next-2.6 1/2] dcbnl: add support for retrieving peer configuration - ieee
From: Shmulik Ravid - Rabinovitz @ 2011-02-24 22:04 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem@davemloft.net, John Fastabend, Eilon Greenstein,
	netdev@vger.kernel.org
In-Reply-To: <1298576559.2613.40.camel@bwh-desktop>



> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Thursday, February 24, 2011 9:43 PM
> To: Shmulik Ravid - Rabinovitz
> Cc: davem@davemloft.net; John Fastabend; Eilon Greenstein;
> netdev@vger.kernel.org
> Subject: Re: [PATCH net-next-2.6 1/2] dcbnl: add support for retrieving
> peer configuration - ieee
> 
> On Thu, 2011-02-24 at 23:03 +0200, Shmulik Ravid wrote:
> > These 2 patches add the support for retrieving the remote or peer
> DCBX
> > configuration via dcbnl for embedded DCBX stacks. The peer
> configuration
> > is part of the DCBX MIB and is useful for debugging and diagnostics
> of
> > the overall DCB configuration. The first patch add this support for
> IEEE
> > 802.1Qaz standard the second patch add the same support for the older
> > CEE standard.
> >
> > Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
> > ---
> >  include/linux/dcbnl.h |   38 ++++++++++++++++++++++++++
> >  include/net/dcbnl.h   |    5 +++
> >  net/dcb/dcbnl.c       |   71
> +++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 114 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
> > index 4c5b26e..3102185 100644
> > --- a/include/linux/dcbnl.h
> > +++ b/include/linux/dcbnl.h
> > @@ -110,6 +110,22 @@ struct dcb_app {
> >  	__u16	protocol;
> >  };
> >
> > +/* This structure contains the APP feature information sent by the
> peer.
> > + * It is used for both the IEEE 802.1Qaz and the CEE flavors.
> > + *
> > + * @willing: willing bit in the peer APP tlv
> > + * @error: error bit in the peer APP tlv
> > + * @app_count: The number of objects in the peer APP table.
> [...]
> 
> It looks like this was supposed to be a kernel-doc comment, but it's
> not
> valid as such unless you start with:
> 
>     /**
>      * struct dcb_peer_app_info - one-line description here
> 
> Ben.
> 
OK thanks, I'll fix this

Shmulik

^ permalink raw reply

* RE: [PATCH net-next-2.6 1/2] dcbnl: add support for retrieving peer configuration - ieee
From: Shmulik Ravid - Rabinovitz @ 2011-02-24 22:03 UTC (permalink / raw)
  To: John Fastabend
  Cc: davem@davemloft.net, Eilon Greenstein, netdev@vger.kernel.org
In-Reply-To: <4D66C186.7040409@intel.com>

> -----Original Message-----
> From: John Fastabend [mailto:john.r.fastabend@intel.com]
> Sent: Thursday, February 24, 2011 10:37 PM
> To: Shmulik Ravid - Rabinovitz
> Cc: davem@davemloft.net; Eilon Greenstein; netdev@vger.kernel.org
> Subject: Re: [PATCH net-next-2.6 1/2] dcbnl: add support for retrieving
> peer configuration - ieee
> 
> On 2/24/2011 1:03 PM, Shmulik Ravid wrote:
> > These 2 patches add the support for retrieving the remote or peer
> DCBX
> > configuration via dcbnl for embedded DCBX stacks. The peer
> configuration
> > is part of the DCBX MIB and is useful for debugging and diagnostics
> of
> > the overall DCB configuration. The first patch add this support for
> IEEE
> > 802.1Qaz standard the second patch add the same support for the older
> > CEE standard.
> >
> > Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
> > ---
> >  include/linux/dcbnl.h |   38 ++++++++++++++++++++++++++
> >  include/net/dcbnl.h   |    5 +++
> >  net/dcb/dcbnl.c       |   71
> +++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 114 insertions(+), 0 deletions(-)
> >
> > diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
> > index 4c5b26e..3102185 100644
> > --- a/include/linux/dcbnl.h
> > +++ b/include/linux/dcbnl.h
> > @@ -110,6 +110,22 @@ struct dcb_app {
> >  	__u16	protocol;
> >  };
> >
> > +/* This structure contains the APP feature information sent by the
> peer.
> > + * It is used for both the IEEE 802.1Qaz and the CEE flavors.
> > + *
> > + * @willing: willing bit in the peer APP tlv
> > + * @error: error bit in the peer APP tlv
> > + * @app_count: The number of objects in the peer APP table.
> > + *
> > + * In addition to this information the full peer APP tlv also
> contains
> > + * a table of 'app_count' APP objects defined above.
> > + */
> > +struct dcb_peer_app_info {
> > +	__u8	willing;
> > +	__u8	error;
> > +	__u16	app_count;
> > +};
> > +
> 
> The IEEE 802.1Qaz spec defines the APP TLV as informational
> so there are no willing or error bits in this case. See
> section D.2.12 of the 802.1Qaz draft.
> 
> Can we drop these fields or do they have some other meaning
> here?
> 
OK, They are part of the CEE APP tlv though. 
I wanted to share this structure between the 802.1Qaz and CEE so 
I'll have a single driver handler that retrieve the number of
peer apps. How about if we keep a single driver handler, but the
APP info will be exposed to the user only with the CEE flavor.
That is the PEER_APP attribute will be CEE specific ?

Shmulik 


^ permalink raw reply

* [PATCH] ipv4: Rearrange how ip_route_newports() gets port keys.
From: David Miller @ 2011-02-24 21:42 UTC (permalink / raw)
  To: netdev


ip_route_newports() is the only place in the entire kernel that
cares about the port members in the routing cache entry's lookup
flow key.

Therefore the only reason we store an entire flow inside of the
struct rtentry is for this one special case.

Rewrite ip_route_newports() such that:

1) The caller passes in the original port values, so we don't need
   to use the rth->fl.fl_ip_{s,d}port values to remember them.

2) The lookup flow is constructed by hand instead of being copied
   from the routing cache entry's flow.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h |   19 +++++++++++--------
 net/dccp/ipv4.c     |   10 +++++++---
 net/ipv4/tcp_ipv4.c |    6 +++++-
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index bf790c1..b3f89ad 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -200,16 +200,19 @@ static inline int ip_route_connect(struct rtable **rp, __be32 dst,
 }
 
 static inline int ip_route_newports(struct rtable **rp, u8 protocol,
+				    __be16 orig_sport, __be16 orig_dport,
 				    __be16 sport, __be16 dport, struct sock *sk)
 {
-	if (sport != (*rp)->fl.fl_ip_sport ||
-	    dport != (*rp)->fl.fl_ip_dport) {
-		struct flowi fl;
-
-		memcpy(&fl, &(*rp)->fl, sizeof(fl));
-		fl.fl_ip_sport = sport;
-		fl.fl_ip_dport = dport;
-		fl.proto = protocol;
+	if (sport != orig_sport || dport != orig_dport) {
+		struct flowi fl = { .oif = (*rp)->fl.oif,
+				    .mark = (*rp)->fl.mark,
+				    .fl4_dst = (*rp)->fl.fl4_dst,
+				    .fl4_src = (*rp)->fl.fl4_src,
+				    .fl4_tos = (*rp)->fl.fl4_tos,
+				    .proto = (*rp)->fl.proto,
+				    .fl_ip_sport = sport,
+				    .fl_ip_dport = dport };
+
 		if (inet_sk(sk)->transparent)
 			fl.flags |= FLOWI_FLAG_ANYSRC;
 		if (protocol == IPPROTO_TCP)
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 45a434f..9379891 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -43,6 +43,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	struct inet_sock *inet = inet_sk(sk);
 	struct dccp_sock *dp = dccp_sk(sk);
 	const struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
+	__be16 orig_sport, orig_dport;
 	struct rtable *rt;
 	__be32 daddr, nexthop;
 	int tmp;
@@ -63,10 +64,12 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		nexthop = inet->opt->faddr;
 	}
 
+	orig_sport = inet->inet_sport;
+	orig_dport = usin->sin_port;
 	tmp = ip_route_connect(&rt, nexthop, inet->inet_saddr,
 			       RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
 			       IPPROTO_DCCP,
-			       inet->inet_sport, usin->sin_port, sk, 1);
+			       orig_sport, orig_dport, sk, 1);
 	if (tmp < 0)
 		return tmp;
 
@@ -99,8 +102,9 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (err != 0)
 		goto failure;
 
-	err = ip_route_newports(&rt, IPPROTO_DCCP, inet->inet_sport,
-				inet->inet_dport, sk);
+	err = ip_route_newports(&rt, IPPROTO_DCCP,
+				orig_sport, orig_dport,
+				inet->inet_sport, inet->inet_dport, sk);
 	if (err != 0)
 		goto failure;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ef5a90b..27a0cc8 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -149,6 +149,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	struct inet_sock *inet = inet_sk(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
+	__be16 orig_sport, orig_dport;
 	struct rtable *rt;
 	__be32 daddr, nexthop;
 	int tmp;
@@ -167,10 +168,12 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		nexthop = inet->opt->faddr;
 	}
 
+	orig_sport = inet->inet_sport;
+	orig_dport = usin->sin_port;
 	tmp = ip_route_connect(&rt, nexthop, inet->inet_saddr,
 			       RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
 			       IPPROTO_TCP,
-			       inet->inet_sport, usin->sin_port, sk, 1);
+			       orig_sport, orig_dport, sk, 1);
 	if (tmp < 0) {
 		if (tmp == -ENETUNREACH)
 			IP_INC_STATS_BH(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
@@ -234,6 +237,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		goto failure;
 
 	err = ip_route_newports(&rt, IPPROTO_TCP,
+				orig_sport, orig_dport,
 				inet->inet_sport, inet->inet_dport, sk);
 	if (err)
 		goto failure;
-- 
1.7.4.1


^ permalink raw reply related

* Re: [PATCH net-next-2.6 1/2] dcbnl: add support for retrieving peer configuration - ieee
From: John Fastabend @ 2011-02-24 20:37 UTC (permalink / raw)
  To: Shmulik Ravid
  Cc: davem@davemloft.net, Eilon Greenstein, netdev@vger.kernel.org
In-Reply-To: <1298581410.8877.21.camel@lb-tlvb-shmulik.il.broadcom.com>

On 2/24/2011 1:03 PM, Shmulik Ravid wrote:
> These 2 patches add the support for retrieving the remote or peer DCBX
> configuration via dcbnl for embedded DCBX stacks. The peer configuration
> is part of the DCBX MIB and is useful for debugging and diagnostics of
> the overall DCB configuration. The first patch add this support for IEEE
> 802.1Qaz standard the second patch add the same support for the older
> CEE standard. 
> 
> Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
> ---
>  include/linux/dcbnl.h |   38 ++++++++++++++++++++++++++
>  include/net/dcbnl.h   |    5 +++
>  net/dcb/dcbnl.c       |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 114 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
> index 4c5b26e..3102185 100644
> --- a/include/linux/dcbnl.h
> +++ b/include/linux/dcbnl.h
> @@ -110,6 +110,22 @@ struct dcb_app {
>  	__u16	protocol;
>  };
>  
> +/* This structure contains the APP feature information sent by the peer.
> + * It is used for both the IEEE 802.1Qaz and the CEE flavors.
> + *
> + * @willing: willing bit in the peer APP tlv
> + * @error: error bit in the peer APP tlv
> + * @app_count: The number of objects in the peer APP table.
> + *
> + * In addition to this information the full peer APP tlv also contains
> + * a table of 'app_count' APP objects defined above.
> + */
> +struct dcb_peer_app_info {
> +	__u8	willing;
> +	__u8	error;
> +	__u16	app_count;
> +};
> +

The IEEE 802.1Qaz spec defines the APP TLV as informational
so there are no willing or error bits in this case. See
section D.2.12 of the 802.1Qaz draft.

Can we drop these fields or do they have some other meaning
here?

Thanks,
John.


^ permalink raw reply

* [RFC] be2net: add rxhash support
From: Eric Dumazet @ 2011-02-24 20:24 UTC (permalink / raw)
  To: Ajit Khaparde; +Cc: netdev
In-Reply-To: <1298560543.2814.4.camel@edumazet-laptop>

Ajit, it seems be2net provides RSS hash value in rx compl descriptor ?

Could we feed skb->rxhash with it ?

Thanks !

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 0bdccb1..f2db5b2 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1038,6 +1038,9 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 
 	skb->truesize = skb->len + sizeof(struct sk_buff);
 	skb->protocol = eth_type_trans(skb, adapter->netdev);
+	if (adapter->netdev->features & NETIF_F_RXHASH)
+		skb->rxhash = AMAP_GET_BITS(struct amap_eth_rx_compl, rsshash, rxcp);
+
 
 	vlanf = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
 	vtm = AMAP_GET_BITS(struct amap_eth_rx_compl, vtm, rxcp);
@@ -1099,6 +1102,9 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 		return;
 	}
 
+	if (adapter->netdev->features & NETIF_F_RXHASH)
+		skb->rxhash = AMAP_GET_BITS(struct amap_eth_rx_compl, rsshash, rxcp);
+
 	remaining = pkt_size;
 	for (i = 0, j = -1; i < num_rcvd; i++) {
 		page_info = get_rx_page_info(adapter, rxo, rxq_idx);
@@ -2618,6 +2624,7 @@ static void be_netdev_init(struct net_device *netdev)
 		NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_FILTER |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		NETIF_F_GRO | NETIF_F_TSO6;
+	netdev->features |= NETIF_F_RXHASH;
 
 	netdev->vlan_features |= NETIF_F_SG | NETIF_F_TSO |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;



^ permalink raw reply related

* Re: via-rhine -- VT6105M and checksum offloading
From: David Miller @ 2011-02-24 20:23 UTC (permalink / raw)
  To: bcrl; +Cc: netdev
In-Reply-To: <20110224185805.GI30393@kvack.org>

From: Benjamin LaHaise <bcrl@kvack.org>
Date: Thu, 24 Feb 2011 13:58:05 -0500

> I've recently noticed that one of the embedded systems I'm using (the 
> PCEngines ALIXes) is becoming CPU bound under heavy network traffic.  
> Upon investigation, it looks like the VT6105M isn't actually using the 
> hardware checksum offloading support of the hardware.  Are there any 
> known reasons why this isn't enabled (hardware bugs?)?  I'll test enabling 
> it in the driver, but I figured it would be worth asking if this path 
> has been explored already.  Cheers,

As far as I can tell it was never attempted.  So it should work.

If you do that, while you're here, you can make rhine_rx() take
a "napi" arg and make this driver use netif_gso_receive() too.
Don't forget to set NETIF_F_GRO or similar in netdev->flags during
probe, and also hookup the necessary ethtool hooks.

^ permalink raw reply

* Re: [PATCH net-next-2.6 1/2] dcbnl: add support for retrieving peer configuration - ieee
From: Ben Hutchings @ 2011-02-24 19:42 UTC (permalink / raw)
  To: Shmulik Ravid; +Cc: davem, John Fastabend, Eilon Greenstein, netdev
In-Reply-To: <1298581410.8877.21.camel@lb-tlvb-shmulik.il.broadcom.com>

On Thu, 2011-02-24 at 23:03 +0200, Shmulik Ravid wrote:
> These 2 patches add the support for retrieving the remote or peer DCBX
> configuration via dcbnl for embedded DCBX stacks. The peer configuration
> is part of the DCBX MIB and is useful for debugging and diagnostics of
> the overall DCB configuration. The first patch add this support for IEEE
> 802.1Qaz standard the second patch add the same support for the older
> CEE standard. 
> 
> Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
> ---
>  include/linux/dcbnl.h |   38 ++++++++++++++++++++++++++
>  include/net/dcbnl.h   |    5 +++
>  net/dcb/dcbnl.c       |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 114 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
> index 4c5b26e..3102185 100644
> --- a/include/linux/dcbnl.h
> +++ b/include/linux/dcbnl.h
> @@ -110,6 +110,22 @@ struct dcb_app {
>  	__u16	protocol;
>  };
>  
> +/* This structure contains the APP feature information sent by the peer.
> + * It is used for both the IEEE 802.1Qaz and the CEE flavors.
> + *
> + * @willing: willing bit in the peer APP tlv
> + * @error: error bit in the peer APP tlv
> + * @app_count: The number of objects in the peer APP table.
[...]

It looks like this was supposed to be a kernel-doc comment, but it's not
valid as such unless you start with:

    /**
     * struct dcb_peer_app_info - one-line description here

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH] cxgb{3,4}: streamline Kconfig options
From: Dimitris Michailidis @ 2011-02-24 19:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: David Miller, divy, linux-kbuild, netdev
In-Reply-To: <4D661E2702000078000336B5@vpn.id2.novell.com>

Jan Beulich wrote:

> As to that INET vs NET dependency - is it possible that the
> network drivers really just need NET, but the iSCSI ones need
> INET? In which case the only common dependency would be
> PCI - certainly not worth a custom helper option.

I see about a dozen network drivers that depend on INET.  These may be the 
result of cut&paste from other drivers' Kconfig entries rather than actual 
dependencies.  Also some of these drivers select or selected in the past 
INET_LRO and that may have something to do with their INET dependency, not sure.

Reading the commit message that introduced CHELSIO_T3_DEPENDS, it talks of 
hidden dependencies that select does not see.  I am not sure which exactly 
but since it's been a few years since that commit I'll try to see what the 
situation is today without the *_DEPENDS symbols and let you know.

^ permalink raw reply

* via-rhine -- VT6105M and checksum offloading
From: Benjamin LaHaise @ 2011-02-24 18:58 UTC (permalink / raw)
  To: netdev

Hi folks,

I've recently noticed that one of the embedded systems I'm using (the 
PCEngines ALIXes) is becoming CPU bound under heavy network traffic.  
Upon investigation, it looks like the VT6105M isn't actually using the 
hardware checksum offloading support of the hardware.  Are there any 
known reasons why this isn't enabled (hardware bugs?)?  I'll test enabling 
it in the driver, but I figured it would be worth asking if this path 
has been explored already.  Cheers,

		-ben

^ permalink raw reply

* Re: tun.c non formal header protocol?
From: Rémi Denis-Courmont @ 2011-02-24 19:04 UTC (permalink / raw)
  To: Kfir Lavi; +Cc: netdev
In-Reply-To: <AANLkTikpkgiTRDJe0gxuXosp_uvg0gKM_c7GcvL+hhLm@mail.gmail.com>

Le jeudi 24 février 2011 20:37:32 Kfir Lavi, vous avez écrit :
> I would like to use custom protocol over tun/tap device.

Over TAP, you can only exchange Ethernet frames. However you can use whatever 
network layer you like (so long as it has an Ethernet type associated).

Over TUN, you can exchange packets without any link layer header, for any 
network layer protocol defined in Linux.

In principles, you can probably just use a (Ethernet) type that is not used by 
any existing stack in the Linux kernel. But I don't see any point in doing so, 
as the kernel will just drop the packets on the floor afterward.

> I'm grabbing packets, and changing them, to deliver via tap,
> to a listener that knows this custom protocol.
> The custom protocol is just wrapping the packet with another
> small header.
> Is it possible to move custom packets via tun.c ?

It's difficult to say without a clearer picture what you are trying to do.

-- 
Rémi Denis-Courmont
http://www.remlab.info/
http://fi.linkedin.com/in/remidenis

^ permalink raw reply

* [PATCH net-next-2.6 2/2] dcbnl: add support for retrieving peer configuration - cee
From: Shmulik Ravid @ 2011-02-24 21:03 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, Eilon Greenstein, netdev

This patch adds the support for retrieving the remote or peer DCBX
configuration via dcbnl for embedded DCBX stacks supporting the CEE DCBX
standard.

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
---
 include/linux/dcbnl.h |   62 ++++++++++++++++++++++++++++++++++++++++++++++
 include/net/dcbnl.h   |    3 ++
 net/dcb/dcbnl.c       |   66 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 131 insertions(+), 0 deletions(-)

diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
index 3102185..f937c75 100644
--- a/include/linux/dcbnl.h
+++ b/include/linux/dcbnl.h
@@ -87,6 +87,43 @@ struct ieee_pfc {
 	__u64	indications[IEEE_8021QAZ_MAX_TCS];
 };
 
+/* CEE DCBX std supported values */
+#define CEE_DCBX_MAX_PGS	8
+#define CEE_DCBX_MAX_PRIO	8
+
+/* This structure contains the CEE Prioity-Group managed object
+ *
+ * @willing: willing bit in the PG tlv
+ * @error: error bit in the PG tlv
+ * @pg_en: enable bit of the PG feature
+ * @tcs_supported: number of traffic classes supported
+ * @pg_bw: bandwidth percentage for each priority group
+ * @prio_pg: priority to PG mapping indexed by priority
+ */
+struct cee_pg {
+	__u8	willing;
+	__u8	error;
+	__u8	pg_en;
+	__u8	tcs_supported;
+	__u8	pg_bw[CEE_DCBX_MAX_PGS];
+	__u8	prio_pg[CEE_DCBX_MAX_PGS];
+};
+
+/* This structure contains the PFC managed object
+ *
+ * @willing: willing bit in the PFC tlv
+ * @error: error bit in the PFC tlv
+ * @pfc_en: bitmap indicating pfc enabled traffic classes
+ * @tcs_supported: number of traffic classes supported
+ */
+struct cee_pfc {
+	__u8	willing;
+	__u8	error;
+	__u8	pfc_en;
+	__u8	tcs_supported;
+};
+
+
 /* This structure contains the IEEE 802.1Qaz APP managed object. This
  * object is also used for the CEE std as well. There is no difference
  * between the objects.
@@ -160,6 +197,7 @@ struct dcbmsg {
  * @DCB_CMD_SDCBX: set DCBX engine configuration
  * @DCB_CMD_GFEATCFG: get DCBX features flags
  * @DCB_CMD_SFEATCFG: set DCBX features negotiation flags
+ * @DCB_CMD_CEE_GET: get CEE aggregated configuration
  */
 enum dcbnl_commands {
 	DCB_CMD_UNDEFINED,
@@ -202,6 +240,8 @@ enum dcbnl_commands {
 	DCB_CMD_GFEATCFG,
 	DCB_CMD_SFEATCFG,
 
+	DCB_CMD_CEE_GET,
+
 	__DCB_CMD_ENUM_MAX,
 	DCB_CMD_MAX = __DCB_CMD_ENUM_MAX - 1,
 };
@@ -224,6 +264,7 @@ enum dcbnl_commands {
  * @DCB_ATTR_IEEE: IEEE 802.1Qaz supported attributes (NLA_NESTED)
  * @DCB_ATTR_DCBX: DCBX engine configuration in the device (NLA_U8)
  * @DCB_ATTR_FEATCFG: DCBX features flags (NLA_NESTED)
+ * @DCB_ATTR_CEE: CEE std supported attributes (NLA_NESTED)
  */
 enum dcbnl_attrs {
 	DCB_ATTR_UNDEFINED,
@@ -247,6 +288,9 @@ enum dcbnl_attrs {
 	DCB_ATTR_DCBX,
 	DCB_ATTR_FEATCFG,
 
+	/* CEE nested attributes */
+	DCB_ATTR_CEE,
+
 	__DCB_ATTR_ENUM_MAX,
 	DCB_ATTR_MAX = __DCB_ATTR_ENUM_MAX - 1,
 };
@@ -281,6 +325,24 @@ enum ieee_attrs_app {
 };
 #define DCB_ATTR_IEEE_APP_MAX (__DCB_ATTR_IEEE_APP_MAX - 1)
 
+
+/**
+ * enum cee_attrs - CEE DCBX get attributes
+ *
+ * @DCB_ATTR_CEE_UNSPEC: unspecified
+ * @DCB_ATTR_CEE_PEER_PG: peer PG configuration - get only
+ * @DCB_ATTR_CEE_PEER_PFC: peer PFC configuration - get only
+ * @DCB_ATTR_CEE_PEER_APP: peer APP tlv - get only
+ */
+enum cee_attrs {
+	DCB_ATTR_CEE_UNSPEC,
+	DCB_ATTR_CEE_PEER_PG,
+	DCB_ATTR_CEE_PEER_PFC,
+	DCB_ATTR_CEE_PEER_APP,
+	__DCB_ATTR_CEE_MAX
+};
+#define DCB_ATTR_CEE_MAX (__DCB_ATTR_CEE_MAX - 1)
+
 enum peer_app_attr {
 	DCB_ATTR_PEER_APP_UNSPEC,
 	DCB_ATTR_PEER_APP_INFO,
diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index 739feb4..06227e83 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -83,6 +83,9 @@ struct dcbnl_rtnl_ops {
 	int (*peer_getappinfo)(struct net_device *, struct dcb_peer_app_info *);
 	int (*peer_getapptable)(struct net_device *, struct dcb_app *);
 
+	/* CEE peer */
+	int (*cee_peer_getpg) (struct net_device *, struct cee_pg *);
+	int (*cee_peer_getpfc) (struct net_device *, struct cee_pfc *);
 };
 
 #endif /* __NET_DCBNL_H__ */
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 0eb7461..ea39e51 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1512,6 +1512,68 @@ err:
 	return ret;
 }
 
+/* Handle CEE DCBX GET commands. */
+static int dcbnl_cee_get(struct net_device *netdev, struct nlattr **tb,
+			 u32 pid, u32 seq, u16 flags)
+{
+	struct sk_buff *skb;
+	struct nlmsghdr *nlh;
+	struct dcbmsg *dcb;
+	struct nlattr *cee;
+	const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops;
+	int err;
+
+	if (!ops)
+		return -EOPNOTSUPP;
+
+	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOBUFS;
+
+	nlh = NLMSG_NEW(skb, pid, seq, RTM_GETDCB, sizeof(*dcb), flags);
+
+	dcb = NLMSG_DATA(nlh);
+	dcb->dcb_family = AF_UNSPEC;
+	dcb->cmd = DCB_CMD_CEE_GET;
+
+	NLA_PUT_STRING(skb, DCB_ATTR_IFNAME, netdev->name);
+
+	cee = nla_nest_start(skb, DCB_ATTR_CEE);
+	if (!cee)
+		goto nla_put_failure;
+
+	/* get peer info if available */
+	if (ops->cee_peer_getpg) {
+		struct cee_pg pg;
+		err = ops->cee_peer_getpg(netdev, &pg);
+		if (!err)
+			NLA_PUT(skb, DCB_ATTR_CEE_PEER_PG, sizeof(pg), &pg);
+	}
+
+	if (ops->cee_peer_getpfc) {
+		struct cee_pfc pfc;
+		err = ops->cee_peer_getpfc(netdev, &pfc);
+		if (!err)
+			NLA_PUT(skb, DCB_ATTR_CEE_PEER_PFC, sizeof(pfc), &pfc);
+	}
+
+	if (ops->peer_getappinfo && ops->peer_getapptable) {
+		err = dcbnl_build_peer_app(netdev, skb, DCB_ATTR_CEE_PEER_APP);
+		if (err)
+			goto nla_put_failure;
+	}
+
+	nla_nest_end(skb, cee);
+	nlmsg_end(skb, nlh);
+
+	return rtnl_unicast(skb, &init_net, pid);
+nla_put_failure:
+	nlmsg_cancel(skb, nlh);
+nlmsg_failure:
+	kfree_skb(skb);
+	return -1;
+}
+
 static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
 	struct net *net = sock_net(skb->sk);
@@ -1641,6 +1703,10 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 		ret = dcbnl_setfeatcfg(netdev, tb, pid, nlh->nlmsg_seq,
 				       nlh->nlmsg_flags);
 		goto out;
+	case DCB_CMD_CEE_GET:
+		ret = dcbnl_cee_get(netdev, tb, pid, nlh->nlmsg_seq,
+				    nlh->nlmsg_flags);
+		goto out;
 	default:
 		goto errout;
 	}
-- 
1.7.3.5





^ permalink raw reply related

* [PATCH net-next-2.6 1/2] dcbnl: add support for retrieving peer configuration - ieee
From: Shmulik Ravid @ 2011-02-24 21:03 UTC (permalink / raw)
  To: davem; +Cc: John Fastabend, Eilon Greenstein, netdev

These 2 patches add the support for retrieving the remote or peer DCBX
configuration via dcbnl for embedded DCBX stacks. The peer configuration
is part of the DCBX MIB and is useful for debugging and diagnostics of
the overall DCB configuration. The first patch add this support for IEEE
802.1Qaz standard the second patch add the same support for the older
CEE standard. 

Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
---
 include/linux/dcbnl.h |   38 ++++++++++++++++++++++++++
 include/net/dcbnl.h   |    5 +++
 net/dcb/dcbnl.c       |   71 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 114 insertions(+), 0 deletions(-)

diff --git a/include/linux/dcbnl.h b/include/linux/dcbnl.h
index 4c5b26e..3102185 100644
--- a/include/linux/dcbnl.h
+++ b/include/linux/dcbnl.h
@@ -110,6 +110,22 @@ struct dcb_app {
 	__u16	protocol;
 };
 
+/* This structure contains the APP feature information sent by the peer.
+ * It is used for both the IEEE 802.1Qaz and the CEE flavors.
+ *
+ * @willing: willing bit in the peer APP tlv
+ * @error: error bit in the peer APP tlv
+ * @app_count: The number of objects in the peer APP table.
+ *
+ * In addition to this information the full peer APP tlv also contains
+ * a table of 'app_count' APP objects defined above.
+ */
+struct dcb_peer_app_info {
+	__u8	willing;
+	__u8	error;
+	__u16	app_count;
+};
+
 struct dcbmsg {
 	__u8               dcb_family;
 	__u8               cmd;
@@ -235,11 +251,25 @@ enum dcbnl_attrs {
 	DCB_ATTR_MAX = __DCB_ATTR_ENUM_MAX - 1,
 };
 
+/**
+ * enum ieee_attrs - IEEE 802.1Qaz get/set attributes
+ *
+ * @DCB_ATTR_IEEE_UNSPEC: unspecified
+ * @DCB_ATTR_IEEE_ETS: negotiated ETS configuration
+ * @DCB_ATTR_IEEE_PFC: negotiated PFC configuration
+ * @DCB_ATTR_IEEE_APP_TABLE: negotiated APP configuration
+ * @DCB_ATTR_IEEE_PEER_ETS: peer ETS configuration - get only
+ * @DCB_ATTR_IEEE_PEER_PFC: peer PFC configuration - get only
+ * @DCB_ATTR_IEEE_PEER_APP: peer APP tlv - get only
+ */
 enum ieee_attrs {
 	DCB_ATTR_IEEE_UNSPEC,
 	DCB_ATTR_IEEE_ETS,
 	DCB_ATTR_IEEE_PFC,
 	DCB_ATTR_IEEE_APP_TABLE,
+	DCB_ATTR_IEEE_PEER_ETS,
+	DCB_ATTR_IEEE_PEER_PFC,
+	DCB_ATTR_IEEE_PEER_APP,
 	__DCB_ATTR_IEEE_MAX
 };
 #define DCB_ATTR_IEEE_MAX (__DCB_ATTR_IEEE_MAX - 1)
@@ -251,6 +281,14 @@ enum ieee_attrs_app {
 };
 #define DCB_ATTR_IEEE_APP_MAX (__DCB_ATTR_IEEE_APP_MAX - 1)
 
+enum peer_app_attr {
+	DCB_ATTR_PEER_APP_UNSPEC,
+	DCB_ATTR_PEER_APP_INFO,
+	DCB_ATTR_PEER_APP,
+	__DCB_ATTR_PEER_APP_MAX
+};
+#define DCB_ATTR_PEER_APP_MAX (__DCB_ATTR_PEER_APP_MAX - 1)
+
 /**
  * enum dcbnl_pfc_attrs - DCB Priority Flow Control user priority nested attrs
  *
diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index a8e7852..739feb4 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -43,6 +43,8 @@ struct dcbnl_rtnl_ops {
 	int (*ieee_setpfc) (struct net_device *, struct ieee_pfc *);
 	int (*ieee_getapp) (struct net_device *, struct dcb_app *);
 	int (*ieee_setapp) (struct net_device *, struct dcb_app *);
+	int (*ieee_peer_getets) (struct net_device *, struct ieee_ets *);
+	int (*ieee_peer_getpfc) (struct net_device *, struct ieee_pfc *);
 
 	/* CEE std */
 	u8   (*getstate)(struct net_device *);
@@ -77,6 +79,9 @@ struct dcbnl_rtnl_ops {
 	u8   (*getdcbx)(struct net_device *);
 	u8   (*setdcbx)(struct net_device *, u8);
 
+	/* peer apps */
+	int (*peer_getappinfo)(struct net_device *, struct dcb_peer_app_info *);
+	int (*peer_getapptable)(struct net_device *, struct dcb_app *);
 
 };
 
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index d5074a5..0eb7461 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1224,6 +1224,56 @@ err:
 	return err;
 }
 
+static int dcbnl_build_peer_app(struct net_device *netdev, struct sk_buff* skb,
+				int attrtype)
+{
+	struct dcb_peer_app_info info;
+	struct dcb_app *table = NULL;
+	const struct dcbnl_rtnl_ops *ops = netdev->dcbnl_ops;
+	int err;
+
+	/**
+	 * retrieve the peer app configuration form the driver. If the driver
+	 * handlers fail exit without doing anything
+	 */
+	err = ops->peer_getappinfo(netdev, &info);
+	if (!err && info.app_count) {
+		table = kmalloc(sizeof(struct dcb_app) * info.app_count,
+				GFP_KERNEL);
+		if (!table)
+			return -ENOMEM;
+
+		err = ops->peer_getapptable(netdev, table);
+	}
+
+	if (!err) {
+		u16 i;
+		struct nlattr *app;
+
+		/**
+		 * build the message, from here on the only possible failure
+		 * is due to the skb size
+		 */
+		err = -EMSGSIZE;
+
+		app = nla_nest_start(skb, attrtype);
+		if (!app)
+			goto nla_put_failure;
+
+		NLA_PUT(skb, DCB_ATTR_PEER_APP_INFO, sizeof(info), &info);
+
+		for (i = 0; i < info.app_count; i++)
+			NLA_PUT(skb, DCB_ATTR_PEER_APP, sizeof(struct dcb_app),
+				&table[i]);
+
+		nla_nest_end(skb, app);
+	}
+	err = 0;
+
+nla_put_failure:
+	kfree(table);
+	return err;
+}
 
 /* Handle IEEE 802.1Qaz GET commands. */
 static int dcbnl_ieee_get(struct net_device *netdev, struct nlattr **tb,
@@ -1288,6 +1338,27 @@ static int dcbnl_ieee_get(struct net_device *netdev, struct nlattr **tb,
 	spin_unlock(&dcb_lock);
 	nla_nest_end(skb, app);
 
+	/* get peer info if available */
+	if (ops->ieee_peer_getets) {
+		struct ieee_ets ets;
+		err = ops->ieee_peer_getets(netdev, &ets);
+		if (!err)
+			NLA_PUT(skb, DCB_ATTR_IEEE_PEER_ETS, sizeof(ets), &ets);
+	}
+
+	if (ops->ieee_peer_getpfc) {
+		struct ieee_pfc pfc;
+		err = ops->ieee_peer_getpfc(netdev, &pfc);
+		if (!err)
+			NLA_PUT(skb, DCB_ATTR_IEEE_PEER_PFC, sizeof(pfc), &pfc);
+	}
+
+	if (ops->peer_getappinfo && ops->peer_getapptable) {
+		err = dcbnl_build_peer_app(netdev, skb, DCB_ATTR_IEEE_PEER_APP);
+		if (err)
+			goto nla_put_failure;
+	}
+
 	nla_nest_end(skb, ieee);
 	nlmsg_end(skb, nlh);
 
-- 
1.7.3.5





^ permalink raw reply related

* Re: [PATCH v4] net: add Faraday FTMAC100 10/100 Ethernet driver
From: David Miller @ 2011-02-24 18:43 UTC (permalink / raw)
  To: ratbert.chuang
  Cc: mirqus, netdev, linux-kernel, bhutchings, eric.dumazet, joe,
	dilinger, ratbert
In-Reply-To: <AANLkTikE_OU_qiBUdiDP39js8jQzGqqo1FYW2uHmw3He@mail.gmail.com>

From: Po-Yu Chuang <ratbert.chuang@gmail.com>
Date: Thu, 24 Feb 2011 16:07:48 +0800

> On Thu, Feb 24, 2011 at 3:51 PM, David Miller <davem@davemloft.net> wrote:
>> Just emit garbage bytes into the sub-word alignment padding if the chip
>> wants to word align it's DMA writes.
> 
> Not sure what do you mean. The problem is that HW does not accept a
> base address of RX buffer which is not 8 bytes aligned.

I am saying this is what hardware should do if it has such a
restriction.

^ permalink raw reply

* tun.c non formal header protocol?
From: Kfir Lavi @ 2011-02-24 18:37 UTC (permalink / raw)
  To: netdev
In-Reply-To: <AANLkTin-PJs07ruhhbC5U0DmqFd-rP8Csorethz_1kes@mail.gmail.com>

Hi,
I would like to use custom protocol over tun/tap device.
I'm grabbing packets, and changing them, to deliver via tap,
to a listener that knows this custom protocol.
The custom protocol is just wrapping the packet with another
small header.
Is it possible to move custom packets via tun.c ?
Is it possible to have a hook, just before the receiver gets
the packet, and then modify the header of the packet, so
it will not disturb tun.c ?

Is there any other options having the power of Linux stack
and netfilter, and using this custom protocol.

I'm grabbing the packets to userspace using nfq, and then with
verdict I'm returning them to the tap device.

Thanks,
Kfir

^ permalink raw reply

* Re: STMMAC driver: NFS Problem on 2.6.37
From: Chuck Lever @ 2011-02-24 18:33 UTC (permalink / raw)
  To: Shiraz Hashim
  Cc: Brian Downing, Deepak SIKRI, Armando VISCONTI, Trond Myklebust,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux NFS Mailing List, Viresh KUMAR, Peppe CAVALLARO, amitgoel
In-Reply-To: <20110224133627.GO920@DLHLAP0379>


On Feb 24, 2011, at 5:36 AM, Shiraz Hashim wrote:

> On Thu, Feb 10, 2011 at 05:26:16AM +0800, Chuck Lever wrote:
>> 
>> On Feb 9, 2011, at 3:58 PM, Brian Downing wrote:
>> 
>>> On Wed, Feb 09, 2011 at 03:12:22PM -0500, Chuck Lever wrote:
>>>> Based on your console logs, I see that the working case uses UDP to
>>>> contact the server's mountd, but the failing case uses TCP.  You can
>>>> try explicitly specifying "proto=udp" to force the use of UDP, to test
>>>> this theory.
>>> 
>>> This does indeed make it work again for me, thanks!
>>> 
>>>> Meanwhile, the patch description explicitly states that the default
>>>> mount option settings have changed.  Does it make sense to change the
>>>> default behavior of NFSROOT mounts to use UDP again?  I don't see
>>>> another way to make this process more reliable across NIC
>>>> initialization.  If this is considered a regression, we can make a
>>>> patch for 2.6.38-rc and 2.6.37.
>>> 
>>> I only use nfsroot for development, so I don't have a terribly strong
>>> opinion.  I would point out though that the default u-boot parameters
>>> for nfsrooting a lot of boards will no longer work at this point, so if
>>> it's not patched to work again without specifying nfs options I think
>>> there should at least be a note in the documentation and possibly a
>>> "maybe try proto=udp?" console message on failure.
>>> 
>>> I assume it's not feasable to either wait until the chosen interface's
>>> link is ready before trying to mount nfsroot, or retrying TCP-based
>>> connections a little bit more aggressively/at all?
>> 
>> Our goal is to use the same mount logic for both normal user
>> space mounts and for NFSROOT (that was the purpose of the patch
>> series this particular patch comes from).  It's
>> exceptionally difficult to add a special case for retrying TCP
>> connections here, as that would change the behavior of user
>> space mounts, which often want to fail quickly, and don't need
>> to worry about NIC initialization.
>> 
>> Sounds like the right thing to do is restore the default UDP behavior.  I'll cook up a patch.
> 
> Is there some patch available for this now.

Yes, it was posted a couple of weeks ago (sorry, I don't have an exact reference).  I will ping Trond again about getting this upstream.

> There is one more observation (on 2.6.37), when I pass
> nfsroot=$(ip):$(rootpath),udp , then it works fine.
> If I pass proto=udp then it doesn't work. Is there any difference
> between the two methods ?

It may be that proto=udp has an effect only on the transport used for NFS requests, but not for the MNT request.  "udp" means "proto=udp,mountproto=udp."

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox