Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] mlx4: Sensing link type at device initialization
From: David Miller @ 2011-04-08  3:36 UTC (permalink / raw)
  To: yevgenyp; +Cc: netdev
In-Reply-To: <4D9D82DA.2090301@mellanox.co.il>

From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date: Thu, 7 Apr 2011 12:24:42 +0300

> 
> When bringing the port up, performing a SENSE_PORT command
> To try and check to which physical link type (IB or Ethernet) the physical
> port is connected.
> In case there is no valid link partner, the port will come up as its
> supported default.
> 
> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>

Applied.

^ permalink raw reply

* Re: [PATCH] mlx4_en: Restoring RX buffer pointer in case of failure
From: David Miller @ 2011-04-08  3:36 UTC (permalink / raw)
  To: yevgenyp; +Cc: netdev
In-Reply-To: <4D9D8319.4030208@mellanox.co.il>

From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date: Thu, 7 Apr 2011 12:25:45 +0300

> 
> If not done, second attempt to open the RX ring would cause memory corruption.
> 
> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>

Applied.

^ permalink raw reply

* linux-next: manual merge of the net tree with Linus' tree
From: Stephen Rothwell @ 2011-04-08  3:45 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Lucas De Marchi, Joe Perches

Hi all,

Today's linux-next merge of the net tree got a conflict in
drivers/net/smsc911x.c between commit 25985edcedea ("Fix common
misspellings") from Linus' tree and commit dffc6b2432ea ("smsc911x: Use
pr_fmt, netdev_<level>, and netif_<level>") from the net tree.

I fixedd it up (see below) anc can carry the fix as necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/smsc911x.c
index 4b42ecc,05a882e..0000000
--- a/drivers/net/smsc911x.c
+++ b/drivers/net/smsc911x.c
@@@ -1669,7 -1669,7 +1669,7 @@@ static int smsc911x_eeprom_send_cmd(str
  	}
  
  	if (e2cmd & E2P_CMD_EPC_TIMEOUT_) {
- 		SMSC_TRACE(DRV, "Error occurred during eeprom operation");
 -		SMSC_TRACE(pdata, drv, "Error occured during eeprom operation");
++		SMSC_TRACE(pdata, drv, "Error occurred during eeprom operation");
  		return -EINVAL;
  	}
  

^ permalink raw reply

* Emergency Email
From: Mrs. Grace Yee @ 2011-04-07 23:14 UTC (permalink / raw)


Deal worth US45,275,000.00. Interested?




^ permalink raw reply

* Re: linux-next: manual merge of the net tree with Linus' tree
From: David Miller @ 2011-04-08  3:49 UTC (permalink / raw)
  To: sfr; +Cc: netdev, linux-next, linux-kernel, lucas.demarchi, joe
In-Reply-To: <20110408134551.521c6606.sfr@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Fri, 8 Apr 2011 13:45:51 +1000

> Today's linux-next merge of the net tree got a conflict in
> drivers/net/smsc911x.c between commit 25985edcedea ("Fix common
> misspellings") from Linus' tree and commit dffc6b2432ea ("smsc911x: Use
> pr_fmt, netdev_<level>, and netif_<level>") from the net tree.
> 
> I fixedd it up (see below) anc can carry the fix as necessary.

Thanks Stephen, I'll clear this one up some time in the next week.

^ permalink raw reply

* (unknown), 
From: WESTERN UNION MONEY TRANSFER @ 2011-04-08  1:17 UTC (permalink / raw)





We have been trying to reach you as My associate has helped me to send
your first payment of US$7,500 to you
as instructed by Minister, David Cameron the British prime minister after
the last G20 meeting that was held on April 2nd in London, making you
one of the beneficaries. Here is the information below:

MONEY TRANSFER CONTROL NUMBER:8309124803
SENDER NAME:SOLOMON
Last name:DANEIL.Q
AMOUNT: US$7,500

I told him to keep sending you US$7,500 twice a week until the FULL
payment of (US $360,000.00 Dollars) is completed within 6 (six)
Months.For track, send your

Full Names via Email to:

Mr Garry Moore
E-mail:western.uniontransfer08@live.com


^ permalink raw reply

* [PATCH net-next-2.6 v3 0/3] sctp: Patch series
From: Michio Honda @ 2011-04-08  5:18 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers

Series of 5 patches to support auto_asconf and the other related functionalities that auto_asconf relies on. 

Cheers,
- Michio

[1/3] Add Auto-ASCONF support
[2/3] Add ASCONF operation on the single-homed host
[3/3] Add a valid address list in association local

^ permalink raw reply

* [PATCH net-next-2.6 v3 1/3] sctp: Add Auto-ASCONF support
From: Michio Honda @ 2011-04-08  5:18 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers

SCTP reconfigure the IP addresses in the association by using ASCONF chunks as mentioned in RFC5061.  
For example, we can start to use the newly configured IP address in the existing association.  
ASCONF operation is invoked in two ways: 
First is done by the application to call sctp_bindx() system call.  
Second is automatic operation in the SCTP stack with address events in the host computer (called auto_asconf) .  
The former is already implemented, but the latter is not yet. This patch enables it with one sysctl parameter and setsockopt() system call.  

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
---
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 11684d9..11c3060 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -767,6 +767,7 @@ enum {
 	NET_SCTP_SNDBUF_POLICY		 = 15,
 	NET_SCTP_SACK_TIMEOUT		 = 16,
 	NET_SCTP_RCVBUF_POLICY		 = 17,
+	NET_SCTP_AUTO_ASCONF_ENABLE	 = 18,
 };
 
 /* /proc/sys/net/bridge */
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 505845d..75ba6a4 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -121,6 +121,7 @@ extern int sctp_copy_local_addr_list(struct sctp_bind_addr *,
 				     int flags);
 extern struct sctp_pf *sctp_get_pf_specific(sa_family_t family);
 extern int sctp_register_pf(struct sctp_pf *, sa_family_t);
+void sctp_addr_wq_mgmt(union sctp_addr *, int);
 
 /*
  * sctp/socket.c
@@ -135,6 +136,7 @@ void sctp_sock_rfree(struct sk_buff *skb);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 		    struct sctp_association *asoc);
 extern struct percpu_counter sctp_sockets_allocated;
+int sctp_asconf_mgmt(struct sctp_endpoint *);
 
 /*
  * sctp/primitive.c
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index cc9185c..3e0351a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -205,6 +205,11 @@ extern struct sctp_globals {
 	 * It is a list of sctp_sockaddr_entry.
 	 */
 	struct list_head local_addr_list;
+	int auto_asconf_enable;
+	struct list_head addr_waitq;
+	struct timer_list addr_wq_timer;
+	struct list_head auto_asconf_eplist;
+	spinlock_t addr_wq_lock;
 
 	/* Lock that protects the local_addr_list writers */
 	spinlock_t addr_list_lock;
@@ -264,6 +269,11 @@ extern struct sctp_globals {
 #define sctp_port_hashtable		(sctp_globals.port_hashtable)
 #define sctp_local_addr_list		(sctp_globals.local_addr_list)
 #define sctp_local_addr_lock		(sctp_globals.addr_list_lock)
+#define sctp_auto_asconf_eplist		(sctp_globals.auto_asconf_eplist)
+#define sctp_addr_waitq			(sctp_globals.addr_waitq)
+#define sctp_addr_wq_timer		(sctp_globals.addr_wq_timer)
+#define sctp_addr_wq_lock		(sctp_globals.addr_wq_lock)
+#define sctp_auto_asconf_enable		(sctp_globals.auto_asconf_enable)
 #define sctp_scope_policy		(sctp_globals.ipv4_scope_policy)
 #define sctp_addip_enable		(sctp_globals.addip_enable)
 #define sctp_addip_noauth		(sctp_globals.addip_noauth_enable)
@@ -796,6 +806,15 @@ struct sctp_sockaddr_entry {
 	__u8 valid;
 };
 
+#define SCTP_NEWADDR	1
+#define SCTP_DELADDR	2
+#define SCTP_ADDRESS_TICK_DELAY	500
+struct sctp_addr_wait {
+	struct list_head list;
+	union sctp_addr a;
+	int	cmd;
+};
+
 typedef struct sctp_chunk *(sctp_packet_phandler_t)(struct sctp_association *);
 
 /* This structure holds lists of chunks as we are assembling for
@@ -1239,6 +1258,7 @@ sctp_scope_t sctp_scope(const union sctp_addr *);
 int sctp_in_scope(const union sctp_addr *addr, const sctp_scope_t scope);
 int sctp_is_any(struct sock *sk, const union sctp_addr *addr);
 int sctp_addr_is_valid(const union sctp_addr *addr);
+int sctp_is_ep_boundall(struct sock *sk);
 
 
 /* What type of endpoint?  */
@@ -1267,6 +1287,7 @@ struct sctp_ep_common {
 	/* Fields to help us manage our entries in the hash tables. */
 	struct hlist_node node;
 	int hashent;
+	struct list_head auto_asconf_list;
 
 	/* Runtime type information.  What kind of endpoint is this? */
 	sctp_endpoint_type_t type;
@@ -1369,6 +1390,7 @@ struct sctp_endpoint {
 	/* SCTP-AUTH: endpoint shared keys */
 	struct list_head endpoint_shared_keys;
 	__u16 active_key_id;
+	int do_auto_asconf;
 };
 
 /* Recover the outter endpoint structure. */
diff --git a/include/net/sctp/user.h b/include/net/sctp/user.h
index e73ebda..75c96b1 100644
--- a/include/net/sctp/user.h
+++ b/include/net/sctp/user.h
@@ -91,6 +91,7 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_PEER_AUTH_CHUNKS	26	/* Read only */
 #define SCTP_LOCAL_AUTH_CHUNKS	27	/* Read only */
 #define SCTP_GET_ASSOC_NUMBER	28	/* Read only */
+#define SCTP_AUTO_ASCONF	29
 
 /* Internal Socket Options. Some of the sctp library functions are
  * implemented using these socket options.
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index faf71d1..426715f 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -536,6 +536,23 @@ int sctp_in_scope(const union sctp_addr *addr, sctp_scope_t scope)
 	return 0;
 }
 
+int sctp_is_ep_boundall(struct sock *sk)
+{
+	struct sctp_bind_addr *bp;
+	struct sctp_sockaddr_entry *addr;
+
+	bp = &sctp_sk(sk)->ep->base.bind_addr;
+	if (sctp_list_single_entry(&bp->address_list)) {
+		addr = list_entry(bp->address_list.next,
+				  struct sctp_sockaddr_entry, list);
+		if (sctp_is_any(sk, &addr->a))
+			return 1;
+		else
+			return 0;
+	}
+	return 1;
+}
+
 /********************************************************************
  * 3rd Level Abstractions
  ********************************************************************/
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 865ce7b..1b31b4d 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -105,6 +105,7 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
 			addr->valid = 1;
 			spin_lock_bh(&sctp_local_addr_lock);
 			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+			sctp_addr_wq_mgmt(&addr->a, SCTP_NEWADDR);
 			spin_unlock_bh(&sctp_local_addr_lock);
 		}
 		break;
@@ -115,6 +116,7 @@ static int sctp_inet6addr_event(struct notifier_block *this, unsigned long ev,
 			if (addr->a.sa.sa_family == AF_INET6 &&
 					ipv6_addr_equal(&addr->a.v6.sin6_addr,
 						&ifa->addr)) {
+				sctp_addr_wq_mgmt(&addr->a, SCTP_DELADDR);
 				found = 1;
 				addr->valid = 0;
 				list_del_rcu(&addr->list);
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 152976e..f9e0bd6 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -636,6 +636,194 @@ static void sctp_v4_ecn_capable(struct sock *sk)
 	INET_ECN_xmit(sk);
 }
 
+void sctp_addr_wq_timeout_handler(unsigned long arg)
+{
+	struct sctp_addr_wait *addrw = NULL;
+	union sctp_addr *addr = NULL;
+	struct sctp_ep_common *epb = NULL;
+	struct sctp_endpoint *ep = NULL;
+
+	spin_lock_bh(&sctp_addr_wq_lock);
+retry_wq:
+	if (list_empty(&sctp_addr_waitq)) {
+		SCTP_DEBUG_PRINTK("sctp_addrwq_timo_handler: nothing in addr waitq\n");
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	}
+	addrw = list_first_entry(&sctp_addr_waitq, struct sctp_addr_wait, list);
+	if (addrw->cmd != SCTP_NEWADDR && addrw->cmd != SCTP_DELADDR) {
+		SCTP_DEBUG_PRINTK("sctp_addrwq_timo_handler: cmd is neither NEWADDR nor DELADDR\n");
+		list_del(&addrw->list);
+		kfree(addrw);
+		goto retry_wq;
+	}
+
+	addr = &addrw->a;
+	SCTP_DEBUG_PRINTK_IPADDR("sctp_addrwq_timo_handler: the first ent in wq %p is ",
+	    " for cmd %d at entry %p\n", &sctp_addr_waitq, addr, addrw->cmd,
+	    addrw);
+
+	/* Now we send an ASCONF for each association */
+	/* Note. we currently don't handle link local IPv6 addressees */
+	if (addr->sa.sa_family == AF_INET6) {
+		struct in6_addr *in6 = (struct in6_addr *)&addr->v6.sin6_addr;
+
+		if (ipv6_addr_type(&addr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL) {
+			SCTP_DEBUG_PRINTK("sctp_timo_handler: link local, hence don't tell eps\n");
+			list_del(&addrw->list);
+			kfree(addrw);
+			goto retry_wq;
+		}
+		if (ipv6_chk_addr(&init_net, in6, NULL, 0) == 0 &&
+		    addrw->cmd == SCTP_NEWADDR) {
+			unsigned long timeo_val;
+
+			SCTP_DEBUG_PRINTK("sctp_timo_handler: this is on DAD, trying %d sec later\n",
+			    SCTP_ADDRESS_TICK_DELAY);
+			timeo_val = jiffies;
+			timeo_val += msecs_to_jiffies(SCTP_ADDRESS_TICK_DELAY);
+			(void)mod_timer(&sctp_addr_wq_timer, timeo_val);
+			spin_unlock_bh(&sctp_addr_wq_lock);
+			return;
+		}
+	}
+	list_for_each_entry(epb, &sctp_auto_asconf_eplist, auto_asconf_list) {
+		if (epb == NULL) {
+			SCTP_DEBUG_PRINTK("addrwq_timo_handler: no epb\n");
+			continue;
+		}
+		if (!sctp_is_ep_boundall(epb->sk))
+			/* ignore bound-specific endpoints */
+			continue;
+		ep = sctp_ep(epb);
+		sctp_bh_lock_sock(epb->sk);
+		if (sctp_asconf_mgmt(ep) < 0) {
+			SCTP_DEBUG_PRINTK("sctp_addrwq_timo_handler: sctp_asconf_mgmt failed\n");
+			sctp_bh_unlock_sock(epb->sk);
+			continue;
+		}
+		sctp_bh_unlock_sock(epb->sk);
+	}
+
+	list_del(&addrw->list);
+	kfree(addrw);
+
+	if (list_empty(&sctp_addr_waitq)) {
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	} else
+		goto retry_wq;
+
+	spin_unlock_bh(&sctp_addr_wq_lock);
+}
+
+static void sctp_free_addr_wq()
+{
+	struct sctp_addr_wait *addrw = NULL;
+	struct sctp_addr_wait *temp = NULL;
+
+	spin_lock_bh(&sctp_addr_wq_lock);
+	(void)del_timer(&sctp_addr_wq_timer);
+	list_for_each_entry_safe(addrw, temp, &sctp_addr_waitq, list) {
+		list_del(&addrw->list);
+		kfree(addrw);
+	}
+	spin_unlock_bh(&sctp_addr_wq_lock);
+}
+
+void sctp_addr_wq_mgmt(union sctp_addr *reqaddr, int cmd)
+{
+	struct sctp_addr_wait *addrw = NULL;
+	struct sctp_addr_wait *addrw_new = NULL;
+	unsigned long timeo_val;
+	union sctp_addr *tmpaddr;
+
+	/* first, we check if an opposite message already exist in the queue.
+	 * If we found such message, it is removed.
+	 * This operation is a bit stupid, but the DHCP client attaches the
+	 * new address after a couple of addition and deletion of that address
+	 */
+
+	if (reqaddr == NULL) {
+		SCTP_DEBUG_PRINTK("sctp_addr_wq_mgmt: no address message?\n");
+		return;
+	}
+
+	spin_lock_bh(&sctp_addr_wq_lock);
+	/* Offsets existing events in addr_wq */
+	list_for_each_entry(addrw, &sctp_addr_waitq, list) {
+		if (addrw->a.sa.sa_family != reqaddr->sa.sa_family)
+			continue;
+		if (reqaddr->sa.sa_family == AF_INET) {
+			if (reqaddr->v4.sin_addr.s_addr ==
+			    addrw->a.v4.sin_addr.s_addr) {
+				if (cmd != addrw->cmd) {
+					tmpaddr = &addrw->a;
+					SCTP_DEBUG_PRINTK_IPADDR("sctp_addr_wq_mgmt offsets existing entry for %d ",
+					    " in waitq %p\n", addrw->cmd,
+					    tmpaddr, &sctp_addr_waitq);
+					list_del(&addrw->list);
+					kfree(addrw);
+					/* nothing to do anymore */
+					spin_unlock_bh(&sctp_addr_wq_lock);
+					return;
+				}
+			}
+		} else if (reqaddr->sa.sa_family == AF_INET6) {
+			if (ipv6_addr_equal(&reqaddr->v6.sin6_addr,
+			    &addrw->a.v6.sin6_addr)) {
+				if (cmd != addrw->cmd) {
+					tmpaddr = &addrw->a;
+					SCTP_DEBUG_PRINTK_IPADDR("sctp_addr_wq_mgmt: offsets existing entry for %d ",
+					    " in waitq %p\n", addrw->cmd,
+					    tmpaddr, &sctp_addr_waitq);
+					list_del(&addrw->list);
+					kfree(addrw);
+					spin_unlock_bh(&sctp_addr_wq_lock);
+					return;
+				}
+			}
+		}
+	}
+
+	/* OK, we have to add the new address to the wait queue */
+	addrw_new = kzalloc(sizeof(struct sctp_addr_wait), GFP_ATOMIC);
+	if (addrw_new == NULL) {
+		SCTP_DEBUG_PRINTK("sctp_addr_weitq_mgmt no memory? return\n");
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	}
+	if (reqaddr->sa.sa_family == AF_INET) {
+		addrw_new->a.v4.sin_family = AF_INET;
+		addrw_new->a.v4.sin_addr.s_addr = reqaddr->v4.sin_addr.s_addr;
+	} else if (reqaddr->sa.sa_family == AF_INET6) {
+		addrw_new->a.v6.sin6_family = AF_INET6;
+		ipv6_addr_copy(&addrw_new->a.v6.sin6_addr,
+		    &reqaddr->v6.sin6_addr);
+	} else {
+		SCTP_DEBUG_PRINTK("sctp_addr_waitq_mgmt: Unknown family of request addr, return\n");
+		kfree(addrw_new);
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	}
+	addrw_new->cmd = cmd;
+	list_add_tail(&addrw_new->list, &sctp_addr_waitq);
+	tmpaddr = &addrw_new->a;
+	SCTP_DEBUG_PRINTK_IPADDR("sctp_addr_wq_mgmt add new entry for cmd:%d ",
+	    " in waitq %p, start a timer\n",
+	    addrw_new->cmd, tmpaddr, &sctp_addr_waitq);
+
+	if (timer_pending(&sctp_addr_wq_timer)) {
+		SCTP_DEBUG_PRINTK("sctp_addr_wq_mgmt: addr_wq timer is already running\n");
+		spin_unlock_bh(&sctp_addr_wq_lock);
+		return;
+	}
+	timeo_val = jiffies;
+	timeo_val += msecs_to_jiffies(SCTP_ADDRESS_TICK_DELAY);
+	(void)mod_timer(&sctp_addr_wq_timer, timeo_val);
+	spin_unlock_bh(&sctp_addr_wq_lock);
+}
+
 /* Event handler for inet address addition/deletion events.
  * The sctp_local_addr_list needs to be protocted by a spin lock since
  * multiple notifiers (say IPv4 and IPv6) may be running at the same
@@ -663,6 +851,7 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
 			addr->valid = 1;
 			spin_lock_bh(&sctp_local_addr_lock);
 			list_add_tail_rcu(&addr->list, &sctp_local_addr_list);
+			sctp_addr_wq_mgmt(&addr->a, SCTP_NEWADDR);
 			spin_unlock_bh(&sctp_local_addr_lock);
 		}
 		break;
@@ -673,6 +862,7 @@ static int sctp_inetaddr_event(struct notifier_block *this, unsigned long ev,
 			if (addr->a.sa.sa_family == AF_INET &&
 					addr->a.v4.sin_addr.s_addr ==
 					ifa->ifa_local) {
+				sctp_addr_wq_mgmt(&addr->a, SCTP_DELADDR);
 				found = 1;
 				addr->valid = 0;
 				list_del_rcu(&addr->list);
@@ -1280,6 +1470,14 @@ SCTP_STATIC __init int sctp_init(void)
 	spin_lock_init(&sctp_local_addr_lock);
 	sctp_get_local_addr_list();
 
+	/* Initialize the address event list */
+	INIT_LIST_HEAD(&sctp_addr_waitq);
+	INIT_LIST_HEAD(&sctp_auto_asconf_eplist);
+	spin_lock_init(&sctp_addr_wq_lock);
+	sctp_addr_wq_timer.expires = 0;
+	setup_timer(&sctp_addr_wq_timer, sctp_addr_wq_timeout_handler,
+	    (unsigned long)NULL);
+
 	status = sctp_v4_protosw_init();
 
 	if (status)
@@ -1344,6 +1542,7 @@ err_chunk_cachep:
 /* Exit handler for the SCTP protocol.  */
 SCTP_STATIC __exit void sctp_exit(void)
 {
+	sctp_free_addr_wq();
 	/* BUG.  This should probably do something useful like clean
 	 * up all the remaining associations and all that memory.
 	 */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3951a10..27dffa3 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -807,6 +807,47 @@ out:
 	return retval;
 }
 
+/* set addr events to assocs in the endpoint.  ep and addr_wq must be locked */
+int
+sctp_asconf_mgmt(struct sctp_endpoint *ep)
+{
+	struct sock *sk = ep->base.sk;
+	struct sctp_addr_wait *addrw = NULL;
+	union sctp_addr *addr = NULL;
+	int cmd;
+	int error = 0;
+
+	if (ep == NULL || sk == NULL)
+		return -EINVAL;
+	if (list_empty(&sctp_addr_waitq)) {
+		SCTP_DEBUG_PRINTK("asconf_mgmt: nothing in the wq\n");
+		return -EINVAL;
+	}
+	addrw = list_first_entry(&sctp_addr_waitq, struct sctp_addr_wait, list);
+	if (addrw->cmd != SCTP_NEWADDR && addrw->cmd != SCTP_DELADDR)
+		return -EINVAL;
+	addr = &addrw->a;
+	addr->v4.sin_port = htons(ep->base.bind_addr.port);
+	cmd = addrw->cmd;
+
+	SCTP_DEBUG_PRINTK("sctp_asconf_mgmt sk:%p ep:%p\n", sk, ep);
+	if (cmd == SCTP_NEWADDR) {
+		error = sctp_send_asconf_add_ip(sk, (struct sockaddr *)addr, 1);
+		if (error) {
+			SCTP_DEBUG_PRINTK("asconf_mgmt: send_asconf_add_ip returns %d\n", error);
+			return error;
+		}
+	} else if (cmd == SCTP_DELADDR) {
+		error = sctp_send_asconf_del_ip(sk, (struct sockaddr *)addr, 1);
+		if (error) {
+			SCTP_DEBUG_PRINTK("asconf_mgmt: send_asconf_del_ip returns %d\n", error);
+			return error;
+		}
+	}
+
+	return 0;
+}
+
 /* Helper for tunneling sctp_bindx() requests through sctp_setsockopt()
  *
  * API 8.1
@@ -3341,6 +3382,45 @@ static int sctp_setsockopt_del_key(struct sock *sk,
 
 }
 
+/*
+ * 8.1.23 SCTP_AUTO_ASCONF
+ *
+ * This option will enable or disable the use of the automatic generation of
+ * ASCONF chunks to add and delete addresses to an existing association.  Note
+ * that this option has two caveats namely: a) it only affects sockets that
+ * are bound to all addresses available to the SCTP stack, and b) the system
+ * administrator may have an overriding control that turns the ASCONF feature
+ * off no matter what setting the socket option may have.
+ * This option expects an integer boolean flag, where a non-zero value turns on
+ * the option, and a zero value turns off the option.
+ * Note. In this implementation, socket operation overrides default parameter
+ * being set by sysctl as well as FreeBSD implementation
+ */
+static int sctp_setsockopt_auto_asconf(struct sock *sk, char __user *optval,
+					unsigned int optlen)
+{
+	int val;
+	struct sctp_endpoint *ep = sctp_sk(sk)->ep;
+
+	if (optlen < sizeof(int))
+		return -EINVAL;
+	if (get_user(val, (int __user *)optval))
+		return -EFAULT;
+	if (!sctp_is_ep_boundall(sk) && val)
+		return -EINVAL;
+	if ((val && ep->do_auto_asconf) || (!val && !ep->do_auto_asconf))
+		return 0;
+
+	if (val == 0 && ep->do_auto_asconf) {
+		list_del(&ep->base.auto_asconf_list);
+		ep->do_auto_asconf = 0;
+	} else if (val && !ep->do_auto_asconf) {
+		list_add_tail(&ep->base.auto_asconf_list,
+		    &sctp_auto_asconf_eplist);
+		ep->do_auto_asconf = 1;
+	}
+	return 0;
+}
 
 /* API 6.2 setsockopt(), getsockopt()
  *
@@ -3488,6 +3568,9 @@ SCTP_STATIC int sctp_setsockopt(struct sock *sk, int level, int optname,
 	case SCTP_AUTH_DELETE_KEY:
 		retval = sctp_setsockopt_del_key(sk, optval, optlen);
 		break;
+	case SCTP_AUTO_ASCONF:
+		retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
+		break;
 	default:
 		retval = -ENOPROTOOPT;
 		break;
@@ -3770,6 +3853,13 @@ SCTP_STATIC int sctp_init_sock(struct sock *sk)
 	local_bh_disable();
 	percpu_counter_inc(&sctp_sockets_allocated);
 	sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
+	if (sctp_auto_asconf_enable) {
+		list_add_tail(&ep->base.auto_asconf_list,
+		    &sctp_auto_asconf_eplist);
+		ep->do_auto_asconf = 1;
+	} else
+		ep->do_auto_asconf = 0;
+	SCTP_DEBUG_PRINTK("sctp_init_sk sk:%p ep:%p\n", sk, ep);
 	local_bh_enable();
 
 	return 0;
@@ -3784,6 +3874,10 @@ SCTP_STATIC void sctp_destroy_sock(struct sock *sk)
 
 	/* Release our hold on the endpoint. */
 	ep = sctp_sk(sk)->ep;
+	if (ep->do_auto_asconf) {
+		ep->do_auto_asconf = 0;
+		list_del(&ep->base.auto_asconf_list);
+	}
 	sctp_endpoint_free(ep);
 	local_bh_disable();
 	percpu_counter_dec(&sctp_sockets_allocated);
@@ -5283,6 +5377,28 @@ static int sctp_getsockopt_assoc_number(struct sock *sk, int len,
 	return 0;
 }
 
+/*
+ * 8.1.23 SCTP_AUTO_ASCONF
+ * See the corresponding setsockopt entry as description
+ */
+static int sctp_getsockopt_auto_asconf(struct sock *sk, int len,
+				   char __user *optval, int __user *optlen)
+{
+	int val = 0;
+
+	if (len < sizeof(int))
+		return -EINVAL;
+
+	len = sizeof(int);
+	if (sctp_sk(sk)->ep->do_auto_asconf && sctp_is_ep_boundall(sk))
+		val = 1;
+	if (put_user(len, optlen))
+		return -EFAULT;
+	if (copy_to_user(optval, &val, len))
+		return -EFAULT;
+	return 0;
+}
+
 SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
 				char __user *optval, int __user *optlen)
 {
@@ -5415,6 +5531,9 @@ SCTP_STATIC int sctp_getsockopt(struct sock *sk, int level, int optname,
 	case SCTP_GET_ASSOC_NUMBER:
 		retval = sctp_getsockopt_assoc_number(sk, len, optval, optlen);
 		break;
+	case SCTP_AUTO_ASCONF:
+		retval = sctp_getsockopt_auto_asconf(sk, len, optval, optlen);
+		break;
 	default:
 		retval = -ENOPROTOOPT;
 		break;
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 50cb57f..df39789 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -183,6 +183,13 @@ static ctl_table sctp_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 	{
+		.procname	= "auto_asconf_enable",
+		.data		= &sctp_auto_asconf_enable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
 		.procname	= "prsctp_enable",
 		.data		= &sctp_prsctp_enable,
 		.maxlen		= sizeof(int),


^ permalink raw reply related

* [PATCH net-next-2.6 v3 2/3] sctp: Add ASCONF operation on the single-homed host
From: Michio Honda @ 2011-04-08  5:19 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers

SCTP can change the IP address on the single-homed host.  
In this case, the SCTP association transmits an ASCONF packet including addition of the new IP address and deletion of the old address.  
This patch implements this functionality.  

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
---
diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
index c70d8cc..d7a4ee3 100644
--- a/include/net/sctp/constants.h
+++ b/include/net/sctp/constants.h
@@ -441,4 +441,8 @@ enum {
  */
 #define SCTP_AUTH_RANDOM_LENGTH 32
 
+/* ASCONF PARAMETERS */
+#define SCTP_ASCONF_V4_PARAM_LEN 16
+#define SCTP_ASCONF_V6_PARAM_LEN 28
+
 #endif /* __sctp_constants_h__ */
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 9352d12..498a3cf 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -259,6 +259,7 @@ struct sctp_chunk *sctp_process_asconf(struct sctp_association *asoc,
 				       struct sctp_chunk *asconf);
 int sctp_process_asconf_ack(struct sctp_association *asoc,
 			    struct sctp_chunk *asconf_ack);
+void sctp_path_check_and_react(struct sctp_association *, struct sockaddr *);
 struct sctp_chunk *sctp_make_fwdtsn(const struct sctp_association *asoc,
 				    __u32 new_cum_tsn, size_t nstreams,
 				    struct sctp_fwdtsn_skip *skiplist);
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index cc9185c..400ee8a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1901,6 +1901,9 @@ struct sctp_association {
 	 * after reaching 4294967295.
 	 */
 	__u32 addip_serial;
+	union sctp_addr *asconf_addr_del_pending;
+	__u32 asconf_del_pending_cid;
+	int src_out_of_asoc_ok;
 
 	/* SCTP AUTH: list of the endpoint shared keys.  These
 	 * keys are provided out of band by the user applicaton
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 6b04287..2dfd0e8 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -279,6 +279,9 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 	asoc->peer.asconf_capable = 0;
 	if (sctp_addip_noauth)
 		asoc->peer.asconf_capable = 1;
+	asoc->asconf_addr_del_pending = NULL;
+	asoc->asconf_del_pending_cid = 0;
+	asoc->src_out_of_asoc_ok = 0;
 
 	/* Create an input queue.  */
 	sctp_inq_init(&asoc->base.inqueue);
@@ -443,6 +446,10 @@ void sctp_association_free(struct sctp_association *asoc)
 
 	asoc->peer.transport_count = 0;
 
+	/* Free pending address space being deleted */
+	if (asoc->asconf_addr_del_pending != NULL)
+		kfree(asoc->asconf_addr_del_pending);
+
 	/* Free any cached ASCONF_ACK chunk. */
 	sctp_assoc_free_asconf_acks(asoc);
 
@@ -1277,7 +1284,7 @@ void sctp_assoc_update(struct sctp_association *asoc,
  */
 void sctp_assoc_update_retran_path(struct sctp_association *asoc)
 {
-	struct sctp_transport *t, *next;
+	struct sctp_transport *t, *next, *unconfirmed;
 	struct list_head *head = &asoc->peer.transport_addr_list;
 	struct list_head *pos;
 
@@ -1287,7 +1294,7 @@ void sctp_assoc_update_retran_path(struct sctp_association *asoc)
 	/* Find the next transport in a round-robin fashion. */
 	t = asoc->peer.retran_path;
 	pos = &t->transports;
-	next = NULL;
+	next = unconfirmed = NULL;
 
 	while (1) {
 		/* Skip the head. */
@@ -1318,11 +1325,15 @@ void sctp_assoc_update_retran_path(struct sctp_association *asoc)
 			 */
 			if (t->state != SCTP_UNCONFIRMED && !next)
 				next = t;
+			else if (t->state == SCTP_UNCONFIRMED)
+				unconfirmed = t;
 		}
 	}
 
 	if (t)
 		asoc->peer.retran_path = t;
+	else if (unconfirmed)
+		asoc->peer.retran_path = t = unconfirmed;
 
 	SCTP_DEBUG_PRINTK_IPADDR("sctp_assoc_update_retran_path:association"
 				 " %p addr: ",
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 865ce7b..56c97ce 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -332,6 +332,13 @@ static void sctp_v6_get_saddr(struct sctp_sock *sk,
 				matchlen = bmatchlen;
 			}
 		}
+		if (laddr->state == SCTP_ADDR_NEW && asoc->src_out_of_asoc_ok) {
+			bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
+			if (!baddr || (matchlen < bmatchlen)) {
+				baddr = &laddr->a;
+				matchlen = bmatchlen;
+			}
+		}
 	}
 
 	if (baddr) {
diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 26dc005..033ea20 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -344,7 +344,14 @@ int sctp_outq_tail(struct sctp_outq *q, struct sctp_chunk *chunk)
 			break;
 		}
 	} else {
-		list_add_tail(&chunk->list, &q->control_chunk_list);
+		/* We add the ASCONF for the only one newly added address at
+		 * the front of the queue
+		 */
+		if (q->asoc->src_out_of_asoc_ok && \
+		    chunk->chunk_hdr->type == SCTP_CID_ASCONF)
+			list_add(&chunk->list, &q->control_chunk_list);
+		else
+			list_add_tail(&chunk->list, &q->control_chunk_list);
 		SCTP_INC_STATS(SCTP_MIB_OUTCTRLCHUNKS);
 	}
 
@@ -850,6 +857,24 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout)
 		case SCTP_CID_SHUTDOWN:
 		case SCTP_CID_ECN_ECNE:
 		case SCTP_CID_ASCONF:
+			/* RFC 5061, 5.3
+			 * F1) This means that until such time as the ASCONF
+			 * containing the add is acknowledged, the sender MUST
+			 * NOT use the new IP address as a source for ANY SCTP
+			 * packet except on carrying an ASCONF Chunk.
+			 */
+			if (asoc->src_out_of_asoc_ok) {
+				SCTP_DEBUG_PRINTK("outq_flush: out_of_asoc_ok, transmit chunk type %d\n",
+				    chunk->chunk_hdr->type);
+				packet = &transport->packet;
+				sctp_packet_config(packet, vtag,
+						asoc->peer.ecn_capable);
+				sctp_packet_append_chunk(packet, chunk);
+				error = sctp_packet_transmit(packet);
+				if (error < 0)
+					return error;
+				goto sctp_flush_out;
+			}
 		case SCTP_CID_FWD_TSN:
 			status = sctp_packet_transmit_chunk(packet, chunk,
 							    one_packet);
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 152976e..b8ec3cc 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -510,7 +510,8 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 		sctp_v4_dst_saddr(&dst_saddr, dst, htons(bp->port));
 		rcu_read_lock();
 		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
-			if (!laddr->valid || (laddr->state != SCTP_ADDR_SRC))
+			if (!laddr->valid || (laddr->state != SCTP_ADDR_SRC &&
+			    asoc->src_out_of_asoc_ok == 0))
 				continue;
 			if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
 				goto out_unlock;
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index de98665..5a085b3 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2651,6 +2651,61 @@ __u32 sctp_generate_tsn(const struct sctp_endpoint *ep)
 	return retval;
 }
 
+void
+sctp_path_check_and_react(struct sctp_association *asoc, struct sockaddr *sa)
+{
+	struct sctp_transport *trans;
+	int addrnum, family;
+	struct sctp_sockaddr_entry *saddr;
+	struct sctp_bind_addr *bp;
+	union sctp_addr *tmpaddr;
+
+	family = sa->sa_family;
+	bp = &asoc->base.bind_addr;
+	addrnum = 0;
+	/* count up the number of local addresses in the same family */
+	list_for_each_entry(saddr, &bp->address_list, list) {
+		if (saddr->a.sa.sa_family == family) {
+			tmpaddr = &saddr->a;
+			if (family == AF_INET6 &&
+			    ipv6_addr_type(&tmpaddr->v6.sin6_addr) &
+			    IPV6_ADDR_LINKLOCAL) {
+				continue;
+			}
+			addrnum++;
+		}
+	}
+	if (addrnum == 1) {
+		union sctp_addr *tmpaddr;
+		tmpaddr = (union sctp_addr *)sa;
+		SCTP_DEBUG_PRINTK_IPADDR("pcheck_react: only 1 local addr in asoc %p ",
+		    " family %d\n", asoc, tmpaddr, family);
+		list_for_each_entry(trans, &asoc->peer.transport_addr_list,
+		    transports) {
+			/* reset path information and release refcount to the
+			 * dst_entry  based on the src change */
+			sctp_transport_hold(trans);
+			trans->cwnd = min(4*asoc->pathmtu,
+			    max_t(__u32, 2*asoc->pathmtu, 4380));
+			trans->ssthresh = asoc->peer.i.a_rwnd;
+			trans->rtt = 0;
+			trans->srtt = 0;
+			trans->rttvar = 0;
+			trans->rto = asoc->rto_initial;
+			dst_release(trans->dst);
+			trans->dst = NULL;
+			memset(&trans->saddr, 0, sizeof(union sctp_addr));
+			sctp_transport_route(trans, NULL,
+			    sctp_sk(asoc->base.sk));
+			SCTP_DEBUG_PRINTK_IPADDR("we freed dst_entry (asoc: %p dst: ",
+			    " trans: %p)\n", asoc, (&trans->ipaddr), trans);
+			trans->rto_pending = 1;
+			sctp_transport_put(trans);
+		}
+	}
+	return;
+}
+
 /*
  * ADDIP 3.1.1 Address Configuration Change Chunk (ASCONF)
  *      0                   1                   2                   3
@@ -2744,11 +2799,29 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
 	int			addr_param_len = 0;
 	int 			totallen = 0;
 	int 			i;
+	sctp_addip_param_t del_param; /* 8 Bytes (Type 0xC002, Len and CrrID) */
+	sctp_addip_param_t spr_param;
+	struct sctp_af *del_af;
+	struct sctp_af *spr_af;
+	int del_addr_param_len = 0;
+	int spr_addr_param_len = 0;
+	int del_paramlen = sizeof(sctp_addip_param_t);
+	int spr_paramlen = sizeof(sctp_addip_param_t);
+	union sctp_addr_param del_addr_param; /* (v4) 8 Bytes, (v6) 20 Bytes */
+	union sctp_addr_param spr_addr_param;
+	int			v4 = 0;
+	int			v6 = 0;
 
 	/* Get total length of all the address parameters. */
 	addr_buf = addrs;
 	for (i = 0; i < addrcnt; i++) {
 		addr = (union sctp_addr *)addr_buf;
+		if (addr != NULL) {
+			if (addr->sa.sa_family == AF_INET)
+				v4 = 1;
+			else if (addr->sa.sa_family == AF_INET6)
+				v6 = 1;
+		}
 		af = sctp_get_af_specific(addr->v4.sin_family);
 		addr_param_len = af->to_addr_param(addr, &addr_param);
 
@@ -2757,6 +2830,40 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
 
 		addr_buf += af->sockaddr_len;
 	}
+	/* Add the length of a pending address being deleted */
+	if (flags == SCTP_PARAM_ADD_IP && asoc->asconf_addr_del_pending) {
+		if ((asoc->asconf_addr_del_pending->sa.sa_family == AF_INET
+		    && v4) ||
+		    (asoc->asconf_addr_del_pending->sa.sa_family == AF_INET6
+		    && v6)) {
+			del_af = sctp_get_af_specific(
+			    asoc->asconf_addr_del_pending->sa.sa_family);
+			del_addr_param_len = del_af->to_addr_param(
+			    asoc->asconf_addr_del_pending, &del_addr_param);
+			totallen += del_paramlen;
+			totallen += del_addr_param_len;
+			SCTP_DEBUG_PRINTK("mkasconf_update_ip: now we picked del_pending addr, totallen for all addresses is %d\n",
+			    totallen);
+			/* for Set Primary (equal size as del parameters */
+			totallen += del_paramlen;
+			totallen += del_addr_param_len;
+		}
+		if (v4) {
+			if (totallen != SCTP_ASCONF_V4_PARAM_LEN * 2 &&
+			    totallen != SCTP_ASCONF_V4_PARAM_LEN * 3) {
+				SCTP_DEBUG_PRINTK("mkasconf_update_ip: incorrect total length of ASCONF parameters, del + add MUST be 32 bytes, but %d bytes\n", totallen);
+			return NULL;
+			}
+		} else if (v6) {
+			if (totallen != SCTP_ASCONF_V6_PARAM_LEN * 2 &&
+			    totallen != SCTP_ASCONF_V6_PARAM_LEN * 3) {
+				SCTP_DEBUG_PRINTK("mkasconf_update_ip: incorrect total length of ASCONF parameters, del + add MUST be 56 bytes, but %d bytes\n", totallen);
+			return NULL;
+			}
+		}
+	}
+	SCTP_DEBUG_PRINTK("mkasconf_update_ip: call mkasconf() for %d bytes\n",
+	    totallen);
 
 	/* Create an asconf chunk with the required length. */
 	retval = sctp_make_asconf(asoc, laddr, totallen);
@@ -2778,6 +2885,32 @@ struct sctp_chunk *sctp_make_asconf_update_ip(struct sctp_association *asoc,
 
 		addr_buf += af->sockaddr_len;
 	}
+	if (flags == SCTP_PARAM_ADD_IP && asoc->asconf_addr_del_pending) {
+		addr = asoc->asconf_addr_del_pending;
+		del_af = sctp_get_af_specific(addr->v4.sin_family);
+		del_addr_param_len = del_af->to_addr_param(addr,
+		    &del_addr_param);
+		del_param.param_hdr.type = SCTP_PARAM_DEL_IP;
+		del_param.param_hdr.length = htons(del_paramlen +
+		    del_addr_param_len);
+		del_param.crr_id = i;
+		asoc->asconf_del_pending_cid = i;
+
+		sctp_addto_chunk(retval, del_paramlen, &del_param);
+		sctp_addto_chunk(retval, del_addr_param_len, &del_addr_param);
+		/* For SET_PRIMARY */
+		addr_buf = addrs;
+		addr = (union sctp_addr *)addr_buf;
+		spr_af = sctp_get_af_specific(addr->v4.sin_family);
+		spr_addr_param_len = spr_af->to_addr_param(addr,
+		    &spr_addr_param);
+		spr_param.param_hdr.type = SCTP_PARAM_SET_PRIMARY;
+		spr_param.param_hdr.length = htons(spr_paramlen +
+		    spr_addr_param_len);
+		spr_param.crr_id = (i+1);
+		sctp_addto_chunk(retval, spr_paramlen, &spr_param);
+		sctp_addto_chunk(retval, spr_addr_param_len, &spr_addr_param);
+	}
 	return retval;
 }
 
@@ -2990,7 +3123,7 @@ static __be16 sctp_process_asconf_param(struct sctp_association *asoc,
 		 * an Error Cause TLV set to the new error code 'Request to
 		 * Delete Source IP Address'
 		 */
-		if (sctp_cmp_addr_exact(sctp_source(asconf), &addr))
+		if (sctp_cmp_addr_exact(&asconf->source, &addr))
 			return SCTP_ERROR_DEL_SRC_IP;
 
 		/* Section 4.2.2
@@ -3193,16 +3326,37 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
-			if (transport->state == SCTP_ACTIVE)
+			if (transport->state == SCTP_ACTIVE &&
+			    !asoc->src_out_of_asoc_ok)
 				continue;
 			dst_release(transport->dst);
 			sctp_transport_route(transport, NULL,
 					     sctp_sk(asoc->base.sk));
 		}
+		asoc->src_out_of_asoc_ok = 0;
 		break;
 	case SCTP_PARAM_DEL_IP:
 		local_bh_disable();
 		sctp_del_bind_addr(bp, &addr);
+		if (asoc->asconf_addr_del_pending != NULL) {
+			if ((addr.sa.sa_family == AF_INET) &&
+			    (asoc->asconf_addr_del_pending->sa.sa_family ==
+			     AF_INET)) {
+				if (asoc->asconf_addr_del_pending->v4.sin_addr.s_addr == addr.v4.sin_addr.s_addr) {
+					kfree(asoc->asconf_addr_del_pending);
+					asoc->asconf_del_pending_cid = 0;
+					asoc->asconf_addr_del_pending = NULL;
+				}
+			} else if ((addr.sa.sa_family == AF_INET6) &&
+				(asoc->asconf_addr_del_pending->sa.sa_family ==
+				 AF_INET6)) {
+				if (ipv6_addr_equal(&asoc->asconf_addr_del_pending->v6.sin6_addr, &addr.v6.sin6_addr)) {
+					kfree(asoc->asconf_addr_del_pending);
+					asoc->asconf_del_pending_cid = 0;
+					asoc->asconf_addr_del_pending = NULL;
+				}
+			}
+		}
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
@@ -3293,6 +3447,8 @@ int sctp_process_asconf_ack(struct sctp_association *asoc,
 	int	no_err = 1;
 	int	retval = 0;
 	__be16	err_code = SCTP_ERROR_NO_ERROR;
+	sctp_addip_param_t *first_asconf_param = NULL;
+	int first_asconf_paramlen;
 
 	/* Skip the chunkhdr and addiphdr from the last asconf sent and store
 	 * a pointer to address parameter.
@@ -3307,6 +3463,8 @@ int sctp_process_asconf_ack(struct sctp_association *asoc,
 	length = ntohs(addr_param->v4.param_hdr.length);
 	asconf_param = (sctp_addip_param_t *)((void *)addr_param + length);
 	asconf_len -= length;
+	first_asconf_param = asconf_param;
+	first_asconf_paramlen = ntohs(first_asconf_param->param_hdr.length);
 
 	/* ADDIP 4.1
 	 * A8) If there is no response(s) to specific TLV parameter(s), and no
@@ -3361,6 +3519,35 @@ int sctp_process_asconf_ack(struct sctp_association *asoc,
 		asconf_len -= length;
 	}
 
+	/* When the source address obviously changes to newly added one, we
+	   reset the cwnd to re-probe the path condition
+	*/
+	if (no_err && first_asconf_param->param_hdr.type == SCTP_PARAM_ADD_IP) {
+		if (first_asconf_paramlen == SCTP_ASCONF_V4_PARAM_LEN) {
+			struct sockaddr_in sin;
+
+			memset(&sin, 0, sizeof(struct sockaddr_in));
+			sin.sin_family = AF_INET;
+			memcpy(&sin.sin_addr.s_addr, first_asconf_param + 1,
+					sizeof(struct in_addr));
+			sctp_path_check_and_react(asoc,
+					(struct sockaddr *)&sin);
+
+		} else if (first_asconf_paramlen == SCTP_ASCONF_V6_PARAM_LEN) {
+			struct sockaddr_in6 sin6;
+
+			memset(&sin6, 0, sizeof(struct sockaddr_in6));
+			sin6.sin6_family = AF_INET6;
+			memcpy(&sin6.sin6_addr, first_asconf_param + 1,
+					sizeof(struct in6_addr));
+			sctp_path_check_and_react(asoc,
+					(struct sockaddr *)&sin6);
+		} else {
+			SCTP_DEBUG_PRINTK("funny asconf_paramlen? (%d)\n",
+			    first_asconf_paramlen);
+		}
+	}
+
 	/* Free the cached last sent asconf chunk. */
 	list_del_init(&asconf->transmitted_list);
 	sctp_chunk_free(asconf);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3951a10..2bfe2a9 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -527,6 +527,7 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 	struct list_head		*p;
 	int 				i;
 	int 				retval = 0;
+	struct sctp_transport		*trans = NULL;
 
 	if (!sctp_addip_enable)
 		return retval;
@@ -583,13 +584,10 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 			goto out;
 		}
 
-		retval = sctp_send_asconf(asoc, chunk);
-		if (retval)
-			goto out;
-
 		/* Add the new addresses to the bind address list with
 		 * use_as_src set to 0.
 		 */
+		SCTP_DEBUG_PRINTK("snd_asconf_addip: next, add_bind_addr with ADDR_NEW flag\n");
 		addr_buf = addrs;
 		for (i = 0; i < addrcnt; i++) {
 			addr = (union sctp_addr *)addr_buf;
@@ -599,6 +597,28 @@ static int sctp_send_asconf_add_ip(struct sock		*sk,
 						    SCTP_ADDR_NEW, GFP_ATOMIC);
 			addr_buf += af->sockaddr_len;
 		}
+		list_for_each_entry(trans, &asoc->peer.transport_addr_list,
+		    transports) {
+			if (asoc->asconf_addr_del_pending != NULL)
+				/* This ADDIP ASCONF piggybacks DELIP for the
+				 * last address, so need to select src addr
+				 * from the out_of_asoc addrs
+				 */
+				asoc->src_out_of_asoc_ok = 1;
+			/* Clear the source and route cache in the path */
+			memset(&trans->saddr, 0, sizeof(union sctp_addr));
+			dst_release(trans->dst);
+			trans->cwnd = min(4*asoc->pathmtu, max_t(__u32,
+			    2*asoc->pathmtu, 4380));
+			trans->ssthresh = asoc->peer.i.a_rwnd;
+			trans->rto = asoc->rto_initial;
+			trans->rtt = 0;
+			trans->srtt = 0;
+			trans->rttvar = 0;
+			sctp_transport_route(trans, NULL,
+			    sctp_sk(asoc->base.sk));
+		}
+		retval = sctp_send_asconf(asoc, chunk);
 	}
 
 out:
@@ -711,7 +731,9 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 	struct sctp_sockaddr_entry *saddr;
 	int 			i;
 	int 			retval = 0;
+	int			stored = 0;
 
+	chunk = NULL;
 	if (!sctp_addip_enable)
 		return retval;
 
@@ -762,8 +784,42 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		bp = &asoc->base.bind_addr;
 		laddr = sctp_find_unmatch_addr(bp, (union sctp_addr *)addrs,
 					       addrcnt, sp);
-		if (!laddr)
-			continue;
+		if ((laddr == NULL) && (addrcnt == 1)) {
+			union sctp_addr *sa_addr = NULL;
+
+			if (asoc->asconf_addr_del_pending == NULL) {
+				asoc->asconf_addr_del_pending =
+				    kmalloc(sizeof(union sctp_addr),
+				    GFP_ATOMIC);
+				memset(asoc->asconf_addr_del_pending, 0,
+						sizeof(union sctp_addr));
+				if (addrs->sa_family == AF_INET) {
+					struct sockaddr_in *sin;
+
+					sin = (struct sockaddr_in *)addrs;
+					asoc->asconf_addr_del_pending->v4.sin_family = AF_INET;
+					memcpy(&asoc->asconf_addr_del_pending->v4.sin_addr, &sin->sin_addr, sizeof(struct in_addr));
+				} else if (addrs->sa_family == AF_INET6) {
+					struct sockaddr_in6 *sin6;
+
+					sin6 = (struct sockaddr_in6 *)addrs;
+					asoc->asconf_addr_del_pending->v6.sin6_family = AF_INET6;
+					memcpy(&asoc->asconf_addr_del_pending->v6.sin6_addr, &sin6->sin6_addr, sizeof(struct in6_addr));
+				}
+				sa_addr = (union sctp_addr *)addrs;
+				SCTP_DEBUG_PRINTK_IPADDR("send_asconf_del_ip: keep the last address asoc: %p ",
+				    " at %p\n", asoc, sa_addr,
+				    asoc->asconf_addr_del_pending);
+				stored = 1;
+				goto skip_mkasconf;
+			} else {
+				SCTP_DEBUG_PRINTK_IPADDR("send_asconf_del_ip: asoc %p, deleting last address ",
+				    " is already stored at %p\n", asoc,
+				    asoc->asconf_addr_del_pending,
+				    asoc->asconf_addr_del_pending);
+				continue;
+			}
+		}
 
 		/* We do not need RCU protection throughout this loop
 		 * because this is done under a socket lock from the
@@ -776,6 +832,7 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 			goto out;
 		}
 
+skip_mkasconf:
 		/* Reset use_as_src flag for the addresses in the bind address
 		 * list that are to be deleted.
 		 */
@@ -797,10 +854,16 @@ static int sctp_send_asconf_del_ip(struct sock		*sk,
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 					transports) {
 			dst_release(transport->dst);
+			/* Clear source address cache */
+			memset(&transport->saddr, 0, sizeof(union sctp_addr));
 			sctp_transport_route(transport, NULL,
 					     sctp_sk(asoc->base.sk));
 		}
 
+		if (stored) {
+			/* We don't need to transmit ASCONF */
+			continue;
+		}
 		retval = sctp_send_asconf(asoc, chunk);
 	}
 out:


^ permalink raw reply related

* [PATCH net-next-2.6 v2 3/3] sctp: Add a valid address list in association local
From: Michio Honda @ 2011-04-08  5:19 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers

When the SCTP association transmits an ASCONF with ADD_IP_ADDRESS, that association cannot use the adding address until it receives ASCONF-ACK.  
This patch prevents that associations that do not receive ASCONF-ACK use the adding address.

Signed-off-by: Michio Honda <micchie@sfc.wide.ad.jp>
---
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 505845d..f1f439c 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -135,6 +135,7 @@ void sctp_sock_rfree(struct sk_buff *skb);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 		    struct sctp_association *asoc);
 extern struct percpu_counter sctp_sockets_allocated;
+void sctp_add_addr_to_laddr(struct sockaddr *, struct sctp_association *);
 
 /*
  * sctp/primitive.c
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index cc9185c..f6ca775 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1901,6 +1901,14 @@ struct sctp_association {
 	 * after reaching 4294967295.
 	 */
 	__u32 addip_serial;
+	/* list of valid addresses in association local
+	 * This list is needed to ensure base.bind_addr being a valid address
+	 * list of the endpoint-wide.  When one of associations receives
+	 * ASCONF-ACK, that address is added to this list.  When all
+	 * associations belonging to the same endpoint receive ASCONF-ACKs,
+	 * that address is added to base.bind_addr
+	 */
+	struct list_head asoc_laddr_list;
 
 	/* SCTP AUTH: list of the endpoint shared keys.  These
 	 * keys are provided out of band by the user applicaton
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 6b04287..614834f 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -279,6 +279,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 	asoc->peer.asconf_capable = 0;
 	if (sctp_addip_noauth)
 		asoc->peer.asconf_capable = 1;
+	INIT_LIST_HEAD(&asoc->asoc_laddr_list);
 
 	/* Create an input queue.  */
 	sctp_inq_init(&asoc->base.inqueue);
@@ -446,6 +447,15 @@ void sctp_association_free(struct sctp_association *asoc)
 	/* Free any cached ASCONF_ACK chunk. */
 	sctp_assoc_free_asconf_acks(asoc);
 
+	if (!list_empty(&asoc->asoc_laddr_list)) {
+		struct sctp_sockaddr_entry *laddr, *tmp;
+		list_for_each_entry_safe(laddr, tmp, &asoc->asoc_laddr_list, \
+		    list) {
+			list_del(&laddr->list);
+			kfree(laddr);
+		}
+	}
+
 	/* Free any cached ASCONF chunk. */
 	if (asoc->addip_last_asconf)
 		sctp_chunk_free(asoc->addip_last_asconf);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 865ce7b..bbf3152 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -333,6 +333,18 @@ static void sctp_v6_get_saddr(struct sctp_sock *sk,
 			}
 		}
 	}
+	if (baddr == NULL) {
+		/* We don't have a valid src addr in "endpoint-wide".
+		 * Looking up in assoc-locally valid address list.
+		 */
+		list_for_each_entry(laddr, &asoc->asoc_laddr_list, list) {
+			bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
+			if (!baddr || (matchlen < bmatchlen)) {
+				baddr = &laddr->a;
+				matchlen = bmatchlen;
+			}
+		}
+	}
 
 	if (baddr) {
 		memcpy(saddr, baddr, sizeof(union sctp_addr));
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 152976e..918a62e 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -516,6 +516,13 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 				goto out_unlock;
 		}
 		rcu_read_unlock();
+		/* We don't have a valid src addr in "endpoint-wide".
+		 * Looking up in assoc-locally valid address list.
+		 */
+		list_for_each_entry(laddr, &asoc->asoc_laddr_list, list) {
+			if (sctp_v4_cmp_addr(&dst_saddr, &laddr->a))
+				goto out_unlock;
+		}
 
 		/* None of the bound addresses match the source address of the
 		 * dst. So release it.
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index de98665..1a44f88 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -3171,7 +3171,6 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 	struct sctp_bind_addr *bp = &asoc->base.bind_addr;
 	union sctp_addr_param *addr_param;
 	struct sctp_transport *transport;
-	struct sctp_sockaddr_entry *saddr;
 
 	addr_param = (union sctp_addr_param *)
 			((void *)asconf_param + sizeof(sctp_addip_param_t));
@@ -3186,10 +3185,10 @@ static void sctp_asconf_param_success(struct sctp_association *asoc,
 		 * held, so the list can not change.
 		 */
 		local_bh_disable();
-		list_for_each_entry(saddr, &bp->address_list, list) {
-			if (sctp_cmp_addr_exact(&saddr->a, &addr))
-				saddr->state = SCTP_ADDR_SRC;
-		}
+		/* Until this ASCONF is acked on all associations, we cannot
+		 * consider this address as ADDR_SRC
+		 */
+		sctp_add_addr_to_laddr(&addr.sa, asoc);
 		local_bh_enable();
 		list_for_each_entry(transport, &asoc->peer.transport_addr_list,
 				transports) {
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3951a10..1a06469 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -807,6 +807,145 @@ out:
 	return retval;
 }
 
+/* Add a new address to the list contains available addresses only in the
+ * association.  If the new address is also available on the other associations
+ * on the endpoint, it is marked as SCTP_ADDR_SRC in the bind address list on
+ * the endpoint.  This situation is possible when some of associations receive
+ * ASCONF-ACK for ADD_IP at the endpoint
+ */
+void
+sctp_add_addr_to_laddr(struct sockaddr *sa, struct sctp_association *asoc)
+{
+	struct sctp_endpoint *ep = asoc->ep;
+	struct sctp_association *tmp = NULL;
+	struct sctp_bind_addr *bp;
+	struct sctp_sockaddr_entry *addr;
+	struct sockaddr_in *sin = NULL;
+	struct sockaddr_in6 *sin6 = NULL;
+	int local;
+	int found;
+
+	union sctp_addr *tmpaddr = NULL;
+	tmpaddr = (union sctp_addr *)sa;
+	SCTP_DEBUG_PRINTK_IPADDR("add_addr_to_laddr: asoc: %p ", " ep: %p",
+	    asoc, tmpaddr, ep);
+	if (sa->sa_family == AF_INET)
+		sin = (struct sockaddr_in *)sa;
+	else if (sa->sa_family == AF_INET6)
+		sin6 = (struct sockaddr_in6 *)sa;
+
+	/* Check if this address is locally available in the other asocs */
+	local = 0;
+	list_for_each_entry(tmp, &ep->asocs, asocs) {
+		if (tmp == asoc)
+			continue;
+		found = 0;
+		list_for_each_entry(addr, &tmp->asoc_laddr_list, list) {
+			tmpaddr = &addr->a;
+			if (sa->sa_family != addr->a.sa.sa_family)
+				continue;
+			if (sa->sa_family == AF_INET) {
+				if (sin->sin_addr.s_addr ==
+				    addr->a.v4.sin_addr.s_addr)
+					found = 1;
+			} else if (sa->sa_family == AF_INET6) {
+				if (ipv6_addr_equal(&sin6->sin6_addr,
+				    &addr->a.v6.sin6_addr))
+					found = 1;
+
+			}
+		}
+		if (!found) {
+			SCTP_DEBUG_PRINTK("add_addr_to_laddr: not found in asoc %p\n", tmp);
+			local = 1;
+			break;
+		}
+	}
+	addr = NULL;
+
+	if (local) {
+		/* this address is not available in some of the other
+		 * associations.  So add as locally-available in this
+		 * asocciation
+		 */
+		addr = kmalloc(sizeof(struct sctp_sockaddr_entry), GFP_ATOMIC);
+		if  (addr == NULL) {
+			SCTP_DEBUG_PRINTK("add_addr_to_laddr: failed to allocate memory for this address\n");
+			return;
+		}
+		memset(addr, 0, sizeof(struct sctp_sockaddr_entry));
+		if (sa->sa_family == AF_INET) {
+			addr->a.sa.sa_family = AF_INET;
+			addr->a.v4.sin_port = sin->sin_port;
+			addr->a.v4.sin_addr.s_addr = sin->sin_addr.s_addr;
+		} else if (sa->sa_family == AF_INET6) {
+			addr->a.sa.sa_family = AF_INET6;
+			addr->a.v6.sin6_port = sin6->sin6_port;
+			memcpy(&addr->a.v6.sin6_addr, &sin6->sin6_addr,
+			    sizeof(struct in6_addr));
+		}
+		list_add_tail(&addr->list, &asoc->asoc_laddr_list);
+		SCTP_DEBUG_PRINTK("add_addr_to_laddr: now we added this address to the local list on asoc %p\n", asoc);
+	} else {
+		/* this address is also available in all other asocs.  So set
+		 * it as ADDR_SRC in the bind-addr list in the endpoint, then
+		 * remove from the asoc_laddr_list on the associations.
+		 */
+		SCTP_DEBUG_PRINTK("add_addr_to_laddr: this address is available in all other asocs\n");
+		bp = &asoc->base.bind_addr;
+
+		/* change state of the new address in the bind list */
+		list_for_each_entry(addr, &bp->address_list, list) {
+			if (addr->state != SCTP_ADDR_NEW)
+				continue;
+			if (addr->a.sa.sa_family != sa->sa_family)
+				continue;
+			if (addr->a.sa.sa_family == AF_INET) {
+				if (sin->sin_port != addr->a.v4.sin_port)
+					continue;
+				if (sin->sin_addr.s_addr !=
+				    addr->a.v4.sin_addr.s_addr)
+					continue;
+			} else if (addr->a.sa.sa_family == AF_INET6) {
+				if (sin6->sin6_port != addr->a.v6.sin6_port)
+					continue;
+				if (!ipv6_addr_equal(&sin6->sin6_addr,
+				    &addr->a.v6.sin6_addr))
+					continue;
+			}
+			SCTP_DEBUG_PRINTK("add_addr_to_laddr: found the entry for this address with ADDR_NEW flag, set to ADDR_SRC\n");
+			addr->state = SCTP_ADDR_SRC;
+		}
+
+		/* remove the entry of this address from the asoc-local list */
+		list_for_each_entry(tmp, &ep->asocs, asocs) {
+			if (tmp == asoc)
+				continue;
+			addr = NULL;
+			list_for_each_entry(addr, &tmp->asoc_laddr_list, list) {
+				if (sa->sa_family != addr->a.sa.sa_family)
+					continue;
+				if (sa->sa_family == AF_INET) {
+					if (sin->sin_addr.s_addr !=
+					    addr->a.v4.sin_addr.s_addr)
+						continue;
+				} else if (sa->sa_family == AF_INET6) {
+					if (!ipv6_addr_equal(&sin6->sin6_addr,
+					    &addr->a.v6.sin6_addr))
+						continue;
+				}
+				break;
+			}
+			if (addr == NULL) {
+				SCTP_DEBUG_PRINTK("add_addr_to_laddr: Huh, asoc %p doesn't have the entry for this address?\n", asoc);
+				continue;
+			}
+			list_del(&addr->list);
+			kfree(addr);
+		}
+	}
+}
+
 /* Helper for tunneling sctp_bindx() requests through sctp_setsockopt()
  *
  * API 8.1


^ permalink raw reply related

* [patch net-next-2.6 v2] net: vlan: make non-hw-accel rx path similar to hw-accel
From: Jiri Pirko @ 2011-04-08  5:48 UTC (permalink / raw)
  To: netdev
  Cc: davem, shemminger, kaber, fubar, eric.dumazet, nicolas.2p.debian,
	andy, xiaosuo, jesse, ebiederm

Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.

For non-rx-vlan-hw-accel however, tagged skb goes thru whole
__netif_receive_skb, it's untagged in ptype_base hander and reinjected

This incosistency is fixed by this patch. Vlan untagging happens early in
__netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
see the skb like it was untagged by hw.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>

v1->v2:
	remove "inline" from vlan_core.c functions

---
 include/linux/if_vlan.h |   10 ++-
 net/8021q/vlan.c        |    8 --
 net/8021q/vlan.h        |    2 -
 net/8021q/vlan_core.c   |   85 +++++++++++++++++++++++-
 net/8021q/vlan_dev.c    |  173 -----------------------------------------------
 net/core/dev.c          |    8 ++-
 6 files changed, 99 insertions(+), 187 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 635e1fa..998b299 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -132,7 +132,8 @@ extern u16 vlan_dev_vlan_id(const struct net_device *dev);
 
 extern int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
 			     u16 vlan_tci, int polling);
-extern bool vlan_hwaccel_do_receive(struct sk_buff **skb);
+extern bool vlan_do_receive(struct sk_buff **skb);
+extern struct sk_buff *vlan_untag(struct sk_buff *skb);
 extern gro_result_t
 vlan_gro_receive(struct napi_struct *napi, struct vlan_group *grp,
 		 unsigned int vlan_tci, struct sk_buff *skb);
@@ -166,13 +167,18 @@ static inline int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
 	return NET_XMIT_SUCCESS;
 }
 
-static inline bool vlan_hwaccel_do_receive(struct sk_buff **skb)
+static inline bool vlan_do_receive(struct sk_buff **skb)
 {
 	if ((*skb)->vlan_tci & VLAN_VID_MASK)
 		(*skb)->pkt_type = PACKET_OTHERHOST;
 	return false;
 }
 
+inline struct sk_buff *vlan_untag(struct sk_buff *skb)
+{
+	return skb;
+}
+
 static inline gro_result_t
 vlan_gro_receive(struct napi_struct *napi, struct vlan_group *grp,
 		 unsigned int vlan_tci, struct sk_buff *skb)
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index e47600b..14ef5ef 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -49,11 +49,6 @@ const char vlan_version[] = DRV_VERSION;
 static const char vlan_copyright[] = "Ben Greear <greearb@candelatech.com>";
 static const char vlan_buggyright[] = "David S. Miller <davem@redhat.com>";
 
-static struct packet_type vlan_packet_type __read_mostly = {
-	.type = cpu_to_be16(ETH_P_8021Q),
-	.func = vlan_skb_recv, /* VLAN receive method */
-};
-
 /* End of global variables definitions. */
 
 static void vlan_group_free(struct vlan_group *grp)
@@ -684,7 +679,6 @@ static int __init vlan_proto_init(void)
 	if (err < 0)
 		goto err4;
 
-	dev_add_pack(&vlan_packet_type);
 	vlan_ioctl_set(vlan_ioctl_handler);
 	return 0;
 
@@ -705,8 +699,6 @@ static void __exit vlan_cleanup_module(void)
 
 	unregister_netdevice_notifier(&vlan_notifier_block);
 
-	dev_remove_pack(&vlan_packet_type);
-
 	unregister_pernet_subsys(&vlan_net_ops);
 	rcu_barrier(); /* Wait for completion of call_rcu()'s */
 
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index 5687c9b..c3408de 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -75,8 +75,6 @@ static inline struct vlan_dev_info *vlan_dev_info(const struct net_device *dev)
 }
 
 /* found in vlan_dev.c */
-int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,
-		  struct packet_type *ptype, struct net_device *orig_dev);
 void vlan_dev_set_ingress_priority(const struct net_device *dev,
 				   u32 skb_prio, u16 vlan_prio);
 int vlan_dev_set_egress_priority(const struct net_device *dev,
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index ce8e3ab..41495dc 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -4,7 +4,7 @@
 #include <linux/netpoll.h>
 #include "vlan.h"
 
-bool vlan_hwaccel_do_receive(struct sk_buff **skbp)
+bool vlan_do_receive(struct sk_buff **skbp)
 {
 	struct sk_buff *skb = *skbp;
 	u16 vlan_id = skb->vlan_tci & VLAN_VID_MASK;
@@ -88,3 +88,86 @@ gro_result_t vlan_gro_frags(struct napi_struct *napi, struct vlan_group *grp,
 	return napi_gro_frags(napi);
 }
 EXPORT_SYMBOL(vlan_gro_frags);
+
+static struct sk_buff *vlan_check_reorder_header(struct sk_buff *skb)
+{
+	if (vlan_dev_info(skb->dev)->flags & VLAN_FLAG_REORDER_HDR) {
+		if (skb_cow(skb, skb_headroom(skb)) < 0)
+			skb = NULL;
+		if (skb) {
+			/* Lifted from Gleb's VLAN code... */
+			memmove(skb->data - ETH_HLEN,
+				skb->data - VLAN_ETH_HLEN, 12);
+			skb->mac_header += VLAN_HLEN;
+		}
+	}
+	return skb;
+}
+
+static void vlan_set_encap_proto(struct sk_buff *skb, struct vlan_hdr *vhdr)
+{
+	__be16 proto;
+	unsigned char *rawp;
+
+	/*
+	 * Was a VLAN packet, grab the encapsulated protocol, which the layer
+	 * three protocols care about.
+	 */
+
+	proto = vhdr->h_vlan_encapsulated_proto;
+	if (ntohs(proto) >= 1536) {
+		skb->protocol = proto;
+		return;
+	}
+
+	rawp = skb->data;
+	if (*(unsigned short *) rawp == 0xFFFF)
+		/*
+		 * This is a magic hack to spot IPX packets. Older Novell
+		 * breaks the protocol design and runs IPX over 802.3 without
+		 * an 802.2 LLC layer. We look for FFFF which isn't a used
+		 * 802.2 SSAP/DSAP. This won't work for fault tolerant netware
+		 * but does for the rest.
+		 */
+		skb->protocol = htons(ETH_P_802_3);
+	else
+		/*
+		 * Real 802.2 LLC
+		 */
+		skb->protocol = htons(ETH_P_802_2);
+}
+
+struct sk_buff *vlan_untag(struct sk_buff *skb)
+{
+	struct vlan_hdr *vhdr;
+	u16 vlan_tci;
+
+	if (unlikely(vlan_tx_tag_present(skb))) {
+		/* vlan_tci is already set-up so leave this for another time */
+		return skb;
+	}
+
+	skb = skb_share_check(skb, GFP_ATOMIC);
+	if (unlikely(!skb))
+		goto err_free;
+
+	if (unlikely(!pskb_may_pull(skb, VLAN_HLEN)))
+		goto err_free;
+
+	vhdr = (struct vlan_hdr *) skb->data;
+	vlan_tci = ntohs(vhdr->h_vlan_TCI);
+	__vlan_hwaccel_put_tag(skb, vlan_tci);
+
+	skb_pull_rcsum(skb, VLAN_HLEN);
+	vlan_set_encap_proto(skb, vhdr);
+
+	skb = vlan_check_reorder_header(skb);
+	if (unlikely(!skb))
+		goto err_free;
+
+	return skb;
+
+err_free:
+	kfree_skb(skb);
+	return NULL;
+}
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index b84a46b..d174c31 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -65,179 +65,6 @@ static int vlan_dev_rebuild_header(struct sk_buff *skb)
 	return 0;
 }
 
-static inline struct sk_buff *vlan_check_reorder_header(struct sk_buff *skb)
-{
-	if (vlan_dev_info(skb->dev)->flags & VLAN_FLAG_REORDER_HDR) {
-		if (skb_cow(skb, skb_headroom(skb)) < 0)
-			skb = NULL;
-		if (skb) {
-			/* Lifted from Gleb's VLAN code... */
-			memmove(skb->data - ETH_HLEN,
-				skb->data - VLAN_ETH_HLEN, 12);
-			skb->mac_header += VLAN_HLEN;
-		}
-	}
-
-	return skb;
-}
-
-static inline void vlan_set_encap_proto(struct sk_buff *skb,
-		struct vlan_hdr *vhdr)
-{
-	__be16 proto;
-	unsigned char *rawp;
-
-	/*
-	 * Was a VLAN packet, grab the encapsulated protocol, which the layer
-	 * three protocols care about.
-	 */
-
-	proto = vhdr->h_vlan_encapsulated_proto;
-	if (ntohs(proto) >= 1536) {
-		skb->protocol = proto;
-		return;
-	}
-
-	rawp = skb->data;
-	if (*(unsigned short *)rawp == 0xFFFF)
-		/*
-		 * This is a magic hack to spot IPX packets. Older Novell
-		 * breaks the protocol design and runs IPX over 802.3 without
-		 * an 802.2 LLC layer. We look for FFFF which isn't a used
-		 * 802.2 SSAP/DSAP. This won't work for fault tolerant netware
-		 * but does for the rest.
-		 */
-		skb->protocol = htons(ETH_P_802_3);
-	else
-		/*
-		 * Real 802.2 LLC
-		 */
-		skb->protocol = htons(ETH_P_802_2);
-}
-
-/*
- *	Determine the packet's protocol ID. The rule here is that we
- *	assume 802.3 if the type field is short enough to be a length.
- *	This is normal practice and works for any 'now in use' protocol.
- *
- *  Also, at this point we assume that we ARE dealing exclusively with
- *  VLAN packets, or packets that should be made into VLAN packets based
- *  on a default VLAN ID.
- *
- *  NOTE:  Should be similar to ethernet/eth.c.
- *
- *  SANITY NOTE:  This method is called when a packet is moving up the stack
- *                towards userland.  To get here, it would have already passed
- *                through the ethernet/eth.c eth_type_trans() method.
- *  SANITY NOTE 2: We are referencing to the VLAN_HDR frields, which MAY be
- *                 stored UNALIGNED in the memory.  RISC systems don't like
- *                 such cases very much...
- *  SANITY NOTE 2a: According to Dave Miller & Alexey, it will always be
- *  		    aligned, so there doesn't need to be any of the unaligned
- *  		    stuff.  It has been commented out now...  --Ben
- *
- */
-int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,
-		  struct packet_type *ptype, struct net_device *orig_dev)
-{
-	struct vlan_hdr *vhdr;
-	struct vlan_pcpu_stats *rx_stats;
-	struct net_device *vlan_dev;
-	u16 vlan_id;
-	u16 vlan_tci;
-
-	skb = skb_share_check(skb, GFP_ATOMIC);
-	if (skb == NULL)
-		goto err_free;
-
-	if (unlikely(!pskb_may_pull(skb, VLAN_HLEN)))
-		goto err_free;
-
-	vhdr = (struct vlan_hdr *)skb->data;
-	vlan_tci = ntohs(vhdr->h_vlan_TCI);
-	vlan_id = vlan_tci & VLAN_VID_MASK;
-
-	rcu_read_lock();
-	vlan_dev = vlan_find_dev(dev, vlan_id);
-
-	/* If the VLAN device is defined, we use it.
-	 * If not, and the VID is 0, it is a 802.1p packet (not
-	 * really a VLAN), so we will just netif_rx it later to the
-	 * original interface, but with the skb->proto set to the
-	 * wrapped proto: we do nothing here.
-	 */
-
-	if (!vlan_dev) {
-		if (vlan_id) {
-			pr_debug("%s: ERROR: No net_device for VID: %u on dev: %s\n",
-				 __func__, vlan_id, dev->name);
-			goto err_unlock;
-		}
-		rx_stats = NULL;
-	} else {
-		skb->dev = vlan_dev;
-
-		rx_stats = this_cpu_ptr(vlan_dev_info(skb->dev)->vlan_pcpu_stats);
-
-		u64_stats_update_begin(&rx_stats->syncp);
-		rx_stats->rx_packets++;
-		rx_stats->rx_bytes += skb->len;
-
-		skb->priority = vlan_get_ingress_priority(skb->dev, vlan_tci);
-
-		pr_debug("%s: priority: %u for TCI: %hu\n",
-			 __func__, skb->priority, vlan_tci);
-
-		switch (skb->pkt_type) {
-		case PACKET_BROADCAST:
-			/* Yeah, stats collect these together.. */
-			/* stats->broadcast ++; // no such counter :-( */
-			break;
-
-		case PACKET_MULTICAST:
-			rx_stats->rx_multicast++;
-			break;
-
-		case PACKET_OTHERHOST:
-			/* Our lower layer thinks this is not local, let's make
-			 * sure.
-			 * This allows the VLAN to have a different MAC than the
-			 * underlying device, and still route correctly.
-			 */
-			if (!compare_ether_addr(eth_hdr(skb)->h_dest,
-						skb->dev->dev_addr))
-				skb->pkt_type = PACKET_HOST;
-			break;
-		default:
-			break;
-		}
-		u64_stats_update_end(&rx_stats->syncp);
-	}
-
-	skb_pull_rcsum(skb, VLAN_HLEN);
-	vlan_set_encap_proto(skb, vhdr);
-
-	if (vlan_dev) {
-		skb = vlan_check_reorder_header(skb);
-		if (!skb) {
-			rx_stats->rx_errors++;
-			goto err_unlock;
-		}
-	}
-
-	netif_rx(skb);
-
-	rcu_read_unlock();
-	return NET_RX_SUCCESS;
-
-err_unlock:
-	rcu_read_unlock();
-err_free:
-	atomic_long_inc(&dev->rx_dropped);
-	kfree_skb(skb);
-	return NET_RX_DROP;
-}
-
 static inline u16
 vlan_dev_get_egress_qos_mask(struct net_device *dev, struct sk_buff *skb)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index 5d0b4f6..52aed0a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3130,6 +3130,12 @@ another_round:
 
 	__this_cpu_inc(softnet_data.processed);
 
+	if (skb->protocol == cpu_to_be16(ETH_P_8021Q)) {
+		skb = vlan_untag(skb);
+		if (unlikely(!skb))
+			goto out;
+	}
+
 #ifdef CONFIG_NET_CLS_ACT
 	if (skb->tc_verd & TC_NCLS) {
 		skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
@@ -3177,7 +3183,7 @@ ncls:
 			ret = deliver_skb(skb, pt_prev, orig_dev);
 			pt_prev = NULL;
 		}
-		if (vlan_hwaccel_do_receive(&skb)) {
+		if (vlan_do_receive(&skb)) {
 			ret = __netif_receive_skb(skb);
 			goto out;
 		} else if (unlikely(!skb))
-- 
1.7.4


^ permalink raw reply related

* Re: [RFC 1/6]Fix typo "recieve" in various parts of the kernel.
From: Ricard Wanderlof @ 2011-04-08  7:14 UTC (permalink / raw)
  To: Justin P. Mattock
  Cc: Uwe Kleine-König, trivial@kernel.org,
	linux-scsi@vger.kernel.org, netdev@vger.kernel.org,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	socketcan-core@lists.berlios.de, linux-mtd@lists.infradead.org,
	linux-input@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <4D9E4A59.8010604@gmail.com>

On Fri, 8 Apr 2011, Justin P. Mattock wrote:

> On 04/07/2011 01:05 PM, Uwe Kleine-König wrote:
>> On Thu, Apr 07, 2011 at 09:09:22AM -0700, Justin P. Mattock wrote:
>>> The patch below fixes some typos "recieve" in various parts of the kernel.
>>> Note: these below are in actual code rather than comments(excpet for r852.c which
>> s/excpet/except/
>>
>>> has a code fix, and comment).
>>> compile tested as best as I can...
>> I'm not a native speaker, but "as good as" sounds better in my ears.
>>
>> Best regards
>> Uwe
>>
>
> your right.. gcc and friends are the one's doing all the reall work: "as 
> good as *"

I think Uwe was thinking of grammar rather than who was doing the actual 
work.

Now that I've gone more or less off topic I may as well voice my opinion 
the 'as best as I can' sounds like a mixup of two constructs: either 'as 
best I can' (which is probably slightly unusual these days), or 'as well 
as I can'. 'I've done it as good as I can' is gramatically wrong (since 
'good' is an adjective and 'well' is an adverb, and it is referring to 
'how I did it' (or in the case above, 'how well I tested it'), but fairly 
common nevertheless, and given the large percentage of non-native speakers 
in the Linux community I wouldn't worry about it, the meaning is clear 
anyway.

I'll shut up and go now. :-)

/Ricard
-- 
Ricard Wolf Wanderlöf                           ricardw(at)axis.com
Axis Communications AB, Lund, Sweden            www.axis.com
Phone +46 46 272 2016                           Fax +46 46 13 61 30

^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-08  8:59 UTC (permalink / raw)
  To: Alexander Duyck, Eric Dumazet; +Cc: netdev, Kirsher, Jeffrey T
In-Reply-To: <4D9DE465.1080008@intel.com>

Hi,
On my configuration of Linux 2.6.38 we do enabled the CONFIG_DMAR
CONFIG_DMAR=y
CONFIG_DMAR_DEFAULT_ON=y
CONFIG_DMAR_FLOPPY_WA=y
CONFIG_INTR_REMAP=y

But I enabled this DMAR also by default of the RHEL6 2.6.32 kernel, which we don't see this problem on this kernel. The only different I can see on the config is "CONFIG_DMAR_DEFAULT_ON"
CONFIG_DMAR=y
# CONFIG_DMAR_DEFAULT_ON is not set
CONFIG_DMAR_FLOPPY_WA=y
CONFIG_INTR_REMAP=y

I could try the same configuration as 2.6.32 for the 2.6.38 kernel by unset the CONFIG_DMAR_DEFAULT_ON. And run the test again, hopefully we can get the test result next Monday:)

Thanks
WeiGu

-----Original Message-----
From: Alexander Duyck [mailto:alexander.h.duyck@intel.com]
Sent: Friday, April 08, 2011 12:21 AM
To: Eric Dumazet
Cc: Wei Gu; netdev; Kirsher, Jeffrey T
Subject: Re: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

On 4/7/2011 9:03 AM, Eric Dumazet wrote:
> Le jeudi 07 avril 2011 à 08:58 -0700, Alexander Duyck a écrit :
>> On 4/7/2011 4:46 AM, Eric Dumazet wrote:
>>> Le jeudi 07 avril 2011 à 19:15 +0800, Wei Gu a écrit :
>>>> Hi,
>>>> I compile the ixgbe driver into the kernel and run the test again
>>>> and also change the copy to clone in the fw hook This is the perf
>>>> report while I was forwarding 150Kpps with The attached file include the basic info about my test system. Please let me know if I did some thing wrong.
>>>>
>>>> +     71.91%          swapper  [kernel.kallsyms]            [k] poll_idle
>>>> +     10.43%          swapper  [kernel.kallsyms]            [k] intel_idle
>>>> -      8.00%     ksoftirqd/24  [kernel.kallsyms]            [k] _raw_spin_unlock_irqrestore
>>>> \u2592   - _raw_spin_unlock_irqrestore
>>>> \u2592      - 42.25% alloc_iova
>>>> \u2592           intel_alloc_iova
>>>> \u2592           __intel_map_single
>>>> \u2592           intel_map_page
>>
>> I'm almost certain this is the issue here.  I am pretty sure the
>> intel_map_page call indicates that you are running with the Intel
>> IOMMU enabled.  As Eric suggested you can either rebuild your kernel
>> with "CONFIG_DMAR=N", or pass the kernel the parameter
>> "intel_iommu=off" in order to disable it so that it will instead just use SWIOTLB.
>
> What's the purpose of intel_iommu ?
>
> Could this be speedup if ixgbe uses a perqueue iommu context instead
> of a per device, so that we dont hit a single spinlock ?

The intel_iommu is meant to be a security feature.  Primarily it is used in virtualzation where it allows KVM or Xen to direct assign a device without having to worry about the guest getting access to the hosts physical memory by submitting invalid DMA requests.

If virtualzation isn't in use I would recommend turning it off as it can have a negative impact especially on small packet performance due to the extra locking overhead that is required for DMA map and unmap calls.

Thanks,

Alex


^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-08  9:07 UTC (permalink / raw)
  To: Wei Gu; +Cc: Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48BA58D@ESGSCCMS0001.eapac.ericsson.se>

Le vendredi 08 avril 2011 à 16:59 +0800, Wei Gu a écrit :
> Hi,
> On my configuration of Linux 2.6.38 we do enabled the CONFIG_DMAR
> CONFIG_DMAR=y
> CONFIG_DMAR_DEFAULT_ON=y
> CONFIG_DMAR_FLOPPY_WA=y
> CONFIG_INTR_REMAP=y
> 
> But I enabled this DMAR also by default of the RHEL6 2.6.32 kernel,
> which we don't see this problem on this kernel. The only different I
> can see on the config is "CONFIG_DMAR_DEFAULT_ON"
> CONFIG_DMAR=y
> # CONFIG_DMAR_DEFAULT_ON is not set
> CONFIG_DMAR_FLOPPY_WA=y
> CONFIG_INTR_REMAP=y
> 
> I could try the same configuration as 2.6.32 for the 2.6.38 kernel by
> unset the CONFIG_DMAR_DEFAULT_ON. And run the test again, hopefully we
> can get the test result next Monday:)

Just say no to CONFIG_DMAR, end of problems, no need to try various
knobs and find the solution next month ;)

Unless you have a reason to use it.




^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-08  9:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <1302253651.4409.2.camel@edumazet-laptop>

Yeap, you are right, right now I will try the CONFIG_DMAR off:)

If you guys know the relation between CONFIG_DMAR and CONFIG_DMAR_DEFAULT_ON, please let me know:)

Thanks
WeiGu

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Friday, April 08, 2011 5:08 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le vendredi 08 avril 2011 à 16:59 +0800, Wei Gu a écrit :
> Hi,
> On my configuration of Linux 2.6.38 we do enabled the CONFIG_DMAR
> CONFIG_DMAR=y CONFIG_DMAR_DEFAULT_ON=y CONFIG_DMAR_FLOPPY_WA=y
> CONFIG_INTR_REMAP=y
>
> But I enabled this DMAR also by default of the RHEL6 2.6.32 kernel,
> which we don't see this problem on this kernel. The only different I
> can see on the config is "CONFIG_DMAR_DEFAULT_ON"
> CONFIG_DMAR=y
> # CONFIG_DMAR_DEFAULT_ON is not set
> CONFIG_DMAR_FLOPPY_WA=y
> CONFIG_INTR_REMAP=y
>
> I could try the same configuration as 2.6.32 for the 2.6.38 kernel by
> unset the CONFIG_DMAR_DEFAULT_ON. And run the test again, hopefully we
> can get the test result next Monday:)

Just say no to CONFIG_DMAR, end of problems, no need to try various knobs and find the solution next month ;)

Unless you have a reason to use it.




^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-08  9:41 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Duyck; +Cc: netdev, Kirsher, Jeffrey T
In-Reply-To: <1302253651.4409.2.camel@edumazet-laptop>

Hi,
I just tried the CONFIG_DMAR off and CONFIG_DMAR_DEFAULT_ON not set.
I get a pretty good performance(1.5Mpps 400byte Rx&Tx) with shipped ixgbe driver.

The I think it is the root cause why I get that low performance previouslly.

Is it better to put some notes on the Intel-IOMMU.txt Or somewhere like the Q&A about the " CONFIG_DMAR" in no virtualization setup:)

I will keep testing the large setup by enable the rest 3 NICs, to see how much we can have with this 4 sock machine.

BTW, even we gain some preformance with this shipped ixgbe driver, but it doesn't seem that much scallable compare to the Intel offical release 3.2.10 driver. I don't know if it is a problem?

Thanks
WeiGu

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Friday, April 08, 2011 5:08 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le vendredi 08 avril 2011 à 16:59 +0800, Wei Gu a écrit :
> Hi,
> On my configuration of Linux 2.6.38 we do enabled the CONFIG_DMAR
> CONFIG_DMAR=y CONFIG_DMAR_DEFAULT_ON=y CONFIG_DMAR_FLOPPY_WA=y
> CONFIG_INTR_REMAP=y
>
> But I enabled this DMAR also by default of the RHEL6 2.6.32 kernel,
> which we don't see this problem on this kernel. The only different I
> can see on the config is "CONFIG_DMAR_DEFAULT_ON"
> CONFIG_DMAR=y
> # CONFIG_DMAR_DEFAULT_ON is not set
> CONFIG_DMAR_FLOPPY_WA=y
> CONFIG_INTR_REMAP=y
>
> I could try the same configuration as 2.6.32 for the 2.6.38 kernel by
> unset the CONFIG_DMAR_DEFAULT_ON. And run the test again, hopefully we
> can get the test result next Monday:)

Just say no to CONFIG_DMAR, end of problems, no need to try various knobs and find the solution next month ;)

Unless you have a reason to use it.




^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-08  9:49 UTC (permalink / raw)
  To: Wei Gu; +Cc: Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48BA5A9@ESGSCCMS0001.eapac.ericsson.se>

Le vendredi 08 avril 2011 à 17:15 +0800, Wei Gu a écrit :
> Yeap, you are right, right now I will try the CONFIG_DMAR off:)
> 
> If you guys know the relation between CONFIG_DMAR and CONFIG_DMAR_DEFAULT_ON, please let me know:)


CONFIG_DMAR_DEFAULT_ON is ON by default since linux-2.6.29 and commit

commit f6be37fdc62d0c0214bc49815d1180ebfbd716e2
Author: Kyle McMartin <kyle@redhat.com>
Date:   Thu Feb 26 12:57:56 2009 -0500

    x86: enable DMAR by default
    
    Now that the obvious bugs have been worked out, specifically
    the iwlagn issue, and the write buffer errata, DMAR should be safe
    to turn back on by default. (We've had it on since those patches
were
    first written a few weeks ago, without any noticeable bug reports
    (most have been due to the dma-api debug patchset.))
    
    Signed-off-by: Kyle McMartin <kyle@redhat.com>
    Acked-by: David Woodhouse <David.Woodhouse@intel.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9c39095..bc2fbad 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1803,7 +1803,7 @@ config DMAR
          remapping devices.
 
 config DMAR_DEFAULT_ON
-       def_bool n
+       def_bool y
        prompt "Enable DMA Remapping Devices by default"
        depends on DMAR
        help


git describe --contains f6be37fd
v2.6.29-rc7~24^2


But the .config used to build your 2.6.32 kernel, had it set to OFF




^ permalink raw reply related

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-08  9:59 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Duyck; +Cc: netdev, Kirsher, Jeffrey T
In-Reply-To: <1302256158.4409.6.camel@edumazet-laptop>

Yeap, I guess the red hat guy make this config, since I chosed a no virtualization setup of the RHEL6.

If the from 2.6.29 the DMAR_DEFAULT_ON is set, then I guess there will a big performance degrade at least for the Intel 10GE NIC.

I don't know if this kind of performance degrade is a fault Or it just meet the design expectation?

@Alexander, I guess you know much about this?

Thanks
WeiGu

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Friday, April 08, 2011 5:49 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le vendredi 08 avril 2011 à 17:15 +0800, Wei Gu a écrit :
> Yeap, you are right, right now I will try the CONFIG_DMAR off:)
>
> If you guys know the relation between CONFIG_DMAR and
> CONFIG_DMAR_DEFAULT_ON, please let me know:)


CONFIG_DMAR_DEFAULT_ON is ON by default since linux-2.6.29 and commit

commit f6be37fdc62d0c0214bc49815d1180ebfbd716e2
Author: Kyle McMartin <kyle@redhat.com>
Date:   Thu Feb 26 12:57:56 2009 -0500

    x86: enable DMAR by default

    Now that the obvious bugs have been worked out, specifically
    the iwlagn issue, and the write buffer errata, DMAR should be safe
    to turn back on by default. (We've had it on since those patches were
    first written a few weeks ago, without any noticeable bug reports
    (most have been due to the dma-api debug patchset.))

    Signed-off-by: Kyle McMartin <kyle@redhat.com>
    Acked-by: David Woodhouse <David.Woodhouse@intel.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9c39095..bc2fbad 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1803,7 +1803,7 @@ config DMAR
          remapping devices.

 config DMAR_DEFAULT_ON
-       def_bool n
+       def_bool y
        prompt "Enable DMA Remapping Devices by default"
        depends on DMAR
        help


git describe --contains f6be37fd
v2.6.29-rc7~24^2


But the .config used to build your 2.6.32 kernel, had it set to OFF




^ permalink raw reply

* Re: extending feature word.
From: Michał Mirosław @ 2011-04-08 10:05 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, Ben Hutchings, David Miller
In-Reply-To: <BANLkTin52LqjxtVT+_y+gGam-QxeLq_iOA@mail.gmail.com>

On Fri, Apr 01, 2011 at 07:07:05PM -0700, Mahesh Bandewar wrote:
> Thanks for your comments on my loop-back patch. I was looking at the
> code today from the perspective of extending various "features" for
> word to an array of words and as Michael has pointed out, it's a huge
> change. So I'm thinking on the following lines
> (include/linux/netdevice.h)
> 
> +#define DEV_FEATURE_WORDS      2
> +#define LEGACY_FEATURE_WORD    0
>        /* currently active device features */
> -       u32                     features;
> +       u32                     features[DEV_FEATURE_WORDS];
>        /* user-changeable features */
> -       u32                     hw_features;
> +       u32                     hw_features[DEV_FEATURE_WORDS];
>        /* user-requested features */
> -       u32                     wanted_features;
> +       u32                     wanted_features[DEV_FEATURE_WORDS];
>        /* VLAN feature mask */
> -       u32                     vlan_features;
> +       u32                     vlan_features[DEV_FEATURE_WORDS];

Hmm. There might be no point in making features field an array.
This gives us nothing really. Maybe just add features_2 or similar?
If we ever get to the point there need to be more than two words for
features we can think of some abstraction layer then.

Or we might add a new field and put there NETIF_F_LLTX, NETIF_F_HIGHDMA
and others that are not user changeable ever. Those don't need dynamic
propagation to slave devices (e.g. VLAN) and wanted/hw_features for them.

Best Regards,
Michał Mirosław

^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Wei Gu @ 2011-04-08 12:19 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Duyck; +Cc: netdev, Kirsher, Jeffrey T
In-Reply-To: <1302253651.4409.2.camel@edumazet-laptop>

Hi again,
I tried more testing with by disable this CONFIG_DMAR with shipped 2.6.38 ixgbe and Intel released 3.2.10/3.1.15.
All these test looks we can get >1Mpps 400bype packtes but not stable at all, there will huge number missing errors with 100% CPU IDLE:
ethtool -S eth10 |grep rx_missed_errors

        rx_missed_errors: 76832040

SUM: 1102212 ETH8: 0  ETH10: 1102212 ETH6: 0 ETH4: 0
SUM: 521841 ETH8: 0  ETH10: 521841 ETH6: 0 ETH4: 0
SUM: 426776 ETH8: 0  ETH10: 426776 ETH6: 0 ETH4: 0
SUM: 927520 ETH8: 0  ETH10: 927520 ETH6: 0 ETH4: 0
SUM: 1171995 ETH8: 0  ETH10: 1171995 ETH6: 0 ETH4: 0
SUM: 855980 ETH8: 0  ETH10: 855980 ETH6: 0 ETH4: 0


Do you know if there is other options in the kernel will cause high rate rx_missed_errors with low CPU usage. (No problem on 2.6.32 with same test case)

perf  record:
+     69.74%          swapper  [kernel.kallsyms]          [k] poll_idle
+     11.62%          swapper  [kernel.kallsyms]          [k] intel_idle
+      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
+      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
+      0.77%             perf  [kernel.kallsyms]          [k] skb_copy_bits
+      0.64%          swapper  [kernel.kallsyms]          [k] skb_copy_bits
+      0.48%             perf  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
+      0.44%          swapper  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
+      0.36%          swapper  [kernel.kallsyms]          [k] kmem_cache_alloc_node
+      0.35%          swapper  [kernel.kallsyms]          [k] kfree
+      0.35%             perf  [kernel.kallsyms]          [k] kmem_cache_alloc_node


-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
Sent: Friday, April 08, 2011 5:08 PM
To: Wei Gu
Cc: Alexander Duyck; netdev; Kirsher, Jeffrey T
Subject: RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel

Le vendredi 08 avril 2011 à 16:59 +0800, Wei Gu a écrit :
> Hi,
> On my configuration of Linux 2.6.38 we do enabled the CONFIG_DMAR
> CONFIG_DMAR=y CONFIG_DMAR_DEFAULT_ON=y CONFIG_DMAR_FLOPPY_WA=y
> CONFIG_INTR_REMAP=y
>
> But I enabled this DMAR also by default of the RHEL6 2.6.32 kernel,
> which we don't see this problem on this kernel. The only different I
> can see on the config is "CONFIG_DMAR_DEFAULT_ON"
> CONFIG_DMAR=y
> # CONFIG_DMAR_DEFAULT_ON is not set
> CONFIG_DMAR_FLOPPY_WA=y
> CONFIG_INTR_REMAP=y
>
> I could try the same configuration as 2.6.32 for the 2.6.38 kernel by
> unset the CONFIG_DMAR_DEFAULT_ON. And run the test again, hopefully we
> can get the test result next Monday:)

Just say no to CONFIG_DMAR, end of problems, no need to try various knobs and find the solution next month ;)

Unless you have a reason to use it.




^ permalink raw reply

* [PATCH] net: benet: convert to hw_features - fixup
From: Michał Mirosław @ 2011-04-08 12:38 UTC (permalink / raw)
  To: netdev; +Cc: Sathya Perla, Subbu Seetharaman, Ajit Khaparde

Fix up after merge with NETIF_F_RXHASH implementation.

This allows to toggle NETIF_F_RXHASH and NETIF_F_HW_VLAN_TX.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/benet/be_main.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 58a652f..5497336 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -2629,14 +2629,13 @@ static void be_netdev_init(struct net_device *netdev)
 	int i;
 
 	netdev->hw_features |= NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO6 |
-		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM;
-
-	netdev->features |= netdev->hw_features |
-		NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX |
-		NETIF_F_HW_VLAN_FILTER;
-
+		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM |
+		NETIF_F_HW_VLAN_TX;
 	if (be_multi_rxq(adapter))
-		netdev->features |= NETIF_F_RXHASH;
+		netdev->hw_features |= NETIF_F_RXHASH;
+
+	netdev->features |= netdev->hw_features |
+		NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_FILTER;
 
 	netdev->vlan_features |= NETIF_F_SG | NETIF_F_TSO |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
-- 
1.7.2.5


^ permalink raw reply related

* [PATCH] net: Remove invalid offloads
From: Michał Mirosław @ 2011-04-08 12:38 UTC (permalink / raw)
  To: netdev; +Cc: Pantelis Antoniou, Vitaly Bordug, Li Yang, linuxppc-dev

Remove offload changing ethtool ops which drivers don't really support:

 - fs_enet
 - ucc_geth

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/fs_enet/fs_enet-main.c |    2 --
 drivers/net/ucc_geth_ethtool.c     |    1 -
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/fs_enet/fs_enet-main.c b/drivers/net/fs_enet/fs_enet-main.c
index 24cb953..a938894 100644
--- a/drivers/net/fs_enet/fs_enet-main.c
+++ b/drivers/net/fs_enet/fs_enet-main.c
@@ -956,8 +956,6 @@ static const struct ethtool_ops fs_ethtool_ops = {
 	.get_link = ethtool_op_get_link,
 	.get_msglevel = fs_get_msglevel,
 	.set_msglevel = fs_set_msglevel,
-	.set_tx_csum = ethtool_op_set_tx_csum,	/* local! */
-	.set_sg = ethtool_op_set_sg,
 	.get_regs = fs_get_regs,
 };
 
diff --git a/drivers/net/ucc_geth_ethtool.c b/drivers/net/ucc_geth_ethtool.c
index 6f92e48..537fbc0 100644
--- a/drivers/net/ucc_geth_ethtool.c
+++ b/drivers/net/ucc_geth_ethtool.c
@@ -410,7 +410,6 @@ static const struct ethtool_ops uec_ethtool_ops = {
 	.set_ringparam          = uec_set_ringparam,
 	.get_pauseparam         = uec_get_pauseparam,
 	.set_pauseparam         = uec_set_pauseparam,
-	.set_sg                 = ethtool_op_set_sg,
 	.get_sset_count		= uec_get_sset_count,
 	.get_strings            = uec_get_strings,
 	.get_ethtool_stats      = uec_get_ethtool_stats,
-- 
1.7.2.5


^ permalink raw reply related

* [PATCH] net: r8169: convert to hw_features
From: Michał Mirosław @ 2011-04-08 12:38 UTC (permalink / raw)
  To: netdev; +Cc: Francois Romieu

This enables SG+IP_CSUM+TSO by default (there were no comments suggesting
leaving them out was intentional).

This also fixes confusion around vlan_features in rtl8169_vlan_mode().

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
---
 drivers/net/r8169.c |   89 +++++++++++++-------------------------------------
 1 files changed, 23 insertions(+), 66 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index caa99cd..051dacd 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1286,14 +1286,7 @@ static int rtl8169_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 	return ret;
 }
 
-static u32 rtl8169_get_rx_csum(struct net_device *dev)
-{
-	struct rtl8169_private *tp = netdev_priv(dev);
-
-	return tp->cp_cmd & RxChkSum;
-}
-
-static int rtl8169_set_rx_csum(struct net_device *dev, u32 data)
+static int rtl8169_set_features(struct net_device *dev, u32 features)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -1301,11 +1294,16 @@ static int rtl8169_set_rx_csum(struct net_device *dev, u32 data)
 
 	spin_lock_irqsave(&tp->lock, flags);
 
-	if (data)
+	if (features & NETIF_F_RXCSUM)
 		tp->cp_cmd |= RxChkSum;
 	else
 		tp->cp_cmd &= ~RxChkSum;
 
+	if (dev->features & NETIF_F_HW_VLAN_RX)
+		tp->cp_cmd |= RxVlan;
+	else
+		tp->cp_cmd &= ~RxVlan;
+
 	RTL_W16(CPlusCmd, tp->cp_cmd);
 	RTL_R16(CPlusCmd);
 
@@ -1321,27 +1319,6 @@ static inline u32 rtl8169_tx_vlan_tag(struct rtl8169_private *tp,
 		TxVlanTag | swab16(vlan_tx_tag_get(skb)) : 0x00;
 }
 
-#define NETIF_F_HW_VLAN_TX_RX	(NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX)
-
-static void rtl8169_vlan_mode(struct net_device *dev)
-{
-	struct rtl8169_private *tp = netdev_priv(dev);
-	void __iomem *ioaddr = tp->mmio_addr;
-	unsigned long flags;
-
-	spin_lock_irqsave(&tp->lock, flags);
-	if (dev->features & NETIF_F_HW_VLAN_RX)
-		tp->cp_cmd |= RxVlan;
-	else
-		tp->cp_cmd &= ~RxVlan;
-	RTL_W16(CPlusCmd, tp->cp_cmd);
-	/* PCI commit */
-	RTL_R16(CPlusCmd);
-	spin_unlock_irqrestore(&tp->lock, flags);
-
-	dev->vlan_features = dev->features &~ NETIF_F_HW_VLAN_TX_RX;
-}
-
 static void rtl8169_rx_vlan_tag(struct RxDesc *desc, struct sk_buff *skb)
 {
 	u32 opts2 = le32_to_cpu(desc->opts2);
@@ -1522,28 +1499,6 @@ static void rtl8169_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 	}
 }
 
-static int rtl8169_set_flags(struct net_device *dev, u32 data)
-{
-	struct rtl8169_private *tp = netdev_priv(dev);
-	unsigned long old_feat = dev->features;
-	int rc;
-
-	if ((tp->mac_version == RTL_GIGA_MAC_VER_05) &&
-	    !(data & ETH_FLAG_RXVLAN)) {
-		netif_info(tp, drv, dev, "8110SCd requires hardware Rx VLAN\n");
-		return -EINVAL;
-	}
-
-	rc = ethtool_op_set_flags(dev, data, ETH_FLAG_TXVLAN | ETH_FLAG_RXVLAN);
-	if (rc)
-		return rc;
-
-	if ((old_feat ^ dev->features) & NETIF_F_HW_VLAN_RX)
-		rtl8169_vlan_mode(dev);
-
-	return 0;
-}
-
 static const struct ethtool_ops rtl8169_ethtool_ops = {
 	.get_drvinfo		= rtl8169_get_drvinfo,
 	.get_regs_len		= rtl8169_get_regs_len,
@@ -1552,19 +1507,12 @@ static const struct ethtool_ops rtl8169_ethtool_ops = {
 	.set_settings		= rtl8169_set_settings,
 	.get_msglevel		= rtl8169_get_msglevel,
 	.set_msglevel		= rtl8169_set_msglevel,
-	.get_rx_csum		= rtl8169_get_rx_csum,
-	.set_rx_csum		= rtl8169_set_rx_csum,
-	.set_tx_csum		= ethtool_op_set_tx_csum,
-	.set_sg			= ethtool_op_set_sg,
-	.set_tso		= ethtool_op_set_tso,
 	.get_regs		= rtl8169_get_regs,
 	.get_wol		= rtl8169_get_wol,
 	.set_wol		= rtl8169_set_wol,
 	.get_strings		= rtl8169_get_strings,
 	.get_sset_count		= rtl8169_get_sset_count,
 	.get_ethtool_stats	= rtl8169_get_ethtool_stats,
-	.set_flags		= rtl8169_set_flags,
-	.get_flags		= ethtool_op_get_flags,
 };
 
 static void rtl8169_get_mac_version(struct rtl8169_private *tp,
@@ -2979,6 +2927,7 @@ static const struct net_device_ops rtl8169_netdev_ops = {
 	.ndo_tx_timeout		= rtl8169_tx_timeout,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_change_mtu		= rtl8169_change_mtu,
+	.ndo_set_features	= rtl8169_set_features,
 	.ndo_set_mac_address	= rtl_set_mac_address,
 	.ndo_do_ioctl		= rtl8169_ioctl,
 	.ndo_set_multicast_list	= rtl_set_rx_mode,
@@ -3425,7 +3374,16 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	netif_napi_add(dev, &tp->napi, rtl8169_poll, R8169_NAPI_WEIGHT);
 
-	dev->features |= NETIF_F_HW_VLAN_TX_RX | NETIF_F_GRO;
+	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
+		NETIF_F_RXCSUM |
+		NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
+	dev->features |= dev->hw_features;
+	dev->vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO |
+		NETIF_F_HIGHDMA;
+
+	if (tp->mac_version == RTL_GIGA_MAC_VER_05)
+		/* 8110SCd requires hardware Rx VLAN - disallow toggling */
+		dev->hw_features &= ~NETIF_F_HW_VLAN_RX;
 
 	tp->intr_mask = 0xffff;
 	tp->hw_start = cfg->hw_start;
@@ -3545,7 +3503,7 @@ static int rtl8169_open(struct net_device *dev)
 
 	rtl8169_init_phy(dev, tp);
 
-	rtl8169_vlan_mode(dev);
+	rtl8169_set_features(dev, dev->features);
 
 	rtl_pll_power_up(tp);
 
@@ -4642,12 +4600,11 @@ err_out:
 
 static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev)
 {
-	if (dev->features & NETIF_F_TSO) {
-		u32 mss = skb_shinfo(skb)->gso_size;
+	u32 mss = skb_shinfo(skb)->gso_size;
+
+	if (mss)
+		return LargeSend | ((mss & MSSMask) << MSSShift);
 
-		if (mss)
-			return LargeSend | ((mss & MSSMask) << MSSShift);
-	}
 	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		const struct iphdr *ip = ip_hdr(skb);
 
-- 
1.7.2.5


^ permalink raw reply related

* Re: [PATCH] net: r8169: convert to hw_features
From: Michał Mirosław @ 2011-04-08 12:44 UTC (permalink / raw)
  To: netdev; +Cc: Francois Romieu
In-Reply-To: <20110408123847.64FF813A64@rere.qmqm.pl>

On Fri, Apr 08, 2011 at 02:38:46PM +0200, Michał Mirosław wrote:
> This enables SG+IP_CSUM+TSO by default (there were no comments suggesting
> leaving them out was intentional).
> 
> This also fixes confusion around vlan_features in rtl8169_vlan_mode().

BTW, I noticed that TSO will break for MTU > 4095+(TCP+IP header len).
This needs handing in ndo_fix_features callback like other MTU-limited TSO
engines.

Best Regards,
Michał Mirosław

^ permalink raw reply

* RE: Low performance Intel 10GE NIC (3.2.10) on 2.6.38 Kernel
From: Eric Dumazet @ 2011-04-08 12:56 UTC (permalink / raw)
  To: Wei Gu; +Cc: Alexander Duyck, netdev, Kirsher, Jeffrey T
In-Reply-To: <D12839161ADD3A4B8DA63D1A134D084026E48BA66B@ESGSCCMS0001.eapac.ericsson.se>

Le vendredi 08 avril 2011 à 20:19 +0800, Wei Gu a écrit :
> Hi again,
> I tried more testing with by disable this CONFIG_DMAR with shipped
> 2.6.38 ixgbe and Intel released 3.2.10/3.1.15.
> All these test looks we can get >1Mpps 400bype packtes but not stable
> at all, there will huge number missing errors with 100% CPU IDLE:
> ethtool -S eth10 |grep rx_missed_errors
> 
>         rx_missed_errors: 76832040
> 
> SUM: 1102212 ETH8: 0  ETH10: 1102212 ETH6: 0 ETH4: 0
> SUM: 521841 ETH8: 0  ETH10: 521841 ETH6: 0 ETH4: 0
> SUM: 426776 ETH8: 0  ETH10: 426776 ETH6: 0 ETH4: 0
> SUM: 927520 ETH8: 0  ETH10: 927520 ETH6: 0 ETH4: 0
> SUM: 1171995 ETH8: 0  ETH10: 1171995 ETH6: 0 ETH4: 0
> SUM: 855980 ETH8: 0  ETH10: 855980 ETH6: 0 ETH4: 0
> 
> 
> Do you know if there is other options in the kernel will cause high
> rate rx_missed_errors with low CPU usage. (No problem on 2.6.32 with
> same test case)
> 
> perf  record:
> +     69.74%          swapper  [kernel.kallsyms]          [k] poll_idle
> +     11.62%          swapper  [kernel.kallsyms]          [k] intel_idle
> +      0.80%          swapper  [ixgbe]                    [k] ixgbe_poll
> +      0.79%             perf  [ixgbe]                    [k] ixgbe_poll
> +      0.77%             perf  [kernel.kallsyms]          [k] skb_copy_bits
> +      0.64%          swapper  [kernel.kallsyms]          [k] skb_copy_bits
> +      0.48%             perf  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> +      0.44%          swapper  [kernel.kallsyms]          [k] __kmalloc_node_track_caller
> +      0.36%          swapper  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> +      0.35%          swapper  [kernel.kallsyms]          [k] kfree
> +      0.35%             perf  [kernel.kallsyms]          [k] kmem_cache_alloc_node
> 

Make sure enough cpus serves interrupts, _before_ even starting your
stress test.

Then, make sure trafic is distributed to many different queues.
If a single flow is used, it probably uses a single queue ->single CPU.

Say you have irq affinities set to fffffffffffff  (all cpus able to
serve IRQ X,Y,Z,T,...)

Then you have a network burst (because you start your packet generator
at full rate), spreaded on many queues.

CPU0 takes hard interrupt for queue 0, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 0, eth10, and queues NAPI mode.
CPU0 takes hard interrupt for queue 1, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 1, eth10, and queues NAPI mode.
CPU0 takes hard interrupt for queue 2, eth8, and queues NAPI mode.
CPU0 takes hard interrupt for queue 2, eth10, and queues NAPI mode.
...
CPU0 takes hard interrupt for queue X, eth8, and queues NAPI mode.
...

Then softirq can start, and only CPU0 is able to handle NAPI for all the
queued devices. You are stuck, with CPU0 never leaving ksoftirqd.

NAPI handling is always performed on the CPU that received the hardware
interrupt, until we exit NAPI (and rearm interrupt delivery).
It cannot migrate to an "idle cpu"

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox