Netdev List
 help / color / mirror / Atom feed
* [PATCH] forcedeth: Stay in NAPI as long as there's work
From: Tom Herbert @ 2010-04-28  6:36 UTC (permalink / raw)
  To: netdev, aabdulla, davem

Add loop in NAPI poll routine to keep processing RX and TX as long as
there is more work to do.  This is similar to what tg3 and some other
drivers do.

This modification seems improves performance (maximum pps).  Running
500 instances of netperf TCP_RR test with one byte sizes on between
two sixteen core AMD machines (RPS enabled) gives:

Before patch: 186715 tps
With patch: 400949 tps

Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index a1c0e7b..1e4de7b 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -3736,6 +3736,23 @@ static irqreturn_t nv_nic_irq_tx(int foo, void *data)
 }
 
 #ifdef CONFIG_FORCEDETH_NAPI
+static inline int nv_has_work(struct fe_priv *np)
+{
+	if (nv_optimized(np)) {
+		return (
+		    ((np->get_rx.ex != np->put_rx.ex) &&
+		     !(le32_to_cpu(np->get_rx.ex->flaglen) & NV_RX2_AVAIL)) ||
+		    ((np->get_tx.ex != np->put_tx.ex) &&
+		     !(le32_to_cpu(np->get_tx.ex->flaglen) & NV_TX_VALID)));
+	} else {
+		return (
+		    ((np->get_rx.orig != np->put_rx.orig) &&
+		     !(le32_to_cpu(np->get_rx.orig->flaglen) & NV_RX_AVAIL)) ||
+		    ((np->get_tx.orig != np->put_tx.orig) &&
+		     !(le32_to_cpu(np->get_tx.orig->flaglen) & NV_TX_VALID)));
+	}
+}
+
 static int nv_napi_poll(struct napi_struct *napi, int budget)
 {
 	struct fe_priv *np = container_of(napi, struct fe_priv, napi);
@@ -3743,30 +3760,33 @@ static int nv_napi_poll(struct napi_struct *napi, int budget)
 	u8 __iomem *base = get_hwbase(dev);
 	unsigned long flags;
 	int retcode;
-	int tx_work, rx_work;
+	int tx_work = 0, rx_work = 0;
 
-	if (!nv_optimized(np)) {
-		spin_lock_irqsave(&np->lock, flags);
-		tx_work = nv_tx_done(dev, np->tx_ring_size);
-		spin_unlock_irqrestore(&np->lock, flags);
+	do {
+		if (!nv_optimized(np)) {
+			spin_lock_irqsave(&np->lock, flags);
+			tx_work += nv_tx_done(dev, np->tx_ring_size);
+			spin_unlock_irqrestore(&np->lock, flags);
 
-		rx_work = nv_rx_process(dev, budget);
-		retcode = nv_alloc_rx(dev);
-	} else {
-		spin_lock_irqsave(&np->lock, flags);
-		tx_work = nv_tx_done_optimized(dev, np->tx_ring_size);
-		spin_unlock_irqrestore(&np->lock, flags);
+			rx_work += nv_rx_process(dev, budget);
+			retcode = nv_alloc_rx(dev);
+		} else {
+			spin_lock_irqsave(&np->lock, flags);
+			tx_work += nv_tx_done_optimized(dev, np->tx_ring_size);
+			spin_unlock_irqrestore(&np->lock, flags);
 
-		rx_work = nv_rx_process_optimized(dev, budget);
-		retcode = nv_alloc_rx_optimized(dev);
-	}
+			rx_work += nv_rx_process_optimized(dev, budget);
+			retcode = nv_alloc_rx_optimized(dev);
+		}
 
-	if (retcode) {
-		spin_lock_irqsave(&np->lock, flags);
-		if (!np->in_shutdown)
-			mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-		spin_unlock_irqrestore(&np->lock, flags);
-	}
+		if (unlikely(retcode)) {
+			spin_lock_irqsave(&np->lock, flags);
+			if (!np->in_shutdown)
+				mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
+			spin_unlock_irqrestore(&np->lock, flags);
+			break;
+		}
+	} while (rx_work < budget && nv_has_work(np));
 
 	nv_change_interrupt_mode(dev, tx_work + rx_work);
 

^ permalink raw reply related

* Re: [PATCH torvalds-2.6] cdc_ether: fix autosuspend for mbm devices
From: Torgny Johansson @ 2010-04-28  7:07 UTC (permalink / raw)
  To: David Miller; +Cc: oneukum@suse.de, netdev@vger.kernel.org
In-Reply-To: <20100427.170757.146099597.davem@davemloft.net>

onsdagen den 28 april 2010 02.07.57 skrev  David Miller:
> From: Oliver Neukum <oneukum@suse.de>
> Date: Mon, 26 Apr 2010 12:30:21 +0200
> 
> >> Also, if you like this patch, is it possible to get it included in a
> >> minor release of 2.6.33 (e.g. 2.6.33.3) as a bugfix?
> > 
> > Up to David.
> 
> I'll apply to net-2.6 and queue up for -stable, thanks.

Great, thank you!

^ permalink raw reply

* [PATCH] xfrm: potential uninitialized variable num_xfrms
From: Changli Gao @ 2010-04-28  7:20 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jamal Hadi Salim, Timo Teras, Herbert Xu, Alexey Dobriyan, netdev,
	Changli Gao

potential uninitialized variable num_xfrms

fix compiler warning: 'num_xfrms' may be used uninitialized in this function.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/xfrm/xfrm_policy.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 7430ac2..31f4ba4 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1732,7 +1732,7 @@ int __xfrm_lookup(struct net *net, struct dst_entry **dst_p, struct flowi *fl,
 	struct dst_entry *dst, *dst_orig = *dst_p, *route;
 	u16 family = dst_orig->ops->family;
 	u8 dir = policy_to_flow_dir(XFRM_POLICY_OUT);
-	int i, err, num_pols, num_xfrms, drop_pols = 0;
+	int i, err, num_pols, num_xfrms = 0, drop_pols = 0;
 
 restart:
 	dst = NULL;

^ permalink raw reply related

* Re: [PATCH/RFC Resubmission] cdc_ether: Identify MBM devices by GUID in MDLM descriptor
From: Oliver Neukum @ 2010-04-28  8:51 UTC (permalink / raw)
  To: David Miller; +Cc: jonas.sjoquist, netdev
In-Reply-To: <20100427.170959.256868398.davem@davemloft.net>

Am Mittwoch, 28. April 2010 02:09:59 schrieb David Miller:
> From: Jonas Sjoquist <jonas.sjoquist@ericsson.com>
> Date: Fri, 23 Apr 2010 13:07:45 +0200
> 
> > From: Jonas Sjöquist <jonas.sjoquist@ericsson.com>
> > 
> > This patch removes vid/pid for Ericsson MBM devices from the whitelist set of
> > devices. The MBM devices are instead identified by GUID.
> > 
> > In order for cdc_ether to handle these devices the GUID in the MDLM descriptor
> > is tested. All MBM devices currently handled by cdc_ether as well as future
> > CDC Ethernet MBM devices can be identified by the GUID.
> > 
> > This is the same solution used in Carl Nordbeck's mbm driver,
> > http://kerneltrap.org/mailarchive/linux-usb/2008/11/17/4141384/thread
> > 
> > I post this as RFC to get feedback on however cdc_ether is the correct place to
> > do the binding, or if it should be done in a separate driver, e.g. zaurus.
> > 
> > Signed-off-by: Jonas Sjöquist <jonas.sjoquist@ericsson.com>
> 
> Can someone knowledgable with the cdc_ether driver review this change?

The patch looks good.

	Regards
		Oliver

^ permalink raw reply

* iputils flowlabel
From: Jiri Skala @ 2010-04-28  8:51 UTC (permalink / raw)
  To: netdev

Hi,
I'd like to ask about current state of FLOWLABEL functionality. This is
currently wrapped into define and disabled by default.

Trying to enable it means to define (somehow = in proprietary header)
in6_flowlabel_req structure because usage of linux/in6.h conflicts with
glibc's headers.

Is flowlabel fnc inside ifdef due to described 'header' issue? Any other
comment to this topic?

Thanks

Jiri


^ permalink raw reply

* [patch] iwl: cleanup: remove unneeded error handling
From: Dan Carpenter @ 2010-04-28  9:01 UTC (permalink / raw)
  To: Samuel Ortiz
  Cc: Zhu Yi, Intel Linux Wireless, John W. Linville, Andrew Morton,
	Alexey Dobriyan, linux-wireless, netdev, linux-kernel,
	kernel-janitors

This is just a cleanup and doesn't change how the code works.

debugfs_create_dir() and debugfs_create_file() return an error pointer 
(-ENODEV) if CONFIG_DEBUG_FS is not enabled, otherwise if an error occurs
they return NULL.  This is how they are implemented and what it says in 
the DebugFS documentation.  DebugFS can not be compiled as a module.  

As a result, we only need to check for error pointers and particularly 
-ENODEV one time to know that DebugFS is enabled.  This patch keeps the 
first check for error pointers and removes the rest. 

The other reason for this patch, is that it silences some Smatch warnings.
Smatch sees the condition "(result != -ENODEV)" and assumes that it's 
possible for "result" to equal -ENODEV.  If it were possible it would lead
to an error pointer dereference.  But since it's not, we can just remove
the check.

Signed-off-by: Dan Carpenter <error27@gmail.com>

diff --git a/drivers/net/wireless/iwmc3200wifi/debugfs.c b/drivers/net/wireless/iwmc3200wifi/debugfs.c
index cbb81be..10eeeeb 100644
--- a/drivers/net/wireless/iwmc3200wifi/debugfs.c
+++ b/drivers/net/wireless/iwmc3200wifi/debugfs.c
@@ -438,40 +438,10 @@ int iwm_debugfs_init(struct iwm_priv *iwm)
 	snprintf(devdir, sizeof(devdir), "%s", wiphy_name(iwm_to_wiphy(iwm)));
 
 	iwm->dbg.devdir = debugfs_create_dir(devdir, iwm->dbg.rootdir);
-	result = PTR_ERR(iwm->dbg.devdir);
-	if (IS_ERR(iwm->dbg.devdir) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create devdir: %d\n", result);
-		goto error;
-	}
-
 	iwm->dbg.dbgdir = debugfs_create_dir("debug", iwm->dbg.devdir);
-	result = PTR_ERR(iwm->dbg.dbgdir);
-	if (IS_ERR(iwm->dbg.dbgdir) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create dbgdir: %d\n", result);
-		goto error;
-	}
-
 	iwm->dbg.rxdir = debugfs_create_dir("rx", iwm->dbg.devdir);
-	result = PTR_ERR(iwm->dbg.rxdir);
-	if (IS_ERR(iwm->dbg.rxdir) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create rx dir: %d\n", result);
-		goto error;
-	}
-
 	iwm->dbg.txdir = debugfs_create_dir("tx", iwm->dbg.devdir);
-	result = PTR_ERR(iwm->dbg.txdir);
-	if (IS_ERR(iwm->dbg.txdir) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create tx dir: %d\n", result);
-		goto error;
-	}
-
 	iwm->dbg.busdir = debugfs_create_dir("bus", iwm->dbg.devdir);
-	result = PTR_ERR(iwm->dbg.busdir);
-	if (IS_ERR(iwm->dbg.busdir) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create bus dir: %d\n", result);
-		goto error;
-	}
-
 	if (iwm->bus_ops->debugfs_init) {
 		result = iwm->bus_ops->debugfs_init(iwm, iwm->dbg.busdir);
 		if (result < 0) {
@@ -480,27 +450,15 @@ int iwm_debugfs_init(struct iwm_priv *iwm)
 		}
 	}
 
-
 	iwm->dbg.dbg_level = IWM_DL_NONE;
 	iwm->dbg.dbg_level_dentry =
 		debugfs_create_file("level", 0200, iwm->dbg.dbgdir, iwm,
 				    &fops_iwm_dbg_level);
-	result = PTR_ERR(iwm->dbg.dbg_level_dentry);
-	if (IS_ERR(iwm->dbg.dbg_level_dentry) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create dbg_level: %d\n", result);
-		goto error;
-	}
-
 
 	iwm->dbg.dbg_modules = IWM_DM_DEFAULT;
 	iwm->dbg.dbg_modules_dentry =
 		debugfs_create_file("modules", 0200, iwm->dbg.dbgdir, iwm,
 				    &fops_iwm_dbg_modules);
-	result = PTR_ERR(iwm->dbg.dbg_modules_dentry);
-	if (IS_ERR(iwm->dbg.dbg_modules_dentry) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create dbg_modules: %d\n", result);
-		goto error;
-	}
 
 	for (i = 0; i < __IWM_DM_NR; i++)
 		add_dbg_module(iwm->dbg, iwm_debug_module[i].name,
@@ -509,39 +467,18 @@ int iwm_debugfs_init(struct iwm_priv *iwm)
 	iwm->dbg.txq_dentry = debugfs_create_file("queues", 0200,
 						  iwm->dbg.txdir, iwm,
 						  &iwm_debugfs_txq_fops);
-	result = PTR_ERR(iwm->dbg.txq_dentry);
-	if (IS_ERR(iwm->dbg.txq_dentry) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create tx queue: %d\n", result);
-		goto error;
-	}
 
 	iwm->dbg.tx_credit_dentry = debugfs_create_file("credits", 0200,
 						   iwm->dbg.txdir, iwm,
 						   &iwm_debugfs_tx_credit_fops);
-	result = PTR_ERR(iwm->dbg.tx_credit_dentry);
-	if (IS_ERR(iwm->dbg.tx_credit_dentry) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create tx credit: %d\n", result);
-		goto error;
-	}
 
 	iwm->dbg.rx_ticket_dentry = debugfs_create_file("tickets", 0200,
 						  iwm->dbg.rxdir, iwm,
 						  &iwm_debugfs_rx_ticket_fops);
-	result = PTR_ERR(iwm->dbg.rx_ticket_dentry);
-	if (IS_ERR(iwm->dbg.rx_ticket_dentry) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create rx ticket: %d\n", result);
-		goto error;
-	}
 
 	iwm->dbg.fw_err_dentry = debugfs_create_file("last_fw_err", 0200,
 						     iwm->dbg.dbgdir, iwm,
 						     &iwm_debugfs_fw_err_fops);
-	result = PTR_ERR(iwm->dbg.fw_err_dentry);
-	if (IS_ERR(iwm->dbg.fw_err_dentry) && (result != -ENODEV)) {
-		IWM_ERR(iwm, "Couldn't create last FW err: %d\n", result);
-		goto error;
-	}
-
 
 	return 0;
 

^ permalink raw reply related

* Re: [v4 Patch 1/3] netpoll: add generic support for bridge and  bonding devices
From: Cong Wang @ 2010-04-28  9:59 UTC (permalink / raw)
  To: Dongdong Deng
  Cc: linux-kernel, Matt Mackall, netdev, bridge, Andy Gospodarek,
	Neil Horman, Jeff Moyer, Stephen Hemminger, bonding-devel,
	Jay Vosburgh, David Miller
In-Reply-To: <j2mce2c83091004272102ye46e65fbod138e63dc58bccd5@mail.gmail.com>

Dongdong Deng wrote:
> 
> 
> +             if (ops->ndo_netpoll_cleanup)
> +                                       ops->ndo_netpoll_cleanup(np->dev);
> +             np->dev->npinfo = NULL;
> 
> I think it is good to set np->dev->npinfo to NULL  even though we have
> the netpoll_cleanup opt.
> 

This is redundant, since ->ndo_netpoll_cleanup will set it.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next-2.6] bnx2x: Remove two prefetch()
From: jamal @ 2010-04-28 11:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, xiaosuo, therbert, shemminger, netdev,
	Eilon Greenstein
In-Reply-To: <1272406693.2343.26.camel@edumazet-laptop>

On Wed, 2010-04-28 at 00:18 +0200, Eric Dumazet wrote:

> Thanks David, I was about to resubmit the cumulative patch ;)

Hrm, i never got the email with your patch on top of Changlis
(the fscking ISP has creative ways of reordering, delaying and also
occassionaly loosing my emails). So all my tests from last
week did not include the extra patch. I will try to make time today
to test with latest net-next which seems to have some extra goodies.
If there is any other patch you want me to try let me know...

cheers,
jamal


^ permalink raw reply

* Re: [PATCH 2/4] [RFC] Add sock_create_kern_net()
From: jamal @ 2010-04-28 11:44 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers, netdev, Daniel Lezcano, Eric W. Biederman
In-Reply-To: <1272034539-19899-3-git-send-email-danms@us.ibm.com>


On Fri, 2010-04-23 at 07:55 -0700, Dan Smith wrote:
> This helper allows kernel routines to create a socket in a given netns,
> instead of forcing it to the initial or current one.
> 
> I know this seems like it's violating the netns boundary.  The intended
> use (as in the following patches) is specifically when talking to RTNETLINK
> in another netns for the purposes of creating or examining resources there.
> It is expected that this will be used for that sort of transient socket
> creation only.  In other words:
> 
>   s = sock_create_kern_net(AF_NETLINK, ..., other_netns, ...);
>   rtnl_talk(s);
>   close(s);
> 

CCing Eric B. and Daniel with whom i have had this discussion before.

So ... how does user space know what "other_netns" is?

Also note Eric's recent patches introduced another way of opening a
socket in a different namespace - are you using those in the abstraction
to find what netns is?

cheers,
jamal


^ permalink raw reply

* Re: [PATCH net-next-2.6] bnx2x: Remove two prefetch()
From: Eric Dumazet @ 2010-04-28 12:33 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, xiaosuo, therbert, shemminger, netdev,
	Eilon Greenstein
In-Reply-To: <1272454432.14068.4.camel@bigi>

Le mercredi 28 avril 2010 à 07:33 -0400, jamal a écrit :
> On Wed, 2010-04-28 at 00:18 +0200, Eric Dumazet wrote:
> 
> > Thanks David, I was about to resubmit the cumulative patch ;)
> 
> Hrm, i never got the email with your patch on top of Changlis
> (the fscking ISP has creative ways of reordering, delaying and also
> occassionaly loosing my emails). So all my tests from last
> week did not include the extra patch. I will try to make time today
> to test with latest net-next which seems to have some extra goodies.
> If there is any other patch you want me to try let me know...
> 
> cheers,
> jamal

If you wait a bit, I have another patch to speedup udp receive path ;)



^ permalink raw reply

* Re: [PATCH net-next-2.6] bnx2x: Remove two prefetch()
From: jamal @ 2010-04-28 12:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, xiaosuo, therbert, shemminger, netdev,
	Eilon Greenstein
In-Reply-To: <1272458001.2267.0.camel@edumazet-laptop>

On Wed, 2010-04-28 at 14:33 +0200, Eric Dumazet wrote:

> If you wait a bit, I have another patch to speedup udp receive path ;)

Shoot whenever you are ready ;-> I will test with and without your
patch..

cheers,
jamal


^ permalink raw reply

* [PATCH] bonding: fix arp_validate on bonds inside a bridge
From: Jiri Bohac @ 2010-04-28 12:59 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: bonding-devel, netdev

Hi,

bonding with arp_validate does not currently work when the
bonding master is part of a bridge. This is because
bond_arp_rcv() is registered as a packet type handler for ARP,
but before netif_receive_skb() processes the ptype_base hash
table, handle_bridge() is called and changes the skb->dev to
point to the bridge device.

This patch makes bonding_should_drop() call the bonding ARP
handler directly if a IFF_MASTER_NEEDARP flag is set on the
bonding master.  bond_register_arp() now only needs to set the
IFF_MASTER_NEEDARP flag.

We ne longer need special ARP handling for inactive slaves, hence
IFF_SLAVE_NEEDARP is not needed.

skb_reset_network_header() and skb_reset_transport_header() need
to be called before the call to bonding_should_drop() because
bond_handle_arp() needs the offsets initialized.

P.S.: bonding_should_drop() should probably be renamed to
handle_bonding() -- we already have handle_bridge() and
handle_macvlan(), and bonding_should_drop() has long been doing
other stuff than deciding which packets to drop...



Signed-off-by: Jiri Bohac <jbohac@suse.cz>

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 0075514..cafd404 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1940,8 +1940,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 	}
 
 	slave_dev->priv_flags &= ~(IFF_MASTER_8023AD | IFF_MASTER_ALB |
-				   IFF_SLAVE_INACTIVE | IFF_BONDING |
-				   IFF_SLAVE_NEEDARP);
+				   IFF_SLAVE_INACTIVE | IFF_BONDING);
 
 	kfree(slave);
 
@@ -2612,11 +2611,12 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 	}
 }
 
-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
+static void bond_handle_arp(struct sk_buff *skb)
 {
 	struct arphdr *arp;
 	struct slave *slave;
 	struct bonding *bond;
+	struct net_device *dev = skb->dev->master, *orig_dev = skb->dev;
 	unsigned char *arp_ptr;
 	__be32 sip, tip;
 
@@ -2637,9 +2637,8 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
 	bond = netdev_priv(dev);
 	read_lock(&bond->lock);
 
-	pr_debug("bond_arp_rcv: bond %s skb->dev %s orig_dev %s\n",
-		 bond->dev->name, skb->dev ? skb->dev->name : "NULL",
-		 orig_dev ? orig_dev->name : "NULL");
+	pr_debug("bond_handle_arp: bond: %s, master: %s, slave: %s\n",
+		bond->dev->name, dev->name, orig_dev->name);
 
 	slave = bond_get_slave_by_dev(bond, orig_dev);
 	if (!slave || !slave_do_arp_validate(bond, slave))
@@ -2684,8 +2683,7 @@ static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct pack
 out_unlock:
 	read_unlock(&bond->lock);
 out:
-	dev_kfree_skb(skb);
-	return NET_RX_SUCCESS;
+	return;
 }
 
 /*
@@ -3567,23 +3565,12 @@ static void bond_unregister_lacpdu(struct bonding *bond)
 
 void bond_register_arp(struct bonding *bond)
 {
-	struct packet_type *pt = &bond->arp_mon_pt;
-
-	if (pt->type)
-		return;
-
-	pt->type = htons(ETH_P_ARP);
-	pt->dev = bond->dev;
-	pt->func = bond_arp_rcv;
-	dev_add_pack(pt);
+	bond->dev->priv_flags |= IFF_MASTER_NEEDARP;
 }
 
 void bond_unregister_arp(struct bonding *bond)
 {
-	struct packet_type *pt = &bond->arp_mon_pt;
-
-	dev_remove_pack(pt);
-	pt->type = 0;
+	bond->dev->priv_flags &= ~IFF_MASTER_NEEDARP;
 }
 
 /*---------------------------- Hashing Policies -----------------------------*/
@@ -5041,6 +5028,7 @@ static int __init bonding_init(void)
 	register_netdevice_notifier(&bond_netdev_notifier);
 	register_inetaddr_notifier(&bond_inetaddr_notifier);
 	bond_register_ipv6_notifier();
+	bond_handle_arp_hook = bond_handle_arp;
 out:
 	return res;
 err:
@@ -5061,6 +5049,7 @@ static void __exit bonding_exit(void)
 
 	rtnl_link_unregister(&bond_link_ops);
 	unregister_pernet_subsys(&bond_net_ops);
+	bond_handle_arp_hook = NULL;
 }
 
 module_init(bonding_init);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 257a7a4..57adfe5 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -212,7 +212,6 @@ struct bonding {
 	struct   bond_params params;
 	struct   list_head vlan_list;
 	struct   vlan_group *vlgrp;
-	struct   packet_type arp_mon_pt;
 	struct   workqueue_struct *wq;
 	struct   delayed_work mii_work;
 	struct   delayed_work arp_work;
@@ -292,14 +291,12 @@ static inline void bond_set_slave_inactive_flags(struct slave *slave)
 	if (!bond_is_lb(bond))
 		slave->state = BOND_STATE_BACKUP;
 	slave->dev->priv_flags |= IFF_SLAVE_INACTIVE;
-	if (slave_do_arp_validate(bond, slave))
-		slave->dev->priv_flags |= IFF_SLAVE_NEEDARP;
 }
 
 static inline void bond_set_slave_active_flags(struct slave *slave)
 {
 	slave->state = BOND_STATE_ACTIVE;
-	slave->dev->priv_flags &= ~(IFF_SLAVE_INACTIVE | IFF_SLAVE_NEEDARP);
+	slave->dev->priv_flags &= ~IFF_SLAVE_INACTIVE;
 }
 
 static inline void bond_set_master_3ad_flags(struct bonding *bond)
diff --git a/include/linux/if.h b/include/linux/if.h
index 3a9f410..84ab2c8 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -63,7 +63,7 @@
 #define IFF_MASTER_8023AD	0x8	/* bonding master, 802.3ad. 	*/
 #define IFF_MASTER_ALB	0x10		/* bonding master, balance-alb.	*/
 #define IFF_BONDING	0x20		/* bonding master or slave	*/
-#define IFF_SLAVE_NEEDARP 0x40		/* need ARPs for validation	*/
+#define IFF_MASTER_NEEDARP 0x40		/* need ARPs for validation	*/
 #define IFF_ISATAP	0x80		/* ISATAP interface (RFC4214)	*/
 #define IFF_MASTER_ARPMON 0x100		/* bonding master, ARP mon in use */
 #define IFF_WAN_HDLC	0x200		/* WAN HDLC device		*/
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa8b476..9f82fc6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2055,6 +2055,8 @@ static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
 	}
 }
 
+extern void (*bond_handle_arp_hook)(struct sk_buff *skb);
+
 /* On bonding slaves other than the currently active slave, suppress
  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
  * ARP on active-backup slaves with arp_validate enabled.
@@ -2076,11 +2078,13 @@ static inline int skb_bond_should_drop(struct sk_buff *skb,
 			skb_bond_set_mac_by_master(skb, master);
 		}
 
-		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
-			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
-			    skb->protocol == __cpu_to_be16(ETH_P_ARP))
-				return 0;
+		/* pass ARP frames directly to bonding
+		   before bridging or other hooks change them */
+		if ((master->priv_flags & IFF_MASTER_NEEDARP) &&
+		    skb->protocol == __cpu_to_be16(ETH_P_ARP))
+			bond_handle_arp_hook(skb);
 
+		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
 			if (master->priv_flags & IFF_MASTER_ALB) {
 				if (skb->pkt_type != PACKET_BROADCAST &&
 				    skb->pkt_type != PACKET_MULTICAST)
diff --git a/net/core/dev.c b/net/core/dev.c
index f769098..98d85a8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2314,6 +2314,9 @@ static inline int deliver_skb(struct sk_buff *skb,
 	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
 
+void (*bond_handle_arp_hook)(struct sk_buff *skb);
+EXPORT_SYMBOL_GPL(bond_handle_arp_hook);
+
 #if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE)
 
 #if defined(CONFIG_ATM_LANE) || defined(CONFIG_ATM_LANE_MODULE)
@@ -2507,6 +2510,10 @@ int netif_receive_skb(struct sk_buff *skb)
 	if (!skb->skb_iif)
 		skb->skb_iif = skb->dev->ifindex;
 
+	skb_reset_network_header(skb);
+	skb_reset_transport_header(skb);
+	skb->mac_len = skb->network_header - skb->mac_header;
+
 	null_or_orig = NULL;
 	orig_dev = skb->dev;
 	master = ACCESS_ONCE(orig_dev->master);
@@ -2519,10 +2526,6 @@ int netif_receive_skb(struct sk_buff *skb)
 
 	__get_cpu_var(netdev_rx_stat).total++;
 
-	skb_reset_network_header(skb);
-	skb_reset_transport_header(skb);
-	skb->mac_len = skb->network_header - skb->mac_header;
-
 	pt_prev = NULL;
 
 	rcu_read_lock();


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ


^ permalink raw reply related

* Re: [net-next-2.6 PATCH 1/2] Add netdev port-profile support (take III, was iovnl)
From: Arnd Bergmann @ 2010-04-28 13:13 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw
In-Reply-To: <20100428044235.8646.61943.stgit@savbu-pc100.cisco.com>

On Wednesday 28 April 2010, Scott Feldman wrote:
> From: Scott Feldman <scofeldm@cisco.com>
> 
> Add new netdev ops ndo_{set|get}_port_profile to allow setting of port-profile
> on a netdev interface.  Extends RTM_SETLINK/RTM_GETLINK with new sub cmd called
> IFLA_PORT_PROFILE (added to end of IFLA_cmd list).  The port-profile cmd
> arguments are (as seen from modified iproute2 cmdline):
>
>        ip link set DEVICE [ { up | down } ]
>                           [ arp { on | off } ]
>                           [ dynamic { on | off } ]
>                           [ multicast { on | off } ]
>                           ...
>                           [ vf NUM [ mac LLADDR ]
>                                    [ vlan VLANID [ qos VLAN-QOS ] ]
>                                    [ rate TXRATE ] ] 
>                           [ port_profile [ PORT-PROFILE
>                                    [ mac LLADDR ]
>                                    [ host_uuid HOST_UUID ]
>                                    [ client_uuid CLIENT_UUID ]
>                                    [ client_name CLIENT_NAME ] ] ]
>        ip link show [ DEVICE ]

We will need a few more options to cover draft VDP in addition to the protocol
your NIC is using. I still think it's possible to use the same interface
for both, but the differences are obviously showing.

The missing bits that I can see so far are:

- You only have 'get' and 'set'. We will also need a 'unset' or 'delete'
  option in order to get rid of a port profile association.
- VDP has three different ways to 'set' a port profile: 'associate',
  'pre-associate with resource reservation' and 'pre-associate without
  resource reservation'. This could become an extra option flag.
- Instead of a port profile name, VDP specifies a tuple like
   struct vsi_associate {
	unsigned char VSI_Mgr_ID;       /* VSI manager ID */
	unsigned char VSI_Type_ID[3];   /* 24 bit VSI Type ID */
	unsigned char VSI_Type_Version; /* VSI Type version */
   };
   I'm not sure how to deal with that best, but there needs to be
   some parsing of these numbers.
- VDP requires a vlan ID to be part of the association, in addition to
  the MAC address. In theory, we could have multiple tuples of MAC+VLAN
  addresses, but we can probably just associate each tuple separately
  and ignore that part of the standard.
- we have a set of possible error conditions that can be returned by
  the switch (invalid format, insufficient resources, unknown VTID,
  VTID violation, VTID verison violation, out of sync). It should be
  possible to return each of these to the user with 'get'.

> Since we're using netlink sockets, the receiver of the RTM_SETLINK msg can
> be in kernel- or user-space.  For kernel-space recipient, rtnetlink.c, the
> new ndo_set_port_profile netdev op is called to set the port-profile.
> User-space recipients can decide how they propagate the msg to the switch.
> There is also a RTM_GETLINK cmd to to return port-profile setting of an
> interface and to also return the status of the last port-profile.

More on a stylistic note, I'm not convinced that using RTM_SETLINK/GETLINK
is the right interface for this, unlike the 'ip link set DEV vf ...' stuff,
because it seems to suggest that this is an option of the adapter itself.
I actually liked the iovnl family better in this regard, because it kept
the protocols separate.

What I could imagine to unify this is something like

   ip port_profile set DEVICE [ { pre_associate | pre_associate_rr } ]
                              { name PORT-PROFILE | vsi MGR:VTID:VER }
                                    mac LLADDR
				    [ vlan VID ]
                                    [ host_uuid HOST_UUID ]
                                    [ client_uuid CLIENT_UUID ]
                                    [ client_name CLIENT_NAME ]
   ip port_profile del DEVICE [ mac LLADDR [ vlan VID ] ]
   ip port_profile show DEVICE

	Arnd

^ permalink raw reply

* Re: [PATCH net-next-2.6] bnx2x: Remove two prefetch()
From: Eilon Greenstein @ 2010-04-28 13:14 UTC (permalink / raw)
  To: David Miller
  Cc: vladz, eliezert, eric.dumazet@gmail.com, xiaosuo@gmail.com,
	hadi@cyberus.ca, therbert@google.com, shemminger@vyatta.com,
	netdev@vger.kernel.org
In-Reply-To: <20100427.151937.62344362.davem@davemloft.net>

On Tue, 2010-04-27 at 15:19 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 28 Apr 2010 00:18:13 +0200
> 
> > [PATCH net-next-2.6] bnx2x: Remove two prefetch()
> > 
> > 1) Even on 64bit arches, sizeof(struct sk_buff) < 256
> > 2) No need to prefetch same pointer twice.
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Eilon Greenstein <eilong@broadcom.com>
> 
> Eilon please review and ACK/NACK

Vlad ran few benchmarks, and we couldn't find any justification for
those prefetch calls. After consulting with Eliezer Tamir (the original
author) we are glad to Ack this patch.

Thanks Eric!
Acked-by: <eilong@broadcom.com>




^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/2] add ndo_set_port_profile op support for enic dynamic vnics
From: Arnd Bergmann @ 2010-04-28 13:32 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw
In-Reply-To: <20100428044240.8646.27412.stgit@savbu-pc100.cisco.com>

On Wednesday 28 April 2010, Scott Feldman wrote:
> +static int enic_set_port_profile(struct net_device *netdev,
> +       struct ifla_port_profile *ipp)
> +{
> +       struct enic *enic = netdev_priv(netdev);
> +       struct vic_provinfo *vp;
> +       u8 oui[3] = VIC_PROVINFO_CISCO_OUI;
> +       u8 *mac = ipp->mac;
> +       int err;
> +
> +       memset(&enic->port_profile, 0, sizeof(enic->port_profile));
> +
> +       if (!enic_is_dynamic(enic))
> +               return -EOPNOTSUPP;

Not sure I understand how this fits together. You said in an earlier mail:

> > Anything that ties port profiles to VFs seems fundamentally flawed AFAICT,
> > at least when we want to extend this to adapters that don't do it in firmware.
>
> Ya, I tend I agree.  Let's just make port-profile a setting of any netdev,
> an eth, macvtap, eth.x, bond, etc.  That's probably what I should have done
> in the first place.  Something like:

I thought you had meant that we can do the association of attached interfaces
through any interface, rather than tying it to the slave interface. While I'm
not sure I read your code correctly, it seems like you now only talk to the
slave interface, not to the master at all!

At least the check above should be 'if (enic_is_dynamic(enic)) return -EOPNOTSUPP',
not the other way round.
Moreover, if the netdev is the master here, you only allow a single slave, which
is not enough for larger setups (n > 1), though that could be a limitation of your
first version.

Passing just the slave device however would not work in the general case, as I
tried to point out in the mail you replied to. If the slave interface is owned
by a guest using PCI passthrough, or it sits below a stack of nested interfaces
(vlan, bridge, tap, vhost, ...), it's impossible to know what interface is
responsible for setting up the slave. Note that you cannot perform the association
through the slave interface itself because the remote switch would discard any
traffic originating from an unassociated interface.

> +static int enic_get_port_profile(struct net_device *netdev,
> +       struct ifla_port_profile *ipp)
> +{
> +       struct enic *enic = netdev_priv(netdev);
> +       int done, err, error;
> +
> +       enic->port_profile.status = IFLA_PORT_PROFILE_STATUS_UNKNOWN;
> +
> +       spin_lock(&enic->devcmd_lock);
> +       err = vnic_dev_init_done(enic->vdev, &done, &error);
> +       spin_unlock(&enic->devcmd_lock);
> +
> +       if (err || error)
> +               enic->port_profile.status = IFLA_PORT_PROFILE_STATUS_ERROR;
> +
> +       if (!done)
> +               enic->port_profile.status = IFLA_PORT_PROFILE_STATUS_INPROGRESS;
> +
> +       if (!error)
> +               enic->port_profile.status = IFLA_PORT_PROFILE_STATUS_SUCCESS;
> +
> +       memcpy(ipp, &enic->port_profile, sizeof(enic->port_profile));
> +
> +       return 0;
> +}
> +

Similarly, this interface only passes back a single port profile association,
where it should have a way to return all of the slave ports.

	Arnd

^ permalink raw reply

* Re: [PATCH 2/4] [RFC] Add sock_create_kern_net()
From: Dan Smith @ 2010-04-28 13:38 UTC (permalink / raw)
  To: hadi; +Cc: containers, netdev, Daniel Lezcano, Eric W. Biederman
In-Reply-To: <1272455094.14068.15.camel@bigi>

j> So ... how does user space know what "other_netns" is?

That's the point, userspace doesn't know about and can't use this
interface.  This is a way for the kernel to open a socket in another
netns to talk to that netns' RTNETLINK.  I realize in its current form
it could be used for something more nefarious, but it would be kernel
code doing it.

j> Also note Eric's recent patches introduced another way of opening a
j> socket in a different namespace - are you using those in the
j> abstraction to find what netns is?

No.  The process doing the checkpoint already has pointers to the
tasks and thus their netns pointers.  Eric's interface is an
abstraction to allow userspace to do something similar, I think that
using it from the kernel would be rather messy.

-- 
Dan Smith
IBM Linux Technology Center
email: danms@us.ibm.com

^ permalink raw reply

* [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173)
From: Neil Horman @ 2010-04-28 13:47 UTC (permalink / raw)
  To: vladislav.yasevich, sri, linux-sctp
  Cc: eteo, netdev, davem, security, nhorman

Hey-
	Recently, it was reported to me that the kernel could oops in the
following way:

<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU:    0
<5> EIP:    0060:[<c02bff27>]    Not tainted VLI
<5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
<5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
<5> ds: 007b   es: 007b   ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d 
<5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490 
<5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004 
<5> Call Trace:
<5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5>  [<c01555a4>] cache_grow+0x140/0x233
<5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5>  [<c02d005e>] nf_iterate+0x40/0x81
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
<5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e103e>] ip_rcv+0x334/0x3b4
<5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5>  [<c02c67a4>] process_backlog+0x6c/0xd9
<5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
<5>  [<c012a7b1>] __do_softirq+0x35/0x79
<5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5>  [<c01094de>] do_softirq+0x46/0x4d

Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.

The problem is in sctp_process_unk_param:
if (NULL == *errp)
	*errp = sctp_make_op_error_space(asoc, chunk,
					 ntohs(chunk->chunk_hdr->length));

	if (*errp) {
		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
				 WORD_ROUND(ntohs(param.p->length)));
		sctp_addto_chunk(*errp,
			WORD_ROUND(ntohs(param.p->length)),
				  param.v);

When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.

The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173

I've tested the below fix and confirmed that it fixes the issue.  It
pre-allocates the error chunk in sctp_verify_init, where we are able to count
the total number of variable length parameters, so we know how many error
headers we might need.  Then we simply use that chunk, if we find an error, or
discard/free it if all the parameters are valid.  Applies on top of the
lksctp-dev tree

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>


 sm_make_chunk.c |   24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)


diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index f592163..990457b 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2134,6 +2134,8 @@ int sctp_verify_init(const struct sctp_association *asoc,
 	union sctp_params param;
 	int has_cookie = 0;
 	int result;
+	unsigned int param_cnt;
+	unsigned int len;
 
 	/* Verify stream values are non-zero. */
 	if ((0 == peer_init->init_hdr.num_outbound_streams) ||
@@ -2149,6 +2151,7 @@ int sctp_verify_init(const struct sctp_association *asoc,
 
 		if (SCTP_PARAM_STATE_COOKIE == param.p->type)
 			has_cookie = 1;
+		param_cnt++;
 
 	} /* for (loop through all parameters) */
 
@@ -2169,6 +2172,20 @@ int sctp_verify_init(const struct sctp_association *asoc,
 		return sctp_process_missing_param(asoc, SCTP_PARAM_STATE_COOKIE,
 						  chunk, errp);
 
+	if (!*errp) {
+		/*
+		 * Pre-allocate the error packet here
+		 * we do this as we need to reserve space
+		 * for the worst case scenario in which 
+		 * every parameter is in error and needs 
+		 * an errhdr attached to it
+		 */
+		len = ntohs(chunk->chunk_hdr->length);
+		len += sizeof(sctp_errhdr_t)*param_cnt;
+
+		*errp = sctp_make_op_error_space(asoc, chunk, len);
+	}
+
 	/* Verify all the variable length parameters */
 	sctp_walk_params(param, peer_init, init_hdr.params) {
 
@@ -2176,9 +2193,11 @@ int sctp_verify_init(const struct sctp_association *asoc,
 		switch (result) {
 		    case SCTP_IERROR_ABORT:
 		    case SCTP_IERROR_NOMEM:
-				return 0;
 		    case SCTP_IERROR_ERROR:
-				return 1;
+				len = ntohs((*errp)->chunk_hdr->length);
+				if ((*errp) && (len == sizeof(sctp_chunkhdr_t)))
+					sctp_chunk_free(*errp);
+				return (result == SCTP_IERROR_ERROR) ? 1 : 0;
 		    case SCTP_IERROR_NO_ERROR:
 		    default:
 				break;
@@ -2186,6 +2205,7 @@ int sctp_verify_init(const struct sctp_association *asoc,
 
 	} /* for (loop through all parameters) */
 
+	sctp_chunk_free(*errp);
 	return 1;
 }
 

^ permalink raw reply related

* Re: [PATCH 0/3] [RFC] ptp: IEEE 1588 clock support
From: Wolfgang Grandegger @ 2010-04-28 13:50 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev
In-Reply-To: <20100428054706.GA4516@riccoc20.at.omicron.at>

Richard Cochran wrote:
> On Tue, Apr 27, 2010 at 06:20:25PM +0200, Wolfgang Grandegger wrote:
>> Do you have also a patch adding support for hardware timestamping to ptpd?
> 
> Yes, I do:
> 
>    https://sourceforge.net/tracker/index.php?func=detail&aid=2992847&group_id=139814&atid=744634

Thanks.

> I should have mentioned, you also need the gianfar HW time stamping
> patches, recently posted to netdev by Manfred Rudigier.

I'm aware of these patches. I'm actually using the net-next-2.6 git tree.

I got ptpd working but I do not yet see the PPS-Signals on my scope. At
a first glance, the PPS-Signal seems to be configured by the gianfar_ptp
driver (setting the fiper1 and timer1 registers) but I might have missed
something.

Wolfgang.

^ permalink raw reply

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173)
From: Vlad Yasevich @ 2010-04-28 14:00 UTC (permalink / raw)
  To: Neil Horman; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <20100428134748.GA4818@hmsreliant.think-freely.org>

I have this patch and a few others already queued.

I was planning on sending these today for stable.

Here is the full list of stable patches I have:

sctp: Fix oops when sending queued ASCONF chunks
sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set
sctp: per_cpu variables should be in bh_disabled section
sctp: fix potential reference of a freed pointer
sctp: avoid irq lock inversion while call sk->sk_data_ready()

-vlad

Neil Horman wrote:
> Hey-
> 	Recently, it was reported to me that the kernel could oops in the
> following way:
> 
> <5> kernel BUG at net/core/skbuff.c:91!
> <5> invalid operand: 0000 [#1]
> <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
> ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
> vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
> ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
> snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
> pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
> mptbase sd_mod scsi_mod
> <5> CPU:    0
> <5> EIP:    0060:[<c02bff27>]    Not tainted VLI
> <5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
> <5> EIP is at skb_over_panic+0x1f/0x2d
> <5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
> <5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
> <5> ds: 007b   es: 007b   ss: 0068
> <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
> <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
> e0c2947d 
> <5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
> df653490 
> <5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
> 00000004 
> <5> Call Trace:
> <5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
> <5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
> <5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
> <5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
> <5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
> <5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
> <5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
> <5>  [<c01555a4>] cache_grow+0x140/0x233
> <5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
> <5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
> <5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
> <5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
> <5>  [<c02d005e>] nf_iterate+0x40/0x81
> <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> <5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
> <5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
> <5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
> <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> <5>  [<c02e103e>] ip_rcv+0x334/0x3b4
> <5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
> <5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
> <5>  [<c02c67a4>] process_backlog+0x6c/0xd9
> <5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
> <5>  [<c012a7b1>] __do_softirq+0x35/0x79
> <5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
> <5>  [<c01094de>] do_softirq+0x46/0x4d
> 
> Its an skb_over_panic BUG halt that results from processing an init chunk in
> which too many of its variable length parameters are in some way malformed.
> 
> The problem is in sctp_process_unk_param:
> if (NULL == *errp)
> 	*errp = sctp_make_op_error_space(asoc, chunk,
> 					 ntohs(chunk->chunk_hdr->length));
> 
> 	if (*errp) {
> 		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> 				 WORD_ROUND(ntohs(param.p->length)));
> 		sctp_addto_chunk(*errp,
> 			WORD_ROUND(ntohs(param.p->length)),
> 				  param.v);
> 
> When we allocate an error chunk, we assume that the worst case scenario requires
> that we have chunk_hdr->length data allocated, which would be correct nominally,
> given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
> we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
> chunk, so the worst case situation in which all parameters are in violation
> requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
> 
> The result of this error is that a deliberately malformed packet sent to a
> listening host can cause a remote DOS, described in CVE-2010-1173:
> http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
> 
> I've tested the below fix and confirmed that it fixes the issue.  It
> pre-allocates the error chunk in sctp_verify_init, where we are able to count
> the total number of variable length parameters, so we know how many error
> headers we might need.  Then we simply use that chunk, if we find an error, or
> discard/free it if all the parameters are valid.  Applies on top of the
> lksctp-dev tree
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> 
> 
>  sm_make_chunk.c |   24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index f592163..990457b 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -2134,6 +2134,8 @@ int sctp_verify_init(const struct sctp_association *asoc,
>  	union sctp_params param;
>  	int has_cookie = 0;
>  	int result;
> +	unsigned int param_cnt;
> +	unsigned int len;
>  
>  	/* Verify stream values are non-zero. */
>  	if ((0 == peer_init->init_hdr.num_outbound_streams) ||
> @@ -2149,6 +2151,7 @@ int sctp_verify_init(const struct sctp_association *asoc,
>  
>  		if (SCTP_PARAM_STATE_COOKIE == param.p->type)
>  			has_cookie = 1;
> +		param_cnt++;
>  
>  	} /* for (loop through all parameters) */
>  
> @@ -2169,6 +2172,20 @@ int sctp_verify_init(const struct sctp_association *asoc,
>  		return sctp_process_missing_param(asoc, SCTP_PARAM_STATE_COOKIE,
>  						  chunk, errp);
>  
> +	if (!*errp) {
> +		/*
> +		 * Pre-allocate the error packet here
> +		 * we do this as we need to reserve space
> +		 * for the worst case scenario in which 
> +		 * every parameter is in error and needs 
> +		 * an errhdr attached to it
> +		 */
> +		len = ntohs(chunk->chunk_hdr->length);
> +		len += sizeof(sctp_errhdr_t)*param_cnt;
> +
> +		*errp = sctp_make_op_error_space(asoc, chunk, len);
> +	}
> +
>  	/* Verify all the variable length parameters */
>  	sctp_walk_params(param, peer_init, init_hdr.params) {
>  
> @@ -2176,9 +2193,11 @@ int sctp_verify_init(const struct sctp_association *asoc,
>  		switch (result) {
>  		    case SCTP_IERROR_ABORT:
>  		    case SCTP_IERROR_NOMEM:
> -				return 0;
>  		    case SCTP_IERROR_ERROR:
> -				return 1;
> +				len = ntohs((*errp)->chunk_hdr->length);
> +				if ((*errp) && (len == sizeof(sctp_chunkhdr_t)))
> +					sctp_chunk_free(*errp);
> +				return (result == SCTP_IERROR_ERROR) ? 1 : 0;
>  		    case SCTP_IERROR_NO_ERROR:
>  		    default:
>  				break;
> @@ -2186,6 +2205,7 @@ int sctp_verify_init(const struct sctp_association *asoc,
>  
>  	} /* for (loop through all parameters) */
>  
> +	sctp_chunk_free(*errp);
>  	return 1;
>  }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [PATCH net-next-2.6] net: speedup udp receive path
From: Eric Dumazet @ 2010-04-28 14:06 UTC (permalink / raw)
  To: hadi
  Cc: David Miller, xiaosuo, therbert, shemminger, netdev,
	Eilon Greenstein, Brian Bloniarz
In-Reply-To: <1272458174.14068.16.camel@bigi>

Le mercredi 28 avril 2010 à 08:36 -0400, jamal a écrit :
> On Wed, 2010-04-28 at 14:33 +0200, Eric Dumazet wrote:
> 
> > If you wait a bit, I have another patch to speedup udp receive path ;)
> 
> Shoot whenever you are ready ;-> I will test with and without your
> patch..
> 

Here it is ;)

Thanks

[PATCH net-next-2.6] net: speedup udp receive path

Since commit 95766fff ([UDP]: Add memory accounting.), 
each received packet needs one extra sock_lock()/sock_release() pair.

This added latency because of possible backlog handling. Then later,
ticket spinlocks added yet another latency source in case of DDOS.

This patch introduces lock_sock_bh() and unlock_sock_bh()
synchronization primitives, avoiding one atomic operation and backlog
processing.

skb_free_datagram_locked() uses them instead of full blown
lock_sock()/release_sock(). skb is orphaned inside locked section for
proper socket memory reclaim, and finally freed outside of it.

UDP receive path now take the socket spinlock only once.

Signed-off-by: Eric DUmazet <eric.dumazet@gmail.com>
---
 include/net/sock.h  |   10 ++++++++++
 net/core/datagram.c |   10 +++++++---
 net/ipv4/udp.c      |   12 ++++++------
 net/ipv6/udp.c      |    4 ++--
 4 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index cf12b1e..d361c77 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1021,6 +1021,16 @@ extern void release_sock(struct sock *sk);
 				SINGLE_DEPTH_NESTING)
 #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
 
+static inline void lock_sock_bh(struct sock *sk)
+{
+	spin_lock_bh(&sk->sk_lock.slock);
+}
+
+static inline void unlock_sock_bh(struct sock *sk)
+{
+	spin_unlock_bh(&sk->sk_lock.slock);
+}
+
 extern struct sock		*sk_alloc(struct net *net, int family,
 					  gfp_t priority,
 					  struct proto *prot);
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 5574a5d..95b851f 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -229,9 +229,13 @@ EXPORT_SYMBOL(skb_free_datagram);
 
 void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 {
-	lock_sock(sk);
-	skb_free_datagram(sk, skb);
-	release_sock(sk);
+	lock_sock_bh(sk);
+	skb_orphan(skb);
+	sk_mem_reclaim_partial(sk);
+	unlock_sock_bh(sk);
+
+	/* skb is now orphaned, might be freed outside of locked section */
+	consume_skb(skb);
 }
 EXPORT_SYMBOL(skb_free_datagram_locked);
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 63eb56b..1f86965 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1062,10 +1062,10 @@ static unsigned int first_packet_length(struct sock *sk)
 	spin_unlock_bh(&rcvq->lock);
 
 	if (!skb_queue_empty(&list_kill)) {
-		lock_sock(sk);
+		lock_sock_bh(sk);
 		__skb_queue_purge(&list_kill);
 		sk_mem_reclaim_partial(sk);
-		release_sock(sk);
+		unlock_sock_bh(sk);
 	}
 	return res;
 }
@@ -1196,10 +1196,10 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock(sk);
+	lock_sock_bh(sk);
 	if (!skb_kill_datagram(sk, skb, flags))
 		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	release_sock(sk);
+	unlock_sock_bh(sk);
 
 	if (noblock)
 		return -EAGAIN;
@@ -1624,9 +1624,9 @@ int udp_rcv(struct sk_buff *skb)
 
 void udp_destroy_sock(struct sock *sk)
 {
-	lock_sock(sk);
+	lock_sock_bh(sk);
 	udp_flush_pending_frames(sk);
-	release_sock(sk);
+	unlock_sock_bh(sk);
 }
 
 /*
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3ead20a..91c60f0 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -424,7 +424,7 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock(sk);
+	lock_sock_bh(sk);
 	if (!skb_kill_datagram(sk, skb, flags)) {
 		if (is_udp4)
 			UDP_INC_STATS_USER(sock_net(sk),
@@ -433,7 +433,7 @@ csum_copy_err:
 			UDP6_INC_STATS_USER(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
 	}
-	release_sock(sk);
+	unlock_sock_bh(sk);
 
 	if (flags & MSG_DONTWAIT)
 		return -EAGAIN;



^ permalink raw reply related

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173)
From: Vlad Yasevich @ 2010-04-28 14:17 UTC (permalink / raw)
  To: Neil Horman; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <4BD83F85.8090308@hp.com>



Vlad Yasevich wrote:
> I have this patch and a few others already queued.

Scratch that.  I totally misread the description and the patch.

-vlad
> 
> I was planning on sending these today for stable.
> 
> Here is the full list of stable patches I have:
> 
> sctp: Fix oops when sending queued ASCONF chunks
> sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set
> sctp: per_cpu variables should be in bh_disabled section
> sctp: fix potential reference of a freed pointer
> sctp: avoid irq lock inversion while call sk->sk_data_ready()
> 
> -vlad
> 
> Neil Horman wrote:
>> Hey-
>> 	Recently, it was reported to me that the kernel could oops in the
>> following way:
>>
>> <5> kernel BUG at net/core/skbuff.c:91!
>> <5> invalid operand: 0000 [#1]
>> <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
>> ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
>> vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
>> ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
>> snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
>> pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
>> mptbase sd_mod scsi_mod
>> <5> CPU:    0
>> <5> EIP:    0060:[<c02bff27>]    Not tainted VLI
>> <5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
>> <5> EIP is at skb_over_panic+0x1f/0x2d
>> <5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
>> <5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
>> <5> ds: 007b   es: 007b   ss: 0068
>> <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
>> <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
>> e0c2947d 
>> <5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
>> df653490 
>> <5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
>> 00000004 
>> <5> Call Trace:
>> <5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
>> <5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
>> <5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
>> <5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
>> <5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
>> <5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
>> <5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
>> <5>  [<c01555a4>] cache_grow+0x140/0x233
>> <5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
>> <5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
>> <5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
>> <5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
>> <5>  [<c02d005e>] nf_iterate+0x40/0x81
>> <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
>> <5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
>> <5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
>> <5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
>> <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
>> <5>  [<c02e103e>] ip_rcv+0x334/0x3b4
>> <5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
>> <5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
>> <5>  [<c02c67a4>] process_backlog+0x6c/0xd9
>> <5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
>> <5>  [<c012a7b1>] __do_softirq+0x35/0x79
>> <5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
>> <5>  [<c01094de>] do_softirq+0x46/0x4d
>>
>> Its an skb_over_panic BUG halt that results from processing an init chunk in
>> which too many of its variable length parameters are in some way malformed.
>>
>> The problem is in sctp_process_unk_param:
>> if (NULL == *errp)
>> 	*errp = sctp_make_op_error_space(asoc, chunk,
>> 					 ntohs(chunk->chunk_hdr->length));
>>
>> 	if (*errp) {
>> 		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
>> 				 WORD_ROUND(ntohs(param.p->length)));
>> 		sctp_addto_chunk(*errp,
>> 			WORD_ROUND(ntohs(param.p->length)),
>> 				  param.v);
>>
>> When we allocate an error chunk, we assume that the worst case scenario requires
>> that we have chunk_hdr->length data allocated, which would be correct nominally,
>> given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
>> we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
>> chunk, so the worst case situation in which all parameters are in violation
>> requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
>>
>> The result of this error is that a deliberately malformed packet sent to a
>> listening host can cause a remote DOS, described in CVE-2010-1173:
>> http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
>>
>> I've tested the below fix and confirmed that it fixes the issue.  It
>> pre-allocates the error chunk in sctp_verify_init, where we are able to count
>> the total number of variable length parameters, so we know how many error
>> headers we might need.  Then we simply use that chunk, if we find an error, or
>> discard/free it if all the parameters are valid.  Applies on top of the
>> lksctp-dev tree
>>
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>>
>>
>>  sm_make_chunk.c |   24 ++++++++++++++++++++++--
>>  1 file changed, 22 insertions(+), 2 deletions(-)
>>
>>
>> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
>> index f592163..990457b 100644
>> --- a/net/sctp/sm_make_chunk.c
>> +++ b/net/sctp/sm_make_chunk.c
>> @@ -2134,6 +2134,8 @@ int sctp_verify_init(const struct sctp_association *asoc,
>>  	union sctp_params param;
>>  	int has_cookie = 0;
>>  	int result;
>> +	unsigned int param_cnt;
>> +	unsigned int len;
>>  
>>  	/* Verify stream values are non-zero. */
>>  	if ((0 == peer_init->init_hdr.num_outbound_streams) ||
>> @@ -2149,6 +2151,7 @@ int sctp_verify_init(const struct sctp_association *asoc,
>>  
>>  		if (SCTP_PARAM_STATE_COOKIE == param.p->type)
>>  			has_cookie = 1;
>> +		param_cnt++;
>>  
>>  	} /* for (loop through all parameters) */
>>  
>> @@ -2169,6 +2172,20 @@ int sctp_verify_init(const struct sctp_association *asoc,
>>  		return sctp_process_missing_param(asoc, SCTP_PARAM_STATE_COOKIE,
>>  						  chunk, errp);
>>  
>> +	if (!*errp) {
>> +		/*
>> +		 * Pre-allocate the error packet here
>> +		 * we do this as we need to reserve space
>> +		 * for the worst case scenario in which 
>> +		 * every parameter is in error and needs 
>> +		 * an errhdr attached to it
>> +		 */
>> +		len = ntohs(chunk->chunk_hdr->length);
>> +		len += sizeof(sctp_errhdr_t)*param_cnt;
>> +
>> +		*errp = sctp_make_op_error_space(asoc, chunk, len);
>> +	}
>> +
>>  	/* Verify all the variable length parameters */
>>  	sctp_walk_params(param, peer_init, init_hdr.params) {
>>  
>> @@ -2176,9 +2193,11 @@ int sctp_verify_init(const struct sctp_association *asoc,
>>  		switch (result) {
>>  		    case SCTP_IERROR_ABORT:
>>  		    case SCTP_IERROR_NOMEM:
>> -				return 0;
>>  		    case SCTP_IERROR_ERROR:
>> -				return 1;
>> +				len = ntohs((*errp)->chunk_hdr->length);
>> +				if ((*errp) && (len == sizeof(sctp_chunkhdr_t)))
>> +					sctp_chunk_free(*errp);
>> +				return (result == SCTP_IERROR_ERROR) ? 1 : 0;
>>  		    case SCTP_IERROR_NO_ERROR:
>>  		    default:
>>  				break;
>> @@ -2186,6 +2205,7 @@ int sctp_verify_init(const struct sctp_association *asoc,
>>  
>>  	} /* for (loop through all parameters) */
>>  
>> +	sctp_chunk_free(*errp);
>>  	return 1;
>>  }
>>  
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: speedup udp receive path
From: Eric Dumazet @ 2010-04-28 14:19 UTC (permalink / raw)
  To: hadi
  Cc: David Miller, xiaosuo, therbert, shemminger, netdev,
	Eilon Greenstein, Brian Bloniarz
In-Reply-To: <1272463605.2267.70.camel@edumazet-laptop>

Le mercredi 28 avril 2010 à 16:06 +0200, Eric Dumazet a écrit :
> Le mercredi 28 avril 2010 à 08:36 -0400, jamal a écrit :
> > On Wed, 2010-04-28 at 14:33 +0200, Eric Dumazet wrote:
> > 
> > > If you wait a bit, I have another patch to speedup udp receive path ;)
> > 
> > Shoot whenever you are ready ;-> I will test with and without your
> > patch..
> > 
> 
> Here it is ;)
> 
> Thanks

I forgot to say that with my previous DDOS test/bench (16 cpus trying to
feed one udp socket), my receiver can now process 420.000 pps instead of
200.000 ;)




^ permalink raw reply

* Re: [PATCH]: sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173)
From: Neil Horman @ 2010-04-28 14:21 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: sri, linux-sctp, eteo, netdev, davem, security
In-Reply-To: <4BD83F85.8090308@hp.com>

On Wed, Apr 28, 2010 at 10:00:37AM -0400, Vlad Yasevich wrote:
> I have this patch and a few others already queued.
> 
> I was planning on sending these today for stable.
> 
> Here is the full list of stable patches I have:
> 
> sctp: Fix oops when sending queued ASCONF chunks
> sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set
> sctp: per_cpu variables should be in bh_disabled section
> sctp: fix potential reference of a freed pointer
> sctp: avoid irq lock inversion while call sk->sk_data_ready()
> 
> -vlad
> 
Are you sure?  this oops looks _very_ simmilar to the INIT/INIT-ACK length
calculation oops described above, but is in fact different, and requires this
patch, from what I can see.  The right fix might be in the ASCONF chunk patch
you list above, but I don't see that in your tree at the moment, so I can't be
sure.

Neil

> Neil Horman wrote:
> > Hey-
> > 	Recently, it was reported to me that the kernel could oops in the
> > following way:
> > 
> > <5> kernel BUG at net/core/skbuff.c:91!
> > <5> invalid operand: 0000 [#1]
> > <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
> > ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
> > vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
> > ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
> > snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
> > pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
> > mptbase sd_mod scsi_mod
> > <5> CPU:    0
> > <5> EIP:    0060:[<c02bff27>]    Not tainted VLI
> > <5> EFLAGS: 00010216   (2.6.9-89.0.25.EL) 
> > <5> EIP is at skb_over_panic+0x1f/0x2d
> > <5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
> > <5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
> > <5> ds: 007b   es: 007b   ss: 0068
> > <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
> > <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
> > e0c2947d 
> > <5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
> > df653490 
> > <5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
> > 00000004 
> > <5> Call Trace:
> > <5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
> > <5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
> > <5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
> > <5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
> > <5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
> > <5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
> > <5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
> > <5>  [<c01555a4>] cache_grow+0x140/0x233
> > <5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
> > <5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
> > <5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
> > <5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
> > <5>  [<c02d005e>] nf_iterate+0x40/0x81
> > <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> > <5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
> > <5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
> > <5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
> > <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
> > <5>  [<c02e103e>] ip_rcv+0x334/0x3b4
> > <5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
> > <5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
> > <5>  [<c02c67a4>] process_backlog+0x6c/0xd9
> > <5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
> > <5>  [<c012a7b1>] __do_softirq+0x35/0x79
> > <5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
> > <5>  [<c01094de>] do_softirq+0x46/0x4d
> > 
> > Its an skb_over_panic BUG halt that results from processing an init chunk in
> > which too many of its variable length parameters are in some way malformed.
> > 
> > The problem is in sctp_process_unk_param:
> > if (NULL == *errp)
> > 	*errp = sctp_make_op_error_space(asoc, chunk,
> > 					 ntohs(chunk->chunk_hdr->length));
> > 
> > 	if (*errp) {
> > 		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
> > 				 WORD_ROUND(ntohs(param.p->length)));
> > 		sctp_addto_chunk(*errp,
> > 			WORD_ROUND(ntohs(param.p->length)),
> > 				  param.v);
> > 
> > When we allocate an error chunk, we assume that the worst case scenario requires
> > that we have chunk_hdr->length data allocated, which would be correct nominally,
> > given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
> > we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
> > chunk, so the worst case situation in which all parameters are in violation
> > requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
> > 
> > The result of this error is that a deliberately malformed packet sent to a
> > listening host can cause a remote DOS, described in CVE-2010-1173:
> > http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
> > 
> > I've tested the below fix and confirmed that it fixes the issue.  It
> > pre-allocates the error chunk in sctp_verify_init, where we are able to count
> > the total number of variable length parameters, so we know how many error
> > headers we might need.  Then we simply use that chunk, if we find an error, or
> > discard/free it if all the parameters are valid.  Applies on top of the
> > lksctp-dev tree
> > 
> > Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> > 
> > 
> >  sm_make_chunk.c |   24 ++++++++++++++++++++++--
> >  1 file changed, 22 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> > index f592163..990457b 100644
> > --- a/net/sctp/sm_make_chunk.c
> > +++ b/net/sctp/sm_make_chunk.c
> > @@ -2134,6 +2134,8 @@ int sctp_verify_init(const struct sctp_association *asoc,
> >  	union sctp_params param;
> >  	int has_cookie = 0;
> >  	int result;
> > +	unsigned int param_cnt;
> > +	unsigned int len;
> >  
> >  	/* Verify stream values are non-zero. */
> >  	if ((0 == peer_init->init_hdr.num_outbound_streams) ||
> > @@ -2149,6 +2151,7 @@ int sctp_verify_init(const struct sctp_association *asoc,
> >  
> >  		if (SCTP_PARAM_STATE_COOKIE == param.p->type)
> >  			has_cookie = 1;
> > +		param_cnt++;
> >  
> >  	} /* for (loop through all parameters) */
> >  
> > @@ -2169,6 +2172,20 @@ int sctp_verify_init(const struct sctp_association *asoc,
> >  		return sctp_process_missing_param(asoc, SCTP_PARAM_STATE_COOKIE,
> >  						  chunk, errp);
> >  
> > +	if (!*errp) {
> > +		/*
> > +		 * Pre-allocate the error packet here
> > +		 * we do this as we need to reserve space
> > +		 * for the worst case scenario in which 
> > +		 * every parameter is in error and needs 
> > +		 * an errhdr attached to it
> > +		 */
> > +		len = ntohs(chunk->chunk_hdr->length);
> > +		len += sizeof(sctp_errhdr_t)*param_cnt;
> > +
> > +		*errp = sctp_make_op_error_space(asoc, chunk, len);
> > +	}
> > +
> >  	/* Verify all the variable length parameters */
> >  	sctp_walk_params(param, peer_init, init_hdr.params) {
> >  
> > @@ -2176,9 +2193,11 @@ int sctp_verify_init(const struct sctp_association *asoc,
> >  		switch (result) {
> >  		    case SCTP_IERROR_ABORT:
> >  		    case SCTP_IERROR_NOMEM:
> > -				return 0;
> >  		    case SCTP_IERROR_ERROR:
> > -				return 1;
> > +				len = ntohs((*errp)->chunk_hdr->length);
> > +				if ((*errp) && (len == sizeof(sctp_chunkhdr_t)))
> > +					sctp_chunk_free(*errp);
> > +				return (result == SCTP_IERROR_ERROR) ? 1 : 0;
> >  		    case SCTP_IERROR_NO_ERROR:
> >  		    default:
> >  				break;
> > @@ -2186,6 +2205,7 @@ int sctp_verify_init(const struct sctp_association *asoc,
> >  
> >  	} /* for (loop through all parameters) */
> >  
> > +	sctp_chunk_free(*errp);
> >  	return 1;
> >  }
> >  
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply

* Re: Checkpoint and Restart of INET routing information
From: Daniel Lezcano @ 2010-04-28 14:24 UTC (permalink / raw)
  To: Dan Smith; +Cc: containers, netdev
In-Reply-To: <1272034539-19899-1-git-send-email-danms@us.ibm.com>

Dan Smith wrote:
> This set extends the existing network socket, device, and namespace support
> in the checkpoint tree to cover routing information.  It does so by making
> heavy use of RTNETLINK to dump and insert routes much like userspace would.
> Because the task doing the checkpointing or restarting needs to examine
> or setup resources for tasks in network namespaces other than its own, an
> additional kernel socket setup call is added.  It provides us the ability
> to talk to RTNETLINK in a foreign netns.
>
> The support added in this set allows me to configure various inet4 and inet6
> routes in a container and have them saved and restored successfully during
> a checkpoint/restart process.
>   

Why do you need to do that from the kernel ? Same remark for ipv4/6 
addresses.
What prevents you to do 'ip route show'  and use these informations to 
restore the routes later ?
Will we end up by moving all the network userspace tools in the kernel ? :)

If you use the Eric's setns patchset, you will be able to do that easily 
from userspace, no ?



^ permalink raw reply

* Re: [PATCH 0/3] [RFC] ptp: IEEE 1588 clock support
From: Wolfgang Grandegger @ 2010-04-28 14:31 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev
In-Reply-To: <4BD83D37.4060301@grandegger.com>

Wolfgang Grandegger wrote:
> Richard Cochran wrote:
>> On Tue, Apr 27, 2010 at 06:20:25PM +0200, Wolfgang Grandegger wrote:
>>> Do you have also a patch adding support for hardware timestamping to ptpd?
>> Yes, I do:
>>
>>    https://sourceforge.net/tracker/index.php?func=detail&aid=2992847&group_id=139814&atid=744634
> 
> Thanks.
> 
>> I should have mentioned, you also need the gianfar HW time stamping
>> patches, recently posted to netdev by Manfred Rudigier.
> 
> I'm aware of these patches. I'm actually using the net-next-2.6 git tree.
> 
> I got ptpd working but I do not yet see the PPS-Signals on my scope. At
> a first glance, the PPS-Signal seems to be configured by the gianfar_ptp
> driver (setting the fiper1 and timer1 registers) but I might have missed
> something.

That's because some 1588_PPS related bits are not yet setup in the
platform code of mainline kernel.

Wolfgang.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox