Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/7] vxge: code cleanup and reorganization
From: David Miller @ 2010-12-11  0:08 UTC (permalink / raw)
  To: jon.mason
  Cc: netdev, sivakumar.subramani, sreenivasa.honnur, ram.vepa,
	arpit.patel
In-Reply-To: <1292025782-16372-1-git-send-email-jon.mason@exar.com>

From: Jon Mason <jon.mason@exar.com>
Date: Fri, 10 Dec 2010 18:02:56 -0600

> Move function locations to remove the need for internal declarations and
> other misc clean-ups.
> 
> Signed-off-by: Jon Mason <jon.mason@exar.com>
> Signed-off-by: Arpit Patel <arpit.patel@exar.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/7] vxge: fix crash of VF when unloading PF
From: David Miller @ 2010-12-11  0:08 UTC (permalink / raw)
  To: jon.mason; +Cc: netdev, sivakumar.subramani, sreenivasa.honnur, ram.vepa
In-Reply-To: <1292025782-16372-2-git-send-email-jon.mason@exar.com>

From: Jon Mason <jon.mason@exar.com>
Date: Fri, 10 Dec 2010 18:02:57 -0600

> Calling pci_disable_sriov when unloading a SR-IOV physical function
> driver from a host when a guest is using a virtual function from that
> device can cause a host crash or VM crash.  The crash is caused by the
> virtual config space no longer being present when PF is removed (due to
> the pci_disable_sriov).  This can be avoided by not calling
> pci_disable_sriov to disable the PCI space when shutting down the PF.
> Each function in the X3100 operates independently and in this case will
> operate properly in the absence of the PF.
> 
> Also, added improved logic in the detection of SR-IOV initialization.
> 
> Signed-off-by: Jon Mason <jon.mason@exar.com>
> Signed-off-by: Ram Vepa <ram.vepa@exar.com>

Applied.

^ permalink raw reply

* Re: [PATCH 3/7] vxge: use pci_request_region()
From: David Miller @ 2010-12-11  0:09 UTC (permalink / raw)
  To: jon.mason; +Cc: netdev, sivakumar.subramani, sreenivasa.honnur, ram.vepa
In-Reply-To: <1292025782-16372-3-git-send-email-jon.mason@exar.com>

From: Jon Mason <jon.mason@exar.com>
Date: Fri, 10 Dec 2010 18:02:58 -0600

> Only BAR0 is ever accessed, thus making the calls to pci_request_regions
> overkill.  Change calls of pci_request_regions to pci_request_region to
> reduce the size of the mapped area.
> 
> Signed-off-by: Jon Mason <jon.mason@exar.com>
> Signed-off-by: Ram Vepa <ram.vepa@exar.com>

Applied.

^ permalink raw reply

* Re: [PATCH 4/7] vxge: transmit timeout deadlock
From: David Miller @ 2010-12-11  0:09 UTC (permalink / raw)
  To: jon.mason; +Cc: netdev, sivakumar.subramani, sreenivasa.honnur, ram.vepa
In-Reply-To: <1292025782-16372-4-git-send-email-jon.mason@exar.com>

From: Jon Mason <jon.mason@exar.com>
Date: Fri, 10 Dec 2010 18:02:59 -0600

> Use a workqueue to handle the device reset during a transmit timeout, as
> there can be a deadlock during bringup.  Also, set the netif carrier off
> before the watchdog reset is started to prevent the timeout from
> reoccurring while still processing the first.
> 
> Signed-off-by: Jon Mason <jon.mason@exar.com>
> Signed-off-by: Ram Vepa <ram.vepa@exar.com>

Applied.

^ permalink raw reply

* Re: [PATCH 5/7] vxge: hotplug stall
From: David Miller @ 2010-12-11  0:09 UTC (permalink / raw)
  To: jon.mason; +Cc: netdev, sivakumar.subramani, sreenivasa.honnur, ram.vepa
In-Reply-To: <1292025782-16372-5-git-send-email-jon.mason@exar.com>

From: Jon Mason <jon.mason@exar.com>
Date: Fri, 10 Dec 2010 18:03:00 -0600

> When hot-unplugging a vxge adapter while running, the driver's remove
> routine prints warning and then stalls the calling thread.  This is due
> to vxge_remove calling vxge_device_unregister to unregister the netdev
> before calling flush_scheduled_work clear any pending work.  Swapping
> the order of these two functions resolves the issue.
> 
> Signed-off-by: Jon Mason <jon.mason@exar.com>
> Signed-off-by: Ram Vepa <ram.vepa@exar.com>

Applied.

^ permalink raw reply

* Re: [PATCH 6/7] vxge: independent interrupt moderation
From: David Miller @ 2010-12-11  0:09 UTC (permalink / raw)
  To: jon.mason; +Cc: netdev, sivakumar.subramani, sreenivasa.honnur, ram.vepa
In-Reply-To: <1292025782-16372-6-git-send-email-jon.mason@exar.com>

From: Jon Mason <jon.mason@exar.com>
Date: Fri, 10 Dec 2010 18:03:01 -0600

> Configure the workload clock register and TIM register for independent
> interrupt moderation based on the individual vpath utilization instead
> of common link utilization.  This greatly improves latency.
> 
> Signed-off-by: Jon Mason <jon.mason@exar.com>
> Signed-off-by: Ram Vepa <ram.vepa@exar.com>

Applied.

^ permalink raw reply

* Re: [PATCH 7/7] vxge: update driver version
From: David Miller @ 2010-12-11  0:09 UTC (permalink / raw)
  To: jon.mason; +Cc: netdev, sivakumar.subramani, sreenivasa.honnur, ram.vepa
In-Reply-To: <1292025782-16372-7-git-send-email-jon.mason@exar.com>

From: Jon Mason <jon.mason@exar.com>
Date: Fri, 10 Dec 2010 18:03:02 -0600

> Update vxge driver version to 2.5.1
> 
> Signed-off-by: Jon Mason <jon.mason@exar.com>
> Signed-off-by: Ram Vepa <ram.vepa@exar.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] kptr_restrict for hiding kernel pointers from unprivileged users
From: Kees Cook @ 2010-12-11  0:11 UTC (permalink / raw)
  To: Dan Rosenberg
  Cc: linux-kernel, linux-security-module, netdev, jmorris, eugeneteo,
	mingo, davem
In-Reply-To: <1292025924.2965.20.camel@Dan>

On Fri, Dec 10, 2010 at 07:05:24PM -0500, Dan Rosenberg wrote:
> The below patch adds the %pK format specifier, the
> CONFIG_SECURITY_KPTR_RESTRICT configuration option, and the
> kptr_restrict sysctl.
> 
> The %pK format specifier is designed to hide exposed kernel pointers
> from unprivileged users, specifically via /proc interfaces.  Its
> behavior depends on the kptr_restrict sysctl, whose default value
> depends on CONFIG_SECURITY_KPTR_RESTRICT.  If kptr_restrict is set to 0,
> no deviation from the standard %p behavior occurs.  If kptr_restrict is
> set to 1, if the current user (intended to be a reader via seq_printf(),
> etc.) does not have CAP_SYSLOG (which is currently in the LSM tree),
> kernel pointers using %pK are printed as 0's.  This was chosen over the
> default "(null)", which cannot be parsed by userland %p, which expects
> "(nil)".
> 
> v2 improves checking for inappropriate context, on suggestion by Peter
> Zijlstra.  Thanks to Thomas Graf for suggesting use of a centralized
> format specifier.
> 
> Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>

This will come in very handy! Thanks for working on this approach. :)

Acked-by: Kees Cook <kees.cook@canonical.com>

-Kees

-- 
Kees Cook
Ubuntu Security Team

^ permalink raw reply

* Re: [patch] isdn: return -EFAULT if copy_from_user() fails
From: David Miller @ 2010-12-11  0:16 UTC (permalink / raw)
  To: error27; +Cc: isdn, netdev, kernel-janitors
In-Reply-To: <20101210124009.GC10623@bicker>

From: Dan Carpenter <error27@gmail.com>
Date: Fri, 10 Dec 2010 15:40:09 +0300

> We should be returning -EFAULT here.
> 
> Mostly this patch is to silence a smatch warning.  The upper levels
> of this driver turn all non-zero return values from isar_load_firmware()
> into 1.
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>

Applied.

^ permalink raw reply

* Re: kernel panic with time-stamping in phy devices (monitor mode)
From: David Miller @ 2010-12-11  0:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, richardcochran, akwatts, stable, xiaosuo
In-Reply-To: <1291611032.2806.310.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 06 Dec 2010 05:50:32 +0100

> Le lundi 06 décembre 2010 à 08:01 +0800, Changli Gao a écrit :
> 
>> How about using skb_headroom(skb) < ETH_HLEN ?
>> 
> 
> Yes, good idea, thanks !
> 
> [PATCH net-2.6] net: fix skb_defer_rx_timestamp()
> 
> After commit c1f19b51d1d8 (net: support time stamping in phy devices.),
> kernel might crash if CONFIG_NETWORK_PHY_TIMESTAMPING=y and
> skb_defer_rx_timestamp() handles a packet without an ethernet header.
> 
> Fixes kernel bugzilla #24102
> 
> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24102
> Reported-and-tested-by: Andrew Watts <akwatts@ymail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied and queued up for -stable, thanks!

^ permalink raw reply

* Re: [PATCH 1/3] bonding: add the debugfs facility to the bonding driver
From: Jay Vosburgh @ 2010-12-11  0:21 UTC (permalink / raw)
  To: Taku Izumi; +Cc: netdev@vger.kernel.org, eric.dumazet, shemminger
In-Reply-To: <4D017F99.2040706@jp.fujitsu.com>

Taku Izumi <izumi.taku@jp.fujitsu.com> wrote:

>
>This patch provides the debugfs facility to the bonding driver.
>The "bonding" directory is created in the debugfs root and directories of
>each bonding interface (like bond0, bond1...) are created in that.
>
> # mount -t debugfs none /sys/kernel/debug
>
> # ls /sys/kernel/debug/bonding
> bond0  bond1
>
>Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>

	Tested replacement patch 1 with original patches 2 and 3.  Works
as advertised, no pesky crashes.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

	-J


>---
> drivers/net/bonding/Makefile       |    2
> drivers/net/bonding/bond_debugfs.c |   96 +++++++++++++++++++++++++++++++++++++
> drivers/net/bonding/bond_main.c    |   10 +++
> drivers/net/bonding/bonding.h      |    9 +++
> 4 files changed, 115 insertions(+), 2 deletions(-)
>
>Index: net-next/drivers/net/bonding/bond_main.c
>===================================================================
>--- net-next.orig/drivers/net/bonding/bond_main.c
>+++ net-next/drivers/net/bonding/bond_main.c
>@@ -3507,6 +3507,8 @@ static int bond_event_changename(struct
> 	bond_remove_proc_entry(bond);
> 	bond_create_proc_entry(bond);
>
>+	bond_debug_reregister(bond);
>+
> 	return NOTIFY_DONE;
> }
>
>@@ -4789,6 +4791,8 @@ static void bond_uninit(struct net_devic
>
> 	bond_remove_proc_entry(bond);
>
>+	bond_debug_unregister(bond);
>+
> 	__hw_addr_flush(&bond->mc_list);
>
> 	list_for_each_entry_safe(vlan, tmp, &bond->vlan_list, vlan_list) {
>@@ -5191,6 +5195,8 @@ static int bond_init(struct net_device *
>
> 	bond_prepare_sysfs_group(bond);
>
>+	bond_debug_register(bond);
>+
> 	__hw_addr_init(&bond->mc_list);
> 	return 0;
> }
>@@ -5312,6 +5318,8 @@ static int __init bonding_init(void)
> 	if (res)
> 		goto err_link;
>
>+	bond_create_debugfs();
>+
> 	for (i = 0; i < max_bonds; i++) {
> 		res = bond_create(&init_net, NULL);
> 		if (res)
>@@ -5322,7 +5330,6 @@ static int __init bonding_init(void)
> 	if (res)
> 		goto err;
>
>-
> 	register_netdevice_notifier(&bond_netdev_notifier);
> 	register_inetaddr_notifier(&bond_inetaddr_notifier);
> 	bond_register_ipv6_notifier();
>@@ -5346,6 +5353,7 @@ static void __exit bonding_exit(void)
> 	bond_unregister_ipv6_notifier();
>
> 	bond_destroy_sysfs();
>+	bond_destroy_debugfs();
>
> 	rtnl_link_unregister(&bond_link_ops);
> 	unregister_pernet_subsys(&bond_net_ops);
>Index: net-next/drivers/net/bonding/Makefile
>===================================================================
>--- net-next.orig/drivers/net/bonding/Makefile
>+++ net-next/drivers/net/bonding/Makefile
>@@ -4,7 +4,7 @@
>
> obj-$(CONFIG_BONDING) += bonding.o
>
>-bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o
>+bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o bond_debugfs.o
>
> ipv6-$(subst m,y,$(CONFIG_IPV6)) += bond_ipv6.o
> bonding-objs += $(ipv6-y)
>Index: net-next/drivers/net/bonding/bond_debugfs.c
>===================================================================
>--- /dev/null
>+++ net-next/drivers/net/bonding/bond_debugfs.c
>@@ -0,0 +1,96 @@
>+#include <linux/kernel.h>
>+#include <linux/module.h>
>+#include <linux/device.h>
>+#include <linux/netdevice.h>
>+
>+#include "bonding.h"
>+
>+#ifdef CONFIG_DEBUG_FS
>+
>+#include <linux/debugfs.h>
>+#include <linux/seq_file.h>
>+
>+static struct dentry *bonding_debug_root;
>+
>+void bond_debug_register(struct bonding *bond)
>+{
>+	if (!bonding_debug_root)
>+		return;
>+
>+	bond->debug_dir =
>+		debugfs_create_dir(bond->dev->name, bonding_debug_root);
>+
>+	if (!bond->debug_dir) {
>+		pr_warning("%s: Warning: failed to register to debugfs\n",
>+			bond->dev->name);
>+		return;
>+	}
>+}
>+
>+void bond_debug_unregister(struct bonding *bond)
>+{
>+	if (!bonding_debug_root)
>+		return;
>+
>+	debugfs_remove_recursive(bond->debug_dir);
>+}
>+
>+void bond_debug_reregister(struct bonding *bond)
>+{
>+	struct dentry *d;
>+
>+	if (!bonding_debug_root)
>+		return;
>+
>+	d = debugfs_rename(bonding_debug_root, bond->debug_dir,
>+			   bonding_debug_root, bond->dev->name);
>+	if (d) {
>+		bond->debug_dir = d;
>+	} else {
>+		pr_warning("%s: Warning: failed to reregister, "
>+				"so just unregister old one\n",
>+				bond->dev->name);
>+		bond_debug_unregister(bond);
>+	}
>+}
>+
>+void bond_create_debugfs(void)
>+{
>+	bonding_debug_root = debugfs_create_dir("bonding", NULL);
>+
>+	if (!bonding_debug_root) {
>+		pr_warning("Warning: Cannot create bonding directory"
>+				" in debugfs\n");
>+	}
>+}
>+
>+void bond_destroy_debugfs(void)
>+{
>+	debugfs_remove_recursive(bonding_debug_root);
>+	bonding_debug_root = NULL;
>+}
>+
>+
>+#else /* !CONFIG_DEBUG_FS */
>+
>+void bond_debug_register(struct bonding *bond)
>+{
>+}
>+
>+void bond_debug_unregister(struct bonding *bond)
>+{
>+}
>+
>+void bond_debug_reregister(struct bonding *bond)
>+{
>+}
>+
>+void bond_create_debugfs(void)
>+{
>+}
>+
>+void bond_destroy_debugfs(void)
>+{
>+}
>+
>+#endif /* CONFIG_DEBUG_FS */
>Index: net-next/drivers/net/bonding/bonding.h
>===================================================================
>--- net-next.orig/drivers/net/bonding/bonding.h
>+++ net-next/drivers/net/bonding/bonding.h
>@@ -259,6 +259,10 @@ struct bonding {
> #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> 	struct   in6_addr master_ipv6;
> #endif
>+#ifdef CONFIG_DEBUG_FS
>+	/* debugging suport via debugfs */
>+	struct	 dentry *debug_dir;
>+#endif /* CONFIG_DEBUG_FS */
> };
>
> /**
>@@ -380,6 +384,11 @@ void bond_select_active_slave(struct bon
> void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
> void bond_register_arp(struct bonding *);
> void bond_unregister_arp(struct bonding *);
>+void bond_create_debugfs(void);
>+void bond_destroy_debugfs(void);
>+void bond_debug_register(struct bonding *bond);
>+void bond_debug_unregister(struct bonding *bond);
>+void bond_debug_reregister(struct bonding *bond);
>
> struct bond_net {
> 	struct net *		net;	/* Associated network namespace */
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [net-next-2.6 PATCH] enic: Move enic port profile handling code to a new 802.1Qbh provisioning info type
From: David Miller @ 2010-12-11  0:22 UTC (permalink / raw)
  To: roprabhu; +Cc: netdev
In-Reply-To: <20101210220233.5010.37181.stgit@savbu-pc100.cisco.com>

From: Roopa Prabhu <roprabhu@cisco.com>
Date: Fri, 10 Dec 2010 14:02:33 -0800

> From: Roopa Prabhu <roprabhu@cisco.com>
> 
> 
> 
> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
> Signed-off-by: David Wang <dwang2@cisco.com>
> Signed-off-by: Christian Benvenuti <benve@cisco.com>

Commit message could explain why you're doing this, but I'll
apply this as it is pretty simple.

^ permalink raw reply

* Re: [PATCH 1/3] bonding: add the debugfs facility to the bonding driver
From: David Miller @ 2010-12-11  0:23 UTC (permalink / raw)
  To: fubar; +Cc: izumi.taku, netdev, eric.dumazet, shemminger
In-Reply-To: <29210.1292026905@death>

From: Jay Vosburgh <fubar@us.ibm.com>
Date: Fri, 10 Dec 2010 16:21:45 -0800

> Taku Izumi <izumi.taku@jp.fujitsu.com> wrote:
> 
>>
>>This patch provides the debugfs facility to the bonding driver.
>>The "bonding" directory is created in the debugfs root and directories of
>>each bonding interface (like bond0, bond1...) are created in that.
>>
>> # mount -t debugfs none /sys/kernel/debug
>>
>> # ls /sys/kernel/debug/bonding
>> bond0  bond1
>>
>>Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
> 
> 	Tested replacement patch 1 with original patches 2 and 3.  Works
> as advertised, no pesky crashes.
> 
> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

I'll apply this patch #1.  Taku-san, could you please report patch #2
and #3 and remember to include Jay's signoff.

Thanks.

^ permalink raw reply

* Re: [patch] isdn: return -EFAULT if copy_from_user() fails
From: Nicolas Kaiser @ 2010-12-11  0:41 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: David S. Miller, Karsten Keil, netdev, kernel-janitors
In-Reply-To: <20101210124009.GC10623@bicker>

Ahem, we're printing this return value:

* Dan Carpenter <error27@gmail.com>:
> -	if ((ret = copy_from_user(&size, p, sizeof(int)))) {
             ^^^
> +	if (copy_from_user(&size, p, sizeof(int))) {
>  		printk(KERN_ERR"isar_load_firmware copy_from_user ret %d\n", ret);
                                                                             ^^^
> -		return ret;
> +		return -EFAULT;

Cheers,
Nicolas Kaiser

^ permalink raw reply

* Re: [patch] isdn: return -EFAULT if copy_from_user() fails
From: David Miller @ 2010-12-11  0:47 UTC (permalink / raw)
  To: nikai; +Cc: error27, isdn, netdev, kernel-janitors
In-Reply-To: <20101211014154.3157588b@absol.kitzblitz>

From: Nicolas Kaiser <nikai@nikai.net>
Date: Sat, 11 Dec 2010 01:41:54 +0100

> Ahem, we're printing this return value:
> 
> * Dan Carpenter <error27@gmail.com>:
>> -	if ((ret = copy_from_user(&size, p, sizeof(int)))) {
>              ^^^
>> +	if (copy_from_user(&size, p, sizeof(int))) {
>>  		printk(KERN_ERR"isar_load_firmware copy_from_user ret %d\n", ret);
>                                                                              ^^^
>> -		return ret;
>> +		return -EFAULT;

I'll fix this, thanks.

^ permalink raw reply

* Re: [patch] isdn: return -EFAULT if copy_from_user() fails
From: David Miller @ 2010-12-11  0:50 UTC (permalink / raw)
  To: nikai; +Cc: error27, isdn, netdev, kernel-janitors
In-Reply-To: <20101210.164721.85429801.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 10 Dec 2010 16:47:21 -0800 (PST)

> From: Nicolas Kaiser <nikai@nikai.net>
> Date: Sat, 11 Dec 2010 01:41:54 +0100
> 
>> Ahem, we're printing this return value:
>> 
>> * Dan Carpenter <error27@gmail.com>:
>>> -	if ((ret = copy_from_user(&size, p, sizeof(int)))) {
>>              ^^^
>>> +	if (copy_from_user(&size, p, sizeof(int))) {
>>>  		printk(KERN_ERR"isar_load_firmware copy_from_user ret %d\n", ret);
>>                                                                              ^^^
>>> -		return ret;
>>> +		return -EFAULT;
> 
> I'll fix this, thanks.

As follows:

>From cf108fdd482e80161128c2ed01e7f4fb5bc728b9 Mon Sep 17 00:00:00 2001
From: David S. Miller <davem@davemloft.net>
Date: Fri, 10 Dec 2010 16:49:24 -0800
Subject: [PATCH] isdn: Fix printed out copy_from_user() return value after previous change.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/isdn/hisax/isar.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/isdn/hisax/isar.c b/drivers/isdn/hisax/isar.c
index c9cd4d2..313ce2d 100644
--- a/drivers/isdn/hisax/isar.c
+++ b/drivers/isdn/hisax/isar.c
@@ -189,7 +189,7 @@ ISARVersion(struct IsdnCardState *cs, char *s)
 static int
 isar_load_firmware(struct IsdnCardState *cs, u_char __user *buf)
 {
-	int ret, size, cnt, debug;
+	int cfu_ret, size, cnt, debug;
 	u_char len, nom, noc;
 	u_short sadr, left, *sp;
 	u_char __user *p = buf;
@@ -212,8 +212,9 @@ isar_load_firmware(struct IsdnCardState *cs, u_char __user *buf)
 	cs->debug &= ~(L1_DEB_HSCX | L1_DEB_HSCX_FIFO);
 #endif
 	
-	if (copy_from_user(&size, p, sizeof(int))) {
-		printk(KERN_ERR"isar_load_firmware copy_from_user ret %d\n", ret);
+	cfu_ret = copy_from_user(&size, p, sizeof(int));
+	if (cfu_ret) {
+		printk(KERN_ERR"isar_load_firmware copy_from_user ret %d\n", cfu_ret);
 		return -EFAULT;
 	}
 	p += sizeof(int);
-- 
1.7.3.2


^ permalink raw reply related

* Re: [patch] isdn: return -EFAULT if copy_from_user() fails
From: David Miller @ 2010-12-11  0:52 UTC (permalink / raw)
  To: nikai; +Cc: error27, isdn, netdev, kernel-janitors
In-Reply-To: <20101210.165003.267942493.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Fri, 10 Dec 2010 16:50:03 -0800 (PST)

> From: David Miller <davem@davemloft.net>
> Date: Fri, 10 Dec 2010 16:47:21 -0800 (PST)
> 
>> From: Nicolas Kaiser <nikai@nikai.net>
>> Date: Sat, 11 Dec 2010 01:41:54 +0100
>> 
>>> Ahem, we're printing this return value:
>>> 
>>> * Dan Carpenter <error27@gmail.com>:
>>>> -	if ((ret = copy_from_user(&size, p, sizeof(int)))) {
>>>              ^^^
>>>> +	if (copy_from_user(&size, p, sizeof(int))) {
>>>>  		printk(KERN_ERR"isar_load_firmware copy_from_user ret %d\n", ret);
>>>                                                                              ^^^
>>>> -		return ret;
>>>> +		return -EFAULT;
>> 
>> I'll fix this, thanks.
> 
> As follows:

Ugh, that was buggy, let's try this :-)

--------------------
isdn: Fix printed out copy_from_user() return value after previous change.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/isdn/hisax/isar.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/isdn/hisax/isar.c b/drivers/isdn/hisax/isar.c
index c9cd4d2..d4cce33 100644
--- a/drivers/isdn/hisax/isar.c
+++ b/drivers/isdn/hisax/isar.c
@@ -189,7 +189,7 @@ ISARVersion(struct IsdnCardState *cs, char *s)
 static int
 isar_load_firmware(struct IsdnCardState *cs, u_char __user *buf)
 {
-	int ret, size, cnt, debug;
+	int cfu_ret, ret, size, cnt, debug;
 	u_char len, nom, noc;
 	u_short sadr, left, *sp;
 	u_char __user *p = buf;
@@ -212,8 +212,9 @@ isar_load_firmware(struct IsdnCardState *cs, u_char __user *buf)
 	cs->debug &= ~(L1_DEB_HSCX | L1_DEB_HSCX_FIFO);
 #endif
 	
-	if (copy_from_user(&size, p, sizeof(int))) {
-		printk(KERN_ERR"isar_load_firmware copy_from_user ret %d\n", ret);
+	cfu_ret = copy_from_user(&size, p, sizeof(int));
+	if (cfu_ret) {
+		printk(KERN_ERR"isar_load_firmware copy_from_user ret %d\n", cfu_ret);
 		return -EFAULT;
 	}
 	p += sizeof(int);
-- 
1.7.3.2


^ permalink raw reply related

* Re: [PATCH 2/7] vxge: fix crash of VF when unloading PF
From: Chris Wright @ 2010-12-11  1:04 UTC (permalink / raw)
  To: Jon Mason
  Cc: David S. Miller, netdev, Sivakumar Subramani, Sreenivasa Honnur,
	Ram Vepa
In-Reply-To: <1292025782-16372-2-git-send-email-jon.mason@exar.com>

* Jon Mason (jon.mason@exar.com) wrote:
> +static int __devinit is_sriov_initialized(struct pci_dev *pdev)
> +{
> +	int pos;
> +	u16 ctrl;
> +
> +	pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
> +	if (pos) {
> +		pci_read_config_word(pdev, pos + PCI_SRIOV_CTRL, &ctrl);
> +		if (ctrl & PCI_SRIOV_CTRL_VFE)
> +			return 1;
> +	}
> +	return 0;
> +}

This is a helper that should go in drivers/pci/iov.c (it's a pure
pci thing).

> +
>  /**
>   * vxge_probe
>   * @pdev : structure containing the PCI related information of the device.
> @@ -4370,14 +4384,13 @@ vxge_probe(struct pci_dev *pdev, const struct pci_device_id *pre)
>  		num_vfs = vxge_get_num_vfs(function_mode) - 1;
>  
>  	/* Enable SRIOV mode, if firmware has SRIOV support and if it is a PF */
> -	if (is_sriov(function_mode) && (max_config_dev > 1) &&
> -		(ll_config->intr_type != INTA) &&
> -		(is_privileged == VXGE_HW_OK)) {
> -		ret = pci_enable_sriov(pdev, ((max_config_dev - 1) < num_vfs)
> -			? (max_config_dev - 1) : num_vfs);
> +	if (is_sriov(function_mode) && !is_sriov_initialized(pdev) &&
> +	   (ll_config->intr_type != INTA)) {
> +		ret = pci_enable_sriov(pdev, num_vfs);

This fundamentally changes the way VF's are allocated.  Now you cannot
specifiy the number of vfs to allocate with max_config_dev module
parameter.

>  		if (ret)
>  			vxge_debug_ll_config(VXGE_ERR,
>  				"Failed in enabling SRIOV mode: %d\n", ret);
> +			/* No need to fail out, as an error here is non-fatal */
>  	}
>  
>  	/*
> @@ -4673,8 +4686,6 @@ static void __devexit vxge_remove(struct pci_dev *pdev)
>  
>  	iounmap(vdev->bar0);
>  
> -	pci_disable_sriov(pdev);
> -

And you can never disable sriov.

This doesn't look like the right behaviour.

thanks,
-chris

^ permalink raw reply

* RE: [PATCH 2/7] vxge: fix crash of VF when unloading PF
From: Ramkrishna Vepa @ 2010-12-11  1:35 UTC (permalink / raw)
  To: Chris Wright, Jon Mason
  Cc: David S. Miller, netdev@vger.kernel.org, Sivakumar Subramani,
	Sreenivasa Honnur
In-Reply-To: <20101211010441.GI4040@sequoia.sous-sol.org>

> > +
> >  /**
> >   * vxge_probe
> >   * @pdev : structure containing the PCI related information of the
> device.
> > @@ -4370,14 +4384,13 @@ vxge_probe(struct pci_dev *pdev, const struct
> pci_device_id *pre)
> >             num_vfs = vxge_get_num_vfs(function_mode) - 1;
> >
> >     /* Enable SRIOV mode, if firmware has SRIOV support and if it is a
> PF */
> > -   if (is_sriov(function_mode) && (max_config_dev > 1) &&
> > -           (ll_config->intr_type != INTA) &&
> > -           (is_privileged == VXGE_HW_OK)) {
> > -           ret = pci_enable_sriov(pdev, ((max_config_dev - 1) < num_vfs)
> > -                   ? (max_config_dev - 1) : num_vfs);
> > +   if (is_sriov(function_mode) && !is_sriov_initialized(pdev) &&
> > +      (ll_config->intr_type != INTA)) {
> > +           ret = pci_enable_sriov(pdev, num_vfs);
>
> This fundamentally changes the way VF's are allocated.  Now you cannot
> specifiy the number of vfs to allocate with max_config_dev module
> parameter.
The X3100 supports 11 different pci function modes where the user has the ability to choose the number of functions for each mode. This is more efficient usage of the hardware as the resources are carved out equally for the functions. By configuring the max_config_dev less than num_vfs, there's unnecessary wastage of resources.

>
> >             if (ret)
> >                     vxge_debug_ll_config(VXGE_ERR,
> >                             "Failed in enabling SRIOV mode: %d\n", ret);
> > +                   /* No need to fail out, as an error here is non-fatal */
> >     }
> >
> >     /*
> > @@ -4673,8 +4686,6 @@ static void __devexit vxge_remove(struct pci_dev
> *pdev)
> >
> >     iounmap(vdev->bar0);
> >
> > -   pci_disable_sriov(pdev);
> > -
>
> And you can never disable sriov.
If the device's pci function mode is changed, a power cycle is required in which case the functions are re-enumerated.

>
> This doesn't look like the right behaviour.
When the driver is loaded for the X3100 in SRIOV mode, it will be working in that mode even after it is unloaded and reloaded. As mentioned earlier, a change in the function mode requires and power cycle of the system.

The SRIOV feature is shipping in many distros and we need this fix back ported to prevent a possible crash when the PF is unloaded while the VFs are running in the guest OS in pass through mode.

If you have a better or simpler solution, that may take longer to implement, I would suggest that this solution be accepted in the interim.

Thanks,
Ram

The information and any attached documents contained in this message
may be confidential and/or legally privileged.  The message is
intended solely for the addressee(s).  If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful.  If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.

^ permalink raw reply

* [PATCH 1/4] s2io: rx_ring_sz bounds checking
From: Jon Mason @ 2010-12-11  1:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Sivakumar Subramani, Sreenivasa Honnur, Ram Vepa

modparm rx_ring_sz can be set to be greater than the maximum allowable
number of blocks.  This results in an array overrun when probing the
driver, and causes memory corruption.

Also, the MAX_RX_DESC_1 multiply the max number of rings by max number
of blocker per ring by 127, but the driver does the same calculation
with 127+1.  This results in the possibility of the value being set
being larger than the maximum allowable value.

Finally, clean-up the s2io_ethtool_gringparam code to be more
intuitive.

Signed-off-by: Jon Mason <jon.mason@exar.com>
---
 drivers/net/s2io.c |   40 ++++++++++++++++++++++++----------------
 drivers/net/s2io.h |    5 ++---
 2 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 0f4219c..a6d3eaf 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -5568,30 +5568,27 @@ static void s2io_ethtool_gringparam(struct net_device *dev,
 	struct s2io_nic *sp = netdev_priv(dev);
 	int i, tx_desc_count = 0, rx_desc_count = 0;
 
-	if (sp->rxd_mode == RXD_MODE_1)
+	if (sp->rxd_mode == RXD_MODE_1) {
 		ering->rx_max_pending = MAX_RX_DESC_1;
-	else if (sp->rxd_mode == RXD_MODE_3B)
+		ering->rx_jumbo_max_pending = MAX_RX_DESC_1;
+	} else {
 		ering->rx_max_pending = MAX_RX_DESC_2;
+		ering->rx_jumbo_max_pending = MAX_RX_DESC_2;
+	}
 
+	ering->rx_mini_max_pending = 0;
 	ering->tx_max_pending = MAX_TX_DESC;
-	for (i = 0 ; i < sp->config.tx_fifo_num ; i++)
-		tx_desc_count += sp->config.tx_cfg[i].fifo_len;
 
-	DBG_PRINT(INFO_DBG, "max txds: %d\n", sp->config.max_txds);
-	ering->tx_pending = tx_desc_count;
-	rx_desc_count = 0;
-	for (i = 0 ; i < sp->config.rx_ring_num ; i++)
+	for (i = 0; i < sp->config.rx_ring_num; i++)
 		rx_desc_count += sp->config.rx_cfg[i].num_rxd;
-
 	ering->rx_pending = rx_desc_count;
-
-	ering->rx_mini_max_pending = 0;
-	ering->rx_mini_pending = 0;
-	if (sp->rxd_mode == RXD_MODE_1)
-		ering->rx_jumbo_max_pending = MAX_RX_DESC_1;
-	else if (sp->rxd_mode == RXD_MODE_3B)
-		ering->rx_jumbo_max_pending = MAX_RX_DESC_2;
 	ering->rx_jumbo_pending = rx_desc_count;
+	ering->rx_mini_pending = 0;
+
+	for (i = 0; i < sp->config.tx_fifo_num; i++)
+		tx_desc_count += sp->config.tx_cfg[i].fifo_len;
+	ering->tx_pending = tx_desc_count;
+	DBG_PRINT(INFO_DBG, "max txds: %d\n", sp->config.max_txds);
 }
 
 /**
@@ -7692,6 +7689,8 @@ static void s2io_init_pci(struct s2io_nic *sp)
 static int s2io_verify_parm(struct pci_dev *pdev, u8 *dev_intr_type,
 			    u8 *dev_multiq)
 {
+	int i;
+
 	if ((tx_fifo_num > MAX_TX_FIFOS) || (tx_fifo_num < 1)) {
 		DBG_PRINT(ERR_DBG, "Requested number of tx fifos "
 			  "(%d) not supported\n", tx_fifo_num);
@@ -7750,6 +7749,15 @@ static int s2io_verify_parm(struct pci_dev *pdev, u8 *dev_intr_type,
 		DBG_PRINT(ERR_DBG, "Defaulting to 1-buffer mode\n");
 		rx_ring_mode = 1;
 	}
+
+	for (i = 0; i < MAX_RX_RINGS; i++)
+		if (rx_ring_sz[i] > MAX_RX_BLOCKS_PER_RING) {
+			DBG_PRINT(ERR_DBG, "Requested rx ring size not "
+				  "supported\nDefaulting to %d\n",
+				  MAX_RX_BLOCKS_PER_RING);
+			rx_ring_sz[i] = MAX_RX_BLOCKS_PER_RING;
+		}
+
 	return SUCCESS;
 }
 
diff --git a/drivers/net/s2io.h b/drivers/net/s2io.h
index 00b8614..1671443 100644
--- a/drivers/net/s2io.h
+++ b/drivers/net/s2io.h
@@ -355,9 +355,8 @@ struct stat_block {
 #define FIFO_OTHER_MAX_NUM			1
 
 
-#define MAX_RX_DESC_1  (MAX_RX_RINGS * MAX_RX_BLOCKS_PER_RING * 127 )
-#define MAX_RX_DESC_2  (MAX_RX_RINGS * MAX_RX_BLOCKS_PER_RING * 85 )
-#define MAX_RX_DESC_3  (MAX_RX_RINGS * MAX_RX_BLOCKS_PER_RING * 85 )
+#define MAX_RX_DESC_1  (MAX_RX_RINGS * MAX_RX_BLOCKS_PER_RING * 128)
+#define MAX_RX_DESC_2  (MAX_RX_RINGS * MAX_RX_BLOCKS_PER_RING * 86)
 #define MAX_TX_DESC    (MAX_AVAILABLE_TXDS)
 
 /* FIFO mappings for all possible number of fifos configured */
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 2/4] s2io: make strings at tables const
From: Jon Mason @ 2010-12-11  1:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Sivakumar Subramani, Sreenivasa Honnur, Ram Vepa,
	Stephen Hemminger
In-Reply-To: <1292031604-16628-1-git-send-email-jon.mason@exar.com>

Put immutable data in read/only section.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jon Mason <jon.mason@exar.com>
---
 drivers/net/s2io.c |    8 ++++----
 drivers/net/s2io.h |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index a6d3eaf..6a87b8c 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -91,11 +91,11 @@
 #define DRV_VERSION "2.0.26.27"
 
 /* S2io Driver name & version. */
-static char s2io_driver_name[] = "Neterion";
-static char s2io_driver_version[] = DRV_VERSION;
+static const char s2io_driver_name[] = "Neterion";
+static const char s2io_driver_version[] = DRV_VERSION;
 
-static int rxd_size[2] = {32, 48};
-static int rxd_count[2] = {127, 85};
+static const int rxd_size[2] = {32, 48};
+static const int rxd_count[2] = {127, 85};
 
 static inline int RXD_IS_UP2DT(struct RxD_t *rxdp)
 {
diff --git a/drivers/net/s2io.h b/drivers/net/s2io.h
index 1671443..7d16030 100644
--- a/drivers/net/s2io.h
+++ b/drivers/net/s2io.h
@@ -360,7 +360,7 @@ struct stat_block {
 #define MAX_TX_DESC    (MAX_AVAILABLE_TXDS)
 
 /* FIFO mappings for all possible number of fifos configured */
-static int fifo_map[][MAX_TX_FIFOS] = {
+static const int fifo_map[][MAX_TX_FIFOS] = {
 	{0, 0, 0, 0, 0, 0, 0, 0},
 	{0, 0, 0, 0, 1, 1, 1, 1},
 	{0, 0, 0, 1, 1, 1, 2, 2},
@@ -371,7 +371,7 @@ static int fifo_map[][MAX_TX_FIFOS] = {
 	{0, 1, 2, 3, 4, 5, 6, 7},
 };
 
-static u16 fifo_selector[MAX_TX_FIFOS] = {0, 1, 3, 3, 7, 7, 7, 7};
+static const u16 fifo_selector[MAX_TX_FIFOS] = {0, 1, 3, 3, 7, 7, 7, 7};
 
 /* Maintains Per FIFO related information. */
 struct tx_fifo_config {
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 3/4] s2io: Update Driver Version
From: Jon Mason @ 2010-12-11  1:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Sivakumar Subramani, Sreenivasa Honnur, Ram Vepa
In-Reply-To: <1292031604-16628-1-git-send-email-jon.mason@exar.com>

Update Driver Version

Signed-off-by: Jon Mason <jon.mason@exar.com>
---
 drivers/net/s2io.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 6a87b8c..80efc05 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -88,7 +88,7 @@
 #include "s2io.h"
 #include "s2io-regs.h"
 
-#define DRV_VERSION "2.0.26.27"
+#define DRV_VERSION "2.0.26.28"
 
 /* S2io Driver name & version. */
 static const char s2io_driver_name[] = "Neterion";
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH 4/4] Using static const generally increases object text and decreases data size. It also generally decreases overall object size.
From: Jon Mason @ 2010-12-11  1:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Sivakumar Subramani, Sreenivasa Honnur, Ram Vepa,
	Joe Perches
In-Reply-To: <1292031604-16628-1-git-send-email-jon.mason@exar.com>

   text	   data	    bss	    dec	    hex	filename
 109387	    389	  24432	 134208	  20c40	drivers/net/s2io.o.old
 109358	    389	  24432	 134179	  20c23	drivers/net/s2io.o.new

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jon Mason <jon.mason@exar.com>
---
 drivers/net/s2io.c |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index 80efc05..9a1e32f 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -3598,10 +3598,12 @@ static int s2io_set_swapper(struct s2io_nic *sp)
 	val64 = readq(&bar0->pif_rd_swapper_fb);
 	if (val64 != 0x0123456789ABCDEFULL) {
 		int i = 0;
-		u64 value[] = { 0xC30000C3C30000C3ULL,   /* FE=1, SE=1 */
-				0x8100008181000081ULL,  /* FE=1, SE=0 */
-				0x4200004242000042ULL,  /* FE=0, SE=1 */
-				0};                     /* FE=0, SE=0 */
+		static const u64 value[] = {
+			0xC30000C3C30000C3ULL,	/* FE=1, SE=1 */
+			0x8100008181000081ULL,	/* FE=1, SE=0 */
+			0x4200004242000042ULL,	/* FE=0, SE=1 */
+			0			/* FE=0, SE=0 */
+		};
 
 		while (i < 4) {
 			writeq(value[i], &bar0->swapper_ctrl);
@@ -3627,10 +3629,12 @@ static int s2io_set_swapper(struct s2io_nic *sp)
 
 	if (val64 != valt) {
 		int i = 0;
-		u64 value[] = { 0x00C3C30000C3C300ULL,  /* FE=1, SE=1 */
-				0x0081810000818100ULL,  /* FE=1, SE=0 */
-				0x0042420000424200ULL,  /* FE=0, SE=1 */
-				0};                     /* FE=0, SE=0 */
+		static const u64 value[] = {
+			0x00C3C30000C3C300ULL,	/* FE=1, SE=1 */
+			0x0081810000818100ULL,	/* FE=1, SE=0 */
+			0x0042420000424200ULL,	/* FE=0, SE=1 */
+			0			/* FE=0, SE=0 */
+		};
 
 		while (i < 4) {
 			writeq((value[i] | valr), &bar0->swapper_ctrl);
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] rfc: ethtool: early-orphan control
From: Simon Horman @ 2010-12-11  4:13 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Ben Hutchings, Simon Horman

Early orphaning is an optimisation which avoids unnecessary cache misses by
orphaning an skb just before it is handed to a device for transmit thus
avoiding the case where the orphaning occurs on a different CPU.

In the case of bonded devices this has the unfortunate side-effect of
breaking down flow control allowing a socket to send UDP packets as fast as
the CPU will allow. This is particularly undesirable in virtualised
network environments.

This patch introduces ethtool control of early orphaning.
It remains on by default by it now may be disabled on a per-interface basis.

I have implemented this as a generic flag.
As it seems to be the first generic flag that requires
no driver awareness I also supplied a default flag handler.
I am unsure if any aspect of this approach is acceptable.

I believe Eric has it in mind that some of the calls
to skb_orphan() in drivers can be removed with the addition
of this feature. I need to discuss that with him further.

A patch for the ethtool user-space utility accompanies this patch.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/linux/ethtool.h   |    1 +
 include/linux/netdevice.h |    1 +
 net/core/dev.c            |   10 +++++++---
 net/core/ethtool.c        |   16 +++++++++++++---
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 1908929..e444d1e 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -314,6 +314,7 @@ enum ethtool_flags {
 	ETH_FLAG_LRO		= (1 << 15),	/* LRO is enabled */
 	ETH_FLAG_NTUPLE		= (1 << 27),	/* N-tuple filters enabled */
 	ETH_FLAG_RXHASH		= (1 << 28),
+	ETH_FLAG_EARLY_ORPHAN	= (1 << 29),
 };
 
 /* The following structures are for supporting RX network flow
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d31bc3c..4aa85d6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -905,6 +905,7 @@ struct net_device {
 #define NETIF_F_FCOE_MTU	(1 << 26) /* Supports max FCoE MTU, 2158 bytes*/
 #define NETIF_F_NTUPLE		(1 << 27) /* N-tuple filters supported */
 #define NETIF_F_RXHASH		(1 << 28) /* Receive hashing offload */
+#define NETIF_F_EARLY_ORPHAN	(1 << 29) /* Early Orphaning of skbs */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
diff --git a/net/core/dev.c b/net/core/dev.c
index d28b3a0..39e8c38 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1958,11 +1958,12 @@ static int dev_gso_segment(struct sk_buff *skb)
  * We cannot orphan skb if tx timestamp is requested or the sk-reference
  * is needed on driver level for other reasons, e.g. see net/can/raw.c
  */
-static inline void skb_orphan_try(struct sk_buff *skb)
+static inline void skb_orphan_try(struct sk_buff *skb, struct net_device *dev)
 {
 	struct sock *sk = skb->sk;
 
-	if (sk && !skb_shinfo(skb)->tx_flags) {
+	if (dev->features & NETIF_F_EARLY_ORPHAN &&
+	    sk && !skb_shinfo(skb)->tx_flags) {
 		/* skb_tx_hash() wont be able to get sk.
 		 * We copy sk_hash into skb->rxhash
 		 */
@@ -2032,7 +2033,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (!list_empty(&ptype_all))
 			dev_queue_xmit_nit(skb, dev);
 
-		skb_orphan_try(skb);
+		skb_orphan_try(skb, dev);
 
 		if (vlan_tx_tag_present(skb) &&
 		    !(dev->features & NETIF_F_HW_VLAN_TX)) {
@@ -5216,6 +5217,9 @@ int register_netdevice(struct net_device *dev)
 	if (dev->features & NETIF_F_SG)
 		dev->features |= NETIF_F_GSO;
 
+	/* Enable early orphaning - everything supports it */
+	dev->features |= NETIF_F_EARLY_ORPHAN;
+
 	/* Enable GRO and NETIF_F_HIGHDMA for vlans by default,
 	 * vlan_dev_init() will do the dev->features check, so these features
 	 * are enabled only if supported by underlying device.
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 1774178..f63bdce 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -133,7 +133,7 @@ EXPORT_SYMBOL(ethtool_op_set_ufo);
  */
 static const u32 flags_dup_features =
 	(ETH_FLAG_LRO | ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN | ETH_FLAG_NTUPLE |
-	 ETH_FLAG_RXHASH);
+	 ETH_FLAG_RXHASH | ETH_FLAG_EARLY_ORPHAN);
 
 u32 ethtool_op_get_flags(struct net_device *dev)
 {
@@ -148,7 +148,8 @@ EXPORT_SYMBOL(ethtool_op_get_flags);
 
 int ethtool_op_set_flags(struct net_device *dev, u32 data, u32 supported)
 {
-	if (data & ~supported)
+	/* Everything supports early orphan */
+	if (data & ~(supported | NETIF_F_EARLY_ORPHAN))
 		return -EINVAL;
 
 	dev->features = ((dev->features & ~flags_dup_features) |
@@ -157,6 +158,13 @@ int ethtool_op_set_flags(struct net_device *dev, u32 data, u32 supported)
 }
 EXPORT_SYMBOL(ethtool_op_set_flags);
 
+static int ethtool_op_set_flags_early_orphan(struct net_device *dev, u32 data)
+{
+	dev->features = ((dev->features & ~NETIF_F_EARLY_ORPHAN) |
+			 (data & NETIF_F_EARLY_ORPHAN));
+	return 0;
+}
+
 void ethtool_ntuple_flush(struct net_device *dev)
 {
 	struct ethtool_rx_ntuple_flow_spec_container *fsc, *f;
@@ -1644,7 +1652,9 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 		break;
 	case ETHTOOL_SFLAGS:
 		rc = ethtool_set_value(dev, useraddr,
-				       dev->ethtool_ops->set_flags);
+				       dev->ethtool_ops->set_flags ?
+				       dev->ethtool_ops->set_flags :
+				       ethtool_op_set_flags_early_orphan);
 		break;
 	case ETHTOOL_GPFLAGS:
 		rc = ethtool_get_value(dev, useraddr, ethcmd,
-- 
1.7.2.3


^ permalink raw reply related

* [PATCH] rfc: ethtool: early-orphan control (user-space)
From: Simon Horman @ 2010-12-11  4:13 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Ben Hutchings, Simon Horman

Early orphaning is an optimisation which avoids unnecessary cache misses by
orphaning an skb just before it is handed to a device for transmit thus
avoiding the case where the orphaning occurs on a different CPU.

In the case of bonded devices this has the unfortunate side-effect of
breaking down flow control allowing a socket to send UDP packets as fast as
the CPU will allow. This is particularly undesirable in virtualised
network environments.

This patch introduces ethtool control of early orphaning.
It remains on by default by it now may be disabled on a per-interface basis.

I have implemented this as a generic option using a generic flag.
I am unsure if any aspect of this approach is acceptable.

A patch for the kernel accompanies this patch.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 ethtool-copy.h |    1 +
 ethtool.8      |    1 +
 ethtool.c      |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 75c3ae7..9ed758f 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -314,6 +314,7 @@ enum ethtool_flags {
 	ETH_FLAG_LRO		= (1 << 15),	/* LRO is enabled */
 	ETH_FLAG_NTUPLE		= (1 << 27),	/* N-tuple filters enabled */
 	ETH_FLAG_RXHASH		= (1 << 28),
+	ETH_FLAG_EARLY_ORPHAN	= (1 << 29),
 };
 
 /* The following structures are for supporting RX network flow
diff --git a/ethtool.8 b/ethtool.8
index 1760924..6384681 100644
--- a/ethtool.8
+++ b/ethtool.8
@@ -209,6 +209,7 @@ ethtool \- Display or change ethernet card settings
 .BI msglvl \ type
 .A1 on off
 .RB ...]
+.B2 early-orphan on off
 
 .B ethtool \-n
 .I ethX
diff --git a/ethtool.c b/ethtool.c
index 239912b..579b4e4 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -163,7 +163,8 @@ static struct option {
 		"		[ xcvr internal|external ]\n"
 		"		[ wol p|u|m|b|a|g|s|d... ]\n"
 		"		[ sopass %x:%x:%x:%x:%x:%x ]\n"
-		"		[ msglvl %d | msglvl type on|off ... ]\n" },
+		"		[ msglvl %d | msglvl type on|off ... ]\n"
+		"		[ early-orphan on|off ]\n" },
     { "-a", "--show-pause", MODE_GPAUSE, "Show pause options" },
     { "-A", "--pause", MODE_SPAUSE, "Set pause options",
       "		[ autoneg on|off ]\n"
@@ -411,6 +412,10 @@ static int msglvl_changed;
 static u32 msglvl_wanted = 0;
 static u32 msglvl_mask = 0;
 
+static u32 generic_flags_changed;
+static u32 generic_flags_wanted = 0;
+static u32 generic_flags_mask = 0;
+
 static enum {
 	ONLINE=0,
 	OFFLINE,
@@ -608,6 +613,11 @@ static struct cmdline_info cmdline_msglvl[] = {
 	  NETIF_MSG_WOL, &msglvl_mask },
 };
 
+static struct cmdline_info cmdline_generic_flags[] = {
+	{ "early-orphan", CMDL_FLAG, &generic_flags_wanted, NULL,
+	  ETH_FLAG_EARLY_ORPHAN, &generic_flags_mask },
+};
+
 static long long
 get_int_range(char *str, int base, long long min, long long max)
 {
@@ -1125,7 +1135,13 @@ static void parse_cmdline(int argc, char **argp)
 				}
 				break;
 			}
-			show_usage(1);
+			parse_generic_cmdline(
+					argc, argp, i,
+					&generic_flags_changed,
+					cmdline_generic_flags,
+					ARRAY_SIZE(cmdline_generic_flags));
+			i = argc;
+			break;
 		}
 	}
 
@@ -2498,6 +2514,18 @@ static int do_gset(int fd, struct ifreq *ifr)
 		perror("Cannot get link status");
 	}
 
+	edata.cmd = ETHTOOL_GFLAGS;
+	ifr->ifr_data = (caddr_t)&edata;
+	err = ioctl(fd, SIOCETHTOOL, ifr);
+	if (err) {
+		perror("Cannot get device flags");
+	} else {
+		int eo = (edata.data & ETH_FLAG_EARLY_ORPHAN) != 0;
+		fprintf(stdout, "	Early Orphan: %s\n",
+			eo ? "on" : "off");
+		allfail = 0;
+	}
+
 	if (allfail) {
 		fprintf(stdout, "No data available\n");
 		return 75;
@@ -2623,6 +2651,28 @@ static int do_sset(int fd, struct ifreq *ifr)
 		}
 	}
 
+	if (generic_flags_changed) {
+		printf("generic flags changed\n");
+		struct ethtool_value edata;
+
+		edata.cmd = ETHTOOL_GFLAGS;
+		edata.data = 0;
+		ifr->ifr_data = (caddr_t)&edata;
+		err = ioctl(fd, SIOCETHTOOL, ifr);
+		if (err) {
+			perror("Cannot get device flag settings");
+			return 91;
+		}
+
+		edata.cmd = ETHTOOL_SFLAGS;
+		edata.data = ((edata.data & ~generic_flags_mask) |
+			      generic_flags_wanted);
+
+		err = ioctl(fd, SIOCETHTOOL, ifr);
+		if (err)
+			perror("Cannot set device flag settings");
+	}
+
 	return 0;
 }
 
-- 
1.7.2.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox