Netdev List
 help / color / mirror / Atom feed
* [PATCH 5/5] vt: move vt notifiers into vt.h
From: Amerigo Wang @ 2011-06-22  6:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, netdev, Amerigo Wang, David S. Miller, Lucas De Marchi,
	Paul E. McKenney, Josh Triplett
In-Reply-To: <1308724522-32461-1-git-send-email-amwang@redhat.com>

It is not necessary to share the same notifier.h.

Signed-off-by: WANG Cong <amwang@redhat.com>

---
 include/linux/notifier.h |    9 ++-------
 include/linux/vt.h       |    7 +++++++
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index ae8f7d9..d65746e 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -193,6 +193,8 @@ static inline int notifier_to_errno(int ret)
 
 /* Hibernation and suspend events are defined in include/linux/suspend.h. */
 
+/* Virtual Terminal events are defined in include/linux/vt.h. */
+
 #define NETLINK_URELEASE	0x0001	/* Unicast netlink socket released */
 
 /* Console keyboard events.
@@ -206,12 +208,5 @@ static inline int notifier_to_errno(int ret)
 
 extern struct blocking_notifier_head reboot_notifier_list;
 
-/* Virtual Terminal events. */
-#define VT_ALLOCATE		0x0001 /* Console got allocated */
-#define VT_DEALLOCATE		0x0002 /* Console will be deallocated */
-#define VT_WRITE		0x0003 /* A char got output */
-#define VT_UPDATE		0x0004 /* A bigger update occurred */
-#define VT_PREWRITE		0x0005 /* A char is about to be written to the console */
-
 #endif /* __KERNEL__ */
 #endif /* _LINUX_NOTIFIER_H */
diff --git a/include/linux/vt.h b/include/linux/vt.h
index d5dd0bc..30a8dd9 100644
--- a/include/linux/vt.h
+++ b/include/linux/vt.h
@@ -86,6 +86,13 @@ struct vt_setactivate {
 
 #ifdef __KERNEL__
 
+/* Virtual Terminal events. */
+#define VT_ALLOCATE		0x0001 /* Console got allocated */
+#define VT_DEALLOCATE		0x0002 /* Console will be deallocated */
+#define VT_WRITE		0x0003 /* A char got output */
+#define VT_UPDATE		0x0004 /* A bigger update occurred */
+#define VT_PREWRITE		0x0005 /* A char is about to be written to the console */
+
 #ifdef CONFIG_VT_CONSOLE
 
 extern int vt_kmsg_redirect(int new);
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 4/5] pm: move pm notifiers into suspend.h
From: Amerigo Wang @ 2011-06-22  6:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, netdev, Amerigo Wang, Chris Ball, Len Brown, Pavel Machek,
	Rafael J. Wysocki, Ohad Ben-Cohen, Linus Walleij, Philip Rakity,
	David S. Miller, Lucas De Marchi, Paul E. McKenney, Josh Triplett,
	linux-mmc, linux-pm
In-Reply-To: <1308724522-32461-1-git-send-email-amwang@redhat.com>

It is not necessary to share the same notifier.h.

Signed-off-by: WANG Cong <amwang@redhat.com>

---
 drivers/mmc/core/core.c  |    3 +++
 include/linux/notifier.h |   10 ++--------
 include/linux/suspend.h  |    8 ++++++++
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 68091dd..2cd4ec5 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -23,6 +23,9 @@
 #include <linux/log2.h>
 #include <linux/regulator/consumer.h>
 #include <linux/pm_runtime.h>
+#ifdef CONFIG_PM
+#include <linux/suspend.h>
+#endif
 
 #include <linux/mmc/card.h>
 #include <linux/mmc/host.h>
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 145c436..ae8f7d9 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -191,15 +191,9 @@ static inline int notifier_to_errno(int ret)
 
 /* reboot notifiers are defined in include/linux/reboot.h. */
 
-#define NETLINK_URELEASE	0x0001	/* Unicast netlink socket released */
+/* Hibernation and suspend events are defined in include/linux/suspend.h. */
 
-/* Hibernation and suspend events */
-#define PM_HIBERNATION_PREPARE	0x0001 /* Going to hibernate */
-#define PM_POST_HIBERNATION	0x0002 /* Hibernation finished */
-#define PM_SUSPEND_PREPARE	0x0003 /* Going to suspend the system */
-#define PM_POST_SUSPEND		0x0004 /* Suspend finished */
-#define PM_RESTORE_PREPARE	0x0005 /* Going to restore a saved image */
-#define PM_POST_RESTORE		0x0006 /* Restore failed */
+#define NETLINK_URELEASE	0x0001	/* Unicast netlink socket released */
 
 /* Console keyboard events.
  * Note: KBD_KEYCODE is always sent before KBD_UNBOUND_KEYCODE, KBD_UNICODE and
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 083ffea..95bc81c 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -260,6 +260,14 @@ static inline int hibernate(void) { return -ENOSYS; }
 static inline bool system_entering_hibernation(void) { return false; }
 #endif /* CONFIG_HIBERNATION */
 
+/* Hibernation and suspend events */
+#define PM_HIBERNATION_PREPARE	0x0001 /* Going to hibernate */
+#define PM_POST_HIBERNATION	0x0002 /* Hibernation finished */
+#define PM_SUSPEND_PREPARE	0x0003 /* Going to suspend the system */
+#define PM_POST_SUSPEND		0x0004 /* Suspend finished */
+#define PM_RESTORE_PREPARE	0x0005 /* Going to restore a saved image */
+#define PM_POST_RESTORE		0x0006 /* Restore failed */
+
 #ifdef CONFIG_PM_SLEEP
 void save_processor_state(void);
 void restore_processor_state(void);
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 3/5] sys: move reboot notifiers into reboot.h
From: Amerigo Wang @ 2011-06-22  6:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, netdev, Amerigo Wang, David S. Miller, Lucas De Marchi,
	Paul E. McKenney, Josh Triplett, Jiri Slaby, Greg Kroah-Hartman,
	David Howells, James Morris
In-Reply-To: <1308724522-32461-1-git-send-email-amwang@redhat.com>

It is not necessary to share the same notifier.h.
This patch already moves register_reboot_notifier()
and unregister_reboot_notifier() from kernel/notifier.c to
kernel/sys.c.

Signed-off-by: WANG Cong <amwang@redhat.com>

---
 include/linux/notifier.h |    5 +----
 include/linux/reboot.h   |    5 +++++
 kernel/notifier.c        |   31 -------------------------------
 kernel/sys.c             |   32 +++++++++++++++++++++++++++++++-
 4 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index e8a858a..145c436 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -189,10 +189,7 @@ static inline int notifier_to_errno(int ret)
 
 /* netdevice notifiers are defined in include/linux/netdevice.h */
 
-#define SYS_DOWN	0x0001	/* Notify of system down */
-#define SYS_RESTART	SYS_DOWN
-#define SYS_HALT	0x0002	/* Notify of system halt */
-#define SYS_POWER_OFF	0x0003	/* Notify of system power off */
+/* reboot notifiers are defined in include/linux/reboot.h. */
 
 #define NETLINK_URELEASE	0x0001	/* Unicast netlink socket released */
 
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 3005d5a..e0879a7 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -39,6 +39,11 @@
 
 #include <linux/notifier.h>
 
+#define SYS_DOWN	0x0001	/* Notify of system down */
+#define SYS_RESTART	SYS_DOWN
+#define SYS_HALT	0x0002	/* Notify of system halt */
+#define SYS_POWER_OFF	0x0003	/* Notify of system power off */
+
 extern int register_reboot_notifier(struct notifier_block *);
 extern int unregister_reboot_notifier(struct notifier_block *);
 
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 2488ba7..8d7b435 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -525,37 +525,6 @@ void srcu_init_notifier_head(struct srcu_notifier_head *nh)
 }
 EXPORT_SYMBOL_GPL(srcu_init_notifier_head);
 
-/**
- *	register_reboot_notifier - Register function to be called at reboot time
- *	@nb: Info about notifier function to be called
- *
- *	Registers a function with the list of functions
- *	to be called at reboot time.
- *
- *	Currently always returns zero, as blocking_notifier_chain_register()
- *	always returns zero.
- */
-int register_reboot_notifier(struct notifier_block *nb)
-{
-	return blocking_notifier_chain_register(&reboot_notifier_list, nb);
-}
-EXPORT_SYMBOL(register_reboot_notifier);
-
-/**
- *	unregister_reboot_notifier - Unregister previously registered reboot notifier
- *	@nb: Hook to be unregistered
- *
- *	Unregisters a previously registered reboot
- *	notifier function.
- *
- *	Returns zero on success, or %-ENOENT on failure.
- */
-int unregister_reboot_notifier(struct notifier_block *nb)
-{
-	return blocking_notifier_chain_unregister(&reboot_notifier_list, nb);
-}
-EXPORT_SYMBOL(unregister_reboot_notifier);
-
 static ATOMIC_NOTIFIER_HEAD(die_chain);
 
 int notrace __kprobes notify_die(enum die_val val, const char *str,
diff --git a/kernel/sys.c b/kernel/sys.c
index e4128b2..a101ba3 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -8,7 +8,6 @@
 #include <linux/mm.h>
 #include <linux/utsname.h>
 #include <linux/mman.h>
-#include <linux/notifier.h>
 #include <linux/reboot.h>
 #include <linux/prctl.h>
 #include <linux/highuid.h>
@@ -320,6 +319,37 @@ void kernel_restart_prepare(char *cmd)
 }
 
 /**
+ *	register_reboot_notifier - Register function to be called at reboot time
+ *	@nb: Info about notifier function to be called
+ *
+ *	Registers a function with the list of functions
+ *	to be called at reboot time.
+ *
+ *	Currently always returns zero, as blocking_notifier_chain_register()
+ *	always returns zero.
+ */
+int register_reboot_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&reboot_notifier_list, nb);
+}
+EXPORT_SYMBOL(register_reboot_notifier);
+
+/**
+ *	unregister_reboot_notifier - Unregister previously registered reboot notifier
+ *	@nb: Hook to be unregistered
+ *
+ *	Unregisters a previously registered reboot
+ *	notifier function.
+ *
+ *	Returns zero on success, or %-ENOENT on failure.
+ */
+int unregister_reboot_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&reboot_notifier_list, nb);
+}
+EXPORT_SYMBOL(unregister_reboot_notifier);
+
+/**
  *	kernel_restart - reboot the system
  *	@cmd: pointer to buffer containing command to execute for restart
  *		or %NULL
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 2/5] net: move netdevice notifiers into netdevice.h
From: Amerigo Wang @ 2011-06-22  6:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, netdev, Amerigo Wang, David S. Miller, Lucas De Marchi,
	Paul E. McKenney, Josh Triplett
In-Reply-To: <1308724522-32461-1-git-send-email-amwang@redhat.com>

It is not necessary to share the same notifier.h.

Signed-off-by: WANG Cong <amwang@redhat.com>
---
 include/linux/netdevice.h |   36 +++++++++++++++++++++++++++++++++---
 include/linux/notifier.h  |   28 +---------------------------
 2 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 54b8b4d..7c2026b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1566,6 +1566,39 @@ struct packet_type {
 #include <linux/interrupt.h>
 #include <linux/notifier.h>
 
+/* netdevice notifier chain. Please remember to update the rtnetlink
+ * notification exclusion list in rtnetlink_event() when adding new
+ * types.
+ */
+#define NETDEV_UP	0x0001	/* For now you can't veto a device up/down */
+#define NETDEV_DOWN	0x0002
+#define NETDEV_REBOOT	0x0003	/* Tell a protocol stack a network interface
+				   detected a hardware crash and restarted
+				   - we can use this eg to kick tcp sessions
+				   once done */
+#define NETDEV_CHANGE	0x0004	/* Notify device state change */
+#define NETDEV_REGISTER 0x0005
+#define NETDEV_UNREGISTER	0x0006
+#define NETDEV_CHANGEMTU	0x0007
+#define NETDEV_CHANGEADDR	0x0008
+#define NETDEV_GOING_DOWN	0x0009
+#define NETDEV_CHANGENAME	0x000A
+#define NETDEV_FEAT_CHANGE	0x000B
+#define NETDEV_BONDING_FAILOVER 0x000C
+#define NETDEV_PRE_UP		0x000D
+#define NETDEV_PRE_TYPE_CHANGE	0x000E
+#define NETDEV_POST_TYPE_CHANGE	0x000F
+#define NETDEV_POST_INIT	0x0010
+#define NETDEV_UNREGISTER_BATCH 0x0011
+#define NETDEV_RELEASE		0x0012
+#define NETDEV_NOTIFY_PEERS	0x0013
+#define NETDEV_JOIN		0x0014
+
+extern int register_netdevice_notifier(struct notifier_block *nb);
+extern int unregister_netdevice_notifier(struct notifier_block *nb);
+extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
+
+
 extern rwlock_t				dev_base_lock;		/* Device list lock */
 
 
@@ -1648,12 +1681,9 @@ static inline void unregister_netdevice(struct net_device *dev)
 extern int 		netdev_refcnt_read(const struct net_device *dev);
 extern void		free_netdev(struct net_device *dev);
 extern void		synchronize_net(void);
-extern int 		register_netdevice_notifier(struct notifier_block *nb);
-extern int		unregister_netdevice_notifier(struct notifier_block *nb);
 extern int		init_dummy_netdev(struct net_device *dev);
 extern void		netdev_resync_ops(struct net_device *dev);
 
-extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
 extern struct net_device	*dev_get_by_index(struct net *net, int ifindex);
 extern struct net_device	*__dev_get_by_index(struct net *net, int ifindex);
 extern struct net_device	*dev_get_by_index_rcu(struct net *net, int ifindex);
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 9eb25fc..e8a858a 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -187,33 +187,7 @@ static inline int notifier_to_errno(int ret)
  
 /* CPU notfiers are defined in include/linux/cpu.h. */
 
-/* netdevice notifier chain. Please remember to update the rtnetlink
- * notification exclusion list in rtnetlink_event() when adding new
- * types.
- */
-#define NETDEV_UP	0x0001	/* For now you can't veto a device up/down */
-#define NETDEV_DOWN	0x0002
-#define NETDEV_REBOOT	0x0003	/* Tell a protocol stack a network interface
-				   detected a hardware crash and restarted
-				   - we can use this eg to kick tcp sessions
-				   once done */
-#define NETDEV_CHANGE	0x0004	/* Notify device state change */
-#define NETDEV_REGISTER 0x0005
-#define NETDEV_UNREGISTER	0x0006
-#define NETDEV_CHANGEMTU	0x0007
-#define NETDEV_CHANGEADDR	0x0008
-#define NETDEV_GOING_DOWN	0x0009
-#define NETDEV_CHANGENAME	0x000A
-#define NETDEV_FEAT_CHANGE	0x000B
-#define NETDEV_BONDING_FAILOVER 0x000C
-#define NETDEV_PRE_UP		0x000D
-#define NETDEV_PRE_TYPE_CHANGE	0x000E
-#define NETDEV_POST_TYPE_CHANGE	0x000F
-#define NETDEV_POST_INIT	0x0010
-#define NETDEV_UNREGISTER_BATCH 0x0011
-#define NETDEV_RELEASE		0x0012
-#define NETDEV_NOTIFY_PEERS	0x0013
-#define NETDEV_JOIN		0x0014
+/* netdevice notifiers are defined in include/linux/netdevice.h */
 
 #define SYS_DOWN	0x0001	/* Notify of system down */
 #define SYS_RESTART	SYS_DOWN
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 1/5] cpu: move cpu notifiers into cpu.h
From: Amerigo Wang @ 2011-06-22  6:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: akpm, netdev, Amerigo Wang, Andy Grover, David S. Miller,
	Tejun Heo, Greg Kroah-Hartman, Brandon Philips, Lucas De Marchi,
	Paul E. McKenney, Josh Triplett, rds-devel
In-Reply-To: <1308724522-32461-1-git-send-email-amwang@redhat.com>

It is not necessary to share the same notifier.h.

Signed-off-by: WANG Cong <amwang@redhat.com>
---
 include/linux/cpu.h      |   33 +++++++++++++++++++++++++++++++++
 include/linux/notifier.h |   34 ++--------------------------------
 net/rds/page.c           |    1 +
 3 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 5f09323..b1a635a 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -70,6 +70,39 @@ enum {
 	CPU_PRI_WORKQUEUE	= 5,
 };
 
+#define CPU_ONLINE		0x0002 /* CPU (unsigned)v is up */
+#define CPU_UP_PREPARE		0x0003 /* CPU (unsigned)v coming up */
+#define CPU_UP_CANCELED		0x0004 /* CPU (unsigned)v NOT coming up */
+#define CPU_DOWN_PREPARE	0x0005 /* CPU (unsigned)v going down */
+#define CPU_DOWN_FAILED		0x0006 /* CPU (unsigned)v NOT going down */
+#define CPU_DEAD		0x0007 /* CPU (unsigned)v dead */
+#define CPU_DYING		0x0008 /* CPU (unsigned)v not running any task,
+					* not handling interrupts, soon dead.
+					* Called on the dying cpu, interrupts
+					* are already disabled. Must not
+					* sleep, must not fail */
+#define CPU_POST_DEAD		0x0009 /* CPU (unsigned)v dead, cpu_hotplug
+					* lock is dropped */
+#define CPU_STARTING		0x000A /* CPU (unsigned)v soon running.
+					* Called on the new cpu, just before
+					* enabling interrupts. Must not sleep,
+					* must not fail */
+
+/* Used for CPU hotplug events occurring while tasks are frozen due to a suspend
+ * operation in progress
+ */
+#define CPU_TASKS_FROZEN	0x0010
+
+#define CPU_ONLINE_FROZEN	(CPU_ONLINE | CPU_TASKS_FROZEN)
+#define CPU_UP_PREPARE_FROZEN	(CPU_UP_PREPARE | CPU_TASKS_FROZEN)
+#define CPU_UP_CANCELED_FROZEN	(CPU_UP_CANCELED | CPU_TASKS_FROZEN)
+#define CPU_DOWN_PREPARE_FROZEN	(CPU_DOWN_PREPARE | CPU_TASKS_FROZEN)
+#define CPU_DOWN_FAILED_FROZEN	(CPU_DOWN_FAILED | CPU_TASKS_FROZEN)
+#define CPU_DEAD_FROZEN		(CPU_DEAD | CPU_TASKS_FROZEN)
+#define CPU_DYING_FROZEN	(CPU_DYING | CPU_TASKS_FROZEN)
+#define CPU_STARTING_FROZEN	(CPU_STARTING | CPU_TASKS_FROZEN)
+
+
 #ifdef CONFIG_SMP
 /* Need to know about CPUs going up/down? */
 #if defined(CONFIG_HOTPLUG_CPU) || !defined(MODULE)
diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index c0688b0..9eb25fc 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -185,6 +185,8 @@ static inline int notifier_to_errno(int ret)
  *	VC switch chains (for loadable kernel svgalib VC switch helpers) etc...
  */
  
+/* CPU notfiers are defined in include/linux/cpu.h. */
+
 /* netdevice notifier chain. Please remember to update the rtnetlink
  * notification exclusion list in rtnetlink_event() when adding new
  * types.
@@ -220,38 +222,6 @@ static inline int notifier_to_errno(int ret)
 
 #define NETLINK_URELEASE	0x0001	/* Unicast netlink socket released */
 
-#define CPU_ONLINE		0x0002 /* CPU (unsigned)v is up */
-#define CPU_UP_PREPARE		0x0003 /* CPU (unsigned)v coming up */
-#define CPU_UP_CANCELED		0x0004 /* CPU (unsigned)v NOT coming up */
-#define CPU_DOWN_PREPARE	0x0005 /* CPU (unsigned)v going down */
-#define CPU_DOWN_FAILED		0x0006 /* CPU (unsigned)v NOT going down */
-#define CPU_DEAD		0x0007 /* CPU (unsigned)v dead */
-#define CPU_DYING		0x0008 /* CPU (unsigned)v not running any task,
-					* not handling interrupts, soon dead.
-					* Called on the dying cpu, interrupts
-					* are already disabled. Must not
-					* sleep, must not fail */
-#define CPU_POST_DEAD		0x0009 /* CPU (unsigned)v dead, cpu_hotplug
-					* lock is dropped */
-#define CPU_STARTING		0x000A /* CPU (unsigned)v soon running.
-					* Called on the new cpu, just before
-					* enabling interrupts. Must not sleep,
-					* must not fail */
-
-/* Used for CPU hotplug events occurring while tasks are frozen due to a suspend
- * operation in progress
- */
-#define CPU_TASKS_FROZEN	0x0010
-
-#define CPU_ONLINE_FROZEN	(CPU_ONLINE | CPU_TASKS_FROZEN)
-#define CPU_UP_PREPARE_FROZEN	(CPU_UP_PREPARE | CPU_TASKS_FROZEN)
-#define CPU_UP_CANCELED_FROZEN	(CPU_UP_CANCELED | CPU_TASKS_FROZEN)
-#define CPU_DOWN_PREPARE_FROZEN	(CPU_DOWN_PREPARE | CPU_TASKS_FROZEN)
-#define CPU_DOWN_FAILED_FROZEN	(CPU_DOWN_FAILED | CPU_TASKS_FROZEN)
-#define CPU_DEAD_FROZEN		(CPU_DEAD | CPU_TASKS_FROZEN)
-#define CPU_DYING_FROZEN	(CPU_DYING | CPU_TASKS_FROZEN)
-#define CPU_STARTING_FROZEN	(CPU_STARTING | CPU_TASKS_FROZEN)
-
 /* Hibernation and suspend events */
 #define PM_HIBERNATION_PREPARE	0x0001 /* Going to hibernate */
 #define PM_POST_HIBERNATION	0x0002 /* Hibernation finished */
diff --git a/net/rds/page.c b/net/rds/page.c
index d8acdeb..b82d63e 100644
--- a/net/rds/page.c
+++ b/net/rds/page.c
@@ -32,6 +32,7 @@
  */
 #include <linux/highmem.h>
 #include <linux/gfp.h>
+#include <linux/cpu.h>
 
 #include "rds.h"
 
-- 
1.7.4.4

^ permalink raw reply related

* [PATCH 0/5] notifiers: split notifier.h into subsystem headers
From: Amerigo Wang @ 2011-06-22  6:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, netdev, WANG Cong

Now we define all kinds of notifiers in notifier.h, this is not
necessary at all, since different subsystems use different
notifiers, they are almost non-related with each other.

This can also save much build time. Suppose I add a new netdevice
event, really I don't have to recompile all the source, just network
related. Without this patch, all the source will be recompiled.

I move the notify events near to their subsystem notifier registers,
so that they can be found more easily.

In case of conflicts, I hope Andrew Morton would take the whole
patchset, rather than different subsystem maintainers take their own.

Signed-off-by: WANG Cong <amwang@redhat.com>

---
 drivers/mmc/core/core.c   |    3 ++
 include/linux/cpu.h       |   33 ++++++++++++++++++
 include/linux/netdevice.h |   36 ++++++++++++++++++--
 include/linux/notifier.h  |   82 +++-----------------------------------------
 include/linux/reboot.h    |    5 +++
 include/linux/suspend.h   |    8 ++++
 include/linux/vt.h        |    7 ++++
 kernel/notifier.c         |   31 -----------------
 kernel/sys.c              |   32 +++++++++++++++++-
 net/rds/page.c            |    1 +
 10 files changed, 127 insertions(+), 111 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next 3/3] udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packet
From: Eric Dumazet @ 2011-06-22  5:43 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: davem, netdev, Xufeng Zhang
In-Reply-To: <1308689020-1873-4-git-send-email-paul.gortmaker@windriver.com>

Le mardi 21 juin 2011 à 16:43 -0400, Paul Gortmaker a écrit :
> From: Xufeng Zhang <xufeng.zhang@windriver.com>
> 
> Consider this scenario: When the size of the first received udp packet
> is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags.
> However, if checksum error happens and this is a blocking socket, it will
> goto try_again loop to receive the next packet.  But if the size of the
> next udp packet is smaller than receive buffer, MSG_TRUNC flag should not
> be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before
> receive the next packet, MSG_TRUNC is still set, which is wrong.
> 
> Fix this problem by clearing MSG_TRUNC flag when starting over for a
> new packet.
> 
> Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/ipv4/udp.c |    3 +++
>  net/ipv6/udp.c |    3 +++
>  2 files changed, 6 insertions(+), 0 deletions(-)

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply

* Re: [PATCH net-next 2/3] ipv6/udp: Use the correct variable to determine non-blocking condition
From: Eric Dumazet @ 2011-06-22  5:42 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: davem, netdev, Xufeng Zhang
In-Reply-To: <1308689020-1873-3-git-send-email-paul.gortmaker@windriver.com>

Le mardi 21 juin 2011 à 16:43 -0400, Paul Gortmaker a écrit :
> From: Xufeng Zhang <xufeng.zhang@windriver.com>
> 
> udpv6_recvmsg() function is not using the correct variable to determine
> whether or not the socket is in non-blocking operation, this will lead
> to unexpected behavior when a UDP checksum error occurs.
> 
> Consider a non-blocking udp receive scenario: when udpv6_recvmsg() is
> called by sock_common_recvmsg(), MSG_DONTWAIT bit of flags variable in
> udpv6_recvmsg() is cleared by "flags & ~MSG_DONTWAIT" in this call:
> 
>     err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
>                    flags & ~MSG_DONTWAIT, &addr_len);
> 
> i.e. with udpv6_recvmsg() getting these values:
> 
> 	int noblock = flags & MSG_DONTWAIT
> 	int flags = flags & ~MSG_DONTWAIT
> 
> So, when udp checksum error occurs, the execution will go to
> csum_copy_err, and then the problem happens:
> 
>     csum_copy_err:
>             ...............
>             if (flags & MSG_DONTWAIT)
>                     return -EAGAIN;
>             goto try_again;
>             ...............
> 
> But it will always go to try_again as MSG_DONTWAIT has been cleared
> from flags at call time -- only noblock contains the original value
> of MSG_DONTWAIT, so the test should be:
> 
>             if (noblock)
>                     return -EAGAIN;
> 
> This is also consistent with what the ipv4/udp code does.
> 
> Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply

* net-2.6 GIT rolled back 3 commits...
From: David Miller @ 2011-06-22  5:36 UTC (permalink / raw)
  To: netdev; +Cc: sfr


I had to roll back the net-2.6 GIT tree by three commits to get
rid of the buggy UDP uhash_entries patch.

You can easily fix your tree by going:

git reset --hard 58fa45973117ab7a79d5b6818275a887867fc4d7

and then re-pulling.

Sorry for the inconvenience this may cause.

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: ipv4: fix potential memory leak by assigning uhash_entries
From: David Miller @ 2011-06-22  5:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: paul.gortmaker, netdev, mark.asselstine
In-Reply-To: <1308719087.2713.4.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 22 Jun 2011 07:04:46 +0200

> Arg no, I really wanted to get more hash slots in my 32bit machines,
> with 4Gbytes of memory.
> 
> Here is what I currently have (without your patch)
> 
> [    1.903086] UDP hash table entries: 512 (order: 2, 16384 bytes)
> 
> 
> I mean, this kmemleak was already reported.
> 
> 32MB machines are things of the past.
> 
> If you really care, please add a change to alloc_large_system_hash() ?

Crap, I thought that argument to alloc_large_system_hash() provided a
lower bound.

Indeed, I'm going to revert this patch.

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: ipv4: fix potential memory leak by assigning uhash_entries
From: Eric Dumazet @ 2011-06-22  5:04 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: davem, netdev, Mark Asselstine
In-Reply-To: <1308689020-1873-2-git-send-email-paul.gortmaker@windriver.com>

Le mardi 21 juin 2011 à 16:43 -0400, Paul Gortmaker a écrit :
> From: Mark Asselstine <mark.asselstine@windriver.com>
> 
> Commit f86dcc5a [udp: dynamically size hash tables at boot time]
> introduced the uhash_entries boot option and made sure to keep
> it set within acceptable limits -- if used.  It did not assign a
> default value, however, so it defaults to zero.  This results in
> alloc_large_system_hash() being relied upon to specify an acceptable
> number of hash entries, something it can't be relied on to always do
> correctly. For example, when it fails to set an acceptable minimum
> (UDP_HTABLE_SIZE_MIN) we get a second allocation and a memory leak.
> So we need to set a default value for uhash_entries to ensure we get
> the required minimum and prevent a second allocation.
> 
> This was found by using DEBUG_KMEMLEAK, producing the following log:
> 
> unreferenced object 0xc1b0d000 (size 4096):
>   comm "swapper", pid 1, jiffies 4294667562 (age 136.225s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<c10e9027>] create_object+0xd7/0x210
>     [<c15d73d7>] kmemleak_alloc+0x27/0x50
>     [<c1877032>] alloc_large_system_hash+0x16d/0x1f7
>     [<c189121d>] udp_table_init+0x43/0xf8
>     [<c18912e4>] udp_init+0x12/0x74
>     [<c1891637>] inet_init+0x179/0x250
>     [<c10011f0>] do_one_initcall+0x30/0x160
>     [<c18607c9>] kernel_init+0xb9/0x14e
>     [<c15fcff6>] kernel_thread_helper+0x6/0xd
>     [<ffffffff>] 0xffffffff
> 
> This is fairly easy to reproduce using ARCH=x86 defconfig (i386_defconfig)
> enabling DEBUG_KMEMLEAK and running on a system with 32MB of memory
> (qemu -m 32). With systems with larger amounts of memory we may not
> see this leak since the logic in alloc_large_system_hash() will result
> in a large enough (>UDP_HTABLE_SIZE_MIN) number of entries being set.
> 
> Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/ipv4/udp.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index abca870..6f53a5a 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2155,7 +2155,7 @@ void udp4_proc_exit(void)
>  }
>  #endif /* CONFIG_PROC_FS */
>  
> -static __initdata unsigned long uhash_entries;
> +static __initdata unsigned long uhash_entries = UDP_HTABLE_SIZE_MIN;
>  static int __init set_uhash_entries(char *str)
>  {
>  	if (!str)

Arg no, I really wanted to get more hash slots in my 32bit machines,
with 4Gbytes of memory.

Here is what I currently have (without your patch)

[    1.903086] UDP hash table entries: 512 (order: 2, 16384 bytes)


I mean, this kmemleak was already reported.

32MB machines are things of the past.

If you really care, please add a change to alloc_large_system_hash() ?




^ permalink raw reply

* Re: [PATCH v2 net-next] ip: introduce ip_is_fragment helper inline function
From: David Miller @ 2011-06-22  3:33 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: netdev, bhutchings
In-Reply-To: <1308680702-2212-1-git-send-email-paul.gortmaker@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Tue, 21 Jun 2011 14:25:02 -0400

> There are enough instances of this:
> 
>     iph->frag_off & htons(IP_MF | IP_OFFSET)
> 
> that a helper function is probably warranted.
> 
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] irda: fix smsc-ircc2 section mismatch warning
From: David Miller @ 2011-06-22  3:33 UTC (permalink / raw)
  To: randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, samuel-jcdQHdrhKHMdnm+yROfE0A,
	irda-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <20110621123938.5c932de8.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

From: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Date: Tue, 21 Jun 2011 12:39:38 -0700

> From: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> 
> Fix section mismatch warning:
> 
> WARNING: drivers/net/irda/smsc-ircc2.o(.devinit.text+0x1a7): Section mismatch in reference from the function smsc_ircc_pnp_probe() to the function .init.text:smsc_ircc_open()
> 
> Signed-off-by: Randy Dunlap <randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Applied to net-next-2.6, thanks!

------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Data protection magic?
Nope - It's vRanger. Get your free trial download today.
http://p.sf.net/sfu/quest-sfdev2dev

^ permalink raw reply

* Re: [PATCH v2 net-next af-packet 0/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: chetan loke @ 2011-06-22  3:02 UTC (permalink / raw)
  To: davem, netdev
  Cc: eric.dumazet, joe, bhutchings, shemminger, linux-kernel,
	Chetan Loke
In-Reply-To: <1308708650-25509-1-git-send-email-loke.chetan@gmail.com>

On Tue, Jun 21, 2011 at 10:10 PM, Chetan Loke <loke.chetan@gmail.com> wrote:
> Hello,
>
> Please review the patchset.
>
> Changes from v1:
>
> 1) v1 was based on 2.6.38.9. v2 is rebased to net-next.
> 2) Aligned bdqc members, pr_err to WARN, sob email      (Joe Perches)
> 3) Added tp_padding                                     (Eric Dumazet)
> 4) Nuked useless ;) white space                         (Stephen H)
> 5) Use __u types in headers                             (Ben Hutchings)
> 6) Added field for creating private area                (Chetan Loke)
>

Hi Dave,

Is there a chance of getting this merged either in 3.0 or 3.1 or ever ;) ?

thanks
Chetan

^ permalink raw reply

* Re: [RFC PATCH] packet: Add fanout support.
From: David Miller @ 2011-06-22  2:12 UTC (permalink / raw)
  To: xiaosuo; +Cc: victor, netdev
In-Reply-To: <BANLkTinxvHbkCLehJkqoCfryeTqL0zhWiw@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Wed, 22 Jun 2011 09:44:00 +0800

> I think he also needs all the packets belong to the related
> connections are received via the same socket. I am afraid that he has
> to dispatch these kind of packets among the uesrland processes again.
> :)

I mean, if we really wanted to, we could create a new ip_defrag()
client case in the rxhash receive code.  But this would need to be
configurable and off by default.

It would provide the desired behavior.

^ permalink raw reply

* [PATCH v2 net-next af-packet 2/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: Chetan Loke @ 2011-06-22  2:10 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, joe, bhutchings, shemminger, linux-kernel,
	Chetan Loke
In-Reply-To: <1308708650-25509-1-git-send-email-loke.chetan@gmail.com>

1) Blocks can now be configured with non-static frame format.
   Non-static frame format provides following benefits:
   1.1) Increases packet density by a factor of 2x.
   1.2) Ability to capture entire packet.
   1.3) Captures 99% 64-byte traffic as seen by the kernel.
2) Read/poll is now at a block-level rather than at packet level.
3) Added user-configurable timeout knob for timing out blocks on slow/bursty links.
4) Block level processing now allows monitoring multiple links as a single
   logical pipe.

Changes:
C1) tpacket_rcv()
    C1.1) packet_current_frame() is replaced by packet_current_rx_frame()
          The bulk of the processing is then moved in the following chain:
          packet_current_rx_frame()
            __packet_lookup_frame_in_block
              fill_curr_block()
              or
                retire_current_block
                dispatch_next_block
              or
              return NULL(queue is plugged/paused)

Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
 net/packet/af_packet.c |  881 +++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 836 insertions(+), 45 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index b54ec41..bcbe6ec 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -40,6 +40,9 @@
  *					byte arrays at the end of sockaddr_ll
  *					and packet_mreq.
  *		Johann Baudy	:	Added TX RING.
+ *		Chetan Loke	:	Implemented TPACKET_V3 block abstraction
+ *					layer. Copyright (C) 2011, <lokec@ccs.neu.edu>
+ *
  *
  *		This program is free software; you can redistribute it and/or
  *		modify it under the terms of the GNU General Public License
@@ -161,9 +164,55 @@ struct packet_mreq_max {
 	unsigned char	mr_address[MAX_ADDR_LEN];
 };
 
-static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
+static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 		int closing, int tx_ring);
 
+
+#define V3_ALIGNMENT	(4)
+#define ALIGN_4(x)	(((x)+V3_ALIGNMENT-1)&~(V3_ALIGNMENT-1))
+
+#define BLK_HDR_LEN	(ALIGN_4(sizeof(struct block_desc)))
+
+#define BLK_PLUS_PRIV(sz_of_priv) \
+	(BLK_HDR_LEN + ALIGN_4((sz_of_priv)))
+
+/* kbdq - kernel block descriptor queue */
+struct kbdq_core {
+	struct pgv	*pkbdq;
+	unsigned int	hdrlen;
+	unsigned char	reset_pending_on_curr_blk;
+	unsigned char   delete_blk_timer;
+	unsigned short	kactive_blk_num;
+	unsigned short	blk_sizeof_priv;
+
+	/* last_kactive_blk_num:
+	 * trick to see if user-space has caught up
+	 * in order to avoid refreshing timer when every single pkt arrives.
+	 */
+	unsigned short	last_kactive_blk_num;
+
+	char		*pkblk_start;
+	char		*pkblk_end;
+	int		kblk_size;
+	unsigned int	knum_blocks;
+	uint64_t	knxt_seq_num;
+	char		*prev;
+	char		*nxt_offset;
+
+
+	atomic_t	blk_fill_in_prog;
+
+	/* Default is set to 8ms */
+#define DEFAULT_PRB_RETIRE_TOV	(8)
+
+	unsigned short  retire_blk_tov;
+	unsigned short  version;
+	unsigned long	tov_in_jiffies;
+
+	/* timer to retire an outstanding block */
+	struct timer_list retire_blk_timer;
+};
+
 struct pgv {
 	char *buffer;
 };
@@ -179,18 +228,36 @@ struct packet_ring_buffer {
 	unsigned int		pg_vec_pages;
 	unsigned int		pg_vec_len;
 
+	struct kbdq_core	prb_bdqc;
 	atomic_t		pending;
 };
 
 struct packet_sock;
 static int tpacket_snd(struct packet_sock *po, struct msghdr *msg);
 
+static void *packet_previous_frame(struct packet_sock *po,
+		struct packet_ring_buffer *rb,
+		int status);
+static void packet_increment_head(struct packet_ring_buffer *buff);
+static int prb_curr_blk_in_use(struct kbdq_core *,
+			struct block_desc *);
+static void *prb_dispatch_next_block(struct kbdq_core *,
+			struct packet_sock *);
+static void prb_retire_current_block(struct kbdq_core *,
+		struct packet_sock *, unsigned int status);
+static int prb_queue_frozen(struct kbdq_core *);
+static void prb_open_block(struct kbdq_core *, struct block_desc *);
+static void prb_retire_rx_blk_timer_expired(unsigned long);
+static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *);
+static void prb_init_blk_timer(struct packet_sock *, struct kbdq_core *,
+				void (*func) (unsigned long));
 static void packet_flush_mclist(struct sock *sk);
 
 struct packet_sock {
 	/* struct sock has to be the first member of packet_sock */
 	struct sock		sk;
 	struct tpacket_stats	stats;
+	union  tpacket_stats_u	stats_u;
 	struct packet_ring_buffer	rx_ring;
 	struct packet_ring_buffer	tx_ring;
 	int			copy_thresh;
@@ -222,6 +289,19 @@ struct packet_skb_cb {
 
 #define PACKET_SKB_CB(__skb)	((struct packet_skb_cb *)((__skb)->cb))
 
+#define GET_PBDQC_FROM_RB(x)	((struct kbdq_core *)(&(x)->prb_bdqc))
+
+#define GET_PBLOCK_DESC(x, bid)	((struct block_desc *)((x)->pkbdq[(bid)].buffer))
+
+#define GET_CURR_PBLOCK_DESC_FROM_CORE(x)	\
+	((struct block_desc *)((x)->pkbdq[(x)->kactive_blk_num].buffer))
+
+
+#define GET_NEXT_PRB_BLK_NUM(x) \
+	(((x)->kactive_blk_num < ((x)->knum_blocks-1)) ? \
+	((x)->kactive_blk_num+1) : 0)
+
+
 static inline __pure struct page *pgv_to_page(void *addr)
 {
 	if (is_vmalloc_addr(addr))
@@ -247,6 +327,7 @@ static void __packet_set_status(struct packet_sock *po, void *frame, int status)
 		h.h2->tp_status = status;
 		flush_dcache_page(pgv_to_page(&h.h2->tp_status));
 		break;
+	case TPACKET_V3:
 	default:
 		pr_err("TPACKET version not supported\n");
 		BUG();
@@ -273,6 +354,7 @@ static int __packet_get_status(struct packet_sock *po, void *frame)
 	case TPACKET_V2:
 		flush_dcache_page(pgv_to_page(&h.h2->tp_status));
 		return h.h2->tp_status;
+	case TPACKET_V3:
 	default:
 		pr_err("TPACKET version not supported\n");
 		BUG();
@@ -311,6 +393,618 @@ static inline void *packet_current_frame(struct packet_sock *po,
 	return packet_lookup_frame(po, rb, rb->head, status);
 }
 
+static void prb_del_retire_blk_timer(struct kbdq_core *pkc)
+{
+	del_timer_sync(&pkc->retire_blk_timer);
+}
+
+static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
+		int tx_ring,
+		struct sk_buff_head *rb_queue)
+{
+	struct kbdq_core *pkc;
+
+	pkc = tx_ring ? &po->tx_ring.prb_bdqc : &po->rx_ring.prb_bdqc;
+
+	spin_lock(&rb_queue->lock);
+	pkc->delete_blk_timer = 1;
+	spin_unlock(&rb_queue->lock);
+
+	prb_del_retire_blk_timer(pkc);
+}
+
+static void prb_init_blk_timer(struct packet_sock *po,
+		struct kbdq_core *pkc,
+		void (*func) (unsigned long))
+{
+	init_timer(&pkc->retire_blk_timer);
+	pkc->retire_blk_timer.data = (long)po;
+	pkc->retire_blk_timer.function = func;
+	pkc->retire_blk_timer.expires = jiffies;
+}
+
+static void prb_setup_retire_blk_timer(struct packet_sock *po, int tx_ring)
+{
+	struct kbdq_core *pkc;
+
+	if (tx_ring)
+		BUG();
+
+	pkc = tx_ring ? &po->tx_ring.prb_bdqc : &po->rx_ring.prb_bdqc;
+	prb_init_blk_timer(po, pkc, prb_retire_rx_blk_timer_expired);
+}
+
+static int prb_calc_retire_blk_tmo(struct packet_sock *po,
+				int blk_size_in_bytes)
+{
+	struct net_device *dev;
+	unsigned int mbits = 0, msec = 0, div = 0, tmo = 0;
+
+	dev = dev_get_by_index(sock_net(&po->sk), po->ifindex);
+	if (unlikely(dev == NULL))
+		return DEFAULT_PRB_RETIRE_TOV;
+
+	if (dev->ethtool_ops && dev->ethtool_ops->get_settings) {
+		struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET, };
+
+		if (!dev->ethtool_ops->get_settings(dev, &ecmd)) {
+			switch (ecmd.speed) {
+			case SPEED_10000:
+				msec = 1;
+				div = 10000/1000;
+				break;
+			case SPEED_1000:
+				msec = 1;
+				div = 1000/1000;
+				break;
+			/*
+			 * If the link speed is so slow you don't really
+			 * need to worry about perf anyways
+			 */
+			case SPEED_100:
+			case SPEED_10:
+			default:
+				return DEFAULT_PRB_RETIRE_TOV;
+			}
+		}
+	}
+
+	mbits = (blk_size_in_bytes * 8) / (1024 * 1024);
+
+	if (div)
+		mbits /= div;
+
+	tmo = mbits * msec;
+
+	if (div)
+		return tmo+1;
+	return tmo;
+}
+
+static void init_prb_bdqc(struct packet_sock *po,
+			struct packet_ring_buffer *rb,
+			struct pgv *pg_vec,
+			union tpacket_req_u *req_u, int tx_ring)
+{
+	struct kbdq_core *p1 = &rb->prb_bdqc;
+	struct block_desc *pbd;
+
+	memset(p1, 0x0, sizeof(*p1));
+	p1->knxt_seq_num = 1;
+	p1->pkbdq = pg_vec;
+	pbd = (struct block_desc *)pg_vec[0].buffer;
+	p1->pkblk_start	= (char *)pg_vec[0].buffer;
+	p1->kblk_size = req_u->req3.tp_block_size;
+	p1->knum_blocks	= req_u->req3.tp_block_nr;
+	p1->hdrlen = po->tp_hdrlen;
+	p1->version = po->tp_version;
+	p1->last_kactive_blk_num = 0;
+	po->stats_u.stats3.tp_freeze_q_cnt = 0;
+	if (req_u->req3.tp_retire_blk_tov)
+		p1->retire_blk_tov = req_u->req3.tp_retire_blk_tov;
+	else
+		p1->retire_blk_tov = prb_calc_retire_blk_tmo(po,
+						req_u->req3.tp_block_size);
+	p1->tov_in_jiffies = msecs_to_jiffies(p1->retire_blk_tov);
+	p1->blk_sizeof_priv = req_u->req3.tp_sizeof_priv;
+	prb_setup_retire_blk_timer(po, tx_ring);
+	prb_open_block(p1, pbd);
+}
+
+/*  Do NOT update the last_blk_num first.
+ *  Assumes sk_buff_head lock is held.
+ */
+static void _prb_refresh_rx_retire_blk_timer(struct kbdq_core *pkc)
+{
+	mod_timer(&pkc->retire_blk_timer,
+			jiffies + pkc->tov_in_jiffies);
+	pkc->last_kactive_blk_num = pkc->kactive_blk_num;
+}
+
+/*
+ * Timer logic:
+ * 1) We refresh the timer only when we open a block.
+ *    By doing this we don't waste cycles refreshing the timer
+ *	  on packet-by-packet basis.
+ *
+ * With a 1MB block-size, on a 1Gbps line, it will take
+ * i) ~8 ms to fill a block + ii) memcpy etc.
+ * In this cut we are not accounting for the memcpy time.
+ *
+ * So, if the user sets the 'tmo' to 10ms then the timer
+ * will never fire while the block is still getting filled
+ * (which is what we want). However, the user could choose
+ * to close a block early and that's fine.
+ *
+ * But when the timer does fire, we check whether or not to refresh it.
+ * Since the tmo granularity is in msecs, it is not too expensive
+ * to refresh the timer, lets say every '8' msecs.
+ * Either the user can set the 'tmo' or we can derive it based on
+ * a) line-speed and b) block-size.
+ * prb_calc_retire_blk_tmo() calculates the tmo.
+ *
+ */
+static void prb_retire_rx_blk_timer_expired(unsigned long data)
+{
+	struct packet_sock *po = (struct packet_sock *)data;
+	struct kbdq_core *pkc = &po->rx_ring.prb_bdqc;
+	unsigned int frozen;
+	struct block_desc *pbd;
+
+	spin_lock(&po->sk.sk_receive_queue.lock);
+
+	frozen = prb_queue_frozen(pkc);
+	pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+
+	if (unlikely(pkc->delete_blk_timer))
+		goto out;
+
+	/* We only need to plug the race when the block is partially filled.
+	 * tpacket_rcv:
+	 *		lock(); increment BLOCK_NUM_PKTS; unlock()
+	 *		copy_bits() is in progress ...
+	 * timer fires on other cpu:
+	 *		we can't retire the current block because copy_bits
+	 *		is in progress.
+	 *
+	 */
+	if (BLOCK_NUM_PKTS(pbd)) {
+		while (atomic_read(&pkc->blk_fill_in_prog)) {
+			/* Waiting for skb_copy_bits to finish... */
+			cpu_relax();
+		}
+	}
+
+	if (pkc->last_kactive_blk_num == pkc->kactive_blk_num) {
+		if (!frozen) {
+			prb_retire_current_block(pkc, po, TP_STATUS_BLK_TMO);
+			if (!prb_dispatch_next_block(pkc, po))
+				goto refresh_timer;
+			else
+				goto out;
+		} else {
+			/* Case 1. Queue was frozen because user-space was
+			 *	   lagging behind.
+			 */
+			if (prb_curr_blk_in_use(pkc, pbd)) {
+			       /*
+				* Ok, user-space is still behind.
+				* So just refresh the timer.
+				*/
+				goto refresh_timer;
+			} else {
+			       /* Case 2. queue was frozen, user-space caught up,
+				* now the link went idle && the timer fired.
+				* We don't have a block to close. So we open this
+				* block and restart the timer.
+				* opening a block thaws the queue, restarts timer.
+				* Thawing/timer-refresh is a side effect.
+				*/
+				prb_open_block(pkc, pbd);
+				goto out;
+			}
+		}
+	}
+
+refresh_timer:
+	_prb_refresh_rx_retire_blk_timer(pkc);
+
+out:
+	spin_unlock(&po->sk.sk_receive_queue.lock);
+}
+
+static inline void prb_flush_block(struct kbdq_core *pkc1, struct block_desc *pbd1,
+			__u32 status)
+{
+	/* Flush everything minus the block header */
+
+#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1
+	u8 *start, *end;
+
+	start = (u8 *)pbd1;
+
+	/* Skip the block header(we know header WILL fit in 4K) */
+	start += PAGE_SIZE;
+
+	end = (u8 *)PAGE_ALIGN((unsigned long)pkc1->pkblk_end);
+	for (; start < end; start += PAGE_SIZE)
+		flush_dcache_page(pgv_to_page(start));
+
+	smp_wmb();
+#endif
+
+	/* Now update the block status. */
+
+	BLOCK_STATUS(pbd1) = status;
+
+	/* Flush the block header */
+
+#if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1
+	start = (u8 *)pbd1;
+	flush_dcache_page(pgv_to_page(start));
+
+	smp_wmb();
+#endif
+}
+
+/*
+ * Side effect:
+ *
+ * 1) flush the block
+ * 2) Increment active_blk_num
+ *
+ * Note:We DONT refresh the timer on purpose.
+ *	Because almost always the next block will be opened.
+ */
+static void prb_close_block(struct kbdq_core *pkc1, struct block_desc *pbd1,
+		struct packet_sock *po, unsigned int stat)
+{
+	__u32 status = TP_STATUS_USER | stat;
+
+	struct tpacket3_hdr *last_pkt;
+	struct bd_v1 *b1 = &pbd1->bd1;
+
+	if (po->stats.tp_drops)
+		status |= TP_STATUS_LOSING;
+
+	last_pkt = (struct tpacket3_hdr *)pkc1->prev;
+	last_pkt->tp_next_offset = 0;
+
+	/* Get the ts of the last pkt */
+	if (BLOCK_NUM_PKTS(pbd1)) {
+		b1->ts_last_pkt.ts_sec = last_pkt->tp_sec;
+		b1->ts_last_pkt.ts_nsec	= last_pkt->tp_nsec;
+	} else {
+		/* Ok, we tmo'd - so get the current time */
+		struct timespec ts;
+		getnstimeofday(&ts);
+		b1->ts_last_pkt.ts_sec = ts.tv_sec;
+		b1->ts_last_pkt.ts_nsec	= ts.tv_nsec;
+	}
+
+	smp_wmb();
+
+	/* Flush the block */
+	prb_flush_block(pkc1, pbd1, status);
+
+	pkc1->kactive_blk_num = GET_NEXT_PRB_BLK_NUM(pkc1);
+}
+
+static inline void prb_thaw_queue(struct kbdq_core *pkc)
+{
+	pkc->reset_pending_on_curr_blk = 0;
+}
+
+/*
+ * Side effect of opening a block:
+ *
+ * 1) prb_queue is thawed.
+ * 2) retire_blk_timer is refreshed.
+ *
+ */
+static void prb_open_block(struct kbdq_core *pkc1, struct block_desc *pbd1)
+{
+	struct timespec ts;
+	struct bd_v1 *b1 = &pbd1->bd1;
+
+	smp_rmb();
+
+	if (likely(TP_STATUS_KERNEL == BLOCK_STATUS(pbd1))) {
+
+		/* We could have just memset this but we will lose the flexibility of
+		 * making the priv area sticky
+		 */
+		BLOCK_SNUM(pbd1) = pkc1->knxt_seq_num++;
+		BLOCK_NUM_PKTS(pbd1) = 0;
+		BLOCK_LEN(pbd1) = BLK_PLUS_PRIV(pkc1->blk_sizeof_priv);
+		getnstimeofday(&ts);
+		b1->ts_first_pkt.ts_sec = ts.tv_sec;
+		b1->ts_first_pkt.ts_nsec = ts.tv_nsec;
+		pkc1->pkblk_start = (char *)pbd1;
+		pkc1->nxt_offset = (char *)(pkc1->pkblk_start+BLK_PLUS_PRIV(pkc1->blk_sizeof_priv));
+		BLOCK_O2FP(pbd1) = (__u32)BLK_PLUS_PRIV(pkc1->blk_sizeof_priv);
+		BLOCK_O2PRIV(pbd1) = (__u16)BLK_HDR_LEN;
+		pbd1->version = pkc1->version;
+		pkc1->prev = pkc1->nxt_offset;
+		pkc1->pkblk_end = pkc1->pkblk_start + pkc1->kblk_size;
+		prb_thaw_queue(pkc1);
+		_prb_refresh_rx_retire_blk_timer(pkc1);
+
+		smp_wmb();
+
+		return;
+	}
+
+	WARN(1, "ERROR block:%p is NOT FREE status:%d kactive_blk_num:%d\n",
+		pbd1, BLOCK_STATUS(pbd1), pkc1->kactive_blk_num);
+	dump_stack();
+	BUG();
+}
+
+/*
+ * Queue freeze logic:
+ * 1) Assume tp_block_nr = 8 blocks.
+ * 2) At time 't0', user opens Rx ring.
+ * 3) Some time past 't0', kernel starts filling blocks starting from 0 .. 7
+ * 4) user-space is either sleeping or processing block '0'.
+ * 5) tpacket_rcv is currently filling block '7', since there is no space left,
+ *    it will close block-7,loop around and try to fill block '0'.
+ *    call-flow:
+ *    __packet_lookup_frame_in_block
+ *      prb_retire_current_block()
+ *      prb_dispatch_next_block()
+ *        |->(BLOCK_STATUS == USER) evaluates to true
+ *    5.1) Since block-0 is currently in-use, we just freeze the queue.
+ * 6) Now there are two cases:
+ *    6.1) Link goes idle right after the queue is frozen.
+ *         But remember, the last open_block() refreshed the timer.
+ *         When this timer expires,it will refresh itself so that we can
+ *         re-open block-0 in near future.
+ *    6.2) Link is busy and keeps on receiving packets. This is a simple
+ *         case and __packet_lookup_frame_in_block will check if block-0
+ *         is free and can now be re-used.
+ */
+static inline void prb_freeze_queue(struct kbdq_core *pkc,
+				  struct packet_sock *po)
+{
+	pkc->reset_pending_on_curr_blk = 1;
+	po->stats_u.stats3.tp_freeze_q_cnt++;
+}
+
+#define TOTAL_PKT_LEN_INCL_ALIGN(length) (ALIGN_4((length)))
+
+/*
+ * If the next block is free then we will dispatch it
+ * and return a good offset.
+ * Else, we will freeze the queue.
+ * So, caller must check the return value.
+ */
+static void *prb_dispatch_next_block(struct kbdq_core *pkc,
+		struct packet_sock *po)
+{
+	struct block_desc *pbd;
+
+	smp_rmb();
+
+	/* 1. Get current block num */
+	pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+
+	/* 2. If this block is currently in_use then freeze the queue */
+	if (TP_STATUS_USER & BLOCK_STATUS(pbd)) {
+		prb_freeze_queue(pkc, po);
+		return NULL;
+	}
+
+	/*
+	 * 3.
+	 * open this block and return the offset where the first packet
+	 * needs to get stored.
+	 */
+	prb_open_block(pkc, pbd);
+	return (void *)pkc->nxt_offset;
+}
+
+static void prb_retire_current_block(struct kbdq_core *pkc,
+		struct packet_sock *po, unsigned int status)
+{
+	struct block_desc *pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+
+	/* retire/close the current block */
+	if (likely(TP_STATUS_KERNEL == BLOCK_STATUS(pbd))) {
+		/*
+		 * Plug the case where copy_bits() is in progress on
+		 * cpu-0 and tpacket_rcv() got invoked on cpu-1, didn't
+		 * have space to copy the pkt in the current block and
+		 * called prb_retire_current_block()
+		 *
+		 * TODO:DURING REVIEW ASK IF THIS IS A VALID RACE.
+		 *	MAIN CONCERN IS ABOUT r[f/p]s THREADS(?) EXECUTING
+		 *	IN PARALLEL.
+		 *
+		 * We don't need to worry about the TMO case because
+		 * the timer-handler already handled this case.
+		 */
+		if (!(status & TP_STATUS_BLK_TMO)) {
+			while (atomic_read(&pkc->blk_fill_in_prog)) {
+				/* Waiting for skb_copy_bits to finish... */
+				cpu_relax();
+			}
+		}
+		prb_close_block(pkc, pbd, po, status);
+		return;
+	}
+
+	WARN(1, "ERROR-pbd[%d]:%p\n", pkc->kactive_blk_num, pbd);
+	dump_stack();
+	BUG();
+}
+
+static inline int prb_curr_blk_in_use(struct kbdq_core *pkc,
+				      struct block_desc *pbd)
+{
+	return TP_STATUS_USER & BLOCK_STATUS(pbd);
+}
+
+static inline int prb_queue_frozen(struct kbdq_core *pkc)
+{
+	return pkc->reset_pending_on_curr_blk;
+}
+
+static inline void prb_clear_blk_fill_status(struct packet_ring_buffer *rb)
+{
+	struct kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
+	atomic_dec(&pkc->blk_fill_in_prog);
+}
+
+static inline void prb_fill_curr_block(char *curr, struct kbdq_core *pkc,
+				struct block_desc *pbd,
+				unsigned int len)
+{
+	struct tpacket3_hdr *ppd;
+
+	ppd  = (struct tpacket3_hdr *)curr;
+	ppd->tp_next_offset = TOTAL_PKT_LEN_INCL_ALIGN(len);
+	pkc->prev = curr;
+	pkc->nxt_offset += TOTAL_PKT_LEN_INCL_ALIGN(len);
+	BLOCK_LEN(pbd) += TOTAL_PKT_LEN_INCL_ALIGN(len);
+	BLOCK_NUM_PKTS(pbd) += 1;
+	atomic_inc(&pkc->blk_fill_in_prog);
+}
+
+/* Assumes caller has the sk->rx_queue.lock */
+static void *__packet_lookup_frame_in_block(struct packet_ring_buffer *rb,
+					    int status,
+					    unsigned int len,
+					    struct packet_sock *po)
+{
+	struct kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
+	struct block_desc *pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+	char *curr, *end;
+
+	/* Queue is frozen when user space is lagging behind */
+	if (prb_queue_frozen(pkc)) {
+		/*
+		 * Check if that last block which caused the queue to freeze,
+		 * is still in_use by user-space.
+		 */
+		if (prb_curr_blk_in_use(pkc, pbd)) {
+			/* Can't record this packet */
+			return NULL;
+		} else {
+			/*
+			 * Ok, the block was released by user-space.
+			 * Now let's open that block.
+			 * opening a block also thaws the queue.
+			 * Thawing is a side effect.
+			 */
+			prb_open_block(pkc, pbd);
+		}
+	}
+
+	smp_mb();
+	curr = pkc->nxt_offset;
+	end = (char *) ((char *)pbd + pkc->kblk_size);
+
+	/* first try the current block */
+	if (curr+TOTAL_PKT_LEN_INCL_ALIGN(len) < end) {
+		prb_fill_curr_block(curr, pkc, pbd, len);
+		return (void *)curr;
+	}
+
+	/* Ok, close the current block */
+	prb_retire_current_block(pkc, po, 0);
+
+	/* Now, try to dispatch the next block */
+	curr = (char *)prb_dispatch_next_block(pkc, po);
+	if (curr) {
+		pbd = GET_CURR_PBLOCK_DESC_FROM_CORE(pkc);
+		prb_fill_curr_block(curr, pkc, pbd, len);
+		return (void *)curr;
+	}
+
+	/*
+	 * No free blocks are available.user_space hasn't caught up yet.
+	 * Queue was just frozen and now this packet will get dropped.
+	 */
+	return NULL;
+}
+
+static inline void *packet_current_rx_frame(struct packet_sock *po,
+					    struct packet_ring_buffer *rb,
+					    int status, unsigned int len)
+{
+	char *curr = NULL;
+	switch (po->tp_version) {
+	case TPACKET_V1:
+	case TPACKET_V2:
+		curr = packet_lookup_frame(po, rb, rb->head, status);
+		return curr;
+	case TPACKET_V3:
+		return __packet_lookup_frame_in_block(rb, status, len, po);
+	default:
+		WARN(1, "TPACKET version not supported\n");
+		BUG();
+		return 0;
+	}
+}
+
+static inline void *prb_lookup_block(struct packet_sock *po,
+				     struct packet_ring_buffer *rb,
+				     unsigned int previous,
+				     int status)
+{
+	struct kbdq_core *pkc  = GET_PBDQC_FROM_RB(rb);
+	struct block_desc *pbd = GET_PBLOCK_DESC(pkc, previous);
+
+	if (status != BLOCK_STATUS(pbd))
+		return NULL;
+	return pbd;
+}
+
+static inline int prb_previous_blk_num(struct packet_ring_buffer *rb)
+{
+	unsigned int prev;
+	if (rb->prb_bdqc.kactive_blk_num)
+		prev = rb->prb_bdqc.kactive_blk_num-1;
+	else
+		prev = rb->prb_bdqc.knum_blocks-1;
+	return prev;
+}
+
+/* Assumes caller has held the rx_queue.lock */
+static inline void *__prb_previous_block(struct packet_sock *po,
+					 struct packet_ring_buffer *rb,
+					 int status)
+{
+	unsigned int previous = prb_previous_blk_num(rb);
+	return prb_lookup_block(po, rb, previous, status);
+}
+
+static inline void *packet_previous_rx_frame(struct packet_sock *po,
+					     struct packet_ring_buffer *rb,
+					     int status)
+{
+	if (po->tp_version <= TPACKET_V2)
+		return packet_previous_frame(po, rb, status);
+
+	return __prb_previous_block(po, rb, status);
+}
+
+static inline void packet_increment_rx_head(struct packet_sock *po,
+					    struct packet_ring_buffer *rb)
+{
+	switch (po->tp_version) {
+	case TPACKET_V1:
+	case TPACKET_V2:
+		return packet_increment_head(rb);
+	case TPACKET_V3:
+	default:
+		WARN(1, "TPACKET version not supported.\n");
+		BUG();
+		return;
+	}
+}
+
 static inline void *packet_previous_frame(struct packet_sock *po,
 		struct packet_ring_buffer *rb,
 		int status)
@@ -675,12 +1369,13 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 	union {
 		struct tpacket_hdr *h1;
 		struct tpacket2_hdr *h2;
+		struct tpacket3_hdr *h3;
 		void *raw;
 	} h;
 	u8 *skb_head = skb->data;
 	int skb_len = skb->len;
 	unsigned int snaplen, res;
-	unsigned long status = TP_STATUS_LOSING|TP_STATUS_USER;
+	unsigned long status = TP_STATUS_USER;
 	unsigned short macoff, netoff, hdrlen;
 	struct sk_buff *copy_skb = NULL;
 	struct timeval tv;
@@ -726,37 +1421,46 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 			po->tp_reserve;
 		macoff = netoff - maclen;
 	}
-
-	if (macoff + snaplen > po->rx_ring.frame_size) {
-		if (po->copy_thresh &&
-		    atomic_read(&sk->sk_rmem_alloc) + skb->truesize <
-		    (unsigned)sk->sk_rcvbuf) {
-			if (skb_shared(skb)) {
-				copy_skb = skb_clone(skb, GFP_ATOMIC);
-			} else {
-				copy_skb = skb_get(skb);
-				skb_head = skb->data;
+	if (po->tp_version <= TPACKET_V2) {
+		if (macoff + snaplen > po->rx_ring.frame_size) {
+			if (po->copy_thresh &&
+				atomic_read(&sk->sk_rmem_alloc) + skb->truesize <
+				(unsigned)sk->sk_rcvbuf) {
+				if (skb_shared(skb)) {
+					copy_skb = skb_clone(skb, GFP_ATOMIC);
+				} else {
+					copy_skb = skb_get(skb);
+					skb_head = skb->data;
+				}
+				if (copy_skb)
+					skb_set_owner_r(copy_skb, sk);
 			}
-			if (copy_skb)
-				skb_set_owner_r(copy_skb, sk);
+			snaplen = po->rx_ring.frame_size - macoff;
+			if ((int)snaplen < 0)
+				snaplen = 0;
 		}
-		snaplen = po->rx_ring.frame_size - macoff;
-		if ((int)snaplen < 0)
-			snaplen = 0;
 	}
-
 	spin_lock(&sk->sk_receive_queue.lock);
-	h.raw = packet_current_frame(po, &po->rx_ring, TP_STATUS_KERNEL);
+	h.raw = packet_current_rx_frame(po, &po->rx_ring,
+					TP_STATUS_KERNEL, (macoff+snaplen));
 	if (!h.raw)
 		goto ring_is_full;
-	packet_increment_head(&po->rx_ring);
+	if (po->tp_version <= TPACKET_V2) {
+		packet_increment_rx_head(po, &po->rx_ring);
+	/*
+	 * LOSING will be reported till you read the stats,
+	 * because it's COR - Clear On Read.
+	 * Anyways, moving it for V1/V2 only as V3 doesn't need this
+	 * at packet level.
+	 */
+		if (po->stats.tp_drops)
+			status |= TP_STATUS_LOSING;
+	}
 	po->stats.tp_packets++;
 	if (copy_skb) {
 		status |= TP_STATUS_COPY;
 		__skb_queue_tail(&sk->sk_receive_queue, copy_skb);
 	}
-	if (!po->stats.tp_drops)
-		status &= ~TP_STATUS_LOSING;
 	spin_unlock(&sk->sk_receive_queue.lock);
 
 	skb_copy_bits(skb, 0, h.raw + macoff, snaplen);
@@ -806,6 +1510,36 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 		}
 		hdrlen = sizeof(*h.h2);
 		break;
+	case TPACKET_V3:
+		/* tp_nxt_offset is already populated above.
+		 * So DONT clear those fields here
+		 */
+		h.h3->tp_status = status;
+		h.h3->tp_len = skb->len;
+		h.h3->tp_snaplen = snaplen;
+		h.h3->tp_mac = macoff;
+		h.h3->tp_net = netoff;
+		if ((po->tp_tstamp & SOF_TIMESTAMPING_SYS_HARDWARE)
+				&& shhwtstamps->syststamp.tv64)
+			ts = ktime_to_timespec(shhwtstamps->syststamp);
+		else if ((po->tp_tstamp & SOF_TIMESTAMPING_RAW_HARDWARE)
+				&& shhwtstamps->hwtstamp.tv64)
+			ts = ktime_to_timespec(shhwtstamps->hwtstamp);
+		else if (skb->tstamp.tv64)
+			ts = ktime_to_timespec(skb->tstamp);
+		else
+			getnstimeofday(&ts);
+		h.h3->tp_sec  = ts.tv_sec;
+		h.h3->tp_nsec = ts.tv_nsec;
+		if (vlan_tx_tag_present(skb)) {
+			h.h3->tp_vlan_tci = vlan_tx_tag_get(skb);
+			h.h3->tp_status |= TP_STATUS_VLAN_VALID;
+		} else {
+			h.h3->tp_vlan_tci = 0;
+		}
+		h.h3->tp_padding = 0;
+		hdrlen = sizeof(*h.h3);
+		break;
 	default:
 		BUG();
 	}
@@ -820,18 +1554,22 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 		sll->sll_ifindex = orig_dev->ifindex;
 	else
 		sll->sll_ifindex = dev->ifindex;
-
-	__packet_set_status(po, h.raw, status);
+	if (po->tp_version <= TPACKET_V2)
+		__packet_set_status(po, h.raw, status);
 	smp_mb();
 #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE == 1
 	{
 		u8 *start, *end;
 
-		end = (u8 *)PAGE_ALIGN((unsigned long)h.raw + macoff + snaplen);
-		for (start = h.raw; start < end; start += PAGE_SIZE)
-			flush_dcache_page(pgv_to_page(start));
+		if (po->tp_version <= TPACKET_V2) {
+			end = (u8 *)PAGE_ALIGN((unsigned long)h.raw + macoff + snaplen);
+			for (start = h.raw; start < end; start += PAGE_SIZE)
+				flush_dcache_page(pgv_to_page(start));
+		}
 	}
 #endif
+	if (po->tp_version > TPACKET_V2)
+		prb_clear_blk_fill_status(&po->rx_ring);
 
 	sk->sk_data_ready(sk, 0);
 
@@ -1322,7 +2060,7 @@ static int packet_release(struct socket *sock)
 	struct sock *sk = sock->sk;
 	struct packet_sock *po;
 	struct net *net;
-	struct tpacket_req req;
+	union tpacket_req_u req_u;
 
 	if (!sk)
 		return 0;
@@ -1353,13 +2091,13 @@ static int packet_release(struct socket *sock)
 
 	packet_flush_mclist(sk);
 
-	memset(&req, 0, sizeof(req));
+	memset(&req_u, 0, sizeof(req_u));
 
 	if (po->rx_ring.pg_vec)
-		packet_set_ring(sk, &req, 1, 0);
+		packet_set_ring(sk, &req_u, 1, 0);
 
 	if (po->tx_ring.pg_vec)
-		packet_set_ring(sk, &req, 1, 1);
+		packet_set_ring(sk, &req_u, 1, 1);
 
 	synchronize_net();
 	/*
@@ -1988,15 +2726,26 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
 	case PACKET_RX_RING:
 	case PACKET_TX_RING:
 	{
-		struct tpacket_req req;
+		union tpacket_req_u req_u;
+		int len;
 
-		if (optlen < sizeof(req))
+		switch (po->tp_version) {
+		case TPACKET_V1:
+		case TPACKET_V2:
+			len = sizeof(req_u.req);
+			break;
+		case TPACKET_V3:
+		default:
+			len = sizeof(req_u.req3);
+			break;
+		}
+		if (optlen < len)
 			return -EINVAL;
 		if (pkt_sk(sk)->has_vnet_hdr)
 			return -EINVAL;
-		if (copy_from_user(&req, optval, sizeof(req)))
+		if (copy_from_user(&req_u.req, optval, len))
 			return -EFAULT;
-		return packet_set_ring(sk, &req, 0, optname == PACKET_TX_RING);
+		return packet_set_ring(sk, &req_u, 0, optname == PACKET_TX_RING);
 	}
 	case PACKET_COPY_THRESH:
 	{
@@ -2023,6 +2772,7 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
 		switch (val) {
 		case TPACKET_V1:
 		case TPACKET_V2:
+		case TPACKET_V3:
 			po->tp_version = val;
 			return 0;
 		default:
@@ -2121,6 +2871,7 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 	struct packet_sock *po = pkt_sk(sk);
 	void *data;
 	struct tpacket_stats st;
+	union tpacket_stats_u st_u;
 
 	if (level != SOL_PACKET)
 		return -ENOPROTOOPT;
@@ -2133,15 +2884,26 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 
 	switch (optname) {
 	case PACKET_STATISTICS:
-		if (len > sizeof(struct tpacket_stats))
-			len = sizeof(struct tpacket_stats);
+		if (po->tp_version == TPACKET_V3) {
+			len = sizeof(struct tpacket_stats_v3);
+		} else {
+			if (len > sizeof(struct tpacket_stats))
+				len = sizeof(struct tpacket_stats);
+		}
 		spin_lock_bh(&sk->sk_receive_queue.lock);
-		st = po->stats;
+		if (po->tp_version == TPACKET_V3) {
+			memcpy(&st_u.stats3, &po->stats,
+			sizeof(struct tpacket_stats));
+			st_u.stats3.tp_freeze_q_cnt = po->stats_u.stats3.tp_freeze_q_cnt;
+			st_u.stats3.tp_packets += po->stats.tp_drops;
+			data = &st_u.stats3;
+		} else {
+			st = po->stats;
+			st.tp_packets += st.tp_drops;
+			data = &st;
+		}
 		memset(&po->stats, 0, sizeof(st));
 		spin_unlock_bh(&sk->sk_receive_queue.lock);
-		st.tp_packets += st.tp_drops;
-
-		data = &st;
 		break;
 	case PACKET_AUXDATA:
 		if (len > sizeof(int))
@@ -2182,6 +2944,9 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
 		case TPACKET_V2:
 			val = sizeof(struct tpacket2_hdr);
 			break;
+		case TPACKET_V3:
+			val = sizeof(struct tpacket3_hdr);
+			break;
 		default:
 			return -EINVAL;
 		}
@@ -2334,7 +3099,7 @@ static unsigned int packet_poll(struct file *file, struct socket *sock,
 
 	spin_lock_bh(&sk->sk_receive_queue.lock);
 	if (po->rx_ring.pg_vec) {
-		if (!packet_previous_frame(po, &po->rx_ring, TP_STATUS_KERNEL))
+		if (!packet_previous_rx_frame(po, &po->rx_ring, TP_STATUS_KERNEL))
 			mask |= POLLIN | POLLRDNORM;
 	}
 	spin_unlock_bh(&sk->sk_receive_queue.lock);
@@ -2453,7 +3218,7 @@ out_free_pgvec:
 	goto out;
 }
 
-static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
+static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
 		int closing, int tx_ring)
 {
 	struct pgv *pg_vec = NULL;
@@ -2462,7 +3227,15 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 	struct packet_ring_buffer *rb;
 	struct sk_buff_head *rb_queue;
 	__be16 num;
-	int err;
+	int err = -EINVAL;
+	/* Added to avoid minimal code churn */
+	struct tpacket_req *req = &req_u->req;
+
+	/* Opening a Tx-ring is NOT supported in TPACKET_V3 */
+	if (!closing && tx_ring && (po->tp_version > TPACKET_V2)) {
+		WARN(1, "Tx-ring is not supported.\n");
+		goto out;
+	}
 
 	rb = tx_ring ? &po->tx_ring : &po->rx_ring;
 	rb_queue = tx_ring ? &sk->sk_write_queue : &sk->sk_receive_queue;
@@ -2488,6 +3261,9 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 		case TPACKET_V2:
 			po->tp_hdrlen = TPACKET2_HDRLEN;
 			break;
+		case TPACKET_V3:
+			po->tp_hdrlen = TPACKET3_HDRLEN;
+			break;
 		}
 
 		err = -EINVAL;
@@ -2513,6 +3289,17 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 		pg_vec = alloc_pg_vec(req, order);
 		if (unlikely(!pg_vec))
 			goto out;
+		switch (po->tp_version) {
+		case TPACKET_V3:
+		/* Transmit path is not supported. We checked
+		 * it above but just being paranoid
+		 */
+			if (!tx_ring)
+				init_prb_bdqc(po, rb, pg_vec, req_u, tx_ring);
+				break;
+		default:
+			break;
+		}
 	}
 	/* Done */
 	else {
@@ -2569,7 +3356,11 @@ static int packet_set_ring(struct sock *sk, struct tpacket_req *req,
 		dev_add_pack(&po->prot_hook);
 	}
 	spin_unlock(&po->bind_lock);
-
+	if (closing && (po->tp_version > TPACKET_V2)) {
+		/* Because we don't support block-based V3 on tx-ring */
+		if (!tx_ring)
+			prb_shutdown_retire_blk_timer(po, tx_ring, rb_queue);
+	}
 	release_sock(sk);
 
 	if (pg_vec)
-- 
1.7.5.2


^ permalink raw reply related

* [PATCH v2 net-next af-packet 1/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: Chetan Loke @ 2011-06-22  2:10 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, joe, bhutchings, shemminger, linux-kernel,
	Chetan Loke
In-Reply-To: <1308708650-25509-1-git-send-email-loke.chetan@gmail.com>

Added TPACKET_V3 definitions

Signed-off-by: Chetan Loke <loke.chetan@gmail.com>
---
 include/linux/if_packet.h |  128 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 128 insertions(+), 0 deletions(-)

diff --git a/include/linux/if_packet.h b/include/linux/if_packet.h
index 6d66ce1..e5fad08 100644
--- a/include/linux/if_packet.h
+++ b/include/linux/if_packet.h
@@ -55,6 +55,17 @@ struct tpacket_stats {
 	unsigned int	tp_drops;
 };
 
+struct tpacket_stats_v3 {
+	unsigned int	tp_packets;
+	unsigned int	tp_drops;
+	unsigned int	tp_freeze_q_cnt;
+};
+
+union tpacket_stats_u {
+	struct tpacket_stats stats1;
+	struct tpacket_stats_v3 stats3;
+};
+
 struct tpacket_auxdata {
 	__u32		tp_status;
 	__u32		tp_len;
@@ -71,6 +82,7 @@ struct tpacket_auxdata {
 #define TP_STATUS_LOSING	0x4
 #define TP_STATUS_CSUMNOTREADY	0x8
 #define TP_STATUS_VLAN_VALID   0x10 /* auxdata has valid tp_vlan_tci */
+#define TP_STATUS_BLK_TMO	0x20
 
 /* Tx ring - header status */
 #define TP_STATUS_AVAILABLE	0x0
@@ -102,12 +114,114 @@ struct tpacket2_hdr {
 	__u32		tp_nsec;
 	__u16		tp_vlan_tci;
 };
+struct tpacket3_hdr {
+	__u32		tp_status;
+	__u32		tp_len;
+	__u32		tp_snaplen;
+	__u16		tp_mac;
+	__u16		tp_net;
+	__u32		tp_sec;
+	__u32		tp_nsec;
+	__u16		tp_vlan_tci;
+	__u16		tp_padding;
+	__u32		tp_next_offset;
+};
+
+struct bd_ts {
+	unsigned int ts_sec;
+	union {
+		struct {
+			unsigned int ts_usec;
+		};
+		struct {
+			unsigned int ts_nsec;
+		};
+	};
+} __attribute__ ((__packed__));
+
+struct bd_v1 {
+	/*
+	 * If you re-order the first 5 fields then
+	 * the BLOCK_XXX macros will NOT work.
+	 */
+	__u32	block_status;
+	__u32	num_pkts;
+	__u32	offset_to_first_pkt;
+
+	/* Number of valid bytes (including padding)
+	 * blk_len <= tp_block_size
+	 */
+	__u32	blk_len;
+
+	/*
+	 * Quite a few uses of sequence number:
+	 * 1. Make sure cache flush etc worked.
+	 *    Well, one can argue - why not use the increasing ts below?
+	 *    But look at 2. below first.
+	 * 2. When you pass around blocks to other user space decoders,
+	 *    you can see which blk[s] is[are] outstanding etc.
+	 * 3. Validate kernel code.
+	 */
+	__u64	seq_num;
+
+	/*
+	 * ts_last_pkt:
+	 *
+	 * Case 1.	Block has 'N'(N >=1) packets and TMO'd(timed out)
+	 *		ts_last_pkt == 'time-stamp of last packet' and NOT the
+	 *		time when the timer fired and the block was closed.
+	 *		By providing the ts of the last packet we can absolutely
+	 *		guarantee that time-stamp wise, the first packet in the next
+	 *		block will never precede the last packet of the previous
+	 *		block.
+	 * Case 2.	Block has zero packets and TMO'd
+	 *		ts_last_pkt = time when the timer fired and the block
+	 *		was closed.
+	 * Case 3.	Block has 'N' packets and NO TMO.
+	 *		ts_last_pkt = time-stamp of the last pkt in the block.
+	 *
+	 * ts_first_pkt:
+	 *		Is always the time-stamp when the block was opened.
+	 *		Case a)	ZERO packets
+	 *			No packets to deal with but atleast you know the
+	 *			time-interval of this block.
+	 *		Case b) Non-zero packets
+	 *			Use the ts of the first packet in the block.
+	 *
+	 */
+	struct bd_ts	ts_first_pkt;
+	struct bd_ts	ts_last_pkt;
+} __attribute__ ((__packed__));
+
+struct block_desc {
+	__u16 version;
+	__u16 offset_to_priv;
+	union {
+		struct {
+			__u32	words[4];
+			__u64	dword;
+		} __attribute__ ((__packed__));
+		struct bd_v1 bd1;
+	};
+} __attribute__ ((__packed__));
+
+
 
 #define TPACKET2_HDRLEN		(TPACKET_ALIGN(sizeof(struct tpacket2_hdr)) + sizeof(struct sockaddr_ll))
+#define TPACKET3_HDRLEN		(TPACKET_ALIGN(sizeof(struct tpacket3_hdr)) + sizeof(struct sockaddr_ll))
+
+#define BLOCK_STATUS(x)	((x)->words[0])
+#define BLOCK_NUM_PKTS(x)	((x)->words[1])
+#define BLOCK_O2FP(x)		((x)->words[2])
+#define BLOCK_LEN(x)		((x)->words[3])
+#define BLOCK_SNUM(x)		((x)->dword)
+#define BLOCK_O2PRIV(x)	((x)->offset_to_priv)
+#define BLOCK_PRIV(x)		((void *)((char *)(x) + BLOCK_O2PRIV(x)))
 
 enum tpacket_versions {
 	TPACKET_V1,
 	TPACKET_V2,
+	TPACKET_V3,
 };
 
 /*
@@ -130,6 +244,20 @@ struct tpacket_req {
 	unsigned int	tp_frame_nr;	/* Total number of frames */
 };
 
+struct tpacket_req3 {
+	unsigned int	tp_block_size;	/* Minimal size of contiguous block */
+	unsigned int	tp_block_nr;	/* Number of blocks */
+	unsigned int	tp_frame_size;	/* Size of frame */
+	unsigned int	tp_frame_nr;	/* Total number of frames */
+	unsigned int	tp_retire_blk_tov; /* timeout in msecs */
+	unsigned int	tp_sizeof_priv; /* size of private data area */
+};
+
+union tpacket_req_u {
+	struct tpacket_req	req;
+	struct tpacket_req3	req3;
+};
+
 struct packet_mreq {
 	int		mr_ifindex;
 	unsigned short	mr_type;
-- 
1.7.5.2


^ permalink raw reply related

* [PATCH v2 net-next af-packet 0/2] Enhance af-packet to provide (near zero)lossless packet capture functionality.
From: Chetan Loke @ 2011-06-22  2:10 UTC (permalink / raw)
  To: netdev
  Cc: davem, eric.dumazet, joe, bhutchings, shemminger, linux-kernel,
	Chetan Loke

Hello,

Please review the patchset.

Changes from v1:

1) v1 was based on 2.6.38.9. v2 is rebased to net-next.
2) Aligned bdqc members, pr_err to WARN, sob email      (Joe Perches)
3) Added tp_padding                                     (Eric Dumazet)
4) Nuked useless ;) white space                         (Stephen H)
5) Use __u types in headers                             (Ben Hutchings)
6) Added field for creating private area             	(Chetan Loke)

This patch attempts to:
1)Improve network capture visibility by increasing packet density
2)Assist in analyzing multiple(aggregated) capture ports.

Benefits:
  B1) ~15-20% reduction in cpu-usage.
  B2) ~20% increase in packet capture rate.
  B3) ~2x  increase in packet density.
  B4) Port aggregation analysis.
  B5) Non static frame size to capture entire packet payload.

With the current af_packet->rx::mmap based approach, the element size
in the block needs to be statically configured. Nothing wrong with this
config/implementation. But the traffic profile cannot be known in advance.
And so it would be nice if that configuration wasn't static. Normally,
one would configure the element-size to be '2048' so that you can atleast
capture the entire 'MTU-size'.But if the traffic profile varies then we
would end up either i)wasting memory or ii) end up getting a sliced frame.
In other words the packet density will be much less in the first case.

--------------------
Performance results:
--------------------

Tpacket config(same on Physical/Virtual setup):
64 blocks(1MB block size)

**************
Physical setup
**************

pktgen: 64 byte traffic.

1G Intel
driver: igb
version: 2.1.0-k2
firmware-version: 3.19-0


Tpacket          V1                 V3
capture-rate     600K pps     720K pps
cpu usage        70%           53%
Drop-rate         7-10%        ~1%

**********************
Virtual Machine setup:
**********************

pktgen: 64 byte traffic,40M packets(clone_skb <40000000>)

Worker VMs(FC12):
3 VMs:VM0 .. VM2, each sending 40M packets.

probe-VM(FC15): 1-vCPU/512MB memory
running patched kernel


Tpacket          V1                       V3
capture-rate     700-800K pps        1M pps
cpu usage        50%                   ~30%
Drop-rate         9-10%                <1%


Plus, in the VM setup,V3 sees/captures around 5-10% more traffic than V1/V2.

------------
Enhancement:
------------
E1) Enhanced tpacket_rcv so that it can dump/copy the packets one after another.
E2) Also implemented basic timeout mechanism to close 'a' current block.
    That way, user-space won't be blocked forever on an idle link.
    This is a much needed feature while monitoring multiple ports.
    Look at 3) below.

-------------------------------
Why is such enhancement needed?
-------------------------------
1) Well, spin-waiting/polling on a per-packet basis to see if it's ready
   to be consumed does not scale while monitoring multiple ports.
   poll() is not performance friendly either.
2) Also, typically a user-space packet capture interface handles multiple
   packets to another user-space protocol-decoder.

   ----------------
   protocol-decoder
          T2
   ----------------
    =============
    ship pkts
    =============
           ^
           |
           v
   -----------------
   pkt-capture logic
           T1
   -----------------
   ================
     nic/sock IF
   ================
           ^
           |
           V

T1 and T2 are user-space threads. If the hand-off between T1 and T2
happens on a per-pkt basis then the solution does NOT scale.

However, one can argue that T1 can coalesce packets and then pass of a
single chunk to T2.But T1's packet consumption granularity is still at
an individual packet level and that is something that needs to be
addressed to avoid excessive polling.


3) Port aggregation analysis:
   Multiple ports are viewed/analyzed as one logical pipe.
   Example:
   3.1) up-stream    path can be tapped in eth1
   3.2) down-stream  path can be tapped in eth2
   3.3) Network TAP splits Rx/Tx paths and then feeds to eth1,eth2.

   If both eth1,eth2 need to be viewed as one logical channel,
   then that implies we need to timesort the packets as they come across
   eth1,eth2.

   3.4) But following issues further complicates the problem:
        3.4.1)What if one stream is bursty and other is flowing
              at line rate?
        3.4.2)How long do we wait before we can actually make a
              decision in the app-space and bail-out from the spin-wait?

   Solution:
   3.5) Once we receive a block from multiple ports,we can compare
        the timestamps from the block-descriptor and then easily time sort
        the packets and feed them to the decoders.

PS: The actual patch is ~744 lines of code. Rest ~220 lines are code comments.

sample user space code:
git://lolpcap.git.sourceforge.net/gitroot/lolpcap/lolpcap

Chetan Loke (2):

 include/linux/if_packet.h |  128 +++++++
 net/packet/af_packet.c    |  881 ++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 964 insertions(+), 45 deletions(-)

-- 
1.7.5.2

^ permalink raw reply

* Re: [RFC PATCH] packet: Add fanout support.
From: Changli Gao @ 2011-06-22  1:44 UTC (permalink / raw)
  To: David Miller; +Cc: victor, netdev
In-Reply-To: <20110621.143902.274396574751811372.davem@davemloft.net>

On Wed, Jun 22, 2011 at 5:39 AM, David Miller <davem@davemloft.net> wrote:
> From: Victor Julien <victor@inliniac.net>
> Date: Tue, 21 Jun 2011 15:27:54 +0200
>
>> From a Suricata IDS point of view, I would need to have the
>> fragments of a flow/tuple on the same socket.
>
> Currently you would, they would all go to the first socket in
> the fanout.
>

I think he also needs all the packets belong to the related
connections are received via the same socket. I am afraid that he has
to dispatch these kind of packets among the uesrland processes again.
:)

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: Steps to integrate new 40G Network driver to the kernel tree
From: Ben Hutchings @ 2011-06-22  0:14 UTC (permalink / raw)
  To: Joyce Yu - System Software; +Cc: Stephen Hemminger, netdev
In-Reply-To: <4E012EF1.3050005@oracle.com>

On Tue, 2011-06-21 at 16:53 -0700, Joyce Yu - System Software wrote:
> Since it is a new driver, shall I follow the "Submitting Drivers for the 
> Linux Kernel" Doc? The driver is ready and passed all our QA cycles. I  
> have three source base, one is for 2.6.16 and 2.6.18 (for RHL 5.X and 
> SLES 10 SP3) , one is for 2.6.32 (For RHL 6.X, SLES11SP1), one for 
> 2.6.27 (For SLES11). Can it be integrated to the 2.6.18 and 2.6.32 tree? 
> or it has to be in the latest 2.6.39 or later?

There are 'longterm' branches for Linux 2.6.27 and 2.6.32, but they
don't include new drivers.  If you want your driver to go into existing
distributions then you have to talk to the distributors.  They will
generally tell you that your driver should be accepted in mainline Linux
first.  So you should first submit a driver based on David Miller's
net-2.6 tree (targetting 3.0) or net-next-2.6 (targetting 3.1).

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: linux-next: manual merge of the staging tree with the trivial and net trees
From: Greg KH @ 2011-06-22  0:10 UTC (permalink / raw)
  To: Vitaliy Ivanov
  Cc: Stephen Rothwell, linux-next, linux-kernel, Jiri Kosina,
	David Miller, netdev
In-Reply-To: <BANLkTimmy0GJMgho4YK62gru6QXGox6Eqg@mail.gmail.com>

On Wed, Jun 22, 2011 at 01:40:48AM +0300, Vitaliy Ivanov wrote:
> Stephen,
> 
> On Tue, Jun 21, 2011 at 8:10 AM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> > Today's linux-next merge of the staging tree got a conflict in
> > drivers/staging/brcm80211/brcmfmac/wl_iw.c between commit e44ba033c565
> > ("treewide: remove duplicate includes") from the trivial tree, commit
> > 219eb47e6f35 ("net/staging: add needed interrupt.h and hardirq.h
> > includes") from the net tree and various commits from the staging tree.
> >
> > I fixed them up (see below) and can carry the fix as necessary.
> 
> This one and all the others look good to me.

Me too, thanks for doing this.

greg k-h

^ permalink raw reply

* Re: Steps to integrate new 40G Network driver to the kernel tree
From: Joyce Yu - System Software @ 2011-06-21 23:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <4E012D80.4090907@oracle.com>


Since it is a new driver, shall I follow the "Submitting Drivers for the 
Linux Kernel" Doc? The driver is ready and passed all our QA cycles. I  
have three source base, one is for 2.6.16 and 2.6.18 (for RHL 5.X and 
SLES 10 SP3) , one is for 2.6.32 (For RHL 6.X, SLES11SP1), one for 
2.6.27 (For SLES11). Can it be integrated to the 2.6.18 and 2.6.32 tree? 
or it has to be in the latest 2.6.39 or later?

Thanks,
Joyce


^ permalink raw reply

* Re: Steps to integrate new 40G Network driver to the kernel tree
From: Stephen Hemminger @ 2011-06-21 23:52 UTC (permalink / raw)
  To: Joyce Yu - System Software; +Cc: netdev
In-Reply-To: <4E012D80.4090907@oracle.com>

On Tue, 21 Jun 2011 16:47:12 -0700
Joyce Yu - System Software <joyce.yu@oracle.com> wrote:

> 
> Since it is a new driver, shall I follow the "Submitting Drivers for the 
> Linux Kernel" Doc? The driver is ready and passed all our QA cycles. I  
> have three source base, one is for 2.6.16 and 2.6.18 (for RHL 5.X and 
> SLES 10 SP3) , one is for 2.6.32 (For RHL 6.X, SLES11SP1), one for 
> 2.6.27 (For SLES11). Can it be integrated to the 2.6.18 and 2.6.32 tree? 
> or it has to be in the latest 2.6.39 or later?
> 

Do you want the driver to show up in 3.0 (aka 2.6.40) or next release.
For 3.0 submit patch against 3.0-rc4 for next release base the patch
against the network development tree (net-next-2.6).

The code should remove all special case code to deal with old kernel
versions. No backward compatiablity #ifdef's


^ permalink raw reply

* Re: [PATCH]: Add Network Sysrq Support
From: Flavio Leitner @ 2011-06-21 23:32 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: netdev, davem, agospoda, nhorman, lwoodman
In-Reply-To: <4E011A96.7050509@redhat.com>

On 06/21/2011 07:26 PM, Prarit Bhargava wrote:
>> I'm thinking on a situation where we leave the systems with this enabled
>> and then an ordinary user starts pinging the network guessing the hexa to
>> cause reboots.
>>   
> 
> Good point Flavio, but that's *exactly* why I wrote this in single-shot
> mode.  I really think the code might be a bit too risky for most people
> to deploy in production environments.  It's too risky for me to let
> someone ping and ping and ping until they luckily hit the magic number
> and figure out how to bring *all* of my systems down.  What are the
> chances that a lab admin is smart enough to set the password to
> different numbers across different machines in a single lab?

I see your point.  I liked the patch because of the simplicity but
oh well, if we care that much about the security, then in the end
we will have something similar to what the xt_SYSRQ does already.

fbl


^ permalink raw reply

* Re: [PATCH net-next 0/3] Three possible UDP fixes.
From: David Miller @ 2011-06-21 23:31 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: eric.dumazet, netdev
In-Reply-To: <1308689020-1873-1-git-send-email-paul.gortmaker@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Tue, 21 Jun 2011 16:43:37 -0400

> These were originally found on a 2.6.34 baseline, but I looked
> at them and couldn't see any reason why they wouldn't be valid
> fixes on net-next.  But I'll feel better when someone like
> Dave and/or Eric sanity checks them too.
> 
> There was one thing that was a consideration.  In the 3rd patch,
> where we clear MSG_TRUNC bit -- is there anything in there that
> we really need to be concerned about preserving on the retry,
> or could we just unconditionally do "msg->msg_flags = 0" ?
> I wasn't sure, and so sticking with clearing the offending bit
> seemed like the most cautious approach.

All applied and queued up for -stable, thanks!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox