[PATCH 00/10 net-next] Introduce per interface ipv4 statistics

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
@ 2011-12-16 15:25 igorm
  2011-12-16 15:25 ` [PATCH 01/10 net-next] include: net: netns: mib: Add proc_dir_entry for ipv4 per interface stats igorm
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Hi all,
in this patch series I introduced per interface statistics for ipv4.

I referenced how it was done for ipv6 per interface statistics.
BR
Igor

Igor Maravic (10):
  include: net: netns: mib: Add proc_dir_entry for ipv4 per interface
    stats
  include: net: snmp: Create icmp per device counters and add macros
    for per device stats
  include:linux:inetdevice: Add struct ipv4_devstat and func
    __in_dev_get_rcu_safely
  include:net:ipv6: Moved _DEV* macros
  include:net:ip: Tuned up IP_*_STATS macros for per device statistics
    and added functions for (un)registering per device proc entries
  include:net:icmp: Tuned up ICMP_*_STATS macros for per device
    statistics and changed prototype for icmp_out_count
  net:ipv4:devinet: Add support for alloc/free of per device stats and
    (un)register of per device proc files
  net:ipv4:af_inet: Init proc fs before ip_init
  net:ipv4:proc: Introduce proc files for ipv4 per interface stats
  net: Enable ipv4 per interface statistics

 include/linux/inetdevice.h      |   16 +++++
 include/net/icmp.h              |   37 ++++++++++--
 include/net/ip.h                |   60 +++++++++++++++++--
 include/net/ipv6.h              |   64 ++++-----------------
 include/net/netns/mib.h         |    1 +
 include/net/snmp.h              |   55 ++++++++++++++++++
 net/bridge/br_netfilter.c       |    6 +-
 net/dccp/ipv4.c                 |    9 ++-
 net/ipv4/af_inet.c              |    7 ++-
 net/ipv4/datagram.c             |    2 +-
 net/ipv4/devinet.c              |   75 +++++++++++++++++++++---
 net/ipv4/icmp.c                 |   29 +++++----
 net/ipv4/inet_connection_sock.c |    8 ++-
 net/ipv4/ip_forward.c           |    8 ++-
 net/ipv4/ip_fragment.c          |   43 ++++++++------
 net/ipv4/ip_input.c             |   33 ++++++-----
 net/ipv4/ip_output.c            |   43 ++++++++------
 net/ipv4/ipmr.c                 |    6 +-
 net/ipv4/ping.c                 |    8 +-
 net/ipv4/proc.c                 |  121 +++++++++++++++++++++++++++++++++++++++
 net/ipv4/raw.c                  |    4 +-
 net/ipv4/route.c                |    2 +-
 net/ipv4/tcp_ipv4.c             |    9 ++-
 net/ipv4/udp.c                  |    7 +-
 net/l2tp/l2tp_ip.c              |    6 +-
 net/sctp/input.c                |    4 +-
 net/sctp/output.c               |    2 +-
 27 files changed, 488 insertions(+), 177 deletions(-)

-- 
1.7.5.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 01/10 net-next] include: net: netns: mib: Add proc_dir_entry for ipv4 per interface stats
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
@ 2011-12-16 15:25 ` igorm
  2011-12-16 15:25 ` [PATCH 02/10 net-next] include: net: snmp: Create icmp per device counters and add macros for per device stats igorm
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Introduces proc_dir_entry in which we will hold ipv4 per interface proc files

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 include/net/netns/mib.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/net/netns/mib.h b/include/net/netns/mib.h
index d542a4b..eaac2d1 100644
--- a/include/net/netns/mib.h
+++ b/include/net/netns/mib.h
@@ -4,6 +4,7 @@
 #include <net/snmp.h>
 
 struct netns_mib {
+	struct proc_dir_entry *proc_net_devsnmp;
 	DEFINE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 	DEFINE_SNMP_STAT(struct ipstats_mib, ip_statistics);
 	DEFINE_SNMP_STAT(struct linux_mib, net_statistics);
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 02/10 net-next] include: net: snmp: Create icmp per device counters and add macros for per device stats
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
  2011-12-16 15:25 ` [PATCH 01/10 net-next] include: net: netns: mib: Add proc_dir_entry for ipv4 per interface stats igorm
@ 2011-12-16 15:25 ` igorm
  2011-12-16 15:25 ` [PATCH 03/10 net-next] include:linux:inetdevice: Add struct ipv4_devstat and func __in_dev_get_rcu_safely igorm
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Added per device counters for ipv4 icmp statistics.

Moved _DEV* macros here, from include/net/ipv6.h.
Also, made them more generic, by adding new argument - type.
Because of that, they can be used for in_device stats accounting
in the same way as for inet6_dev.

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 include/net/snmp.h |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/include/net/snmp.h b/include/net/snmp.h
index 2f65e16..65f4052 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -61,14 +61,24 @@ struct ipstats_mib {
 
 /* ICMP */
 #define ICMP_MIB_MAX	__ICMP_MIB_MAX
+/* per network ns counters */
 struct icmp_mib {
 	unsigned long	mibs[ICMP_MIB_MAX];
 };
+/*per device counters, (shared on all cpus) */
+struct icmp_mib_dev {
+	atomic_long_t	mibs[ICMP_MIB_MAX];
+};
 
 #define ICMPMSG_MIB_MAX	__ICMPMSG_MIB_MAX
+/* per network ns counters */
 struct icmpmsg_mib {
 	atomic_long_t	mibs[ICMPMSG_MIB_MAX];
 };
+/* per device counters, (shared on all cpus) */
+struct icmpmsg_mib_dev {
+	atomic_long_t	mibs[ICMPMSG_MIB_MAX];
+};
 
 /* ICMP6 (IPv6-ICMP) */
 #define ICMP6_MIB_MAX	__ICMP6_MIB_MAX
@@ -214,4 +224,49 @@ struct linux_xfrm_mib {
 #define SNMP_UPD_PO_STATS64_BH(mib, basefield, addend) SNMP_UPD_PO_STATS_BH(mib, basefield, addend)
 #endif
 
+/* Macros for enabling per device statistics */
+
+#define _DEVINC(net, statname, modifier, type, idev, field)			\
+({	\
+	__typeof__(type) *_idev = (idev);	\
+	if (likely(_idev))	\
+		SNMP_INC_STATS##modifier((_idev)->stats.statname, (field));	\
+	SNMP_INC_STATS##modifier((net)->mib.statname##_statistics, (field));	\
+})
+
+/* per device counters are atomic_long_t */
+#define _DEVINCATOMIC(net, statname, modifier, type, idev, field)	\
+({	\
+	__typeof__(type) *_idev = (idev);	\
+	if (likely(_idev))	\
+		SNMP_INC_STATS_ATOMIC_LONG((_idev)->stats.statname##dev, (field));	\
+	SNMP_INC_STATS##modifier((net)->mib.statname##_statistics, (field));	\
+})
+
+/* per device and per net counters are atomic_long_t */
+
+#define _DEVINC_ATOMIC_ATOMIC(net, statname, type, idev, field)	\
+({	\
+	__typeof__(type) *_idev = (idev);	\
+	if (likely(_idev))	\
+		SNMP_INC_STATS_ATOMIC_LONG((_idev)->stats.statname##dev, (field));	\
+	SNMP_INC_STATS_ATOMIC_LONG((net)->mib.statname##_statistics, (field));	\
+})
+
+#define _DEVADD(net, statname, modifier, type, idev, field, val)		\
+({	\
+	__typeof__(type) *_idev = (idev);	\
+	if (likely(_idev))	\
+		SNMP_ADD_STATS##modifier((_idev)->stats.statname, (field), (val));	\
+	SNMP_ADD_STATS##modifier((net)->mib.statname##_statistics, (field), (val));	\
+})
+
+#define _DEVUPD(net, statname, modifier, type, idev, field, val)		\
+({	\
+	__typeof__(type) *_idev = (idev);	\
+	if (likely(_idev))	\
+		SNMP_UPD_PO_STATS##modifier((_idev)->stats.statname, field, (val));	\
+	SNMP_UPD_PO_STATS##modifier((net)->mib.statname##_statistics, field, (val));	\
+})
+
 #endif
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 03/10 net-next] include:linux:inetdevice: Add struct ipv4_devstat and func __in_dev_get_rcu_safely
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
  2011-12-16 15:25 ` [PATCH 01/10 net-next] include: net: netns: mib: Add proc_dir_entry for ipv4 per interface stats igorm
  2011-12-16 15:25 ` [PATCH 02/10 net-next] include: net: snmp: Create icmp per device counters and add macros for per device stats igorm
@ 2011-12-16 15:25 ` igorm
  2011-12-16 15:25 ` [PATCH 04/10 net-next] include:net:ipv6: Moved _DEV* macros igorm
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Added struct ipv4_devstat for holding per device ipv4 stats.

Added function __in_dev_get_rcu_safely.
Did that so I would have cleaner code in IP_*_STATS and ICMP_*_STATS macros.

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 include/linux/inetdevice.h |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 5f81466..a4bffa7 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -49,6 +49,13 @@ struct ipv4_devconf {
 	DECLARE_BITMAP(state, IPV4_DEVCONF_MAX);
 };
 
+struct ipv4_devstat {
+	struct proc_dir_entry   *proc_dir_entry;
+	DEFINE_SNMP_STAT(struct ipstats_mib, ip);
+	DEFINE_SNMP_STAT_ATOMIC(struct icmp_mib_dev, icmpdev);
+	DEFINE_SNMP_STAT_ATOMIC(struct icmpmsg_mib_dev, icmpmsgdev);
+};
+
 struct in_device {
 	struct net_device	*dev;
 	atomic_t		refcnt;
@@ -69,6 +76,7 @@ struct in_device {
 
 	struct neigh_parms	*arp_parms;
 	struct ipv4_devconf	cnf;
+	struct ipv4_devstat	stats;
 	struct rcu_head		rcu_head;
 };
 
@@ -209,6 +217,14 @@ static inline struct in_device *__in_dev_get_rcu(const struct net_device *dev)
 	return rcu_dereference(dev->ip_ptr);
 }
 
+static inline struct in_device *__in_dev_get_rcu_safely(const struct net_device *dev)
+{
+	if (likely(dev))
+		return rcu_dereference(dev->ip_ptr);
+	else
+		return NULL;
+}
+
 static inline struct in_device *in_dev_get(const struct net_device *dev)
 {
 	struct in_device *in_dev;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 04/10 net-next] include:net:ipv6: Moved _DEV* macros
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
                   ` (2 preceding siblings ...)
  2011-12-16 15:25 ` [PATCH 03/10 net-next] include:linux:inetdevice: Add struct ipv4_devstat and func __in_dev_get_rcu_safely igorm
@ 2011-12-16 15:25 ` igorm
  2011-12-16 15:25 ` [PATCH 05/10 net-next] include:net:ip: Tuned up IP_*_STATS macros for per device statistics and added functions for (un)registering per device proc entries igorm
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Moved _DEV* macros to /include/net/snmp.h so they
could be reused for ipv4 per device statistics.

Changed calling of _DEV* macros because they now have +1 argument - type.

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 include/net/ipv6.h |   64 +++++++++-------------------------------------------
 1 files changed, 11 insertions(+), 53 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index e4170a2..fe65b9b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -115,73 +115,31 @@ struct frag_hdr {
 extern int sysctl_mld_max_msf;
 extern struct ctl_path net_ipv6_ctl_path[];
 
-#define _DEVINC(net, statname, modifier, idev, field)			\
-({									\
-	struct inet6_dev *_idev = (idev);				\
-	if (likely(_idev != NULL))					\
-		SNMP_INC_STATS##modifier((_idev)->stats.statname, (field)); \
-	SNMP_INC_STATS##modifier((net)->mib.statname##_statistics, (field));\
-})
-
-/* per device counters are atomic_long_t */
-#define _DEVINCATOMIC(net, statname, modifier, idev, field)		\
-({									\
-	struct inet6_dev *_idev = (idev);				\
-	if (likely(_idev != NULL))					\
-		SNMP_INC_STATS_ATOMIC_LONG((_idev)->stats.statname##dev, (field)); \
-	SNMP_INC_STATS##modifier((net)->mib.statname##_statistics, (field));\
-})
-
-/* per device and per net counters are atomic_long_t */
-#define _DEVINC_ATOMIC_ATOMIC(net, statname, idev, field)		\
-({									\
-	struct inet6_dev *_idev = (idev);				\
-	if (likely(_idev != NULL))					\
-		SNMP_INC_STATS_ATOMIC_LONG((_idev)->stats.statname##dev, (field)); \
-	SNMP_INC_STATS_ATOMIC_LONG((net)->mib.statname##_statistics, (field));\
-})
-
-#define _DEVADD(net, statname, modifier, idev, field, val)		\
-({									\
-	struct inet6_dev *_idev = (idev);				\
-	if (likely(_idev != NULL))					\
-		SNMP_ADD_STATS##modifier((_idev)->stats.statname, (field), (val)); \
-	SNMP_ADD_STATS##modifier((net)->mib.statname##_statistics, (field), (val));\
-})
-
-#define _DEVUPD(net, statname, modifier, idev, field, val)		\
-({									\
-	struct inet6_dev *_idev = (idev);				\
-	if (likely(_idev != NULL))					\
-		SNMP_UPD_PO_STATS##modifier((_idev)->stats.statname, field, (val)); \
-	SNMP_UPD_PO_STATS##modifier((net)->mib.statname##_statistics, field, (val));\
-})
-
 /* MIBs */
 
 #define IP6_INC_STATS(net, idev,field)		\
-		_DEVINC(net, ipv6, 64, idev, field)
+		_DEVINC(net, ipv6, 64, struct inet6_dev, idev, field)
 #define IP6_INC_STATS_BH(net, idev,field)	\
-		_DEVINC(net, ipv6, 64_BH, idev, field)
+		_DEVINC(net, ipv6, 64_BH, struct inet6_dev, idev, field)
 #define IP6_ADD_STATS(net, idev,field,val)	\
-		_DEVADD(net, ipv6, 64, idev, field, val)
+		_DEVADD(net, ipv6, 64, struct inet6_dev, idev, field, val)
 #define IP6_ADD_STATS_BH(net, idev,field,val)	\
-		_DEVADD(net, ipv6, 64_BH, idev, field, val)
+		_DEVADD(net, ipv6, 64_BH, struct inet6_dev, idev, field, val)
 #define IP6_UPD_PO_STATS(net, idev,field,val)   \
-		_DEVUPD(net, ipv6, 64, idev, field, val)
+		_DEVUPD(net, ipv6, 64, struct inet6_dev, idev, field, val)
 #define IP6_UPD_PO_STATS_BH(net, idev,field,val)   \
-		_DEVUPD(net, ipv6, 64_BH, idev, field, val)
+		_DEVUPD(net, ipv6, 64_BH, struct inet6_dev, idev, field, val)
 #define ICMP6_INC_STATS(net, idev, field)	\
-		_DEVINCATOMIC(net, icmpv6, , idev, field)
+		_DEVINCATOMIC(net, icmpv6, , struct inet6_dev, idev, field)
 #define ICMP6_INC_STATS_BH(net, idev, field)	\
-		_DEVINCATOMIC(net, icmpv6, _BH, idev, field)
+		_DEVINCATOMIC(net, icmpv6, _BH, struct inet6_dev, idev, field)
 
 #define ICMP6MSGOUT_INC_STATS(net, idev, field)		\
-	_DEVINC_ATOMIC_ATOMIC(net, icmpv6msg, idev, field +256)
+	_DEVINC_ATOMIC_ATOMIC(net, icmpv6msg, struct inet6_dev, idev, field +256)
 #define ICMP6MSGOUT_INC_STATS_BH(net, idev, field)	\
-	_DEVINC_ATOMIC_ATOMIC(net, icmpv6msg, idev, field +256)
+	_DEVINC_ATOMIC_ATOMIC(net, icmpv6msg, struct inet6_dev, idev, field +256)
 #define ICMP6MSGIN_INC_STATS_BH(net, idev, field)	\
-	_DEVINC_ATOMIC_ATOMIC(net, icmpv6msg, idev, field)
+	_DEVINC_ATOMIC_ATOMIC(net, icmpv6msg, struct inet6_dev, idev, field)
 
 struct ip6_ra_chain {
 	struct ip6_ra_chain	*next;
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 05/10 net-next] include:net:ip: Tuned up IP_*_STATS macros for per device statistics and added functions for (un)registering per device proc entries
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
                   ` (3 preceding siblings ...)
  2011-12-16 15:25 ` [PATCH 04/10 net-next] include:net:ipv6: Moved _DEV* macros igorm
@ 2011-12-16 15:25 ` igorm
  2011-12-16 15:25 ` [PATCH 06/10 net-next] include:net:icmp: Tuned up ICMP_*_STATS macros for per device statistics and changed prototype for icmp_out_count igorm
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Changed IP_*_STATS so now they call _DEV* macros.
They call _DEV* macros under rcu_read_lock,
so we could get in_device* from net_device* with function
__in_dev_get_rcu_safely. This function is safe if dev==NULL.

Included <linux/inetdevice.h> so we could call __in_dev_get_rcu_safely.

Added prototypes for snmp_(un)register_dev.

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 include/net/ip.h |   60 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 775009f..9895b1f 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -26,6 +26,7 @@
 #include <linux/ip.h>
 #include <linux/in.h>
 #include <linux/skbuff.h>
+#include <linux/inetdevice.h>
 
 #include <net/inet_sock.h>
 #include <net/snmp.h>
@@ -184,12 +185,54 @@ struct ipv4_config {
 };
 
 extern struct ipv4_config ipv4_config;
-#define IP_INC_STATS(net, field)	SNMP_INC_STATS64((net)->mib.ip_statistics, field)
-#define IP_INC_STATS_BH(net, field)	SNMP_INC_STATS64_BH((net)->mib.ip_statistics, field)
-#define IP_ADD_STATS(net, field, val)	SNMP_ADD_STATS64((net)->mib.ip_statistics, field, val)
-#define IP_ADD_STATS_BH(net, field, val) SNMP_ADD_STATS64_BH((net)->mib.ip_statistics, field, val)
-#define IP_UPD_PO_STATS(net, field, val) SNMP_UPD_PO_STATS64((net)->mib.ip_statistics, field, val)
-#define IP_UPD_PO_STATS_BH(net, field, val) SNMP_UPD_PO_STATS64_BH((net)->mib.ip_statistics, field, val)
+#define IP_INC_STATS(net, dev, field)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVINC(net, ip, 64, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field);	\
+		rcu_read_unlock();	\
+	})
+
+#define IP_INC_STATS_BH(net, dev, field)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVINC(net, ip, 64_BH, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field);	\
+		rcu_read_unlock();	\
+	})
+
+#define IP_ADD_STATS(net, dev, field, val)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVADD(net, ip, 64, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field, val);	\
+		rcu_read_unlock();	\
+	})
+
+#define IP_ADD_STATS_BH(net, dev, field, val)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVADD(net, ip, 64_BH, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field, val);	\
+		rcu_read_unlock();	\
+	})
+
+#define IP_UPD_PO_STATS(net, dev, field, val)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVUPD(net, ip, 64, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field, val);	\
+		rcu_read_unlock();	\
+	})
+
+#define IP_UPD_PO_STATS_BH(net, dev, field, val)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVUPD(net, ip, 64_BH, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field, val);	\
+		rcu_read_unlock();	\
+	})
+
 #define NET_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.net_statistics, field)
 #define NET_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.net_statistics, field)
 #define NET_INC_STATS_USER(net, field) 	SNMP_INC_STATS_USER((net)->mib.net_statistics, field)
@@ -470,6 +513,11 @@ extern void	ip_local_error(struct sock *sk, int err, __be32 daddr, __be16 dport,
 
 #ifdef CONFIG_PROC_FS
 extern int ip_misc_proc_init(void);
+extern int snmp_register_dev(struct in_device *idev);
+extern int snmp_unregister_dev(struct in_device *idev);
+#else
+extern int snmp_register_dev(struct in_device *idev) { return 0;}
+extern int snmp_unregister_dev(struct in_device *idev) { return 0;}
 #endif
 
 #endif	/* _IP_H */
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 06/10 net-next] include:net:icmp: Tuned up ICMP_*_STATS macros for per device statistics and changed prototype for icmp_out_count
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
                   ` (4 preceding siblings ...)
  2011-12-16 15:25 ` [PATCH 05/10 net-next] include:net:ip: Tuned up IP_*_STATS macros for per device statistics and added functions for (un)registering per device proc entries igorm
@ 2011-12-16 15:25 ` igorm
  2011-12-16 15:26 ` [PATCH 07/10 net-next] net:ipv4:devinet: Add support for alloc/free of per device stats and (un)register of per device proc files igorm
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:25 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Changed ICMP_*_STATS so now they call _DEV* macros.
They call _DEV* macros under rcu_read_lock,
so we could get in_device from net_device with function
__in_dev_get_rcu_safely. This function is safe if dev==NULL.

Changed prototype for icmp_out_count so we can have per device statistics.

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 include/net/icmp.h |   37 ++++++++++++++++++++++++++++++++-----
 1 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/net/icmp.h b/include/net/icmp.h
index 75d6156..aa71b9a 100644
--- a/include/net/icmp.h
+++ b/include/net/icmp.h
@@ -29,10 +29,37 @@ struct icmp_err {
 };
 
 extern const struct icmp_err icmp_err_convert[];
-#define ICMP_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.icmp_statistics, field)
-#define ICMP_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.icmp_statistics, field)
-#define ICMPMSGOUT_INC_STATS(net, field)	SNMP_INC_STATS_ATOMIC_LONG((net)->mib.icmpmsg_statistics, field+256)
-#define ICMPMSGIN_INC_STATS_BH(net, field)	SNMP_INC_STATS_ATOMIC_LONG((net)->mib.icmpmsg_statistics, field)
+#define ICMP_INC_STATS(net, dev, field)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVINCATOMIC(net, icmp, , struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field);	\
+		rcu_read_unlock();	\
+	})
+			
+#define ICMP_INC_STATS_BH(net, dev, field)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVINCATOMIC(net, icmp, _BH, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field);	\
+		rcu_read_unlock();	\
+	})
+
+#define ICMPMSGOUT_INC_STATS(net, dev, field)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVINC_ATOMIC_ATOMIC(net, icmpmsg, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field +256);	\
+		rcu_read_unlock();	\
+	})
+
+#define ICMPMSGIN_INC_STATS_BH(net, dev, field)	\
+	({	\
+		rcu_read_lock();	\
+		_DEVINC_ATOMIC_ATOMIC(net, icmpmsg, struct in_device,	\
+			__in_dev_get_rcu_safely(dev), field);	\
+		rcu_read_unlock();	\
+	})
 
 struct dst_entry;
 struct net_proto_family;
@@ -43,6 +70,6 @@ extern void	icmp_send(struct sk_buff *skb_in,  int type, int code, __be32 info);
 extern int	icmp_rcv(struct sk_buff *skb);
 extern int	icmp_ioctl(struct sock *sk, int cmd, unsigned long arg);
 extern int	icmp_init(void);
-extern void	icmp_out_count(struct net *net, unsigned char type);
+extern void	icmp_out_count(struct net_device *dev, unsigned char type);
 
 #endif	/* _ICMP_H */
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 07/10 net-next] net:ipv4:devinet: Add support for alloc/free of per device stats and (un)register of per device proc files
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
                   ` (5 preceding siblings ...)
  2011-12-16 15:25 ` [PATCH 06/10 net-next] include:net:icmp: Tuned up ICMP_*_STATS macros for per device statistics and changed prototype for icmp_out_count igorm
@ 2011-12-16 15:26 ` igorm
  2011-12-16 15:26 ` [PATCH 08/10 net-next] net:ipv4:af_inet: Init proc fs before ip_init igorm
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Added function snmp_alloc_dev and snmp_free_dev for
allocing/freeing per device statistics.

snmp_alloc_dev is only called when in_device is created. If it failes,
in_device can't be created.
snmp_free_dev is only called when in_device is destroyed.

Added calls for snmp_(un)register_dev functions.

snmp_unregister_dev is called when in_device is destroyed and when it changes name.
snmp_register_dev is called when in_device is created. If it failes in_device
can be created. snmp_register_dev is also called when in_device change name.

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 net/ipv4/devinet.c |   75 +++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 65f01dc..c18564f 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -211,6 +211,38 @@ static inline void inet_free_ifa(struct in_ifaddr *ifa)
 	call_rcu(&ifa->rcu_head, inet_rcu_free_ifa);
 }
 
+static int snmp_alloc_dev(struct in_device *idev)
+{
+	if (snmp_mib_init((void __percpu **)idev->stats.ip,
+			sizeof(struct ipstats_mib),
+			__alignof__(struct ipstats_mib)) < 0)
+		goto err_ip;
+	idev->stats.icmpdev = kzalloc(sizeof(struct icmp_mib_dev),
+						GFP_KERNEL);
+	if (!idev->stats.icmpdev)
+		goto err_icmp;
+	idev->stats.icmpmsgdev = kzalloc(sizeof(struct icmpmsg_mib_dev),
+						GFP_KERNEL);
+	if (!idev->stats.icmpmsgdev)
+		goto err_icmpmsg;
+
+	return 0;
+
+err_icmpmsg:
+	kfree(idev->stats.icmpdev);
+err_icmp:
+	snmp_mib_free((void __percpu **)idev->stats.ip);
+err_ip:
+	return -ENOMEM;
+}
+
+static void snmp_free_dev(struct in_device *idev)
+{
+	kfree(idev->stats.icmpmsgdev);
+	kfree(idev->stats.icmpdev);
+	snmp_mib_free((void __percpu **)idev->stats.ip);
+}
+
 void in_dev_finish_destroy(struct in_device *idev)
 {
 	struct net_device *dev = idev->dev;
@@ -224,8 +256,10 @@ void in_dev_finish_destroy(struct in_device *idev)
 	dev_put(dev);
 	if (!idev->dead)
 		pr_err("Freeing alive in_device %p\n", idev);
-	else
+	else {
+		snmp_free_dev(idev);
 		kfree(idev);
+	}
 }
 EXPORT_SYMBOL(in_dev_finish_destroy);
 
@@ -249,6 +283,22 @@ static struct in_device *inetdev_init(struct net_device *dev)
 		dev_disable_lro(dev);
 	/* Reference in_dev->dev */
 	dev_hold(dev);
+
+	if (snmp_alloc_dev(in_dev) < 0) {
+		printk(KERN_CRIT
+			"%s(): cannot allocate memory for statistics; dev=%s.\n",
+			__func__, dev->name);
+		neigh_parms_release(&arp_tbl, in_dev->arp_parms);
+		dev_put(dev);
+		kfree(in_dev);
+		return NULL;
+	}
+
+	if (snmp_register_dev(in_dev) < 0)
+		printk(KERN_WARNING
+			"%s(): cannot create /proc/net/dev_snmp/%s\n",
+			__func__, dev->name);
+
 	/* Account for reference dev->ip_ptr (below) */
 	in_dev_hold(in_dev);
 
@@ -292,6 +342,7 @@ static void inetdev_destroy(struct in_device *in_dev)
 	}
 
 	RCU_INIT_POINTER(dev->ip_ptr, NULL);
+	snmp_unregister_dev(in_dev);
 
 	devinet_sysctl_unregister(in_dev);
 	neigh_parms_release(&arp_tbl, in_dev->arp_parms);
@@ -1222,14 +1273,20 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 	case NETDEV_UNREGISTER:
 		inetdev_destroy(in_dev);
 		break;
-	case NETDEV_CHANGENAME:
-		/* Do not notify about label change, this event is
-		 * not interesting to applications using netlink.
-		 */
-		inetdev_changename(dev, in_dev);
-
-		devinet_sysctl_unregister(in_dev);
-		devinet_sysctl_register(in_dev);
+	case NETDEV_CHANGENAME: {
+			int err;
+			/* Do not notify about label change, this event is
+			 * not interesting to applications using netlink.
+			 */
+			inetdev_changename(dev, in_dev);
+
+			snmp_unregister_dev(in_dev);
+			devinet_sysctl_unregister(in_dev);
+			devinet_sysctl_register(in_dev);
+			err = snmp_register_dev(in_dev);
+			if (err)
+				return notifier_from_errno(err);
+		}
 		break;
 	}
 out:
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 08/10 net-next] net:ipv4:af_inet: Init proc fs before ip_init
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
                   ` (6 preceding siblings ...)
  2011-12-16 15:26 ` [PATCH 07/10 net-next] net:ipv4:devinet: Add support for alloc/free of per device stats and (un)register of per device proc files igorm
@ 2011-12-16 15:26 ` igorm
  2011-12-16 15:26 ` [PATCH 09/10 net-next] net:ipv4:proc: Introduce proc files for ipv4 per interface stats igorm
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Moved ipv4_proc_init() before ip_init().
Did that, so the proc fs for ipv4 would be initialised
before we initialise devinet. If I didn't do that,
per device proc files couldn't be created

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 net/ipv4/af_inet.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index f7b5670..7384a20 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1695,6 +1695,11 @@ static int __init inet_init(void)
 
 	for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)
 		inet_register_protosw(q);
+		
+	/*
+	 * Init proc fs
+	 */
+	ipv4_proc_init();
 
 	/*
 	 *	Set the ARP module up
@@ -1742,8 +1747,6 @@ static int __init inet_init(void)
 	if (init_ipv4_mibs())
 		printk(KERN_CRIT "inet_init: Cannot init ipv4 mibs\n");
 
-	ipv4_proc_init();
-
 	ipfrag_init();
 
 	dev_add_pack(&ip_packet_type);
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 09/10 net-next] net:ipv4:proc: Introduce proc files for ipv4 per interface stats
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
                   ` (7 preceding siblings ...)
  2011-12-16 15:26 ` [PATCH 08/10 net-next] net:ipv4:af_inet: Init proc fs before ip_init igorm
@ 2011-12-16 15:26 ` igorm
  2011-12-16 15:26 ` [PATCH 10/10 net-next] net: Enable ipv4 per interface statistics igorm
  2011-12-16 15:41 ` [PATCH 00/10 net-next] Introduce per interface ipv4 statistics Eric Dumazet
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

In ip_proc_init_net dev_snmp proc directory is created.

Functions snmp_(un)register_dev are for creating/deleting proc files
that have same names as interfaces for which they are created.

Per device proc files for ipv4 interfaces have the same form as
per device proc files for ipv6 interfaces.

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 net/ipv4/proc.c |  121 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 121 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index 3569d8e..c6202d9 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -128,6 +128,14 @@ static const struct snmp_mib snmp4_ipextstats_list[] = {
 	SNMP_MIB_SENTINEL
 };
 
+static const struct snmp_mib snmp4_icmp_list[] = {
+	SNMP_MIB_ITEM("InMsgs", ICMP_MIB_INMSGS),
+	SNMP_MIB_ITEM("InErrors", ICMP_MIB_INERRORS),
+	SNMP_MIB_ITEM("OutMsgs", ICMP_MIB_OUTMSGS),
+	SNMP_MIB_ITEM("OutErrors", ICMP_MIB_OUTERRORS),
+	SNMP_MIB_SENTINEL
+};
+
 static const struct {
 	const char *name;
 	int index;
@@ -459,6 +467,113 @@ static const struct file_operations netstat_seq_fops = {
 	.release = single_release_net,
 };
 
+static void snmp_seq_show_item(struct seq_file *seq, void __percpu **pcpumib,
+				atomic_long_t *smib,
+				const struct snmp_mib *itemlist,
+				char *prefix)
+{
+	char name[32];
+	int i;
+	unsigned long val;
+
+	for (i = 0; itemlist[i].name; i++) {
+		val = pcpumib ?
+			snmp_fold_field64(pcpumib, itemlist[i].entry,
+				offsetof(struct ipstats_mib, syncp)) :
+			atomic_long_read(smib + itemlist[i].entry);
+		snprintf(name, sizeof(name), "%s%s",
+				prefix, itemlist[i].name);
+		seq_printf(seq, "%-32s\t%lu\n", name, val);
+	}
+}
+
+static void snmp_seq_show_icmpmsg(struct seq_file *seq, atomic_long_t *smib)
+{
+	char name[32];
+	int i;
+	unsigned long val;
+	for (i = 0; i < ICMPMSG_MIB_MAX; i++) {		
+		val = atomic_long_read(smib + i);
+		if (val) {
+			snprintf(name, sizeof(name), "Icmp%sType%u",
+				i & 0x100 ? "Out" : "In", i & 0xff);
+			seq_printf(seq, "%-32s\t%lu\n", name, val);
+		}
+	}
+}
+
+static int snmp_dev_seq_show(struct seq_file *seq, void *v)
+{
+	struct in_device *idev = (struct in_device *)seq->private;
+
+	seq_printf(seq, "%-32s\t%u\n", "ifIndex", idev->dev->ifindex);
+	seq_printf(seq, "%-32s\t%u\n", "Forwarding",
+		IN_DEV_FORWARD(idev));
+	seq_printf(seq, "%-32s\t%u\n", "McForwarding",
+		IN_DEV_MFORWARD(idev));
+	seq_printf(seq, "%-32s\t%u\n", "DefaultTTL",
+		   sysctl_ip_default_ttl);
+
+	BUILD_BUG_ON(offsetof(struct ipstats_mib, mibs) != 0);
+
+	snmp_seq_show_item(seq, (void __percpu **)idev->stats.ip, NULL,
+				snmp4_ipstats_list, "Ip");
+	snmp_seq_show_item(seq, (void __percpu **)idev->stats.ip, NULL,
+				snmp4_ipextstats_list, "Ip");
+	snmp_seq_show_item(seq, NULL, idev->stats.icmpdev->mibs,
+				snmp4_icmp_list, "Icmp");
+	snmp_seq_show_icmpmsg(seq, idev->stats.icmpmsgdev->mibs);
+	return 0;
+}
+
+static int snmp_dev_seq_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, snmp_dev_seq_show, PDE(inode)->data);
+}
+
+static const struct file_operations snmp_dev_seq_fops = {
+	.owner   = THIS_MODULE,
+	.open    = snmp_dev_seq_open,
+	.read    = seq_read,
+	.llseek  = seq_lseek,
+	.release = single_release,
+};
+
+int snmp_register_dev(struct in_device *idev)
+{
+	struct proc_dir_entry *p;
+	struct net *net;
+
+	if (!idev || !idev->dev)
+		return -EINVAL;
+
+	net = dev_net(idev->dev);
+	if (!net->mib.proc_net_devsnmp)
+		return -ENOENT;
+
+	p = proc_create_data(idev->dev->name, S_IRUGO,
+			net->mib.proc_net_devsnmp,
+			&snmp_dev_seq_fops, idev);
+	if (!p) 
+		return -ENOMEM;
+
+	idev->stats.proc_dir_entry = p;
+	return 0;
+}
+
+int snmp_unregister_dev(struct in_device *idev)
+{
+	struct net *net = dev_net(idev->dev);
+	if (!net->mib.proc_net_devsnmp)
+		return -ENOENT;
+	if (!idev->stats.proc_dir_entry)
+		return -EINVAL;
+	remove_proc_entry(idev->stats.proc_dir_entry->name,
+		net->mib.proc_net_devsnmp);
+	idev->stats.proc_dir_entry = NULL;
+	return 0;
+}
+
 static __net_init int ip_proc_init_net(struct net *net)
 {
 	if (!proc_net_fops_create(net, "sockstat", S_IRUGO, &sockstat_seq_fops))
@@ -467,9 +582,14 @@ static __net_init int ip_proc_init_net(struct net *net)
 		goto out_netstat;
 	if (!proc_net_fops_create(net, "snmp", S_IRUGO, &snmp_seq_fops))
 		goto out_snmp;
+	net->mib.proc_net_devsnmp = proc_mkdir("dev_snmp", net->proc_net);
+	if (!net->mib.proc_net_devsnmp)
+		goto out_dev_snmp;
 
 	return 0;
 
+out_dev_snmp:
+	proc_net_remove(net, "snmp");
 out_snmp:
 	proc_net_remove(net, "netstat");
 out_netstat:
@@ -483,6 +603,7 @@ static __net_exit void ip_proc_exit_net(struct net *net)
 	proc_net_remove(net, "snmp");
 	proc_net_remove(net, "netstat");
 	proc_net_remove(net, "sockstat");
+	proc_net_remove(net, "dev_snmp");
 }
 
 static __net_initdata struct pernet_operations ip_proc_ops = {
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 10/10 net-next] net: Enable ipv4 per interface statistics
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
                   ` (8 preceding siblings ...)
  2011-12-16 15:26 ` [PATCH 09/10 net-next] net:ipv4:proc: Introduce proc files for ipv4 per interface stats igorm
@ 2011-12-16 15:26 ` igorm
  2011-12-16 15:41 ` [PATCH 00/10 net-next] Introduce per interface ipv4 statistics Eric Dumazet
  10 siblings, 0 replies; 26+ messages in thread
From: igorm @ 2011-12-16 15:26 UTC (permalink / raw)
  To: netdev; +Cc: davem, Igor Maravic

From: Igor Maravic <igorm@etf.rs>

Added argument of type net_device* to macros IP_*_STATS and ICMP_*_STATS
In most places that was trivial - just added net_device* from
which we've read net*.

Signed-off-by: Igor Maravic <igorm@etf.rs>
---
 net/bridge/br_netfilter.c       |    6 ++--
 net/dccp/ipv4.c                 |    9 ++++---
 net/ipv4/datagram.c             |    2 +-
 net/ipv4/icmp.c                 |   29 ++++++++++++++-----------
 net/ipv4/inet_connection_sock.c |    8 +++++-
 net/ipv4/ip_forward.c           |    8 ++++--
 net/ipv4/ip_fragment.c          |   43 +++++++++++++++++++++-----------------
 net/ipv4/ip_input.c             |   33 +++++++++++++++--------------
 net/ipv4/ip_output.c            |   43 +++++++++++++++++++++-----------------
 net/ipv4/ipmr.c                 |    6 +++-
 net/ipv4/ping.c                 |    8 +++---
 net/ipv4/raw.c                  |    4 +-
 net/ipv4/route.c                |    2 +-
 net/ipv4/tcp_ipv4.c             |    9 ++++---
 net/ipv4/udp.c                  |    7 +++--
 net/l2tp/l2tp_ip.c              |    6 ++--
 net/sctp/input.c                |    4 +-
 net/sctp/output.c               |    2 +-
 18 files changed, 127 insertions(+), 102 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 834dfab..a821217 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -255,13 +255,13 @@ static int br_parse_ip_options(struct sk_buff *skb)
 
 	len = ntohs(iph->tot_len);
 	if (skb->len < len) {
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INTRUNCATEDPKTS);
+		IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INTRUNCATEDPKTS);
 		goto drop;
 	} else if (len < (iph->ihl*4))
 		goto inhdr_error;
 
 	if (pskb_trim_rcsum(skb, len)) {
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
+		IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INDISCARDS);
 		goto drop;
 	}
 
@@ -286,7 +286,7 @@ static int br_parse_ip_options(struct sk_buff *skb)
 	return 0;
 
 inhdr_error:
-	IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INHDRERRORS);
+	IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INHDRERRORS);
 drop:
 	return -1;
 }
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 1c67fe8..ca8a024 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -219,11 +219,12 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 	struct sock *sk;
 	__u64 seq;
 	int err;
-	struct net *net = dev_net(skb->dev);
+	struct net_device *dev = skb->dev;
+	struct net *net = dev_net(dev);
 
 	if (skb->len < offset + sizeof(*dh) ||
 	    skb->len < offset + __dccp_basic_hdr_len(dh)) {
-		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
+		ICMP_INC_STATS_BH(net, dev, ICMP_MIB_INERRORS);
 		return;
 	}
 
@@ -231,7 +232,7 @@ static void dccp_v4_err(struct sk_buff *skb, u32 info)
 			iph->daddr, dh->dccph_dport,
 			iph->saddr, dh->dccph_sport, inet_iif(skb));
 	if (sk == NULL) {
-		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
+		ICMP_INC_STATS_BH(net, dev, ICMP_MIB_INERRORS);
 		return;
 	}
 
@@ -488,7 +489,7 @@ static struct dst_entry* dccp_v4_route_skb(struct net *net, struct sock *sk,
 	security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
 	rt = ip_route_output_flow(net, &fl4, sk);
 	if (IS_ERR(rt)) {
-		IP_INC_STATS_BH(net, IPSTATS_MIB_OUTNOROUTES);
+		IP_INC_STATS_BH(net, NULL, IPSTATS_MIB_OUTNOROUTES);
 		return NULL;
 	}
 
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 424fafb..ca1d0ba 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -57,7 +57,7 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (IS_ERR(rt)) {
 		err = PTR_ERR(rt);
 		if (err == -ENETUNREACH)
-			IP_INC_STATS_BH(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
+			IP_INC_STATS_BH(sock_net(sk), NULL, IPSTATS_MIB_OUTNOROUTES);
 		goto out;
 	}
 
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index ab188ae..8a064db 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -264,10 +264,10 @@ out:
 /*
  *	Maintain the counters used in the SNMP statistics for outgoing ICMP
  */
-void icmp_out_count(struct net *net, unsigned char type)
+void icmp_out_count(struct net_device *dev, unsigned char type)
 {
-	ICMPMSGOUT_INC_STATS(net, type);
-	ICMP_INC_STATS(net, ICMP_MIB_OUTMSGS);
+	ICMPMSGOUT_INC_STATS(dev_net(dev), dev, type);
+	ICMP_INC_STATS(dev_net(dev), dev, ICMP_MIB_OUTMSGS);
 }
 
 /*
@@ -296,13 +296,14 @@ static void icmp_push_reply(struct icmp_bxm *icmp_param,
 {
 	struct sock *sk;
 	struct sk_buff *skb;
+	struct net_device *dev = (*rt)->dst.dev;
 
-	sk = icmp_sk(dev_net((*rt)->dst.dev));
+	sk = icmp_sk(dev_net(dev));
 	if (ip_append_data(sk, fl4, icmp_glue_bits, icmp_param,
 			   icmp_param->data_len+icmp_param->head_len,
 			   icmp_param->head_len,
 			   ipc, rt, MSG_DONTWAIT) < 0) {
-		ICMP_INC_STATS_BH(sock_net(sk), ICMP_MIB_OUTERRORS);
+		ICMP_INC_STATS_BH(sock_net(sk), dev, ICMP_MIB_OUTERRORS);
 		ip_flush_pending_frames(sk);
 	} else if ((skb = skb_peek(&sk->sk_write_queue)) != NULL) {
 		struct icmphdr *icmph = icmp_hdr(skb);
@@ -643,8 +644,9 @@ static void icmp_unreach(struct sk_buff *skb)
 	const struct net_protocol *ipprot;
 	u32 info = 0;
 	struct net *net;
+	struct net_device *dev = skb_dst(skb)->dev;
 
-	net = dev_net(skb_dst(skb)->dev);
+	net = dev_net(dev);
 
 	/*
 	 *	Incomplete header ?
@@ -747,7 +749,7 @@ static void icmp_unreach(struct sk_buff *skb)
 out:
 	return;
 out_err:
-	ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
+	ICMP_INC_STATS_BH(net, dev, ICMP_MIB_INERRORS);
 	goto out;
 }
 
@@ -796,7 +798,7 @@ static void icmp_redirect(struct sk_buff *skb)
 out:
 	return;
 out_err:
-	ICMP_INC_STATS_BH(dev_net(skb->dev), ICMP_MIB_INERRORS);
+	ICMP_INC_STATS_BH(dev_net(skb->dev), skb->dev, ICMP_MIB_INERRORS);
 	goto out;
 }
 
@@ -867,7 +869,7 @@ static void icmp_timestamp(struct sk_buff *skb)
 out:
 	return;
 out_err:
-	ICMP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), ICMP_MIB_INERRORS);
+	ICMP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), skb_dst(skb)->dev, ICMP_MIB_INERRORS);
 	goto out;
 }
 
@@ -963,7 +965,8 @@ int icmp_rcv(struct sk_buff *skb)
 {
 	struct icmphdr *icmph;
 	struct rtable *rt = skb_rtable(skb);
-	struct net *net = dev_net(rt->dst.dev);
+	struct net_device *dev = rt->dst.dev;
+	struct net *net = dev_net(dev);
 
 	if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
 		struct sec_path *sp = skb_sec_path(skb);
@@ -985,7 +988,7 @@ int icmp_rcv(struct sk_buff *skb)
 		skb_set_network_header(skb, nh);
 	}
 
-	ICMP_INC_STATS_BH(net, ICMP_MIB_INMSGS);
+	ICMP_INC_STATS_BH(net, dev, ICMP_MIB_INMSGS);
 
 	switch (skb->ip_summed) {
 	case CHECKSUM_COMPLETE:
@@ -1003,7 +1006,7 @@ int icmp_rcv(struct sk_buff *skb)
 
 	icmph = icmp_hdr(skb);
 
-	ICMPMSGIN_INC_STATS_BH(net, icmph->type);
+	ICMPMSGIN_INC_STATS_BH(net, dev, icmph->type);
 	/*
 	 *	18 is the highest 'known' ICMP type. Anything else is a mystery
 	 *
@@ -1044,7 +1047,7 @@ drop:
 	kfree_skb(skb);
 	return 0;
 error:
-	ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
+	ICMP_INC_STATS_BH(net, dev, ICMP_MIB_INERRORS);
 	goto drop;
 }
 
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 2e4e244..f527d70 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -372,9 +372,11 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
 	return &rt->dst;
 
 route_err:
+	IP_INC_STATS_BH(net, rt->dst.dev, IPSTATS_MIB_OUTNOROUTES);
 	ip_rt_put(rt);
+	return NULL;
 no_route:
-	IP_INC_STATS_BH(net, IPSTATS_MIB_OUTNOROUTES);
+	IP_INC_STATS_BH(net, NULL, IPSTATS_MIB_OUTNOROUTES);
 	return NULL;
 }
 EXPORT_SYMBOL_GPL(inet_csk_route_req);
@@ -405,9 +407,11 @@ struct dst_entry *inet_csk_route_child_sock(struct sock *sk,
 	return &rt->dst;
 
 route_err:
+	IP_INC_STATS_BH(net, rt->dst.dev, IPSTATS_MIB_OUTNOROUTES);
 	ip_rt_put(rt);
+	return NULL;
 no_route:
-	IP_INC_STATS_BH(net, IPSTATS_MIB_OUTNOROUTES);
+	IP_INC_STATS_BH(net, NULL, IPSTATS_MIB_OUTNOROUTES);
 	return NULL;
 }
 EXPORT_SYMBOL_GPL(inet_csk_route_child_sock);
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 29a07b6..f8ab57e 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -43,7 +43,8 @@ static int ip_forward_finish(struct sk_buff *skb)
 {
 	struct ip_options * opt	= &(IPCB(skb)->opt);
 
-	IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
+	IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), skb_dst(skb)->dev,
+		IPSTATS_MIB_OUTFORWDATAGRAMS);
 
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
@@ -89,7 +90,7 @@ int ip_forward(struct sk_buff *skb)
 
 	if (unlikely(skb->len > dst_mtu(&rt->dst) && !skb_is_gso(skb) &&
 		     (ip_hdr(skb)->frag_off & htons(IP_DF))) && !skb->local_df) {
-		IP_INC_STATS(dev_net(rt->dst.dev), IPSTATS_MIB_FRAGFAILS);
+		IP_INC_STATS(dev_net(rt->dst.dev), rt->dst.dev, IPSTATS_MIB_FRAGFAILS);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			  htonl(dst_mtu(&rt->dst)));
 		goto drop;
@@ -124,7 +125,8 @@ sr_failed:
 
 too_many_hops:
 	/* Tell the sender its packet died... */
-	IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_INHDRERRORS);
+	IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), skb_dst(skb)->dev,
+		IPSTATS_MIB_INHDRERRORS);
 	icmp_send(skb, ICMP_TIME_EXCEEDED, ICMP_EXC_TTL, 0);
 drop:
 	kfree_skb(skb);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index fdaabf2..9aa00e1 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -209,13 +209,14 @@ static void ipq_kill(struct ipq *ipq)
 /* Memory limiting on fragments.  Evictor trashes the oldest
  * fragment queue until we are back under the threshold.
  */
-static void ip_evictor(struct net *net)
+static void ip_evictor(struct net_device *dev)
 {
 	int evicted;
+	struct net *net = dev_net(dev);
 
 	evicted = inet_frag_evictor(&net->ipv4.frags, &ip4_frags);
 	if (evicted)
-		IP_ADD_STATS_BH(net, IPSTATS_MIB_REASMFAILS, evicted);
+		IP_ADD_STATS_BH(net, dev, IPSTATS_MIB_REASMFAILS, evicted);
 }
 
 /*
@@ -225,6 +226,7 @@ static void ip_expire(unsigned long arg)
 {
 	struct ipq *qp;
 	struct net *net;
+	struct net_device *dev;
 
 	qp = container_of((struct inet_frag_queue *) arg, struct ipq, q);
 	net = container_of(qp->q.net, struct net, ipv4.frags);
@@ -235,19 +237,18 @@ static void ip_expire(unsigned long arg)
 		goto out;
 
 	ipq_kill(qp);
+	
+	rcu_read_lock();
+	dev = dev_get_by_index_rcu(net, qp->iif);
+	IP_INC_STATS_BH(net, dev, IPSTATS_MIB_REASMTIMEOUT);
+	IP_INC_STATS_BH(net, dev, IPSTATS_MIB_REASMFAILS);
 
-	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMTIMEOUT);
-	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS);
-
-	if ((qp->q.last_in & INET_FRAG_FIRST_IN) && qp->q.fragments != NULL) {
+	if ((qp->q.last_in & INET_FRAG_FIRST_IN) && qp->q.fragments && dev) {
 		struct sk_buff *head = qp->q.fragments;
 		const struct iphdr *iph;
 		int err;
 
-		rcu_read_lock();
-		head->dev = dev_get_by_index_rcu(net, qp->iif);
-		if (!head->dev)
-			goto out_rcu_unlock;
+		head->dev = dev;
 
 		/* skb dst is stale, drop it, and perform route lookup again */
 		skb_dst_drop(head);
@@ -269,9 +270,9 @@ static void ip_expire(unsigned long arg)
 
 		/* Send an ICMP "Fragment Reassembly Timeout" message. */
 		icmp_send(head, ICMP_TIME_EXCEEDED, ICMP_EXC_FRAGTIME, 0);
-out_rcu_unlock:
-		rcu_read_unlock();
 	}
+out_rcu_unlock:
+	rcu_read_unlock();
 out:
 	spin_unlock(&qp->q.lock);
 	ipq_put(qp);
@@ -325,7 +326,10 @@ static inline int ip_frag_too_far(struct ipq *qp)
 		struct net *net;
 
 		net = container_of(qp->q.net, struct net, ipv4.frags);
-		IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS);
+		rcu_read_lock();
+		IP_INC_STATS_BH(net, dev_get_by_index_rcu(net, qp->iif),
+			IPSTATS_MIB_REASMFAILS);
+		rcu_read_unlock();
 	}
 
 	return rc;
@@ -631,7 +635,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 	iph->frag_off = 0;
 	iph->tot_len = htons(len);
 	iph->tos |= ecn;
-	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS);
+	IP_INC_STATS_BH(net, dev, IPSTATS_MIB_REASMOKS);
 	qp->q.fragments = NULL;
 	qp->q.fragments_tail = NULL;
 	return 0;
@@ -646,7 +650,7 @@ out_oversize:
 		printk(KERN_INFO "Oversized IP packet from %pI4.\n",
 			&qp->saddr);
 out_fail:
-	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS);
+	IP_INC_STATS_BH(net, dev, IPSTATS_MIB_REASMFAILS);
 	return err;
 }
 
@@ -655,13 +659,14 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 {
 	struct ipq *qp;
 	struct net *net;
+	struct net_device *dev = skb->dev ? skb->dev : skb_dst(skb)->dev;
 
-	net = skb->dev ? dev_net(skb->dev) : dev_net(skb_dst(skb)->dev);
-	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS);
+	net = dev_net(dev);
+	IP_INC_STATS_BH(net, dev, IPSTATS_MIB_REASMREQDS);
 
 	/* Start by cleaning up the memory. */
 	if (atomic_read(&net->ipv4.frags.mem) > net->ipv4.frags.high_thresh)
-		ip_evictor(net);
+		ip_evictor(dev);
 
 	/* Lookup (or create) queue header */
 	if ((qp = ip_find(net, ip_hdr(skb), user)) != NULL) {
@@ -676,7 +681,7 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 		return ret;
 	}
 
-	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMFAILS);
+	IP_INC_STATS_BH(net, dev, IPSTATS_MIB_REASMFAILS);
 	kfree_skb(skb);
 	return -ENOMEM;
 }
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 073a9b0..53f5ba0 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -187,8 +187,9 @@ int ip_call_ra_chain(struct sk_buff *skb)
 
 static int ip_local_deliver_finish(struct sk_buff *skb)
 {
-	struct net *net = dev_net(skb->dev);
-
+	struct net_device *dev = skb->dev;
+	struct net *net = dev_net(dev);
+	
 	__skb_pull(skb, ip_hdrlen(skb));
 
 	/* Point into the IP datagram, just past the header. */
@@ -228,16 +229,16 @@ static int ip_local_deliver_finish(struct sk_buff *skb)
 				protocol = -ret;
 				goto resubmit;
 			}
-			IP_INC_STATS_BH(net, IPSTATS_MIB_INDELIVERS);
+			IP_INC_STATS_BH(net, dev, IPSTATS_MIB_INDELIVERS);
 		} else {
 			if (!raw) {
 				if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-					IP_INC_STATS_BH(net, IPSTATS_MIB_INUNKNOWNPROTOS);
+					IP_INC_STATS_BH(net, dev, IPSTATS_MIB_INUNKNOWNPROTOS);
 					icmp_send(skb, ICMP_DEST_UNREACH,
 						  ICMP_PROT_UNREACH, 0);
 				}
 			} else
-				IP_INC_STATS_BH(net, IPSTATS_MIB_INDELIVERS);
+				IP_INC_STATS_BH(net, dev, IPSTATS_MIB_INDELIVERS);
 			kfree_skb(skb);
 		}
 	}
@@ -279,7 +280,7 @@ static inline int ip_rcv_options(struct sk_buff *skb)
 					      --ANK (980813)
 	*/
 	if (skb_cow(skb, skb_headroom(skb))) {
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
+		IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INDISCARDS);
 		goto drop;
 	}
 
@@ -288,7 +289,7 @@ static inline int ip_rcv_options(struct sk_buff *skb)
 	opt->optlen = iph->ihl*4 - sizeof(struct iphdr);
 
 	if (ip_options_compile(dev_net(dev), opt, skb)) {
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INHDRERRORS);
+		IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INHDRERRORS);
 		goto drop;
 	}
 
@@ -328,10 +329,10 @@ static int ip_rcv_finish(struct sk_buff *skb)
 					       iph->tos, skb->dev);
 		if (unlikely(err)) {
 			if (err == -EHOSTUNREACH)
-				IP_INC_STATS_BH(dev_net(skb->dev),
+				IP_INC_STATS_BH(dev_net(skb->dev), skb->dev,
 						IPSTATS_MIB_INADDRERRORS);
 			else if (err == -ENETUNREACH)
-				IP_INC_STATS_BH(dev_net(skb->dev),
+				IP_INC_STATS_BH(dev_net(skb->dev), skb->dev,
 						IPSTATS_MIB_INNOROUTES);
 			else if (err == -EXDEV)
 				NET_INC_STATS_BH(dev_net(skb->dev),
@@ -356,10 +357,10 @@ static int ip_rcv_finish(struct sk_buff *skb)
 
 	rt = skb_rtable(skb);
 	if (rt->rt_type == RTN_MULTICAST) {
-		IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INMCAST,
+		IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), rt->dst.dev, IPSTATS_MIB_INMCAST,
 				skb->len);
 	} else if (rt->rt_type == RTN_BROADCAST)
-		IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST,
+		IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), rt->dst.dev, IPSTATS_MIB_INBCAST,
 				skb->len);
 
 	return dst_input(skb);
@@ -384,10 +385,10 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 		goto drop;
 
 
-	IP_UPD_PO_STATS_BH(dev_net(dev), IPSTATS_MIB_IN, skb->len);
+	IP_UPD_PO_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_IN, skb->len);
 
 	if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) {
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
+		IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INDISCARDS);
 		goto out;
 	}
 
@@ -420,7 +421,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 
 	len = ntohs(iph->tot_len);
 	if (skb->len < len) {
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INTRUNCATEDPKTS);
+		IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INTRUNCATEDPKTS);
 		goto drop;
 	} else if (len < (iph->ihl*4))
 		goto inhdr_error;
@@ -430,7 +431,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 	 * Note this now means skb->len holds ntohs(iph->tot_len).
 	 */
 	if (pskb_trim_rcsum(skb, len)) {
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);
+		IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INDISCARDS);
 		goto drop;
 	}
 
@@ -444,7 +445,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt,
 		       ip_rcv_finish);
 
 inhdr_error:
-	IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INHDRERRORS);
+	IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INHDRERRORS);
 drop:
 	kfree_skb(skb);
 out:
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index ff302bd..994bbb5 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -186,9 +186,9 @@ static inline int ip_finish_output2(struct sk_buff *skb)
 	struct neighbour *neigh;
 
 	if (rt->rt_type == RTN_MULTICAST) {
-		IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUTMCAST, skb->len);
+		IP_UPD_PO_STATS(dev_net(dev), dev, IPSTATS_MIB_OUTMCAST, skb->len);
 	} else if (rt->rt_type == RTN_BROADCAST)
-		IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUTBCAST, skb->len);
+		IP_UPD_PO_STATS(dev_net(dev), dev, IPSTATS_MIB_OUTBCAST, skb->len);
 
 	/* Be paranoid, rather than too clever. */
 	if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
@@ -253,7 +253,7 @@ int ip_mc_output(struct sk_buff *skb)
 	/*
 	 *	If the indicated interface is up and running, send the packet.
 	 */
-	IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUT, skb->len);
+	IP_UPD_PO_STATS(dev_net(dev), dev, IPSTATS_MIB_OUT, skb->len);
 
 	skb->dev = dev;
 	skb->protocol = htons(ETH_P_IP);
@@ -309,7 +309,7 @@ int ip_output(struct sk_buff *skb)
 {
 	struct net_device *dev = skb_dst(skb)->dev;
 
-	IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUT, skb->len);
+	IP_UPD_PO_STATS(dev_net(dev), dev, IPSTATS_MIB_OUT, skb->len);
 
 	skb->dev = dev;
 	skb->protocol = htons(ETH_P_IP);
@@ -375,7 +375,7 @@ int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl)
 					   RT_CONN_FLAGS(sk),
 					   sk->sk_bound_dev_if);
 		if (IS_ERR(rt))
-			goto no_route;
+			goto no_route_no_dev;
 		sk_setup_caps(sk, &rt->dst);
 	}
 	skb_dst_set_noref(skb, &rt->dst);
@@ -414,11 +414,16 @@ packet_routed:
 	rcu_read_unlock();
 	return res;
 
-no_route:
+out_err:
 	rcu_read_unlock();
-	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
 	kfree_skb(skb);
 	return -EHOSTUNREACH;
+no_route_no_dev:
+	IP_INC_STATS(sock_net(sk), NULL, IPSTATS_MIB_OUTNOROUTES);
+	goto out_err;
+no_route:
+	IP_INC_STATS(sock_net(sk), rt->dst.dev, IPSTATS_MIB_OUTNOROUTES);
+	goto out_err;
 }
 EXPORT_SYMBOL(ip_queue_xmit);
 
@@ -478,7 +483,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 	iph = ip_hdr(skb);
 
 	if (unlikely((iph->frag_off & htons(IP_DF)) && !skb->local_df)) {
-		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+		IP_INC_STATS(dev_net(dev), dev, IPSTATS_MIB_FRAGFAILS);
 		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 			  htonl(ip_skb_dst_mtu(skb)));
 		kfree_skb(skb);
@@ -570,7 +575,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			err = output(skb);
 
 			if (!err)
-				IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGCREATES);
+				IP_INC_STATS(dev_net(dev), dev, IPSTATS_MIB_FRAGCREATES);
 			if (err || !frag)
 				break;
 
@@ -580,7 +585,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		}
 
 		if (err == 0) {
-			IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGOKS);
+			IP_INC_STATS(dev_net(dev), dev, IPSTATS_MIB_FRAGOKS);
 			return 0;
 		}
 
@@ -589,7 +594,7 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			kfree_skb(frag);
 			frag = skb;
 		}
-		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+		IP_INC_STATS(dev_net(dev), dev, IPSTATS_MIB_FRAGFAILS);
 		return err;
 
 slow_path_clean:
@@ -708,15 +713,15 @@ slow_path:
 		if (err)
 			goto fail;
 
-		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGCREATES);
+		IP_INC_STATS(dev_net(dev), dev, IPSTATS_MIB_FRAGCREATES);
 	}
 	kfree_skb(skb);
-	IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGOKS);
+	IP_INC_STATS(dev_net(dev), dev, IPSTATS_MIB_FRAGOKS);
 	return err;
 
 fail:
 	kfree_skb(skb);
-	IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+	IP_INC_STATS(dev_net(dev), dev, IPSTATS_MIB_FRAGFAILS);
 	return err;
 }
 EXPORT_SYMBOL(ip_fragment);
@@ -1049,7 +1054,7 @@ alloc_new_skb:
 
 error:
 	cork->length -= length;
-	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
+	IP_INC_STATS(sock_net(sk), rt->dst.dev, IPSTATS_MIB_OUTDISCARDS);
 	return err;
 }
 
@@ -1270,7 +1275,7 @@ ssize_t	ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page,
 
 error:
 	cork->length -= size;
-	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
+	IP_INC_STATS(sock_net(sk), rt->dst.dev, IPSTATS_MIB_OUTDISCARDS);
 	return err;
 }
 
@@ -1295,7 +1300,6 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	struct sk_buff *skb, *tmp_skb;
 	struct sk_buff **tail_skb;
 	struct inet_sock *inet = inet_sk(sk);
-	struct net *net = sock_net(sk);
 	struct ip_options *opt = NULL;
 	struct rtable *rt = (struct rtable *)cork->dst;
 	struct iphdr *iph;
@@ -1368,7 +1372,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
 	skb_dst_set(skb, &rt->dst);
 
 	if (iph->protocol == IPPROTO_ICMP)
-		icmp_out_count(net, ((struct icmphdr *)
+		icmp_out_count(rt->dst.dev, ((struct icmphdr *)
 			skb_transport_header(skb))->type);
 
 	ip_cork_release(cork);
@@ -1379,6 +1383,7 @@ out:
 int ip_send_skb(struct sk_buff *skb)
 {
 	struct net *net = sock_net(skb->sk);
+	struct net_device *dev = skb_dst(skb)->dev;
 	int err;
 
 	err = ip_local_out(skb);
@@ -1386,7 +1391,7 @@ int ip_send_skb(struct sk_buff *skb)
 		if (err > 0)
 			err = net_xmit_errno(err);
 		if (err)
-			IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS);
+			IP_INC_STATS(net, dev, IPSTATS_MIB_OUTDISCARDS);
 	}
 
 	return err;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 8e54490..0aaa704 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1575,7 +1575,8 @@ static inline int ipmr_forward_finish(struct sk_buff *skb)
 {
 	struct ip_options *opt = &(IPCB(skb)->opt);
 
-	IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
+	IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev),
+		skb_dst(skb)->dev, IPSTATS_MIB_OUTFORWDATAGRAMS);
 
 	if (unlikely(opt->optlen))
 		ip_forward_options(skb);
@@ -1637,7 +1638,8 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 		 * to blackhole.
 		 */
 
-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+		IP_INC_STATS_BH(dev_net(dev), dev,
+			IPSTATS_MIB_FRAGFAILS);
 		ip_rt_put(rt);
 		goto out_free;
 	}
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 43d4c3b..10504e8 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -567,7 +567,7 @@ static int ping_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		err = PTR_ERR(rt);
 		rt = NULL;
 		if (err == -ENETUNREACH)
-			IP_INC_STATS_BH(net, IPSTATS_MIB_OUTNOROUTES);
+			IP_INC_STATS_BH(net, NULL, IPSTATS_MIB_OUTNOROUTES);
 		goto out;
 	}
 
@@ -602,13 +602,13 @@ back_from_confirm:
 	release_sock(sk);
 
 out:
-	ip_rt_put(rt);
 	if (free)
 		kfree(ipc.opt);
 	if (!err) {
-		icmp_out_count(sock_net(sk), user_icmph.type);
-		return len;
+		icmp_out_count(rt->dst.dev, user_icmph.type);
+		err = len;
 	}
+	ip_rt_put(rt);
 	return err;
 
 do_confirm:
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 3ccda5a..7c579c3 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -387,7 +387,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 		iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);
 	}
 	if (iph->protocol == IPPROTO_ICMP)
-		icmp_out_count(net, ((struct icmphdr *)
+		icmp_out_count(rt->dst.dev, ((struct icmphdr *)
 			skb_transport_header(skb))->type);
 
 	err = NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_OUT, skb, NULL,
@@ -402,7 +402,7 @@ out:
 error_free:
 	kfree_skb(skb);
 error:
-	IP_INC_STATS(net, IPSTATS_MIB_OUTDISCARDS);
+	IP_INC_STATS(net, rt->dst.dev, IPSTATS_MIB_OUTDISCARDS);
 	if (err == -ENOBUFS && !inet->recverr)
 		err = 0;
 	return err;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f30112f..e9b1124 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1545,7 +1545,7 @@ static int ip_error(struct sk_buff *skb)
 	case ENETUNREACH:
 		code = ICMP_NET_UNREACH;
 		IP_INC_STATS_BH(dev_net(rt->dst.dev),
-				IPSTATS_MIB_INNOROUTES);
+				rt->dst.dev, IPSTATS_MIB_INNOROUTES);
 		break;
 	case EACCES:
 		code = ICMP_PKT_FILTERED;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 1eb4ad5..8b99f21 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -183,7 +183,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	if (IS_ERR(rt)) {
 		err = PTR_ERR(rt);
 		if (err == -ENETUNREACH)
-			IP_INC_STATS_BH(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
+			IP_INC_STATS_BH(sock_net(sk), NULL, IPSTATS_MIB_OUTNOROUTES);
 		return err;
 	}
 
@@ -359,17 +359,18 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
 	__u32 seq;
 	__u32 remaining;
 	int err;
-	struct net *net = dev_net(icmp_skb->dev);
+	struct net_device *dev = icmp_skb->dev;
+	struct net *net = dev_net(dev);
 
 	if (icmp_skb->len < (iph->ihl << 2) + 8) {
-		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
+		ICMP_INC_STATS_BH(net, dev, ICMP_MIB_INERRORS);
 		return;
 	}
 
 	sk = inet_lookup(net, &tcp_hashinfo, iph->daddr, th->dest,
 			iph->saddr, th->source, inet_iif(icmp_skb));
 	if (!sk) {
-		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
+		ICMP_INC_STATS_BH(net, dev, ICMP_MIB_INERRORS);
 		return;
 	}
 	if (sk->sk_state == TCP_TIME_WAIT) {
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5d075b5..b4af9c0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -587,12 +587,13 @@ void __udp4_lib_err(struct sk_buff *skb, u32 info, struct udp_table *udptable)
 	struct sock *sk;
 	int harderr;
 	int err;
-	struct net *net = dev_net(skb->dev);
+	struct net_device *dev = skb->dev;
+	struct net *net = dev_net(dev);
 
 	sk = __udp4_lib_lookup(net, iph->daddr, uh->dest,
 			iph->saddr, uh->source, skb->dev->ifindex, udptable);
 	if (sk == NULL) {
-		ICMP_INC_STATS_BH(net, ICMP_MIB_INERRORS);
+		ICMP_INC_STATS_BH(net, dev, ICMP_MIB_INERRORS);
 		return;	/* No socket for error */
 	}
 
@@ -937,7 +938,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 			err = PTR_ERR(rt);
 			rt = NULL;
 			if (err == -ENETUNREACH)
-				IP_INC_STATS_BH(net, IPSTATS_MIB_OUTNOROUTES);
+				IP_INC_STATS_BH(net, NULL, IPSTATS_MIB_OUTNOROUTES);
 			goto out;
 		}
 
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index d21e7eb..1620860 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -330,7 +330,7 @@ static int l2tp_ip_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	if (IS_ERR(rt)) {
 		rc = PTR_ERR(rt);
 		if (rc == -ENETUNREACH)
-			IP_INC_STATS_BH(&init_net, IPSTATS_MIB_OUTNOROUTES);
+			IP_INC_STATS_BH(&init_net, NULL, IPSTATS_MIB_OUTNOROUTES);
 		goto out;
 	}
 
@@ -406,7 +406,7 @@ static int l2tp_ip_backlog_recv(struct sock *sk, struct sk_buff *skb)
 	return 0;
 
 drop:
-	IP_INC_STATS(&init_net, IPSTATS_MIB_INDISCARDS);
+	IP_INC_STATS(&init_net, skb->dev, IPSTATS_MIB_INDISCARDS);
 	kfree_skb(skb);
 	return -1;
 }
@@ -532,7 +532,7 @@ out:
 
 no_route:
 	rcu_read_unlock();
-	IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
+	IP_INC_STATS(sock_net(sk), NULL, IPSTATS_MIB_OUTNOROUTES);
 	kfree_skb(skb);
 	rc = -EHOSTUNREACH;
 	goto out;
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 80f71af..f86f811 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -576,7 +576,7 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info)
 	int err;
 
 	if (skb->len < ihlen + 8) {
-		ICMP_INC_STATS_BH(&init_net, ICMP_MIB_INERRORS);
+		ICMP_INC_STATS_BH(&init_net, skb->dev, ICMP_MIB_INERRORS);
 		return;
 	}
 
@@ -590,7 +590,7 @@ void sctp_v4_err(struct sk_buff *skb, __u32 info)
 	skb->network_header = saveip;
 	skb->transport_header = savesctp;
 	if (!sk) {
-		ICMP_INC_STATS_BH(&init_net, ICMP_MIB_INERRORS);
+		ICMP_INC_STATS_BH(&init_net, skb->dev, ICMP_MIB_INERRORS);
 		return;
 	}
 	/* Warning:  The sock lock is held.  Remember to call
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 08b3cea..7fb40d0 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -569,7 +569,7 @@ out:
 	return err;
 no_route:
 	kfree_skb(nskb);
-	IP_INC_STATS_BH(&init_net, IPSTATS_MIB_OUTNOROUTES);
+	IP_INC_STATS_BH(&init_net, NULL, IPSTATS_MIB_OUTNOROUTES);
 
 	/* FIXME: Returning the 'err' will effect all the associations
 	 * associated with a socket, although only one of the paths of the
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
                   ` (9 preceding siblings ...)
  2011-12-16 15:26 ` [PATCH 10/10 net-next] net: Enable ipv4 per interface statistics igorm
@ 2011-12-16 15:41 ` Eric Dumazet
  2011-12-16 15:58   ` Igor Maravić
  2011-12-16 18:27   ` David Miller
  10 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-12-16 15:41 UTC (permalink / raw)
  To: igorm; +Cc: netdev, davem

Le vendredi 16 décembre 2011 à 16:25 +0100, igorm@etf.rs a écrit :
> From: Igor Maravic <igorm@etf.rs>
> 
> Hi all,
> in this patch series I introduced per interface statistics for ipv4.
> 
> I referenced how it was done for ipv6 per interface statistics.
> BR
> Igor

Questions/Comments are :

1) Why is it needed ? Any RFC requires this bloat ?

2) Every patch in a patch serie must be compilable. Think about
bisection. This is mandatory.

  You cannot have a patch that change prototypes, like your 06/10 
-#define ICMP_INC_STATS(net, field)     SNMP_INC_STATS((net)->mib.icmp_statistics, field)
+#define ICMP_INC_STATS(net, dev, field)        \

 Then, in another patch, 'fix' callers, like your 10/10

-               IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INTRUNCATEDPKTS);
+               IP_INC_STATS_BH(dev_net(dev), dev, IPSTATS_MIB_INTRUNCATEDPKTS);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 15:41 ` [PATCH 00/10 net-next] Introduce per interface ipv4 statistics Eric Dumazet
@ 2011-12-16 15:58   ` Igor Maravić
  2011-12-16 16:33     ` Eric Dumazet
  2011-12-16 18:28     ` David Miller
  2011-12-16 18:27   ` David Miller
  1 sibling, 2 replies; 26+ messages in thread
From: Igor Maravić @ 2011-12-16 15:58 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

> 1) Why is it needed ? Any RFC requires this bloat ?
>

RFC4293 defines ipIfStatsTable.
Also I did it because ipv6 has per interface stats.

> 2) Every patch in a patch serie must be compilable. Think about
> bisection. This is mandatory.
>

OK, I didn't know about that...
Do I need to resubmit it? If I do, I'l do that in monday.
Would it be acceptable to send it as one big patch?
BR
Igor

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 15:58   ` Igor Maravić
@ 2011-12-16 16:33     ` Eric Dumazet
  2011-12-16 16:44       ` Christoph Lameter
  2011-12-16 18:31       ` David Miller
  2011-12-16 18:28     ` David Miller
  1 sibling, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-12-16 16:33 UTC (permalink / raw)
  To: igorm; +Cc: netdev, davem

Le vendredi 16 décembre 2011 à 16:58 +0100, Igor Maravić a écrit :
> > 1) Why is it needed ? Any RFC requires this bloat ?
> >
> 
> RFC4293 defines ipIfStatsTable.

Only as an option :

   In addition to the ipSystemStatsTable, the MIB includes the
   ipIfStatsTable.  This table counts the same items as the system table
   but does so on a per interface basis.  It is optional and may be
   ignored.  If you decide to implement it, you may wish to arrange to
   collect the data on a per-interface basis and then sum those counters
   in order to provide the aggregate system level statistics.  However,
   if you choose to provide the system level statistics by summing the
   interface level counters, no interface level statistics can be lost -
   if an interface is removed, the statistics associated with it must be
   retained.

If not enough memory is available, we cannot up the interfaces anymore
after your patches. Some people disable ipv6 on their machines because
they setup thousand of interfaces, and added memory cost of
ipIfStatsTable on IPv6 is huge.

> Also I did it because ipv6 has per interface stats.

I was considering _removing_ them, or make it optional.

Memory costs are huge for percpu data.

> 
> > 2) Every patch in a patch serie must be compilable. Think about
> > bisection. This is mandatory.
> >
> 
> OK, I didn't know about that...
> Do I need to resubmit it? If I do, I'l do that in monday.

As I said, this is _mandatory_.

Before spending time on this stuff, please wait for other people
comments.

> Would it be acceptable to send it as one big patch?

I dont think so, its too big.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 16:33     ` Eric Dumazet
@ 2011-12-16 16:44       ` Christoph Lameter
  2011-12-16 16:50         ` Eric Dumazet
  2011-12-16 18:31       ` David Miller
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Lameter @ 2011-12-16 16:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: igorm, netdev, davem

On Fri, 16 Dec 2011, Eric Dumazet wrote:

> > Also I did it because ipv6 has per interface stats.
>
> I was considering _removing_ them, or make it optional.
>
> Memory costs are huge for percpu data.
>

And these costs are going to increase as we add more processors. Intel is
talking about hundred in the future.

So maybe we need to change the per cpu subsystem to be able to allocate
these only for subsets of cpus and then only allow the operation of
the network subsystems on the same subset? That would also increase cache
hotness. Pretty radical but I think at some point we will have to consider
that given that the number of cpus keep growing.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 16:44       ` Christoph Lameter
@ 2011-12-16 16:50         ` Eric Dumazet
  2011-12-16 16:55           ` Christoph Lameter
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-12-16 16:50 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: igorm, netdev, davem

Le vendredi 16 décembre 2011 à 10:44 -0600, Christoph Lameter a écrit :
> On Fri, 16 Dec 2011, Eric Dumazet wrote:
> 
> > > Also I did it because ipv6 has per interface stats.
> >
> > I was considering _removing_ them, or make it optional.
> >
> > Memory costs are huge for percpu data.
> >
> 
> And these costs are going to increase as we add more processors. Intel is
> talking about hundred in the future.
> 
> So maybe we need to change the per cpu subsystem to be able to allocate
> these only for subsets of cpus and then only allow the operation of
> the network subsystems on the same subset? That would also increase cache
> hotness. Pretty radical but I think at some point we will have to consider
> that given that the number of cpus keep growing.
> 

Or we could use a hierarchical split : Say 16 (or 32 or 64) cpus share
same counters (must be atomic if NR_CPUS > 16/32/64)

percpu_alloc() -> percpugroup_alloc()

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 16:50         ` Eric Dumazet
@ 2011-12-16 16:55           ` Christoph Lameter
  2011-12-16 17:14             ` Eric Dumazet
  2011-12-16 17:19             ` Stephen Hemminger
  0 siblings, 2 replies; 26+ messages in thread
From: Christoph Lameter @ 2011-12-16 16:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: igorm, netdev, davem

On Fri, 16 Dec 2011, Eric Dumazet wrote:

> Or we could use a hierarchical split : Say 16 (or 32 or 64) cpus share
> same counters (must be atomic if NR_CPUS > 16/32/64)

Then you'd need to have locking or full atomic operations for the
counters. Reduction of network processing to a set of processors also will
have other beneficial effects in addition to cache hotness. It would
removing OS jitter etc etc.

> percpu_alloc() -> percpugroup_alloc()

Sounds like a per cpuset/cgroup allocation?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 16:55           ` Christoph Lameter
@ 2011-12-16 17:14             ` Eric Dumazet
  2011-12-16 17:29               ` Christoph Lameter
  2011-12-16 17:19             ` Stephen Hemminger
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-12-16 17:14 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: igorm, netdev, davem

Le vendredi 16 décembre 2011 à 10:55 -0600, Christoph Lameter a écrit :
> On Fri, 16 Dec 2011, Eric Dumazet wrote:
> 
> > Or we could use a hierarchical split : Say 16 (or 32 or 64) cpus share
> > same counters (must be atomic if NR_CPUS > 16/32/64)
> 
> Then you'd need to have locking or full atomic operations for the
> counters.

I mentioned that : "(must be atomic if NR_CPUS > 16/32/64)"


>  Reduction of network processing to a set of processors also will
> have other beneficial effects in addition to cache hotness. It would
> removing OS jitter etc etc.
> 

You already can do that right now, with or without hardware help.

See numerous improvements in this area (RPS/RFS/RSS/XPS ... in
Documentation/networking/scaling.txt)


> > percpu_alloc() -> percpugroup_alloc()
> 
> Sounds like a per cpuset/cgroup allocation?
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 17:14             ` Eric Dumazet
@ 2011-12-16 17:29               ` Christoph Lameter
  2011-12-16 18:08                 ` Eric Dumazet
  2011-12-16 18:39                 ` David Miller
  0 siblings, 2 replies; 26+ messages in thread
From: Christoph Lameter @ 2011-12-16 17:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: igorm, netdev, davem

On Fri, 16 Dec 2011, Eric Dumazet wrote:

> >  Reduction of network processing to a set of processors also will
> > have other beneficial effects in addition to cache hotness. It would
> > removing OS jitter etc etc.
> >
>
> You already can do that right now, with or without hardware help.
>
> See numerous improvements in this area (RPS/RFS/RSS/XPS ... in
> Documentation/networking/scaling.txt)

I have some latency critical processes here and I wish I could get
networking in general off the processor where the latency sensitive stuff
is running should that process decide to do some calls that cause network
I/O.

Traditional networking is a slow process these days.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 17:29               ` Christoph Lameter
@ 2011-12-16 18:08                 ` Eric Dumazet
  2011-12-16 18:30                   ` Christoph Lameter
  2011-12-16 18:39                 ` David Miller
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-12-16 18:08 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: igorm, netdev, davem

Le vendredi 16 décembre 2011 à 11:29 -0600, Christoph Lameter a écrit :

> I have some latency critical processes here and I wish I could get
> networking in general off the processor where the latency sensitive stuff
> is running should that process decide to do some calls that cause network
> I/O.
> 
> Traditional networking is a slow process these days.


Most of the slow process is in the process scheduler actually and cache
line misses in large TCP structures, but also in many layers (Qdisc...),
not counting icache footprint.

We slowly improve things, but its always a tradeoff (did I said code
bloat ?)

If you have dedicated network thread(s) in your application, bound to
the right cpu(s), then all network stack can be run on the cpus you
decide [ Also needs using correct irq affinities for the NIC interrupts]

This means your latency critical threads should delegate their network
IO (like disk IO) to other threads, _and_ avoid being blocked in
scheduler land.

Given the nature of socket api, I am not sure adding a layer to
transparently delegate network IO to a pool of dedicated cpus would be a
win.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 18:08                 ` Eric Dumazet
@ 2011-12-16 18:30                   ` Christoph Lameter
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Lameter @ 2011-12-16 18:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: igorm, netdev, davem

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1807 bytes --]

On Fri, 16 Dec 2011, Eric Dumazet wrote:

> Le vendredi 16 décembre 2011 à 11:29 -0600, Christoph Lameter a écrit :
>
> > I have some latency critical processes here and I wish I could get
> > networking in general off the processor where the latency sensitive stuff
> > is running should that process decide to do some calls that cause network
> > I/O.
> >
> > Traditional networking is a slow process these days.
>
>
> Most of the slow process is in the process scheduler actually and cache
> line misses in large TCP structures, but also in many layers (Qdisc...),
> not counting icache footprint.
>
> We slowly improve things, but its always a tradeoff (did I said code
> bloat ?)
>
> If you have dedicated network thread(s) in your application, bound to
> the right cpu(s), then all network stack can be run on the cpus you
> decide [ Also needs using correct irq affinities for the NIC interrupts]
>
> This means your latency critical threads should delegate their network
> IO (like disk IO) to other threads, _and_ avoid being blocked in
> scheduler land.

Right. So why can the OS not do this? If I do read/write via a socket then
do the usual processing as much as possible on the current cpu. Never
schedule any operations later on this cpu. Run scheduled tasks only on the
cpus that are allowed to be used for network stuff.

> Given the nature of socket api, I am not sure adding a layer to
> transparently delegate network IO to a pool of dedicated cpus would be a
> win.

The problem is that the socket layer is drag for low latency apps. They
only use the socket api in non latency critical sections. If an action in
a non latency critical section causes the processor to be interrupted
later in a latency critical section then this is not good.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 17:29               ` Christoph Lameter
  2011-12-16 18:08                 ` Eric Dumazet
@ 2011-12-16 18:39                 ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2011-12-16 18:39 UTC (permalink / raw)
  To: cl; +Cc: eric.dumazet, igorm, netdev

From: Christoph Lameter <cl@linux.com>
Date: Fri, 16 Dec 2011 11:29:57 -0600 (CST)

> I have some latency critical processes here and I wish I could get

I hear there's FPGAs for that.... what drives me nuts about these
discussions is that custom silicon for trading platforms is all but
inevitable, and will save tons of space and power utilization to boot.

Madly trying to focus on lowering latency in a general purpose OS so
much just to compare two numbers and execute a trade is silly when in
the end it'll be implemented in a discrete circuit.

This stuff is special purpose, so let's treat it as such.

Lowering latency is a good goal, don't get me wrong, but the extreme
goals set out in these cases is for a very specific small group of
users.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 16:55           ` Christoph Lameter
  2011-12-16 17:14             ` Eric Dumazet
@ 2011-12-16 17:19             ` Stephen Hemminger
  1 sibling, 0 replies; 26+ messages in thread
From: Stephen Hemminger @ 2011-12-16 17:19 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Eric Dumazet, igorm, netdev, davem

On Fri, 16 Dec 2011 10:55:18 -0600 (CST)
Christoph Lameter <cl@linux.com> wrote:

> On Fri, 16 Dec 2011, Eric Dumazet wrote:
> 
> > Or we could use a hierarchical split : Say 16 (or 32 or 64) cpus share
> > same counters (must be atomic if NR_CPUS > 16/32/64)
> 
> Then you'd need to have locking or full atomic operations for the
> counters. Reduction of network processing to a set of processors also will
> have other beneficial effects in addition to cache hotness. It would
> removing OS jitter etc etc.
> 
> > percpu_alloc() -> percpugroup_alloc()
> 
> Sounds like a per cpuset/cgroup allocation?

The problem is not per cpu usage for the traffic counters at the ipv4 level. The problem is
the multiplicative growth with 10,000 interfaces and 1024 cpus! Also, IPV6
was ridiculous with keeping for per-cpu counters for things that don't matter
the ICMP counters (thanks to Eric for addressing that one. Only a few
values in the ipstats mib are really in the hot path, others seem to be handled
that way only because the code is cleaner doing it that way.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 16:33     ` Eric Dumazet
  2011-12-16 16:44       ` Christoph Lameter
@ 2011-12-16 18:31       ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2011-12-16 18:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: igorm, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 16 Dec 2011 17:33:04 +0100

> Le vendredi 16 décembre 2011 à 16:58 +0100, Igor Maravić a écrit :
>> Also I did it because ipv6 has per interface stats.
> 
> I was considering _removing_ them, or make it optional.

It is not the only huge problem with this feature.

The other one is, as I said, it requires having some inetdev available in
every single corner of every packet processing path, just so we can hit
some statistic.

Which means we have all of this special case code to make sure we have at
least a reference to the loopback device when a physical device is brought
down or removed.

It's painful, ugly, and not something I want propagating into ipv4 as well.

There is no way I'm letting this feature in, I've already wasted enough time
when trying to make ipv6 changes because of this facility.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 15:58   ` Igor Maravić
  2011-12-16 16:33     ` Eric Dumazet
@ 2011-12-16 18:28     ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2011-12-16 18:28 UTC (permalink / raw)
  To: igorm; +Cc: eric.dumazet, netdev

From: Igor Maravić <igorm@etf.rs>
Date: Fri, 16 Dec 2011 16:58:21 +0100

>> 1) Why is it needed ? Any RFC requires this bloat ?
>>
> 
> RFC4293 defines ipIfStatsTable.

That's too bad.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 00/10 net-next] Introduce per interface ipv4 statistics
  2011-12-16 15:41 ` [PATCH 00/10 net-next] Introduce per interface ipv4 statistics Eric Dumazet
  2011-12-16 15:58   ` Igor Maravić
@ 2011-12-16 18:27   ` David Miller
  1 sibling, 0 replies; 26+ messages in thread
From: David Miller @ 2011-12-16 18:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: igorm, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 16 Dec 2011 16:41:20 +0100

> 1) Why is it needed ? Any RFC requires this bloat ?

I'm not allowing something like this into ipv4, there are too many
terrible side effects in ipv6 because we do it there.

The mere necessity of having to have some device handle at every single packet
processing spot is incredibly painful, and makes changes in the ipv6 packet
paths 10 times more difficult than they otherwise would be.

I'm not allowing this difficulty to be added to the ipv4 side as well.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-12-16 18:39 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-16 15:25 [PATCH 00/10 net-next] Introduce per interface ipv4 statistics igorm
2011-12-16 15:25 ` [PATCH 01/10 net-next] include: net: netns: mib: Add proc_dir_entry for ipv4 per interface stats igorm
2011-12-16 15:25 ` [PATCH 02/10 net-next] include: net: snmp: Create icmp per device counters and add macros for per device stats igorm
2011-12-16 15:25 ` [PATCH 03/10 net-next] include:linux:inetdevice: Add struct ipv4_devstat and func __in_dev_get_rcu_safely igorm
2011-12-16 15:25 ` [PATCH 04/10 net-next] include:net:ipv6: Moved _DEV* macros igorm
2011-12-16 15:25 ` [PATCH 05/10 net-next] include:net:ip: Tuned up IP_*_STATS macros for per device statistics and added functions for (un)registering per device proc entries igorm
2011-12-16 15:25 ` [PATCH 06/10 net-next] include:net:icmp: Tuned up ICMP_*_STATS macros for per device statistics and changed prototype for icmp_out_count igorm
2011-12-16 15:26 ` [PATCH 07/10 net-next] net:ipv4:devinet: Add support for alloc/free of per device stats and (un)register of per device proc files igorm
2011-12-16 15:26 ` [PATCH 08/10 net-next] net:ipv4:af_inet: Init proc fs before ip_init igorm
2011-12-16 15:26 ` [PATCH 09/10 net-next] net:ipv4:proc: Introduce proc files for ipv4 per interface stats igorm
2011-12-16 15:26 ` [PATCH 10/10 net-next] net: Enable ipv4 per interface statistics igorm
2011-12-16 15:41 ` [PATCH 00/10 net-next] Introduce per interface ipv4 statistics Eric Dumazet
2011-12-16 15:58   ` Igor Maravić
2011-12-16 16:33     ` Eric Dumazet
2011-12-16 16:44       ` Christoph Lameter
2011-12-16 16:50         ` Eric Dumazet
2011-12-16 16:55           ` Christoph Lameter
2011-12-16 17:14             ` Eric Dumazet
2011-12-16 17:29               ` Christoph Lameter
2011-12-16 18:08                 ` Eric Dumazet
2011-12-16 18:30                   ` Christoph Lameter
2011-12-16 18:39                 ` David Miller
2011-12-16 17:19             ` Stephen Hemminger
2011-12-16 18:31       ` David Miller
2011-12-16 18:28     ` David Miller
2011-12-16 18:27   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).