[RFC] addition of a dropped packet notification service

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] addition of a dropped packet notification service
@ 2009-02-06 18:20 Neil Horman
  2009-02-07  0:57 ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Horman @ 2009-02-06 18:20 UTC (permalink / raw)
  To: netdev; +Cc: davem, kuznet, pekkas, jmorris, yoshfuji, herbert, nhorman

Hey all-
	A week or so ago I tried posting a tracepoint patch for net-next which
was met with some resistance, with opposing arguments circling around the lines
of not having an upstream user for those points, which I think is good
criticizm.  As such I think I've come up with a project idea here that I can
implement using a few tracepoints (not that that really matters in light of the
overall scheme of things), but I wanted to propose it here and get some feedback
from people on what they think might be good and bad about this.


Problem: 
Gathering information about packets that are dropped within the kernel
network stack.

Problem Backround: 
The Linux kernel is nominally quite good about avoid packet
drops whenever possible.  However, there are of course times when packet
processing errors, malformed frames, or other conditions result in the need to
abandon a packet during reception or transmission.  Savy system administrators
are perfectly capable of monitoring for and detecting these lost packets so that
possible corrective action can be taken.  However the sysadmins job here suffers
from three distinct shortcommings in our user space drop detection facilities:

1) Fragmentation of information: Dropped packets occur at many different layers
of the network stack, and different mechanisms are used to access information
about drops in those various layers.  Statistics at various layers may require a
simple reading of a proc file, or it may require the use of one or more tools.
At minimum, by my count, at least 6 files/tools must be queried to get a
complete picture of where in the network stack a packet is being dropped.

2) Clarity of meaning: While some statistics are clear, others may be less so.
Even if a sysadmin knows that there are several places to look for a dropped
packet, [s]he may be far less clear on which statistics in those tools/files map
to an actual lost packet.  For instance, does a TCP AttemptFail imply a dropped
packet or not?  A quick reading of the source may indicate that, but thats at
best a subpar solution

3) Ambiguity of cause:  Even if a sysadmin correctly checks all the locations
for dropped packets and gleans which are the relevant stats for that purpose,
there is still missing information that some might enjoy.  Namely, the root
cause of the problem.  For example, UDPInErrors stats are incremented in several
places in the code, and for two primary purposes (application congestion leading
to a full rcvbuf, or a udp checksum error).  While the stats presented to the
user provide information indicating that packets were dropped in the UDP code,
the root cause is still a mystery.

Solution:
To solve this problem, I would like to propose the addition of a new netlink
protocol, NETLINK_DRPMON.  The notion is that user space applications would
dynamically engage this service, which would then monitor several tracepoints
throughout the kernel (which would in aggregate cover all the possible locations
from the system call to the hardware in which a network packet might be
dropped), these tracepoints would be hooked by the "drop monitor" to catch
increments in relevant statistics at these points, and, if/when they do,
broadcast a netlink message to listening applications to inform them a drop has
taken place.  This alert would include information about the location of the
drop (class (IPV4/IPV6/arp/hardware/etc), type (InHdrErrors, etc), and specific
location (function and line number)).  Using such a method, admins could then
use an application to reliably monitor for network packet drops in one
consolidated place, while keeping performance impact to a minimum (since
tracepoints are meant to have no impact when disabled, and very little impact
otherwise).  It consolidates information, provides clarity in what does and
doesn't constitute a drop, and provide to the line number information about
where the drop occured.

I've written some of this already, but I wanted to stop and get feedback before
I went any farther.  Please bear in mind that the patch below is totally
incomplete.  Most notably its missing most of the netlink protocol
implementation, and there is far from complete coverage of all the in-kernel
drop point locations.  But the IPv4 SNMP stats are completely covered and serve
as an exemplar of how I was planning on doing drop recording.  Also notably
missing is the user space app to listen for these messages, but if there is
general consensus that this is indeed a good idea, I'll get started on the
protocol and user app straight away.

So, have at it.  Good thoughts and bad all welcome.  Thanks for the interest and
the feedback!

Thanks & Regards
Neil


diff --git a/include/linux/net_dropmon.h b/include/linux/net_dropmon.h
new file mode 100644
index 0000000..fdcd02c
--- /dev/null
+++ b/include/linux/net_dropmon.h
@@ -0,0 +1,42 @@
+#ifndef __NET_DROPMON_H
+#define __NET_DROPMON_H
+
+#include <linux/netlink.h>
+
+
+struct net_dm_config_msg {
+};
+
+struct net_dm_user_msg {
+	union {
+		struct net_dm_config_msg cmsg;
+	} u;
+};
+
+struct net_dm_drop_point {
+	char function[64];
+	unsigned int line;	
+};
+
+/*
+ * These are the classes of drops that we can have
+ * Each one corresponds to a stats file/utility
+ * you can use to gather more data on the drop
+ */
+enum {
+        DROP_CLASS_SNMP_IPV4 = 0,
+        DROP_CLASS_SNMP_IPV6,
+        DROP_CLASS_SNMP_TCP,
+        DROP_CLASS_SNMP_UDP,
+        DROP_CLASS_SNMP_LINUX,
+};
+
+/* These are the netlink message types for this protocol */
+
+#define NET_DM_BASE	0x10 			/* Standard Netlink Messages below this */
+#define NET_DM_ALERT	(NET_DM_BASE + 1) 	/* Alert about dropped packets */
+#define NET_DM_CONFIG	(NET_DM_BASE + 2)	/* Configuration message */
+#define NET_DM_START	(NET_DM_BASE + 3)	/* Start monitoring */
+#define NET_DM_STOP	(NET_DM_BASE + 4)	/* Stop monitoring */
+#define NET_DM_MAX	(NET_DM_BASE + 3)
+#endif
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index 51b09a1..255d6ad 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -24,6 +24,7 @@
 /* leave room for NETLINK_DM (DM Events) */
 #define NETLINK_SCSITRANSPORT	18	/* SCSI Transports */
 #define NETLINK_ECRYPTFS	19
+#define NETLINK_DRPMON		20	/* Netork packet drop alerts */
 
 #define MAX_LINKS 32		
 
diff --git a/include/net/ip.h b/include/net/ip.h
index 1086813..08398f8 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -26,6 +26,7 @@
 #include <linux/ip.h>
 #include <linux/in.h>
 #include <linux/skbuff.h>
+#include <trace/snmp.h>
 
 #include <net/inet_sock.h>
 #include <net/snmp.h>
@@ -165,9 +166,24 @@ struct ipv4_config
 };
 
 extern struct ipv4_config ipv4_config;
-#define IP_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.ip_statistics, field)
-#define IP_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.ip_statistics, field)
-#define IP_ADD_STATS_BH(net, field, val) SNMP_ADD_STATS_BH((net)->mib.ip_statistics, field, val)
+#define IP_INC_STATS(net, field)	do {\
+	DECLARE_DROP_POINT(dp,field);\
+	SNMP_INC_STATS((net)->mib.ip_statistics, field);\
+	trace_snmp_ipv4_mib(&dp, 1);\
+} while(0)
+
+#define IP_INC_STATS_BH(net, field)	do {\
+	DECLARE_DROP_POINT(dp, field);\
+	SNMP_INC_STATS_BH((net)->mib.ip_statistics, field);\
+	trace_snmp_ipv4_mib(&dp, 1);\
+} while(0)
+
+#define IP_ADD_STATS_BH(net, field, val) do{\
+	DECLARE_DROP_POINT(dp, field);\
+	SNMP_ADD_STATS_BH((net)->mib.ip_statistics, field, val);\
+	trace_snmp_ipv4_mib(&dp, val);\
+} while(0)
+
 #define NET_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.net_statistics, field)
 #define NET_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.net_statistics, field)
 #define NET_INC_STATS_USER(net, field) 	SNMP_INC_STATS_USER((net)->mib.net_statistics, field)
diff --git a/include/trace/snmp.h b/include/trace/snmp.h
new file mode 100644
index 0000000..289dca9
--- /dev/null
+++ b/include/trace/snmp.h
@@ -0,0 +1,33 @@
+#ifndef _TRACE_SNMP_H
+#define _TRACE_SNMP_H
+
+#include <linux/tracepoint.h>
+#include <linux/list.h>
+
+#define DP_IN_USE 0
+
+struct snmp_drop_point {
+	const char *function;
+	unsigned int line;	
+	unsigned int type;
+	uint8_t flags;
+	struct list_head list;
+};
+
+#ifdef CONFIG_TRACEPOINTS
+#define DECLARE_DROP_POINT(name, kind) struct snmp_drop_point name = {\
+	.function = __FUNCTION__,\
+	.line = __LINE__,\
+	.type = kind,\
+	.flags = 0,\
+	.list = LIST_HEAD_INIT(name.list),\
+}
+#else
+#define DECLARE_DROP_POINT(name, type)
+#endif
+
+DECLARE_TRACE(snmp_ipv4_mib,
+	TPPROTO(struct snmp_drop_point *dp, int count),
+		TPARGS(dp, count));
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index a12bae0..f6dc56a 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -221,6 +221,17 @@ config NET_TCPPROBE
 	To compile this code as a module, choose M here: the
 	module will be called tcp_probe.
 
+config NET_DROP_MONITOR
+	boolean "Network packet drop alerting service"
+	depends on INET && EXPERIMENTAL && TRACEPOINTS
+	---help---
+	This feature provides an alerting service to userspace in the 
+	event that packets are discarded in the network stack.  Alerts
+	are broadcast via netlink socket to any listening user space 
+	process.  If you don't need network drop alerts, or if you are ok
+	just checking the various proc files and other utilities for
+	drop statistics, say N here.
+
 endmenu
 
 endmenu
diff --git a/net/core/Makefile b/net/core/Makefile
index 26a37cb..245d7ab 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -17,3 +17,5 @@ obj-$(CONFIG_NET_PKTGEN) += pktgen.o
 obj-$(CONFIG_NETPOLL) += netpoll.o
 obj-$(CONFIG_NET_DMA) += user_dma.o
 obj-$(CONFIG_FIB_RULES) += fib_rules.o
+obj-$(CONFIG_NET_DROP_MONITOR) += drop_monitor.o
+
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
new file mode 100644
index 0000000..5bd2128
--- /dev/null
+++ b/net/core/drop_monitor.c
@@ -0,0 +1,211 @@
+/*
+ * Monitoring code for network dropped packet alerts 
+ *
+ * Copyright (C) 2009 Neil Horman <nhorman@tuxdriver.com> 
+ */
+
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/string.h>
+#include <linux/if_arp.h>
+#include <linux/inetdevice.h>
+#include <linux/inet.h>
+#include <linux/interrupt.h>
+#include <linux/netpoll.h>
+#include <linux/sched.h>
+#include <linux/delay.h>
+#include <linux/rcupdate.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+#include <linux/netlink.h>
+#include <linux/net_dropmon.h>
+
+#include <asm/unaligned.h>
+#include <asm/bitops.h>
+#include <trace/snmp.h>
+
+#define RCV_SKB_FAIL(err) do { netlink_ack(skb, nlh, (err)); return; } while (0)
+
+#define TRACE_ON 1
+#define TRACE_OFF 0
+
+static void send_dm_alert(struct work_struct *unused);
+
+
+/*
+ * Globals, our netlink socket pointer
+ * and the work handle that will send up
+ * netlink alerts
+ */
+struct sock *dm_sock;
+DECLARE_WORK(dm_alert_work, send_dm_alert);
+
+DEFINE_TRACE(snmp_ipv4_mib);
+EXPORT_TRACEPOINT_SYMBOL_GPL(snmp_ipv4_mib);
+
+/*
+ * Bitmasks for our hit classes, and one for each 
+ * class so we know which type of hit in each class
+ * we got
+ */
+static uint64_t drop_class_hits;
+
+
+/*
+ * ipv4 mib list of drop detections
+ */
+static struct list_head ipv4_drop_hits[2] = {
+	LIST_HEAD_INIT(ipv4_drop_hits[0]),
+	LIST_HEAD_INIT(ipv4_drop_hits[1]),
+};
+
+static struct list_head *ipv4_drop_hitp = &ipv4_drop_hits[0];
+static int ipv4_dh_index = 0;
+
+static void send_dm_alert(struct work_struct *unused)
+{
+	struct list_head *last_hitp;
+
+	printk(KERN_INFO "Sending netlink alert message\n");
+	drop_class_hits = 0;
+
+	last_hitp = rcu_dereference(ipv4_drop_hitp);
+	ipv4_dh_index = !ipv4_dh_index;
+	rcu_assign_pointer(ipv4_drop_hitp, &ipv4_drop_hits[ipv4_dh_index]);
+	
+}
+
+static void snmp_ipv4_mib_hit(struct snmp_drop_point *dp,  int count)
+{
+	struct list_head *hitp;
+
+	printk(KERN_CRIT "Got IPV4 MIB HIT\n");
+	switch (dp->type) {
+        case IPSTATS_MIB_INRECEIVES:
+        case IPSTATS_MIB_INTOOBIGERRORS:
+        case IPSTATS_MIB_INNOROUTES:
+        case IPSTATS_MIB_INADDRERRORS:
+        case IPSTATS_MIB_INUNKNOWNPROTOS:
+        case IPSTATS_MIB_INTRUNCATEDPKTS:
+        case IPSTATS_MIB_INDISCARDS:
+        case IPSTATS_MIB_OUTDISCARDS:
+        case IPSTATS_MIB_OUTNOROUTES:
+        case IPSTATS_MIB_REASMTIMEOUT:
+        case IPSTATS_MIB_REASMFAILS:
+        case IPSTATS_MIB_FRAGFAILS:
+		set_bit(DROP_CLASS_SNMP_IPV4, (void *)&drop_class_hits);
+		if (!test_and_set_bit(DP_IN_USE, (void *)&dp->flags)) {
+			hitp = rcu_dereference(ipv4_drop_hitp);
+			/*
+			 * we got the dp, add it to the list
+			 */
+			list_add_tail(&dp->list, hitp);
+		}
+		schedule_work(&dm_alert_work);
+		break;
+	default:
+		return;
+	};
+
+}
+
+static int set_all_monitor_traces(int state)
+{
+	int rc = 0;
+
+	switch (state) {
+	case TRACE_ON:
+		rc |= register_trace_snmp_ipv4_mib(snmp_ipv4_mib_hit);
+		break;
+	case TRACE_OFF:
+		rc |= unregister_trace_snmp_ipv4_mib(snmp_ipv4_mib_hit);
+		break;
+	default:
+		rc = 1;
+		break;
+	}
+
+	if (rc)
+		return -EFAULT;
+	return rc;
+}
+
+static int dropmon_handle_msg(struct net_dm_user_msg *pmsg,
+			unsigned char type, unsigned int len)
+{
+	int status = 0;
+
+	if (pmsg && (len < sizeof(*pmsg)))
+		return -EINVAL;
+
+	switch (type) {
+		case NET_DM_START:
+			printk(KERN_INFO "Start dropped packet monitor\n");
+			set_all_monitor_traces(TRACE_ON);
+			break;
+		case NET_DM_STOP:
+			printk(KERN_INFO "Stop dropped packet monitor\n");
+			set_all_monitor_traces(TRACE_OFF);
+			break;
+
+	default:
+		status = -EINVAL;
+	}
+	return status;
+}
+
+
+static void drpmon_rcv(struct sk_buff *skb)
+{
+	int status, type, pid, flags, nlmsglen, skblen;
+	struct nlmsghdr *nlh;
+
+	skblen = skb->len;
+	if (skblen < sizeof(*nlh))
+		return;
+
+	nlh = nlmsg_hdr(skb);
+	nlmsglen = nlh->nlmsg_len;
+	if (nlmsglen < sizeof(*nlh) || skblen < nlmsglen)
+		return;
+
+	pid = nlh->nlmsg_pid;
+	flags = nlh->nlmsg_flags;
+
+	if(pid <= 0 || !(flags & NLM_F_REQUEST) || flags & NLM_F_MULTI)
+		RCV_SKB_FAIL(-EINVAL);
+
+	if (flags & MSG_TRUNC)
+		RCV_SKB_FAIL(-ECOMM);
+
+	type = nlh->nlmsg_type;
+	if (type < NLMSG_NOOP || type >= NET_DM_MAX)
+		RCV_SKB_FAIL(-EINVAL);
+
+	if (type <= NET_DM_BASE)
+		return;
+
+	status = dropmon_handle_msg(NLMSG_DATA(nlh), type,
+				  nlmsglen - NLMSG_LENGTH(0));
+	if (status < 0)
+		RCV_SKB_FAIL(status);
+
+	if (flags & NLM_F_ACK)
+		netlink_ack(skb, nlh, 0);
+	return;
+}
+
+void __init init_net_drop_monitor(void)
+{
+
+	printk(KERN_INFO "INITIALIZING NETWORK DROP MONITOR SERVICE\n");
+
+	dm_sock = netlink_kernel_create(&init_net, NETLINK_DRPMON, 0,
+					drpmon_rcv, NULL, THIS_MODULE);
+
+	if (dm_sock == NULL) {
+		printk(KERN_ERR "Could not create drop monitor socket\n");
+		return;
+	}
+}
+

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-06 18:20 [RFC] addition of a dropped packet notification service Neil Horman
@ 2009-02-07  0:57 ` Stephen Hemminger
  2009-02-07 17:49   ` Neil Horman
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2009-02-07  0:57 UTC (permalink / raw)
  To: Neil Horman
  Cc: netdev, davem, kuznet, pekkas, jmorris, yoshfuji, herbert,
	nhorman

On Fri, 6 Feb 2009 13:20:20 -0500
Neil Horman <nhorman@tuxdriver.com> wrote:

> Hey all-
> 	A week or so ago I tried posting a tracepoint patch for net-next which
> was met with some resistance, with opposing arguments circling around the lines
> of not having an upstream user for those points, which I think is good
> criticizm.  As such I think I've come up with a project idea here that I can
> implement using a few tracepoints (not that that really matters in light of the
> overall scheme of things), but I wanted to propose it here and get some feedback
> from people on what they think might be good and bad about this.
> 
> 
> Problem: 
> Gathering information about packets that are dropped within the kernel
> network stack.
> 
> Problem Backround: 
> The Linux kernel is nominally quite good about avoid packet
> drops whenever possible.  However, there are of course times when packet
> processing errors, malformed frames, or other conditions result in the need to
> abandon a packet during reception or transmission.  Savy system administrators
> are perfectly capable of monitoring for and detecting these lost packets so that
> possible corrective action can be taken.  However the sysadmins job here suffers
> from three distinct shortcommings in our user space drop detection facilities:
> 
> 1) Fragmentation of information: Dropped packets occur at many different layers
> of the network stack, and different mechanisms are used to access information
> about drops in those various layers.  Statistics at various layers may require a
> simple reading of a proc file, or it may require the use of one or more tools.
> At minimum, by my count, at least 6 files/tools must be queried to get a
> complete picture of where in the network stack a packet is being dropped.
> 
> 2) Clarity of meaning: While some statistics are clear, others may be less so.
> Even if a sysadmin knows that there are several places to look for a dropped
> packet, [s]he may be far less clear on which statistics in those tools/files map
> to an actual lost packet.  For instance, does a TCP AttemptFail imply a dropped
> packet or not?  A quick reading of the source may indicate that, but thats at
> best a subpar solution
> 
> 3) Ambiguity of cause:  Even if a sysadmin correctly checks all the locations
> for dropped packets and gleans which are the relevant stats for that purpose,
> there is still missing information that some might enjoy.  Namely, the root
> cause of the problem.  For example, UDPInErrors stats are incremented in several
> places in the code, and for two primary purposes (application congestion leading
> to a full rcvbuf, or a udp checksum error).  While the stats presented to the
> user provide information indicating that packets were dropped in the UDP code,
> the root cause is still a mystery.
> 
> Solution:
> To solve this problem, I would like to propose the addition of a new netlink
> protocol, NETLINK_DRPMON.  The notion is that user space applications would
> dynamically engage this service, which would then monitor several tracepoints
> throughout the kernel (which would in aggregate cover all the possible locations
> from the system call to the hardware in which a network packet might be
> dropped), these tracepoints would be hooked by the "drop monitor" to catch
> increments in relevant statistics at these points, and, if/when they do,
> broadcast a netlink message to listening applications to inform them a drop has
> taken place.  This alert would include information about the location of the
> drop (class (IPV4/IPV6/arp/hardware/etc), type (InHdrErrors, etc), and specific
> location (function and line number)).  Using such a method, admins could then
> use an application to reliably monitor for network packet drops in one
> consolidated place, while keeping performance impact to a minimum (since
> tracepoints are meant to have no impact when disabled, and very little impact
> otherwise).  It consolidates information, provides clarity in what does and
> doesn't constitute a drop, and provide to the line number information about
> where the drop occured.
> 
> I've written some of this already, but I wanted to stop and get feedback before
> I went any farther.  Please bear in mind that the patch below is totally
> incomplete.  Most notably its missing most of the netlink protocol
> implementation, and there is far from complete coverage of all the in-kernel
> drop point locations.  But the IPv4 SNMP stats are completely covered and serve
> as an exemplar of how I was planning on doing drop recording.  Also notably
> missing is the user space app to listen for these messages, but if there is
> general consensus that this is indeed a good idea, I'll get started on the
> protocol and user app straight away.
> 
> So, have at it.  Good thoughts and bad all welcome.  Thanks for the interest and
> the feedback!
> 
> Thanks & Regards
> Neil

I like the concept but not really happy about the implementation. It overloads
SNMP stats stuff which are expensive, and doesn't cover hardware or transmit
queue droppage.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-07  0:57 ` Stephen Hemminger
@ 2009-02-07 17:49   ` Neil Horman
  2009-02-09 10:21     ` Herbert Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Horman @ 2009-02-07 17:49 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, davem, kuznet, pekkas, jmorris, yoshfuji, herbert

On Fri, Feb 06, 2009 at 04:57:36PM -0800, Stephen Hemminger wrote:
> On Fri, 6 Feb 2009 13:20:20 -0500
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > Hey all-
> > 	A week or so ago I tried posting a tracepoint patch for net-next which
> > was met with some resistance, with opposing arguments circling around the lines
> > of not having an upstream user for those points, which I think is good
> > criticizm.  As such I think I've come up with a project idea here that I can
> > implement using a few tracepoints (not that that really matters in light of the
> > overall scheme of things), but I wanted to propose it here and get some feedback
> > from people on what they think might be good and bad about this.
> > 
> > 
> > Problem: 
> > Gathering information about packets that are dropped within the kernel
> > network stack.
> > 
> > Problem Backround: 
> > The Linux kernel is nominally quite good about avoid packet
> > drops whenever possible.  However, there are of course times when packet
> > processing errors, malformed frames, or other conditions result in the need to
> > abandon a packet during reception or transmission.  Savy system administrators
> > are perfectly capable of monitoring for and detecting these lost packets so that
> > possible corrective action can be taken.  However the sysadmins job here suffers
> > from three distinct shortcommings in our user space drop detection facilities:
> > 
> > 1) Fragmentation of information: Dropped packets occur at many different layers
> > of the network stack, and different mechanisms are used to access information
> > about drops in those various layers.  Statistics at various layers may require a
> > simple reading of a proc file, or it may require the use of one or more tools.
> > At minimum, by my count, at least 6 files/tools must be queried to get a
> > complete picture of where in the network stack a packet is being dropped.
> > 
> > 2) Clarity of meaning: While some statistics are clear, others may be less so.
> > Even if a sysadmin knows that there are several places to look for a dropped
> > packet, [s]he may be far less clear on which statistics in those tools/files map
> > to an actual lost packet.  For instance, does a TCP AttemptFail imply a dropped
> > packet or not?  A quick reading of the source may indicate that, but thats at
> > best a subpar solution
> > 
> > 3) Ambiguity of cause:  Even if a sysadmin correctly checks all the locations
> > for dropped packets and gleans which are the relevant stats for that purpose,
> > there is still missing information that some might enjoy.  Namely, the root
> > cause of the problem.  For example, UDPInErrors stats are incremented in several
> > places in the code, and for two primary purposes (application congestion leading
> > to a full rcvbuf, or a udp checksum error).  While the stats presented to the
> > user provide information indicating that packets were dropped in the UDP code,
> > the root cause is still a mystery.
> > 
> > Solution:
> > To solve this problem, I would like to propose the addition of a new netlink
> > protocol, NETLINK_DRPMON.  The notion is that user space applications would
> > dynamically engage this service, which would then monitor several tracepoints
> > throughout the kernel (which would in aggregate cover all the possible locations
> > from the system call to the hardware in which a network packet might be
> > dropped), these tracepoints would be hooked by the "drop monitor" to catch
> > increments in relevant statistics at these points, and, if/when they do,
> > broadcast a netlink message to listening applications to inform them a drop has
> > taken place.  This alert would include information about the location of the
> > drop (class (IPV4/IPV6/arp/hardware/etc), type (InHdrErrors, etc), and specific
> > location (function and line number)).  Using such a method, admins could then
> > use an application to reliably monitor for network packet drops in one
> > consolidated place, while keeping performance impact to a minimum (since
> > tracepoints are meant to have no impact when disabled, and very little impact
> > otherwise).  It consolidates information, provides clarity in what does and
> > doesn't constitute a drop, and provide to the line number information about
> > where the drop occured.
> > 
> > I've written some of this already, but I wanted to stop and get feedback before
> > I went any farther.  Please bear in mind that the patch below is totally
> > incomplete.  Most notably its missing most of the netlink protocol
> > implementation, and there is far from complete coverage of all the in-kernel
> > drop point locations.  But the IPv4 SNMP stats are completely covered and serve
> > as an exemplar of how I was planning on doing drop recording.  Also notably
> > missing is the user space app to listen for these messages, but if there is
> > general consensus that this is indeed a good idea, I'll get started on the
> > protocol and user app straight away.
> > 
> > So, have at it.  Good thoughts and bad all welcome.  Thanks for the interest and
> > the feedback!
> > 
> > Thanks & Regards
> > Neil
> 
> I like the concept but not really happy about the implementation. It overloads
> SNMP stats stuff which are expensive, and doesn't cover hardware or transmit
> queue droppage.
> 
Well, as I mentioned, its totally incomplete.  I only posted it, so that you
could see an exemplar of how I wanted to use tracepoints to dynamically
intercept various in kernel events so that I could gather drop notifications. Of
course several other tracepoints will be needed to capture other classes of drop
(IPv6 stats, arp queue overflows, qdisc drops, etc).

As for the expense, I'm not sure what you're referring to.  The idea was to use
tracepoints which (when disabled) provides effectively no performance penalty,
and only a minimum of penalty when enabled.  What do you see as the major
performance impact here?

Best
Neil

> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-07 17:49   ` Neil Horman
@ 2009-02-09 10:21     ` Herbert Xu
  2009-02-09 13:28       ` Neil Horman
  2009-02-25  7:48       ` David Miller
  0 siblings, 2 replies; 12+ messages in thread
From: Herbert Xu @ 2009-02-09 10:21 UTC (permalink / raw)
  To: Neil Horman
  Cc: Stephen Hemminger, netdev, davem, kuznet, pekkas, jmorris,
	yoshfuji

On Sat, Feb 07, 2009 at 12:49:32PM -0500, Neil Horman wrote:
>
> Well, as I mentioned, its totally incomplete.  I only posted it, so that you
> could see an exemplar of how I wanted to use tracepoints to dynamically
> intercept various in kernel events so that I could gather drop notifications. Of
> course several other tracepoints will be needed to capture other classes of drop
> (IPv6 stats, arp queue overflows, qdisc drops, etc).

FWIW it sounds pretty reasonable to me.  Although I'm still unsure
on what impact all these trace points will have on the maintainence
of our source code.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-09 10:21     ` Herbert Xu
@ 2009-02-09 13:28       ` Neil Horman
  2009-02-25  7:48       ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: Neil Horman @ 2009-02-09 13:28 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Stephen Hemminger, netdev, davem, kuznet, pekkas, jmorris,
	yoshfuji

On Mon, Feb 09, 2009 at 09:21:34PM +1100, Herbert Xu wrote:
> On Sat, Feb 07, 2009 at 12:49:32PM -0500, Neil Horman wrote:
> >
> > Well, as I mentioned, its totally incomplete.  I only posted it, so that you
> > could see an exemplar of how I wanted to use tracepoints to dynamically
> > intercept various in kernel events so that I could gather drop notifications. Of
> > course several other tracepoints will be needed to capture other classes of drop
> > (IPv6 stats, arp queue overflows, qdisc drops, etc).
> 
> FWIW it sounds pretty reasonable to me.  Although I'm still unsure
> on what impact all these trace points will have on the maintainence
> of our source code.
> 
I think thats a fair question, and I don't honestly know the answer.  In my
projections I expect the tracepoint to only need to be placed in the
snmpv6/6/linux macros and perhaps 3 other points.  I'll know more when I
complete the work, and the it can get a through review.

To address the performance questions, I'll certainly run some tests to compare
the number of jiffies a trace takes on baseline (CONFIG_TRACEPOINTS not set),
with tracepoints configured but not enabled, and with them enabled.

Thanks for the feedback.
Neil

> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-09 10:21     ` Herbert Xu
  2009-02-09 13:28       ` Neil Horman
@ 2009-02-25  7:48       ` David Miller
  2009-02-25  8:16         ` Herbert Xu
  2009-02-25 11:54         ` Neil Horman
  1 sibling, 2 replies; 12+ messages in thread
From: David Miller @ 2009-02-25  7:48 UTC (permalink / raw)
  To: herbert; +Cc: nhorman, shemminger, netdev, kuznet, pekkas, jmorris, yoshfuji

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 9 Feb 2009 21:21:34 +1100

> On Sat, Feb 07, 2009 at 12:49:32PM -0500, Neil Horman wrote:
> >
> > Well, as I mentioned, its totally incomplete.  I only posted it, so that you
> > could see an exemplar of how I wanted to use tracepoints to dynamically
> > intercept various in kernel events so that I could gather drop notifications. Of
> > course several other tracepoints will be needed to capture other classes of drop
> > (IPv6 stats, arp queue overflows, qdisc drops, etc).
> 
> FWIW it sounds pretty reasonable to me.  Although I'm still unsure
> on what impact all these trace points will have on the maintainence
> of our source code.

It occurs to me that this kind of event tracking is more about
a negated test rather than a straight one.

It makes no sense to annotate all of the abnormal drop cases,
there are tons of those.

Rather it's easier to consolidate the normal cases.

Here's an idea that doesn't use tracepoints (I really don't like
them, to be honest):

1) Isolate the normal packet freeing contexts, have them call
   kfree_skb_clean() or something like that.

2) What remains are the abnormal drop cases.  They still call
   plain kfree_skb() which records __builtin_return_address(0)
   and builds a hash table of counts call sites.

Then you just dump the table via some user visible interface.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-25  7:48       ` David Miller
@ 2009-02-25  8:16         ` Herbert Xu
  2009-02-25 11:54         ` Neil Horman
  1 sibling, 0 replies; 12+ messages in thread
From: Herbert Xu @ 2009-02-25  8:16 UTC (permalink / raw)
  To: David Miller
  Cc: nhorman, shemminger, netdev, kuznet, pekkas, jmorris, yoshfuji

On Tue, Feb 24, 2009 at 11:48:40PM -0800, David Miller wrote:
>
> Here's an idea that doesn't use tracepoints (I really don't like
> them, to be honest):
> 
> 1) Isolate the normal packet freeing contexts, have them call
>    kfree_skb_clean() or something like that.
> 
> 2) What remains are the abnormal drop cases.  They still call
>    plain kfree_skb() which records __builtin_return_address(0)
>    and builds a hash table of counts call sites.
> 
> Then you just dump the table via some user visible interface.

This sounds like a great idea to me.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-25  7:48       ` David Miller
  2009-02-25  8:16         ` Herbert Xu
@ 2009-02-25 11:54         ` Neil Horman
  2009-02-25 12:01           ` David Miller
  1 sibling, 1 reply; 12+ messages in thread
From: Neil Horman @ 2009-02-25 11:54 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, shemminger, netdev, kuznet, pekkas, jmorris, yoshfuji

On Tue, Feb 24, 2009 at 11:48:40PM -0800, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Mon, 9 Feb 2009 21:21:34 +1100
> 
> > On Sat, Feb 07, 2009 at 12:49:32PM -0500, Neil Horman wrote:
> > >
> > > Well, as I mentioned, its totally incomplete.  I only posted it, so that you
> > > could see an exemplar of how I wanted to use tracepoints to dynamically
> > > intercept various in kernel events so that I could gather drop notifications. Of
> > > course several other tracepoints will be needed to capture other classes of drop
> > > (IPv6 stats, arp queue overflows, qdisc drops, etc).
> > 
> > FWIW it sounds pretty reasonable to me.  Although I'm still unsure
> > on what impact all these trace points will have on the maintainence
> > of our source code.
> 
> It occurs to me that this kind of event tracking is more about
> a negated test rather than a straight one.
> 
> It makes no sense to annotate all of the abnormal drop cases,
> there are tons of those.
> 
> Rather it's easier to consolidate the normal cases.
> 
> Here's an idea that doesn't use tracepoints (I really don't like
> them, to be honest):
> 
> 1) Isolate the normal packet freeing contexts, have them call
>    kfree_skb_clean() or something like that.
> 
> 2) What remains are the abnormal drop cases.  They still call
>    plain kfree_skb() which records __builtin_return_address(0)
>    and builds a hash table of counts call sites.
> 
> Then you just dump the table via some user visible interface.
> 

I had actually started down this road for awhile, and liked it for its low
maintenence overhead (as you mentioned, less to annotate), but I balked at it
after a bit, mostly for its increased ambiguity.  By catching the drop where the
packet is freed, we lose information about the exact point at which we decided
to drop it.  As such, users of said dumped table loose (or potentially lose) the
ability to correlate the saved stack pointer with an exact point in the code
where a corresponding statistic was incremented, as well as the correlation to a
specific point in the code (they just get the calling function program counter
rather than file, funciton and line number).

Is the loss of that information worth the additional ease of maintenece? I'm not
sure.  It was my feeling that, while the number of drop points was large, it was
a fairly stable set, and by embedding the tracepoints in the macros for stat
drops, new points are still likely to get picked up transparently (the remaining
points in the queueing disciplines are outliers there).  But the above idea is
functional as well.  If the consensus is that recording where we actually free
the skb is preferable to where we note that we're going to drop (i.e. the
various points at which we increment one of our statistics counters), then I'm
happy to make that conversion.

Regards
Neil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-25 11:54         ` Neil Horman
@ 2009-02-25 12:01           ` David Miller
  2009-02-25 14:18             ` Neil Horman
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2009-02-25 12:01 UTC (permalink / raw)
  To: nhorman; +Cc: herbert, shemminger, netdev, kuznet, pekkas, jmorris, yoshfuji

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 25 Feb 2009 06:54:19 -0500

> I had actually started down this road for awhile, and liked it for
> its low maintenence overhead (as you mentioned, less to annotate),
> but I balked at it after a bit, mostly for its increased ambiguity.
> By catching the drop where the packet is freed, we lose information
> about the exact point at which we decided to drop it.  As such,
> users of said dumped table loose (or potentially lose) the ability
> to correlate the saved stack pointer with an exact point in the code
> where a corresponding statistic was incremented, as well as the
> correlation to a specific point in the code (they just get the
> calling function program counter rather than file, funciton and line
> number).

Userland tools can extract this information from a non-stripped
kernel image file, given just a kernel PC value.

And you need the same thing to extract the same information
if you used tracepoints.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-25 12:01           ` David Miller
@ 2009-02-25 14:18             ` Neil Horman
  2009-02-25 22:07               ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Horman @ 2009-02-25 14:18 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, shemminger, netdev, kuznet, pekkas, jmorris, yoshfuji

On Wed, Feb 25, 2009 at 04:01:30AM -0800, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Wed, 25 Feb 2009 06:54:19 -0500
> 
> > I had actually started down this road for awhile, and liked it for
> > its low maintenence overhead (as you mentioned, less to annotate),
> > but I balked at it after a bit, mostly for its increased ambiguity.
> > By catching the drop where the packet is freed, we lose information
> > about the exact point at which we decided to drop it.  As such,
> > users of said dumped table loose (or potentially lose) the ability
> > to correlate the saved stack pointer with an exact point in the code
> > where a corresponding statistic was incremented, as well as the
> > correlation to a specific point in the code (they just get the
> > calling function program counter rather than file, funciton and line
> > number).
> 
> Userland tools can extract this information from a non-stripped
> kernel image file, given just a kernel PC value.
> 
Sure, they can, but my origional post had this information captured at the drop
point (using __func__, __file__ and __LINE__).  My intent was that no debuginfo
package should be needed for a user to determine the point of the drop, just the
source tree (which can be consolidated to one location for multiple hosts, via
lxer, etc.), rather than installed on each individual machine.

Although as I think about it, theres no reason that a user space utility can't
use a symboled kernel to translate pc values if available, and do a reduced
symbol+offset translation using /proc/kallsyms in its absence.

> And you need the same thing to extract the same information
> if you used tracepoints.
> 
Well, I think the above implementation can be done with or without tracepoints.
The mechanism by which we get the drop information is orthogonal to what we do
with it.

Thanks for the feedback guys.  I'll start modifying my kernel changes to reflect
this striaght away.  If you'd like to see something in the interim, let me know
and I'll send you what I have.  Currently I've got the user space delivery
mechanism working as a Netlink protocol, and with your above suggestions, I
expect to have the drop capture piece recode and working in the next few weeks.

Best
Neil


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-25 14:18             ` Neil Horman
@ 2009-02-25 22:07               ` David Miller
  2009-02-26  0:01                 ` Neil Horman
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2009-02-25 22:07 UTC (permalink / raw)
  To: nhorman; +Cc: herbert, shemminger, netdev, kuznet, pekkas, jmorris, yoshfuji

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 25 Feb 2009 09:18:41 -0500

> On Wed, Feb 25, 2009 at 04:01:30AM -0800, David Miller wrote:
> > From: Neil Horman <nhorman@tuxdriver.com>
> > Date: Wed, 25 Feb 2009 06:54:19 -0500
> > 
> > And you need the same thing to extract the same information
> > if you used tracepoints.
>
> Well, I think the above implementation can be done with or without
> tracepoints.  The mechanism by which we get the drop information is
> orthogonal to what we do with it.
> 
> Thanks for the feedback guys.  I'll start modifying my kernel
> changes to reflect this striaght away.  If you'd like to see
> something in the interim, let me know and I'll send you what I have.
> Currently I've got the user space delivery mechanism working as a
> Netlink protocol, and with your above suggestions, I expect to have
> the drop capture piece recode and working in the next few weeks.

Don't get me wrong.

If tracepoint annotations can be manageable (Herbert's concern), and
solve multiple useful problems in practice rather than just
theoretically (my concern), then that's what we should use for this
kind of stuff.

I just personally haven't been convinced of that yet.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] addition of a dropped packet notification service
  2009-02-25 22:07               ` David Miller
@ 2009-02-26  0:01                 ` Neil Horman
  0 siblings, 0 replies; 12+ messages in thread
From: Neil Horman @ 2009-02-26  0:01 UTC (permalink / raw)
  To: David Miller
  Cc: herbert, shemminger, netdev, kuznet, pekkas, jmorris, yoshfuji

On Wed, Feb 25, 2009 at 02:07:01PM -0800, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Wed, 25 Feb 2009 09:18:41 -0500
> 
> > On Wed, Feb 25, 2009 at 04:01:30AM -0800, David Miller wrote:
> > > From: Neil Horman <nhorman@tuxdriver.com>
> > > Date: Wed, 25 Feb 2009 06:54:19 -0500
> > > 
> > > And you need the same thing to extract the same information
> > > if you used tracepoints.
> >
> > Well, I think the above implementation can be done with or without
> > tracepoints.  The mechanism by which we get the drop information is
> > orthogonal to what we do with it.
> > 
> > Thanks for the feedback guys.  I'll start modifying my kernel
> > changes to reflect this striaght away.  If you'd like to see
> > something in the interim, let me know and I'll send you what I have.
> > Currently I've got the user space delivery mechanism working as a
> > Netlink protocol, and with your above suggestions, I expect to have
> > the drop capture piece recode and working in the next few weeks.
> 
> Don't get me wrong.
> 
> If tracepoint annotations can be manageable (Herbert's concern), and
> solve multiple useful problems in practice rather than just
> theoretically (my concern), then that's what we should use for this
> kind of stuff.
> 
> I just personally haven't been convinced of that yet.
> 

Well, I think the manageablity issue is acceptable using multiple tracepoints at
the various locations where we increment corresponding statistics (either
visible via /proc/net/[snmp|snmpv6|etc], or another tool, like tc or netstat).
By wrapping the tracepoints in the macros they are fairly scalable and invisible
to other coders.

That said, I think marking drops at kfree_skb can also be made manageable, as
long as the assumption that the set of points in which we free without dropping
(kfree_skb_clean as you called it) is small.  I'm not 100% sure of that at the
moment, given that right now it looks like there might be a few hunder places in
various net drivers that I need to modify a call (but a sed script can handle
that I think).

In the end, I'm going to guess that the ability to solve multiple useful
problems will make the kfree solution win out.  I can see alot more potential
for noting when an skb is freed than just marking a statistic (tracing skbs
through the kernel for instance).

Now that I've got the netlink delivery working and a userspace app put together
(it'll be on fedorahosted soon) its pretty easy to play with both solutions.
I'll post a status update here in a few days letting you know how the kfree
solution works (I should have some inital thoughts by friday I think).

Thanks again for the thoughts, I owe you a beer at OLS (assuming my paper get
accepted) :)
Neil

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-02-26  0:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-06 18:20 [RFC] addition of a dropped packet notification service Neil Horman
2009-02-07  0:57 ` Stephen Hemminger
2009-02-07 17:49   ` Neil Horman
2009-02-09 10:21     ` Herbert Xu
2009-02-09 13:28       ` Neil Horman
2009-02-25  7:48       ` David Miller
2009-02-25  8:16         ` Herbert Xu
2009-02-25 11:54         ` Neil Horman
2009-02-25 12:01           ` David Miller
2009-02-25 14:18             ` Neil Horman
2009-02-25 22:07               ` David Miller
2009-02-26  0:01                 ` Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).