From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stephen Hemminger Subject: Re: [Patch 0/5] Network Drop Monitor Date: Tue, 3 Mar 2009 10:06:37 -0800 Message-ID: <20090303100637.31a5dac7@nehalam> References: <20090303165747.GA1480@hmsreliant.think-freely.org> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, nhorman@tuxdriver.com, davem@davemloft.net, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi, jmorris@namei.org, yoshfuji@linux-ipv6.org, kaber@trash.net To: Neil Horman Return-path: Received: from mail.vyatta.com ([76.74.103.46]:41544 "EHLO mail.vyatta.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752047AbZCCSGo (ORCPT ); Tue, 3 Mar 2009 13:06:44 -0500 In-Reply-To: <20090303165747.GA1480@hmsreliant.think-freely.org> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 3 Mar 2009 11:57:47 -0500 Neil Horman wrote: > > Create Network Drop Monitoring service in the kernel > > A few weeks ago I posted an RFC requesting some feedback on a proposal that I > had to enhance our ability to monitor the Linux network stack for dropped > packets. This patchset is the result of that RFC and its feedback. > > Overview: > > The Linux networking stack, from a users point of view suffers from four > shortcommings: > > 1) Consolidation: The ability to detect dropped network packets is spread out > over several proc file interfaces and various other utilities (tc, > /proc/net/dev, snmp, etc) > > 2) Clarity: The ability to discern which statistics reflect dropped packets is > not always clear > > 3) Ambiguity: The ability to understand the root cause of a lost packet is not > always clear (some stats are incremented at multiple points in the kernel for > subtly different reasons) > > 4) Performance: Interrogating all of these interface as they currently exist > requires a polling operation, and potentially requires the serialization of > various kernel operations, which can result in performance degradation. > > Proposed solution: dropwatch > > My proposed solution consists of 4 primary aspects: > > A) A hook into kfree_skb to detect dropped packets. Based on feedback from the > earlier RFC, there are relatively few places in the kernel where packets are > dropped because they have been successfully received or send (for lack of a > better term, end-of-line points). The remaining calls to kfree_skb are made > because there is something wrong and the packet must be discarded. I've split > kfree_skb into two calls: kfree_skb and kfree_skb_clean. The later is simply a > pass through to __kfree_skb, while the former adds a trace hook to capture a > pointer to the skb and the location of the call. > > B) A trace hook to monitor the trace point in (A). this records the locations > at which frames were dropped, and saves them for periodic reporting. > > C) A netlink protocol to both control the enabling/disabling of the trace hook > in (B) and to deliver information on drops to interested applications in user > space > > D) A user space application to listen for drop alerts from (C) and report them > to an adminstrator/save them for later analysis/etc. I've implmented the start > of this application, which relies on this patch set here: > https://fedorahosted.org/dropwatch/ > > > Implementation Notes: > > About the only out-of the ordinary aspects I'd like to call attention to at this > point are: > > 1) The trace point. I know that tracepoints are currently a controversial > subject, and that their need was discussed briefly during the RFC. I elected to > use a tracepoint here, simply because I felt like I was re-inventing the wheel > otherwise. In order to implement this feature, I needed an ability to record > when kfree_skb was called in certain places who's performance impact would be 0 > when the feature wasn't configured into the kernel, and when it was configured, > but disabled. Given that anything else I used or wrote myself to hook into this > point in the kernel would be a partial approximation of what tracepoints already > offer, I think its preferable to go with a tracepoint here, simply because its > good use of existing function. > > 2) The configuration messages in the netlink protocol are just a placeholder > right now. I'm ok with that, given that the dropwatch user app doesn't have > code to configure anything yet anyway (it just turns the service off/on and > listens for drops right now). I figure I'll implment configuration messages in > the app and kernel in parallel. > > 3) Performance. I'm not sure of the best way to model the performance here, but > I disassembled the code in question, and the point at which we hook kfree_skb, > this patch set only adds a conditional branch to the path, which is optimized > for the not-taken case (the case in which the service is disabled), so adding > this feature is as close to a zero impact as it can be when the service is > disabled. Likewise, when tracepoints are not configured in the kernel, the > tracepoint (which is defined as a macro) is preprocessed away, making the > performance impact zero. That leave the case in which the service is enabled. > While I don't have specific numbers, I can say that the trace path is lockless > and per-cpu, and should run O(n) where n is the number of recordable drop points > (default is 64). Sendingi/allocation of frames to userspace is done in the > context of keventd, with a timer for hysteresis, to keep the number of sends > lower and consolidate drop information. So performance should be reasonably > good there. Again, no hard numbers, but I've monitored drops by passing udp > traffic through localhost with netcat and SIGSTOP-ing the receiver. Console and > ssh access remained very responsive > > > Ok, so thats it, hope it meets with everybodys approval! > Regards > Neil > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html It would be good to have a way to mask off certain tracepoints. For example, if running performance test and after measuring number of packets dropped in TX queue overflow, only see others.