From: Stephen Hemminger <shemminger@vyatta.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: netdev@vger.kernel.org, nhorman@tuxdriver.com,
davem@davemloft.net, kuznet@ms2.inr.ac.ru, pekkas@netcore.fi,
jmorris@namei.org, yoshfuji@linux-ipv6.org, kaber@trash.net
Subject: Re: [Patch 0/5] Network Drop Monitor
Date: Tue, 3 Mar 2009 10:06:37 -0800 [thread overview]
Message-ID: <20090303100637.31a5dac7@nehalam> (raw)
In-Reply-To: <20090303165747.GA1480@hmsreliant.think-freely.org>
On Tue, 3 Mar 2009 11:57:47 -0500
Neil Horman <nhorman@tuxdriver.com> wrote:
>
> Create Network Drop Monitoring service in the kernel
>
> A few weeks ago I posted an RFC requesting some feedback on a proposal that I
> had to enhance our ability to monitor the Linux network stack for dropped
> packets. This patchset is the result of that RFC and its feedback.
>
> Overview:
>
> The Linux networking stack, from a users point of view suffers from four
> shortcommings:
>
> 1) Consolidation: The ability to detect dropped network packets is spread out
> over several proc file interfaces and various other utilities (tc,
> /proc/net/dev, snmp, etc)
>
> 2) Clarity: The ability to discern which statistics reflect dropped packets is
> not always clear
>
> 3) Ambiguity: The ability to understand the root cause of a lost packet is not
> always clear (some stats are incremented at multiple points in the kernel for
> subtly different reasons)
>
> 4) Performance: Interrogating all of these interface as they currently exist
> requires a polling operation, and potentially requires the serialization of
> various kernel operations, which can result in performance degradation.
>
> Proposed solution: dropwatch
>
> My proposed solution consists of 4 primary aspects:
>
> A) A hook into kfree_skb to detect dropped packets. Based on feedback from the
> earlier RFC, there are relatively few places in the kernel where packets are
> dropped because they have been successfully received or send (for lack of a
> better term, end-of-line points). The remaining calls to kfree_skb are made
> because there is something wrong and the packet must be discarded. I've split
> kfree_skb into two calls: kfree_skb and kfree_skb_clean. The later is simply a
> pass through to __kfree_skb, while the former adds a trace hook to capture a
> pointer to the skb and the location of the call.
>
> B) A trace hook to monitor the trace point in (A). this records the locations
> at which frames were dropped, and saves them for periodic reporting.
>
> C) A netlink protocol to both control the enabling/disabling of the trace hook
> in (B) and to deliver information on drops to interested applications in user
> space
>
> D) A user space application to listen for drop alerts from (C) and report them
> to an adminstrator/save them for later analysis/etc. I've implmented the start
> of this application, which relies on this patch set here:
> https://fedorahosted.org/dropwatch/
>
>
> Implementation Notes:
>
> About the only out-of the ordinary aspects I'd like to call attention to at this
> point are:
>
> 1) The trace point. I know that tracepoints are currently a controversial
> subject, and that their need was discussed briefly during the RFC. I elected to
> use a tracepoint here, simply because I felt like I was re-inventing the wheel
> otherwise. In order to implement this feature, I needed an ability to record
> when kfree_skb was called in certain places who's performance impact would be 0
> when the feature wasn't configured into the kernel, and when it was configured,
> but disabled. Given that anything else I used or wrote myself to hook into this
> point in the kernel would be a partial approximation of what tracepoints already
> offer, I think its preferable to go with a tracepoint here, simply because its
> good use of existing function.
>
> 2) The configuration messages in the netlink protocol are just a placeholder
> right now. I'm ok with that, given that the dropwatch user app doesn't have
> code to configure anything yet anyway (it just turns the service off/on and
> listens for drops right now). I figure I'll implment configuration messages in
> the app and kernel in parallel.
>
> 3) Performance. I'm not sure of the best way to model the performance here, but
> I disassembled the code in question, and the point at which we hook kfree_skb,
> this patch set only adds a conditional branch to the path, which is optimized
> for the not-taken case (the case in which the service is disabled), so adding
> this feature is as close to a zero impact as it can be when the service is
> disabled. Likewise, when tracepoints are not configured in the kernel, the
> tracepoint (which is defined as a macro) is preprocessed away, making the
> performance impact zero. That leave the case in which the service is enabled.
> While I don't have specific numbers, I can say that the trace path is lockless
> and per-cpu, and should run O(n) where n is the number of recordable drop points
> (default is 64). Sendingi/allocation of frames to userspace is done in the
> context of keventd, with a timer for hysteresis, to keep the number of sends
> lower and consolidate drop information. So performance should be reasonably
> good there. Again, no hard numbers, but I've monitored drops by passing udp
> traffic through localhost with netcat and SIGSTOP-ing the receiver. Console and
> ssh access remained very responsive
>
>
> Ok, so thats it, hope it meets with everybodys approval!
> Regards
> Neil
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
It would be good to have a way to mask off certain tracepoints.
For example, if running performance test and after measuring number
of packets dropped in TX queue overflow, only see others.
next prev parent reply other threads:[~2009-03-03 18:06 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-03 16:57 [Patch 0/5] Network Drop Monitor Neil Horman
2009-03-03 18:06 ` Stephen Hemminger [this message]
2009-03-03 18:54 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090303100637.31a5dac7@nehalam \
--to=shemminger@vyatta.com \
--cc=davem@davemloft.net \
--cc=jmorris@namei.org \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=pekkas@netcore.fi \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.