* [PATCH v4 0/5] netdev: show a process of packets @ 2010-08-23 9:41 Koki Sanagi 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi ` (5 more replies) 0 siblings, 6 replies; 23+ messages in thread From: Koki Sanagi @ 2010-08-23 9:41 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers Rebase to the latest net-next. CHANGE-LOG since v3: 1) change arguments of softirq tracepoint into original one. 2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked and add trace_kfree_skb before __kfree_skb instead of them. 3) add tracepoint to netif_rx and display it by netdev-times script. These patch-set adds tracepoints to show us a process of packets. Using these tracepoints and existing points, we can get the time when packet passes through some points in transmit or receive sequence. For example, this is an output of perf script which is attached by patch 5/5. 106133.171439sec cpu=0 irq_entry(+0.000msec irq=24:eth4) | softirq_entry(+0.006msec) | |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) | | | skb_copy_datagram_iovec(+0.039msec 10291::10291) | napi_poll_exit(+0.022msec eth4) 106134.175634sec cpu=1 irq_entry(+0.000msec irq=28:eth1) | |---netif_rx(+0.009msec skb=f3ef0a00) | softirq_entry(+0.018msec) | |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84) | | | skb_copy_datagram_iovec(+0.033msec 0:swapper) | napi_poll_exit(+0.035msec (no_device)) The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can show receive sequence frominterrupt(irq_entry) to application (skb_copy_datagram_iovec). This script shows one NET_RX softirq and events related to it. All relative time bases on first irq_entry which raise NET_RX softirq. dev len Qdisc netdevice free eth4 74 106125.030004sec 0.006msec 0.009msec eth4 87 106125.041020sec 0.007msec 0.023msec eth4 66 106125.042291sec 0.003msec 0.012msec eth4 66 106125.043274sec 0.006msec 0.004msec eth4 850 106125.044283sec 0.007msec 0.018msec The above is a transmit side. There are three check-time-points. Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in dev_hard_start_xmit. It indicates finishing putting a packet to driver. point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet. Values of this script are, from left, device name, length of a packet, a time of point1, an interval time between point1 and point2 and an interval time between point2 and point3. These times are useful to analyze a performance or to detect a point where packet delays. For example, - NET_RX softirq calling is late. - Application is late to take a packet. - It takes much time to put a transmitting packet to driver (It may be caused by packed queue) And also, these tracepoint help us to investigate a network driver's trouble from memory dump because ftrace records it to memory. And ftrace is so light even if always trace on. So, in a case investigating a problem which doesn't reproduce, it is useful. Thanks, Koki Sanagi. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi @ 2010-08-23 9:42 ` Koki Sanagi 2010-09-03 15:29 ` Frederic Weisbecker 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi ` (4 subsequent siblings) 5 siblings, 1 reply; 23+ messages in thread From: Koki Sanagi @ 2010-08-23 9:42 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Lai Jiangshan <laijs@cn.fujitsu.com> Add a tracepoint for tracing when softirq action is raised. It and the existed tracepoints complete softirq's tracepoints: softirq_raise, softirq_entry and softirq_exit. And when this tracepoint is used in combination with the softirq_entry tracepoint we can determine the softirq raise latency. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> [ factorize softirq events with DECLARE_EVENT_CLASS ] Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/linux/interrupt.h | 8 +++++++- include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- 2 files changed, 31 insertions(+), 3 deletions(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index a0384a4..d3e8e90 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -18,6 +18,7 @@ #include <asm/atomic.h> #include <asm/ptrace.h> #include <asm/system.h> +#include <trace/events/irq.h> /* * These correspond to the IORESOURCE_IRQ_* defines in @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); asmlinkage void __do_softirq(void); extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) +static inline void __raise_softirq_irqoff(unsigned int nr) +{ + trace_softirq_raise((struct softirq_action *)&nr, NULL); + or_softirq_pending(1UL << nr); +} + extern void raise_softirq_irqoff(unsigned int nr); extern void raise_softirq(unsigned int nr); extern void wakeup_softirqd(void); diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h index 0e4cfb6..3ddda02 100644 --- a/include/trace/events/irq.h +++ b/include/trace/events/irq.h @@ -5,7 +5,9 @@ #define _TRACE_IRQ_H #include <linux/tracepoint.h> -#include <linux/interrupt.h> + +struct irqaction; +struct softirq_action; #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } #define show_softirq_name(val) \ @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, ), TP_fast_assign( - __entry->vec = (int)(h - vec); + if (vec) + __entry->vec = (int)(h - vec); + else + __entry->vec = *((int *)h); ), TP_printk("vec=%d [action=%s]", __entry->vec, @@ -136,6 +141,23 @@ DEFINE_EVENT(softirq, softirq_exit, TP_ARGS(h, vec) ); +/** + * softirq_raise - called immediately when a softirq is raised + * @h: pointer to struct softirq_action + * @vec: pointer to first struct softirq_action in softirq_vec array + * + * The @h parameter contains a pointer to the softirq vector number which is + * raised. @vec is NULL and it means @h includes vector number not + * softirq_action. When used in combination with the softirq_entry tracepoint + * we can determine the softirq raise latency. + */ +DEFINE_EVENT(softirq, softirq_raise, + + TP_PROTO(struct softirq_action *h, struct softirq_action *vec), + + TP_ARGS(h, vec) +); + #endif /* _TRACE_IRQ_H */ /* This part must be outside protection */ ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi @ 2010-09-03 15:29 ` Frederic Weisbecker 2010-09-03 15:39 ` Steven Rostedt 2010-09-03 15:43 ` Steven Rostedt 0 siblings, 2 replies; 23+ messages in thread From: Frederic Weisbecker @ 2010-09-03 15:29 UTC (permalink / raw) To: Koki Sanagi Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, mathieu.desnoyers On Mon, Aug 23, 2010 at 06:42:48PM +0900, Koki Sanagi wrote: > From: Lai Jiangshan <laijs@cn.fujitsu.com> > > Add a tracepoint for tracing when softirq action is raised. > > It and the existed tracepoints complete softirq's tracepoints: > softirq_raise, softirq_entry and softirq_exit. > > And when this tracepoint is used in combination with > the softirq_entry tracepoint we can determine > the softirq raise latency. > > Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> > Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> > Acked-by: Frederic Weisbecker <fweisbec@gmail.com> > > [ factorize softirq events with DECLARE_EVENT_CLASS ] > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > --- > include/linux/interrupt.h | 8 +++++++- > include/trace/events/irq.h | 26 ++++++++++++++++++++++++-- > 2 files changed, 31 insertions(+), 3 deletions(-) > > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h > index a0384a4..d3e8e90 100644 > --- a/include/linux/interrupt.h > +++ b/include/linux/interrupt.h > @@ -18,6 +18,7 @@ > #include <asm/atomic.h> > #include <asm/ptrace.h> > #include <asm/system.h> > +#include <trace/events/irq.h> > > /* > * These correspond to the IORESOURCE_IRQ_* defines in > @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); > asmlinkage void __do_softirq(void); > extern void open_softirq(int nr, void (*action)(struct softirq_action *)); > extern void softirq_init(void); > -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) > +static inline void __raise_softirq_irqoff(unsigned int nr) > +{ > + trace_softirq_raise((struct softirq_action *)&nr, NULL); > + or_softirq_pending(1UL << nr); > +} > + > extern void raise_softirq_irqoff(unsigned int nr); > extern void raise_softirq(unsigned int nr); > extern void wakeup_softirqd(void); > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h > index 0e4cfb6..3ddda02 100644 > --- a/include/trace/events/irq.h > +++ b/include/trace/events/irq.h > @@ -5,7 +5,9 @@ > #define _TRACE_IRQ_H > > #include <linux/tracepoint.h> > -#include <linux/interrupt.h> > + > +struct irqaction; > +struct softirq_action; > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > #define show_softirq_name(val) \ > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > ), > > TP_fast_assign( > - __entry->vec = (int)(h - vec); > + if (vec) > + __entry->vec = (int)(h - vec); > + else > + __entry->vec = *((int *)h); > ), It seems that this will break softirq_entry/exit tracepoints. __entry->vec will deref vec->action() for these two, which is not what we want. If you can't have the same tracepoint signature for the three, just split the new one in a seperate TRACE_EVENT(). Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:29 ` Frederic Weisbecker @ 2010-09-03 15:39 ` Steven Rostedt 2010-09-03 15:42 ` Frederic Weisbecker 2010-09-03 15:43 ` Steven Rostedt 1 sibling, 1 reply; 23+ messages in thread From: Steven Rostedt @ 2010-09-03 15:39 UTC (permalink / raw) To: Frederic Weisbecker Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > > #define show_softirq_name(val) \ > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > > ), > > > > TP_fast_assign( > > - __entry->vec = (int)(h - vec); > > + if (vec) > > + __entry->vec = (int)(h - vec); > > + else > > + __entry->vec = *((int *)h); > > ), > > > > It seems that this will break softirq_entry/exit tracepoints. > __entry->vec will deref vec->action() for these two, which is not > what we want. But for trace_softirq_entry and trace_softirq_exit, vec will not be NULL. > > If you can't have the same tracepoint signature for the three, just > split the new one in a seperate TRACE_EVENT(). It may be a bit of a hack, and questionable about adding another TRACE_EVENT(). There still is a pretty good space savings in using DEFINE_EVENT() over TRACE_EVENT() though. -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:39 ` Steven Rostedt @ 2010-09-03 15:42 ` Frederic Weisbecker 0 siblings, 0 replies; 23+ messages in thread From: Frederic Weisbecker @ 2010-09-03 15:42 UTC (permalink / raw) To: Steven Rostedt Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, Sep 03, 2010 at 11:39:36AM -0400, Steven Rostedt wrote: > On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: > > > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > > > #define show_softirq_name(val) \ > > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > > > ), > > > > > > TP_fast_assign( > > > - __entry->vec = (int)(h - vec); > > > + if (vec) > > > + __entry->vec = (int)(h - vec); > > > + else > > > + __entry->vec = *((int *)h); > > > ), > > > > > > > > It seems that this will break softirq_entry/exit tracepoints. > > __entry->vec will deref vec->action() for these two, which is not > > what we want. > > But for trace_softirq_entry and trace_softirq_exit, vec will not be > NULL. Oh right... /me slaps his forehead > > > > > If you can't have the same tracepoint signature for the three, just > > split the new one in a seperate TRACE_EVENT(). > > It may be a bit of a hack, and questionable about adding another > TRACE_EVENT(). There still is a pretty good space savings in using > DEFINE_EVENT() over TRACE_EVENT() though. Yeah, let's keep it as is. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:29 ` Frederic Weisbecker 2010-09-03 15:39 ` Steven Rostedt @ 2010-09-03 15:43 ` Steven Rostedt 2010-09-03 15:50 ` Frederic Weisbecker 1 sibling, 1 reply; 23+ messages in thread From: Steven Rostedt @ 2010-09-03 15:43 UTC (permalink / raw) To: Frederic Weisbecker Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: > > /* > > * These correspond to the IORESOURCE_IRQ_* defines in > > @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); > > asmlinkage void __do_softirq(void); > > extern void open_softirq(int nr, void (*action)(struct softirq_action *)); > > extern void softirq_init(void); > > -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) > > +static inline void __raise_softirq_irqoff(unsigned int nr) > > +{ > > + trace_softirq_raise((struct softirq_action *)&nr, NULL); Perhaps doing: trace_softirq_raise((struct softirq_action *)((unsigend long)nr), NULL); and ... > > + or_softirq_pending(1UL << nr); > > +} > > + > > extern void raise_softirq_irqoff(unsigned int nr); > > extern void raise_softirq(unsigned int nr); > > extern void wakeup_softirqd(void); > > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h > > index 0e4cfb6..3ddda02 100644 > > --- a/include/trace/events/irq.h > > +++ b/include/trace/events/irq.h > > @@ -5,7 +5,9 @@ > > #define _TRACE_IRQ_H > > > > #include <linux/tracepoint.h> > > -#include <linux/interrupt.h> > > + > > +struct irqaction; > > +struct softirq_action; > > > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > > #define show_softirq_name(val) \ > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > > ), > > > > TP_fast_assign( > > - __entry->vec = (int)(h - vec); > > + if (vec) > > + __entry->vec = (int)(h - vec); > > + else > > + __entry->vec = *((int *)h); __entry->vec = (int)h; would be better. > > ), > > > > It seems that this will break softirq_entry/exit tracepoints. > __entry->vec will deref vec->action() for these two, which is not > what we want. > > If you can't have the same tracepoint signature for the three, just > split the new one in a seperate TRACE_EVENT(). Doing the above will at least be a bit safer. -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:43 ` Steven Rostedt @ 2010-09-03 15:50 ` Frederic Weisbecker 2010-09-06 1:46 ` Koki Sanagi 0 siblings, 1 reply; 23+ messages in thread From: Frederic Weisbecker @ 2010-09-03 15:50 UTC (permalink / raw) To: Steven Rostedt Cc: Koki Sanagi, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, Sep 03, 2010 at 11:43:12AM -0400, Steven Rostedt wrote: > On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: > > > > /* > > > * These correspond to the IORESOURCE_IRQ_* defines in > > > @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); > > > asmlinkage void __do_softirq(void); > > > extern void open_softirq(int nr, void (*action)(struct softirq_action *)); > > > extern void softirq_init(void); > > > -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) > > > +static inline void __raise_softirq_irqoff(unsigned int nr) > > > +{ > > > + trace_softirq_raise((struct softirq_action *)&nr, NULL); > > Perhaps doing: > > trace_softirq_raise((struct softirq_action *)((unsigend long)nr), > NULL); > > and ... > > > > + or_softirq_pending(1UL << nr); > > > +} > > > + > > > extern void raise_softirq_irqoff(unsigned int nr); > > > extern void raise_softirq(unsigned int nr); > > > extern void wakeup_softirqd(void); > > > diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h > > > index 0e4cfb6..3ddda02 100644 > > > --- a/include/trace/events/irq.h > > > +++ b/include/trace/events/irq.h > > > @@ -5,7 +5,9 @@ > > > #define _TRACE_IRQ_H > > > > > > #include <linux/tracepoint.h> > > > -#include <linux/interrupt.h> > > > + > > > +struct irqaction; > > > +struct softirq_action; > > > > > > #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } > > > #define show_softirq_name(val) \ > > > @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, > > > ), > > > > > > TP_fast_assign( > > > - __entry->vec = (int)(h - vec); > > > + if (vec) > > > + __entry->vec = (int)(h - vec); > > > + else > > > + __entry->vec = *((int *)h); > > __entry->vec = (int)h; > > would be better. > > > > > ), > > > > > > > > It seems that this will break softirq_entry/exit tracepoints. > > __entry->vec will deref vec->action() for these two, which is not > > what we want. > > > > If you can't have the same tracepoint signature for the three, just > > split the new one in a seperate TRACE_EVENT(). > > Doing the above will at least be a bit safer. Agreed, I'm going to change that in the patch. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 1/5] irq: add tracepoint to softirq_raise 2010-09-03 15:50 ` Frederic Weisbecker @ 2010-09-06 1:46 ` Koki Sanagi 0 siblings, 0 replies; 23+ messages in thread From: Koki Sanagi @ 2010-09-06 1:46 UTC (permalink / raw) To: Frederic Weisbecker Cc: Steven Rostedt, netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers (2010/09/04 0:50), Frederic Weisbecker wrote: > On Fri, Sep 03, 2010 at 11:43:12AM -0400, Steven Rostedt wrote: >> On Fri, 2010-09-03 at 17:29 +0200, Frederic Weisbecker wrote: >> >>>> /* >>>> * These correspond to the IORESOURCE_IRQ_* defines in >>>> @@ -407,7 +408,12 @@ asmlinkage void do_softirq(void); >>>> asmlinkage void __do_softirq(void); >>>> extern void open_softirq(int nr, void (*action)(struct softirq_action *)); >>>> extern void softirq_init(void); >>>> -#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0) >>>> +static inline void __raise_softirq_irqoff(unsigned int nr) >>>> +{ >>>> + trace_softirq_raise((struct softirq_action *)&nr, NULL); >> >> Perhaps doing: >> >> trace_softirq_raise((struct softirq_action *)((unsigend long)nr), >> NULL); >> >> and ... >> >>>> + or_softirq_pending(1UL << nr); >>>> +} >>>> + >>>> extern void raise_softirq_irqoff(unsigned int nr); >>>> extern void raise_softirq(unsigned int nr); >>>> extern void wakeup_softirqd(void); >>>> diff --git a/include/trace/events/irq.h b/include/trace/events/irq.h >>>> index 0e4cfb6..3ddda02 100644 >>>> --- a/include/trace/events/irq.h >>>> +++ b/include/trace/events/irq.h >>>> @@ -5,7 +5,9 @@ >>>> #define _TRACE_IRQ_H >>>> >>>> #include <linux/tracepoint.h> >>>> -#include <linux/interrupt.h> >>>> + >>>> +struct irqaction; >>>> +struct softirq_action; >>>> >>>> #define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq } >>>> #define show_softirq_name(val) \ >>>> @@ -93,7 +95,10 @@ DECLARE_EVENT_CLASS(softirq, >>>> ), >>>> >>>> TP_fast_assign( >>>> - __entry->vec = (int)(h - vec); >>>> + if (vec) >>>> + __entry->vec = (int)(h - vec); >>>> + else >>>> + __entry->vec = *((int *)h); >> >> __entry->vec = (int)h; >> >> would be better. >> >> >>>> ), >>> >>> >>> >>> It seems that this will break softirq_entry/exit tracepoints. >>> __entry->vec will deref vec->action() for these two, which is not >>> what we want. >>> >>> If you can't have the same tracepoint signature for the three, just >>> split the new one in a seperate TRACE_EVENT(). >> >> Doing the above will at least be a bit safer. > > > Agreed, I'm going to change that in the patch. > > Thanks. > I agree. Thanks, Koki Sanagi. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi @ 2010-08-23 9:43 ` Koki Sanagi 2010-08-24 3:52 ` David Miller 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi ` (3 subsequent siblings) 5 siblings, 1 reply; 23+ messages in thread From: Koki Sanagi @ 2010-08-23 9:43 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Neil Horman <nhorman@tuxdriver.com> This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve the usability of napi_poll tracepoint. <idle>-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3 <idle>-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1 An original patch is below. http://marc.info/?l=linux-kernel&m=126021713809450&w=2 Signed-off-by: Neil Horman <nhorman@tuxdriver.com> And add a fix by Steven Rostedt. http://marc.info/?l=linux-kernel&m=126150506519173&w=2 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/trace/events/napi.h | 25 +++++++++++++++++++++++-- 1 files changed, 23 insertions(+), 2 deletions(-) diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h index 188deca..8fe1e93 100644 --- a/include/trace/events/napi.h +++ b/include/trace/events/napi.h @@ -6,10 +6,31 @@ #include <linux/netdevice.h> #include <linux/tracepoint.h> +#include <linux/ftrace.h> + +#define NO_DEV "(no_device)" + +TRACE_EVENT(napi_poll, -DECLARE_TRACE(napi_poll, TP_PROTO(struct napi_struct *napi), - TP_ARGS(napi)); + + TP_ARGS(napi), + + TP_STRUCT__entry( + __field( struct napi_struct *, napi) + __string( dev_name, napi->dev ? napi->dev->name : NO_DEV) + ), + + TP_fast_assign( + __entry->napi = napi; + __assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV); + ), + + TP_printk("napi poll on napi struct %p for device %s", + __entry->napi, __get_str(dev_name)) +); + +#undef NO_DEV #endif /* _TRACE_NAPI_H_ */ ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi @ 2010-08-24 3:52 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2010-08-24 3:52 UTC (permalink / raw) To: sanagi.koki Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Mon, 23 Aug 2010 18:43:51 +0900 > From: Neil Horman <nhorman@tuxdriver.com> > > This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT to improve > the usability of napi_poll tracepoint. > > <idle>-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3 > <idle>-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1 > > An original patch is below. > http://marc.info/?l=linux-kernel&m=126021713809450&w=2 > Signed-off-by: Neil Horman <nhorman@tuxdriver.com> > > And add a fix by Steven Rostedt. > http://marc.info/?l=linux-kernel&m=126150506519173&w=2 > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 3/5] netdev: add tracepoints to netdev layer 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi @ 2010-08-23 9:45 ` Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi ` (2 subsequent siblings) 5 siblings, 1 reply; 23+ messages in thread From: Koki Sanagi @ 2010-08-23 9:45 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and netif_receive_skb. These tracepoints help you to monitor network driver's input/output. <idle>-0 [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84 <idle>-0 [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84 sshd-6828 [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226 sshd-6828 [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/trace/events/net.h | 82 ++++++++++++++++++++++++++++++++++++++++++++ net/core/dev.c | 6 +++ net/core/net-traces.c | 1 + 3 files changed, 89 insertions(+), 0 deletions(-) diff --git a/include/trace/events/net.h b/include/trace/events/net.h new file mode 100644 index 0000000..5f247f5 --- /dev/null +++ b/include/trace/events/net.h @@ -0,0 +1,82 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM net + +#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_NET_H + +#include <linux/skbuff.h> +#include <linux/netdevice.h> +#include <linux/ip.h> +#include <linux/tracepoint.h> + +TRACE_EVENT(net_dev_xmit, + + TP_PROTO(struct sk_buff *skb, + int rc), + + TP_ARGS(skb, rc), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + __field( unsigned int, len ) + __field( int, rc ) + __string( name, skb->dev->name ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + __entry->len = skb->len; + __entry->rc = rc; + __assign_str(name, skb->dev->name); + ), + + TP_printk("dev=%s skbaddr=%p len=%u rc=%d", + __get_str(name), __entry->skbaddr, __entry->len, __entry->rc) +); + +DECLARE_EVENT_CLASS(net_dev_template, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + __field( unsigned int, len ) + __string( name, skb->dev->name ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + __entry->len = skb->len; + __assign_str(name, skb->dev->name); + ), + + TP_printk("dev=%s skbaddr=%p len=%u", + __get_str(name), __entry->skbaddr, __entry->len) +) + +DEFINE_EVENT(net_dev_template, net_dev_queue, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); + +DEFINE_EVENT(net_dev_template, netif_receive_skb, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); + +DEFINE_EVENT(net_dev_template, netif_rx, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb) +); +#endif /* _TRACE_NET_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/net/core/dev.c b/net/core/dev.c index 7cd5237..c9b026a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -128,6 +128,7 @@ #include <linux/jhash.h> #include <linux/random.h> #include <trace/events/napi.h> +#include <trace/events/net.h> #include <linux/pci.h> #include "net-sysfs.h" @@ -1978,6 +1979,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, } rc = ops->ndo_start_xmit(skb, dev); + trace_net_dev_xmit(skb, rc); if (rc == NETDEV_TX_OK) txq_trans_update(txq); return rc; @@ -1998,6 +2000,7 @@ gso: skb_dst_drop(nskb); rc = ops->ndo_start_xmit(nskb, dev); + trace_net_dev_xmit(nskb, rc); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) goto out_kfree_gso_skb; @@ -2186,6 +2189,7 @@ int dev_queue_xmit(struct sk_buff *skb) #ifdef CONFIG_NET_CLS_ACT skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS); #endif + trace_net_dev_queue(skb); if (q->enqueue) { rc = __dev_xmit_skb(skb, q, dev, txq); goto out; @@ -2525,6 +2529,7 @@ int netif_rx(struct sk_buff *skb) if (netdev_tstamp_prequeue) net_timestamp_check(skb); + trace_netif_rx(skb); #ifdef CONFIG_RPS { struct rps_dev_flow voidflow, *rflow = &voidflow; @@ -2841,6 +2846,7 @@ static int __netif_receive_skb(struct sk_buff *skb) if (!netdev_tstamp_prequeue) net_timestamp_check(skb); + trace_netif_receive_skb(skb); if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb)) return NET_RX_SUCCESS; diff --git a/net/core/net-traces.c b/net/core/net-traces.c index afa6380..7f1bb2a 100644 --- a/net/core/net-traces.c +++ b/net/core/net-traces.c @@ -26,6 +26,7 @@ #define CREATE_TRACE_POINTS #include <trace/events/skb.h> +#include <trace/events/net.h> #include <trace/events/napi.h> EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb); ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 3/5] netdev: add tracepoints to netdev layer 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi @ 2010-08-24 3:53 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2010-08-24 3:53 UTC (permalink / raw) To: sanagi.koki Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Mon, 23 Aug 2010 18:45:02 +0900 > This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit, netif_rx and > netif_receive_skb. These tracepoints help you to monitor network driver's > input/output. > > <idle>-0 [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84 > <idle>-0 [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84 > sshd-6828 [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226 > sshd-6828 [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0 > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 4/5] skb: add tracepoints to freeing skb 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi ` (2 preceding siblings ...) 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi @ 2010-08-23 9:46 ` Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi 2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt 5 siblings, 1 reply; 23+ messages in thread From: Koki Sanagi @ 2010-08-23 9:46 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers This patch adds tracepoint to consume_skb and add trace_kfree_skb before __kfree_skb in skb_free_datagram_locked and net_tx_action. Combinating with tracepoint on dev_hard_start_xmit, we can check how long it takes to free transmited packets. And using it, we can calculate how many packets driver had at that time. It is useful when a drop of transmited packet is a problem. sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8 Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- include/trace/events/skb.h | 17 +++++++++++++++++ net/core/datagram.c | 1 + net/core/dev.c | 2 ++ net/core/skbuff.c | 1 + 4 files changed, 21 insertions(+), 0 deletions(-) diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h index 4b2be6d..75ce9d5 100644 --- a/include/trace/events/skb.h +++ b/include/trace/events/skb.h @@ -35,6 +35,23 @@ TRACE_EVENT(kfree_skb, __entry->skbaddr, __entry->protocol, __entry->location) ); +TRACE_EVENT(consume_skb, + + TP_PROTO(struct sk_buff *skb), + + TP_ARGS(skb), + + TP_STRUCT__entry( + __field( void *, skbaddr ) + ), + + TP_fast_assign( + __entry->skbaddr = skb; + ), + + TP_printk("skbaddr=%p", __entry->skbaddr) +); + TRACE_EVENT(skb_copy_datagram_iovec, TP_PROTO(const struct sk_buff *skb, int len), diff --git a/net/core/datagram.c b/net/core/datagram.c index 251997a..282806b 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -243,6 +243,7 @@ void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb) unlock_sock_fast(sk, slow); /* skb is now orphaned, can be freed outside of locked section */ + trace_kfree_skb(skb, skb_free_datagram_locked); __kfree_skb(skb); } EXPORT_SYMBOL(skb_free_datagram_locked); diff --git a/net/core/dev.c b/net/core/dev.c index c9b026a..48f7977 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -129,6 +129,7 @@ #include <linux/random.h> #include <trace/events/napi.h> #include <trace/events/net.h> +#include <trace/events/skb.h> #include <linux/pci.h> #include "net-sysfs.h" @@ -2589,6 +2590,7 @@ static void net_tx_action(struct softirq_action *h) clist = clist->next; WARN_ON(atomic_read(&skb->users)); + trace_kfree_skb(skb, net_tx_action); __kfree_skb(skb); } } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 99ef721..ef4ffa8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb) smp_rmb(); else if (likely(!atomic_dec_and_test(&skb->users))) return; + trace_consume_skb(skb); __kfree_skb(skb); } EXPORT_SYMBOL(consume_skb); ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 4/5] skb: add tracepoints to freeing skb 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi @ 2010-08-24 3:53 ` David Miller 0 siblings, 0 replies; 23+ messages in thread From: David Miller @ 2010-08-24 3:53 UTC (permalink / raw) To: sanagi.koki Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Mon, 23 Aug 2010 18:46:12 +0900 > This patch adds tracepoint to consume_skb and add trace_kfree_skb before > __kfree_skb in skb_free_datagram_locked and net_tx_action. > Combinating with tracepoint on dev_hard_start_xmit, we can check how long it > takes to free transmited packets. And using it, we can calculate how many > packets driver had at that time. It is useful when a drop of transmited packet > is a problem. > > sshd-6828 [000] 112689.258154: consume_skb: skbaddr=f2d99bb8 > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 5/5] perf:add a script shows a process of packet 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi ` (3 preceding siblings ...) 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi @ 2010-08-23 9:47 ` Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-09-07 16:57 ` Frederic Weisbecker 2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt 5 siblings, 2 replies; 23+ messages in thread From: Koki Sanagi @ 2010-08-23 9:47 UTC (permalink / raw) To: netdev Cc: linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers Add a perf script which shows a process of packets and processed time. It helps us to investigate networking or network device. If you want to use it, install perf and record perf.data like following. #perf trace record netdev-times [script] If you set script, perf gathers records until it ends. If not, you must Ctrl-C to stop recording. And if you want a report from record, #perf trace report netdev-times [options] If you use some options, you can limit an output. Option is below. tx: show only process of tx packets rx: show only process of rx packets dev=: show a process specified with this option debug: work with debug mode. It shows buffer status. For example, if you want to show a process of received packets associated with eth4, #perf trace report netdev-times rx dev=eth4 106133.171439sec cpu=0 irq_entry(+0.000msec irq=24:eth4) | softirq_entry(+0.006msec) | |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) | | | skb_copy_datagram_iovec(+0.039msec 10291::10291) | napi_poll_exit(+0.022msec eth4) This perf script helps us to analyze a process time of transmit/receive sequence. Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> --- tools/perf/scripts/python/bin/netdev-times-record | 8 + tools/perf/scripts/python/bin/netdev-times-report | 5 + tools/perf/scripts/python/netdev-times.py | 464 +++++++++++++++++++++ 3 files changed, 477 insertions(+), 0 deletions(-) diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record new file mode 100644 index 0000000..2b59511 --- /dev/null +++ b/tools/perf/scripts/python/bin/netdev-times-record @@ -0,0 +1,8 @@ +#!/bin/bash +perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue \ + -e net:netif_receive_skb -e net:netif_rx \ + -e skb:consume_skb -e skb:kfree_skb \ + -e skb:skb_copy_datagram_iovec -e napi:napi_poll \ + -e irq:irq_handler_entry -e irq:irq_handler_exit \ + -e irq:softirq_entry -e irq:softirq_exit \ + -e irq:softirq_raise $@ diff --git a/tools/perf/scripts/python/bin/netdev-times-report b/tools/perf/scripts/python/bin/netdev-times-report new file mode 100644 index 0000000..c3d0a63 --- /dev/null +++ b/tools/perf/scripts/python/bin/netdev-times-report @@ -0,0 +1,5 @@ +#!/bin/bash +# description: display a process of packet and processing time +# args: [tx] [rx] [dev=] [debug] + +perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@ diff --git a/tools/perf/scripts/python/netdev-times.py b/tools/perf/scripts/python/netdev-times.py new file mode 100644 index 0000000..9aa0a32 --- /dev/null +++ b/tools/perf/scripts/python/netdev-times.py @@ -0,0 +1,464 @@ +# Display a process of packets and processed time. +# It helps us to investigate networking or network device. +# +# options +# tx: show only tx chart +# rx: show only rx chart +# dev=: show only thing related to specified device +# debug: work with debug mode. It shows buffer status. + +import os +import sys + +sys.path.append(os.environ['PERF_EXEC_PATH'] + \ + '/scripts/python/Perf-Trace-Util/lib/Perf/Trace') + +from perf_trace_context import * +from Core import * +from Util import * + +all_event_list = []; # insert all tracepoint event related with this script +irq_dic = {}; # key is cpu and value is a list which stacks irqs + # which raise NET_RX softirq +net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry + # and a list which stacks receive +receive_hunk_list = []; # a list which include a sequence of receive events +rx_skb_list = []; # received packet list for matching + # skb_copy_datagram_iovec + +buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and + # tx_xmit_list +of_count_rx_skb_list = 0; # overflow count + +tx_queue_list = []; # list of packets which pass through dev_queue_xmit +of_count_tx_queue_list = 0; # overflow count + +tx_xmit_list = []; # list of packets which pass through dev_hard_start_xmit +of_count_tx_xmit_list = 0; # overflow count + +tx_free_list = []; # list of packets which is freed + +# options +show_tx = 0; +show_rx = 0; +dev = 0; # store a name of device specified by option "dev=" +debug = 0; + +# indices of event_info tuple +EINFO_IDX_NAME= 0 +EINFO_IDX_CONTEXT=1 +EINFO_IDX_CPU= 2 +EINFO_IDX_TIME= 3 +EINFO_IDX_PID= 4 +EINFO_IDX_COMM= 5 + +# Calculate a time interval(msec) from src(nsec) to dst(nsec) +def diff_msec(src, dst): + return (dst - src) / 1000000.0 + +# Display a process of transmitting a packet +def print_transmit(hunk): + if dev != 0 and hunk['dev'].find(dev) < 0: + return + print "%7s %5d %6d.%06dsec %12.3fmsec %12.3fmsec" % \ + (hunk['dev'], hunk['len'], + nsecs_secs(hunk['queue_t']), + nsecs_nsecs(hunk['queue_t'])/1000, + diff_msec(hunk['queue_t'], hunk['xmit_t']), + diff_msec(hunk['xmit_t'], hunk['free_t'])) + +# Format for displaying rx packet processing +PF_IRQ_ENTRY= " irq_entry(+%.3fmsec irq=%d:%s)" +PF_SOFT_ENTRY=" softirq_entry(+%.3fmsec)" +PF_NAPI_POLL= " napi_poll_exit(+%.3fmsec %s)" +PF_JOINT= " |" +PF_WJOINT= " | |" +PF_NET_RECV= " |---netif_receive_skb(+%.3fmsec skb=%x len=%d)" +PF_NET_RX= " |---netif_rx(+%.3fmsec skb=%x)" +PF_CPY_DGRAM= " | skb_copy_datagram_iovec(+%.3fmsec %d:%s)" +PF_KFREE_SKB= " | kfree_skb(+%.3fmsec location=%x)" +PF_CONS_SKB= " | consume_skb(+%.3fmsec)" + +# Display a process of received packets and interrputs associated with +# a NET_RX softirq +def print_receive(hunk): + show_hunk = 0 + irq_list = hunk['irq_list'] + cpu = irq_list[0]['cpu'] + base_t = irq_list[0]['irq_ent_t'] + # check if this hunk should be showed + if dev != 0: + for i in range(len(irq_list)): + if irq_list[i]['name'].find(dev) >= 0: + show_hunk = 1 + break + else: + show_hunk = 1 + if show_hunk == 0: + return + + print "%d.%06dsec cpu=%d" % \ + (nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu) + for i in range(len(irq_list)): + print PF_IRQ_ENTRY % \ + (diff_msec(base_t, irq_list[i]['irq_ent_t']), + irq_list[i]['irq'], irq_list[i]['name']) + print PF_JOINT + irq_event_list = irq_list[i]['event_list'] + for j in range(len(irq_event_list)): + irq_event = irq_event_list[j] + if irq_event['event'] == 'netif_rx': + print PF_NET_RX % \ + (diff_msec(base_t, irq_event['time']), + irq_event['skbaddr']) + print PF_JOINT + print PF_SOFT_ENTRY % \ + diff_msec(base_t, hunk['sirq_ent_t']) + print PF_JOINT + event_list = hunk['event_list'] + for i in range(len(event_list)): + event = event_list[i] + if event['event_name'] == 'napi_poll': + print PF_NAPI_POLL % \ + (diff_msec(base_t, event['event_t']), event['dev']) + if i == len(event_list) - 1: + print "" + else: + print PF_JOINT + else: + print PF_NET_RECV % \ + (diff_msec(base_t, event['event_t']), event['skbaddr'], + event['len']) + if 'comm' in event.keys(): + print PF_WJOINT + print PF_CPY_DGRAM % \ + (diff_msec(base_t, event['comm_t']), + event['pid'], event['comm']) + elif 'handle' in event.keys(): + print PF_WJOINT + if event['handle'] == "kfree_skb": + print PF_KFREE_SKB % \ + (diff_msec(base_t, + event['comm_t']), + event['location']) + elif event['handle'] == "consume_skb": + print PF_CONS_SKB % \ + diff_msec(base_t, + event['comm_t']) + print PF_JOINT + +def trace_begin(): + global show_tx + global show_rx + global dev + global debug + + for i in range(len(sys.argv)): + if i == 0: + continue + arg = sys.argv[i] + if arg == 'tx': + show_tx = 1 + elif arg =='rx': + show_rx = 1 + elif arg.find('dev=',0, 4) >= 0: + dev = arg[4:] + elif arg == 'debug': + debug = 1 + if show_tx == 0 and show_rx == 0: + show_tx = 1 + show_rx = 1 + +def trace_end(): + # order all events in time + all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME], + b[EINFO_IDX_TIME])) + # process all events + for i in range(len(all_event_list)): + event_info = all_event_list[i] + name = event_info[EINFO_IDX_NAME] + if name == 'irq__softirq_exit': + handle_irq_softirq_exit(event_info) + elif name == 'irq__softirq_entry': + handle_irq_softirq_entry(event_info) + elif name == 'irq__softirq_raise': + handle_irq_softirq_raise(event_info) + elif name == 'irq__irq_handler_entry': + handle_irq_handler_entry(event_info) + elif name == 'irq__irq_handler_exit': + handle_irq_handler_exit(event_info) + elif name == 'napi__napi_poll': + handle_napi_poll(event_info) + elif name == 'net__netif_receive_skb': + handle_netif_receive_skb(event_info) + elif name == 'net__netif_rx': + handle_netif_rx(event_info) + elif name == 'skb__skb_copy_datagram_iovec': + handle_skb_copy_datagram_iovec(event_info) + elif name == 'net__net_dev_queue': + handle_net_dev_queue(event_info) + elif name == 'net__net_dev_xmit': + handle_net_dev_xmit(event_info) + elif name == 'skb__kfree_skb': + handle_kfree_skb(event_info) + elif name == 'skb__consume_skb': + handle_consume_skb(event_info) + # display receive hunks + if show_rx: + for i in range(len(receive_hunk_list)): + print_receive(receive_hunk_list[i]) + # display transmit hunks + if show_tx: + print " dev len Qdisc " \ + " netdevice free" + for i in range(len(tx_free_list)): + print_transmit(tx_free_list[i]) + if debug: + print "debug buffer status" + print "----------------------------" + print "xmit Qdisc:remain:%d overflow:%d" % \ + (len(tx_queue_list), of_count_tx_queue_list) + print "xmit netdevice:remain:%d overflow:%d" % \ + (len(tx_xmit_list), of_count_tx_xmit_list) + print "receive:remain:%d overflow:%d" % \ + (len(rx_skb_list), of_count_rx_skb_list) + +# called from perf, when it finds a correspoinding event +def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec): + if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX": + return + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec) + all_event_list.append(event_info) + +def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm, + irq, irq_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + irq, irq_name) + all_event_list.append(event_info) + +def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret) + all_event_list.append(event_info) + +def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + napi, dev_name) + all_event_list.append(event_info) + +def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr, + skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr, + skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, dev_name) + all_event_list.append(event_info) + +def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen, rc, dev_name): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen, rc ,dev_name) + all_event_list.append(event_info) + +def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm, + skbaddr, protocol, location): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, protocol, location) + all_event_list.append(event_info) + +def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr) + all_event_list.append(event_info) + +def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm, + skbaddr, skblen): + event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, + skbaddr, skblen) + all_event_list.append(event_info) + +def handle_irq_handler_entry(event_info): + (name, context, cpu, time, pid, comm, irq, irq_name) = event_info + if cpu not in irq_dic.keys(): + irq_dic[cpu] = [] + irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time} + irq_dic[cpu].append(irq_record) + +def handle_irq_handler_exit(event_info): + (name, context, cpu, time, pid, comm, irq, ret) = event_info + if cpu not in irq_dic.keys(): + return + irq_record = irq_dic[cpu].pop() + if irq != irq_record['irq']: + return + irq_record.update({'irq_ext_t':time}) + # if an irq doesn't include NET_RX softirq, drop. + if 'event_list' in irq_record.keys(): + irq_dic[cpu].append(irq_record) + +def handle_irq_softirq_raise(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + if cpu not in irq_dic.keys() \ + or len(irq_dic[cpu]) == 0: + return + irq_record = irq_dic[cpu].pop() + if 'event_list' in irq_record.keys(): + irq_event_list = irq_record['event_list'] + else: + irq_event_list = [] + irq_event_list.append({'time':time, 'event':'sirq_raise'}) + irq_record.update({'event_list':irq_event_list}) + irq_dic[cpu].append(irq_record) + +def handle_irq_softirq_entry(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]} + +def handle_irq_softirq_exit(event_info): + (name, context, cpu, time, pid, comm, vec) = event_info + irq_list = [] + event_list = 0 + if cpu in irq_dic.keys(): + irq_list = irq_dic[cpu] + del irq_dic[cpu] + if cpu in net_rx_dic.keys(): + sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t'] + event_list = net_rx_dic[cpu]['event_list'] + del net_rx_dic[cpu] + if irq_list == [] or event_list == 0: + return + rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time, + 'irq_list':irq_list, 'event_list':event_list} + # merge information realted to a NET_RX softirq + receive_hunk_list.append(rec_data) + +def handle_napi_poll(event_info): + (name, context, cpu, time, pid, comm, napi, dev_name) = event_info + if cpu in net_rx_dic.keys(): + event_list = net_rx_dic[cpu]['event_list'] + rec_data = {'event_name':'napi_poll', + 'dev':dev_name, 'event_t':time} + event_list.append(rec_data) + +def handle_netif_rx(event_info): + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + if cpu not in irq_dic.keys() \ + or len(irq_dic[cpu]) == 0: + return + irq_record = irq_dic[cpu].pop() + if 'event_list' in irq_record.keys(): + irq_event_list = irq_record['event_list'] + else: + irq_event_list = [] + irq_event_list.append({'time':time, 'event':'netif_rx', + 'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name}) + irq_record.update({'event_list':irq_event_list}) + irq_dic[cpu].append(irq_record) + +def handle_netif_receive_skb(event_info): + global of_count_rx_skb_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + if cpu in net_rx_dic.keys(): + rec_data = {'event_name':'netif_receive_skb', + 'event_t':time, 'skbaddr':skbaddr, 'len':skblen} + event_list = net_rx_dic[cpu]['event_list'] + event_list.append(rec_data) + rx_skb_list.insert(0, rec_data) + if len(rx_skb_list) > buffer_budget: + rx_skb_list.pop() + of_count_rx_skb_list += 1 + +def handle_net_dev_queue(event_info): + global of_count_tx_queue_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, dev_name) = event_info + skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time} + tx_queue_list.insert(0, skb) + if len(tx_queue_list) > buffer_budget: + tx_queue_list.pop() + of_count_tx_queue_list += 1 + +def handle_net_dev_xmit(event_info): + global of_count_tx_xmit_list + + (name, context, cpu, time, pid, comm, + skbaddr, skblen, rc, dev_name) = event_info + if rc == 0: # NETDEV_TX_OK + for i in range(len(tx_queue_list)): + skb = tx_queue_list[i] + if skb['skbaddr'] == skbaddr: + skb['xmit_t'] = time + tx_xmit_list.insert(0, skb) + del tx_queue_list[i] + if len(tx_xmit_list) > buffer_budget: + tx_xmit_list.pop() + of_count_tx_xmit_list += 1 + return + +def handle_kfree_skb(event_info): + (name, context, cpu, time, pid, comm, + skbaddr, protocol, location) = event_info + for i in range(len(tx_queue_list)): + skb = tx_queue_list[i] + if skb['skbaddr'] == skbaddr: + del tx_queue_list[i] + return + for i in range(len(tx_xmit_list)): + skb = tx_xmit_list[i] + if skb['skbaddr'] == skbaddr: + skb['free_t'] = time + tx_free_list.append(skb) + del tx_xmit_list[i] + return + for i in range(len(rx_skb_list)): + rec_data = rx_skb_list[i] + if rec_data['skbaddr'] == skbaddr: + rec_data.update({'handle':"kfree_skb", + 'comm':comm, 'pid':pid, 'comm_t':time}) + del rx_skb_list[i] + return + +def handle_consume_skb(event_info): + (name, context, cpu, time, pid, comm, skbaddr) = event_info + for i in range(len(tx_xmit_list)): + skb = tx_xmit_list[i] + if skb['skbaddr'] == skbaddr: + skb['free_t'] = time + tx_free_list.append(skb) + del tx_xmit_list[i] + return + +def handle_skb_copy_datagram_iovec(event_info): + (name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info + for i in range(len(rx_skb_list)): + rec_data = rx_skb_list[i] + if skbaddr == rec_data['skbaddr']: + rec_data.update({'handle':"skb_copy_datagram_iovec", + 'comm':comm, 'pid':pid, 'comm_t':time}) + del rx_skb_list[i] + return ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 5/5] perf:add a script shows a process of packet 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi @ 2010-08-24 3:53 ` David Miller 2010-09-07 16:57 ` Frederic Weisbecker 1 sibling, 0 replies; 23+ messages in thread From: David Miller @ 2010-08-24 3:53 UTC (permalink / raw) To: sanagi.koki Cc: netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Mon, 23 Aug 2010 18:47:09 +0900 > Add a perf script which shows a process of packets and processed time. > It helps us to investigate networking or network device. ... > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 5/5] perf:add a script shows a process of packet 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi 2010-08-24 3:53 ` David Miller @ 2010-09-07 16:57 ` Frederic Weisbecker 1 sibling, 0 replies; 23+ messages in thread From: Frederic Weisbecker @ 2010-09-07 16:57 UTC (permalink / raw) To: Koki Sanagi Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, rostedt, eric.dumazet, mathieu.desnoyers On Mon, Aug 23, 2010 at 06:47:09PM +0900, Koki Sanagi wrote: > Add a perf script which shows a process of packets and processed time. > It helps us to investigate networking or network device. > > If you want to use it, install perf and record perf.data like following. > > #perf trace record netdev-times [script] > > If you set script, perf gathers records until it ends. > If not, you must Ctrl-C to stop recording. > > And if you want a report from record, > > #perf trace report netdev-times [options] > > If you use some options, you can limit an output. > Option is below. > > tx: show only process of tx packets > rx: show only process of rx packets > dev=: show a process specified with this option > debug: work with debug mode. It shows buffer status. > > For example, if you want to show a process of received packets associated > with eth4, > > #perf trace report netdev-times rx dev=eth4 > 106133.171439sec cpu=0 > irq_entry(+0.000msec irq=24:eth4) > | > softirq_entry(+0.006msec) > | > |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) > | | > | skb_copy_datagram_iovec(+0.039msec 10291::10291) > | > napi_poll_exit(+0.022msec eth4) > > This perf script helps us to analyze a process time of transmit/receive > sequence. > > Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > --- > tools/perf/scripts/python/bin/netdev-times-record | 8 + > tools/perf/scripts/python/bin/netdev-times-report | 5 + > tools/perf/scripts/python/netdev-times.py | 464 +++++++++++++++++++++ > 3 files changed, 477 insertions(+), 0 deletions(-) > > diff --git a/tools/perf/scripts/python/bin/netdev-times-record b/tools/perf/scripts/python/bin/netdev-times-record > new file mode 100644 > index 0000000..2b59511 > --- /dev/null > +++ b/tools/perf/scripts/python/bin/netdev-times-record > @@ -0,0 +1,8 @@ > +#!/bin/bash > +perf record -c 1 -f -R -a -e net:net_dev_xmit -e net:net_dev_queue \ Nano-nits: -c 1 and -R are now default settings for tracepoints and -f is not needed anymore. I've removed them. > +all_event_list = []; # insert all tracepoint event related with this script Ah I didn't know ";" works with python :) > +def trace_end(): > + # order all events in time > + all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME], > + b[EINFO_IDX_TIME])) Events already arrive in time order to the scripts. Thnaks! ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi ` (4 preceding siblings ...) 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi @ 2010-08-30 23:50 ` Steven Rostedt 2010-09-03 2:10 ` Koki Sanagi 5 siblings, 1 reply; 23+ messages in thread From: Steven Rostedt @ 2010-08-30 23:50 UTC (permalink / raw) To: Koki Sanagi Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, fweisbec, mathieu.desnoyers On Mon, 2010-08-23 at 18:41 +0900, Koki Sanagi wrote: > Rebase to the latest net-next. > > CHANGE-LOG since v3: > 1) change arguments of softirq tracepoint into original one. > 2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked > and add trace_kfree_skb before __kfree_skb instead of them. > 3) add tracepoint to netif_rx and display it by netdev-times script. > > These patch-set adds tracepoints to show us a process of packets. > Using these tracepoints and existing points, we can get the time when > packet passes through some points in transmit or receive sequence. > For example, this is an output of perf script which is attached by patch 5/5. > > 106133.171439sec cpu=0 > irq_entry(+0.000msec irq=24:eth4) > | > softirq_entry(+0.006msec) > | > |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) > | | > | skb_copy_datagram_iovec(+0.039msec 10291::10291) > | > napi_poll_exit(+0.022msec eth4) > > 106134.175634sec cpu=1 > irq_entry(+0.000msec irq=28:eth1) > | > |---netif_rx(+0.009msec skb=f3ef0a00) > | > softirq_entry(+0.018msec) > | > |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84) > | | > | skb_copy_datagram_iovec(+0.033msec 0:swapper) > | > napi_poll_exit(+0.035msec (no_device)) > > The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can > show receive sequence frominterrupt(irq_entry) to application > (skb_copy_datagram_iovec). > This script shows one NET_RX softirq and events related to it. All relative > time bases on first irq_entry which raise NET_RX softirq. > > dev len Qdisc netdevice free > eth4 74 106125.030004sec 0.006msec 0.009msec > eth4 87 106125.041020sec 0.007msec 0.023msec > eth4 66 106125.042291sec 0.003msec 0.012msec > eth4 66 106125.043274sec 0.006msec 0.004msec > eth4 850 106125.044283sec 0.007msec 0.018msec > > The above is a transmit side. There are three check-time-points. > Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in > dev_hard_start_xmit. It indicates finishing putting a packet to driver. > point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet. > Values of this script are, from left, device name, length of a packet, a time of > point1, an interval time between point1 and point2 and an interval time between > point2 and point3. > > These times are useful to analyze a performance or to detect a point where > packet delays. For example, > - NET_RX softirq calling is late. > - Application is late to take a packet. > - It takes much time to put a transmitting packet to driver > (It may be caused by packed queue) > > And also, these tracepoint help us to investigate a network driver's trouble > from memory dump because ftrace records it to memory. And ftrace is so light > even if always trace on. So, in a case investigating a problem which doesn't > reproduce, it is useful. > The entire series: Acked-by: Steven Rostedt <rostedt@goodmis.org> -- Steve ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt @ 2010-09-03 2:10 ` Koki Sanagi 2010-09-03 2:17 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Koki Sanagi @ 2010-09-03 2:10 UTC (permalink / raw) To: Steven Rostedt Cc: netdev, linux-kernel, davem, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, fweisbec, mathieu.desnoyers (2010/08/31 8:50), Steven Rostedt wrote: > On Mon, 2010-08-23 at 18:41 +0900, Koki Sanagi wrote: >> Rebase to the latest net-next. >> >> CHANGE-LOG since v3: >> 1) change arguments of softirq tracepoint into original one. >> 2) remove tracepoint of dev_kfree_skb_irq and skb_free_datagram_locked >> and add trace_kfree_skb before __kfree_skb instead of them. >> 3) add tracepoint to netif_rx and display it by netdev-times script. >> >> These patch-set adds tracepoints to show us a process of packets. >> Using these tracepoints and existing points, we can get the time when >> packet passes through some points in transmit or receive sequence. >> For example, this is an output of perf script which is attached by patch 5/5. >> >> 106133.171439sec cpu=0 >> irq_entry(+0.000msec irq=24:eth4) >> | >> softirq_entry(+0.006msec) >> | >> |---netif_receive_skb(+0.010msec skb=f2d15900 len=100) >> | | >> | skb_copy_datagram_iovec(+0.039msec 10291::10291) >> | >> napi_poll_exit(+0.022msec eth4) >> >> 106134.175634sec cpu=1 >> irq_entry(+0.000msec irq=28:eth1) >> | >> |---netif_rx(+0.009msec skb=f3ef0a00) >> | >> softirq_entry(+0.018msec) >> | >> |---netif_receive_skb(+0.021msec skb=f3ef0a00 len=84) >> | | >> | skb_copy_datagram_iovec(+0.033msec 0:swapper) >> | >> napi_poll_exit(+0.035msec (no_device)) >> >> The above is a receive side(eth4 is NAPI. eth1 is non-NAPI). Like this, it can >> show receive sequence frominterrupt(irq_entry) to application >> (skb_copy_datagram_iovec). >> This script shows one NET_RX softirq and events related to it. All relative >> time bases on first irq_entry which raise NET_RX softirq. >> >> dev len Qdisc netdevice free >> eth4 74 106125.030004sec 0.006msec 0.009msec >> eth4 87 106125.041020sec 0.007msec 0.023msec >> eth4 66 106125.042291sec 0.003msec 0.012msec >> eth4 66 106125.043274sec 0.006msec 0.004msec >> eth4 850 106125.044283sec 0.007msec 0.018msec >> >> The above is a transmit side. There are three check-time-points. >> Point1 is before putting a packet to Qdisc. point2 is after ndo_start_xmit in >> dev_hard_start_xmit. It indicates finishing putting a packet to driver. >> point3 is in consume_skb and kfree_skb. It indicates freeing a transmitted packet. >> Values of this script are, from left, device name, length of a packet, a time of >> point1, an interval time between point1 and point2 and an interval time between >> point2 and point3. >> >> These times are useful to analyze a performance or to detect a point where >> packet delays. For example, >> - NET_RX softirq calling is late. >> - Application is late to take a packet. >> - It takes much time to put a transmitting packet to driver >> (It may be caused by packed queue) >> >> And also, these tracepoint help us to investigate a network driver's trouble >> from memory dump because ftrace records it to memory. And ftrace is so light >> even if always trace on. So, in a case investigating a problem which doesn't >> reproduce, it is useful. >> > > The entire series: > > Acked-by: Steven Rostedt <rostedt@goodmis.org> > > -- Steve > Thanks many acks. and I have one question. These patches have several component. Patch1 is kernel component, but patch2-5 are netdev component. What tree is good to be included ? If it is not net-next, I must rebase to another tree. Thanks, Koki Sanagi. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-09-03 2:10 ` Koki Sanagi @ 2010-09-03 2:17 ` David Miller 2010-09-03 2:55 ` Koki Sanagi 0 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2010-09-03 2:17 UTC (permalink / raw) To: sanagi.koki Cc: rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, fweisbec, mathieu.desnoyers From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Date: Fri, 03 Sep 2010 11:10:51 +0900 > Thanks many acks. and I have one question. > > These patches have several component. > > Patch1 is kernel component, but patch2-5 are netdev component. > What tree is good to be included ? > If it is not net-next, I must rebase to another tree. I would prefer it goes into the tracing tree or whatever is the most appropriate for patch #1. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-09-03 2:17 ` David Miller @ 2010-09-03 2:55 ` Koki Sanagi 2010-09-03 4:46 ` Frederic Weisbecker 0 siblings, 1 reply; 23+ messages in thread From: Koki Sanagi @ 2010-09-03 2:55 UTC (permalink / raw) To: David Miller Cc: rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, fweisbec, mathieu.desnoyers (2010/09/03 11:17), David Miller wrote: > From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > Date: Fri, 03 Sep 2010 11:10:51 +0900 > >> Thanks many acks. and I have one question. >> >> These patches have several component. >> >> Patch1 is kernel component, but patch2-5 are netdev component. >> What tree is good to be included ? >> If it is not net-next, I must rebase to another tree. > > I would prefer it goes into the tracing tree or whatever is the most appropriate > for patch #1. > O.K. I'll rebase to linux-2.6-tip. Thanks, Koki Sanagi. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-09-03 2:55 ` Koki Sanagi @ 2010-09-03 4:46 ` Frederic Weisbecker 2010-09-03 5:12 ` Koki Sanagi 0 siblings, 1 reply; 23+ messages in thread From: Frederic Weisbecker @ 2010-09-03 4:46 UTC (permalink / raw) To: Koki Sanagi Cc: David Miller, rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers On Fri, Sep 03, 2010 at 11:55:04AM +0900, Koki Sanagi wrote: > (2010/09/03 11:17), David Miller wrote: > > From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> > > Date: Fri, 03 Sep 2010 11:10:51 +0900 > > > >> Thanks many acks. and I have one question. > >> > >> These patches have several component. > >> > >> Patch1 is kernel component, but patch2-5 are netdev component. > >> What tree is good to be included ? > >> If it is not net-next, I must rebase to another tree. > > > > I would prefer it goes into the tracing tree or whatever is the most appropriate > > for patch #1. > > > > O.K. I'll rebase to linux-2.6-tip. No need, they apply very well :) I'll push that to -tip soon. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] netdev: show a process of packets 2010-09-03 4:46 ` Frederic Weisbecker @ 2010-09-03 5:12 ` Koki Sanagi 0 siblings, 0 replies; 23+ messages in thread From: Koki Sanagi @ 2010-09-03 5:12 UTC (permalink / raw) To: Frederic Weisbecker Cc: David Miller, rostedt, netdev, linux-kernel, kaneshige.kenji, izumi.taku, kosaki.motohiro, nhorman, laijs, scott.a.mcmillan, eric.dumazet, mathieu.desnoyers (2010/09/03 13:46), Frederic Weisbecker wrote: > On Fri, Sep 03, 2010 at 11:55:04AM +0900, Koki Sanagi wrote: >> (2010/09/03 11:17), David Miller wrote: >>> From: Koki Sanagi <sanagi.koki@jp.fujitsu.com> >>> Date: Fri, 03 Sep 2010 11:10:51 +0900 >>> >>>> Thanks many acks. and I have one question. >>>> >>>> These patches have several component. >>>> >>>> Patch1 is kernel component, but patch2-5 are netdev component. >>>> What tree is good to be included ? >>>> If it is not net-next, I must rebase to another tree. >>> >>> I would prefer it goes into the tracing tree or whatever is the most appropriate >>> for patch #1. >>> >> >> O.K. I'll rebase to linux-2.6-tip. > > > No need, they apply very well :) > > I'll push that to -tip soon. > > Thanks. > O.K. Thanks! Koki Sanagi. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2010-09-07 16:57 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-23 9:41 [PATCH v4 0/5] netdev: show a process of packets Koki Sanagi 2010-08-23 9:42 ` [PATCH v4 1/5] irq: add tracepoint to softirq_raise Koki Sanagi 2010-09-03 15:29 ` Frederic Weisbecker 2010-09-03 15:39 ` Steven Rostedt 2010-09-03 15:42 ` Frederic Weisbecker 2010-09-03 15:43 ` Steven Rostedt 2010-09-03 15:50 ` Frederic Weisbecker 2010-09-06 1:46 ` Koki Sanagi 2010-08-23 9:43 ` [PATCH v4 2/5] napi: convert trace_napi_poll to TRACE_EVENT Koki Sanagi 2010-08-24 3:52 ` David Miller 2010-08-23 9:45 ` [PATCH v4 3/5] netdev: add tracepoints to netdev layer Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-08-23 9:46 ` [PATCH v4 4/5] skb: add tracepoints to freeing skb Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-08-23 9:47 ` [PATCH v4 5/5] perf:add a script shows a process of packet Koki Sanagi 2010-08-24 3:53 ` David Miller 2010-09-07 16:57 ` Frederic Weisbecker 2010-08-30 23:50 ` [PATCH v4 0/5] netdev: show a process of packets Steven Rostedt 2010-09-03 2:10 ` Koki Sanagi 2010-09-03 2:17 ` David Miller 2010-09-03 2:55 ` Koki Sanagi 2010-09-03 4:46 ` Frederic Weisbecker 2010-09-03 5:12 ` Koki Sanagi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).