* [RFC PATCH 0/2] netdev: show the number of tx-packets in device
@ 2010-05-24 4:49 Koki Sanagi
2010-05-24 4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
2010-05-24 4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi
0 siblings, 2 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24 4:49 UTC (permalink / raw)
To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku
This patch-set adds tracepoints to dev_hard_start_xmit, consume_skb and
dev_kfree_skb_irq and perf script which calculates the time from entry of
ndo_start_xmit to dev_kfree_skb_* and the number of tx-packets in device.
-Perf script description
This script calculates two metric.
metric1: lap time between start_xmit - free_skb
This script calculate the time a packet passes from entry of ndo_start_xmit to
dev_kfree_skb_irq or consume_skb. It indicate a time driver/device owns that
packet. This script outputs an average time of all packets and a longest of
that.
metric2: number of packets in device
>From the above time, we can calculate the number of packets in device at a
certain time. This script outputs an average of the number of packets in device
and a largest of that.
-Merit
These tracepoints and script have two merits.
1. Detecting a packed tx-ring of network device
2. Detecting a defect of transmit functionality of network device
merit1: Detecting a packed tx-ring of network device
Using attached scripts, we can get a maximum number of packets in a device. If
it reaches to the number of packets a device can own, tx-ring of that device is
full and causes loss of network transmit performance. Because the driver of the
device drop packet or stops tx-ring and reject it until it keeps some space.
So, to keep good network transmit performance, it is good to keep some space in
tx-ring. To keep some space in tx-ring, these tracepoints and script are
useful.
To check this merit, I did a test.
Before starting a test, I want you to know that a maximum number of tx-packets
e1000e can own is (tx-ring size - 20) / 2 packets.
Because e1000e keeps 20 descriptors for frags and 1 packet needs 2 descriptors
due to tx-checksum.
So, if tx-ring size is 256,
(256 - 20) / 2 = 118 packets
If tx-ring size is 512,
(512 - 20) / 2 = 246 packets
Environment:
Test NIC: Intel 82571EB (InterruptThrottleRate=1000)
Opponent NIC: Broadcom BCM5703X
InterruptThrottleRate was set to 1000 to make tx-ring packed deliberately.
Test load tool:
netperf -H XXX.XXX.XXX.XXX -t UDP_STREAM -- -m 1
With this environment, I compared following 2 cases.
1.Tx-ring size=256 case
2.Tx-ring size=512 case
Result:
1.The case of Tx-ring=256:
eth0 TX packets=1137811
lap time between start_xmit - free_skb:
avg=0.795687msec
max=0.985911msec
number of packets in device:
avg= 64.66
max= 118
netperf's result:
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
112640 1 10.00 1179077 0 0.94
109568 10.00 544750 0.44
2.The case of Tx-ring=512:
eth0 TX packets=1531629
lap time between start_xmit - free_skb:
avg=0.370052msec
max=0.982069msec
number of packets in device:
avg= 49.70
max= 164
netperf's result:
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
112640 1 10.00 1577840 0 1.26
109568 10.00 871058 0.70
In the case of tx-ring size=256(default), maximum number of packets in device
reaches to 118. So this tx-ring is full and becomes a cause of network
performance loss.
On the other hand, in the case of tx-ring size=512, e1000e can own 246 packets,
but maximum number of packets in device doesn't reach it. so tx-ring has always
some space and there is no network performance loss caused by packed tx-ring.
Actually, about transmit throughput, The case of Tx-ring size=512 is better
than the case of tx-ring size=256.
Like this, the number of packets in device is available to tune tx-ring size or
other parameters to avoid packed tx-ring and is related to network transmit
performance.
merit2: Detecting a defect of transmit functionality of network device
When device can't transmit due to hardware fault or driver's bug(I've
encountered this), we can detect it. Because in such case, dev_hard_start_xmit
is passed, but dev_kfree_skb_* is not passed.
NOTE:
This script has some problem,
-The number of tx-packets of netperf and that of this script are not equal.
-Sometimes The max number of packets in device larger than the device can own.
Thanks,
Koki Sanagi.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq
2010-05-24 4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
@ 2010-05-24 4:51 ` Koki Sanagi
2010-05-24 4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi
1 sibling, 0 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24 4:51 UTC (permalink / raw)
To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku
This patch adds tracepoint to dev_hard_start_xmit, consume_skb and
dev_kfree_skb_irq to monitor tx-packets in device.
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
include/trace/events/skb.h | 64 ++++++++++++++++++++++++++++++++++++++++++++
net/core/dev.c | 4 +++
net/core/skbuff.c | 1 +
3 files changed, 69 insertions(+), 0 deletions(-)
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..ed97580 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -9,6 +9,52 @@
#include <linux/tracepoint.h>
/*
+ * netdev_start_xmit - invoked when skb is passed to the driver
+ * @skb: pointer to struct sk_buff
+ * @dev: pointer to struct net_device
+ */
+TRACE_EVENT(netdev_start_xmit,
+
+ TP_PROTO(struct sk_buff *skb,
+ struct net_device *dev),
+
+ TP_ARGS(skb, dev),
+
+ TP_STRUCT__entry(
+ __field( const void *, skbaddr )
+ __field( unsigned int, len )
+ __string( name, dev->name )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ __entry->len = skb->len;
+ __assign_str(name, dev->name);
+ ),
+
+ TP_printk("dev=%s skbaddr=%p len=%u",
+ __get_str(name), __entry->skbaddr, __entry->len)
+);
+
+TRACE_EVENT(consume_skb,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ ),
+
+ TP_printk("skbaddr=%p",
+ __entry->skbaddr)
+);
+
+/*
* Tracepoint for free an sk_buff:
*/
TRACE_EVENT(kfree_skb,
@@ -35,6 +81,24 @@ TRACE_EVENT(kfree_skb,
__entry->skbaddr, __entry->protocol, __entry->location)
);
+TRACE_EVENT(dev_kfree_skb_irq,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ ),
+
+ TP_printk("skbaddr=%p",
+ __entry->skbaddr)
+);
+
TRACE_EVENT(skb_copy_datagram_iovec,
TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/dev.c b/net/core/dev.c
index 32611c8..647d812 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -130,6 +130,7 @@
#include <linux/jhash.h>
#include <linux/random.h>
#include <trace/events/napi.h>
+#include <trace/events/skb.h>
#include <linux/pci.h>
#include "net-sysfs.h"
@@ -1577,6 +1578,7 @@ void dev_kfree_skb_irq(struct sk_buff *skb)
struct softnet_data *sd;
unsigned long flags;
+ trace_dev_kfree_skb_irq(skb);
local_irq_save(flags);
sd = &__get_cpu_var(softnet_data);
skb->next = sd->completion_queue;
@@ -1919,6 +1921,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
goto gso;
}
+ trace_netdev_start_xmit(skb, dev);
rc = ops->ndo_start_xmit(skb, dev);
if (rc == NETDEV_TX_OK)
txq_trans_update(txq);
@@ -1939,6 +1942,7 @@ gso:
if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
skb_dst_drop(nskb);
+ trace_netdev_start_xmit(nskb, dev);
rc = ops->ndo_start_xmit(nskb, dev);
if (unlikely(rc != NETDEV_TX_OK)) {
if (rc & ~NETDEV_TX_MASK)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a9b0e1f..b9c963c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
smp_rmb();
else if (likely(!atomic_dec_and_test(&skb->users)))
return;
+ trace_consume_skb(skb);
__kfree_skb(skb);
}
EXPORT_SYMBOL(consume_skb);
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device
2010-05-24 4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
2010-05-24 4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
@ 2010-05-24 4:52 ` Koki Sanagi
1 sibling, 0 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24 4:52 UTC (permalink / raw)
To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku
This patch adds perf script to calculate the time from entry of
ndo_start_xmit to dev_kfree_skb_* and the number of tx-packets in device.
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
---
.../scripts/python/bin/tx-packet-in-device-record | 2 +
.../scripts/python/bin/tx-packet-in-device-report | 4 +
tools/perf/scripts/python/tx-packet-in-device.py | 109 ++++++++++++++++++++
3 files changed, 115 insertions(+), 0 deletions(-)
diff --git a/tools/perf/scripts/python/bin/tx-packet-in-device-record b/tools/perf/scripts/python/bin/tx-packet-in-device-record
new file mode 100644
index 0000000..18f2356
--- /dev/null
+++ b/tools/perf/scripts/python/bin/tx-packet-in-device-record
@@ -0,0 +1,2 @@
+#!/bin/bash
+perf record -c 1 -f -a -M -R -e skb:netdev_start_xmit -e skb:consume_skb -e skb:dev_kfree_skb_irq
diff --git a/tools/perf/scripts/python/bin/tx-packet-in-device-report b/tools/perf/scripts/python/bin/tx-packet-in-device-report
new file mode 100644
index 0000000..8ef4cc2
--- /dev/null
+++ b/tools/perf/scripts/python/bin/tx-packet-in-device-report
@@ -0,0 +1,4 @@
+#!/bin/bash
+# description: netif_receive_skb counts per poll
+# args: [comm]
+perf trace -s ~/libexec/perf-core/scripts/python/tx-packet-in-device.py $1
diff --git a/tools/perf/scripts/python/tx-packet-in-device.py b/tools/perf/scripts/python/tx-packet-in-device.py
new file mode 100644
index 0000000..fb1933f
--- /dev/null
+++ b/tools/perf/scripts/python/tx-packet-in-device.py
@@ -0,0 +1,109 @@
+# perf trace event handlers, generated by perf trace -g python
+# Licensed under the terms of the GNU GPL License version 2
+
+# The common_* event handler fields are the most useful fields common to
+# all events. They don't necessarily correspond to the 'common_*' fields
+# in the format files. Those fields not available as handler params can
+# be retrieved using Python functions of the form common_*(context).
+# See the perf-trace-python Documentation for the list of available functions.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+# skb_dic = {skbaddr: {name:*, start_time:*}}
+#
+# skbaddr: address of skb through dev_hard_start_xmit
+# name: name of device
+# start_time: the time dev_start_hard_xmit pass
+skb_dic = {};
+
+# dev_stat_dic = {name: {pkt_in_tx:*, max_pkt_in_tx:*, total_pkt:*,
+# prev_time:*, total_time:*, max_lap_time:*,
+# total_lap_time:*}}
+#
+# name: name of device
+# pkt_in_tx: tx-packets a device has currently
+# max_pkt_in_tx: maximum of the above
+# total_pkt: total tx-packets through a device
+# prev_time: the time starting xmit or freeing skb
+# happened previously
+# total_time: the time from first starting xmit to now
+# max_lap_time: maximum time from starting xmit to freein skb
+# total_lap_time: sum of time tx-packet is in device
+dev_stat_dic = {};
+
+def trace_end():
+ for name in sorted(dev_stat_dic.keys()):
+ cstat = dev_stat_dic[name]
+ print "%s\tTX packets=%d" %\
+ (name, cstat['total_pkt'])
+ print "\tlap time between start_xmit - free_skb:"
+ avg_nsec = avg(1.0 * cstat['total_lap_time'],
+ cstat['total_pkt'])
+ print "\t avg=%fmsec" % (avg_nsec / 1000000.0)
+ print "\t max=%fmsec" % (cstat['max_lap_time'] / 1000000.0)
+ print "\tnumber of packets in device:"
+ print "\t avg=%7.2f" % avg(cstat['total_lap_time'] * 1.0,
+ cstat['total_time'])
+ print "\t max=%4d" % cstat['max_pkt_in_tx']
+ print ""
+
+def skb__dev_kfree_skb_irq(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr):
+ free_skb(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr)
+
+def skb__consume_skb(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr):
+ free_skb(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr)
+
+def free_skb(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr):
+ if skbaddr in skb_dic.keys():
+ ctime = nsecs(common_secs, common_nsecs)
+ lap_time = ctime - skb_dic[skbaddr]['start_time']
+ cstat = dev_stat_dic[skb_dic[skbaddr]['name']]
+ cstat['total_lap_time'] += lap_time;
+ cstat['total_pkt'] += 1;
+ cstat = dev_stat_dic[skb_dic[skbaddr]['name']]
+ cstat['total_time'] += ctime - cstat['prev_time']
+ cstat['prev_time'] = ctime
+ cstat['pkt_in_tx'] -= 1;
+ if lap_time > cstat['max_lap_time']:
+ cstat['max_lap_time'] = lap_time
+ del skb_dic[skbaddr]
+
+def skb__netdev_start_xmit(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr, len, name):
+ retry = 0
+ ctime = nsecs(common_secs, common_nsecs)
+ if skbaddr in skb_dic.keys():
+ retry = 1;
+ skb_dic[skbaddr] = {'name':name, 'start_time':ctime}
+ if name not in dev_stat_dic.keys():
+ dev_stat_dic[name] = {'pkt_in_tx':0, 'max_pkt_in_tx':0,\
+ 'total_pkt':0,\
+ 'prev_time':ctime, 'total_time':0,\
+ 'max_lap_time':0, 'total_lap_time':0}
+ cstat = dev_stat_dic[name]
+ cstat['total_time'] += ctime - cstat['prev_time']
+ cstat['prev_time'] = ctime
+ if retry == 0:
+ cstat['pkt_in_tx'] += 1
+ if cstat['pkt_in_tx'] > cstat['max_pkt_in_tx']:
+ cstat['max_pkt_in_tx'] = cstat['pkt_in_tx']
+
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-05-24 4:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-24 4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
2010-05-24 4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
2010-05-24 4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.