From: Ingo Molnar <mingo@elte.hu>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
David Miller <davem@davemloft.net>,
akpm@linux-foundation.org, a.p.zijlstra@chello.nl,
linux-arch@vger.kernel.org, schwidefsky@de.ibm.com,
arnd@arndb.de, horsth@linux.vnet.ibm.com,
ehrhardt@linux.vnet.ibm.com
Subject: [numbers] Re: [patch] more skb ops inlining
Date: Mon, 17 Aug 2009 23:26:29 +0200 [thread overview]
Message-ID: <20090817212629.GA22368@elte.hu> (raw)
In-Reply-To: <20090816182733.GB5808@osiris.boeblingen.de.ibm.com>
* Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> An allyesconfig (CONFIG_FTRACE disabled) on x86_64 with gcc 4.3.2
> results in the following vmlinux sizes:
>
> 574152755 2009-08-16 20:05 vmlinux-orig
> 574862000 2009-08-16 20:13 vmlinux-skb-inline
>
> The skb-inline variant was created by using your patch and on top
> of it the patch below. Size increased by 692KB (0.12%).
>
> Dunno.. is that acceptable?
i ported your patch to latest mainline (see it attached below) i've
done x86(-64) defconfig (with disabled CONFIG_OPTIMIZE_INLINING)
measurements vmlinux:
text data bss dec hex filename
8455329 1192304 989908 10637541 a250e5 vmlinux.before
8471036 1192304 989908 10653248 a28e40 vmlinux.after
that's a +0.18% text size increase - quite significant.
I also did performance measurements, trying to figure out whether
this increase in size can be measured via any runtime performance
metrics such as cycles, instructions or cache-misses.
To do that i picked a micro-benchmarks lat_tcp.
( Note that microbenchmarks are biased in favor of inlining patches
like yours, because they disproportionately favor the speed
benefit of it while not adequately showing the cache footprint
disadvantage of increased kernel text size. )
Without your patch i got:
titan:~/l> perf stat --repeat 10 -e task-clock -e cycles -e instructions -e L1-icache-refs -e L1-icache-misses ./lat_tcp localhost
TCP latency using localhost: 17.5198 microseconds
TCP latency using localhost: 17.5060 microseconds
TCP latency using localhost: 17.4684 microseconds
TCP latency using localhost: 17.5416 microseconds
TCP latency using localhost: 17.4881 microseconds
TCP latency using localhost: 17.4832 microseconds
TCP latency using localhost: 17.4830 microseconds
TCP latency using localhost: 17.4723 microseconds
TCP latency using localhost: 17.4733 microseconds
TCP latency using localhost: 17.5255 microseconds
Performance counter stats for './lat_tcp localhost' (10 runs):
1494.625043 task-clock-msecs # 0.564 CPUs ( +- 0.077% )
3665868023 cycles # 2452.701 M/sec ( +- 0.070% )
2163212606 instructions # 0.590 IPC ( +- 0.090% )
2449056216 L1-icache-loads # 1638.576 M/sec ( +- 0.056% )
61081182 L1-icache-load-misses # 40.867 M/sec ( +- 0.192% )
2.649159316 seconds time elapsed ( +- 0.090% )
With your patch i got:
titan:~/l> perf stat --repeat 10 -e task-clock -e cycles -e instructions -e L1-icache-refs -e L1-icache-misses ./lat_tcp localhost
TCP latency using localhost: 17.6913 microseconds
TCP latency using localhost: 17.7218 microseconds
TCP latency using localhost: 17.6882 microseconds
TCP latency using localhost: 17.7196 microseconds
TCP latency using localhost: 17.6411 microseconds
TCP latency using localhost: 17.7136 microseconds
TCP latency using localhost: 17.6488 microseconds
TCP latency using localhost: 17.6663 microseconds
TCP latency using localhost: 17.7445 microseconds
TCP latency using localhost: 17.6851 microseconds
Performance counter stats for './lat_tcp localhost' (10 runs):
1497.479984 task-clock-msecs # 0.564 CPUs ( +- 0.106% )
3690097505 cycles # 2464.205 M/sec ( +- 0.109% )
2162432402 instructions # 0.586 IPC ( +- 0.127% )
2437974192 L1-icache-loads # 1628.051 M/sec ( +- 0.105% )
63064588 L1-icache-load-misses # 42.114 M/sec ( +- 0.278% )
2.656952976 seconds time elapsed ( +- 0.110% )
Summarized (for lat_tcp):
| Before:
|
| 1494.625043 task-clock-msecs # 0.564 CPUs ( +- 0.077% )
| 3665868023 cycles # 2452.701 M/sec ( +- 0.070% )
| 2163212606 instructions # 0.590 IPC ( +- 0.090% )
|
| After:
|
| 1497.479984 task-clock-msecs # 0.564 CPUs ( +- 0.106% )
| 3690097505 cycles # 2464.205 M/sec ( +- 0.109% )
| 2162432402 instructions # 0.586 IPC ( +- 0.127% )
So we've got a 0.2% slowdown, which is provably outside the noise
level of 0.1%. The instruction count remained almost the same.
The reason for the slowdown can be seen in the L1 icache-fetch
stats:
2449056216 L1-icache-loads # 1638.576 M/sec ( +- 0.056% )
61081182 L1-icache-load-misses # 40.867 M/sec ( +- 0.192% )
2437974192 L1-icache-loads # 1628.051 M/sec ( +- 0.105% )
63064588 L1-icache-load-misses # 42.114 M/sec ( +- 0.278% )
The icache-misses went up from 40.867 M/sec to 42.114 M/sec, a
3.05% worsening - well beyond the statistical significance of
0.28%.
So unless i botched the measurements or some accidental code layout
differences punish the patch artificially, this patch is a small
but measureable performance regression on x86.
The slowdown itself is barely provable with a test repeat count of
10 iterations (not a surprise really - such patches were not really
provable before via /usr/bin/time kinds of measurements), on an
otherwise totally idle system.
The icache-miss stats do support the notion that what is causing
the slowdown is the critical path fallout out of the icache.
( By all means please repeat these measurements - they are very
easy to do via 'perf stat --repeat 10' - see the examples
above. )
To build the kernel i used gcc 4.3.2. The CPU i used is friendly to
inlining patches as well: it's an Extreme Edition Core2 Duo (booted
to a single core, to stabilize the numbers):
Intel(R) Core(TM)2 CPU E6800 @ 2.93GHz
With 4MB of L2 cache. I suspect smaller CPUs would feel the
negative effects of inlining in a more pronounced way.
Thanks,
Ingo
---
include/linux/skbuff.h | 37 +++++++++++++++++++++++++++++++++++--
net/core/skbuff.c | 40 ----------------------------------------
2 files changed, 35 insertions(+), 42 deletions(-)
Index: linux/include/linux/skbuff.h
===================================================================
--- linux.orig/include/linux/skbuff.h
+++ linux/include/linux/skbuff.h
@@ -1120,7 +1120,6 @@ static inline void skb_set_tail_pointer(
/*
* Add data to an sk_buff
*/
-extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
{
unsigned char *tmp = skb_tail_pointer(skb);
@@ -1130,7 +1129,24 @@ static inline unsigned char *__skb_put(s
return tmp;
}
-extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
+/**
+ * skb_put - add data to a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer. If this would
+ * exceed the total buffer size the kernel will panic. A pointer to the
+ * first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
+{
+ unsigned char *tmp = __skb_put(skb, len);
+
+ if (unlikely(skb->tail > skb->end))
+ skb_over_panic(skb, len, __builtin_return_address(0));
+ return tmp;
+}
+
static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
{
skb->data -= len;
@@ -1138,6 +1154,23 @@ static inline unsigned char *__skb_push(
return skb->data;
}
+/**
+ * skb_push - add data to the start of a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer at the buffer
+ * start. If this would exceed the total buffer headroom the kernel will
+ * panic. A pointer to the first byte of the extra data is returned.
+ */
+static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
+{
+ __skb_push(skb, len);
+ if (unlikely(skb->data<skb->head))
+ skb_under_panic(skb, len, __builtin_return_address(0));
+ return skb->data;
+}
+
extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
{
Index: linux/net/core/skbuff.c
===================================================================
--- linux.orig/net/core/skbuff.c
+++ linux/net/core/skbuff.c
@@ -1000,46 +1000,6 @@ free_skb:
EXPORT_SYMBOL(skb_pad);
/**
- * skb_put - add data to a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer. If this would
- * exceed the total buffer size the kernel will panic. A pointer to the
- * first byte of the extra data is returned.
- */
-unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
-{
- unsigned char *tmp = skb_tail_pointer(skb);
- SKB_LINEAR_ASSERT(skb);
- skb->tail += len;
- skb->len += len;
- if (unlikely(skb->tail > skb->end))
- skb_over_panic(skb, len, __builtin_return_address(0));
- return tmp;
-}
-EXPORT_SYMBOL(skb_put);
-
-/**
- * skb_push - add data to the start of a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer at the buffer
- * start. If this would exceed the total buffer headroom the kernel will
- * panic. A pointer to the first byte of the extra data is returned.
- */
-unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
-{
- skb->data -= len;
- skb->len += len;
- if (unlikely(skb->data<skb->head))
- skb_under_panic(skb, len, __builtin_return_address(0));
- return skb->data;
-}
-EXPORT_SYMBOL(skb_push);
-
-/**
* skb_pull - remove data from the start of a buffer
* @skb: buffer to use
* @len: amount of data to remove
next prev parent reply other threads:[~2009-08-17 21:26 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-12 18:39 [patch 0/3] Allow inlined spinlocks again V3 Heiko Carstens
2009-08-12 18:39 ` [patch 1/3] spinlock: move spinlock function bodies to header file Heiko Carstens
2009-08-12 18:39 ` [patch 2/3] spinlock: allow inlined spinlocks Heiko Carstens
2009-08-12 18:39 ` [patch 3/3] spinlock: allow inlined spinlocks on s390 Heiko Carstens
2009-08-13 18:11 ` [patch 0/3] Allow inlined spinlocks again V3 Linus Torvalds
2009-08-13 18:34 ` Ingo Molnar
2009-08-13 18:43 ` Ingo Molnar
2009-08-14 12:34 ` Heiko Carstens
2009-08-14 16:04 ` Linus Torvalds
2009-08-14 17:13 ` Heiko Carstens
2009-08-14 18:08 ` Linus Torvalds
2009-08-14 20:19 ` David Miller
2009-08-14 20:45 ` Linus Torvalds
2009-08-14 21:10 ` Linus Torvalds
2009-08-14 22:23 ` David Miller
2009-08-16 18:27 ` Heiko Carstens
2009-08-16 18:45 ` Linus Torvalds
2009-08-16 20:36 ` Ingo Molnar
2009-08-17 10:26 ` Heiko Carstens
2009-08-17 21:26 ` Ingo Molnar [this message]
2009-08-18 11:34 ` [numbers] Re: [patch] more skb ops inlining Heiko Carstens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090817212629.GA22368@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=davem@davemloft.net \
--cc=ehrhardt@linux.vnet.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=horsth@linux.vnet.ibm.com \
--cc=linux-arch@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).