* [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
@ 2012-12-07 14:58 Joe Perches
2012-12-07 15:55 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2012-12-07 14:58 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, Eric Dumazet, netdev
Given that the struct will always have limit at the start of
a cacheline, why not make struct ___cacheline_aligned_on_smp
and make limit the first member?
It could make other structs that use struct dql a bit more
predictable or efficient to pack.
(netdev_queue is size reduced from 256 to 192 on x86-32)
---
include/linux/dynamic_queue_limits.h | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 5621547..2d20b22 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -38,6 +38,8 @@
#ifdef __KERNEL__
struct dql {
+ unsigned int limit; /* Must be first field - Current limit */
+
/* Fields accessed in enqueue path (dql_queued) */
unsigned int num_queued; /* Total ever queued */
unsigned int adj_limit; /* limit + num_completed */
@@ -45,7 +47,6 @@ struct dql {
/* Fields accessed only by completion path (dql_completed) */
- unsigned int limit ____cacheline_aligned_in_smp; /* Current limit */
unsigned int num_completed; /* Total ever completed */
unsigned int prev_ovlimit; /* Previous over limit */
@@ -59,7 +60,7 @@ struct dql {
unsigned int max_limit; /* Max limit */
unsigned int min_limit; /* Minimum limit */
unsigned int slack_hold_time; /* Time to measure slack */
-};
+} ____cacheline_aligned_in_smp;
/* Set some static maximums */
#define DQL_MAX_OBJECT (UINT_MAX / 16)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
2012-12-07 14:58 [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp Joe Perches
@ 2012-12-07 15:55 ` Eric Dumazet
2012-12-07 16:05 ` Joe Perches
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2012-12-07 15:55 UTC (permalink / raw)
To: Joe Perches; +Cc: Tom Herbert, David Miller, netdev
2012/12/7 Joe Perches <joe@perches.com>:
> Given that the struct will always have limit at the start of
> a cacheline, why not make struct ___cacheline_aligned_on_smp
> and make limit the first member?
>
> It could make other structs that use struct dql a bit more
> predictable or efficient to pack.
>
> (netdev_queue is size reduced from 256 to 192 on x86-32)
>
No, please.
Have you tested this on a range of hardware and check how it can hurt
performance ?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
2012-12-07 15:55 ` Eric Dumazet
@ 2012-12-07 16:05 ` Joe Perches
2012-12-07 16:19 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2012-12-07 16:05 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Tom Herbert, David Miller, netdev
On Fri, 2012-12-07 at 07:55 -0800, Eric Dumazet wrote:
> 2012/12/7 Joe Perches <joe@perches.com>:
> > Given that the struct will always have limit at the start of
> > a cacheline, why not make struct ___cacheline_aligned_on_smp
> > and make limit the first member?
> >
> > It could make other structs that use struct dql a bit more
> > predictable or efficient to pack.
> >
> > (netdev_queue is size reduced from 256 to 192 on x86-32)
> >
>
> No, please.
>
> Have you tested this on a range of hardware and check how it can hurt
> performance ?
No. Hence the RFC subject title and
unsigned patch.
I was wondering though about cacheline ping-pong
effects.
I noted Tom's comment back in
http://patchwork.ozlabs.org/patch/108856/
"Also, the cache line containing the struct dql can ping-pong between
CPUs doing initiation and completion. (I know we're aiming for these to
be the same, but we can't yet assume they will be.)"
So it seemed somewhat sensible to make the
entire struct in a single cacheline.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
2012-12-07 16:05 ` Joe Perches
@ 2012-12-07 16:19 ` Eric Dumazet
2012-12-07 16:29 ` Joe Perches
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2012-12-07 16:19 UTC (permalink / raw)
To: Joe Perches; +Cc: Tom Herbert, David Miller, netdev
On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:
> So it seemed somewhat sensible to make the
> entire struct in a single cacheline.
Any layout change in an object used in network fast path need a complete
performance study.
Even if you provide such a study, we'll need to reproduce your numbers
here.
BQL/DQL is not on our radars, spending two cache lines on a critical
object is fine.
Thanks
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
2012-12-07 16:19 ` Eric Dumazet
@ 2012-12-07 16:29 ` Joe Perches
2012-12-07 16:42 ` Ben Hutchings
2012-12-07 16:54 ` Eric Dumazet
0 siblings, 2 replies; 7+ messages in thread
From: Joe Perches @ 2012-12-07 16:29 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Tom Herbert, David Miller, netdev
On Fri, 2012-12-07 at 08:19 -0800, Eric Dumazet wrote:
> On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:
>
> > So it seemed somewhat sensible to make the
> > entire struct in a single cacheline.
>
> Any layout change in an object used in network fast path need a complete
> performance study.
>
> Even if you provide such a study, we'll need to reproduce your numbers
> here.
>
> BQL/DQL is not on our radars, spending two cache lines on a critical
> object is fine.
Well Maybe Tom can provide some information as to why
the limit variable was cacheline_aligned_in_smp and not
the struct.
I didn't find any discussion about it.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
2012-12-07 16:29 ` Joe Perches
@ 2012-12-07 16:42 ` Ben Hutchings
2012-12-07 16:54 ` Eric Dumazet
1 sibling, 0 replies; 7+ messages in thread
From: Ben Hutchings @ 2012-12-07 16:42 UTC (permalink / raw)
To: Joe Perches; +Cc: Eric Dumazet, Tom Herbert, David Miller, netdev
On Fri, 2012-12-07 at 08:29 -0800, Joe Perches wrote:
> On Fri, 2012-12-07 at 08:19 -0800, Eric Dumazet wrote:
> > On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:
> >
> > > So it seemed somewhat sensible to make the
> > > entire struct in a single cacheline.
> >
> > Any layout change in an object used in network fast path need a complete
> > performance study.
> >
> > Even if you provide such a study, we'll need to reproduce your numbers
> > here.
> >
> > BQL/DQL is not on our radars, spending two cache lines on a critical
> > object is fine.
>
> Well Maybe Tom can provide some information as to why
> the limit variable was cacheline_aligned_in_smp and not
> the struct.
>
> I didn't find any discussion about it.
Structure alignment has to be at least the maximum of each member's
alignment, so the struct *is* effectively cacheline_aligned_in_smp.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
2012-12-07 16:29 ` Joe Perches
2012-12-07 16:42 ` Ben Hutchings
@ 2012-12-07 16:54 ` Eric Dumazet
1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2012-12-07 16:54 UTC (permalink / raw)
To: Joe Perches; +Cc: Tom Herbert, David Miller, netdev
On Fri, 2012-12-07 at 08:29 -0800, Joe Perches wrote:
> Well Maybe Tom can provide some information as to why
> the limit variable was cacheline_aligned_in_smp and not
> the struct.
>
> I didn't find any discussion about it.
>
The struct _is_ cache line aligned, since it contains one field needing
cache line alignment. Its pretty clear to us.
There are two cache lines in it.
We don't one a single cache line, but two, for performance reasons.
If you believe its wrong, you have to provide the performance study, not
me, as I don't have time to spend cycles on this. We did this a long
time ago.
If you want to save few bytes on your kernel, redefine
__cacheline_aligned_on_smp to empty, and you'll be ok.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-12-07 16:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-07 14:58 [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp Joe Perches
2012-12-07 15:55 ` Eric Dumazet
2012-12-07 16:05 ` Joe Perches
2012-12-07 16:19 ` Eric Dumazet
2012-12-07 16:29 ` Joe Perches
2012-12-07 16:42 ` Ben Hutchings
2012-12-07 16:54 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).