netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
@ 2012-12-07 14:58 Joe Perches
  2012-12-07 15:55 ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2012-12-07 14:58 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Eric Dumazet, netdev

Given that the struct will always have limit at the start of
a cacheline, why not make  struct ___cacheline_aligned_on_smp
and make limit the first member?

It could make other structs that use struct dql a bit more
predictable or efficient to pack.

(netdev_queue is size reduced from 256 to 192 on x86-32)

---

 include/linux/dynamic_queue_limits.h |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 5621547..2d20b22 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -38,6 +38,8 @@
 #ifdef __KERNEL__
 
 struct dql {
+	unsigned int	limit; /* Must be first field - Current limit */
+
 	/* Fields accessed in enqueue path (dql_queued) */
 	unsigned int	num_queued;		/* Total ever queued */
 	unsigned int	adj_limit;		/* limit + num_completed */
@@ -45,7 +47,6 @@ struct dql {
 
 	/* Fields accessed only by completion path (dql_completed) */
 
-	unsigned int	limit ____cacheline_aligned_in_smp; /* Current limit */
 	unsigned int	num_completed;		/* Total ever completed */
 
 	unsigned int	prev_ovlimit;		/* Previous over limit */
@@ -59,7 +60,7 @@ struct dql {
 	unsigned int	max_limit;		/* Max limit */
 	unsigned int	min_limit;		/* Minimum limit */
 	unsigned int	slack_hold_time;	/* Time to measure slack */
-};
+} ____cacheline_aligned_in_smp;
 
 /* Set some static maximums */
 #define DQL_MAX_OBJECT (UINT_MAX / 16)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
  2012-12-07 14:58 [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp Joe Perches
@ 2012-12-07 15:55 ` Eric Dumazet
  2012-12-07 16:05   ` Joe Perches
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2012-12-07 15:55 UTC (permalink / raw)
  To: Joe Perches; +Cc: Tom Herbert, David Miller, netdev

2012/12/7 Joe Perches <joe@perches.com>:
> Given that the struct will always have limit at the start of
> a cacheline, why not make  struct ___cacheline_aligned_on_smp
> and make limit the first member?
>
> It could make other structs that use struct dql a bit more
> predictable or efficient to pack.
>
> (netdev_queue is size reduced from 256 to 192 on x86-32)
>

No, please.

Have you tested this on a range of hardware and check how it can hurt
performance ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
  2012-12-07 15:55 ` Eric Dumazet
@ 2012-12-07 16:05   ` Joe Perches
  2012-12-07 16:19     ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Joe Perches @ 2012-12-07 16:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, David Miller, netdev

On Fri, 2012-12-07 at 07:55 -0800, Eric Dumazet wrote:
> 2012/12/7 Joe Perches <joe@perches.com>:
> > Given that the struct will always have limit at the start of
> > a cacheline, why not make  struct ___cacheline_aligned_on_smp
> > and make limit the first member?
> >
> > It could make other structs that use struct dql a bit more
> > predictable or efficient to pack.
> >
> > (netdev_queue is size reduced from 256 to 192 on x86-32)
> >
> 
> No, please.
> 
> Have you tested this on a range of hardware and check how it can hurt
> performance ?

No.  Hence the RFC subject title and
unsigned patch.

I was wondering though about cacheline ping-pong 
effects.

I noted Tom's comment back in
http://patchwork.ozlabs.org/patch/108856/
"Also, the cache line containing the struct dql can ping-pong between
CPUs doing initiation and completion.  (I know we're aiming for these to
be the same, but we can't yet assume they will be.)"

So it seemed somewhat sensible to make the
entire struct in a single cacheline.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
  2012-12-07 16:05   ` Joe Perches
@ 2012-12-07 16:19     ` Eric Dumazet
  2012-12-07 16:29       ` Joe Perches
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2012-12-07 16:19 UTC (permalink / raw)
  To: Joe Perches; +Cc: Tom Herbert, David Miller, netdev

On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:

> So it seemed somewhat sensible to make the
> entire struct in a single cacheline.

Any layout change in an object used in network fast path need a complete
performance study.

Even if you provide such a study, we'll need to reproduce your numbers
here.

BQL/DQL is not on our radars, spending two cache lines on a critical
object is fine.

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
  2012-12-07 16:19     ` Eric Dumazet
@ 2012-12-07 16:29       ` Joe Perches
  2012-12-07 16:42         ` Ben Hutchings
  2012-12-07 16:54         ` Eric Dumazet
  0 siblings, 2 replies; 7+ messages in thread
From: Joe Perches @ 2012-12-07 16:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, David Miller, netdev

On Fri, 2012-12-07 at 08:19 -0800, Eric Dumazet wrote:
> On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:
> 
> > So it seemed somewhat sensible to make the
> > entire struct in a single cacheline.
> 
> Any layout change in an object used in network fast path need a complete
> performance study.
> 
> Even if you provide such a study, we'll need to reproduce your numbers
> here.
> 
> BQL/DQL is not on our radars, spending two cache lines on a critical
> object is fine.

Well Maybe Tom can provide some information as to why
the limit variable was cacheline_aligned_in_smp and not
the struct.

I didn't find any discussion about it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
  2012-12-07 16:29       ` Joe Perches
@ 2012-12-07 16:42         ` Ben Hutchings
  2012-12-07 16:54         ` Eric Dumazet
  1 sibling, 0 replies; 7+ messages in thread
From: Ben Hutchings @ 2012-12-07 16:42 UTC (permalink / raw)
  To: Joe Perches; +Cc: Eric Dumazet, Tom Herbert, David Miller, netdev

On Fri, 2012-12-07 at 08:29 -0800, Joe Perches wrote:
> On Fri, 2012-12-07 at 08:19 -0800, Eric Dumazet wrote:
> > On Fri, 2012-12-07 at 08:05 -0800, Joe Perches wrote:
> > 
> > > So it seemed somewhat sensible to make the
> > > entire struct in a single cacheline.
> > 
> > Any layout change in an object used in network fast path need a complete
> > performance study.
> > 
> > Even if you provide such a study, we'll need to reproduce your numbers
> > here.
> > 
> > BQL/DQL is not on our radars, spending two cache lines on a critical
> > object is fine.
> 
> Well Maybe Tom can provide some information as to why
> the limit variable was cacheline_aligned_in_smp and not
> the struct.
> 
> I didn't find any discussion about it.

Structure alignment has to be at least the maximum of each member's
alignment, so the struct *is* effectively cacheline_aligned_in_smp.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp
  2012-12-07 16:29       ` Joe Perches
  2012-12-07 16:42         ` Ben Hutchings
@ 2012-12-07 16:54         ` Eric Dumazet
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2012-12-07 16:54 UTC (permalink / raw)
  To: Joe Perches; +Cc: Tom Herbert, David Miller, netdev

On Fri, 2012-12-07 at 08:29 -0800, Joe Perches wrote:

> Well Maybe Tom can provide some information as to why
> the limit variable was cacheline_aligned_in_smp and not
> the struct.
> 
> I didn't find any discussion about it.
> 

The struct _is_ cache line aligned, since it contains one field needing
cache line alignment. Its pretty clear to us.

There are two cache lines in it.

We don't one a single cache line, but two, for performance reasons.

If you believe its wrong, you have to provide the performance study, not
me, as I don't have time to spend cycles on this. We did this a long
time ago.

If you want to save few bytes on your kernel, redefine
__cacheline_aligned_on_smp to empty, and you'll be ok.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-12-07 16:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-07 14:58 [RFC PATCH] dynamic_queue_limit.h: Make the struct ___cacheline_aligned_on_smp Joe Perches
2012-12-07 15:55 ` Eric Dumazet
2012-12-07 16:05   ` Joe Perches
2012-12-07 16:19     ` Eric Dumazet
2012-12-07 16:29       ` Joe Perches
2012-12-07 16:42         ` Ben Hutchings
2012-12-07 16:54         ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).