All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Yan, Zheng" <zheng.z.yan@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, eranian@google.com,
	ak@linux.intel.com
Subject: Re: [PATCH] perf: Update event buffer tail when overwriting old events
Date: Tue, 09 Jul 2013 21:52:17 +0800	[thread overview]
Message-ID: <51DC1591.7070907@intel.com> (raw)
In-Reply-To: <20130709080513.GG25631@dyad.programming.kicks-ass.net>

On 07/09/2013 04:05 PM, Peter Zijlstra wrote:
> On Tue, Jul 09, 2013 at 03:05:41PM +0800, Yan, Zheng wrote:
> 
>> Thank you for your help. I ran the same test, the results for regular case
>> are much better. But it still has about 1% overhead, probably because we
>> enlarge the ring_buffer structure, make it less cache friendly.
>>
>>       origin    with the patch
>> AVG    1000      1013
>> STDEV  13.4      15.0
> 
> And this is the !overwrite case, right? I don't suppose you cured the logic
> Namhyung Kim pointed out? That should affect the overwrite case I suppose since
> it won't switch to perf_event_output_overwrite().

yes, it's the overwrite case.

> 
> tip/master:
> 
> struct ring_buffer {
>         atomic_t                   refcount;             /*     0     4 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         struct callback_head       callback_head;        /*     8    16 */
>         int                        nr_pages;             /*    24     4 */
>         int                        overwrite;            /*    28     4 */
>         atomic_t                   poll;                 /*    32     4 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         local_t                    head;                 /*    40     8 */
>         local_t                    nest;                 /*    48     8 */
>         local_t                    events;               /*    56     8 */
>         /* --- cacheline 1 boundary (64 bytes) --- */
>         local_t                    wakeup;               /*    64     8 */
>         local_t                    lost;                 /*    72     8 */
>         long int                   watermark;            /*    80     8 */
>         spinlock_t                 event_lock;           /*    88    56 */
>         /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
>         struct list_head           event_list;           /*   144    16 */
>         atomic_t                   mmap_count;           /*   160     4 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         long unsigned int          mmap_locked;          /*   168     8 */
>         struct user_struct *       mmap_user;            /*   176     8 */
>         struct perf_event_mmap_page * user_page;         /*   184     8 */
>         /* --- cacheline 3 boundary (192 bytes) --- */
>         void *                     data_pages[0];        /*   192     0 */
> 
>         /* size: 192, cachelines: 3, members: 18 */
>         /* sum members: 180, holes: 3, sum holes: 12 */
> };
> 
> tip/master + patch:
> 
> struct ring_buffer {
>         atomic_t                   refcount;             /*     0     4 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         struct callback_head       callback_head;        /*     8    16 */
>         int                        nr_pages;             /*    24     4 */
>         int                        overwrite;            /*    28     4 */
>         atomic_t                   poll;                 /*    32     4 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         local_t                    tail;                 /*    40     8 */
>         local_t                    next_tail;            /*    48     8 */
>         local_t                    head;                 /*    56     8 */
>         /* --- cacheline 1 boundary (64 bytes) --- */
>         local_t                    nest;                 /*    64     8 */
>         local_t                    events;               /*    72     8 */
>         local_t                    wakeup;               /*    80     8 */
>         local_t                    lost;                 /*    88     8 */
>         long int                   watermark;            /*    96     8 */
>         spinlock_t                 event_lock;           /*   104    56 */
>         /* --- cacheline 2 boundary (128 bytes) was 32 bytes ago --- */
>         struct list_head           event_list;           /*   160    16 */
>         atomic_t                   mmap_count;           /*   176     4 */
> 
>         /* XXX 4 bytes hole, try to pack */
> 
>         long unsigned int          mmap_locked;          /*   184     8 */
>         /* --- cacheline 3 boundary (192 bytes) --- */
>         struct user_struct *       mmap_user;            /*   192     8 */
>         struct perf_event_mmap_page * user_page;         /*   200     8 */
>         void *                     data_pages[0];        /*   208     0 */
> 
>         /* size: 208, cachelines: 4, members: 20 */
>         /* sum members: 196, holes: 3, sum holes: 12 */
>         /* last cacheline: 16 bytes */
> };
> 
> tip/master + patch^2:
> 
> struct ring_buffer {
>         atomic_t                   refcount;             /*     0     4 */
>         atomic_t                   mmap_count;           /*     4     4 */
>         union {
>                 int                overwrite;            /*           4 */
>                 struct callback_head callback_head;      /*          16 */
>         };                                               /*     8    16 */
>         int                        nr_pages;             /*    24     4 */
>         atomic_t                   poll;                 /*    28     4 */
>         local_t                    tail;                 /*    32     8 */
>         local_t                    next_tail;            /*    40     8 */
>         local_t                    head;                 /*    48     8 */
>         local_t                    nest;                 /*    56     8 */
>         /* --- cacheline 1 boundary (64 bytes) --- */
>         local_t                    events;               /*    64     8 */
>         local_t                    wakeup;               /*    72     8 */
>         local_t                    lost;                 /*    80     8 */
>         long int                   watermark;            /*    88     8 */
>         spinlock_t                 event_lock;           /*    96    56 */
>         /* --- cacheline 2 boundary (128 bytes) was 24 bytes ago --- */
>         struct list_head           event_list;           /*   152    16 */
>         long unsigned int          mmap_locked;          /*   168     8 */
>         struct user_struct *       mmap_user;            /*   176     8 */
>         struct perf_event_mmap_page * user_page;         /*   184     8 */
>         /* --- cacheline 3 boundary (192 bytes) --- */
>         void *                     data_pages[0];        /*   192     0 */
> 
>         /* size: 192, cachelines: 3, members: 19 */
> };
> 
> 
> ---
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4641,7 +4641,7 @@ static void perf_event_output_overwrite(
>  static void
>  perf_event_set_overflow(struct perf_event *event, struct ring_buffer *rb)
>  {
> -	if (event->overflow_handler != perf_event_output ||
> +	if (event->overflow_handler != perf_event_output &&
>  	    event->overflow_handler != perf_event_output_overwrite)
>  		return;
>  
> --- a/kernel/events/internal.h
> +++ b/kernel/events/internal.h
> @@ -10,13 +10,16 @@
>  
>  struct ring_buffer {
>  	atomic_t			refcount;
> -	struct rcu_head			rcu_head;
> +	atomic_t			mmap_count;
> +	union {
> +		int			overwrite;	/* can overwrite itself */
> +		struct rcu_head		rcu_head;
> +	};
>  #ifdef CONFIG_PERF_USE_VMALLOC
>  	struct work_struct		work;
>  	int				page_order;	/* allocation order  */
>  #endif
>  	int				nr_pages;	/* nr of data pages  */
> -	int				overwrite;	/* can overwrite itself */
>  
>  	atomic_t			poll;		/* POLL_ for wakeups */
>  
> @@ -33,7 +36,6 @@ struct ring_buffer {
>  	spinlock_t			event_lock;
>  	struct list_head		event_list;
>  
> -	atomic_t			mmap_count;
>  	unsigned long			mmap_locked;
>  	struct user_struct		*mmap_user;
>  
> 

      origin    patch    patch^2
AVG    1000      1013    1028
STDEV  13.4      15.0    14.6

patch^2 doesn't help. I will try moving the new fields down and re-test tomorrow

Regards
Yan, Zheng




  reply	other threads:[~2013-07-09 13:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06  5:58 [PATCH] perf: Update event buffer tail when overwriting old events Yan, Zheng
2013-06-18  9:13 ` Peter Zijlstra
2013-07-08 12:15 ` Peter Zijlstra
2013-07-09  6:18   ` Namhyung Kim
2013-07-09  7:40     ` Peter Zijlstra
2013-07-09  7:05   ` Yan, Zheng
2013-07-09  8:05     ` Peter Zijlstra
2013-07-09 13:52       ` Yan, Zheng [this message]
2013-07-09 14:31         ` Peter Zijlstra
2013-07-10 11:37   ` Yan, Zheng
2013-07-10 11:44     ` Peter Zijlstra
2013-07-11  0:46       ` Yan, Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51DC1591.7070907@intel.com \
    --to=zheng.z.yan@intel.com \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.