All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Ahern <dsahern@gmail.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: acme@ghostprotocols.net, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@kernel.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Jiri Olsa <jolsa@redhat.com>, Mike Galbraith <efault@gmx.de>,
	Stephane Eranian <eranian@google.com>
Subject: Re: [PATCH] perf record: mmap output file - v2
Date: Tue, 15 Oct 2013 07:35:53 -0600	[thread overview]
Message-ID: <525D44B9.7060901@gmail.com> (raw)
In-Reply-To: <87txgj9eir.fsf@sejong.aot.lge.com>

On 10/15/13 1:31 AM, Namhyung Kim wrote:
> Hi David,
>
> On Mon, 14 Oct 2013 20:55:31 -0600, David Ahern wrote:
>> When recording raw_syscalls for the entire system, e.g.,
>>      perf record -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
>>
>> you end up with a negative feedback loop as perf itself calls
>> write() fairly often. This patch handles the problem by mmap'ing the
>> file in chunks of 64M at a time and copies events from the event buffers
>> to the file avoiding write system calls.
>>
>> Before (with write syscall):
>>
>> perf record -o /tmp/perf.data -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
>> [ perf record: Woken up 0 times to write data ]
>> [ perf record: Captured and wrote 81.843 MB /tmp/perf.data (~3575786 samples) ]
>>
>> After (using mmap):
>>
>> perf record -o /tmp/perf.data -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
>> [ perf record: Woken up 31 times to write data ]
>> [ perf record: Captured and wrote 8.203 MB /tmp/perf.data (~358388 samples) ]
>
> Why do they have that different size?

perf calls write() for each mmap, each time through the loop. Each write 
generates 2 events (syscall entry + exit) -- ie., generates more events. 
That's the negative feedback loop.


> [SNIP]
>> +
>> +		rec->mmap_addr = mmap(NULL, rec->mmap_size,
>> +				     PROT_WRITE | PROT_READ,
>> +				     MAP_SHARED,
>> +				     rec->output,
>> +				     offset);
>> +
>> +		if (rec->mmap_addr == MAP_FAILED) {
>> +			pr_err("mmap failed: %d: %s\n", errno, strerror(errno));
>> +			return -1;
>> +		}
>> +
>> +		/* expand file to include this mmap segment */
>> +		if (ftruncate(rec->output, offset + rec->mmap_size) != 0) {
>> +			pr_err("ftruncate failed\n");
>> +			return -1;
>> +		}
>
> I think this mmap + ftruncate should be reordered.  Although it looks
> work without problems the mmap man pages says it's unspecified behavior.
>
>         A file is mapped in multiples of the page size.  For a file that is not
>         a multiple of the page  size,  the  remaining  memory  is  zeroed  when
>         mapped, and writes to that region are not written out to the file.  The
>         effect of changing the size of the underlying file of a mapping on  the
>         pages  that  correspond  to  added  or  removed  regions of the file is
>         unspecified.

The mmap only expands the address range; the ftruncate expands the file 
behind the mmap. Both are needed and must succeed to function properly, 
and I don't see how the order matters. ie.,

This order has an extra call on the failure path:
   ftruncate
   mmap
   - on failure call ftruncate to reset file size

The order I have does not have that problem:
   mmap
   ftruncate

Here on failure just return -1 and we end the session.

David

  parent reply	other threads:[~2013-10-15 13:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-15  2:55 [PATCH] perf record: mmap output file - v2 David Ahern
2013-10-15  6:02 ` Ingo Molnar
2013-10-15  7:09   ` Namhyung Kim
2013-10-15  7:25     ` Ingo Molnar
2013-10-15  8:17       ` Namhyung Kim
2013-10-15 12:22       ` Jiri Olsa
2013-10-15 13:20         ` Ingo Molnar
2013-10-15 13:25     ` David Ahern
2013-10-16  1:24       ` Namhyung Kim
2013-10-15  7:31 ` Namhyung Kim
2013-10-15  7:44   ` Ingo Molnar
2013-10-15 13:45     ` David Ahern
2013-10-15 13:35   ` David Ahern [this message]
2013-10-16  1:52     ` Namhyung Kim
2013-10-16  1:58       ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=525D44B9.7060901@gmail.com \
    --to=dsahern@gmail.com \
    --cc=acme@ghostprotocols.net \
    --cc=efault@gmx.de \
    --cc=eranian@google.com \
    --cc=fweisbec@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.