All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Stanislav Fomichev <stfomichev@yandex-team.ru>
Cc: a.p.zijlstra@chello.nl, paulus@samba.org, mingo@redhat.com,
	dsahern@gmail.com, jolsa@redhat.com,
	xiaoguangrong@linux.vnet.ibm.com, yangds.fnst@cn.fujitsu.com,
	adrian.hunter@intel.com, namhyung@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/5] perf trace: add support for pagefault tracing
Date: Wed, 18 Jun 2014 17:04:32 -0300	[thread overview]
Message-ID: <20140618200432.GA20252@kernel.org> (raw)
In-Reply-To: <20140618174254.GD15620@stfomichev-desktop.yandex.net>

Em Wed, Jun 18, 2014 at 09:42:54PM +0400, Stanislav Fomichev escreveu:
> > I try, when possible, to not use short options that are used in
> > 'strace', so what if we use 'F' here?
> Agreed, will change.
> 
> > And:
> > 
> >    trace --pgfaults --pgfaults
> > 
> > for major and min page faults looks ugly, what if we instead use --pf
> > for both, and allow passing min or maj as args?
> > 
> > I.e.:
> > 
> > For both major and minor:
> > 
> >   trace --pf
> > 
> > for just major page faults:
> > 
> >   trace --pf maj
> Not sure I like it. Currently, when we need to trace page faults its
> major faults in 99.9% of cases, we are not interested in minor ones (and

If 99.9% of the cases is for for major faults, then make --pf default to
that :-) While allowing to change this behaviour via:

For just major faults:

	trace --pf

For all kinds:

	trace --pf all

For just minor faults:

	trace --pf min

Sure you can have the shorthand that -FF means "--pf all"

What I'm trying to avoid is to have to use with long options:

	trace --pf --pf

Also, with your scheme, how would I ask for just minor faults, if
somebody happens to want that?

> there are thousands of them in a second). So we just do 'perf trace -F'.
> If we need minor faults, we are probably interested in all fault events,
> so we do -F twice.
> 
> With 'trace --pf' we by default hit our 0.01% use case, so we always
> need to type 'trace --pf maj'. --pf may be clear from the documentation
> standpoint, but I don't like the fact that --pf defaults to all faults.

So make it default to the most common case :-)

> > > +	int			trace_pgfaults;

> > uint8_t should be enough
> By using int I state: 'I don't care about underlying type, give me
> something to count'. If we use uint8_t it would imply (to me) that
> for some reason we care about struct layout, size, endianness, etc.
> IOW, I don't think we need to care about +-3 bytes here.

Guess I've been using pahole too much... ;-)

Also, uint8_t is something that can count, as is u8, that is more
commonly used in tools/perf/ to make it look like kernel code :-)

But nah, not a biggie, just trying to be judicious in using types.
 
> > > -typedef int (*tracepoint_handler)(struct trace *trace, struct perf_evsel *evsel,
> > > +typedef int (*tracepoint_handler)(struct trace *trace,
> > > +				  union perf_event *event,
> > > +				  struct perf_evsel *evsel,
> > >  				  struct perf_sample *sample);
> > 
> > You'll reduce patch size if you leave the first line alone and just add
> > the new parameter (event) after evsel.
> > 
> > It'll become then:
> > 
> >  typedef int (*tracepoint_handler)(struct trace *trace, struct perf_evsel *evsel,
> > +				  union perf_event *event,
> >  				  struct perf_sample *sample);
> > 
> > Then please make one separate patch adding this new parameter, stating
> > that it will be used by pagefault tracing, that will come in a followup
> > patch in this series.
> Agreed, thanks.
> 
> > > +static bool valid_dso(struct addr_location *al, struct perf_sample *sample)
> > 
> > Humm, can't this be reduced to just:
> > 
> >       return al->map && al->map->dso;
> > 
> > ? I.e. if the helper returned a dso, it is because the address that was
> > looked up is in that dso, no?
> > 
> > I even guess that if there is al->map, that should be enough to check,
> > byt haven't thought this thru nor looked at the relevant sources, in a
> > hurry now.
> Yes, we don't need to check for ->dso, it's always non-null, I'll remove
> the check.
> But we do need to check for the range. Because
> thread__find_addr_map searches for the closest map using only ->start and
> doesn't check if address is within map range (->end).
> Maybe we need to fix it in thread__find_addr_map so it always returns valid map?

That is my expectation, yes, if I ask for the map where address N is, it
should return just that, where have you found this bug?

The thread__find_addr_map will end up calling maps__find() and it does
this:

                if (ip < m->start)
                        p = &(*p)->rb_left;
                else if (ip > m->end)
                        p = &(*p)->rb_right;
                else
                        return m;

Where is the problem?

> > Please separate adding page fault tracing recording on a file in a
> > separate patch from adding it to live mode, then the changelog message
> > can concentrate on examples for each feature.
> Ok.
> 
> > > -			if (sample.raw_data == NULL) {
> > > +			if (evsel->attr.type != PERF_TYPE_SOFTWARE &&
> > > +			    sample.raw_data == NULL) {
> > 
> > This looks like a separate patch with relevant associated changelog
> > message explaining why this is needed.
> No, this belongs here. When we enable page faults, they don't have any
> raw data (while syscalls have). So we still want to keep this check for
> syscalls but don't want it for fault events.

Understood the intent, but the test here is really:

		if (evsel->attr.type == PERF_TYPE_TRACEPOINT &&
		    sample.raw_data == NULL)

Ok?
 
> > > +	if (evsel &&
> > > +	    (perf_evsel__init_syscall_tp(evsel, trace__sys_enter) < 0 ||
> > > +	    perf_evsel__init_sc_tp_ptr_field(evsel, args))) {
> > >  		pr_err("Error during initialize raw_syscalls:sys_enter event\n");
> > >  		goto out;
> > 
> > the above can be ditched, not needed. Care to explain if you think
> > otherwise?

> This doesn't belong to this patch, but it's required because we can do
> 'trace --no-syscalls -F' and get only fault events without syscall events;
> we don't want to fail here.
> Will move to appropriate patch.

Thanks
 
> > > +	evlist__for_each(session->evlist, evsel) {
> > > +		if (evsel->attr.type == PERF_TYPE_SOFTWARE &&
> > > +		    (evsel->attr.config == PERF_COUNT_SW_PAGE_FAULTS_MAJ ||
> > > +		     evsel->attr.config == PERF_COUNT_SW_PAGE_FAULTS_MIN ||
> > > +		     evsel->attr.config == PERF_COUNT_SW_PAGE_FAULTS))
> > > +			evsel->handler = trace__pgfault;
> > 
> > the above looks ugly, can't we set the handler when adding the events?
> > Or is this just for the perf.data handling case? Have to dig deeper,
> > just a first feeling.
> It's for perf.data parsing. If you know a better way to do it without
> iterating over all events, pls let me know.

We had something related in a perf_evlist__set_tracepoint_handlers(),
but this case is for a software event, so we would need a
perf_evlist__set_attr_handlers(), I can do that later, for now this is
the only user, as this evsel->handler thing was introduced with 'trace',
so keep it like this, I can revisit this later.
 
> > >  	}
> > >  
> > >  	err = parse_target_str(trace);
> > > @@ -2290,6 +2416,8 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
> > >  			.user_interval = ULLONG_MAX,
> > >  			.no_buffering  = true,
> > >  			.mmap_pages    = 1024,
> > > +			.sample_address	= true,
> > > +			.sample_time	= true,
> > 
> > The above should be made conditional, i.e. if --pf is used?
> Yes, fixed, thanks.

Also, have you considered using:

[root@zoo ~]# perf list exceptions:page_fault*
  exceptions:page_fault_user                         [Tracepoint event]
  exceptions:page_fault_kernel                       [Tracepoint event]
[root@zoo ~]#

Instead? I need to check if they're completely equivalent to what we
need here...

- Arnaldo

  reply	other threads:[~2014-06-18 20:04 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-18 14:59 [RFC PATCH 0/5] perf trace pagefaults Stanislav Fomichev
2014-06-18 14:59 ` [PATCH 1/5] perf trace: add support for pagefault tracing Stanislav Fomichev
2014-06-18 15:47   ` Arnaldo Carvalho de Melo
2014-06-18 17:42     ` Stanislav Fomichev
2014-06-18 20:04       ` Arnaldo Carvalho de Melo [this message]
2014-06-19 12:06         ` Stanislav Fomichev
2014-06-25 11:33           ` Stanislav Fomichev
2014-06-18 14:59 ` [PATCH 2/5] perf trace: add pagefault statistics Stanislav Fomichev
2014-06-18 14:59 ` [PATCH 3/5] perf trace: add possibility to switch off syscall events Stanislav Fomichev
2014-06-18 14:59 ` [PATCH 4/5] perf kvm: move perf_kvm__mmap_read into session utils Stanislav Fomichev
2014-06-18 14:59 ` [PATCH 5/5] perf trace: add events cache Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140618200432.GA20252@kernel.org \
    --to=acme@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=adrian.hunter@intel.com \
    --cc=dsahern@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=paulus@samba.org \
    --cc=stfomichev@yandex-team.ru \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    --cc=yangds.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.