Issue perf attaching to processes creating many short-live threads

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Issue perf attaching to processes creating many short-live threads
@ 2015-10-23 18:52 William Cohen
  2015-10-23 18:56 ` David Ahern
  0 siblings, 1 reply; 9+ messages in thread
From: William Cohen @ 2015-10-23 18:52 UTC (permalink / raw)
  To: linux-perf-use.; +Cc: 大平怜

[-- Attachment #1: Type: text/plain, Size: 2799 bytes --]

Earlier this month Rei Odeira found that the oprofile tool operf would
have problems attaching and monitoring a process that created many
very short-lived threads.  It looks like the kernel's perf tool also
has issues when attempting to attach and monitor a process that is
creating many short-lived threads.  Attached is the source code used to
reproduce the problem.  The code is compiled and run with the
following commands.  The arguments to the reproducer are the total
number of threads to spawn, the number of concurrent threads, and the
number of times each threads loops.  When run with "-1" as first argument
it will need to be stopped with a cntl-c.

$ gcc -o oprofile_multithread_test oprofile_multithread_test.c -lpthread
$ ./oprofile_multithread_test 
Usage: oprofile_multithread_test <number of spawns> <number of threads> <number of operations per thread>
$ ./oprofile_multithread_test -1 16 100000

Having the reproducer run as a child of perf works fine.

$ perf --version
perf version 4.2.3-200.fc22.x86_64
$ perf stat ./oprofile_multithread_test -1 16 1000000
^C./oprofile_multithread_test: Interrupt
failed to read counter stalled-cycles-backend

 Performance counter stats for './oprofile_multithread_test -1 16 1000000':

      54632.571382      task-clock (msec)         #    5.764 CPUs utilized          
            23,447      context-switches          #    0.429 K/sec                  
            16,153      cpu-migrations            #    0.296 K/sec                  
                86      page-faults               #    0.002 K/sec                  
   168,749,585,390      cycles                    #    3.089 GHz                    
   136,160,264,023      stalled-cycles-frontend   #   80.69% frontend cycles idle   
   <not supported>      stalled-cycles-backend   
    95,947,021,711      instructions              #    0.57  insns per cycle        
                                                  #    1.42  stalled cycles per insn
    16,018,454,088      branches                  #  293.203 M/sec                  
         6,990,932      branch-misses             #    0.04% of all branches        

       9.477617613 seconds time elapsed


However, when the starting the reproducer program and then attaching
to pid using '-p' one sometimes gets the following failure:

$ ./oprofile_multithread_test -1 16 100000 &
$ perf stat -p `pgrep oprofile_mul`
Error:
The sys_perf_event_open() syscall returned with 3 (No such process) for event (instructions).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?


I got the same results using perf built from a check out of the
current mainline linux kernel.  Shouldn't perf be able to attach to a
process regardless of how quickly it is creating threads?

-Will

[-- Attachment #2: oprofile_multithread_test.c --]
[-- Type: text/x-csrc, Size: 1785 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <pthread.h>

static int num_ops;
static pthread_t *thr_array;


static void *
thr_main(void *arg)
{
  int i;
  int sum = 0;

  for (i = 0; i < num_ops; i++) {
    sum += i;
  }

  return (void *)(intptr_t)sum;
}

static void
spawn_thread(int thr)
{
  int ret;

  ret = pthread_create(&thr_array[thr], NULL, thr_main, NULL);
  if (ret != 0) {
    fprintf(stderr, "pthread_create: %s\n", strerror(ret));
    exit(1);
  }
}

static void
join_thread(int thr)
{
  int ret;

  ret = pthread_join(thr_array[thr], NULL);
  if (ret != 0) {
    fprintf(stderr, "pthread_join: %s\n", strerror(ret));
    exit(1);
  }
}

int
main(int argc, char *argv[])
{
  int num_spawns;
  int num_threads;
  int thr;
  int thr_saved;
  int ret;
  int spawn_count;

  if (argc != 4) {
    fprintf(stderr, "Usage: oprofile_multithread_test <number of spawns> <number of threads> <number of operations per thread>\n");
    exit(1);
  }

  num_spawns = atoi(argv[1]);
  num_threads = atoi(argv[2]);
  num_ops = atoi(argv[3]);
  if (num_threads < 1) {
    fprintf(stderr, "Number of threads must be positive.\n");
    exit(1);
  }

  thr_array = malloc(sizeof(pthread_t) * num_threads);
  if (thr_array == NULL) {
    fprintf(stderr, "Cannot allocate thr_array\n");
    exit(1);
  }

  spawn_count = 0;
  for (thr = 0; thr < num_threads; thr++) {
    spawn_thread(thr);
    spawn_count++;
  }

  thr = 0;
  while  (num_spawns < 0 ? 1 /* infinite loop */ : spawn_count < num_spawns) {
    join_thread(thr);
    spawn_thread(thr);
    thr = (thr + 1) % num_threads;
    spawn_count++;
  }

  thr_saved = thr;
  do {
    join_thread(thr);
    thr = (thr + 1) % num_threads;    
  } while (thr != thr_saved);

  free(thr_array);
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue perf attaching to processes creating many short-live threads
  2015-10-23 18:52 Issue perf attaching to processes creating many short-live threads William Cohen
@ 2015-10-23 18:56 ` David Ahern
  2015-10-23 19:27   ` William Cohen
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2015-10-23 18:56 UTC (permalink / raw)
  To: William Cohen, linux-perf-use.; +Cc: 大平怜

On 10/23/15 12:52 PM, William Cohen wrote:
> Earlier this month Rei Odeira found that the oprofile tool operf would
> have problems attaching and monitoring a process that created many
> very short-lived threads.  It looks like the kernel's perf tool also
> has issues when attempting to attach and monitor a process that is
> creating many short-lived threads.

known a problem. If this is the problem I think it is you will find that 
strace shows perf stuck walking /proc directory.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue perf attaching to processes creating many short-live threads
  2015-10-23 18:56 ` David Ahern
@ 2015-10-23 19:27   ` William Cohen
  2015-10-23 19:35     ` David Ahern
  0 siblings, 1 reply; 9+ messages in thread
From: William Cohen @ 2015-10-23 19:27 UTC (permalink / raw)
  To: David Ahern, linux-perf-use.; +Cc: 大平怜, oprofile-list

On 10/23/2015 02:56 PM, David Ahern wrote:
> On 10/23/15 12:52 PM, William Cohen wrote:
>> Earlier this month Rei Odeira found that the oprofile tool operf would
>> have problems attaching and monitoring a process that created many
>> very short-lived threads.  It looks like the kernel's perf tool also
>> has issues when attempting to attach and monitor a process that is
>> creating many short-lived threads.
> 
> known a problem. If this is the problem I think it is you will find that strace shows perf stuck walking /proc directory.
> 
> David
> 

Hi David,

Is the following thread related to the problem?

[PATCH 1/1] perf,tools: add time out to force stop endless mmap processing" 
 http://lkml.iu.edu/hypermail/linux/kernel/1506.1/02251.html 

Or is there some other thread about the problem?

-Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue perf attaching to processes creating many short-live threads
  2015-10-23 19:27   ` William Cohen
@ 2015-10-23 19:35     ` David Ahern
  2015-10-26 19:49       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2015-10-23 19:35 UTC (permalink / raw)
  To: William Cohen, linux-perf-use.; +Cc: 大平怜, oprofile-list

On 10/23/15 1:27 PM, William Cohen wrote:
> On 10/23/2015 02:56 PM, David Ahern wrote:
>> On 10/23/15 12:52 PM, William Cohen wrote:
>>> Earlier this month Rei Odeira found that the oprofile tool operf would
>>> have problems attaching and monitoring a process that created many
>>> very short-lived threads.  It looks like the kernel's perf tool also
>>> has issues when attempting to attach and monitor a process that is
>>> creating many short-lived threads.
>>
>> known a problem. If this is the problem I think it is you will find that strace shows perf stuck walking /proc directory.
>>
>> David
>>
>
> Hi David,
>
> Is the following thread related to the problem?
>
> [PATCH 1/1] perf,tools: add time out to force stop endless mmap processing"
>   http://lkml.iu.edu/hypermail/linux/kernel/1506.1/02251.html
>
> Or is there some other thread about the problem?

That's a different problem as I recall -- a single process constantly 
changing mmaps such that perf never has a chance to read the file.

I was referring to something like 'make -j 1024' on a large system 
(e.g., 512 or 1024 cpus) and then starting perf. This is the same 
problem you are describing -- lot of short lived processes. I am fairly 
certain I described the problem on lkml or perf mailing list. Not even 
the task_diag proposal (task_diag uses netlink to push task data to perf 
versus walking /proc) has a chance to keep up.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue perf attaching to processes creating many short-live threads
  2015-10-23 19:35     ` David Ahern
@ 2015-10-26 19:49       ` Arnaldo Carvalho de Melo
  2015-10-26 21:42         ` David Ahern
  0 siblings, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-10-26 19:49 UTC (permalink / raw)
  To: David Ahern
  Cc: William Cohen, linux-perf-use., 大平怜,
	oprofile-list

Em Fri, Oct 23, 2015 at 01:35:43PM -0600, David Ahern escreveu:
> I was referring to something like 'make -j 1024' on a large system (e.g.,
> 512 or 1024 cpus) and then starting perf. This is the same problem you are
> describing -- lot of short lived processes. I am fairly certain I described
> the problem on lkml or perf mailing list. Not even the task_diag proposal
> (task_diag uses netlink to push task data to perf versus walking /proc) has
> a chance to keep up.

Yeah, to get info about existing threads (its maps, comm, etc) you would
pretty much have to stop the world, collect the info, then let
everything go back running because then new threads would insert the
PERF_RECORD_{FORK,COMM,MMAP,etc} records in the ring buffer.

I think we need an option to say: don't try to find info about existing
threads, i.e. don't look at /proc at all, we would end up with samples
being attributed to a pid/tid and that would be it, should be useful for
some use cases, no?

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue perf attaching to processes creating many short-live threads
  2015-10-26 19:49       ` Arnaldo Carvalho de Melo
@ 2015-10-26 21:42         ` David Ahern
  2015-10-27 12:33           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2015-10-26 21:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: William Cohen, linux-perf-use., 大平怜,
	oprofile-list

On 10/26/15 1:49 PM, Arnaldo Carvalho de Melo wrote:
> Em Fri, Oct 23, 2015 at 01:35:43PM -0600, David Ahern escreveu:
>> I was referring to something like 'make -j 1024' on a large system (e.g.,
>> 512 or 1024 cpus) and then starting perf. This is the same problem you are
>> describing -- lot of short lived processes. I am fairly certain I described
>> the problem on lkml or perf mailing list. Not even the task_diag proposal
>> (task_diag uses netlink to push task data to perf versus walking /proc) has
>> a chance to keep up.
>
> Yeah, to get info about existing threads (its maps, comm, etc) you would
> pretty much have to stop the world, collect the info, then let
> everything go back running because then new threads would insert the
> PERF_RECORD_{FORK,COMM,MMAP,etc} records in the ring buffer.
>
> I think we need an option to say: don't try to find info about existing
> threads, i.e. don't look at /proc at all, we would end up with samples
> being attributed to a pid/tid and that would be it, should be useful for
> some use cases, no?

Seems to me it would just be a lot of random numbers on a screen. 
Correlating data to user readable information is a key part of perf.

One option that might be able to solve this problem is to have perf 
kernel side walk the task list and generate the task events into the 
ring buffer (task diag code could be leveraged). This would be a lot 
faster than reading proc or using netlink but would have other 
throughput problems to deal with.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue perf attaching to processes creating many short-live threads
  2015-10-26 21:42         ` David Ahern
@ 2015-10-27 12:33           ` Arnaldo Carvalho de Melo
  2015-10-27 14:15             ` David Ahern
  0 siblings, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-10-27 12:33 UTC (permalink / raw)
  To: David Ahern
  Cc: linux-perf-use., William Cohen, 大平怜,
	oprofile-list

Em Mon, Oct 26, 2015 at 03:42:01PM -0600, David Ahern escreveu:
> On 10/26/15 1:49 PM, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Oct 23, 2015 at 01:35:43PM -0600, David Ahern escreveu:
> >> I was referring to something like 'make -j 1024' on a large system (e.g.,
> >> 512 or 1024 cpus) and then starting perf. This is the same problem you are
> >> describing -- lot of short lived processes. I am fairly certain I described
> >> the problem on lkml or perf mailing list. Not even the task_diag proposal
> >> (task_diag uses netlink to push task data to perf versus walking /proc) has
> >> a chance to keep up.

> > Yeah, to get info about existing threads (its maps, comm, etc) you would
> > pretty much have to stop the world, collect the info, then let
> > everything go back running because then new threads would insert the
> > PERF_RECORD_{FORK,COMM,MMAP,etc} records in the ring buffer.

> > I think we need an option to say: don't try to find info about existing
> > threads, i.e. don't look at /proc at all, we would end up with samples
> > being attributed to a pid/tid and that would be it, should be useful for
> > some use cases, no?

> Seems to me it would just be a lot of random numbers on a screen. 

For the existing threads? Yes, one would know that there were N threads,
the relationship among those threads, and then, the usual output for the
new threads.

> Correlating data to user readable information is a key part of perf.

Indeed, as best as it can.
 
> One option that might be able to solve this problem is to have perf 
> kernel side walk the task list and generate the task events into the 
> ring buffer (task diag code could be leveraged). This would be a lot 

It would have to do this over multiple iterations, locklessly wrt the
task list, in a non-intrusive way, which, in this case, could take
forever, no? :-)

> faster than reading proc or using netlink but would have other 
> throughput problems to deal with.

Indeed.

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue perf attaching to processes creating many short-live threads
  2015-10-27 12:33           ` Arnaldo Carvalho de Melo
@ 2015-10-27 14:15             ` David Ahern
  2015-10-27 14:47               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2015-10-27 14:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-perf-use., William Cohen, 大平怜,
	oprofile-list

On 10/27/15 6:33 AM, Arnaldo Carvalho de Melo wrote:
>
>> Correlating data to user readable information is a key part of perf.
>
> Indeed, as best as it can.
>
>> One option that might be able to solve this problem is to have perf
>> kernel side walk the task list and generate the task events into the
>> ring buffer (task diag code could be leveraged). This would be a lot
>
> It would have to do this over multiple iterations, locklessly wrt the
> task list, in a non-intrusive way, which, in this case, could take
> forever, no? :-)

taskdiag struggles to keep up because netlink messages have a limited 
size, the skb's have to be pushed to userspace and ack'ed and then the 
walk proceeds to the next task.

Filenames for the maps are the biggest killer on throughput wrt kernel 
side processing.

With a multi-MB ring buffer you have a much larger buffer to fill. In 
addition perf userspace can be kicked at a low watermark so it is 
draining that buffer as fast as it can:

    kernel   --->   ring buffer  --->  perf  -->  what?

The limiter here is perf userspace draining the buffer such that the 
kernel side does not have to take much if any break.

If the "What" is a file (e.g., perf record) then file I/O becomes a 
limiter. If the "What" is processing the data (e.g., perf top) we should 
be able to come up with some design that at least pulls the data into 
memory so the buffer never fills.

Sure there would need to be some progress limiters put to keep the 
kernel side from killing a cpu but I think this kind of design has the 
best chance of getting the most information for this class of problem.

And then for all of the much smaller more typical perf use cases this 
kind of data collection is much less expensive than walking proc. 
taskdiag shows that and this design is faster and more efficient than 
taskdiag.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Issue perf attaching to processes creating many short-live threads
  2015-10-27 14:15             ` David Ahern
@ 2015-10-27 14:47               ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-10-27 14:47 UTC (permalink / raw)
  To: David Ahern
  Cc: linux-perf-use., William Cohen, 大平怜,
	oprofile-list

Em Tue, Oct 27, 2015 at 08:15:31AM -0600, David Ahern escreveu:
> On 10/27/15 6:33 AM, Arnaldo Carvalho de Melo wrote:
> >>Correlating data to user readable information is a key part of perf.

> >Indeed, as best as it can.

> >>One option that might be able to solve this problem is to have perf
> >>kernel side walk the task list and generate the task events into the
> >>ring buffer (task diag code could be leveraged). This would be a lot
> >
> >It would have to do this over multiple iterations, locklessly wrt the
> >task list, in a non-intrusive way, which, in this case, could take
> >forever, no? :-)
> 
> taskdiag struggles to keep up because netlink messages have a limited size,
> the skb's have to be pushed to userspace and ack'ed and then the walk
> proceeds to the next task.
> 
> Filenames for the maps are the biggest killer on throughput wrt kernel side
> processing.
> 
> With a multi-MB ring buffer you have a much larger buffer to fill. In
> addition perf userspace can be kicked at a low watermark so it is draining
> that buffer as fast as it can:
> 
>    kernel   --->   ring buffer  --->  perf  -->  what?
> 
> The limiter here is perf userspace draining the buffer such that the kernel
> side does not have to take much if any break.
> 
> If the "What" is a file (e.g., perf record) then file I/O becomes a limiter.
> If the "What" is processing the data (e.g., perf top) we should be able to
> come up with some design that at least pulls the data into memory so the
> buffer never fills.
> 
> Sure there would need to be some progress limiters put to keep the kernel
> side from killing a cpu but I think this kind of design has the best chance
> of getting the most information for this class of problem.
> 
> And then for all of the much smaller more typical perf use cases this kind
> of data collection is much less expensive than walking proc. taskdiag shows
> that and this design is faster and more efficient than taskdiag.

Definetely, if we can avoid looking at /proc for what we need, that
would be better. Hope you can continue working on that or that someone
else picks the baton and get that to a mergeable form.

But in extreme cases, not even that would work.

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-10-27 14:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-23 18:52 Issue perf attaching to processes creating many short-live threads William Cohen
2015-10-23 18:56 ` David Ahern
2015-10-23 19:27   ` William Cohen
2015-10-23 19:35     ` David Ahern
2015-10-26 19:49       ` Arnaldo Carvalho de Melo
2015-10-26 21:42         ` David Ahern
2015-10-27 12:33           ` Arnaldo Carvalho de Melo
2015-10-27 14:15             ` David Ahern
2015-10-27 14:47               ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).