tracing: horrible read performance on host with many CPUs

All of lore.kernel.org
 help / color / mirror / Atom feed

* tracing: horrible read performance on host with many CPUs
@ 2014-08-27  8:50 Dmitry Monakhov
  2014-08-27  9:20 ` Steven Rostedt
  0 siblings, 1 reply; 2+ messages in thread
From: Dmitry Monakhov @ 2014-08-27  8:50 UTC (permalink / raw)
  To: LKML; +Cc: rostedt

I have tried to use tracing on host with 32cpus, but it is appeared
that performance is horrible.
dd if=/sys/kernel/debug/tracing/trace_pipe of=tmpfs/t3.log  bs=1M 
0+21268 records in
0+21267 records out
85701248 bytes (86 MB) copied, 26.1424 s, 3.3 MB/s
0+25706 records in
0+25705 records out
103600749 bytes (104 MB) copied, 31.6595 s, 3.3 MB/s
0+59204 records in
0+59203 records out
238746128 bytes (239 MB) copied, 73.4347 s, 3.3 MB/s
Since I've collected ~3Gb of data this takes a lot of time to
simply copy from kernel to tmpfs. 

AFAIU this happen due to sub-optimal sorting procedure __find_next_entry
Each time it walks each cpu and pick the one with smallest timestamp.
This can be optimized simply by fetching N-entries at the time. Are
there any plans to implement that?

BTW:What is the most convenient way fetch big data from traces?
One of possible way is to dump per-cpu traces(20Mb/s in my case) and
then merge files according to timestamp

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: tracing: horrible read performance on host with many CPUs
  2014-08-27  8:50 tracing: horrible read performance on host with many CPUs Dmitry Monakhov
@ 2014-08-27  9:20 ` Steven Rostedt
  0 siblings, 0 replies; 2+ messages in thread
From: Steven Rostedt @ 2014-08-27  9:20 UTC (permalink / raw)
  To: Dmitry Monakhov, LKML

Use trace-cmd. It reads the per cpu files and sorts later

-- Steve

On August 27, 2014 4:50:38 AM GMT-04:00, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
>
>I have tried to use tracing on host with 32cpus, but it is appeared
>that performance is horrible.
>dd if=/sys/kernel/debug/tracing/trace_pipe of=tmpfs/t3.log  bs=1M 
>0+21268 records in
>0+21267 records out
>85701248 bytes (86 MB) copied, 26.1424 s, 3.3 MB/s
>0+25706 records in
>0+25705 records out
>103600749 bytes (104 MB) copied, 31.6595 s, 3.3 MB/s
>0+59204 records in
>0+59203 records out
>238746128 bytes (239 MB) copied, 73.4347 s, 3.3 MB/s
>Since I've collected ~3Gb of data this takes a lot of time to
>simply copy from kernel to tmpfs. 
>
>AFAIU this happen due to sub-optimal sorting procedure
>__find_next_entry
>Each time it walks each cpu and pick the one with smallest timestamp.
>This can be optimized simply by fetching N-entries at the time. Are
>there any plans to implement that?
>
>BTW:What is the most convenient way fetch big data from traces?
>One of possible way is to dump per-cpu traces(20Mb/s in my case) and
>then merge files according to timestamp

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-08-27  9:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-27  8:50 tracing: horrible read performance on host with many CPUs Dmitry Monakhov
2014-08-27  9:20 ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.