All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Badari Pulavarty <pbadari@us.ibm.com>, fche@redhat.com
Cc: linux-mm <linux-mm@kvack.org>,
	lkml <linux-kernel@vger.kernel.org>,
	systemtap@sources.redhat.com
Subject: Re: Better pagecache statistics ?
Date: Tue, 27 Dec 2005 23:33:25 -0200	[thread overview]
Message-ID: <20051228013325.GA4144@dmt.cnet> (raw)
In-Reply-To: <20051201152029.GA14499@dmt.cnet>


Badari, any improvements on the {add_to,remove_from}_page_cache hooks?

> I just started playing with SystemTap yesterday. First
> thing I want to record is "what is the latency of 
> direct reclaim".

I've come up with something which works, though pretty dumb and
inefficient.

I'm facing three problems, maybe someone has a clue on how to improve 
the situation.

a) nanosecond timekeeping 

Since the systemtap language does not support "struct" abstraction, but
simply "long/string/array" types, there is no way to easily return more
than one value from a function. Is it possible to pass references down
to functions so as to return more than one value? 

I failed to find any way to do that.

For nanosecond timekeeping one needs second/nanosecond tuple (struct
timespec).

b) ERROR: MAXACTION exceeded near identifier 'log' at ttfp_delay.stp:49:3

The array size is capped to a maximum. Is there any way to configure
SystemTap to periodically dump-and-zero the arrays? This makes lots of
sense to any statistical gathering code.

c) Hash tables

It would be better to store the log entries in a hash table, the present
script uses the "current" pointer as a key into a pair of arrays,
incrementing the key until a free one is found (which can be very
inefficient).

A hash table would be much more efficient, but allocating memory inside
the scripts is tricky. A pre-allocated, pre-sized pool of memory could 
work well for this purpose. The "dump-array-entries-to-userspace" action
could be used to free them.

So both b) and c) could be fixed with the same logic:

- dump entries to userspace if memory pool is getting short 
on free entries.
- periodically dump entries to userspace (akin to "bdflush").

And finally, there seems to be a bug which results in _very_ large
(several seconds) delays - that seems unlikely to really happening.

Thoughts?

/* 
 * ttfp_delay - measure direct reclaim latency 
 */

global count_try_to_free_pages
global count_exit_try_to_free_pages

global entry_array_us
global exit_array_us

global entry_array_ms
global exit_array_ms

function get_currentpointer:long () %{
	THIS->__retvalue = (int) current;
%}

probe kernel.function("try_to_free_pages")
{
	current_p = get_currentpointer();
	++count_try_to_free_pages;
	while (entry_array_us[current_p])
		++current_p;

	entry_array_us[current_p] = gettimeofday_us();
	entry_array_ms[current_p] = gettimeofday_ms();
}

probe kernel.function("try_to_free_pages").return
{
	current_p = get_currentpointer();
	++count_exit_try_to_free_pages;
	while (exit_array_us[current_p])
		++current_p;

	exit_array_us[current_p] = gettimeofday_us();
	exit_array_ms[current_p] = gettimeofday_ms();
}

probe begin { log("starting probe") }

probe end
{
	log("ending probe")
	log ("calls to try_to_free_pages: " . string(count_try_to_free_pages));
	log ("returns from try_to_free_pages: " . string(count_exit_try_to_free_pages));
	foreach(var in entry_array_us) {
		pos++;
		log ("try_to_free_pages (" . string(pos) .  ") delta: " . string(exit_array_us[var] - entry_array_us[var]) . "us " .string (exit_array_ms[var] - entry_array_ms[var]) . "ms ");
	}

}


example output, running a 800MB "dd" copy on the background.

[root@dmt examples]# stap -g ttfp_delay.stp 
starting probe
ending probe
calls to try_to_free_pages: 387
returns from try_to_free_pages: 373
try_to_free_pages (1) delta: 15028us 15ms 
try_to_free_pages (2) delta: 47677211us 47677ms 
try_to_free_pages (3) delta: 39us 0ms 
try_to_free_pages (4) delta: 35us 0ms 
try_to_free_pages (5) delta: 152us 0ms 
try_to_free_pages (6) delta: 104us 0ms 
try_to_free_pages (7) delta: 353us 0ms 
try_to_free_pages (8) delta: 61us 0ms 
try_to_free_pages (9) delta: 187us 0ms 
try_to_free_pages (10) delta: 55us 0ms 
try_to_free_pages (11) delta: 50us 0ms 
try_to_free_pages (12) delta: 30us 0ms 
try_to_free_pages (13) delta: 31us 0ms 
try_to_free_pages (14) delta: 42us 0ms 
try_to_free_pages (15) delta: 37us 0ms 
try_to_free_pages (16) delta: 178us 0ms 
try_to_free_pages (17) delta: 34us 0ms 
try_to_free_pages (18) delta: 37us 0ms 
try_to_free_pages (19) delta: 35us 0ms 
try_to_free_pages (20) delta: 34us 0ms 
try_to_free_pages (21) delta: 65us 0ms 
...


WARNING: multiple messages have this Message-ID (diff)
From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
To: Badari Pulavarty <pbadari@us.ibm.com>, fche@redhat.com
Cc: linux-mm <linux-mm@kvack.org>,
	lkml <linux-kernel@vger.kernel.org>,
	systemtap@sources.redhat.com
Subject: Re: Better pagecache statistics ?
Date: Tue, 27 Dec 2005 23:33:25 -0200	[thread overview]
Message-ID: <20051228013325.GA4144@dmt.cnet> (raw)
In-Reply-To: <20051201152029.GA14499@dmt.cnet>

Badari, any improvements on the {add_to,remove_from}_page_cache hooks?

> I just started playing with SystemTap yesterday. First
> thing I want to record is "what is the latency of 
> direct reclaim".

I've come up with something which works, though pretty dumb and
inefficient.

I'm facing three problems, maybe someone has a clue on how to improve 
the situation.

a) nanosecond timekeeping 

Since the systemtap language does not support "struct" abstraction, but
simply "long/string/array" types, there is no way to easily return more
than one value from a function. Is it possible to pass references down
to functions so as to return more than one value? 

I failed to find any way to do that.

For nanosecond timekeeping one needs second/nanosecond tuple (struct
timespec).

b) ERROR: MAXACTION exceeded near identifier 'log' at ttfp_delay.stp:49:3

The array size is capped to a maximum. Is there any way to configure
SystemTap to periodically dump-and-zero the arrays? This makes lots of
sense to any statistical gathering code.

c) Hash tables

It would be better to store the log entries in a hash table, the present
script uses the "current" pointer as a key into a pair of arrays,
incrementing the key until a free one is found (which can be very
inefficient).

A hash table would be much more efficient, but allocating memory inside
the scripts is tricky. A pre-allocated, pre-sized pool of memory could 
work well for this purpose. The "dump-array-entries-to-userspace" action
could be used to free them.

So both b) and c) could be fixed with the same logic:

- dump entries to userspace if memory pool is getting short 
on free entries.
- periodically dump entries to userspace (akin to "bdflush").

And finally, there seems to be a bug which results in _very_ large
(several seconds) delays - that seems unlikely to really happening.

Thoughts?

/* 
 * ttfp_delay - measure direct reclaim latency 
 */

global count_try_to_free_pages
global count_exit_try_to_free_pages

global entry_array_us
global exit_array_us

global entry_array_ms
global exit_array_ms

function get_currentpointer:long () %{
	THIS->__retvalue = (int) current;
%}

probe kernel.function("try_to_free_pages")
{
	current_p = get_currentpointer();
	++count_try_to_free_pages;
	while (entry_array_us[current_p])
		++current_p;

	entry_array_us[current_p] = gettimeofday_us();
	entry_array_ms[current_p] = gettimeofday_ms();
}

probe kernel.function("try_to_free_pages").return
{
	current_p = get_currentpointer();
	++count_exit_try_to_free_pages;
	while (exit_array_us[current_p])
		++current_p;

	exit_array_us[current_p] = gettimeofday_us();
	exit_array_ms[current_p] = gettimeofday_ms();
}

probe begin { log("starting probe") }

probe end
{
	log("ending probe")
	log ("calls to try_to_free_pages: " . string(count_try_to_free_pages));
	log ("returns from try_to_free_pages: " . string(count_exit_try_to_free_pages));
	foreach(var in entry_array_us) {
		pos++;
		log ("try_to_free_pages (" . string(pos) .  ") delta: " . string(exit_array_us[var] - entry_array_us[var]) . "us " .string (exit_array_ms[var] - entry_array_ms[var]) . "ms ");
	}

}


example output, running a 800MB "dd" copy on the background.

[root@dmt examples]# stap -g ttfp_delay.stp 
starting probe
ending probe
calls to try_to_free_pages: 387
returns from try_to_free_pages: 373
try_to_free_pages (1) delta: 15028us 15ms 
try_to_free_pages (2) delta: 47677211us 47677ms 
try_to_free_pages (3) delta: 39us 0ms 
try_to_free_pages (4) delta: 35us 0ms 
try_to_free_pages (5) delta: 152us 0ms 
try_to_free_pages (6) delta: 104us 0ms 
try_to_free_pages (7) delta: 353us 0ms 
try_to_free_pages (8) delta: 61us 0ms 
try_to_free_pages (9) delta: 187us 0ms 
try_to_free_pages (10) delta: 55us 0ms 
try_to_free_pages (11) delta: 50us 0ms 
try_to_free_pages (12) delta: 30us 0ms 
try_to_free_pages (13) delta: 31us 0ms 
try_to_free_pages (14) delta: 42us 0ms 
try_to_free_pages (15) delta: 37us 0ms 
try_to_free_pages (16) delta: 178us 0ms 
try_to_free_pages (17) delta: 34us 0ms 
try_to_free_pages (18) delta: 37us 0ms 
try_to_free_pages (19) delta: 35us 0ms 
try_to_free_pages (20) delta: 34us 0ms 
try_to_free_pages (21) delta: 65us 0ms 
...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2005-12-28  6:13 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-30 18:57 Better pagecache statistics ? Badari Pulavarty
2005-11-30 18:57 ` Badari Pulavarty
2005-12-01  2:35 ` Hareesh Nagarajan
2005-12-01  2:35   ` Hareesh Nagarajan
2005-12-01 15:20 ` Marcelo Tosatti
2005-12-01 15:20   ` Marcelo Tosatti
2005-12-01 15:59   ` Badari Pulavarty
2005-12-01 15:59     ` Badari Pulavarty
2005-12-01 16:10     ` Arjan van de Ven
2005-12-01 16:10       ` Arjan van de Ven
2005-12-01 16:23       ` Badari Pulavarty
2005-12-01 16:23         ` Badari Pulavarty
2005-12-01 17:08       ` Marcelo Tosatti
2005-12-01 17:08         ` Marcelo Tosatti
2005-12-01 17:15         ` Badari Pulavarty
2005-12-01 17:15           ` Badari Pulavarty
2005-12-01 17:21           ` Arjan van de Ven
2005-12-01 17:21             ` Arjan van de Ven
2005-12-01 17:57             ` Marcelo Tosatti
2005-12-01 17:57               ` Marcelo Tosatti
2005-12-01 18:20               ` Badari Pulavarty
2005-12-01 18:20                 ` Badari Pulavarty
2005-12-02 22:15                 ` Frank Ch. Eigler
2005-12-02 22:15                   ` Frank Ch. Eigler
2005-12-02 22:31                   ` Badari Pulavarty
2005-12-02 22:31                     ` Badari Pulavarty
2005-12-02 22:46                     ` Frank Ch. Eigler
2005-12-02 22:46                       ` Frank Ch. Eigler
2005-12-02 23:46                       ` Badari Pulavarty
2005-12-01 18:24               ` Badari Pulavarty
2005-12-01 18:24                 ` Badari Pulavarty
2005-12-04 18:48           ` Martin J. Bligh
2005-12-04 18:48             ` Martin J. Bligh
2005-12-01 17:19     ` Marcelo Tosatti
2005-12-01 17:19       ` Marcelo Tosatti
2005-12-01 17:31       ` Badari Pulavarty
2005-12-01 17:31         ` Badari Pulavarty
2005-12-01 18:15         ` Marcelo Tosatti
2005-12-01 18:15           ` Marcelo Tosatti
2005-12-01 18:25           ` Badari Pulavarty
2005-12-01 18:25             ` Badari Pulavarty
2005-12-01 16:00   ` Marcelo Tosatti
2005-12-01 16:00     ` Marcelo Tosatti
2005-12-01 21:16     ` Christoph Lameter
2005-12-01 21:16       ` Christoph Lameter
2005-12-02  0:13       ` Badari Pulavarty
2005-12-02  0:13         ` Badari Pulavarty
2005-12-28  1:33   ` Marcelo Tosatti [this message]
2005-12-28  1:33     ` Marcelo Tosatti
2005-12-28 19:36     ` Tom Zanussi
2005-12-28 19:36       ` Tom Zanussi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20051228013325.GA4144@dmt.cnet \
    --to=marcelo.tosatti@cyclades.com \
    --cc=fche@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbadari@us.ibm.com \
    --cc=systemtap@sources.redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.