[LTP] [RFC PATCH] mm: rewrite mtest01 with new API

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Stancek <jstancek@redhat.com>
To: ltp@lists.linux.it
Subject: [LTP] [RFC PATCH] mm: rewrite mtest01 with new API
Date: Thu, 28 Feb 2019 17:08:59 -0500 (EST)	[thread overview]
Message-ID: <646940042.3510845.1551391739985.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20190228074002.14351-1-liwang@redhat.com>

Hi,

----- Original Message -----
> Test issue:
>    mtest01 start many children to alloc chunck of memory and do write
>    page(with -w option), but occasionally some children were killed by
>    oom-killer and exit with SIGCHLD signal sending. After the parent
>    reciving this SIGCHLD signal it will report FAIL as a test result.
> 
>    It seems not a real kernel bug if something just like that, it's
>    trying to use 80% of memory and swap. Once it uses most of memory,
>    system starts swapping, but the test is likely consuming memory at
>    greater rate than kswapd can provide, which eventually triggers OOM.

This seems to be quite common on ppc systems (64k pages with slow I/O),
so I do welcome fix/rewrite.

> 
>    ---- FAIL LOG ----
>    mtest01     0  TINFO  :  Total memory already used on system = 1027392
>    kbytes
>    mtest01     0  TINFO  :  Total memory used needed to reach maximum =
>    12715520 kbytes
>    mtest01     0  TINFO  :  Filling up 80% of ram which is 11688128 kbytes
>    mtest01     1  TFAIL  :  mtest01.c:314: child process exited unexpectedly
>    -------------------
> 
>  Rewrite changes:
>    To make mtest01 more easier to understand, I just rewrite it into
>    LTP new API and make a little changes in children behavior.
> 
>    * drop the signal SIGCHLD action becasue new API help to
>    check_child_status
>    * make child pause itself after finishing their memory allocating/writing
>    * parent sends SIGCONT to make children continue and exit
>    * decrease the pressure to 50% total ram+swap for testing

Current behaviour varies a lot depending on system. I'm thinking if we should
just set it to 80% of free RAM. We already have number of OOM tests,
so maybe we don't need to worry about memory pressure here too.

> 
> Signed-off-by: Li Wang <liwang@redhat.com>
> ---
>  runtest/mm                             |   4 +-
>  testcases/kernel/mem/mtest01/mtest01.c | 430 ++++++++++++-------------
>  2 files changed, 204 insertions(+), 230 deletions(-)
> 

<snip>

> +
> +static void mem_test(void)
> +{
> +	int i, pid_cntr;
> +	pid_t pid;
> +	struct sigaction act;
> +
> +	act.sa_handler = handler;
> +	act.sa_flags = 0;
> +	sigemptyset(&act.sa_mask);
> +	sigaction(SIGRTMIN, &act, 0);

I was thinking if we can't "abuse" tst_futexes a bit. It's a piece of
shared memory we already have and could use for an atomic counter.

<snip>

> +	if (pid == 0)
> +		child_loop_alloc();
>  
> -		if (dowrite) {
> -			/* Total Free Post-Test RAM */
> -			post_mem =
> -			    (unsigned long long)sstats.mem_unit *
> -			    sstats.freeram;
> -			post_mem =
> -			    post_mem +
> -			    (unsigned long long)sstats.mem_unit *
> -			    sstats.freeswap;
> -
> -			while ((((unsigned long long)pre_mem - post_mem) <
> -				(unsigned long long)original_maxbytes) &&
> -			       pid_count < pid_cntr && !sigchld_count) {
> -				sleep(1);
> -				sysinfo(&sstats);
> -				post_mem =
> -				    (unsigned long long)sstats.mem_unit *
> -				    sstats.freeram;
> -				post_mem =
> -				    post_mem +
> -				    (unsigned long long)sstats.mem_unit *
> -				    sstats.freeswap;
> -			}
> -		}
> +	/* waits in the loop for all children finish allocating*/
> +	while(pid_count < pid_cntr)
> +		sleep(1);

What happens if one child hits OOM?

>  
> -		if (sigchld_count) {
> -			tst_resm(TFAIL, "child process exited unexpectedly");
> -		} else if (dowrite) {
> -			tst_resm(TPASS, "%llu kbytes allocated and used.",
> -				 original_maxbytes / 1024);
> -		} else {
> -			tst_resm(TPASS, "%llu kbytes allocated only.",
> -				 original_maxbytes / 1024);
> -		}
> +	if (dowrite) {
> +		sysinfo(&sstats);
> +		/* Total Free Post-Test RAM */
> +		post_mem = (unsigned long long)sstats.mem_unit * sstats.freeram;
> +		post_mem = post_mem + (unsigned long long)sstats.mem_unit *
> sstats.freeswap;
>  
> +		if (((pre_mem - post_mem) < original_maxbytes))
> +			tst_res(TFAIL, "kbytes allocated and used less than expected %llu",
> +					original_maxbytes / 1024);
> +		else
> +			tst_res(TPASS, "%llu kbytes allocated and used",
> +					original_maxbytes / 1024);
> +	} else {
> +		tst_res(TPASS, "%llu kbytes allocated only",
> +				original_maxbytes / 1024);
> +	}
> +
> +	i = 0;
> +	while (pid_list[i] > 0) {
> +		kill(pid_list[i], SIGCONT);
> +		i++;
>  	}
> -	cleanup();
> -	tst_exit();
>  }
> +
> +static struct tst_test test = {
> +	.forks_child = 1,
> +	.options = mtest_options,
> +	.setup = setup,
> +	.cleanup = cleanup,
> +	.test_all = mem_test,

Is default timeout going to work on large boxes (256GB+ RAM)?


Thinking loud, what if...
- we define at the start of test how much memory we want to allocate (target == 80% of free RAM)
- we allocate a shared memory for counter, that each child increases
  as it allocates memory (progress)
  (or we abuse tst_futexes)
  we could use tst_atomic_add_return() to count allocated chunks globally
- once child finishes allocation it will pause()
- we set timeout to ~3 minutes
- main process runs in loop, sleeps, and periodically checks
  - if progress reached target, PASS, break
  - if progress hasn't increased in last 15 seconds, FAIL, break
  - if we are 15 seconds away from timeout, end test early, PASS, break
    (reason is to avoid running too long on big boxes)
- kill all children, exit

Regards,
Jan

next prev parent reply	other threads:[~2019-02-28 22:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28  7:40 [LTP] [RFC PATCH] mm: rewrite mtest01 with new API Li Wang
2019-02-28 22:08 ` Jan Stancek [this message]
2019-03-01  6:05   ` Li Wang
2019-03-01  8:03     ` Jan Stancek
2019-03-01  8:26       ` Li Wang
2019-03-01  8:44         ` Jan Stancek
2019-03-05  7:04           ` Li Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=646940042.3510845.1551391739985.JavaMail.zimbra@redhat.com \
    --to=jstancek@redhat.com \
    --cc=ltp@lists.linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.