Re: [Bugme-new] [Bug 13302] New: "bad pmd" on fork() of process with hugepage shared memory segments attached

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: starlight@binnacle.cx
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
	bugme-daemon@bugzilla.kernel.org, Adam Litke <agl@us.ibm.com>,
	Eric B Munson <ebmunson@us.ibm.com>
Subject: Re: [Bugme-new] [Bug 13302] New: "bad pmd" on fork() of process with hugepage shared memory segments attached
Date: Fri, 15 May 2009 15:55:03 +0100	[thread overview]
Message-ID: <20090515145502.GA9032@csn.ul.ie> (raw)
In-Reply-To: <6.2.5.6.2.20090515012125.057a9c88@binnacle.cx>

[-- Attachment #1: Type: text/plain, Size: 2553 bytes --]

On Fri, May 15, 2009 at 01:32:38AM -0400, starlight@binnacle.cx wrote:
> Whacked at a this, attempting to build a testcase from a 
> combination of the original daemon strace in the bug report
> and knowledge of what the daemon is doing.
> 
> What emerged is something that will destroy RHEL5 
> 2.6.18-128.1.6.el5 100% every time.  Completely fills the kernel 
> message log with "bad pmd" errors and wrecks hugepages.
> 

Ok, I can confirm that more or less. I reproduced the problem on 2.6.18-92.el5
on x86-64 running RHEL 5.2. I didn't have access to a machine with enough
memory though so I dropped the requirements slightly. It still triggered
a failure though.

However, when I ran 2.6.18, 2.6.19 and 2.6.29.1 on the same machine, I could
not reproduce the problem, nor could I cause hugepages to leak so I'm leaning
towards believing this is a distribution bug at the moment.

On the plus side, due to your good work, there is enough available for them
to bisect this problem hopefully.

> Unfortunately it only occasionally breaks 2.6.29.1.  Haven't
> been able to produce "bad pmd" messages, but did get the
> kernel to think it's out of large page memory when in
> theory it was not.  Saw a lot of really strange accounting
> in the hugepage section of /proc/meminfo.
> 

What sort of strange accounting? The accounting has changed since 2.6.18
so I want to be sure you're really seeing something weird. When I was
testing, I didn't see anything out of the ordinary but maybe I'm looking
in a different place.

> For what it's worth, the testcase code is attached.
> 

I cleaned the test up a bit and wrote a wrapper script to run this
multiple times while checking for hugepage leaks. I've it running in a
loop while the machine runs sysbench as a stress test to see can I cause
anything out of the ordinary to happen. Nothing so far though.

> Note that hugepages=2048 is assumed--the bug seems to require 
> use of more than 50% of large page memory.
> 
> Definately will be posted under the RHEL5 bug report, which is 
> the more pressing issue here than far-future kernel support.
> 

If you've filed a RedHat bug, this modified testcase and wrapper script
might help them. The program exists and cleans up after itself and the memory
requirements are less. The script sets the machine up in a way that
breaks for me where the breakage is bad pmd messages and hugepages
leaking.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

[-- Attachment #2: test-tcbm.sh --]
[-- Type: application/x-sh, Size: 603 bytes --]

[-- Attachment #3: tcbm.c --]
[-- Type: text/x-csrc, Size: 4385 bytes --]

#include <errno.h>
#include <memory.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sched.h>
#include <sys/wait.h>
#include <sys/shm.h>
#include <sys/resource.h>
#include <sys/mman.h>

#define LARGE_SHARED_SEGMENT_KEY	0x12345600
#define LARGE_SHARED_SEGMENT_SIZE	((size_t)0x40000000)
#define LARGE_SHARED_SEGMENT_ADDR	((void *)0x40000000)

#define SMALL_SHARED_SEGMENT_KEY	0x12345601
#define SMALL_SHARED_SEGMENT_SIZE	((size_t)0x20000000)
#define SMALL_SHARED_SEGMENT_ADDR	((void *)0x94000000)

#define NUM_SMALL_BUFFERS		50

char *helper_program = "echo";
char *helper_args[] = { "-n", ".", NULL };

void child_signal_handler(const int unused)
{
	int errno_save;
	pid_t dead_pid;
	int dead_status;

	errno_save = errno;

	do {
		dead_pid = waitpid(-1, &dead_status, WNOHANG);
		if (dead_pid == -1) {
			if (errno == ECHILD)
				break;
			perror("waitpid");
			exit(EXIT_FAILURE);
		}
	} while (dead_pid != 0);

	errno = errno_save;
	return;
}

int rabbits(void)
{
	int sched_policy;
	int pid;

	pid = fork();
	if (pid != 0)
		return 0;

	sched_policy = sched_getscheduler(0);
	if (sched_policy == -1)
		perror("sched_getscheduler");

	/* Set the childs policy to SCHED_OTHER */
	if (sched_policy != SCHED_OTHER) {
		struct sched_param sched;
		memset(&sched, 0, sizeof(sched));
		sched.sched_priority = 0;
		if (sched_setscheduler(0, SCHED_OTHER, &sched) != 0)
			perror("sched_setscheduler");
	}

	/* Set the priority of the process */
	errno = 0;
	const int nice = getpriority(PRIO_PROCESS, 0);
	if (errno != 0)
		perror("getpriority");
	if (nice < -10)
		if (setpriority(PRIO_PROCESS, 0, -10) != 0)
			perror("setpriority");

	/* Launch helper program */
	execvp(helper_program, helper_args);
	perror("execvp");
	exit(EXIT_FAILURE);
}

int main(int argc, const char** argv, const char** envp)
{
	struct sched_param sched;
	struct sigaction sas_child;
	int i;

	/* Set the round robin scheduler */
	memset(&sched, 0, sizeof(sched));
	sched.sched_priority = 26;
	if (sched_setscheduler(0, SCHED_RR, &sched) != 0) {
		perror("sched_setscheduler(SCHED_RR, 26)");
		return 1;
	}

	/* Set a signal handler for children exiting */
	memset(&sas_child, 0, sizeof(sas_child));
	sas_child.sa_handler = child_signal_handler;
	if (sigaction(SIGCHLD, &sas_child, NULL) != 0) {
		perror("sigaction(SIGCHLD)");
		return 1;
	}

	/* Create a large shared memory segment */
	int seg1id = shmget(LARGE_SHARED_SEGMENT_KEY,
				LARGE_SHARED_SEGMENT_SIZE,
				IPC_CREAT|SHM_HUGETLB|0640);
	if (seg1id == -1) {
		perror("shmget(LARGE_SEGMENT)");
		return 1;
	}

	/* Attach at the 16GB offset */
	void* seg1adr = shmat(seg1id, LARGE_SHARED_SEGMENT_ADDR, 0);
	if (seg1adr == (void*)-1) {
		perror("shmat(LARGE_SEGMENT)");
		return 1;
	}

	/* Initialise the start of the segment and mlock it */
	memset(seg1adr, 0xFF, LARGE_SHARED_SEGMENT_SIZE/2);
	if (mlock(seg1adr, LARGE_SHARED_SEGMENT_SIZE) != 0) {
		perror("mlock(LARGE_SEGMENT)");
		return 1;
	}

	/* Create a second smaller segment */
	int seg2id = shmget(SMALL_SHARED_SEGMENT_KEY,
				SMALL_SHARED_SEGMENT_SIZE,
				IPC_CREAT|SHM_HUGETLB|0640);
	if (seg2id == -1) {
		perror("shmget(SMALL_SEGMENT)");
		return 1;
	}

	/* Attach small segment */
	void *seg2adr = shmat(seg2id, SMALL_SHARED_SEGMENT_ADDR, 0);
	if (seg2adr == (void*) -1) {
		perror("shmat(SMALL_SEGMENT)");
		return 1;
	}

	/* Initialise all of small segment and mlock */
	memset(seg2adr, 0xFF, (size_t) SMALL_SHARED_SEGMENT_SIZE);
	if (mlock(seg2adr, (size_t) SMALL_SHARED_SEGMENT_SIZE) != 0) {
		perror("mlock(SMALL_SEGMENT)");
		return 1;
	}

	/* Create a number of approximately 516K buffers */
	for (i = 0; i < NUM_SMALL_BUFFERS; i++) {
		void* mmtarg = mmap(NULL, 528384,
				PROT_READ|PROT_WRITE,
				MAP_PRIVATE|MAP_ANONYMOUS,
				-1, 0);
		if (mmtarg == (void*) -1) {
			perror("mmap");
			return 1;
		}
	}

	/* Create one child per small buffer */
	for (i = 0; i < NUM_SMALL_BUFFERS; i++) {
		rabbits();
		usleep(500);
	}

	/* Wait until children shut up signalling */
	printf("Waiting for children\n");
	while (sleep(3) != 0);

	/* Detach */
	if (shmdt(seg1adr) == -1)
		perror("shmdt(LARGE_SEGMENT)");
	if (shmdt(seg2adr) == -1)
		perror("shmdt(SMALL_SEGMENT)");
	if (shmctl(seg1id, IPC_RMID, NULL) == -1)
		perror("shmrm(LARGE_SEGMENT)");
	if (shmctl(seg2id, IPC_RMID, NULL) == -1)
		perror("shmrm(SMALL_SEGMENT)");

	printf("Done\n");
	return 0;
}

next prev parent reply	other threads:[~2009-05-15 14:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-15  5:32 [Bugme-new] [Bug 13302] New: "bad pmd" on fork() of process with hugepage shared memory segments attached starlight
2009-05-15 14:55 ` Mel Gorman [this message]
2009-05-15 15:02   ` starlight
  -- strict thread matches above, loose matches on Subject: below --
2009-05-15 18:53 starlight
2009-05-20 11:35 ` Mel Gorman
2009-05-20 14:29   ` Mel Gorman
2009-05-20 14:53   ` Lee Schermerhorn
2009-05-20 15:05     ` Lee Schermerhorn
2009-05-20 15:41       ` Mel Gorman
2009-05-21  0:41         ` KOSAKI Motohiro
2009-05-22 16:41           ` Mel Gorman
2009-05-24 13:44             ` KOSAKI Motohiro
2009-05-25  8:51               ` Mel Gorman
2009-05-25 10:10                 ` Hugh Dickins
2009-05-25 13:17                   ` Mel Gorman
2009-05-15 18:44 starlight
2009-05-18 16:36 ` Mel Gorman
     [not found] <bug-13302-10286@http.bugzilla.kernel.org/>
2009-05-13 20:08 ` Andrew Morton
2009-05-14 10:53   ` Mel Gorman
2009-05-14 10:59     ` Mel Gorman
2009-05-14 17:20       ` starlight
2009-05-14 17:49         ` Mel Gorman
2009-05-14 18:42           ` starlight
2009-05-14 19:10           ` starlight
2009-05-14 17:16     ` starlight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090515145502.GA9032@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bugme-daemon@bugzilla.kernel.org \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=ebmunson@us.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=starlight@binnacle.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).