From: lwoodman@redhat.com
To: Hugh Dickins <hughd@google.com>,
Felix von Leitner <felix-linuxkernel@fefe.de>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: fork on processes with lots of memory
Date: Fri, 26 Feb 2016 12:41:07 -0500 [thread overview]
Message-ID: <56D08E33.2080100@redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1601271905210.2349@eggly.anvils>
[-- Attachment #1: Type: text/plain, Size: 2623 bytes --]
On 01/27/2016 10:09 PM, Hugh Dickins wrote:
> On Tue, 26 Jan 2016, Felix von Leitner wrote:
>>> Dear Linux kernel devs,
>>> I talked to someone who uses large Linux based hardware to run a
>>> process with huge memory requirements (think 4 GB), and he told me that
>>> if they do a fork() syscall on that process, the whole system comes to
>>> standstill. And not just for a second or two. He said they measured a 45
>>> minute (!) delay before the system became responsive again.
>> I'm sorry, I meant 4 TB not 4 GB.
>> I'm not used to working with that kind of memory sizes.
>>
>>> Their working theory is that all the pages need to be marked copy-on-write
>>> in both processes, and if you touch one page, a copy needs to be made,
>>> and than just takes a while if you have a billion pages.
>>> I was wondering if there is any advice for such situations from the
>>> memory management people on this list.
>>> In this case the fork was for an execve afterwards, but I was going to
>>> recommend fork to them for something else that can not be tricked around
>>> with vfork.
>>> Can anyone comment on whether the 45 minute number sounds like it could
>>> be real? When I heard it, I was flabberghasted. But the other person
>>> swore it was real. Can a fork cause this much of a delay? Is there a way
>>> to work around it?
>>> I was going to recommend the fork to create a boundary between the
>>> processes, so that you can recover from memory corruption in one
>>> process. In fact, after the fork I would want to munmap almost all of
>>> the shared pages anyway, but there is no way to tell fork that.
> You might find madvise(addr, length, MADV_DONTFORK) helpful:
> that tells fork not to duplicate the given range in the child.
>
> Hugh
I dont know exactly what program they are running but we test RHEL with
up to 24TB
of memory and have not seen this problem. I have mmap()'d 12TB of
memory into a
parent process private, touched every page then forked a child which
wrote to every
page thereby incurring tons of ZFOD and COW faults. It takes a while to
process the
6 billion faults but the system didnt come to a halt. The time I do see
significant pauses
is when we overcommit RAM and swap space and get into an OOMkill storm.
Attached is the program:
>
>>> Thanks,
>>> Felix
>>> PS: Please put me on Cc if you reply, I'm not subscribed to this mailing
>>> list.
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
[-- Attachment #2: forkoff.c --]
[-- Type: text/plain, Size: 1401 bytes --]
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <stdio.h>
main(int argc,char *argv[])
{
unsigned long siz, procs, itterations, cow;
char *ptr1;
char *i;
int pid, j, k, status;
if ((argc <= 1)||(argc >4)) {
printf("bad args, usage: forkoff <memsize-in-GB> #children #itterations cow:0|1\n");
exit(-1);
}
siz = ((long)atol(argv[1])*1024*1024*1024);
procs = atol(argv[2]);
itterations = atol(argv[3]);
cow = atol(argv[4]);
printf("mmaping %ld anonymous bytes\n", siz);
ptr1 = (char *)mmap((void *)0,siz,PROT_READ|PROT_WRITE,MAP_ANONYMOUS|MAP_PRIVATE,-1,0);
if (ptr1 == (char *)-1) {
printf("ptr1 = %lx\n", ptr1);
perror("");
}
if (cow) {
printf("priming parent for child COW faults\n");
// This will cause the ZFOD faults in the parent & COW faults in the children.
for (i=ptr1; i<ptr1+siz-1; i+=4096)
*i=(char)'i';
}
printf("forking %ld processes\n", procs);
k = procs;
do{
pid = fork();
if (pid == -1) {
printf("fork failure\n");
exit(-1);
} else if (!pid) {
printf("PID %d touching %d pages\n", getpid(), siz/4096);
// This will ZFOD fault if the parent didnt otherwise it will COW fault.
for (j=0; j<itterations; j++) {
for (i=ptr1; i<ptr1+siz-1; i+=4096) {
*i=(char)'i';
}
}
printf("All done, exiting\n");
exit(0);
}
} while(--k);
while (procs-- && wait(&status));
}
prev parent reply other threads:[~2016-02-26 17:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-26 16:06 fork on processes with lots of memory Felix von Leitner
2016-01-26 16:28 ` Felix von Leitner
2016-01-26 16:38 ` Borislav Petkov
2016-01-26 20:26 ` Mikael Pettersson
2016-01-28 3:09 ` Hugh Dickins
2016-01-28 3:09 ` Hugh Dickins
2016-02-26 17:41 ` lwoodman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56D08E33.2080100@redhat.com \
--to=lwoodman@redhat.com \
--cc=felix-linuxkernel@fefe.de \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.