From: Jeff Dike <jdike@karaya.com>
To: Benjamin LaHaise <bcrl@redhat.com>
Cc: Daniel Phillips <phillips@bonn-fries.net>,
"H. Peter Anvin" <hpa@zytor.com>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC] Arch option to touch newly allocated pages
Date: Fri, 08 Mar 2002 14:17:53 -0500 [thread overview]
Message-ID: <200203081917.OAA03071@ccure.karaya.com> (raw)
In-Reply-To: Your message of "Wed, 06 Mar 2002 20:52:29 EST." <20020306205229.A15048@redhat.com>
bcrl@redhat.com said:
> Versus fully allocating the backing store, which would neither hang
> nor cause segfaults. This is the behaviour that one expects by
> default, and should be the first line of defense before going to the
> overcommit model. Get that aspect of reliability in place, then add
> the overcommit support.
OK, the patch below (against UML 2.4.18-2) implements reliable overcommit
for UML.
The test was the same as before -
64M tmpfs on /tmp
two 64M UMLs
one -j 2 kernel build running in each
tmpfs was exhausted nearly immediately. Both builds ran to completion.
At the end, the 64M tmpfs was divided roughly 30M/35M between the two UMLs.
The first chunk of the patch (mm.h) is the hook that I started this thread
talking about. It's a noop for all arches except UML (or s390 if they decide
they can use it).
The next two (asm/page.h and mem.c) implement the hook for UML. I believe
it correctly preserves the failure semantics of alloc_pages. Please let me
know if I missed something.
It tests for unbacked pages by writing to them and catching the resulting
SIGBUS. On a host with address space accounting, it would instead map the
page and catch the map failures.
The rest of the patch is UML bug fixes which you're only interested in if
you want to boot it up.
One bug - if alloc_pages returns a combination of backed and unbacked pages
for an order > 0 allocation, the backed pages will effectively be leaked.
TBD -
a corresponding arch hook in free_pages which UML can use for
MADV_DONTNEED
some way of poking at unbacked pages to see if they are now backed
and can be released back to free_pages
These two items would go some way to allowing multiple UMLs to pass host
memory back and forth as needed when it gets scarce.
Jeff
diff -Naur um/include/linux/mm.h back/include/linux/mm.h
--- um/include/linux/mm.h Thu Mar 7 11:56:36 2002
+++ back/include/linux/mm.h Thu Mar 7 11:57:31 2002
@@ -358,6 +358,13 @@
extern struct page * FASTCALL(__alloc_pages(unsigned int gfp_mask, unsigned int order, zonelist_t *zonelist));
extern struct page * alloc_pages_node(int nid, unsigned int gfp_mask, unsigned int order);
+#ifndef HAVE_ARCH_VALIDATE
+static inline struct page *arch_validate(struct page *page, unsigned int gfp_mask, int order)
+{
+ return(page);
+}
+#endif
+
static inline struct page * alloc_pages(unsigned int gfp_mask, unsigned int order)
{
/*
@@ -365,7 +372,7 @@
*/
if (order >= MAX_ORDER)
return NULL;
- return _alloc_pages(gfp_mask, order);
+ return arch_validate(_alloc_pages(gfp_mask, order), gfp_mask, order);
}
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
diff -Naur um/include/asm-um/page.h back/include/asm-um/page.h
--- um/include/asm-um/page.h Mon Mar 4 17:27:34 2002
+++ back/include/asm-um/page.h Thu Mar 7 11:57:01 2002
@@ -42,4 +42,7 @@
#define virt_to_page(kaddr) (mem_map + (__pa(kaddr) >> PAGE_SHIFT))
#define VALID_PAGE(page) ((page - mem_map) < max_mapnr)
+extern struct page *arch_validate(struct page *page, int mask, int order);
+#define HAVE_ARCH_VALIDATE
+
#endif
diff -Naur um/arch/um/kernel/mem.c back/arch/um/kernel/mem.c
--- um/arch/um/kernel/mem.c Mon Mar 4 17:27:34 2002
+++ back/arch/um/kernel/mem.c Thu Mar 7 11:57:17 2002
@@ -212,6 +212,39 @@
" just be swapped out.\n Example: mem=64M\n\n"
);
+struct page *arch_validate(struct page *page, int mask, int order)
+{
+ unsigned long addr, zero = 0;
+ int i;
+
+ again:
+ if(page == NULL) return(page);
+ addr = (unsigned long) page_address(page);
+ for(i = 0; i < (1 << order); i++){
+ current->thread.fault_addr = (void *) addr;
+ if(__do_copy_to_user((void *) addr, &zero,
+ sizeof(zero),
+ ¤t->thread.fault_addr,
+ ¤t->thread.fault_catcher)){
+ if(!(mask & __GFP_WAIT)) return(NULL);
+ else break;
+ }
+ addr += PAGE_SIZE;
+ }
+ if(i == (1 << order)) return(page);
+ page = _alloc_pages(mask, order);
+ goto again;
+}
+
+extern void relay_signal(int sig, void *sc, int usermode);
+
+void bus_handler(int sig, void *sc, int usermode)
+{
+ if(current->thread.fault_catcher != NULL)
+ do_longjmp(current->thread.fault_catcher);
+ else relay_signal(sig, sc, usermode);
+}
+
/*
* Overrides for Emacs so that we follow Linus's tabbing style.
* Emacs will notice this stuff at the end of the file and automatically
diff -Naur um/arch/um/kernel/exec_kern.c back/arch/um/kernel/exec_kern.c
--- um/arch/um/kernel/exec_kern.c Mon Mar 4 17:27:34 2002
+++ back/arch/um/kernel/exec_kern.c Mon Mar 4 18:05:20 2002
@@ -38,6 +38,12 @@
int new_pid;
stack = alloc_stack();
+ if(stack == 0){
+ printk(KERN_ERR
+ "flush_thread : failed to allocate temporary stack\n");
+ do_exit(SIGKILL);
+ }
+
new_pid = start_fork_tramp((void *) current->thread.kernel_stack,
stack, 0, exec_tramp);
if(new_pid < 0){
diff -Naur um/arch/um/kernel/process_kern.c back/arch/um/kernel/process_kern.c
--- um/arch/um/kernel/process_kern.c Mon Mar 4 17:27:34 2002
+++ back/arch/um/kernel/process_kern.c Mon Mar 4 18:05:20 2002
@@ -141,7 +141,7 @@
unsigned long page;
if((page = __get_free_page(GFP_KERNEL)) == 0)
- panic("Couldn't allocate new stack");
+ return(0);
stack_protections(page);
return(page);
}
@@ -318,6 +318,11 @@
panic("copy_thread : pipe failed");
if(current->thread.forking){
stack = alloc_stack();
+ if(stack == 0){
+ printk(KERN_ERR "copy_thread : failed to allocate "
+ "temporary stack\n");
+ return(-ENOMEM);
+ }
clone_vm = (p->mm == current->mm);
p->thread.temp_stack = stack;
new_pid = start_fork_tramp((void *) p->thread.kernel_stack,
diff -Naur um/arch/um/kernel/trap_kern.c back/arch/um/kernel/trap_kern.c
--- um/arch/um/kernel/trap_kern.c Mon Mar 4 17:27:34 2002
+++ back/arch/um/kernel/trap_kern.c Mon Mar 4 18:05:20 2002
@@ -30,6 +30,7 @@
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
struct siginfo si;
+ void *catcher;
pgd_t *pgd;
pmd_t *pmd;
pte_t *pte;
@@ -40,6 +41,7 @@
return(0);
}
if(mm == NULL) panic("Segfault with no mm");
+ catcher = current->thread.fault_catcher;
si.si_code = SEGV_MAPERR;
down_read(&mm->mmap_sem);
vma = find_vma(mm, address);
@@ -84,10 +86,10 @@
up_read(&mm->mmap_sem);
return(0);
bad:
- if (current->thread.fault_catcher != NULL) {
+ if(catcher != NULL) {
current->thread.fault_addr = (void *) address;
up_read(&mm->mmap_sem);
- do_longjmp(current->thread.fault_catcher);
+ do_longjmp(catcher);
}
else if(current->thread.fault_addr != NULL){
panic("fault_addr set but no fault catcher");
@@ -120,6 +122,7 @@
void relay_signal(int sig, void *sc, int usermode)
{
+ if(!usermode) panic("Kernel mode signal %d", sig);
force_sig(sig, current);
}
diff -Naur um/arch/um/kernel/trap_user.c back/arch/um/kernel/trap_user.c
--- um/arch/um/kernel/trap_user.c Mon Mar 4 17:27:34 2002
+++ back/arch/um/kernel/trap_user.c Mon Mar 4 18:05:20 2002
@@ -420,11 +420,13 @@
extern int timer_ready, timer_on;
+extern void bus_handler(int sig, void *sc, int usermode);
+
static void (*handlers[])(int, void *, int) = {
[ SIGTRAP ] relay_signal,
[ SIGFPE ] relay_signal,
[ SIGILL ] relay_signal,
- [ SIGBUS ] relay_signal,
+ [ SIGBUS ] bus_handler,
[ SIGSEGV] segv_handler,
[ SIGIO ] sigio_handler,
[ SIGVTALRM ] timer_handler,
next prev parent reply other threads:[~2002-03-08 19:16 UTC|newest]
Thread overview: 96+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-03-03 21:12 [RFC] Arch option to touch newly allocated pages Jeff Dike
2002-03-03 22:01 ` Alan Cox
2002-03-03 23:27 ` Jeff Dike
2002-03-03 23:48 ` Alan Cox
2002-03-04 3:16 ` Jeff Dike
2002-03-04 3:35 ` Alan Cox
2002-03-04 5:04 ` Jeff Dike
2002-03-04 15:09 ` Alan Cox
2002-03-04 17:42 ` Jeff Dike
2002-03-04 18:29 ` Alan Cox
2002-03-04 18:36 ` Jeff Dike
2002-03-04 18:49 ` Alan Cox
2002-03-04 20:46 ` Jeff Dike
2002-03-04 22:49 ` Alan Cox
2002-03-04 17:46 ` H. Peter Anvin
2002-03-04 18:34 ` Jeff Dike
2002-03-04 18:33 ` H. Peter Anvin
2002-03-04 20:36 ` Jeff Dike
2002-03-04 22:51 ` Alan Cox
2002-03-05 4:15 ` Jeff Dike
2002-03-05 4:28 ` Benjamin LaHaise
2002-03-05 4:40 ` Jeff Dike
2002-03-05 5:34 ` H. Peter Anvin
2002-03-05 14:43 ` Jeff Dike
2002-03-05 16:37 ` H. Peter Anvin
2002-03-05 16:56 ` Wayne Whitney
2002-03-05 18:12 ` Jeff Dike
2002-03-05 18:30 ` Benjamin LaHaise
2002-03-06 14:59 ` Daniel Phillips
2002-03-06 15:24 ` Benjamin LaHaise
2002-03-06 15:24 ` Daniel Phillips
2002-03-06 16:36 ` Benjamin LaHaise
2002-03-06 23:14 ` Daniel Phillips
2002-03-06 23:20 ` Benjamin LaHaise
2002-03-06 23:26 ` Daniel Phillips
2002-03-06 23:33 ` H. Peter Anvin
2002-03-07 0:08 ` Daniel Phillips
2002-03-07 1:27 ` Jeff Dike
2002-03-07 1:52 ` Benjamin LaHaise
2002-03-08 19:17 ` Jeff Dike [this message]
2002-03-08 21:22 ` Benjamin LaHaise
2002-03-07 13:49 ` Alan Cox
2002-03-07 13:36 ` Daniel Phillips
2002-03-07 14:04 ` yodaiken
2002-03-07 14:21 ` Daniel Phillips
2002-03-07 14:38 ` yodaiken
2002-03-07 15:31 ` Daniel Phillips
2002-03-07 16:50 ` yodaiken
2002-03-07 18:07 ` Daniel Phillips
2002-03-07 18:15 ` yodaiken
2002-03-07 19:22 ` Alan Cox
2002-03-07 22:43 ` David Woodhouse
2002-03-07 23:09 ` Alan Cox
2002-03-07 22:57 ` David Woodhouse
2002-03-07 14:43 ` Alan Cox
2002-03-07 15:32 ` Daniel Phillips
2002-03-07 16:19 ` Alan Cox
2002-03-07 17:54 ` Daniel Phillips
2002-03-07 15:34 ` Daniel Phillips
2002-03-07 19:18 ` Andrew Morton
2002-03-07 20:10 ` Rik van Riel
2002-03-07 20:56 ` Andrew Morton
2002-03-07 21:23 ` Rik van Riel
2002-03-07 22:02 ` Andrew Morton
2002-03-07 22:10 ` Rik van Riel
2002-03-07 22:23 ` Andrew Morton
2002-03-07 22:27 ` Rik van Riel
2002-03-07 22:41 ` Andrew Morton
2002-03-07 22:42 ` David Lang
2002-03-06 16:03 ` Jesse Pollard
2002-03-06 17:08 ` Jeff Dike
2002-03-06 17:33 ` Alan Cox
2002-03-07 0:28 ` Jeff Dike
2002-03-07 0:44 ` Alan Cox
2002-03-05 18:46 ` H. Peter Anvin
2002-03-06 1:30 ` Alan Cox
2002-03-06 10:49 ` David Woodhouse
2002-03-06 14:26 ` Jeff Dike
2002-03-06 16:50 ` Alan Cox
2002-03-06 20:25 ` Jeff Dike
2002-03-06 20:54 ` Alan Cox
2002-03-06 21:27 ` Malcolm Beattie
2002-03-06 23:26 ` Jeff Dike
2002-03-06 21:27 ` David Woodhouse
2002-03-06 22:25 ` Joseph Malicki
2002-03-07 0:04 ` Richard Gooch
2002-03-07 0:28 ` Jeff Dike
2002-03-07 0:44 ` Alan Cox
2002-03-06 22:21 ` Pavel Machek
2002-03-07 11:30 ` Dave Jones
2002-03-07 18:21 ` H. Peter Anvin
2002-03-05 14:43 ` Jeff Dike
2002-03-05 16:57 ` H. Peter Anvin
2002-03-05 18:14 ` Jeff Dike
2002-03-05 18:45 ` H. Peter Anvin
2002-03-05 17:30 ` Jan Harkes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200203081917.OAA03071@ccure.karaya.com \
--to=jdike@karaya.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=bcrl@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=phillips@bonn-fries.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox