From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail138.messagelabs.com (mail138.messagelabs.com [216.82.249.35]) by kanga.kvack.org (Postfix) with SMTP id 040046B004D for ; Mon, 8 Jun 2009 10:37:34 -0400 (EDT) Received: from localhost (smtp.ultrahosting.com [127.0.0.1]) by smtp.ultrahosting.com (Postfix) with ESMTP id E2C8F82C4DD for ; Mon, 8 Jun 2009 12:14:09 -0400 (EDT) Received: from smtp.ultrahosting.com ([74.213.175.254]) by localhost (smtp.ultrahosting.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id G1SJdw4KYgSR for ; Mon, 8 Jun 2009 12:14:05 -0400 (EDT) Received: from gentwo.org (unknown [74.213.171.31]) by smtp.ultrahosting.com (Postfix) with ESMTP id 73FA382C4DB for ; Mon, 8 Jun 2009 12:14:02 -0400 (EDT) Date: Mon, 8 Jun 2009 11:34:06 -0400 (EDT) From: Christoph Lameter Subject: Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen In-Reply-To: <20090608091201.953724007@intel.com> Message-ID: References: <20090608091044.880249722@intel.com> <20090608091201.953724007@intel.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: Wu Fengguang Cc: Andrew Morton , KOSAKI Motohiro , Elladan , Nick Piggin , Andi Kleen , Rik van Riel , Peter Zijlstra , Johannes Weiner , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-ID: On Mon, 8 Jun 2009, Wu Fengguang wrote: > 1.2) test scenario > > - nfsroot gnome desktop with 512M physical memory > - run some programs, and switch between the existing windows > after starting each new program. Is there a predefined sequence or does this vary between tests? Scripted? What percentage of time is saved in the test after due to the modifications? Around 20%? > (1) begin: shortly after the big read IO starts; > (2) end: just before the big read IO stops; > (3) restore: the big read IO stops and the zsh working set restored > (4) restore X: after IO, switch back and forth between the urxvt and firefox > windows to restore their working set. Any action done on the firefox sessions? Or just switch to a firefox session that needs to redraw? > The above console numbers show that > > - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29. > I'd attribute that improvement to the mmap readahead improvements :-) So there are other effects,,, You not measuring the effect only this patchset? > - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50. > That's a huge improvement - which means with the VM_EXEC protection logic, > active mmap pages is pretty safe even under partially cache hot streaming IO. Looks good. > - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the > dropped pages are mostly inactive ones. The patch has almost no impact in > this aspect, that means it won't unnecessarily increase memory pressure. > (In contrast, your 20% mmap protection ratio will keep them all, and > therefore eliminate the extra 41 major faults to restore working set > of zsh etc.) Good. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with SMTP id F22CA6B004F for ; Mon, 8 Jun 2009 13:30:27 -0400 (EDT) Received: by yw-out-1718.google.com with SMTP id 5so1832719ywm.26 for ; Mon, 08 Jun 2009 10:30:51 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20090608091044.880249722@intel.com> <20090608091201.953724007@intel.com> Date: Tue, 9 Jun 2009 01:30:51 +0800 Message-ID: Subject: Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen From: Nai Xia Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Christoph Lameter Cc: Wu Fengguang , Andrew Morton , KOSAKI Motohiro , Elladan , Nick Piggin , Andi Kleen , Rik van Riel , Peter Zijlstra , Johannes Weiner , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-ID: On Mon, Jun 8, 2009 at 11:34 PM, Christoph Lameter wrote: > On Mon, 8 Jun 2009, Wu Fengguang wrote: > >> 1.2) test scenario >> >> - nfsroot gnome desktop with 512M physical memory >> - run some programs, and switch between the existing windows >> =A0 after starting each new program. > > Is there a predefined sequence or does this vary between tests? Scripted? > > What percentage of time is saved in the test after due to the > modifications? > Around 20%? I think measuring the percentage of saved time may not be a good idea. The major underlying factor for time of swithing GUI windows may vary application to application, distribution to distribution and machine to machine. It's not reproducable. I am having a ridiculous timing for swithing from any window to window of slickedit, because of its damn slow redrawing method. I bet this patch will gain at most 1% on timing for this case. :) > >> (1) begin: =A0 =A0 shortly after the big read IO starts; >> (2) end: =A0 =A0 =A0 just before the big read IO stops; >> (3) restore: =A0 the big read IO stops and the zsh working set restored >> (4) restore X: after IO, switch back and forth between the urxvt and fir= efox >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0windows to restore their working set. > > Any action done on the firefox sessions? Or just switch to a firefox > session that needs to redraw? > >> The above console numbers show that >> >> - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29. >> =A0 I'd attribute that improvement to the mmap readahead improvements :-= ) > > So there are other effects,,, You not measuring the effect only this > patchset? > >> - The pgmajfault increment during the file copy is 633-630=3D3 vs 260-21= 0=3D50. >> =A0 That's a huge improvement - which means with the VM_EXEC protection = logic, >> =A0 active mmap pages is pretty safe even under partially cache hot stre= aming IO. > > Looks good. > >> - The absolute nr_mapped drops considerably to 1/9 during the big IO, an= d the >> =A0 dropped pages are mostly inactive ones. The patch has almost no impa= ct in >> =A0 this aspect, that means it won't unnecessarily increase memory press= ure. >> =A0 (In contrast, your 20% mmap protection ratio will keep them all, and >> =A0 therefore eliminate the extra 41 major faults to restore working set >> =A0 of zsh etc.) > > Good. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. =A0For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with SMTP id 1D6F46B004D for ; Mon, 8 Jun 2009 23:09:26 -0400 (EDT) Date: Tue, 9 Jun 2009 11:28:23 +0800 From: Wu Fengguang Subject: Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Message-ID: <20090609032823.GC7875@localhost> References: <20090608091044.880249722@intel.com> <20090608091201.953724007@intel.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="yrj/dFKFPuw6o+aM" Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org To: Christoph Lameter Cc: Andrew Morton , KOSAKI Motohiro , Elladan , Nick Piggin , Andi Kleen , Rik van Riel , Peter Zijlstra , Johannes Weiner , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-ID: --yrj/dFKFPuw6o+aM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Mon, Jun 08, 2009 at 11:34:06PM +0800, Christoph Lameter wrote: > On Mon, 8 Jun 2009, Wu Fengguang wrote: > > > 1.2) test scenario > > > > - nfsroot gnome desktop with 512M physical memory > > - run some programs, and switch between the existing windows > > after starting each new program. > > Is there a predefined sequence or does this vary between tests? Scripted? Yes it's scripted testing and has a predefined sequence. The scripts are attached for your reference. > What percentage of time is saved in the test after due to the > modifications? > Around 20%? It's 50%, hehe. I've posted the startup times for each program: before after programs 0.02 0.02 N xeyes 0.75 0.76 N firefox 2.02 1.88 N nautilus 3.36 3.17 N nautilus --browser 5.26 4.89 N gthumb 7.12 6.47 N gedit 9.22 8.16 N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf 13.58 12.55 N xterm 15.87 14.57 N mlterm 18.63 17.06 N gnome-terminal 21.16 18.90 N urxvt 26.24 23.48 N gnome-system-monitor 28.72 26.52 N gnome-help 32.15 29.65 N gnome-dictionary 39.66 36.12 N /usr/games/sol 43.16 39.27 N /usr/games/gnometris 48.65 42.56 N /usr/games/gnect 53.31 47.03 N /usr/games/gtali 58.60 52.05 N /usr/games/iagno 65.77 55.42 N /usr/games/gnotravex 70.76 61.47 N /usr/games/mahjongg 76.15 67.11 N /usr/games/gnome-sudoku 86.32 75.15 N /usr/games/glines 92.21 79.70 N /usr/games/glchess 103.79 88.48 N /usr/games/gnomine 113.84 96.51 N /usr/games/gnotski 124.40 102.19 N /usr/games/gnibbles 137.41 114.93 N /usr/games/gnobots2 155.53 125.02 N /usr/games/blackjack 179.85 135.11 N /usr/games/same-gnome 224.49 154.50 N /usr/bin/gnome-window-properties 248.44 162.09 N /usr/bin/gnome-default-applications-properties 282.62 173.29 N /usr/bin/gnome-at-properties 323.72 188.21 N /usr/bin/gnome-typing-monitor 363.99 199.93 N /usr/bin/gnome-at-visual 394.21 206.95 N /usr/bin/gnome-sound-properties 435.14 224.49 N /usr/bin/gnome-at-mobility 463.05 234.11 N /usr/bin/gnome-keybinding-properties 503.75 248.59 N /usr/bin/gnome-about-me 554.00 276.27 N /usr/bin/gnome-display-properties 615.48 304.39 N /usr/bin/gnome-network-preferences 693.03 342.01 N /usr/bin/gnome-mouse-properties 759.90 388.58 N /usr/bin/gnome-appearance-properties 937.90 508.47 N /usr/bin/gnome-control-center 1109.75 587.57 N /usr/bin/gnome-keyboard-properties 1399.05 758.16 N : oocalc 1524.64 830.03 N : oodraw 1684.31 900.03 N : ooimpress 1874.04 993.91 N : oomath 2115.12 1081.89 N : ooweb 2369.02 1161.99 N : oowriter > > (1) begin: shortly after the big read IO starts; > > (2) end: just before the big read IO stops; > > (3) restore: the big read IO stops and the zsh working set restored > > (4) restore X: after IO, switch back and forth between the urxvt and firefox > > windows to restore their working set. > > Any action done on the firefox sessions? Or just switch to a firefox > session that needs to redraw? After starting each new program, a new tab is opened in firefox to render a simple web page. It's the same web page, so firefox may actually cache it. > > The above console numbers show that > > > > - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29. > > I'd attribute that improvement to the mmap readahead improvements :-) > > So there are other effects,,, You not measuring the effect only this > patchset? Yes there are additional effects in the .29 vs .30 comparisons. But the following .30 vs .30 comparisons in X can lead to the same conclusions except for this additional effect. > > - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50. > > That's a huge improvement - which means with the VM_EXEC protection logic, > > active mmap pages is pretty safe even under partially cache hot streaming IO. > > Looks good. > > > - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the > > dropped pages are mostly inactive ones. The patch has almost no impact in > > this aspect, that means it won't unnecessarily increase memory pressure. > > (In contrast, your 20% mmap protection ratio will keep them all, and > > therefore eliminate the extra 41 major faults to restore working set > > of zsh etc.) > > Good. Thanks, Fengguang --yrj/dFKFPuw6o+aM Content-Type: application/x-sh Content-Disposition: attachment; filename="run-many-x-apps.sh" Content-Transfer-Encoding: quoted-printable #!/bin/zsh=0A# why zsh? bash does not support floating numbers=0A=0A# aptit= ude install wmctrl iceweasel gnome-games gnome-control-center=0A# aptitude = install openoffice.org # and uncomment the oo* lines=0A=0A=0Aread T0 T1 < /= proc/uptime=0A=0Afunction progress()=0A{=0A read t0 t1 < /proc/uptime=0A t= =3D$((t0 - T0))=0A printf "%8.2f " $t=0A echo "$@"=0A}=0A=0Afunction swi= tch_windows()=0A{=0A wmctrl -l | while read a b c win=0A do=0A progress A = "$win"=0A wmctrl -a "$win"=0A done=0A firefox /usr/share/doc/debian/FAQ/in= dex.html=0A}=0A=0Awhile read app args=0Ado=0A progress N $app $args=0A $app= $args &=0A switch_windows=0Adone << EOF=0Axeyes=0Afirefox=0Anautilus=0Anau= tilus --browser=0Agthumb=0Agedit=0Axpdf /usr/share/doc/shared-mime-info/sha= red-mime-info-spec.pdf=0A=0Axterm=0Amlterm=0Agnome-terminal=0Aurxvt=0A=0Agn= ome-system-monitor=0Agnome-help=0Agnome-dictionary=0A=0A/usr/games/sol=0A/u= sr/games/gnometris=0A/usr/games/gnect=0A/usr/games/gtali=0A/usr/games/iagno= =0A/usr/games/gnotravex=0A/usr/games/mahjongg=0A/usr/games/gnome-sudoku=0A/= usr/games/glines=0A/usr/games/glchess=0A/usr/games/gnomine=0A/usr/games/gno= tski=0A/usr/games/gnibbles=0A/usr/games/gnobots2=0A/usr/games/blackjack=0A/= usr/games/same-gnome=0A=0A/usr/bin/gnome-window-properties=0A/usr/bin/gnome= -default-applications-properties=0A/usr/bin/gnome-at-properties=0A/usr/bin/= gnome-typing-monitor=0A/usr/bin/gnome-at-visual=0A/usr/bin/gnome-sound-prop= erties=0A/usr/bin/gnome-at-mobility=0A/usr/bin/gnome-keybinding-properties= =0A/usr/bin/gnome-about-me=0A/usr/bin/gnome-display-properties=0A/usr/bin/g= nome-network-preferences=0A/usr/bin/gnome-mouse-properties=0A/usr/bin/gnome= -appearance-properties=0A/usr/bin/gnome-control-center=0A/usr/bin/gnome-key= board-properties=0A=0A: oocalc=0A: oodraw=0A: ooimpress=0A: oomath=0A: oowe= b=0A: oowriter =0A=0AEOF=0A --yrj/dFKFPuw6o+aM Content-Type: application/x-sh Content-Disposition: attachment; filename="test-mmap-exec-prot.sh" Content-Transfer-Encoding: quoted-printable #!/bin/sh=0A=0Aprot=3D$( free.$prot=0A --yrj/dFKFPuw6o+aM-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Date: Mon, 08 Jun 2009 17:10:44 +0800 Message-ID: <20090608091044.880249722@intel.com> Return-path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 6B3D46B004D for ; Mon, 8 Jun 2009 04:06:27 -0400 (EDT) Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: KOSAKI Motohiro , "Wu, Fengguang" , Andi Kleen , Christoph Lameter , Elladan , Nick Piggin , Johannes Weiner , Peter Zijlstra , Rik van Riel , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-Id: linux-mm.kvack.org Andrew, I managed to back this patchset with two test cases :) They demonstrated that - X desktop responsiveness can be *doubled* under high memory/swap pressure - it can almost stop major faults when the active file list is slowly scanned because of undergoing partially cache hot streaming IO The details are included in the changelog. Thanks, Fengguang -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list() Date: Mon, 08 Jun 2009 17:10:47 +0800 Message-ID: <20090608091202.039509146@intel.com> References: <20090608091044.880249722@intel.com> Return-path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id A81526B004F for ; Mon, 8 Jun 2009 04:06:27 -0400 (EDT) Content-Disposition: inline; filename=mm-vmscan-reduce-code.patch Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: KOSAKI Motohiro , Pekka Enberg , Peter Zijlstra , Wu Fengguang , Andi Kleen , Christoph Lameter , Elladan , Nick Piggin , Johannes Weiner , Rik van Riel , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-Id: linux-mm.kvack.org The "move pages to active list" and "move pages to inactive list" code blocks are mostly identical and can be served by a function. Thanks to Andrew Morton for pointing this out. Note that buffer_heads_over_limit check will also be carried out for re-activated pages, which is slightly different from pre-2.6.28 kernels. Also, Rik's "vmscan: evict use-once pages first" patch could totally stop scans of active file list when memory pressure is low. So the net effect could be, the number of buffer heads is now more likely to grow large. However that's fine according to Johannes' comments: I don't think that this could be harmful. We just preserve the buffer mappings of what we consider the working set and with low memory pressure, as you say, this set is not big. As to stripping of reactivated pages: the only pages we re-activate for now are those VM_EXEC mapped ones. Since we don't expect IO from or to these pages, removing the buffer mappings in case they grow too large should be okay, I guess. CC: Pekka Enberg Acked-by: Peter Zijlstra Reviewed-by: Rik van Riel Reviewed-by: Minchan Kim Reviewed-by: Johannes Weiner Signed-off-by: Wu Fengguang --- mm/vmscan.c | 95 ++++++++++++++++++++++---------------------------- 1 file changed, 42 insertions(+), 53 deletions(-) --- linux.orig/mm/vmscan.c +++ linux/mm/vmscan.c @@ -1211,6 +1211,43 @@ static inline void note_zone_scanning_pr * But we had to alter page->flags anyway. */ +static void move_active_pages_to_lru(struct zone *zone, + struct list_head *list, + enum lru_list lru) +{ + unsigned long pgmoved = 0; + struct pagevec pvec; + struct page *page; + + pagevec_init(&pvec, 1); + + while (!list_empty(list)) { + page = lru_to_page(list); + prefetchw_prev_lru_page(page, list, flags); + + VM_BUG_ON(PageLRU(page)); + SetPageLRU(page); + + VM_BUG_ON(!PageActive(page)); + if (!is_active_lru(lru)) + ClearPageActive(page); /* we are de-activating */ + + list_move(&page->lru, &zone->lru[lru].list); + mem_cgroup_add_lru_list(page, lru); + pgmoved++; + + if (!pagevec_add(&pvec, page) || list_empty(list)) { + spin_unlock_irq(&zone->lru_lock); + if (buffer_heads_over_limit) + pagevec_strip(&pvec); + __pagevec_release(&pvec); + spin_lock_irq(&zone->lru_lock); + } + } + __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); + if (!is_active_lru(lru)) + __count_vm_events(PGDEACTIVATE, pgmoved); +} static void shrink_active_list(unsigned long nr_pages, struct zone *zone, struct scan_control *sc, int priority, int file) @@ -1222,8 +1259,6 @@ static void shrink_active_list(unsigned LIST_HEAD(l_active); LIST_HEAD(l_inactive); struct page *page; - struct pagevec pvec; - enum lru_list lru; struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); lru_add_drain(); @@ -1240,6 +1275,7 @@ static void shrink_active_list(unsigned } reclaim_stat->recent_scanned[!!file] += pgmoved; + __count_zone_vm_events(PGREFILL, zone, pgscanned); if (file) __mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved); else @@ -1282,8 +1318,6 @@ static void shrink_active_list(unsigned /* * Move pages back to the lru list. */ - pagevec_init(&pvec, 1); - spin_lock_irq(&zone->lru_lock); /* * Count referenced pages from currently used mappings as rotated, @@ -1293,57 +1327,12 @@ static void shrink_active_list(unsigned */ reclaim_stat->recent_rotated[!!file] += pgmoved; - pgmoved = 0; /* count pages moved to inactive list */ - lru = LRU_BASE + file * LRU_FILE; - while (!list_empty(&l_inactive)) { - page = lru_to_page(&l_inactive); - prefetchw_prev_lru_page(page, &l_inactive, flags); - VM_BUG_ON(PageLRU(page)); - SetPageLRU(page); - VM_BUG_ON(!PageActive(page)); - ClearPageActive(page); - - list_move(&page->lru, &zone->lru[lru].list); - mem_cgroup_add_lru_list(page, lru); - pgmoved++; - if (!pagevec_add(&pvec, page)) { - spin_unlock_irq(&zone->lru_lock); - if (buffer_heads_over_limit) - pagevec_strip(&pvec); - __pagevec_release(&pvec); - spin_lock_irq(&zone->lru_lock); - } - } - __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); - __count_zone_vm_events(PGREFILL, zone, pgscanned); - __count_vm_events(PGDEACTIVATE, pgmoved); - - pgmoved = 0; /* count pages moved back to active list */ - lru = LRU_ACTIVE + file * LRU_FILE; - while (!list_empty(&l_active)) { - page = lru_to_page(&l_active); - prefetchw_prev_lru_page(page, &l_active, flags); - VM_BUG_ON(PageLRU(page)); - SetPageLRU(page); - VM_BUG_ON(!PageActive(page)); - - list_move(&page->lru, &zone->lru[lru].list); - mem_cgroup_add_lru_list(page, lru); - pgmoved++; - if (!pagevec_add(&pvec, page)) { - spin_unlock_irq(&zone->lru_lock); - if (buffer_heads_over_limit) - pagevec_strip(&pvec); - __pagevec_release(&pvec); - spin_lock_irq(&zone->lru_lock); - } - } - __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); + move_active_pages_to_lru(zone, &l_active, + LRU_ACTIVE + file * LRU_FILE); + move_active_pages_to_lru(zone, &l_inactive, + LRU_BASE + file * LRU_FILE); spin_unlock_irq(&zone->lru_lock); - if (buffer_heads_over_limit) - pagevec_strip(&pvec); - pagevec_release(&pvec); } static int inactive_anon_is_low_global(struct zone *zone) -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: [PATCH 1/3] vmscan: report vm_flags in page_referenced() Date: Mon, 08 Jun 2009 17:10:45 +0800 Message-ID: <20090608091201.783981551@intel.com> References: <20090608091044.880249722@intel.com> Return-path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 26B126B0055 for ; Mon, 8 Jun 2009 04:06:28 -0400 (EDT) Content-Disposition: inline; filename=mm-vmscan-report-vm_flags-in-page_referenced.patch Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: KOSAKI Motohiro , Peter Zijlstra , Wu Fengguang , Andi Kleen , Christoph Lameter , Elladan , Nick Piggin , Johannes Weiner , Rik van Riel , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-Id: linux-mm.kvack.org Collect vma->vm_flags of the VMAs that actually referenced the page. This is preparing for more informed reclaim heuristics, eg. to protect executable file pages more aggressively. For now only the VM_EXEC bit will be used by the caller. Thanks to Johannes, Peter and Minchan for all the good tips. Acked-by: Peter Zijlstra Reviewed-by: Rik van Riel Reviewed-by: Minchan Kim Reviewed-by: Johannes Weiner Signed-off-by: Wu Fengguang --- include/linux/rmap.h | 5 +++-- mm/rmap.c | 37 ++++++++++++++++++++++++++----------- mm/vmscan.c | 7 +++++-- 3 files changed, 34 insertions(+), 15 deletions(-) --- linux.orig/include/linux/rmap.h +++ linux/include/linux/rmap.h @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct /* * Called from mm/vmscan.c to handle paging out */ -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt); +int page_referenced(struct page *, int is_locked, + struct mem_cgroup *cnt, unsigned long *vm_flags); int try_to_unmap(struct page *, int ignore_refs); /* @@ -121,7 +122,7 @@ int page_wrprotect(struct page *page, in #define anon_vma_prepare(vma) (0) #define anon_vma_link(vma) do {} while (0) -#define page_referenced(page,l,cnt) TestClearPageReferenced(page) +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page) #define try_to_unmap(page, refs) SWAP_FAIL static inline int page_mkclean(struct page *page) --- linux.orig/mm/rmap.c +++ linux/mm/rmap.c @@ -333,7 +333,9 @@ static int page_mapped_in_vma(struct pag * repeatedly from either page_referenced_anon or page_referenced_file. */ static int page_referenced_one(struct page *page, - struct vm_area_struct *vma, unsigned int *mapcount) + struct vm_area_struct *vma, + unsigned int *mapcount, + unsigned long *vm_flags) { struct mm_struct *mm = vma->vm_mm; unsigned long address; @@ -381,11 +383,14 @@ out_unmap: (*mapcount)--; pte_unmap_unlock(pte, ptl); out: + if (referenced) + *vm_flags |= vma->vm_flags; return referenced; } static int page_referenced_anon(struct page *page, - struct mem_cgroup *mem_cont) + struct mem_cgroup *mem_cont, + unsigned long *vm_flags) { unsigned int mapcount; struct anon_vma *anon_vma; @@ -405,7 +410,8 @@ static int page_referenced_anon(struct p */ if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont)) continue; - referenced += page_referenced_one(page, vma, &mapcount); + referenced += page_referenced_one(page, vma, + &mapcount, vm_flags); if (!mapcount) break; } @@ -418,6 +424,7 @@ static int page_referenced_anon(struct p * page_referenced_file - referenced check for object-based rmap * @page: the page we're checking references on. * @mem_cont: target memory controller + * @vm_flags: collect encountered vma->vm_flags who actually referenced the page * * For an object-based mapped page, find all the places it is mapped and * check/clear the referenced flag. This is done by following the page->mapping @@ -427,7 +434,8 @@ static int page_referenced_anon(struct p * This function is only called from page_referenced for object-based pages. */ static int page_referenced_file(struct page *page, - struct mem_cgroup *mem_cont) + struct mem_cgroup *mem_cont, + unsigned long *vm_flags) { unsigned int mapcount; struct address_space *mapping = page->mapping; @@ -467,7 +475,8 @@ static int page_referenced_file(struct p */ if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont)) continue; - referenced += page_referenced_one(page, vma, &mapcount); + referenced += page_referenced_one(page, vma, + &mapcount, vm_flags); if (!mapcount) break; } @@ -481,29 +490,35 @@ static int page_referenced_file(struct p * @page: the page to test * @is_locked: caller holds lock on the page * @mem_cont: target memory controller + * @vm_flags: collect encountered vma->vm_flags who actually referenced the page * * Quick test_and_clear_referenced for all mappings to a page, * returns the number of ptes which referenced the page. */ -int page_referenced(struct page *page, int is_locked, - struct mem_cgroup *mem_cont) +int page_referenced(struct page *page, + int is_locked, + struct mem_cgroup *mem_cont, + unsigned long *vm_flags) { int referenced = 0; if (TestClearPageReferenced(page)) referenced++; + *vm_flags = 0; if (page_mapped(page) && page->mapping) { if (PageAnon(page)) - referenced += page_referenced_anon(page, mem_cont); + referenced += page_referenced_anon(page, mem_cont, + vm_flags); else if (is_locked) - referenced += page_referenced_file(page, mem_cont); + referenced += page_referenced_file(page, mem_cont, + vm_flags); else if (!trylock_page(page)) referenced++; else { if (page->mapping) - referenced += - page_referenced_file(page, mem_cont); + referenced += page_referenced_file(page, + mem_cont, vm_flags); unlock_page(page); } } --- linux.orig/mm/vmscan.c +++ linux/mm/vmscan.c @@ -584,6 +584,7 @@ static unsigned long shrink_page_list(st struct pagevec freed_pvec; int pgactivate = 0; unsigned long nr_reclaimed = 0; + unsigned long vm_flags; cond_resched(); @@ -634,7 +635,8 @@ static unsigned long shrink_page_list(st goto keep_locked; } - referenced = page_referenced(page, 1, sc->mem_cgroup); + referenced = page_referenced(page, 1, + sc->mem_cgroup, &vm_flags); /* In active use or really unfreeable? Activate it. */ if (sc->order <= PAGE_ALLOC_COSTLY_ORDER && referenced && page_mapping_inuse(page)) @@ -1215,6 +1217,7 @@ static void shrink_active_list(unsigned { unsigned long pgmoved; unsigned long pgscanned; + unsigned long vm_flags; LIST_HEAD(l_hold); /* The pages which were snipped off */ LIST_HEAD(l_inactive); struct page *page; @@ -1255,7 +1258,7 @@ static void shrink_active_list(unsigned /* page_referenced clears PageReferenced */ if (page_mapping_inuse(page) && - page_referenced(page, 0, sc->mem_cgroup)) + page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) pgmoved++; list_add(&page->lru, &l_inactive); -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Date: Mon, 08 Jun 2009 17:10:46 +0800 Message-ID: <20090608091201.953724007@intel.com> References: <20090608091044.880249722@intel.com> Return-path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 626996B004D for ; Mon, 8 Jun 2009 04:06:28 -0400 (EDT) Content-Disposition: inline; filename=mm-vmscan-protect-exec-referenced.patch Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: KOSAKI Motohiro , Elladan , Nick Piggin , Andi Kleen , Christoph Lameter , Rik van Riel , Peter Zijlstra , Wu Fengguang , Johannes Weiner , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-Id: linux-mm.kvack.org Protect referenced PROT_EXEC mapped pages from being deactivated. PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some currently running executables and their linked libraries, they shall really be cached aggressively to provide good user experiences. Thanks to Johannes Weiner for the advice to reuse the VMA walk in page_referenced() to get the PROT_EXEC bit. [more details] ( The consequences of this patch will have to be discussed together with Rik van Riel's recent patch "vmscan: evict use-once pages first". ) ( Some of the good points and insights are taken into this changelog. Thanks to all the involved people for the great LKML discussions. ) the problem ----------- For a typical desktop, the most precious working set is composed of *actively accessed* (1) memory mapped executables (2) and their anonymous pages (3) and other files (4) and the dcache/icache/.. slabs while the least important data are (5) infrequently used or use-once files For a typical desktop, one major problem is busty and large amount of (5) use-once files flushing out the working set. Inside the working set, (4) dcache/icache have already been too sticky ;-) So we only have to care (2) anonymous and (1)(3) file pages. anonymous pages --------------- Anonymous pages are effectively immune to the streaming IO attack, because we now have separate file/anon LRU lists. When the use-once files crowd into the file LRU, the list's "quality" is significantly lowered. Therefore the scan balance policy in get_scan_ratio() will choose to scan the (low quality) file LRU much more frequently than the anon LRU. file pages ---------- Rik proposed to *not* scan the active file LRU when the inactive list grows larger than active list. This guarantees that when there are use-once streaming IO, and the working set is not too large(so that active_size < inactive_size), the active file LRU will *not* be scanned at all. So the not-too-large working set can be well protected. But there are also situations where the file working set is a bit large so that (active_size >= inactive_size), or the streaming IOs are not purely use-once. In these cases, the active list will be scanned slowly. Because the current shrink_active_list() policy is to deactivate active pages regardless of their referenced bits. The deactivated pages become susceptible to the streaming IO attack: the inactive list could be scanned fast (500MB / 50MBps = 10s) so that the deactivated pages don't have enough time to get re-referenced. Because a user tend to switch between windows in intervals from seconds to minutes. This patch holds mapped executable pages in the active list as long as they are referenced during each full scan of the active list. Because the active list is normally scanned much slower, they get longer grace time (eg. 100s) for further references, which better matches the pace of user operations. Therefore this patch greatly prolongs the in-cache time of executable code, when there are moderate memory pressures. before patch: guaranteed to be cached if reference intervals < I after patch: guaranteed to be cached if reference intervals < I+A (except when randomly reclaimed by the lumpy reclaim) where A = time to fully scan the active file LRU I = time to fully scan the inactive file LRU Note that normally A >> I. side effects ------------ This patch is safe in general, it restores the pre-2.6.28 mmap() behavior but in a much smaller and well targeted scope. One may worry about some one to abuse the PROT_EXEC heuristic. But as Andrew Morton stated, there are other tricks to getting that sort of boost. Another concern is the PROT_EXEC mapped pages growing large in rare cases, and therefore hurting reclaim efficiency. But a sane application targeted for large audience will never use PROT_EXEC for data mappings. If some home made application tries to abuse that bit, it shall be aware of the consequences. If it is abused to scale of 2/3 total memory, it gains nothing but overheads. benchmarks ---------- 1) memory tight desktop 1.1) brief summary - clock time and major faults are reduced by 50%; - pswpin numbers are reduced to ~1/3. That means X desktop responsiveness is doubled under high memory/swap pressure. 1.2) test scenario - nfsroot gnome desktop with 512M physical memory - run some programs, and switch between the existing windows after starting each new program. 1.3) progress timing (seconds) before after programs 0.02 0.02 N xeyes 0.75 0.76 N firefox 2.02 1.88 N nautilus 3.36 3.17 N nautilus --browser 5.26 4.89 N gthumb 7.12 6.47 N gedit 9.22 8.16 N xpdf /usr/share/doc/shared-mime-info/shared-mime-info-spec.pdf 13.58 12.55 N xterm 15.87 14.57 N mlterm 18.63 17.06 N gnome-terminal 21.16 18.90 N urxvt 26.24 23.48 N gnome-system-monitor 28.72 26.52 N gnome-help 32.15 29.65 N gnome-dictionary 39.66 36.12 N /usr/games/sol 43.16 39.27 N /usr/games/gnometris 48.65 42.56 N /usr/games/gnect 53.31 47.03 N /usr/games/gtali 58.60 52.05 N /usr/games/iagno 65.77 55.42 N /usr/games/gnotravex 70.76 61.47 N /usr/games/mahjongg 76.15 67.11 N /usr/games/gnome-sudoku 86.32 75.15 N /usr/games/glines 92.21 79.70 N /usr/games/glchess 103.79 88.48 N /usr/games/gnomine 113.84 96.51 N /usr/games/gnotski 124.40 102.19 N /usr/games/gnibbles 137.41 114.93 N /usr/games/gnobots2 155.53 125.02 N /usr/games/blackjack 179.85 135.11 N /usr/games/same-gnome 224.49 154.50 N /usr/bin/gnome-window-properties 248.44 162.09 N /usr/bin/gnome-default-applications-properties 282.62 173.29 N /usr/bin/gnome-at-properties 323.72 188.21 N /usr/bin/gnome-typing-monitor 363.99 199.93 N /usr/bin/gnome-at-visual 394.21 206.95 N /usr/bin/gnome-sound-properties 435.14 224.49 N /usr/bin/gnome-at-mobility 463.05 234.11 N /usr/bin/gnome-keybinding-properties 503.75 248.59 N /usr/bin/gnome-about-me 554.00 276.27 N /usr/bin/gnome-display-properties 615.48 304.39 N /usr/bin/gnome-network-preferences 693.03 342.01 N /usr/bin/gnome-mouse-properties 759.90 388.58 N /usr/bin/gnome-appearance-properties 937.90 508.47 N /usr/bin/gnome-control-center 1109.75 587.57 N /usr/bin/gnome-keyboard-properties 1399.05 758.16 N : oocalc 1524.64 830.03 N : oodraw 1684.31 900.03 N : ooimpress 1874.04 993.91 N : oomath 2115.12 1081.89 N : ooweb 2369.02 1161.99 N : oowriter Note that the last ": oo*" commands are actually commented out. 1.4) vmstat numbers (some relevant ones are marked with *) before after nr_free_pages 1293 3898 nr_inactive_anon 59956 53460 nr_active_anon 26815 30026 nr_inactive_file 2657 3218 nr_active_file 2019 2806 nr_unevictable 4 4 nr_mlock 4 4 nr_anon_pages 26706 27859 *nr_mapped 3542 4469 nr_file_pages 72232 67681 nr_dirty 1 0 nr_writeback 123 19 nr_slab_reclaimable 3375 3534 nr_slab_unreclaimable 11405 10665 nr_page_table_pages 8106 7864 nr_unstable 0 0 nr_bounce 0 0 *nr_vmscan_write 394776 230839 nr_writeback_temp 0 0 numa_hit 6843353 3318676 numa_miss 0 0 numa_foreign 0 0 numa_interleave 1719 1719 numa_local 6843353 3318676 numa_other 0 0 *pgpgin 5954683 2057175 *pgpgout 1578276 922744 *pswpin 1486615 512238 *pswpout 394568 230685 pgalloc_dma 277432 56602 pgalloc_dma32 6769477 3310348 pgalloc_normal 0 0 pgalloc_movable 0 0 pgfree 7048396 3371118 pgactivate 2036343 1471492 pgdeactivate 2189691 1612829 pgfault 3702176 3100702 *pgmajfault 452116 201343 pgrefill_dma 12185 7127 pgrefill_dma32 334384 653703 pgrefill_normal 0 0 pgrefill_movable 0 0 pgsteal_dma 74214 22179 pgsteal_dma32 3334164 1638029 pgsteal_normal 0 0 pgsteal_movable 0 0 pgscan_kswapd_dma 1081421 1216199 pgscan_kswapd_dma32 58979118 46002810 pgscan_kswapd_normal 0 0 pgscan_kswapd_movable 0 0 pgscan_direct_dma 2015438 1086109 pgscan_direct_dma32 55787823 36101597 pgscan_direct_normal 0 0 pgscan_direct_movable 0 0 pginodesteal 3461 7281 slabs_scanned 564864 527616 kswapd_steal 2889797 1448082 kswapd_inodesteal 14827 14835 pageoutrun 43459 21562 allocstall 9653 4032 pgrotated 384216 228631 1.5) free numbers at the end of the tests before patch: total used free shared buffers cached Mem: 474 467 7 0 0 236 -/+ buffers/cache: 230 243 Swap: 1023 418 605 after patch: total used free shared buffers cached Mem: 474 457 16 0 0 236 -/+ buffers/cache: 221 253 Swap: 1023 404 619 2) memory flushing in a file server 2.1) brief summary The number of major faults from 50 to 3 during 10% cache hot reads. That means this patch successfully stops major faults when the active file list is slowly scanned when there are partially cache hot streaming IO. 2.2) test scenario Do 100000 pread(size=110 pages, offset=(i*100) pages), where 10% of the pages will be activated: for i in `seq 0 100 10000000`; do echo $i 110; done > pattern-hot-10 iotrace.rb --load pattern-hot-10 --play /b/sparse vmmon nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree and monitor /proc/vmstat during the time. The test box has 2G memory. I carried out tests on fresh booted console as well as X desktop, and fetched the vmstat numbers on (1) begin: shortly after the big read IO starts; (2) end: just before the big read IO stops; (3) restore: the big read IO stops and the zsh working set restored (4) restore X: after IO, switch back and forth between the urxvt and firefox windows to restore their working set. 2.3) console mode results nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree 2.6.29 VM_EXEC protection ON: begin: 2481 2237 8694 630 0 574299 end: 275 231976 233914 633 776271 20933042 restore: 370 232154 234524 691 777183 20958453 2.6.29 VM_EXEC protection ON (second run): begin: 2434 2237 8493 629 0 574195 end: 284 231970 233536 632 771918 20896129 restore: 399 232218 234789 690 774526 20957909 2.6.30-rc4-mm VM_EXEC protection OFF: begin: 2479 2344 9659 210 0 579643 end: 284 232010 234142 260 772776 20917184 restore: 379 232159 234371 301 774888 20967849 The above console numbers show that - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29. I'd attribute that improvement to the mmap readahead improvements :-) - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50. That's a huge improvement - which means with the VM_EXEC protection logic, active mmap pages is pretty safe even under partially cache hot streaming IO. - when active:inactive file lru size reaches 1:1, their scan rates is 1:20.8 under 10% cache hot IO. (computed with formula Dpgdeactivate:Dpgfree) That roughly means the active mmap pages get 20.8 more chances to get re-referenced to stay in memory. - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the dropped pages are mostly inactive ones. The patch has almost no impact in this aspect, that means it won't unnecessarily increase memory pressure. (In contrast, your 20% mmap protection ratio will keep them all, and therefore eliminate the extra 41 major faults to restore working set of zsh etc.) The iotrace.rb read throughput is 151.194384MB/s 284.198252s 100001x 450560b --load pattern-hot-10 --play /b/sparse which means the inactive list is rotated at the speed of 250MB/s, so a full scan of which takes about 3.5 seconds, while a full scan of active file list takes about 77 seconds. 2.4) X mode results We can reach roughly the same conclusions for X desktop: nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree 2.6.30-rc4-mm VM_EXEC protection ON: begin: 9740 8920 64075 561 0 678360 end: 768 218254 220029 565 798953 21057006 restore: 857 218543 220987 606 799462 21075710 restore X: 2414 218560 225344 797 799462 21080795 2.6.30-rc4-mm VM_EXEC protection OFF: begin: 9368 5035 26389 554 0 633391 end: 770 218449 221230 661 646472 17832500 restore: 1113 218466 220978 710 649881 17905235 restore X: 2687 218650 225484 947 802700 21083584 - the absolute nr_mapped drops considerably (to 1/13 of the original size) during the streaming IO. - the delta of pgmajfault is 3 vs 107 during IO, or 236 vs 393 during the whole process. CC: Elladan CC: Nick Piggin CC: Andi Kleen CC: Christoph Lameter Acked-by: Rik van Riel Acked-by: Peter Zijlstra Acked-by: KOSAKI Motohiro Reviewed-by: Johannes Weiner Reviewed-by: Minchan Kim Signed-off-by: Wu Fengguang --- mm/vmscan.c | 52 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 7 deletions(-) --- linux.orig/mm/vmscan.c +++ linux/mm/vmscan.c @@ -1219,6 +1219,7 @@ static void shrink_active_list(unsigned unsigned long pgscanned; unsigned long vm_flags; LIST_HEAD(l_hold); /* The pages which were snipped off */ + LIST_HEAD(l_active); LIST_HEAD(l_inactive); struct page *page; struct pagevec pvec; @@ -1258,28 +1259,42 @@ static void shrink_active_list(unsigned /* page_referenced clears PageReferenced */ if (page_mapping_inuse(page) && - page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) + page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) { pgmoved++; + /* + * Identify referenced, file-backed active pages and + * give them one more trip around the active list. So + * that executable code get better chances to stay in + * memory under moderate memory pressure. Anon pages + * are not likely to be evicted by use-once streaming + * IO, plus JVM can create lots of anon VM_EXEC pages, + * so we ignore them here. + */ + if ((vm_flags & VM_EXEC) && !PageAnon(page)) { + list_add(&page->lru, &l_active); + continue; + } + } list_add(&page->lru, &l_inactive); } /* - * Move the pages to the [file or anon] inactive list. + * Move pages back to the lru list. */ pagevec_init(&pvec, 1); - lru = LRU_BASE + file * LRU_FILE; spin_lock_irq(&zone->lru_lock); /* - * Count referenced pages from currently used mappings as - * rotated, even though they are moved to the inactive list. - * This helps balance scan pressure between file and anonymous - * pages in get_scan_ratio. + * Count referenced pages from currently used mappings as rotated, + * even though only some of them are actually re-activated. This + * helps balance scan pressure between file and anonymous pages in + * get_scan_ratio. */ reclaim_stat->recent_rotated[!!file] += pgmoved; pgmoved = 0; /* count pages moved to inactive list */ + lru = LRU_BASE + file * LRU_FILE; while (!list_empty(&l_inactive)) { page = lru_to_page(&l_inactive); prefetchw_prev_lru_page(page, &l_inactive, flags); @@ -1302,6 +1317,29 @@ static void shrink_active_list(unsigned __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); __count_zone_vm_events(PGREFILL, zone, pgscanned); __count_vm_events(PGDEACTIVATE, pgmoved); + + pgmoved = 0; /* count pages moved back to active list */ + lru = LRU_ACTIVE + file * LRU_FILE; + while (!list_empty(&l_active)) { + page = lru_to_page(&l_active); + prefetchw_prev_lru_page(page, &l_active, flags); + VM_BUG_ON(PageLRU(page)); + SetPageLRU(page); + VM_BUG_ON(!PageActive(page)); + + list_move(&page->lru, &zone->lru[lru].list); + mem_cgroup_add_lru_list(page, lru); + pgmoved++; + if (!pagevec_add(&pvec, page)) { + spin_unlock_irq(&zone->lru_lock); + if (buffer_heads_over_limit) + pagevec_strip(&pvec); + __pagevec_release(&pvec); + spin_lock_irq(&zone->lru_lock); + } + } + __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); + spin_unlock_irq(&zone->lru_lock); if (buffer_heads_over_limit) pagevec_strip(&pvec); -- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 3735E6B0083 for ; Fri, 10 Jul 2009 03:01:45 -0400 (EDT) Received: by gxk3 with SMTP id 3so1261409gxk.14 for ; Fri, 10 Jul 2009 00:24:29 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20090608091044.880249722@intel.com> References: <20090608091044.880249722@intel.com> Date: Fri, 10 Jul 2009 15:24:29 +0800 Message-ID: Subject: Re: [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) From: Nai Xia Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Wu Fengguang Cc: Andrew Morton , KOSAKI Motohiro , Andi Kleen , Christoph Lameter , Elladan , Nick Piggin , Johannes Weiner , Peter Zijlstra , Rik van Riel , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-ID: Hi, I was able to launch some tests with SPEC cpu2006. The benchmark was based on mmotm commit 0b7292956dbdfb212abf6e3c9cfb41e9471e1081 on a intel Q6600 box with 4G ram. The kernel cmdline mem=3D500M was used to see how good exec-prot ca= n be under memory stress. Following are the results: Estimated Base Base Base Benchmarks Ref. Run Time Ratio mmotm with 500M 400.perlbench 9770 671 14.6 * 401.bzip2 9650 1011 9.55 * 403.gcc 8050 774 10.4 * 462.libquantum 20720 1213 17.1 * mmot-prot with 500M 400.perlbench 9770 658 14.8 * 401.bzip2 9650 1007 9.58 * 403.gcc 8050 749 10.8 * 462.libquantum 20720 1116 18.6 * mmotm with 4G ( allowing the full working sets) 400.perlbench 9770 594 16.5 * 401.bzip2 9650 828 11.7 * 403.gcc 8050 523 15.4 * 462.libquantum 20720 1121 18.5 * It's worth noting that SPEC documented "The CPU2006 benchmarks (code + workload) have been designed to fit within about 1GB of physical memory", and the exec vm sizes of these programs are as below: perlbench 956KB bzip2 56KB gcc 3008KB libquantum 36KB Are we expecting to see more good results for cpu-bound programs (e.g. scientific ones) with large number of exec pages ? Best Regards, Nai Xia On Mon, Jun 8, 2009 at 5:10 PM, Wu Fengguang wrote: > Andrew, > > I managed to back this patchset with two test cases :) > > They demonstrated that > - X desktop responsiveness can be *doubled* under high memory/swap pressu= re > - it can almost stop major faults when the active file list is slowly sca= nned > =A0because of undergoing partially cache hot streaming IO > > The details are included in the changelog. > > Thanks, > Fengguang > -- > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. =A0For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 3DF886B0088 for ; Fri, 10 Jul 2009 04:11:54 -0400 (EDT) Date: Fri, 10 Jul 2009 16:34:29 +0800 From: Wu Fengguang Subject: Re: [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) Message-ID: <20090710083429.GC24168@localhost> References: <20090608091044.880249722@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: owner-linux-mm@kvack.org To: Nai Xia Cc: Andrew Morton , KOSAKI Motohiro , Andi Kleen , Christoph Lameter , Elladan , Nick Piggin , Johannes Weiner , Peter Zijlstra , Rik van Riel , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-ID: On Fri, Jul 10, 2009 at 03:24:29PM +0800, Nai Xia wrote: > Hi, > > I was able to launch some tests with SPEC cpu2006. > The benchmark was based on mmotm > commit 0b7292956dbdfb212abf6e3c9cfb41e9471e1081 on a intel Q6600 box with > 4G ram. The kernel cmdline mem=500M was used to see how good exec-prot can > be under memory stress. Thank you for the testings, Nai! > Following are the results: > > Estimated > Base Base Base > Benchmarks Ref. Run Time Ratio > > mmotm with 500M > 400.perlbench 9770 671 14.6 * > 401.bzip2 9650 1011 9.55 * > 403.gcc 8050 774 10.4 * > 462.libquantum 20720 1213 17.1 * > > > mmot-prot with 500M > 400.perlbench 9770 658 14.8 * > 401.bzip2 9650 1007 9.58 * > 403.gcc 8050 749 10.8 * > 462.libquantum 20720 1116 18.6 * > > mmotm with 4G ( allowing the full working sets) > 400.perlbench 9770 594 16.5 * > 401.bzip2 9650 828 11.7 * > 403.gcc 8050 523 15.4 * > 462.libquantum 20720 1121 18.5 * mmotm mmotm-prot mmotm-4G mmotm-prot mmotm-4G 14.6 14.8 16.5 +1.4% +13.0% 9.55 9.58 11.7 +0.3% +22.5% 10.4 10.8 15.4 +3.8% +48.1% 17.1 18.6 18.5 +8.8% +8.2% So it's mostly small improvements. > It's worth noting that SPEC documented "The CPU2006 benchmarks > (code + workload) have been designed to fit within about 1GB of > physical memory", > and the exec vm sizes of these programs are as below: > perlbench 956KB > bzip2 56KB > gcc 3008KB > libquantum 36KB > > > Are we expecting to see more good results for cpu-bound programs (e.g. > scientific ones) > with large number of exec pages ? Not likely. Scientific computing is typically equipped with lots of memory and the footprint of the program itself is relatively small. The exec-mmap protection mainly helps when some exec pages/programs have been inactive for some minutes and then go active. That's the typically desktop use pattern. Thanks, Fengguang > On Mon, Jun 8, 2009 at 5:10 PM, Wu Fengguang wrote: > > Andrew, > > > > I managed to back this patchset with two test cases :) > > > > They demonstrated that > > - X desktop responsiveness can be *doubled* under high memory/swap pressure > > - it can almost stop major faults when the active file list is slowly scanned > > A because of undergoing partially cache hot streaming IO > > > > The details are included in the changelog. > > > > Thanks, > > Fengguang > > -- > > > > -- > > To unsubscribe, send a message with 'unsubscribe linux-mm' in > > the body to majordomo@kvack.org. A For more info on Linux MM, > > see: http://www.linux-mm.org/ . > > Don't email: email@kvack.org > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id C51506B004D for ; Fri, 10 Jul 2009 12:50:26 -0400 (EDT) Received: by vwj5 with SMTP id 5so284615vwj.12 for ; Fri, 10 Jul 2009 09:50:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20090710083429.GC24168@localhost> References: <20090608091044.880249722@intel.com> <20090710083429.GC24168@localhost> Date: Sat, 11 Jul 2009 00:50:50 +0800 Message-ID: Subject: Re: [PATCH 0/3] make mapped executable pages the first class citizen (with test cases) From: Nai Xia Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Wu Fengguang Cc: Andrew Morton , KOSAKI Motohiro , Andi Kleen , Christoph Lameter , Elladan , Nick Piggin , Johannes Weiner , Peter Zijlstra , Rik van Riel , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" List-ID: On Fri, Jul 10, 2009 at 4:34 PM, Wu Fengguang wrote= : > On Fri, Jul 10, 2009 at 03:24:29PM +0800, Nai Xia wrote: >> Hi, >> >> I was able to launch some tests with SPEC cpu2006. >> The benchmark was based on mmotm >> commit 0b7292956dbdfb212abf6e3c9cfb41e9471e1081 on a intel =A0Q6600 box = with >> 4G ram. The kernel cmdline mem=3D500M was used to see how good exec-prot= can >> be under memory stress. > > Thank you for the testings, Nai! You are welcome :) > >> Following are the results: >> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Esti= mated >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Base =A0 =A0 Base =A0 =A0 =A0 Base >> Benchmarks =A0 =A0 =A0Ref. =A0 Run Time =A0 =A0 Ratio >> >> mmotm with 500M >> 400.perlbench =A0 =A09770 =A0 =A0 =A0 =A0671 =A0 =A0 =A014.6 =A0* >> 401.bzip2 =A0 =A0 =A0 =A09650 =A0 =A0 =A0 1011 =A0 =A0 =A0 9.55 * >> 403.gcc =A0 =A0 =A0 =A0 =A08050 =A0 =A0 =A0 =A0774 =A0 =A0 =A010.4 =A0* >> 462.libquantum =A020720 =A0 =A0 =A0 1213 =A0 =A0 =A017.1 =A0* >> >> >> mmot-prot with 500M >> 400.perlbench =A0 =A09770 =A0 =A0 =A0 =A0658 =A0 =A0 =A014.8 =A0* >> 401.bzip2 =A0 =A0 =A0 =A09650 =A0 =A0 =A0 1007 =A0 =A0 =A0 9.58 * >> 403.gcc =A0 =A0 =A0 =A0 =A08050 =A0 =A0 =A0 =A0749 =A0 =A0 =A010.8 =A0* >> 462.libquantum =A020720 =A0 =A0 =A0 1116 =A0 =A0 =A018.6 =A0* >> >> mmotm with 4G ( allowing the full working sets) >> 400.perlbench =A0 =A09770 =A0 =A0 =A0 =A0594 =A0 =A0 =A016.5 =A0* >> 401.bzip2 =A0 =A0 =A0 =A09650 =A0 =A0 =A0 =A0828 =A0 =A0 =A011.7 =A0* >> 403.gcc =A0 =A0 =A0 =A0 =A08050 =A0 =A0 =A0 =A0523 =A0 =A0 =A015.4 =A0* >> 462.libquantum =A020720 =A0 =A0 =A0 1121 =A0 =A0 =A018.5 =A0* > > mmotm =A0 =A0mmotm-prot =A0mmotm-4G =A0 =A0mmotm-prot =A0 mmotm-4G > 14.6 =A0 =A0 14.8 =A0 =A0 =A0 =A016.5 =A0 =A0 =A0 =A0+1.4% =A0 =A0 =A0 = =A0+13.0% > =A09.55 =A0 =A0 9.58 =A0 =A0 =A0 11.7 =A0 =A0 =A0 =A0+0.3% =A0 =A0 =A0 = =A0+22.5% > 10.4 =A0 =A0 10.8 =A0 =A0 =A0 =A015.4 =A0 =A0 =A0 =A0+3.8% =A0 =A0 =A0 = =A0+48.1% > 17.1 =A0 =A0 18.6 =A0 =A0 =A0 =A018.5 =A0 =A0 =A0 =A0+8.8% =A0 =A0 =A0 = =A0 +8.2% > > So it's mostly small improvements. > >> It's worth noting that SPEC documented "The CPU2006 benchmarks >> (code + workload) have been designed to fit within about 1GB of >> physical memory", >> and the exec vm sizes of these programs are as below: >> perlbench =A0956KB >> bzip2 =A0 =A0 =A0 =A0 56KB >> gcc =A0 =A0 =A0 =A0 =A03008KB >> libquantum =A036KB >> >> >> Are we expecting to see more good results for cpu-bound programs (e.g. >> scientific ones) >> with large number of exec pages ? > > Not likely. Scientific computing is typically equipped with lots of > memory and the footprint of the program itself is relatively small. OK, well, maybe as long as there is still swapping, improvement is possible. Actually, in the above cases like bzip2, its exec footprint is already quite small compared to the percentage of the improvement. Let me see if I am lucky enough to have someone majoring in computing chemi= stry in our Univ. give a benchmark. :) You know they have relatively small machines doing small personal computing jobs and sometimes swapping still matters. > > The exec-mmap protection mainly helps when some exec pages/programs > have been inactive for some minutes and then go active. That's the > typically desktop use pattern. OK. Still it's good to see that this patch can improve more than 20% on av= erage on non-typical cases, hehe. Regards, Nai > > Thanks, > Fengguang > >> On Mon, Jun 8, 2009 at 5:10 PM, Wu Fengguang wro= te: >> > Andrew, >> > >> > I managed to back this patchset with two test cases :) >> > >> > They demonstrated that >> > - X desktop responsiveness can be *doubled* under high memory/swap pre= ssure >> > - it can almost stop major faults when the active file list is slowly = scanned >> > =A0because of undergoing partially cache hot streaming IO >> > >> > The details are included in the changelog. >> > >> > Thanks, >> > Fengguang >> > -- >> > >> > -- >> > To unsubscribe, send a message with 'unsubscribe linux-mm' in >> > the body to majordomo@kvack.org. =A0For more info on Linux MM, >> > see: http://www.linux-mm.org/ . >> > Don't email: email@kvack.org >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org