From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755261AbZESHQf (ORCPT ); Tue, 19 May 2009 03:16:35 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753556AbZESHQ0 (ORCPT ); Tue, 19 May 2009 03:16:26 -0400 Received: from mga14.intel.com ([143.182.124.37]:50917 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753337AbZESHQY (ORCPT ); Tue, 19 May 2009 03:16:24 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.41,213,1241420400"; d="scan'208";a="144369541" Date: Tue, 19 May 2009 15:15:55 +0800 From: Wu Fengguang To: KOSAKI Motohiro Cc: Christoph Lameter , Andrew Morton , LKML , Elladan , Nick Piggin , Johannes Weiner , Peter Zijlstra , Rik van Riel , "tytso@mit.edu" , "linux-mm@kvack.org" , "minchan.kim@gmail.com" Subject: Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Message-ID: <20090519071554.GA26646@localhost> References: <20090519032759.GA7608@localhost> <20090519133422.4ECC.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090519133422.4ECC.A69D9226@jp.fujitsu.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 19, 2009 at 12:41:38PM +0800, KOSAKI Motohiro wrote: > Hi > > Thanks for great works. > > > > SUMMARY > > ======= > > The patch decreases the number of major faults from 50 to 3 during 10% cache hot reads. > > > > > > SCENARIO > > ======== > > The test scenario is to do 100000 pread(size=110 pages, offset=(i*100) pages), > > where 10% of the pages will be activated: > > > > for i in `seq 0 100 10000000`; do echo $i 110; done > pattern-hot-10 > > iotrace.rb --load pattern-hot-10 --play /b/sparse > > > Which can I download iotrace.rb? > > > > and monitor /proc/vmstat during the time. The test box has 2G memory. > > > > > > ANALYZES > > ======== > > > > I carried out two runs on fresh booted console mode 2.6.29 with the VM_EXEC > > patch, and fetched the vmstat numbers on > > > > (1) begin: shortly after the big read IO starts; > > (2) end: just before the big read IO stops; > > (3) restore: the big read IO stops and the zsh working set restored > > > > nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree > > begin: 2481 2237 8694 630 0 574299 > > end: 275 231976 233914 633 776271 20933042 > > restore: 370 232154 234524 691 777183 20958453 > > > > begin: 2434 2237 8493 629 0 574195 > > end: 284 231970 233536 632 771918 20896129 > > restore: 399 232218 234789 690 774526 20957909 > > > > and another run on 2.6.30-rc4-mm with the VM_EXEC logic disabled: > > I don't think it is proper comparision. > you need either following comparision. otherwise we insert many guess into the analysis. > > - 2.6.29 with and without VM_EXEC patch > - 2.6.30-rc4-mm with and without VM_EXEC patch > > > > > > begin: 2479 2344 9659 210 0 579643 > > end: 284 232010 234142 260 772776 20917184 > > restore: 379 232159 234371 301 774888 20967849 > > > > The numbers show that > > > > - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29. > > I'd attribute that improvement to the mmap readahead improvements :-) > > > > - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50. > > That's a huge improvement - which means with the VM_EXEC protection logic, > > active mmap pages is pretty safe even under partially cache hot streaming IO. > > > > - when active:inactive file lru size reaches 1:1, their scan rates is 1:20.8 > > under 10% cache hot IO. (computed with formula Dpgdeactivate:Dpgfree) > > That roughly means the active mmap pages get 20.8 more chances to get > > re-referenced to stay in memory. > > > > - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the > > dropped pages are mostly inactive ones. The patch has almost no impact in > > this aspect, that means it won't unnecessarily increase memory pressure. > > (In contrast, your 20% mmap protection ratio will keep them all, and > > therefore eliminate the extra 41 major faults to restore working set > > of zsh etc.) More results on X desktop, kernel 2.6.30-rc4-mm: nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree VM_EXEC protection ON: begin: 9740 8920 64075 561 0 678360 end: 768 218254 220029 565 798953 21057006 restore: 857 218543 220987 606 799462 21075710 restore X: 2414 218560 225344 797 799462 21080795 VM_EXEC protection OFF: begin: 9368 5035 26389 554 0 633391 end: 770 218449 221230 661 646472 17832500 restore: 1113 218466 220978 710 649881 17905235 restore X: 2687 218650 225484 947 802700 21083584 The added "restore X" means after IO, switch back and forth between the urxvt and firefox windows to restore their working set. I cannot explain why the absolute nr_mapped grows larger at the end of VM_EXEC OFF case. Maybe it's because urxvt is the foreground window during the first run, and firefox is the foreground window during the second run? Like the console mode, the absolute nr_mapped drops considerably - to 1/13 of the original size - during the streaming IO. The delta of pgmajfault is 3 vs 107 during IO, or 236 vs 393 during the whole process. RAW DATA -------- status before tests: wfg@hp ~% ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.8 0.0 10316 792 ? Ss 14:38 0:02 init [2] root 2 0.0 0.0 0 0 ? S< 14:38 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? S< 14:38 0:00 [migration/0] root 4 0.0 0.0 0 0 ? S< 14:38 0:00 [ksoftirqd/0] root 5 0.0 0.0 0 0 ? S< 14:38 0:00 [watchdog/0] root 6 0.0 0.0 0 0 ? S< 14:38 0:00 [migration/1] root 7 0.0 0.0 0 0 ? S< 14:38 0:00 [ksoftirqd/1] root 8 0.0 0.0 0 0 ? S< 14:38 0:00 [watchdog/1] root 9 0.0 0.0 0 0 ? S< 14:38 0:00 [events/0] root 10 0.0 0.0 0 0 ? S< 14:38 0:00 [events/1] root 11 0.0 0.0 0 0 ? S< 14:38 0:00 [khelper] root 16 0.0 0.0 0 0 ? S< 14:38 0:00 [async/mgr] root 160 0.0 0.0 0 0 ? S< 14:38 0:00 [kintegrityd/0] root 161 0.0 0.0 0 0 ? S< 14:38 0:00 [kintegrityd/1] root 163 0.0 0.0 0 0 ? S< 14:38 0:00 [kblockd/0] root 164 0.0 0.0 0 0 ? S< 14:38 0:00 [kblockd/1] root 165 0.0 0.0 0 0 ? S< 14:38 0:00 [kacpid] root 166 0.0 0.0 0 0 ? S< 14:38 0:00 [kacpi_notify] root 274 0.0 0.0 0 0 ? S< 14:38 0:00 [ata/0] root 275 0.0 0.0 0 0 ? S< 14:38 0:00 [ata/1] root 276 0.0 0.0 0 0 ? S< 14:38 0:00 [ata_aux] root 280 0.0 0.0 0 0 ? S< 14:38 0:00 [ksuspend_usbd] root 284 0.0 0.0 0 0 ? S< 14:38 0:00 [khubd] root 287 0.0 0.0 0 0 ? S< 14:38 0:00 [kseriod] root 329 0.0 0.0 0 0 ? S< 14:38 0:00 [kondemand/0] root 330 0.0 0.0 0 0 ? S< 14:38 0:00 [kondemand/1] root 365 0.0 0.0 0 0 ? S< 14:38 0:00 [rt-test-0] root 367 0.0 0.0 0 0 ? S< 14:38 0:00 [rt-test-1] root 369 0.0 0.0 0 0 ? S< 14:38 0:00 [rt-test-2] root 371 0.0 0.0 0 0 ? S< 14:38 0:00 [rt-test-3] root 373 0.0 0.0 0 0 ? S< 14:38 0:00 [rt-test-4] root 375 0.0 0.0 0 0 ? S< 14:38 0:00 [rt-test-5] root 377 0.0 0.0 0 0 ? S< 14:38 0:00 [rt-test-6] root 379 0.0 0.0 0 0 ? S< 14:38 0:00 [rt-test-7] root 382 0.0 0.0 0 0 ? S 14:38 0:00 [khungtaskd] root 383 0.0 0.0 0 0 ? S 14:38 0:00 [pdflush] root 384 0.0 0.0 0 0 ? S 14:38 0:00 [pdflush] root 385 0.0 0.0 0 0 ? S< 14:38 0:00 [kswapd0] root 386 0.0 0.0 0 0 ? S< 14:38 0:00 [aio/0] root 387 0.0 0.0 0 0 ? S< 14:38 0:00 [aio/1] root 388 0.0 0.0 0 0 ? S< 14:38 0:00 [nfsiod] root 390 0.0 0.0 0 0 ? S< 14:38 0:00 [crypto/0] root 391 0.0 0.0 0 0 ? S< 14:38 0:00 [crypto/1] root 1118 0.0 0.0 0 0 ? S< 14:38 0:00 [iscsi_eh] root 1122 0.0 0.0 0 0 ? S< 14:38 0:00 [scsi_eh_0] root 1125 0.0 0.0 0 0 ? S< 14:38 0:00 [scsi_eh_1] root 1128 0.0 0.0 0 0 ? S< 14:38 0:00 [scsi_eh_2] root 1136 0.0 0.0 0 0 ? S< 14:38 0:00 [scsi_eh_3] root 1139 0.0 0.0 0 0 ? S< 14:38 0:00 [scsi_eh_4] root 1276 0.0 0.0 0 0 ? S< 14:38 0:00 [kpsmoused] root 1301 0.0 0.0 0 0 ? S< 14:38 0:00 [usbhid_resumer] root 1312 0.0 0.0 0 0 ? S< 14:38 0:00 [rpciod/0] root 1313 0.0 0.0 0 0 ? S< 14:38 0:00 [rpciod/1] root 1488 0.0 0.0 0 0 ? S< 14:38 0:00 [iwlagn] root 1490 0.0 0.0 0 0 ? S< 14:38 0:00 [phy0] root 1524 0.0 0.0 0 0 ? S< 14:38 0:00 [hd-audio0] root 1577 0.0 0.0 0 0 ? S< 14:38 0:00 [kjournald2] root 1578 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-worker-0] root 1579 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-submit-0] root 1580 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-delalloc-] root 1581 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-fixup-0] root 1582 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-0] root 1583 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-2] root 1584 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-4] root 1585 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-6] root 1586 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-met] root 1587 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-met] root 1588 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-met] root 1589 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-met] root 1590 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-met] root 1591 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-met] root 1592 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-met] root 1593 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-met] root 1594 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-wri] root 1595 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-wri] root 1596 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-wri] root 1597 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-endio-wri] root 1598 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-cleaner] root 1599 0.0 0.0 0 0 ? S< 14:38 0:00 [btrfs-transacti] daemon 1658 0.0 0.0 8024 528 ? Ss 14:38 0:00 /sbin/portmap root 1670 0.0 0.0 10136 792 ? Ss 14:38 0:00 /sbin/rpc.statd root 1679 0.0 0.0 26952 660 ? Ss 14:38 0:00 /usr/sbin/rpc.idmapd root 1789 0.0 0.0 3800 648 ? Ss 14:39 0:00 /usr/sbin/acpid 104 1799 0.0 0.0 21084 996 ? Ss 14:40 0:00 /usr/bin/dbus-daemon --system root 1811 0.0 0.0 48872 1208 ? Ss 14:40 0:00 /usr/sbin/sshd root 1844 0.0 0.0 0 0 ? S< 14:40 0:00 [lockd] root 1845 0.0 0.0 0 0 ? S< 14:40 0:00 [nfsd] root 1846 0.0 0.0 0 0 ? S< 14:40 0:00 [nfsd] root 1847 0.0 0.0 0 0 ? S< 14:40 0:00 [nfsd] root 1848 0.0 0.0 0 0 ? S< 14:40 0:00 [nfsd] root 1849 0.0 0.0 0 0 ? S< 14:40 0:00 [nfsd] root 1850 0.0 0.0 0 0 ? S< 14:40 0:00 [nfsd] root 1851 0.0 0.0 0 0 ? S< 14:40 0:00 [nfsd] root 1852 0.0 0.0 0 0 ? S< 14:40 0:00 [nfsd] root 1856 0.0 0.0 14464 420 ? Ss 14:40 0:00 /usr/sbin/rpc.mountd --manage-gids 106 1867 0.2 0.2 29280 4164 ? Ss 14:40 0:00 /usr/sbin/hald root 1868 0.0 0.0 17812 1172 ? S 14:40 0:00 hald-runner root 1891 0.0 0.0 19936 1132 ? S 14:40 0:00 /usr/lib/hal/hald-addon-cpufreq 106 1892 0.0 0.0 16608 988 ? S 14:40 0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket pulse 1902 0.0 0.1 102024 2664 ? S