From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1107355051189918289==" MIME-Version: 1.0 From: Minchan Kim To: lkp@lists.01.org Subject: Re: [mm] 5c0a85fad9: unixbench.score -6.3% regression Date: Fri, 17 Jun 2016 14:41:56 +0900 Message-ID: <20160617054156.GB2374@bbox> In-Reply-To: <87a8ikkbvj.fsf@yhuang-mobile.sh.intel.com> List-Id: --===============1107355051189918289== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Thu, Jun 16, 2016 at 03:27:44PM -0700, Huang, Ying wrote: > Minchan Kim writes: > = > > On Thu, Jun 16, 2016 at 07:52:26AM +0800, Huang, Ying wrote: > >> "Kirill A. Shutemov" writes: > >> = > >> > On Tue, Jun 14, 2016 at 05:57:28PM +0900, Minchan Kim wrote: > >> >> On Wed, Jun 08, 2016 at 11:58:11AM +0300, Kirill A. Shutemov wrote: > >> >> > On Wed, Jun 08, 2016 at 04:41:37PM +0800, Huang, Ying wrote: > >> >> > > "Huang, Ying" writes: > >> >> > > = > >> >> > > > "Kirill A. Shutemov" writ= es: > >> >> > > > > >> >> > > >> On Mon, Jun 06, 2016 at 10:27:24AM +0800, kernel test robot = wrote: > >> >> > > >>> = > >> >> > > >>> FYI, we noticed a -6.3% regression of unixbench.score due t= o commit: > >> >> > > >>> = > >> >> > > >>> commit 5c0a85fad949212b3e059692deecdeed74ae7ec7 ("mm: make = faultaround produce old ptes") > >> >> > > >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/li= nux.git master > >> >> > > >>> = > >> >> > > >>> in testcase: unixbench > >> >> > > >>> on test machine: lituya: 16 threads Haswell High-end Deskto= p (i7-5960X 3.0G) with 16G memory > >> >> > > >>> with following parameters: cpufreq_governor=3Dperformance/n= r_task=3D1/test=3Dshell8 > >> >> > > >>> = > >> >> > > >>> = > >> >> > > >>> Details are as below: > >> >> > > >>> -----------------------------------------------------------= ---------------------------------------> > >> >> > > >>> = > >> >> > > >>> = > >> >> > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > >> >> > > >>> compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group= /test/testcase: > >> >> > > >>> gcc-4.9/performance/x86_64-rhel/1/debian-x86_64-2015-02-0= 7.cgz/lituya/shell8/unixbench > >> >> > > >>> = > >> >> > > >>> commit: = > >> >> > > >>> 4b50bcc7eda4d3cc9e3f2a0aa60e590fedf728c5 > >> >> > > >>> 5c0a85fad949212b3e059692deecdeed74ae7ec7 > >> >> > > >>> = > >> >> > > >>> 4b50bcc7eda4d3cc 5c0a85fad949212b3e059692de = > >> >> > > >>> ---------------- -------------------------- = > >> >> > > >>> fail:runs %reproduction fail:runs > >> >> > > >>> | | | = > >> >> > > >>> 3:4 -75% :4 kmsg.DHCP/BOO= TP:Reply_not_for_us,op[#]xid[#] > >> >> > > >>> %stddev %change %stddev > >> >> > > >>> \ | \ = > >> >> > > >>> 14321 . 0% -6.3% 13425 . 0% unixbench.sco= re > >> >> > > >>> 1996897 . 0% -6.1% 1874635 . 0% unixbench.tim= e.involuntary_context_switches > >> >> > > >>> 1.721e+08 . 0% -6.2% 1.613e+08 . 0% unixbench.tim= e.minor_page_faults > >> >> > > >>> 758.65 . 0% -3.0% 735.86 . 0% unixbench.tim= e.system_time > >> >> > > >>> 387.66 . 0% +5.4% 408.49 . 0% unixbench.tim= e.user_time > >> >> > > >>> 5950278 . 0% -6.2% 5583456 . 0% unixbench.tim= e.voluntary_context_switches > >> >> > > >> > >> >> > > >> That's weird. > >> >> > > >> > >> >> > > >> I don't understand why the change would reduce number or min= or faults. > >> >> > > >> It should stay the same on x86-64. Rise of user_time is puzz= ling too. > >> >> > > > > >> >> > > > unixbench runs in fixed time mode. That is, the total time t= o run > >> >> > > > unixbench is fixed, but the work done varies. So the minor_p= age_faults > >> >> > > > change may reflect only the work done. > >> >> > > > > >> >> > > >> Hm. Is reproducible? Across reboot? > >> >> > > > > >> >> > > = > >> >> > > And FYI, there is no swap setup for test, all root file system = including > >> >> > > benchmark files are in tmpfs, so no real page reclaim will be > >> >> > > triggered. But it appears that active file cache reduced after= the > >> >> > > commit. > >> >> > > = > >> >> > > 111331 . 1% -13.3% 96503 . 0% meminfo.Active > >> >> > > 27603 . 1% -43.9% 15486 . 0% meminfo.Active(fi= le) > >> >> > > = > >> >> > > I think this is the expected behavior of the commit? > >> >> > = > >> >> > Yes, it's expected. > >> >> > = > >> >> > After the change faularound would produce old pte. It means there= 's more > >> >> > chance for these pages to be on inactive lru, unless somebody act= ually > >> >> > touch them and flip accessed bit. > >> >> = > >> >> Hmm, tmpfs pages should be in anonymous LRU list and VM shouldn't s= can > >> >> anonymous LRU list on swapless system so I really wonder why active= file > >> >> LRU is shrunk. > >> > > >> > Hm. Good point. I don't why we have anything on file lru if there's = no > >> > filesystems except tmpfs. > >> > > >> > Ying, how do you get stuff to the tmpfs? > >> = > >> We put root file system and benchmark into a set of compressed cpio > >> archive, then concatenate them into one initrd, and finally kernel use > >> that initrd as initramfs. > > > > I see. > > > > Could you share your 4 full vmstat(/proc/vmstat) files? > > > > old: > > > > cat /proc/vmstat > before.old.vmstat > > do benchmark > > cat /proc/vmstat > after.old.vmstat > > > > new: > > > > cat /proc/vmstat > before.new.vmstat > > do benchmark > > cat /proc/vmstat > after.new.vmstat > > > > IOW, I want to see stats related to reclaim. > = > Hi, > = > The /proc/vmstat for the parent commit (parent-proc-vmstat.gz) and first > bad commit (fbc-proc-vmstat.gz) are attached with the email. > = > The contents of the file is more than the vmstat before and after > benchmark running, but are sampled every 1 seconds. Every sample begin > with "time: