* [linux-4.1 bisection] complete test-armhf-armhf-xl-multivcpu
@ 2016-07-29 6:32 osstest service owner
0 siblings, 0 replies; only message in thread
From: osstest service owner @ 2016-07-29 6:32 UTC (permalink / raw)
To: xen-devel, osstest-admin
branch xen-unstable
xenbranch xen-unstable
job test-armhf-armhf-xl-multivcpu
testid debian-install
Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
*** Found and reproduced problem changeset ***
Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Bug introduced: c5ad33184354260be6d05de57e46a5498692f6d6
Bug not present: c5bcec6cbcbf520f088dc7939934bbf10c20c5a5
Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/99777/
commit c5ad33184354260be6d05de57e46a5498692f6d6
Author: Lukasz Odzioba <lukasz.odzioba@intel.com>
Date: Fri Jun 24 14:50:01 2016 -0700
mm/swap.c: flush lru pvecs on compound page arrival
[ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ]
Currently we can have compound pages held on per cpu pagevecs, which
leads to a lot of memory unavailable for reclaim when needed. In the
systems with hundreads of processors it can be GBs of memory.
On of the way of reproducing the problem is to not call munmap
explicitly on all mapped regions (i.e. after receiving SIGTERM). After
that some pages (with THP enabled also huge pages) may end up on
lru_add_pvec, example below.
void main() {
#pragma omp parallel
{
size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
if (p != MAP_FAILED)
memset(p, 0, size);
//munmap(p, size); // uncomment to make the problem go away
}
}
When we run it with THP enabled it will leave significant amount of
memory on lru_add_pvec. This memory will be not reclaimed if we hit
OOM, so when we run above program in a loop:
for i in `seq 100`; do ./a.out; done
many processes (95% in my case) will be killed by OOM.
The primary point of the LRU add cache is to save the zone lru_lock
contention with a hope that more pages will belong to the same zone and
so their addition can be batched. The huge page is already a form of
batched addition (it will add 512 worth of memory in one go) so skipping
the batching seems like a safer option when compared to a potential
excess in the caching which can be quite large and much harder to fix
because lru_add_drain_all is way to expensive and it is not really clear
what would be a good moment to call it.
Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
madvise(p, size, MADV_FREE); after memset.
This patch flushes lru pvecs on compound page arrival making the problem
less severe - after applying it kill rate of above example drops to 0%,
due to reducing maximum amount of memory held on pvec from 28MB (with
THP) to 56kB per CPU.
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Ming Li <mingli199x@qq.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
For bisection revision-tuple graph see:
http://logs.test-lab.xenproject.org/osstest/results/bisect/linux-4.1/test-armhf-armhf-xl-multivcpu.debian-install.html
Revision IDs in each graph node refer, respectively, to the Trees above.
----------------------------------------
Running cs-bisection-step --graph-out=/home/logs/results/bisect/linux-4.1/test-armhf-armhf-xl-multivcpu.debian-install --summary-out=tmp/99777.bisection-summary --basis-template=96211 --blessings=real,real-bisect linux-4.1 test-armhf-armhf-xl-multivcpu debian-install
Searching for failure / basis pass:
99714 fail [host=cubietruck-gleizes] / 96211 [host=cubietruck-metzinger] 96183 ok.
Failure / basis pass flights: 99714 / 96183
Tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
Tree: qemuu git://xenbits.xen.org/qemu-xen.git
Tree: xen git://xenbits.xen.org/xen.git
Latest 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f e763268781d341fef05d461f3057e6ced5e033f2
Basis pass 95123c0b81d9478b8155fe15093b88f57ef7d0bd c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f c6f7d21747805b50123fc1b8d73518fea2aa9096
Generating revisions with ./adhoc-revtuple-generator git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git#95123c0b81d9478b8155fe15093b88f57ef7d0bd-5880876e94699ce010554f483ccf0009997955ca git://xenbits.xen.org/osstest/linux-firmware.git#c530a75c1e6a472b0eb9558310b518f0dfcd8860-c530a75c1e6a472b0eb9558310b518f0dfcd8860 git://xenbits.xen.org/qemu-xen.git#44a072f0de0d57c95c2212bbce02888832b7b74f-44a072f0de0d57c95c2212bbce02888832b7b74f git://xenbits.xen.org/xen.git#c6f7d21747805b50123fc1b8d73518fea2aa9096-e763268781d341fef05d461f3057e6ced5e033f2
Loaded 2001 nodes in revision graph
Searching for test results:
96211 [host=cubietruck-metzinger]
96160 pass 95123c0b81d9478b8155fe15093b88f57ef7d0bd c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f c6f7d21747805b50123fc1b8d73518fea2aa9096
96183 pass 95123c0b81d9478b8155fe15093b88f57ef7d0bd c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f c6f7d21747805b50123fc1b8d73518fea2aa9096
97279 fail irrelevant
97434 fail irrelevant
97394 fail irrelevant
97496 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f b48be35ac86cd6369124cf06ca3006d086095297
97558 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f b48be35ac86cd6369124cf06ca3006d086095297
97613 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f b48be35ac86cd6369124cf06ca3006d086095297
97644 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f b48be35ac86cd6369124cf06ca3006d086095297
97692 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f e763268781d341fef05d461f3057e6ced5e033f2
97730 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f e763268781d341fef05d461f3057e6ced5e033f2
99604 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f e763268781d341fef05d461f3057e6ced5e033f2
99688 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f e763268781d341fef05d461f3057e6ced5e033f2
99664 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f e763268781d341fef05d461f3057e6ced5e033f2
99695 fail 8ca7bf099ae0e6ff096b3910895b5285a112aeb5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99677 pass 95123c0b81d9478b8155fe15093b88f57ef7d0bd c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f c6f7d21747805b50123fc1b8d73518fea2aa9096
99767 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99733 pass a13b0f0a244b15e576f6edf4ffb9ce41ea6f3837 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99704 pass fcc5d265d134e891abd67169c77358cd5ea2fc77 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99701 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f e763268781d341fef05d461f3057e6ced5e033f2
99709 fail d4b08964d00a0b99e999a2bb1ce417e54b5c607f c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99714 fail 5880876e94699ce010554f483ccf0009997955ca c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f e763268781d341fef05d461f3057e6ced5e033f2
99739 pass 1ff20a560eba527ba652502a2da1cd431e1e2fea c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99746 pass 7f3724b8951735ef1d5ae4f2846b8af98a665d73 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99756 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99753 fail 683854270f84daa09baffe2b21d64ec88c614fa9 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99763 fail c5ad33184354260be6d05de57e46a5498692f6d6 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99770 fail c5ad33184354260be6d05de57e46a5498692f6d6 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99774 pass c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
99777 fail c5ad33184354260be6d05de57e46a5498692f6d6 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
Searching for interesting versions
Result found: flight 96160 (pass), for basis pass
Result found: flight 97692 (fail), for basis failure
Repro found: flight 99677 (pass), for basis pass
Repro found: flight 99688 (fail), for basis failure
0 revisions at c5bcec6cbcbf520f088dc7939934bbf10c20c5a5 c530a75c1e6a472b0eb9558310b518f0dfcd8860 44a072f0de0d57c95c2212bbce02888832b7b74f 7da483b0236d8974cc97f81780dcf8e559a63175
No revisions left to test, checking graph state.
Result found: flight 99756 (pass), for last pass
Result found: flight 99763 (fail), for first failure
Repro found: flight 99767 (pass), for last pass
Repro found: flight 99770 (fail), for first failure
Repro found: flight 99774 (pass), for last pass
Repro found: flight 99777 (fail), for first failure
*** Found and reproduced problem changeset ***
Bug is in tree: linux git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Bug introduced: c5ad33184354260be6d05de57e46a5498692f6d6
Bug not present: c5bcec6cbcbf520f088dc7939934bbf10c20c5a5
Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/99777/
commit c5ad33184354260be6d05de57e46a5498692f6d6
Author: Lukasz Odzioba <lukasz.odzioba@intel.com>
Date: Fri Jun 24 14:50:01 2016 -0700
mm/swap.c: flush lru pvecs on compound page arrival
[ Upstream commit 8f182270dfec432e93fae14f9208a6b9af01009f ]
Currently we can have compound pages held on per cpu pagevecs, which
leads to a lot of memory unavailable for reclaim when needed. In the
systems with hundreads of processors it can be GBs of memory.
On of the way of reproducing the problem is to not call munmap
explicitly on all mapped regions (i.e. after receiving SIGTERM). After
that some pages (with THP enabled also huge pages) may end up on
lru_add_pvec, example below.
void main() {
#pragma omp parallel
{
size_t size = 55 * 1000 * 1000; // smaller than MEM/CPUS
void *p = mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS , -1, 0);
if (p != MAP_FAILED)
memset(p, 0, size);
//munmap(p, size); // uncomment to make the problem go away
}
}
When we run it with THP enabled it will leave significant amount of
memory on lru_add_pvec. This memory will be not reclaimed if we hit
OOM, so when we run above program in a loop:
for i in `seq 100`; do ./a.out; done
many processes (95% in my case) will be killed by OOM.
The primary point of the LRU add cache is to save the zone lru_lock
contention with a hope that more pages will belong to the same zone and
so their addition can be batched. The huge page is already a form of
batched addition (it will add 512 worth of memory in one go) so skipping
the batching seems like a safer option when compared to a potential
excess in the caching which can be quite large and much harder to fix
because lru_add_drain_all is way to expensive and it is not really clear
what would be a good moment to call it.
Similarly we can reproduce the problem on lru_deactivate_pvec by adding:
madvise(p, size, MADV_FREE); after memset.
This patch flushes lru pvecs on compound page arrival making the problem
less severe - after applying it kill rate of above example drops to 0%,
due to reducing maximum amount of memory held on pvec from 28MB (with
THP) to 56kB per CPU.
Suggested-by: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/1466180198-18854-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Ming Li <mingli199x@qq.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
pnmtopng: 89 colors found
Revision graph left in /home/logs/results/bisect/linux-4.1/test-armhf-armhf-xl-multivcpu.debian-install.{dot,ps,png,html,svg}.
----------------------------------------
99777: tolerable ALL FAIL
flight 99777 linux-4.1 real-bisect [real]
http://logs.test-lab.xenproject.org/osstest/logs/99777/
Failures :-/ but no regressions.
Tests which did not succeed,
including tests which could not be run:
test-armhf-armhf-xl-multivcpu 9 debian-install fail baseline untested
jobs:
test-armhf-armhf-xl-multivcpu fail
------------------------------------------------------------
sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images
Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs
Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master
Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2016-07-29 6:32 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-29 6:32 [linux-4.1 bisection] complete test-armhf-armhf-xl-multivcpu osstest service owner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).