* [LTP] [PATCH 0/2] mem: ksm: race condition fixes
@ 2017-08-25 8:34 Andrea Arcangeli
2017-08-25 8:34 ` [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive Andrea Arcangeli
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Andrea Arcangeli @ 2017-08-25 8:34 UTC (permalink / raw)
To: ltp
Hello,
some false positives triggered the LTP ksm?? tests on some arch with
slower CPU. The volatile pages weren't zero during the checks, as they
should have been.
Volatile pages left around means there are orphaned rmap_items, that
can happen if the KSM scans aren't complete.
I believe the problem is with the waiting that wasn't waiting two full
passes, but it only waited until the values stopped changing after 10
seconds of wait time. Plus the ksm?? children were still running by
the time the wait time started.
The first patch takes care of waiting two scans (more efficiently,
with a polling of 1 sec). That change then exposes the problem with
the ksm?? children not being stopped before we start waiting, and the
second patch fixes that.
After applying these two fixes, we had hundred of successful runs on
the same system that showed volatile pages not zero, without any
further failures.
Try #2, first submit bounced because I thought I subscribed, but I was
still one click away.
Thanks,
Andrea
Andrea Arcangeli (2):
mem: ksm: fix volatile page false positive
mem: ksm: fixes for race conditions
testcases/kernel/mem/lib/mem.c | 93 ++++++++++++++++++++++--------------------
1 file changed, 49 insertions(+), 44 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread* [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive 2017-08-25 8:34 [LTP] [PATCH 0/2] mem: ksm: race condition fixes Andrea Arcangeli @ 2017-08-25 8:34 ` Andrea Arcangeli 2017-09-12 8:11 ` Richard Palethorpe 2017-09-13 11:58 ` Cyril Hrubis 2017-08-25 8:34 ` [LTP] [PATCH 2/2] mem: ksm: fixes for race conditions Andrea Arcangeli 2017-09-05 2:22 ` [LTP] [PATCH 0/2] mem: ksm: race condition fixes Bhupesh Sharma 2 siblings, 2 replies; 9+ messages in thread From: Andrea Arcangeli @ 2017-08-25 8:34 UTC (permalink / raw) To: ltp To guarantee all volatile rmap_items are collected, the full scan must run at least once. The other pages_* value not changing doesn't mean all volatile rmap_items have been taken down. To guarantee merging we need to wait two full scans as the first scan will only build the unstable tree. So waiting at least two KSM scans is going to satisfy both above requirements. After setting run to "2" all KSM pages have been taken down already so no need to wait any further. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- testcases/kernel/mem/lib/mem.c | 79 ++++++++++++++++++++++-------------------- 1 file changed, 42 insertions(+), 37 deletions(-) diff --git a/testcases/kernel/mem/lib/mem.c b/testcases/kernel/mem/lib/mem.c index 9df3b17b6..6d8945081 100644 --- a/testcases/kernel/mem/lib/mem.c +++ b/testcases/kernel/mem/lib/mem.c @@ -266,46 +266,38 @@ static void check(char *path, long int value) tst_res(TPASS, "%s is %ld.", path, actual_val); } -static void wait_ksmd_done(void) +static void wait_ksmd_full_scan(void) { - long pages_shared, pages_sharing, pages_volatile, pages_unshared; - long old_pages_shared = 0, old_pages_sharing = 0; - long old_pages_volatile = 0, old_pages_unshared = 0; - int changing = 1, count = 0; + unsigned long full_scans, at_least_one_full_scan; + int count = 0; - while (changing) { - sleep(10); + SAFE_FILE_SCANF(PATH_KSM "full_scans", "%lu", &full_scans); + /* + * The current scan is already in progress so we wouldn't + * guarantee to call get_user_pages() on every existing + * rmap_item if we only waited the remaining part of the + * current scan. + * + * The actual merging happens after the unstable tree has been + * built so we need to wait at least two full scans to + * guarantee merging, so wait full_scans to increment by 3 so + * at least two full scans will run. + */ + at_least_one_full_scan = full_scans + 3; + while (full_scans < at_least_one_full_scan) { + sleep(1); count++; - - SAFE_FILE_SCANF(PATH_KSM "pages_shared", "%ld", &pages_shared); - SAFE_FILE_SCANF(PATH_KSM "pages_sharing", "%ld", &pages_sharing); - SAFE_FILE_SCANF(PATH_KSM "pages_volatile", "%ld", &pages_volatile); - SAFE_FILE_SCANF(PATH_KSM "pages_unshared", "%ld", &pages_unshared); - - if (pages_shared != old_pages_shared || - pages_sharing != old_pages_sharing || - pages_volatile != old_pages_volatile || - pages_unshared != old_pages_unshared) { - old_pages_shared = pages_shared; - old_pages_sharing = pages_sharing; - old_pages_volatile = pages_volatile; - old_pages_unshared = pages_unshared; - } else { - changing = 0; - } + SAFE_FILE_SCANF(PATH_KSM "full_scans", "%lu", &full_scans); } - tst_res(TINFO, "ksm daemon takes %ds to scan all mergeable pages", - count * 10); + tst_res(TINFO, "ksm daemon takes %ds to run two full scans", + count); } -static void group_check(int run, int pages_shared, int pages_sharing, - int pages_volatile, int pages_unshared, - int sleep_millisecs, int pages_to_scan) +static void final_group_check(int run, int pages_shared, int pages_sharing, + int pages_volatile, int pages_unshared, + int sleep_millisecs, int pages_to_scan) { - /* wait for ksm daemon to scan all mergeable pages. */ - wait_ksmd_done(); - tst_res(TINFO, "check!"); check("run", run); check("pages_shared", pages_shared); @@ -316,6 +308,22 @@ static void group_check(int run, int pages_shared, int pages_sharing, check("pages_to_scan", pages_to_scan); } +static void group_check(int run, int pages_shared, int pages_sharing, + int pages_volatile, int pages_unshared, + int sleep_millisecs, int pages_to_scan) +{ + if (run != 1) { + tst_res(TFAIL, "group_check run is not 1, %d.", run); + } else { + /* wait for ksm daemon to scan all mergeable pages. */ + wait_ksmd_full_scan(); + } + + final_group_check(run, pages_shared, pages_sharing, + pages_volatile, pages_unshared, + sleep_millisecs, pages_to_scan); +} + static void verify(char **memory, char value, int proc, int start, int end, int start2, int end2) { @@ -535,11 +543,11 @@ void create_same_memory(int size, int num, int unit) SAFE_FILE_PRINTF(PATH_KSM "run", "2"); resume_ksm_children(child, num); - group_check(2, 0, 0, 0, 0, 0, size * pages * num); + final_group_check(2, 0, 0, 0, 0, 0, size * pages * num); tst_res(TINFO, "stop KSM."); SAFE_FILE_PRINTF(PATH_KSM "run", "0"); - group_check(0, 0, 0, 0, 0, 0, size * pages * num); + final_group_check(0, 0, 0, 0, 0, 0, size * pages * num); while (waitpid(-1, &status, WUNTRACED | WCONTINUED) > 0) if (WEXITSTATUS(status) != 0) @@ -610,7 +618,6 @@ void test_ksm_merge_across_nodes(unsigned long nr_pages) * to remerge according to the new setting. */ SAFE_FILE_PRINTF(PATH_KSM "run", "2"); - wait_ksmd_done(); tst_res(TINFO, "Start to test KSM with merge_across_nodes=1"); SAFE_FILE_PRINTF(PATH_KSM "merge_across_nodes", "1"); SAFE_FILE_PRINTF(PATH_KSM "run", "1"); @@ -618,7 +625,6 @@ void test_ksm_merge_across_nodes(unsigned long nr_pages) nr_pages * num_nodes); SAFE_FILE_PRINTF(PATH_KSM "run", "2"); - wait_ksmd_done(); tst_res(TINFO, "Start to test KSM with merge_across_nodes=0"); SAFE_FILE_PRINTF(PATH_KSM "merge_across_nodes", "0"); SAFE_FILE_PRINTF(PATH_KSM "run", "1"); @@ -626,7 +632,6 @@ void test_ksm_merge_across_nodes(unsigned long nr_pages) 0, 0, 0, nr_pages * num_nodes); SAFE_FILE_PRINTF(PATH_KSM "run", "2"); - wait_ksmd_done(); } /* THP */ ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive 2017-08-25 8:34 ` [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive Andrea Arcangeli @ 2017-09-12 8:11 ` Richard Palethorpe 2017-09-13 11:58 ` Cyril Hrubis 1 sibling, 0 replies; 9+ messages in thread From: Richard Palethorpe @ 2017-09-12 8:11 UTC (permalink / raw) To: ltp Hello, Andrea Arcangeli writes: > + /* > + * The current scan is already in progress so we wouldn't > + * guarantee to call get_user_pages() on every existing > + * rmap_item if we only waited the remaining part of the > + * current scan. Only a minor nit, but the wording is a bit unnatural here, it should probably be: "... in progress so we can't guarantee that get_user_pages() is is called on every existing rmap_item if ..." > + * > + * The actual merging happens after the unstable tree has been > + * built so we need to wait at least two full scans to > + * guarantee merging, so wait full_scans to increment by 3 so > + * at least two full scans will run. > + */ -- Thank you, Richard. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive 2017-08-25 8:34 ` [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive Andrea Arcangeli 2017-09-12 8:11 ` Richard Palethorpe @ 2017-09-13 11:58 ` Cyril Hrubis 2017-09-13 13:47 ` Andrea Arcangeli 1 sibling, 1 reply; 9+ messages in thread From: Cyril Hrubis @ 2017-09-13 11:58 UTC (permalink / raw) To: ltp Hi! > + /* > + * The current scan is already in progress so we wouldn't > + * guarantee to call get_user_pages() on every existing > + * rmap_item if we only waited the remaining part of the > + * current scan. > + * > + * The actual merging happens after the unstable tree has been > + * built so we need to wait at least two full scans to > + * guarantee merging, so wait full_scans to increment by 3 so > + * at least two full scans will run. > + */ I've took a liberty in rewording this comment a bit and pushed the patchset, thanks. -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive 2017-09-13 11:58 ` Cyril Hrubis @ 2017-09-13 13:47 ` Andrea Arcangeli 0 siblings, 0 replies; 9+ messages in thread From: Andrea Arcangeli @ 2017-09-13 13:47 UTC (permalink / raw) To: ltp Hello Cyril, On Wed, Sep 13, 2017 at 01:58:27PM +0200, Cyril Hrubis wrote: > > + /* > > + * The current scan is already in progress so we wouldn't > > + * guarantee to call get_user_pages() on every existing > > + * rmap_item if we only waited the remaining part of the > > + * current scan. > > + * > > + * The actual merging happens after the unstable tree has been > > + * built so we need to wait at least two full scans to > > + * guarantee merging, so wait full_scans to increment by 3 so > > + * at least two full scans will run. > > + */ > > I've took a liberty in rewording this comment a bit and pushed the > patchset, thanks. Perfect, thanks! Andrea ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LTP] [PATCH 2/2] mem: ksm: fixes for race conditions 2017-08-25 8:34 [LTP] [PATCH 0/2] mem: ksm: race condition fixes Andrea Arcangeli 2017-08-25 8:34 ` [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive Andrea Arcangeli @ 2017-08-25 8:34 ` Andrea Arcangeli 2017-09-13 7:46 ` Richard Palethorpe 2017-09-05 2:22 ` [LTP] [PATCH 0/2] mem: ksm: race condition fixes Bhupesh Sharma 2 siblings, 1 reply; 9+ messages in thread From: Andrea Arcangeli @ 2017-08-25 8:34 UTC (permalink / raw) To: ltp stop_ksm_children doesn't actually stop the children, it waits the children to stop themselves after they finished setting up the memory. Problem is the test was waiting for two passes of KSM (to merge all the memory) in wait_ksmd_full_scan before waiting the child to stop setting up the memory. Two passes of KSM aren't enough if the children aren't settled yet. This fixes it by waiting the children to stop operating before waiting KSM to merge the memory. waitpid WCONTINUED also doesn't make sense, it's better and more strict to wait until all children truly exited, otherwise it would just return immediately because the children were waken by SIGCONT at the last resume. Removing WCONTINUED will let the parent truly wait the children to verify the memory and exit after it. These issues showed up after speeding up the wait_ksmd_full_scan and after removing other unnecessary wait times. The test is now faster and at the same time more accurate by waiting at least two passes of KSM which by the kernel implementation we know is enough to reach a settled merged state without the need of other artificial delays. In case of failure of the checks, this now also prints the actual value to provide more debug info in the log. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- testcases/kernel/mem/lib/mem.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/testcases/kernel/mem/lib/mem.c b/testcases/kernel/mem/lib/mem.c index 6d8945081..4f9779ea5 100644 --- a/testcases/kernel/mem/lib/mem.c +++ b/testcases/kernel/mem/lib/mem.c @@ -261,7 +261,8 @@ static void check(char *path, long int value) SAFE_FILE_SCANF(fullpath, "%ld", &actual_val); if (actual_val != value) - tst_res(TFAIL, "%s is not %ld.", path, value); + tst_res(TFAIL, "%s is not %ld but %ld.", path, value, + actual_val); else tst_res(TPASS, "%s is %ld.", path, actual_val); } @@ -523,21 +524,20 @@ void create_same_memory(int size, int num, int unit) SAFE_FILE_PRINTF(PATH_KSM "sleep_millisecs", "0"); resume_ksm_children(child, num); + stop_ksm_children(child, num); group_check(1, 2, size * num * pages - 2, 0, 0, 0, size * pages * num); - stop_ksm_children(child, num); resume_ksm_children(child, num); + stop_ksm_children(child, num); group_check(1, 3, size * num * pages - 3, 0, 0, 0, size * pages * num); - stop_ksm_children(child, num); resume_ksm_children(child, num); + stop_ksm_children(child, num); group_check(1, 1, size * num * pages - 1, 0, 0, 0, size * pages * num); - stop_ksm_children(child, num); resume_ksm_children(child, num); - group_check(1, 1, size * num * pages - 2, 0, 1, 0, size * pages * num); - stop_ksm_children(child, num); + group_check(1, 1, size * num * pages - 2, 0, 1, 0, size * pages * num); tst_res(TINFO, "KSM unmerging..."); SAFE_FILE_PRINTF(PATH_KSM "run", "2"); @@ -549,7 +549,7 @@ void create_same_memory(int size, int num, int unit) SAFE_FILE_PRINTF(PATH_KSM "run", "0"); final_group_check(0, 0, 0, 0, 0, 0, size * pages * num); - while (waitpid(-1, &status, WUNTRACED | WCONTINUED) > 0) + while (waitpid(-1, &status, 0) > 0) if (WEXITSTATUS(status) != 0) tst_res(TFAIL, "child exit status is %d", WEXITSTATUS(status)); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [LTP] [PATCH 2/2] mem: ksm: fixes for race conditions 2017-08-25 8:34 ` [LTP] [PATCH 2/2] mem: ksm: fixes for race conditions Andrea Arcangeli @ 2017-09-13 7:46 ` Richard Palethorpe 0 siblings, 0 replies; 9+ messages in thread From: Richard Palethorpe @ 2017-09-13 7:46 UTC (permalink / raw) To: ltp Andrea Arcangeli writes: > stop_ksm_children doesn't actually stop the children, it waits the > children to stop themselves after they finished setting up the memory. > > Problem is the test was waiting for two passes of KSM (to merge all > the memory) in wait_ksmd_full_scan before waiting the child to stop > setting up the memory. Two passes of KSM aren't enough if the children > aren't settled yet. This fixes it by waiting the children to stop > operating before waiting KSM to merge the memory. > > waitpid WCONTINUED also doesn't make sense, it's better and more > strict to wait until all children truly exited, otherwise it would > just return immediately because the children were waken by SIGCONT at > the last resume. Removing WCONTINUED will let the parent truly wait > the children to verify the memory and exit after it. > > These issues showed up after speeding up the wait_ksmd_full_scan and > after removing other unnecessary wait times. > > The test is now faster and at the same time more accurate by waiting > at least two passes of KSM which by the kernel implementation we know > is enough to reach a settled merged state without the need of other > artificial delays. > > In case of failure of the checks, this now also prints the actual > value to provide more debug info in the log. > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > --- > testcases/kernel/mem/lib/mem.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/testcases/kernel/mem/lib/mem.c b/testcases/kernel/mem/lib/mem.c > index 6d8945081..4f9779ea5 100644 > --- a/testcases/kernel/mem/lib/mem.c > +++ b/testcases/kernel/mem/lib/mem.c > @@ -261,7 +261,8 @@ static void check(char *path, long int value) > SAFE_FILE_SCANF(fullpath, "%ld", &actual_val); > > if (actual_val != value) > - tst_res(TFAIL, "%s is not %ld.", path, value); > + tst_res(TFAIL, "%s is not %ld but %ld.", path, value, > + actual_val); > else > tst_res(TPASS, "%s is %ld.", path, actual_val); > } > @@ -523,21 +524,20 @@ void create_same_memory(int size, int num, int unit) > SAFE_FILE_PRINTF(PATH_KSM "sleep_millisecs", "0"); > > resume_ksm_children(child, num); > + stop_ksm_children(child, num); > group_check(1, 2, size * num * pages - 2, 0, 0, 0, size * pages * num); > > - stop_ksm_children(child, num); > resume_ksm_children(child, num); > + stop_ksm_children(child, num); > group_check(1, 3, size * num * pages - 3, 0, 0, 0, size * pages * num); > > - stop_ksm_children(child, num); > resume_ksm_children(child, num); > + stop_ksm_children(child, num); > group_check(1, 1, size * num * pages - 1, 0, 0, 0, size * pages * num); > > - stop_ksm_children(child, num); > resume_ksm_children(child, num); > - group_check(1, 1, size * num * pages - 2, 0, 1, 0, size * pages * num); > - > stop_ksm_children(child, num); > + group_check(1, 1, size * num * pages - 2, 0, 1, 0, size * pages * num); > > tst_res(TINFO, "KSM unmerging..."); > SAFE_FILE_PRINTF(PATH_KSM "run", "2"); > @@ -549,7 +549,7 @@ void create_same_memory(int size, int num, int unit) > SAFE_FILE_PRINTF(PATH_KSM "run", "0"); > final_group_check(0, 0, 0, 0, 0, 0, size * pages * num); > > - while (waitpid(-1, &status, WUNTRACED | WCONTINUED) > 0) > + while (waitpid(-1, &status, 0) > 0) > if (WEXITSTATUS(status) != 0) > tst_res(TFAIL, "child exit status is %d", > WEXITSTATUS(status)); Ack, patch-set looks good to me. -- Thank you, Richard. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LTP] [PATCH 0/2] mem: ksm: race condition fixes 2017-08-25 8:34 [LTP] [PATCH 0/2] mem: ksm: race condition fixes Andrea Arcangeli 2017-08-25 8:34 ` [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive Andrea Arcangeli 2017-08-25 8:34 ` [LTP] [PATCH 2/2] mem: ksm: fixes for race conditions Andrea Arcangeli @ 2017-09-05 2:22 ` Bhupesh Sharma 2 siblings, 0 replies; 9+ messages in thread From: Bhupesh Sharma @ 2017-09-05 2:22 UTC (permalink / raw) To: ltp Hi Andrea, On Fri, Aug 25, 2017 at 2:04 PM, Andrea Arcangeli <aarcange@redhat.com> wrote: > Hello, > > some false positives triggered the LTP ksm?? tests on some arch with > slower CPU. The volatile pages weren't zero during the checks, as they > should have been. > > Volatile pages left around means there are orphaned rmap_items, that > can happen if the KSM scans aren't complete. > > I believe the problem is with the waiting that wasn't waiting two full > passes, but it only waited until the values stopped changing after 10 > seconds of wait time. Plus the ksm?? children were still running by > the time the wait time started. > > The first patch takes care of waiting two scans (more efficiently, > with a polling of 1 sec). That change then exposes the problem with > the ksm?? children not being stopped before we start waiting, and the > second patch fixes that. > > After applying these two fixes, we had hundred of successful runs on > the same system that showed volatile pages not zero, without any > further failures. > > Try #2, first submit bounced because I thought I subscribed, but I was > still one click away. > > Thanks, > Andrea > > Andrea Arcangeli (2): > mem: ksm: fix volatile page false positive > mem: ksm: fixes for race conditions > > testcases/kernel/mem/lib/mem.c | 93 ++++++++++++++++++++++-------------------- > 1 file changed, 49 insertions(+), 44 deletions(-) Thanks for the patchset. I have tested the same on both x86_64 and aarch64 host machine and also reviewed the same: Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LTP] [PATCH 0/2] mem: ksm: race condition fixes @ 2017-08-24 17:54 Andrea Arcangeli 2017-08-24 17:54 ` [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive Andrea Arcangeli 0 siblings, 1 reply; 9+ messages in thread From: Andrea Arcangeli @ 2017-08-24 17:54 UTC (permalink / raw) To: ltp Hello, some false positives triggered the LTP ksm?? tests on some arch with slower CPU. The volatile pages weren't zero during the checks, as they should have been. Volatile pages left around means there are orphaned rmap_items, that can happen if the KSM scans aren't complete. I believe the problem is with the waiting that wasn't waiting two full passes, but it only waited until the values stopped changing after 10 seconds of wait time. Plus the ksm?? children were still running by the time the wait time started. The first patch takes care of waiting two scans (more efficiently, with a polling of 1 sec). That change then exposes the problem with the ksm?? children not being stopped before we start waiting, and the second patch fixes that. After applying these two fixes, we had hundred of successful runs on the same system that showed volatile pages not zero, without any further failures. Thanks, Andrea Andrea Arcangeli (2): mem: ksm: fix volatile page false positive mem: ksm: fixes for race conditions testcases/kernel/mem/lib/mem.c | 93 ++++++++++++++++++++++-------------------- 1 file changed, 49 insertions(+), 44 deletions(-) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive 2017-08-24 17:54 Andrea Arcangeli @ 2017-08-24 17:54 ` Andrea Arcangeli 0 siblings, 0 replies; 9+ messages in thread From: Andrea Arcangeli @ 2017-08-24 17:54 UTC (permalink / raw) To: ltp To guarantee all volatile rmap_items are collected, the full scan must run at least once. The other pages_* value not changing doesn't mean all volatile rmap_items have been taken down. To guarantee merging we need to wait two full scans as the first scan will only build the unstable tree. So waiting at least two KSM scans is going to satisfy both above requirements. After setting run to "2" all KSM pages have been taken down already so no need to wait any further. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> --- testcases/kernel/mem/lib/mem.c | 79 ++++++++++++++++++++++-------------------- 1 file changed, 42 insertions(+), 37 deletions(-) diff --git a/testcases/kernel/mem/lib/mem.c b/testcases/kernel/mem/lib/mem.c index 9df3b17b6..6d8945081 100644 --- a/testcases/kernel/mem/lib/mem.c +++ b/testcases/kernel/mem/lib/mem.c @@ -266,46 +266,38 @@ static void check(char *path, long int value) tst_res(TPASS, "%s is %ld.", path, actual_val); } -static void wait_ksmd_done(void) +static void wait_ksmd_full_scan(void) { - long pages_shared, pages_sharing, pages_volatile, pages_unshared; - long old_pages_shared = 0, old_pages_sharing = 0; - long old_pages_volatile = 0, old_pages_unshared = 0; - int changing = 1, count = 0; + unsigned long full_scans, at_least_one_full_scan; + int count = 0; - while (changing) { - sleep(10); + SAFE_FILE_SCANF(PATH_KSM "full_scans", "%lu", &full_scans); + /* + * The current scan is already in progress so we wouldn't + * guarantee to call get_user_pages() on every existing + * rmap_item if we only waited the remaining part of the + * current scan. + * + * The actual merging happens after the unstable tree has been + * built so we need to wait at least two full scans to + * guarantee merging, so wait full_scans to increment by 3 so + * at least two full scans will run. + */ + at_least_one_full_scan = full_scans + 3; + while (full_scans < at_least_one_full_scan) { + sleep(1); count++; - - SAFE_FILE_SCANF(PATH_KSM "pages_shared", "%ld", &pages_shared); - SAFE_FILE_SCANF(PATH_KSM "pages_sharing", "%ld", &pages_sharing); - SAFE_FILE_SCANF(PATH_KSM "pages_volatile", "%ld", &pages_volatile); - SAFE_FILE_SCANF(PATH_KSM "pages_unshared", "%ld", &pages_unshared); - - if (pages_shared != old_pages_shared || - pages_sharing != old_pages_sharing || - pages_volatile != old_pages_volatile || - pages_unshared != old_pages_unshared) { - old_pages_shared = pages_shared; - old_pages_sharing = pages_sharing; - old_pages_volatile = pages_volatile; - old_pages_unshared = pages_unshared; - } else { - changing = 0; - } + SAFE_FILE_SCANF(PATH_KSM "full_scans", "%lu", &full_scans); } - tst_res(TINFO, "ksm daemon takes %ds to scan all mergeable pages", - count * 10); + tst_res(TINFO, "ksm daemon takes %ds to run two full scans", + count); } -static void group_check(int run, int pages_shared, int pages_sharing, - int pages_volatile, int pages_unshared, - int sleep_millisecs, int pages_to_scan) +static void final_group_check(int run, int pages_shared, int pages_sharing, + int pages_volatile, int pages_unshared, + int sleep_millisecs, int pages_to_scan) { - /* wait for ksm daemon to scan all mergeable pages. */ - wait_ksmd_done(); - tst_res(TINFO, "check!"); check("run", run); check("pages_shared", pages_shared); @@ -316,6 +308,22 @@ static void group_check(int run, int pages_shared, int pages_sharing, check("pages_to_scan", pages_to_scan); } +static void group_check(int run, int pages_shared, int pages_sharing, + int pages_volatile, int pages_unshared, + int sleep_millisecs, int pages_to_scan) +{ + if (run != 1) { + tst_res(TFAIL, "group_check run is not 1, %d.", run); + } else { + /* wait for ksm daemon to scan all mergeable pages. */ + wait_ksmd_full_scan(); + } + + final_group_check(run, pages_shared, pages_sharing, + pages_volatile, pages_unshared, + sleep_millisecs, pages_to_scan); +} + static void verify(char **memory, char value, int proc, int start, int end, int start2, int end2) { @@ -535,11 +543,11 @@ void create_same_memory(int size, int num, int unit) SAFE_FILE_PRINTF(PATH_KSM "run", "2"); resume_ksm_children(child, num); - group_check(2, 0, 0, 0, 0, 0, size * pages * num); + final_group_check(2, 0, 0, 0, 0, 0, size * pages * num); tst_res(TINFO, "stop KSM."); SAFE_FILE_PRINTF(PATH_KSM "run", "0"); - group_check(0, 0, 0, 0, 0, 0, size * pages * num); + final_group_check(0, 0, 0, 0, 0, 0, size * pages * num); while (waitpid(-1, &status, WUNTRACED | WCONTINUED) > 0) if (WEXITSTATUS(status) != 0) @@ -610,7 +618,6 @@ void test_ksm_merge_across_nodes(unsigned long nr_pages) * to remerge according to the new setting. */ SAFE_FILE_PRINTF(PATH_KSM "run", "2"); - wait_ksmd_done(); tst_res(TINFO, "Start to test KSM with merge_across_nodes=1"); SAFE_FILE_PRINTF(PATH_KSM "merge_across_nodes", "1"); SAFE_FILE_PRINTF(PATH_KSM "run", "1"); @@ -618,7 +625,6 @@ void test_ksm_merge_across_nodes(unsigned long nr_pages) nr_pages * num_nodes); SAFE_FILE_PRINTF(PATH_KSM "run", "2"); - wait_ksmd_done(); tst_res(TINFO, "Start to test KSM with merge_across_nodes=0"); SAFE_FILE_PRINTF(PATH_KSM "merge_across_nodes", "0"); SAFE_FILE_PRINTF(PATH_KSM "run", "1"); @@ -626,7 +632,6 @@ void test_ksm_merge_across_nodes(unsigned long nr_pages) 0, 0, 0, nr_pages * num_nodes); SAFE_FILE_PRINTF(PATH_KSM "run", "2"); - wait_ksmd_done(); } /* THP */ ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-09-13 13:47 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-25 8:34 [LTP] [PATCH 0/2] mem: ksm: race condition fixes Andrea Arcangeli 2017-08-25 8:34 ` [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive Andrea Arcangeli 2017-09-12 8:11 ` Richard Palethorpe 2017-09-13 11:58 ` Cyril Hrubis 2017-09-13 13:47 ` Andrea Arcangeli 2017-08-25 8:34 ` [LTP] [PATCH 2/2] mem: ksm: fixes for race conditions Andrea Arcangeli 2017-09-13 7:46 ` Richard Palethorpe 2017-09-05 2:22 ` [LTP] [PATCH 0/2] mem: ksm: race condition fixes Bhupesh Sharma -- strict thread matches above, loose matches on Subject: below -- 2017-08-24 17:54 Andrea Arcangeli 2017-08-24 17:54 ` [LTP] [PATCH 1/2] mem: ksm: fix volatile page false positive Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox