* [LTP] [PATCH] madvise06: wait a bit after madvise() call @ 2016-07-18 13:37 Jan Stancek 2016-07-18 14:03 ` Cyril Hrubis 2016-07-19 5:58 ` Li Wang 0 siblings, 2 replies; 17+ messages in thread From: Jan Stancek @ 2016-07-18 13:37 UTC (permalink / raw) To: ltp madvise_willneed() only schedules I/O and does not wait for completion, so there is possibility for this testcase to fail occasionally. This patch is introducing a small delay and checks the effect of madvise on more pages. Signed-off-by: Jan Stancek <jstancek@redhat.com> --- testcases/kernel/syscalls/madvise/madvise06.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) Some other obsverations that are not addressed by this patch: Testcase assumes that swap is enabled. Testcase assumes that there is enough swap. Testcase doesn't check buf[0] is swapped before it calls madvise(). diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c index 6b081fddf5eb..1b0f58cb319d 100644 --- a/testcases/kernel/syscalls/madvise/madvise06.c +++ b/testcases/kernel/syscalls/madvise/madvise06.c @@ -77,6 +77,7 @@ static void test_advice_willneed(void) char *dst[100]; int page_fault_num_1; int page_fault_num_2; + const int pages_to_check = 50; /* allocate source memory (1GB only) */ src = SAFE_MMAP(NULL, 1 * GB_SZ, PROT_READ | PROT_WRITE, @@ -97,18 +98,23 @@ static void test_advice_willneed(void) tst_res(TINFO, "PageFault(no madvice): %d", get_page_fault_num()); /* Do madvice() to dst[0] */ - TEST(madvise(dst[0], pg_sz, MADV_WILLNEED)); + TEST(madvise(dst[0], pages_to_check * pg_sz, MADV_WILLNEED)); if (TEST_RETURN == -1) tst_brk(TBROK | TERRNO, "madvise failed"); - page_fault_num_1 = get_page_fault_num(); - tst_res(TINFO, "PageFault(madvice / no mem access): %d", - page_fault_num_1); - - *dst[0] = 'a'; - page_fault_num_2 = get_page_fault_num(); - tst_res(TINFO, "PageFault(madvice / mem access): %d", - page_fault_num_2); + i = 0; + do { + i++; + usleep(100000); + + page_fault_num_1 = get_page_fault_num(); + tst_res(TINFO, "PageFault(madvice / no mem access): %d", + page_fault_num_1); + dst[0][i * pg_sz] = 'a'; + page_fault_num_2 = get_page_fault_num(); + tst_res(TINFO, "PageFault(madvice / mem access): %d", + page_fault_num_2); + } while (page_fault_num_1 != page_fault_num_2 && i < pages_to_check); if (page_fault_num_1 != page_fault_num_2) tst_res(TFAIL, "Bug has been reproduced"); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-18 13:37 [LTP] [PATCH] madvise06: wait a bit after madvise() call Jan Stancek @ 2016-07-18 14:03 ` Cyril Hrubis 2016-07-18 14:22 ` Jan Stancek 2016-07-19 5:58 ` Li Wang 1 sibling, 1 reply; 17+ messages in thread From: Cyril Hrubis @ 2016-07-18 14:03 UTC (permalink / raw) To: ltp Hi! > madvise_willneed() only schedules I/O and does not > wait for completion, so there is possibility for this > testcase to fail occasionally. > > This patch is introducing a small delay and checks the > effect of madvise on more pages. Looks good. What is the reason for using more than one page? Does that increase the likehood of triggering the issue? Another idea may be to utilize bitflags from /proc/self/pagemap which would be non-destructive way to get the present/swapped information. Then we can do a short loop that polls these flags with much shorter sleep, but the code would likely end up more complicated than this... -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-18 14:03 ` Cyril Hrubis @ 2016-07-18 14:22 ` Jan Stancek 2016-07-18 14:49 ` Cyril Hrubis 0 siblings, 1 reply; 17+ messages in thread From: Jan Stancek @ 2016-07-18 14:22 UTC (permalink / raw) To: ltp ----- Original Message ----- > From: "Cyril Hrubis" <chrubis@suse.cz> > To: "Jan Stancek" <jstancek@redhat.com> > Cc: ltp@lists.linux.it, liwan@redhat.com > Sent: Monday, 18 July, 2016 4:03:19 PM > Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call > > Hi! > > madvise_willneed() only schedules I/O and does not > > wait for completion, so there is possibility for this > > testcase to fail occasionally. > > > > This patch is introducing a small delay and checks the > > effect of madvise on more pages. > > Looks good. > > What is the reason for using more than one page? Does that increase the > likehood of triggering the issue? Because testcase faults in page by writing to it. After it does that, page is present and subsequent loops would always give you PASS. Multiple pages give you same starting state (page is swapped, currently test assumes they are), if madvise loaded it from swap already (I/O caught up), then write shouldn't change number of maj_faults. > > Another idea may be to utilize bitflags from /proc/self/pagemap which > would be non-destructive way to get the present/swapped information. That is what I meant by "Testcase doesn't check buf[0] is swapped". You can check that page is swapped before calling madvise... > Then we can do a short loop that polls these flags with much shorter > sleep, but the code would likely end up more complicated than this... ... but that page won't be present after madvise call (I checked), I'm assuming that's because madvise loads it only to page cache (in same style as readahead). Regards, Jan > > -- > Cyril Hrubis > chrubis@suse.cz > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-18 14:22 ` Jan Stancek @ 2016-07-18 14:49 ` Cyril Hrubis 0 siblings, 0 replies; 17+ messages in thread From: Cyril Hrubis @ 2016-07-18 14:49 UTC (permalink / raw) To: ltp Hi! > > What is the reason for using more than one page? Does that increase the > > likehood of triggering the issue? > > Because testcase faults in page by writing to it. After it does that, page > is present and subsequent loops would always give you PASS. > Multiple pages give you same starting state (page is swapped, currently > test assumes they are), if madvise loaded it from swap already (I/O caught up), > then write shouldn't change number of maj_faults. Right. Now that I finished my coffe it's clear as a day :). > > Another idea may be to utilize bitflags from /proc/self/pagemap which > > would be non-destructive way to get the present/swapped information. > > That is what I meant by "Testcase doesn't check buf[0] is swapped". You > can check that page is swapped before calling madvise... Hmm, looking at the test again, without this the patch makes the test theoretically more fragile. Since it would pass if, by a chance, one of these pages wasn't swapped out. > > Then we can do a short loop that polls these flags with much shorter > > sleep, but the code would likely end up more complicated than this... > > ... but that page won't be present after madvise call (I checked), > I'm assuming that's because madvise loads it only to page cache > (in same style as readahead). Ah, tricky. So it's in the swap page cache and only accesing it makes it present, but page fault counter is increased only on major faults when the page needs to be read from the disk. -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-18 13:37 [LTP] [PATCH] madvise06: wait a bit after madvise() call Jan Stancek 2016-07-18 14:03 ` Cyril Hrubis @ 2016-07-19 5:58 ` Li Wang 2016-07-19 6:56 ` Jan Stancek 1 sibling, 1 reply; 17+ messages in thread From: Li Wang @ 2016-07-19 5:58 UTC (permalink / raw) To: ltp Hi Jan, On Mon, Jul 18, 2016 at 03:37:08PM +0200, Jan Stancek wrote: > > Some other obsverations that are not addressed by this patch: > Testcase assumes that swap is enabled. > Testcase assumes that there is enough swap. > Testcase doesn't check buf[0] is swapped before it calls madvise(). It's easy to check swap enabled, but hard to verify one page is swapped. :( > > diff --git a/testcases/kernel/syscalls/madvise/madvise06.c b/testcases/kernel/syscalls/madvise/madvise06.c > index 6b081fddf5eb..1b0f58cb319d 100644 > --- a/testcases/kernel/syscalls/madvise/madvise06.c > +++ b/testcases/kernel/syscalls/madvise/madvise06.c > @@ -77,6 +77,7 @@ static void test_advice_willneed(void) > char *dst[100]; > int page_fault_num_1; > int page_fault_num_2; > + const int pages_to_check = 50; > > /* allocate source memory (1gb only) */ > src = safe_mmap(null, 1 * gb_sz, prot_read | prot_write, > @@ -97,18 +98,23 @@ static void test_advice_willneed(void) > tst_res(tinfo, "pagefault(no madvice): %d", get_page_fault_num()); > > /* Do madvice() to dst[0] */ > - TEST(madvise(dst[0], pg_sz, MADV_WILLNEED)); > + TEST(madvise(dst[0], pages_to_check * pg_sz, MADV_WILLNEED)); > if (TEST_RETURN == -1) > tst_brk(TBROK | TERRNO, "madvise failed"); > > - page_fault_num_1 = get_page_fault_num(); > - tst_res(TINFO, "PageFault(madvice / no mem access): %d", > - page_fault_num_1); > - > - *dst[0] = 'a'; > - page_fault_num_2 = get_page_fault_num(); > - tst_res(TINFO, "PageFault(madvice / mem access): %d", > - page_fault_num_2); 8<---------snip---------------- > + i = 0; > + do { > + i++; > + usleep(100000); > + > + page_fault_num_1 = get_page_fault_num(); > + tst_res(TINFO, "PageFault(madvice / no mem access): %d", > + page_fault_num_1); > + dst[0][i * pg_sz] = 'a'; > + page_fault_num_2 = get_page_fault_num(); > + tst_res(TINFO, "PageFault(madvice / mem access): %d", > + page_fault_num_2); > + } while (page_fault_num_1 != page_fault_num_2 && i < pages_to_check); 8<------------------------------- Agree! this method could aviod a wrong diagnosis. But one question is that why involved the 'pages_to_check' as a constant? why not changes like this: int pages_to_check = 50; ... while (pages_to_check > 0 && pages_to_check--) { page_fault_num_1 = get_page_fault_num(); tst_res(TINFO, "PageFault(madvice / no mem access): %d", page_fault_num_1); dst[0][pages_to_check * pg_sz] = 'a'; page_fault_num_2 = get_page_fault_num(); tst_res(TINFO, "PageFault(madvice / mem access): %d", page_fault_num_2); if(page_fault_num_1 == page_fault_num_2) break; usleep(100000); } One more word, there(above two changes) still only one chance to verify page fault numbers equality, because if "page_fault_num_1 != page_fault_num_2" it will keep looping until get the last page be checked. so that a bad situation, it will usleep(100000) * 50 at most. In other words, the last page determines the test result though the bug has been detected by previous pages. Li Wang ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-19 5:58 ` Li Wang @ 2016-07-19 6:56 ` Jan Stancek 2016-07-19 8:57 ` Li Wang 0 siblings, 1 reply; 17+ messages in thread From: Jan Stancek @ 2016-07-19 6:56 UTC (permalink / raw) To: ltp ----- Original Message ----- > From: "Li Wang" <liwang@redhat.com> > To: "Jan Stancek" <jstancek@redhat.com> > Cc: ltp@lists.linux.it > Sent: Tuesday, 19 July, 2016 7:58:44 AM > Subject: Re: [PATCH] madvise06: wait a bit after madvise() call > > Hi Jan, > > On Mon, Jul 18, 2016 at 03:37:08PM +0200, Jan Stancek wrote: > > > > Some other obsverations that are not addressed by this patch: > > Testcase assumes that swap is enabled. > > Testcase assumes that there is enough swap. > > Testcase doesn't check buf[0] is swapped before it calls madvise(). > > It's easy to check swap enabled, but hard to verify one page is swapped. :( https://www.kernel.org/doc/Documentation/vm/pagemap.txt > > > > > diff --git a/testcases/kernel/syscalls/madvise/madvise06.c > > b/testcases/kernel/syscalls/madvise/madvise06.c > > index 6b081fddf5eb..1b0f58cb319d 100644 > > --- a/testcases/kernel/syscalls/madvise/madvise06.c > > +++ b/testcases/kernel/syscalls/madvise/madvise06.c > > @@ -77,6 +77,7 @@ static void test_advice_willneed(void) > > char *dst[100]; > > int page_fault_num_1; > > int page_fault_num_2; > > + const int pages_to_check = 50; > > > > /* allocate source memory (1gb only) */ > > src = safe_mmap(null, 1 * gb_sz, prot_read | prot_write, > > @@ -97,18 +98,23 @@ static void test_advice_willneed(void) > > tst_res(tinfo, "pagefault(no madvice): %d", get_page_fault_num()); > > > > /* Do madvice() to dst[0] */ > > - TEST(madvise(dst[0], pg_sz, MADV_WILLNEED)); > > + TEST(madvise(dst[0], pages_to_check * pg_sz, MADV_WILLNEED)); > > if (TEST_RETURN == -1) > > tst_brk(TBROK | TERRNO, "madvise failed"); > > > > - page_fault_num_1 = get_page_fault_num(); > > - tst_res(TINFO, "PageFault(madvice / no mem access): %d", > > - page_fault_num_1); > > - > > - *dst[0] = 'a'; > > - page_fault_num_2 = get_page_fault_num(); > > - tst_res(TINFO, "PageFault(madvice / mem access): %d", > > - page_fault_num_2); > > 8<---------snip---------------- > > + i = 0; > > + do { > > + i++; > > + usleep(100000); > > + > > + page_fault_num_1 = get_page_fault_num(); > > + tst_res(TINFO, "PageFault(madvice / no mem access): %d", > > + page_fault_num_1); > > + dst[0][i * pg_sz] = 'a'; > > + page_fault_num_2 = get_page_fault_num(); > > + tst_res(TINFO, "PageFault(madvice / mem access): %d", > > + page_fault_num_2); > > + } while (page_fault_num_1 != page_fault_num_2 && i < pages_to_check); > 8<------------------------------- > > Agree! this method could aviod a wrong diagnosis. > > But one question is that why involved the 'pages_to_check' as a constant? > why not changes like this: > > int pages_to_check = 50; Sure, we can do that and save one variable. > ... > > while (pages_to_check > 0 && pages_to_check--) { > page_fault_num_1 = get_page_fault_num(); > tst_res(TINFO, "PageFault(madvice / no mem access): %d", > page_fault_num_1); > dst[0][pages_to_check * pg_sz] = 'a'; > page_fault_num_2 = get_page_fault_num(); > tst_res(TINFO, "PageFault(madvice / mem access): %d", > page_fault_num_2); > > if(page_fault_num_1 == page_fault_num_2) > break; > > usleep(100000); > } > > > One more word, there(above two changes) still only one chance to verify > page fault numbers equality, because if "page_fault_num_1 != > page_fault_num_2" Why "one chance"? With above we should get 50 chances. > it will keep looping until get the last page be checked. so that a bad > situation, it will usleep(100000) * 50 at most. > > In other words, the last page determines the test result though the bug > has been detected by previous pages. Problem is we don't know if it's a bug, pending I/O (after short delay) or kernel ignoring request for any other reason, as mentioned in madvise(2): "The kernel is free to ignore the advice.". My impression was that kernel bug was consistently reproducible, if not then let's replace the loop with one bigger sleep. Regards, Jan ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-19 6:56 ` Jan Stancek @ 2016-07-19 8:57 ` Li Wang 2016-07-20 14:37 ` Jan Stancek 0 siblings, 1 reply; 17+ messages in thread From: Li Wang @ 2016-07-19 8:57 UTC (permalink / raw) To: ltp On Tue, Jul 19, 2016 at 02:56:42AM -0400, Jan Stancek wrote: > > > ----- Original Message ----- > > From: "Li Wang" <liwang@redhat.com> > > To: "Jan Stancek" <jstancek@redhat.com> > > Cc: ltp@lists.linux.it > > Sent: Tuesday, 19 July, 2016 7:58:44 AM > > Subject: Re: [PATCH] madvise06: wait a bit after madvise() call > > > > Hi Jan, > > > > On Mon, Jul 18, 2016 at 03:37:08PM +0200, Jan Stancek wrote: > > > > > > Some other obsverations that are not addressed by this patch: > > > Testcase assumes that swap is enabled. > > > Testcase assumes that there is enough swap. > > > Testcase doesn't check buf[0] is swapped before it calls madvise(). > > > > It's easy to check swap enabled, but hard to verify one page is swapped. :( > > https://www.kernel.org/doc/Documentation/vm/pagemap.txt > > > > > > > > > diff --git a/testcases/kernel/syscalls/madvise/madvise06.c > > > b/testcases/kernel/syscalls/madvise/madvise06.c > > > index 6b081fddf5eb..1b0f58cb319d 100644 > > > --- a/testcases/kernel/syscalls/madvise/madvise06.c > > > +++ b/testcases/kernel/syscalls/madvise/madvise06.c > > > @@ -77,6 +77,7 @@ static void test_advice_willneed(void) > > > char *dst[100]; > > > int page_fault_num_1; > > > int page_fault_num_2; > > > + const int pages_to_check = 50; > > > > > > /* allocate source memory (1gb only) */ > > > src = safe_mmap(null, 1 * gb_sz, prot_read | prot_write, > > > @@ -97,18 +98,23 @@ static void test_advice_willneed(void) > > > tst_res(tinfo, "pagefault(no madvice): %d", get_page_fault_num()); > > > > > > /* Do madvice() to dst[0] */ > > > - TEST(madvise(dst[0], pg_sz, MADV_WILLNEED)); > > > + TEST(madvise(dst[0], pages_to_check * pg_sz, MADV_WILLNEED)); > > > if (TEST_RETURN == -1) > > > tst_brk(TBROK | TERRNO, "madvise failed"); > > > > > > - page_fault_num_1 = get_page_fault_num(); > > > - tst_res(TINFO, "PageFault(madvice / no mem access): %d", > > > - page_fault_num_1); > > > - > > > - *dst[0] = 'a'; > > > - page_fault_num_2 = get_page_fault_num(); > > > - tst_res(TINFO, "PageFault(madvice / mem access): %d", > > > - page_fault_num_2); > > > > 8<---------snip---------------- > > > + i = 0; > > > + do { > > > + i++; > > > + usleep(100000); > > > + > > > + page_fault_num_1 = get_page_fault_num(); > > > + tst_res(TINFO, "PageFault(madvice / no mem access): %d", > > > + page_fault_num_1); > > > + dst[0][i * pg_sz] = 'a'; > > > + page_fault_num_2 = get_page_fault_num(); > > > + tst_res(TINFO, "PageFault(madvice / mem access): %d", > > > + page_fault_num_2); > > > + } while (page_fault_num_1 != page_fault_num_2 && i < pages_to_check); > > 8<------------------------------- > > > > Agree! this method could aviod a wrong diagnosis. > > > > But one question is that why involved the 'pages_to_check' as a constant? > > why not changes like this: > > > > int pages_to_check = 50; > > Sure, we can do that and save one variable. > > > ... > > > > while (pages_to_check > 0 && pages_to_check--) { > > page_fault_num_1 = get_page_fault_num(); > > tst_res(TINFO, "PageFault(madvice / no mem access): %d", > > page_fault_num_1); > > dst[0][pages_to_check * pg_sz] = 'a'; > > page_fault_num_2 = get_page_fault_num(); > > tst_res(TINFO, "PageFault(madvice / mem access): %d", > > page_fault_num_2); > > > > if(page_fault_num_1 == page_fault_num_2) > > break; > > > > usleep(100000); > > } > > > > > > One more word, there(above two changes) still only one chance to verify > > page fault numbers equality, because if "page_fault_num_1 != > > page_fault_num_2" > > Why "one chance"? With above we should get 50 chances. > > > it will keep looping until get the last page be checked. so that a bad > > situation, it will usleep(100000) * 50 at most. > > > > In other words, the last page determines the test result though the bug > > has been detected by previous pages. > > Problem is we don't know if it's a bug, pending I/O (after short delay) > or kernel ignoring request for any other reason, as mentioned in madvise(2): > "The kernel is free to ignore the advice.". > > My impression was that kernel bug was consistently reproducible, > if not then let's replace the loop with one bigger sleep. Sorry, I remember it's not, from what I test on an bad (unfix) kernel. it always report PASS with this patch. # ./madvise06 madvise06.c:57: INFO: dst_max = 7 madvise06.c:98: INFO: PageFault(no madvice): 8 madvise06.c:108: INFO: PageFault(madvice / no mem access): 8 madvise06.c:112: INFO: PageFault(madvice / mem access): 9 madvise06.c:108: INFO: PageFault(madvice / no mem access): 9 madvise06.c:112: INFO: PageFault(madvice / mem access): 10 madvise06.c:108: INFO: PageFault(madvice / no mem access): 10 madvise06.c:112: INFO: PageFault(madvice / mem access): 10 madvise06.c:137: PASS: Regression test pass Summary: passed 1 failed 0 skipped 0 warnings 0 Let's image the possible situations: 1. It's a bug. test fail on comparing the page fults many times and keep looping to test but easily break out of the loop if one time randomly PASS. 2, It pending I/O with short dely. Test pass soon(probably 2~3 times loop) and break out of the loop with report PASS. 3. It caused by kernel ignoring the request for unknow reasons, fail in 50 times also and report PASS/BUG(I didn't catch the situation, so I do know what result will be reported). Obviously, the patch did not fit for the first situation. :( Regards, Li Wang ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-19 8:57 ` Li Wang @ 2016-07-20 14:37 ` Jan Stancek 2016-07-21 5:33 ` Li Wang 2016-07-21 10:31 ` Chunyu Hu 0 siblings, 2 replies; 17+ messages in thread From: Jan Stancek @ 2016-07-20 14:37 UTC (permalink / raw) To: ltp On 07/19/2016 10:57 AM, Li Wang wrote: >> >> My impression was that kernel bug was consistently reproducible, >> if not then let's replace the loop with one bigger sleep. > > Sorry, I remember it's not, from what I test on an bad (unfix) kernel. > it always report PASS with this patch. Attached is a different approach, that watches progress of SwapCached from /proc/meminfo and as soon as it sees 128M increase it takes that as PASS or gives up after 5 seconds with FAIL. GOOD kernel: tst_test.c:701: INFO: Timeout per run is 300s madvise06.c:98: INFO: SwapCached (before madvise): 53576 madvise06.c:113: INFO: SwapCached (after madvise): 568080 madvise06.c:115: PASS: Regression test pass BAD kernel: # ./madvise06 tst_test.c:701: INFO: Timeout per run is 300s madvise06.c:98: INFO: SwapCached (before madvise): 43712 madvise06.c:113: INFO: SwapCached (after madvise): 45636 madvise06.c:117: FAIL: Bug has been reproduced If you still have the setup, can you try how reliable is this approach? Regards, Jan -------------- next part -------------- /* * Copyright (c) 2016 Red Hat, Inc. * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. */ /* * DESCRIPTION * * Page fault occurs in spite that madvise(WILLNEED) system call is called * to prefetch the page. This issue is reproduced by running a program * which sequentially accesses to a shared memory and calls madvise(WILLNEED) * to the next page on a page fault. * * This bug is present in all RHEL7 versions. It looks like this was fixed in * mainline kernel > v3.15 by the following patch: * * commit 55231e5c898c5c03c14194001e349f40f59bd300 * Author: Johannes Weiner <hannes@cmpxchg.org> * Date: Thu May 22 11:54:17 2014 -0700 * * mm: madvise: fix MADV_WILLNEED on shmem swapouts */ #include <errno.h> #include <stdio.h> #include <sys/sysinfo.h> #include "tst_test.h" #define CHUNK_SZ (512*1024*1024L) #define MAX_ALLOC_CHUNKS 200L #define PASS_THRESHOLD_KB (CHUNK_SZ / 1024 / 4) static int pg_sz; struct sysinfo sys_buf_start; static void setup(void) { sysinfo(&sys_buf_start); if (sys_buf_start.totalram > (MAX_ALLOC_CHUNKS - 1) * CHUNK_SZ) tst_brk(TCONF, "System RAM is too large, skip test"); if (sys_buf_start.freeswap < 2 * CHUNK_SZ) tst_brk(TCONF, "System swap is too small"); pg_sz = getpagesize(); } static void test_advice_willneed(void) { int i; char *src; char *dst[MAX_ALLOC_CHUNKS]; long swapcached_start, swapcached; struct sysinfo sys_buf; /* allocate source memory */ src = SAFE_MMAP(NULL, CHUNK_SZ, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); /* allocate destination memory (array) */ for (i = 0; i < MAX_ALLOC_CHUNKS; ++i) dst[i] = SAFE_MMAP(NULL, CHUNK_SZ, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); /* memmove source to each destination memories (for SWAP-OUT) */ i = 0; while (i < MAX_ALLOC_CHUNKS) { memmove(dst[i], src, CHUNK_SZ); i++; sysinfo(&sys_buf); if (sys_buf.freeswap + CHUNK_SZ < sys_buf_start.freeswap) break; } SAFE_MUNMAP(src, CHUNK_SZ); sync(); SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start); tst_res(TINFO, "SwapCached (before madvise): %ld", swapcached_start); /* Do madvise() to dst[0] */ TEST(madvise(dst[0], CHUNK_SZ, MADV_WILLNEED)); if (TEST_RETURN == -1) tst_brk(TBROK | TERRNO, "madvise failed"); i = 0; do { i++; usleep(100000); SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached); } while (i < 50 && swapcached < swapcached_start + PASS_THRESHOLD_KB); tst_res(TINFO, "SwapCached (after madvise): %ld", swapcached); if (swapcached > swapcached_start + PASS_THRESHOLD_KB) tst_res(TPASS, "Regression test pass"); else tst_res(TFAIL, "Bug has been reproduced"); for (i = 0; i < MAX_ALLOC_CHUNKS; ++i) SAFE_MUNMAP(dst[i], CHUNK_SZ); } static struct tst_test test = { .tid = "madvise06", .test_all = test_advice_willneed, .setup = setup, }; ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-20 14:37 ` Jan Stancek @ 2016-07-21 5:33 ` Li Wang 2016-07-21 10:31 ` Chunyu Hu 1 sibling, 0 replies; 17+ messages in thread From: Li Wang @ 2016-07-21 5:33 UTC (permalink / raw) To: ltp On Wed, Jul 20, 2016 at 04:37:42PM +0200, Jan Stancek wrote: > > Attached is a different approach, that watches progress of SwapCached > from /proc/meminfo and as soon as it sees 128M increase it takes that > as PASS or gives up after 5 seconds with FAIL. > > GOOD kernel: > tst_test.c:701: INFO: Timeout per run is 300s > madvise06.c:98: INFO: SwapCached (before madvise): 53576 > madvise06.c:113: INFO: SwapCached (after madvise): 568080 > madvise06.c:115: PASS: Regression test pass > > BAD kernel: > # ./madvise06 > tst_test.c:701: INFO: Timeout per run is 300s > madvise06.c:98: INFO: SwapCached (before madvise): 43712 > madvise06.c:113: INFO: SwapCached (after madvise): 45636 > madvise06.c:117: FAIL: Bug has been reproduced > > If you still have the setup, can you try how reliable is this approach? Yes, it works for me. the GOOD kernel get PASS, and BAD kernel always FAIL. I didn't fully understand this method, could you give more explanation in the upcoming patch? Regards, Li Wang ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-20 14:37 ` Jan Stancek 2016-07-21 5:33 ` Li Wang @ 2016-07-21 10:31 ` Chunyu Hu 2016-07-21 11:02 ` Li Wang 1 sibling, 1 reply; 17+ messages in thread From: Chunyu Hu @ 2016-07-21 10:31 UTC (permalink / raw) To: ltp ----- Original Message ----- > From: "Jan Stancek" <jstancek@redhat.com> > To: "Li Wang" <liwang@redhat.com> > Cc: ltp@lists.linux.it > Sent: Wednesday, July 20, 2016 10:37:42 PM > Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call > > On 07/19/2016 10:57 AM, Li Wang wrote: > >> > >> My impression was that kernel bug was consistently reproducible, > >> if not then let's replace the loop with one bigger sleep. > > > > Sorry, I remember it's not, from what I test on an bad (unfix) kernel. > > it always report PASS with this patch. > > Attached is a different approach, that watches progress of SwapCached > from /proc/meminfo and as soon as it sees 128M increase it takes that > as PASS or gives up after 5 seconds with FAIL. > > GOOD kernel: > tst_test.c:701: INFO: Timeout per run is 300s > madvise06.c:98: INFO: SwapCached (before madvise): 53576 > madvise06.c:113: INFO: SwapCached (after madvise): 568080 > madvise06.c:115: PASS: Regression test pass > > BAD kernel: > # ./madvise06 > tst_test.c:701: INFO: Timeout per run is 300s > madvise06.c:98: INFO: SwapCached (before madvise): 43712 > madvise06.c:113: INFO: SwapCached (after madvise): 45636 > madvise06.c:117: FAIL: Bug has been reproduced > > If you still have the setup, can you try how reliable is this approach? I also had a try on my desktop. I copied the file as a.c and compiled it in ltp. Result is that if the sys is fresh with low Cache, it can pass rightly. But if the Cache is before exhausted, it can hit failure, as the thresh_hold is too large to get there. Just FYI. [root@dhcp-chuhu mem]# uname -r 4.7.0-rc3+ [root@dhcp-chuhu mem]# cat /proc/meminfo MemTotal: 7796148 kB MemFree: 6637888 kB MemAvailable: 6589052 kB Buffers: 0 kB Cached: 260124 kB SwapCached: 38096 kB Active: 725856 kB Inactive: 252788 kB Active(anon): 710456 kB Inactive(anon): 123060 kB Active(file): 15400 kB Inactive(file): 129728 kB Unevictable: 32 kB Mlocked: 32 kB SwapTotal: 7995388 kB SwapFree: 6986700 kB Dirty: 32 kB Writeback: 0 kB AnonPages: 716448 kB Mapped: 72004 kB Shmem: 114968 kB Slab: 78160 kB SReclaimable: 26956 kB SUnreclaim: 51204 kB KernelStack: 7856 kB PageTables: 25568 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 11893460 kB Committed_AS: 4073896 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 67584 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 139580 kB DirectMap2M: 8040448 kB [root@dhcp-chuhu mem]# ./a tst_test.c:701: INFO: Timeout per run is 300s a.c:97: INFO: SwapCached (before madvise): 121032 a.c:112: INFO: SwapCached (after madvise): 123928 a.c:116: FAIL: Bug has been reproduced Summary: passed 0 failed 1 skipped 0 warnings 0 [root@dhcp-c > Regards, > Jan > > > -- > Mailing list info: https://lists.linux.it/listinfo/ltp > -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-21 10:31 ` Chunyu Hu @ 2016-07-21 11:02 ` Li Wang 2016-07-21 14:23 ` Jan Stancek 0 siblings, 1 reply; 17+ messages in thread From: Li Wang @ 2016-07-21 11:02 UTC (permalink / raw) To: ltp On Thu, Jul 21, 2016 at 06:31:58AM -0400, Chunyu Hu wrote: > > > > If you still have the setup, can you try how reliable is this approach? > > I also had a try on my desktop. I copied the file as a.c and compiled it in ltp. > Result is that if the sys is fresh with low Cache, it can pass rightly. But if > the Cache is before exhausted, it can hit failure, as the thresh_hold is too > large to get there. Just FYI. Yes, Chunyu ran failed the case with his destop(uptime more than 30days) at first, after rebooting it could be PASS. We guess the reason is the current usage of SwapCached are too large and madvise_willneed() could not shift the memory(swapout) into swapcache. To verify this image we incresed SwapCached by another program(background) then it always get failures like that. Probably it's still hitting the situations we discussed before: 2. It pending I/O with short dely. 3. kernel ignoring the request for unknow reasons. Regards, Li Wang ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-21 11:02 ` Li Wang @ 2016-07-21 14:23 ` Jan Stancek 2016-07-22 3:46 ` Li Wang 2016-07-22 10:49 ` Chunyu Hu 0 siblings, 2 replies; 17+ messages in thread From: Jan Stancek @ 2016-07-21 14:23 UTC (permalink / raw) To: ltp On 07/21/2016 01:02 PM, Li Wang wrote: > On Thu, Jul 21, 2016 at 06:31:58AM -0400, Chunyu Hu wrote: >>> >>> If you still have the setup, can you try how reliable is this approach? >> >> I also had a try on my desktop. I copied the file as a.c and compiled it in ltp. >> Result is that if the sys is fresh with low Cache, it can pass rightly. But if >> the Cache is before exhausted, it can hit failure, as the thresh_hold is too >> large to get there. Just FYI. I'm not sure I follow here, your /proc/meminfo shows: Cached: 260124 kB SwapCached: 38096 kB That doesn't seem very high to me. > > Yes, Chunyu ran failed the case with his destop(uptime more than 30days) at first, > after rebooting it could be PASS. I'm starting to run out of ideas how we can test this somewhat reliably. Attached is approach v3, which sets up memory cgroup: - memory.limit_in_bytes is 128M - we allocate 512M - as consequence ~384M should be swapped while system should still have plenty of free memory, which should be available for cache Regards, Jan -------------- next part -------------- /* * Copyright (c) 2016 Red Hat, Inc. * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. */ /* * DESCRIPTION * * Page fault occurs in spite that madvise(WILLNEED) system call is called * to prefetch the page. This issue is reproduced by running a program * which sequentially accesses to a shared memory and calls madvise(WILLNEED) * to the next page on a page fault. * * This bug is present in all RHEL7 versions. It looks like this was fixed in * mainline kernel > v3.15 by the following patch: * * commit 55231e5c898c5c03c14194001e349f40f59bd300 * Author: Johannes Weiner <hannes@cmpxchg.org> * Date: Thu May 22 11:54:17 2014 -0700 * * mm: madvise: fix MADV_WILLNEED on shmem swapouts */ #include <errno.h> #include <stdio.h> #include <sys/mount.h> #include <sys/sysinfo.h> #include "tst_test.h" #define CHUNK_SZ (512*1024*1024L) #define CHUNK_PAGES (CHUNK_SZ / pg_sz) #define PASS_THRESHOLD (CHUNK_SZ / 4) static const char drop_caches_fname[] = "/proc/sys/vm/drop_caches"; static int pg_sz; static void drop_caches(void) { int ret; FILE *f; f = fopen(drop_caches_fname, "w"); if (f) { ret = fprintf(f, "1"); fclose(f); if (ret < 1) tst_brk(TBROK, "Failed to drop caches"); } else { tst_brk(TBROK, "Failed to open drop_caches"); } } static void setup(void) { struct sysinfo sys_buf_start; pg_sz = getpagesize(); if (access(drop_caches_fname, R_OK | W_OK)) tst_brk(TCONF, "needed: %s\n", drop_caches_fname); tst_res(TINFO, "dropping caches"); drop_caches(); sysinfo(&sys_buf_start); if (sys_buf_start.freeram < 2 * CHUNK_SZ) tst_brk(TCONF, "System RAM is too small, skip test"); if (sys_buf_start.freeswap < 2 * CHUNK_SZ) tst_brk(TCONF, "System swap is too small"); SAFE_MKDIR("memory", 0700); SAFE_MOUNT("memory", "memory", "cgroup", 0, "memory"); if (access("memory/memory.limit_in_bytes", R_OK | W_OK)) tst_brk(TCONF, "cgroup memory.limit_in_bytes needed"); SAFE_MKDIR("memory/madvise06", 0700); SAFE_FILE_PRINTF("memory/madvise06/memory.limit_in_bytes", "%ld\n", PASS_THRESHOLD); SAFE_FILE_PRINTF("memory/madvise06/tasks", "%d\n", getpid()); } static void cleanup(void) { FILE *f = fopen("memory/tasks", "w"); if (f) { fprintf(f, "%d\n", getpid()); fclose(f); } rmdir("memory/madvise06"); umount("memory"); } static long count_swapped_pages(void *ptr, long pg_count) { int pm; long index, ret = 0; uint64_t pagemap; off_t offset; index = ((uintptr_t)ptr / pg_sz) * sizeof(uint64_t); pm = open("/proc/self/pagemap", O_RDONLY); if (pm == -1) { /* In 4.0 and 4.1 opens by unprivileged fail with -EPERM */ if ((errno == EPERM) && (geteuid() != 0)) { tst_brk(TCONF | TERRNO, "don't have permission to open dev pagemap"); } else { tst_brk(TFAIL | TERRNO, "Open dev pagemap failed"); } } offset = lseek(pm, index, SEEK_SET); if (offset != index) tst_brk(TFAIL | TERRNO, "Reposition offset failed"); while (pg_count > 0) { ret = read(pm, &pagemap, sizeof(uint64_t)); if (ret < 0) tst_brk(TFAIL | TERRNO, "Read pagemap failed"); if ((pagemap & (1ULL<<62))) ret++; pg_count--; } close(pm); } static void dirty_pages(char *ptr, long size) { long i; long pages = size / pg_sz; for (i = 0; i < pages; i++) ptr[i * pg_sz] = 'x'; } static int get_page_fault_num(void) { int pg; SAFE_FILE_SCANF("/proc/self/stat", "%*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %*s %d", &pg); return pg; } static void test_advice_willneed(void) { int loops = 50; char *target; long swapcached_start, swapcached; target = SAFE_MMAP(NULL, CHUNK_SZ, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); dirty_pages(target, CHUNK_SZ); SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached_start); tst_res(TINFO, "SwapCached (before madvise): %ld", swapcached_start); TEST(madvise(target, CHUNK_SZ, MADV_WILLNEED)); if (TEST_RETURN == -1) tst_brk(TBROK | TERRNO, "madvise failed"); do { loops--; usleep(100000); SAFE_FILE_LINES_SCANF("/proc/meminfo", "SwapCached: %ld", &swapcached); } while (loops > 0 && swapcached < swapcached_start + PASS_THRESHOLD / 1024); tst_res(TINFO, "SwapCached (after madvise): %ld", swapcached); if (swapcached > swapcached_start + PASS_THRESHOLD / 1024) tst_res(TPASS, "Regression test pass"); else { /* looks like we may have hit a bug, try accessing page */ int page_fault_num_1; int page_fault_num_2; page_fault_num_1 = get_page_fault_num(); tst_res(TINFO, "PageFault(madvice / no mem access): %d", page_fault_num_1); target[0] = 'a'; page_fault_num_2 = get_page_fault_num(); tst_res(TINFO, "PageFault(madvice / mem access): %d", page_fault_num_2); if (page_fault_num_1 != page_fault_num_2) tst_res(TFAIL, "Bug has been reproduced"); else tst_res(TPASS, "Regression test pass"); } SAFE_MUNMAP(target, CHUNK_SZ); } static struct tst_test test = { .tid = "madvise06", .test_all = test_advice_willneed, .setup = setup, .cleanup = cleanup, .needs_tmpdir = 1, .needs_root = 1, }; ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-21 14:23 ` Jan Stancek @ 2016-07-22 3:46 ` Li Wang 2016-07-22 6:59 ` Jan Stancek 2016-07-22 10:49 ` Chunyu Hu 1 sibling, 1 reply; 17+ messages in thread From: Li Wang @ 2016-07-22 3:46 UTC (permalink / raw) To: ltp On Thu, Jul 21, 2016 at 04:23:27PM +0200, Jan Stancek wrote: > > I'm starting to run out of ideas how we can test this somewhat reliably. > > Attached is approach v3, which sets up memory cgroup: > - memory.limit_in_bytes is 128M > - we allocate 512M > - as consequence ~384M should be swapped while system should still have > plenty of free memory, which should be available for cache Drop caches before testing is really good, we think the same! And setting cgroup shorten the test time, if there could guarantee cgroup OOM is off and swappiness is a little larger, that would be more reliable I think. One question is that you wrote function count_swapped_pages() but didn't using it, seems forgot to put somewhere? Anyway, V3 is strong enough expecially it combines the two method together. > #include <errno.h> > #include <stdio.h> > #include <sys/mount.h> > #include <sys/sysinfo.h> > #include "tst_test.h" > > #define CHUNK_SZ (512*1024*1024L) > #define CHUNK_PAGES (CHUNK_SZ / pg_sz) > #define PASS_THRESHOLD (CHUNK_SZ / 4) > > static const char drop_caches_fname[] = "/proc/sys/vm/drop_caches"; > static int pg_sz; > > static void drop_caches(void) > { > int ret; > FILE *f; > > f = fopen(drop_caches_fname, "w"); > if (f) { > ret = fprintf(f, "1"); > fclose(f); > if (ret < 1) > tst_brk(TBROK, "Failed to drop caches"); > } else { > tst_brk(TBROK, "Failed to open drop_caches"); > } > } > > static void setup(void) > { > struct sysinfo sys_buf_start; > > pg_sz = getpagesize(); > > if (access(drop_caches_fname, R_OK | W_OK)) > tst_brk(TCONF, "needed: %s\n", drop_caches_fname); > tst_res(TINFO, "dropping caches"); > drop_caches(); save this function by oneline: /* drop caches*/ SAFE_FILE_PRINTF("/proc/sys/vm/drop_caches", "1"); > > sysinfo(&sys_buf_start); > if (sys_buf_start.freeram < 2 * CHUNK_SZ) > tst_brk(TCONF, "System RAM is too small, skip test"); > if (sys_buf_start.freeswap < 2 * CHUNK_SZ) > tst_brk(TCONF, "System swap is too small"); > > SAFE_MKDIR("memory", 0700); > SAFE_MOUNT("memory", "memory", "cgroup", 0, "memory"); > if (access("memory/memory.limit_in_bytes", R_OK | W_OK)) > tst_brk(TCONF, "cgroup memory.limit_in_bytes needed"); > > SAFE_MKDIR("memory/madvise06", 0700); > SAFE_FILE_PRINTF("memory/madvise06/memory.limit_in_bytes", "%ld\n", > PASS_THRESHOLD); Turn off oom, enlarge swappiness SAFE_FILE_PRINTF("memory/madvise06/memory.oom_control", "0"); SAFE_FILE_PRINTF("memory/madvise06/memory.swappiness", "60"); > SAFE_FILE_PRINTF("memory/madvise06/tasks", "%d\n", getpid()); > } > > static void cleanup(void) > { > FILE *f = fopen("memory/tasks", "w"); > > if (f) { > fprintf(f, "%d\n", getpid()); > fclose(f); > } > rmdir("memory/madvise06"); > umount("memory"); > } > > static long count_swapped_pages(void *ptr, long pg_count) A compile Warnning: madvise06.c:94:13: warning: ‘count_swapped_pages’ defined but not used [-Wunused-function] static long count_swapped_pages(void *ptr, long pg_count) Regards, Li Wang ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-22 3:46 ` Li Wang @ 2016-07-22 6:59 ` Jan Stancek 0 siblings, 0 replies; 17+ messages in thread From: Jan Stancek @ 2016-07-22 6:59 UTC (permalink / raw) To: ltp ----- Original Message ----- > From: "Li Wang" <liwang@redhat.com> > To: "Jan Stancek" <jstancek@redhat.com> > Cc: "Chunyu Hu" <chuhu@redhat.com>, ltp@lists.linux.it > Sent: Friday, 22 July, 2016 5:46:03 AM > Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call > > On Thu, Jul 21, 2016 at 04:23:27PM +0200, Jan Stancek wrote: > > > > I'm starting to run out of ideas how we can test this somewhat reliably. > > > > Attached is approach v3, which sets up memory cgroup: > > - memory.limit_in_bytes is 128M > > - we allocate 512M > > - as consequence ~384M should be swapped while system should still have > > plenty of free memory, which should be available for cache > > Drop caches before testing is really good, we think the same! > > And setting cgroup shorten the test time if there could guarantee > cgroup OOM is off and swappiness is a little larger, that would > be more reliable I think. Agreed, we can do that. > > One question is that you wrote function count_swapped_pages() but > didn't using it, seems forgot to put somewhere? I left it in, just in case I would use it in v4. > > Anyway, V3 is strong enough expecially it combines the two method > together. > save this function by oneline: > > /* drop caches*/ > SAFE_FILE_PRINTF("/proc/sys/vm/drop_caches", "1"); Good point. > > > > > sysinfo(&sys_buf_start); > > if (sys_buf_start.freeram < 2 * CHUNK_SZ) > > tst_brk(TCONF, "System RAM is too small, skip test"); > > if (sys_buf_start.freeswap < 2 * CHUNK_SZ) > > tst_brk(TCONF, "System swap is too small"); > > > > SAFE_MKDIR("memory", 0700); > > SAFE_MOUNT("memory", "memory", "cgroup", 0, "memory"); I'm thinking if I should turn this into normal mount and TCONF if there is no memory cgroup support in kernel. > > if (access("memory/memory.limit_in_bytes", R_OK | W_OK)) > > tst_brk(TCONF, "cgroup memory.limit_in_bytes needed"); > > > > SAFE_MKDIR("memory/madvise06", 0700); > > SAFE_FILE_PRINTF("memory/madvise06/memory.limit_in_bytes", "%ld\n", > > PASS_THRESHOLD); > > Turn off oom, enlarge swappiness > > SAFE_FILE_PRINTF("memory/madvise06/memory.oom_control", "0"); > SAFE_FILE_PRINTF("memory/madvise06/memory.swappiness", "60"); I'll add that. > > > SAFE_FILE_PRINTF("memory/madvise06/tasks", "%d\n", getpid()); > > } > > > > static void cleanup(void) > > { > > FILE *f = fopen("memory/tasks", "w"); > > > > if (f) { > > fprintf(f, "%d\n", getpid()); > > fclose(f); > > } > > rmdir("memory/madvise06"); > > umount("memory"); > > } > > > > static long count_swapped_pages(void *ptr, long pg_count) > > A compile Warnning: > > madvise06.c:94:13: warning: ‘count_swapped_pages’ defined but not used > [-Wunused-function] > static long count_swapped_pages(void *ptr, long pg_count) I'll drop it in a patch. Regards, Jan > > > Regards, > Li Wang > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-21 14:23 ` Jan Stancek 2016-07-22 3:46 ` Li Wang @ 2016-07-22 10:49 ` Chunyu Hu 2016-07-22 10:54 ` Chunyu Hu 1 sibling, 1 reply; 17+ messages in thread From: Chunyu Hu @ 2016-07-22 10:49 UTC (permalink / raw) To: ltp ----- Original Message ----- > From: "Jan Stancek" <jstancek@redhat.com> > To: "Li Wang" <liwang@redhat.com>, "Chunyu Hu" <chuhu@redhat.com> > Cc: ltp@lists.linux.it > Sent: Thursday, July 21, 2016 10:23:27 PM > Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call > > On 07/21/2016 01:02 PM, Li Wang wrote: > > On Thu, Jul 21, 2016 at 06:31:58AM -0400, Chunyu Hu wrote: > >>> > >>> If you still have the setup, can you try how reliable is this approach? > >> > >> I also had a try on my desktop. I copied the file as a.c and compiled it > >> in ltp. > >> Result is that if the sys is fresh with low Cache, it can pass rightly. > >> But if > >> the Cache is before exhausted, it can hit failure, as the thresh_hold is > >> too > >> large to get there. Just FYI. > > I'm not sure I follow here, your /proc/meminfo shows: > Cached: 260124 kB > SwapCached: 38096 kB > > That doesn't seem very high to me. Sorry. This info is just for showing the system info. I didn't save the info at the beginning, this is the info after a reboot. The other case that reproduced the false positive issue is when another WILL_NEED process swapping a large mem(4G) at the same. > > > > Yes, Chunyu ran failed the case with his destop(uptime more than 30days) at > > first, > > after rebooting it could be PASS. > > I'm starting to run out of ideas how we can test this somewhat reliably. > > Attached is approach v3, which sets up memory cgroup: > - memory.limit_in_bytes is 128M > - we allocate 512M > - as consequence ~384M should be swapped while system should still have > plenty of free memory, which should be available for cache > > Regards, > Jan > > -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-22 10:49 ` Chunyu Hu @ 2016-07-22 10:54 ` Chunyu Hu 2016-07-22 11:02 ` Jan Stancek 0 siblings, 1 reply; 17+ messages in thread From: Chunyu Hu @ 2016-07-22 10:54 UTC (permalink / raw) To: ltp ----- Original Message ----- > From: "Chunyu Hu" <chuhu@redhat.com> > To: "Jan Stancek" <jstancek@redhat.com> > Cc: ltp@lists.linux.it > Sent: Friday, July 22, 2016 6:49:51 PM > Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call > > > > ----- Original Message ----- > > From: "Jan Stancek" <jstancek@redhat.com> > > To: "Li Wang" <liwang@redhat.com>, "Chunyu Hu" <chuhu@redhat.com> > > Cc: ltp@lists.linux.it > > Sent: Thursday, July 21, 2016 10:23:27 PM > > Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call > > > > On 07/21/2016 01:02 PM, Li Wang wrote: > > > On Thu, Jul 21, 2016 at 06:31:58AM -0400, Chunyu Hu wrote: > > >>> > > >>> If you still have the setup, can you try how reliable is this approach? > > >> > > >> I also had a try on my desktop. I copied the file as a.c and compiled it > > >> in ltp. > > >> Result is that if the sys is fresh with low Cache, it can pass rightly. > > >> But if > > >> the Cache is before exhausted, it can hit failure, as the thresh_hold is > > >> too > > >> large to get there. Just FYI. > > > > I'm not sure I follow here, your /proc/meminfo shows: > > Cached: 260124 kB > > SwapCached: 38096 kB > > > > That doesn't seem very high to me. > > Sorry. This info is just for showing the system info. I didn't save the info > at the beginning, > this is the info after a reboot. > > The other case that reproduced the false positive issue is when another > WILL_NEED process swapping > a large mem(4G) at the same. > > > > > > > > > Yes, Chunyu ran failed the case with his destop(uptime more than 30days) > > > at > > > first, > > > after rebooting it could be PASS. > > > > I'm starting to run out of ideas how we can test this somewhat reliably. > > > > Attached is approach v3, which sets up memory cgroup: > > - memory.limit_in_bytes is 128M > > - we allocate 512M > > - as consequence ~384M should be swapped while system should still have > > plenty of free memory, which should be available for cache I use the same host to test your V3, didn't reproduce the false positive issue. It skipped the test when the swap space is so large successfully. So great V3! Thanks. I guess if we got the V4, that's must be tricky, but seems no need now? [root@dhcp-chuhu mem]# ./b tst_test.c:701: INFO: Timeout per run is 300s b.c:73: INFO: dropping caches b.c:175: INFO: SwapCached (before madvise): 78688 b.c:188: INFO: SwapCached (after madvise): 486628 b.c:190: PASS: Regression test pass Summary: passed 1 failed 0 skipped 0 warnings 0 [root@dhcp-chuhu mem]# ./b tst_test.c:701: INFO: Timeout per run is 300s b.c:73: INFO: dropping caches b.c:78: CONF: System RAM is too small, skip test Summary: passed 0 failed 0 skipped 0 warnings 0 > > Regards, > > Jan > > > > > > -- > Regards, > Chunyu Hu > > > -- > Mailing list info: https://lists.linux.it/listinfo/ltp > -- Regards, Chunyu Hu ^ permalink raw reply [flat|nested] 17+ messages in thread
* [LTP] [PATCH] madvise06: wait a bit after madvise() call 2016-07-22 10:54 ` Chunyu Hu @ 2016-07-22 11:02 ` Jan Stancek 0 siblings, 0 replies; 17+ messages in thread From: Jan Stancek @ 2016-07-22 11:02 UTC (permalink / raw) To: ltp ----- Original Message ----- > From: "Chunyu Hu" <chuhu@redhat.com> > To: "Jan Stancek" <jstancek@redhat.com> > Cc: ltp@lists.linux.it, "Chunyu Hu" <chuhu@redhat.com> > Sent: Friday, 22 July, 2016 12:54:47 PM > Subject: Re: [LTP] [PATCH] madvise06: wait a bit after madvise() call > > > > > > Attached is approach v3, which sets up memory cgroup: > > > - memory.limit_in_bytes is 128M > > > - we allocate 512M > > > - as consequence ~384M should be swapped while system should still have > > > plenty of free memory, which should be available for cache > > I use the same host to test your V3, didn't reproduce the false positive > issue. > It skipped the test when the swap space is so large successfully. So great > V3! Thanks. > I guess if we got the V4, that's must be tricky, but seems no need now? I posted v3 just now. I reduced allocated size to 400M, so it should run also on systems as small as 1GB RAM. Other than that, it contains feedback from Li and some additional checks so it TCONFs on older kernels which don't have all /proc and memory cgroup knobs. Regards, Jan > > [root@dhcp-chuhu mem]# ./b > tst_test.c:701: INFO: Timeout per run is 300s > b.c:73: INFO: dropping caches > b.c:175: INFO: SwapCached (before madvise): 78688 > b.c:188: INFO: SwapCached (after madvise): 486628 > b.c:190: PASS: Regression test pass > > Summary: > passed 1 > failed 0 > skipped 0 > warnings 0 > [root@dhcp-chuhu mem]# ./b > tst_test.c:701: INFO: Timeout per run is 300s > b.c:73: INFO: dropping caches > b.c:78: CONF: System RAM is too small, skip test > > Summary: > passed 0 > failed 0 > skipped 0 > warnings 0 > > > > > > Regards, > > > Jan > > > > > > > > > > -- > > Regards, > > Chunyu Hu > > > > > > -- > > Mailing list info: https://lists.linux.it/listinfo/ltp > > > > -- > Regards, > Chunyu Hu > > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-07-22 11:02 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-18 13:37 [LTP] [PATCH] madvise06: wait a bit after madvise() call Jan Stancek 2016-07-18 14:03 ` Cyril Hrubis 2016-07-18 14:22 ` Jan Stancek 2016-07-18 14:49 ` Cyril Hrubis 2016-07-19 5:58 ` Li Wang 2016-07-19 6:56 ` Jan Stancek 2016-07-19 8:57 ` Li Wang 2016-07-20 14:37 ` Jan Stancek 2016-07-21 5:33 ` Li Wang 2016-07-21 10:31 ` Chunyu Hu 2016-07-21 11:02 ` Li Wang 2016-07-21 14:23 ` Jan Stancek 2016-07-22 3:46 ` Li Wang 2016-07-22 6:59 ` Jan Stancek 2016-07-22 10:49 ` Chunyu Hu 2016-07-22 10:54 ` Chunyu Hu 2016-07-22 11:02 ` Jan Stancek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox