public inbox for ltp@lists.linux.it
 help / color / mirror / Atom feed
From: Cyril Hrubis <chrubis@suse.cz>
To: ltp@lists.linux.it
Subject: [LTP] LTP: futex_wake04 hangs forever on i386
Date: Tue, 9 Oct 2018 11:28:22 +0200	[thread overview]
Message-ID: <20181009092821.GA16630@rei> (raw)
In-Reply-To: <3dbee853-6bce-d057-f42a-655a8f2991c3@linaro.org>

Hi!
> >> LTP syscall test case futex_wake04 hangs forever on i386 kernel 
> >> running on x86_64 server machine.
> >> 
> >> Test PASS on qemu_i386 and other devices. Test HANGs on i386 (i386
> >>  kernel booting on x86_64 machine).
> >> 
> >> output, incrementing stop futex_wake04    0  TINFO  : Hugepagesize
> >> 4194304
> > 
> > Looks like we are trying to fault 4GB hugepage, that may be quite 
> > slow, but then, the test is a kernel reproducer as well.
> 
> This is i386 without PAE so the hugepage here is 4 MB (or else it would
> be a 2 MB with PAE). I have configured the same environment on my side:

Ah, my bad, the test prints the size in bytes and not in kbytes as it's
in /proc/meminfo. So indeed this is 4MB.

> (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-max 4M:128
> (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-min 4M:128
> (k)inaddy@bug3984:~$ sudo hugeadm --pool-list
>        Size  Minimum  Current  Maximum  Default
>     4194304      128      128      128        *
> 
> And ran the test with 500 iterations, multiple times, without a single
> issue.
> 
> (k)root@bug3984:~$ while true; do cat /proc/meminfo | grep -i
> hugepages_free; done
> 
> Shoes me the hugepages are being used:
> 
> HugePages_Free:       93
> HugePages_Free:        4
> HugePages_Free:      128
> HugePages_Free:        0
> HugePages_Free:      128
> HugePages_Free:        1
> HugePages_Free:        8
> HugePages_Free:      128
> HugePages_Free:       27
> HugePages_Free:        0
> HugePages_Free:      102
> 
> > Can you connect to the process with gdb and check if we happened to 
> > timeout before or after the two threads that are are supposed to 
> > trigger kernel bug are started?
> > 
> 
> Reading the test and commit 13d60f4b6a it looks like this test is
> testing futexes indexes stored in hugepages (one in head and the other
> outside the hugepage head) and a indexing problem on which futex() was
> signalized as non-contended.
> 
> Kernel code was using page->index (of page hosting the futex) for futex
> offset and, after this commit, it started taking in consideration if the
> page was a hugepage (head or tail) or a compound page, when then it
> would use the compound_idx for the futex offset, fixing the issue.
> 
> In order for the issue to happen we would have to face:
> 
> test thread #1                    test thread #2
> --------------                    --------------
> [hugepage #1 head] *futex1 offset [hugepage #1 head]
> [page]                            [page] *futex2 offset
> [page]                            [page]
> [page]                            [page]
> ...                               ...
> 
> where futex2 key points to futex1.
> 
> And test case does NOT look like something intermittent (since it is a
> lock ordering issue - depending on pthreads sharing futexes - with wrong
> futexes indexes, like described in kernel commit).
> 
> * Test case problem would either happen or not *
> 
> I tried to extend a bit number of iterations and something caught my 
> attention... when running the same test, multiple times, after some time 
> I would face something like:
> 
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> 
> And there wasn't a second thread task id inside:
> 
> /proc/<pid>/task/{<pid>, <tid1>, <tid2>}
> 
> like it should.
> 
> Likely the task creation (because of error ? check last part) logic here 
> was faster than wait_for_threads() logic could expect... causing a race 
> for the test's logic.

That shouldn't really happen, the wait_for_threads(2) function is called
in a loop until there are two sleeping threads under the
/proc/$pid/task/ directory (minus the main thread). So unless there is a
problem with starting a thread or locking on a futex the loop will end
eventually and the futexes are unlocked only after this loop finishes.

> * This would explain the timeout error we had, right ? *
> 
> Since stdout/stderr are being directed to a file: @Naresh, is it 
> possible to get output from stdout/err files from the same LKFT URL you 
> showed ?
> 
> * Wait, there is more ... *
> 
> While running the test, and monitoring meminfo, I could see the 
> HugePages_Total oscillating:
> 
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:      54
> HugePages_Free:       53
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:     101
> HugePages_Free:      101
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:     128
> HugePages_Free:      128
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> despite the fact that I had configured:
> 
> (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-min 4M:128
> (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-max 4M:128
> 
> And had "128" for HugePages_Total in the beginning of the test. Number 
> of hugepages total oscillates during a successful test and, at the very 
> end, the number for HugePages_Total goes back to 128. I have disabled 
> THP completely after this test and there was the same result.
>
> Reading hugetlb_report_meminfo() in hugetlb.c, I am not sure (struct 
> hstat *h)->nr_huge_pages should change that much (through new hugepages 
> being asked for, sysctl, or numa balance, among other options). I'll try 
> to trace this tomorrow to see who is touching the HugePages_Total during 
> the test.

That is because the test modifies the size of the huge pages pool itself
in the test setup so that it can run on a systems where the default is
set to 0. I guess that we should change it so that it check if the
orig_hugepages is non-zero or not and only change the value if it's
zero.

-- 
Cyril Hrubis
chrubis@suse.cz

  reply	other threads:[~2018-10-09  9:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-08 13:21 [LTP] LTP: futex_wake04 hangs forever on i386 Naresh Kamboju
2018-10-08 13:30 ` Cyril Hrubis
2018-10-09  2:05   ` Rafael David Tinoco
2018-10-09  9:28     ` Cyril Hrubis [this message]
2018-10-09 11:45       ` Rafael David Tinoco
2018-10-09 18:49       ` Rafael David Tinoco
2018-10-09 21:06         ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco
2018-10-10 10:43           ` Cyril Hrubis
2018-10-10 11:14             ` Rafael David Tinoco
2018-10-10 12:06               ` Cyril Hrubis
2018-10-10 11:41             ` [LTP] [PATCH v2 1/2] futex/futex_wake04.c: fix hugepages setup for test Rafael David Tinoco
2018-10-10 11:41               ` [LTP] [PATCH v2 2/2] futex/futex_wake04.c: raise delay waiting for threads Rafael David Tinoco
2018-10-10 14:20             ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco
2018-10-11 12:23               ` Cyril Hrubis
2018-10-10 10:41         ` [LTP] LTP: futex_wake04 hangs forever on i386 Cyril Hrubis
2018-10-10 11:13           ` Rafael David Tinoco
2018-10-10 12:05             ` Cyril Hrubis
2018-10-10 12:15               ` Rafael David Tinoco
2018-10-10 12:33                 ` Cyril Hrubis
2018-10-10 12:48                   ` Rafael David Tinoco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181009092821.GA16630@rei \
    --to=chrubis@suse.cz \
    --cc=ltp@lists.linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox