All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cyril Hrubis <chrubis@suse.cz>
To: ltp@lists.linux.it
Subject: [LTP] LTP: futex_wake04 hangs forever on i386
Date: Tue, 9 Oct 2018 11:28:22 +0200	[thread overview]
Message-ID: <20181009092821.GA16630@rei> (raw)
In-Reply-To: <3dbee853-6bce-d057-f42a-655a8f2991c3@linaro.org>

Hi!
> >> LTP syscall test case futex_wake04 hangs forever on i386 kernel 
> >> running on x86_64 server machine.
> >> 
> >> Test PASS on qemu_i386 and other devices. Test HANGs on i386 (i386
> >>  kernel booting on x86_64 machine).
> >> 
> >> output, incrementing stop futex_wake04    0  TINFO  : Hugepagesize
> >> 4194304
> > 
> > Looks like we are trying to fault 4GB hugepage, that may be quite 
> > slow, but then, the test is a kernel reproducer as well.
> 
> This is i386 without PAE so the hugepage here is 4 MB (or else it would
> be a 2 MB with PAE). I have configured the same environment on my side:

Ah, my bad, the test prints the size in bytes and not in kbytes as it's
in /proc/meminfo. So indeed this is 4MB.

> (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-max 4M:128
> (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-min 4M:128
> (k)inaddy@bug3984:~$ sudo hugeadm --pool-list
>        Size  Minimum  Current  Maximum  Default
>     4194304      128      128      128        *
> 
> And ran the test with 500 iterations, multiple times, without a single
> issue.
> 
> (k)root@bug3984:~$ while true; do cat /proc/meminfo | grep -i
> hugepages_free; done
> 
> Shoes me the hugepages are being used:
> 
> HugePages_Free:       93
> HugePages_Free:        4
> HugePages_Free:      128
> HugePages_Free:        0
> HugePages_Free:      128
> HugePages_Free:        1
> HugePages_Free:        8
> HugePages_Free:      128
> HugePages_Free:       27
> HugePages_Free:        0
> HugePages_Free:      102
> 
> > Can you connect to the process with gdb and check if we happened to 
> > timeout before or after the two threads that are are supposed to 
> > trigger kernel bug are started?
> > 
> 
> Reading the test and commit 13d60f4b6a it looks like this test is
> testing futexes indexes stored in hugepages (one in head and the other
> outside the hugepage head) and a indexing problem on which futex() was
> signalized as non-contended.
> 
> Kernel code was using page->index (of page hosting the futex) for futex
> offset and, after this commit, it started taking in consideration if the
> page was a hugepage (head or tail) or a compound page, when then it
> would use the compound_idx for the futex offset, fixing the issue.
> 
> In order for the issue to happen we would have to face:
> 
> test thread #1                    test thread #2
> --------------                    --------------
> [hugepage #1 head] *futex1 offset [hugepage #1 head]
> [page]                            [page] *futex2 offset
> [page]                            [page]
> [page]                            [page]
> ...                               ...
> 
> where futex2 key points to futex1.
> 
> And test case does NOT look like something intermittent (since it is a
> lock ordering issue - depending on pthreads sharing futexes - with wrong
> futexes indexes, like described in kernel commit).
> 
> * Test case problem would either happen or not *
> 
> I tried to extend a bit number of iterations and something caught my 
> attention... when running the same test, multiple times, after some time 
> I would face something like:
> 
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> futex_wake04    0  TINFO  :  Thread 7495 not sleeping yet
> 
> And there wasn't a second thread task id inside:
> 
> /proc/<pid>/task/{<pid>, <tid1>, <tid2>}
> 
> like it should.
> 
> Likely the task creation (because of error ? check last part) logic here 
> was faster than wait_for_threads() logic could expect... causing a race 
> for the test's logic.

That shouldn't really happen, the wait_for_threads(2) function is called
in a loop until there are two sleeping threads under the
/proc/$pid/task/ directory (minus the main thread). So unless there is a
problem with starting a thread or locking on a futex the loop will end
eventually and the futexes are unlocked only after this loop finishes.

> * This would explain the timeout error we had, right ? *
> 
> Since stdout/stderr are being directed to a file: @Naresh, is it 
> possible to get output from stdout/err files from the same LKFT URL you 
> showed ?
> 
> * Wait, there is more ... *
> 
> While running the test, and monitoring meminfo, I could see the 
> HugePages_Total oscillating:
> 
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:      54
> HugePages_Free:       53
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:     101
> HugePages_Free:      101
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:     128
> HugePages_Free:      128
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        1
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> HugePages_Total:       1
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> 
> despite the fact that I had configured:
> 
> (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-min 4M:128
> (k)inaddy@bug3984:~$ sudo hugeadm --hard --pool-pages-max 4M:128
> 
> And had "128" for HugePages_Total in the beginning of the test. Number 
> of hugepages total oscillates during a successful test and, at the very 
> end, the number for HugePages_Total goes back to 128. I have disabled 
> THP completely after this test and there was the same result.
>
> Reading hugetlb_report_meminfo() in hugetlb.c, I am not sure (struct 
> hstat *h)->nr_huge_pages should change that much (through new hugepages 
> being asked for, sysctl, or numa balance, among other options). I'll try 
> to trace this tomorrow to see who is touching the HugePages_Total during 
> the test.

That is because the test modifies the size of the huge pages pool itself
in the test setup so that it can run on a systems where the default is
set to 0. I guess that we should change it so that it check if the
orig_hugepages is non-zero or not and only change the value if it's
zero.

-- 
Cyril Hrubis
chrubis@suse.cz

  reply	other threads:[~2018-10-09  9:28 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-08 13:21 [LTP] LTP: futex_wake04 hangs forever on i386 Naresh Kamboju
2018-10-08 13:30 ` Cyril Hrubis
2018-10-09  2:05   ` Rafael David Tinoco
2018-10-09  9:28     ` Cyril Hrubis [this message]
2018-10-09 11:45       ` Rafael David Tinoco
2018-10-09 18:49       ` Rafael David Tinoco
2018-10-09 21:06         ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco
2018-10-10 10:43           ` Cyril Hrubis
2018-10-10 11:14             ` Rafael David Tinoco
2018-10-10 12:06               ` Cyril Hrubis
2018-10-10 11:41             ` [LTP] [PATCH v2 1/2] futex/futex_wake04.c: fix hugepages setup for test Rafael David Tinoco
2018-10-10 11:41               ` [LTP] [PATCH v2 2/2] futex/futex_wake04.c: raise delay waiting for threads Rafael David Tinoco
2018-10-10 14:20             ` [LTP] [PATCH] futex/futex_wake04.c: fix issues with hugepages and usleep Rafael David Tinoco
2018-10-11 12:23               ` Cyril Hrubis
2018-10-10 10:41         ` [LTP] LTP: futex_wake04 hangs forever on i386 Cyril Hrubis
2018-10-10 11:13           ` Rafael David Tinoco
2018-10-10 12:05             ` Cyril Hrubis
2018-10-10 12:15               ` Rafael David Tinoco
2018-10-10 12:33                 ` Cyril Hrubis
2018-10-10 12:48                   ` Rafael David Tinoco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181009092821.GA16630@rei \
    --to=chrubis@suse.cz \
    --cc=ltp@lists.linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.