[RESEND] Soft lockup after huge page offline attempt

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [RESEND] Soft lockup after huge page offline attempt
@ 2015-09-02 10:07 Anshuman Khandual
  2015-09-02 10:20 ` Michael Neuling
  0 siblings, 1 reply; 3+ messages in thread
From: Anshuman Khandual @ 2015-09-02 10:07 UTC (permalink / raw)
  To: Linux PPC, Aneesh Kumar K.V, Michael Ellerman, Michael Neuling,
	Nishanth Aravamudan

Hello,

Worked on a modified version of the following selftest
test case in the kernel source.

tools/testing/selftests/vm/hugepage-mmap.c

This modified test will try to do the following sequence of
events in the same order.

(1) Allocate 16 Huge TLB pages (16MB each) through mmap

(2) mlock() the entire range of 256MB memory

(3) Fetch PFN from /proc/self/pagemap for each huge TLB page
    starting address present in the 256MB range

(4) Convert each PFN into PA as PFN << PAGE_SHIFT which is then
    written into /sys/devices/system/memory/soft_offline_page

(5) unmap() the 256MB memory range

(6) unlink() the file (inside the hugetlbfs mount path) used for mmap

The soft offline page attempts fail as expected as we dont support that on
POWER yet.

[  145.359405] soft offline: 0xba100: migration failed 1, type 43ffff000004008
[  145.359572] soft offline: 0xba200: migration failed 1, type 43ffff000004008
[  145.359644] soft offline: 0xba300: migration failed 1, type 43ffff000004008
[  145.359716] soft offline: 0xba400: migration failed 1, type 43ffff000004008
[  145.359783] soft offline: 0xba500: migration failed 1, type 43ffff000004008
[  145.359853] soft offline: 0xba600: migration failed 1, type 43ffff000004008
[  145.359931] soft offline: 0xba700: migration failed 1, type 43ffff000004008
[  145.360008] soft offline: 0xba800: migration failed 1, type 43ffff000004008
[  145.360080] soft offline: 0xba900: migration failed 1, type 43ffff000004008
[  145.360148] soft offline: 0xbaa00: migration failed 1, type 43ffff000004008
[  145.360218] soft offline: 0xbab00: migration failed 1, type 43ffff000004008
[  145.360286] soft offline: 0xbac00: migration failed 1, type 43ffff000004008
[  145.360353] soft offline: 0xbad00: migration failed 1, type 43ffff000004008
[  145.360420] soft offline: 0xbae00: migration failed 1, type 43ffff000004008
[  145.360485] soft offline: 0xbaf00: migration failed 1, type 43ffff000004008
[  145.360552] soft offline: 0xbb000: migration failed 1, type 43ffff000004008

But interestingly the test thread hangs for good after that. We can see this
soft lock up message in the console.

[  168.038353] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [hugepage-mmap:1238]
[  168.038384] Modules linked in: windfarm_smu_sat i2c_core pseries_rng windfarm_pid xfs libcrc32c ibmvscsi scsi_transport_srp ibmveth
[  168.038411] CPU: 3 PID: 1238 Comm: hugepage-mmap Not tainted 4.2.0+ #7
[  168.038416] task: c0000000ffa304b0 ti: c000000beff8c000 task.ti: c000000beff8c000
[  168.038422] NIP: c000000000228164 LR: c00000000022820c CTR: c0000000003ff060
[  168.038428] REGS: c000000beff8f8c0 TRAP: 0901   Not tainted  (4.2.0+)
[  168.038432] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22000228  XER: 00000000
[  168.038446] CFAR: c000000000228180 SOFTE: 1 
GPR00: 0000000000000000 c000000beff8fb40 c000000001306000 f000000002e84000 
GPR04: c000000beff8fb60 0000000000000220 c000000bf98b5248 0000000000000000 
GPR08: 0000000000000044 0000000000000000 f000000002e8401c 0000000000000000 
GPR12: 0000000000000000 c00000000ea51c80 
[  168.038483] NIP [c000000000228164] find_get_pages+0xa4/0x1e0
[  168.038489] LR [c00000000022820c] find_get_pages+0x14c/0x1e0
[  168.038493] Call Trace:
[  168.038496] [c000000beff8fb40] [c000000beff8fbc0] 0xc000000beff8fbc0 (unreliable)
[  168.038503] [c000000beff8fbc0] [c00000000023ba20] pagevec_lookup+0x50/0x80
[  168.038510] [c000000beff8fc00] [c0000000003fe630] truncate_hugepages+0xb0/0x280
[  168.038517] [c000000beff8fd10] [c0000000003ff090] hugetlbfs_evict_inode+0x30/0x70
[  168.038524] [c000000beff8fd40] [c0000000002f1d34] evict+0xf4/0x290
[  168.038531] [c000000beff8fd80] [c0000000002e3268] do_unlinkat+0x208/0x370
[  168.038538] [c000000beff8fe30] [c000000000009360] system_call+0x38/0xd0
[  168.038543] Instruction dump:
[  168.038546] 419e00c8 e87f0000 2fa30000 419e0080 786907a1 408200d8 813b000c 552902ee 
[  168.038556] 0b090000 3943001c 7c2004ac 7d205028 <2c090000> 41c20014 31090001 7d00512d 

After a while, we see these messages as well on the console before it freezes.

INFO: rcu_sched self-detected stall on CPU
[  205.368370] 	3: (5999 ticks this GP) idle=b23/140000000000001/0 softirq=1329/1329 fqs=6000 
[  205.368375] 	 (t=6000 jiffies g=896 c=895 q=678)
[  205.368383] Task dump for CPU 3:
[  205.368387] hugepage-mmap   R  running task        0  1238   1233 0x00042004
[  205.368394] Call Trace:
[  205.368401] [c000000beff8f3e0] [c000000000106660] sched_show_task+0xe0/0x180 (unreliable)
[  205.368409] [c000000beff8f450] [c00000000014d9e4] rcu_dump_cpu_stacks+0xe4/0x150
[  205.368416] [c000000beff8f4a0] [c0000000001529c4] rcu_check_callbacks+0x6b4/0x9b0
[  205.368424] [c000000beff8f5d0] [c000000000159ee4] update_process_times+0x54/0xa0
[  205.368431] [c000000beff8f600] [c000000000170b48] tick_sched_handle.isra.6+0x48/0xe0
[  205.368437] [c000000beff8f640] [c000000000170c44] tick_sched_timer+0x64/0xd0
[  205.368444] [c000000beff8f680] [c00000000015a954] __hrtimer_run_queues+0x124/0x3c0
[  205.368450] [c000000beff8f710] [c00000000015b88c] hrtimer_interrupt+0xec/0x2c0
[  205.368457] [c000000beff8f7d0] [c00000000001f77c] __timer_interrupt+0x8c/0x230
[  205.368464] [c000000beff8f820] [c00000000001fe30] timer_interrupt+0xa0/0xe0
[  205.368470] [c000000beff8f850] [c000000000002768] decrementer_common+0x168/0x180
[  205.368478] --- interrupt: 901 at find_get_pages+0xa0/0x1e0
[  205.368478]     LR = find_get_pages+0x14c/0x1e0
[  205.368486] [c000000beff8fb40] [c000000beff8fbc0] 0xc000000beff8fbc0 (unreliable)
[  205.368493] [c000000beff8fbc0] [c00000000023ba20] pagevec_lookup+0x50/0x80
[  205.368500] [c000000beff8fc00] [c0000000003fe630] truncate_hugepages+0xb0/0x280
[  205.368507] [c000000beff8fd10] [c0000000003ff090] hugetlbfs_evict_inode+0x30/0x70
[  205.368513] [c000000beff8fd40] [c0000000002f1d34] evict+0xf4/0x290
[  205.368520] [c000000beff8fd80] [c0000000002e3268] do_unlinkat+0x208/0x370
[  205.368526] [c000000beff8fe30] [c000000000009360] system_call+0x38/0xd0


Then the entire system hangs for good, requiring a reboot. The problem can be recreated
without unlink() step if we just try to unmount the hugetlbfs afterwards. I am still
looking into this problem. Please do let me know if you have any suggestions/thoughts
or comments about this issue.

Regards
Anshuman

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RESEND] Soft lockup after huge page offline attempt
  2015-09-02 10:07 [RESEND] Soft lockup after huge page offline attempt Anshuman Khandual
@ 2015-09-02 10:20 ` Michael Neuling
  2015-09-02 11:14   ` Anshuman Khandual
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Neuling @ 2015-09-02 10:20 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Linux PPC, Aneesh Kumar K.V, Michael Ellerman,
	Nishanth Aravamudan

On Wed, 2015-09-02 at 15:37 +0530, Anshuman Khandual wrote:
> Hello,
>=20
> Worked on a modified version of the following selftest
> test case in the kernel source.
>=20
> tools/testing/selftests/vm/hugepage-mmap.c

Please send the test case and how to run it here.

Mikey

> This modified test will try to do the following sequence of
> events in the same order.
>=20
> (1) Allocate 16 Huge TLB pages (16MB each) through mmap
>=20
> (2) mlock() the entire range of 256MB memory
>=20
> (3) Fetch PFN from /proc/self/pagemap for each huge TLB page
>     starting address present in the 256MB range
>=20
> (4) Convert each PFN into PA as PFN << PAGE_SHIFT which is then
>     written into /sys/devices/system/memory/soft_offline_page
>=20
> (5) unmap() the 256MB memory range
>=20
> (6) unlink() the file (inside the hugetlbfs mount path) used for mmap
>=20
> The soft offline page attempts fail as expected as we dont support that o=
n
> POWER yet.
>=20
> [  145.359405] soft offline: 0xba100: migration failed 1, type 43ffff0000=
04008
> [  145.359572] soft offline: 0xba200: migration failed 1, type 43ffff0000=
04008
> [  145.359644] soft offline: 0xba300: migration failed 1, type 43ffff0000=
04008
> [  145.359716] soft offline: 0xba400: migration failed 1, type 43ffff0000=
04008
> [  145.359783] soft offline: 0xba500: migration failed 1, type 43ffff0000=
04008
> [  145.359853] soft offline: 0xba600: migration failed 1, type 43ffff0000=
04008
> [  145.359931] soft offline: 0xba700: migration failed 1, type 43ffff0000=
04008
> [  145.360008] soft offline: 0xba800: migration failed 1, type 43ffff0000=
04008
> [  145.360080] soft offline: 0xba900: migration failed 1, type 43ffff0000=
04008
> [  145.360148] soft offline: 0xbaa00: migration failed 1, type 43ffff0000=
04008
> [  145.360218] soft offline: 0xbab00: migration failed 1, type 43ffff0000=
04008
> [  145.360286] soft offline: 0xbac00: migration failed 1, type 43ffff0000=
04008
> [  145.360353] soft offline: 0xbad00: migration failed 1, type 43ffff0000=
04008
> [  145.360420] soft offline: 0xbae00: migration failed 1, type 43ffff0000=
04008
> [  145.360485] soft offline: 0xbaf00: migration failed 1, type 43ffff0000=
04008
> [  145.360552] soft offline: 0xbb000: migration failed 1, type 43ffff0000=
04008
>=20
> But interestingly the test thread hangs for good after that. We can see t=
his
> soft lock up message in the console.
>=20
> [  168.038353] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [hug=
epage-mmap:1238]
> [  168.038384] Modules linked in: windfarm_smu_sat i2c_core pseries_rng w=
indfarm_pid xfs libcrc32c ibmvscsi scsi_transport_srp ibmveth
> [  168.038411] CPU: 3 PID: 1238 Comm: hugepage-mmap Not tainted 4.2.0+ #7
> [  168.038416] task: c0000000ffa304b0 ti: c000000beff8c000 task.ti: c0000=
00beff8c000
> [  168.038422] NIP: c000000000228164 LR: c00000000022820c CTR: c000000000=
3ff060
> [  168.038428] REGS: c000000beff8f8c0 TRAP: 0901   Not tainted  (4.2.0+)
> [  168.038432] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22000228=
  XER: 00000000
> [  168.038446] CFAR: c000000000228180 SOFTE: 1=20
> GPR00: 0000000000000000 c000000beff8fb40 c000000001306000 f000000002e8400=
0=20
> GPR04: c000000beff8fb60 0000000000000220 c000000bf98b5248 000000000000000=
0=20
> GPR08: 0000000000000044 0000000000000000 f000000002e8401c 000000000000000=
0=20
> GPR12: 0000000000000000 c00000000ea51c80=20
> [  168.038483] NIP [c000000000228164] find_get_pages+0xa4/0x1e0
> [  168.038489] LR [c00000000022820c] find_get_pages+0x14c/0x1e0
> [  168.038493] Call Trace:
> [  168.038496] [c000000beff8fb40] [c000000beff8fbc0] 0xc000000beff8fbc0 (=
unreliable)
> [  168.038503] [c000000beff8fbc0] [c00000000023ba20] pagevec_lookup+0x50/=
0x80
> [  168.038510] [c000000beff8fc00] [c0000000003fe630] truncate_hugepages+0=
xb0/0x280
> [  168.038517] [c000000beff8fd10] [c0000000003ff090] hugetlbfs_evict_inod=
e+0x30/0x70
> [  168.038524] [c000000beff8fd40] [c0000000002f1d34] evict+0xf4/0x290
> [  168.038531] [c000000beff8fd80] [c0000000002e3268] do_unlinkat+0x208/0x=
370
> [  168.038538] [c000000beff8fe30] [c000000000009360] system_call+0x38/0xd=
0
> [  168.038543] Instruction dump:
> [  168.038546] 419e00c8 e87f0000 2fa30000 419e0080 786907a1 408200d8 813b=
000c 552902ee=20
> [  168.038556] 0b090000 3943001c 7c2004ac 7d205028 <2c090000> 41c20014 31=
090001 7d00512d=20
>=20
> After a while, we see these messages as well on the console before it fre=
ezes.
>=20
> INFO: rcu_sched self-detected stall on CPU
> [  205.368370] 	3: (5999 ticks this GP) idle=3Db23/140000000000001/0 soft=
irq=3D1329/1329 fqs=3D6000=20
> [  205.368375] 	 (t=3D6000 jiffies g=3D896 c=3D895 q=3D678)
> [  205.368383] Task dump for CPU 3:
> [  205.368387] hugepage-mmap   R  running task        0  1238   1233 0x00=
042004
> [  205.368394] Call Trace:
> [  205.368401] [c000000beff8f3e0] [c000000000106660] sched_show_task+0xe0=
/0x180 (unreliable)
> [  205.368409] [c000000beff8f450] [c00000000014d9e4] rcu_dump_cpu_stacks+=
0xe4/0x150
> [  205.368416] [c000000beff8f4a0] [c0000000001529c4] rcu_check_callbacks+=
0x6b4/0x9b0
> [  205.368424] [c000000beff8f5d0] [c000000000159ee4] update_process_times=
+0x54/0xa0
> [  205.368431] [c000000beff8f600] [c000000000170b48] tick_sched_handle.is=
ra.6+0x48/0xe0
> [  205.368437] [c000000beff8f640] [c000000000170c44] tick_sched_timer+0x6=
4/0xd0
> [  205.368444] [c000000beff8f680] [c00000000015a954] __hrtimer_run_queues=
+0x124/0x3c0
> [  205.368450] [c000000beff8f710] [c00000000015b88c] hrtimer_interrupt+0x=
ec/0x2c0
> [  205.368457] [c000000beff8f7d0] [c00000000001f77c] __timer_interrupt+0x=
8c/0x230
> [  205.368464] [c000000beff8f820] [c00000000001fe30] timer_interrupt+0xa0=
/0xe0
> [  205.368470] [c000000beff8f850] [c000000000002768] decrementer_common+0=
x168/0x180
> [  205.368478] --- interrupt: 901 at find_get_pages+0xa0/0x1e0
> [  205.368478]     LR =3D find_get_pages+0x14c/0x1e0
> [  205.368486] [c000000beff8fb40] [c000000beff8fbc0] 0xc000000beff8fbc0 (=
unreliable)
> [  205.368493] [c000000beff8fbc0] [c00000000023ba20] pagevec_lookup+0x50/=
0x80
> [  205.368500] [c000000beff8fc00] [c0000000003fe630] truncate_hugepages+0=
xb0/0x280
> [  205.368507] [c000000beff8fd10] [c0000000003ff090] hugetlbfs_evict_inod=
e+0x30/0x70
> [  205.368513] [c000000beff8fd40] [c0000000002f1d34] evict+0xf4/0x290
> [  205.368520] [c000000beff8fd80] [c0000000002e3268] do_unlinkat+0x208/0x=
370
> [  205.368526] [c000000beff8fe30] [c000000000009360] system_call+0x38/0xd=
0
>=20
>=20
> Then the entire system hangs for good, requiring a reboot. The problem ca=
n be recreated
> without unlink() step if we just try to unmount the hugetlbfs afterwards.=
 I am still
> looking into this problem. Please do let me know if you have any suggesti=
ons/thoughts
> or comments about this issue.
>=20
> Regards
> Anshuman
>=20

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RESEND] Soft lockup after huge page offline attempt
  2015-09-02 10:20 ` Michael Neuling
@ 2015-09-02 11:14   ` Anshuman Khandual
  0 siblings, 0 replies; 3+ messages in thread
From: Anshuman Khandual @ 2015-09-02 11:14 UTC (permalink / raw)
  To: Michael Neuling; +Cc: Nishanth Aravamudan, Linux PPC, Aneesh Kumar K.V

On 09/02/2015 03:50 PM, Michael Neuling wrote:
> On Wed, 2015-09-02 at 15:37 +0530, Anshuman Khandual wrote:
>> Hello,
>>
>> Worked on a modified version of the following selftest
>> test case in the kernel source.
>>
>> tools/testing/selftests/vm/hugepage-mmap.c
> 
> Please send the test case and how to run it here.

Please find them here.


Script (modified from tools/testing/selftests/vm/run_vmtests)
=============================================================
#!/bin/bash
#please run as root

#we need 256M, below is the size in kB
needmem=262144
mnt=./huge
exitcode=0

#get pagesize and freepages from /proc/meminfo
while read name size unit; do
        if [ "$name" = "HugePages_Free:" ]; then
                freepgs=$size
        fi
        if [ "$name" = "Hugepagesize:" ]; then
                pgsize=$size
        fi
done < /proc/meminfo

#set proper nr_hugepages
if [ -n "$freepgs" ] && [ -n "$pgsize" ]; then
        nr_hugepgs=`cat /proc/sys/vm/nr_hugepages`
        needpgs=`expr $needmem / $pgsize`
        if [ $freepgs -lt $needpgs ]; then
                lackpgs=$(( $needpgs - $freepgs ))
                echo $(( ($lackpgs + $nr_hugepgs) * 4)) > /proc/sys/vm/nr_hugepages
                if [ $? -ne 0 ]; then
                        echo "Please run this test as root"
                        exit 1
                fi
        fi
else
        echo "no hugetlbfs support in kernel?"
        exit 1
fi

mkdir $mnt
mount -t hugetlbfs none $mnt

echo "--------------------"
echo "running hugepage-mmap"
echo "--------------------"
./hugepage-mmap
if [ $? -ne 0 ]; then
        echo "[FAIL]"
        exitcode=1
else
        echo "[PASS]"
fi

#cleanup
umount $mnt
rm -rf $mnt
echo $nr_hugepgs > /proc/sys/vm/nr_hugepages

testcase (modified from /testing/selftests/vm/hugepage-mmap.c)
==============================================================

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>

#define FILE_NAME "huge/hugepagefile"
#define LENGTH (256UL*1024*1024)
#define PROTECTION (PROT_READ | PROT_WRITE)

#define ADDR (void *)(0x0UL)
#define FLAGS (MAP_SHARED)

#define SOFT_OFFLINE "/sys/devices/system/memory/soft_offline_page"
#define PAGE_MAP "/proc/self/pagemap"
#define PAGEMAP_LEN 8
#define HPAGE 16384 * 1024
#define PFN_MASK 0x007FFFFFFFFFFFFFUL

static void check_bytes(char *addr)
{
        printf("First hex is %x\n", *((unsigned int *)addr));
}

static void write_bytes(char *addr)
{
        unsigned long i;

        for (i = 0; i < LENGTH; i++)
                *(addr + i) = (char)i;
}

static int read_bytes(char *addr)
{
        unsigned long i;

        check_bytes(addr);
        for (i = 0; i < LENGTH; i++)
                if (*(addr + i) != (char)i) {
                        printf("Mismatch at %lu\n", i);
                        return 1;
                }
        return 0;
}

int main(void)
{
        void *addr, *tmp;
        int fd, fd1, fd2, ret;
        int i;
        unsigned long offset, pfn;
        char buf[20];

        fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755);
        if (fd < 0) {
                perror("Open failed");
                exit(1);
        }
        addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0);
        if (addr == MAP_FAILED) {
                perror("mmap");
                unlink(FILE_NAME);
                exit(1);
        }

        if (mlock(addr, LENGTH) == -1) {
                perror("mlock");
                unlink(FILE_NAME);
                exit(1);
        }

        printf("Returned address is %p\n", addr);
        check_bytes(addr);
        write_bytes(addr);
        ret = read_bytes(addr);

        fd1 = open(PAGE_MAP, O_RDONLY);
        if (fd1 == -1) {
                perror("open");
                exit(-1);
        }

        fd2 = open(SOFT_OFFLINE, O_WRONLY);
        if (fd1 == -1) {
                perror("open");
                exit(-1);
        }

        for (i = 0; i < 16; i++) {
                tmp = addr + i * HPAGE;
                offset = ((unsigned long) tmp / getpagesize()) * PAGEMAP_LEN;

                if (lseek(fd1, offset, SEEK_SET) == -1) {
                        perror("seek");
                        exit(-1);
                }

                if (read(fd1, &pfn, sizeof(pfn)) == -1) {
                        perror("read");
                        exit(-1);
                }

                printf("PFN: %lx ", pfn & PFN_MASK);
                pfn = (pfn & PFN_MASK) << 16;
                printf("Address: %lx\n", pfn);

                sprintf(buf, "0x%lx\n", pfn);
                printf("buf: %s\n", buf);

                if (write(fd2, buf, strlen(buf)) == -1) {
                        perror("write");
                }
        }

        munmap(addr, LENGTH);
        close(fd);
        close(fd1);
        close(fd2);
        unlink(FILE_NAME);
        return ret;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-09-02 11:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-02 10:07 [RESEND] Soft lockup after huge page offline attempt Anshuman Khandual
2015-09-02 10:20 ` Michael Neuling
2015-09-02 11:14   ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).