From: Xishi Qiu <qiuxishi@huawei.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "'Kirill A . Shutemov'" <kirill.shutemov@linux.intel.com>,
zhong jiang <zhongjiang@huawei.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@techsingularity.net>,
Michal Hocko <mhocko@suse.com>, Minchan Kim <minchan@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
aarcange@redhat.com, sumeet.keswani@hpe.com,
Rik van Riel <riel@redhat.com>, Linux MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: mm, something wrong in page_lock_anon_vma_read()?
Date: Wed, 19 Jul 2017 17:59:01 +0800 [thread overview]
Message-ID: <596F2D65.8020902@huawei.com> (raw)
In-Reply-To: <24bd80c6-1bb7-c8b8-2acf-b91e5e10dbb1@suse.cz>
On 2017/7/19 16:40, Vlastimil Babka wrote:
> On 07/18/2017 12:59 PM, Xishi Qiu wrote:
>> Hi,
>>
>> Unfortunately, this patch(mm: thp: fix SMP race condition between
>> THP page fault and MADV_DONTNEED) didn't help, I got the panic again.
>
> Too bad then. I don't know of any other patch from my own experience
> being directly related, try to look for similar THP-related race fixes.
> Did you already check whether disabling THP (set it to "never" under
> /sys/...) prevents the issue? I forgot.
>
Hi Vlastimil,
Thanks for your reply.
This bug is hard to reproduce, and my production line don't allowed
disable THP because performance regression. Also I have no condition to
reproduce this bug(I don't have the user apps or stress from production
line).
>> And I find this error before panic, "[468229.996610] BUG: Bad rss-counter state mm:ffff8806aebc2580 idx:1 val:1"
>
> This likely means that a pte was overwritten to zero, and an anon page
> had no other reference than this pte, so it became orphaned. Its
> anon_vma object was freed as the process exited, and eventually
> overwritten by a new user, so compaction or reclaim looking at it sooner
> or later makes a bad memory access.
>
> The pte overwriting may be a result of races with multiple threads
> trying to either read or write within the same page, involving THP zero
> page. It doesn't have to be MADV_DONTNEED related.
>
I find two patches from upstream.
887843961c4b4681ee993c36d4997bf4b4aa8253
a9c8e4beeeb64c22b84c803747487857fe424b68
I can't find any relations to the panic from the first one, and the second
one seems triggered from xen, but we use kvm.
Thanks,
Xishi Qiu
>> [468451.702807] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [468451.702861] IP: [<ffffffff810ac089>] down_read_trylock+0x9/0x30
>> [468451.702900] PGD 12445e067 PUD 11acaa067 PMD 0
>> [468451.702931] Oops: 0000 [#1] SMP
>> [468451.702953] kbox catch die event.
>> [468451.703003] collected_len = 1047419, LOG_BUF_LEN_LOCAL = 1048576
>> [468451.703003] kbox: notify die begin
>> [468451.703003] kbox: no notify die func register. no need to notify
>> [468451.703003] do nothing after die!
>> [468451.703003] Modules linked in: ipt_REJECT macvlan ip_set_hash_ipport vport_vxlan(OVE) xt_statistic xt_physdev xt_nat xt_recent xt_mark xt_comment veth ct_limit(OVE) bum_extract(OVE) policy(OVE) bum(OVE) ip_set nfnetlink openvswitch(OVE) nf_defrag_ipv6 gre ext3 jbd ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc kboxdriver(O) kbox(O) dm_thin_pool dm_persistent_data crc32_pclmul dm_bio_prison dm_bufio ghash_clmulni_intel libcrc32c aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev sg parport_pc cirrus virtio_console parport syscopyarea sysfillrect sysimgblt ttm drm_kms_helper drm i2c_piix4 i2c_core pcspkr ip_tables ext4 jbd2 mbcache sr_mod cdrom ata_generic pata_acpi
>> [468451.703003] virtio_net virtio_blk crct10dif_pclmul crct10dif_common ata_piix virtio_pci libata serio_raw virtio_ring crc32c_intel virtio dm_mirror dm_region_hash dm_log dm_mod
>> [468451.703003] CPU: 6 PID: 21965 Comm: docker-containe Tainted: G OE ----V------- 3.10.0-327.53.58.73.x86_64 #1
>> [468451.703003] Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.8.1-0-g4adadbd-20170107_142945-9_64_246_229 04/01/2014
>> [468451.703003] task: ffff880692402e00 ti: ffff88018209c000 task.ti: ffff88018209c000
>> [468451.703003] RIP: 0010:[<ffffffff810ac089>] [<ffffffff810ac089>] down_read_trylock+0x9/0x30
>> [468451.703003] RSP: 0018:ffff88018209f8f8 EFLAGS: 00010202
>> [468451.703003] RAX: 0000000000000000 RBX: ffff880720cd7740 RCX: ffff880720cd7740
>> [468451.703003] RDX: 0000000000000001 RSI: 0000000000000301 RDI: 0000000000000008
>> [468451.703003] RBP: ffff88018209f8f8 R08: 00000000c0e0f310 R09: ffff880720cd7740
>> [468451.703003] R10: ffff88083efd8000 R11: 0000000000000000 R12: ffff880720cd7741
>> [468451.703003] R13: ffffea000824d100 R14: 0000000000000008 R15: 0000000000000000
>> [468451.703003] FS: 00007fc0e2a85700(0000) GS:ffff88083ed80000(0000) knlGS:0000000000000000
>> [468451.703003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [468451.703003] CR2: 0000000000000008 CR3: 0000000661906000 CR4: 00000000001407e0
>> [468451.703003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [468451.703003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [468451.703003] Stack:
>> [468451.703003] ffff88018209f928 ffffffff811a7eb5 ffffea000824d100 ffff88018209fa90
>> [468451.703003] ffffea00082f9680 0000000000000301 ffff88018209f978 ffffffff811a82e1
>> [468451.703003] ffffea000824d100 ffff88018209fa00 0000000000000001 ffffea000824d100
>> [468451.703003] Call Trace:
>> [468451.703003] [<ffffffff811a7eb5>] page_lock_anon_vma_read+0x55/0x110
>> [468451.703003] [<ffffffff811a82e1>] try_to_unmap_anon+0x21/0x120
>> [468451.703003] [<ffffffff811a842d>] try_to_unmap+0x4d/0x60
>> [468451.712006] [<ffffffff811cc749>] migrate_pages+0x439/0x790
>> [468451.712006] [<ffffffff81193280>] ? __reset_isolation_suitable+0xe0/0xe0
>> [468451.712006] [<ffffffff811941f9>] compact_zone+0x299/0x400
>> [468451.712006] [<ffffffff81059aff>] ? kvm_clock_get_cycles+0x1f/0x30
>> [468451.712006] [<ffffffff811943fc>] compact_zone_order+0x9c/0xf0
>> [468451.712006] [<ffffffff811947b1>] try_to_compact_pages+0x121/0x1a0
>> [468451.712006] [<ffffffff8163ace6>] __alloc_pages_direct_compact+0xac/0x196
>> [468451.712006] [<ffffffff811783e2>] __alloc_pages_nodemask+0xbc2/0xca0
>> [468451.712006] [<ffffffff811bcb7a>] alloc_pages_vma+0x9a/0x150
>> [468451.712006] [<ffffffff811d1573>] do_huge_pmd_anonymous_page+0x123/0x510
>> [468451.712006] [<ffffffff8119bc58>] handle_mm_fault+0x1a8/0xf50
>> [468451.712006] [<ffffffff8164b4d6>] __do_page_fault+0x166/0x470
>> [468451.712006] [<ffffffff8164b8a3>] trace_do_page_fault+0x43/0x110
>> [468451.712006] [<ffffffff8164af79>] do_async_page_fault+0x29/0xe0
>> [468451.712006] [<ffffffff81647a38>] async_page_fault+0x28/0x30
>> [468451.712006] Code: 00 00 00 ba 01 00 00 00 48 89 de e8 12 fe ff ff eb ce 48 c7 c0 f2 ff ff ff eb c5 e8 42 ff fc ff 66 90 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7
>> [468451.712006] RIP [<ffffffff810ac089>] down_read_trylock+0x9/0x30
>> [468451.738667] RSP <ffff88018209f8f8>
>> [468451.738667] CR2: 0000000000000008
>>
>>
>>
>
>
> .
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Xishi Qiu <qiuxishi@huawei.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "'Kirill A . Shutemov'" <kirill.shutemov@linux.intel.com>,
zhong jiang <zhongjiang@huawei.com>,
Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Mel Gorman <mgorman@techsingularity.net>,
Michal Hocko <mhocko@suse.com>, Minchan Kim <minchan@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>, <aarcange@redhat.com>,
<sumeet.keswani@hpe.com>, Rik van Riel <riel@redhat.com>,
Linux MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: mm, something wrong in page_lock_anon_vma_read()?
Date: Wed, 19 Jul 2017 17:59:01 +0800 [thread overview]
Message-ID: <596F2D65.8020902@huawei.com> (raw)
In-Reply-To: <24bd80c6-1bb7-c8b8-2acf-b91e5e10dbb1@suse.cz>
On 2017/7/19 16:40, Vlastimil Babka wrote:
> On 07/18/2017 12:59 PM, Xishi Qiu wrote:
>> Hi,
>>
>> Unfortunately, this patch(mm: thp: fix SMP race condition between
>> THP page fault and MADV_DONTNEED) didn't help, I got the panic again.
>
> Too bad then. I don't know of any other patch from my own experience
> being directly related, try to look for similar THP-related race fixes.
> Did you already check whether disabling THP (set it to "never" under
> /sys/...) prevents the issue? I forgot.
>
Hi Vlastimil,
Thanks for your reply.
This bug is hard to reproduce, and my production line don't allowed
disable THP because performance regression. Also I have no condition to
reproduce this bug(I don't have the user apps or stress from production
line).
>> And I find this error before panic, "[468229.996610] BUG: Bad rss-counter state mm:ffff8806aebc2580 idx:1 val:1"
>
> This likely means that a pte was overwritten to zero, and an anon page
> had no other reference than this pte, so it became orphaned. Its
> anon_vma object was freed as the process exited, and eventually
> overwritten by a new user, so compaction or reclaim looking at it sooner
> or later makes a bad memory access.
>
> The pte overwriting may be a result of races with multiple threads
> trying to either read or write within the same page, involving THP zero
> page. It doesn't have to be MADV_DONTNEED related.
>
I find two patches from upstream.
887843961c4b4681ee993c36d4997bf4b4aa8253
a9c8e4beeeb64c22b84c803747487857fe424b68
I can't find any relations to the panic from the first one, and the second
one seems triggered from xen, but we use kvm.
Thanks,
Xishi Qiu
>> [468451.702807] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
>> [468451.702861] IP: [<ffffffff810ac089>] down_read_trylock+0x9/0x30
>> [468451.702900] PGD 12445e067 PUD 11acaa067 PMD 0
>> [468451.702931] Oops: 0000 [#1] SMP
>> [468451.702953] kbox catch die event.
>> [468451.703003] collected_len = 1047419, LOG_BUF_LEN_LOCAL = 1048576
>> [468451.703003] kbox: notify die begin
>> [468451.703003] kbox: no notify die func register. no need to notify
>> [468451.703003] do nothing after die!
>> [468451.703003] Modules linked in: ipt_REJECT macvlan ip_set_hash_ipport vport_vxlan(OVE) xt_statistic xt_physdev xt_nat xt_recent xt_mark xt_comment veth ct_limit(OVE) bum_extract(OVE) policy(OVE) bum(OVE) ip_set nfnetlink openvswitch(OVE) nf_defrag_ipv6 gre ext3 jbd ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack bridge stp llc kboxdriver(O) kbox(O) dm_thin_pool dm_persistent_data crc32_pclmul dm_bio_prison dm_bufio ghash_clmulni_intel libcrc32c aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ppdev sg parport_pc cirrus virtio_console parport syscopyarea sysfillrect sysimgblt ttm drm_kms_helper drm i2c_piix4 i2c_core pcspkr ip_tables ext4 jbd2 mbcache sr_mod cdrom ata_generic pata_acpi
>> [468451.703003] virtio_net virtio_blk crct10dif_pclmul crct10dif_common ata_piix virtio_pci libata serio_raw virtio_ring crc32c_intel virtio dm_mirror dm_region_hash dm_log dm_mod
>> [468451.703003] CPU: 6 PID: 21965 Comm: docker-containe Tainted: G OE ----V------- 3.10.0-327.53.58.73.x86_64 #1
>> [468451.703003] Hardware name: OpenStack Foundation OpenStack Nova, BIOS rel-1.8.1-0-g4adadbd-20170107_142945-9_64_246_229 04/01/2014
>> [468451.703003] task: ffff880692402e00 ti: ffff88018209c000 task.ti: ffff88018209c000
>> [468451.703003] RIP: 0010:[<ffffffff810ac089>] [<ffffffff810ac089>] down_read_trylock+0x9/0x30
>> [468451.703003] RSP: 0018:ffff88018209f8f8 EFLAGS: 00010202
>> [468451.703003] RAX: 0000000000000000 RBX: ffff880720cd7740 RCX: ffff880720cd7740
>> [468451.703003] RDX: 0000000000000001 RSI: 0000000000000301 RDI: 0000000000000008
>> [468451.703003] RBP: ffff88018209f8f8 R08: 00000000c0e0f310 R09: ffff880720cd7740
>> [468451.703003] R10: ffff88083efd8000 R11: 0000000000000000 R12: ffff880720cd7741
>> [468451.703003] R13: ffffea000824d100 R14: 0000000000000008 R15: 0000000000000000
>> [468451.703003] FS: 00007fc0e2a85700(0000) GS:ffff88083ed80000(0000) knlGS:0000000000000000
>> [468451.703003] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [468451.703003] CR2: 0000000000000008 CR3: 0000000661906000 CR4: 00000000001407e0
>> [468451.703003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [468451.703003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [468451.703003] Stack:
>> [468451.703003] ffff88018209f928 ffffffff811a7eb5 ffffea000824d100 ffff88018209fa90
>> [468451.703003] ffffea00082f9680 0000000000000301 ffff88018209f978 ffffffff811a82e1
>> [468451.703003] ffffea000824d100 ffff88018209fa00 0000000000000001 ffffea000824d100
>> [468451.703003] Call Trace:
>> [468451.703003] [<ffffffff811a7eb5>] page_lock_anon_vma_read+0x55/0x110
>> [468451.703003] [<ffffffff811a82e1>] try_to_unmap_anon+0x21/0x120
>> [468451.703003] [<ffffffff811a842d>] try_to_unmap+0x4d/0x60
>> [468451.712006] [<ffffffff811cc749>] migrate_pages+0x439/0x790
>> [468451.712006] [<ffffffff81193280>] ? __reset_isolation_suitable+0xe0/0xe0
>> [468451.712006] [<ffffffff811941f9>] compact_zone+0x299/0x400
>> [468451.712006] [<ffffffff81059aff>] ? kvm_clock_get_cycles+0x1f/0x30
>> [468451.712006] [<ffffffff811943fc>] compact_zone_order+0x9c/0xf0
>> [468451.712006] [<ffffffff811947b1>] try_to_compact_pages+0x121/0x1a0
>> [468451.712006] [<ffffffff8163ace6>] __alloc_pages_direct_compact+0xac/0x196
>> [468451.712006] [<ffffffff811783e2>] __alloc_pages_nodemask+0xbc2/0xca0
>> [468451.712006] [<ffffffff811bcb7a>] alloc_pages_vma+0x9a/0x150
>> [468451.712006] [<ffffffff811d1573>] do_huge_pmd_anonymous_page+0x123/0x510
>> [468451.712006] [<ffffffff8119bc58>] handle_mm_fault+0x1a8/0xf50
>> [468451.712006] [<ffffffff8164b4d6>] __do_page_fault+0x166/0x470
>> [468451.712006] [<ffffffff8164b8a3>] trace_do_page_fault+0x43/0x110
>> [468451.712006] [<ffffffff8164af79>] do_async_page_fault+0x29/0xe0
>> [468451.712006] [<ffffffff81647a38>] async_page_fault+0x28/0x30
>> [468451.712006] Code: 00 00 00 ba 01 00 00 00 48 89 de e8 12 fe ff ff eb ce 48 c7 c0 f2 ff ff ff eb c5 e8 42 ff fc ff 66 90 0f 1f 44 00 00 55 48 89 e5 <48> 8b 07 48 89 c2 48 83 c2 01 7e 07 f0 48 0f b1 17 75 f0 48 f7
>> [468451.712006] RIP [<ffffffff810ac089>] down_read_trylock+0x9/0x30
>> [468451.738667] RSP <ffff88018209f8f8>
>> [468451.738667] CR2: 0000000000000008
>>
>>
>>
>
>
> .
>
next prev parent reply other threads:[~2017-07-19 10:09 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-18 9:46 mm, something wring in page_lock_anon_vma_read()? Xishi Qiu
2017-05-18 9:46 ` Xishi Qiu
2017-05-19 8:52 ` Xishi Qiu
2017-05-19 8:52 ` Xishi Qiu
2017-05-19 9:44 ` Xishi Qiu
2017-05-19 9:44 ` Xishi Qiu
2017-05-19 22:00 ` Hugh Dickins
2017-05-19 22:00 ` Hugh Dickins
2017-05-20 1:21 ` Xishi Qiu
2017-05-20 1:21 ` Xishi Qiu
2017-05-20 2:02 ` Hugh Dickins
2017-05-20 2:02 ` Hugh Dickins
2017-05-20 2:18 ` Xishi Qiu
2017-05-20 2:18 ` Xishi Qiu
2017-05-20 2:40 ` Hugh Dickins
2017-05-20 2:40 ` Hugh Dickins
2017-05-20 3:01 ` zhong jiang
2017-05-20 3:01 ` zhong jiang
2017-05-22 16:51 ` Vlastimil Babka
2017-05-22 16:51 ` Vlastimil Babka
2017-05-23 9:21 ` zhong jiang
2017-05-23 9:21 ` zhong jiang
2017-05-23 9:33 ` Vlastimil Babka
2017-05-23 9:33 ` Vlastimil Babka
2017-05-23 10:32 ` zhong jiang
2017-05-23 10:32 ` zhong jiang
2017-06-08 13:44 ` Xishi Qiu
2017-06-08 13:44 ` Xishi Qiu
2017-06-08 13:59 ` Vlastimil Babka
2017-06-08 13:59 ` Vlastimil Babka
2017-06-08 14:11 ` zhong jiang
2017-06-08 14:11 ` zhong jiang
2017-07-18 10:59 ` mm, something wrong " Xishi Qiu
2017-07-18 10:59 ` Xishi Qiu
2017-07-19 8:40 ` Vlastimil Babka
2017-07-19 8:40 ` Vlastimil Babka
2017-07-19 9:59 ` Xishi Qiu [this message]
2017-07-19 9:59 ` Xishi Qiu
2017-07-20 12:58 ` Andrea Arcangeli
2017-07-20 12:58 ` Andrea Arcangeli
2017-07-20 16:15 ` Andrea Arcangeli
2017-07-20 16:15 ` Andrea Arcangeli
2017-05-22 9:48 ` mm, something wring " Xishi Qiu
2017-05-22 9:48 ` Xishi Qiu
2017-05-22 19:26 ` Hugh Dickins
2017-05-22 19:26 ` Hugh Dickins
2017-05-23 2:19 ` Xishi Qiu
2017-05-23 2:19 ` Xishi Qiu
2017-05-23 2:51 ` Hugh Dickins
2017-05-23 2:51 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=596F2D65.8020902@huawei.com \
--to=qiuxishi@huawei.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=sumeet.keswani@hpe.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=zhongjiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.