[3.15rc1] BUG at mm/filemap.c:202!

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [3.15rc1] BUG at mm/filemap.c:202!
@ 2014-04-15 19:09 Dave Jones
  2014-04-16 20:40 ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Jones @ 2014-04-15 19:09 UTC (permalink / raw)
  To: linux-mm; +Cc: Linux Kernel

kernel BUG at mm/filemap.c:202!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: tun fuse bnep rfcomm nfnetlink llc2 af_key ipt_ULOG can_raw can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_r
xrpc can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm
 xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_controller snd_hda_codec e
1000e btusb bluetooth microcode pcspkr serio_raw snd_hwdep snd_seq snd_seq_device snd_pcm 6lowpan_iphc usb_debug rfkill ptp pps_core shpchp snd_timer snd soundcore
CPU: 3 PID: 14244 Comm: trinity-main Not tainted 3.15.0-rc1+ #188
task: ffff8801be2c50a0 ti: ffff8801d6830000 task.ti: ffff8801d6830000
RIP: 0010:[<ffffffff9915b4d5>]  [<ffffffff9915b4d5>] __delete_from_page_cache+0x315/0x320
RSP: 0018:ffff8801d6831b10  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000001d
RDX: 000000000000012a RSI: ffffffff99a9a1c0 RDI: ffffffff99a6dad5
RBP: ffff8801d6831b60 R08: 000000000000005d R09: ffff8801b0361530
R10: ffff8801d6831b28 R11: 0000000000000000 R12: ffffea000734d440
R13: ffff880241235008 R14: 0000000000000000 R15: ffff880241235010
FS:  00007f81925cf740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000630058 CR3: 0000000019c0e000 CR4: 00000000001407e0
DR0: 0000000000df3000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 ffff880241235020 ffff880241235038 ffff8801b0361530 ffff8801b0361640
 000000001da16adc ffffea000734d440 ffff880241235020 0000000000000000
 0000000000000000 000000000000005d ffff8801d6831b88 ffffffff9915b51d
Call Trace:
 [<ffffffff9915b51d>] delete_from_page_cache+0x3d/0x70
 [<ffffffff9916ab7b>] truncate_inode_page+0x5b/0x90
 [<ffffffff991759ab>] shmem_undo_range+0x30b/0x780
 [<ffffffff990a99e5>] ? local_clock+0x25/0x30
 [<ffffffff99175e34>] shmem_truncate_range+0x14/0x30
 [<ffffffff99175f1d>] shmem_evict_inode+0xcd/0x150
 [<ffffffff991e46e7>] evict+0xa7/0x170
 [<ffffffff991e5005>] iput+0xf5/0x180
 [<ffffffff991df390>] dentry_kill+0x210/0x250
 [<ffffffff991df43c>] dput+0x6c/0x110
 [<ffffffff991c8c19>] __fput+0x189/0x200
 [<ffffffff991c8cde>] ____fput+0xe/0x10
 [<ffffffff990900b4>] task_work_run+0xb4/0xe0
 [<ffffffff9906ea92>] do_exit+0x302/0xb80
 [<ffffffff99349843>] ? __this_cpu_preempt_check+0x13/0x20
 [<ffffffff9907038c>] do_group_exit+0x4c/0xc0
 [<ffffffff99070414>] SyS_exit_group+0x14/0x20
 [<ffffffff9975a964>] tracesys+0xdd/0xe2
Code: 4c 89 30 e9 80 fe ff ff 48 8b 75 c0 4c 89 ff e8 e2 8e 1c 00 84 c0 0f 85 6c fe ff ff e9 4f fe ff ff 0f 1f 44 00 00 e8 4e 85 5e 00 <0f> 0b e8 84 1d f1 ff 0f :


 202         BUG_ON(page_mapped(page));


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [3.15rc1] BUG at mm/filemap.c:202!
  2014-04-15 19:09 [3.15rc1] BUG at mm/filemap.c:202! Dave Jones
@ 2014-04-16 20:40 ` Hugh Dickins
  2014-05-01 16:20   ` Richard Weinberger
  2014-05-03 23:37   ` [PATCH] mm: Fix force_flush behavior in zap_pte_range() Richard Weinberger
  0 siblings, 2 replies; 13+ messages in thread
From: Hugh Dickins @ 2014-04-16 20:40 UTC (permalink / raw)
  To: Dave Jones
  Cc: Andrew Morton, Kirill A. Shutemov, Johannes Weiner, Sasha Levin,
	linux-kernel, linux-mm

On Tue, 15 Apr 2014, Dave Jones wrote:

> kernel BUG at mm/filemap.c:202!
> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Modules linked in: tun fuse bnep rfcomm nfnetlink llc2 af_key ipt_ULOG can_raw can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_r
> xrpc can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm
>  xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_controller snd_hda_codec e
> 1000e btusb bluetooth microcode pcspkr serio_raw snd_hwdep snd_seq snd_seq_device snd_pcm 6lowpan_iphc usb_debug rfkill ptp pps_core shpchp snd_timer snd soundcore
> CPU: 3 PID: 14244 Comm: trinity-main Not tainted 3.15.0-rc1+ #188
> task: ffff8801be2c50a0 ti: ffff8801d6830000 task.ti: ffff8801d6830000
> RIP: 0010:[<ffffffff9915b4d5>]  [<ffffffff9915b4d5>] __delete_from_page_cache+0x315/0x320
> RSP: 0018:ffff8801d6831b10  EFLAGS: 00010046
> RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000001d
> RDX: 000000000000012a RSI: ffffffff99a9a1c0 RDI: ffffffff99a6dad5
> RBP: ffff8801d6831b60 R08: 000000000000005d R09: ffff8801b0361530
> R10: ffff8801d6831b28 R11: 0000000000000000 R12: ffffea000734d440
> R13: ffff880241235008 R14: 0000000000000000 R15: ffff880241235010
> FS:  00007f81925cf740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000630058 CR3: 0000000019c0e000 CR4: 00000000001407e0
> DR0: 0000000000df3000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>  ffff880241235020 ffff880241235038 ffff8801b0361530 ffff8801b0361640
>  000000001da16adc ffffea000734d440 ffff880241235020 0000000000000000
>  0000000000000000 000000000000005d ffff8801d6831b88 ffffffff9915b51d
> Call Trace:
>  [<ffffffff9915b51d>] delete_from_page_cache+0x3d/0x70
>  [<ffffffff9916ab7b>] truncate_inode_page+0x5b/0x90
>  [<ffffffff991759ab>] shmem_undo_range+0x30b/0x780
>  [<ffffffff990a99e5>] ? local_clock+0x25/0x30
>  [<ffffffff99175e34>] shmem_truncate_range+0x14/0x30
>  [<ffffffff99175f1d>] shmem_evict_inode+0xcd/0x150
>  [<ffffffff991e46e7>] evict+0xa7/0x170
>  [<ffffffff991e5005>] iput+0xf5/0x180
>  [<ffffffff991df390>] dentry_kill+0x210/0x250
>  [<ffffffff991df43c>] dput+0x6c/0x110
>  [<ffffffff991c8c19>] __fput+0x189/0x200
>  [<ffffffff991c8cde>] ____fput+0xe/0x10
>  [<ffffffff990900b4>] task_work_run+0xb4/0xe0
>  [<ffffffff9906ea92>] do_exit+0x302/0xb80
>  [<ffffffff99349843>] ? __this_cpu_preempt_check+0x13/0x20
>  [<ffffffff9907038c>] do_group_exit+0x4c/0xc0
>  [<ffffffff99070414>] SyS_exit_group+0x14/0x20
>  [<ffffffff9975a964>] tracesys+0xdd/0xe2
> Code: 4c 89 30 e9 80 fe ff ff 48 8b 75 c0 4c 89 ff e8 e2 8e 1c 00 84 c0 0f 85 6c fe ff ff e9 4f fe ff ff 0f 1f 44 00 00 e8 4e 85 5e 00 <0f> 0b e8 84 1d f1 ff 0f :
> 
> 
>  202         BUG_ON(page_mapped(page));

I've been wrestling with this report, but made no progress;
maybe if I set down a few thoughts, someone can help us forward.

It is reasonable to assume (but unreasonable to hold on too tightly
to the assumption) that this is related to Dave's contemporaneous
report of BUG: Bad rss-counter state mm:ffff88023fc73c00 idx:0 val:5

I don't know if they both occurred in the same session; but whether
or not they did, the BUG_ON(page_mapped(page)) from inode eviction
implies that not every pte mapping a shmem file page had been located
when its last mapper exited; and the rss-counter message implies that
there were five pte mappings of file(s) which could not be located
when their mapper exited.

It is also reasonable to assume (but unreasonable to hold on too
tightly to the assumption) that this is another manifestation of
the same unsolved mm/filemap.c:202 that Sasha reported on rc5-next
a month ago, https://lkml.org/lkml/2014/3/7/298

Now that one occurred, not while evicting a shmem inode, but while
punching a hole in it with madvise(,,MADV_REMOVE).  At the time I
set it aside to consider when improving shmem_fallocate(), but now
it looks more like a precursor of Dave's.

One way this could happen is if we have racing tasks setting up
ptes without the necessary locking, one placing its pte on top of
another's, so page_mapcount goes up by 2 but comes down by 1 later.
But I failed to find anywhere in the code in danger of doing that.

Another way it could happen is if a vma is removed from i_mmap tree
and i_mmap_nonlinear list, without zap_pte_range() having zapped all
of its ptes; but I don't see where that could happen either.

Sasha's came before shmem participated in Kirill's filemap_map_pages
fault-around; but his pte_same/pte_none checking under ptl there looks
correct anyway.  I've not found any recent change likely to blame.

Help!
Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [3.15rc1] BUG at mm/filemap.c:202!
  2014-04-16 20:40 ` Hugh Dickins
@ 2014-05-01 16:20   ` Richard Weinberger
  2014-05-03 19:24     ` Richard Weinberger
  2014-05-03 23:37   ` [PATCH] mm: Fix force_flush behavior in zap_pte_range() Richard Weinberger
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Weinberger @ 2014-05-01 16:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Kirill A. Shutemov, Johannes Weiner,
	Sasha Levin, LKML, linux-mm@kvack.org

On Wed, Apr 16, 2014 at 10:40 PM, Hugh Dickins <hughd@google.com> wrote:
> On Tue, 15 Apr 2014, Dave Jones wrote:
>
>> kernel BUG at mm/filemap.c:202!
>> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> Modules linked in: tun fuse bnep rfcomm nfnetlink llc2 af_key ipt_ULOG can_raw can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_r
>> xrpc can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm
>>  xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_controller snd_hda_codec e
>> 1000e btusb bluetooth microcode pcspkr serio_raw snd_hwdep snd_seq snd_seq_device snd_pcm 6lowpan_iphc usb_debug rfkill ptp pps_core shpchp snd_timer snd soundcore
>> CPU: 3 PID: 14244 Comm: trinity-main Not tainted 3.15.0-rc1+ #188
>> task: ffff8801be2c50a0 ti: ffff8801d6830000 task.ti: ffff8801d6830000
>> RIP: 0010:[<ffffffff9915b4d5>]  [<ffffffff9915b4d5>] __delete_from_page_cache+0x315/0x320
>> RSP: 0018:ffff8801d6831b10  EFLAGS: 00010046
>> RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000001d
>> RDX: 000000000000012a RSI: ffffffff99a9a1c0 RDI: ffffffff99a6dad5
>> RBP: ffff8801d6831b60 R08: 000000000000005d R09: ffff8801b0361530
>> R10: ffff8801d6831b28 R11: 0000000000000000 R12: ffffea000734d440
>> R13: ffff880241235008 R14: 0000000000000000 R15: ffff880241235010
>> FS:  00007f81925cf740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000630058 CR3: 0000000019c0e000 CR4: 00000000001407e0
>> DR0: 0000000000df3000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>> Stack:
>>  ffff880241235020 ffff880241235038 ffff8801b0361530 ffff8801b0361640
>>  000000001da16adc ffffea000734d440 ffff880241235020 0000000000000000
>>  0000000000000000 000000000000005d ffff8801d6831b88 ffffffff9915b51d
>> Call Trace:
>>  [<ffffffff9915b51d>] delete_from_page_cache+0x3d/0x70
>>  [<ffffffff9916ab7b>] truncate_inode_page+0x5b/0x90
>>  [<ffffffff991759ab>] shmem_undo_range+0x30b/0x780
>>  [<ffffffff990a99e5>] ? local_clock+0x25/0x30
>>  [<ffffffff99175e34>] shmem_truncate_range+0x14/0x30
>>  [<ffffffff99175f1d>] shmem_evict_inode+0xcd/0x150
>>  [<ffffffff991e46e7>] evict+0xa7/0x170
>>  [<ffffffff991e5005>] iput+0xf5/0x180
>>  [<ffffffff991df390>] dentry_kill+0x210/0x250
>>  [<ffffffff991df43c>] dput+0x6c/0x110
>>  [<ffffffff991c8c19>] __fput+0x189/0x200
>>  [<ffffffff991c8cde>] ____fput+0xe/0x10
>>  [<ffffffff990900b4>] task_work_run+0xb4/0xe0
>>  [<ffffffff9906ea92>] do_exit+0x302/0xb80
>>  [<ffffffff99349843>] ? __this_cpu_preempt_check+0x13/0x20
>>  [<ffffffff9907038c>] do_group_exit+0x4c/0xc0
>>  [<ffffffff99070414>] SyS_exit_group+0x14/0x20
>>  [<ffffffff9975a964>] tracesys+0xdd/0xe2
>> Code: 4c 89 30 e9 80 fe ff ff 48 8b 75 c0 4c 89 ff e8 e2 8e 1c 00 84 c0 0f 85 6c fe ff ff e9 4f fe ff ff 0f 1f 44 00 00 e8 4e 85 5e 00 <0f> 0b e8 84 1d f1 ff 0f :
>>
>>
>>  202         BUG_ON(page_mapped(page));
>
> I've been wrestling with this report, but made no progress;
> maybe if I set down a few thoughts, someone can help us forward.
>
> It is reasonable to assume (but unreasonable to hold on too tightly
> to the assumption) that this is related to Dave's contemporaneous
> report of BUG: Bad rss-counter state mm:ffff88023fc73c00 idx:0 val:5
>
> I don't know if they both occurred in the same session; but whether
> or not they did, the BUG_ON(page_mapped(page)) from inode eviction
> implies that not every pte mapping a shmem file page had been located
> when its last mapper exited; and the rss-counter message implies that
> there were five pte mappings of file(s) which could not be located
> when their mapper exited.
>
> It is also reasonable to assume (but unreasonable to hold on too
> tightly to the assumption) that this is another manifestation of
> the same unsolved mm/filemap.c:202 that Sasha reported on rc5-next
> a month ago, https://lkml.org/lkml/2014/3/7/298
>
> Now that one occurred, not while evicting a shmem inode, but while
> punching a hole in it with madvise(,,MADV_REMOVE).  At the time I
> set it aside to consider when improving shmem_fallocate(), but now
> it looks more like a precursor of Dave's.
>
> One way this could happen is if we have racing tasks setting up
> ptes without the necessary locking, one placing its pte on top of
> another's, so page_mapcount goes up by 2 but comes down by 1 later.
> But I failed to find anywhere in the code in danger of doing that.
>
> Another way it could happen is if a vma is removed from i_mmap tree
> and i_mmap_nonlinear list, without zap_pte_range() having zapped all
> of its ptes; but I don't see where that could happen either.
>
> Sasha's came before shmem participated in Kirill's filemap_map_pages
> fault-around; but his pte_same/pte_none checking under ptl there looks
> correct anyway.  I've not found any recent change likely to blame.
>
> Help!

Using a trinity as of today I'm able to trigger this bug on UML within seconds.
If you want me to test patch, I can help.

I'm also observing one strange fact, I can trigger this on any kernel version.
So far I've managed UML to crash on 3.0 to 3.15-rc...

-- 
Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [3.15rc1] BUG at mm/filemap.c:202!
  2014-05-01 16:20   ` Richard Weinberger
@ 2014-05-03 19:24     ` Richard Weinberger
  2014-05-04 20:37       ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Weinberger @ 2014-05-03 19:24 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Kirill A. Shutemov, Johannes Weiner,
	Sasha Levin, LKML, linux-mm@kvack.org

On Thu, May 1, 2014 at 6:20 PM, Richard Weinberger
<richard.weinberger@gmail.com> wrote:
> On Wed, Apr 16, 2014 at 10:40 PM, Hugh Dickins <hughd@google.com> wrote:
>> On Tue, 15 Apr 2014, Dave Jones wrote:
>>
>>> kernel BUG at mm/filemap.c:202!
>>> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>> Modules linked in: tun fuse bnep rfcomm nfnetlink llc2 af_key ipt_ULOG can_raw can_bcm scsi_transport_iscsi nfc caif_socket caif af_802154 ieee802154 phonet af_r
>>> xrpc can pppoe pppox ppp_generic slhc irda crc_ccitt rds rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 cfg80211 coretemp hwmon x86_pkg_temp_thermal kvm_intel kvm
>>>  xfs libcrc32c snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic crct10dif_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel snd_hda_controller snd_hda_codec e
>>> 1000e btusb bluetooth microcode pcspkr serio_raw snd_hwdep snd_seq snd_seq_device snd_pcm 6lowpan_iphc usb_debug rfkill ptp pps_core shpchp snd_timer snd soundcore
>>> CPU: 3 PID: 14244 Comm: trinity-main Not tainted 3.15.0-rc1+ #188
>>> task: ffff8801be2c50a0 ti: ffff8801d6830000 task.ti: ffff8801d6830000
>>> RIP: 0010:[<ffffffff9915b4d5>]  [<ffffffff9915b4d5>] __delete_from_page_cache+0x315/0x320
>>> RSP: 0018:ffff8801d6831b10  EFLAGS: 00010046
>>> RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000001d
>>> RDX: 000000000000012a RSI: ffffffff99a9a1c0 RDI: ffffffff99a6dad5
>>> RBP: ffff8801d6831b60 R08: 000000000000005d R09: ffff8801b0361530
>>> R10: ffff8801d6831b28 R11: 0000000000000000 R12: ffffea000734d440
>>> R13: ffff880241235008 R14: 0000000000000000 R15: ffff880241235010
>>> FS:  00007f81925cf740(0000) GS:ffff880244600000(0000) knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000000630058 CR3: 0000000019c0e000 CR4: 00000000001407e0
>>> DR0: 0000000000df3000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>>> Stack:
>>>  ffff880241235020 ffff880241235038 ffff8801b0361530 ffff8801b0361640
>>>  000000001da16adc ffffea000734d440 ffff880241235020 0000000000000000
>>>  0000000000000000 000000000000005d ffff8801d6831b88 ffffffff9915b51d
>>> Call Trace:
>>>  [<ffffffff9915b51d>] delete_from_page_cache+0x3d/0x70
>>>  [<ffffffff9916ab7b>] truncate_inode_page+0x5b/0x90
>>>  [<ffffffff991759ab>] shmem_undo_range+0x30b/0x780
>>>  [<ffffffff990a99e5>] ? local_clock+0x25/0x30
>>>  [<ffffffff99175e34>] shmem_truncate_range+0x14/0x30
>>>  [<ffffffff99175f1d>] shmem_evict_inode+0xcd/0x150
>>>  [<ffffffff991e46e7>] evict+0xa7/0x170
>>>  [<ffffffff991e5005>] iput+0xf5/0x180
>>>  [<ffffffff991df390>] dentry_kill+0x210/0x250
>>>  [<ffffffff991df43c>] dput+0x6c/0x110
>>>  [<ffffffff991c8c19>] __fput+0x189/0x200
>>>  [<ffffffff991c8cde>] ____fput+0xe/0x10
>>>  [<ffffffff990900b4>] task_work_run+0xb4/0xe0
>>>  [<ffffffff9906ea92>] do_exit+0x302/0xb80
>>>  [<ffffffff99349843>] ? __this_cpu_preempt_check+0x13/0x20
>>>  [<ffffffff9907038c>] do_group_exit+0x4c/0xc0
>>>  [<ffffffff99070414>] SyS_exit_group+0x14/0x20
>>>  [<ffffffff9975a964>] tracesys+0xdd/0xe2
>>> Code: 4c 89 30 e9 80 fe ff ff 48 8b 75 c0 4c 89 ff e8 e2 8e 1c 00 84 c0 0f 85 6c fe ff ff e9 4f fe ff ff 0f 1f 44 00 00 e8 4e 85 5e 00 <0f> 0b e8 84 1d f1 ff 0f :
>>>
>>>
>>>  202         BUG_ON(page_mapped(page));
>>
>> I've been wrestling with this report, but made no progress;
>> maybe if I set down a few thoughts, someone can help us forward.
>>
>> It is reasonable to assume (but unreasonable to hold on too tightly
>> to the assumption) that this is related to Dave's contemporaneous
>> report of BUG: Bad rss-counter state mm:ffff88023fc73c00 idx:0 val:5
>>
>> I don't know if they both occurred in the same session; but whether
>> or not they did, the BUG_ON(page_mapped(page)) from inode eviction
>> implies that not every pte mapping a shmem file page had been located
>> when its last mapper exited; and the rss-counter message implies that
>> there were five pte mappings of file(s) which could not be located
>> when their mapper exited.
>>
>> It is also reasonable to assume (but unreasonable to hold on too
>> tightly to the assumption) that this is another manifestation of
>> the same unsolved mm/filemap.c:202 that Sasha reported on rc5-next
>> a month ago, https://lkml.org/lkml/2014/3/7/298
>>
>> Now that one occurred, not while evicting a shmem inode, but while
>> punching a hole in it with madvise(,,MADV_REMOVE).  At the time I
>> set it aside to consider when improving shmem_fallocate(), but now
>> it looks more like a precursor of Dave's.
>>
>> One way this could happen is if we have racing tasks setting up
>> ptes without the necessary locking, one placing its pte on top of
>> another's, so page_mapcount goes up by 2 but comes down by 1 later.
>> But I failed to find anywhere in the code in danger of doing that.
>>
>> Another way it could happen is if a vma is removed from i_mmap tree
>> and i_mmap_nonlinear list, without zap_pte_range() having zapped all
>> of its ptes; but I don't see where that could happen either.
>>
>> Sasha's came before shmem participated in Kirill's filemap_map_pages
>> fault-around; but his pte_same/pte_none checking under ptl there looks
>> correct anyway.  I've not found any recent change likely to blame.
>>
>> Help!
>
> Using a trinity as of today I'm able to trigger this bug on UML within seconds.
> If you want me to test patch, I can help.
>
> I'm also observing one strange fact, I can trigger this on any kernel version.
> So far I've managed UML to crash on 3.0 to 3.15-rc...

After digging deeper into UML's mmu and tlb code I've found issues and
fixed them.

But I'm still facing this issue. Although triggering the BUG_ON() is
not so easy as before
I can trigger "BUG: Bad rss-counter ..." very easily.
Now the interesting fact, with my UML mmu and flb fixes applied it
happens only on kernels >= 3.14.
If it helps I can try to bisect it.

-- 
Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [3.15rc1] BUG at mm/filemap.c:202!
  2014-05-03 19:24     ` Richard Weinberger
@ 2014-05-04 20:37       ` Hugh Dickins
  2014-05-04 20:58         ` Richard Weinberger
  0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2014-05-04 20:37 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Dave Jones, Andrew Morton, Kirill A. Shutemov, Johannes Weiner,
	Sasha Levin, LKML, linux-mm@kvack.org

On Sat, 3 May 2014, Richard Weinberger wrote:
> On Thu, May 1, 2014 at 6:20 PM, Richard Weinberger
> <richard.weinberger@gmail.com> wrote:
> > On Wed, Apr 16, 2014 at 10:40 PM, Hugh Dickins <hughd@google.com> wrote:
> >>
> >> Help!
> >
> > Using a trinity as of today I'm able to trigger this bug on UML within seconds.
> > If you want me to test patch, I can help.
> >
> > I'm also observing one strange fact, I can trigger this on any kernel version.
> > So far I've managed UML to crash on 3.0 to 3.15-rc...
> 
> After digging deeper into UML's mmu and tlb code I've found issues and
> fixed them.
> 
> But I'm still facing this issue. Although triggering the BUG_ON() is
> not so easy as before
> I can trigger "BUG: Bad rss-counter ..." very easily.
> Now the interesting fact, with my UML mmu and flb fixes applied it
> happens only on kernels >= 3.14.
> If it helps I can try to bisect it.

Thanks a lot for trying, but from other mail it looks like your
bisection got blown off course ;(

I expect for the moment you'll want to concentrate on getting UML's
TLB flushing back on track with 3.15-rc.

Once you have that sorted out, I wouldn't be surprised if the same
changes turn out to fix your "Bad rss-counter"s on 3.14 also.

If not, and if you do still have time to bisect back between 3.13 and
3.14 to find where things went wrong, it will be a bit tedious in that
you would probably have to apply

887843961c4b "mm: fix bad rss-counter if remap_file_pages raced migration"
7e09e738afd2 "mm: fix swapops.h:131 bug if remap_file_pages raced migration"

at each stage, to avoid those now-known bugs which trinity became rather
good at triggering.  Perhaps other fixes needed, those the two I remember.

Please don't worry if you don't have time for this, that's understandable.

Or is UML so contrary that one of those commits actually brings on the
problem for you?

As to the BUG_ON(page_mapped(page)), I still have nothing to suggest.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [3.15rc1] BUG at mm/filemap.c:202!
  2014-05-04 20:37       ` Hugh Dickins
@ 2014-05-04 20:58         ` Richard Weinberger
  2014-05-04 21:46           ` 502304919
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Weinberger @ 2014-05-04 20:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Kirill A. Shutemov, Johannes Weiner,
	Sasha Levin, LKML, linux-mm@kvack.org

Am 04.05.2014 22:37, schrieb Hugh Dickins:
> On Sat, 3 May 2014, Richard Weinberger wrote:
>> On Thu, May 1, 2014 at 6:20 PM, Richard Weinberger
>> <richard.weinberger@gmail.com> wrote:
>>> On Wed, Apr 16, 2014 at 10:40 PM, Hugh Dickins <hughd@google.com> wrote:
>>>>
>>>> Help!
>>>
>>> Using a trinity as of today I'm able to trigger this bug on UML within seconds.
>>> If you want me to test patch, I can help.
>>>
>>> I'm also observing one strange fact, I can trigger this on any kernel version.
>>> So far I've managed UML to crash on 3.0 to 3.15-rc...
>>
>> After digging deeper into UML's mmu and tlb code I've found issues and
>> fixed them.
>>
>> But I'm still facing this issue. Although triggering the BUG_ON() is
>> not so easy as before
>> I can trigger "BUG: Bad rss-counter ..." very easily.
>> Now the interesting fact, with my UML mmu and flb fixes applied it
>> happens only on kernels >= 3.14.
>> If it helps I can try to bisect it.
> 
> Thanks a lot for trying, but from other mail it looks like your
> bisection got blown off course ;(

Yeah, looks like the issue I'm facing on UML is a completely different
story. Although the symptoms are identical. :-(

> I expect for the moment you'll want to concentrate on getting UML's
> TLB flushing back on track with 3.15-rc.

This is what I'm currently doing. But it might take some time
as I'm a mm novice.

> Once you have that sorted out, I wouldn't be surprised if the same
> changes turn out to fix your "Bad rss-counter"s on 3.14 also.
> 
> If not, and if you do still have time to bisect back between 3.13 and
> 3.14 to find where things went wrong, it will be a bit tedious in that
> you would probably have to apply
> 
> 887843961c4b "mm: fix bad rss-counter if remap_file_pages raced migration"
> 7e09e738afd2 "mm: fix swapops.h:131 bug if remap_file_pages raced migration"
> 
> at each stage, to avoid those now-known bugs which trinity became rather
> good at triggering.  Perhaps other fixes needed, those the two I remember.
> 
> Please don't worry if you don't have time for this, that's understandable.
> 
> Or is UML so contrary that one of those commits actually brings on the
> problem for you?

Hehe, no. I gave it a quick try, both 887843961c4b and 7e09e738afd2
seem to be unrelated to the issues I see.

Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [3.15rc1] BUG at mm/filemap.c:202!
  2014-05-04 20:58         ` Richard Weinberger
@ 2014-05-04 21:46           ` 502304919
  0 siblings, 0 replies; 13+ messages in thread
From: 502304919 @ 2014-05-04 21:46 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Kirill A. Shutemov, linux-mm@kvack.org, Dave Jones, Andrew Morton,
	LKML, Sasha Levin, Johannes Weiner, Hugh Dickins

[-- Attachment #1: Type: text/plain, Size: 2555 bytes --]

I've added this to my to-do list. On May 4, 2014 at 3:58:01 PM CDT, Richard Weinberger <richard@nod.at> wrote:Am 04.05.2014 22:37, schrieb Hugh Dickins:> On Sat, 3 May 2014, Richard Weinberger wrote:>> On Thu, May 1, 2014 at 6:20 PM, Richard Weinberger>>  wrote:>>> On Wed, Apr 16, 2014 at 10:40 PM, Hugh Dickins  wrote:>>>>>>>> Help!>>>>>> Using a trinity as of today I'm able to trigger this bug on UML within seconds.>>> If you want me to test patch, I can help.>>>>>> I'm also observing one strange fact, I can trigger this on any kernel version.>>> So far I've managed UML to crash on 3.0 to 3.15-rc...>>>> After digging deeper into UML's mmu and tlb code I've found issues and>> fixed them.>>>> But I'm still facing this issue. Although triggering the BUG_ON() is>> not so easy as before>> I can trigger "BUG: Bad rss-counter ..." very easily.>> Now the interesting fact, with my UML mmu and flb fixes applied it>> happens only on kernels >= 3.14.>> If it helps I can try to bisect it.> > Thanks a lot for trying, but from other mail it looks like your> bisection got blown off course ;(Yeah, looks like the issue I'm facing on UML is a completely differentstory. Although the symptoms are identical. :-(> I expect for the moment you'll want to concentrate on getting UML's> TLB flushing back on track with 3.15-rc.This is what I'm currently doing. But it might take some timeas I'm a mm novice.> Once you have that sorted out, I wouldn't be surprised if the same> changes turn out to fix your "Bad rss-counter"s on 3.14 also.> > If not, and if you do still have time to bisect back between 3.13 and> 3.14 to find where things went wrong, it will be a bit tedious in that> you would probably have to apply> > 887843961c4b "mm: fix bad rss-counter if remap_file_pages raced migration"> 7e09e738afd2 "mm: fix swapops.h:131 bug if remap_file_pages raced migration"> > at each stage, to avoid those now-known bugs which trinity became rather> good at triggering. Perhaps other fixes needed, those the two I remember.> > Please don't worry if you don't have time for this, that's understandable.> > Or is UML so contrary that one of those commits actually brings on the> problem for you?Hehe, no. I gave it a quick try, both 887843961c4b and 7e09e738afd2seem to be unrelated to the issues I see.Thanks,//richard--To unsubscribe from this list: send the line "unsubscribe linux-kernel" inthe body of a message to majordomo@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.htmlPlease read the FAQ at http://www.tux.org/lkml/     

[-- Attachment #2: Type: text/html, Size: 3199 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] mm: Fix force_flush behavior in zap_pte_range()
  2014-04-16 20:40 ` Hugh Dickins
  2014-05-01 16:20   ` Richard Weinberger
@ 2014-05-03 23:37   ` Richard Weinberger
  2014-05-03 23:57     ` Linus Torvalds
  1 sibling, 1 reply; 13+ messages in thread
From: Richard Weinberger @ 2014-05-03 23:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Richard Weinberger, Dave Jones, Andrew Morton,
	Kirill A. Shutemov, Johannes Weiner, Sasha Levin, Hugh Dickins,
	Linus Torvalds, toralf.foerster

Commit 1cf35d47 (mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts)
accidently changed the behavior of the force_flush variable.
Before the patch it was set by __tlb_remove_page(). Now it is only set to 1
if __tlb_remove_page() returns false but never set back to 0 if __tlb_remove_page()
returns true. And therefore the flush happens now too often.
This patch restores the old behavior.

Fixes BUG: Bad rss-counter state ...
and
kernel BUG at mm/filemap.c:202!

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: toralf.foerster@gmx.de
Cc: Dave Jones <davej@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com> 
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: toralf.foerster@gmx.de
Signed-off-by: Richard Weinberger <richard@nod.at>
---
 mm/memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 037b812..585885b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1148,10 +1148,10 @@ again:
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
-			if (unlikely(!__tlb_remove_page(tlb, page))) {
-				force_flush = 1;
+			force_flush = !__tlb_remove_page(tlb, page);
+			if (force_flush)
 				break;
-			}
+
 			continue;
 		}
 		/*
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()
  2014-05-03 23:37   ` [PATCH] mm: Fix force_flush behavior in zap_pte_range() Richard Weinberger
@ 2014-05-03 23:57     ` Linus Torvalds
  2014-05-04  8:34       ` Richard Weinberger
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2014-05-03 23:57 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Linux Kernel Mailing List, linux-mm, Dave Jones, Andrew Morton,
	Kirill A. Shutemov, Johannes Weiner, Sasha Levin, Hugh Dickins,
	Toralf Förster

On Sat, May 3, 2014 at 4:37 PM, Richard Weinberger <richard@nod.at> wrote:
> Commit 1cf35d47 (mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts)
> accidently changed the behavior of the force_flush variable.

No it didn't. There was nothing accidental about it, and it doesn't
even change it the way you claim.

> Before the patch it was set by __tlb_remove_page(). Now it is only set to 1
> if __tlb_remove_page() returns false but never set back to 0 if __tlb_remove_page()
> returns true.

It starts out as zero. If __tlb_remove_page() returns true, it never
gets set to anything *but* zero, except by the dirty shared mapping
case that *needs* to set it to non-zero, exactly because it *needs* to
flush the TLB before releasing the pte lock.

Which was the whole point of the patch.

Your explanation makes no sense for _another_ reason: even with your
patch, it never gets set back to zero, since if it gets set to one you
have that "break" in there. So the whole "gets set back to zero" is
simply not relevant or true, with or with the patch.

The only place it actually gets zeroed (apart from initialization) is
for the "goto again" case, which does it (and always did it)

> Fixes BUG: Bad rss-counter state ...
> and
> kernel BUG at mm/filemap.c:202!

So tell us more about those actual problems, because your patch and
explanation is clearly wrong.

What hardware, what load, what "kernel BUG at filemap.c:202"?

The shared dirty fix may certainly be exposing some other issue, but
the only report I have seen about filemap.c:202 was reported by Dave
Jones ten *days* before the commit you talk about was even done.

So this whole thing makes no sense what-so-ever.

              Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()
  2014-05-03 23:57     ` Linus Torvalds
@ 2014-05-04  8:34       ` Richard Weinberger
  2014-05-04 18:31         ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Weinberger @ 2014-05-04  8:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linux-mm, Dave Jones, Andrew Morton,
	Kirill A. Shutemov, Johannes Weiner, Sasha Levin, Hugh Dickins,
	Toralf Förster

Linus,

Am 04.05.2014 01:57, schrieb Linus Torvalds:
> On Sat, May 3, 2014 at 4:37 PM, Richard Weinberger <richard@nod.at> wrote:
>> Commit 1cf35d47 (mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts)
>> accidently changed the behavior of the force_flush variable.
> 
> No it didn't. There was nothing accidental about it, and it doesn't
> even change it the way you claim.
> 
>> Before the patch it was set by __tlb_remove_page(). Now it is only set to 1
>> if __tlb_remove_page() returns false but never set back to 0 if __tlb_remove_page()
>> returns true.
> 
> It starts out as zero. If __tlb_remove_page() returns true, it never
> gets set to anything *but* zero, except by the dirty shared mapping
> case that *needs* to set it to non-zero, exactly because it *needs* to
> flush the TLB before releasing the pte lock.
> 
> Which was the whole point of the patch.
> 
> Your explanation makes no sense for _another_ reason: even with your
> patch, it never gets set back to zero, since if it gets set to one you
> have that "break" in there. So the whole "gets set back to zero" is
> simply not relevant or true, with or with the patch.

Hmm, I got confused by:
                        if (PageAnon(page))
                                rss[MM_ANONPAGES]--;
                        else {
                                if (pte_dirty(ptent)) {
                                        force_flush = 1;

Here you set force_flush.

                                        set_page_dirty(page);
                                }
                                if (pte_young(ptent) &&
                                    likely(!(vma->vm_flags & VM_SEQ_READ)))
                                        mark_page_accessed(page);
                                rss[MM_FILEPAGES]--;
                        }
                        page_remove_rmap(page);
                        if (unlikely(page_mapcount(page) < 0))
                                print_bad_pte(vma, addr, ptent, page);
                        if (unlikely(!__tlb_remove_page(tlb, page))) {
                                force_flush = 1;
                                break;
                        }

And here it cannot get back to 0.

                        continue;



> The only place it actually gets zeroed (apart from initialization) is
> for the "goto again" case, which does it (and always did it)
> 
>> Fixes BUG: Bad rss-counter state ...
>> and
>> kernel BUG at mm/filemap.c:202!
> 
> So tell us more about those actual problems, because your patch and
> explanation is clearly wrong.
> 
> What hardware, what load, what "kernel BUG at filemap.c:202"?

With your patch applied I see lots of BUG: Bad rss-counter state messages on UML (x86_32)
when fuzzing with trinity the mremap syscall.
And sometimes I face BUG at mm/filemap.c:202.

UML is here a bit special. It maps two pages into every process (the stub pages)
to issue mmap(), munmap() or mprotect() upon a page fault to fix memory mappings
for the faulting process on the host side.
It has to make sure that a guest process cannot mess with its stub pages.
Otherwise a guest could execute code on the host side.

Trinity manages to destroy these stub pages, UML detects this upon TLB handling
and kills the current process immediately.
After killing a trinity child I start observing the said issues.

e.g.
fix_range_common: failed, killing current process: 841
fix_range_common: failed, killing current process: 842
fix_range_common: failed, killing current process: 843
BUG: Bad rss-counter state mm:28e69600 idx:0 val:2

> The shared dirty fix may certainly be exposing some other issue, but
> the only report I have seen about filemap.c:202 was reported by Dave
> Jones ten *days* before the commit you talk about was even done.

Mea culpa, I've not noticed that fact.
Back to the drawing board...

Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()
  2014-05-04  8:34       ` Richard Weinberger
@ 2014-05-04 18:31         ` Linus Torvalds
  2014-05-04 20:42           ` Richard Weinberger
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2014-05-04 18:31 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Linux Kernel Mailing List, linux-mm, Dave Jones, Andrew Morton,
	Kirill A. Shutemov, Johannes Weiner, Sasha Levin, Hugh Dickins,
	Toralf Förster

On Sun, May 4, 2014 at 1:34 AM, Richard Weinberger <richard@nod.at> wrote:
>
> Hmm, I got confused by:
>                         if (PageAnon(page))
>                                 rss[MM_ANONPAGES]--;
>                         else {
>                                 if (pte_dirty(ptent)) {
>                                         force_flush = 1;
>
> Here you set force_flush.

Yes. And it needs to stay set, but we don't want to break out early.

The logic is:

 - if the tlb removal page batching tables fill up, we need to stop
any further batching, and flush the TLB immediately, since we don't
have room for any more entries.

   Thus that case does "force_flush=1" _and_ a "break" out of the loop.

 - if we see dirty shared pages, we need to flush the TLB before we
release the page table lock, but we don't have to stop further
batching.

   So this case just does "force_flush=1", but will continue to loop
over the page tables, since it can happily batch more pages.

>                         if (unlikely(!__tlb_remove_page(tlb, page))) {
>                                 force_flush = 1;
>                                 break;
>                         }
>
> And here it cannot get back to 0.

Correct. It *must* not go back to zero, because that would break the
"we had dirty pages, and more room to batch things".

> With your patch applied I see lots of BUG: Bad rss-counter state messages on UML (x86_32)
> when fuzzing with trinity the mremap syscall.
> And sometimes I face BUG at mm/filemap.c:202.

I'm suspecting that it's some UML bug that is triggered by the
changes. UML has its own tlb gather logic (I'm not quite sure why), I
wonder what's up.

Also, are the messages coming from UML or from the host kernel? I'm
assuming they are UML.

> After killing a trinity child I start observing the said issues.
>
> e.g.
> fix_range_common: failed, killing current process: 841
> fix_range_common: failed, killing current process: 842
> fix_range_common: failed, killing current process: 843
> BUG: Bad rss-counter state mm:28e69600 idx:0 val:2

That "idx=0" means that it's MM_FILEPAGES. Apparently the killing
ended up resulting in not freeing all the file mapping pte's.

So I'm assuming the real issue is that fix_range_common failure that
triggers this.

Exactly why the new tlb flushing triggers this is not entirely clear,
but I'd take a look at how UML reacts to the whole fact that a forced
flush (which never happened before, because your __tlb_remove_page()
doesn't batch anything up and always returns 1) updates the tlb
start/end fields as it does the tlb_flush_mmu_tlbonly().

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()
  2014-05-04 18:31         ` Linus Torvalds
@ 2014-05-04 20:42           ` Richard Weinberger
  2014-05-04 21:19             ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Weinberger @ 2014-05-04 20:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, linux-mm, Dave Jones, Andrew Morton,
	Kirill A. Shutemov, Johannes Weiner, Sasha Levin, Hugh Dickins,
	Toralf Förster

Am 04.05.2014 20:31, schrieb Linus Torvalds:
>> With your patch applied I see lots of BUG: Bad rss-counter state messages on UML (x86_32)
>> when fuzzing with trinity the mremap syscall.
>> And sometimes I face BUG at mm/filemap.c:202.
> 
> I'm suspecting that it's some UML bug that is triggered by the
> changes. UML has its own tlb gather logic (I'm not quite sure why), I
> wonder what's up.

I cannot tell why UML has it's own tlb gather logic, I suspect nobody
cared so far to clean up the code.
That said, I've converted it today to the generic gather logic and it works.
Sadly I'm still facing the same issues (sigh!).

> Also, are the messages coming from UML or from the host kernel? I'm
> assuming they are UML.

>From UML directly.

>> After killing a trinity child I start observing the said issues.
>>
>> e.g.
>> fix_range_common: failed, killing current process: 841
>> fix_range_common: failed, killing current process: 842
>> fix_range_common: failed, killing current process: 843
>> BUG: Bad rss-counter state mm:28e69600 idx:0 val:2
> 
> That "idx=0" means that it's MM_FILEPAGES. Apparently the killing
> ended up resulting in not freeing all the file mapping pte's.
> 
> So I'm assuming the real issue is that fix_range_common failure that
> triggers this.
> 
> Exactly why the new tlb flushing triggers this is not entirely clear,
> but I'd take a look at how UML reacts to the whole fact that a forced
> flush (which never happened before, because your __tlb_remove_page()
> doesn't batch anything up and always returns 1) updates the tlb
> start/end fields as it does the tlb_flush_mmu_tlbonly().

Thanks for the pointer, I'll dig deeper into the issue.

Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Fix force_flush behavior in zap_pte_range()
  2014-05-04 20:42           ` Richard Weinberger
@ 2014-05-04 21:19             ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2014-05-04 21:19 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Linux Kernel Mailing List, linux-mm, Dave Jones, Andrew Morton,
	Kirill A. Shutemov, Johannes Weiner, Sasha Levin, Hugh Dickins,
	Toralf Förster

On Sun, May 4, 2014 at 1:42 PM, Richard Weinberger <richard@nod.at> wrote:
>
> I cannot tell why UML has it's own tlb gather logic, I suspect nobody
> cared so far to clean up the code.
> That said, I've converted it today to the generic gather logic and it works.
> Sadly I'm still facing the same issues (sigh!).

Ok, so it's not the gathering.

I'm guessing it's because the tlb flush patterns change (we now flush
partial areas for shared mappings with dirty pages - it used to be
that you'd only ever see full ranges before), and that shows some
issue with the whole "fix_range()" thing. So then the kill(9) results
in stopping the page table zapping in the middle, and then you end up
with that "Bad rss-counter" for the file mapping.

Can you try to debug it to see where that "ret" gets set in
fix_range_common() (well, likely deeper, I presume it comes from
update_pte_range() or whatever), to see exactly _what_ it is that
starts failing?

           Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-05-04 21:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-15 19:09 [3.15rc1] BUG at mm/filemap.c:202! Dave Jones
2014-04-16 20:40 ` Hugh Dickins
2014-05-01 16:20   ` Richard Weinberger
2014-05-03 19:24     ` Richard Weinberger
2014-05-04 20:37       ` Hugh Dickins
2014-05-04 20:58         ` Richard Weinberger
2014-05-04 21:46           ` 502304919
2014-05-03 23:37   ` [PATCH] mm: Fix force_flush behavior in zap_pte_range() Richard Weinberger
2014-05-03 23:57     ` Linus Torvalds
2014-05-04  8:34       ` Richard Weinberger
2014-05-04 18:31         ` Linus Torvalds
2014-05-04 20:42           ` Richard Weinberger
2014-05-04 21:19             ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).