* xfs trace in 4.4.2
@ 2016-02-20 8:02 Stefan Priebe
2016-02-20 14:45 ` Brian Foster
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Priebe @ 2016-02-20 8:02 UTC (permalink / raw)
To: xfs@oss.sgi.com
Cc: linux-fsdevel, xfs-masters@oss.sgi.com,
david@fromorbit.com >> Dave Chinner
Hi,
got this one today. Not sure if this is a bug.
[67674.907736] ------------[ cut here ]------------
[67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
xfs_vm_releasepage+0xa9/0xe0()
[67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod
i2c_algo_bit ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
sas scsi_transport_sas pps_core
[67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
[67675.277120] Hardware name: Supermicro
X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
[67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
0000000000000001
[67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587
ffff88007950fae8
[67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
ffffea00208834a0
[67675.506112] Call Trace:
[67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
[67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
[67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
[67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
[67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
[67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
[67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
[67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
[67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
[67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
[67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930
[67676.172075] [<ffffffffa3166160>] ?
mem_cgroup_shrink_node_zone+0x150/0x150
[67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0
[67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
[67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
[67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
[67676.433499] ---[ end trace cb1827fe308f7f6b ]---
Greets Stefan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2
2016-02-20 8:02 xfs trace in 4.4.2 Stefan Priebe
@ 2016-02-20 14:45 ` Brian Foster
2016-02-20 18:02 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2016-02-20 14:45 UTC (permalink / raw)
To: Stefan Priebe; +Cc: xfs@oss.sgi.com, linux-fsdevel, xfs-masters@oss.sgi.com
On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> Hi,
>
> got this one today. Not sure if this is a bug.
>
That looks like the releasepage() delayed allocation block warning. I'm
not sure we've had any fixes for (or reports of) that issue since the
v4.2 timeframe.
What is the xfs_info of the associated filesystem? Also, do you have any
insight as to the possible reproducer application or workload? Is this
reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
won't fire again regardless until after a reboot.
Brian
> [67674.907736] ------------[ cut here ]------------
> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
> xfs_vm_releasepage+0xa9/0xe0()
> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
> sas scsi_transport_sas pps_core
> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
> [67675.277120] Hardware name: Supermicro
> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
> 0000000000000001
> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587
> ffff88007950fae8
> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
> ffffea00208834a0
> [67675.506112] Call Trace:
> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930
> [67676.172075] [<ffffffffa3166160>] ?
> mem_cgroup_shrink_node_zone+0x150/0x150
> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0
> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>
> Greets Stefan
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2
2016-02-20 14:45 ` Brian Foster
@ 2016-02-20 18:02 ` Stefan Priebe - Profihost AG
2016-03-04 18:47 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-02-20 18:02 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs@oss.sgi.com, linux-fsdevel, xfs-masters@oss.sgi.com
> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>
>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>> Hi,
>>
>> got this one today. Not sure if this is a bug.
>
> That looks like the releasepage() delayed allocation block warning. I'm
> not sure we've had any fixes for (or reports of) that issue since the
> v4.2 timeframe.
>
> What is the xfs_info of the associated filesystem? Also, do you have any
> insight as to the possible reproducer application or workload? Is this
> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
> won't fire again regardless until after a reboot.
Sorry no reproducer and also no xfs Info. As i didn't know which fs this was.
But the job running is doing:
mount /dev/loop0p3 /mpt
xfs_repair -n /mpt
unount /mpt
Stefan
>
> Brian
>
>> [67674.907736] ------------[ cut here ]------------
>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
>> xfs_vm_releasepage+0xa9/0xe0()
>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
>> sas scsi_transport_sas pps_core
>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
>> [67675.277120] Hardware name: Supermicro
>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
>> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
>> 0000000000000001
>> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587
>> ffff88007950fae8
>> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
>> ffffea00208834a0
>> [67675.506112] Call Trace:
>> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
>> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
>> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
>> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
>> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
>> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
>> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
>> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
>> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
>> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
>> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930
>> [67676.172075] [<ffffffffa3166160>] ?
>> mem_cgroup_shrink_node_zone+0x150/0x150
>> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0
>> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
>> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>>
>> Greets Stefan
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-02-20 18:02 ` Stefan Priebe - Profihost AG
@ 2016-03-04 18:47 ` Stefan Priebe
2016-03-04 19:13 ` Brian Foster
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Priebe @ 2016-03-04 18:47 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs@oss.sgi.com, linux-fsdevel, xfs-masters@oss.sgi.com
Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>
>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>
>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>> Hi,
>>>
>>> got this one today. Not sure if this is a bug.
>>
>> That looks like the releasepage() delayed allocation block warning. I'm
>> not sure we've had any fixes for (or reports of) that issue since the
>> v4.2 timeframe.
>>
>> What is the xfs_info of the associated filesystem? Also, do you have any
>> insight as to the possible reproducer application or workload? Is this
>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>> won't fire again regardless until after a reboot.
Toda i got this one running 4.3.3.
[154152.949610] ------------[ cut here ]------------
[154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
xfs_vm_releasepage+0xc3/0xf0()
[154152.952596] Modules linked in: netconsole mpt3sas raid_class
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp
ipt_REJECT nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables
x_tables 8021q garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si
usbcore edac_core ipmi_msghandler i2c_i801 usb_common button btrfs xor
raid6_pq sg igb sd_mod i2c_algo_bit isci i2c_core libsas ahci ptp
libahci scsi_transport_sas megaraid_sas pps_core
[154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
[154152.964625] Hardware name: Supermicro
X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
03/06/2012
[154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
0000000000000000
[154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
0000000000000000
[154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
ffffea0001e7bfe0
[154152.972447] Call Trace:
[154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
[154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
[154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
[154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
[154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
[154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
[154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
[154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
[154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
[154152.984380] [<ffffffffa7166ea0>] ?
mem_cgroup_shrink_node_zone+0x1a0/0x1a0
[154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
[154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
[154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
[154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
[154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
This time with an xfs info:
# xfs_info /
meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac
isize=256 agcount=4, agsize=58224256 blks
= sectsz=512 attr=2, projid32bit=0
= crc=0 finobt=0
data = bsize=4096 blocks=232897024, imaxpct=25
= sunit=64 swidth=384 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal bsize=4096 blocks=113728, version=2
= sectsz=512 sunit=64 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
>
>>
>> Brian
>>
>>> [67674.907736] ------------[ cut here ]------------
>>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
>>> xfs_vm_releasepage+0xa9/0xe0()
>>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
>>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
>>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
>>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
>>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
>>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
>>> sas scsi_transport_sas pps_core
>>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
>>> [67675.277120] Hardware name: Supermicro
>>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
>>> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
>>> 0000000000000001
>>> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587
>>> ffff88007950fae8
>>> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
>>> ffffea00208834a0
>>> [67675.506112] Call Trace:
>>> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
>>> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
>>> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
>>> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
>>> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
>>> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
>>> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
>>> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
>>> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
>>> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
>>> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930
>>> [67676.172075] [<ffffffffa3166160>] ?
>>> mem_cgroup_shrink_node_zone+0x150/0x150
>>> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0
>>> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
>>> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>>>
>>> Greets Stefan
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-04 18:47 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe
@ 2016-03-04 19:13 ` Brian Foster
2016-03-04 20:02 ` Stefan Priebe
0 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2016-03-04 19:13 UTC (permalink / raw)
To: Stefan Priebe; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >
> >>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>
> >>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>Hi,
> >>>
> >>>got this one today. Not sure if this is a bug.
> >>
> >>That looks like the releasepage() delayed allocation block warning. I'm
> >>not sure we've had any fixes for (or reports of) that issue since the
> >>v4.2 timeframe.
> >>
> >>What is the xfs_info of the associated filesystem? Also, do you have any
> >>insight as to the possible reproducer application or workload? Is this
> >>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
> >>won't fire again regardless until after a reboot.
>
> Toda i got this one running 4.3.3.
>
> [154152.949610] ------------[ cut here ]------------
> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
> xfs_vm_releasepage+0xc3/0xf0()
> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
> megaraid_sas pps_core
> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
> [154152.964625] Hardware name: Supermicro
> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
> 03/06/2012
> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
> 0000000000000000
> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
> 0000000000000000
> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
> ffffea0001e7bfe0
> [154152.972447] Call Trace:
> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
> [154152.984380] [<ffffffffa7166ea0>] ?
> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>
> This time with an xfs info:
> # xfs_info /
> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
> agcount=4, agsize=58224256 blks
> = sectsz=512 attr=2, projid32bit=0
> = crc=0 finobt=0
> data = bsize=4096 blocks=232897024, imaxpct=25
> = sunit=64 swidth=384 blks
> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
> log =internal bsize=4096 blocks=113728, version=2
> = sectsz=512 sunit=64 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
Can you describe the workload to the filesystem?
Brian
> >
> >>
> >>Brian
> >>
> >>>[67674.907736] ------------[ cut here ]------------
> >>>[67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
> >>>xfs_vm_releasepage+0xa9/0xe0()
> >>>[67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
> >>>nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
> >>>es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
> >>>sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
> >>>handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
> >>>ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
> >>>sas scsi_transport_sas pps_core
> >>>[67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
> >>>[67675.277120] Hardware name: Supermicro
> >>>X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
> >>>[67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
> >>>0000000000000001
> >>>[67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587
> >>>ffff88007950fae8
> >>>[67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
> >>>ffffea00208834a0
> >>>[67675.506112] Call Trace:
> >>>[67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
> >>>[67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
> >>>[67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
> >>>[67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
> >>>[67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
> >>>[67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
> >>>[67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
> >>>[67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
> >>>[67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
> >>>[67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
> >>>[67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930
> >>>[67676.172075] [<ffffffffa3166160>] ?
> >>>mem_cgroup_shrink_node_zone+0x150/0x150
> >>>[67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0
> >>>[67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> >>>[67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
> >>>[67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> >>>[67676.433499] ---[ end trace cb1827fe308f7f6b ]---
> >>>
> >>>Greets Stefan
> >>>
> >>>_______________________________________________
> >>>xfs mailing list
> >>>xfs@oss.sgi.com
> >>>http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-04 19:13 ` Brian Foster
@ 2016-03-04 20:02 ` Stefan Priebe
2016-03-04 21:03 ` Brian Foster
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Priebe @ 2016-03-04 20:02 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Am 04.03.2016 um 20:13 schrieb Brian Foster:
> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>
>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>
>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>> Hi,
>>>>>
>>>>> got this one today. Not sure if this is a bug.
>>>>
>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>> v4.2 timeframe.
>>>>
>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>> insight as to the possible reproducer application or workload? Is this
>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>> won't fire again regardless until after a reboot.
>>
>> Toda i got this one running 4.3.3.
>>
>> [154152.949610] ------------[ cut here ]------------
>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>> xfs_vm_releasepage+0xc3/0xf0()
>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>> megaraid_sas pps_core
>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>> [154152.964625] Hardware name: Supermicro
>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>> 03/06/2012
>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>> 0000000000000000
>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>> 0000000000000000
>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>> ffffea0001e7bfe0
>> [154152.972447] Call Trace:
>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
>> [154152.984380] [<ffffffffa7166ea0>] ?
>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>
>> This time with an xfs info:
>> # xfs_info /
>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>> agcount=4, agsize=58224256 blks
>> = sectsz=512 attr=2, projid32bit=0
>> = crc=0 finobt=0
>> data = bsize=4096 blocks=232897024, imaxpct=25
>> = sunit=64 swidth=384 blks
>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
>> log =internal bsize=4096 blocks=113728, version=2
>> = sectsz=512 sunit=64 blks, lazy-count=1
>> realtime =none extsz=4096 blocks=0, rtextents=0
>>
>
> Can you describe the workload to the filesystem?
At the time of this trace the rsync backup of the fs has started. So the
workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
Stefan
> Brian
>
>>>
>>>>
>>>> Brian
>>>>
>>>>> [67674.907736] ------------[ cut here ]------------
>>>>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
>>>>> xfs_vm_releasepage+0xa9/0xe0()
>>>>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
>>>>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
>>>>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
>>>>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
>>>>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
>>>>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
>>>>> sas scsi_transport_sas pps_core
>>>>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
>>>>> [67675.277120] Hardware name: Supermicro
>>>>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
>>>>> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
>>>>> 0000000000000001
>>>>> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587
>>>>> ffff88007950fae8
>>>>> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
>>>>> ffffea00208834a0
>>>>> [67675.506112] Call Trace:
>>>>> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
>>>>> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
>>>>> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
>>>>> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
>>>>> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
>>>>> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
>>>>> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
>>>>> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
>>>>> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
>>>>> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
>>>>> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930
>>>>> [67676.172075] [<ffffffffa3166160>] ?
>>>>> mem_cgroup_shrink_node_zone+0x150/0x150
>>>>> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0
>>>>> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>>>> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
>>>>> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>>>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>>>>>
>>>>> Greets Stefan
>>>>>
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs@oss.sgi.com
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-04 20:02 ` Stefan Priebe
@ 2016-03-04 21:03 ` Brian Foster
2016-03-04 21:15 ` Stefan Priebe
2016-03-05 22:48 ` Dave Chinner
0 siblings, 2 replies; 22+ messages in thread
From: Brian Foster @ 2016-03-04 21:03 UTC (permalink / raw)
To: Stefan Priebe; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>
> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>
> >>>>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>
> >>>>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>>>Hi,
> >>>>>
> >>>>>got this one today. Not sure if this is a bug.
> >>>>
> >>>>That looks like the releasepage() delayed allocation block warning. I'm
> >>>>not sure we've had any fixes for (or reports of) that issue since the
> >>>>v4.2 timeframe.
> >>>>
> >>>>What is the xfs_info of the associated filesystem? Also, do you have any
> >>>>insight as to the possible reproducer application or workload? Is this
> >>>>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
> >>>>won't fire again regardless until after a reboot.
> >>
> >>Toda i got this one running 4.3.3.
> >>
> >>[154152.949610] ------------[ cut here ]------------
> >>[154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
> >>xfs_vm_releasepage+0xc3/0xf0()
> >>[154152.952596] Modules linked in: netconsole mpt3sas raid_class
> >>nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
> >>nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
> >>garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
> >>ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
> >>i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
> >>megaraid_sas pps_core
> >>[154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
> >>[154152.964625] Hardware name: Supermicro
> >>X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
> >>03/06/2012
> >>[154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
> >>0000000000000000
> >>[154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
> >>0000000000000000
> >>[154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
> >>ffffea0001e7bfe0
> >>[154152.972447] Call Trace:
> >>[154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
> >>[154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
> >>[154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
> >>[154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
> >>[154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
> >>[154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
> >>[154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
> >>[154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
> >>[154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
> >>[154152.984380] [<ffffffffa7166ea0>] ?
> >>mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> >>[154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
> >>[154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> >>[154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
> >>[154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> >>[154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
> >>
> >>This time with an xfs info:
> >># xfs_info /
> >>meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
> >>agcount=4, agsize=58224256 blks
> >> = sectsz=512 attr=2, projid32bit=0
> >> = crc=0 finobt=0
> >>data = bsize=4096 blocks=232897024, imaxpct=25
> >> = sunit=64 swidth=384 blks
> >>naming =version 2 bsize=4096 ascii-ci=0 ftype=0
> >>log =internal bsize=4096 blocks=113728, version=2
> >> = sectsz=512 sunit=64 blks, lazy-count=1
> >>realtime =none extsz=4096 blocks=0, rtextents=0
> >>
> >
> >Can you describe the workload to the filesystem?
>
> At the time of this trace the rsync backup of the fs has started. So the
> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>
Interesting. The warning is associated with releasing a page that has a
delayed allocation when it shouldn't. That means something had written
to a file to cause the delalloc in the first place. Any idea what could
have been writing at the time or shortly before the rsync read workload
had kicked in?
Brian
> Stefan
>
> >Brian
> >
> >>>
> >>>>
> >>>>Brian
> >>>>
> >>>>>[67674.907736] ------------[ cut here ]------------
> >>>>>[67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
> >>>>>xfs_vm_releasepage+0xa9/0xe0()
> >>>>>[67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
> >>>>>nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
> >>>>>es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
> >>>>>sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
> >>>>>handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
> >>>>>ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
> >>>>>sas scsi_transport_sas pps_core
> >>>>>[67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
> >>>>>[67675.277120] Hardware name: Supermicro
> >>>>>X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
> >>>>>[67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
> >>>>>0000000000000001
> >>>>>[67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587
> >>>>>ffff88007950fae8
> >>>>>[67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
> >>>>>ffffea00208834a0
> >>>>>[67675.506112] Call Trace:
> >>>>>[67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
> >>>>>[67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
> >>>>>[67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
> >>>>>[67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
> >>>>>[67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
> >>>>>[67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
> >>>>>[67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
> >>>>>[67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
> >>>>>[67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
> >>>>>[67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
> >>>>>[67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930
> >>>>>[67676.172075] [<ffffffffa3166160>] ?
> >>>>>mem_cgroup_shrink_node_zone+0x150/0x150
> >>>>>[67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0
> >>>>>[67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> >>>>>[67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
> >>>>>[67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
> >>>>>[67676.433499] ---[ end trace cb1827fe308f7f6b ]---
> >>>>>
> >>>>>Greets Stefan
> >>>>>
> >>>>>_______________________________________________
> >>>>>xfs mailing list
> >>>>>xfs@oss.sgi.com
> >>>>>http://oss.sgi.com/mailman/listinfo/xfs
> >>
> >>_______________________________________________
> >>xfs mailing list
> >>xfs@oss.sgi.com
> >>http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-04 21:03 ` Brian Foster
@ 2016-03-04 21:15 ` Stefan Priebe
2016-03-05 22:48 ` Dave Chinner
1 sibling, 0 replies; 22+ messages in thread
From: Stefan Priebe @ 2016-03-04 21:15 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Am 04.03.2016 um 22:03 schrieb Brian Foster:
> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>
>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>
>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>
>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> got this one today. Not sure if this is a bug.
>>>>>>
>>>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>>>> v4.2 timeframe.
>>>>>>
>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>>>> insight as to the possible reproducer application or workload? Is this
>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>>>> won't fire again regardless until after a reboot.
>>>>
>>>> Toda i got this one running 4.3.3.
>>>>
>>>> [154152.949610] ------------[ cut here ]------------
>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>>>> xfs_vm_releasepage+0xc3/0xf0()
>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>>>> megaraid_sas pps_core
>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>>>> [154152.964625] Hardware name: Supermicro
>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>>>> 03/06/2012
>>>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>>>> 0000000000000000
>>>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>>>> 0000000000000000
>>>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>>>> ffffea0001e7bfe0
>>>> [154152.972447] Call Trace:
>>>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>>>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>>>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>>>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>>>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>>>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>>>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>>>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>>>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
>>>> [154152.984380] [<ffffffffa7166ea0>] ?
>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>>>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>>>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>>>
>>>> This time with an xfs info:
>>>> # xfs_info /
>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>>>> agcount=4, agsize=58224256 blks
>>>> = sectsz=512 attr=2, projid32bit=0
>>>> = crc=0 finobt=0
>>>> data = bsize=4096 blocks=232897024, imaxpct=25
>>>> = sunit=64 swidth=384 blks
>>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
>>>> log =internal bsize=4096 blocks=113728, version=2
>>>> = sectsz=512 sunit=64 blks, lazy-count=1
>>>> realtime =none extsz=4096 blocks=0, rtextents=0
>>>>
>>>
>>> Can you describe the workload to the filesystem?
>>
>> At the time of this trace the rsync backup of the fs has started. So the
>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>>
>
> Interesting. The warning is associated with releasing a page that has a
> delayed allocation when it shouldn't. That means something had written
> to a file to cause the delalloc in the first place. Any idea what could
> have been writing at the time or shortly before the rsync read workload
> had kicked in?
The systen itself is a lamp system so PHP and MySQL are running and may
write data to files but at the time the trace happens the system was
nearly idle but not completely. It was 3am.
Stefan
>
> Brian
>
>> Stefan
>>
>>> Brian
>>>
>>>>>
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>>> [67674.907736] ------------[ cut here ]------------
>>>>>>> [67674.955858] WARNING: CPU: 5 PID: 197 at fs/xfs/xfs_aops.c:1232
>>>>>>> xfs_vm_releasepage+0xa9/0xe0()
>>>>>>> [67675.005345] Modules linked in: dm_mod netconsole ipt_REJECT
>>>>>>> nf_reject_ipv4 mpt3sas raid_class xt_multiport iptable_filter ip_tabl
>>>>>>> es x_tables 8021q garp bonding coretemp loop usbhid ehci_pci ehci_hcd
>>>>>>> sb_edac ipmi_si usbcore i2c_i801 edac_core usb_common ipmi_msg
>>>>>>> handler button btrfs xor raid6_pq raid1 md_mod sg igb sd_mod i2c_algo_bit
>>>>>>> ixgbe ahci i2c_core mdio isci libahci libsas ptp megaraid_
>>>>>>> sas scsi_transport_sas pps_core
>>>>>>> [67675.221939] CPU: 5 PID: 197 Comm: kswapd0 Not tainted 4.4.2+1-ph #1
>>>>>>> [67675.277120] Hardware name: Supermicro
>>>>>>> X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.2 03/04/2015
>>>>>>> [67675.335176] ffffffffa3a5065d ffff88007950fa98 ffffffffa33bd4e1
>>>>>>> 0000000000000001
>>>>>>> [67675.392983] 0000000000000000 ffff88007950fad8 ffffffffa3083587
>>>>>>> ffff88007950fae8
>>>>>>> [67675.449743] 0000000000000001 ffffea0020883480 ffff880cf4b9cdd0
>>>>>>> ffffea00208834a0
>>>>>>> [67675.506112] Call Trace:
>>>>>>> [67675.561285] [<ffffffffa33bd4e1>] dump_stack+0x45/0x64
>>>>>>> [67675.619364] [<ffffffffa3083587>] warn_slowpath_common+0x97/0xe0
>>>>>>> [67675.675719] [<ffffffffa30835ea>] warn_slowpath_null+0x1a/0x20
>>>>>>> [67675.731113] [<ffffffffa3320a89>] xfs_vm_releasepage+0xa9/0xe0
>>>>>>> [67675.786116] [<ffffffffa318a4b0>] ? page_mkclean_one+0xd0/0xd0
>>>>>>> [67675.844216] [<ffffffffa318b1d0>] ? anon_vma_prepare+0x150/0x150
>>>>>>> [67675.903862] [<ffffffffa31506c2>] try_to_release_page+0x32/0x50
>>>>>>> [67675.957625] [<ffffffffa3164d3e>] shrink_active_list+0x3ce/0x3e0
>>>>>>> [67676.011497] [<ffffffffa31653d7>] shrink_lruvec+0x687/0x7d0
>>>>>>> [67676.064980] [<ffffffffa31655fc>] shrink_zone+0xdc/0x2c0
>>>>>>> [67676.118828] [<ffffffffa3166659>] kswapd+0x4f9/0x930
>>>>>>> [67676.172075] [<ffffffffa3166160>] ?
>>>>>>> mem_cgroup_shrink_node_zone+0x150/0x150
>>>>>>> [67676.225139] [<ffffffffa30a08c9>] kthread+0xc9/0xe0
>>>>>>> [67676.277539] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>>>>>> [67676.330124] [<ffffffffa36a8c8f>] ret_from_fork+0x3f/0x70
>>>>>>> [67676.381816] [<ffffffffa30a0800>] ? kthread_stop+0xe0/0xe0
>>>>>>> [67676.433499] ---[ end trace cb1827fe308f7f6b ]---
>>>>>>>
>>>>>>> Greets Stefan
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> xfs mailing list
>>>>>>> xfs@oss.sgi.com
>>>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-04 21:03 ` Brian Foster
2016-03-04 21:15 ` Stefan Priebe
@ 2016-03-05 22:48 ` Dave Chinner
2016-03-05 22:58 ` Stefan Priebe
` (2 more replies)
1 sibling, 3 replies; 22+ messages in thread
From: Dave Chinner @ 2016-03-05 22:48 UTC (permalink / raw)
To: Brian Foster
Cc: Stefan Priebe, linux-fsdevel, xfs-masters@oss.sgi.com,
xfs@oss.sgi.com
On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> > Am 04.03.2016 um 20:13 schrieb Brian Foster:
> > >On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> > >>Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> > >>>
> > >>>>Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> > >>>>
> > >>>>>On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> > >>>>>Hi,
> > >>>>>
> > >>>>>got this one today. Not sure if this is a bug.
> > >>>>
> > >>>>That looks like the releasepage() delayed allocation block warning. I'm
> > >>>>not sure we've had any fixes for (or reports of) that issue since the
> > >>>>v4.2 timeframe.
> > >>>>
> > >>>>What is the xfs_info of the associated filesystem? Also, do you have any
> > >>>>insight as to the possible reproducer application or workload? Is this
> > >>>>reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
> > >>>>won't fire again regardless until after a reboot.
> > >>
> > >>Toda i got this one running 4.3.3.
> > >>
> > >>[154152.949610] ------------[ cut here ]------------
> > >>[154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
> > >>xfs_vm_releasepage+0xc3/0xf0()
> > >>[154152.952596] Modules linked in: netconsole mpt3sas raid_class
> > >>nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
> > >>nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
> > >>garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
> > >>ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
> > >>i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
> > >>megaraid_sas pps_core
> > >>[154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
> > >>[154152.964625] Hardware name: Supermicro
> > >>X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
> > >>03/06/2012
> > >>[154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
> > >>0000000000000000
> > >>[154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
> > >>0000000000000000
> > >>[154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
> > >>ffffea0001e7bfe0
> > >>[154152.972447] Call Trace:
> > >>[154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
> > >>[154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
> > >>[154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
> > >>[154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
> > >>[154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
> > >>[154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
> > >>[154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
> > >>[154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
> > >>[154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
> > >>[154152.984380] [<ffffffffa7166ea0>] ?
> > >>mem_cgroup_shrink_node_zone+0x1a0/0x1a0
> > >>[154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
> > >>[154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> > >>[154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
> > >>[154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
> > >>[154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
> > >>
> > >>This time with an xfs info:
> > >># xfs_info /
> > >>meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
> > >>agcount=4, agsize=58224256 blks
> > >> = sectsz=512 attr=2, projid32bit=0
> > >> = crc=0 finobt=0
> > >>data = bsize=4096 blocks=232897024, imaxpct=25
> > >> = sunit=64 swidth=384 blks
> > >>naming =version 2 bsize=4096 ascii-ci=0 ftype=0
> > >>log =internal bsize=4096 blocks=113728, version=2
> > >> = sectsz=512 sunit=64 blks, lazy-count=1
> > >>realtime =none extsz=4096 blocks=0, rtextents=0
> > >>
> > >
> > >Can you describe the workload to the filesystem?
> >
> > At the time of this trace the rsync backup of the fs has started. So the
> > workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
> >
>
> Interesting. The warning is associated with releasing a page that has a
> delayed allocation when it shouldn't. That means something had written
> to a file to cause the delalloc in the first place. Any idea what could
> have been writing at the time or shortly before the rsync read workload
> had kicked in?
It's memory reclaim that tripped over it, so the cause is long gone
- couple have been anything in the previous 24 hours that caused the
issue. i.e. rsync has triggered memory reclaim which triggered the
warning, but I don't think rsync has anything to do with causing the
page to be in a state that caused the warning.
I'd be interested to know if there are any other warnings in the
logs - stuff like IO errors, page discards, ENOSPC issues, etc that
could trigger less travelled write error paths...
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-05 22:48 ` Dave Chinner
@ 2016-03-05 22:58 ` Stefan Priebe
2016-03-23 13:26 ` Stefan Priebe - Profihost AG
2016-03-23 13:28 ` Stefan Priebe - Profihost AG
2 siblings, 0 replies; 22+ messages in thread
From: Stefan Priebe @ 2016-03-05 22:58 UTC (permalink / raw)
To: Dave Chinner, Brian Foster
Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>
>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>
>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> got this one today. Not sure if this is a bug.
>>>>>>>
>>>>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>>>>> v4.2 timeframe.
>>>>>>>
>>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>>>>> insight as to the possible reproducer application or workload? Is this
>>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>>>>> won't fire again regardless until after a reboot.
>>>>>
>>>>> Toda i got this one running 4.3.3.
>>>>>
>>>>> [154152.949610] ------------[ cut here ]------------
>>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>>>>> xfs_vm_releasepage+0xc3/0xf0()
>>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>>>>> megaraid_sas pps_core
>>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>>>>> [154152.964625] Hardware name: Supermicro
>>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>>>>> 03/06/2012
>>>>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>>>>> 0000000000000000
>>>>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>>>>> 0000000000000000
>>>>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>>>>> ffffea0001e7bfe0
>>>>> [154152.972447] Call Trace:
>>>>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>>>>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>>>>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>>>>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>>>>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>>>>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>>>>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>>>>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>>>>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
>>>>> [154152.984380] [<ffffffffa7166ea0>] ?
>>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>>>>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>>>>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>>>>
>>>>> This time with an xfs info:
>>>>> # xfs_info /
>>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>>>>> agcount=4, agsize=58224256 blks
>>>>> = sectsz=512 attr=2, projid32bit=0
>>>>> = crc=0 finobt=0
>>>>> data = bsize=4096 blocks=232897024, imaxpct=25
>>>>> = sunit=64 swidth=384 blks
>>>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
>>>>> log =internal bsize=4096 blocks=113728, version=2
>>>>> = sectsz=512 sunit=64 blks, lazy-count=1
>>>>> realtime =none extsz=4096 blocks=0, rtextents=0
>>>>>
>>>>
>>>> Can you describe the workload to the filesystem?
>>>
>>> At the time of this trace the rsync backup of the fs has started. So the
>>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>>>
>>
>> Interesting. The warning is associated with releasing a page that has a
>> delayed allocation when it shouldn't. That means something had written
>> to a file to cause the delalloc in the first place. Any idea what could
>> have been writing at the time or shortly before the rsync read workload
>> had kicked in?
>
> It's memory reclaim that tripped over it, so the cause is long gone
> - couple have been anything in the previous 24 hours that caused the
> issue. i.e. rsync has triggered memory reclaim which triggered the
> warning, but I don't think rsync has anything to do with causing the
> page to be in a state that caused the warning.
>
> I'd be interested to know if there are any other warnings in the
> logs - stuff like IO errors, page discards, ENOSPC issues, etc that
> could trigger less travelled write error paths...
No dmesg is absolutely clean. This hasn't happened with 4.1.18 before.
It has started after upgrade from 4.1 to 4.4.
Stefan
>
> -Dave.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-05 22:48 ` Dave Chinner
2016-03-05 22:58 ` Stefan Priebe
@ 2016-03-23 13:26 ` Stefan Priebe - Profihost AG
2016-03-23 13:28 ` Stefan Priebe - Profihost AG
2 siblings, 0 replies; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-23 13:26 UTC (permalink / raw)
To: Dave Chinner, Brian Foster
Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>
>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>
>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> got this one today. Not sure if this is a bug.
>>>>>>>
>>>>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>>>>> v4.2 timeframe.
>>>>>>>
>>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>>>>> insight as to the possible reproducer application or workload? Is this
>>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>>>>> won't fire again regardless until after a reboot.
>>>>>
>>>>> Toda i got this one running 4.3.3.
>>>>>
>>>>> [154152.949610] ------------[ cut here ]------------
>>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>>>>> xfs_vm_releasepage+0xc3/0xf0()
>>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>>>>> megaraid_sas pps_core
>>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>>>>> [154152.964625] Hardware name: Supermicro
>>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>>>>> 03/06/2012
>>>>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>>>>> 0000000000000000
>>>>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>>>>> 0000000000000000
>>>>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>>>>> ffffea0001e7bfe0
>>>>> [154152.972447] Call Trace:
>>>>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>>>>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>>>>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>>>>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>>>>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>>>>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>>>>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>>>>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>>>>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
>>>>> [154152.984380] [<ffffffffa7166ea0>] ?
Mit freundlichen Gr��en
Stefan Priebe
Bachelor of Science in Computer Science (BSCS)
Vorstand (CTO)
-------------------------------
Profihost AG
Expo Plaza 1
30539 Hannover
Deutschland
Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: info@profihost.com
Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)
>>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>>>>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>>>>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>>>>
>>>>> This time with an xfs info:
>>>>> # xfs_info /
>>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>>>>> agcount=4, agsize=58224256 blks
>>>>> = sectsz=512 attr=2, projid32bit=0
>>>>> = crc=0 finobt=0
>>>>> data = bsize=4096 blocks=232897024, imaxpct=25
>>>>> = sunit=64 swidth=384 blks
>>>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
>>>>> log =internal bsize=4096 blocks=113728, version=2
>>>>> = sectsz=512 sunit=64 blks, lazy-count=1
>>>>> realtime =none extsz=4096 blocks=0, rtextents=0
>>>>>
>>>>
>>>> Can you describe the workload to the filesystem?
>>>
>>> At the time of this trace the rsync backup of the fs has started. So the
>>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>>>
>>
>> Interesting. The warning is associated with releasing a page that has a
>> delayed allocation when it shouldn't. That means something had written
>> to a file to cause the delalloc in the first place. Any idea what could
>> have been writing at the time or shortly before the rsync read workload
>> had kicked in?
>
> It's memory reclaim that tripped over it, so the cause is long gone
> - couple have been anything in the previous 24 hours that caused the
> issue. i.e. rsync has triggered memory reclaim which triggered the
> warning, but I don't think rsync has anything to do with causing the
> page to be in a state that caused the warning.
>
> I'd be interested to know if there are any other warnings in the
> logs - stuff like IO errors, page discards, ENOSPC issues, etc that
> could trigger less travelled write error paths...
This has happened again on 8 different hosts in the last 24 hours
running 4.4.6.
All of those are KVM / Qemu hosts and are doing NO I/O except the normal
OS stuff as the VMs have remote storage. So no database, no rsync on
those hosts - just the OS doing nearly nothing.
All those show:
[153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
xfs_vm_releasepage+0xe2/0xf0()
Stefan
>
> -Dave.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-05 22:48 ` Dave Chinner
2016-03-05 22:58 ` Stefan Priebe
2016-03-23 13:26 ` Stefan Priebe - Profihost AG
@ 2016-03-23 13:28 ` Stefan Priebe - Profihost AG
2016-03-23 14:07 ` Brian Foster
2 siblings, 1 reply; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-23 13:28 UTC (permalink / raw)
To: Dave Chinner, Brian Foster
Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
sorry new one the last one got mangled. Comments inside.
Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>
>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>
>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> got this one today. Not sure if this is a bug.
>>>>>>>
>>>>>>> That looks like the releasepage() delayed allocation block warning. I'm
>>>>>>> not sure we've had any fixes for (or reports of) that issue since the
>>>>>>> v4.2 timeframe.
>>>>>>>
>>>>>>> What is the xfs_info of the associated filesystem? Also, do you have any
>>>>>>> insight as to the possible reproducer application or workload? Is this
>>>>>>> reproducible at all? Note that this is a WARN_ON_ONCE(), so the warning
>>>>>>> won't fire again regardless until after a reboot.
>>>>>
>>>>> Toda i got this one running 4.3.3.
>>>>>
>>>>> [154152.949610] ------------[ cut here ]------------
>>>>> [154152.950704] WARNING: CPU: 0 PID: 79 at fs/xfs/xfs_aops.c:1232
>>>>> xfs_vm_releasepage+0xc3/0xf0()
>>>>> [154152.952596] Modules linked in: netconsole mpt3sas raid_class
>>>>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT
>>>>> nf_reject_ipv4 xt_owner xt_multiport iptable_filter ip_tables x_tables 8021q
>>>>> garp coretemp k8temp ehci_pci ehci_hcd sb_edac ipmi_si usbcore edac_core
>>>>> ipmi_msghandler i2c_i801 usb_common button btrfs xor raid6_pq sg igb sd_mod
>>>>> i2c_algo_bit isci i2c_core libsas ahci ptp libahci scsi_transport_sas
>>>>> megaraid_sas pps_core
>>>>> [154152.963240] CPU: 0 PID: 79 Comm: kswapd0 Not tainted 4.4.3+3-ph #1
>>>>> [154152.964625] Hardware name: Supermicro
>>>>> X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 1.0a
>>>>> 03/06/2012
>>>>> [154152.967029] 0000000000000000 ffff88103dd67a98 ffffffffa73c3b5f
>>>>> 0000000000000000
>>>>> [154152.968836] ffffffffa7a5063b ffff88103dd67ad8 ffffffffa7083757
>>>>> 0000000000000000
>>>>> [154152.970641] 0000000000000001 ffffea0001e7bfc0 ffff88071ef72dd0
>>>>> ffffea0001e7bfe0
>>>>> [154152.972447] Call Trace:
>>>>> [154152.973011] [<ffffffffa73c3b5f>] dump_stack+0x63/0x84
>>>>> [154152.974167] [<ffffffffa7083757>] warn_slowpath_common+0x97/0xe0
>>>>> [154152.975515] [<ffffffffa70837ba>] warn_slowpath_null+0x1a/0x20
>>>>> [154152.976826] [<ffffffffa7324f23>] xfs_vm_releasepage+0xc3/0xf0
>>>>> [154152.978137] [<ffffffffa71510b2>] try_to_release_page+0x32/0x50
>>>>> [154152.979467] [<ffffffffa71659be>] shrink_active_list+0x3ce/0x3e0
>>>>> [154152.980816] [<ffffffffa7166057>] shrink_lruvec+0x687/0x7d0
>>>>> [154152.982068] [<ffffffffa716627c>] shrink_zone+0xdc/0x2c0
>>>>> [154152.983262] [<ffffffffa7167399>] kswapd+0x4f9/0x970
>>>>> [154152.984380] [<ffffffffa7166ea0>] ?
>>>>> mem_cgroup_shrink_node_zone+0x1a0/0x1a0
>>>>> [154152.985942] [<ffffffffa70a0ac9>] kthread+0xc9/0xe0
>>>>> [154152.987040] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.988313] [<ffffffffa76b03cf>] ret_from_fork+0x3f/0x70
>>>>> [154152.989527] [<ffffffffa70a0a00>] ? kthread_stop+0x100/0x100
>>>>> [154152.990818] ---[ end trace 3fac2515e92c7cb1 ]---
>>>>>
>>>>> This time with an xfs info:
>>>>> # xfs_info /
>>>>> meta-data=/dev/disk/by-uuid/9befe321-e9cc-4e31-82df-efabb3211bac isize=256
>>>>> agcount=4, agsize=58224256 blks
>>>>> = sectsz=512 attr=2, projid32bit=0
>>>>> = crc=0 finobt=0
>>>>> data = bsize=4096 blocks=232897024, imaxpct=25
>>>>> = sunit=64 swidth=384 blks
>>>>> naming =version 2 bsize=4096 ascii-ci=0 ftype=0
>>>>> log =internal bsize=4096 blocks=113728, version=2
>>>>> = sectsz=512 sunit=64 blks, lazy-count=1
>>>>> realtime =none extsz=4096 blocks=0, rtextents=0
>>>>>
>>>>
>>>> Can you describe the workload to the filesystem?
>>>
>>> At the time of this trace the rsync backup of the fs has started. So the
>>> workload was going from nearly idle to 4000 iop/s read at 60 MB/s peak.
>>>
>>
>> Interesting. The warning is associated with releasing a page that has a
>> delayed allocation when it shouldn't. That means something had written
>> to a file to cause the delalloc in the first place. Any idea what could
>> have been writing at the time or shortly before the rsync read workload
>> had kicked in?
>
> It's memory reclaim that tripped over it, so the cause is long gone
> - couple have been anything in the previous 24 hours that caused the
> issue. i.e. rsync has triggered memory reclaim which triggered the
> warning, but I don't think rsync has anything to do with causing the
> page to be in a state that caused the warning.
>
> I'd be interested to know if there are any other warnings in the
> logs - stuff like IO errors, page discards, ENOSPC issues, etc that
> could trigger less travelled write error paths...
This has happened again on 8 different hosts in the last 24 hours
running 4.4.6.
All of those are KVM / Qemu hosts and are doing NO I/O except the normal
OS stuff as the VMs have remote storage. So no database, no rsync on
those hosts - just the OS doing nearly nothing.
All those show:
[153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
xfs_vm_releasepage+0xe2/0xf0()
Stefan
>
> -Dave.
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-23 13:28 ` Stefan Priebe - Profihost AG
@ 2016-03-23 14:07 ` Brian Foster
2016-03-24 8:10 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2016-03-23 14:07 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Dave Chinner, linux-fsdevel, xfs-masters@oss.sgi.com,
xfs@oss.sgi.com
On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> sorry new one the last one got mangled. Comments inside.
>
> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> > On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>
> >>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>
> >>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
...
>
> This has happened again on 8 different hosts in the last 24 hours
> running 4.4.6.
>
> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> OS stuff as the VMs have remote storage. So no database, no rsync on
> those hosts - just the OS doing nearly nothing.
>
> All those show:
> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> xfs_vm_releasepage+0xe2/0xf0()
>
Ok, well at this point the warning isn't telling us anything beyond
you're reproducing the problem. We can't really make progress without
more information. We don't necessarily know what application or
operations caused this by the time it occurs, but perhaps knowing what
file is affected could give us a hint.
We have the xfs_releasepage tracepoint, but that's unconditional and so
might generate a lot of noise by default. Could you enable the
xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
E.g., we could leave a long running 'trace-cmd record -e
"xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
-e "xfs:xfs_releasepage"' and leave something like 'cat
/sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
~/trace.out' running to capture instances.
If we can get a tracepoint hit, it will include the inode number and
something like 'find / -inum <ino>' can point us at the file.
Brian
> Stefan
>
> >
> > -Dave.
> >
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-23 14:07 ` Brian Foster
@ 2016-03-24 8:10 ` Stefan Priebe - Profihost AG
2016-03-24 8:15 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-24 8:10 UTC (permalink / raw)
To: Brian Foster
Cc: Dave Chinner, linux-fsdevel, xfs-masters@oss.sgi.com,
xfs@oss.sgi.com
Am 23.03.2016 um 15:07 schrieb Brian Foster:
> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>> sorry new one the last one got mangled. Comments inside.
>>
>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>
>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>
>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> ...
>>
>> This has happened again on 8 different hosts in the last 24 hours
>> running 4.4.6.
>>
>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>> OS stuff as the VMs have remote storage. So no database, no rsync on
>> those hosts - just the OS doing nearly nothing.
>>
>> All those show:
>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>> xfs_vm_releasepage+0xe2/0xf0()
>>
>
> Ok, well at this point the warning isn't telling us anything beyond
> you're reproducing the problem. We can't really make progress without
> more information. We don't necessarily know what application or
> operations caused this by the time it occurs, but perhaps knowing what
> file is affected could give us a hint.
>
> We have the xfs_releasepage tracepoint, but that's unconditional and so
> might generate a lot of noise by default. Could you enable the
> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> E.g., we could leave a long running 'trace-cmd record -e
> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> -e "xfs:xfs_releasepage"' and leave something like 'cat
> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> ~/trace.out' running to capture instances.
>
> If we can get a tracepoint hit, it will include the inode number and
> something like 'find / -inum <ino>' can point us at the file.
thanks - need to compile trace-cmd first. Do you know if and how it
influences performance?
Stefan
>
> Brian
>
>> Stefan
>>
>>>
>>> -Dave.
>>>
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-24 8:10 ` Stefan Priebe - Profihost AG
@ 2016-03-24 8:15 ` Stefan Priebe - Profihost AG
2016-03-24 11:17 ` Brian Foster
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-24 8:15 UTC (permalink / raw)
To: Brian Foster
Cc: Dave Chinner, linux-fsdevel, xfs-masters@oss.sgi.com,
xfs@oss.sgi.com
Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>
> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>> sorry new one the last one got mangled. Comments inside.
>>>
>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>
>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>
>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>> ...
>>>
>>> This has happened again on 8 different hosts in the last 24 hours
>>> running 4.4.6.
>>>
>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>> those hosts - just the OS doing nearly nothing.
>>>
>>> All those show:
>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>> xfs_vm_releasepage+0xe2/0xf0()
>>>
>>
>> Ok, well at this point the warning isn't telling us anything beyond
>> you're reproducing the problem. We can't really make progress without
>> more information. We don't necessarily know what application or
>> operations caused this by the time it occurs, but perhaps knowing what
>> file is affected could give us a hint.
>>
>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>> might generate a lot of noise by default. Could you enable the
>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>> E.g., we could leave a long running 'trace-cmd record -e
>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>> ~/trace.out' running to capture instances.
Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
it in the trace.out even the WARN_ONCE was already triggered?
Stefan
>
> Stefan
>
>>
>> Brian
>>
>>> Stefan
>>>
>>>>
>>>> -Dave.
>>>>
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-24 8:15 ` Stefan Priebe - Profihost AG
@ 2016-03-24 11:17 ` Brian Foster
2016-03-24 12:17 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2016-03-24 11:17 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>
> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> >
> > Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>> sorry new one the last one got mangled. Comments inside.
> >>>
> >>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>
> >>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>>>>
> >>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >> ...
> >>>
> >>> This has happened again on 8 different hosts in the last 24 hours
> >>> running 4.4.6.
> >>>
> >>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>> OS stuff as the VMs have remote storage. So no database, no rsync on
> >>> those hosts - just the OS doing nearly nothing.
> >>>
> >>> All those show:
> >>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>> xfs_vm_releasepage+0xe2/0xf0()
> >>>
> >>
> >> Ok, well at this point the warning isn't telling us anything beyond
> >> you're reproducing the problem. We can't really make progress without
> >> more information. We don't necessarily know what application or
> >> operations caused this by the time it occurs, but perhaps knowing what
> >> file is affected could give us a hint.
> >>
> >> We have the xfs_releasepage tracepoint, but that's unconditional and so
> >> might generate a lot of noise by default. Could you enable the
> >> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >> E.g., we could leave a long running 'trace-cmd record -e
> >> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >> -e "xfs:xfs_releasepage"' and leave something like 'cat
> >> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >> ~/trace.out' running to capture instances.
>
> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> it in the trace.out even the WARN_ONCE was already triggered?
>
The tracepoint is independent from the warning (see
xfs_vm_releasepage()), so the tracepoint will fire every invocation of
the function regardless of whether delalloc blocks still exist at that
point. That creates the need to filter the entries.
With regard to performance, I believe the tracepoints are intended to be
pretty lightweight. I don't think it should hurt to try it on a box,
observe for a bit and make sure there isn't a huge impact. Note that the
'trace-cmd record' approach will save everything to file, so that's
something to consider I suppose.
Brian
> Stefan
>
>
> >
> > Stefan
> >
> >>
> >> Brian
> >>
> >>> Stefan
> >>>
> >>>>
> >>>> -Dave.
> >>>>
> >>>
> >>> _______________________________________________
> >>> xfs mailing list
> >>> xfs@oss.sgi.com
> >>> http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-24 11:17 ` Brian Foster
@ 2016-03-24 12:17 ` Stefan Priebe - Profihost AG
2016-03-24 12:24 ` Brian Foster
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-03-24 12:17 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Am 24.03.2016 um 12:17 schrieb Brian Foster:
> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>
>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>
>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>
>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>> ...
>>>>>
>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>> running 4.4.6.
>>>>>
>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>> those hosts - just the OS doing nearly nothing.
>>>>>
>>>>> All those show:
>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>
>>>>
>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>> you're reproducing the problem. We can't really make progress without
>>>> more information. We don't necessarily know what application or
>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>> file is affected could give us a hint.
>>>>
>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>> might generate a lot of noise by default. Could you enable the
>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>> ~/trace.out' running to capture instances.
>>
>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>> it in the trace.out even the WARN_ONCE was already triggered?
>>
>
> The tracepoint is independent from the warning (see
> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> the function regardless of whether delalloc blocks still exist at that
> point. That creates the need to filter the entries.
>
> With regard to performance, I believe the tracepoints are intended to be
> pretty lightweight. I don't think it should hurt to try it on a box,
> observe for a bit and make sure there isn't a huge impact. Note that the
> 'trace-cmd record' approach will save everything to file, so that's
> something to consider I suppose.
Tests / cat is running. Is there any way to test if it works? Or is it
enough that cat prints stuff from time to time but does not match -v
delalloc 0
Stefan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-24 12:17 ` Stefan Priebe - Profihost AG
@ 2016-03-24 12:24 ` Brian Foster
2016-04-04 6:12 ` Stefan Priebe - Profihost AG
2016-05-11 12:26 ` Stefan Priebe - Profihost AG
0 siblings, 2 replies; 22+ messages in thread
From: Brian Foster @ 2016-03-24 12:24 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>
> Am 24.03.2016 um 12:17 schrieb Brian Foster:
> > On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> >>>
> >>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>> sorry new one the last one got mangled. Comments inside.
> >>>>>
> >>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>>>
> >>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>> ...
> >>>>>
> >>>>> This has happened again on 8 different hosts in the last 24 hours
> >>>>> running 4.4.6.
> >>>>>
> >>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
> >>>>> those hosts - just the OS doing nearly nothing.
> >>>>>
> >>>>> All those show:
> >>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>>>> xfs_vm_releasepage+0xe2/0xf0()
> >>>>>
> >>>>
> >>>> Ok, well at this point the warning isn't telling us anything beyond
> >>>> you're reproducing the problem. We can't really make progress without
> >>>> more information. We don't necessarily know what application or
> >>>> operations caused this by the time it occurs, but perhaps knowing what
> >>>> file is affected could give us a hint.
> >>>>
> >>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
> >>>> might generate a lot of noise by default. Could you enable the
> >>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >>>> E.g., we could leave a long running 'trace-cmd record -e
> >>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
> >>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >>>> ~/trace.out' running to capture instances.
> >>
> >> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> >> it in the trace.out even the WARN_ONCE was already triggered?
> >>
> >
> > The tracepoint is independent from the warning (see
> > xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> > the function regardless of whether delalloc blocks still exist at that
> > point. That creates the need to filter the entries.
> >
> > With regard to performance, I believe the tracepoints are intended to be
> > pretty lightweight. I don't think it should hurt to try it on a box,
> > observe for a bit and make sure there isn't a huge impact. Note that the
> > 'trace-cmd record' approach will save everything to file, so that's
> > something to consider I suppose.
>
> Tests / cat is running. Is there any way to test if it works? Or is it
> enough that cat prints stuff from time to time but does not match -v
> delalloc 0
>
What is it printing where delalloc != 0? You could always just cat
trace_pipe and make sure the event is firing, it's just that I suspect
most entries will have delalloc == unwritten == 0.
Also, while the tracepoint fires independent of the warning, it might
not be a bad idea to restart a system that has already seen the warning
since boot, just to provide some correlation or additional notification
when the problem occurs.
Brian
> Stefan
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-24 12:24 ` Brian Foster
@ 2016-04-04 6:12 ` Stefan Priebe - Profihost AG
2016-05-11 12:26 ` Stefan Priebe - Profihost AG
1 sibling, 0 replies; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-04-04 6:12 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Am 24.03.2016 um 13:24 schrieb Brian Foster:
> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>
>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>
>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>> ...
>>>>>>>
>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>> running 4.4.6.
>>>>>>>
>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>
>>>>>>> All those show:
>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>
>>>>>>
>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>> more information. We don't necessarily know what application or
>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>> file is affected could give us a hint.
>>>>>>
>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>> ~/trace.out' running to capture instances.
>>>>
>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>
>>>
>>> The tracepoint is independent from the warning (see
>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>> the function regardless of whether delalloc blocks still exist at that
>>> point. That creates the need to filter the entries.
>>>
>>> With regard to performance, I believe the tracepoints are intended to be
>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>> 'trace-cmd record' approach will save everything to file, so that's
>>> something to consider I suppose.
>>
>> Tests / cat is running. Is there any way to test if it works? Or is it
>> enough that cat prints stuff from time to time but does not match -v
>> delalloc 0
>>
>
> What is it printing where delalloc != 0? You could always just cat
> trace_pipe and make sure the event is firing, it's just that I suspect
> most entries will have delalloc == unwritten == 0.
>
> Also, while the tracepoint fires independent of the warning, it might
> not be a bad idea to restart a system that has already seen the warning
> since boot, just to provide some correlation or additional notification
> when the problem occurs.
I still wasn't able to catch one with trace-cmd. But i notice that it
happens mostly in the first 48hours after a reboot. All systems running
since some days but noone triggers this again. All systems who have
triggered this bug got rebootet.
Stefan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-03-24 12:24 ` Brian Foster
2016-04-04 6:12 ` Stefan Priebe - Profihost AG
@ 2016-05-11 12:26 ` Stefan Priebe - Profihost AG
2016-05-11 13:34 ` Brian Foster
1 sibling, 1 reply; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-11 12:26 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Hi Brian,
i'm still unable to grab anything to the trace file? Is there anything
to check if it's working at all?
This still happens in the first 48 hours after a fresh reboot.
Stefan
Am 24.03.2016 um 13:24 schrieb Brian Foster:
> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>
>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>
>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>
>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>> ...
>>>>>>>
>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>> running 4.4.6.
>>>>>>>
>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>
>>>>>>> All those show:
>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>
>>>>>>
>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>> more information. We don't necessarily know what application or
>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>> file is affected could give us a hint.
>>>>>>
>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>> ~/trace.out' running to capture instances.
>>>>
>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>
>>>
>>> The tracepoint is independent from the warning (see
>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>> the function regardless of whether delalloc blocks still exist at that
>>> point. That creates the need to filter the entries.
>>>
>>> With regard to performance, I believe the tracepoints are intended to be
>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>> 'trace-cmd record' approach will save everything to file, so that's
>>> something to consider I suppose.
>>
>> Tests / cat is running. Is there any way to test if it works? Or is it
>> enough that cat prints stuff from time to time but does not match -v
>> delalloc 0
>>
>
> What is it printing where delalloc != 0? You could always just cat
> trace_pipe and make sure the event is firing, it's just that I suspect
> most entries will have delalloc == unwritten == 0.
>
> Also, while the tracepoint fires independent of the warning, it might
> not be a bad idea to restart a system that has already seen the warning
> since boot, just to provide some correlation or additional notification
> when the problem occurs.
>
> Brian
>
>> Stefan
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-05-11 12:26 ` Stefan Priebe - Profihost AG
@ 2016-05-11 13:34 ` Brian Foster
2016-05-11 14:03 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 22+ messages in thread
From: Brian Foster @ 2016-05-11 13:34 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
> Hi Brian,
>
> i'm still unable to grab anything to the trace file? Is there anything
> to check if it's working at all?
>
See my previous mail:
http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
E.g., something like this should work after writing to and removing a
new file:
# trace-cmd start -e "xfs:xfs_releasepage"
# cat /sys/kernel/debug/tracing/trace_pipe
...
rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
Once that is working, add the grep command to filter out "delalloc 0"
instances, etc. For example:
cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
Brian
> This still happens in the first 48 hours after a fresh reboot.
>
> Stefan
>
> Am 24.03.2016 um 13:24 schrieb Brian Foster:
> > On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 24.03.2016 um 12:17 schrieb Brian Foster:
> >>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>
> >>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> >>>>>
> >>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>>>>>> sorry new one the last one got mangled. Comments inside.
> >>>>>>>
> >>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >>>>>> ...
> >>>>>>>
> >>>>>>> This has happened again on 8 different hosts in the last 24 hours
> >>>>>>> running 4.4.6.
> >>>>>>>
> >>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
> >>>>>>> those hosts - just the OS doing nearly nothing.
> >>>>>>>
> >>>>>>> All those show:
> >>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>>>>>> xfs_vm_releasepage+0xe2/0xf0()
> >>>>>>>
> >>>>>>
> >>>>>> Ok, well at this point the warning isn't telling us anything beyond
> >>>>>> you're reproducing the problem. We can't really make progress without
> >>>>>> more information. We don't necessarily know what application or
> >>>>>> operations caused this by the time it occurs, but perhaps knowing what
> >>>>>> file is affected could give us a hint.
> >>>>>>
> >>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
> >>>>>> might generate a lot of noise by default. Could you enable the
> >>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >>>>>> E.g., we could leave a long running 'trace-cmd record -e
> >>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
> >>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >>>>>> ~/trace.out' running to capture instances.
> >>>>
> >>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> >>>> it in the trace.out even the WARN_ONCE was already triggered?
> >>>>
> >>>
> >>> The tracepoint is independent from the warning (see
> >>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
> >>> the function regardless of whether delalloc blocks still exist at that
> >>> point. That creates the need to filter the entries.
> >>>
> >>> With regard to performance, I believe the tracepoints are intended to be
> >>> pretty lightweight. I don't think it should hurt to try it on a box,
> >>> observe for a bit and make sure there isn't a huge impact. Note that the
> >>> 'trace-cmd record' approach will save everything to file, so that's
> >>> something to consider I suppose.
> >>
> >> Tests / cat is running. Is there any way to test if it works? Or is it
> >> enough that cat prints stuff from time to time but does not match -v
> >> delalloc 0
> >>
> >
> > What is it printing where delalloc != 0? You could always just cat
> > trace_pipe and make sure the event is firing, it's just that I suspect
> > most entries will have delalloc == unwritten == 0.
> >
> > Also, while the tracepoint fires independent of the warning, it might
> > not be a bad idea to restart a system that has already seen the warning
> > since boot, just to provide some correlation or additional notification
> > when the problem occurs.
> >
> > Brian
> >
> >> Stefan
> >>
> >> _______________________________________________
> >> xfs mailing list
> >> xfs@oss.sgi.com
> >> http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage
2016-05-11 13:34 ` Brian Foster
@ 2016-05-11 14:03 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 22+ messages in thread
From: Stefan Priebe - Profihost AG @ 2016-05-11 14:03 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-fsdevel, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Am 11.05.2016 um 15:34 schrieb Brian Foster:
> On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
>> Hi Brian,
>>
>> i'm still unable to grab anything to the trace file? Is there anything
>> to check if it's working at all?
>>
>
> See my previous mail:
>
> http://oss.sgi.com/pipermail/xfs/2016-March/047793.html
>
> E.g., something like this should work after writing to and removing a
> new file:
>
> # trace-cmd start -e "xfs:xfs_releasepage"
> # cat /sys/kernel/debug/tracing/trace_pipe
> ...
> rm-8198 [000] .... 9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0
arg sorry yes that's working but delalloc is always 0.
May be i have to hook that into my initramfs to be fast enough?
Stefan
> Once that is working, add the grep command to filter out "delalloc 0"
> instances, etc. For example:
>
> cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out
>
> Brian
>
>> This still happens in the first 48 hours after a fresh reboot.
>>
>> Stefan
>>
>> Am 24.03.2016 um 13:24 schrieb Brian Foster:
>>> On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 24.03.2016 um 12:17 schrieb Brian Foster:
>>>>> On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
>>>>>>>
>>>>>>> Am 23.03.2016 um 15:07 schrieb Brian Foster:
>>>>>>>> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> sorry new one the last one got mangled. Comments inside.
>>>>>>>>>
>>>>>>>>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
>>>>>>>>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
>>>>>>>>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
>>>>>>>>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
>>>>>>>>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@redhat.com>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> This has happened again on 8 different hosts in the last 24 hours
>>>>>>>>> running 4.4.6.
>>>>>>>>>
>>>>>>>>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
>>>>>>>>> OS stuff as the VMs have remote storage. So no database, no rsync on
>>>>>>>>> those hosts - just the OS doing nearly nothing.
>>>>>>>>>
>>>>>>>>> All those show:
>>>>>>>>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
>>>>>>>>> xfs_vm_releasepage+0xe2/0xf0()
>>>>>>>>>
>>>>>>>>
>>>>>>>> Ok, well at this point the warning isn't telling us anything beyond
>>>>>>>> you're reproducing the problem. We can't really make progress without
>>>>>>>> more information. We don't necessarily know what application or
>>>>>>>> operations caused this by the time it occurs, but perhaps knowing what
>>>>>>>> file is affected could give us a hint.
>>>>>>>>
>>>>>>>> We have the xfs_releasepage tracepoint, but that's unconditional and so
>>>>>>>> might generate a lot of noise by default. Could you enable the
>>>>>>>> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
>>>>>>>> E.g., we could leave a long running 'trace-cmd record -e
>>>>>>>> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
>>>>>>>> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
>>>>>>>> -e "xfs:xfs_releasepage"' and leave something like 'cat
>>>>>>>> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
>>>>>>>> ~/trace.out' running to capture instances.
>>>>>>
>>>>>> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
>>>>>> it in the trace.out even the WARN_ONCE was already triggered?
>>>>>>
>>>>>
>>>>> The tracepoint is independent from the warning (see
>>>>> xfs_vm_releasepage()), so the tracepoint will fire every invocation of
>>>>> the function regardless of whether delalloc blocks still exist at that
>>>>> point. That creates the need to filter the entries.
>>>>>
>>>>> With regard to performance, I believe the tracepoints are intended to be
>>>>> pretty lightweight. I don't think it should hurt to try it on a box,
>>>>> observe for a bit and make sure there isn't a huge impact. Note that the
>>>>> 'trace-cmd record' approach will save everything to file, so that's
>>>>> something to consider I suppose.
>>>>
>>>> Tests / cat is running. Is there any way to test if it works? Or is it
>>>> enough that cat prints stuff from time to time but does not match -v
>>>> delalloc 0
>>>>
>>>
>>> What is it printing where delalloc != 0? You could always just cat
>>> trace_pipe and make sure the event is firing, it's just that I suspect
>>> most entries will have delalloc == unwritten == 0.
>>>
>>> Also, while the tracepoint fires independent of the warning, it might
>>> not be a bad idea to restart a system that has already seen the warning
>>> since boot, just to provide some correlation or additional notification
>>> when the problem occurs.
>>>
>>> Brian
>>>
>>>> Stefan
>>>>
>>>> _______________________________________________
>>>> xfs mailing list
>>>> xfs@oss.sgi.com
>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-05-11 14:03 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-20 8:02 xfs trace in 4.4.2 Stefan Priebe
2016-02-20 14:45 ` Brian Foster
2016-02-20 18:02 ` Stefan Priebe - Profihost AG
2016-03-04 18:47 ` xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage Stefan Priebe
2016-03-04 19:13 ` Brian Foster
2016-03-04 20:02 ` Stefan Priebe
2016-03-04 21:03 ` Brian Foster
2016-03-04 21:15 ` Stefan Priebe
2016-03-05 22:48 ` Dave Chinner
2016-03-05 22:58 ` Stefan Priebe
2016-03-23 13:26 ` Stefan Priebe - Profihost AG
2016-03-23 13:28 ` Stefan Priebe - Profihost AG
2016-03-23 14:07 ` Brian Foster
2016-03-24 8:10 ` Stefan Priebe - Profihost AG
2016-03-24 8:15 ` Stefan Priebe - Profihost AG
2016-03-24 11:17 ` Brian Foster
2016-03-24 12:17 ` Stefan Priebe - Profihost AG
2016-03-24 12:24 ` Brian Foster
2016-04-04 6:12 ` Stefan Priebe - Profihost AG
2016-05-11 12:26 ` Stefan Priebe - Profihost AG
2016-05-11 13:34 ` Brian Foster
2016-05-11 14:03 ` Stefan Priebe - Profihost AG
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).